Re: core development

James Bates 24 Dec 2002 15:14:05 -0000

>
> I completely agree that we need to break the documents up more and I
> also agree with your reasons for needing to do so, however I'm very
> hesitant to go all the way to individual nodes. The are several reasons
> for this. One is that the current BTree is really slow, in fact it's
> faster to store things using the FSFiler then it is with the
> BTreeFiler. If we start breaking documents up at that granularity
> performance is really going to go through the floor. This is especially
> true if we store one node per page. A typical node will be only a few
> bytes to a few hundred bytes, but pages are supposed to be the same
> size as the OS page size (usually 4 or 8K). So we're talking about
> having to read large numbers of pages for relatively small documents.
>


This is true, I had overlooked it.

> Anyway, I've been exploring two things related to this. First I've been
> looking hard at page management, and the idea of storing one node per
> page.

This is indeed questionable. The idea of having pages is to allow a storage 
manager to read in parts of a datafile as needed, right? I guess we need a 
more flexible way of storing tree nodes
into pages... Is it imaginable to have a page contain several nodes, indexed 
for example, and have a TOC at the start of each page indicating at which 
offset each of the nodes can be found? Pointers to nodes in the paged file
would then be (pageno, index) pairs instead of just page numbers as now.



 For the current BTree this is OK because the way it is
> implemented it will completely fill the pages for index (i.e. internal)
> nodes before splitting and most data nodes at the leaves are relatively
> large. If we go to smaller leaf node sizes we need to look at storing
> more then one BTree node per page. This is probably a good thing to do
> anyway, since indexes are already storing small leaf nodes. BTW, I also
> think we should be questioning the use of a BTree for actual record
> storage. It makes sense for indexes, but for direct key based
> retrievals it may not be the best thing.

Actually B-Trees are only useful for very large keysets. The intent originally
was to store many, many documents in a collection, in which a kind of "name 
index" is useful. It could also apply for example to an element containing 
thousands of attributes. (although I've yet to come accross such an XML 
document myself). 

Now whether the B-Tree "document name-index" should be in the datafile with
the XML file, or whether it should be considered as an index is also open to 
discussion...


>
> The second thing I'm exploring in this area, is a different way to
> partition an XML document into pieces. I've been exploring horizontal
> partitioning where all sibling nodes would be stored together in a
> single record. In a lot of ways this is pretty similar to how
> relational databases work, i.e. each set of sibling nodes represents a
> tuple within a relation, and the entire document is made up through
> joins between relations. In our case our relations will only have one
> tuple, but I'm thinking the general concept is the same. Relational
> databases don't store things at the attribute level, they store full
> tuples together as a unit. We'd still use the compression format for
> the sibling level and just extend it so that each node can find the
> proper BTree node with its children in it. Anyway, I've done a fair
> amount of exploratory work on this including some ideas for structural
> and content based indexes, but I need more time before I'm willing to
> say whether it's a good idea or not. So far I do kind of like it.
>

This might be a problem with long flat XML files as you mentioned also. Maybe 
a compromise where a fixed maximum amount of sibling nodes is stored in a 
record, but for longer sets of siblings, several records are used?

> So far I also see two cases where it will break down, one is where you
> have a large flat document and the other is where you have a deep
> hierarchical document with one child node for each node. In the first
> case you get a really large list of sibling nodes and in the second you
> get a deep hierarchy that is basically the same as storing all nodes
> individually.
>
If we manage to alleviate the one-page-per-record paradigm, having many nodes 
might become less of a problem? The advantage of having many nodes is that 
indexes can point to them! This makes for more query optimization 
possiblities...

>
> Related to this I've been exploring a versioning technique for
> implementing transactions. This is described very briefly in the book
> you mentioned "Fundamentals of Database Systems" and there are a number
> of research papers published on it. This is probably a really good
> technique from our perspective since we want versioning in the core
> anyway.

I know what you're referring to. It's what I was thinking of for transactions 
too. Many RDBMSs work this way: it means that one session can still read (old 
version) data while some other session is in the middle of a transaction. It 
does mean that now and then transactions WILL fail because of conflicting 
updates, and this is another reason why failure recovery is needed first.

> We of course still need to have a recovery log, but I think
> using a versioning technique might allow us to track smaller amounts of
> information in the log. i.e. we could write the actual data values to
> the repository as a new version without updating the version history
> for the node and then just record the version history change in the log
> before committing.
Absolutely. I hadn't considered this, but I do agree.

> Anyway, just wanted to give you some idea of what
> I'm exploring, there's still a ton to learn here. Picked up a book
> "Recovery Mechanisms in Database Systems" from the library a few days
> ago, haven't really gotten into it yet, but I'll let you know if it's
> worth reading.
>
> How comfortable are you with the recovery algorithms you mentioned?
>
I've *read* the samples in the book; that's about it :)


> BTW, is the book on database implementation you mentioned worth picking
> up? I've been looking at that book for a long time and have always
> balked at the price. I've been trying to find a copy locally to skim
> it, but neither the UofA library or any bookstores around here have it.
>
It's pretty standard actually; I don't suppose you'll find much stuff that 
other database implementation books don't mention...

James

Re: core development

Reply via email to