Re: btree on disk

Cam Bazz Tue, 10 Jul 2007 23:56:27 -0700

this sounds interesting. would it be too diffucult to implement linear
hashing for xindices current hashfiler?


On 7/11/07, Joel Nelson <[EMAIL PROTECTED]> wrote:


True for Xindice's implementation of hashing, but if Xindice supported
Linear Hashing, even growing records would be ok.

http://en.wikipedia.org/wiki/Linear_hash

There were a lot of papers written about this, unfortunately I lost
the pdfs I had a while ago. However I think Berkeley DB (the famous
embedded DB) implements linear hashing, and you should be able to find
more information by googling.

On 7/10/07, Natalia Shilenkova <[EMAIL PROTECTED]> wrote:
> On 7/10/07, Joel Nelson <[EMAIL PROTECTED]> wrote:
> > > it otherwords, I have to check if a graph contains a node n, if it
does not
> > > write a new node, and if it does retrieve the node. I need to be
able to do
> > > this very fast. I have tried object oriented dbms, although they
worked
> > > nicely for graph things, they were very poor at identity search.
> > >
> >
> > Remember what a btree is good at: requiring a minimal number of disk
> > accesses to do range queries, e.g. "give me all nodes with a key
> > between 54-98"
> >
> > If you do not need range queries, and you don't need to iterate over
> > the nodes in some sorted order, there is no advantage I know of to
> > using a b-tree, or any other tree structure.
> >
> > Instead, it sounds like you already know what exact key value to look
> > for (there is no range). In this case, a hash table (and therefore the
> > HashFiler) may be a lot faster for you. A hash table can be configured
> > so that it almost always finds (or reports that it cannot find) your
> > node with 1 disk access, versus a btree which can require 3-5.
>
> While it is true, hash table is not always a preferred data structure
> because as more values are added, it will get collisions more often
> and will get slower. I am not sure if it appears anywhere in Xindice
> documentation, but general guidelines for selecting between HashFiler
> and BTreeFiler are the following:
>
> 1. If amount of records that are going to be stored can be estimated
> and won't change too much, HashFiler is a way to go. In this case
> filer can be configured to allocate optimal amount of space for fast
> data access.
>
> 2. If amount of records is unknown or likely to grow or hash function
> does not work too well, BTreeFiler will show better performance.
>
> This applies to any kind of data.
>
> > Also depending on what you are trying to do with these graphs, there
> > are specialized data structures that may work even better than a hash
> > table. If you are willing as you said to create your own database, you
> > should not be afraid to investigate using your own file format.
> >
> > Here are some common structures used for storing graphs:
> >
> > http://en.wikipedia.org/wiki/Adjacency_list
> > http://en.wikipedia.org/wiki/Adjacency_matrix
> >
> > Hope this helps.
> >
>

Re: btree on disk

Reply via email to