Re: btree on disk

Vadim Gritsenko Wed, 11 Jul 2007 04:33:56 -0700

Cam Bazz wrote:

this sounds interesting. would it be too diffucult to implement linearhashing for xindices current hashfiler?

No reason to implement linear hashing within the current HashFiler, but it makessense to implement separate LinearHashFiler class, and if you start withHashFiler's code as a starting point, it should not be too hard.


Are you game? ;)

Vadim

On 7/11/07, *Joel Nelson* < [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>wrote:


    True for Xindice's implementation of hashing, but if Xindice supported
    Linear Hashing, even growing records would be ok.

    http://en.wikipedia.org/wiki/Linear_hash

    There were a lot of papers written about this, unfortunately I lost
    the pdfs I had a while ago. However I think Berkeley DB (the famous
    embedded DB) implements linear hashing, and you should be able to find
    more information by googling.

    On 7/10/07, Natalia Shilenkova < [EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>> wrote:
     > On 7/10/07, Joel Nelson <[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>> wrote:
     > > > it otherwords, I have to check if a graph contains a node n,
    if it does not
     > > > write a new node, and if it does retrieve the node. I need to
    be able to do
     > > > this very fast. I have tried object oriented dbms, although
    they worked
     > > > nicely for graph things, they were very poor at identity search.
     > > >
     > >
     > > Remember what a btree is good at: requiring a minimal number of
    disk
     > > accesses to do range queries, e.g. "give me all nodes with a key
     > > between 54-98"
     > >
     > > If you do not need range queries, and you don't need to iterate
    over
     > > the nodes in some sorted order, there is no advantage I know of to
     > > using a b-tree, or any other tree structure.
     > >
     > > Instead, it sounds like you already know what exact key value
    to look
     > > for (there is no range). In this case, a hash table (and
    therefore the
     > > HashFiler) may be a lot faster for you. A hash table can be
    configured
     > > so that it almost always finds (or reports that it cannot find)
    your
     > > node with 1 disk access, versus a btree which can require 3-5.
     >
     > While it is true, hash table is not always a preferred data
    structure
     > because as more values are added, it will get collisions more often
     > and will get slower. I am not sure if it appears anywhere in Xindice
     > documentation, but general guidelines for selecting between
    HashFiler
     > and BTreeFiler are the following:
     >
     > 1. If amount of records that are going to be stored can be estimated
     > and won't change too much, HashFiler is a way to go. In this case
     > filer can be configured to allocate optimal amount of space for fast
     > data access.
     >
     > 2. If amount of records is unknown or likely to grow or hash function
     > does not work too well, BTreeFiler will show better performance.
     >
     > This applies to any kind of data.
     >
     > > Also depending on what you are trying to do with these graphs,
    there
     > > are specialized data structures that may work even better than
    a hash
     > > table. If you are willing as you said to create your own
    database, you
     > > should not be afraid to investigate using your own file format.
     > >
     > > Here are some common structures used for storing graphs:
     > >
     > > http://en.wikipedia.org/wiki/Adjacency_list
     > > http://en.wikipedia.org/wiki/Adjacency_matrix
     > >
     > > Hope this helps.
     > >
     >

Re: btree on disk

Reply via email to