Re: btree on disk

Cam Bazz Wed, 11 Jul 2007 23:25:59 -0700

Yeah, I am game. I will at least try.

Best,
-C.B.


On 7/11/07, Vadim Gritsenko <[EMAIL PROTECTED]> wrote:


Cam Bazz wrote:
> this sounds interesting. would it be too diffucult to implement linear
> hashing for xindices current hashfiler?

No reason to implement linear hashing within the current HashFiler, but it
makes
sense to implement separate LinearHashFiler class, and if you start with
HashFiler's code as a starting point, it should not be too hard.

Are you game? ;)

Vadim


> On 7/11/07, *Joel Nelson* < [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
> wrote:
>
>     True for Xindice's implementation of hashing, but if Xindice
supported
>     Linear Hashing, even growing records would be ok.
>
>     http://en.wikipedia.org/wiki/Linear_hash
>
>     There were a lot of papers written about this, unfortunately I lost
>     the pdfs I had a while ago. However I think Berkeley DB (the famous
>     embedded DB) implements linear hashing, and you should be able to
find
>     more information by googling.
>
>     On 7/10/07, Natalia Shilenkova < [EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>      > On 7/10/07, Joel Nelson <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>      > > > it otherwords, I have to check if a graph contains a node n,
>     if it does not
>      > > > write a new node, and if it does retrieve the node. I need to
>     be able to do
>      > > > this very fast. I have tried object oriented dbms, although
>     they worked
>      > > > nicely for graph things, they were very poor at identity
search.
>      > > >
>      > >
>      > > Remember what a btree is good at: requiring a minimal number of
>     disk
>      > > accesses to do range queries, e.g. "give me all nodes with a
key
>      > > between 54-98"
>      > >
>      > > If you do not need range queries, and you don't need to iterate
>     over
>      > > the nodes in some sorted order, there is no advantage I know of
to
>      > > using a b-tree, or any other tree structure.
>      > >
>      > > Instead, it sounds like you already know what exact key value
>     to look
>      > > for (there is no range). In this case, a hash table (and
>     therefore the
>      > > HashFiler) may be a lot faster for you. A hash table can be
>     configured
>      > > so that it almost always finds (or reports that it cannot find)
>     your
>      > > node with 1 disk access, versus a btree which can require 3-5.
>      >
>      > While it is true, hash table is not always a preferred data
>     structure
>      > because as more values are added, it will get collisions more
often
>      > and will get slower. I am not sure if it appears anywhere in
Xindice
>      > documentation, but general guidelines for selecting between
>     HashFiler
>      > and BTreeFiler are the following:
>      >
>      > 1. If amount of records that are going to be stored can be
estimated
>      > and won't change too much, HashFiler is a way to go. In this case
>      > filer can be configured to allocate optimal amount of space for
fast
>      > data access.
>      >
>      > 2. If amount of records is unknown or likely to grow or hash
function
>      > does not work too well, BTreeFiler will show better performance.
>      >
>      > This applies to any kind of data.
>      >
>      > > Also depending on what you are trying to do with these graphs,
>     there
>      > > are specialized data structures that may work even better than
>     a hash
>      > > table. If you are willing as you said to create your own
>     database, you
>      > > should not be afraid to investigate using your own file format.
>      > >
>      > > Here are some common structures used for storing graphs:
>      > >
>      > > http://en.wikipedia.org/wiki/Adjacency_list
>      > > http://en.wikipedia.org/wiki/Adjacency_matrix
>      > >
>      > > Hope this helps.
>      > >
>      >
>
>

Re: btree on disk

Reply via email to