this sounds interesting. would it be too diffucult to implement linear hashing for xindices current hashfiler?
On 7/11/07, Joel Nelson <[EMAIL PROTECTED]> wrote:
True for Xindice's implementation of hashing, but if Xindice supported Linear Hashing, even growing records would be ok. http://en.wikipedia.org/wiki/Linear_hash There were a lot of papers written about this, unfortunately I lost the pdfs I had a while ago. However I think Berkeley DB (the famous embedded DB) implements linear hashing, and you should be able to find more information by googling. On 7/10/07, Natalia Shilenkova <[EMAIL PROTECTED]> wrote: > On 7/10/07, Joel Nelson <[EMAIL PROTECTED]> wrote: > > > it otherwords, I have to check if a graph contains a node n, if it does not > > > write a new node, and if it does retrieve the node. I need to be able to do > > > this very fast. I have tried object oriented dbms, although they worked > > > nicely for graph things, they were very poor at identity search. > > > > > > > Remember what a btree is good at: requiring a minimal number of disk > > accesses to do range queries, e.g. "give me all nodes with a key > > between 54-98" > > > > If you do not need range queries, and you don't need to iterate over > > the nodes in some sorted order, there is no advantage I know of to > > using a b-tree, or any other tree structure. > > > > Instead, it sounds like you already know what exact key value to look > > for (there is no range). In this case, a hash table (and therefore the > > HashFiler) may be a lot faster for you. A hash table can be configured > > so that it almost always finds (or reports that it cannot find) your > > node with 1 disk access, versus a btree which can require 3-5. > > While it is true, hash table is not always a preferred data structure > because as more values are added, it will get collisions more often > and will get slower. I am not sure if it appears anywhere in Xindice > documentation, but general guidelines for selecting between HashFiler > and BTreeFiler are the following: > > 1. If amount of records that are going to be stored can be estimated > and won't change too much, HashFiler is a way to go. In this case > filer can be configured to allocate optimal amount of space for fast > data access. > > 2. If amount of records is unknown or likely to grow or hash function > does not work too well, BTreeFiler will show better performance. > > This applies to any kind of data. > > > Also depending on what you are trying to do with these graphs, there > > are specialized data structures that may work even better than a hash > > table. If you are willing as you said to create your own database, you > > should not be afraid to investigate using your own file format. > > > > Here are some common structures used for storing graphs: > > > > http://en.wikipedia.org/wiki/Adjacency_list > > http://en.wikipedia.org/wiki/Adjacency_matrix > > > > Hope this helps. > > >