On 7/10/07, Joel Nelson <[EMAIL PROTECTED]> wrote:
> it otherwords, I have to check if a graph contains a node n, if it does not
> write a new node, and if it does retrieve the node. I need to be able to do
> this very fast. I have tried object oriented dbms, although they worked
> nicely for graph things, they were very poor at identity search.
>
Remember what a btree is good at: requiring a minimal number of disk
accesses to do range queries, e.g. "give me all nodes with a key
between 54-98"
If you do not need range queries, and you don't need to iterate over
the nodes in some sorted order, there is no advantage I know of to
using a b-tree, or any other tree structure.
Instead, it sounds like you already know what exact key value to look
for (there is no range). In this case, a hash table (and therefore the
HashFiler) may be a lot faster for you. A hash table can be configured
so that it almost always finds (or reports that it cannot find) your
node with 1 disk access, versus a btree which can require 3-5.
While it is true, hash table is not always a preferred data structure
because as more values are added, it will get collisions more often
and will get slower. I am not sure if it appears anywhere in Xindice
documentation, but general guidelines for selecting between HashFiler
and BTreeFiler are the following:
1. If amount of records that are going to be stored can be estimated
and won't change too much, HashFiler is a way to go. In this case
filer can be configured to allocate optimal amount of space for fast
data access.
2. If amount of records is unknown or likely to grow or hash function
does not work too well, BTreeFiler will show better performance.
This applies to any kind of data.
Also depending on what you are trying to do with these graphs, there
are specialized data structures that may work even better than a hash
table. If you are willing as you said to create your own database, you
should not be afraid to investigate using your own file format.
Here are some common structures used for storing graphs:
http://en.wikipedia.org/wiki/Adjacency_list
http://en.wikipedia.org/wiki/Adjacency_matrix
Hope this helps.