Re: B+-Tree Indexing

Andy Seaborne Thu, 25 May 2017 08:37:06 -0700


On 25/05/17 15:40, Dennis Grinwald wrote:

Thank you so much for your help, my understanding for TDB got a lot better!

One question left : In SPO etc. are the keys in the leaf blocks sorted by growing order of NodeIDs ? So first sorted by Subject NodeIDs, 
then Predicate and lastly Object NodeIDs (for SPO) . In this case one leaf block of branching factor 4 would look like this : | (2, 1, 4) , 
(2, 3, 4) , (5, 1, 6) | , where the NodeIDs: "2" and "5" represent RDF Subjects, "1","3" 
Predicates, "4" and "6" RDF Objects (great simplification of NodeIDs in this example).

Yes - really, the key is byte array made from the concatenation of SPOids into a 24 byte value. The trees don't know it is used as 3 separateparts.


    Andy



Thank you for your help!


Dennis Grinwald

________________________________
From: Andy Seaborne <[email protected]>
Sent: Thursday, May 25, 2017 6:42:22 AM
To: [email protected]
Subject: Re: B+-Tree Indexing



On 24/05/17 17:56, Dennis Grinwald wrote:

Hi Andy,


thank you for your quick and helpful response.

My understanding now is, that every single Node(RDF term) is represented
by a NodeID in TDB. Some NodeIDs contain inlined values in their bottom
56 bits and others an address, which points to an 56 bit sequence on
disk (Node Table),


The 56 bits in the NodeId is the file offset in nodes.dat for the bytes
(variable length) for the node (it's a Turtle-ish string in TDB1, RDF
Thrift in TDB2 : some bytes that go bytes -> Node).

where details of the node are recorded. The top 8
bits represent the type. Does that include an unique id, which makes it
possible for the NodeID to be accessed by TDB?


NodeIds are internal to TDB.  They are on-disk as 8 bytes - they are
created in-memeory by setting those bytes.  See the code for NodeId.java.

Furthermore I would like to know, what some indexes, when creating a TDB
directory mean :

-"nodes.dat" : Is that the node table, which contains the NodeID and the
belonging RDF term?


Its part of the node table: NodeId->bytes/string for the Node
The other part is node2id.


-What's the purpose of the "node2id.idn/.dat" and why does almost every
file has a .dat and .idn version of it ?


.idn : the B+Tree branch nodes.
.dat : the B+Tree data nodes.

These two have different formats.

-Is it possbile to open these files in a readable way ? Is it possbile
to view those files?


No. They are binary.  The layout in in the code.

(jena-cmds has "tdb.tools.dumpbpt" -- it might still work.)

I'm currently doing research on distributing those indexes on several
machines and therefore want to fully understand their purpose and
interactions. If you could therefore recommend me some helpful
resources, literature etc. like the "Clustered TDB: A Clustered Triple
Store for Jena"-paper I would be very thankful. Thank you for your
patience!


First step - is distributing the indexes the best way to scale a triple
store?

I'm not convinced it is the only choice.

We have Claude looking at using Cassandra for RDF storage.
https://github.com/Claudenw/jena-on-cassandra

We have Dick contributing a
https://github.com/apache/jena/pull/233
https://github.com/dick-twocows/jena

(Dick - sorry - haven't caught up on this yet - the holiday in Provance
was wonderful)

Implementing on Apache Spark if the busienss case is for analytic
queries of the bug data style and scale.

See also BlazeGraph which IIRC has some form of distributed B-Tree.

TDB B+Trees are not special so a google search for "distributed B-tree"
is useful e.g. maybe
http://www.vldb.org/pvldb/1/1453922.pdf
https://www.usenix.org/system/files/conference/atc16/atc16_paper-mitchell.pdf

See also 4Store.

See also TDB2 ... github.com/afs/Mantis

TDB2 is a derived TDB where the B+Trees are copy-on-write which makes
the transaction journal only contain a few byutes for each commit. TDB1
is a forward log - the changed blocks go into the journal so it can get
a bit big.  And lots of other engineering changes.

Regards

Dennis Grinwald

On 24.05.2017 01:53, Andy Seaborne wrote:
On 24/05/17 02:33, Dennis Grinwald wrote:
  > Hi there,

Hi Dennis,

  > I'm very new to the Jena TDB and have several questions about it's
  > Architecture. I would be very thankful if you could answer the
following :

All Nodes get given a NodeId which is a fixed length 8 byte sequence.

Some Nodes (RDF Terms) like integers, dateTimes, hold the value in the
NodeId itself (8 bits types, 56 bits of value) otherwise the NodeId
records the location in the node table for the details of the node are
recorded (URIs, strings, out of range integers...).

== Lookup NodeId -> Node

The NodeId is the location in the node table so TDB just fetechs it from
the file.

There is a large cache as well.  This lookup happens on query results so
it is important.

== Lookup Node -> NodeId

For this, the Node is hashed (MD5 : 128 bits) and the hash assumed to be
unique.  128 bits is a very large number!

There is then a B+Tree with key node hash and value NodeId.

This lookup happens when building the query. It's cached and there is a
a "miss" cache as well for nodes not in the database.

== SPO etc

The key is 3 NodeIds - 24 bytes.  There is no value part.

Because the B+Tree has the keys sorted, you can do lookup of S?? to find
the start of all triples with that S (lookup S00), then a range scan
forwards until (S+1)00 through the B+Tree leaves.  The leaves are
chained horizontally. (This is for TDB1 ; not true in TDB2 - the leaves
are not linked.)

Similarly, you can lookup SP?, or SPO.

The B+Trees are about order 200 - they use a 8Kbyte block.


1. In "Clustered TDB: A Clustered Triple Store for Jena" published by
Mr. Seaborn, Owens, Gibbins and Schraefel, it says that the Indexes:
SPO, POS, OSP are B+-Trees. My question is: How are the internal nodes
(keys) of this B+-Tree represented and what are the values/value
pointers in the leaves of the B+-Tree?


That was using a much older version of TDB but the design is the roughly
the same.

2. On "http://jena.apache.org/documentation/tdb/architecture.html"; it
says, that the Node to NodeID mapping is realized by a B+-Tree. My
question as in "1." : How are the internal nodes (keys) of this
B+-Tree represented and what are the values/value pointers in the
leaves of the B+-Tree?


That lookup does not need the sorted property of B+Trees, just a a
key->value lookup such as a external hash index.  But the cache is so
important anyway, it was simpler to have one datastructure.

It is quite important in TDB that it uses it's own implementation of
B+Trees because they are efficient (in space and time) for the TDB
case.  A general purpose B+Tree storage package is doing a lot of things
that TDB does not need (e.g reading and writing the bytes of keys
without copies for safety).  Transactions need to be coordinated across
multiple B+Trees so a transactional B+Tree would not be much help unless
it had a way to integrate into a larger transaction system.

Normally, the B+trees are memory-mapped into the JVM which gives the
caching, using in effect, the OS disk cache.

Hope this description helps,

      Andy



Thank you for your time.


Best regards

Dennis Grinwald

Re: B+-Tree Indexing

Reply via email to