Re: B+-Tree Indexing

ajs6f Thu, 25 May 2017 11:51:04 -0700

Just on the note of distributed SPARQL-- there is also an incubating Apache 
project to use Accumulo to back an RDF store.


https://rya.apache.org/

---
A. Soroka

Andy Seaborne wrote on 5/25/17 9:42 AM:



On 24/05/17 17:56, Dennis Grinwald wrote:

Hi Andy,


thank you for your quick and helpful response.

My understanding now is, that every single Node(RDF term) is represented by a 
NodeID in TDB. Some NodeIDs contain inlined values in their bottom 56 bits and
others an address, which points to an 56 bit sequence on disk (Node Table),


The 56 bits in the NodeId is the file offset in nodes.dat for the bytes 
(variable length) for the node (it's a Turtle-ish string in TDB1, RDF Thrift in 
TDB2 :
some bytes that go bytes -> Node).

where details of the node are recorded. The top 8 bits represent the type. Does 
that include an unique id, which makes it possible for the NodeID to be
accessed by TDB?


NodeIds are internal to TDB.  They are on-disk as 8 bytes - they are created 
in-memeory by setting those bytes.  See the code for NodeId.java.

Furthermore I would like to know, what some indexes, when creating a TDB 
directory mean :

-"nodes.dat" : Is that the node table, which contains the NodeID and the 
belonging RDF term?


Its part of the node table: NodeId->bytes/string for the Node
The other part is node2id.


-What's the purpose of the "node2id.idn/.dat" and why does almost every file 
has a .dat and .idn version of it ?


.idn : the B+Tree branch nodes.
.dat : the B+Tree data nodes.

These two have different formats.

-Is it possbile to open these files in a readable way ? Is it possbile to view 
those files?


No. They are binary.  The layout in in the code.

(jena-cmds has "tdb.tools.dumpbpt" -- it might still work.)

I'm currently doing research on distributing those indexes on several machines 
and therefore want to fully understand their purpose and interactions. If you
could therefore recommend me some helpful resources, literature etc. like the 
"Clustered TDB: A Clustered Triple Store for Jena"-paper I would be very
thankful. Thank you for your patience!


First step - is distributing the indexes the best way to scale a triple store?

I'm not convinced it is the only choice.

We have Claude looking at using Cassandra for RDF storage.
https://github.com/Claudenw/jena-on-cassandra

We have Dick contributing a
https://github.com/apache/jena/pull/233
https://github.com/dick-twocows/jena

(Dick - sorry - haven't caught up on this yet - the holiday in Provance was 
wonderful)

Implementing on Apache Spark if the busienss case is for analytic queries of 
the bug data style and scale.

See also BlazeGraph which IIRC has some form of distributed B-Tree.

TDB B+Trees are not special so a google search for "distributed B-tree" is 
useful e.g. maybe
http://www.vldb.org/pvldb/1/1453922.pdf
https://www.usenix.org/system/files/conference/atc16/atc16_paper-mitchell.pdf

See also 4Store.

See also TDB2 ... github.com/afs/Mantis

TDB2 is a derived TDB where the B+Trees are copy-on-write which makes the 
transaction journal only contain a few byutes for each commit. TDB1 is a 
forward log -
the changed blocks go into the journal so it can get a bit big.  And lots of 
other engineering changes.

Regards

Dennis Grinwald

On 24.05.2017 01:53, Andy Seaborne wrote:
On 24/05/17 02:33, Dennis Grinwald wrote:
 > Hi there,

Hi Dennis,

 > I'm very new to the Jena TDB and have several questions about it's
 > Architecture. I would be very thankful if you could answer the following :

All Nodes get given a NodeId which is a fixed length 8 byte sequence.

Some Nodes (RDF Terms) like integers, dateTimes, hold the value in the NodeId 
itself (8 bits types, 56 bits of value) otherwise the NodeId records the
location in the node table for the details of the node are recorded (URIs, 
strings, out of range integers...).

== Lookup NodeId -> Node

The NodeId is the location in the node table so TDB just fetechs it from the 
file.

There is a large cache as well.  This lookup happens on query results so it is 
important.

== Lookup Node -> NodeId

For this, the Node is hashed (MD5 : 128 bits) and the hash assumed to be 
unique.  128 bits is a very large number!

There is then a B+Tree with key node hash and value NodeId.

This lookup happens when building the query. It's cached and there is a a 
"miss" cache as well for nodes not in the database.

== SPO etc

The key is 3 NodeIds - 24 bytes.  There is no value part.

Because the B+Tree has the keys sorted, you can do lookup of S?? to find the 
start of all triples with that S (lookup S00), then a range scan forwards until
(S+1)00 through the B+Tree leaves.  The leaves are chained horizontally. (This 
is for TDB1 ; not true in TDB2 - the leaves are not linked.)

Similarly, you can lookup SP?, or SPO.

The B+Trees are about order 200 - they use a 8Kbyte block.


1. In "Clustered TDB: A Clustered Triple Store for Jena" published by Mr. 
Seaborn, Owens, Gibbins and Schraefel, it says that the Indexes: SPO, POS, OSP are
B+-Trees. My question is: How are the internal nodes (keys) of this B+-Tree 
represented and what are the values/value pointers in the leaves of the B+-Tree?


That was using a much older version of TDB but the design is the roughly the 
same.

2. On "http://jena.apache.org/documentation/tdb/architecture.html"; it says, that the Node 
to NodeID mapping is realized by a B+-Tree. My question as in "1."
: How are the internal nodes (keys) of this B+-Tree represented and what are 
the values/value pointers in the leaves of the B+-Tree?


That lookup does not need the sorted property of B+Trees, just a a key->value 
lookup such as a external hash index.  But the cache is so important anyway, it
was simpler to have one datastructure.

It is quite important in TDB that it uses it's own implementation of B+Trees 
because they are efficient (in space and time) for the TDB case.  A general
purpose B+Tree storage package is doing a lot of things that TDB does not need 
(e.g reading and writing the bytes of keys without copies for safety).
Transactions need to be coordinated across multiple B+Trees so a transactional 
B+Tree would not be much help unless it had a way to integrate into a larger
transaction system.

Normally, the B+trees are memory-mapped into the JVM which gives the caching, 
using in effect, the OS disk cache.

Hope this description helps,

     Andy



Thank you for your time.


Best regards

Dennis Grinwald

Re: B+-Tree Indexing

Reply via email to