Re: B+-Tree Indexing

Dennis Grinwald Thu, 25 May 2017 07:40:50 -0700

Thank you so much for your help, my understanding for TDB got a lot better!


One question left : In SPO etc. are the keys in the leaf blocks sorted by 
growing order of NodeIDs ? So first sorted by Subject NodeIDs, then Predicate 
and lastly Object NodeIDs (for SPO) . In this case one leaf block of branching 
factor 4 would look like this : | (2, 1, 4) , (2, 3, 4) , (5, 1, 6) | , where 
the NodeIDs: "2" and "5" represent RDF Subjects, "1","3" Predicates, "4" and 
"6" RDF Objects (great simplification of NodeIDs in this example).


Thank you for your help!


Dennis Grinwald

________________________________
From: Andy Seaborne <[email protected]>
Sent: Thursday, May 25, 2017 6:42:22 AM
To: [email protected]
Subject: Re: B+-Tree Indexing



On 24/05/17 17:56, Dennis Grinwald wrote:
> Hi Andy,
>
>
> thank you for your quick and helpful response.
>
> My understanding now is, that every single Node(RDF term) is represented
> by a NodeID in TDB. Some NodeIDs contain inlined values in their bottom
> 56 bits and others an address, which points to an 56 bit sequence on
> disk (Node Table),

The 56 bits in the NodeId is the file offset in nodes.dat for the bytes
(variable length) for the node (it's a Turtle-ish string in TDB1, RDF
Thrift in TDB2 : some bytes that go bytes -> Node).

> where details of the node are recorded. The top 8
> bits represent the type. Does that include an unique id, which makes it
> possible for the NodeID to be accessed by TDB?

NodeIds are internal to TDB.  They are on-disk as 8 bytes - they are
created in-memeory by setting those bytes.  See the code for NodeId.java.

> Furthermore I would like to know, what some indexes, when creating a TDB
> directory mean :
>
> -"nodes.dat" : Is that the node table, which contains the NodeID and the
> belonging RDF term?

Its part of the node table: NodeId->bytes/string for the Node
The other part is node2id.

>
> -What's the purpose of the "node2id.idn/.dat" and why does almost every
> file has a .dat and .idn version of it ?

.idn : the B+Tree branch nodes.
.dat : the B+Tree data nodes.

These two have different formats.

> -Is it possbile to open these files in a readable way ? Is it possbile
> to view those files?

No. They are binary.  The layout in in the code.

(jena-cmds has "tdb.tools.dumpbpt" -- it might still work.)


> I'm currently doing research on distributing those indexes on several
> machines and therefore want to fully understand their purpose and
> interactions. If you could therefore recommend me some helpful
> resources, literature etc. like the "Clustered TDB: A Clustered Triple
> Store for Jena"-paper I would be very thankful. Thank you for your
> patience!

First step - is distributing the indexes the best way to scale a triple
store?

I'm not convinced it is the only choice.

We have Claude looking at using Cassandra for RDF storage.
https://github.com/Claudenw/jena-on-cassandra

We have Dick contributing a
https://github.com/apache/jena/pull/233
https://github.com/dick-twocows/jena

(Dick - sorry - haven't caught up on this yet - the holiday in Provance
was wonderful)

Implementing on Apache Spark if the busienss case is for analytic
queries of the bug data style and scale.

See also BlazeGraph which IIRC has some form of distributed B-Tree.

TDB B+Trees are not special so a google search for "distributed B-tree"
is useful e.g. maybe
http://www.vldb.org/pvldb/1/1453922.pdf
https://www.usenix.org/system/files/conference/atc16/atc16_paper-mitchell.pdf

See also 4Store.

See also TDB2 ... github.com/afs/Mantis

TDB2 is a derived TDB where the B+Trees are copy-on-write which makes
the transaction journal only contain a few byutes for each commit. TDB1
is a forward log - the changed blocks go into the journal so it can get
a bit big.  And lots of other engineering changes.

>
>
> Regards
>
> Dennis Grinwald
>
>
>
>
>
> On 24.05.2017 01:53, Andy Seaborne wrote:
> On 24/05/17 02:33, Dennis Grinwald wrote:
>  > Hi there,
>
> Hi Dennis,
>
>  > I'm very new to the Jena TDB and have several questions about it's
>  > Architecture. I would be very thankful if you could answer the
> following :
>
> All Nodes get given a NodeId which is a fixed length 8 byte sequence.
>
> Some Nodes (RDF Terms) like integers, dateTimes, hold the value in the
> NodeId itself (8 bits types, 56 bits of value) otherwise the NodeId
> records the location in the node table for the details of the node are
> recorded (URIs, strings, out of range integers...).
>
> == Lookup NodeId -> Node
>
> The NodeId is the location in the node table so TDB just fetechs it from
> the file.
>
> There is a large cache as well.  This lookup happens on query results so
> it is important.
>
> == Lookup Node -> NodeId
>
> For this, the Node is hashed (MD5 : 128 bits) and the hash assumed to be
> unique.  128 bits is a very large number!
>
> There is then a B+Tree with key node hash and value NodeId.
>
> This lookup happens when building the query. It's cached and there is a
> a "miss" cache as well for nodes not in the database.
>
> == SPO etc
>
> The key is 3 NodeIds - 24 bytes.  There is no value part.
>
> Because the B+Tree has the keys sorted, you can do lookup of S?? to find
> the start of all triples with that S (lookup S00), then a range scan
> forwards until (S+1)00 through the B+Tree leaves.  The leaves are
> chained horizontally. (This is for TDB1 ; not true in TDB2 - the leaves
> are not linked.)
>
> Similarly, you can lookup SP?, or SPO.
>
> The B+Trees are about order 200 - they use a 8Kbyte block.
>
>>
>> 1. In "Clustered TDB: A Clustered Triple Store for Jena" published by
>> Mr. Seaborn, Owens, Gibbins and Schraefel, it says that the Indexes:
>> SPO, POS, OSP are B+-Trees. My question is: How are the internal nodes
>> (keys) of this B+-Tree represented and what are the values/value
>> pointers in the leaves of the B+-Tree?
>
> That was using a much older version of TDB but the design is the roughly
> the same.
>
>> 2. On "http://jena.apache.org/documentation/tdb/architecture.html"; it
>> says, that the Node to NodeID mapping is realized by a B+-Tree. My
>> question as in "1." : How are the internal nodes (keys) of this
>> B+-Tree represented and what are the values/value pointers in the
>> leaves of the B+-Tree?
>
> That lookup does not need the sorted property of B+Trees, just a a
> key->value lookup such as a external hash index.  But the cache is so
> important anyway, it was simpler to have one datastructure.
>
> It is quite important in TDB that it uses it's own implementation of
> B+Trees because they are efficient (in space and time) for the TDB
> case.  A general purpose B+Tree storage package is doing a lot of things
> that TDB does not need (e.g reading and writing the bytes of keys
> without copies for safety).  Transactions need to be coordinated across
> multiple B+Trees so a transactional B+Tree would not be much help unless
> it had a way to integrate into a larger transaction system.
>
> Normally, the B+trees are memory-mapped into the JVM which gives the
> caching, using in effect, the OS disk cache.
>
> Hope this description helps,
>
>      Andy
>
>
>
> Thank you for your time.
>
>
> Best regards
>
> Dennis Grinwald
>
>

Re: B+-Tree Indexing

Reply via email to