On 24/05/17 17:56, Dennis Grinwald wrote:
Hi Andy,
thank you for your quick and helpful response.
My understanding now is, that every single Node(RDF term) is represented by a
NodeID in TDB. Some NodeIDs contain inlined values in their bottom 56 bits and
others an address, which points to an 56 bit sequence on disk (Node Table),
The 56 bits in the NodeId is the file offset in nodes.dat for the bytes
(variable length) for the node (it's a Turtle-ish string in TDB1, RDF Thrift in
TDB2 :
some bytes that go bytes -> Node).
where details of the node are recorded. The top 8 bits represent the type. Does
that include an unique id, which makes it possible for the NodeID to be
accessed by TDB?
NodeIds are internal to TDB. They are on-disk as 8 bytes - they are created
in-memeory by setting those bytes. See the code for NodeId.java.
Furthermore I would like to know, what some indexes, when creating a TDB
directory mean :
-"nodes.dat" : Is that the node table, which contains the NodeID and the
belonging RDF term?
Its part of the node table: NodeId->bytes/string for the Node
The other part is node2id.
-What's the purpose of the "node2id.idn/.dat" and why does almost every file
has a .dat and .idn version of it ?
.idn : the B+Tree branch nodes.
.dat : the B+Tree data nodes.
These two have different formats.
-Is it possbile to open these files in a readable way ? Is it possbile to view
those files?
No. They are binary. The layout in in the code.
(jena-cmds has "tdb.tools.dumpbpt" -- it might still work.)
I'm currently doing research on distributing those indexes on several machines
and therefore want to fully understand their purpose and interactions. If you
could therefore recommend me some helpful resources, literature etc. like the
"Clustered TDB: A Clustered Triple Store for Jena"-paper I would be very
thankful. Thank you for your patience!
First step - is distributing the indexes the best way to scale a triple store?
I'm not convinced it is the only choice.
We have Claude looking at using Cassandra for RDF storage.
https://github.com/Claudenw/jena-on-cassandra
We have Dick contributing a
https://github.com/apache/jena/pull/233
https://github.com/dick-twocows/jena
(Dick - sorry - haven't caught up on this yet - the holiday in Provance was
wonderful)
Implementing on Apache Spark if the busienss case is for analytic queries of
the bug data style and scale.
See also BlazeGraph which IIRC has some form of distributed B-Tree.
TDB B+Trees are not special so a google search for "distributed B-tree" is
useful e.g. maybe
http://www.vldb.org/pvldb/1/1453922.pdf
https://www.usenix.org/system/files/conference/atc16/atc16_paper-mitchell.pdf
See also 4Store.
See also TDB2 ... github.com/afs/Mantis
TDB2 is a derived TDB where the B+Trees are copy-on-write which makes the
transaction journal only contain a few byutes for each commit. TDB1 is a
forward log -
the changed blocks go into the journal so it can get a bit big. And lots of
other engineering changes.
Regards
Dennis Grinwald
On 24.05.2017 01:53, Andy Seaborne wrote:
On 24/05/17 02:33, Dennis Grinwald wrote:
> Hi there,
Hi Dennis,
> I'm very new to the Jena TDB and have several questions about it's
> Architecture. I would be very thankful if you could answer the following :
All Nodes get given a NodeId which is a fixed length 8 byte sequence.
Some Nodes (RDF Terms) like integers, dateTimes, hold the value in the NodeId
itself (8 bits types, 56 bits of value) otherwise the NodeId records the
location in the node table for the details of the node are recorded (URIs,
strings, out of range integers...).
== Lookup NodeId -> Node
The NodeId is the location in the node table so TDB just fetechs it from the
file.
There is a large cache as well. This lookup happens on query results so it is
important.
== Lookup Node -> NodeId
For this, the Node is hashed (MD5 : 128 bits) and the hash assumed to be
unique. 128 bits is a very large number!
There is then a B+Tree with key node hash and value NodeId.
This lookup happens when building the query. It's cached and there is a a
"miss" cache as well for nodes not in the database.
== SPO etc
The key is 3 NodeIds - 24 bytes. There is no value part.
Because the B+Tree has the keys sorted, you can do lookup of S?? to find the
start of all triples with that S (lookup S00), then a range scan forwards until
(S+1)00 through the B+Tree leaves. The leaves are chained horizontally. (This
is for TDB1 ; not true in TDB2 - the leaves are not linked.)
Similarly, you can lookup SP?, or SPO.
The B+Trees are about order 200 - they use a 8Kbyte block.
1. In "Clustered TDB: A Clustered Triple Store for Jena" published by Mr.
Seaborn, Owens, Gibbins and Schraefel, it says that the Indexes: SPO, POS, OSP are
B+-Trees. My question is: How are the internal nodes (keys) of this B+-Tree
represented and what are the values/value pointers in the leaves of the B+-Tree?
That was using a much older version of TDB but the design is the roughly the
same.
2. On "http://jena.apache.org/documentation/tdb/architecture.html" it says, that the Node
to NodeID mapping is realized by a B+-Tree. My question as in "1."
: How are the internal nodes (keys) of this B+-Tree represented and what are
the values/value pointers in the leaves of the B+-Tree?
That lookup does not need the sorted property of B+Trees, just a a key->value
lookup such as a external hash index. But the cache is so important anyway, it
was simpler to have one datastructure.
It is quite important in TDB that it uses it's own implementation of B+Trees
because they are efficient (in space and time) for the TDB case. A general
purpose B+Tree storage package is doing a lot of things that TDB does not need
(e.g reading and writing the bytes of keys without copies for safety).
Transactions need to be coordinated across multiple B+Trees so a transactional
B+Tree would not be much help unless it had a way to integrate into a larger
transaction system.
Normally, the B+trees are memory-mapped into the JVM which gives the caching,
using in effect, the OS disk cache.
Hope this description helps,
Andy
Thank you for your time.
Best regards
Dennis Grinwald