On 29/07/13 16:33, Rose Beck wrote:
Hi
I read the classic paper "Efficient RDF Storage and Retrieval in Jena2".
However, from the paper I am unable to understand how GSPO, GOSP, etc
(employed within Jena TDB) indexes are stored.
Can you please give me pointers from where I can understand more about
these indexes in Jena TDB. I'll be highly thankful to you for the same.
http://www.w3.org/TR/sparql11-query/#rdfDataset
The paper you reference describes storing one single graph using
relational technology.
TDB supports "RDF datasets" - a collection of graphs (see link). TDB
does not use an SQL database (see SDB for that). It stores triples (3
slots) and quads (4 slots). There is a node table and all RDF terms are
represented by NodeId (8 bytes). Some literals are stored with the 8
bytes - other literals, URIs and bNodes are stored in the node table, a
key-value table.
GSPO is one of the quad indexes of 4 NodeIs for
graph/subject/predicate/object.
There is no triple table nor quad table per se - the "indexes" are
sufficient. Or alternatively, there are multiple tables each with a
single access order. It just a matter of point of view - the ccode
calls them tuple indexes.
The indexes are implemented using conventional B+Trees, with forward
linking of the leave blocks to facility scans. They are custom
implemented and only support what's necessary for the purpose so they
strip out a lot of the overhead (no row overhead: no null map, no
per-row locking, ...)
Andy