On 29/07/13 16:33, Rose Beck wrote:
Hi

I read the classic paper "Efficient RDF Storage and Retrieval in Jena2".
However, from the paper I am unable to understand how GSPO, GOSP, etc
(employed within Jena TDB) indexes are stored.

Can you please give me pointers from where I can understand more about
these indexes in Jena TDB. I'll be highly thankful to you for the same.

http://www.w3.org/TR/sparql11-query/#rdfDataset

The paper you reference describes storing one single graph using relational technology.

TDB supports "RDF datasets" - a collection of graphs (see link). TDB does not use an SQL database (see SDB for that). It stores triples (3 slots) and quads (4 slots). There is a node table and all RDF terms are represented by NodeId (8 bytes). Some literals are stored with the 8 bytes - other literals, URIs and bNodes are stored in the node table, a key-value table.

GSPO is one of the quad indexes of 4 NodeIs for graph/subject/predicate/object.

There is no triple table nor quad table per se - the "indexes" are sufficient. Or alternatively, there are multiple tables each with a single access order. It just a matter of point of view - the ccode calls them tuple indexes.

The indexes are implemented using conventional B+Trees, with forward linking of the leave blocks to facility scans. They are custom implemented and only support what's necessary for the purpose so they strip out a lot of the overhead (no row overhead: no null map, no per-row locking, ...)



        Andy

Reply via email to