Re: Query regarding Jena's indexes

Andy Seaborne Tue, 30 Jul 2013 01:15:14 -0700

On 29/07/13 16:33, Rose Beck wrote:

Hi


I read the classic paper "Efficient RDF Storage and Retrieval in Jena2".
However, from the paper I am unable to understand how GSPO, GOSP, etc
(employed within Jena TDB) indexes are stored.

Can you please give me pointers from where I can understand more about
these indexes in Jena TDB. I'll be highly thankful to you for the same.


http://www.w3.org/TR/sparql11-query/#rdfDataset

The paper you reference describes storing one single graph usingrelational technology.

TDB supports "RDF datasets" - a collection of graphs (see link). TDBdoes not use an SQL database (see SDB for that). It stores triples (3slots) and quads (4 slots). There is a node table and all RDF terms arerepresented by NodeId (8 bytes). Some literals are stored with the 8bytes - other literals, URIs and bNodes are stored in the node table, akey-value table.

GSPO is one of the quad indexes of 4 NodeIs forgraph/subject/predicate/object.

There is no triple table nor quad table per se - the "indexes" aresufficient. Or alternatively, there are multiple tables each with asingle access order. It just a matter of point of view - the ccodecalls them tuple indexes.

The indexes are implemented using conventional B+Trees, with forwardlinking of the leave blocks to facility scans. They are customimplemented and only support what's necessary for the purpose so theystrip out a lot of the overhead (no row overhead: no null map, noper-row locking, ...)




        Andy

Re: Query regarding Jena's indexes

Reply via email to