Andy,

Thanks for the clarification regarding ARQ. 

I am happy to hear that ARQ is using underlying indexes. 

Best regards,
Niels

-----Original Message-----
From: Andy Seaborne [mailto:[email protected]] 
Sent: Monday, November 14, 2016 11:45
To: [email protected]
Subject: Re: How do I do a join between multiple model.listStatments calls?

Jena has APIs for local and remote access for SPARQL.

Many large installations are a SPARQL triple store with business logic layer.

On 14/11/16 19:10, Niels Andersen wrote:
> Andy is answering my original question about joins, he stated that 
> Jena ARQ is using the Jena API, Graph.find and listStatement (you 
> included this in your response).

I said it uses Graph.find or is faster.

TDB cuts through Graph.find and listStatements to work on the indexes 
themselves.

> Again, if I understand this
> correctly, then Jena ARQ does not implement a join algorithm based on 
> two sorted lists, so the join must be performed using lookups for each 
> element returned from the first list (like I showed in my example). 
> While this is OK for small datasets, it becomes problematic for large 
> datasets. Do I understand this correctly?

It's called an index join and in TDB does work with RDF terms but with internal 
ids (which are fixed 8 bytes long).  The representation of teh RDF terms are 
left on disk unless needed later ("if you do not need data, do not touch it.").

If the first set is small, an index join is faster than a merge join.  A merge 
join still need to traverse the whole of both sides if it does not use sideways 
passing ... in which case it becomes a form of index join. 
Due to caching, index lookup is not necessarily expensive.

I would still like to hear what you are intending to use RDF for.  What 
features of semntic web, or RDF are you exploting?  You email address suggests 
an IoT application.

        Andy

Reply via email to