Re: Very very slow query when using a high OFFSET

dandh988 Mon, 18 Dec 2017 00:44:37 -0800

From HDT-FoQ;
The most costly operation when dealing with RDF datasets is indexing, 
especially for big datasets whose index does not fit in main memory. In HDT, 
most of the burden is done already in the server, so only the complementary 
index (the Wavelet and the O-Index) need to be generated in client side. 
As these are the indexes needed to solve outside of SPO is it these which are 
begging created client side which accounts for the increase in elapsed time. 
This needs to be asked on the HDT mailing list or crack the code item and see 
where the FoQ part is built...


Dick
-------- Original message --------From: Laura Morales <laure...@mail.com> Date: 
18/12/2017  08:07  (GMT+00:00) To: users@jena.apache.org Cc: 
users@jena.apache.org Subject: Re: Very very slow query when using a high 
OFFSET 
> The don't have index permutations spo, ops, pos, etc.

Yes they have, what you're saying is wrong. See 
http://www.rdfhdt.org/hdt-binary-format/#triples That's what the .hdt.index 
file is about, to store more index permutations.

> To bring this thread to an end, I guess we finally answered your
> question? Or are the any open issues?

I think the only remaining open questions are:

- since the problem was not with the OFFSET, would the query "SELECT ?s FROM 
<wikidata> WHERE ..." also fail to terminate with a TDB-backed namedGraph 
(instead of HDT)?

- is there any improvement that can be added to Jena to solve these type of 
queries faster, or is it just the way it is and nothing can be done about it?

Re: Very very slow query when using a high OFFSET

Reply via email to