From HDT-FoQ; The most costly operation when dealing with RDF datasets is indexing, especially for big datasets whose index does not fit in main memory. In HDT, most of the burden is done already in the server, so only the complementary index (the Wavelet and the O-Index) need to be generated in client side. As these are the indexes needed to solve outside of SPO is it these which are begging created client side which accounts for the increase in elapsed time. This needs to be asked on the HDT mailing list or crack the code item and see where the FoQ part is built...
Dick -------- Original message --------From: Laura Morales <laure...@mail.com> Date: 18/12/2017 08:07 (GMT+00:00) To: users@jena.apache.org Cc: users@jena.apache.org Subject: Re: Very very slow query when using a high OFFSET > The don't have index permutations spo, ops, pos, etc. Yes they have, what you're saying is wrong. See http://www.rdfhdt.org/hdt-binary-format/#triples That's what the .hdt.index file is about, to store more index permutations. > To bring this thread to an end, I guess we finally answered your > question? Or are the any open issues? I think the only remaining open questions are: - since the problem was not with the OFFSET, would the query "SELECT ?s FROM <wikidata> WHERE ..." also fail to terminate with a TDB-backed namedGraph (instead of HDT)? - is there any improvement that can be added to Jena to solve these type of queries faster, or is it just the way it is and nothing can be done about it?