Re: Very very slow query when using a high OFFSET

Lorenz Buehmann Fri, 15 Dec 2017 00:59:15 -0800

Have you tried to contact one of the developers? Or ask in the HDT forum
[1]?


[1]
http://www.rdfhdt.org/forum/rdf-hdt-support-group1/bugs-and-problems-forum2/


On 15.12.2017 03:57, Laura Morales wrote:
> During one of my countless tests....... I've setup Fuseki with a HDT store. 
> In particular, the store is "wikidata.hdt".
>
> Then I've ran this query from Fuseki web UI:
>
> SELECT ?s
> WHERE { ?s a <http://wikiba.se/ontology-beta#Item> }
> LIMIT 10
> OFFSET 20000000
>
> this query takes forever... so much forever in fact, that I killed it after 
> 15 minutes with no results. CPU 100% on *all* threads, Java VM using all the 
> allocated RAM (6G), no swap nor disk activity.
> I don't know where the problem is, especially because I don't know the 
> dynamics among Fuseki/Jena and the HDT binding (hdt-java).
>
> However:
>
> - hdt-cpp has a small CLI tool that allows to match simple patterns like "? ? 
> ?" or "? a ?", so I searched for "? a <http://wikiba.se/ontology-beta#Item>" 
> and grepped all the output filtering out the first 20M triples. From when I 
> issued the command, to when I started to see the first results, it elapsed 
> about 1 minute (and a few seconds). *NOTE:* this time also takes into account 
> the time needed to setup the Java VM, map the HDT file into memory, and load 
> some HDT indices (in memory)
>
> - hdt-java (the Fuseki binding) instead, has a CLI tool called "hdtsparql" 
> that allows to run sparql queries directly against a HDT file, and AFAICT it 
> uses Jena ARQ. This tool also has some initialization time linked to the 
> loading of the HDT, anyway the query (above) was answered in 35 seconds (load 
> time + query)
>
> So as I said, I don't know what's going on between Fuseki and HDT-Java, but 
> this looks like a problem with Fuseki. Can somebody else confirm? Any idea? 
> Hint?

Re: Very very slow query when using a high OFFSET

Reply via email to