This is
https://github.com/rdfhdt/hdt-cpp/issues/142
https://github.com/rdfhdt/hdt-java/issues/65
(a day ago)
On 15/12/17 08:51, Lorenz Buehmann wrote:
I thinkĀ we already mentioned it in some previous discussions. Nobody
here - as far as I know - developed resp. is responsible for hdt-java.
Agreed.
In addition, have you tested it with a default Fuseki/TDB store?
In general, OFFSET isn't the most efficient operator when using triple
stores - at least that was my experience over the years.
On 15.12.2017 03:57, Laura Morales wrote:
During one of my countless tests....... I've setup Fuseki with a HDT store. In
particular, the store is "wikidata.hdt".
Then I've ran this query from Fuseki web UI:
SELECT ?s
WHERE { ?s a <http://wikiba.se/ontology-beta#Item> }
LIMIT 10
OFFSET 20000000
this query takes forever... so much forever in fact, that I killed it after 15
minutes with no results. CPU 100% on *all* threads,
Heap exhaustion in Java8 is signalled by all threads at 100%.
> Java VM using all the allocated RAM (6G), no swap nor disk activity.
I don't know where the problem is, especially because I don't know the dynamics
among Fuseki/Jena and the HDT binding (hdt-java).
However:
- hdt-cpp has a small CLI tool that allows to match simple patterns like "? ? ?" or "? a ?", so
I searched for "? a <http://wikiba.se/ontology-beta#Item>" and grepped all the output filtering out
the first 20M triples. From when I issued the command, to when I started to see the first results, it elapsed about
1 minute (and a few seconds). *NOTE:* this time also takes into account the time needed to setup the Java VM, map
the HDT file into memory, and load some HDT indices (in memory)
- hdt-java (the Fuseki binding) instead, has a CLI tool called "hdtsparql" that
allows to run sparql queries directly against a HDT file, and AFAICT it uses Jena ARQ.
This tool also has some initialization time linked to the loading of the HDT, anyway the
query (above) was answered in 35 seconds (load time + query)
So as I said, I don't know what's going on between Fuseki and HDT-Java, but
this looks like a problem with Fuseki. Can somebody else confirm? Any idea?
Hint?