Bonjour Jean-Marc!
04.11.2016, 09:27, Jean-Marc Vanel kirjoitti:
Looking for Pari* with your SPARQL on dbPedia takes 4 seconds on my
supposedly efficient laptop CPU:
[...]
I should try with SSD.
I don't know whether TDB can exploit multi-core CPU.
Also I don't know whether I can pre-compile the query with a parameter for
runtime.
The obvious problem here is that the query has to count all the triples
with the same subject. I don't think SSD or a CPU with more cores would
help, at least not much.
What could help is to use a narrower query pattern, for example if you
could look at only a specific property (or a few, expressed using
VALUES) instead of every possible property.
Anyway, I'll implement the ordering by triple count in Semantic_forms.
Maybe later can it be helpful within Jena-text.
Although you probably can store the triple count in the jena-text index,
there is also an alternative way that doesn't need any new code; namely,
to precompute the counts (I'm assuming that your DBpedia data doesn't
change very often) and store them as triples, which could be done using
a single SPARQL Update query. Then you could just look up that count in
the same SPARQL query where you use jena-text and rank the results by
the count. It should be a lot faster than the query I gave you.
-Osma
PS. Are you still interested in completing the Lucene upgrade? I wrote a
comment in JENA-1250 about what to do so that we could at least get the
update to version 5 merged into Jena. At the very minimum, making a PR
against Jena would indicate (from a legal perspective) that you wish to
contribute the work to Apache Jena, so that others can make use of it.
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi