Bonjour Jean-Marc!

04.11.2016, 09:27, Jean-Marc Vanel kirjoitti:
Looking for Pari* with your SPARQL on dbPedia takes 4 seconds on my
supposedly efficient laptop CPU:
[...]
I should try with SSD.
I don't know whether TDB can exploit multi-core CPU.
Also I don't know whether I can pre-compile the query with a parameter for
runtime.

The obvious problem here is that the query has to count all the triples with the same subject. I don't think SSD or a CPU with more cores would help, at least not much.

What could help is to use a narrower query pattern, for example if you could look at only a specific property (or a few, expressed using VALUES) instead of every possible property.

Anyway, I'll implement the ordering by triple count in Semantic_forms.
Maybe later can it be helpful within Jena-text.

Although you probably can store the triple count in the jena-text index, there is also an alternative way that doesn't need any new code; namely, to precompute the counts (I'm assuming that your DBpedia data doesn't change very often) and store them as triples, which could be done using a single SPARQL Update query. Then you could just look up that count in the same SPARQL query where you use jena-text and rank the results by the count. It should be a lot faster than the query I gave you.

-Osma

PS. Are you still interested in completing the Lucene upgrade? I wrote a comment in JENA-1250 about what to do so that we could at least get the update to version 5 merged into Jena. At the very minimum, making a PR against Jena would indicate (from a legal perspective) that you wish to contribute the work to Apache Jena, so that others can make use of it.

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Reply via email to