Re: completion with Lucene: desirable from SPARQL

Osma Suominen Fri, 04 Nov 2016 04:49:15 -0700

Bonjour Jean-Marc!

04.11.2016, 09:27, Jean-Marc Vanel kirjoitti:

Looking for Pari* with your SPARQL on dbPedia takes 4 seconds on my
supposedly efficient laptop CPU:

[...]

I should try with SSD.
I don't know whether TDB can exploit multi-core CPU.
Also I don't know whether I can pre-compile the query with a parameter for
runtime.

The obvious problem here is that the query has to count all the tripleswith the same subject. I don't think SSD or a CPU with more cores wouldhelp, at least not much.

What could help is to use a narrower query pattern, for example if youcould look at only a specific property (or a few, expressed usingVALUES) instead of every possible property.

Anyway, I'll implement the ordering by triple count in Semantic_forms.
Maybe later can it be helpful within Jena-text.

Although you probably can store the triple count in the jena-text index,there is also an alternative way that doesn't need any new code; namely,to precompute the counts (I'm assuming that your DBpedia data doesn'tchange very often) and store them as triples, which could be done usinga single SPARQL Update query. Then you could just look up that count inthe same SPARQL query where you use jena-text and rank the results bythe count. It should be a lot faster than the query I gave you.


-Osma

PS. Are you still interested in completing the Lucene upgrade? I wrote acomment in JENA-1250 about what to do so that we could at least get theupdate to version 5 merged into Jena. At the very minimum, making a PRagainst Jena would indicate (from a legal perspective) that you wish tocontribute the work to Apache Jena, so that others can make use of it.


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Re: completion with Lucene: desirable from SPARQL

Reply via email to