Re: completion with Lucene: desirable from SPARQL

Jean-Marc Vanel Fri, 04 Nov 2016 00:28:37 -0700

Looking for Pari* with your SPARQL on dbPedia takes 4 seconds on my
supposedly efficient laptop CPU:


$ lscpu
Architecture:          x86_64
Mode(s) opératoire(s) des processeurs :32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) par cœur : 2
Cœur(s) par socket : 4
Socket(s):             1
Nœud(s) NUMA :       1
Identifiant constructeur :GenuineIntel
Famille de processeur :6
Modèle :             94
Model name:            Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Révision :           3
Vitesse du processeur en MHz :2644.789
CPU max MHz:           3500,0000
CPU min MHz:           800,0000
BogoMIPS:              5181.67

I should try with SSD.
I don't know whether TDB can exploit multi-core CPU.
Also I don't know whether I can pre-compile the query with a parameter for
runtime.

Anyway, I'll implement the ordering by triple count in Semantic_forms.
Maybe later can it be helpful within Jena-text.


2016-11-03 14:30 GMT+01:00 Osma Suominen <[email protected]>:

> Hi Jean-Marc!
>
> AFAIK using the weights to order results is intimately linked to the text
>> index querying.
>> If I want the top 10 results, the search must have the weights beforehand
>> otherwise I must get all the results to filter later.
>> This is the reason for using AnalyzingInfixSuggester.
>> Lucene 4_9_1
>> https://lucene.apache.org/core/4_9_1/suggest/org/apache/luce
>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>> Lucene 6_2_1
>> https://lucene.apache.org/core/6_2_1/suggest/org/apache/luce
>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>>
>> I guess this is what you call "performance reasons" .
>>
>
> I don't see why you couldn't, in principle, do something like this:
>
> SELECT ?s (COUNT(*) as ?count)
> WHERE {
>   ?s text:query "édu*" .
>   ?s ?p ?o .
> }
> GROUP BY ?s
> ORDER BY DESC(?count)
> LIMIT 10
>
> (note: untested query)
>
> I'm sure it will get slow if the number of hits from the text index is
> more than a few dozen. But for a small number of results at a time, it
> might work.
>
> As I wrote in the original post, "I'll have to implement also the callback
>> for updates
>> like class TextDocProducerTriples in Jena-text." .
>> http://jena.apache.org/documentation/javadoc/text/org/
>> apache/jena/query/text/TextDocProducerTriples.html
>>
>
> Isn't that called only when the indexed triple changes (e.g. the one with
> rdfs:label or skos:prefLabel or whatever property you are indexing), but
> not when other data related to the same subject changes? So if new triples
> are added for the same subject, but its label is unchanged, then the text
> index won't see the update and thus the count of references/triples won't
> be updated either.
>
> I may be wrong here, I'm not sure how the update tracking works.
>
> -Osma
>
>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> [email protected]
> http://www.nationallibrary.fi
>



-- 
Jean-Marc Vanel
Profil:
http://163.172.179.125:9111/display?displayuri=http%3A%2F%2Fjmvanel.free.fr%2Fjmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

Re: completion with Lucene: desirable from SPARQL

Reply via email to