Re: completion with Lucene: desirable from SPARQL

Jean-Marc Vanel Thu, 03 Nov 2016 09:36:21 -0700

Osma,

That makes sense,
and the first tests are not bad.


Although I'm surprised that "par*" does not get dbpedia:Paris in the first
10;
but "pari*" does get dbpedia:Paris in the first position:

"count" "s"
"3090"^^http://www.w3.org/2001/XMLSchema#integer
http://dbpedia.org/resource/Paris
"2676"^^http://www.w3.org/2001/XMLSchema#integer
http://dbpedia.org/resource/London
"72"^^http://www.w3.org/2001/XMLSchema#integer
http://dbpedia.org/resource/Émile_Durkheim
"68"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
Henri_Bergson
"66"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
20th_arrondissement_of_Paris
"64"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
Cornelius_Castoriadis
"64"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
Jacques_Derrida
"63"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
Michel_Foucault "62"^^http://www.w3.org/2001/XMLSchema#integer
http://dbpedia.org/resource/Louis,_Grand_Condé
"60"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
Jean-Jacques_Rousseau


I'll add that SPARQL in my sandbox as a replacement of dbpedia lookup
service,
and tell you how it goes.
But I foresee that using the Lucene implementation after adding the weights
will be more efficient. But that demands more work...


2016-11-03 14:30 GMT+01:00 Osma Suominen <[email protected]>:

> Hi Jean-Marc!
>
> AFAIK using the weights to order results is intimately linked to the text
>> index querying.
>> If I want the top 10 results, the search must have the weights beforehand
>> otherwise I must get all the results to filter later.
>> This is the reason for using AnalyzingInfixSuggester.
>> Lucene 4_9_1
>> https://lucene.apache.org/core/4_9_1/suggest/org/apache/luce
>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>> Lucene 6_2_1
>> https://lucene.apache.org/core/6_2_1/suggest/org/apache/luce
>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>>
>> I guess this is what you call "performance reasons" .
>>
>
> I don't see why you couldn't, in principle, do something like this:
>
> SELECT ?s (COUNT(*) as ?count)
> WHERE {
>   ?s text:query "édu*" .
>   ?s ?p ?o .
> }
> GROUP BY ?s
> ORDER BY DESC(?count)
> LIMIT 10
>
> (note: untested query)
>
> I'm sure it will get slow if the number of hits from the text index is
> more than a few dozen. But for a small number of results at a time, it
> might work.
>
> As I wrote in the original post, "I'll have to implement also the callback
>> for updates
>> like class TextDocProducerTriples in Jena-text." .
>> http://jena.apache.org/documentation/javadoc/text/org/apache
>> /jena/query/text/TextDocProducerTriples.html
>>
>
> Isn't that called only when the indexed triple changes (e.g. the one with
> rdfs:label or skos:prefLabel or whatever property you are indexing), but
> not when other data related to the same subject changes? So if new triples
> are added for the same subject, but its label is unchanged, then the text
> index won't see the update and thus the count of references/triples won't
> be updated either.
>
> I may be wrong here, I'm not sure how the update tracking works.
>
> -Osma
>
>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> [email protected]
> http://www.nationallibrary.fi
>



-- 
Jean-Marc Vanel
Profil: http://163.172.179.125:9111/display?displayuri=http%3A%2F%
2Fjmvanel.free.fr%2Fjmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

Re: completion with Lucene: desirable from SPARQL

Reply via email to