Re: completion with Lucene: desirable from SPARQL

Osma Suominen Tue, 01 Nov 2016 06:00:32 -0700

Hi Jean-Marc,

The wildcard queries etc. are basic Lucene features, part of Lucenequery syntax, so probably that's why they not documented on thejena-text page. The query string is simply passed to the Lucene queryparser by jena-text and should support any features of Lucene, see:http://lucene.apache.org/core/6_2_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package.description


Glad you were able to get your lookup service working!

Regarding the saving of weights: I think you could simply save them astriples (perhaps in a separate graph), outside the Lucene index. Thencombine the results of the text:query with the weights from triplesusing SPARQL.

The jena-text query also returns score values. I'm not sure how usefulthey are in your use case, but they could potentially be used as afactor in the overall "notoriety" calculation. Though if you aresearching just for single words or prefixes, chances are that the scorevalues will be the same for all results.

Thanks for all the work on the Lucene 5 and 6 upgrade (JENA-1250)! Ihope we can finish that work and get it merged soon after the 3.1.1release. In any case the newer Lucene version should perform better andbe easier to maintain moving forward.


-Osma

On 01/11/16 11:01, Jean-Marc Vanel wrote:

I's too bad that the * joker feature, and other details of the SPARQL to
Lucene query translation, are not documented on the Jena text search page.

Anyway, it works for my use case, I now have on my laptop a (kind of)
replacement of dbPedia lookup service.

To experiment with the original dbPedia lookup service, you can go to
semantic_forms sandbox:
http://163.172.179.125:9111/create?uri=&uri=http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FPerson
and type a few letters in the dct:subject field.

I don't need the full original literal value, because the URI results of
the query are labelled in the application: a foaf:Person is labelled by
given and family names, etc.

BUT, there is a "but", the dbPedia lookup service are apropriately ordered
by "notoriety".
Instead, I currently get with http://localhost:9000/lookup?q=*Pari*
on my TDB that mirrors dbPedia.

<ArrayOfResult>
         <Result>
           <Label>Université Pierre-et-Marie-Curie</Label>
           <URI>http://dbpedia.org/resource/Pierre_and_Marie_Curie_University
</URI>
         </Result><Result>
           <Label>Guillaume Le Gentil</Label>
           <URI>http://dbpedia.org/resource/Guillaume_Le_Gentil</URI>
         </Result><Result>
           <Label>1 E1 m</Label>
           <URI>http://dbpedia.org/resource/1_decametre</URI>
         </Result><Result>
           <Label>1 E4 m</Label>
           <URI>http://dbpedia.org/resource/1_myriametre</URI>
         </Result><Result>
           <Label>Nadia Boulanger</Label>
           <URI>http://dbpedia.org/resource/Nadia_Boulanger</URI>
         </Result><Result>
           <Label>Luis Mariano</Label>
           <URI>http://dbpedia.org/resource/Luis_Mariano</URI>
         </Result><Result>
           <Label>Paul Chemetov</Label>
           <URI>http://dbpedia.org/resource/Paul_Chemetov</URI>
         </Result><Result>
           <Label>Marc Boegner</Label>
           <URI>http://dbpedia.org/resource/Marc_Boegner</URI>
         </Result><Result>
           <Label>Cassandre (graphiste)</Label>
           <URI>http://dbpedia.org/resource/Cassandre_(artist)</URI>
         </Result><Result>
           <Label>La Norville</Label>
           <URI>http://dbpedia.org/resource/La_Norville</URI>
         </Result>
     </ArrayOfResult>

My understanding is that I need to set a weight on URI's in Lucene to
reflect their "notoriety".
I see 2 ways:

    1. easy to implement: just count the triples from and to the URI
    2. also take in account the the URI's consulted by user in my
    application (but currently I don't record that information); there is
    also the issue of combining weights 1) and 2)

Google search does both weightings.

So, in the short term I have to figure out how to add weights to the Lucene
- Jena index.

Then I have to read what dbPedia lookup does, and other background material.



2016-10-31 16:42 GMT+01:00 Osma Suominen <[email protected]>:

Hi Jean-Marc,

Depending on what exactly you want from such a service, this may be
already possible with jena-text.

I'm assuming that you want to perform a prefix search such as "édu*" and
get possible completions for that, such as "éducation".

You can of course already do a prefix search with jena-text. What you will
get back will be the RDF resources which have labels that contain this
prefix. If the text index is configured to store literal values, you can
ask for the actual values as well.

E.g. with this data:

ex:cse rdfs:label "Conseil supérieur de l'éducation"@fr .

and a suitably configured jena-text index, you can perform this query:

(?s ?score ?literal) text:query (rdfs:label "édu*") .

and get back these bindings:

?s=ex:cse ?literal="Conseil supérieur de l'éducation"@fr

However, you will get the full original literal value, not just the
individual word that matched ("éducation"). If you want just the matched
word, you will need special support that jena-text doesn't currently have.

-Osma

On 17/10/16 11:37, Jean-Marc Vanel wrote:

Hi

I'm implementing an equivalent of dbPedia lookup service [1] in
semantic_forms, leveraging on Lucene integration in TDB, and dbPedia
mirror
with TDB [2] .

The dbPedia lookup service is really nice but:

     - the hosted service is often down
     - completion is in english only

A lookup service with TDB and Lucene would overcome these 2 problems.

So I would need completion with Lucene from SPARQL.
According to Jena doc., this does not seems to be implemented:
https://jena.apache.org/documentation/query/text-query.html#
query-with-sparql

There are plenty of pages when searching for
lucene completion

  From these pages there is a code snippet here
http://stackoverflow.com/questions/120180/how-to-do-query-
auto-completion-suggestions-in-lucene
but a regular Lucene API may exist.

[1] https://github.com/dbpedia/lookup
[2]
https://github.com/jmvanel/semantic_forms/blob/master/doc/
en/administration.md#populating-with-dbpedia-mirroring-dbpedia


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi



--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Re: completion with Lucene: desirable from SPARQL

Reply via email to