Hi Jean-Marc,

The wildcard queries etc. are basic Lucene features, part of Lucene query syntax, so probably that's why they not documented on the jena-text page. The query string is simply passed to the Lucene query parser by jena-text and should support any features of Lucene, see: http://lucene.apache.org/core/6_2_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package.description

Glad you were able to get your lookup service working!

Regarding the saving of weights: I think you could simply save them as triples (perhaps in a separate graph), outside the Lucene index. Then combine the results of the text:query with the weights from triples using SPARQL.

The jena-text query also returns score values. I'm not sure how useful they are in your use case, but they could potentially be used as a factor in the overall "notoriety" calculation. Though if you are searching just for single words or prefixes, chances are that the score values will be the same for all results.

Thanks for all the work on the Lucene 5 and 6 upgrade (JENA-1250)! I hope we can finish that work and get it merged soon after the 3.1.1 release. In any case the newer Lucene version should perform better and be easier to maintain moving forward.

-Osma

On 01/11/16 11:01, Jean-Marc Vanel wrote:
I's too bad that the * joker feature, and other details of the SPARQL to
Lucene query translation, are not documented on the Jena text search page.

Anyway, it works for my use case, I now have on my laptop a (kind of)
replacement of dbPedia lookup service.

To experiment with the original dbPedia lookup service, you can go to
semantic_forms sandbox:
http://163.172.179.125:9111/create?uri=&uri=http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FPerson
and type a few letters in the dct:subject field.

I don't need the full original literal value, because the URI results of
the query are labelled in the application: a foaf:Person is labelled by
given and family names, etc.

BUT, there is a "but", the dbPedia lookup service are apropriately ordered
by "notoriety".
Instead, I currently get with http://localhost:9000/lookup?q=*Pari*
on my TDB that mirrors dbPedia.

<ArrayOfResult>
         <Result>
           <Label>Université Pierre-et-Marie-Curie</Label>
           <URI>http://dbpedia.org/resource/Pierre_and_Marie_Curie_University
</URI>
         </Result><Result>
           <Label>Guillaume Le Gentil</Label>
           <URI>http://dbpedia.org/resource/Guillaume_Le_Gentil</URI>
         </Result><Result>
           <Label>1 E1 m</Label>
           <URI>http://dbpedia.org/resource/1_decametre</URI>
         </Result><Result>
           <Label>1 E4 m</Label>
           <URI>http://dbpedia.org/resource/1_myriametre</URI>
         </Result><Result>
           <Label>Nadia Boulanger</Label>
           <URI>http://dbpedia.org/resource/Nadia_Boulanger</URI>
         </Result><Result>
           <Label>Luis Mariano</Label>
           <URI>http://dbpedia.org/resource/Luis_Mariano</URI>
         </Result><Result>
           <Label>Paul Chemetov</Label>
           <URI>http://dbpedia.org/resource/Paul_Chemetov</URI>
         </Result><Result>
           <Label>Marc Boegner</Label>
           <URI>http://dbpedia.org/resource/Marc_Boegner</URI>
         </Result><Result>
           <Label>Cassandre (graphiste)</Label>
           <URI>http://dbpedia.org/resource/Cassandre_(artist)</URI>
         </Result><Result>
           <Label>La Norville</Label>
           <URI>http://dbpedia.org/resource/La_Norville</URI>
         </Result>
     </ArrayOfResult>

My understanding is that I need to set a weight on URI's in Lucene to
reflect their "notoriety".
I see 2 ways:

    1. easy to implement: just count the triples from and to the URI
    2. also take in account the the URI's consulted by user in my
    application (but currently I don't record that information); there is
    also the issue of combining weights 1) and 2)

Google search does both weightings.

So, in the short term I have to figure out how to add weights to the Lucene
- Jena index.

Then I have to read what dbPedia lookup does, and other background material.



2016-10-31 16:42 GMT+01:00 Osma Suominen <[email protected]>:

Hi Jean-Marc,

Depending on what exactly you want from such a service, this may be
already possible with jena-text.

I'm assuming that you want to perform a prefix search such as "édu*" and
get possible completions for that, such as "éducation".

You can of course already do a prefix search with jena-text. What you will
get back will be the RDF resources which have labels that contain this
prefix. If the text index is configured to store literal values, you can
ask for the actual values as well.

E.g. with this data:

ex:cse rdfs:label "Conseil supérieur de l'éducation"@fr .

and a suitably configured jena-text index, you can perform this query:

(?s ?score ?literal) text:query (rdfs:label "édu*") .

and get back these bindings:

?s=ex:cse ?literal="Conseil supérieur de l'éducation"@fr

However, you will get the full original literal value, not just the
individual word that matched ("éducation"). If you want just the matched
word, you will need special support that jena-text doesn't currently have.

-Osma

On 17/10/16 11:37, Jean-Marc Vanel wrote:

Hi

I'm implementing an equivalent of dbPedia lookup service [1] in
semantic_forms, leveraging on Lucene integration in TDB, and dbPedia
mirror
with TDB [2] .

The dbPedia lookup service is really nice but:

     - the hosted service is often down
     - completion is in english only

A lookup service with TDB and Lucene would overcome these 2 problems.

So I would need completion with Lucene from SPARQL.
According to Jena doc., this does not seems to be implemented:
https://jena.apache.org/documentation/query/text-query.html#
query-with-sparql

There are plenty of pages when searching for
lucene completion

  From these pages there is a code snippet here
http://stackoverflow.com/questions/120180/how-to-do-query-
auto-completion-suggestions-in-lucene
but a regular Lucene API may exist.

[1] https://github.com/dbpedia/lookup
[2]
https://github.com/jmvanel/semantic_forms/blob/master/doc/
en/administration.md#populating-with-dbpedia-mirroring-dbpedia



--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi






--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Reply via email to