Re: Puzzled with Jena text from API: well indexed in Lucene, but SPARQL returns nothing

Jean-Marc Vanel Sun, 30 Jul 2017 03:05:54 -0700

I modified the test according to your request:
https://github.com/jmvanel/semantic_forms/blob/master/scala/
forms/src/main/scala/deductions/runtime/jena/lucene/TestTextIndex2.scala


and here is the result:

[info] Doc: 0
[info]   1 stored,indexed,tokenized,indexOptions=DOCS<uri:test:/test1>
[info]   uri = test:/test1
[info]   2
stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:19d3a93327bdf0b91b03170dceb6e012423dece6a8a9e0ec48f098e8f742a5f6>
[info]   uid =
19d3a93327bdf0b91b03170dceb6e012423dece6a8a9e0ec48f098e8f742a5f6
[info]   3
stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:59074d3e3a183c6fd25f3ee84b2603dcbd9de496fbaca72d4f42093bca3ad169>
[info]   uid =
59074d3e3a183c6fd25f3ee84b2603dcbd9de496fbaca72d4f42093bca3ad169
[info] search "test1"
[info] Doc: 0
[info]   1 stored,indexed,tokenized,indexOptions=DOCS<uri:test:/test1>
[info]   uri = test:/test1
[info]   2
stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:19d3a93327bdf0b91b03170dceb6e012423dece6a8a9e0ec48f098e8f742a5f6>
[info]   uid =
19d3a93327bdf0b91b03170dceb6e012423dece6a8a9e0ec48f098e8f742a5f6
[info]   3
stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:59074d3e3a183c6fd25f3ee84b2603dcbd9de496fbaca72d4f42093bca3ad169>
[info]   uid =
59074d3e3a183c6fd25f3ee84b2603dcbd9de496fbaca72d4f42093bca3ad169
[info] sparql Query
[info]     PREFIX text: <http://jena.apache.org/text#>
[info]     SELECT * WHERE {
[info]       graph ?g {
[info]         ?thing ?p ?o .
[info]       }
[info]     }
[info]
[info]
--------------------------------------------------------------------------------------------------------------
[info] | thing         | p                                            |
o                 | g                       |
[info]
==============================================================================================================
[info] | <test:/test1> | <http://www.w3.org/2000/01/rdf-schema#label> |
"test-extra-data" | <test:/test-extra-data> |
[info] | <test:/test1> | <http://www.w3.org/2000/01/rdf-schema#label> |
"test1"           | <test:/test1>           |
[info] | <test:/test1> | <http://xmlns.com/foaf/0.1/givenName>        |
"test1"           | <test:/test1>           |
[info]
--------------------------------------------------------------------------------------------------------------
[info] sparql Query
[info]     PREFIX text: <http://jena.apache.org/text#>
[info]     SELECT * WHERE {
[info]       graph ?g {
[info]         ?thing text:query 'test1' .
[info]         ?thing ?p ?o .
[info]       }
[info]     }
[info]
[info] ---------------------
[info] | thing | p | o | g |
[info] =====================
[info] ---------------------
[info] tdb.tdbdump (after dataset.close() )
[info] <test:/test1> <http://www.w3.org/2000/01/rdf-schema#label>
"test-extra-data" <test:/test-extra-data> .
[info] <test:/test1> <http://www.w3.org/2000/01/rdf-schema#label> "test1"
<test:/test1> .
[info] <test:/test1> <http://xmlns.com/foaf/0.1/givenName> "test1"
<test:/test1> .
[success] Total time: 4 s, completed 30 juil. 2017 10:39:11

I can help you with compile and run the test in Scala, or even translate it
in Java,
or any other help :) .



2017-07-29 19:04 GMT+02:00 Andy Seaborne <a...@apache.org>:

>
> On 29/07/17 09:54, Jean-Marc Vanel wrote:
>
>> The self-contained test with no semantic_forms nor Banana dependency, that
>> reproduces the scenario by the API:
>> https://github.com/jmvanel/semantic_forms/blob/master/
>> scala/forms/src/main/scala/deductions/runtime/jena/
>> lucene/TestTextIndex2.scala
>>
>> now FAILS!
>>
>> Jena problem:
>>      when adding first a named graph with no relevant data,
>>      and second the named graph with relevant data,
>>      the SPARQL query with text:query FAILS.
>>
>> It looks as if only the first named graph is used in SPARQL processing.
>>
>>
> Could you please try a query that focus on that point:
>
> SELECT * { GRAPH ?g { ?s ?p ?o } }
>
>     Andy
>
> It is a regression after the migration to recent Lucene version.
>> It looks as if nobody tested Jena + Lucene with several named graphs ...
>>
>>
>>
>> 2017-07-28 14:50 GMT+02:00 Jean-Marc Vanel <jeanmarc.va...@gmail.com>:
>>
>> Forgot to say that I'm using Jena 3.3.0 on Ubuntu 17.04 , and
>>> java -version
>>> java version "1.8.0_121"
>>> Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
>>> Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
>>>
>>> The semantic_forms sandbox is up-to-date with the source code and the
>>> scenario above:
>>> http://semantic-forms.cc:9111/
>>>
>>>
>>>
>>>
>>> 2017-07-28 13:14 GMT+02:00 Jean-Marc Vanel <jeanmarc.va...@gmail.com>:
>>>
>>> Hi
>>>>
>>>> I've checked lots of things for 2 days.
>>>>
>>>> I have this scenario in semantic_forms:
>>>>
>>>>      - on fresh TDB and LUCENE directories
>>>>      - load rdfs: (the ontology)
>>>>      - create instance of class bli:bli (sic !)
>>>>      - enter rdfs:comment bli
>>>>      - search bli => NOTHING !!! :(
>>>>
>>>> I wrote a  self-contained test with no semantic_forms nor Banana
>>>> dependency, that reproduces the same scenario by theAPI:
>>>> https://github.com/jmvanel/semantic_forms/blob/master/scala/
>>>> forms/src/main/scala/deductions/runtime/jena/lucene/
>>>> TestTextIndex2.scala
>>>>
>>>> But it succeds !!!
>>>>
>>>> So I wrote another test that runs on the TDB that was prepared in the
>>>> above scenario in semantic_forms:
>>>> https://github.com/jmvanel/semantic_forms/blob/master/scala/
>>>> forms/src/main/scala/deductions/runtime/jena/lucene/
>>>> QueryTextIndex.scala
>>>>
>>>> The indexing seems normal on Lucene + Jena side, but NOT the SPARQL
>>>> search with text:query .
>>>>
>>>> runMain deductions.runtime.jena.lucene.QueryTextIndex bli TDB
>>>> ...
>>>> [info] search with Lucene: bli
>>>> [info] Doc: 30
>>>> [info]   1 stored,indexed,tokenized,indexOptions=DOCS<uri:http://localh
>>>> ost:9000/ldp/1501237821055-8217451390491>
>>>> [info]   uri = http://localhost:9000/ldp/1501237821055-8217451390491
>>>> [info]   2 stored,indexed,tokenized,omitNorms,indexOptions=DOCS<lang:
>>>> fr>
>>>> [info]   lang = fr
>>>> [info]   3 stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:f1e
>>>> 70540a1cd751b78e29b31b4ae57c5520b71a728f8e1c7b24c698e8cd85e83>
>>>> [info]   uid = f1e70540a1cd751b78e29b31b4ae57
>>>> c5520b71a728f8e1c7b24c698e8cd85e83
>>>> [info]   4 stored,indexed,tokenized,omitNorms,indexOptions=DOCS<lang:
>>>> fr>
>>>> [info]   lang = fr
>>>> [info]   5 stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:435
>>>> b1578a796765c441ad43a9147e1952abbc44facfa5aebab3d6cb67e98f844>
>>>> [info]   uid = 435b1578a796765c441ad43a9147e1
>>>> 952abbc44facfa5aebab3d6cb67e98f844
>>>> [info] query
>>>> [info]     PREFIX text: <http://jena.apache.org/text#>
>>>> [info]     PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>>>> [info]     SELECT * WHERE {
>>>> [info]     graph ?g {
>>>> [info]     # ?thing text:query (rdfs:label  "bli" ) .
>>>> [info]     ?thing text:query 'bli' .
>>>> [info]     ?thing ?p ?o .
>>>> [info]   }
>>>> [info] } LIMIT 22
>>>> [info]
>>>> [info] ---------------------
>>>> [info] | thing | p | o | g |
>>>> [info] =====================
>>>> [info] ---------------------
>>>>
>>>> The URI in Lucene dump is correct. I'm surprised that field "lang"
>>>> appears 2 times, and "graph" not at all .
>>>>
>>>> I've looked in the Jena code, and the member fields in EntityDefinition
>>>> https://github.com/apache/jena/blob/master/jena-text/src/
>>>> main/java/org/apache/jena/query/text/EntityDefinition.java#L39
>>>> looks as if it is not always updated.
>>>> fields is initialized once from fieldToPredicate, and I'm not sure that
>>>> fieldToPredicate is initialized before;
>>>> moreover it is modified by method
>>>> void set(String field, Node predicate)
>>>> https://github.com/apache/jena/blob/master/jena-text/src/
>>>> main/java/org/apache/jena/query/text/EntityDefinition.java#L126
>>>>
>>>> --
>>>> Jean-Marc Vanel
>>>> http://www.semantic-forms.cc:9111/display?displayuri=http://
>>>> jmvanel.free.fr/jmv.rdf%23me
>>>> Déductions SARL - Consulting, services, training,
>>>> Rule-based programming, Semantic Web
>>>> +33 (0)6 89 16 29 52 <+33%206%2089%2016%2029%2052>
>>>> Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
>>>>
>>>>
>>>
>>>
>>> --
>>> Jean-Marc Vanel
>>> http://www.semantic-forms.cc:9111/display?displayuri=http:/
>>> /jmvanel.free.fr/jmv.rdf%23me
>>> Déductions SARL - Consulting, services, training,
>>> Rule-based programming, Semantic Web
>>> +33 (0)6 89 16 29 52 <+33%206%2089%2016%2029%2052>
>>> Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
>>>
>>>
>>
>>
>>


-- 
Jean-Marc Vanel
http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

Re: Puzzled with Jena text from API: well indexed in Lucene, but SPARQL returns nothing

Reply via email to