I put the test in Scala in a separate folder, easy to run for a non-Scala
user:
https://github.com/jmvanel/semantic_forms/tree/master/scala/jena_only
Hope it helps.
Note that commenting out some settings makes an exception:
176 entMap.setLangField("lang")
177 entMap.setUidField("uid")
178 entMap.setGraphField("graph")
2017-07-30 12:05 GMT+02:00 Jean-Marc Vanel <[email protected]>:
> I modified the test according to your request:
> https://github.com/jmvanel/semantic_forms/blob/master/scala/
> forms/src/main/scala/deductions/runtime/jena/lucene/TestTextIndex2.scala
>
> and here is the result:
>
> [info] Doc: 0
> [info] 1 stored,indexed,tokenized,indexOptions=DOCS<uri:test:/test1>
> [info] uri = test:/test1
> [info] 2 stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:
> 19d3a93327bdf0b91b03170dceb6e012423dece6a8a9e0ec48f098e8f742a5f6>
> [info] uid = 19d3a93327bdf0b91b03170dceb6e0
> 12423dece6a8a9e0ec48f098e8f742a5f6
> [info] 3 stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:
> 59074d3e3a183c6fd25f3ee84b2603dcbd9de496fbaca72d4f42093bca3ad169>
> [info] uid = 59074d3e3a183c6fd25f3ee84b2603
> dcbd9de496fbaca72d4f42093bca3ad169
> [info] search "test1"
> [info] Doc: 0
> [info] 1 stored,indexed,tokenized,indexOptions=DOCS<uri:test:/test1>
> [info] uri = test:/test1
> [info] 2 stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:
> 19d3a93327bdf0b91b03170dceb6e012423dece6a8a9e0ec48f098e8f742a5f6>
> [info] uid = 19d3a93327bdf0b91b03170dceb6e0
> 12423dece6a8a9e0ec48f098e8f742a5f6
> [info] 3 stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:
> 59074d3e3a183c6fd25f3ee84b2603dcbd9de496fbaca72d4f42093bca3ad169>
> [info] uid = 59074d3e3a183c6fd25f3ee84b2603
> dcbd9de496fbaca72d4f42093bca3ad169
> [info] sparql Query
> [info] PREFIX text: <http://jena.apache.org/text#>
> [info] SELECT * WHERE {
> [info] graph ?g {
> [info] ?thing ?p ?o .
> [info] }
> [info] }
> [info]
> [info] ------------------------------------------------------------
> --------------------------------------------------
> [info] | thing | p |
> o | g |
> [info] ============================================================
> ==================================================
> [info] | <test:/test1> | <http://www.w3.org/2000/01/rdf-schema#label> |
> "test-extra-data" | <test:/test-extra-data> |
> [info] | <test:/test1> | <http://www.w3.org/2000/01/rdf-schema#label> |
> "test1" | <test:/test1> |
> [info] | <test:/test1> | <http://xmlns.com/foaf/0.1/givenName> |
> "test1" | <test:/test1> |
> [info] ------------------------------------------------------------
> --------------------------------------------------
> [info] sparql Query
> [info] PREFIX text: <http://jena.apache.org/text#>
> [info] SELECT * WHERE {
> [info] graph ?g {
> [info] ?thing text:query 'test1' .
> [info] ?thing ?p ?o .
> [info] }
> [info] }
> [info]
> [info] ---------------------
> [info] | thing | p | o | g |
> [info] =====================
> [info] ---------------------
> [info] tdb.tdbdump (after dataset.close() )
> [info] <test:/test1> <http://www.w3.org/2000/01/rdf-schema#label>
> "test-extra-data" <test:/test-extra-data> .
> [info] <test:/test1> <http://www.w3.org/2000/01/rdf-schema#label> "test1"
> <test:/test1> .
> [info] <test:/test1> <http://xmlns.com/foaf/0.1/givenName> "test1"
> <test:/test1> .
> [success] Total time: 4 s, completed 30 juil. 2017 10:39:11
>
> I can help you with compile and run the test in Scala, or even translate
> it in Java,
> or any other help :) .
>
>
>
> 2017-07-29 19:04 GMT+02:00 Andy Seaborne <[email protected]>:
>
>>
>> On 29/07/17 09:54, Jean-Marc Vanel wrote:
>>
>>> The self-contained test with no semantic_forms nor Banana dependency,
>>> that
>>> reproduces the scenario by the API:
>>> https://github.com/jmvanel/semantic_forms/blob/master/
>>> scala/forms/src/main/scala/deductions/runtime/jena/
>>> lucene/TestTextIndex2.scala
>>>
>>> now FAILS!
>>>
>>> Jena problem:
>>> when adding first a named graph with no relevant data,
>>> and second the named graph with relevant data,
>>> the SPARQL query with text:query FAILS.
>>>
>>> It looks as if only the first named graph is used in SPARQL processing.
>>>
>>>
>> Could you please try a query that focus on that point:
>>
>> SELECT * { GRAPH ?g { ?s ?p ?o } }
>>
>> Andy
>>
>> It is a regression after the migration to recent Lucene version.
>>> It looks as if nobody tested Jena + Lucene with several named graphs ...
>>>
>>>
>>>
>>> 2017-07-28 14:50 GMT+02:00 Jean-Marc Vanel <[email protected]>:
>>>
>>> Forgot to say that I'm using Jena 3.3.0 on Ubuntu 17.04 , and
>>>> java -version
>>>> java version "1.8.0_121"
>>>> Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
>>>>
>>>> The semantic_forms sandbox is up-to-date with the source code and the
>>>> scenario above:
>>>> http://semantic-forms.cc:9111/
>>>>
>>>>
>>>>
>>>>
>>>> 2017-07-28 13:14 GMT+02:00 Jean-Marc Vanel <[email protected]>:
>>>>
>>>> Hi
>>>>>
>>>>> I've checked lots of things for 2 days.
>>>>>
>>>>> I have this scenario in semantic_forms:
>>>>>
>>>>> - on fresh TDB and LUCENE directories
>>>>> - load rdfs: (the ontology)
>>>>> - create instance of class bli:bli (sic !)
>>>>> - enter rdfs:comment bli
>>>>> - search bli => NOTHING !!! :(
>>>>>
>>>>> I wrote a self-contained test with no semantic_forms nor Banana
>>>>> dependency, that reproduces the same scenario by theAPI:
>>>>> https://github.com/jmvanel/semantic_forms/blob/master/scala/
>>>>> forms/src/main/scala/deductions/runtime/jena/lucene/TestText
>>>>> Index2.scala
>>>>>
>>>>> But it succeds !!!
>>>>>
>>>>> So I wrote another test that runs on the TDB that was prepared in the
>>>>> above scenario in semantic_forms:
>>>>> https://github.com/jmvanel/semantic_forms/blob/master/scala/
>>>>> forms/src/main/scala/deductions/runtime/jena/lucene/QueryTex
>>>>> tIndex.scala
>>>>>
>>>>> The indexing seems normal on Lucene + Jena side, but NOT the SPARQL
>>>>> search with text:query .
>>>>>
>>>>> runMain deductions.runtime.jena.lucene.QueryTextIndex bli TDB
>>>>> ...
>>>>> [info] search with Lucene: bli
>>>>> [info] Doc: 30
>>>>> [info] 1 stored,indexed,tokenized,indexOptions=DOCS<uri:
>>>>> http://localh
>>>>> ost:9000/ldp/1501237821055-8217451390491>
>>>>> [info] uri = http://localhost:9000/ldp/1501237821055-8217451390491
>>>>> [info] 2 stored,indexed,tokenized,omitN
>>>>> orms,indexOptions=DOCS<lang:fr>
>>>>> [info] lang = fr
>>>>> [info] 3 stored,indexed,tokenized,omitN
>>>>> orms,indexOptions=DOCS<uid:f1e
>>>>> 70540a1cd751b78e29b31b4ae57c5520b71a728f8e1c7b24c698e8cd85e83>
>>>>> [info] uid = f1e70540a1cd751b78e29b31b4ae57
>>>>> c5520b71a728f8e1c7b24c698e8cd85e83
>>>>> [info] 4 stored,indexed,tokenized,omitN
>>>>> orms,indexOptions=DOCS<lang:fr>
>>>>> [info] lang = fr
>>>>> [info] 5 stored,indexed,tokenized,omitN
>>>>> orms,indexOptions=DOCS<uid:435
>>>>> b1578a796765c441ad43a9147e1952abbc44facfa5aebab3d6cb67e98f844>
>>>>> [info] uid = 435b1578a796765c441ad43a9147e1
>>>>> 952abbc44facfa5aebab3d6cb67e98f844
>>>>> [info] query
>>>>> [info] PREFIX text: <http://jena.apache.org/text#>
>>>>> [info] PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>>>>> [info] SELECT * WHERE {
>>>>> [info] graph ?g {
>>>>> [info] # ?thing text:query (rdfs:label "bli" ) .
>>>>> [info] ?thing text:query 'bli' .
>>>>> [info] ?thing ?p ?o .
>>>>> [info] }
>>>>> [info] } LIMIT 22
>>>>> [info]
>>>>> [info] ---------------------
>>>>> [info] | thing | p | o | g |
>>>>> [info] =====================
>>>>> [info] ---------------------
>>>>>
>>>>> The URI in Lucene dump is correct. I'm surprised that field "lang"
>>>>> appears 2 times, and "graph" not at all .
>>>>>
>>>>> I've looked in the Jena code, and the member fields in EntityDefinition
>>>>> https://github.com/apache/jena/blob/master/jena-text/src/
>>>>> main/java/org/apache/jena/query/text/EntityDefinition.java#L39
>>>>> looks as if it is not always updated.
>>>>> fields is initialized once from fieldToPredicate, and I'm not sure that
>>>>> fieldToPredicate is initialized before;
>>>>> moreover it is modified by method
>>>>> void set(String field, Node predicate)
>>>>> https://github.com/apache/jena/blob/master/jena-text/src/
>>>>> main/java/org/apache/jena/query/text/EntityDefinition.java#L126
>>>>>
>>>>> --
>>>>> Jean-Marc Vanel
>>>>> http://www.semantic-forms.cc:9111/display?displayuri=http://
>>>>> jmvanel.free.fr/jmv.rdf%23me
>>>>> Déductions SARL - Consulting, services, training,
>>>>> Rule-based programming, Semantic Web
>>>>> +33 (0)6 89 16 29 52 <+33%206%2089%2016%2029%2052>
>>>>> Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergu
>>>>> i
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Jean-Marc Vanel
>>>> http://www.semantic-forms.cc:9111/display?displayuri=http:/
>>>> /jmvanel.free.fr/jmv.rdf%23me
>>>> Déductions SARL - Consulting, services, training,
>>>> Rule-based programming, Semantic Web
>>>> +33 (0)6 89 16 29 52 <+33%206%2089%2016%2029%2052>
>>>> Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
>>>>
>>>>
>>>
>>>
>>>
>
>
> --
> Jean-Marc Vanel
> http://www.semantic-forms.cc:9111/display?displayuri=http:/
> /jmvanel.free.fr/jmv.rdf%23me
> Déductions SARL - Consulting, services, training,
> Rule-based programming, Semantic Web
> +33 (0)6 89 16 29 52 <+33%206%2089%2016%2029%2052>
> Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
>
--
Jean-Marc Vanel
http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui