Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Yang Fri, 05 Jun 2015 01:55:35 -0700

Hi Andy,

I am sorry for such a late response. We were busy on another project during 
this period. Now I try to explain how I reproduce the error step by step. I did 
send you an email to the mailing list yesterday, however it never shows up. So 
I would like to give another trial today. My apologies for possible duplicates.


So the problem is there is something wrong in the search indexing for in-memory 
datasets.
Here is the configuration file I used, it should be basic enough: a server 
description, a service description and an index engine associating to the 
dataset to index "rdfs:label".

> @prefix : <#> .
> @prefix fuseki: <http://jena.apache.org/fuseki#> 
> <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
> <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> 
> <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> 
> <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
> @prefix spatial: <http://jena.apache.org/spatial#> 
> <http://jena.apache.org/spatial#> .
> [] a fuseki:Server ;
> fuseki:services (
> <#memory>
> ) .
> <#memory> a fuseki:Service ;
> fuseki:name "memory" ; 
> fuseki:serviceQuery "sparql" ;
> fuseki:serviceQuery "query" ;
> fuseki:serviceUpdate "update" ; # SPARQL query service – /memory/update
> fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
> fuseki:serviceReadWriteGraphStore "data" ; 
> fuseki:serviceReadGraphStore "get" ; # Graph store protocol (read only) – 
> /memory/get
> fuseki:dataset :text_dataset ;
> .
> <#dataset> rdf:type ja:RDFDataset ;
> ja:defaultGraph
> [ 
> a ja:MemoryModel ;
> ] .
> Text
> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
> text:TextDataset rdfs:subClassOf ja:RDFDataset .
> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
> :text_dataset a text:TextDataset ;
> text:dataset <#dataset> ;
> text:index <#textIndexLucene> ;
> .
> Text index description
> <#textIndexLucene> a text:TextIndexLucene ;
> text:directory <file:Lucene> <file://Lucene> ;
> ##text:directory "mem" ;
> text:entityMap <#entMap> ;
> .
> <#entMap> a text:EntityMap ;
> text:entityField "uri" ;
> text:defaultField "text" ;
> text:map (
> [ text:field "text" ; text:predicate rdfs:label ]
> ) .

The server is started with
> "./fuseki-server --config=config-memory-text.ttl"

and console says it starts properly:
> [2015-06-03 12:13:09] Server INFO Fuseki 2.0.1-SNAPSHOT 
> 2015-05-05T12:48:09+0000
> [2015-06-03 12:13:09] Config INFO 
> FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT
> [2015-06-03 12:13:09] Config INFO 
> FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run
> [2015-06-03 12:13:09] Servlet INFO Initializing Shiro environment
> [2015-06-03 12:13:09] Config INFO Shiro file: 
> file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini 
> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>
> [2015-06-03 12:13:09] Config INFO Configuration file: config-memory-text.ttl
> [2015-06-03 12:13:10] Builder INFO Service: :memory
> [2015-06-03 12:13:11] Config INFO Register: /memory
> [2015-06-03 12:13:11] Server INFO Started 2015/06/03 12:13:11 CEST on port 
> 3030

I tested it in two versions: the official release 2.0.0 and the latest snapshot 
2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The phenomenons are as follows:

In 2.0.0:
If I load some triples not containing "rdfs:label", everything works properly. 
However in this case the index engine is not working; then as long as I add one 
triple for "rdfs:label" into the file I am loading to Fuseki, error emerges:
> [2015-06-03 12:10:47] Fuseki INFO [7] Filename: licenties.ttl, 
> Content-Type=application/octet-stream, Charset=null => Turtle : Count=40 
> Triples=40 Quads=0
> [2015-06-03 12:10:47] HttpAction WARN Exception during abort (operation 
> attempts to continue): Can't abort a write lock-transaction
> [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms)

I remember that a few months ago when 2.0.0 was released for the first time, I 
discovered this issue and reported to you. But at that time I didn't realize 
that the root reason was because of indexing. In a later snapshot you fix it, 
but my test wasn't proper so I thought the problem is solved and gave you a 
wrong feedback. My sincere apologizes.

In 2.0.1 SNAPSHOT:
The latest snapshot contains the patch I mentioned above so they can be 
successfully loaded. However they are not indexed at all. Queries with keyword 
search do not return any result.
Following your advice, I tested loading and query from both Web UI and 
s-post/s-query tools, unfortunately (or fortunately?) the consequences are the 
same.

TDB:
Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0 and 2.0.1 SNAPSHOT 
is also performed, they both works properly. Loadings are successful and 
queries returns search results. The only difference is in the configuration 
file the in-memory dataset is replaced with TDB.
> @prefix : <#> .
> @prefix fuseki: <http://jena.apache.org/fuseki#> 
> <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
> <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> 
> <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> 
> <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
> [] rdf:type fuseki:Server ;
> fuseki:services (
> <#service_text_tdb>
> ) .
> TDB
> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
> tdb:GraphTDB rdfs:subClassOf ja:Model .
> Text
> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
> text:TextDataset rdfs:subClassOf ja:RDFDataset .
> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
> <#service_text_tdb> a fuseki:Service ;
> rdfs:label "TDB/text service" ;
> fuseki:name "tdb" ;
> fuseki:serviceQuery "query" ;
> fuseki:serviceQuery "sparql" ;
> fuseki:serviceUpdate "update" ;
> fuseki:serviceUpload "upload" ;
> fuseki:serviceReadGraphStore "get" ;
> fuseki:serviceReadWriteGraphStore "data" ;
> fuseki:dataset <#text_dataset> ;
> .
> <#text_dataset> a text:TextDataset ;
> text:dataset <#dataset> ;
> text:index <#indexLucene> ;
> .
> <#dataset> a tdb:DatasetTDB ;
> tdb:location "DB" ;
> ##tdb:unionDefaultGraph true ;
> .
> <#indexLucene> a text:TextIndexLucene ;
> text:directory <file:Lucene> <file://Lucene> ;
> ##text:directory "mem" ;
> text:entityMap <#entMap> ;
> .
> <#entMap> a text:EntityMap ;
> text:entityField "uri" ;
> text:defaultField "text" ; 
> text:map ( 
> [ text:field "text" ; text:predicate rdfs:label ]
> ) .

Any advice for it now? Thank you very much for your efforts in advance.

Regards,
Yang

PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to test on it as 
well. However I wasn't able to run it.

On 04/17/2015 05:29 PM, Yang Yuanzhe wrote:
> Hi Andy, 
> 
> Thank you very much for your reply. 
> 
> In fact the problem is irrelevant to the preloaded triples. It won't work no 
> matter if we start an empty or preloaded one. Moreover, it takes around 1 
> minute to load 38k triples, while TDB only needs 6 seconds. If we turn off 
> text search for an in-memory dataset, the loading speed rushed to only 1 
> second. That's why I thought problem is from Fuseki side. 
> 
> As for TDB with reasoning, I don't agree with your opinion that the dataset 
> is not attached to a text index. We have defined the dataset: 
>> <#tdb_inf_ds> a ja:RDFDataset ; 
>>     ja:defaultGraph       <#tdb_inf> ; 
>>     . 
> We tell Lucene to index it: 
>> :text_dataset a text:TextDataset ; 
>>     text:dataset   <#tdb_inf_ds> ; 
>>     text:index     <#textIndexLucene> ; 
>>     .
> And we assert that the dataset includes an RDFS inference model: 
>> <#tdb_inf> a ja:InfModel ; 
>>     rdfs:label "RDFS Inference Model" ; 
>>     ja:baseModel <#tdb_graph> ; 
>>     ja:reasoner 
>>          [ ja:reasonerURL <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> 
>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ] 
>>     .
> 
> Then both text search and RDFS reasoning should work. Such configuration 
> works properly in Fuseki 1.1.1. However things changed in 1.1.2 and 2.0.x. I 
> don't know what I should do to adjust to the new system. 
> 
> Thank you very much for your efforts again and have a nice day. 
> 
> Regards, 
> Yang 
> 
> 
> On 04/17/2015 02:53 PM, Andy Seaborne wrote: 
>> On 14/04/15 18:51, Yang Yuanzhe wrote: 
>>> Hi there, 
>>> 
>>> Sorry to trouble you again. Last month I wrote to you to figure out the 
>>> bug in text search for TDB. Given the following configuration, text 
>>> search works with TDB: 
>>> 
>> ... 
>> 
>> Comments inline: 
>> 
>>> Now we want to use text search for in-memory datasets, but we failed 
>>> after some trials, the configuration file we use is as follows: 
>>> 
>>>> @prefix :        <#> . 
>>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> 
>>>> <http://jena.apache.org/fuseki#> . 
>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
>>>> @prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> 
>>>> <http://www.w3.org/2000/01/rdf-schema#> . 
>>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> 
>>>> <http://jena.hpl.hp.com/2008/tdb#> . 
>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> 
>>>> <http://jena.hpl.hp.com/2005/11/Assembler#> . 
>>>> @prefix text:    <http://jena.apache.org/text#> 
>>>> <http://jena.apache.org/text#> . 
>>>> @prefix spatial:    <http://jena.apache.org/spatial#> 
>>>> <http://jena.apache.org/spatial#> . 
>>>> 
>>>> [] a fuseki:Server ; 
>>>>    fuseki:services ( 
>>>>      <#memory> 
>>>>    ) . 
>>>> 
>>>> <#memory> a fuseki:Service ; 
>>>>     fuseki:name                     "memory" ; 
>>>>     fuseki:serviceQuery             "sparql" ; 
>>>>     fuseki:serviceQuery             "query" ; 
>>>>     fuseki:serviceUpdate            "update" ;   # SPARQL query 
>>>> service -- /memory/update 
>>>>     fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload 
>>>> service 
>>>>     fuseki:serviceReadWriteGraphStore      "data" ; 
>>>>     fuseki:serviceReadGraphStore       "get" ;   # Graph store 
>>>> protocol (read only) -- /memory/get 
>>>>     fuseki:dataset           :text_dataset ; 
>>>>     . 
>>>> 
>>>> <#dataset> rdf:type ja:RDFDataset ; 
>>>>     ja:defaultGraph 
>>>>           [ 
>>>>             a ja:MemoryModel ; 
>>>>             ja:content [ja:externalContent <file:dcat-vl.ttl> 
>>>> <file://dcat-vl.ttl> ] ; 
>>>>           ] . 
>> 
>> That is going to load the data each time the server starts but does not 
>> attach it anyway to the text index. 
>> 
>> Is it the same data as is loaded (separately) into the text index? 
>> 
>> Similarly for the inference setup (which is in a different Lucene index 
>> file:Text <file://Text>) ... 
>> 
>>     Andy 
>> 
>>>> 
>>>> # Text 
>>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" . 
>>>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset . 
>>>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex . 
>>>> 
>>>> :text_dataset a text:TextDataset ; 
>>>>     text:dataset   <#dataset> ; 
>>>>     text:index     <#textIndexLucene> ; 
>>>>     . 
>>>> 
>>>> # Text index description 
>>>> <#textIndexLucene> a text:TextIndexLucene ; 
>>>>     text:directory <file:Lucene> <file://Lucene> ; 
>>>>     ##text:directory "mem" ; 
>>>>     text:entityMap <#entMap> ; 
>>>>     . 
>>>> 
>>>> <#entMap> a text:EntityMap ; 
>>>>     text:entityField      "uri" ; 
>>>>     text:defaultField     "text" ; 
>>>>     text:map ( 
>>>>          [ text:field "text" ; text:predicate rdfs:label ] 
>>>>          ) . 
>>>> 
>> ... 
>> 
>>> 
>>> All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any 
>>> clue or any suggestion for this issue? Thank you very much and have a 
>>> nice day. 
>>> 
>>> Regards, 
>>> Yang 
>>> 
>> 
>

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Reply via email to