Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Yang Mon, 08 Jun 2015 06:42:56 -0700

Hi Andy,

Thank you very much for the suggestion. In-memory TDB dataset works properly.


As for the 500 error in loading, maybe you didn't notice my explanation about 
it. It emerges on 2.0.0 only when an in-memory dataset is used with text search 
enabled. I reported this error to you in March and it is fixed later on in a 
snapshot. Now in the latest snapshot loading is working, but Lucene does not 
index any more. 

Anyway, while using in-memory TDB for the moment, we are looking forward to 
your solution (or even a new release) for it. Thank you in advance for your 
efforts and have a nice day.

Regards,
Yang

PS: I am working behind some firewalls so sometimes I can't send out emails. :D


On 06/05/2015 12:32 PM, Andy Seaborne wrote:
> I've logged this as JENA-956 (with details).  The work-round is to use an 
> in-memory TDB dataset. 
> 
>      tdb:location "--mem--" ; 
> 
> 
> 
> > [2015-06-03 12:10:47] HttpAction WARN Exception during abort (operation 
> > attempts to continue): Can't abort a write lock-transaction 
> > [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms) 
> 
> You loaded the data twice I guess. 
> 
>     Andy 
> 
> 
> PS Your email address [email protected] <mailto:[email protected]> does 
> not always work. 
> 
> Reporting-MTA: dns; mailrelay118.isp.belgacom.be 
> 
> Final-Recipient: rfc822;[email protected] 
> <mailto:rfc822;[email protected]> 
> Action: failed 
> Status: 5.0.0 (permanent failure) 
> Remote-MTA: dns; [91.183.52.144] 
> Diagnostic-Code: smtp; 5.3.0 - Other mail system problem 554-'5.4.0 Error: 
> too many hops' (delivery attempts: 0) 
> 
> 
> 
> On 05/06/15 09:17, Yang wrote: 
>> Hi Andy, 
>> 
>> I am sorry for such a late response. We were busy on another project during 
>> this period. Now I try to explain how I reproduce the error step by step. I 
>> did send you an email to the mailing list yesterday, however it never shows 
>> up. So I would like to give another trial today. My apologies for possible 
>> duplicates. 
>> 
>> So the problem is there is something wrong in the search indexing for 
>> in-memory datasets. 
>> Here is the configuration file I used, it should be basic enough: a server 
>> description, a service description and an index engine associating to the 
>> dataset to index "rdfs:label". 
>> 
>>> @prefix : <#> . 
>>> @prefix fuseki: <http://jena.apache.org/fuseki#> 
>>> <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> 
>>> <http://jena.apache.org/fuseki#> . 
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
>>> <http://www.w3.org/2000/01/rdf-schema#> 
>>> <http://www.w3.org/2000/01/rdf-schema#> 
>>> <http://www.w3.org/2000/01/rdf-schema#> . 
>>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> 
>>> <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> 
>>> <http://jena.hpl.hp.com/2008/tdb#> . 
>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> 
>>> <http://jena.hpl.hp.com/2005/11/Assembler#> 
>>> <http://jena.hpl.hp.com/2005/11/Assembler#> 
>>> <http://jena.hpl.hp.com/2005/11/Assembler#> . 
>>> @prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> 
>>> <http://jena.apache.org/text#> <http://jena.apache.org/text#> . 
>>> @prefix spatial: <http://jena.apache.org/spatial#> 
>>> <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> 
>>> <http://jena.apache.org/spatial#> . 
>>> [] a fuseki:Server ; 
>>> fuseki:services ( 
>>> <#memory> 
>>> ) . 
>>> <#memory> a fuseki:Service ; 
>>> fuseki:name "memory" ; 
>>> fuseki:serviceQuery "sparql" ; 
>>> fuseki:serviceQuery "query" ; 
>>> fuseki:serviceUpdate "update" ; # SPARQL query service – /memory/update 
>>> fuseki:serviceUpload "upload" ; # Non-SPARQL upload service 
>>> fuseki:serviceReadWriteGraphStore "data" ; 
>>> fuseki:serviceReadGraphStore "get" ; # Graph store protocol (read only) – 
>>> /memory/get 
>>> fuseki:dataset :text_dataset ; 
>>> . 
>>> <#dataset> rdf:type ja:RDFDataset ; 
>>> ja:defaultGraph 
>>> [ 
>>> a ja:MemoryModel ; 
>>> ] . 
>>> Text 
>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" . 
>>> text:TextDataset rdfs:subClassOf ja:RDFDataset . 
>>> text:TextIndexLucene rdfs:subClassOf text:TextIndex . 
>>> :text_dataset a text:TextDataset ; 
>>> text:dataset <#dataset> ; 
>>> text:index <#textIndexLucene> ; 
>>> . 
>>> Text index description 
>>> <#textIndexLucene> a text:TextIndexLucene ; 
>>> text:directory <file:Lucene> <file://Lucene> <file://Lucene> 
>>> <file://lucene/> ; 
>>> ##text:directory "mem" ; 
>>> text:entityMap <#entMap> ; 
>>> . 
>>> <#entMap> a text:EntityMap ; 
>>> text:entityField "uri" ; 
>>> text:defaultField "text" ; 
>>> text:map ( 
>>> [ text:field "text" ; text:predicate rdfs:label ] 
>>> ) . 
>> 
>> The server is started with 
>>> "./fuseki-server --config=config-memory-text.ttl" 
>> 
>> and console says it starts properly: 
>>> [2015-06-03 12:13:09] Server INFO Fuseki 2.0.1-SNAPSHOT 
>>> 2015-05-05T12:48:09+0000 
>>> [2015-06-03 12:13:09] Config INFO 
>>> FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT 
>>> [2015-06-03 12:13:09] Config INFO 
>>> FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run 
>>> [2015-06-03 12:13:09] Servlet INFO Initializing Shiro environment 
>>> [2015-06-03 12:13:09] Config INFO Shiro file: 
>>> file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini 
>>> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>
>>>  
>>> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>
>>>  
>>> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>
>>>  
>>> [2015-06-03 12:13:09] Config INFO Configuration file: 
>>> config-memory-text.ttl 
>>> [2015-06-03 12:13:10] Builder INFO Service: :memory 
>>> [2015-06-03 12:13:11] Config INFO Register: /memory 
>>> [2015-06-03 12:13:11] Server INFO Started 2015/06/03 12:13:11 CEST on port 
>>> 3030 
>> 
>> I tested it in two versions: the official release 2.0.0 and the latest 
>> snapshot 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The phenomenons are as 
>> follows: 
>> 
>> In 2.0.0: 
>> If I load some triples not containing "rdfs:label", everything works 
>> properly. However in this case the index engine is not working; then as long 
>> as I add one triple for "rdfs:label" into the file I am loading to Fuseki, 
>> error emerges: 
>>> [2015-06-03 12:10:47] Fuseki INFO [7] Filename: licenties.ttl, 
>>> Content-Type=application/octet-stream, Charset=null => Turtle : Count=40 
>>> Triples=40 Quads=0 
>>> [2015-06-03 12:10:47] HttpAction WARN Exception during abort (operation 
>>> attempts to continue): Can't abort a write lock-transaction 
>>> [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms) 
>> 
>> I remember that a few months ago when 2.0.0 was released for the first time, 
>> I discovered this issue and reported to you. But at that time I didn't 
>> realize that the root reason was because of indexing. In a later snapshot 
>> you fix it, but my test wasn't proper so I thought the problem is solved and 
>> gave you a wrong feedback. My sincere apologizes. 
>> 
>> In 2.0.1 SNAPSHOT: 
>> The latest snapshot contains the patch I mentioned above so they can be 
>> successfully loaded. However they are not indexed at all. Queries with 
>> keyword search do not return any result. 
>> Following your advice, I tested loading and query from both Web UI and 
>> s-post/s-query tools, unfortunately (or fortunately?) the consequences are 
>> the same. 
>> 
>> TDB: 
>> Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0 and 2.0.1 
>> SNAPSHOT is also performed, they both works properly. Loadings are 
>> successful and queries returns search results. The only difference is in the 
>> configuration file the in-memory dataset is replaced with TDB. 
>>> @prefix : <#> . 
>>> @prefix fuseki: <http://jena.apache.org/fuseki#> 
>>> <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> 
>>> <http://jena.apache.org/fuseki#> . 
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
>>> <http://www.w3.org/2000/01/rdf-schema#> 
>>> <http://www.w3.org/2000/01/rdf-schema#> 
>>> <http://www.w3.org/2000/01/rdf-schema#> . 
>>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> 
>>> <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> 
>>> <http://jena.hpl.hp.com/2008/tdb#> . 
>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> 
>>> <http://jena.hpl.hp.com/2005/11/Assembler#> 
>>> <http://jena.hpl.hp.com/2005/11/Assembler#> 
>>> <http://jena.hpl.hp.com/2005/11/Assembler#> . 
>>> @prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> 
>>> <http://jena.apache.org/text#> <http://jena.apache.org/text#> . 
>>> [] rdf:type fuseki:Server ; 
>>> fuseki:services ( 
>>> <#service_text_tdb> 
>>> ) . 
>>> TDB 
>>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" . 
>>> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset . 
>>> tdb:GraphTDB rdfs:subClassOf ja:Model . 
>>> Text 
>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" . 
>>> text:TextDataset rdfs:subClassOf ja:RDFDataset . 
>>> text:TextIndexLucene rdfs:subClassOf text:TextIndex . 
>>> <#service_text_tdb> a fuseki:Service ; 
>>> rdfs:label "TDB/text service" ; 
>>> fuseki:name "tdb" ; 
>>> fuseki:serviceQuery "query" ; 
>>> fuseki:serviceQuery "sparql" ; 
>>> fuseki:serviceUpdate "update" ; 
>>> fuseki:serviceUpload "upload" ; 
>>> fuseki:serviceReadGraphStore "get" ; 
>>> fuseki:serviceReadWriteGraphStore "data" ; 
>>> fuseki:dataset <#text_dataset> ; 
>>> . 
>>> <#text_dataset> a text:TextDataset ; 
>>> text:dataset <#dataset> ; 
>>> text:index <#indexLucene> ; 
>>> . 
>>> <#dataset> a tdb:DatasetTDB ; 
>>> tdb:location "DB" ; 
>>> ##tdb:unionDefaultGraph true ; 
>>> . 
>>> <#indexLucene> a text:TextIndexLucene ; 
>>> text:directory <file:Lucene> <file://Lucene> <file://Lucene> 
>>> <file://lucene/> ; 
>>> ##text:directory "mem" ; 
>>> text:entityMap <#entMap> ; 
>>> . 
>>> <#entMap> a text:EntityMap ; 
>>> text:entityField "uri" ; 
>>> text:defaultField "text" ; 
>>> text:map ( 
>>> [ text:field "text" ; text:predicate rdfs:label ] 
>>> ) . 
>> 
>> Any advice for it now? Thank you very much for your efforts in advance. 
>> 
>> Regards, 
>> Yang 
>> 
>> PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to test on it 
>> as well. However I wasn't able to run it. 
>> 
>> On 04/17/2015 05:29 PM, Yang Yuanzhe wrote: 
>>> Hi Andy, 
>>> 
>>> Thank you very much for your reply. 
>>> 
>>> In fact the problem is irrelevant to the preloaded triples. It won't work 
>>> no matter if we start an empty or preloaded one. Moreover, it takes around 
>>> 1 minute to load 38k triples, while TDB only needs 6 seconds. If we turn 
>>> off text search for an in-memory dataset, the loading speed rushed to only 
>>> 1 second. That's why I thought problem is from Fuseki side. 
>>> 
>>> As for TDB with reasoning, I don't agree with your opinion that the dataset 
>>> is not attached to a text index. We have defined the dataset: 
>>>> <#tdb_inf_ds> a ja:RDFDataset ; 
>>>>      ja:defaultGraph       <#tdb_inf> ; 
>>>>      . 
>>> We tell Lucene to index it: 
>>>> :text_dataset a text:TextDataset ; 
>>>>      text:dataset   <#tdb_inf_ds> ; 
>>>>      text:index     <#textIndexLucene> ; 
>>>>      . 
>>> And we assert that the dataset includes an RDFS inference model: 
>>>> <#tdb_inf> a ja:InfModel ; 
>>>>      rdfs:label "RDFS Inference Model" ; 
>>>>      ja:baseModel <#tdb_graph> ; 
>>>>      ja:reasoner 
>>>>           [ ja:reasonerURL 
>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> 
>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> 
>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> 
>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ] 
>>>>      . 
>>> 
>>> Then both text search and RDFS reasoning should work. Such configuration 
>>> works properly in Fuseki 1.1.1. However things changed in 1.1.2 and 2.0.x. 
>>> I don't know what I should do to adjust to the new system. 
>>> 
>>> Thank you very much for your efforts again and have a nice day. 
>>> 
>>> Regards, 
>>> Yang 
>>> 
>>> 
>>> On 04/17/2015 02:53 PM, Andy Seaborne wrote: 
>>>> On 14/04/15 18:51, Yang Yuanzhe wrote: 
>>>>> Hi there, 
>>>>> 
>>>>> Sorry to trouble you again. Last month I wrote to you to figure out the 
>>>>> bug in text search for TDB. Given the following configuration, text 
>>>>> search works with TDB: 
>>>>> 
>>>> ... 
>>>> 
>>>> Comments inline: 
>>>> 
>>>>> Now we want to use text search for in-memory datasets, but we failed 
>>>>> after some trials, the configuration file we use is as follows: 
>>>>> 
>>>>>> @prefix :        <#> . 
>>>>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> 
>>>>>> <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> 
>>>>>> <http://jena.apache.org/fuseki#> . 
>>>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
>>>>>> @prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> 
>>>>>> <http://www.w3.org/2000/01/rdf-schema#> 
>>>>>> <http://www.w3.org/2000/01/rdf-schema#> 
>>>>>> <http://www.w3.org/2000/01/rdf-schema#> . 
>>>>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> 
>>>>>> <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> 
>>>>>> <http://jena.hpl.hp.com/2008/tdb#> . 
>>>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> 
>>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#> 
>>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#> 
>>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#> . 
>>>>>> @prefix text:    <http://jena.apache.org/text#> 
>>>>>> <http://jena.apache.org/text#> <http://jena.apache.org/text#> 
>>>>>> <http://jena.apache.org/text#> . 
>>>>>> @prefix spatial:    <http://jena.apache.org/spatial#> 
>>>>>> <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> 
>>>>>> <http://jena.apache.org/spatial#> . 
>>>>>> 
>>>>>> [] a fuseki:Server ; 
>>>>>>     fuseki:services ( 
>>>>>>       <#memory> 
>>>>>>     ) . 
>>>>>> 
>>>>>> <#memory> a fuseki:Service ; 
>>>>>>      fuseki:name                     "memory" ; 
>>>>>>      fuseki:serviceQuery             "sparql" ; 
>>>>>>      fuseki:serviceQuery             "query" ; 
>>>>>>      fuseki:serviceUpdate            "update" ;   # SPARQL query 
>>>>>> service -- /memory/update 
>>>>>>      fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload 
>>>>>> service 
>>>>>>      fuseki:serviceReadWriteGraphStore      "data" ; 
>>>>>>      fuseki:serviceReadGraphStore       "get" ;   # Graph store 
>>>>>> protocol (read only) -- /memory/get 
>>>>>>      fuseki:dataset           :text_dataset ; 
>>>>>>      . 
>>>>>> 
>>>>>> <#dataset> rdf:type ja:RDFDataset ; 
>>>>>>      ja:defaultGraph 
>>>>>>            [ 
>>>>>>              a ja:MemoryModel ; 
>>>>>>              ja:content [ja:externalContent <file:dcat-vl.ttl> 
>>>>>> <file://dcat-vl.ttl> <file://dcat-vl.ttl> <file://dcat-vl.ttl/> ] ; 
>>>>>>            ] . 
>>>> 
>>>> That is going to load the data each time the server starts but does not 
>>>> attach it anyway to the text index. 
>>>> 
>>>> Is it the same data as is loaded (separately) into the text index? 
>>>> 
>>>> Similarly for the inference setup (which is in a different Lucene index 
>>>> file:Text <file://Text> <file://Text> <file://text/>) ... 
>>>> 
>>>>      Andy 
>>>> 
>>>>>> 
>>>>>> # Text 
>>>>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" . 
>>>>>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset . 
>>>>>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex . 
>>>>>> 
>>>>>> :text_dataset a text:TextDataset ; 
>>>>>>      text:dataset   <#dataset> ; 
>>>>>>      text:index     <#textIndexLucene> ; 
>>>>>>      . 
>>>>>> 
>>>>>> # Text index description 
>>>>>> <#textIndexLucene> a text:TextIndexLucene ; 
>>>>>>      text:directory <file:Lucene> <file://Lucene> <file://Lucene> 
>>>>>> <file://lucene/> ; 
>>>>>>      ##text:directory "mem" ; 
>>>>>>      text:entityMap <#entMap> ; 
>>>>>>      . 
>>>>>> 
>>>>>> <#entMap> a text:EntityMap ; 
>>>>>>      text:entityField      "uri" ; 
>>>>>>      text:defaultField     "text" ; 
>>>>>>      text:map ( 
>>>>>>           [ text:field "text" ; text:predicate rdfs:label ] 
>>>>>>           ) . 
>>>>>> 
>>>> ... 
>>>> 
>>>>> 
>>>>> All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any 
>>>>> clue or any suggestion for this issue? Thank you very much and have a 
>>>>> nice day. 
>>>>> 
>>>>> Regards, 
>>>>> Yang 
>>>>> 
>>>> 
>>> 
>> 
>> 
>

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Reply via email to