Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Andy Seaborne Fri, 05 Jun 2015 03:34:21 -0700

I've logged this as JENA-956 (with details). The work-round is to usean in-memory TDB dataset.


     tdb:location "--mem--" ;

> [2015-06-03 12:10:47] HttpAction WARN Exception during abort(operation attempts to continue): Can't abort a write lock-transaction

> [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms)

You loaded the data twice I guess.

        Andy


PS Your email address [email protected] does not always work.

Reporting-MTA: dns; mailrelay118.isp.belgacom.be

Final-Recipient: rfc822;[email protected]
Action: failed
Status: 5.0.0 (permanent failure)
Remote-MTA: dns; [91.183.52.144]

Diagnostic-Code: smtp; 5.3.0 - Other mail system problem 554-'5.4.0Error: too many hops' (delivery attempts: 0)




On 05/06/15 09:17, Yang wrote:

Hi Andy,

I am sorry for such a late response. We were busy on another project during 
this period. Now I try to explain how I reproduce the error step by step. I did 
send you an email to the mailing list yesterday, however it never shows up. So 
I would like to give another trial today. My apologies for possible duplicates.

So the problem is there is something wrong in the search indexing for in-memory 
datasets.
Here is the configuration file I used, it should be basic enough: a server description, a 
service description and an index engine associating to the dataset to index 
"rdfs:label".

@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> 
<http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
<http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> 
<http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> 
<http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
@prefix spatial: <http://jena.apache.org/spatial#> 
<http://jena.apache.org/spatial#> .
[] a fuseki:Server ;
fuseki:services (
<#memory>
) .
<#memory> a fuseki:Service ;
fuseki:name "memory" ;
fuseki:serviceQuery "sparql" ;
fuseki:serviceQuery "query" ;
fuseki:serviceUpdate "update" ; # SPARQL query service – /memory/update
fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:serviceReadGraphStore "get" ; # Graph store protocol (read only) – 
/memory/get
fuseki:dataset :text_dataset ;
.
<#dataset> rdf:type ja:RDFDataset ;
ja:defaultGraph
[
a ja:MemoryModel ;
] .
Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset rdfs:subClassOf ja:RDFDataset .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
:text_dataset a text:TextDataset ;
text:dataset <#dataset> ;
text:index <#textIndexLucene> ;
.
Text index description
<#textIndexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene> <file://Lucene> ;
##text:directory "mem" ;
text:entityMap <#entMap> ;
.
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map (
[ text:field "text" ; text:predicate rdfs:label ]
) .


The server is started with

"./fuseki-server --config=config-memory-text.ttl"


and console says it starts properly:

[2015-06-03 12:13:09] Server INFO Fuseki 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000
[2015-06-03 12:13:09] Config INFO 
FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT
[2015-06-03 12:13:09] Config INFO 
FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run
[2015-06-03 12:13:09] Servlet INFO Initializing Shiro environment
[2015-06-03 12:13:09] Config INFO Shiro file: 
file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini 
<file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>
[2015-06-03 12:13:09] Config INFO Configuration file: config-memory-text.ttl
[2015-06-03 12:13:10] Builder INFO Service: :memory
[2015-06-03 12:13:11] Config INFO Register: /memory
[2015-06-03 12:13:11] Server INFO Started 2015/06/03 12:13:11 CEST on port 3030


I tested it in two versions: the official release 2.0.0 and the latest snapshot 
2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The phenomenons are as follows:

In 2.0.0:
If I load some triples not containing "rdfs:label", everything works properly. However in 
this case the index engine is not working; then as long as I add one triple for 
"rdfs:label" into the file I am loading to Fuseki, error emerges:

[2015-06-03 12:10:47] Fuseki INFO [7] Filename: licenties.ttl, 
Content-Type=application/octet-stream, Charset=null => Turtle : Count=40 
Triples=40 Quads=0
[2015-06-03 12:10:47] HttpAction WARN Exception during abort (operation 
attempts to continue): Can't abort a write lock-transaction
[2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms)


I remember that a few months ago when 2.0.0 was released for the first time, I 
discovered this issue and reported to you. But at that time I didn't realize 
that the root reason was because of indexing. In a later snapshot you fix it, 
but my test wasn't proper so I thought the problem is solved and gave you a 
wrong feedback. My sincere apologizes.

In 2.0.1 SNAPSHOT:
The latest snapshot contains the patch I mentioned above so they can be 
successfully loaded. However they are not indexed at all. Queries with keyword 
search do not return any result.
Following your advice, I tested loading and query from both Web UI and 
s-post/s-query tools, unfortunately (or fortunately?) the consequences are the 
same.

TDB:
Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0 and 2.0.1 SNAPSHOT 
is also performed, they both works properly. Loadings are successful and 
queries returns search results. The only difference is in the configuration 
file the in-memory dataset is replaced with TDB.

@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> 
<http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
<http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> 
<http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> 
<http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
[] rdf:type fuseki:Server ;
fuseki:services (
<#service_text_tdb>
) .
TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset rdfs:subClassOf ja:RDFDataset .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
<#service_text_tdb> a fuseki:Service ;
rdfs:label "TDB/text service" ;
fuseki:name "tdb" ;
fuseki:serviceQuery "query" ;
fuseki:serviceQuery "sparql" ;
fuseki:serviceUpdate "update" ;
fuseki:serviceUpload "upload" ;
fuseki:serviceReadGraphStore "get" ;
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:dataset <#text_dataset> ;
.
<#text_dataset> a text:TextDataset ;
text:dataset <#dataset> ;
text:index <#indexLucene> ;
.
<#dataset> a tdb:DatasetTDB ;
tdb:location "DB" ;
##tdb:unionDefaultGraph true ;
.
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene> <file://Lucene> ;
##text:directory "mem" ;
text:entityMap <#entMap> ;
.
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map (
[ text:field "text" ; text:predicate rdfs:label ]
) .


Any advice for it now? Thank you very much for your efforts in advance.

Regards,
Yang

PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to test on it as 
well. However I wasn't able to run it.

On 04/17/2015 05:29 PM, Yang Yuanzhe wrote:

Hi Andy,

Thank you very much for your reply.

In fact the problem is irrelevant to the preloaded triples. It won't work no 
matter if we start an empty or preloaded one. Moreover, it takes around 1 
minute to load 38k triples, while TDB only needs 6 seconds. If we turn off text 
search for an in-memory dataset, the loading speed rushed to only 1 second. 
That's why I thought problem is from Fuseki side.

As for TDB with reasoning, I don't agree with your opinion that the dataset is 
not attached to a text index. We have defined the dataset:

<#tdb_inf_ds> a ja:RDFDataset ;
     ja:defaultGraph       <#tdb_inf> ;
     .

We tell Lucene to index it:

:text_dataset a text:TextDataset ;
     text:dataset   <#tdb_inf_ds> ;
     text:index     <#textIndexLucene> ;
     .

And we assert that the dataset includes an RDFS inference model:

<#tdb_inf> a ja:InfModel ;
     rdfs:label "RDFS Inference Model" ;
     ja:baseModel <#tdb_graph> ;
     ja:reasoner
          [ ja:reasonerURL <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> 
<http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ]
     .


Then both text search and RDFS reasoning should work. Such configuration works 
properly in Fuseki 1.1.1. However things changed in 1.1.2 and 2.0.x. I don't 
know what I should do to adjust to the new system.

Thank you very much for your efforts again and have a nice day.

Regards,
Yang


On 04/17/2015 02:53 PM, Andy Seaborne wrote:

On 14/04/15 18:51, Yang Yuanzhe wrote:

Hi there,

Sorry to trouble you again. Last month I wrote to you to figure out the
bug in text search for TDB. Given the following configuration, text
search works with TDB:

...

Comments inline:

Now we want to use text search for in-memory datasets, but we failed
after some trials, the configuration file we use is as follows:

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> 
<http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> 
<http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> 
<http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> 
<http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
@prefix spatial:    <http://jena.apache.org/spatial#> 
<http://jena.apache.org/spatial#> .

[] a fuseki:Server ;
    fuseki:services (
      <#memory>
    ) .

<#memory> a fuseki:Service ;
     fuseki:name                     "memory" ;
     fuseki:serviceQuery             "sparql" ;
     fuseki:serviceQuery             "query" ;
     fuseki:serviceUpdate            "update" ;   # SPARQL query
service -- /memory/update
     fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload
service
     fuseki:serviceReadWriteGraphStore      "data" ;
     fuseki:serviceReadGraphStore       "get" ;   # Graph store
protocol (read only) -- /memory/get
     fuseki:dataset           :text_dataset ;
     .

<#dataset> rdf:type ja:RDFDataset ;
     ja:defaultGraph
           [
             a ja:MemoryModel ;
             ja:content [ja:externalContent <file:dcat-vl.ttl> 
<file://dcat-vl.ttl> ] ;
           ] .


That is going to load the data each time the server starts but does not attach 
it anyway to the text index.

Is it the same data as is loaded (separately) into the text index?

Similarly for the inference setup (which is in a different Lucene index file:Text 
<file://Text>) ...

     Andy


# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

:text_dataset a text:TextDataset ;
     text:dataset   <#dataset> ;
     text:index     <#textIndexLucene> ;
     .

# Text index description
<#textIndexLucene> a text:TextIndexLucene ;
     text:directory <file:Lucene> <file://Lucene> ;
     ##text:directory "mem" ;
     text:entityMap <#entMap> ;
     .

<#entMap> a text:EntityMap ;
     text:entityField      "uri" ;
     text:defaultField     "text" ;
     text:map (
          [ text:field "text" ; text:predicate rdfs:label ]
          ) .

...


All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any
clue or any suggestion for this issue? Thank you very much and have a
nice day.

Regards,
Yang

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Reply via email to