Hi Andy,
I am sorry for such a late response. We were busy on another project during
this period. Now I try to explain how I reproduce the error step by step. I did
send you an email to the mailing list yesterday, however it never shows up. So
I would like to give another trial today. My apologies for possible duplicates.
So the problem is there is something wrong in the search indexing for in-memory
datasets.
Here is the configuration file I used, it should be basic enough: a server description, a
service description and an index engine associating to the dataset to index
"rdfs:label".
@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
@prefix spatial: <http://jena.apache.org/spatial#>
<http://jena.apache.org/spatial#> .
[] a fuseki:Server ;
fuseki:services (
<#memory>
) .
<#memory> a fuseki:Service ;
fuseki:name "memory" ;
fuseki:serviceQuery "sparql" ;
fuseki:serviceQuery "query" ;
fuseki:serviceUpdate "update" ; # SPARQL query service – /memory/update
fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:serviceReadGraphStore "get" ; # Graph store protocol (read only) –
/memory/get
fuseki:dataset :text_dataset ;
.
<#dataset> rdf:type ja:RDFDataset ;
ja:defaultGraph
[
a ja:MemoryModel ;
] .
Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset rdfs:subClassOf ja:RDFDataset .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
:text_dataset a text:TextDataset ;
text:dataset <#dataset> ;
text:index <#textIndexLucene> ;
.
Text index description
<#textIndexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene> <file://Lucene> ;
##text:directory "mem" ;
text:entityMap <#entMap> ;
.
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map (
[ text:field "text" ; text:predicate rdfs:label ]
) .
The server is started with
"./fuseki-server --config=config-memory-text.ttl"
and console says it starts properly:
[2015-06-03 12:13:09] Server INFO Fuseki 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000
[2015-06-03 12:13:09] Config INFO
FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT
[2015-06-03 12:13:09] Config INFO
FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run
[2015-06-03 12:13:09] Servlet INFO Initializing Shiro environment
[2015-06-03 12:13:09] Config INFO Shiro file:
file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini
<file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>
[2015-06-03 12:13:09] Config INFO Configuration file: config-memory-text.ttl
[2015-06-03 12:13:10] Builder INFO Service: :memory
[2015-06-03 12:13:11] Config INFO Register: /memory
[2015-06-03 12:13:11] Server INFO Started 2015/06/03 12:13:11 CEST on port 3030
I tested it in two versions: the official release 2.0.0 and the latest snapshot
2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The phenomenons are as follows:
In 2.0.0:
If I load some triples not containing "rdfs:label", everything works properly. However in
this case the index engine is not working; then as long as I add one triple for
"rdfs:label" into the file I am loading to Fuseki, error emerges:
[2015-06-03 12:10:47] Fuseki INFO [7] Filename: licenties.ttl,
Content-Type=application/octet-stream, Charset=null => Turtle : Count=40
Triples=40 Quads=0
[2015-06-03 12:10:47] HttpAction WARN Exception during abort (operation
attempts to continue): Can't abort a write lock-transaction
[2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms)
I remember that a few months ago when 2.0.0 was released for the first time, I
discovered this issue and reported to you. But at that time I didn't realize
that the root reason was because of indexing. In a later snapshot you fix it,
but my test wasn't proper so I thought the problem is solved and gave you a
wrong feedback. My sincere apologizes.
In 2.0.1 SNAPSHOT:
The latest snapshot contains the patch I mentioned above so they can be
successfully loaded. However they are not indexed at all. Queries with keyword
search do not return any result.
Following your advice, I tested loading and query from both Web UI and
s-post/s-query tools, unfortunately (or fortunately?) the consequences are the
same.
TDB:
Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0 and 2.0.1 SNAPSHOT
is also performed, they both works properly. Loadings are successful and
queries returns search results. The only difference is in the configuration
file the in-memory dataset is replaced with TDB.
@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
[] rdf:type fuseki:Server ;
fuseki:services (
<#service_text_tdb>
) .
TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset rdfs:subClassOf ja:RDFDataset .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
<#service_text_tdb> a fuseki:Service ;
rdfs:label "TDB/text service" ;
fuseki:name "tdb" ;
fuseki:serviceQuery "query" ;
fuseki:serviceQuery "sparql" ;
fuseki:serviceUpdate "update" ;
fuseki:serviceUpload "upload" ;
fuseki:serviceReadGraphStore "get" ;
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:dataset <#text_dataset> ;
.
<#text_dataset> a text:TextDataset ;
text:dataset <#dataset> ;
text:index <#indexLucene> ;
.
<#dataset> a tdb:DatasetTDB ;
tdb:location "DB" ;
##tdb:unionDefaultGraph true ;
.
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene> <file://Lucene> ;
##text:directory "mem" ;
text:entityMap <#entMap> ;
.
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map (
[ text:field "text" ; text:predicate rdfs:label ]
) .
Any advice for it now? Thank you very much for your efforts in advance.
Regards,
Yang
PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to test on it as
well. However I wasn't able to run it.
On 04/17/2015 05:29 PM, Yang Yuanzhe wrote:
Hi Andy,
Thank you very much for your reply.
In fact the problem is irrelevant to the preloaded triples. It won't work no
matter if we start an empty or preloaded one. Moreover, it takes around 1
minute to load 38k triples, while TDB only needs 6 seconds. If we turn off text
search for an in-memory dataset, the loading speed rushed to only 1 second.
That's why I thought problem is from Fuseki side.
As for TDB with reasoning, I don't agree with your opinion that the dataset is
not attached to a text index. We have defined the dataset:
<#tdb_inf_ds> a ja:RDFDataset ;
ja:defaultGraph <#tdb_inf> ;
.
We tell Lucene to index it:
:text_dataset a text:TextDataset ;
text:dataset <#tdb_inf_ds> ;
text:index <#textIndexLucene> ;
.
And we assert that the dataset includes an RDFS inference model:
<#tdb_inf> a ja:InfModel ;
rdfs:label "RDFS Inference Model" ;
ja:baseModel <#tdb_graph> ;
ja:reasoner
[ ja:reasonerURL <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>
<http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ]
.
Then both text search and RDFS reasoning should work. Such configuration works
properly in Fuseki 1.1.1. However things changed in 1.1.2 and 2.0.x. I don't
know what I should do to adjust to the new system.
Thank you very much for your efforts again and have a nice day.
Regards,
Yang
On 04/17/2015 02:53 PM, Andy Seaborne wrote:
On 14/04/15 18:51, Yang Yuanzhe wrote:
Hi there,
Sorry to trouble you again. Last month I wrote to you to figure out the
bug in text search for TDB. Given the following configuration, text
search works with TDB:
...
Comments inline:
Now we want to use text search for in-memory datasets, but we failed
after some trials, the configuration file we use is as follows:
@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
@prefix spatial: <http://jena.apache.org/spatial#>
<http://jena.apache.org/spatial#> .
[] a fuseki:Server ;
fuseki:services (
<#memory>
) .
<#memory> a fuseki:Service ;
fuseki:name "memory" ;
fuseki:serviceQuery "sparql" ;
fuseki:serviceQuery "query" ;
fuseki:serviceUpdate "update" ; # SPARQL query
service -- /memory/update
fuseki:serviceUpload "upload" ; # Non-SPARQL upload
service
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:serviceReadGraphStore "get" ; # Graph store
protocol (read only) -- /memory/get
fuseki:dataset :text_dataset ;
.
<#dataset> rdf:type ja:RDFDataset ;
ja:defaultGraph
[
a ja:MemoryModel ;
ja:content [ja:externalContent <file:dcat-vl.ttl>
<file://dcat-vl.ttl> ] ;
] .
That is going to load the data each time the server starts but does not attach
it anyway to the text index.
Is it the same data as is loaded (separately) into the text index?
Similarly for the inference setup (which is in a different Lucene index file:Text
<file://Text>) ...
Andy
# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset rdfs:subClassOf ja:RDFDataset .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
:text_dataset a text:TextDataset ;
text:dataset <#dataset> ;
text:index <#textIndexLucene> ;
.
# Text index description
<#textIndexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene> <file://Lucene> ;
##text:directory "mem" ;
text:entityMap <#entMap> ;
.
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map (
[ text:field "text" ; text:predicate rdfs:label ]
) .
...
All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any
clue or any suggestion for this issue? Thank you very much and have a
nice day.
Regards,
Yang