Philipp - sorry for the delay.
On 20/04/17 13:01, Philipp Poschmann wrote:
Dear Andy,
thank you very much for your advice. Indeed, my purpose is to build a java
application which queries the data stored by tdbloader. But I am not sure how
to implement the AssemblerUtils.build command in my application. That is, how
to choose the right dataset in my application.
What I have done now was to start the Fuseki Server. The administration panel
shows that there is one dataset „ds“. When I run my query via the browser it
works. So query my data with Fuseki would be a solution. However, I would like
to avoid running the server parallel to my application. So is it possible to
run my query targeting the text index without using the server at all?
Currently, my code gives me the same error („Failed to find the text index")
and looks like this:
Dataset dataset = TDBFactory.assembleDataset("text-config.ttl");
You'll need to build a text dataset - this wraps the TDB one
Dataset dataset = TextDatasetFactory.create("text-config.ttl");
It'll look for exactly one resource with type text:TextDataset which
your config does have.
It's useful to attach the Jena code and drill down into what the
operations do.
Andy
dataset.begin(ReadWrite.READ);
Model model = dataset.getDefaultModel();
Query query = QueryFactory.create("PREFIX
rdfs:<http://www.w3.org/2000/01/rdf-schema#> "
+ "PREFIX text: <http://jena.apache.org/text#> "
+ "PREFIX dbo:<http://dbpedia.org/ontology/> "
+ "SELECT * \n"
+ "WHERE { \n"
+ "?entity text:query (rdfs:label \"Stage\") ;
\n"
+ "rdfs:label ?label . \n"
+ "} LIMIT " + LIMIT);
try {
QueryExecution qexec = QueryExecutionFactory.create(query, model);
ResultSet rs = qexec.execSelect();
ResultSetFormatter.out(rs);
}
finally {
close();
}
Again, thank you very much for your help.
Philipp
Am 20.04.2017 um 11:46 schrieb Andy Seaborne <[email protected]>:
Philipp,
I'm not completely sure what is going on but this:
java -cp fuseki-server.jar tdb.tdbquery --time --tdb=config.ttl
--query=query.txt
will not work because tdbquery looks for a TDB dataset so it will find
<#dataset>.
What I don't know is how to use general command line tools to pick out the
right dataset from the configuration where there are two. Someon else may have
a trick to do this.
Fuseki will pick the right one when you point the service to text dataset.
A small Java program can use
AssemblerUtils.build(String assemblerFile, Resource type)
to pick the right dataset.
Andy
On 19/04/17 19:57, Philipp Poschmann wrote:
Dear all,
I am sorry for my incompetence but I have a problem with indexing labels using
Apache Jena. I have already checked several posts about this topic but can’t
find my error. Is there anyone who could help me please?
My text-config.ttl file looks like this:
@prefix : <http://localhost/jena_example/#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> .
# TDB
[] ja:loadClass "org.apache.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset rdfs:subClassOf ja:RDFDataset .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.
:text_dataset rdf:type text:TextDataset ;
text:dataset <#dataset> ;
text:index <#indexLucene> ;
.
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location "storage" ;
## In the example, this would hide the real default graph.
# tdb:unionDefaultGraph true ;
.
<#indexLucene> a text:TextIndexLucene ;
#text:directory <file:Lucene> ;
text:directory <file:storage> ;
text:entityMap <#entMap> ;
.
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ; ## Must be defined in the text:maps
text:map (
# rdfs:label
[ text:field "text" ; text:predicate rdfs:label ]
) .
I have a dataset file from dbpedia with some labels that I load and index with
these commands:
java -cp fuseki-server.jar tdb.tdbloader --tdb=config.ttl
infobox_property_definitions_en.ttl
java -cp fuseki-server.jar jena.textindexer --desc=config.ttl
To test for the indexed labels I tried the following query:
PREFIX text: <http://jena.apache.org/text#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
{ ?s text:query (rdfs:label "Stage") ;
rdfs:label ?label
}
LIMIT 10
With this command:
java -cp fuseki-server.jar tdb.tdbquery --time --tdb=config.ttl
--query=query.txt
However, I just get the following results:
WARN Failed to find the text index : tried context and as a text-enabled
dataset
WARN No text index - no text search performed
----------------------------------------------------------
| s | label |
==========================================================
| <http://dbpedia.org/property/colwidth> | "colwidth"@en |
| <http://dbpedia.org/property/voy> | "voy"@en |
| <http://dbpedia.org/property/n> | "n"@en |
| <http://dbpedia.org/property/v> | "v"@en |
| <http://dbpedia.org/property/b> | "b"@en |
| <http://dbpedia.org/property/s> | "s"@en |
| <http://dbpedia.org/property/d> | "d"@en |
| <http://dbpedia.org/property/name> | "Name"@en |
| <http://dbpedia.org/property/alt> | "Alt"@en |
| <http://dbpedia.org/property/caption> | "Caption"@en |
----------------------------------------------------------
Time: 0,065 sec
Obviously, these are not the desired results and the script has a problem
finding the text index. Just for clarification: I have decided to use a TDB
backed solution because my aim is to reproduce some dbpedia data locally but
without in-memory load. Actually, my intention is not to use the Fuseki Server
but it seems that it is the only solution to use text indexing.
Thank you very much for your help.
Best regards
Philipp