Hi all,
I have got a question concerning the handling of one dataset with two named
graphs and two corresponding lucene indixes/directories.
Given the following situation:
I created one dataset. This dataset contains two different named graphs. Each
of this named graphs has an independent Lucene index for an rdfs:label within a
directory. I used Jena 3.3.0 and Jena-text 3.3.0 to build the connection
between Jena and Lucene. To retrieve the first ten data results and save them
in an JSONArray I created the following method:
// create a TDB-dataset
Dataset dataset = TDBFactory.createDataset(pathToTDB);
// Define the index mapping
EntityDefinition entDef = new EntityDefinition("uri", "text");
entDef.setPrimaryPredicate(RDFS.label.asNode());
// Lucene, in memory.
Directory dir = null;
try {
dir = new SimpleFSDirectory(new File(pathToLucene).toPath());
} catch (IOException e) {
e.printStackTrace();
}
TextIndexConfig textIndexConfig = new TextIndexConfig(entDef);
// connect jena dataset with lucene index
dataset = TextDatasetFactory.createLucene(dataset, dir,
textIndexConfig);
dataset.begin(ReadWrite.READ);
try {
Query query = QueryFactory.create(sparqlQueryString);
QueryExecution qExec = QueryExecutionFactory.create(query, dataset);
// Select a RDF-result set with data from the Jena-TDB
ResultSet resultsSel = qExec.execSelect();
return convertResultToJSONArray(resultsSel);
} finally {
dataset.end();
}
The sparqlQueryString contains:
SELECT *
WHERE
{ GRAPH <http://www.example.com/namedgraph1>
{ ?s text:query (rdfs:label 'test*' 10) ;
rdfs:label ?label}
}
or:
SELECT *
WHERE
{ GRAPH <http://www.example.com/namedgraph2>
{ ?s text:query (rdfs:label 'test*' 10) ;
rdfs:label ?label}
}
This method is used through a websocket service. This works really good for
queries of the first used named graph. If I send the same query with a second
named graph I get no results. If I restart the websocket and send the query
with the second named graph I get the expected results. But if I change the
named graph to the first named graph I get no results without a restart. I also
built a program which calls the method two times (one time for each query) in
one run.
So my question is: How can I reset the connection between the Lucene index and
the Jena dataset without needing a restart?
I also tried to use only one Lucene index. But this setup also doesn't work for
me. The SPARQL text-query first checks the Lucene index and finds ten results.
Afterwards, Jena checks for this results, whether they are in the corresponding
named graph or not. In the worst case (e.g. for small input) I got no results
in my JSONArray because the ten results are in the wrong named graph.
Thanks for all,
Roman