Handle one dataset with two named graphs and two corresponding lucene indixes/directories

marscheliniho Thu, 29 Jun 2017 07:49:26 -0700

Hi all,

I have got a question concerning the handling of one dataset with two named 
graphs and two corresponding lucene indixes/directories.


Given the following situation:

I created one dataset. This dataset contains two different named graphs. Each 
of this named graphs has an independent Lucene index for an rdfs:label within a 
directory. I used Jena 3.3.0 and Jena-text 3.3.0 to build the connection 
between Jena and Lucene. To retrieve the first ten data results and save them 
in an JSONArray I created the following method:

        // create a TDB-dataset
        Dataset dataset = TDBFactory.createDataset(pathToTDB);

        // Define the index mapping
        EntityDefinition entDef = new EntityDefinition("uri", "text");

        entDef.setPrimaryPredicate(RDFS.label.asNode());

        // Lucene, in memory.
        Directory dir = null;

        try {

            dir = new SimpleFSDirectory(new File(pathToLucene).toPath());

        } catch (IOException e) {

            e.printStackTrace();

        }

        TextIndexConfig textIndexConfig = new TextIndexConfig(entDef);

        // connect jena dataset with lucene index
        dataset = TextDatasetFactory.createLucene(dataset, dir, 
textIndexConfig);

        dataset.begin(ReadWrite.READ);

        try {

            Query query = QueryFactory.create(sparqlQueryString);

            QueryExecution qExec = QueryExecutionFactory.create(query, dataset);

            // Select a RDF-result set with data from the Jena-TDB
            ResultSet resultsSel = qExec.execSelect();

            return convertResultToJSONArray(resultsSel);


        } finally {

            dataset.end();

        }

The sparqlQueryString contains:

SELECT  *
WHERE
  { GRAPH <http://www.example.com/namedgraph1>
      { ?s  text:query  (rdfs:label 'test*' 10) ;
            rdfs:label  ?label}
  }

or:

SELECT  *
WHERE
  { GRAPH <http://www.example.com/namedgraph2>
      { ?s  text:query  (rdfs:label 'test*' 10) ;
            rdfs:label  ?label}
  }


This method is used through a websocket service. This works really good for 
queries of the first used named graph. If I send the same query with a second 
named graph I get no results. If I restart the websocket and send the query 
with the second named graph I get the expected results. But if I change the 
named graph to the first named graph I get no results without a restart. I also 
built a program which calls the method two times (one time for each query) in 
one run.

So my question is: How can I reset the connection between the Lucene index and 
the Jena dataset without needing a restart?


I also tried to use only one Lucene index. But this setup also doesn't work for 
me. The SPARQL text-query first checks the Lucene index and finds ten results. 
Afterwards, Jena checks for this results, whether they are in the corresponding 
named graph or not. In the worst case (e.g. for small input) I got no results 
in my JSONArray because the ten results are in the wrong named graph. 

Thanks for all,

Roman

Handle one dataset with two named graphs and two corresponding lucene indixes/directories

Reply via email to