Re: Handle one dataset with two named graphs and two corresponding lucene indixes/directories

Andy Seaborne Fri, 30 Jun 2017 01:30:07 -0700

Hi Roman,

The text index is per-dataset. It is found by looking in the Context forthe Symbol("http://jena.apache.org/text#index) which is the Javaconstant TextQuery.textIndex.


Graph specific indexing is described in:

http://jena.apache.org/documentation/query/text-query.html#graph-specific-indexing

Did that not work for you? Could you produce a complete, minimalexample? Details such as what is in each of the graphs and the textindex configuration matter here.


As to your current setup:

    GRAPH <http://www.example.com/namedgraph2>
    { ?s  text:query  (rdfs:label 'test*' 10) ;
          rdfs:label  ?label}

This is looking for "?s rdfs:label ?label" in graph2 - presumably it'snot there because it's in graph1.


You can switch graphs by using GRAPH again:

    GRAPH <http://www.example.com/namedgraph2>
    { ?s  text:query  (rdfs:label 'test*' 10) ;
      GRAPH <http://www.example.com/namedgraph1>
      { ?s rdfs:label  ?label}
    }

    Andy



On 29/06/17 15:49, [email protected] wrote:

Hi all,

I have got a question concerning the handling of one dataset with two named 
graphs and two corresponding lucene indixes/directories.

Given the following situation:

I created one dataset. This dataset contains two different named graphs. Each 
of this named graphs has an independent Lucene index for an rdfs:label within a 
directory. I used Jena 3.3.0 and Jena-text 3.3.0 to build the connection 
between Jena and Lucene. To retrieve the first ten data results and save them 
in an JSONArray I created the following method:

         // create a TDB-dataset
         Dataset dataset = TDBFactory.createDataset(pathToTDB);

         // Define the index mapping
         EntityDefinition entDef = new EntityDefinition("uri", "text");

         entDef.setPrimaryPredicate(RDFS.label.asNode());

         // Lucene, in memory.
         Directory dir = null;

         try {

             dir = new SimpleFSDirectory(new File(pathToLucene).toPath());

         } catch (IOException e) {

             e.printStackTrace();

         }

         TextIndexConfig textIndexConfig = new TextIndexConfig(entDef);

         // connect jena dataset with lucene index
         dataset = TextDatasetFactory.createLucene(dataset, dir, 
textIndexConfig);

         dataset.begin(ReadWrite.READ);

         try {

             Query query = QueryFactory.create(sparqlQueryString);

             QueryExecution qExec = QueryExecutionFactory.create(query, 
dataset);

             // Select a RDF-result set with data from the Jena-TDB
             ResultSet resultsSel = qExec.execSelect();

             return convertResultToJSONArray(resultsSel);


         } finally {

             dataset.end();

         }

The sparqlQueryString contains:

SELECT  *
WHERE
   { GRAPH <http://www.example.com/namedgraph1>
       { ?s  text:query  (rdfs:label 'test*' 10) ;
             rdfs:label  ?label}
   }

or:

SELECT  *
WHERE
   { GRAPH <http://www.example.com/namedgraph2>
       { ?s  text:query  (rdfs:label 'test*' 10) ;
             rdfs:label  ?label}
   }


This method is used through a websocket service. This works really good for 
queries of the first used named graph. If I send the same query with a second 
named graph I get no results. If I restart the websocket and send the query 
with the second named graph I get the expected results. But if I change the 
named graph to the first named graph I get no results without a restart. I also 
built a program which calls the method two times (one time for each query) in 
one run.

So my question is: How can I reset the connection between the Lucene index and 
the Jena dataset without needing a restart?


I also tried to use only one Lucene index. But this setup also doesn't work for 
me. The SPARQL text-query first checks the Lucene index and finds ten results. 
Afterwards, Jena checks for this results, whether they are in the corresponding 
named graph or not. In the worst case (e.g. for small input) I got no results 
in my JSONArray because the ten results are in the wrong named graph.

Thanks for all,

Roman

Re: Handle one dataset with two named graphs and two corresponding lucene indixes/directories

Reply via email to