On 05/08/13 21:49, Brad Moran wrote:
I have an existing Jena TDB based on this example RDF:

...

I have compiled a Jena TDB based on several of these RDF files so it is a
large TDB and have several SPARQL queries that work as desired. I am now
trying to implement a full text search on this TDB. I have downloaded the
Jena 2.10.2 Snapshot jars and figured out my dependencies. I would like to
implement this text search through java code using the new Jena Text Search
feature. This is my best attempt at solving the problem so far:

  public class TextSearchTest {
     public static void main(String[] args)
     {
         try{
             String DBDirectory = "tdb";

             // Construct the Lucene Index to be queried

             String indexDir = "luceneIndexes";
             File file = new File(indexDir);
             Directory dir = FSDirectory.open(file);

             // Create the in memory text index described
             Dataset ds1 = TDBFactory.createDataset(DBDirectory);
             String uri = "<http://rdf.cdisc.org/mms#dataElement>";
             String property = "<http://rdf.cdisc.org/mms#dataElementName>";
             EntityDefinition entDef = new EntityDefinition(uri, property,
RDFS.Literal);//RDFS.label

This defines the text index to be working on a particular property.

You want to pass in a resource (Resource or Property object) for http://rdf.cdisc.org/mms#dataElementName here.



             // Construct the Lucene Index to be queried
             Dataset dataset = TextDatasetFactory.createLucene(ds1, dir,
entDef);

I hope you loaded the data into this dataset, not the underlying TDB one because other wise the text indexer would not have seen the RDF triples to index.


             // try query
             dataset.begin(ReadWrite.READ);
                 QueryExecution qExec = QueryExecutionFactory.create(
                         "PREFIX text: <http://jena.apache.org/text#> PREFIX
mms: <http://rdf.cdisc.org/mms#> "
                         + "SELECT * WHERE{?s text:query
(mms:dataElementName 'AE')}", dataset);

                 ResultSet rs = qExec.execSelect();
                 ResultSetFormatter.out(rs);

             dataset.end();
         }
         catch(Exception e){
             System.out.println(e);
         }
     }
}


This results in: WARN  o.apache.jena.query.text.TextQueryPF - Predicate not
indexed: http://rdf.cdisc.org/mms#dataElementName

Because that field isn't being indexed.

You can have several fileds indexed if you .set the EntityDefinition with additional predicates.

and an empty result set is printed out by resultSetFormatter. It does not
seem to create an index for the TDB.

I believe my problem occurs with my
EntityDefinition (mainly because I am not sure where the parameters
entityField, primaryField, and primaryPredicate should come from). Also in
the example code it seems a lucene index is created then the data is loaded
by an assembler file. Maybe I am just implementing this wrong. So to try to
wrap this up:

1. Do I need to use an assembler file?

No but it may be easier that way.

2. Can I create an index from an existing TDB or do I need to create the
index as I create the TDB.

As the data is loaded.

There is a simple application 'jena.textindexer' which will create the index from existing data.

http://jena.staging.apache.org/documentation/query/text-query.html#building-a-text-index

3. Could you give me a description of the parameters of EntityDefintion
class and where they come from? (in the rdf maybe?)

Create Property object for http://rdf.cdisc.org/mms#dataElementName nad pass that in as the 3rd argument

4. Any general advice on how I can solve this problem from my code.

I tried to be as specific as possible here in hopes that you may be able to
guide me in the right direction. If I left anything out just let me out and
hopefully I can explain better. Thanks.

minor in this case, but the data is incomplete RDF/XML, no namespaces, so I didn't try using it.

Our mantra is "complete, minimal example". Both "complete" and "minimal" make it much, much easier to give good answers.


--Brad


        Andy

Reply via email to