Ok, since I already have the TDB built, it seems the best plan would be to
create an assembler file and then use the jena.textindexes application.
Sorry, these are the namespaces:
xmlns:mms="http://rdf.cdisc.org/mms#"
xmlns="http://rdf.cdisc.org/sdtm-1-2/std#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:sdtms="http://rdf.cdisc.org/sdtm-1-2/schema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:cts="http://rdf.cdisc.org/ct/schema#"
xml:base="http://rdf.cdisc.org/sdtm-1-2/std">
I have no experience with assembler files so I based mine off the example
on documentation. Does this look right?
@prefix : <http://localhost/jena_example/#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> .
@prefix mms: <http://rdf.cdisc.org/mms#> .
## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
## Initialize text query
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset rdfs:subClassOf ja:RDFDataset .
# Lucene index
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.
:text_dataset rdf:type text:TextDataset ;
text:dataset <#dataset> ;
text:index <#indexLucene> ;
.
# A TDB dataset used for RDF storage
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location "tdb" ;
tdb:unionDefaultGraph true ; # Optional
.
# Text index description
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:luceneIndexes> ;
text:entityMap <#entMap> ;
.
# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map (
[ text:field "text" ; text:predicate mms:dataElementName ]
[text:field "text" ; text:predicate mms:dataElementDescription ]
# the rest of the fields?
) .
On Tue, Aug 6, 2013 at 7:15 AM, Andy Seaborne <[email protected]> wrote:
> On 05/08/13 21:49, Brad Moran wrote:
>
>> I have an existing Jena TDB based on this example RDF:
>>
>> ...
>
>
>> I have compiled a Jena TDB based on several of these RDF files so it is a
>> large TDB and have several SPARQL queries that work as desired. I am now
>> trying to implement a full text search on this TDB. I have downloaded the
>> Jena 2.10.2 Snapshot jars and figured out my dependencies. I would like to
>> implement this text search through java code using the new Jena Text
>> Search
>> feature. This is my best attempt at solving the problem so far:
>>
>> public class TextSearchTest {
>> public static void main(String[] args)
>> {
>> try{
>> String DBDirectory = "tdb";
>>
>> // Construct the Lucene Index to be queried
>>
>> String indexDir = "luceneIndexes";
>> File file = new File(indexDir);
>> Directory dir = FSDirectory.open(file);
>>
>> // Create the in memory text index described
>> Dataset ds1 = TDBFactory.createDataset(**DBDirectory);
>> String uri =
>> "<http://rdf.cdisc.org/mms#**dataElement<http://rdf.cdisc.org/mms#dataElement>
>> >";
>> String property = "<http://rdf.cdisc.org/mms#**
>> dataElementName <http://rdf.cdisc.org/mms#dataElementName>>";
>> EntityDefinition entDef = new EntityDefinition(uri, property,
>> RDFS.Literal);//RDFS.label
>>
>
> This defines the text index to be working on a particular property.
>
> You want to pass in a resource (Resource or Property object) for
> http://rdf.cdisc.org/mms#**dataElementName<http://rdf.cdisc.org/mms#dataElementName>here.
>
>
>
>
> // Construct the Lucene Index to be queried
>> Dataset dataset = TextDatasetFactory.**createLucene(ds1,
>> dir,
>> entDef);
>>
>
> I hope you loaded the data into this dataset, not the underlying TDB one
> because other wise the text indexer would not have seen the RDF triples to
> index.
>
>
>
>> // try query
>> dataset.begin(ReadWrite.READ);
>> QueryExecution qExec = QueryExecutionFactory.create(
>> "PREFIX text: <http://jena.apache.org/text#>
>> PREFIX
>> mms: <http://rdf.cdisc.org/mms#> "
>> + "SELECT * WHERE{?s text:query
>> (mms:dataElementName 'AE')}", dataset);
>>
>> ResultSet rs = qExec.execSelect();
>> ResultSetFormatter.out(rs);
>>
>> dataset.end();
>> }
>> catch(Exception e){
>> System.out.println(e);
>> }
>> }
>> }
>>
>>
>> This results in: WARN o.apache.jena.query.text.**TextQueryPF -
>> Predicate not
>> indexed:
>> http://rdf.cdisc.org/mms#**dataElementName<http://rdf.cdisc.org/mms#dataElementName>
>>
>
> Because that field isn't being indexed.
>
> You can have several fileds indexed if you .set the EntityDefinition with
> additional predicates.
>
>
> and an empty result set is printed out by resultSetFormatter. It does not
>> seem to create an index for the TDB.
>>
>
> I believe my problem occurs with my
>> EntityDefinition (mainly because I am not sure where the parameters
>> entityField, primaryField, and primaryPredicate should come from). Also in
>> the example code it seems a lucene index is created then the data is
>> loaded
>> by an assembler file. Maybe I am just implementing this wrong. So to try
>> to
>> wrap this up:
>>
>> 1. Do I need to use an assembler file?
>>
>
> No but it may be easier that way.
>
>
> 2. Can I create an index from an existing TDB or do I need to create the
>> index as I create the TDB.
>>
>
> As the data is loaded.
>
> There is a simple application 'jena.textindexer' which will create the
> index from existing data.
>
> http://jena.staging.apache.**org/documentation/query/text-**
> query.html#building-a-text-**index<http://jena.staging.apache.org/documentation/query/text-query.html#building-a-text-index>
>
>
> 3. Could you give me a description of the parameters of EntityDefintion
>> class and where they come from? (in the rdf maybe?)
>>
>
> Create Property object for
> http://rdf.cdisc.org/mms#**dataElementName<http://rdf.cdisc.org/mms#dataElementName>nad
> pass that in as the 3rd argument
>
>
> 4. Any general advice on how I can solve this problem from my code.
>>
>> I tried to be as specific as possible here in hopes that you may be able
>> to
>> guide me in the right direction. If I left anything out just let me out
>> and
>> hopefully I can explain better. Thanks.
>>
>
> minor in this case, but the data is incomplete RDF/XML, no namespaces, so
> I didn't try using it.
>
> Our mantra is "complete, minimal example". Both "complete" and "minimal"
> make it much, much easier to give good answers.
>
>
>> --Brad
>>
>>
> Andy
>