I have an existing Jena TDB based on this example RDF:
<mms:DataElement rdf:ID="DE.Intervention.--MODIFY">
<sdtms:dataElementRole rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.SynonymQualifier"/>
<sdtms:supportedBySEND rdf:datatype="
http://www.w3.org/2001/XMLSchema#boolean"
>true</sdtms:supportedBySEND>
<mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
>2</mms:ordinal>
<mms:dataElementDescription rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>If the value for --TRT is modified for coding purposes, then the
modified text is placed here.</mms:dataElementDescription>
<mms:dataElementName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>--MODIFY</mms:dataElementName>
<sdtms:dataElementType rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
<sdtms:supportedBySDTMIG rdf:datatype="
http://www.w3.org/2001/XMLSchema#boolean"
>true</sdtms:supportedBySDTMIG>
<mms:dataElementLabel rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>Modified Treatment Name</mms:dataElementLabel>
<mms:dataElementType rdf:datatype="
http://www.w3.org/2001/XMLSchema#QName"
>xsd:string</mms:dataElementType>
<mms:context>
<mms:VariableGrouping rdf:ID="InterventionVariables">
<mms:contextLabel rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>Interventions Observation Class Variables</mms:contextLabel>
<mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
>1</mms:ordinal>
<mms:context rdf:resource="#Model.SDTM-1-2"/>
</mms:VariableGrouping>
</mms:context>
<sdtms:qualifies>
<mms:DataElement rdf:ID="DE.Intervention.--TRT">
<mms:dataElementName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>--TRT</mms:dataElementName>
<sdtms:dataElementRole rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.TopicVariable"/>
<sdtms:dataElementType rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
<mms:dataElementDescription rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>The topic for the intervention observation, usually the verbatim
name of the treatment, drug, medicine, or therapy given during the dosing
interval for the observation.</mms:dataElementDescription>
<mms:dataElementLabel rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>Name of Treatment</mms:dataElementLabel>
<sdtms:supportedBySDTMIG rdf:datatype="
http://www.w3.org/2001/XMLSchema#boolean"
>true</sdtms:supportedBySDTMIG>
<mms:context rdf:resource="#InterventionVariables"/>
<mms:dataElementType rdf:datatype="
http://www.w3.org/2001/XMLSchema#QName"
>xsd:string</mms:dataElementType>
<sdtms:supportedBySEND rdf:datatype="
http://www.w3.org/2001/XMLSchema#boolean"
>true</sdtms:supportedBySEND>
<mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
>1</mms:ordinal>
</mms:DataElement>
</sdtms:qualifies>
</mms:DataElement>
This is one of two forms of rdf that is in the TDB, the second is:
<mms:PermissibleValue rdf:ID="C81224.C81203">
<mms:inValueDomain>
<mms:EnumeratedValueDomain rdf:ID="C81224">
<cts:cdiscDefinition rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>Derivation Type: Analysis value derivation
method.</cts:cdiscDefinition>
<cts:nciPreferredTerm rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>CDISC ADaM Derivation Type Terminology</cts:nciPreferredTerm>
<cts:nciCode rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>C81224</cts:nciCode>
<cts:cdiscSynonyms rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>Derivation Type</cts:cdiscSynonyms>
<cts:cdiscSubmissionValue rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>DTYPE</cts:cdiscSubmissionValue>
<cts:codelistName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>Derivation Type</cts:codelistName>
<cts:isExtensibleCodelist rdf:datatype="
http://www.w3.org/2001/XMLSchema#boolean"
>true</cts:isExtensibleCodelist>
</mms:EnumeratedValueDomain>
</mms:inValueDomain>
<cts:nciPreferredTerm rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>Worst Case Imputation Technique</cts:nciPreferredTerm>
<cts:nciCode rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>C81203</cts:nciCode>
<cts:cdiscDefinition rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>Worst Case: A data imputation technique which populates missing values
with the worst possible outcome.</cts:cdiscDefinition>
<cts:cdiscSubmissionValue rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
>WC</cts:cdiscSubmissionValue>
</mms:PermissibleValue>
I have compiled a Jena TDB based on several of these RDF files so it is a
large TDB and have several SPARQL queries that work as desired. I am now
trying to implement a full text search on this TDB. I have downloaded the
Jena 2.10.2 Snapshot jars and figured out my dependencies. I would like to
implement this text search through java code using the new Jena Text Search
feature. This is my best attempt at solving the problem so far:
public class TextSearchTest {
public static void main(String[] args)
{
try{
String DBDirectory = "tdb";
// Construct the Lucene Index to be queried
String indexDir = "luceneIndexes";
File file = new File(indexDir);
Directory dir = FSDirectory.open(file);
// Create the in memory text index described
Dataset ds1 = TDBFactory.createDataset(DBDirectory);
String uri = "<http://rdf.cdisc.org/mms#dataElement>";
String property = "<http://rdf.cdisc.org/mms#dataElementName>";
EntityDefinition entDef = new EntityDefinition(uri, property,
RDFS.Literal);//RDFS.label
// Construct the Lucene Index to be queried
Dataset dataset = TextDatasetFactory.createLucene(ds1, dir,
entDef);
// try query
dataset.begin(ReadWrite.READ);
QueryExecution qExec = QueryExecutionFactory.create(
"PREFIX text: <http://jena.apache.org/text#> PREFIX
mms: <http://rdf.cdisc.org/mms#> "
+ "SELECT * WHERE{?s text:query
(mms:dataElementName 'AE')}", dataset);
ResultSet rs = qExec.execSelect();
ResultSetFormatter.out(rs);
dataset.end();
}
catch(Exception e){
System.out.println(e);
}
}
}
This results in: WARN o.apache.jena.query.text.TextQueryPF - Predicate not
indexed: http://rdf.cdisc.org/mms#dataElementName
and an empty result set is printed out by resultSetFormatter. It does not
seem to create an index for the TDB. I believe my problem occurs with my
EntityDefinition (mainly because I am not sure where the parameters
entityField, primaryField, and primaryPredicate should come from). Also in
the example code it seems a lucene index is created then the data is loaded
by an assembler file. Maybe I am just implementing this wrong. So to try to
wrap this up:
1. Do I need to use an assembler file?
2. Can I create an index from an existing TDB or do I need to create the
index as I create the TDB.
3. Could you give me a description of the parameters of EntityDefintion
class and where they come from? (in the rdf maybe?)
4. Any general advice on how I can solve this problem from my code.
I tried to be as specific as possible here in hopes that you may be able to
guide me in the right direction. If I left anything out just let me out and
hopefully I can explain better. Thanks.
--Brad