Hi
If you want to index data by using java program then follow this code:
One thing keep in mind, if you are dealing with huge rdf data then it might
give you a Memory exception because it holds everything in-memory and then
indexes it. So for huge rdf data use command lines.
public static void main(String[] argv) {
TextQuery.init();
Dataset ds = createCode();
Dataset ds = createAssembler() ;
loadData(ds, "file path");
queryData(ds);
}
public static Dataset createCode() {
Dataset ds = null;
System.out.println("Construct an in-memory dataset with in-memory
lucene index using code");
// Build a text dataset by code.
// Here , in-memory base data and in-memeory Lucene index
// Base data
// Dataset ds1 = DatasetFactory.createMem();
String directory = "path of TDB";
Dataset ds1 = TDBFactory.createDataset(directory);
try {
// Define the index mapping
EntityDefinition entDef = new EntityDefinition("uri",
"property", RDFS.label);
// Lucene, in memory.
Directory dir = new RAMDirectory();
// Join together into a dataset
ds = TextDatasetFactory.createLucene(ds1, dir, entDef);
} catch (Exception e) {
System.out.println(e.toString());
}
return ds;
}
public static void loadData(Dataset dataset, String file) {
System.out.println("Start loading");
long startTime = System.nanoTime();
dataset.begin(ReadWrite.WRITE);
try {
Model m = dataset.getDefaultModel();
RDFDataMgr.read(m, file);
dataset.commit();
} finally {
dataset.end();
}
long finishTime = System.nanoTime();
double time = (finishTime - startTime) / 1.0e6;
System.out.println(String.format("Finish loading - %.2fms", time));
}
public static void queryData(Dataset dataset) {
System.out.println("START");
long startTime = System.nanoTime();
String queryString = "Sparql Query";
dataset.begin(ReadWrite.READ);
Model m = dataset.getDefaultModel();
try {
Query q = QueryFactory.create(queryString);
QueryExecution qexec = QueryExecutionFactory.create(q, dataset);
QueryExecUtils.executeQuery(q, qexec);
} finally {
dataset.end();
}
long finishTime = System.nanoTime();
double time = (finishTime - startTime) / 1.0e6;
System.out.println(String.format("FINISH - %.2fms", time));
}
Thanks
On Tue, May 20, 2014 at 1:49 PM, Andy Seaborne <[email protected]> wrote:
> On 17/05/14 14:27, Karen Menz wrote:
>
>> Thanks Harkishan & Andy for your help, it works fine now.
>>
>> However, I wonder if there's a way to build the index in java code,
>> instead of the command line.
>>
>
> You can call the command line programme from java (you can call any java
> .main from java) if you want to index an already loaded dataset or just
> read data into a dataset with text index attached.
>
> Andy
>
>
>
>> Thanks,
>> Karen
>>
>>
>>
>> On Friday, May 16, 2014 8:46 PM, Andy Seaborne <[email protected]> wrote:
>>
>>
>>
>> On 08/05/14 07:21, Karen Menz wrote:
>>
>>> Hi,
>>> I'm trying to get Lucene working with TDB,
>>> but no luck so far.
>>> I already
>>> have a TDB dataset saved in "tdb_directory" folder, with the following
>>> platform:
>>> Jena-2.11.1
>>> Lucene-4.8.0 (Apache)
>>> Java-1.7.0,
>>> 64-Bit
>>>
>>> And I have
>>> the following code:
>>> Dataset ds1 = TDBFactory.createDataset(tdb_directory);
>>> EntityDefinition entDef = new
>>> EntityDefinition("uri", "text", RDFS.label.asNode());
>>> File indexDir = new File(index_directory);
>>> try{
>>> dir
>>> = FSDirectory.open(indexDir);
>>> } catch(IOException e) {
>>> e.printStackTrace();
>>> }
>>> Dataset ds = TextDatasetFactory.createLucene(ds1,
>>> dir, entDef);
>>>
>>> Then, when I execute the query as
>>> following:
>>>
>>> ds.begin(ReadWrite.READ);
>>> Model model = ds.getDefaultModel();
>>> Query q = QueryFactory.create(pre + "\n"
>>> + qs);
>>>
>>
>> What's 'qs'?
>>
>> QueryExecution qexec =
>>> QueryExecutionFactory.create(q, ds);
>>> QueryExecUtils.executeQuery(q,
>>> qexec);
>>>
>>> ds.commit();
>>> ds.end();
>>>
>>> I get an empty table
>>> -------------
>>> | s | label |
>>> =============
>>>
>>> and in the index_directory, only 3
>>> file were created; segments.gen, segments_1, and write.lock, with sizes
>>> 1kb,
>>> 1kb, 0kb, respectively.
>>>
>>> I'm not sure what I'm missing here,
>>> and really appreciate any help.
>>>
>>
>> looks like the data isn't indexed. It does not happen automatically
>> just by atatching and index to an existing, preloaded dataset.
>>
>> Either have the index attached to the dataset when you loaded the data
>> or build the dataset in two steps:
>>
>> http://jena.apache.org/documentation/query/text-
>> query.html#building-a-text-index
>>
>> There is a textindexer to run from the command line for indexing
>> existing data.
>>
>> Andy
>>
>>
>>
>>
>>
>>
>>> Thanks in advance.
>>>
>>> Karen
>>>
>>>
>