Re: Lucene & TDB Question

Harkishan Singh Tue, 20 May 2014 01:36:26 -0700

Hi

If you want to index data by using java program then follow this code:


One thing keep in mind, if you are dealing with huge rdf data then it might
give you a Memory exception because it holds everything in-memory and then
indexes it. So for huge rdf data use command lines.

public static void main(String[] argv) {
        TextQuery.init();
        Dataset ds = createCode();
        Dataset ds = createAssembler() ;
        loadData(ds, "file path");
        queryData(ds);
    }


    public static Dataset createCode() {
        Dataset ds = null;
        System.out.println("Construct an in-memory dataset with in-memory
lucene index using code");
        // Build a text dataset by code.
        // Here , in-memory base data and in-memeory Lucene index

        // Base data
        // Dataset ds1 = DatasetFactory.createMem();
        String directory = "path of TDB";
        Dataset ds1 = TDBFactory.createDataset(directory);


        try {
            // Define the index mapping
            EntityDefinition entDef = new EntityDefinition("uri",
"property", RDFS.label);
            // Lucene, in memory.
             Directory dir = new RAMDirectory();

            // Join together into a dataset
            ds = TextDatasetFactory.createLucene(ds1, dir, entDef);


        } catch (Exception e) {
            System.out.println(e.toString());
        }
        return ds;
    }


     public static void loadData(Dataset dataset, String file) {
        System.out.println("Start loading");
        long startTime = System.nanoTime();
        dataset.begin(ReadWrite.WRITE);
        try {
            Model m = dataset.getDefaultModel();
            RDFDataMgr.read(m, file);
            dataset.commit();

        } finally {
            dataset.end();
        }

        long finishTime = System.nanoTime();
        double time = (finishTime - startTime) / 1.0e6;
        System.out.println(String.format("Finish loading - %.2fms", time));
    }

    public static void queryData(Dataset dataset) {
        System.out.println("START");
        long startTime = System.nanoTime();
        String queryString = "Sparql Query";
        dataset.begin(ReadWrite.READ);
        Model m = dataset.getDefaultModel();
        try {
            Query q = QueryFactory.create(queryString);
            QueryExecution qexec = QueryExecutionFactory.create(q, dataset);

            QueryExecUtils.executeQuery(q, qexec);
        } finally {
            dataset.end();
        }
        long finishTime = System.nanoTime();
        double time = (finishTime - startTime) / 1.0e6;
        System.out.println(String.format("FINISH - %.2fms", time));

    }

Thanks


On Tue, May 20, 2014 at 1:49 PM, Andy Seaborne <[email protected]> wrote:

> On 17/05/14 14:27, Karen Menz wrote:
>
>> Thanks Harkishan & Andy for your help, it works fine now.
>>
>> However, I wonder if there's a way to build the index in java code,
>> instead of the command line.
>>
>
> You can call the command line programme from java (you can call any java
> .main from java) if you want to index an already loaded dataset or just
> read data into a dataset with text index attached.
>
>         Andy
>
>
>
>> Thanks,
>> Karen
>>
>>
>>
>> On Friday, May 16, 2014 8:46 PM, Andy Seaborne <[email protected]> wrote:
>>
>>
>>
>> On 08/05/14 07:21, Karen Menz wrote:
>>
>>> Hi,
>>>      I'm trying to get Lucene working with TDB,
>>> but no luck so far.
>>> I already
>>> have a TDB dataset saved in "tdb_directory" folder, with the following
>>> platform:
>>>        Jena-2.11.1
>>>        Lucene-4.8.0      (Apache)
>>>        Java-1.7.0,
>>> 64-Bit
>>>
>>> And I have
>>> the following code:
>>> Dataset ds1 = TDBFactory.createDataset(tdb_directory);
>>> EntityDefinition entDef = new
>>> EntityDefinition("uri", "text", RDFS.label.asNode());
>>> File indexDir = new File(index_directory);
>>> try{
>>>         dir
>>> = FSDirectory.open(indexDir);
>>> } catch(IOException e) {
>>>         e.printStackTrace();
>>> }
>>> Dataset ds = TextDatasetFactory.createLucene(ds1,
>>> dir, entDef);
>>>
>>> Then, when I execute the query as
>>> following:
>>>
>>> ds.begin(ReadWrite.READ);
>>> Model model = ds.getDefaultModel();
>>> Query q = QueryFactory.create(pre + "\n"
>>> + qs);
>>>
>>
>> What's 'qs'?
>>
>>  QueryExecution qexec =
>>> QueryExecutionFactory.create(q, ds);
>>> QueryExecUtils.executeQuery(q,
>>> qexec);
>>>
>>> ds.commit();
>>> ds.end();
>>>
>>> I get an empty table
>>> -------------
>>> | s | label |
>>> =============
>>>
>>> and in the index_directory, only 3
>>> file were created; segments.gen, segments_1, and write.lock, with sizes
>>> 1kb,
>>> 1kb, 0kb, respectively.
>>>
>>> I'm not sure what I'm missing here,
>>> and really appreciate any help.
>>>
>>
>> looks like the data isn't indexed.  It does not happen automatically
>> just by atatching and index to an existing, preloaded dataset.
>>
>> Either have the index attached to the dataset when you loaded the data
>> or build the dataset in two steps:
>>
>> http://jena.apache.org/documentation/query/text-
>> query.html#building-a-text-index
>>
>> There is a textindexer to run from the command line for indexing
>> existing data.
>>
>>      Andy
>>
>>
>>
>>
>>
>>
>>> Thanks in advance.
>>>
>>> Karen
>>>
>>>
>

Re: Lucene & TDB Question

Reply via email to