(Others have more experience working with large text indexed datasets than I have)

It is possible to bulk load with tdbloader from the command line then index the text with textindexer.

----

A DatasetGraphTest is a a warpped dataset - you can get the wrapped DatasetGraph which presumably is the TDB one by casting.

But by updating TDB directly you are by passing the text indexing.

        Andy


On 12/02/16 17:28, Joël Kuiper wrote:
Hey all,

I would like to use TDB with a text index, the easiest way it seems is to set 
this is up with an assembler file.
However if I use TDBFactory.assembleDataset or DatasetFactory.assemble I can no 
longer use the TDBLoader, since TDBInternal/getDatasetGraphTDB returns null.
Is there a way to obtain the DatasetGraphTDB for the bulk loader when creating 
a dataset with an assembler file?
Using the RDFDataMgr to load the data is not really an option, since it stalls 
here (presumably due to the size of the files).

Many thanks,

Joël

ATTACHMENT
Assembler file:
@prefix :        <http://localhost/jena_example/#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix skos:    <http://www.w3.org/2004/02/skos/core#> .

## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "org.apache.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type     text:TextDataset ;
     text:dataset   <#dataset> ;
     text:index     <#indexLucene> ;
     .

# A TDB datset used for RDF storage
<#dataset> rdf:type      tdb:DatasetTDB ;
     tdb:location "tdb" ;
     tdb:unionDefaultGraph true ;
     .

# Text index description
<#indexLucene> a text:TextIndexLucene ;
     text:directory <file:lucene> ;
     ##text:directory "mem" ;
     text:entityMap <#entMap> ;
     .

# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
     text:entityField      "uri" ;
     text:defaultField     "text" ;
     text:graphField       "graph" ;
     text:map (
       [ text:field "text" ;
         text:predicate rdfs:label
       ]
       [ text:field "text" ;
         text:predicate skos:prefLabel
       ]
       [ text:field "text" ;
         text:predicate skos:altLabel
       ]
     ) .


Reply via email to