Hi Andy,

indexing via Fuseki (#31) and and jena.textIndexer now worked both for me - 
thanks for your help.

In a production setting, I'd prefer the latter, because

a) the Fuseki datastore should better be read-only, and
b) on large datasets, loading and index building may take some hours, and this 
will be easier to control in a "local" script

>From the script, I reference a temporary config file which holds definitions 
>for only one dataset (whereas the fuseki config may hold many), in order to 
>(re-) build only one index.

Thanks again - Joachim

-----Ursprüngliche Nachricht-----
Von: Andy Seaborne [mailto:a...@apache.org] 
Gesendet: Freitag, 21. Juni 2013 15:30
An: users@jena.apache.org
Betreff: Re: AW: Empty index with Jena Text and Fuseki

On 21/06/13 11:45, Neubert Joachim wrote:
> Hi Andy,
>
> thanks for the quick response, which makes quite clear what was wrong: A 
> before for Joseki, I used a pre-built read-only tdb database.
>
> Well, so I have to use Fuseki for tdb building as well. I'll check and report 
> back.

There is a command line tool jena.textindexer to take dataset and produce an 
index.

java -cp fuseki-server.jar jena.textindexer YourJosekiConfigFile

But it was broken in the way it handled the command line args  - I've just 
fixed it and used it to index a store that wasn't loaded with text indexing 
enabled:

tdb.tdbloader -loc=DIR
jena.textindexer ....

and it worked for me.  You'll need the latest development build (# 31) which I 
just kicked off for a full rebuild.

https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-fuseki/0.2.8-SNAPSHOT/jena-fuseki-0.2.8-20130621.132913-31-distribution.zip

        Andy

>
> Cheers, Joachim
>
> -----Ursprüngliche Nachricht-----
> Von: Andy Seaborne [mailto:a...@apache.org]
> Gesendet: Freitag, 21. Juni 2013 12:24
> An: users@jena.apache.org
> Betreff: Re: Empty index with Jena Text and Fuseki
>
> On 21/06/13 09:33, Neubert Joachim wrote:
>> When I got it right, Fuseki is supposed to build the text index when it 
>> starts up. However, this did not work for me.
>
> Joachim,
>
> Fuseki indexes the data as it's loaded, it does not index existing data on 
> startup.  I see what you see in the Lucene directory before data is loaded.
>
> How is the data being loaded into the store?
>
> Have you tried the config-tdb-text.ttl example? I have just checked using 
> that, and also modified to add something more like the entity map you have 
> and it works for me.
>
> I've tried s-put and the web UI (SPARQL update) to load data into the current 
> snapshot build and text queries returned something.
>
> If you have a complete, minimal example of load-query lifecycle that would be 
> most useful.
>
>       Andy
>
>
>>
>> Starting fuseki (jena-fuseki-0.2.8-20130618.075236-28-server.jar) with an 
>> empty index directory, for a very short time, it looks like this:
>>
>> -rw-r--r--. 1 root root   45 Jun 20 13:46 segments_1
>> -rw-r--r--. 1 root root    0 Jun 20 13:46 write.lock
>>
>> and then it stays like this:
>>
>> -rw-r--r--. 1 root root   45 Jun 20 13:46 segments_1
>> -rw-r--r--. 1 root root   20 Jun 20 13:46 segments.gen
>>
>> Text queries yield an empty result, while standard sparql queries work.
>>
>> I can't figure out what could be wrong with my config:
>>
>> ## ---------------------------------------------------------------
>> ## Read-only TDB dataset (only read services enabled).
>>
>> <#service_stw_combined> rdf:type fuseki:Service ;
>>       rdfs:label                      "STW combined TDB Service (R)" ;
>>       fuseki:name                     "stw_combined" ;
>>       fuseki:serviceQuery             "query" ;
>>       fuseki:serviceQuery             "sparql" ;
>>       ##fuseki:serviceUpdate            "update" ;
>>       fuseki:serviceReadGraphStore    "data" ;
>>       fuseki:serviceReadGraphStore    "get" ;
>>       fuseki:dataset           :stw_combined ;
>>       .
>>
>> :stw_combined rdf:type      text:TextDataset ;
>>       text:dataset <#stw> ;
>>       text:index   <#stwIndex> ;
>>       .
>>
>> <#stw> rdf:type      tdb:DatasetTDB ;
>>       tdb:location "/opt/thes/var/stw/latest/tdb" ;
>>       ##tdb:unionDefaultGraph true ;
>>       .
>>
>> <#stwIndex> a text:TextIndexLucene ;
>>       text:directory <file:/opt/thes/var/stw/latest/tdb_lucene> ;
>>       text:entityMap <#entMap> ;
>>       .
>>
>> <#entMap> a text:EntityMap ;
>>       text:entityField      "uri" ;
>>       text:defaultField     "text" ; ## Must be defined in the text:map
>>       text:map (
>>            # skos:prefLabel
>>            [ text:field "text" ; text:predicate skos:prefLabel ]
>>            # skos:altLabel
>>            [ text:field "text" ; text:predicate skos:altLabel ]
>>            # skos:hiddenLabel
>>            [ text:field "text" ; text:predicate skos:hiddenLabel ]
>>            ) .
>>
>> Help would be much appreciated.
>>
>> Cheers, Joachim
>>
>

Reply via email to