How to do text search with Jena and Fuseki

Kamble, Ajay, Crest Wed, 04 Nov 2015 04:11:30 -0800

Hi All,

1. Triplestore


I have an existing Triplestore that I setup by putting data in Fuseki. I used 
Java code to put all triples in Fuseki (here is url that I used - 
http://localhost:3030/mydb/data). Before starting loading of data I start 
Fuseki with this command:

fuseki-server --update --loc=/tmp/fuseki-tdb /mydb
(on Mac OS X).

My database is located at /tmp/fuseki-tdb

This setup works well and I can query all triples from console.

2. Free Text Search

I need to setup free text search on top of this Triplestore, so that normal 
Sparql queries and free text queries are both possible.

Here is the assembler file that I used.

@prefix :        <http://mydb.com/ns/dataset#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix no: <http://mydb.com/ns/concepts#> .
@prefix d: <http://mydb.com/ns/data#> .

## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
# Solr index
text:TextIndexSolr    rdfs:subClassOf   text:TextIndex .

## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type     text:TextDataset ;
    text:dataset   <#dataset> ;
    text:index     <#indexLucene> ;
    .

# A TDB datset used for RDF storage
<#dataset> rdf:type      tdb:DatasetTDB ;
    tdb:location “/tmp/fuseki-tdb" ;
    tdb:unionDefaultGraph true ; # Optional
    .

# Text index description
<#indexLucene> a text:TextIndexLucene ;
    text:directory <file:Lucene> ;
    ##text:directory "mem" ;
    text:entityMap <#entMap> ;
    .

# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ; text:predicate no:name ]
         [ text:field "text" ; text:predicate no:alt-name ]
         [ text:field "text" ; text:predicate no:name ]
         [ text:field "text" ; text:predicate no:title ]
         [ text:field "text" ; text:predicate no:author ]
         [ text:field "text" ; text:predicate no:inventor ]
         ) .

[] rdf:type fuseki:Server ;
   # Server-wide context parameters can be given here.
   # For example, to set query timeouts: on a server-wide basis:
   # Format 1: "1000" -- 1 second timeout
   # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout 
to for rest of query.
   # See java doc for ARQ.queryTimeout
   # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "10000" ] ;

   # Load custom code (rarely needed)
   # ja:loadClass "your.code.Class" ;

   # Services available.  Only explicitly listed services are configured.
   #  If there is a service description not linked from this list, it is 
ignored.
   fuseki:services (
     <#service>
     #<#service_text_tdb>
   ) .

<#service>  rdf:type fuseki:Service ;
    fuseki:name              “mydb" ;       # http://host:port/tdb
    fuseki:serviceQuery               "query" ;    # SPARQL query service
    fuseki:serviceQuery               "sparql" ;   # SPARQL query service
    fuseki:serviceUpdate              "update" ;   # SPARQL query service
    fuseki:serviceUpload              "upload" ;   # Non-SPARQL upload service
    fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph store 
protocol (read and write)
    fuseki:dataset           <#dataset> ;
    fuseki:dataset                  :text_dataset ;
.

With this assembler file, I start my server with following command,

fuseki-server --update 
--desc=/Users/kamb16/projects/nano/data/fuseki-assembler.ttl /mydb

I get following error,

com.hp.hpl.jena.sparql.ARQException: Found two matches: var ?root -> 
http://mydb.com/ns/dataset#text_dataset, 
file:///tmp/fuseki-assembler.ttl#dataset
at com.hp.hpl.jena.sparql.util.QueryExecUtils.getOne(QueryExecUtils.java:360)
at 
com.hp.hpl.jena.sparql.util.graph.GraphUtils.findRootByType(GraphUtils.java:194)
at 
com.hp.hpl.jena.sparql.core.assembler.AssemblerUtils.build(AssemblerUtils.java:91)
at arq.cmdline.ModAssembler.create(ModAssembler.java:68)
at arq.cmdline.ModDatasetAssembler.createDataset(ModDatasetAssembler.java:43)
at org.apache.jena.fuseki.FusekiCmd.processModulesAndArgs(FusekiCmd.java:307)
at arq.cmdline.CmdArgModule.process(CmdArgModule.java:50)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at org.apache.jena.fuseki.FusekiCmd.main(FusekiCmd.java:166)

I do not understand how to fix this issue. Could you please help? I want to do 
regular Sparql queries as well as Free text search.

Regards,
Ajay

How to do text search with Jena and Fuseki

Reply via email to