Re: Apache Jena Fuseki with text indexing

Zhenya Antić Thu, 26 Mar 2020 08:32:45 -0700

Hi Andy,

Thanks. So I think I have all the lines you listed in the .ttl file (attached). 
I also checked, the data file contains the relevant data. But I have 0 
properties indexed.


Thanks,
Zhenya



On Wed, Mar 25, 2020, at 4:41 AM, Andy Seaborne wrote:
> 
> 
> On 24/03/2020 15:11, Zhenya Antić wrote:
> > Hi Andy,
> > 
> >> Did you load the data before attaching the text index?
> > 
> > How do I do it (or not do it, wasn't sure from your post)?
> 
> Set up the Fueski system, with the text index as the Fuskei service dataset:
> 
>  fuseki:name "biology" ;
>  fuseki:dataset :text_dataset ;
> ...
> 
> :text_dataset rdf:type text:TextDataset ;
>  text:dataset <#dataset> ;
> 
> 
> 
> <#dataset> rdf:type tdb2:DatasetTDB2 ;
> tdb2:location "db" ; #path to TDB;
> .
> 
> then send the data to /biology/data (which is the SPARQl GSP write 
> endpoint) or however you want to push the data to the server (SPARQL 
> Update, or the UI.
> 
> For very large data:
> 
> Load the TDB2 dataset offline
> Then run the "jena.textindexer" utility
> 
> https://jena.apache.org/documentation/query/text-query.html#configuration
> 
> The first way is easier.
> 
>  Andy
> 
> > 
> > Thanks,
> > Zhenya
> > 
> > 
> > 
> > On Sun, Mar 22, 2020, at 9:18 AM, Andy Seaborne wrote:
> >> Just checking one point:
> >>
> >> Did you load the data before attaching the text index?
> >>
> >> The text index is calculated as data is added so if you first load the
> >> dataset then setup a text index, it will miss indexing the data.
> >>
> >> Andy
> >>
> >> On 21/03/2020 07:55, Lorenz Buehmann wrote:
> >>> Hi,
> >>>
> >>> welcome to Semantic Web and Apache Jena.
> >>>
> >>> Comments inline:
> >>>
> >>> On 20.03.20 15:36, Zhenya Antić wrote:
> >>>> Hello,
> >>>>
> >>>> I am a beginner with Fuseki, knowledge graphs and SPARQL, so please 
> >>>> forgive me if the questions seem obvious, the learning curve for this 
> >>>> turned out to be quite steep.
> >>> No problem, nothing is simple in the beginning,
> >>>>
> >>>> I am trying to get text indexing to work with my Fuseki knowledge graph.
> >>> Which DBpedia dataset did you load? I mean, which files?
> >>>>
> >>>> For starters, I tried using a regular expression, but that didn't work:
> >>>>
> >>>> Just a plain query like this:
> >>>> SELECT DISTINCT * WHERE {
> >>>> ?s ?p ?o
> >>>> }
> >>>> gives 98 results such as:
> >>>>
> >>>> 1
> >>>> <http://dbpedia.org/ontology/wikiPageID:9127632>
> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
> >>>> <http://dbpedia.org/resource/Biology>
> >>>> 2
> >>>> <http://dbpedia.org/ontology/wikiPageID:9127632>
> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
> >>>> <http://dbpedia.org/resource/Biology#Branches>
> >>>> 3
> >>>> <http://dbpedia.org/ontology/wikiPageID:9127632>
> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym>
> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#branches_of_biology>
> >>>> 4
> >>>> <http://dbpedia.org/ontology/wikiPageID:18393>
> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
> >>>> <http://dbpedia.org/resource/Life>
> >>> That can't be the correct output of this query. rdfs:label should return
> >>> literals as object (?o) - or you loaded some really weird data
> >>>>
> >>>> But a query with a regular expression:
> >>>> SELECT DISTINCT * WHERE {
> >>>> ?s ?p ?o
> >>>> FILTER regex(?o, "Biol", "i")
> >>>> }
> >>>
> >>> 1. you should help the query engine and use rdfs:label as property
> >>>
> >>> 2. you should use str() function on the ?o values:
> >>>
> >>> SELECT DISTINCT * WHERE {
> >>> ?s rdfs:label ?o
> >>> FILTER regex(str(?o), "Biol", "i")
> >>> }
> >>>
> >>>> gives 0 results, although there are clearly results that contain "Biol".
> >>>
> >>>
> >>> I've to try your config or maybe others will spot the issue in the 
> >>> meantime.
> >>>
> >>>>
> >>>> I also tried setting up indexing with a .ttl file, however the result 
> >>>> was "INFO 0 (0 per second) properties indexed". .ttl file below:
> >>>>
> >>>> @prefix : <http://base/#> .
> >>>> @prefix tdb2: <http://jena.apache.org/2016/tdb#> .
> >>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> >>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
> >>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> >>>> @prefix fuseki: <http://jena.apache.org/fuseki#> .
> >>>> @prefix text: <http://jena.apache.org/text#> .
> >>>>
> >>>> <http://jena.apache.org/2016/tdb#DatasetTDB>
> >>>> rdfs:subClassOf ja:RDFDataset .
> >>>>
> >>>> ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .
> >>>>
> >>>> tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .
> >>>>
> >>>> tdb2:GraphTDB2 rdfs:subClassOf ja:Model .
> >>>>
> >>>> <http://jena.apache.org/2016/tdb#GraphTDB2>
> >>>> rdfs:subClassOf ja:Model .
> >>>>
> >>>> ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .
> >>>>
> >>>> ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .
> >>
> >> The rdfs:subClassOf should not be necessary (recent versions of Fuseki).
> >>
> >> If any are, let's use know so it can be fixed.
> >>
> >>>>
> >>>> <http://jena.apache.org/text#TextDataset>
> >>>> rdfs:subClassOf ja:RDFDataset .
> >>>>
> >>>> :service_tdb_all a fuseki:Service ;
> >>>> rdfs:label "TDB biology" ;
> >>>> fuseki:dataset :tdb_dataset_readwrite ;
> >>>> fuseki:name "biology" ;
> >>>> fuseki:serviceQuery "query" , "" , "sparql" ;
> >>>> fuseki:serviceReadGraphStore "get" ;
> >>>> fuseki:serviceReadQuads "" ;
> >>>> fuseki:serviceReadWriteGraphStore
> >>>> "data" ;
> >>>> fuseki:serviceReadWriteQuads "" ;
> >>>> fuseki:serviceUpdate "" , "update" ;
> >>>> fuseki:serviceUpload "upload" .
> >>>>
> >>>> :tdb_dataset_readwrite
> >>>> a tdb2:DatasetTDB2 ;
> >>>> tdb2:location "db" .
> >>>>
> >>>> <http://jena.apache.org/2016/tdb#GraphTDB>
> >>>> rdfs:subClassOf ja:Model .
> >>>>
> >>>> ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .
> >>>>
> >>>> ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .
> >>>>
> >>>> <http://jena.apache.org/2016/tdb#DatasetTDB2>
> >>>> rdfs:subClassOf ja:RDFDataset .
> >>>>
> >>>> <#dataset> rdf:type tdb2:DatasetTDB2 ;
> >>>> tdb2:location "db" ; #path to TDB;
> >>>> .
> >>>>
> >>>> # Text index description
> >>>> :text_dataset rdf:type text:TextDataset ;
> >>>> text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired 
> >>>> URI
> >>>> text:index <#indexLucene> ;
> >>>> .
> >>>>
> >>>> <#indexLucene> a text:TextIndexLucene ;
> >>>> text:directory <file:data/luceneIndexing> ;
> >>>> text:entityMap <#entMap> ;
> >>>> .
> >>>>
> >>>> <#entMap> a text:EntityMap ;
> >>>> text:defaultField "text" ;
> >>>> text:entityField "uri" ;
> >>>> text:map (
> >>>> #RDF label abstracts
> >>>> [ text:field "text" ;
> >>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ;
> >>>> text:analyzer [
> >>>> a text:StandardAnalyzer
> >>>> ]
> >>>> ]
> >>>> [ text:field "text" ;
> >>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ;
> >>>> text:analyzer [
> >>>> a text:StandardAnalyzer
> >>>> ]
> >>>> ]
> >>>> ) .
> >>>>
> >>>>
> >>>>
> >>>> <#service_text_tdb> rdf:type fuseki:Service ;
> >>>> fuseki:name "ds" ;
> >>>> fuseki:serviceQuery "query" ;
> >>>> fuseki:serviceQuery "sparql" ;
> >>>> fuseki:serviceUpdate "update" ;
> >>>> fuseki:serviceUpload "upload" ;
> >>>> fuseki:serviceReadGraphStore "get" ;
> >>>> fuseki:serviceReadWriteGraphStore "data" ;
> >>>> fuseki:dataset :text_dataset ;
> >>>> .
> >>>>
> >>>> Thank you so much in advance,
> >>>>
> >>>> __________________________
> >>>> Zhenya Antić, PhD
> >>>> Natural Language Processing
> >>>> https://www.linkedin.com/in/zhenya-antic/
> >>>>
> >>>> Practical Linguistics Inc
> >>>> http://www.practicallinguistics.com
> >>>>
> >>>>
> >>>>
> >>>
> >>
> > 
>

Re: Apache Jena Fuseki with text indexing

Reply via email to