Hi Andy, Thanks. So I think I have all the lines you listed in the .ttl file (attached). I also checked, the data file contains the relevant data. But I have 0 properties indexed.
Thanks, Zhenya On Wed, Mar 25, 2020, at 4:41 AM, Andy Seaborne wrote: > > > On 24/03/2020 15:11, Zhenya Antić wrote: > > Hi Andy, > > > >> Did you load the data before attaching the text index? > > > > How do I do it (or not do it, wasn't sure from your post)? > > Set up the Fueski system, with the text index as the Fuskei service dataset: > > fuseki:name "biology" ; > fuseki:dataset :text_dataset ; > ... > > :text_dataset rdf:type text:TextDataset ; > text:dataset <#dataset> ; > > > > <#dataset> rdf:type tdb2:DatasetTDB2 ; > tdb2:location "db" ; #path to TDB; > . > > then send the data to /biology/data (which is the SPARQl GSP write > endpoint) or however you want to push the data to the server (SPARQL > Update, or the UI. > > For very large data: > > Load the TDB2 dataset offline > Then run the "jena.textindexer" utility > > https://jena.apache.org/documentation/query/text-query.html#configuration > > The first way is easier. > > Andy > > > > > Thanks, > > Zhenya > > > > > > > > On Sun, Mar 22, 2020, at 9:18 AM, Andy Seaborne wrote: > >> Just checking one point: > >> > >> Did you load the data before attaching the text index? > >> > >> The text index is calculated as data is added so if you first load the > >> dataset then setup a text index, it will miss indexing the data. > >> > >> Andy > >> > >> On 21/03/2020 07:55, Lorenz Buehmann wrote: > >>> Hi, > >>> > >>> welcome to Semantic Web and Apache Jena. > >>> > >>> Comments inline: > >>> > >>> On 20.03.20 15:36, Zhenya Antić wrote: > >>>> Hello, > >>>> > >>>> I am a beginner with Fuseki, knowledge graphs and SPARQL, so please > >>>> forgive me if the questions seem obvious, the learning curve for this > >>>> turned out to be quite steep. > >>> No problem, nothing is simple in the beginning, > >>>> > >>>> I am trying to get text indexing to work with my Fuseki knowledge graph. > >>> Which DBpedia dataset did you load? I mean, which files? > >>>> > >>>> For starters, I tried using a regular expression, but that didn't work: > >>>> > >>>> Just a plain query like this: > >>>> SELECT DISTINCT * WHERE { > >>>> ?s ?p ?o > >>>> } > >>>> gives 98 results such as: > >>>> > >>>> 1 > >>>> <http://dbpedia.org/ontology/wikiPageID:9127632> > >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> > >>>> <http://dbpedia.org/resource/Biology> > >>>> 2 > >>>> <http://dbpedia.org/ontology/wikiPageID:9127632> > >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> > >>>> <http://dbpedia.org/resource/Biology#Branches> > >>>> 3 > >>>> <http://dbpedia.org/ontology/wikiPageID:9127632> > >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> > >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#branches_of_biology> > >>>> 4 > >>>> <http://dbpedia.org/ontology/wikiPageID:18393> > >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> > >>>> <http://dbpedia.org/resource/Life> > >>> That can't be the correct output of this query. rdfs:label should return > >>> literals as object (?o) - or you loaded some really weird data > >>>> > >>>> But a query with a regular expression: > >>>> SELECT DISTINCT * WHERE { > >>>> ?s ?p ?o > >>>> FILTER regex(?o, "Biol", "i") > >>>> } > >>> > >>> 1. you should help the query engine and use rdfs:label as property > >>> > >>> 2. you should use str() function on the ?o values: > >>> > >>> SELECT DISTINCT * WHERE { > >>> ?s rdfs:label ?o > >>> FILTER regex(str(?o), "Biol", "i") > >>> } > >>> > >>>> gives 0 results, although there are clearly results that contain "Biol". > >>> > >>> > >>> I've to try your config or maybe others will spot the issue in the > >>> meantime. > >>> > >>>> > >>>> I also tried setting up indexing with a .ttl file, however the result > >>>> was "INFO 0 (0 per second) properties indexed". .ttl file below: > >>>> > >>>> @prefix : <http://base/#> . > >>>> @prefix tdb2: <http://jena.apache.org/2016/tdb#> . > >>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . > >>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . > >>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . > >>>> @prefix fuseki: <http://jena.apache.org/fuseki#> . > >>>> @prefix text: <http://jena.apache.org/text#> . > >>>> > >>>> <http://jena.apache.org/2016/tdb#DatasetTDB> > >>>> rdfs:subClassOf ja:RDFDataset . > >>>> > >>>> ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset . > >>>> > >>>> tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset . > >>>> > >>>> tdb2:GraphTDB2 rdfs:subClassOf ja:Model . > >>>> > >>>> <http://jena.apache.org/2016/tdb#GraphTDB2> > >>>> rdfs:subClassOf ja:Model . > >>>> > >>>> ja:MemoryDataset rdfs:subClassOf ja:RDFDataset . > >>>> > >>>> ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset . > >> > >> The rdfs:subClassOf should not be necessary (recent versions of Fuseki). > >> > >> If any are, let's use know so it can be fixed. > >> > >>>> > >>>> <http://jena.apache.org/text#TextDataset> > >>>> rdfs:subClassOf ja:RDFDataset . > >>>> > >>>> :service_tdb_all a fuseki:Service ; > >>>> rdfs:label "TDB biology" ; > >>>> fuseki:dataset :tdb_dataset_readwrite ; > >>>> fuseki:name "biology" ; > >>>> fuseki:serviceQuery "query" , "" , "sparql" ; > >>>> fuseki:serviceReadGraphStore "get" ; > >>>> fuseki:serviceReadQuads "" ; > >>>> fuseki:serviceReadWriteGraphStore > >>>> "data" ; > >>>> fuseki:serviceReadWriteQuads "" ; > >>>> fuseki:serviceUpdate "" , "update" ; > >>>> fuseki:serviceUpload "upload" . > >>>> > >>>> :tdb_dataset_readwrite > >>>> a tdb2:DatasetTDB2 ; > >>>> tdb2:location "db" . > >>>> > >>>> <http://jena.apache.org/2016/tdb#GraphTDB> > >>>> rdfs:subClassOf ja:Model . > >>>> > >>>> ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset . > >>>> > >>>> ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset . > >>>> > >>>> <http://jena.apache.org/2016/tdb#DatasetTDB2> > >>>> rdfs:subClassOf ja:RDFDataset . > >>>> > >>>> <#dataset> rdf:type tdb2:DatasetTDB2 ; > >>>> tdb2:location "db" ; #path to TDB; > >>>> . > >>>> > >>>> # Text index description > >>>> :text_dataset rdf:type text:TextDataset ; > >>>> text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired > >>>> URI > >>>> text:index <#indexLucene> ; > >>>> . > >>>> > >>>> <#indexLucene> a text:TextIndexLucene ; > >>>> text:directory <file:data/luceneIndexing> ; > >>>> text:entityMap <#entMap> ; > >>>> . > >>>> > >>>> <#entMap> a text:EntityMap ; > >>>> text:defaultField "text" ; > >>>> text:entityField "uri" ; > >>>> text:map ( > >>>> #RDF label abstracts > >>>> [ text:field "text" ; > >>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ; > >>>> text:analyzer [ > >>>> a text:StandardAnalyzer > >>>> ] > >>>> ] > >>>> [ text:field "text" ; > >>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ; > >>>> text:analyzer [ > >>>> a text:StandardAnalyzer > >>>> ] > >>>> ] > >>>> ) . > >>>> > >>>> > >>>> > >>>> <#service_text_tdb> rdf:type fuseki:Service ; > >>>> fuseki:name "ds" ; > >>>> fuseki:serviceQuery "query" ; > >>>> fuseki:serviceQuery "sparql" ; > >>>> fuseki:serviceUpdate "update" ; > >>>> fuseki:serviceUpload "upload" ; > >>>> fuseki:serviceReadGraphStore "get" ; > >>>> fuseki:serviceReadWriteGraphStore "data" ; > >>>> fuseki:dataset :text_dataset ; > >>>> . > >>>> > >>>> Thank you so much in advance, > >>>> > >>>> __________________________ > >>>> Zhenya Antić, PhD > >>>> Natural Language Processing > >>>> https://www.linkedin.com/in/zhenya-antic/ > >>>> > >>>> Practical Linguistics Inc > >>>> http://www.practicallinguistics.com > >>>> > >>>> > >>>> > >>> > >> > > >
