@prefix : <http://base/#> . @prefix tdb2: <http://jena.apache.org/2016/tdb#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix fuseki: <http://jena.apache.org/fuseki#> . @prefix text: <http://jena.apache.org/text#> .
<http://jena.apache.org/2016/tdb#DatasetTDB> rdfs:subClassOf ja:RDFDataset . ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset . tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset . tdb2:GraphTDB2 rdfs:subClassOf ja:Model . <http://jena.apache.org/2016/tdb#GraphTDB2> rdfs:subClassOf ja:Model . ja:MemoryDataset rdfs:subClassOf ja:RDFDataset . ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset . <http://jena.apache.org/text#TextDataset> rdfs:subClassOf ja:RDFDataset . :service_tdb_all a fuseki:Service ; rdfs:label "TDB biology" ; fuseki:dataset :tdb_dataset_readwrite ; fuseki:name "biology" ; fuseki:serviceQuery "query" , "" , "sparql" ; fuseki:serviceReadGraphStore "get" ; fuseki:serviceReadQuads "" ; fuseki:serviceReadWriteGraphStore "data" ; fuseki:serviceReadWriteQuads "" ; fuseki:serviceUpdate "" , "update" ; fuseki:serviceUpload "upload" . :tdb_dataset_readwrite a tdb2:DatasetTDB2 ; tdb2:location "db" . <http://jena.apache.org/2016/tdb#GraphTDB> rdfs:subClassOf ja:Model . ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset . ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset . <http://jena.apache.org/2016/tdb#DatasetTDB2> rdfs:subClassOf ja:RDFDataset . <#dataset> rdf:type tdb2:DatasetTDB2 ; tdb2:location "db" ; #path to TDB; . # Text index description :text_dataset rdf:type text:TextDataset ; text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired URI text:index <#indexLucene> ; . <#indexLucene> a text:TextIndexLucene ; text:directory <file:data/luceneIndexing> ; text:entityMap <#entMap> ; . <#entMap> a text:EntityMap ; text:defaultField "text" ; text:entityField "uri" ; text:map ( #RDF label abstracts [ text:field "text" ; text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ; text:analyzer [ a text:StandardAnalyzer ] ] [ text:field "text" ; text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ; text:analyzer [ a text:StandardAnalyzer ] ] ) . <#service_text_tdb> rdf:type fuseki:Service ; fuseki:name "ds" ; fuseki:serviceQuery "query" ; fuseki:serviceQuery "sparql" ; fuseki:serviceUpdate "update" ; fuseki:serviceUpload "upload" ; fuseki:serviceReadGraphStore "get" ; fuseki:serviceReadWriteGraphStore "data" ; fuseki:dataset :text_dataset ; . On Thu, Mar 26, 2020, at 11:31 AM, Zhenya Antić wrote: > Hi Andy, > > Thanks. So I think I have all the lines you listed in the .ttl file > (attached). I also checked, the data file contains the relevant data. But I > have 0 properties indexed. > > Thanks, > Zhenya > > > > On Wed, Mar 25, 2020, at 4:41 AM, Andy Seaborne wrote: >> >> >> On 24/03/2020 15:11, Zhenya Antić wrote: >> > Hi Andy, >> > >> >> Did you load the data before attaching the text index? >> > >> > How do I do it (or not do it, wasn't sure from your post)? >> >> Set up the Fueski system, with the text index as the Fuskei service dataset: >> >> fuseki:name "biology" ; >> fuseki:dataset :text_dataset ; >> ... >> >> :text_dataset rdf:type text:TextDataset ; >> text:dataset <#dataset> ; >> >> >> >> <#dataset> rdf:type tdb2:DatasetTDB2 ; >> tdb2:location "db" ; #path to TDB; >> . >> >> then send the data to /biology/data (which is the SPARQl GSP write >> endpoint) or however you want to push the data to the server (SPARQL >> Update, or the UI. >> >> For very large data: >> >> Load the TDB2 dataset offline >> Then run the "jena.textindexer" utility >> >> https://jena.apache.org/documentation/query/text-query.html#configuration >> >> The first way is easier. >> >> Andy >> >> > >> > Thanks, >> > Zhenya >> > >> > >> > >> > On Sun, Mar 22, 2020, at 9:18 AM, Andy Seaborne wrote: >> >> Just checking one point: >> >> >> >> Did you load the data before attaching the text index? >> >> >> >> The text index is calculated as data is added so if you first load the >> >> dataset then setup a text index, it will miss indexing the data. >> >> >> >> Andy >> >> >> >> On 21/03/2020 07:55, Lorenz Buehmann wrote: >> >>> Hi, >> >>> >> >>> welcome to Semantic Web and Apache Jena. >> >>> >> >>> Comments inline: >> >>> >> >>> On 20.03.20 15:36, Zhenya Antić wrote: >> >>>> Hello, >> >>>> >> >>>> I am a beginner with Fuseki, knowledge graphs and SPARQL, so please >> >>>> forgive me if the questions seem obvious, the learning curve for this >> >>>> turned out to be quite steep. >> >>> No problem, nothing is simple in the beginning, >> >>>> >> >>>> I am trying to get text indexing to work with my Fuseki knowledge graph. >> >>> Which DBpedia dataset did you load? I mean, which files? >> >>>> >> >>>> For starters, I tried using a regular expression, but that didn't work: >> >>>> >> >>>> Just a plain query like this: >> >>>> SELECT DISTINCT * WHERE { >> >>>> ?s ?p ?o >> >>>> } >> >>>> gives 98 results such as: >> >>>> >> >>>> 1 >> >>>> <http://dbpedia.org/ontology/wikiPageID:9127632> >> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> >> >>>> <http://dbpedia.org/resource/Biology> >> >>>> 2 >> >>>> <http://dbpedia.org/ontology/wikiPageID:9127632> >> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> >> >>>> <http://dbpedia.org/resource/Biology#Branches> >> >>>> 3 >> >>>> <http://dbpedia.org/ontology/wikiPageID:9127632> >> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> >> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#branches_of_biology> >> >>>> 4 >> >>>> <http://dbpedia.org/ontology/wikiPageID:18393> >> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> >> >>>> <http://dbpedia.org/resource/Life> >> >>> That can't be the correct output of this query. rdfs:label should return >> >>> literals as object (?o) - or you loaded some really weird data >> >>>> >> >>>> But a query with a regular expression: >> >>>> SELECT DISTINCT * WHERE { >> >>>> ?s ?p ?o >> >>>> FILTER regex(?o, "Biol", "i") >> >>>> } >> >>> >> >>> 1. you should help the query engine and use rdfs:label as property >> >>> >> >>> 2. you should use str() function on the ?o values: >> >>> >> >>> SELECT DISTINCT * WHERE { >> >>> ?s rdfs:label ?o >> >>> FILTER regex(str(?o), "Biol", "i") >> >>> } >> >>> >> >>>> gives 0 results, although there are clearly results that contain "Biol". >> >>> >> >>> >> >>> I've to try your config or maybe others will spot the issue in the >> >>> meantime. >> >>> >> >>>> >> >>>> I also tried setting up indexing with a .ttl file, however the result >> >>>> was "INFO 0 (0 per second) properties indexed". .ttl file below: >> >>>> >> >>>> @prefix : <http://base/#> . >> >>>> @prefix tdb2: <http://jena.apache.org/2016/tdb#> . >> >>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . >> >>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . >> >>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . >> >>>> @prefix fuseki: <http://jena.apache.org/fuseki#> . >> >>>> @prefix text: <http://jena.apache.org/text#> . >> >>>> >> >>>> <http://jena.apache.org/2016/tdb#DatasetTDB> >> >>>> rdfs:subClassOf ja:RDFDataset . >> >>>> >> >>>> ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset . >> >>>> >> >>>> tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset . >> >>>> >> >>>> tdb2:GraphTDB2 rdfs:subClassOf ja:Model . >> >>>> >> >>>> <http://jena.apache.org/2016/tdb#GraphTDB2> >> >>>> rdfs:subClassOf ja:Model . >> >>>> >> >>>> ja:MemoryDataset rdfs:subClassOf ja:RDFDataset . >> >>>> >> >>>> ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset . >> >> >> >> The rdfs:subClassOf should not be necessary (recent versions of Fuseki). >> >> >> >> If any are, let's use know so it can be fixed. >> >> >> >>>> >> >>>> <http://jena.apache.org/text#TextDataset> >> >>>> rdfs:subClassOf ja:RDFDataset . >> >>>> >> >>>> :service_tdb_all a fuseki:Service ; >> >>>> rdfs:label "TDB biology" ; >> >>>> fuseki:dataset :tdb_dataset_readwrite ; >> >>>> fuseki:name "biology" ; >> >>>> fuseki:serviceQuery "query" , "" , "sparql" ; >> >>>> fuseki:serviceReadGraphStore "get" ; >> >>>> fuseki:serviceReadQuads "" ; >> >>>> fuseki:serviceReadWriteGraphStore >> >>>> "data" ; >> >>>> fuseki:serviceReadWriteQuads "" ; >> >>>> fuseki:serviceUpdate "" , "update" ; >> >>>> fuseki:serviceUpload "upload" . >> >>>> >> >>>> :tdb_dataset_readwrite >> >>>> a tdb2:DatasetTDB2 ; >> >>>> tdb2:location "db" . >> >>>> >> >>>> <http://jena.apache.org/2016/tdb#GraphTDB> >> >>>> rdfs:subClassOf ja:Model . >> >>>> >> >>>> ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset . >> >>>> >> >>>> ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset . >> >>>> >> >>>> <http://jena.apache.org/2016/tdb#DatasetTDB2> >> >>>> rdfs:subClassOf ja:RDFDataset . >> >>>> >> >>>> <#dataset> rdf:type tdb2:DatasetTDB2 ; >> >>>> tdb2:location "db" ; #path to TDB; >> >>>> . >> >>>> >> >>>> # Text index description >> >>>> :text_dataset rdf:type text:TextDataset ; >> >>>> text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired >> >>>> URI >> >>>> text:index <#indexLucene> ; >> >>>> . >> >>>> >> >>>> <#indexLucene> a text:TextIndexLucene ; >> >>>> text:directory <file:data/luceneIndexing> ; >> >>>> text:entityMap <#entMap> ; >> >>>> . >> >>>> >> >>>> <#entMap> a text:EntityMap ; >> >>>> text:defaultField "text" ; >> >>>> text:entityField "uri" ; >> >>>> text:map ( >> >>>> #RDF label abstracts >> >>>> [ text:field "text" ; >> >>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ; >> >>>> text:analyzer [ >> >>>> a text:StandardAnalyzer >> >>>> ] >> >>>> ] >> >>>> [ text:field "text" ; >> >>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ; >> >>>> text:analyzer [ >> >>>> a text:StandardAnalyzer >> >>>> ] >> >>>> ] >> >>>> ) . >> >>>> >> >>>> >> >>>> >> >>>> <#service_text_tdb> rdf:type fuseki:Service ; >> >>>> fuseki:name "ds" ; >> >>>> fuseki:serviceQuery "query" ; >> >>>> fuseki:serviceQuery "sparql" ; >> >>>> fuseki:serviceUpdate "update" ; >> >>>> fuseki:serviceUpload "upload" ; >> >>>> fuseki:serviceReadGraphStore "get" ; >> >>>> fuseki:serviceReadWriteGraphStore "data" ; >> >>>> fuseki:dataset :text_dataset ; >> >>>> . >> >>>> >> >>>> Thank you so much in advance, >> >>>> >> >>>> __________________________ >> >>>> Zhenya Antić, PhD >> >>>> Natural Language Processing >> >>>> https://www.linkedin.com/in/zhenya-antic/ >> >>>> >> >>>> Practical Linguistics Inc >> >>>> http://www.practicallinguistics.com >> >>>> >> >>>> >> >>>> >> >>> >> >> >> > >> >