Andy, I think I figured out what the issue is. It seems that I have two datasets with the same name, and one was started with the config file I sent (and has no data in it - and hence it is not indexed), and the other was started without a config file (like this: fuseki-server --port 3030 --loc="db" /biology), and it has the data.
How do I transfer the data from one to other? Thanks, Zhenya On Thu, Mar 26, 2020, at 12:22 PM, Chris Tomlinson wrote: > Zhenya, > > Do you see any content in the directory: > > > text:directory <file:data/luceneIndexing> ; > > like the following partial listing: > > > fuseki@foo :~/base/lucene-test$ ls -l > > total 3608108 > > -rw-rw---- 1 fuseki fuseki 7772 Jan 29 21:15 _19a_5x.liv > > -rw-r----- 1 fuseki fuseki 299 Jan 21 15:53 _19a.cfe > > -rw-r----- 1 fuseki fuseki 36547721 Jan 21 15:53 _19a.cfs > > -rw-r----- 1 fuseki fuseki 443 Jan 21 15:53 _19a.si > > -rw-r----- 1 fuseki fuseki 23621 Jan 21 15:53 _24_17n.liv > > -rw-r----- 1 fuseki fuseki 22718569 Jan 21 15:53 _24.fdt > > -rw-r----- 1 fuseki fuseki 9184 Jan 21 15:53 _24.fdx > > -rw-r----- 1 fuseki fuseki 12975 Jan 21 15:53 _24.fnm > > -rw-r----- 1 fuseki fuseki 7009762 Jan 21 15:53 _24_Lucene50_0.doc > > -rw-r----- 1 fuseki fuseki 3804794 Jan 21 15:53 _24_Lucene50_0.pos > > -rw-r----- 1 fuseki fuseki 16186474 Jan 21 15:53 _24_Lucene50_0.tim > > -rw-r----- 1 fuseki fuseki 103945 Jan 21 15:53 _24_Lucene50_0.tip > > -rw-r----- 1 fuseki fuseki 667296 Jan 21 15:53 _24.nvd > > -rw-r----- 1 fuseki fuseki 4027 Jan 21 15:53 _24.nvm > > -rw-r----- 1 fuseki fuseki 540 Jan 21 15:53 _24.si > > Also if you don’t have storevalues true then queries like: > > (?s ?score ?lit) text:query “ribosome” > > won’t bind anything to ?lit. The storevalues is set like: > > > # Text index description > > :test_lucene_index a text:TextIndexLucene ; > > text:directory <file:/usr/local/fuseki/base/lucene-test> ; > > text:storeValues true ; > > text:entityMap :test_entmap ; > > > Also you need to reload the data if you change the configuration so that the > indexing will be done according to the configuration. > > ciao, > Chris > > > > On Mar 26, 2020, at 10:33 AM, Zhenya Antić <[email protected]> wrote: > > > > @prefix : <http://base/#> . > > @prefix tdb2: <http://jena.apache.org/2016/tdb#> . > > @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . > > @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . > > @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . > > @prefix fuseki: <http://jena.apache.org/fuseki#> . > > @prefix text: <http://jena.apache.org/text#> . > > > > <http://jena.apache.org/2016/tdb#DatasetTDB> > > rdfs:subClassOf ja:RDFDataset . > > > > ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset . > > > > tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset . > > > > tdb2:GraphTDB2 rdfs:subClassOf ja:Model . > > > > <http://jena.apache.org/2016/tdb#GraphTDB2> > > rdfs:subClassOf ja:Model . > > > > ja:MemoryDataset rdfs:subClassOf ja:RDFDataset . > > > > ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset . > > > > <http://jena.apache.org/text#TextDataset> > > rdfs:subClassOf ja:RDFDataset . > > > > :service_tdb_all a fuseki:Service ; > > rdfs:label "TDB biology" ; > > fuseki:dataset :tdb_dataset_readwrite ; > > fuseki:name "biology" ; > > fuseki:serviceQuery "query" , "" , "sparql" ; > > fuseki:serviceReadGraphStore "get" ; > > fuseki:serviceReadQuads "" ; > > fuseki:serviceReadWriteGraphStore > > "data" ; > > fuseki:serviceReadWriteQuads "" ; > > fuseki:serviceUpdate "" , "update" ; > > fuseki:serviceUpload "upload" . > > > > :tdb_dataset_readwrite > > a tdb2:DatasetTDB2 ; > > tdb2:location "db" . > > > > <http://jena.apache.org/2016/tdb#GraphTDB> > > rdfs:subClassOf ja:Model . > > > > ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset . > > > > ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset . > > > > <http://jena.apache.org/2016/tdb#DatasetTDB2> > > rdfs:subClassOf ja:RDFDataset . > > > > <#dataset> rdf:type tdb2:DatasetTDB2 ; > > tdb2:location "db" ; #path to TDB; > > . > > > > # Text index description > > :text_dataset rdf:type text:TextDataset ; > > text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired URI > > text:index <#indexLucene> ; > > . > > > > <#indexLucene> a text:TextIndexLucene ; > > text:directory <file:data/luceneIndexing> ; > > text:entityMap <#entMap> ; > > . > > > > <#entMap> a text:EntityMap ; > > text:defaultField "text" ; > > text:entityField "uri" ; > > text:map ( > > #RDF label abstracts > > [ text:field "text" ; > > text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ; > > text:analyzer [ > > a text:StandardAnalyzer > > ] > > ] > > [ text:field "text" ; > > text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ; > > text:analyzer [ > > a text:StandardAnalyzer > > ] > > ] > > ) . > > > > > > > > <#service_text_tdb> rdf:type fuseki:Service ; > > fuseki:name "ds" ; > > fuseki:serviceQuery "query" ; > > fuseki:serviceQuery "sparql" ; > > fuseki:serviceUpdate "update" ; > > fuseki:serviceUpload "upload" ; > > fuseki:serviceReadGraphStore "get" ; > > fuseki:serviceReadWriteGraphStore "data" ; > > fuseki:dataset :text_dataset ; > > . > > > > > > > > On Thu, Mar 26, 2020, at 11:31 AM, Zhenya Antić wrote: > >> Hi Andy, > >> > >> Thanks. So I think I have all the lines you listed in the .ttl file > >> (attached). I also checked, the data file contains the relevant data. But > >> I have 0 properties indexed. > >> > >> Thanks, > >> Zhenya > >> > >> > >> > >> On Wed, Mar 25, 2020, at 4:41 AM, Andy Seaborne wrote: > >>> > >>> > >>> On 24/03/2020 15:11, Zhenya Antić wrote: > >>>> Hi Andy, > >>>> > >>>>> Did you load the data before attaching the text index? > >>>> > >>>> How do I do it (or not do it, wasn't sure from your post)? > >>> > >>> Set up the Fueski system, with the text index as the Fuskei service > >>> dataset: > >>> > >>> fuseki:name "biology" ; > >>> fuseki:dataset :text_dataset ; > >>> ... > >>> > >>> :text_dataset rdf:type text:TextDataset ; > >>> text:dataset <#dataset> ; > >>> > >>> > >>> > >>> <#dataset> rdf:type tdb2:DatasetTDB2 ; > >>> tdb2:location "db" ; #path to TDB; > >>> . > >>> > >>> then send the data to /biology/data (which is the SPARQl GSP write > >>> endpoint) or however you want to push the data to the server (SPARQL > >>> Update, or the UI. > >>> > >>> For very large data: > >>> > >>> Load the TDB2 dataset offline > >>> Then run the "jena.textindexer" utility > >>> > >>> https://jena.apache.org/documentation/query/text-query.html#configuration > >>> > >>> The first way is easier. > >>> > >>> Andy > >>> > >>>> > >>>> Thanks, > >>>> Zhenya > >>>> > >>>> > >>>> > >>>> On Sun, Mar 22, 2020, at 9:18 AM, Andy Seaborne wrote: > >>>>> Just checking one point: > >>>>> > >>>>> Did you load the data before attaching the text index? > >>>>> > >>>>> The text index is calculated as data is added so if you first load the > >>>>> dataset then setup a text index, it will miss indexing the data. > >>>>> > >>>>> Andy > >>>>> > >>>>> On 21/03/2020 07:55, Lorenz Buehmann wrote: > >>>>>> Hi, > >>>>>> > >>>>>> welcome to Semantic Web and Apache Jena. > >>>>>> > >>>>>> Comments inline: > >>>>>> > >>>>>> On 20.03.20 15:36, Zhenya Antić wrote: > >>>>>>> Hello, > >>>>>>> > >>>>>>> I am a beginner with Fuseki, knowledge graphs and SPARQL, so please > >>>>>>> forgive me if the questions seem obvious, the learning curve for this > >>>>>>> turned out to be quite steep. > >>>>>> No problem, nothing is simple in the beginning, > >>>>>>> > >>>>>>> I am trying to get text indexing to work with my Fuseki knowledge > >>>>>>> graph. > >>>>>> Which DBpedia dataset did you load? I mean, which files? > >>>>>>> > >>>>>>> For starters, I tried using a regular expression, but that didn't > >>>>>>> work: > >>>>>>> > >>>>>>> Just a plain query like this: > >>>>>>> SELECT DISTINCT * WHERE { > >>>>>>> ?s ?p ?o > >>>>>>> } > >>>>>>> gives 98 results such as: > >>>>>>> > >>>>>>> 1 > >>>>>>> <http://dbpedia.org/ontology/wikiPageID:9127632> > >>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> > >>>>>>> <http://dbpedia.org/resource/Biology> > >>>>>>> 2 > >>>>>>> <http://dbpedia.org/ontology/wikiPageID:9127632> > >>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> > >>>>>>> <http://dbpedia.org/resource/Biology#Branches> > >>>>>>> 3 > >>>>>>> <http://dbpedia.org/ontology/wikiPageID:9127632> > >>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> > >>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#branches_of_biology> > >>>>>>> 4 > >>>>>>> <http://dbpedia.org/ontology/wikiPageID:18393> > >>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> > >>>>>>> <http://dbpedia.org/resource/Life> > >>>>>> That can't be the correct output of this query. rdfs:label should > >>>>>> return > >>>>>> literals as object (?o) - or you loaded some really weird data > >>>>>>> > >>>>>>> But a query with a regular expression: > >>>>>>> SELECT DISTINCT * WHERE { > >>>>>>> ?s ?p ?o > >>>>>>> FILTER regex(?o, "Biol", "i") > >>>>>>> } > >>>>>> > >>>>>> 1. you should help the query engine and use rdfs:label as property > >>>>>> > >>>>>> 2. you should use str() function on the ?o values: > >>>>>> > >>>>>> SELECT DISTINCT * WHERE { > >>>>>> ?s rdfs:label ?o > >>>>>> FILTER regex(str(?o), "Biol", "i") > >>>>>> } > >>>>>> > >>>>>>> gives 0 results, although there are clearly results that contain > >>>>>>> "Biol". > >>>>>> > >>>>>> > >>>>>> I've to try your config or maybe others will spot the issue in the > >>>>>> meantime. > >>>>>> > >>>>>>> > >>>>>>> I also tried setting up indexing with a .ttl file, however the result > >>>>>>> was "INFO 0 (0 per second) properties indexed". .ttl file below: > >>>>>>> > >>>>>>> @prefix : <http://base/#> . > >>>>>>> @prefix tdb2: <http://jena.apache.org/2016/tdb#> . > >>>>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . > >>>>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . > >>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . > >>>>>>> @prefix fuseki: <http://jena.apache.org/fuseki#> . > >>>>>>> @prefix text: <http://jena.apache.org/text#> . > >>>>>>> > >>>>>>> <http://jena.apache.org/2016/tdb#DatasetTDB> > >>>>>>> rdfs:subClassOf ja:RDFDataset . > >>>>>>> > >>>>>>> ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset . > >>>>>>> > >>>>>>> tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset . > >>>>>>> > >>>>>>> tdb2:GraphTDB2 rdfs:subClassOf ja:Model . > >>>>>>> > >>>>>>> <http://jena.apache.org/2016/tdb#GraphTDB2> > >>>>>>> rdfs:subClassOf ja:Model . > >>>>>>> > >>>>>>> ja:MemoryDataset rdfs:subClassOf ja:RDFDataset . > >>>>>>> > >>>>>>> ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset . > >>>>> > >>>>> The rdfs:subClassOf should not be necessary (recent versions of Fuseki). > >>>>> > >>>>> If any are, let's use know so it can be fixed. > >>>>> > >>>>>>> > >>>>>>> <http://jena.apache.org/text#TextDataset> > >>>>>>> rdfs:subClassOf ja:RDFDataset . > >>>>>>> > >>>>>>> :service_tdb_all a fuseki:Service ; > >>>>>>> rdfs:label "TDB biology" ; > >>>>>>> fuseki:dataset :tdb_dataset_readwrite ; > >>>>>>> fuseki:name "biology" ; > >>>>>>> fuseki:serviceQuery "query" , "" , "sparql" ; > >>>>>>> fuseki:serviceReadGraphStore "get" ; > >>>>>>> fuseki:serviceReadQuads "" ; > >>>>>>> fuseki:serviceReadWriteGraphStore > >>>>>>> "data" ; > >>>>>>> fuseki:serviceReadWriteQuads "" ; > >>>>>>> fuseki:serviceUpdate "" , "update" ; > >>>>>>> fuseki:serviceUpload "upload" . > >>>>>>> > >>>>>>> :tdb_dataset_readwrite > >>>>>>> a tdb2:DatasetTDB2 ; > >>>>>>> tdb2:location "db" . > >>>>>>> > >>>>>>> <http://jena.apache.org/2016/tdb#GraphTDB> > >>>>>>> rdfs:subClassOf ja:Model . > >>>>>>> > >>>>>>> ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset . > >>>>>>> > >>>>>>> ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset . > >>>>>>> > >>>>>>> <http://jena.apache.org/2016/tdb#DatasetTDB2> > >>>>>>> rdfs:subClassOf ja:RDFDataset . > >>>>>>> > >>>>>>> <#dataset> rdf:type tdb2:DatasetTDB2 ; > >>>>>>> tdb2:location "db" ; #path to TDB; > >>>>>>> . > >>>>>>> > >>>>>>> # Text index description > >>>>>>> :text_dataset rdf:type text:TextDataset ; > >>>>>>> text:dataset <#dataset> ; # <-- replace `:my_dataset` with the > >>>>>>> desired URI > >>>>>>> text:index <#indexLucene> ; > >>>>>>> . > >>>>>>> > >>>>>>> <#indexLucene> a text:TextIndexLucene ; > >>>>>>> text:directory <file:data/luceneIndexing> ; > >>>>>>> text:entityMap <#entMap> ; > >>>>>>> . > >>>>>>> > >>>>>>> <#entMap> a text:EntityMap ; > >>>>>>> text:defaultField "text" ; > >>>>>>> text:entityField "uri" ; > >>>>>>> text:map ( > >>>>>>> #RDF label abstracts > >>>>>>> [ text:field "text" ; > >>>>>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ; > >>>>>>> text:analyzer [ > >>>>>>> a text:StandardAnalyzer > >>>>>>> ] > >>>>>>> ] > >>>>>>> [ text:field "text" ; > >>>>>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ; > >>>>>>> text:analyzer [ > >>>>>>> a text:StandardAnalyzer > >>>>>>> ] > >>>>>>> ] > >>>>>>> ) . > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> <#service_text_tdb> rdf:type fuseki:Service ; > >>>>>>> fuseki:name "ds" ; > >>>>>>> fuseki:serviceQuery "query" ; > >>>>>>> fuseki:serviceQuery "sparql" ; > >>>>>>> fuseki:serviceUpdate "update" ; > >>>>>>> fuseki:serviceUpload "upload" ; > >>>>>>> fuseki:serviceReadGraphStore "get" ; > >>>>>>> fuseki:serviceReadWriteGraphStore "data" ; > >>>>>>> fuseki:dataset :text_dataset ; > >>>>>>> . > >>>>>>> > >>>>>>> Thank you so much in advance, > >>>>>>> > >>>>>>> __________________________ > >>>>>>> Zhenya Antić, PhD > >>>>>>> Natural Language Processing > >>>>>>> https://www.linkedin.com/in/zhenya-antic/ > >>>>>>> > >>>>>>> Practical Linguistics Inc > >>>>>>> http://www.practicallinguistics.com > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >
