Zhenya,
Do you see any content in the directory:
> text:directory <file:data/luceneIndexing> ;
like the following partial listing:
> fuseki@foo :~/base/lucene-test$ ls -l
> total 3608108
> -rw-rw---- 1 fuseki fuseki 7772 Jan 29 21:15 _19a_5x.liv
> -rw-r----- 1 fuseki fuseki 299 Jan 21 15:53 _19a.cfe
> -rw-r----- 1 fuseki fuseki 36547721 Jan 21 15:53 _19a.cfs
> -rw-r----- 1 fuseki fuseki 443 Jan 21 15:53 _19a.si
> -rw-r----- 1 fuseki fuseki 23621 Jan 21 15:53 _24_17n.liv
> -rw-r----- 1 fuseki fuseki 22718569 Jan 21 15:53 _24.fdt
> -rw-r----- 1 fuseki fuseki 9184 Jan 21 15:53 _24.fdx
> -rw-r----- 1 fuseki fuseki 12975 Jan 21 15:53 _24.fnm
> -rw-r----- 1 fuseki fuseki 7009762 Jan 21 15:53 _24_Lucene50_0.doc
> -rw-r----- 1 fuseki fuseki 3804794 Jan 21 15:53 _24_Lucene50_0.pos
> -rw-r----- 1 fuseki fuseki 16186474 Jan 21 15:53 _24_Lucene50_0.tim
> -rw-r----- 1 fuseki fuseki 103945 Jan 21 15:53 _24_Lucene50_0.tip
> -rw-r----- 1 fuseki fuseki 667296 Jan 21 15:53 _24.nvd
> -rw-r----- 1 fuseki fuseki 4027 Jan 21 15:53 _24.nvm
> -rw-r----- 1 fuseki fuseki 540 Jan 21 15:53 _24.si
Also if you don’t have storevalues true then queries like:
(?s ?score ?lit) text:query “ribosome”
won’t bind anything to ?lit. The storevalues is set like:
> # Text index description
> :test_lucene_index a text:TextIndexLucene ;
> text:directory <file:/usr/local/fuseki/base/lucene-test> ;
> text:storeValues true ;
> text:entityMap :test_entmap ;
Also you need to reload the data if you change the configuration so that the
indexing will be done according to the configuration.
ciao,
Chris
> On Mar 26, 2020, at 10:33 AM, Zhenya Antić <[email protected]> wrote:
>
> @prefix : <http://base/#> .
> @prefix tdb2: <http://jena.apache.org/2016/tdb#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix fuseki: <http://jena.apache.org/fuseki#> .
> @prefix text: <http://jena.apache.org/text#> .
>
> <http://jena.apache.org/2016/tdb#DatasetTDB>
> rdfs:subClassOf ja:RDFDataset .
>
> ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .
>
> tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .
>
> tdb2:GraphTDB2 rdfs:subClassOf ja:Model .
>
> <http://jena.apache.org/2016/tdb#GraphTDB2>
> rdfs:subClassOf ja:Model .
>
> ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .
>
> ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .
>
> <http://jena.apache.org/text#TextDataset>
> rdfs:subClassOf ja:RDFDataset .
>
> :service_tdb_all a fuseki:Service ;
> rdfs:label "TDB biology" ;
> fuseki:dataset :tdb_dataset_readwrite ;
> fuseki:name "biology" ;
> fuseki:serviceQuery "query" , "" , "sparql" ;
> fuseki:serviceReadGraphStore "get" ;
> fuseki:serviceReadQuads "" ;
> fuseki:serviceReadWriteGraphStore
> "data" ;
> fuseki:serviceReadWriteQuads "" ;
> fuseki:serviceUpdate "" , "update" ;
> fuseki:serviceUpload "upload" .
>
> :tdb_dataset_readwrite
> a tdb2:DatasetTDB2 ;
> tdb2:location "db" .
>
> <http://jena.apache.org/2016/tdb#GraphTDB>
> rdfs:subClassOf ja:Model .
>
> ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .
>
> ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .
>
> <http://jena.apache.org/2016/tdb#DatasetTDB2>
> rdfs:subClassOf ja:RDFDataset .
>
> <#dataset> rdf:type tdb2:DatasetTDB2 ;
> tdb2:location "db" ; #path to TDB;
> .
>
> # Text index description
> :text_dataset rdf:type text:TextDataset ;
> text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired URI
> text:index <#indexLucene> ;
> .
>
> <#indexLucene> a text:TextIndexLucene ;
> text:directory <file:data/luceneIndexing> ;
> text:entityMap <#entMap> ;
> .
>
> <#entMap> a text:EntityMap ;
> text:defaultField "text" ;
> text:entityField "uri" ;
> text:map (
> #RDF label abstracts
> [ text:field "text" ;
> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ;
> text:analyzer [
> a text:StandardAnalyzer
> ]
> ]
> [ text:field "text" ;
> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ;
> text:analyzer [
> a text:StandardAnalyzer
> ]
> ]
> ) .
>
>
>
> <#service_text_tdb> rdf:type fuseki:Service ;
> fuseki:name "ds" ;
> fuseki:serviceQuery "query" ;
> fuseki:serviceQuery "sparql" ;
> fuseki:serviceUpdate "update" ;
> fuseki:serviceUpload "upload" ;
> fuseki:serviceReadGraphStore "get" ;
> fuseki:serviceReadWriteGraphStore "data" ;
> fuseki:dataset :text_dataset ;
> .
>
>
>
> On Thu, Mar 26, 2020, at 11:31 AM, Zhenya Antić wrote:
>> Hi Andy,
>>
>> Thanks. So I think I have all the lines you listed in the .ttl file
>> (attached). I also checked, the data file contains the relevant data. But I
>> have 0 properties indexed.
>>
>> Thanks,
>> Zhenya
>>
>>
>>
>> On Wed, Mar 25, 2020, at 4:41 AM, Andy Seaborne wrote:
>>>
>>>
>>> On 24/03/2020 15:11, Zhenya Antić wrote:
>>>> Hi Andy,
>>>>
>>>>> Did you load the data before attaching the text index?
>>>>
>>>> How do I do it (or not do it, wasn't sure from your post)?
>>>
>>> Set up the Fueski system, with the text index as the Fuskei service dataset:
>>>
>>> fuseki:name "biology" ;
>>> fuseki:dataset :text_dataset ;
>>> ...
>>>
>>> :text_dataset rdf:type text:TextDataset ;
>>> text:dataset <#dataset> ;
>>>
>>>
>>>
>>> <#dataset> rdf:type tdb2:DatasetTDB2 ;
>>> tdb2:location "db" ; #path to TDB;
>>> .
>>>
>>> then send the data to /biology/data (which is the SPARQl GSP write
>>> endpoint) or however you want to push the data to the server (SPARQL
>>> Update, or the UI.
>>>
>>> For very large data:
>>>
>>> Load the TDB2 dataset offline
>>> Then run the "jena.textindexer" utility
>>>
>>> https://jena.apache.org/documentation/query/text-query.html#configuration
>>>
>>> The first way is easier.
>>>
>>> Andy
>>>
>>>>
>>>> Thanks,
>>>> Zhenya
>>>>
>>>>
>>>>
>>>> On Sun, Mar 22, 2020, at 9:18 AM, Andy Seaborne wrote:
>>>>> Just checking one point:
>>>>>
>>>>> Did you load the data before attaching the text index?
>>>>>
>>>>> The text index is calculated as data is added so if you first load the
>>>>> dataset then setup a text index, it will miss indexing the data.
>>>>>
>>>>> Andy
>>>>>
>>>>> On 21/03/2020 07:55, Lorenz Buehmann wrote:
>>>>>> Hi,
>>>>>>
>>>>>> welcome to Semantic Web and Apache Jena.
>>>>>>
>>>>>> Comments inline:
>>>>>>
>>>>>> On 20.03.20 15:36, Zhenya Antić wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am a beginner with Fuseki, knowledge graphs and SPARQL, so please
>>>>>>> forgive me if the questions seem obvious, the learning curve for this
>>>>>>> turned out to be quite steep.
>>>>>> No problem, nothing is simple in the beginning,
>>>>>>>
>>>>>>> I am trying to get text indexing to work with my Fuseki knowledge graph.
>>>>>> Which DBpedia dataset did you load? I mean, which files?
>>>>>>>
>>>>>>> For starters, I tried using a regular expression, but that didn't work:
>>>>>>>
>>>>>>> Just a plain query like this:
>>>>>>> SELECT DISTINCT * WHERE {
>>>>>>> ?s ?p ?o
>>>>>>> }
>>>>>>> gives 98 results such as:
>>>>>>>
>>>>>>> 1
>>>>>>> <http://dbpedia.org/ontology/wikiPageID:9127632>
>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
>>>>>>> <http://dbpedia.org/resource/Biology>
>>>>>>> 2
>>>>>>> <http://dbpedia.org/ontology/wikiPageID:9127632>
>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
>>>>>>> <http://dbpedia.org/resource/Biology#Branches>
>>>>>>> 3
>>>>>>> <http://dbpedia.org/ontology/wikiPageID:9127632>
>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym>
>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#branches_of_biology>
>>>>>>> 4
>>>>>>> <http://dbpedia.org/ontology/wikiPageID:18393>
>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
>>>>>>> <http://dbpedia.org/resource/Life>
>>>>>> That can't be the correct output of this query. rdfs:label should return
>>>>>> literals as object (?o) - or you loaded some really weird data
>>>>>>>
>>>>>>> But a query with a regular expression:
>>>>>>> SELECT DISTINCT * WHERE {
>>>>>>> ?s ?p ?o
>>>>>>> FILTER regex(?o, "Biol", "i")
>>>>>>> }
>>>>>>
>>>>>> 1. you should help the query engine and use rdfs:label as property
>>>>>>
>>>>>> 2. you should use str() function on the ?o values:
>>>>>>
>>>>>> SELECT DISTINCT * WHERE {
>>>>>> ?s rdfs:label ?o
>>>>>> FILTER regex(str(?o), "Biol", "i")
>>>>>> }
>>>>>>
>>>>>>> gives 0 results, although there are clearly results that contain "Biol".
>>>>>>
>>>>>>
>>>>>> I've to try your config or maybe others will spot the issue in the
>>>>>> meantime.
>>>>>>
>>>>>>>
>>>>>>> I also tried setting up indexing with a .ttl file, however the result
>>>>>>> was "INFO 0 (0 per second) properties indexed". .ttl file below:
>>>>>>>
>>>>>>> @prefix : <http://base/#> .
>>>>>>> @prefix tdb2: <http://jena.apache.org/2016/tdb#> .
>>>>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>> @prefix fuseki: <http://jena.apache.org/fuseki#> .
>>>>>>> @prefix text: <http://jena.apache.org/text#> .
>>>>>>>
>>>>>>> <http://jena.apache.org/2016/tdb#DatasetTDB>
>>>>>>> rdfs:subClassOf ja:RDFDataset .
>>>>>>>
>>>>>>> ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .
>>>>>>>
>>>>>>> tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .
>>>>>>>
>>>>>>> tdb2:GraphTDB2 rdfs:subClassOf ja:Model .
>>>>>>>
>>>>>>> <http://jena.apache.org/2016/tdb#GraphTDB2>
>>>>>>> rdfs:subClassOf ja:Model .
>>>>>>>
>>>>>>> ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .
>>>>>>>
>>>>>>> ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .
>>>>>
>>>>> The rdfs:subClassOf should not be necessary (recent versions of Fuseki).
>>>>>
>>>>> If any are, let's use know so it can be fixed.
>>>>>
>>>>>>>
>>>>>>> <http://jena.apache.org/text#TextDataset>
>>>>>>> rdfs:subClassOf ja:RDFDataset .
>>>>>>>
>>>>>>> :service_tdb_all a fuseki:Service ;
>>>>>>> rdfs:label "TDB biology" ;
>>>>>>> fuseki:dataset :tdb_dataset_readwrite ;
>>>>>>> fuseki:name "biology" ;
>>>>>>> fuseki:serviceQuery "query" , "" , "sparql" ;
>>>>>>> fuseki:serviceReadGraphStore "get" ;
>>>>>>> fuseki:serviceReadQuads "" ;
>>>>>>> fuseki:serviceReadWriteGraphStore
>>>>>>> "data" ;
>>>>>>> fuseki:serviceReadWriteQuads "" ;
>>>>>>> fuseki:serviceUpdate "" , "update" ;
>>>>>>> fuseki:serviceUpload "upload" .
>>>>>>>
>>>>>>> :tdb_dataset_readwrite
>>>>>>> a tdb2:DatasetTDB2 ;
>>>>>>> tdb2:location "db" .
>>>>>>>
>>>>>>> <http://jena.apache.org/2016/tdb#GraphTDB>
>>>>>>> rdfs:subClassOf ja:Model .
>>>>>>>
>>>>>>> ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .
>>>>>>>
>>>>>>> ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .
>>>>>>>
>>>>>>> <http://jena.apache.org/2016/tdb#DatasetTDB2>
>>>>>>> rdfs:subClassOf ja:RDFDataset .
>>>>>>>
>>>>>>> <#dataset> rdf:type tdb2:DatasetTDB2 ;
>>>>>>> tdb2:location "db" ; #path to TDB;
>>>>>>> .
>>>>>>>
>>>>>>> # Text index description
>>>>>>> :text_dataset rdf:type text:TextDataset ;
>>>>>>> text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired
>>>>>>> URI
>>>>>>> text:index <#indexLucene> ;
>>>>>>> .
>>>>>>>
>>>>>>> <#indexLucene> a text:TextIndexLucene ;
>>>>>>> text:directory <file:data/luceneIndexing> ;
>>>>>>> text:entityMap <#entMap> ;
>>>>>>> .
>>>>>>>
>>>>>>> <#entMap> a text:EntityMap ;
>>>>>>> text:defaultField "text" ;
>>>>>>> text:entityField "uri" ;
>>>>>>> text:map (
>>>>>>> #RDF label abstracts
>>>>>>> [ text:field "text" ;
>>>>>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ;
>>>>>>> text:analyzer [
>>>>>>> a text:StandardAnalyzer
>>>>>>> ]
>>>>>>> ]
>>>>>>> [ text:field "text" ;
>>>>>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ;
>>>>>>> text:analyzer [
>>>>>>> a text:StandardAnalyzer
>>>>>>> ]
>>>>>>> ]
>>>>>>> ) .
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <#service_text_tdb> rdf:type fuseki:Service ;
>>>>>>> fuseki:name "ds" ;
>>>>>>> fuseki:serviceQuery "query" ;
>>>>>>> fuseki:serviceQuery "sparql" ;
>>>>>>> fuseki:serviceUpdate "update" ;
>>>>>>> fuseki:serviceUpload "upload" ;
>>>>>>> fuseki:serviceReadGraphStore "get" ;
>>>>>>> fuseki:serviceReadWriteGraphStore "data" ;
>>>>>>> fuseki:dataset :text_dataset ;
>>>>>>> .
>>>>>>>
>>>>>>> Thank you so much in advance,
>>>>>>>
>>>>>>> __________________________
>>>>>>> Zhenya Antić, PhD
>>>>>>> Natural Language Processing
>>>>>>> https://www.linkedin.com/in/zhenya-antic/
>>>>>>>
>>>>>>> Practical Linguistics Inc
>>>>>>> http://www.practicallinguistics.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>