@prefix : <http://base/#> .
@prefix tdb2: <http://jena.apache.org/2016/tdb#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix text: <http://jena.apache.org/text#> .

<http://jena.apache.org/2016/tdb#DatasetTDB>
 rdfs:subClassOf ja:RDFDataset .

ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .

tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .

tdb2:GraphTDB2 rdfs:subClassOf ja:Model .

<http://jena.apache.org/2016/tdb#GraphTDB2>
 rdfs:subClassOf ja:Model .

ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .

ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .

<http://jena.apache.org/text#TextDataset>
 rdfs:subClassOf ja:RDFDataset .

:service_tdb_all a fuseki:Service ;
 rdfs:label "TDB biology" ;
 fuseki:dataset :tdb_dataset_readwrite ;
 fuseki:name "biology" ;
 fuseki:serviceQuery "query" , "" , "sparql" ;
 fuseki:serviceReadGraphStore "get" ;
 fuseki:serviceReadQuads "" ;
 fuseki:serviceReadWriteGraphStore
 "data" ;
 fuseki:serviceReadWriteQuads "" ;
 fuseki:serviceUpdate "" , "update" ;
 fuseki:serviceUpload "upload" .

:tdb_dataset_readwrite
 a tdb2:DatasetTDB2 ;
 tdb2:location "db" .

<http://jena.apache.org/2016/tdb#GraphTDB>
 rdfs:subClassOf ja:Model .

ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .

ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .

<http://jena.apache.org/2016/tdb#DatasetTDB2>
 rdfs:subClassOf ja:RDFDataset .

<#dataset> rdf:type tdb2:DatasetTDB2 ;
tdb2:location "db" ; #path to TDB;
.

# Text index description
:text_dataset rdf:type text:TextDataset ;
 text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired URI
 text:index <#indexLucene> ;
.

<#indexLucene> a text:TextIndexLucene ;
 text:directory <file:data/luceneIndexing> ;
 text:entityMap <#entMap> ;
 .

<#entMap> a text:EntityMap ;
 text:defaultField "text" ;
 text:entityField "uri" ;
 text:map (
 #RDF label abstracts
 [ text:field "text" ;
 text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ;
 text:analyzer [
 a text:StandardAnalyzer
 ] 
 ]
 [ text:field "text" ;
 text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ;
 text:analyzer [
 a text:StandardAnalyzer
 ] 
 ]
 ) .



<#service_text_tdb> rdf:type fuseki:Service ;
 fuseki:name "ds" ;
 fuseki:serviceQuery "query" ;
 fuseki:serviceQuery "sparql" ;
 fuseki:serviceUpdate "update" ;
 fuseki:serviceUpload "upload" ;
 fuseki:serviceReadGraphStore "get" ;
 fuseki:serviceReadWriteGraphStore "data" ;
 fuseki:dataset :text_dataset ;
 .



On Thu, Mar 26, 2020, at 11:31 AM, Zhenya Antić wrote:
> Hi Andy,
> 
> Thanks. So I think I have all the lines you listed in the .ttl file 
> (attached). I also checked, the data file contains the relevant data. But I 
> have 0 properties indexed.
> 
> Thanks,
> Zhenya
> 
> 
> 
> On Wed, Mar 25, 2020, at 4:41 AM, Andy Seaborne wrote:
>> 
>> 
>> On 24/03/2020 15:11, Zhenya Antić wrote:
>> > Hi Andy,
>> > 
>> >> Did you load the data before attaching the text index?
>> > 
>> > How do I do it (or not do it, wasn't sure from your post)?
>> 
>> Set up the Fueski system, with the text index as the Fuskei service dataset:
>> 
>>  fuseki:name "biology" ;
>>  fuseki:dataset :text_dataset ;
>> ...
>> 
>> :text_dataset rdf:type text:TextDataset ;
>>  text:dataset <#dataset> ;
>> 
>> 
>> 
>> <#dataset> rdf:type tdb2:DatasetTDB2 ;
>> tdb2:location "db" ; #path to TDB;
>> .
>> 
>> then send the data to /biology/data (which is the SPARQl GSP write 
>> endpoint) or however you want to push the data to the server (SPARQL 
>> Update, or the UI.
>> 
>> For very large data:
>> 
>> Load the TDB2 dataset offline
>> Then run the "jena.textindexer" utility
>> 
>> https://jena.apache.org/documentation/query/text-query.html#configuration
>> 
>> The first way is easier.
>> 
>>  Andy
>> 
>> > 
>> > Thanks,
>> > Zhenya
>> > 
>> > 
>> > 
>> > On Sun, Mar 22, 2020, at 9:18 AM, Andy Seaborne wrote:
>> >> Just checking one point:
>> >>
>> >> Did you load the data before attaching the text index?
>> >>
>> >> The text index is calculated as data is added so if you first load the
>> >> dataset then setup a text index, it will miss indexing the data.
>> >>
>> >> Andy
>> >>
>> >> On 21/03/2020 07:55, Lorenz Buehmann wrote:
>> >>> Hi,
>> >>>
>> >>> welcome to Semantic Web and Apache Jena.
>> >>>
>> >>> Comments inline:
>> >>>
>> >>> On 20.03.20 15:36, Zhenya Antić wrote:
>> >>>> Hello,
>> >>>>
>> >>>> I am a beginner with Fuseki, knowledge graphs and SPARQL, so please 
>> >>>> forgive me if the questions seem obvious, the learning curve for this 
>> >>>> turned out to be quite steep.
>> >>> No problem, nothing is simple in the beginning,
>> >>>>
>> >>>> I am trying to get text indexing to work with my Fuseki knowledge graph.
>> >>> Which DBpedia dataset did you load? I mean, which files?
>> >>>>
>> >>>> For starters, I tried using a regular expression, but that didn't work:
>> >>>>
>> >>>> Just a plain query like this:
>> >>>> SELECT DISTINCT * WHERE {
>> >>>> ?s ?p ?o
>> >>>> }
>> >>>> gives 98 results such as:
>> >>>>
>> >>>> 1
>> >>>> <http://dbpedia.org/ontology/wikiPageID:9127632>
>> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
>> >>>> <http://dbpedia.org/resource/Biology>
>> >>>> 2
>> >>>> <http://dbpedia.org/ontology/wikiPageID:9127632>
>> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
>> >>>> <http://dbpedia.org/resource/Biology#Branches>
>> >>>> 3
>> >>>> <http://dbpedia.org/ontology/wikiPageID:9127632>
>> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym>
>> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#branches_of_biology>
>> >>>> 4
>> >>>> <http://dbpedia.org/ontology/wikiPageID:18393>
>> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
>> >>>> <http://dbpedia.org/resource/Life>
>> >>> That can't be the correct output of this query. rdfs:label should return
>> >>> literals as object (?o) - or you loaded some really weird data
>> >>>>
>> >>>> But a query with a regular expression:
>> >>>> SELECT DISTINCT * WHERE {
>> >>>> ?s ?p ?o
>> >>>> FILTER regex(?o, "Biol", "i")
>> >>>> }
>> >>>
>> >>> 1. you should help the query engine and use rdfs:label as property
>> >>>
>> >>> 2. you should use str() function on the ?o values:
>> >>>
>> >>> SELECT DISTINCT * WHERE {
>> >>> ?s rdfs:label ?o
>> >>> FILTER regex(str(?o), "Biol", "i")
>> >>> }
>> >>>
>> >>>> gives 0 results, although there are clearly results that contain "Biol".
>> >>>
>> >>>
>> >>> I've to try your config or maybe others will spot the issue in the 
>> >>> meantime.
>> >>>
>> >>>>
>> >>>> I also tried setting up indexing with a .ttl file, however the result 
>> >>>> was "INFO 0 (0 per second) properties indexed". .ttl file below:
>> >>>>
>> >>>> @prefix : <http://base/#> .
>> >>>> @prefix tdb2: <http://jena.apache.org/2016/tdb#> .
>> >>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>> >>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>> >>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>> >>>> @prefix fuseki: <http://jena.apache.org/fuseki#> .
>> >>>> @prefix text: <http://jena.apache.org/text#> .
>> >>>>
>> >>>> <http://jena.apache.org/2016/tdb#DatasetTDB>
>> >>>> rdfs:subClassOf ja:RDFDataset .
>> >>>>
>> >>>> ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .
>> >>>>
>> >>>> tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .
>> >>>>
>> >>>> tdb2:GraphTDB2 rdfs:subClassOf ja:Model .
>> >>>>
>> >>>> <http://jena.apache.org/2016/tdb#GraphTDB2>
>> >>>> rdfs:subClassOf ja:Model .
>> >>>>
>> >>>> ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .
>> >>>>
>> >>>> ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .
>> >>
>> >> The rdfs:subClassOf should not be necessary (recent versions of Fuseki).
>> >>
>> >> If any are, let's use know so it can be fixed.
>> >>
>> >>>>
>> >>>> <http://jena.apache.org/text#TextDataset>
>> >>>> rdfs:subClassOf ja:RDFDataset .
>> >>>>
>> >>>> :service_tdb_all a fuseki:Service ;
>> >>>> rdfs:label "TDB biology" ;
>> >>>> fuseki:dataset :tdb_dataset_readwrite ;
>> >>>> fuseki:name "biology" ;
>> >>>> fuseki:serviceQuery "query" , "" , "sparql" ;
>> >>>> fuseki:serviceReadGraphStore "get" ;
>> >>>> fuseki:serviceReadQuads "" ;
>> >>>> fuseki:serviceReadWriteGraphStore
>> >>>> "data" ;
>> >>>> fuseki:serviceReadWriteQuads "" ;
>> >>>> fuseki:serviceUpdate "" , "update" ;
>> >>>> fuseki:serviceUpload "upload" .
>> >>>>
>> >>>> :tdb_dataset_readwrite
>> >>>> a tdb2:DatasetTDB2 ;
>> >>>> tdb2:location "db" .
>> >>>>
>> >>>> <http://jena.apache.org/2016/tdb#GraphTDB>
>> >>>> rdfs:subClassOf ja:Model .
>> >>>>
>> >>>> ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .
>> >>>>
>> >>>> ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .
>> >>>>
>> >>>> <http://jena.apache.org/2016/tdb#DatasetTDB2>
>> >>>> rdfs:subClassOf ja:RDFDataset .
>> >>>>
>> >>>> <#dataset> rdf:type tdb2:DatasetTDB2 ;
>> >>>> tdb2:location "db" ; #path to TDB;
>> >>>> .
>> >>>>
>> >>>> # Text index description
>> >>>> :text_dataset rdf:type text:TextDataset ;
>> >>>> text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired 
>> >>>> URI
>> >>>> text:index <#indexLucene> ;
>> >>>> .
>> >>>>
>> >>>> <#indexLucene> a text:TextIndexLucene ;
>> >>>> text:directory <file:data/luceneIndexing> ;
>> >>>> text:entityMap <#entMap> ;
>> >>>> .
>> >>>>
>> >>>> <#entMap> a text:EntityMap ;
>> >>>> text:defaultField "text" ;
>> >>>> text:entityField "uri" ;
>> >>>> text:map (
>> >>>> #RDF label abstracts
>> >>>> [ text:field "text" ;
>> >>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ;
>> >>>> text:analyzer [
>> >>>> a text:StandardAnalyzer
>> >>>> ]
>> >>>> ]
>> >>>> [ text:field "text" ;
>> >>>> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ;
>> >>>> text:analyzer [
>> >>>> a text:StandardAnalyzer
>> >>>> ]
>> >>>> ]
>> >>>> ) .
>> >>>>
>> >>>>
>> >>>>
>> >>>> <#service_text_tdb> rdf:type fuseki:Service ;
>> >>>> fuseki:name "ds" ;
>> >>>> fuseki:serviceQuery "query" ;
>> >>>> fuseki:serviceQuery "sparql" ;
>> >>>> fuseki:serviceUpdate "update" ;
>> >>>> fuseki:serviceUpload "upload" ;
>> >>>> fuseki:serviceReadGraphStore "get" ;
>> >>>> fuseki:serviceReadWriteGraphStore "data" ;
>> >>>> fuseki:dataset :text_dataset ;
>> >>>> .
>> >>>>
>> >>>> Thank you so much in advance,
>> >>>>
>> >>>> __________________________
>> >>>> Zhenya Antić, PhD
>> >>>> Natural Language Processing
>> >>>> https://www.linkedin.com/in/zhenya-antic/
>> >>>>
>> >>>> Practical Linguistics Inc
>> >>>> http://www.practicallinguistics.com
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> > 
>> 
> 

Reply via email to