Hello,
I am a beginner with Fuseki, knowledge graphs and SPARQL, so please forgive me
if the questions seem obvious, the learning curve for this turned out to be
quite steep.
I am trying to get text indexing to work with my Fuseki knowledge graph.
For starters, I tried using a regular expression, but that didn't work:
Just a plain query like this:
SELECT DISTINCT * WHERE {
?s ?p ?o
}
gives 98 results such as:
1
<http://dbpedia.org/ontology/wikiPageID:9127632>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
<http://dbpedia.org/resource/Biology>
2
<http://dbpedia.org/ontology/wikiPageID:9127632>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
<http://dbpedia.org/resource/Biology#Branches>
3
<http://dbpedia.org/ontology/wikiPageID:9127632>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#branches_of_biology>
4
<http://dbpedia.org/ontology/wikiPageID:18393>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
<http://dbpedia.org/resource/Life>
But a query with a regular expression:
SELECT DISTINCT * WHERE {
?s ?p ?o
FILTER regex(?o, "Biol", "i")
}
gives 0 results, although there are clearly results that contain "Biol".
I also tried setting up indexing with a .ttl file, however the result was "INFO
0 (0 per second) properties indexed". .ttl file below:
@prefix : <http://base/#> .
@prefix tdb2: <http://jena.apache.org/2016/tdb#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix text: <http://jena.apache.org/text#> .
<http://jena.apache.org/2016/tdb#DatasetTDB>
rdfs:subClassOf ja:RDFDataset .
ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .
tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .
tdb2:GraphTDB2 rdfs:subClassOf ja:Model .
<http://jena.apache.org/2016/tdb#GraphTDB2>
rdfs:subClassOf ja:Model .
ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .
ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .
<http://jena.apache.org/text#TextDataset>
rdfs:subClassOf ja:RDFDataset .
:service_tdb_all a fuseki:Service ;
rdfs:label "TDB biology" ;
fuseki:dataset :tdb_dataset_readwrite ;
fuseki:name "biology" ;
fuseki:serviceQuery "query" , "" , "sparql" ;
fuseki:serviceReadGraphStore "get" ;
fuseki:serviceReadQuads "" ;
fuseki:serviceReadWriteGraphStore
"data" ;
fuseki:serviceReadWriteQuads "" ;
fuseki:serviceUpdate "" , "update" ;
fuseki:serviceUpload "upload" .
:tdb_dataset_readwrite
a tdb2:DatasetTDB2 ;
tdb2:location "db" .
<http://jena.apache.org/2016/tdb#GraphTDB>
rdfs:subClassOf ja:Model .
ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .
ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .
<http://jena.apache.org/2016/tdb#DatasetTDB2>
rdfs:subClassOf ja:RDFDataset .
<#dataset> rdf:type tdb2:DatasetTDB2 ;
tdb2:location "db" ; #path to TDB;
.
# Text index description
:text_dataset rdf:type text:TextDataset ;
text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired URI
text:index <#indexLucene> ;
.
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:data/luceneIndexing> ;
text:entityMap <#entMap> ;
.
<#entMap> a text:EntityMap ;
text:defaultField "text" ;
text:entityField "uri" ;
text:map (
#RDF label abstracts
[ text:field "text" ;
text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ;
text:analyzer [
a text:StandardAnalyzer
]
]
[ text:field "text" ;
text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ;
text:analyzer [
a text:StandardAnalyzer
]
]
) .
<#service_text_tdb> rdf:type fuseki:Service ;
fuseki:name "ds" ;
fuseki:serviceQuery "query" ;
fuseki:serviceQuery "sparql" ;
fuseki:serviceUpdate "update" ;
fuseki:serviceUpload "upload" ;
fuseki:serviceReadGraphStore "get" ;
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:dataset :text_dataset ;
.
Thank you so much in advance,
__________________________
Zhenya Antić, PhD
Natural Language Processing
https://www.linkedin.com/in/zhenya-antic/
Practical Linguistics Inc
http://www.practicallinguistics.com