Hi,
welcome to Semantic Web and Apache Jena.
Comments inline:
On 20.03.20 15:36, Zhenya Antić wrote:
> Hello,
>
> I am a beginner with Fuseki, knowledge graphs and SPARQL, so please forgive
> me if the questions seem obvious, the learning curve for this turned out to
> be quite steep.
No problem, nothing is simple in the beginning,
>
> I am trying to get text indexing to work with my Fuseki knowledge graph.
Which DBpedia dataset did you load? I mean, which files?
>
> For starters, I tried using a regular expression, but that didn't work:
>
> Just a plain query like this:
> SELECT DISTINCT * WHERE {
> ?s ?p ?o
> }
> gives 98 results such as:
>
> 1
> <http://dbpedia.org/ontology/wikiPageID:9127632>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
> <http://dbpedia.org/resource/Biology>
> 2
> <http://dbpedia.org/ontology/wikiPageID:9127632>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
> <http://dbpedia.org/resource/Biology#Branches>
> 3
> <http://dbpedia.org/ontology/wikiPageID:9127632>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#branches_of_biology>
> 4
> <http://dbpedia.org/ontology/wikiPageID:18393>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label>
> <http://dbpedia.org/resource/Life>
That can't be the correct output of this query. rdfs:label should return
literals as object (?o) - or you loaded some really weird data
>
> But a query with a regular expression:
> SELECT DISTINCT * WHERE {
> ?s ?p ?o
> FILTER regex(?o, "Biol", "i")
> }
1. you should help the query engine and use rdfs:label as property
2. you should use str() function on the ?o values:
SELECT DISTINCT * WHERE {
?s rdfs:label ?o
FILTER regex(str(?o), "Biol", "i")
}
> gives 0 results, although there are clearly results that contain "Biol".
I've to try your config or maybe others will spot the issue in the meantime.
>
> I also tried setting up indexing with a .ttl file, however the result was
> "INFO 0 (0 per second) properties indexed". .ttl file below:
>
> @prefix : <http://base/#> .
> @prefix tdb2: <http://jena.apache.org/2016/tdb#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix fuseki: <http://jena.apache.org/fuseki#> .
> @prefix text: <http://jena.apache.org/text#> .
>
> <http://jena.apache.org/2016/tdb#DatasetTDB>
> rdfs:subClassOf ja:RDFDataset .
>
> ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .
>
> tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .
>
> tdb2:GraphTDB2 rdfs:subClassOf ja:Model .
>
> <http://jena.apache.org/2016/tdb#GraphTDB2>
> rdfs:subClassOf ja:Model .
>
> ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .
>
> ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .
>
> <http://jena.apache.org/text#TextDataset>
> rdfs:subClassOf ja:RDFDataset .
>
> :service_tdb_all a fuseki:Service ;
> rdfs:label "TDB biology" ;
> fuseki:dataset :tdb_dataset_readwrite ;
> fuseki:name "biology" ;
> fuseki:serviceQuery "query" , "" , "sparql" ;
> fuseki:serviceReadGraphStore "get" ;
> fuseki:serviceReadQuads "" ;
> fuseki:serviceReadWriteGraphStore
> "data" ;
> fuseki:serviceReadWriteQuads "" ;
> fuseki:serviceUpdate "" , "update" ;
> fuseki:serviceUpload "upload" .
>
> :tdb_dataset_readwrite
> a tdb2:DatasetTDB2 ;
> tdb2:location "db" .
>
> <http://jena.apache.org/2016/tdb#GraphTDB>
> rdfs:subClassOf ja:Model .
>
> ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .
>
> ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .
>
> <http://jena.apache.org/2016/tdb#DatasetTDB2>
> rdfs:subClassOf ja:RDFDataset .
>
> <#dataset> rdf:type tdb2:DatasetTDB2 ;
> tdb2:location "db" ; #path to TDB;
> .
>
> # Text index description
> :text_dataset rdf:type text:TextDataset ;
> text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired URI
> text:index <#indexLucene> ;
> .
>
> <#indexLucene> a text:TextIndexLucene ;
> text:directory <file:data/luceneIndexing> ;
> text:entityMap <#entMap> ;
> .
>
> <#entMap> a text:EntityMap ;
> text:defaultField "text" ;
> text:entityField "uri" ;
> text:map (
> #RDF label abstracts
> [ text:field "text" ;
> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> ;
> text:analyzer [
> a text:StandardAnalyzer
> ]
> ]
> [ text:field "text" ;
> text:predicate <http://www.w3.org/1999/02/22-rdf-syntax-ns#synonym> ;
> text:analyzer [
> a text:StandardAnalyzer
> ]
> ]
> ) .
>
>
>
> <#service_text_tdb> rdf:type fuseki:Service ;
> fuseki:name "ds" ;
> fuseki:serviceQuery "query" ;
> fuseki:serviceQuery "sparql" ;
> fuseki:serviceUpdate "update" ;
> fuseki:serviceUpload "upload" ;
> fuseki:serviceReadGraphStore "get" ;
> fuseki:serviceReadWriteGraphStore "data" ;
> fuseki:dataset :text_dataset ;
> .
>
> Thank you so much in advance,
>
> __________________________
> Zhenya Antić, PhD
> Natural Language Processing
> https://www.linkedin.com/in/zhenya-antic/
>
> Practical Linguistics Inc
> http://www.practicallinguistics.com
>
>
>