Hi, I'm trying to use Jena Full Text Search feature according to https://jena.apache.org/documentation/query/text-query.html I've noticed that queries using "text:query" are very slow: ~20 times slower that similar using "FILTER contains" clause. There are ~5.5M triples in database, 18230 triples with indexed predicate. Database takes 1.3GB and index 4.2M disc space. Available memory for fuseki server is 16GB.
My config is quite easy, there is nothing special configured: ################################################################################################ PREFIX : <#> PREFIX fuseki: http://jena.apache.org/fuseki# PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX ja: http://jena.hpl.hp.com/2005/11/Assembler# PREFIX tdb: http://jena.hpl.hp.com/2008/tdb# PREFIX tdb2: http://jena.apache.org/2016/tdb# PREFIX text: http://jena.apache.org/text# PREFIX skos: http://www.w3.org/2004/02/skos/core# PREFIX fhir: http://hl7.org/fhir/ PREFIX tes: http://mycompany/tes/ [] rdf:type fuseki:Server ; fuseki:services ( :service ) . :service rdf:type fuseki:Service ; fuseki:name "tes" ; fuseki:serviceQuery "query" , "sparql" ; # SPARQL query service fuseki:serviceUpdate "update" ; # SPARQL update service fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol (read and write) fuseki:serviceReadGraphStore "get" ; fuseki:serviceUpload "upload" ; fuseki:dataset :text_dataset ; . # A TextDataset is a regular dataset with a text index. :text_dataset rdf:type text:TextDataset ; text:dataset :tdb2_dataset_readwrite; text:index :indexLucene ; . # A TDB dataset used for RDF storage :tdb2_dataset_readwrite rdf:type tdb2:DatasetTDB ; tdb2:location "databases/db" ; . :indexLucene a text:TextIndexLucene ; text:directory "databases/db-index" ; text:entityMap :entMap ; text:storeValues true ; text:analyzer [ a text:StandardAnalyzer ; # text:stopWords ("the" "a" "an" "and" "but") ] ; # text:queryAnalyzer [ a text:StandardAnalyzer ] ; text:queryParser text:QueryParser ; # text:multilingualSupport true ; # optional . # Entity map (see documentation for other options) :entMap a text:EntityMap ; text:defaultField "tesValue" ; text:entityField "uri" ; text:uidField "uid" ; text:langField "lang" ; text:graphField "graph" ; text:map ( [ text:field "tesValue" ; text:predicate tes:indexedValue ] ) . ################################################################################################ There are very similar SPARQL queries: * with "text:query" clause: PREFIX tes: http://mycompany/tes/ PREFIX fhir: http://hl7.org/fhir/ PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX owl: http://www.w3.org/2002/07/owl# PREFIX xsd: http://www.w3.org/2001/XMLSchema# PREFIX skos: http://www.w3.org/2004/02/skos/core# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX text: http://jena.apache.org/text# SELECT DISTINCT ?this ?json WHERE { ?this rdf:type fhir:CodeSystem . ?this fhir:Resource.jsonContent/fhir:value ?json . ?this fhir:CodeSystem.name/text:query (tes:indexedValue '*Allergy*') } * and with "FILTER contains" clause: PREFIX tes: http://cgm.com/tes/ PREFIX fhir: http://hl7.org/fhir/ PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX owl: http://www.w3.org/2002/07/owl# PREFIX xsd: http://www.w3.org/2001/XMLSchema# PREFIX skos: http://www.w3.org/2004/02/skos/core# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX text: http://jena.apache.org/text# SELECT DISTINCT ?this ?json WHERE { ?this rdf:type fhir:CodeSystem . ?this fhir:Resource.jsonContent/fhir:value ?json . ?this fhir:CodeSystem.name/tes:indexedValue ?name FILTER contains(?name, "Allergy") } ========================================================================================== Log from fuseki: 15:19:33 INFO Fuseki :: [4] POST http://localhost:3030/tes/sparql 15:19:33 INFO Fuseki :: [4] Query = PREFIX tes: http://mycomany/tes/ PREFIX fhir: http://hl7.org/fhir/ PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX owl: http://www.w3.org/2002/07/owl# PREFIX xsd: http://www.w3.org/2001/XMLSchema# PREFIX skos: http://www.w3.org/2004/02/skos/core# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX text: http://jena.apache.org/text# SELECT DISTINCT ?this ?json WHERE { ?this rdf:type fhir:CodeSystem . ?this fhir:Resource.jsonContent/fhir:value ?json . ?this fhir:CodeSystem.name/tes:indexedValue ?name FILTER contains(?name, "Allergy") } 15:19:33 INFO Fuseki :: [4] 200 OK (55 ms) 15:20:25 INFO Fuseki :: [5] POST http://localhost:3030/tes/sparql 15:20:25 INFO Fuseki :: [5] Query = PREFIX tes: http://mycomany/tes/ PREFIX fhir: http://hl7.org/fhir/ PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX owl: http://www.w3.org/2002/07/owl# PREFIX xsd: http://www.w3.org/2001/XMLSchema# PREFIX skos: http://www.w3.org/2004/02/skos/core# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX text: http://jena.apache.org/text# SELECT DISTINCT ?this ?json WHERE { ?this rdf:type fhir:CodeSystem . ?this fhir:Resource.jsonContent/fhir:value ?json . ?this fhir:CodeSystem.name/text:query (tes:indexedValue '*Allergy*') } 15:20:36 INFO Fuseki :: [5] 200 OK (10,888 s) ========================================================================================== There is no difference between standard and docker installations. I even found bug https://issues.apache.org/jira/browse/JENA-999 regarding performance, which is already fixed in version 3.1.0 , while I'm currently using version 4.4.0. Did anyone notice the same problem? Or maybe I'm doing something wrong? Or I must do some additional magic configuration? Is there any solution for this problem?
smime.p7s
Description: S/MIME cryptographic signature
