Hi Pawel,

I think this could be due to the text:query being evaluated late in the
query, and other statements first computing many results, before the text
query limits it down. Maybe the contains filter gets applied earlier?

Would reordering the statements, expanding the property path and/or
enclosing the statement with the text:query in curly brackets help?

SELECT DISTINCT  ?this ?json WHERE   {
  { ?name text:query (tes:indexedValue '*Allergy*') .}
   ?this fhir:CodeSystem.name ?name.
  ?this rdf:type  fhir:CodeSystem .  ?this
fhir:Resource.jsonContent/fhir:value ?json .}

Another approach I use on text queries is using subqueries, for smaller
batched results, but you may have to expand the default text:query lucene
limit to walk through all results.

SELECT DISTINCT  ?this ?json WHERE {
  {SELECT ?name { ?name text:query (tes:indexedValue '*Allergy*') .}
#  LIMIT N OFFSET 0
}
   ?this fhir:CodeSystem.name ?name.
  ?this rdf:type fhir:CodeSystem .?this
fhir:Resource.jsonContent/fhir:value ?json .}

I do use text:query on larger indexes on a similar server configuration,
without experiencing any issues, but I haven't compared results for filter
contains and text:query.

Best regards,
Øyvind

On Wed, Jun 15, 2022 at 3:37 PM Goławski, Paweł <[email protected]>
wrote:

> Hi,
>
> I’m trying to use Jena Full Text Search feature according to
> https://jena.apache.org/documentation/query/text-query.html
>
> I’ve noticed that queries using “*text:query”* are very slow: ~20 times
> slower that similar using “*FILTER contains”* clause.
>
> There are ~5.5M triples in database, 18230 triples with indexed predicate.
>
> Database takes 1.3GB and index 4.2M disc space.
>
> Available memory for fuseki server is 16GB.
>
>
>
> My config is quite easy, there is nothing special configured:
>
>
>
> *################################################################################################*PREFIX
>  :        <#>
> PREFIX fuseki:  http://jena.apache.org/fuseki#
> PREFIX rdf:     http://www.w3.org/1999/02/22-rdf-syntax-ns#
> PREFIX rdfs:    http://www.w3.org/2000/01/rdf-schema#
> PREFIX ja:      http://jena.hpl.hp.com/2005/11/Assembler#
> PREFIX tdb:     http://jena.hpl.hp.com/2008/tdb#
> PREFIX tdb2:    http://jena.apache.org/2016/tdb#
> PREFIX text:    http://jena.apache.org/text#
> PREFIX skos:    http://www.w3.org/2004/02/skos/core#
> PREFIX fhir:    http://hl7.org/fhir/
> PREFIX tes:     http://mycompany/tes/
>
> [] rdf:type fuseki:Server ;
>    fuseki:*services *(
>                        :service
>                    ) .
>
> :service rdf:type fuseki:Service ;
>                      fuseki:*name *"tes" ;
>                      fuseki:*serviceQuery               *"query" , "sparql" ;
> *# SPARQL query service                     *fuseki:*serviceUpdate            
>   *"update" ;
> *# SPARQL update service                     
> *fuseki:*serviceReadWriteGraphStore *"data" ;
> *# SPARQL Graph store protocol (read and write)                     
> *fuseki:*serviceReadGraphStore      *"get" ;
>                      fuseki:*serviceUpload              *"upload" ;
>                      fuseki:*dataset *:text_dataset ;
> .
>
>
> *# A TextDataset is a regular dataset with a text index.*:text_dataset 
> rdf:type    text:TextDataset ;
>                           text:*dataset   *:tdb2_dataset_readwrite;
>                           text:*index     *:indexLucene ;
> .
>
>
> *# A TDB dataset used for RDF storage*:tdb2_dataset_readwrite rdf:type 
> tdb2:DatasetTDB ;
>     tdb2:*location  *"databases/db" ;
> .
>
>
> :indexLucene a text:TextIndexLucene ;
>      text:*directory *"databases/db-index" ;
>      text:*entityMap *:entMap ;
>      text:*storeValues *true ;
>      text:*analyzer *[
>                        a text:StandardAnalyzer ;
>
> *#                       text:stopWords ("the" "a" "an" "and" "but")          
>          *] ;
>
> *#    text:queryAnalyzer [ a text:StandardAnalyzer ] ;     *text:*queryParser 
> *text:QueryParser ;
>
> *# text:multilingualSupport true ; # optional*.
>
> *# Entity map (see documentation for other options)*:entMap a text:EntityMap ;
>             text:*defaultField     *"tesValue" ;
>             text:*entityField      *"uri" ;
>             text:*uidField         *"uid" ;
>             text:*langField        *"lang" ;
>             text:*graphField       *"graph" ;
>             text:*map *(
>                          [ text:*field *"tesValue" ;
>                            text:*predicate *tes:indexedValue
>                          ]
>                      )
> .
>
> *################################################################################################*
>
>
>
> There are very similar SPARQL queries:
>
> ·         with “text:query” clause:
>
>
>
> PREFIX  tes:  http://mycompany/tes/
>
> PREFIX  fhir: http://hl7.org/fhir/
>
> PREFIX  rdf:  http://www.w3.org/1999/02/22-rdf-syntax-ns#
>
> PREFIX  owl:  http://www.w3.org/2002/07/owl#
>
> PREFIX  xsd:  http://www.w3.org/2001/XMLSchema#
>
> PREFIX  skos: http://www.w3.org/2004/02/skos/core#
>
> PREFIX  rdfs: http://www.w3.org/2000/01/rdf-schema#
>
> PREFIX  text: http://jena.apache.org/text#
>
>
>
> SELECT DISTINCT  ?this ?json
>
> WHERE
>
>   { ?this  rdf:type  fhir:CodeSystem .
>
>     ?this fhir:Resource.jsonContent/fhir:value ?json .
>
>     ?this fhir:CodeSystem.name/text:query (tes:indexedValue '*Allergy*')
>
>   }
>
>
>
> ·         and with “*FILTER contains”* clause:
>
>
>
> PREFIX  tes:  http://cgm.com/tes/
>
> PREFIX  fhir: http://hl7.org/fhir/
>
> PREFIX  rdf:  http://www.w3.org/1999/02/22-rdf-syntax-ns#
>
> PREFIX  owl:  http://www.w3.org/2002/07/owl#
>
> PREFIX  xsd:  http://www.w3.org/2001/XMLSchema#
>
> PREFIX  skos: http://www.w3.org/2004/02/skos/core#
>
> PREFIX  rdfs: http://www.w3.org/2000/01/rdf-schema#
>
> PREFIX  text: http://jena.apache.org/text#
>
>
>
> SELECT DISTINCT  ?this ?json
>
> WHERE
>
>   { ?this  rdf:type  fhir:CodeSystem .
>
>     ?this fhir:Resource.jsonContent/fhir:value ?json .
>
>     ?this fhir:CodeSystem.name/tes:indexedValue ?name FILTER contains(?name, 
> "Allergy")
>
>   }
>
>
> ==========================================================================================
>
> Log from fuseki:
>
>
>
> 15:19:33 INFO  Fuseki          :: [4] POST http://localhost:3030/tes/sparql
>
> 15:19:33 INFO  Fuseki          :: [4] Query = PREFIX  tes:  
> http://mycomany/tes/ PREFIX  fhir: http://hl7.org/fhir/ PREFIX  rdf:  
> http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX  owl:  
> http://www.w3.org/2002/07/owl# PREFIX  xsd:  
> http://www.w3.org/2001/XMLSchema# PREFIX  skos: 
> http://www.w3.org/2004/02/skos/core# PREFIX  rdfs: 
> http://www.w3.org/2000/01/rdf-schema# PREFIX  text: 
> http://jena.apache.org/text#  SELECT DISTINCT  ?this ?json WHERE   { ?this  
> rdf:type  fhir:CodeSystem .     ?this fhir:Resource.jsonContent/fhir:value 
> ?json .      ?this fhir:CodeSystem.name/tes:indexedValue ?name FILTER 
> contains(?name, "Allergy")   }
>
> 15:19:33 INFO  Fuseki          :: [4] 200 OK (55 ms)
>
>
>
> 15:20:25 INFO  Fuseki          :: [5] POST http://localhost:3030/tes/sparql
>
> 15:20:25 INFO  Fuseki          :: [5] Query = PREFIX  tes:  
> http://mycomany/tes/ PREFIX  fhir: http://hl7.org/fhir/ PREFIX  rdf:  
> http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX  owl:  
> http://www.w3.org/2002/07/owl# PREFIX  xsd:  
> http://www.w3.org/2001/XMLSchema# PREFIX  skos: 
> http://www.w3.org/2004/02/skos/core# PREFIX  rdfs: 
> http://www.w3.org/2000/01/rdf-schema# PREFIX  text: 
> http://jena.apache.org/text#  SELECT DISTINCT  ?this ?json WHERE   { ?this  
> rdf:type  fhir:CodeSystem .     ?this fhir:Resource.jsonContent/fhir:value 
> ?json .      ?this fhir:CodeSystem.name/text:query (tes:indexedValue 
> '*Allergy*')   }
>
> 15:20:36 INFO  Fuseki          :: [5] 200 OK (10,888 s)
>
>
> ==========================================================================================
>
>
>
> There is no difference between standard and docker installations.
>
> I even found bug https://issues.apache.org/jira/browse/JENA-999 regarding
> performance, which is already fixed in version 3.1.0 , while I’m currently
> using version 4.4.0.
>
> Did anyone notice the same problem?
>
> Or maybe I’m doing something wrong?
>
> Or I must do some additional magic configuration?
>
> Is there any solution for this problem?
>

Reply via email to