Re: Slow query execution

Rob Vesse Thu, 27 Apr 2017 06:30:37 -0700

You haven’t really given enough details for anyone to provide a specific 
analysis.


 How did you load the dataset?

It is possible that it took the first hour to load the data though that would 
be quite slow. How many triples are in the dataset? Do you have a spinning or 
flash disk? Flash disk will be much faster. This would also explain why the 
query timed out, if the database was locked for a write transaction no queries 
which are read transactions would be allowed to proceed during that time as the 
concurrency is Multiple Readers OR Single Writer (MRSW)

Did other queries work in the same timeframe? I suspect not.

Since you mention Fuseki I assume you use the default setup whereby you create 
a TDB database? i.e. you specify a --loc argument at the command line

 The performance problem here it Is most likely caused by the use of the 
regular expression. In order to answer the query the query engine first has to 
find all matching patterns and then for each possible match evaluate the 
regular expression against it. This requires looking up the full value of the 
variable being queried which is in a separate lookup table from the main 
indices since the main indices store only internal identifiers for efficiency. 
This is what is known as dictionary encoding and is standard across most RDF 
databases.

 Try removing the filter and simply counting the number of results, this will 
tell you how many times it has to attempt the regular expression.

For simpler searches you will be likely better served by using alternative 
functions e.g.

CONTAINS(?label, “entity”)

Or

SAMETERM(?label, “entity”)

Although those won’t give you case insensitivity. If you intend to do a lots of 
text search you would be better off using  the text indexing extensions.

Rob

On 27/04/2017 11:36, "Laura Morales" <[email protected]> wrote:

    -------------------------------------
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX wn31: <http://wordnet-rdf.princeton.edu/wn31/>
    PREFIX wno: <http://wordnet-rdf.princeton.edu/ontology#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    
    SELECT *
    WHERE
    {
      ?synset  a wno:Synset ;
           rdfs:label ?label ;
           wno:gloss ?gloss .
      
      FILTER regex(?label, "entity", "i")
    }
    LIMIT 10
    -------------------------------------
    
    what happened:
    
    - for a long time after loading the dataset (more than 1h), this last query 
timed out all the times I submitted it. So I thought it was a problem with 
indexes, and that I should read more about Jena indexes
    - but now all of a sudden it seems to work, albeit the query seems to take 
a few seconds to complete (which still feels a bit too slow since THE DATABASE 
is local on the same machine, and the dataset is not *that* huge)
    
    
    Does anybody know what I've run into? Do indexes have anything to do with 
this, or maybe some jena/fuseki cache/bootstrap activity, or it's just some 
monkey business going on with my computer? This feels so strange because I 
don't think I have done anything relevant with my computer that could have 
influenced this query. I just loaded the dataset into Fuseki, then started 
querying...

Re: Slow query execution

Reply via email to