Re: Long response times on TDB queries

Andy Seaborne Sun, 17 Jun 2018 08:21:42 -0700

Adam,

The query is "FROM <urn:x-arq:UnionGraph>".


A better query is "WHERE { GRAPH <urn:x-arq:UnionGraph> { ...} }

ors set the dataset to use tdb:defaultUnionGraph.

The first has to execute a union in a completely general way (calls TDBtripe patetrn by tripe patetrn - no stats) whereas the second passes tehpattern to TDB to deal with it and TDB does a more efficient union graph.

This would also get round a problem with reorder which means in thiscase, the reordering isn't happening (a bit of a "doh!" moment) and itshould. Reordered as JENA-1564.


Thanks for a detailed and complete report,

    Andy

On 15/06/18 13:59, Rob Vesse wrote:

Ok, did the output show that the stats.opt file was actually being used?

Note the file has to be named exactly that and be placed in the database 
directory

You can modify the generated file to tweak the optimiser behaviour if you still 
don't get good reorderings even with the statistics, the documentation I linked 
also explains the format of that file and how to add custom rules to it as 
necessary

Attachments are scrubbed from Apache lists, if you do want to share the 
generated stats file you can use a service like Gist/Pastebin/Dropbox etc to 
upload it and simply send a link to it to the list

Rob

On 15/06/2018, 13:52, "Adam Ladly" <[email protected]> wrote:

     Rob,
I had actually generated the statistics file prior to getting in touch. I meant to attach it to the original message, but it was too large, the message wouldn't send. It, however, didn't seem to make a large dent in the time the query took to execute.Adam________________________________
     From: Rob Vesse <[email protected]>
     Sent: 15 June 2018 13:48
     To: [email protected]
     Subject: Re: Long response times on TDB queries
AdamDid you try generating the statistics file as well?Doing that should allow TDB to more intelligently reorder triple pattern execution within BGPs and hopefully avoid the need to manually adjust queriesRobOn 15/06/2018, 13:13, "Adam Ladly" <[email protected]> wrote:Hi Rob,Just tried re-ordering it as you suggested, and this was exactly the improvement that I needed. Now getting returns in 0.1 seconds rather than 200+.Much appreciated,Adam________________________________
         From: Rob Vesse <[email protected]>
         Sent: 15 June 2018 12:12
         To: [email protected]
         Subject: Re: Long response times on TDB queries
On 15/06/2018, 11:46, "Adam Ladly" <[email protected]> wrote:22:18:21 INFO exec :: Reorder/generic
             ?node <http://example.com/idmapping> ?id
             ?node ?p "http://www.ncbi.nlm.nih.gov/gene/1956";
So it looks like TDB is using a generic execution ordering and for your data this is clearly sub-optimal, I suspect you have far more triples matching the first pattern than the second.Firstly I would try using { } to rewrite your query and explicitly force the execution order i.e.PREFIX els: <http://example.com/> SELECT DISTINCT ?id FROM <urn:x-arq:UnionGraph> WHERE { { ?node ?p "http://www.ncbi.nlm.nih.gov/gene/1956"; } ?node els:idmapping ?id . }Although the optimiser might flatten that into a single BGP anyway in which case you might still have the same problem.Secondly I would try generating a stats.opt file per the documentation - http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-fileOnce you have this copy it to stats.opt in your database directory (per the documentation don't try and generate directly to this location) and then retry your original query. You should then hopefully see Reorder/stats being used which if your data is structured as I suspect the statistics should reverse the order of your triple patterns leading to faster executionHope this Helps,Rob

Re: Long response times on TDB queries

Reply via email to