Hi Andy, I take a look at the code under ..\jena-arq\src\main\java\org\apache\jena\sparql\engine
And I found the statistic-based optimizer reorders the patterns by weight and then executes on plan given by the reordered patterns. The weights are assigned based on the type of patterns. e.g., s p ? gives weight of 2, ? p o is weighted 10. I think it is a smart and quick way to optimize, although it might not get optimal optimization. Best Regards, Wei Zhang -----Original Message----- From: Andy Seaborne [mailto:[email protected]] Sent: Tuesday, 8 September 2015 12:41 AM To: [email protected] Subject: Re: Does TDB have command to see estimated query execution time and row count ? On 07/09/15 15:37, Wei Zhang wrote: > Hi Andy, > > I think there is criteria to choose better plan. > I will look at the source code. Hope it will help... > > Thank you very much for your time. > > Wei Zhang Great - I look forward to hearing form you. Andy > > > -----Original Message----- > From: Andy Seaborne [mailto:[email protected]] > Sent: Monday, 7 September 2015 11:56 PM > To: [email protected] > Subject: Re: Does TDB have command to see estimated query execution time and > row count ? > > Hi there, > > The optimizer does not try to estimate the execution time. It is not a fully > fledged, top-to-bottom cost-based optimizer. It does the best it can based > on heuristics. As has been discovered in Jena and elsewhere, SPARQL can also > be used in very simple fashion where the optimizer cost can be more than just > doing the query. > > QueryExecUtils will actually execute the algebra expression. I mentioned the > class because you'll probably want to execute algebra directly to get > comparisons. And of course, you can look at the source code! > > Andy > > On 07/09/15 13:42, Wei Zhang wrote: >> Hi Andy, >> >> Thank you very much for your help! >> I think per your suggestion, I can compare the performance with and without >> optimizer. >> But how can I get optimizer's estimated query time? Which I plan to compare >> with the real execution time? >> >> Do you mean when I execute algebra expressions directly using tools like >> QueryExecUtils, then the time I get can be considered as estimated time? >> I am not sure if my understanding is correct... >> >> Best Regards, >> Wei Zhang >> >> -----Original Message----- >> From: Andy Seaborne [mailto:[email protected]] >> Sent: Monday, 7 September 2015 9:26 PM >> To: [email protected] >> Subject: Re: Does TDB have command to see estimated query execution time and >> row count ? >> >> On 07/09/15 05:30, Wei Zhang wrote: >>> Dear All, >>> >>> From the document >>> (https://jena.apache.org/documentation/tdb/optimizer.html), it is said TDB >>> optimizer has both static and dynamic optimizations. >>> How can I get the estimated query time and row count instead of the actual >>> time and row count after static/dynamic optimization? >>> Another question is that I think "tdbquery -explain" gives the query plan >>> after execution, but it also cannot provide the information I want. >>> >>> What I want is to find the TDB optimizer's performance. >>> >>> Could anyone help? >>> >>> Thank you very much for your time. >>> >>> Best Regards, >>> Wei >>> >>> >> >> Wei, >> >> I think you have a model of how the optimizer works but it's at odds with >> what it actually does. It is not strongly based around cost estimation >> although TDB does a little of that. >> >> The high level optimizations, done at the start of query execution, are a >> set of rule based rewrites that look for patterns in the algebra and produce >> better algebra. In particular, these are not based on the data. >> Rather the rules are ways to standard SPARQL algebra (exactly as >> produced by the transformation in the spec) into better (nearly always!) >> algebra. That includes introducing a new operators (like "TopN") as well as >> rewriting using existing operators (like filter/equality into a pattern with >> that term and a BIND). >> >> This is printed by "qparse --print=opt" >> >> TDB adds reordering basic graph patterns, either by the rule based method >> described at that link or a fixed way (roughly - choose mist grounded triple >> pattern, but avoid rdf:type). >> >> There are tools (QueryExecUtils) to execute algebra expressions directly so >> combined with the optimizer switched off, you can try out different >> possibilities. >> >> Andy >> >> >> >
