RE: Does TDB have command to see estimated query execution time and row count ?

Wei Zhang Mon, 07 Sep 2015 18:12:05 -0700

Hi Andy,

I take a look at the code under 
..\jena-arq\src\main\java\org\apache\jena\sparql\engine


And I found the statistic-based optimizer reorders the patterns by weight and 
then executes on plan given by the reordered patterns.
The weights are assigned based on the type of patterns. e.g.,  s p ? gives 
weight of 2, ? p o is weighted 10.

I think it is a smart and quick way to optimize, although it might not get 
optimal optimization.


Best Regards,
Wei Zhang


-----Original Message----- 
From: Andy Seaborne [mailto:[email protected]] 
Sent: Tuesday, 8 September 2015 12:41 AM
To: [email protected]
Subject: Re: Does TDB have command to see estimated query execution time and 
row count ?

On 07/09/15 15:37, Wei Zhang wrote:
> Hi Andy,
>
> I think there is criteria to choose better plan.
> I will look at the source code. Hope it will help...
>
> Thank you very much for your time.
>
> Wei Zhang

Great - I look forward to hearing form you.
        
        Andy

>
>
> -----Original Message-----
> From: Andy Seaborne [mailto:[email protected]]
> Sent: Monday, 7 September 2015 11:56 PM
> To: [email protected]
> Subject: Re: Does TDB have command to see estimated query execution time and 
> row count ?
>
> Hi there,
>
> The optimizer does not try to estimate the execution time.  It is not a fully 
> fledged, top-to-bottom cost-based optimizer.  It does the best it can based 
> on heuristics.  As has been discovered in Jena and elsewhere, SPARQL can also 
> be used in very simple fashion where the optimizer cost can be more than just 
> doing the query.
>
> QueryExecUtils will actually execute the algebra expression.  I mentioned the 
> class because you'll probably want to execute algebra directly to get 
> comparisons.  And of course, you can look at the source code!
>
>       Andy
>
> On 07/09/15 13:42, Wei Zhang wrote:
>> Hi Andy,
>>
>> Thank you very much for your help!
>> I think per your suggestion, I can compare the performance with and without 
>> optimizer.
>> But how can I get optimizer's estimated query time? Which I plan to compare 
>> with the real execution time?
>>
>> Do you mean when I execute algebra expressions directly using tools like 
>> QueryExecUtils, then the time I get can be considered as estimated time?
>> I am not sure if my understanding is correct...
>>
>> Best Regards,
>> Wei Zhang
>>
>> -----Original Message-----
>> From: Andy Seaborne [mailto:[email protected]]
>> Sent: Monday, 7 September 2015 9:26 PM
>> To: [email protected]
>> Subject: Re: Does TDB have command to see estimated query execution time and 
>> row count ?
>>
>> On 07/09/15 05:30, Wei Zhang wrote:
>>> Dear All,
>>>
>>>    From the document 
>>> (https://jena.apache.org/documentation/tdb/optimizer.html), it is said TDB 
>>> optimizer has both static and dynamic optimizations.
>>> How can I get the estimated query time and row count instead of the actual 
>>> time and row count after static/dynamic optimization?
>>> Another question is that I think "tdbquery -explain" gives the query plan 
>>> after execution,  but it also cannot provide the information I want.
>>>
>>> What I want is to find the TDB optimizer's performance.
>>>
>>> Could anyone help?
>>>
>>> Thank you very much for your time.
>>>
>>> Best Regards,
>>> Wei
>>>
>>>
>>
>> Wei,
>>
>> I think you have a model of how the optimizer works but it's at odds with 
>> what it actually does.  It is not strongly based around cost estimation 
>> although TDB does a little of that.
>>
>> The high level optimizations, done at the start of query execution, are a 
>> set of rule based rewrites that look for patterns in the algebra and produce 
>> better algebra.  In particular, these are not based on the data.
>>     Rather the rules are ways to standard SPARQL algebra (exactly as 
>> produced by the transformation in the spec) into better (nearly always!) 
>> algebra.  That includes introducing a new operators (like "TopN") as well as 
>> rewriting using existing operators (like filter/equality into a pattern with 
>> that term and a BIND).
>>
>> This is printed by "qparse --print=opt"
>>
>> TDB adds reordering basic graph patterns, either by the rule based method 
>> described at that link or a fixed way (roughly - choose mist grounded triple 
>> pattern, but avoid rdf:type).
>>
>> There are tools (QueryExecUtils) to execute algebra expressions directly so 
>> combined with the optimizer switched off, you can try out different 
>> possibilities.
>>
>>      Andy
>>
>>
>>
>

RE: Does TDB have command to see estimated query execution time and row count ?

Reply via email to