On 3 March 2014 23:00, Florian Pflug <f...@phlo.org> wrote:
>> * In show_windowagg_info(), this calculation looks suspicious to me:
>>        double tperrow = winaggstate->aggfwdtrans /
>>            (inst->nloops * inst->ntuples);
>> If the node is executed multiple times, aggfwdtrans will be reset in
>> each loop, so the transitions per row figure will be under-estimated.
>> ISTM that if you want to report on this, you'd need aggfwdtrans to be
>> reset once per query, but I'm not sure exactly how to do that.
>> ...
>> Actually, I think it's misleading to only count forward transition
>> function calls, because a call to the inverse transition function
>> still represents a state transition, and is likely to be around the
>> same cost. For a window of size 2, there would not be much advantage
>> to using inverse transition functions, because it would be around 2
>> transitions per row either way.
> True. In fact, I pondered whether to avoid using the inverse transition
> function for windows of 2 rows. In the end, I didn't because I felt that
> it makes custom aggregates harder to test.
> On the question of whether to count inverse transition function calls -
> the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the
> number of state transitions, but rather to show whether the aggregation
> has O(n) or O(n^2) behaviour. The idea being that a value close to "1"
> means "inverse transition function works as expected", and larger values
> mean "not working so well".
> Regarding multiple evaluations - I think I based the behaviour on how
> ntuples works, which also only reports the value of the last evaluation
> I think. But maybe I'm confused about this.

No, it doesn't look like that's correct for multiple loops. Consider
this example:

explain (verbose, analyse)
  select * from (values (10), (20), (30), (40)) v(x),
  (select sum(i) over (rows between 4 preceding and current row)
     from generate_series(1, x) i) t;

 Nested Loop  (cost=0.00..170.06 rows=4000 width=12) (actual
time=0.027..0.414 rows=100 loops=1)
   Output: "*VALUES*".column1, (sum(i.i) OVER (?))
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.05 rows=4 width=4)
(actual time=0.002..0.006 rows=4 loops=1)
         Output: "*VALUES*".column1
   ->  WindowAgg  (cost=0.00..22.50 rows=1000 width=4) (actual
time=0.019..0.094 rows=25 loops=4)
         Output: sum(i.i) OVER (?)
         Transitions Per Row: 0.2
         ->  Function Scan on pg_catalog.generate_series i
(cost=0.00..10.00 rows=1000 width=4) (actual time=0.010..0.015 rows=25
               Output: i.i
               Function Call: generate_series(1, "*VALUES*".column1)

It turns out that show_windowagg_info() is only called once at the
end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing
tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get
1, you'd have to use this formula:

    double tperrow = winaggstate->aggfwdtrans / inst->ntuples;

I'm still not convinced that's the most useful thing to report though.
Personally, I'd prefer to just see the separate counts, e.g.:

   ->  WindowAgg  (cost=0.00..22.50 rows=1000 width=4) (actual
time=0.019..0.094 rows=25 loops=4)
         Output: sum(i.i) OVER (?)
         Forward transitions: 25  Inverse transitions: 25

IMO that gives a clearer picture of what's going on.



Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to