[PERFORM] plan difference between set-returning function with ROWS within IN() and a plain join

2008-05-06 Thread Frank van Vugt
L.S.

I'm noticing a difference in planning between a join and an in() clause, 
before trying to create an independent test-case, I'd like to know if there's 
an obvious reason why this would be happening:


= the relatively simple PLPGSQL si_credit_tree() function has 'ROWS 5' in 
it's definition


df=# select version();
version

 PostgreSQL 8.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
(1 row)



db=# explain analyse
select sum(si.base_total_val)
from sales_invoice si, si_credit_tree(8057) foo(id)
where si.id = foo.id;
QUERY PLAN  
   
-
 Aggregate  (cost=42.73..42.74 rows=1 width=8) (actual time=0.458..0.459 
rows=1 loops=1)
   -  Nested Loop  (cost=0.00..42.71 rows=5 width=8) (actual 
time=0.361..0.429 rows=5 loops=1)
 -  Function Scan on si_credit_tree foo  (cost=0.00..1.30 rows=5 
width=4) (actual time=0.339..0.347 rows=5 loops=1)
 -  Index Scan using sales_invoice_pkey on sales_invoice si  
(cost=0.00..8.27 rows=1 width=12) (actual time=0.006..0.008 rows=1 loops=5)
   Index Cond: (si.id = foo.id)

Total runtime: 0.562 ms




db=# explain analyse
select sum(base_total_val)
from sales_invoice
where id in (select id from si_credit_tree(8057));
   QUERY PLAN   
   
-
 Aggregate  (cost=15338.31..15338.32 rows=1 width=8) (actual 
time=3349.401..3349.402 rows=1 loops=1)
   -  Seq Scan on sales_invoice  (cost=0.00..15311.19 rows=10846 width=8) 
(actual time=0.781..3279.046 rows=21703 loops=1)
 Filter: (subplan)
 SubPlan
   -  Function Scan on si_credit_tree  (cost=0.00..1.30 rows=5 
width=0) (actual time=0.146..0.146 rows=1 loops=21703)

Total runtime: 3349.501 ms





I'd hoped the planner would use the ROWS=5 knowledge a bit better:


db=# explain analyse
select sum(base_total_val)
from sales_invoice
where id in (8057,8058,8059,80500010,80500011);
QUERY PLAN
--
 Aggregate  (cost=40.21..40.22 rows=1 width=8) (actual time=0.105..0.106 
rows=1 loops=1)
   -  Bitmap Heap Scan on sales_invoice  (cost=21.29..40.19 rows=5 width=8) 
(actual time=0.061..0.070 rows=5 loops=1)
 Recheck Cond: (id = ANY 
('{8057,8058,8059,80500010,80500011}'::integer[]))
 -  Bitmap Index Scan on sales_invoice_pkey  (cost=0.00..21.29 rows=5 
width=0) (actual time=0.049..0.049 rows=5 loops=1)
   Index Cond: (id = ANY 
('{8057,8058,8059,80500010,80500011}'::integer[]))

Total runtime: 0.201 ms






-- 
Best,




Frank.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] plan difference between set-returning function with ROWS within IN() and a plain join

2008-05-06 Thread Frank van Vugt
  db=# explain analyse
  select sum(base_total_val)
  from sales_invoice
  where id in (select id from si_credit_tree(8057));

 Did you check whether this query even gives the right answer?

You knew the right answer to that already ;)

 I think you forgot the alias foo(id) in the subselect and it's
 actually reducing to where id in (id), ie, TRUE.

Tricky, but completely obvious once pointed out, that's _exactly_ what was 
happening.


db=# explain analyse
select sum(base_total_val)
from sales_invoice
where id in (select id from si_credit_tree(8057) foo(id));
 QUERY 
PLAN
-
 Aggregate  (cost=42.79..42.80 rows=1 width=8) (actual time=0.440..0.441 
rows=1 loops=1)
   -  Nested Loop  (cost=1.31..42.77 rows=5 width=8) (actual 
time=0.346..0.413 rows=5 loops=1)
 -  HashAggregate  (cost=1.31..1.36 rows=5 width=4) (actual 
time=0.327..0.335 rows=5 loops=1)
   -  Function Scan on si_credit_tree foo  (cost=0.00..1.30 
rows=5 width=4) (actual time=0.300..0.306 rows=5 loops=1)
 -  Index Scan using sales_invoice_pkey on sales_invoice  
(cost=0.00..8.27 rows=1 width=12) (actual time=0.006..0.008 rows=1 loops=5)
   Index Cond: (sales_invoice.id = foo.id)

Total runtime: 0.559 ms




Thanks for the replies!


-- 
Best,




Frank.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why is the number of dead tuples causing the performance of deferred triggers to degrade so rapidly (exponentionally)?

2004-08-18 Thread Frank van Vugt
Hi Josh,

  It seems in this case the time needed for a single deferred trigger
  somehow depends on the number of dead tuples in the table

After further investigation I think I have a better grasp of what's going on.

The thing biting me here is indeed the 'delete from' on a table with a number 
of dead rows, possibly made worse in some cases where not everything can be 
handled in memory.

 I'm not clear on all of the work you're doing in the trigger.
  NB. My real-world application 'collects' id's in need for deferred work
 I think you're doing a lot more than is wise to do in triggers.

I probably wasn't clear enough on this. I'm not creating types and/or 
temporary tables or anything of that kind.

The ratio is probably explained better by this example:

- the database has knowledge on 'parts' and 'sets', the sets have a few fields 
whose content depend on the parts, but the proper value for these fields can 
only be determined by looking at all the parts of the particular set together 
(i.e. it's not a plain 'part-count' that one could update by a trigger on the 
part)

- during a transaction, a number of things will happen to various parts of 
various sets, so I have after triggers on the parts that will insert the ids 
of the sets that need an update into a set_update holding table; in turn, 
this set_update table has a deferred trigger

- upon execution of the deferred triggers, I now know that all the work on the 
parts is finished, so the deferred trigger initiates an update for the sets 
whose ids are in the update table and it will delete these ids afterwards

Now, because multiple updates to parts of the same set will result in multiple 
inserts in the update table, I want to avoid doing the set-update more that 
once. 

Obviously, it would be better to be able to 'cancel' the rest of the calls to 
the deferred trigger after it has been executed for the first time, but that 
doesn't seem possible.

Even better would be to use a 'for each statement' trigger on the set_update 
holding table instead, but it is not possible to create a deferred 'for each 
statement' trigger. ;(

So, I seem to be a bit between a rock and a hard place here, I must use 
deferred triggers in order to avoid a costly set update on each part update, 
but in such a deferred trigger I cannot avoid doing the update multiple 
times(due to the growing cost of a 'delete from' in the trigger)

Mmm, it seems that by hacking pg_trigger I am able to create a for each 
statement trigger that is 'deferrable initially deferred'.

This probably solves my problem, I will ask on 'general' whether this has any 
unforseen side effects and whether or not a 'regular' deferrable for each 
statement trigger is incorporated in v8.0.

Thanks for you reply!



-- 
Best,




Frank.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] Why is the number of dead tuples causing the performance of deferred triggers to degrading so rapidly (exponentionally)?

2004-08-17 Thread Frank van Vugt
Hi,

I'm seeing the following behaviour with the table and functions given below:

db=# insert into f select * from full_sequence(1, 1000);
INSERT 0 1000
Time: 197,507 ms
db=# insert into f select * from full_sequence(1, 1000);
INSERT 0 1000
Time: 341,880 ms
db=# insert into f select * from full_sequence(1, 1000);
INSERT 0 1000
Time: 692,603 ms
db=# insert into f select * from full_sequence(1, 1000);
INSERT 0 1000
Time: 985,253 ms
db=# insert into f select * from full_sequence(1, 1000);
INSERT 0 1000
Time: 1241,334 ms

Or even worse (fresh drop/create of the table and functions):

db=# insert into f select id from full_sequence(1, 1);
INSERT 0 1
Time: 22255,767 ms
db=# insert into f select id from full_sequence(1, 1);
INSERT 0 1
Time: 45398,433 ms
db=# insert into f select id from full_sequence(1, 1);
INSERT 0 1
Time: 67993,476 ms

Wrapping the commands in a transaction only accumulates the penalty at commit.

It seems in this case the time needed for a single deferred trigger somehow 
depends on the number of dead tuples in the table, because a vacuum of the 
table will 'reset' the query-times. However, even if I wanted to, vacuum is 
not allowed from within a function.

What is happening here? And more importantly, what can I do to prevent this?

NB. My real-world application 'collects' id's in need for deferred work, but 
this work is both costly and only needed once per base record. So I use an 
'update' table whose content I join with the actual tables in order to do the 
work for _all_ the base records involved upon the first execution of the 
deferred trigger. At the end of the trigger, this 'update' table is emptied 
so any additional deferred triggers on the same table will hardly lose any 
time. Or at least, that was the intention

*** demo script ***
drop table f cascade;
drop function tr_f_def() cascade;
drop function full_sequence(integer, integer);
drop type full_sequence_type;

create table f (id int);
create function tr_f_def() RETURNS trigger LANGUAGE 'plpgsql' STABLE STRICT
SECURITY INVOKER AS '
DECLARE
BEGIN
-- do stuff with all the ids in the table

-- delete the contents
--  delete from f;
IF EXISTS (SELECT 1 FROM f) THEN
DELETE FROM F;
VACUUM F;
END IF;

RETURN NULL;
END;';
create type full_sequence_type as (id int);
create function full_sequence(integer, integer)
RETURNS SETOF full_sequence_type
LANGUAGE 'plpgsql'
IMMUTABLE
STRICT
SECURITY INVOKER
AS 'DECLARE
my_from ALIAS FOR $1;
my_to ALIAS FOR $2;
result full_sequence_type%ROWTYPE;
BEGIN
-- just loop
FOR i IN my_from..my_to LOOP
result.id = i;
RETURN NEXT result;
END LOOP;

-- finish
RETURN;
END;';
CREATE CONSTRAINT TRIGGER f_def AFTER INSERT ON f DEFERRABLE INITIALLY 
DEFERRED FOR EACH ROW EXECUTE PROCEDURE tr_f_def();
*** demo script ***

db=# select version();
   version
-
 PostgreSQL 7.4.3 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66





-- 
Best,




Frank.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Why is the number of dead tuples causing the performance of deferred triggers to degrading so rapidly (exponentionally)?

2004-08-17 Thread Frank van Vugt
Obviously,

this part of tr_f_def():

**
-- delete the contents
--  delete from f;
IF EXISTS (SELECT 1 FROM f) THEN
DELETE FROM F;
VACUUM F;
END IF;
**


should simply read:

**
-- delete the contents
delete from f;
**



--
Best,




Frank.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ??

2004-06-16 Thread Frank van Vugt
 We don't attempt to make every possible inference (and I don't think
 you'd like it if we did).

I wasn't really asking you to, either ;))

Just trying to achieve a more in-depth understanding of the way things work.

 This example doesn't
 persuade me that it would be worth expending the cycles to do so.

In the real thing I can easily get good processing times by adding an extra 
join with article inside in the group by and simply use the constraint on 
that as well, so I'm ok with any choice you make on this.

I thought this might have been some kind of special case though, given its 
occurence on the use of group by.

 for example, zero and minus zero in IEEE-standard float arithmetic.

Good example, brings back a few memories as well ;)




Thanks for your explanation, Tom!



-- 
Best,




Frank.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] *very* inefficient choice made by the planner (regarding IN(...))

2004-06-10 Thread Frank van Vugt
Wow,

The effectiveness of the pgsql mailinglists never ceases to amaze me.

Default sort mem it was, I guess I'd simply been to cautious with this 
per-client setting.




Stephan  Tom : thanks!




-- 
Best,




Frank.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html