Something weird with your example which doesn't have the same result, see row count with explain analyze:
cruz=# SELECT version(); version -------------------------------------------------------------------------------------------- PostgreSQL 8.3.5 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.2-1) 4.3.2 (1 registro) cruz=# EXPLAIN ANALYZE select * from t1 where id not in (select id from t2); QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Seq Scan on t1 (cost=1643.00..4928.00 rows=100000 width=4) (actual time=256.687..585.774 rows=73653 loops=1) Filter: (NOT (hashed subplan)) SubPlan -> Seq Scan on t2 (cost=0.00..1393.00 rows=100000 width=4) (actual time=0.052..86.867 rows=100000 loops=1) Total runtime: 625.471 ms (5 registros) cruz=# EXPLAIN ANALYZE select * from t1 except all (select id from t2); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- SetOp Except All (cost=34469.90..35969.90 rows=30000 width=4) (actual time=2598.574..3663.712 rows=126733 loops=1) -> Sort (cost=34469.90..35219.90 rows=300000 width=4) (actual time=2598.550..3178.387 rows=300000 loops=1) Sort Key: "*SELECT* 1".id Sort Method: external merge Disk: 5864kB -> Append (cost=0.00..7178.00 rows=300000 width=4) (actual time=0.037..1026.367 rows=300000 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..4785.00 rows=200000 width=4) (actual time=0.035..439.507 rows=200000 loops=1) -> Seq Scan on t1 (cost=0.00..2785.00 rows=200000 width=4) (actual time=0.029..161.355 rows=200000 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..2393.00 rows=100000 width=4) (actual time=0.107..255.160 rows=100000 loops=1) -> Seq Scan on t2 (cost=0.00..1393.00 rows=100000 width=4) (actual time=0.097..110.639 rows=100000 loops=1) Total runtime: 3790.831 ms (10 registros) </pre> Sometimes I got a better result (on older versions) with this kind of query, but in this case it doesn't: cruz=# EXPLAIN ANALYZE SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.id WHERE t2.id IS NULL; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Merge Right Join (cost=30092.86..35251.53 rows=155304 width=8) (actual time=850.232..1671.091 rows=73653 loops=1) Merge Cond: (t2.id = t1.id) Filter: (t2.id IS NULL) -> Sort (cost=9697.82..9947.82 rows=100000 width=4) (actual time=266.501..372.560 rows=100000 loops=1) Sort Key: t2.id Sort Method: quicksort Memory: 4392kB -> Seq Scan on t2 (cost=0.00..1393.00 rows=100000 width=4) (actual time=0.029..78.087 rows=100000 loops=1) -> Sort (cost=20394.64..20894.64 rows=200000 width=4) (actual time=583.699..855.427 rows=273364 loops=1) Sort Key: t1.id Sort Method: quicksort Memory: 8784kB -> Seq Scan on t1 (cost=0.00..2785.00 rows=200000 width=4) (actual time=0.087..155.665 rows=200000 loops=1) Total runtime: 1717.062 ms (12 registros) </pre> Regards, "??????? ????????" <[EMAIL PROTECTED]> escreveu: >Hello. > >It's second query rewrite postgresql seems not to handle - making EXCEPT >from NOT IT. >Here is an example: >Preparation: > >drop table if exists t1; >drop table if exists t2; >create temporary table t1(id) as >select >(random()*100000)::int from generate_series(1,200000) a(id); > >create temporary table t2(id) as >select >(random()*100000)::int from generate_series(1,100000) a(id); >analyze t1; >analyze t2; > >Query 1: >select * from t1 where id not in (select id from t2); >Plan: >"Seq Scan on t1 (cost=1934.00..164105319.00 rows=100000 width=4)" >" Filter: (NOT (subplan))" >" SubPlan" >" -> Materialize (cost=1934.00..3325.00 rows=100000 width=4)" >" -> Seq Scan on t2 (cost=0.00..1443.00 rows=100000 width=4)" > >Query 2 (gives same result as Q1): >select * from t1 except all (select id from t2); >Plan: >"SetOp Except All (cost=38721.90..40221.90 rows=30000 width=4)" >" -> Sort (cost=38721.90..39471.90 rows=300000 width=4)" >" Sort Key: "*SELECT* 1".id" >" -> Append (cost=0.00..7328.00 rows=300000 width=4)" >" -> Subquery Scan "*SELECT* 1" (cost=0.00..4885.00 >rows=200000 width=4)" >" -> Seq Scan on t1 (cost=0.00..2885.00 rows=200000 >width=4)" >" -> Subquery Scan "*SELECT* 2" (cost=0.00..2443.00 >rows=100000 width=4)" >" -> Seq Scan on t2 (cost=0.00..1443.00 rows=100000 >width=4)" > >If I am correct, planner simply do not know that he can rewrite NOT IN as >"EXCEPT ALL" operator, so all NOT INs when list of values to remove is long >takes very much time. >So the question is: I am willing to participate in postgresql development >because it may be easier to fix planner then to rewrite all my queries :). >How can I? (I mean to work on query planner enhancements by providing new >options of query rewrite, not to work on other thing nor on enhancing >planner in other ways, like better estimations of known plans). > > > > -- <span style="color: #000080">Daniel Cristian Cruz </span>Administrador de Banco de Dados Direção Regional - Núcleo de Tecnologia da Informação SENAI - SC Telefone: 48-3239-1422 (ramal 1422)