Hello I think so HashAggregate goes out of memory - you can try to increase a work_mem.
There are better queries for counting duplicit then cross join Regards Pavel Stehule 2011/3/21 Adam Tistler <atist...@gmail.com>: > logicops2=# explain analyze select count(*) from nodes where node_id = any( > Array(select node_id from nodes limit 100000) ); > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=1718.59..1718.60 rows=1 width=0) (actual > time=509.126..509.127 rows=1 loops=1) > InitPlan 1 (returns $0) > -> Limit (cost=0.00..1637.04 rows=100000 width=4) (actual > time=0.010..76.604 rows=100000 loops=1) > -> Seq Scan on nodes (cost=0.00..12355.41 rows=754741 width=4) > (actual time=0.008..38.105 rows=100000 loops=1) > -> Bitmap Heap Scan on nodes (cost=42.67..81.53 rows=10 width=0) (actual > time=447.274..484.283 rows=100000 loops=1) > Recheck Cond: (node_id = ANY ($0)) > -> Bitmap Index Scan on n_node_id_index (cost=0.00..42.67 rows=10 > width=0) (actual time=447.074..447.074 rows=100000 loops=1) > Index Cond: (node_id = ANY ($0)) > Total runtime: 509.209 ms > (9 rows) > > Time: 510.009 ms > > > logicops2=# explain analyze select count(*) from nodes where node_id in > (select node_id from nodes limit 100000); > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=3017.17..3017.18 rows=1 width=0) (actual > time=1052.866..1052.866 rows=1 loops=1) > -> Nested Loop (cost=2887.04..3016.67 rows=200 width=0) (actual > time=167.310..1021.540 rows=100000 loops=1) > -> HashAggregate (cost=2887.04..2889.04 rows=200 width=4) (actual > time=167.198..251.205 rows=100000 loops=1) > -> Limit (cost=0.00..1637.04 rows=100000 width=4) (actual > time=0.008..80.090 rows=100000 loops=1) > -> Seq Scan on nodes (cost=0.00..12355.41 rows=754741 > width=4) (actual time=0.007..41.566 rows=100000 loops=1) > -> Index Scan using n_node_id_index on nodes (cost=0.00..0.63 > rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=100000) > Index Cond: (public.nodes.node_id = public.nodes.node_id) > Total runtime: 1053.523 ms > (8 rows) > > Time: 1054.864 ms > > > > On Mar 20, 2011, at 2:51 AM, Pavel Stehule wrote: > >> Hello >> >> 2011/3/20 Adam Tistler <atist...@gmail.com>: >>> I have noticed that SELECT ... = ANY(ARRAY(...)) is about twice as fast as >>> SELECT IN ( ... ). >>> Can anyone explain a reason for this? Results are the bottom and are >>> reproducible. I can test with other versions if that is necessary. >>> >> >> send a result of EXPLAIN ANALYZE SELECT ..., please >> >> The reasons can be different - less seq scans, indexes >> >> Regards >> >> Pavel Stehule >> >> >> >>> ./configure --prefix=/usr/local/pgsql84 --with-openssl --with-perl >>> CentOS release 5.4 (Final) >>> psql (PostgreSQL) 8.4.1 >>> >>> prompt2=# select count(*) from nodes; >>> count >>> -------- >>> 754734 >>> (1 row) >>> >>> >>> prompt2=# \d nodes >>> Table "public.nodes" >>> Column | Type | Modifiers >>> --------------+--------------------------+----------------------------------------------------------- >>> node_id | integer | not null default >>> nextval(('node_id_seq'::text)::regclass) >>> node_type_id | integer | not null >>> template_id | integer | not null >>> timestamp | timestamp with time zone | default >>> ('now'::text)::timestamp(6) with time zone >>> Indexes: >>> "nodes_pkey" PRIMARY KEY, btree (node_id) >>> "n_node_id_index" btree (node_id) >>> "n_node_type_id_index" btree (node_type_id) >>> "n_template_id_index" btree (template_id) >>> >>> prompt2=# select count(*) from nodes where node_id = any( Array(select >>> node_id from nodes limit 100000) ); >>> count >>> -------- >>> 100000 >>> (1 row) >>> >>> Time: 404.530 ms >>> prompt2=# select count(*) from nodes where node_id = any( Array(select >>> node_id from nodes limit 100000) ); >>> count >>> -------- >>> 100000 >>> (1 row) >>> >>> Time: 407.316 ms >>> prompt2=# select count(*) from nodes where node_id = any( Array(select >>> node_id from nodes limit 100000) ); >>> count >>> -------- >>> 100000 >>> (1 row) >>> >>> Time: 408.728 ms >>> prompt2=# select count(*) from nodes where node_id in (select node_id from >>> nodes limit 100000 ); >>> count >>> -------- >>> 100000 >>> (1 row) >>> >>> Time: 793.840 ms >>> prompt2=# select count(*) from nodes where node_id in (select node_id from >>> nodes limit 100000 ); >>> count >>> -------- >>> 100000 >>> (1 row) >>> >>> Time: 779.137 ms >>> prompt2=# select count(*) from nodes where node_id in (select node_id from >>> nodes limit 100000 ); >>> count >>> -------- >>> 100000 >>> (1 row) >>> >>> Time: 781.820 ms >>> >>> >>> -- >>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-performance >>> > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance