Hi All, We are currently using PostgreSQL 9.0.3 and we noticed a performance anomaly from a framework (ActiveRecord) generated query to one of our tables. The query uses an in clause to check an indexed column for the presence of either of two values. In this particular case neither of them is present (but in other cases one or more might be). The framework generates a limit 1 query to test for existence. This query ends up using a seq scan and is quite slow, however rewriting it using OR = rather then IN uses the index (as does removing the limit or raising it to a large value). The table has 36 million rows (more details are below) and is read only in typical usage. I was wondering if IN vs OR planning being so differently represented a bug and/or if we might have some misconfiguration somewhere that leads the query planner to pick what in best case can only be a slightly faster plan then using the index but in worst case is much much slower. I would also think the cluster on the table would argue against using a sequence scan for this kind of query (since the hts_code_id's will be colocated, perf, if the id is present, will very greatly depending on what order the seq scan walks the table which we've observed...; if the id(s) are not present then this plan is always terrible). We can use set enable_seqscan TO off around this query if need be, but it seems like something the planner should have done better with unless we have something weird somewhere (conf file details are below).
psql (9.0.3) Type "help" for help. -- Table info dev=> ANALYZE exp_detls; ANALYZE dev=> select count(*) from exp_detls; 36034391 dev=>explain analyze select count(*) from exp_detls; Aggregate (cost=1336141.30..1336141.31 rows=1 width=0) (actual time=43067.620..43067.621 rows=1 loops=1) -> Seq Scan on exp_detls (cost=0.00..1246046.84 rows=36037784 width=0) (actual time=0.011..23703.177 rows=36034391 loops=1) Total runtime: 43067.675 ms dev=>select pg_size_pretty(pg_table_size('exp_detls')); 6919 MB -- Problematic Query dev=> explain analyze SELECT "exp_detls".id FROM "exp_detls" WHERE ("exp_detls"."hts_code_id" IN (12,654)) LIMIT 1; Limit (cost=0.00..158.18 rows=1 width=4) (actual time=9661.363..9661.363 rows=0 loops=1) -> Seq Scan on exp_detls (cost=0.00..1336181.90 rows=8447 width=4) (actual time=9661.360..9661.360 rows=0 loops=1) Filter: (hts_code_id = ANY ('{12,654}'::integer[])) Total runtime: 9661.398 ms (4 rows) -- Using OR =, much faster, though more complicated plan then below dev=> explain analyze SELECT "exp_detls".id FROM "exp_detls" WHERE ("exp_detls"."hts_code_id" = 12 OR "exp_detls"."hts_code_id" = 654) LIMIT 1; Limit (cost=162.59..166.29 rows=1 width=4) (actual time=0.029..0.029 rows=0 loops=1) -> Bitmap Heap Scan on exp_detls (cost=162.59..31188.14 rows=8370 width=4) (actual time=0.028..0.028 rows=0 loops=1) Recheck Cond: ((hts_code_id = 12) OR (hts_code_id = 654)) -> BitmapOr (cost=162.59..162.59 rows=8370 width=0) (actual time=0.027..0.027 rows=0 loops=1) -> Bitmap Index Scan on index_exp_detls_on_hts_code_id_and_data_month (cost=0.00..79.20 rows=4185 width=0) (actual time=0.017..0.017 rows=0 loops=1) Index Cond: (hts_code_id = 12) -> Bitmap Index Scan on index_exp_detls_on_hts_code_id_and_data_month (cost=0.00..79.20 rows=4185 width=0) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (hts_code_id = 654) Total runtime: 0.051 ms (9 rows) -- No limit, much faster, also a cleaner looking plan (of course problematic when there are many matching rows) dev=>explain analyze SELECT "exp_detls".id FROM "exp_detls" WHERE ("exp_detls"."hts_code_id" IN (12,654)); Bitmap Heap Scan on exp_detls (cost=156.93..31161.56 rows=8370 width=4) (actual time=0.028..0.028 rows=0 loops=1) Recheck Cond: (hts_code_id = ANY ('{12,654}'::integer[])) -> Bitmap Index Scan on index_exp_detls_on_hts_code_id_and_data_month (cost=0.00..154.84 rows=8370 width=0) (actual time=0.026..0.026 rows=0 loops=1) Index Cond: (hts_code_id = ANY ('{12,654}'::integer[])) Total runtime: 0.045 ms (5 rows) -- Table Schema Table "public.exp_detls" Column | Type | Modifiers ------------------+-----------------------------+-------------------------------------------------------- id | integer | not null default nextval('exp_detls_id_seq'::regclass) created_at | timestamp without time zone | not null df | integer | hts_code_id | integer | not null uscb_country_id | integer | country_id | integer | uscb_district_id | integer | cards_mo | numeric(15,0) | not null qty_1_mo | numeric(15,0) | not null qty_2_mo | numeric(15,0) | all_val_mo | numeric(15,0) | not null air_val_mo | numeric(15,0) | not null air_wgt_mo | numeric(15,0) | not null ves_val_mo | numeric(15,0) | not null ves_wgt_mo | numeric(15,0) | not null cnt_val_mo | numeric(15,0) | not null cnt_wgt_mo | numeric(15,0) | not null cards_yr | numeric(15,0) | not null qty_1_yr | numeric(15,0) | not null qty_2_yr | numeric(15,0) | all_val_yr | numeric(15,0) | not null air_val_yr | numeric(15,0) | not null air_wgt_yr | numeric(15,0) | not null ves_val_yr | numeric(15,0) | not null ves_wgt_yr | numeric(15,0) | not null cnt_val_yr | numeric(15,0) | not null cnt_wgt_yr | numeric(15,0) | not null data_month | date | not null parent_id | integer | Indexes: "exp_detls_pkey" PRIMARY KEY, btree (id) "index_exp_detls_on_data_month" btree (data_month) WITH (fillfactor=100) "index_exp_detls_on_hts_code_id_and_data_month" btree (hts_code_id, data_month) WITH (fillfactor=100) CLUSTER "index_exp_detls_on_parent_id" btree (parent_id) WITH (fillfactor=100) WHERE parent_id IS NOT NULL <Several FK's> postgresql.conf non-default settings listen_addresses = '*' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 230 # (change requires restart) tcp_keepalives_idle = 180 # TCP_KEEPIDLE, in seconds; shared_buffers = 4GB # min 128kB, DEFAULT 32MB work_mem = 512MB # min 64kB, DEFAULT 1MB maintenance_work_mem = 256MB # min 1MB, DEFAULT 16MB effective_io_concurrency = 2 # 1-1000. 0 disables prefetching synchronous_commit = off # immediate fsync at commit, DEFAULT on wal_buffers = 16MB # min 32kB, DEFAULT 64kB wal_writer_delay = 330ms # 1-10000 milliseconds, DEFAULT 200ms checkpoint_segments = 24 # in logfile segments, min 1, 16MB each checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 effective_cache_size = 24GB # DEFAULT 128MB logging_collector = on # Enable capturing of stderr and csvlog log_checkpoints = on # DEFAULT off log_connections = on # DEFAULT off log_disconnections = on # DEFAULT off log_hostname = on # DEFAULT off log_line_prefix = '%t' # special values: track_activity_query_size = 8192 # (change requires restart) bytea_output = 'escape' # hex, escape, Default hex datestyle = 'iso, mdy' lc_messages = 'en_US.UTF-8' # locale for system error message strings lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting default_text_search_config = 'pg_catalog.english'