Hi,

I'm fairly new to PostgreSQL query tuning, so please forgive me if I've 
overlooked something obvious.

I have a query which is running slowly, and the query plan shows an unexpected 
sequence scan where I'd have expected the planner to use an index. Setting 
enable_seqscan=off causes the planner to use the index as expected. The types 
of the field that the join where the index should be used do indeed match 
(int4) - I've read that's the usual reason for an index not being used. I've 
tried rebuilding the twitter_tweet_transmission_id index, and have re-ANALYSE'd 
the table.

I can use the enable_seqscan=off workaround for now, but I'd be interested in 
knowing why the index isn't being used.

Production is PostgreSQL 8.4 on Ubuntu 10.04 LTS, 16G RAM. I get a broadly 
similar query plan both on a local dev machine running PG 9.1 with 8G RAM and 
single SATA drive, and on the production server with 16G RAM and a RAID10 array 
(some of the costs etc. are different, but the use or absence of the index is 
common). I'm using a restored production database on dev, so data volumes are 
the same.

Here's the query, table and index definitions, query plans, and server configs 
where the defaults have changed (query plans and config from the dev machine) - 
I've only really done rudimentary tuning with pgtune, so there are probably 
some howlers in the conf. Anyway:

SELECT "timecode_transmission"."id",
             COUNT("twitter_tweet"."id") AS "tweet_count"
FROM "timecode_transmission"
LEFT OUTER JOIN "twitter_tweet" ON ("timecode_transmission"."id" = 
"twitter_tweet"."transmission_id")
WHERE "timecode_transmission"."tx" <= '2012-04-06 23:59:59'
    AND "timecode_transmission"."tx" >= '2012-04-06 00:00:00'
GROUP BY "timecode_transmission"."id"

The twitter_tweet table has about 25m rows, as you'll see from the query plans.

Any hints are appreciated.

Cheers,
Dan


Table definitions:

CREATE TABLE "public"."twitter_tweet" (
        "id" int8 NOT NULL DEFAULT nextval('twitter_tweet_id_seq'::regclass),
        "twitter_id" int8 NOT NULL,
        "created_at" timestamp(6) WITH TIME ZONE NOT NULL,
        "from_user" varchar(20) NOT NULL,
        "text" text NOT NULL,
        "from_user_id" int8 NOT NULL,
        "sentiment" float8 NOT NULL,
        "transmission_id" int4,
        "from_user_name" varchar(500),
        "gender" int2,
        CONSTRAINT "twitter_tweet_pkey" PRIMARY KEY ("id") NOT DEFERRABLE 
INITIALLY IMMEDIATE,
        CONSTRAINT "transmission_id_refs_id_23b9da6852fe9f37" FOREIGN KEY 
("transmission_id") REFERENCES "public"."timecode_transmission" ("id") ON 
UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED ,
        CONSTRAINT "ck_gender_pstv_4bc0eb22f3ec191e" CHECK ((gender >= 0)) NOT 
DEFERRABLE INITIALLY IMMEDIATE,
        CONSTRAINT "twitter_tweet_gender_check" CHECK ((gender >= 0)) NOT 
DEFERRABLE INITIALLY IMMEDIATE,
        CONSTRAINT "twitter_tweet_klout_score_check" CHECK ((klout_score >= 0)) 
NOT DEFERRABLE INITIALLY IMMEDIATE
)
WITH (OIDS=FALSE);
CREATE INDEX "twitter_tweet_created_at" ON "public"."twitter_tweet" USING 
btree(created_at ASC NULLS LAST);
CREATE INDEX "twitter_tweet_transmission_id" ON "public"."twitter_tweet" USING 
btree(transmission_id ASC NULLS LAST);

CREATE TABLE "public"."timecode_transmission" (
        "id" int4 NOT NULL DEFAULT 
nextval('timecode_transmission_id_seq'::regclass),
        "episode_id" int4 NOT NULL,
        "channel_id" int4 NOT NULL,
        "tx" timestamp(6) WITH TIME ZONE NOT NULL,
        "status" varchar(100) NOT NULL,
        "end" timestamp(6) WITH TIME ZONE NOT NULL,
        "duration" int4,
        "include_in_listings" bool NOT NULL,
        CONSTRAINT "timecode_transmission_pkey" PRIMARY KEY ("id") NOT 
DEFERRABLE INITIALLY IMMEDIATE,
        CONSTRAINT "channel_id_refs_id_42fae8846ea37b15" FOREIGN KEY 
("channel_id") REFERENCES "public"."timecode_channel" ("id") ON UPDATE NO 
ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED ,
        CONSTRAINT "episode_id_refs_id_52ab388b54a13ff3" FOREIGN KEY 
("episode_id") REFERENCES "public"."timecode_episode" ("id") ON UPDATE NO 
ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED ,
        CONSTRAINT "timecode_transmission_tx_72eeb3dac42e185_uniq" UNIQUE 
("tx", "channel_id") NOT DEFERRABLE INITIALLY IMMEDIATE,
        CONSTRAINT "timecode_transmission_duration_check" CHECK ((duration >= 
0)) NOT DEFERRABLE INITIALLY IMMEDIATE
)
WITH (OIDS=FALSE);
CREATE INDEX "timecode_transmission_channel_id" ON 
"public"."timecode_transmission" USING btree(channel_id ASC NULLS LAST);
CREATE INDEX "timecode_transmission_episode_id" ON 
"public"."timecode_transmission" USING btree(episode_id ASC NULLS LAST);
CREATE INDEX "timecode_transmission_status" ON "public"."timecode_transmission" 
USING btree(status ASC NULLS LAST);
CREATE INDEX "timecode_transmission_tx" ON "public"."timecode_transmission" 
USING btree(tx ASC NULLS LAST);

Query plans (PG 9.1, dev):

With enable_seqscan=on:
HashAggregate  (cost=3722291.37..3722301.78 rows=1041 width=12) (actual 
time=255056.070..255056.167 rows=1074 loops=1)
  ->  Hash Right Join  (cost=68.20..3721927.54 rows=72766 width=12) (actual 
time=229054.781..255005.491 rows=415193 loops=1)
        Hash Cond: (twitter_tweet.transmission_id = timecode_transmission.id)
        ->  Seq Scan on twitter_tweet  (cost=0.00..3628135.86 rows=24798886 
width=12) (actual time=0.003..251157.607 rows=24799190 loops=1)
        ->  Hash  (cost=55.18..55.18 rows=1041 width=4) (actual 
time=0.922..0.922 rows=1074 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 38kB
              ->  Index Scan using timecode_transmission_tx on 
timecode_transmission  (cost=0.00..55.18 rows=1041 width=4) (actual 
time=0.023..0.659 rows=1074 loops=1)
                    Index Cond: ((tx <= '2012-04-06 23:59:59+01'::timestamp 
with time zone) AND (tx >= '2012-04-06 00:00:00+01'::timestamp with time zone))
Total runtime: 255083.009 ms

With enable_seqscan=off:
GroupAggregate  (cost=0.00..5972744.82 rows=1041 width=12) (actual 
time=63.504..272.790 rows=1074 loops=1)
  ->  Nested Loop Left Join  (cost=0.00..5972370.58 rows=72766 width=12) 
(actual time=63.498..244.115 rows=415193 loops=1)
        ->  Index Scan using timecode_transmission_pkey on 
timecode_transmission  (cost=0.00..13520.56 rows=1041 width=4) (actual 
time=63.486..68.130 rows=1074 loops=1)
              Filter: ((tx <= '2012-04-06 23:59:59+01'::timestamp with time 
zone) AND (tx >= '2012-04-06 00:00:00+01'::timestamp with time zone))
        ->  Index Scan using twitter_tweet_transmission_id on twitter_tweet  
(cost=0.00..5677.78 rows=3710 width=12) (actual time=0.002..0.130 rows=386 
loops=1074)
              Index Cond: (timecode_transmission.id = transmission_id)
Total runtime: 272.871 ms

Query plans (PG 8.4, prod):

With enable_seqscan=on:
HashAggregate  (cost=4931753.82..4931766.82 rows=1040 width=12) (actual 
time=130720.665..130720.873 rows=1071 loops=1)
  ->  Hash Left Join  (cost=4205000.99..4931360.42 rows=78679 width=12) (actual 
time=126841.549..130645.028 rows=421286 loops=1)
        Hash Cond: (timecode_transmission.id = twitter_tweet.transmission_id)
        ->  Index Scan using timecode_transmission_tx on timecode_transmission  
(cost=0.00..57.25 rows=1040 width=4) (actual time=8.730..19.873 rows=1071 
loops=1)
              Index Cond: ((tx <= '2012-04-06 23:59:59+00'::timestamp with time 
zone) AND (tx >= '2012-04-06 00:00:00+00'::timestamp with time zone))
        ->  Hash  (cost=3736125.66..3736125.66 rows=26973466 width=12) (actual 
time=126824.846..126824.846 rows=23224197 loops=1)
              ->  Seq Scan on twitter_tweet  (cost=0.00..3736125.66 
rows=26973466 width=12) (actual time=8.084..117210.707 rows=25165171 loops=1)
Total runtime: 130729.920 ms

With enable_seqscan=off;
GroupAggregate  (cost=0.00..6117044.05 rows=1040 width=12) (actual 
time=72.242..470.289 rows=1071 loops=1)
  ->  Nested Loop Left Join  (cost=0.00..6116637.65 rows=78681 width=12) 
(actual time=72.180..426.434 rows=421286 loops=1)
        ->  Index Scan using timecode_transmission_pkey on 
timecode_transmission  (cost=0.00..13610.82 rows=1040 width=4) (actual 
time=72.164..78.575 rows=1071 loops=1)
              Filter: ((tx <= '2012-04-06 23:59:59+00'::timestamp with time 
zone) AND (tx >= '2012-04-06 00:00:00+00'::timestamp with time zone))
        ->  Index Scan using twitter_tweet_transmission_id on twitter_tweet  
(cost=0.00..5816.73 rows=4125 width=12) (actual time=0.003..0.277 rows=393 
loops=1071)
              Index Cond: (timecode_transmission.id = 
twitter_tweet.transmission_id)
Total runtime: 470.421 ms

Non-default configuration (dev, PG9.1):
default_statistics_target = 100 # pgtune wizard 2012-01-11
maintenance_work_mem = 400MB # pgtune wizard 2012-01-11
constraint_exclusion = partition # pgtune wizard 2012-01-11
checkpoint_completion_target = 0.9 # pgtune wizard 2012-01-11
effective_cache_size = 7552MB # pgtune wizard 2012-01-11
work_mem = 50MB # pgtune wizard 2012-01-11
wal_buffers = 16MB # pgtune wizard 2012-01-11
checkpoint_segments = 16 # pgtune wizard 2012-01-11
shared_buffers = 3840MB # pgtune wizard 2012-01-11
max_connections = 80 # pgtune wizard 2012-01-11

Non-default configuration (prod, PG8.4)
maintenance_work_mem = 960MB
checkpoint_completion_target = 0.9
checkpoint_segments = 32
wal_buffers = 16MB
effective_cache_size = 11GB
shared_buffers = 4GB
max_connections = 32
work_mem = 100MB


--
Dan Fairs | dan.fa...@gmail.com | www.fezconsulting.com

Reply via email to