Re: [PERFORM] dissimilar drives in Raid10 , does it make difference ?

2009-02-14 Thread Craig Ringer

Matthew Wakeling wrote:

In fact, it is recommended that two different types of drives are used. 
That way, if there's a mass failure of a whole batch of drives from one 
particular vendor, you don't lose all your data.


Don't think this is just paranoia, either. I've had it happen to me 
SEVERAL times - either a second drive fails before I can replace the 
first, or the second drive in a pair fails during rebuild onto the 
replacement. I use regular RAID scrubbing, so this isn't just a case of 
undetected media errors.


Nothing beats good backups.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-14 Thread Craig Ringer

Alexander Staubo wrote:

On Fri, Feb 13, 2009 at 12:53 PM, Alexander Staubo a...@bengler.no wrote:

The upgrade was done with dump/restore using pg_dump -Fc. The old
database lived on a SAN volume, whereas the new database lives on a
local disk volume.


I need to correct myself: The Munin graphs were never set to track the
SAN volume where the old database lived. So when the graph goes from
near-zero to lots, it's actually correct.

When I compare the correct graph, however, it's apparently that I/O
writes have, on average, doubled.


Is there any chance you had pg_xlog stored separately on your old 
database, and I/O for it wasn't being recorded?


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-14 Thread Chris Browne
mallah.raj...@gmail.com (Rajesh Kumar Mallah) writes:
 why is it not a good idea to give end users control over when they
 want to run it ?

It's not a particularly good idea to give end users things that they
are likely then to *immediately* use to shoot themselves in the foot.

Turning off vacuuming all day is the sort of thing that is indeed
pretty certain to hurt you when you imagined it was going to help you.

In particular, if you shut off autovac all day, heavily updated tables
with certain sorts of (pretty common!) update patterns are certain to
bloat up, to the point that you'll need to do CLUSTER/VACUUM FULL on
them.

In effect, the practical effect of autovacuum at lean hours only is
more reasonably described as cancel autovacuum and revert to the
elder policy of requiring users to do manual vacuuming.

It's worth looking at how autovacuum has been evolving over time...

- When it was introduced, 8.0-ish (maybe it was 8.1 when it became
  official), it was pretty primitive.

  Autovac was a single process, where you had three controls over
  behaviour:

   - You could run autovac, or not; 

   - You could exclude specific tables from being processed by autovac

   - There is a capability to vacuum less aggressively by using
 delays to reduce autovac I/O usage

- In 8.3, it was enhanced to add the notion of having multiple vacuum
  workers

  There was discussion about having one of those workers restrict
  itself to small tables, so that you'd never have the scenario where
  the workers were all busy and a small table that needed vacuuming
  was left unvacuumed for a long time.  It appears that didn't happen,
  which seems unfortunate, but that's life...

You should look at all the knobs that *are* offered before deciding
a policy that may be harmful to performance.  As things stand now,
there are a couple of ways I could see tuning this:

 - You might check on the GUC variables autovacuum_vacuum_cost_delay
   and autovacuum_vacuum_cost_limit, which would allow you to restrict
   the I/O cost.

   This might allow you to run autovacuum all the time without
   adversely affecting performance.

 - You might come up with a list of the *LARGE* tables that you don't
   want vacuumed during the day, and set up a cron job that adds/drops
   them from the pg_autovacuum table at the appropriate times.

   This is definitely going to be more finicky, and requires a great
   deal more awareness of the tables being updated by your
   applications.  It makes autovacuum a whole lot less automatic.

There are known enhancements coming up:

 - In 8.4, there is a capability for VACUUM to only process the
   portions of the table known to have been altered.

   That ought to be a better answer than *any* of the fiddling
   suggested, to date.  Right now, a VACUUM on public.my_huge_table,
   a table 18GB in size, will walk through the entire table, even
   though there were only a handful of pages where tuples were
   invalidated.

   This is almost certainly the single best improvement possible to
   resolve your issue; it seems likely to *directly* address the
   problem, and has the considerable merit of not requiring much if
   any configuration/reconfiguration/scheduling.
-- 
(reverse (concatenate 'string gro.mca @ enworbbc))
http://linuxdatabases.info/info/
what would  we do without C?  we   would have PASAL,  BASI, OBOL, and
Ommon Lisp. -- #Erik

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-14 Thread Rajesh Kumar Mallah
Its nice to know the evolution of autovacuum and i  understand that
the suggestion/requirement  of autovacuum at lean hours only
was defeating the whole idea.

regds
--rajesh kumar mallah.

On Fri, Feb 13, 2009 at 11:07 PM, Chris Browne cbbro...@acm.org wrote:
 mallah.raj...@gmail.com (Rajesh Kumar Mallah) writes:
 why is it not a good idea to give end users control over when they
 want to run it ?

 It's not a particularly good idea to give end users things that they
 are likely then to *immediately* use to shoot themselves in the foot.

 Turning off vacuuming all day is the sort of thing that is indeed
 pretty certain to hurt you when you imagined it was going to help you.

 In particular, if you shut off autovac all day, heavily updated tables
 with certain sorts of (pretty common!) update patterns are certain to
 bloat up, to the point that you'll need to do CLUSTER/VACUUM FULL on
 them.

 In effect, the practical effect of autovacuum at lean hours only is
 more reasonably described as cancel autovacuum and revert to the
 elder policy of requiring users to do manual vacuuming.

 It's worth looking at how autovacuum has been evolving over time...

 - When it was introduced, 8.0-ish (maybe it was 8.1 when it became
  official), it was pretty primitive.

  Autovac was a single process, where you had three controls over
  behaviour:

   - You could run autovac, or not;

   - You could exclude specific tables from being processed by autovac

   - There is a capability to vacuum less aggressively by using
 delays to reduce autovac I/O usage

 - In 8.3, it was enhanced to add the notion of having multiple vacuum
  workers

  There was discussion about having one of those workers restrict
  itself to small tables, so that you'd never have the scenario where
  the workers were all busy and a small table that needed vacuuming
  was left unvacuumed for a long time.  It appears that didn't happen,
  which seems unfortunate, but that's life...

 You should look at all the knobs that *are* offered before deciding
 a policy that may be harmful to performance.  As things stand now,
 there are a couple of ways I could see tuning this:

  - You might check on the GUC variables autovacuum_vacuum_cost_delay
   and autovacuum_vacuum_cost_limit, which would allow you to restrict
   the I/O cost.

   This might allow you to run autovacuum all the time without
   adversely affecting performance.

  - You might come up with a list of the *LARGE* tables that you don't
   want vacuumed during the day, and set up a cron job that adds/drops
   them from the pg_autovacuum table at the appropriate times.

   This is definitely going to be more finicky, and requires a great
   deal more awareness of the tables being updated by your
   applications.  It makes autovacuum a whole lot less automatic.

 There are known enhancements coming up:

  - In 8.4, there is a capability for VACUUM to only process the
   portions of the table known to have been altered.

   That ought to be a better answer than *any* of the fiddling
   suggested, to date.  Right now, a VACUUM on public.my_huge_table,
   a table 18GB in size, will walk through the entire table, even
   though there were only a handful of pages where tuples were
   invalidated.

   This is almost certainly the single best improvement possible to
   resolve your issue; it seems likely to *directly* address the
   problem, and has the considerable merit of not requiring much if
   any configuration/reconfiguration/scheduling.
 --
 (reverse (concatenate 'string gro.mca @ enworbbc))
 http://linuxdatabases.info/info/
 what would  we do without C?  we   would have PASAL,  BASI, OBOL, and
 Ommon Lisp. -- #Erik

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow queries from information schema

2009-02-14 Thread Octavio Alvarez
I'm aware you already know that information_schema is slow [1] [2], so I
just want to expose/document another case and tests I did.

I'm using the following view to check what tables depend on what other
tables.

CREATE VIEW raw_relation_tree AS
SELECT
   tc_p.table_catalog AS parent_catalog,
   tc_p.table_schema AS parent_schema,
   tc_p.table_name AS parent_table,
   tc_c.table_catalog AS child_catalog,
   tc_c.table_schema AS child_schema,
   tc_c.table_name AS child_table
FROM
   information_schema.referential_constraints AS rc
   NATURAL JOIN information_schema.table_constraints AS tc_c
   LEFT JOIN information_schema.table_constraints AS tc_p ON
  rc.unique_constraint_catalog = tc_p.constraint_catalog AND
  rc.unique_constraint_schema = tc_p.constraint_schema AND
  rc.unique_constraint_name = tc_p.constraint_name
;

test=# select count(*) from raw_relation_tree;
count 
---
11
(1 row)

An EXPLAIN ANALYZE for a simple SELECT on each of the FROM tables give:
referential_constraints: ~9ms.
table_constraints: ~24ms.

The result, on the above view: ~80ms. Fair enough. But if I apply a
condition:

SELECT * FROM ___pgnui_relation_tree.raw_relation_tree WHERE
parent_schema  child_schema;

it takes ~2 seconds (!) to complete.

I tried using an alternate table_constraints definition by creating my
own view and changing UNION to UNION ALL (as per [2]) The results were:

table_constraints using UNION ALL has the same number of rows as the
UNION version.

table_constraints now take about 4 ms (as expected).
VIEW raw_relation_tree is now 110 ms.
VIEW raw_relation_tree WHERE parent_schema  child_schema: 3.3 sec.

EXPLAIN results are way too long to post here. If it is ok, I'll gladly
post them.

Using 8.3.6.

[1] http://archives.postgresql.org/pgsql-bugs/2008-12/msg00144.php
[2]
http://archives.postgresql.org/pgsql-performance/2008-05/msg00062.php



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-14 Thread Alexander Staubo
On Fri, Feb 13, 2009 at 6:35 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 You should definitely set effective_cache_size.

Wow -- I set this to 10GB (5GB for shared buffers + another 5GB for
cache), and today's average write frequency went from 20MB/sec to just
1MB/sec. The documentation suggests that effective_cache_size is only
used for query planning in conjunction with indexes. So how come it
affects write access?

 If you still see the problem after that, I suggest testing different
 settings for:

 bgwriter_lru_maxpages
 bgwriter_lru_multiplier
 checkpoint_segments
 checkpoint_timeout
 checkpoint_completion_target

 Both the checkpoint process and the background writer changed quite a
 bit, and these are the main knobs for tuning the new version.

We are hoping to set up a duplicate instance and play back the SQL log
against it so we can experiment with different settings. Until we have
such a setup, I'm not sure what to do with the knobs other than frob
them wildly. :-) Are there any statistics, either in PostgreSQL proper
or in the OS, that I can use as metrics to guide the tuning? For
example, is there anything in pg_stat_bgwriter that can help me tune
the bgwriter_lru_* settings?

Do transactions that only contain query statements end up writing
entries to the WAL when they commit? If yes, can we avoid the writes
by wrapping our queries in read only transactions, and would it be
worth the effort?

Our web application is handling 30 requests per second at peak time;
each request is performing dozens queries in autocommit mode, ie. one
transaction per query. Only a minority of those requests actually end
up modifying the database. PostgreSQL is committing and fsyncing
600-800 transactions per second, so that's probably a good chunk of
disk/CPU usage wasted, right?

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-14 Thread Alexander Staubo
On Sat, Feb 14, 2009 at 9:49 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 Is there any chance you had pg_xlog stored separately on your old database,
 and I/O for it wasn't being recorded?

No, the database files have always been on a single volume.

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-14 Thread Tom Lane
Alexander Staubo a...@bengler.no writes:
 kevin.gritt...@wicourts.gov wrote:
 You should definitely set effective_cache_size.

 Wow -- I set this to 10GB (5GB for shared buffers + another 5GB for
 cache), and today's average write frequency went from 20MB/sec to just
 1MB/sec. The documentation suggests that effective_cache_size is only
 used for query planning in conjunction with indexes. So how come it
 affects write access?

It *is* only used for query planning.  A plausible theory is that you
caused some join queries to change from hash or merge joining involving
a temporary hash or sort file to an index nestloop that doesn't use any
temporary storage.  If the required hash or sort storage exceeded
work_mem, which you have set to just 10MB, that would have created some
write traffic.

Did you happen to notice whether your queries got faster or slower when
you did this?  Watching only aggregate write traffic is a pretty limited
view of what is happening in your database.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow queries from information schema

2009-02-14 Thread Tom Lane
Octavio Alvarez alvar...@alvarezp.ods.org writes:
 The result, on the above view: ~80ms. Fair enough. But if I apply a
 condition:
 SELECT * FROM ___pgnui_relation_tree.raw_relation_tree WHERE
 parent_schema  child_schema;
 it takes ~2 seconds (!) to complete.

I'm not sure I'm seeing the exact same case as you, but what I see here
is that 8.3 puts the join condition involving _pg_keysequal() at the
top of the tree where it will be executed quite a lot of times (way
more than the planner expects, because of bad rowcount estimates below)
... and _pg_keysequal() is implemented in a depressingly inefficient way.

CVS HEAD seems to avoid this trap in the same case, but I'm not entirely
convinced whether it's getting better rowcount estimates or just got
lucky.

Anyway it seems to help a great deal if you use a less sucky definition
of the function, such as

create or replace function information_schema._pg_keysequal(smallint[], 
smallint[]) RETURNS boolean
LANGUAGE sql STRICT IMMUTABLE AS
'select $1 @ $2 and $2 @ $1';

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O increase after upgrading to 8.3.5

2009-02-14 Thread Alexander Staubo
On Sat, Feb 14, 2009 at 8:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alexander Staubo a...@bengler.no writes:
 Wow -- I set this to 10GB (5GB for shared buffers + another 5GB for
 cache), and today's average write frequency went from 20MB/sec to just
 1MB/sec. The documentation suggests that effective_cache_size is only
 used for query planning in conjunction with indexes. So how come it
 affects write access?

 It *is* only used for query planning.  A plausible theory is that you
 caused some join queries to change from hash or merge joining involving
 a temporary hash or sort file to an index nestloop that doesn't use any
 temporary storage.  If the required hash or sort storage exceeded
 work_mem, which you have set to just 10MB, that would have created some
 write traffic.

Interesting. Is there any statistic available that can tell me whether
work_mem is being exceeded? The tools to monitor exactly what
PostgreSQL is doing -- especially on a production box -- are fairly
limited, especially since Linux does not yet have anything close to
DTrace in functionality.

 Did you happen to notice whether your queries got faster or slower when
 you did this?  Watching only aggregate write traffic is a pretty limited
 view of what is happening in your database.

Unfortunately we don't log SQL queries at the moment. We do log
application response times, but they look roughly the same as before
the change. I could revert the effective_cache_size setting, turn on
SQL logging for a while, then reapply the change and compare.

Alexander.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow queries from information schema

2009-02-14 Thread Octavio Alvarez
On Sat, 2009-02-14 at 15:02 -0500, Tom Lane wrote:
 Octavio Alvarez alvar...@alvarezp.ods.org writes:
  The result, on the above view: ~80ms. Fair enough. But if I apply a
  condition:
  SELECT * FROM ___pgnui_relation_tree.raw_relation_tree WHERE
  parent_schema  child_schema;
  it takes ~2 seconds (!) to complete.
 
 I'm not sure I'm seeing the exact same case as you, but what I see here
 is that 8.3 puts the join condition involving _pg_keysequal() at the
 top of the tree where it will be executed quite a lot of times (way
 more than the planner expects, because of bad rowcount estimates below)
 ... and _pg_keysequal() is implemented in a depressingly inefficient way.
 
 CVS HEAD seems to avoid this trap in the same case, but I'm not entirely
 convinced whether it's getting better rowcount estimates or just got
 lucky.
 
 Anyway it seems to help a great deal if you use a less sucky definition
 of the function, such as
 
 create or replace function information_schema._pg_keysequal(smallint[], 
 smallint[]) RETURNS boolean
 LANGUAGE sql STRICT IMMUTABLE AS
 'select $1 @ $2 and $2 @ $1';

Wow! Just tried it with the UNION (the original) version of
information_schema.table_constraints and it drastically reduced the
total runtime to 309 ms!

I also tested it with UNION ALL and it took 1.6 sec. (and yet, 50% of
the previous time with UNION ALL).




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Bad plan for nested loop + limit

2009-02-14 Thread Alexander Staubo
This dinky little query takes about 4 seconds to run:

  select event_occurrences.*
  from event_occurrences
  join section_items on section_items.subject_id = event_occurrences.event_id
and section_items.subject_type = 'Event'
and section_items.sandbox_id = 16399
  where event_occurrences.start_time  '2009-02-14 18:15:14.739411 +0100'
  order by event_occurrences.start_time
  limit 4;

Output from explain analyze:

 Limit  (cost=0.00..973.63 rows=4 width=48) (actual
time=61.554..4039.704 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
time=61.552..4039.700 rows=1 loops=1)
 -  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
(actual time=61.535..4039.682 rows=1 loops=1)
   -  Index Scan using
index_event_occurrences_on_start_time on event_occurrences
(cost=0.00..13975.01 rows=159718 width=48) (actual time=0.024..398.152
rows=155197 loops=1)
 Index Cond: (start_time  '2009-02-14
18:15:14.739411+01'::timestamp with time zone)
   -  Index Scan using
index_section_items_on_subject_type_and_subject_id on section_items
(cost=0.00..0.33 rows=1 width=4) (actual time=0.023..0.023 rows=0
loops=155197)
 Index Cond: (((section_items.subject_type)::text
= 'Event'::text) AND (section_items.subject_id =
event_occurrences.event_id))
 Filter: (section_items.sandbox_id = 16399)
 -  Index Scan using event_instances_pkey on events
(cost=0.00..6.23 rows=1 width=4) (actual time=0.014..0.015 rows=1
loops=1)
   Index Cond: (events.id = event_occurrences.event_id)
   Filter: (events.read_permissions = (-1))
 Total runtime: 4039.788 ms

Now, if I use limit 50 it uses a plan that is several orders of
magnitude more efficient:

 Limit  (cost=6202.38..6202.51 rows=50 width=48) (actual
time=0.170..0.171 rows=1 loops=1)
   -  Sort  (cost=6202.38..6203.20 rows=326 width=48) (actual
time=0.170..0.170 rows=1 loops=1)
 Sort Key: event_occurrences.start_time
 Sort Method:  quicksort  Memory: 25kB
 -  Nested Loop  (cost=5.09..6191.55 rows=326 width=48)
(actual time=0.160..0.161 rows=1 loops=1)
   -  Bitmap Heap Scan on section_items
(cost=5.09..328.94 rows=96 width=4) (actual time=0.024..0.087 rows=7
loops=1)
 Recheck Cond: (sandbox_id = 16399)
 Filter: ((subject_type)::text = 'Event'::text)
 -  Bitmap Index Scan on
index_section_items_on_sandbox_id  (cost=0.00..5.06 rows=107 width=0)
(actual time=0.018..0.018 rows=7 loops=1)
   Index Cond: (sandbox_id = 16399)
   -  Index Scan using
index_event_occurrences_on_event_id on event_occurrences
(cost=0.00..60.14 rows=74 width=48) (actual time=0.010..0.010 rows=0
loops=7)
 Index Cond: (event_occurrences.event_id =
section_items.subject_id)
 Filter: (event_occurrences.start_time 
'2009-02-14 18:15:14.739411+01'::timestamp with time zone)
 Total runtime: 0.210 ms

Similarly if I disable nested joins with set enable_nestloop = off:

 Limit  (cost=10900.13..10900.14 rows=4 width=48) (actual
time=191.476..191.478 rows=1 loops=1)
   -  Sort  (cost=10900.13..10900.95 rows=326 width=48) (actual
time=191.474..191.475 rows=1 loops=1)
 Sort Key: event_occurrences.start_time
 Sort Method:  quicksort  Memory: 25kB
 -  Hash Join  (cost=8944.52..10895.24 rows=326 width=48)
(actual time=162.104..191.463 rows=1 loops=1)
   Hash Cond: (section_items.subject_id =
event_occurrences.event_id)
   -  Bitmap Heap Scan on section_items
(cost=5.09..328.94 rows=96 width=4) (actual time=0.026..0.050 rows=7
loops=1)
 Recheck Cond: (sandbox_id = 16399)
 Filter: ((subject_type)::text = 'Event'::text)
 -  Bitmap Index Scan on
index_section_items_on_sandbox_id  (cost=0.00..5.06 rows=107 width=0)
(actual time=0.019..0.019 rows=7 loops=1)
   Index Cond: (sandbox_id = 16399)
   -  Hash  (cost=5580.54..5580.54 rows=157752 width=48)
(actual time=161.832..161.832 rows=155197 loops=1)
 -  Seq Scan on event_occurrences
(cost=0.00..5580.54 rows=157752 width=48) (actual time=0.030..75.406
rows=155197 loops=1)
   Filter: (start_time  '2009-02-14
18:15:14.739411+01'::timestamp with time zone)
 Total runtime: 192.496 ms

Some statistics:

# # select attname, n_distinct from pg_stats where tablename =
'event_occurrences';
  attname   | n_distinct
+
 id | -1
 created_at |  -0.291615
 updated_at |  -0.294081
 created_by |715
 updated_by |715
 event_id   |   2146
 start_time |   -0.10047
 end_time   |   5602

# select attname, n_distinct from pg_stats where tablename = 'section_items';
   attname| n_distinct
--+
 id 

Re: [PERFORM] Bad plan for nested loop + limit

2009-02-14 Thread David Wilson
On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo a...@bengler.no wrote:

 Output from explain analyze:

  Limit  (cost=0.00..973.63 rows=4 width=48) (actual
 time=61.554..4039.704 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
 time=61.552..4039.700 rows=1 loops=1)
 -  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
 (actual time=61.535..4039.682 rows=1 loops=1)

Those estimates are pretty far off. Did you try increasing the
statistics target? Also, is the first query repeatable (that is, is it
already in cache when you do the test, or alternately, are all queries
*out* of cache when you test?)
-- 
- David T. Wilson
david.t.wil...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance