> > What are the odds of going through and revamping some of the
> > tunables in postgresql.conf for the 7.4 release?  I was just
> > working with someone on IRC and on their 7800 RPM IDE drives,
> > their random_page_cost was ideally suited to be 0.32: a far cry
> > from 4.  Doing so has been a win across the board and the problem
> > query went from about 40sec (seq scan) down to 0.25ms (using idx,
> > higher than 0.32 resulted in a query time jump to 2sec, and at 0.4
> > it went back up to a full seq scan at 40sec).
> I'm the guy who advocates settings of 1 to 2, and that still sounds
> low to me. :-) I'm wondering if the effective_cache_size was set
> properly, as well as there be enough buffers allocated.
> I generally set effective cache size to 100,000 pages (800 megs or
> so) on my box, which is where it sits most days.  with this setting
> I've found that settings of under 1 are not usually necessary to
> force the planner to take the path of righteousness (i.e. the
> fastest one :-) 1.2 to 1.4 are optimal to me.

This is a nightly report that's run, cache sizes won't impact
performance of the query at all.  The planner was consistently
choosing a sequential scan over using the index until the
random_page_cost was set to 0.32.  After adjustment, the query just
flies ([EMAIL PROTECTED] vs [EMAIL PROTECTED] vs. 40s@>0.4).  Since it's a nightly
report that only gets performed once a day and data is COPY'ed in once
every few minutes, there's a huge amount of data that's not cached nor
should it be.

> Since theoretically a random page of of 1 means no penalty to move
> the heads around, and there's ALWAYS a penalty for moving the heads
> around, we have to assume:
> 1: That either the planner is making poor decisions on some other
> variable, and we can whack the planner in the head with a really low
> random page count.

By all accounts of having played with this query+data, this is the
correct assumption from what I can tell.

> OR 
> 2: The other settings are suboptimal (buffers, sort_mem,
> effective_cache_size, etc...) and lowering random page costs helps
> there.

None of those other than possibly sort_mem had any impact on the
query, but even then, lower sort_mem doesn't help until the data's
been picked out of the table.  Sorting ~16k of rows is quicker with
more sort_mem.  Higher sort_mem has zero impact on fetching ~16K rows
out of a table with 40M rows of data.  Getting the planner to pick
using the index to filter out data inserted in the last 3 days over
doing a seq scan...  well, I don't know how you could do that without
changing the random_page_cost.  A good thump to the side of the head
would be welcome too if I'm wrong, just make sure it's a good thump
with the appropriate clue-bat.

> I've always wondered if most performance issues aren't a bit of both.  

Eh, in my experience, it's generally that random_page_cost needs to be
adjusted to match the hardware and this value every year with new
hardware, seems to be getting lower.

> The answer, of course, is fixing the planner so that a
> random_page_cost of anything less than 1 would never be needed,
> since by design, anything under 1 represents a computer that likely
> doesn't exist (in theory of course.)  A 1 would be a machine that
> was using solid state hard drives and had the same cost in terms of
> OS paths to do random accesses as sequential.

Well, this could be a bug then, but I'm skeptical.  What's odd to me
is that hanging the value between 0.32, 0.33, and 0.4 all radically
change the performance of the query.

> What constants in the planner, and / or formulas would be the likely
> culprits I wonder?  I've wandered through that page and wasn't sure
> what to play with.

random_page_cost should be proportional to the seek time necessary for
the disk to find a page of data on its platters.  It makes sense that
this value, as time progresses, gets smaller as hardware gets faster.


Sean Chittenden

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?


Reply via email to