Re: [PERFORM] cpu_tuple_cost

2005-03-17 Thread Manfred Koizar
On Mon, 14 Mar 2005 21:23:29 -0500, Tom Lane [EMAIL PROTECTED] wrote:
 I think that the reduce random_page_cost mantra
is not an indication that that parameter is wrong, but that the
cost models it feeds into need more work.

One of these areas is the cost interpolation depending on correlation.
This has been discussed on -hackes in October 2002 and August 2003
(Correlation in cost_index()).  My Postgres installations contain the
patch presented during that discussion (and another index correlation
patch), and I use *higher* values for random_page_cost (up to 10).

Servus
 Manfred

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] cpu_tuple_cost

2005-03-17 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 Yes it is.  I ran experiments back in the late 90s to derive it.
 Check the archives.

 H ... which list?

-hackers, no doubt.  -performance didn't exist then.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] cpu_tuple_cost

2005-03-17 Thread Michael Fuhr
On Thu, Mar 17, 2005 at 09:54:29AM -0800, Josh Berkus wrote:
 
  Yes it is.  I ran experiments back in the late 90s to derive it.
  Check the archives.
 
 H ... which list?

These look like relevant threads:

http://archives.postgresql.org/pgsql-hackers/2000-01/msg00910.php
http://archives.postgresql.org/pgsql-hackers/2000-02/msg00215.php

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] cpu_tuple_cost

2005-03-16 Thread Magnus Hagander
  The this day and age argument isn't very convincing. Hard drive 
  capacity growth has far outstripped hard drive seek time 
 and bandwidth improvements.
  Random access has more penalty than ever.
 
  In point of fact, there haven't been noticeable seek time 
 improvements 
  for years. Transfer rates, on the other hand, have gone 
 through the roof.
 
 Er, yeah. I stated it wrong. The real ratio here is between 
 seek time and throughput.
 
 Typical 7200RPM drives have average seek times are in the 
 area of 10ms.
 Typical sustained transfer rates are in the range of 40Mb/s. 
 Postgres reads 8kB blocks at a time.
 
 So 800kB/s for random access reads. And 40Mb/s for sequential 
 reads. That's a factor of 49. I don't think anyone wants 
 random_page_cost to be set to 50 though.
 
 For a high end 15k drive I see average seek times get as low 
 as 3ms. And sustained transfer rates get as high as 100Mb/s. 
 So about 2.7Mb/s for random access reads or about a 
 random_page_cost of 37. Still pretty extreme.
 
 So what's going on with the empirically derived value of 4? 
 Perhaps this is because even though Postgres is reading an 
 entire table sequentially it's unlikely to be the only I/O 
 consumer? The sequential reads would be interleaved 
 occasionally by some other I/O forcing a seek to continue.

What about the cache memory on the disk? Even IDE disks have some 8Mb
cache today, which makes a lot of difference for fairly short scans.
Even if it's just read cache. That'll bring the speed of random access
down to a 1=1 relationship with sequential access, assuming all fits in
the cache.


//Magnus

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] cpu_tuple_cost

2005-03-16 Thread Josh Berkus
Greg,

 So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's
 a factor of 49. I don't think anyone wants random_page_cost to be set to 50
 though.

 For a high end 15k drive I see average seek times get as low as 3ms. And
 sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for
 random access reads or about a random_page_cost of 37. Still pretty
 extreme.

Actually, what you're demonstrating here is that there's really no point in 
having a random_page_cost GUC, since the seek/scan ratio is going to be high 
regardless.   

Although I can point out that you left out the fact that the disk needs to do 
a seek to find the beginning of the seq scan area, and even then some file 
fragmentation is possible.   Finally, I've never seen PostgreSQL manage more 
than 70% of the maximum read rate, and in most cases more like 30%. 

 So what's going on with the empirically derived value of 4? 

It's not empirically derived; it's a value we plug into an 
internal-to-postgresql formula.   And 4 is a fairly conservative value that 
works for a lot of systems.

Realistically, the values we should be deriving from are:
-- median file cache size for postgresql files
-- average disk read throughput
-- effective processor calculation throughput
-- median I/O contention

However, working those 4 hardware facts into forumulas that allow us to 
calculate the actual cost of a query execution plan is somebody's PhD paper.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] cpu_tuple_cost

2005-03-16 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 So what's going on with the empirically derived value of 4? 

 It's not empirically derived;

Yes it is.  I ran experiments back in the late 90s to derive it.
Check the archives.

Disks have gotten noticeably bigger since then, but I don't think
the ratio of seek time to rotation rate has changed much.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] cpu_tuple_cost

2005-03-15 Thread David Brown
Gregory Stark wrote:
The this day and age argument isn't very convincing. Hard drive capacity
growth has far outstripped hard drive seek time and bandwidth improvements.
Random access has more penalty than ever.
 

In point of fact, there haven't been noticeable seek time improvements 
for years. Transfer rates, on the other hand, have gone through the roof.

Which is why I would question the published tuning advice that 
recommends lowering it to 2 for arrays. Arrays increase the effective 
transfer rate more than they reduce random access times. Dropping from 4 
to 2 would reflect going from a typical single 7200rpm ATA drive to a 
15000rpm SCSI drive, but striping will move it back up again - probably 
even higher than 4 with a big array (at a guess, perhaps the 
relationship might be approximated as a square root after allowing for 
the array type?).

With default settings, I've seen the planner pick the wrong index unless 
random_page_cost was set to 2. But in testing on an ATA drive, I 
achieved slightly better plan costings by increasing cpu_tuple_cost 
(relative to cpu_index_tuple_cost - by default it's only a factor of 10) 
and actually *raising* random_page_cost to 5! So why pick on one 
parameter? It's all going to vary according to the query and the data.

I agree with Tom 100%. Pulling levers on a wonky model is no solution.
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] cpu_tuple_cost

2005-03-15 Thread Greg Stark

David Brown [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 
 The this day and age argument isn't very convincing. Hard drive capacity
 growth has far outstripped hard drive seek time and bandwidth improvements.
 Random access has more penalty than ever.

 In point of fact, there haven't been noticeable seek time improvements for
 years. Transfer rates, on the other hand, have gone through the roof.

Er, yeah. I stated it wrong. The real ratio here is between seek time and
throughput.

Typical 7200RPM drives have average seek times are in the area of 10ms.
Typical sustained transfer rates are in the range of 40Mb/s. Postgres reads
8kB blocks at a time.

So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's a
factor of 49. I don't think anyone wants random_page_cost to be set to 50
though.

For a high end 15k drive I see average seek times get as low as 3ms. And
sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for random
access reads or about a random_page_cost of 37. Still pretty extreme.

So what's going on with the empirically derived value of 4? Perhaps this is
because even though Postgres is reading an entire table sequentially it's
unlikely to be the only I/O consumer? The sequential reads would be
interleaved occasionally by some other I/O forcing a seek to continue.

In which case the true random_page_cost seems like it would be extremely
sensitive to the amount of readahead the OS does. To reach a random_page_cost
of 4 given the numbers above for a 7200RPM drive requires that just under 25%
of the I/O of a sequential table scan be random seeks [*]. That translates to
32kB of sequential reading, which actually does sound like a typical value for
OS readahead.

I wonder if those same empirical tests would show even higher values of
random_page_cost if the readahead were turned up to 64kB or 128kB.




[*] A bit of an algebraic diversion: 

1s/10ms = 100 random buffers/s. 
random_page_cost = 4 so net sequential buffers/s = 400.

solve:

400 buffers = rnd+seq
1000ms = .2*seq + 10*rnd


-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] cpu_tuple_cost

2005-03-14 Thread Tom Lane
Greg Sabino Mullane [EMAIL PROTECTED] writes:
 On that note, can I raise the idea again of dropping the default
 value for random_page_cost in postgresql.conf? I think 4 is too
 conservative in this day and age. Certainly the person who will
 be negatively impacted by a default drop of 4 to 3 will be the
 exception and not the rule.

The ones who'd be negatively impacted are the ones we haven't
been hearing from ;-).  To assume that they aren't out there
is a logical fallacy.

I still think that 4 is about right for large databases (where
large is in comparison to available RAM).

Also, to the extent that we think these numbers mean anything at all,
we should try to keep them matching the physical parameters we think
they represent.  I think that the reduce random_page_cost mantra
is not an indication that that parameter is wrong, but that the
cost models it feeds into need more work.  One thing we *know*
is wrong is the costing of nestloop inner indexscans: there needs
to be a correction for caching of index blocks across repeated
scans.  I've looked at this a few times but not come up with
anything that seemed convincing.  Another thing I've wondered
about more than once is if we shouldn't discount fetching of
higher-level btree pages on the grounds that they're probably
in RAM already, even if the indexscan isn't inside a loop.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] cpu_tuple_cost

2005-03-14 Thread Bruce Momjian
Greg Sabino Mullane wrote:
[ There is text before PGP section. ]
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 
  Reducing random_page_cost is usually the best way to get the
  planner to favor indexscans more.
 
 On that note, can I raise the idea again of dropping the default
 value for random_page_cost in postgresql.conf? I think 4 is too
 conservative in this day and age. Certainly the person who will
 be negatively impacted by a default drop of 4 to 3 will be the
 exception and not the rule.

Agreed.  I think we should reduce it at least to 3.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] cpu_tuple_cost

2005-03-14 Thread Gregory Stark

Bruce Momjian pgman@candle.pha.pa.us wrote:

 Agreed.  I think we should reduce it at least to 3.

Note that changing it from 4 to 3 or even 2 is unlikely to really change much.
Many of the plans people complain about turn out to have critical points
closer to 1.2 or 1.1. 

The only reason things work out better with such low values is because people
have data sets that fit more or less entirely in RAM. So values close to 1 or
even equal to 1 actually represent the reality.

The this day and age argument isn't very convincing. Hard drive capacity
growth has far outstripped hard drive seek time and bandwidth improvements.
Random access has more penalty than ever.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] cpu_tuple_cost

2005-03-14 Thread Josh Berkus
Greg,

 On that note, can I raise the idea again of dropping the default
 value for random_page_cost in postgresql.conf? I think 4 is too
 conservative in this day and age. Certainly the person who will
 be negatively impacted by a default drop of 4 to 3 will be the
 exception and not the rule.

I don't agree.  The defaults are there for people who aren't going to read 
enough of the documentation to set them.  As such, conservative for the 
defaults is appropriate.

If we were going to change anything automatically, it would be to set 
effective_cache_size to 1/3 of RAM at initdb time.  However, I don't know any 
method to determine RAM size that works on all the platforms we support.

Tom,

 Also, to the extent that we think these numbers mean anything at all,
 we should try to keep them matching the physical parameters we think
 they represent. 

Personally, what I would love to see is the system determining and caching 
some of these parameters automatically.   For example, in a database which 
has been running in production for a couple of days, it should be possible to 
determine the ratio of average random seek tuple cost to average seq scan 
tuple cost.

Other parameters should really work the same way.   Effective_cache_size, for 
example, is a blunt instrument to replace what the database should ideally do 
through automated interactive fine tuning.  Particularly since we have 2 
separate caches (or 3, if you count t1 and t2 from 2Q).   What the planner 
really needs to know is: is this table or index already in the t1 or t2 cache 
(can't we determine this?)?   How likely is it to be in the filesystem cache?  
The latter question is not just one of size (table  memory), but one of 
frequency of access.

Of course, this stuff is really, really hard which is why we rely on the 
GUCs ...

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] cpu_tuple_cost

2005-03-13 Thread Daniel Schuchardt
I have forgotten this :
CREATE OR REPLACE FUNCTION date_to_yearmonth_dec(TIMESTAMP) RETURNS 
VARCHAR AS'
BEGIN
 RETURN extract(year FROM $1) || extract(month FROM $1)-1;
END'LANGUAGE plpgsql IMMUTABLE;

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] cpu_tuple_cost

2005-03-13 Thread Tom Lane
Daniel Schuchardt [EMAIL PROTECTED] writes:
 i have a query plan who is bad with standard cpu_tuple_costs and good if 
 I raise cpu_tuple_costs. Is it is a good practice to raise them if i 
 want to force postgres to use indexes more often?

Reducing random_page_cost is usually the best way to get the planner to
favor indexscans more.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster