Re: [PERFORM] planner costs in warm cache tests

2010-06-04 Thread Robert Haas
On Mon, May 31, 2010 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jesper Krogh jes...@krogh.cc writes:
 On 2010-05-30 20:34, Tom Lane wrote:
 Well, hmm, I really doubt that that represents reality either.  A page
 access is by no means free even when the page is already in cache.
 I don't recall anyone suggesting that you set these numbers to less
 than perhaps 0.01.

 Thank you for the prompt response. Is it a false assumption that the
 cost should in some metric between different plans be a measurement
 of actual run-time in a dead-disk run?

 Well, the default cost parameters (seq_page_cost=1, random_page_cost=4)
 are intended to model the non-cached state where most page fetches
 actually do require a disk access.  They are definitely too large
 relative to the cpu_xxx_cost parameters when you have a fully-cached
 database, but what I've seen people recommending for that condition
 is to set them both to the same value in the vicinity of 0.1 or 0.01
 or so.  If it's only mostly cached you might try intermediate settings.

I have had to set it as low as .005 to get the right things to happen.
 Could have been a fluke, I suppose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] planner costs in warm cache tests

2010-06-01 Thread Scott Carey
It is still best to have random_page_cost to be slightly larger (~50%) than 
sequential_page_cost, because even when entirely in RAM, sequential reads are 
faster than random reads.  Today's CPU's do memory prefetching on sequential 
access.  Perhaps try something like 0.3 and 0.2, or half that.  You still don't 
want it to gratuitously scan a lot of RAM -- reading a page is not free and can 
kick out other pages from shared_buffers.


On May 31, 2010, at 12:55 PM, Tom Lane wrote:

 Jesper Krogh jes...@krogh.cc writes:
 On 2010-05-30 20:34, Tom Lane wrote:
 Well, hmm, I really doubt that that represents reality either.  A page
 access is by no means free even when the page is already in cache.
 I don't recall anyone suggesting that you set these numbers to less
 than perhaps 0.01.
 
 Thank you for the prompt response. Is it a false assumption that the
 cost should in some metric between different plans be a measurement
 of actual run-time in a dead-disk run?
 
 Well, the default cost parameters (seq_page_cost=1, random_page_cost=4)
 are intended to model the non-cached state where most page fetches
 actually do require a disk access.  They are definitely too large
 relative to the cpu_xxx_cost parameters when you have a fully-cached
 database, but what I've seen people recommending for that condition
 is to set them both to the same value in the vicinity of 0.1 or 0.01
 or so.  If it's only mostly cached you might try intermediate settings.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] planner costs in warm cache tests

2010-06-01 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 It is still best to have random_page_cost to be slightly larger (~50%)
 than sequential_page_cost, because even when entirely in RAM,
 sequential reads are faster than random reads.  Today's CPU's do
 memory prefetching on sequential access.

Do you have any actual evidence of that?  Because I don't believe it.
Neither PG nor any kernel that I've ever heard of makes any effort to
ensure that logically sequential blocks occupy physically sequential
buffers, so even if the CPU tries to do some prefetching, it's not
going to help at all.

Now, if the database isn't entirely cached, then indeed it's a good
idea to keep random_page_cost higher than seq_page_cost.  But that's
because of the actual disk fetches, not anything that happens in RAM.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] planner costs in warm cache tests

2010-05-31 Thread Jesper Krogh

On 2010-05-30 20:34, Tom Lane wrote:

Jesper Kroghjes...@krogh.cc  writes:
   

testdb=# set seq_page_cost = 0.1;
SET
testdb=# set random_page_cost = 0.1;
SET
 

Well, hmm, I really doubt that that represents reality either.  A page
access is by no means free even when the page is already in cache.
I don't recall anyone suggesting that you set these numbers to less
than perhaps 0.01.

   

Thank you for the prompt response. Is it a false assumption that the
cost should in some metric between different plans be a measurement
of actual run-time in a dead-disk run?

It should most likely be matching a typical workload situation, but that
it really hard to tell anything about, so my feeling would be that the
dead disk case is the one closest?

--
Jesper

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] planner costs in warm cache tests

2010-05-31 Thread Tom Lane
Jesper Krogh jes...@krogh.cc writes:
 On 2010-05-30 20:34, Tom Lane wrote:
 Well, hmm, I really doubt that that represents reality either.  A page
 access is by no means free even when the page is already in cache.
 I don't recall anyone suggesting that you set these numbers to less
 than perhaps 0.01.
 
 Thank you for the prompt response. Is it a false assumption that the
 cost should in some metric between different plans be a measurement
 of actual run-time in a dead-disk run?

Well, the default cost parameters (seq_page_cost=1, random_page_cost=4)
are intended to model the non-cached state where most page fetches
actually do require a disk access.  They are definitely too large
relative to the cpu_xxx_cost parameters when you have a fully-cached
database, but what I've seen people recommending for that condition
is to set them both to the same value in the vicinity of 0.1 or 0.01
or so.  If it's only mostly cached you might try intermediate settings.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] planner costs in warm cache tests

2010-05-30 Thread Jesper Krogh

Hi.

I'm trying to get the planner to do sort of the correct thing
when choosing between index-scans on btree indices and
bitmap heap scans.

There has been several things going on in parallel. One
is that the statistics data is off:
http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/141420/focus=141735

The other one that the costestimates (number of pages
to read) is inaccurate on gin indices:
http://archives.postgresql.org/pgsql-performance/2009-10/msg00393.php
which there also is coming a solution to that I'm testing out.

I was trying to nullify the problem with the wrongly estimated number
of pages to read and see if the rest seems to work as expected.

The theory was that if I set seq_page_cost and random_page_cost to
something really low (0 is not permitted) and ran tests
on a fully cached query (where both costs indeed is really low).
then the cheapest query should indeed also be the fastest one.
Let me know if the logic is flawed.

The test dataset is 1365462 documents, running pg9.0b1, both queries run 
twice to

see that the data actually is fully cached as expected.

testdb=# set seq_page_cost = 0.1;
SET
testdb=# set random_page_cost = 0.1;
SET
testdb=# set enable_indexscan = on;
SET
testdb=# explain analyze select id from testdb.reference where 
document_tsvector @@ to_tsquery('literature') order by accession_number 
limit 200;
 QUERY 
PLAN

-
 Limit  (cost=0.00..432.82 rows=200 width=11) (actual 
time=831.456..2167.302 rows=200 loops=1)
   -  Index Scan using ref_acc_idx on reference  (cost=0.00..61408.12 
rows=28376 width=11) (actual time=831.451..2166.434 rows=200 loops=1)

 Filter: (document_tsvector @@ to_tsquery('literature'::text))
 Total runtime: 2167.982 ms
(4 rows)

testdb=# explain analyze select id from testdb.reference where 
document_tsvector @@ to_tsquery('literature') order by accession_number 
limit 200;
 QUERY 
PLAN

-
 Limit  (cost=0.00..432.82 rows=200 width=11) (actual 
time=842.990..2187.393 rows=200 loops=1)
   -  Index Scan using ref_acc_idx on reference  (cost=0.00..61408.12 
rows=28376 width=11) (actual time=842.984..2186.540 rows=200 loops=1)

 Filter: (document_tsvector @@ to_tsquery('literature'::text))
 Total runtime: 2188.083 ms
(4 rows)

testdb=# set enable_indexscan = off;
SET
testdb=# explain analyze select id from testdb.reference where 
document_tsvector @@ to_tsquery('literature') order by accession_number 
limit 200;

QUERY PLAN

--
 Limit  (cost=2510.68..2511.18 rows=200 width=11) (actual 
time=270.016..270.918 rows=200 loops=1)
   -  Sort  (cost=2510.68..2581.62 rows=28376 width=11) (actual 
time=270.011..270.321 rows=200 loops=1)

 Sort Key: accession_number
 Sort Method:  top-N heapsort  Memory: 34kB
 -  Bitmap Heap Scan on reference  (cost=219.94..1284.29 
rows=28376 width=11) (actual time=13.897..216.700 rows=21613 loops=1)
   Recheck Cond: (document_tsvector @@ 
to_tsquery('literature'::text))
   -  Bitmap Index Scan on reference_fts_idx  
(cost=0.00..212.85 rows=28376 width=0) (actual time=10.053..10.053 
rows=21613 loops=1)
 Index Cond: (document_tsvector @@ 
to_tsquery('literature'::text))

 Total runtime: 271.323 ms
(9 rows)

testdb=# explain analyze select id from testdb.reference where 
document_tsvector @@ to_tsquery('literature') order by accession_number 
limit 200;

QUERY PLAN

--
 Limit  (cost=2510.68..2511.18 rows=200 width=11) (actual 
time=269.881..270.782 rows=200 loops=1)
   -  Sort  (cost=2510.68..2581.62 rows=28376 width=11) (actual 
time=269.876..270.182 rows=200 loops=1)

 Sort Key: accession_number
 Sort Method:  top-N heapsort  Memory: 34kB
 -  Bitmap Heap Scan on reference  (cost=219.94..1284.29 
rows=28376 width=11) (actual time=14.113..216.173 rows=21613 loops=1)
   Recheck Cond: (document_tsvector @@ 
to_tsquery('literature'::text))
   -  Bitmap Index Scan on reference_fts_idx  
(cost=0.00..212.85 rows=28376 width=0) (actual time=10.360..10.360 
rows=21613 loops=1)
 Index Cond: (document_tsvector @@ 
to_tsquery('literature'::text))

 

Re: [PERFORM] planner costs in warm cache tests

2010-05-30 Thread Tom Lane
Jesper Krogh jes...@krogh.cc writes:
 testdb=# set seq_page_cost = 0.1;
 SET
 testdb=# set random_page_cost = 0.1;
 SET

Well, hmm, I really doubt that that represents reality either.  A page
access is by no means free even when the page is already in cache.
I don't recall anyone suggesting that you set these numbers to less
than perhaps 0.01.

In the case at hand, the problem is that the planner is preferring using
an indexscan to an after-the-fact sort to obtain the specified result
ordering.  Making page fetches look too cheap definitely plays into
that.  There may also be a statistical problem, if the location of the
desired records isn't independent of the accession_number ordering, but
you're not doing yourself any favors by pushing the planner cost
parameters several orders of magnitude outside the design envelope.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance