Re: [HACKERS] SeqScan costs

2008-08-20 Thread Decibel!

On Aug 18, 2008, at 11:49 AM, Tom Lane wrote:
Perhaps what's also needed here is to measure just how accurate  
the cpu_*
costs are. Perhaps they need to be raised somewhat if we're  
underestimating
the cost of digging through 200 tuples on a heap page and the  
benefit of a

binary search on the index tuples.


Possibly.  I doubt anyone's ever taken a hard look at the cpu_xxx
values.



Josh Berkus indicated at PGCon that he's had luck *decreasing* the  
CPU costs, but IIRC that was mostly on OLAP systems. It seems we need  
some real data here.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] SeqScan costs

2008-08-18 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 On Tue, 2008-08-12 at 15:46 -0400, Tom Lane wrote:
 This is only going to matter for a table of 1 block (or at least very
 few blocks), and for such a table it's highly likely that it's in RAM
 anyway.  So I'm unconvinced that the proposed change represents a
 better model of reality.

 I think the first block of a sequential scan is clearly a random access. If
 that doesn't represent reality well then perhaps we need to tackle both
 problems together.

 The point I was trying to make (evidently not too well) is that fooling
 around with fundamental aspects of the cost models is not something that
 should be done without any evidence.  We've spent ten years getting the
 system to behave reasonably well with the current models, and it's quite
 possible that changing them to be more accurate according to a
 five-minute analysis is going to make things markedly worse overall.

 I'm not necessarily opposed to making this change --- it does sound
 kinda plausible --- but I want to see some hard evidence that it does
 more good than harm before we put it in.

I don't want to see this thread completely drop because it also seems pretty
plausible to me too.

So what kind of evidence do we need? I'm thinking a query like

select (select count(*) from 1pagetable) as n1,
   (select count(*) from 2pagetable) as n2,
   (select count(*) from 3pagetable) as n3,
   ...
  from fairlylargetable

for various maximum size subquery tables would give an idea of how much cpu
time is spent thrashing through the sequential scans. If we raise the cost of
small sequential scans do the resulting costs get more accurate or do they get
out of whack?

Perhaps what's also needed here is to measure just how accurate the cpu_*
costs are. Perhaps they need to be raised somewhat if we're underestimating
the cost of digging through 200 tuples on a heap page and the benefit of a
binary search on the index tuples.

 People lower random_page_cost because we're not doing a good job estimating
 how much of a table is in cache.

 Agreed, the elephant in the room is that we lack enough data to model
 caching effects with any degree of realism.

It looks like we *do* discount the page accesses in index_pages_fetched based
on effective_cache_size. But that's the *only* place we use
effective_cache_size. We aren't discounting sequential scan or heap page
accesses even when the entire table is much smaller than effective_cache_size
and therefore hopefully cached.

We need to think about this. I'm a bit concerned that if we assume small
tables are always cached that we'll run into problems on the poor but common
schema design that has hundreds of tiny tables. But that seems like a narrow
use case and not worth assuming we *never* get any cache effects on sequential
scans at all.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-18 Thread Simon Riggs

On Mon, 2008-08-18 at 16:44 +0100, Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  Gregory Stark [EMAIL PROTECTED] writes:
  On Tue, 2008-08-12 at 15:46 -0400, Tom Lane wrote:
  This is only going to matter for a table of 1 block (or at least
 very
  few blocks), and for such a table it's highly likely that it's in
 RAM
  anyway.  So I'm unconvinced that the proposed change represents a
  better model of reality.
 
  I think the first block of a sequential scan is clearly a random
 access. If
  that doesn't represent reality well then perhaps we need to tackle
 both
  problems together.
 
  The point I was trying to make (evidently not too well) is that
 fooling
  around with fundamental aspects of the cost models is not something
 that
  should be done without any evidence.  We've spent ten years getting
 the
  system to behave reasonably well with the current models, and it's
 quite
  possible that changing them to be more accurate according to a
  five-minute analysis is going to make things markedly worse overall.
 
  I'm not necessarily opposed to making this change --- it does sound
  kinda plausible --- but I want to see some hard evidence that it
 does
  more good than harm before we put it in.
 
 I don't want to see this thread completely drop because it also seems
 pretty
 plausible to me too.
 
 So what kind of evidence do we need? I'm thinking a query like
 
 select (select count(*) from 1pagetable) as n1,
(select count(*) from 2pagetable) as n2,
(select count(*) from 3pagetable) as n3,
...
   from fairlylargetable
 
 for various maximum size subquery tables would give an idea of how
 much cpu
 time is spent thrashing through the sequential scans. If we raise the
 cost of
 small sequential scans do the resulting costs get more accurate or do
 they get
 out of whack?

Sounds OK. I've not given up on this yet...

 Perhaps what's also needed here is to measure just how accurate the
 cpu_*
 costs are. Perhaps they need to be raised somewhat if we're
 underestimating
 the cost of digging through 200 tuples on a heap page and the benefit
 of a
 binary search on the index tuples.

Well, that's a can of worms you just opened. I'm trying to suggest a
specific fix to a specific problem.

  People lower random_page_cost because we're not doing a good job
 estimating
  how much of a table is in cache.
 
  Agreed, the elephant in the room is that we lack enough data to
 model
  caching effects with any degree of realism.
 
 It looks like we *do* discount the page accesses in
 index_pages_fetched based
 on effective_cache_size. But that's the *only* place we use
 effective_cache_size. We aren't discounting sequential scan or heap
 page
 accesses even when the entire table is much smaller than
 effective_cache_size
 and therefore hopefully cached.

That is about block reuse within the same scan, that's why it only
happens there. It doesn't assume the indexes are already cached, it just
says that they will be if we scan heap blocks in index order.

 We need to think about this. I'm a bit concerned that if we assume
 small tables are always cached 

I've not suggested that and we don't currently assume that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-18 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 I'm not necessarily opposed to making this change --- it does sound
 kinda plausible --- but I want to see some hard evidence that it does
 more good than harm before we put it in.

 I don't want to see this thread completely drop because it also seems pretty
 plausible to me too.

 So what kind of evidence do we need? I'm thinking a query like

 select (select count(*) from 1pagetable) as n1,
(select count(*) from 2pagetable) as n2,
(select count(*) from 3pagetable) as n3,
...
   from fairlylargetable

Well, no, because there's no freedom of choice there.  What we want is
to find out how this change impacts seqscan vs indexscan choices for
small tables, and then whether those choices get better or worse.

 Perhaps what's also needed here is to measure just how accurate the cpu_*
 costs are. Perhaps they need to be raised somewhat if we're underestimating
 the cost of digging through 200 tuples on a heap page and the benefit of a
 binary search on the index tuples.

Possibly.  I doubt anyone's ever taken a hard look at the cpu_xxx
values.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-14 Thread Decibel!

On Aug 13, 2008, at 10:45 PM, Andrew Gierth wrote:

You could likely expose a difference using LIMIT 1 in the subselect,
but that doesn't tell us anything we didn't already know (which is
that yes, index scan is much faster than seqscan even for 1-block
tables, except in the rare case when neither the index page nor the
table page are in cache, causing the indexscan to take two page
fetches rather than just one).

Oddly enough, when I try it with LIMIT 1, it _does_ show a significant
speed difference according to the row position, _but_ the index scan
is still twice as fast even when fetching only row 1 (which is indeed
physically first).



So the question is: why?? How can it be cheaper to hit 2 buffers than 1?

Though, unless we can improve the speed of seqscanning an entire page  
vs pulling the exact row we need it's probably still a moot point.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] SeqScan costs

2008-08-13 Thread Simon Riggs

On Tue, 2008-08-12 at 23:22 -0400, Tom Lane wrote:
 Gregory Stark [EMAIL PROTECTED] writes:
  On Tue, 2008-08-12 at 15:46 -0400, Tom Lane wrote:
  This is only going to matter for a table of 1 block (or at least very
  few blocks), and for such a table it's highly likely that it's in RAM
  anyway.  So I'm unconvinced that the proposed change represents a
  better model of reality.
 
  I think the first block of a sequential scan is clearly a random access. If
  that doesn't represent reality well then perhaps we need to tackle both
  problems together.
 
 The point I was trying to make (evidently not too well) is that fooling
 around with fundamental aspects of the cost models is not something that
 should be done without any evidence.  We've spent ten years getting the
 system to behave reasonably well with the current models, and it's quite
 possible that changing them to be more accurate according to a
 five-minute analysis is going to make things markedly worse overall.
 
 I'm not necessarily opposed to making this change --- it does sound
 kinda plausible --- but I want to see some hard evidence that it does
 more good than harm before we put it in.

psql -f seq.sql -v numblocks=2 -v pkval=Anything -v filler=Varies

When numblocks=2 I consistently see that an index scan is actually
faster than a seqscan, yet the planner chooses a seqscan in all cases. 

This is true for any value of pkval and values of filler up to 4-500
bytes. We already take into account the length of rows because we
estimate the CPU costs per row not per block. That is not what I wish to
change.

This same situation occurs for all small tables. What I conclude is that
the disk cost swamps the CPU costs and so we end up with a seq scan
when we really want an index scan.

There are two ways of looking at this
* we work out a complex scheme for knowing when to remove disk costs
* we realise that the disk cost is actually the same on the *first*
block whether we are in memory or on disk. 

If we take the second way, then we have a small but crucial correction
factor that produces better plans in most cases on small tables. Doing
it this way allows us to not worry about the cacheing, but just have a
scheme that balances the access costs better so that although they are
still present in the total cost the final plan choice is less dependent
upon the disk cost and more dependent upon the CPU costs.

This analysis is the result of experience, then measurement, not theory.
I've been looking for an easy and justifiable way to nudge the cost
factors so that they work better for small tables.

  run_cost += random_page_cost + seq_page_cost * (baserel-pages - 1);


  People lower random_page_cost because we're not doing a good job estimating
  how much of a table is in cache.
 
 Agreed, the elephant in the room is that we lack enough data to model
 caching effects with any degree of realism.

I'm specifically talking about a proposal that works whether or not the
first block of the table is in cache, because I see a problem with small
table access.

I'm not suggesting that we model cacheing effects (though we may choose
to later). If you did, you might need to consider cross-statement
effects such as the likelihood that a UPDATE .. WHERE CURRENT OF CURSOR
is more likely to find the block in cache, or other effects such as
certain MFVs might actually be more likely to be in cache than non-MFVs
and so index scans against them are actually more preferable than it
might appear.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-13 Thread Zeugswetter Andreas OSB sIT
   Proposal: Make the first block of a seq scan cost random_page_cost, then
   after that every additional block costs seq_page_cost.

+1

 AFAICS the cost cross-over is much higher than the actual elapsed time
 cross-over for both narrow and wide tables.

Which makes absolute sense, since readahead can only reduce cost when you
read more than one page (and more than a few when lacking fadvise tech).

I am wondering about your test though. It was all cached, so it seems
we also underestimate the CPU cost of accessing and comparing all the rows
during seq scan.

 Thats why using SET enable_seqscan=off helps performance in many cases,
 or why people reduce random_page_cost to force index selection.

Sequential scans that cause more IO than an alternate index path are often
counter productive in highly concurrent scenarios.
In such scenarios it is really reasonable to lower random_page_cost.

Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-13 Thread Decibel!

On Aug 12, 2008, at 4:52 PM, Andrew Gierth wrote:

Tom == Tom Lane [EMAIL PROTECTED] writes:



Proposal: Make the first block of a seq scan cost
random_page_cost, then after that every additional block costs
seq_page_cost.


?Tom This is only going to matter for a table of 1 block (or at least
?Tom very few blocks), and for such a table it's highly likely that
?Tom it's in RAM anyway.? So I'm unconvinced that the proposed change
?Tom represents a better model of reality.

Simple example which demonstrates a 10x speed improvement for index
scan over seqscan for a 1-block table (on 8.3.3):

create table oneblock (id integer primary key, value text not null);?
insert into oneblock select i, 'row ' || i from generate_series 
(1,200) i;


test= select pg_relation_size('oneblock');
?pg_relation_size?
--
?? ? ? ? ? ? 8192

analyze oneblock;

set enable_seqscan=true;

select (select value from oneblock where id = i)
? from generate_series(1,200) i, generate_series(1,5000) j;
Time: 25596.709 ms? (that's 25.6 us per row)

set enable_seqscan=false;

select (select value from oneblock where id = i)
? from generate_series(1,200) i, generate_series(1,5000) j;
Time: 2415.691 ms ? (that's 2.4 us per row)


Roughly what I get on my MBP (I'm about a factor of 2 slower). This  
makes me think it's an issue of having to slog through an entire page  
one row at a time vs just using the index. To test this I tested  
selecting i=200 (remember we start filling data at the back of the  
page, so 200 would actually be the front, and I'm assuming the first  
value that would be hit) vs i=1. With seqscans, I saw about a 10%  
difference. With index scans the difference was moot, but also note  
that now index scans are in-between seqscans in performance.


[EMAIL PROTECTED] explain analyze select (select value from  
oneblock where id = 200)

  from generate_series(1,200) i, generate_series(1,50) j;
   QUERY  
PLAN
 
-
 Nested Loop  (cost=17.00..20029.50 rows=100 width=0) (actual  
time=270.867..65821.373 rows=1 loops=1)

   InitPlan
 -  Seq Scan on oneblock  (cost=0.00..3.50 rows=1 width=7)  
(actual time=0.052..0.053 rows=1 loops=1)

   Filter: (id = 200)
   -  Function Scan on generate_series i  (cost=0.00..12.50  
rows=1000 width=0) (actual time=0.062..0.351 rows=200 loops=1)
   -  Materialize  (cost=13.50..23.50 rows=1000 width=0) (actual  
time=1.368..164.634 rows=50 loops=200)
 -  Function Scan on generate_series j  (cost=0.00..12.50  
rows=1000 width=0) (actual time=270.743..459.335 rows=50 loops=1)

 Total runtime: 79055.822 ms
(8 rows)

[EMAIL PROTECTED] explain analyze select (select value from  
oneblock where id = 1)

  from generate_series(1,200) i, generate_series(1,50) j;
   QUERY  
PLAN
 
-
 Nested Loop  (cost=17.00..20029.50 rows=100 width=0) (actual  
time=261.941..72937.226 rows=1 loops=1)

   InitPlan
 -  Seq Scan on oneblock  (cost=0.00..3.50 rows=1 width=7)  
(actual time=0.025..0.056 rows=1 loops=1)

   Filter: (id = 1)
   -  Function Scan on generate_series i  (cost=0.00..12.50  
rows=1000 width=0) (actual time=0.060..0.346 rows=200 loops=1)
   -  Materialize  (cost=13.50..23.50 rows=1000 width=0) (actual  
time=1.375..182.474 rows=50 loops=200)
 -  Function Scan on generate_series j  (cost=0.00..12.50  
rows=1000 width=0) (actual time=261.815..448.652 rows=50 loops=1)

 Total runtime: 87702.315 ms
(8 rows)

[EMAIL PROTECTED] set enable_seqscan =off;
SET
[EMAIL PROTECTED] explain analyze select (select value from  
oneblock where id = 200)

  from generate_series(1,200) i, generate_series(1,50) j;
   QUERY  
PLAN
 
-
 Nested Loop  (cost=21.77..20034.27 rows=100 width=0) (actual  
time=262.219..69851.786 rows=1 loops=1)

   InitPlan
 -  Index Scan using oneblock_pkey on oneblock   
(cost=0.00..8.27 rows=1 width=7) (actual time=0.024..0.026 rows=1  
loops=1)

   Index Cond: (id = 200)
   -  Function Scan on generate_series i  (cost=0.00..12.50  
rows=1000 width=0) (actual time=0.062..0.355 rows=200 loops=1)
   -  Materialize  (cost=13.50..23.50 rows=1000 width=0) (actual  
time=1.325..174.314 rows=50 loops=200)
 -  Function Scan on generate_series j  (cost=0.00..12.50  
rows=1000 width=0) (actual time=262.119..449.383 rows=50 loops=1)

 Total runtime: 83024.952 ms
(8 rows)

[EMAIL 

Re: [HACKERS] SeqScan costs

2008-08-13 Thread Andrew Gierth
 Decibel! == Decibel!  [EMAIL PROTECTED] writes:

 Decibel Roughly what I get on my MBP (I'm about a factor of 2
 Decibel slower). This makes me think it's an issue of having to slog
 Decibel through an entire page one row at a time vs just using the
 Decibel index. To test this I tested selecting i=200 (remember we
 Decibel start filling data at the back of the page, so 200 would
 Decibel actually be the front, and I'm assuming the first value that
 Decibel would be hit) vs i=1. With seqscans, I saw about a 10%
 Decibel difference. With index scans the difference was moot, but
 Decibel also note that now index scans are in-between seqscans in
 Decibel performance.

The problem is that by looking for a constant row, you're actually
eliminating the entire effect being tested, because the uncorrelated
subselect is run ONCE as an initplan, and the entire query time is
then spent elsewhere. The differences in runtime you're seeing are
pure noise (the fact that you had to increase the iteration count so
much should have been a clue here).

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-13 Thread Decibel!
On Wed, Aug 13, 2008 at 07:33:40PM +0100, Andrew Gierth wrote:
 The following message is a courtesy copy of an article
 that has been posted to pgsql.hackers as well.
 
  Decibel! == Decibel!  [EMAIL PROTECTED] writes:
 
  Decibel Roughly what I get on my MBP (I'm about a factor of 2
  Decibel slower). This makes me think it's an issue of having to slog
  Decibel through an entire page one row at a time vs just using the
  Decibel index. To test this I tested selecting i=200 (remember we
  Decibel start filling data at the back of the page, so 200 would
  Decibel actually be the front, and I'm assuming the first value that
  Decibel would be hit) vs i=1. With seqscans, I saw about a 10%
  Decibel difference. With index scans the difference was moot, but
  Decibel also note that now index scans are in-between seqscans in
  Decibel performance.
 
 The problem is that by looking for a constant row, you're actually
 eliminating the entire effect being tested, because the uncorrelated
 subselect is run ONCE as an initplan, and the entire query time is
 then spent elsewhere. The differences in runtime you're seeing are
 pure noise (the fact that you had to increase the iteration count so
 much should have been a clue here).

Makes sense, and yeah, I was wondering a bit about that. I'll try to
fake it out with offset 0 later on if no one beats me to it; I do still
think we could just be seeing the effect of slogging through 200 tuples
instead of going directly to the one we want.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828


pgpHzjB5xruWd.pgp
Description: PGP signature


Re: [HACKERS] SeqScan costs

2008-08-13 Thread Gregory Stark
Decibel! [EMAIL PROTECTED] writes:

 Makes sense, and yeah, I was wondering a bit about that. I'll try to
 fake it out with offset 0 later on if no one beats me to it; I do still
 think we could just be seeing the effect of slogging through 200 tuples
 instead of going directly to the one we want.

Of course index lookups aren't magic. You don't get to go *directly* to the
one you want. You still have to slog through index tuples to find the right
pointer.

That means going to the index meta page, find the fast root pointer, look up
that page, look at the single leaf page pointer, look up that page, and do a
binary search of the 200 leaf pointers. Once you find the resulting match,
look up the heap page and *then* go directly to the right tuple.

So that means up to four trips to the buffer cache, trips through lwlocks,
etc. And it still means up to 9 btree comparisons. Still less than 200 but
it's not entirely free.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-13 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 That means going to the index meta page, find the fast root pointer, look up
 that page, look at the single leaf page pointer, look up that page, and do a
 binary search of the 200 leaf pointers. Once you find the resulting match,
 look up the heap page and *then* go directly to the right tuple.

Actually, the metapage access has been cached for some time, and there's
not going to be a separate root page if you only have 1 page worth of
index entries.  But yeah, for large indexes there are going to be
multiple page accesses before you find what you want.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-13 Thread Decibel!

On Aug 13, 2008, at 3:52 PM, Decibel! wrote:

The problem is that by looking for a constant row, you're actually
eliminating the entire effect being tested, because the uncorrelated
subselect is run ONCE as an initplan, and the entire query time is
then spent elsewhere. The differences in runtime you're seeing are
pure noise (the fact that you had to increase the iteration count so
much should have been a clue here).


Makes sense, and yeah, I was wondering a bit about that. I'll try to
fake it out with offset 0 later on if no one beats me to it; I do  
still
think we could just be seeing the effect of slogging through 200  
tuples

instead of going directly to the one we want.



OK, ran the test again via this query:

explain analyze select (select value from oneblock where id = i) from  
generate_series(1,1) i, generate_series(1,10) j;


changing 1,1 to 200,200 as needed. I don't see any meaningful  
differences between 1,1 and 200,200. The seqscan case is still  
notably slower than the index case (~5500ms vs ~800ms).


It'd be useful to get strace data on this, but OS X doesn't have  
that :/ (and I'm on 10.4 so no dtrace either). Can someone get an  
strace from this?

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] SeqScan costs

2008-08-13 Thread Tom Lane
Decibel! [EMAIL PROTECTED] writes:
 It'd be useful to get strace data on this, but OS X doesn't have  
 that :/ (and I'm on 10.4 so no dtrace either).

See ktrace.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-13 Thread Andrew Gierth
 Decibel == Decibel!  [EMAIL PROTECTED] writes:

 Decibel OK, ran the test again via this query:

 Decibel explain analyze select (select value from oneblock where id = i)
 Decibel from generate_series(1,1) i, generate_series(1,10) j;

 Decibel changing 1,1 to 200,200 as needed. I don't see any
 Decibel meaningful differences between 1,1 and 200,200.

Well of course you don't, since it scans all the rows regardless.
(Scalar subselects don't abort after the first row, they only abort if
they find _more_ than one row, and in this example there is only one,
so the whole of oneblock is scanned every time.)

You could likely expose a difference using LIMIT 1 in the subselect,
but that doesn't tell us anything we didn't already know (which is
that yes, index scan is much faster than seqscan even for 1-block
tables, except in the rare case when neither the index page nor the
table page are in cache, causing the indexscan to take two page
fetches rather than just one).

Oddly enough, when I try it with LIMIT 1, it _does_ show a significant
speed difference according to the row position, _but_ the index scan
is still twice as fast even when fetching only row 1 (which is indeed
physically first).

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SeqScan costs

2008-08-12 Thread Simon Riggs

If we access a 1 block table using a SeqScan then it costs
seq_page_cost, or 1 by default.

If we access the same 1 block table using an IndexScan then the access
costs random_page_cost to access the index block and then
random_page_cost to access to the data block.

So the same block accessed for different reasons is judged to have two
different costs. But that clearly must be wrong, since the I/O is a 1
block I/O in either case,

This leads to favouring seq scans in cases where the actual index scan
timing is actually less than seq scan.

Proposal: Make the first block of a seq scan cost random_page_cost, then
after that every additional block costs seq_page_cost.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-12 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Proposal: Make the first block of a seq scan cost random_page_cost, then
 after that every additional block costs seq_page_cost.

This is only going to matter for a table of 1 block (or at least very
few blocks), and for such a table it's highly likely that it's in RAM
anyway.  So I'm unconvinced that the proposed change represents a
better model of reality.

Perhaps more to the point, you haven't provided any actual evidence
that this is a better approach.  I'm disinclined to tinker with the
fundamental cost models on the basis of handwaving.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-12 Thread Simon Riggs

On Tue, 2008-08-12 at 15:46 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Proposal: Make the first block of a seq scan cost random_page_cost, then
  after that every additional block costs seq_page_cost.
 
 This is only going to matter for a table of 1 block (or at least very
 few blocks), and for such a table it's highly likely that it's in RAM
 anyway.  So I'm unconvinced that the proposed change represents a
 better model of reality.

The access cost should be the same for a 1 block table, whether its on
disk or in memory.

 Perhaps more to the point, you haven't provided any actual evidence
 that this is a better approach.  I'm disinclined to tinker with the
 fundamental cost models on the basis of handwaving.

I've written a simple test suite

psql -f seq.sql -v numblocks=x -v pkval=y -v filler=z

to investigate various costs and elapsed times.

AFAICS the cost cross-over is much higher than the actual elapsed time
cross-over for both narrow and wide tables. 

Thats why using SET enable_seqscan=off helps performance in many cases,
or why people reduce random_page_cost to force index selection.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

drop table if exists block:numblocks;
create table block:numblocks (col1 integer not null, col2 text not null);
insert into block:numblocks
select generate_series(1,8192*:numblocks/(48+:filler))
,repeat('a', :filler);
create unique index idx_block:numblocks on block:numblocks (col1);
analyze block:numblocks;
select pg_relation_size('block' || (:numblocks)::text);
\timing 
\pset pager off
explain
select * from block:numblocks where col1 = :pkval;
select * from block:numblocks where col1 = :pkval;
select * from block:numblocks where col1 = :pkval;
select * from block:numblocks where col1 = :pkval;
select * from block:numblocks where col1 = :pkval;
select * from block:numblocks where col1 = :pkval;
select * from block:numblocks where col1 = :pkval;

set enable_seqscan = off;

explain
select * from block:numblocks where col1 = :pkval;
select * from block:numblocks where col1 = :pkval;
select * from block:numblocks where col1 = :pkval;
select * from block:numblocks where col1 = :pkval;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-12 Thread Andrew Gierth
 Tom == Tom Lane [EMAIL PROTECTED] writes:

  Proposal: Make the first block of a seq scan cost
  random_page_cost, then after that every additional block costs
  seq_page_cost.

 Tom This is only going to matter for a table of 1 block (or at least
 Tom very few blocks), and for such a table it's highly likely that
 Tom it's in RAM anyway.  So I'm unconvinced that the proposed change
 Tom represents a better model of reality.

Simple example which demonstrates a 10x speed improvement for index
scan over seqscan for a 1-block table (on 8.3.3):

create table oneblock (id integer primary key, value text not null); 
insert into oneblock select i, 'row ' || i from generate_series(1,200) i;

test= select pg_relation_size('oneblock');
 pg_relation_size 
--
 8192

analyze oneblock;

set enable_seqscan=true;

select (select value from oneblock where id = i)
  from generate_series(1,200) i, generate_series(1,5000) j;
Time: 25596.709 ms  (that's 25.6 us per row)

set enable_seqscan=false;

select (select value from oneblock where id = i)
  from generate_series(1,200) i, generate_series(1,5000) j;
Time: 2415.691 ms   (that's 2.4 us per row)

(removing the subselect entirely gives 0.4us per row, so it's actually
about a 12x speed difference for the subselect alone.)

The planner costs the seqscan at 3.50 and the indexscan at 8.27.

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-12 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes:

 On Tue, 2008-08-12 at 15:46 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Proposal: Make the first block of a seq scan cost random_page_cost, then
  after that every additional block costs seq_page_cost.
 
 This is only going to matter for a table of 1 block (or at least very
 few blocks), and for such a table it's highly likely that it's in RAM
 anyway.  So I'm unconvinced that the proposed change represents a
 better model of reality.

I think the first block of a sequential scan is clearly a random access. If
that doesn't represent reality well then perhaps we need to tackle both
problems together. Somehow we need to discount scan i/o cost based on how much
of the table we expect to be in cache. For 1-block tables if we should expect
them to be in cache we should be zeroing out all the i/o cost whether random
or sequential.

 The access cost should be the same for a 1 block table, whether its on
 disk or in memory.

Uhm, huh? That can't be what you meant to write?

 AFAICS the cost cross-over is much higher than the actual elapsed time
 cross-over for both narrow and wide tables. 

 Thats why using SET enable_seqscan=off helps performance in many cases,
 or why people reduce random_page_cost to force index selection.

People lower random_page_cost because we're not doing a good job estimating
how much of a table is in cache. I think that would be a great target for some
careful analysis. If you can come up with specific places and reasonable
heuristics to discount i/o costs based on effective_cache_size and then
demonstrate cases where it produces consistently better cost estimates that
would be a huge help.

I've been running benchmarks where I see accurate random_page_costs of 13-80
on uncached data on a moderate sized raid array. But of course when a some of
the data is cached the effective random_page_cost is much much lower than
that.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-12 Thread Jeff Davis
On Tue, 2008-08-12 at 23:58 +0100, Gregory Stark wrote:
 People lower random_page_cost because we're not doing a good job
  estimating how much of a table is in cache. 

Is it because of a bad estimate about how much of a table is in cache,
or a bad assumption about the distribution of access to a table?

If the planner were to know, for example, that 10-20% of the table is
likely to be in cache, will that really make a difference in the plan? I
suspect that it would mostly only matter when the entire table is
cached, the correlation is low, and the query is somewhat selective
(which is a possible use case, but fairly narrow).

I suspect that this has more to do with the fact that some data is
naturally going to be accessed much more frequently than other data in a
large table. But how do you determine, at plan time, whether the query
will be mostly accessing hot data, or cold data?

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-12 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 On Tue, 2008-08-12 at 15:46 -0400, Tom Lane wrote:
 This is only going to matter for a table of 1 block (or at least very
 few blocks), and for such a table it's highly likely that it's in RAM
 anyway.  So I'm unconvinced that the proposed change represents a
 better model of reality.

 I think the first block of a sequential scan is clearly a random access. If
 that doesn't represent reality well then perhaps we need to tackle both
 problems together.

The point I was trying to make (evidently not too well) is that fooling
around with fundamental aspects of the cost models is not something that
should be done without any evidence.  We've spent ten years getting the
system to behave reasonably well with the current models, and it's quite
possible that changing them to be more accurate according to a
five-minute analysis is going to make things markedly worse overall.

I'm not necessarily opposed to making this change --- it does sound
kinda plausible --- but I want to see some hard evidence that it does
more good than harm before we put it in.

 People lower random_page_cost because we're not doing a good job estimating
 how much of a table is in cache.

Agreed, the elephant in the room is that we lack enough data to model
caching effects with any degree of realism.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SeqScan costs

2008-08-12 Thread Simon Riggs

On Tue, 2008-08-12 at 23:58 +0100, Gregory Stark wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
 
  On Tue, 2008-08-12 at 15:46 -0400, Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   Proposal: Make the first block of a seq scan cost random_page_cost, then
   after that every additional block costs seq_page_cost.
  
  This is only going to matter for a table of 1 block (or at least very
  few blocks), and for such a table it's highly likely that it's in RAM
  anyway.  So I'm unconvinced that the proposed change represents a
  better model of reality.
 
 I think the first block of a sequential scan is clearly a random access. 

Agreed

 If
 that doesn't represent reality well then perhaps we need to tackle both
 problems together. 

It does represent reality.

 Somehow we need to discount scan i/o cost based on how much
 of the table we expect to be in cache. For 1-block tables if we should expect
 them to be in cache we should be zeroing out all the i/o cost whether random
 or sequential.

This isn't relevant. 

  The access cost should be the same for a 1 block table, whether its on
  disk or in memory.
 
 Uhm, huh? That can't be what you meant to write?

It can be. The *comparative* access cost (in the optimizer) between
random and sequential access should be exactly the same for a 1 block
table whether the block is on disk or in memory. In reality the block
access time is identical for a 1 block table at each level of the
storage hierarchy

* in shared_buffers (ReadBuffer)
* in filesystem cache (read() to cache)
* on disk (read() to disk)

  AFAICS the cost cross-over is much higher than the actual elapsed time
  cross-over for both narrow and wide tables. 
 
  Thats why using SET enable_seqscan=off helps performance in many cases,
  or why people reduce random_page_cost to force index selection.
 
 People lower random_page_cost because we're not doing a good job estimating
 how much of a table is in cache. I think that would be a great target for some
 careful analysis. If you can come up with specific places and reasonable
 heuristics to discount i/o costs based on effective_cache_size and then
 demonstrate cases where it produces consistently better cost estimates that
 would be a huge help.
 
 I've been running benchmarks where I see accurate random_page_costs of 13-80
 on uncached data on a moderate sized raid array. But of course when a some of
 the data is cached the effective random_page_cost is much much lower than
 that.

Agreed, but thats a harder problem and nothing I wish to raise here.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers