Re: [HACKERS] The science of optimization in practical terms?

2009-02-20 Thread decibel

On Feb 17, 2009, at 11:23 PM, Robert Haas wrote:

Actually, a simple algorithm that might work really well would be to
calculate relation cache odds as ( number of page accesses for  
relation /

number of page accesses for all relations ) * ( sum(relpages)*BLKSZ /
eff_cache_size ), where number of page accesses would be both from  
relcache

and not.


I don't think that formula makes any sense.  If effective_cache_size
is in the denominator, then increasing it will make the odds of
finding the page in cache go down.


Yes, sorry... I got that part of the equation upside-down. It should be:

( number of page accesses for relation / number of page accesses for  
all relations ) * ( eff_cache_size / sum(relpages)*BLKSZ )



One thing this doesn't address though is the report from a few
months ago that accessing small tables is still faster with an  
index scan,
even if we know the whole thing is in cache (I don't remember if  
that was

ever resolved...)


I'm not sure if this is what you're referring to, but there was a
relatively recent post on, I believe, -performance, where a bitmap
index scan that hit almost the entire table beat out a seqscan.  I
don't think there was any further discussion and I'm still mystified
as to how it's possible.



What I was thinking of was that when dealing with a very small table  
(one or maybe a few pages), the planner thinks that a seqscan is the  
fastest way to get a single row, but it's actually faster to use an  
index. The bitmap case is even more interesting. Something is  
seriously screwy with small seqscans it seems.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-20 Thread Robert Haas
On Fri, Feb 20, 2009 at 7:25 PM, decibel deci...@decibel.org wrote:
 On Feb 17, 2009, at 11:23 PM, Robert Haas wrote:

 Actually, a simple algorithm that might work really well would be to
 calculate relation cache odds as ( number of page accesses for relation /
 number of page accesses for all relations ) * ( sum(relpages)*BLKSZ /
 eff_cache_size ), where number of page accesses would be both from
 relcache
 and not.

 I don't think that formula makes any sense.  If effective_cache_size
 is in the denominator, then increasing it will make the odds of
 finding the page in cache go down.

 Yes, sorry... I got that part of the equation upside-down. It should be:

 ( number of page accesses for relation / number of page accesses for all
 relations ) * ( eff_cache_size / sum(relpages)*BLKSZ )

Well, that makes more sense, but it's still not right.  Suppose I have
ten equal-sized relations whose total size is equal to
effective_cache_size.  Relations 1-5 each get 15% of the page accesses
and relations 6-10 each get 5% of the page accesses.  Under your
formula, relations 1-5 will be 150% in cache and relations 6-10 will
be 50% in cache.  In reality, assuming sufficient frequency of access,
100% of all ten relations will be in cache.

I don't think there's any way to do this that doesn't involve some
sort of iterative process.  What you need to do is compute (# of page
accesses for this relation / number of page accesses for all
relations) * effective_cache_size, dole out that amount of cache to it
(capped at 100% of the relation size), and then decrement
effective_cache_size by the amount of cache you doled out and the
number of page accesses by the number for that relation, and then
rerun for the second-most-popular relation.

For example, suppose (in the example above) that effective_cache_size
= 1GB and there are 10K page accesses total.

Relation 1: MAX(1.5K/10K * 1GB, 100MB) = MAX(150MB, 100MB) = 100MB
Relation 2: MAX(1.5K/8.5K * 900MB, 100MB) = MAX(159MB, 100MB) = 100MB
Relation 3: MAX(1.5K/7K * 800MB, 100MB) = MAX(171MB, 100MB) = 100MB
Relation 4: MAX(1.5K/5.5K * 700MB, 100MB) = MAX(190MB, 100MB) = 100MB
Relation 5: MAX(1.5K/4K * 600MB, 100MB)  = MAX(225MB, 100MB) = 100MB
Relation 6: MAX(0.5K/2.5K * 500MB, 100MB) = MAX(100MB, 100MB) = 100MB
Relation 7: MAX(0.5K/2.0K * 400MB, 100MB) = MAX(100MB, 100MB) = 100MB
Relation 8: MAX(0.5K/1.5K * 300MB, 100MB) = MAX(100MB, 100MB) = 100MB
Relation 9: MAX(0.5K/1.0K * 200MB, 100MB) = MAX(100MB, 100MB) = 100MB
Relation 10: MAX(0.5K/0.5K * 100MB, 100MB) = MAX(100MB, 100MB) = 100MB

 One thing this doesn't address though is the report from a few
 months ago that accessing small tables is still faster with an index
 scan,
 even if we know the whole thing is in cache (I don't remember if that was
 ever resolved...)

 I'm not sure if this is what you're referring to, but there was a
 relatively recent post on, I believe, -performance, where a bitmap
 index scan that hit almost the entire table beat out a seqscan.  I
 don't think there was any further discussion and I'm still mystified
 as to how it's possible.

 What I was thinking of was that when dealing with a very small table (one or
 maybe a few pages), the planner thinks that a seqscan is the fastest way to
 get a single row, but it's actually faster to use an index. The bitmap case
 is even more interesting. Something is seriously screwy with small seqscans
 it seems.

Do you have a good test case for this?  I'd like to poke at it.  It's
really just because the planner thinks that accessing the index pages
will cost a disk read, which is often false in practice.  Does it help
if you set random_page_cost = seq_page_cost = 0.2?

The case I mentioned is qualitatively different because not only is
the planner wrong, but the observed behavior is somewhat inexplicable.
 I have a feeling this may have to do with the fact that bitmap
indices can identify individual tuples on the page when the tbm is
non-lossy.  Consulting the index (which is almost free if the page is
already in shared_buffers) not only finds the right page, but lets you
skip the CPU overhead of testing the quals against the irrelevant
tuples on that page.  But we need to do some better testing here to
figure out what is really going on.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Robert Haas
On Wed, Feb 18, 2009 at 1:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm interested to know whether anyone else shares my belief that
 nested loops are the cause of most really bad plans.  What usually
 happens to me is that the planner develops some unwarranted optimism
 about the number of rows likely to be generated by the outer side of
 the join and decides that it's not worth sorting the inner side or
 building a hash table or using an index, and that the right thing to
 do is just rescan the inner node on every pass.  When the outer side
 returns three or four orders of magnitude more results than expected,
 ka-pow!

 And then there is the other half of the world, who complain because it
 *didn't* pick a nestloop for some query that would have run in much less
 time if it had.

Well, that's my question: is that really the other half of the world,
or is it the other 5% of the world?  And how does it happen?  In my
experience, most bad plans are caused by bad selectivity estimates,
and the #1 source of bad selectivity estimates is selectivity
estimates for unknown expressions.

(Now it appears that Josh is having problems that are caused by
overestimating the cost of a page fetch, perhaps due to caching
effects.  Those are discussed upthread, and I'm still interested to
see whether we can arrive at any sort of consensus about what might be
a reasonable approach to attacking that problem.  My own experience
has been that this problem is not quite as bad, because it can throw
the cost off by a factor of 5, but not by a factor of 800,000, as in
my example of three unknown expressions with a combined selectivity of
0.1.)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Sam Mason
On Wed, Feb 18, 2009 at 01:34:25AM -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  I'm interested to know whether anyone else shares my belief that
  nested loops are the cause of most really bad plans.  What usually
  happens to me is that the planner develops some unwarranted optimism
  about the number of rows likely to be generated by the outer side of
  the join and decides that it's not worth sorting the inner side or
  building a hash table or using an index, and that the right thing to
  do is just rescan the inner node on every pass.  When the outer side
  returns three or four orders of magnitude more results than expected,
  ka-pow!
 
 And then there is the other half of the world, who complain because it
 *didn't* pick a nestloop for some query that would have run in much less
 time if it had.

There's something called interval arithmetic I heard about recently
that may help here.  The basic idea is to store two values, representing
the upper and lower bounds of a number, instead of its absolute value.
That way you know that the number is going to be somewhere in the middle
and round off becomes a non-less because you know when it's happened
(e.g. the FPU is set up to always round the lower bound down and the
upper bound up).  Round-off isn't a problem here, but it's one of the
algebra's nice properties.

If the planning was done with some sort of interval then you'd be
able to encode information about how well your stats characterized the
underlying data.  Traditionally awkward things like amount of cache
would serve to drop the lower bound, but not alter the upper.  The
planner then automatically propagate performance information through the
calculations, i.e. a nested loop with a tight estimate on a small number
of rows joined to a table with a wider estimate of a small number of
rows would keep the low lower bound but the upper-bound would tend to
make the planner stay away.

That said, I can't decide if that would help in any meaningful way!  A
good counter argument seems to be why not just always go with the upper
bound?  There are extensions to vanilla interval arithmetic that allow
distributions to be modeled instead of just an unknown interval.  You'd
then be able to use some sort of 95% confidence interval instead of a
horribly pessimistic worst case.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Robert Haas
 If the planning was done with some sort of interval then you'd be
 able to encode information about how well your stats characterized the
 underlying data.  Traditionally awkward things like amount of cache
 would serve to drop the lower bound, but not alter the upper.  The
 planner then automatically propagate performance information through the
 calculations, i.e. a nested loop with a tight estimate on a small number
 of rows joined to a table with a wider estimate of a small number of
 rows would keep the low lower bound but the upper-bound would tend to
 make the planner stay away.

Yeah, I thought about this too, but it seems like overkill for the
problem at hand, and as you say it's not clear you'd get any benefit
out of the upper bound anyway.  I was thinking of something simpler:
instead of directly multiplying 0.005 into the selectivity every time
you find something incomprehensible, keep a count of the number of
incomprehensible things you saw and at the end multiply by 0.005/N.
That way more unknown quals look more restrictive than fewer, but
things only get linearly wacky instead of exponentially wacky.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Yeah, I thought about this too, but it seems like overkill for the
 problem at hand, and as you say it's not clear you'd get any benefit
 out of the upper bound anyway.  I was thinking of something simpler:
 instead of directly multiplying 0.005 into the selectivity every time
 you find something incomprehensible, keep a count of the number of
 incomprehensible things you saw and at the end multiply by 0.005/N.
 That way more unknown quals look more restrictive than fewer, but
 things only get linearly wacky instead of exponentially wacky.

clauselist_selectivity could perhaps apply such a heuristic, although
I'm not sure how it could recognize default estimates from the various
specific estimators, since they're mostly all different.

Personally I've not seen all that many practical cases where the
estimator simply hasn't got a clue at all.  What's far more commonly
complained of IME is failure to handle *correlated* conditions in
an accurate fashion.  Maybe we should just discount the product
selectivity all the time, not only when we think the components are
default estimates.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Robert Haas
On Wed, Feb 18, 2009 at 11:46 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Yeah, I thought about this too, but it seems like overkill for the
 problem at hand, and as you say it's not clear you'd get any benefit
 out of the upper bound anyway.  I was thinking of something simpler:
 instead of directly multiplying 0.005 into the selectivity every time
 you find something incomprehensible, keep a count of the number of
 incomprehensible things you saw and at the end multiply by 0.005/N.
 That way more unknown quals look more restrictive than fewer, but
 things only get linearly wacky instead of exponentially wacky.

 clauselist_selectivity could perhaps apply such a heuristic, although
 I'm not sure how it could recognize default estimates from the various
 specific estimators, since they're mostly all different.

Presumably the estimators would need to be modified to provide some
information on their level of confidence in their estimate (possibly
this could be more general than whether the number is a default or
not, though I'm not sure what we'd do with that information).  But it
may not be necessary to go through that pain if we implement your idea
below.

 Personally I've not seen all that many practical cases where the
 estimator simply hasn't got a clue at all.  What's far more commonly
 complained of IME is failure to handle *correlated* conditions in
 an accurate fashion.  Maybe we should just discount the product
 selectivity all the time, not only when we think the components are
 default estimates.

That has something going for it, although off the top of my head I'm
not sure exactly what formula would make sense.  Presumably we want
the overall selectivity estimate to be less than the estimate for
individual clause taken individually, but how much less?  It doesn't
seem right to estimate the selectivity of S_1...S_n as MIN(S_1 ...
S_n) / n, because that will give you weird results with things like a
= 1 AND a != 2.  You might need to divide the estimates into two
buckets: those that reduce selectivity by a lot, and those that reduce
it only slightly, then multiply the latter bucket and, say, divide
through by the cardinality of the former bucket.  But the exact
details of the math are not obvious to me.

I'm talking off the top of my head here, maybe you have a more clear
thought as to how this would work?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Joshua D. Drake
On Wed, 2009-02-18 at 07:50 -0500, Robert Haas wrote:

 (Now it appears that Josh is having problems that are caused by
 overestimating the cost of a page fetch, perhaps due to caching
 effects.  Those are discussed upthread, and I'm still interested to
 see whether we can arrive at any sort of consensus about what might be
 a reasonable approach to attacking that problem.  My own experience
 has been that this problem is not quite as bad, because it can throw
 the cost off by a factor of 5, but not by a factor of 800,000, as in
 my example of three unknown expressions with a combined selectivity of
 0.1.)
 

Well a very big problem with any solution is that we are creating a
solution for a 2% problem. 98% of the postgresql installations out there
will never need to adjust cpu_tuple_cost, cpu_index_tuple_cost,
cpu_operator_cost, random_page_cost etc... They can get by just fine
with a tweak to shared_buffers, work_mem, effective_cache_size and
default_statistics_target.

What I think should happen is to do some testing one normal installs
and see if upping those parameters to .5 (or other amount) hinders those
98% installs. If it doesn't hinder those then we should up the default
and walk away.

Joshua D. Drake


 ...Robert
 
-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Ron Mayer
Robert Haas wrote:
 experience, most bad plans are caused by bad selectivity estimates,
 and the #1 source of bad selectivity estimates is selectivity
 estimates for unknown expressions.

ISTM unknown expressions should be modeled as a range of
values rather than one single arbitrary value.

For example, rather than just guessing 1000 rows, if an
unknown expression picked a wide range (say, 100 - 1
rows; or maybe even 1 - table_size), the planner could
choose a plan which wouldn't be pathologically slow
regardless of if the guess was too low or too high.

For that matter, it seems if all estimates used a range
rather than a single value, ISTM less in general we would
product less fragile plans.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Robert Haas
On Wed, Feb 18, 2009 at 2:46 PM, Ron Mayer
rm...@cheapcomplexdevices.com wrote:
 Robert Haas wrote:
 experience, most bad plans are caused by bad selectivity estimates,
 and the #1 source of bad selectivity estimates is selectivity
 estimates for unknown expressions.

 ISTM unknown expressions should be modeled as a range of
 values rather than one single arbitrary value.

 For example, rather than just guessing 1000 rows, if an
 unknown expression picked a wide range (say, 100 - 1
 rows; or maybe even 1 - table_size), the planner could
 choose a plan which wouldn't be pathologically slow
 regardless of if the guess was too low or too high.

 For that matter, it seems if all estimates used a range
 rather than a single value, ISTM less in general we would
 product less fragile plans.

It would be interesting to find out if something like this could be
made to work, but it's more than I'd be willing to bite off.  I think
this would require reworking large portions of the planner, and I am
doubtful that it could be done without a substantial loss of
performance.  The existing code considers A LOT of plans, to the point
where even a few more or fewer floating-point operations per plan
result in a measurable change in planning time that can be measured in
macro-benchmarks.

If we could somehow tamp down the amount of time considering plans
that turn out to be dead ends, it might free up some time to perform
some of these other computations.  But I'm not sure how to go about
that.  The best ideas I've come up with so far involve refactoring
joinpath.c to eliminate some of the duplicate computation and/or
somehow be more intelligent about which nested loops we generate.  But
I haven't come up with anything yet that's demonstrably better than
the add_path patch that I submitted a few weeks ago, which is not bad
but not earth-shattering either.  At any rate, we'd need to save quite
a bit to pay for carting around best and worst case costs for every
plan we consider.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... At any rate, we'd need to save quite
 a bit to pay for carting around best and worst case costs for every
 plan we consider.

Another problem with this is it doesn't really do anything to solve the
problem we were just discussing, namely having an intelligent way of
combining inaccurate estimates for WHERE clauses.  If you just take a
range of plausible values and multiply then it doesn't take very many
clauses to get to a range of [0,1] --- or at least a range of
probabilities wide enough to be unhelpful.

An idea that I think has been mentioned before is to try to identify
cases where we can *prove* there is at most one row emitted by a
sub-path (eg, because of a unique index, DISTINCT subplan, etc).  Then
we could penalize nestloops with outer relations that weren't provably a
single row.  This is basically restricting the notion of estimation
confidence to a special case that's particularly important for SQL.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Robert Haas
On Wed, Feb 18, 2009 at 3:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ... At any rate, we'd need to save quite
 a bit to pay for carting around best and worst case costs for every
 plan we consider.

 Another problem with this is it doesn't really do anything to solve the
 problem we were just discussing, namely having an intelligent way of
 combining inaccurate estimates for WHERE clauses.  If you just take a
 range of plausible values and multiply then it doesn't take very many
 clauses to get to a range of [0,1] --- or at least a range of
 probabilities wide enough to be unhelpful.

Yeah.

 An idea that I think has been mentioned before is to try to identify
 cases where we can *prove* there is at most one row emitted by a
 sub-path (eg, because of a unique index, DISTINCT subplan, etc).  Then
 we could penalize nestloops with outer relations that weren't provably a
 single row.  This is basically restricting the notion of estimation
 confidence to a special case that's particularly important for SQL.

I thought about this, too, and I agree.  Having this information
available would also be very helpful for join removal.  I believe that
you did some work on this for SEMI/ANTI-join support in the form of
query_is_distinct_for, but I'm not sure if that takes the right sort
of inputs for what we need here.  (It also doesn't seem to consider
the case of a baserel with a unique index for some reason...)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Simon Riggs

On Wed, 2009-02-18 at 15:32 -0500, Tom Lane wrote:

 An idea that I think has been mentioned before is to try to identify
 cases where we can *prove* there is at most one row emitted by a
 sub-path (eg, because of a unique index, DISTINCT subplan, etc).  Then
 we could penalize nestloops with outer relations that weren't provably a
 single row.  This is basically restricting the notion of estimation
 confidence to a special case that's particularly important for SQL.

Proof seems best way forward. IIRC the reason we didn't do this before
HOT is that unique index scans did often return many more than one row.
Now we have a much better chance of it being true.

As you say, propagation of error makes an error bars approach pointless
too quickly to be worth pursuing.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-18 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Wed, 2009-02-18 at 15:32 -0500, Tom Lane wrote:
 An idea that I think has been mentioned before is to try to identify
 cases where we can *prove* there is at most one row emitted by a
 sub-path (eg, because of a unique index, DISTINCT subplan, etc).

 Proof seems best way forward. IIRC the reason we didn't do this before
 HOT is that unique index scans did often return many more than one row.

But those extra tuples didn't make it up to the point of the join, so
they weren't really a problem for nestloop speed.  IMO the reason this
hasn't been done to date is that until recently we didn't have any
mechanism to ensure a plan got invalidated if some constraint it was
depending on (like uniqueness) went away.  Now we do, so it would be
safe to rely on the constraint for proof purposes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-17 Thread decibel

On Feb 15, 2009, at 9:54 PM, Robert Haas wrote:
On Sun, Feb 15, 2009 at 1:16 PM, Greg Smith gsm...@gregsmith.com  
wrote:

On Fri, 13 Feb 2009, Robert Haas wrote:

This seems plausible, but I'm not totally sold: predicting the
contents of the operating system buffer cache sounds like it might be
pretty touch.  And do we even need to go that far?   I'm kind of
wondering whether we might be able to leverage the information that
the statistics collector already gathers for this purpose - in
particular, the information on blocks fetched and read.  That might
not exactly model the current contents of the buffer cache, but it's
certainly a measure of popularity, and that may be all we really need.
 We're not going to invalidate every plan in the system on every
buffer eviction, so plans have to be based not so much on what is in
the buffer cache right now but on what we have a reasonable
expectation of finding there in the typical case.

Consider, for example, the degenerate (but not necessarily uncommon)
case where the entire database can fit within shared_buffers, or
perhaps shared_buffers + OS cache.  ISTM we're going to want to plan
as if the entire database is in cache all the time, even though that
might not always be true - right after restart, for example.


The shared_buffers + OS cache example is a reason why simply  
examining shared_buffers isn't likely to work well; in that case it  
definitely would not reflect reality. Though, really in that case we  
should be able to simply look at eff_cache_size as well as the size  
of the database and understand everything should be in memory.


Actually, a simple algorithm that might work really well would be to  
calculate relation cache odds as ( number of page accesses for  
relation / number of page accesses for all relations ) * ( sum 
(relpages)*BLKSZ / eff_cache_size ), where number of page accesses  
would be both from relcache and not. One thing this doesn't address  
though is the report from a few months ago that accessing small  
tables is still faster with an index scan, even if we know the whole  
thing is in cache (I don't remember if that was ever resolved...)


Another idea would be to look at an efficient way to measure how long  
it actually takes to pull data from the OS. This has been suggested  
in the past, but the idea there was to measure every block access,  
and the concern was the overhead of the timing calls. But what if we  
sampled instead? Or, what if we read multiple blocks at one time in  
the cases where we knew we'd need to (seqscan and an index scan  
needing more than one tuple). Another option would by an async IO  
process that is responsible for measuring this stuff; I believe some  
people have played with async IO and gotten good results.


Of course, on dtrace platforms we could just plug into dtrace...


You might also run into
problems with relations that have hot segments that are accessed
frequently and stay cached, and cold segments that are never
touched: if 20% of the relation is in cache, but that's the only 20%
of the relation we ever access, then our hit rate will be 100% rather
than 20%.


Yes, but that would be accurate :)

In reality, I think we need to re-visit the idea of evaluating how  
close a chosen query plan is matching reality as we're running. If we  
thought we'd be seeing a 100% hit rate but in reality it's much lower  
we could re-plan (of course this probably only makes sense for  
queries that take many seconds).



But even a primitive algorithm would probably be a lot better than
what we have now. I'm guessing that there are a lot of databases where
either the whole database fits in cache, or a decent chunk of
relatively small core relations fit in cache and then there are some
big or infrequently-used ones that don't.


--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-17 Thread Robert Haas
 Actually, a simple algorithm that might work really well would be to
 calculate relation cache odds as ( number of page accesses for relation /
 number of page accesses for all relations ) * ( sum(relpages)*BLKSZ /
 eff_cache_size ), where number of page accesses would be both from relcache
 and not.

I don't think that formula makes any sense.  If effective_cache_size
is in the denominator, then increasing it will make the odds of
finding the page in cache go down.

 One thing this doesn't address though is the report from a few
 months ago that accessing small tables is still faster with an index scan,
 even if we know the whole thing is in cache (I don't remember if that was
 ever resolved...)

I'm not sure if this is what you're referring to, but there was a
relatively recent post on, I believe, -performance, where a bitmap
index scan that hit almost the entire table beat out a seqscan.  I
don't think there was any further discussion and I'm still mystified
as to how it's possible.

 Another idea would be to look at an efficient way to measure how long it
 actually takes to pull data from the OS. This has been suggested in the
 past, but the idea there was to measure every block access, and the concern
 was the overhead of the timing calls. But what if we sampled instead? Or,
 what if we read multiple blocks at one time in the cases where we knew we'd
 need to (seqscan and an index scan needing more than one tuple). Another
 option would by an async IO process that is responsible for measuring this
 stuff; I believe some people have played with async IO and gotten good
 results.

 Of course, on dtrace platforms we could just plug into dtrace...

 You might also run into
 problems with relations that have hot segments that are accessed
 frequently and stay cached, and cold segments that are never
 touched: if 20% of the relation is in cache, but that's the only 20%
 of the relation we ever access, then our hit rate will be 100% rather
 than 20%.

 Yes, but that would be accurate :)

No, we'd predict the hit rate to be 20%, but the real hit rate would be 100%.

 In reality, I think we need to re-visit the idea of evaluating how close a
 chosen query plan is matching reality as we're running. If we thought we'd
 be seeing a 100% hit rate but in reality it's much lower we could re-plan
 (of course this probably only makes sense for queries that take many
 seconds).

I don't think it's going to be very practical to re-plan the query in
its entirety, because then you'd have to somehow undo all of the work
you'd done thus far (including side effects, if any), which might not
be possible and certainly isn't easy.  What might be practical is to
bail out of a nested loop that turns out to iterate more times than
expected and hash the inner rel, or even sort the remaining portion of
the outer rel and the entire inner rel and then merge-join them.  The
problem is that these sorts of changes can alter the order in which
results are generated, and if the parent node is something like a
merge-join that needs the results to be ordered in a particular way,
then you've got a problem.  Furthermore, it's not easy to decide when
to switch strategies.  If you decide to switch to a hash join just
prior to what would have been the last iteration of the nested loop,
you lose.

I'm interested to know whether anyone else shares my belief that
nested loops are the cause of most really bad plans.  What usually
happens to me is that the planner develops some unwarranted optimism
about the number of rows likely to be generated by the outer side of
the join and decides that it's not worth sorting the inner side or
building a hash table or using an index, and that the right thing to
do is just rescan the inner node on every pass.  When the outer side
returns three or four orders of magnitude more results than expected,
ka-pow!

Another approach to this problem might be to try to make the planner a
little more cautious about choosing nested loops in the first place.
Given a column a with the values 1 .. 10^6, the planner estimates the
number of rows for a = X as 1, a in (X1..Xn) as n, a not in (X1..Xn)
AS 10^6-n, and a  X for all X  100 as 100.  These are all pretty
reasonable estimates (one could hope to get a different result for a 
5 than a  100).  But as soon as you use some operation that the
planner knows nothing about, the guesses get really bad:

CREATE TABLE big (id serial, x text);
INSERT INTO big (x) SELECT random() FROM generate_series(1,100);
ANALYZE;
EXPLAIN SELECT * FROM big WHERE id % 2 = 0 AND (id + 0) % 2 = 0 AND
(id - 0) % 2 = 0;

  QUERY PLAN
--
 Seq Scan on big  (cost=0.00..36375.00 rows=1 width=22)
   Filter: (((id % 2) = 0) AND (((id + 0) % 2) = 0) AND (((id - 0) % 2) = 0))

The fact that the selectivity of an unknown expression is arbitrarily
set to 0.005 is a defensible choice that doesn't happen to 

Re: [HACKERS] The science of optimization in practical terms?

2009-02-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I'm interested to know whether anyone else shares my belief that
 nested loops are the cause of most really bad plans.  What usually
 happens to me is that the planner develops some unwarranted optimism
 about the number of rows likely to be generated by the outer side of
 the join and decides that it's not worth sorting the inner side or
 building a hash table or using an index, and that the right thing to
 do is just rescan the inner node on every pass.  When the outer side
 returns three or four orders of magnitude more results than expected,
 ka-pow!

And then there is the other half of the world, who complain because it
*didn't* pick a nestloop for some query that would have run in much less
time if it had.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-15 Thread Greg Smith

On Fri, 13 Feb 2009, Robert Haas wrote:

Gather statistics on relation access patterns and use that to estimate 
the fraction of a relation likely to be in cache.


At one point I had a hacked background writer that collected statistics 
about the contents of the buffer cache.  Since it's obtaining a lock on 
the buffer header anyway, it's a perfectly good place to note what 
relfileid the buffer is associated with.  If you set aside some fixed 
amount of space to hold information about the most popular relations 
(perhaps using a continuous top-k model, see 
http://www.mysmu.edu/faculty/kyriakos/topk-SIGMOD06.pdf ), you can end up 
with enough data to estimate how much data in shared_buffers exists for 
the most cached relations in there.


In a typical recommended tuning nowadays, we can only expect that to 
sample about 1/3 of the total caching happening (presuming 
shared_buffers=1/4 RAM and effective_cache_size~=3/4 RAM).  While in 
general it's nice to think that shared_buffers has a similar makeup to 
what the OS is caching, it's not hard to discover common cases where this 
would not be the case.  Particularly given the VACUUM/seq scan ring-buffer 
improvements in 8.3, it's easy to imagine scanning a table that's 
2*shared_buffers in size showing only 256KB in shared_buffers, while the 
whole thing is available in the OS cache.


I had a eureka moment where I realized I could hook the buffer eviction 
code to model that.  Decrement the count for that relation in the main 
top-k count, then have a second count that assumes the last 
2*shared_buffers evicted are also still cached.  That would accurately 
model the ring-buffer case and improve the quality of the model in 
general.  Updating those stats on every eviction would add some overhead, 
but if the background writer is doing enough of them for you that should 
at least be asynchronous from when most backends are blocked waiting for 
an eviction.


And that's as far as I got before I had to return to real work again.

--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-15 Thread Kevin Grittner
 Greg Smith gsm...@gregsmith.com wrote: 
 have a second count that assumes the last 
 2*shared_buffers evicted are also still cached.
 
Perhaps it would be better to assume that the external cache is
effective_cache_size - shared_buffers?  Of course, we would need to
have some heuristics to cover odd settings (like effective_cache_size
less than shared_buffers).
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-15 Thread Robert Haas
On Sun, Feb 15, 2009 at 1:16 PM, Greg Smith gsm...@gregsmith.com wrote:
 On Fri, 13 Feb 2009, Robert Haas wrote:
 Gather statistics on relation access patterns and use that to estimate the
 fraction of a relation likely to be in cache.

 At one point I had a hacked background writer that collected statistics
 about the contents of the buffer cache.  Since it's obtaining a lock on the
 buffer header anyway, it's a perfectly good place to note what relfileid the
 buffer is associated with.  If you set aside some fixed amount of space to
 hold information about the most popular relations (perhaps using a
 continuous top-k model, see
 http://www.mysmu.edu/faculty/kyriakos/topk-SIGMOD06.pdf ), you can end up
 with enough data to estimate how much data in shared_buffers exists for the
 most cached relations in there.

 In a typical recommended tuning nowadays, we can only expect that to sample
 about 1/3 of the total caching happening (presuming shared_buffers=1/4 RAM
 and effective_cache_size~=3/4 RAM).  While in general it's nice to think
 that shared_buffers has a similar makeup to what the OS is caching, it's not
 hard to discover common cases where this would not be the case.
  Particularly given the VACUUM/seq scan ring-buffer improvements in 8.3,
 it's easy to imagine scanning a table that's 2*shared_buffers in size
 showing only 256KB in shared_buffers, while the whole thing is available in
 the OS cache.

 I had a eureka moment where I realized I could hook the buffer eviction code
 to model that.  Decrement the count for that relation in the main top-k
 count, then have a second count that assumes the last 2*shared_buffers
 evicted are also still cached.  That would accurately model the ring-buffer
 case and improve the quality of the model in general.  Updating those stats
 on every eviction would add some overhead, but if the background writer is
 doing enough of them for you that should at least be asynchronous from when
 most backends are blocked waiting for an eviction.

 And that's as far as I got before I had to return to real work again.

This seems plausible, but I'm not totally sold: predicting the
contents of the operating system buffer cache sounds like it might be
pretty touch.  And do we even need to go that far?   I'm kind of
wondering whether we might be able to leverage the information that
the statistics collector already gathers for this purpose - in
particular, the information on blocks fetched and read.  That might
not exactly model the current contents of the buffer cache, but it's
certainly a measure of popularity, and that may be all we really need.
 We're not going to invalidate every plan in the system on every
buffer eviction, so plans have to be based not so much on what is in
the buffer cache right now but on what we have a reasonable
expectation of finding there in the typical case.

Consider, for example, the degenerate (but not necessarily uncommon)
case where the entire database can fit within shared_buffers, or
perhaps shared_buffers + OS cache.  ISTM we're going to want to plan
as if the entire database is in cache all the time, even though that
might not always be true - right after restart, for example.

I kind of feel like the algorithm for predicting the cache hit ratio
ought to be some sort of ANALYZE-like process that runs periodically
and stores away a value for each table.  When we run the algorithm, we
take a user-supplied estimate of the total cache available on the
machine (effective_cache_size, or some new GUC we invent for this
purpose) and divvy it up among all the relations in the database in
proportion to popularity score.  That is, we divide the number of
popularity points for the most popular relation by the total of all
popularity values and assign that amount of cache to the most popular
relation (but not more than the relation size).  We then repeat this
algorithm with the remaining relations and the remaining amount of
cache, and then approximate the cache hit ratio for each relation as
the allocated amount of cache divided by the relation size.  To
prevent plans from changing too abruptly, the popularity function
should use some sort of sliding window.

Where you're going to run into trouble with any algorithm of this type
is databases that switch between multiple, distinct workloads.  I have
no idea what to do about that problem.  You might also run into
problems with relations that have hot segments that are accessed
frequently and stay cached, and cold segments that are never
touched: if 20% of the relation is in cache, but that's the only 20%
of the relation we ever access, then our hit rate will be 100% rather
than 20%.

But even a primitive algorithm would probably be a lot better than
what we have now. I'm guessing that there are a lot of databases where
either the whole database fits in cache, or a decent chunk of
relatively small core relations fit in cache and then there are some
big or infrequently-used ones that don't.

...Robert


Re: [HACKERS] The science of optimization in practical terms?

2009-02-13 Thread Bernd Helmle
--On Donnerstag, Februar 12, 2009 16:06:31 -0800 Joshua D. Drake 
j...@commandprompt.com wrote:



However, in recent times I have found that increasing cpu_tuple_cost,
cpu_operator_cost and cpu_index_tuple_cost to be very useful. This is
always in the scenario of, queries were running fine for months and
then all of a sudden, they are not. It is also always on systems that
we are already maintaining and thus (in theory) are in good shape.


Hmm have you tried seq_page_cost and random_page_cost, too? I found them 
really important especially if you have a steadily growing database or a 
fully cached database to reflect the real disk access costs.


--
 Thanks

   Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-13 Thread Grzegorz Jaskiewicz

yet more arguments, to let postgresql estimate those automatically.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-13 Thread Joshua D. Drake
On Fri, 2009-02-13 at 20:10 +, Grzegorz Jaskiewicz wrote:
 yet more arguments, to let postgresql estimate those automatically.
 

Well I haven't seen any arguments actually. Which was the point of my
original question. I don't think anyone actually knows what these knobs
change, in practice.

Joshua D. Drake



 
-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-13 Thread Hannu Krosing
On Thu, 2009-02-12 at 16:06 -0800, Joshua D. Drake wrote:
 Hello,
 
 I was helping a customer today with what is becoming a common theme with
 a lot of work we do. Basically, It was working fine until recently.
 Now 90% of the time it is as simple as running an ANALYZE VERBOSE and
 picking apart relations that aren't being maintained properly and adjust
 autovacuum or vacuum appropriately. If it isn't that, it is usually
 something like increasing effective_cache_size, or
 default_statistics_target.
 
 However, in recent times I have found that increasing cpu_tuple_cost,
 cpu_operator_cost and cpu_index_tuple_cost to be very useful. This is
 always in the scenario of, queries were running fine for months and
 then all of a sudden, they are not. It is also always on systems that
 we are already maintaining and thus (in theory) are in good shape.
 
 So my question is, what is the science in practical terms behind those
 parameters? 

In most general terms increasing these will favor index access over
seqscans (well, increasing _only_ cpu_index_tuple_cost won't but still).

Basically - for any database with multiple user growing anywere near
where it will not fit mostly in cache - it is a good rule of a thumb to
push postgres in direction of selecting plans with mostly index access,
as postgreSQL's planner is not much aware of other queries running in
parallel and thus can not do much by itself know that it should.

Things are fast until main tables stay in cache, preferably in pg shared
memory byt os cache is also quite ok and then suddenly deteriorate, once
some part of it does not.

You can watch for these things by monitoring pg_stat_user_tables /
pg_stat_user_indexes and make educated guesses if som etables should or
should not have that much traffic of the type indicated there.

 Normally I would just accept it as another PostgreSQL
 idiosyncrasy but the performance differences I am talking about are
 large. After changing cpu_tuple_cost and cpu_operator_cost today to 0.5
 I decreased two queries from 10 seconds and 15 seconds to 2 seconds and
 ~900 ms respectively.
 
 Sincerely,
 
 Joshua D. Drake
  
 
 -- 
 PostgreSQL - XMPP: jdr...@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
 

-- 
--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-13 Thread Robert Haas
On Fri, Feb 13, 2009 at 3:27 PM, Joshua D. Drake j...@commandprompt.com wrote:
 On Fri, 2009-02-13 at 20:10 +, Grzegorz Jaskiewicz wrote:
 yet more arguments, to let postgresql estimate those automatically.

 Well I haven't seen any arguments actually. Which was the point of my
 original question. I don't think anyone actually knows what these knobs
 change, in practice.

Well, in broad strokes, it seems to me that what they do is fairly
obvious: they affect the planner's willingness to choose plans that
touch more pages vs. plans that involve more CPU overhead (e.g. qual
evaluation).  If the database is mostly or entirely in shared buffers
or the system buffer cache, and CPU consumption is a problem, then
raising the CPU costs is apt to help.

I think the root of this problem is that we can't model caching
effects.  random_page_cost  seq_page_cost models the cost of seeks,
but min(random_page_cost, seq_page_cost)  max(cpu_tuple_cost,
cpu_index_tuple_cost, cpu_operator_cost) models the fact that read
from disk, even sequentially, is always slow.  Unfortunately, if the
whole database is likely already in memory, which seems to be a pretty
common scenario even for relatively large databases (because people
buy more memory to make them fit), then it's just wrong.

If we had a good knowledge of which pages were apt to be cached, we
could add a GUC cached_page_cost with a default value of maybe 0.2,
and presumably we'd get better plans that way.  The bad news is that
it's pretty difficult to get that knowledge (and of course it could
change after the fact if the usage pattern of the database shifts
dramatically).  The good news is that experimentation is possible.
For example, we could:

- Assume that small relations are more likely to be cached (so derate
page costs when accessing them).
- Allow users to override the page cost on a per-rel basis using a reloption.
- Gather statistics on relation access patterns and use that to
estimate the fraction of a relation likely to be in cache.

If your whole database stays in memory all the time, I would guess
that you could either raise the CPU costs or drop the page costs quite
substantially and that would probably work out fine.  What's tougher
is to still be able to generate good plans when only part of the
database fits in memory, or there's other activity on the system that
is periodically purging portions of the system cache.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] The science of optimization in practical terms?

2009-02-12 Thread Joshua D. Drake
Hello,

I was helping a customer today with what is becoming a common theme with
a lot of work we do. Basically, It was working fine until recently.
Now 90% of the time it is as simple as running an ANALYZE VERBOSE and
picking apart relations that aren't being maintained properly and adjust
autovacuum or vacuum appropriately. If it isn't that, it is usually
something like increasing effective_cache_size, or
default_statistics_target.

However, in recent times I have found that increasing cpu_tuple_cost,
cpu_operator_cost and cpu_index_tuple_cost to be very useful. This is
always in the scenario of, queries were running fine for months and
then all of a sudden, they are not. It is also always on systems that
we are already maintaining and thus (in theory) are in good shape.

So my question is, what is the science in practical terms behind those
parameters? Normally I would just accept it as another PostgreSQL
idiosyncrasy but the performance differences I am talking about are
large. After changing cpu_tuple_cost and cpu_operator_cost today to 0.5
I decreased two queries from 10 seconds and 15 seconds to 2 seconds and
~900 ms respectively.

Sincerely,

Joshua D. Drake
 

-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers