Re: [HACKERS] generalizing the planner knobs

2005-12-08 Thread Jaime Casanova

 BTW, there's another end to the 'enable_seqscan=false' problem... it
 sometimes doesn't work! Last I looked, enable_seqscan=false only added a
 fixed overhead cost to a seqscan (100 IIRC). The problem is, some
 queries will produce estimates for other methodes that are more
 expensive than a seqscan even with the added burden. If instead of
 adding a fixed amount enable_seqscan=false multiplied by some amount
 then this would probably be impossible to occur.

 (And before someone asks, no, I don't remember which query was actually
 faster...)
 --
 Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
 Pervasive Software  http://pervasive.comwork: 512-231-6117
 vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461


I have often considered that this is an indication that seq scan is
actually the better plan... although, i have to admit that is a little
confusing that  enable_seqscan = false actually let you use a seqscan
if the other plans are bad enough

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] generalizing the planner knobs

2005-12-08 Thread Tom Lane
Jaime Casanova [EMAIL PROTECTED] writes:
 BTW, there's another end to the 'enable_seqscan=false' problem... it
 sometimes doesn't work!

 I have often considered that this is an indication that seq scan is
 actually the better plan...

There are cases where it is the *only* plan, eg, you have no relevant
indexes.  I am not sure that applies to Jim's complaint though.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] generalizing the planner knobs

2005-12-08 Thread Jim C. Nasby
On Thu, Dec 08, 2005 at 01:07:10PM -0500, Tom Lane wrote:
 Jaime Casanova [EMAIL PROTECTED] writes:
  BTW, there's another end to the 'enable_seqscan=false' problem... it
  sometimes doesn't work!
 
  I have often considered that this is an indication that seq scan is
  actually the better plan...
 
 There are cases where it is the *only* plan, eg, you have no relevant
 indexes.  I am not sure that applies to Jim's complaint though.

IIRC I ran into this when I was working on generating some numbers about
how well a high correlation improves the performance of an index scan
(since afaict the cost estimator for index scan is rather broken :( ) In
that case, I had defined an index on a ~120M row table on a collumn with
a very low correlation. It's pretty much a given that a seqscan and sort
would be faster than the index scan, but it would have still been good
to be able to verify that. Because of how enable_seqscan works, I
couldn't.

BTW,
http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php is
where I first mentioned this, including the cost function that I think
is broken.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] generalizing the planner knobs

2005-12-07 Thread Jim C. Nasby
On Thu, Dec 01, 2005 at 12:32:12PM -0500, Qingqing Zhou wrote:
 
 Neil Conway [EMAIL PROTECTED] wrote
 
  This would also be useful when diagnosing bad query plans: for example,
  setting enable_seqscan=false often causes the planner to disregard the
  use of *any* sequential scan, anywhere in the plan. The ability to
  slightly bump up the cost of particular operations would allow more
  alternative plans to be examined.
 
 
 This method also has the problem of enable_seqscan=false in some 
 situations. I would vote we implement the final general solution like query 
 plan hints directly.

BTW, there's another end to the 'enable_seqscan=false' problem... it
sometimes doesn't work! Last I looked, enable_seqscan=false only added a
fixed overhead cost to a seqscan (100 IIRC). The problem is, some
queries will produce estimates for other methodes that are more
expensive than a seqscan even with the added burden. If instead of
adding a fixed amount enable_seqscan=false multiplied by some amount
then this would probably be impossible to occur.

(And before someone asks, no, I don't remember which query was actually
faster...)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] generalizing the planner knobs

2005-12-06 Thread Rod Taylor
On Fri, 2005-12-02 at 15:49 -0500, Greg Stark wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
 
   In the extreme, no amount of added intelligence in the optimizer is going 
   to
   help it come up with any sane selectivity estimate for something like 
   
 WHERE radius_authenticate(user) = 'OK'
  
  Why not?
  
  The missing capability in this case is to be able to provide or generate
  (self learning?) statistics for a function that describe a typical result
  and the cost of getting that result.
 
 Ok, try WHERE radius_authenticate(user, (select ...), ?)
 
 The point is that you can improve the estimates the planner gets. But you can
 never make them omniscient. There will always be cases where the user knows
 his data more than the planner. And those hints are still valid when a new
 optimizer has new plans available.

You missed my point. If the user knows there data there is absolutely no
reason, aside from missing functionality in PostgreSQL, that statistics
cannot be generated to represent what the user knows about their data.

Once the planner knows the statistics it can make the right decision
without any hints.

The missing feature here is the ability to generate or provide
statistics and costs for functions.



-- 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] generalizing the planner knobs

2005-12-05 Thread Simon Riggs
On Sun, 2005-12-04 at 12:49 -0300, Alvaro Herrera wrote:
 Simon Riggs wrote:
 
  ISTM we could do some of that with another GUC, lets call it
  prepare_once = on. The system default is to have a prepared statement
  bound to a plan on its first parameter bind. If we set this to off,
  then the statement will replan each time we bind. This would give us
  both flexibility and predictability. (As ever, someone suggest a better
  name?).
 
 Why would all statements behave the same?  

They would be flexible and predictable, but not the same.

prepare_once = off
would reoptimize each statement, so each could have a potentially
different plan. Which, in the case I cited, is the only optimal
behaviour: sticking to any one plan, by any method, would be wrong.

The plans would be predictable because performance never exceeds the
worst case SeqScan; planning would be flexible because it will always
take the best plan.

 I think an important
 percentage of cases would require a fixed plan (thus planning at first
 sight is a good idea), while a limited number of cases would require
 planning every time the sentence is called.  

Yes, that is exactly what I see. Hence a GUC with a default the same as
it is now: they would only be prepared once. You would only set the GUC
to another value when you have a statement that looks like it needs
hinting i.e. the plan flips from SeqScan to IndexScan and back
depending upon the input data. 

 Your idea of qualifying it
 by table name does not make too much sense to me, because you can have
 both types of queries for each table, and further any query where this
 is necessary will involve more than one table anyway, so which one do
 you choose to make the decision?

That was a different idea later down my note, not a variation of the
same one: that had nothing to do with the prepare_once concept. Those
options were meant to be set on a per statement basis, not at the server
level.

I was trying to solve Neil's stated problem: How to force one part of a
query to avoid a SeqScan, yet without touching the others.

 So we would provide a protocol/libpq option to allow first-params-
 planning (the default and current behavior), and another to allow
 planning-every-time.  The latter would tell the server to save only the
 parsetree of the query and replan each time it is invoked.

Or some function similar. I prefer the GUC because it does not imply a
protocol change.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] generalizing the planner knobs

2005-12-05 Thread Simon Riggs
On Sun, 2005-12-04 at 13:47 -0500, Pollard, Mike wrote:
 Simon Riggs wrote
   The system default is to have a prepared statement
  bound to a plan on its first parameter bind. 
 
 We call it deferred optimization.
 
 Do you really stop at the first parameter?  

The first bind of parameters to the query, yes.

 You
 can do the same thing with correlated subqueries

Not currently done, AFAIK.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] generalizing the planner knobs

2005-12-05 Thread Simon Riggs
On Mon, 2005-12-05 at 01:53 -0500, Greg Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:

  There is no such thing as a plan
  that is good for every case --- outlying data values can make a
  usually-good plan blow out your performance guarantee anyway. 
 
 But outlying data is something the user has control over. 

Unfortunately, the DBA cannot choose the data distribution in his
database. So the appearance of control is somewhat illusory.

 The user when
 approving plans needs to be aware not just that the plan is experimentally
 good, but that it will perform reliably within the constraints based on his
 knowledge of the application and the data.

Greg's idea to have a plan comparator is a good one, for most
situations.

What you'll see if you run it though is no matter what you do, there
will be a few queries that are resistant to tuning. Their stored plans
will flip from SeqScan to IndexScan and back depending upon the
parameters used; neither will be suitable all the time and either
setting will cause very variable response times.

For those queries only, I seek a solution.

[Priming the cache by executing IndexScan causing queries does not
work for all cases, so again the appearance of control is illusory.]

My solution is to replan the queries each time, rather than just once on
first parameter bind. By some mechanism; the GUC is just one of those.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] generalizing the planner knobs

2005-12-05 Thread Hans-Juergen Schoenig


On Dec 5, 2005, at 4:17 AM, Tom Lane wrote:


Greg Stark [EMAIL PROTECTED] writes:

Plan stability is also an important feature, especially for OLTP
systems which have hard real-time requirements. OLTP systems  
typically
don't care about getting the best plan for a query, only a plan  
that

is good enough.


Good enough means it can keep up with the rate of incoming  
requests; it
doesn't matter whether it keeps up with 10% headroom or 20%  
headroom. But if
one incoming query even one in a thousand takes 1000% of the time  
available

then the entire system risks falling down.


Is it worth pointing out that using the same plan all the time is *no*
recipe for guaranteeing response time?  There is no such thing as a  
plan

that is good for every case --- outlying data values can make a
usually-good plan blow out your performance guarantee anyway.   
Disabling

the planner is just a recipe for ensuring that that will happen, IMHO.

regards, tom lane




I think I know what Greg is trying to say: I think in this plan  
stability does not mean that the plan has to be completely fixed -  
usually it is all about indexing. People start with an empty  
perfectly analyzed database and data is added. However, some day some  
cron job doing ANALYZE or whatever fails and the system will slow  
down or even break down because data is added to some table which is  
still seq-scanned. This is what usually happens and which leads to  
support cases.


Adding hints to some comments or to the statement itself is not a  
good solution as well. This is why I proposed a table or some flag  
telling the planner what to favour (= always use a certain index). So  
the basic idea is not to turn index of in general but to have the  
chance to do it on a per index basis. I guess this would not be to  
complex to implement and it solves 90% of all problems without having  
to hide some information inside comments (which is no good at all).


best regards,

hans




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] generalizing the planner knobs

2005-12-05 Thread Greg Stark

Hans-Juergen Schoenig [EMAIL PROTECTED] writes:

 I think I know what Greg is trying to say: I think in this plan stability
 does not mean that the plan has to be completely fixed - usually it is all
 about indexing.

Usually problems occur because someone hasn't run analyze at all. That's not
what I'm talking about. I'm talking about a large mature system where the DBA
has everything tuned and adjusted properly and just wants to get a good
night's sleep, confident that the nightly analyze isn't going to suddenly
change the performance of existing queries.

 Adding hints to some comments or to the statement itself is not a  good
 solution as well. This is why I proposed a table or some flag  telling the
 planner what to favour (= always use a certain index). So  the basic idea is
 not to turn index of in general but to have the  chance to do it on a per 
 index
 basis. I guess this would not be to  complex to implement and it solves 90% of
 all problems without having  to hide some information inside comments (which 
 is
 no good at all).

I disagree that this is a reasonable solution.

I want to be sure my existing queries keep using the plans they've been using
until I allow them to change.

I don't want to sit down and type select count(*) from users and have it not
work correctly (ie, use a sequential scan) because the system is so single
mindedly tuned for the OLTP application.


-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] generalizing the planner knobs

2005-12-05 Thread Csaba Nagy
[snip]
 I want to be sure my existing queries keep using the plans they've been using
 until I allow them to change.
 
 I don't want to sit down and type select count(*) from users and have it not
 work correctly (ie, use a sequential scan) because the system is so single
 mindedly tuned for the OLTP application.
 

Now this is exactly what I've had in mind... it would be nice to
fixate a plan for some of the queries, and let the planner choose the
best for all the rest. I think some other data bases have something like
an optimizer plan stability feature, providing outlines of query
plan bundles. This is maybe too much, but specifying that for a certain
query I definitely want to use one index and not the other would be
nice...

On another note, it might be interesting to have some kind of prepare
analyze, where the planner is allowed to go and get some more detailed
estimation from the actual table data based on the hard-coded parameter
values, and produce some more detailed statistics for the parameterized
values so it can then produce hot-shot plans for the actual parameter
values on each execution... I wonder if this makes any sense. This way
we could have some very detailed statistics directly supporting the
queries we actually use. I would call this kind of prepare for the most
used/problematic queries from time to time, and the planner should
decide what statistics it needs to support it and go and get it...

Cheers,
Csaba.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] generalizing the planner knobs

2005-12-04 Thread Simon Riggs
On Fri, 2005-12-02 at 11:07 +0100, Csaba Nagy wrote:
 On Thu, 2005-12-01 at 22:01, Tom Lane wrote:
  Greg Stark [EMAIL PROTECTED] writes:
   On the other hand the type I would prefer to see are hints that feed 
   directly
   into filling in information the planner lacks. This only requires that the
   user understand his own data and still lets the planner pick the best plan
   based on the provided information.
  
  This would avoid some issues, but it still is vulnerable to the problem
  that the hint you put in your code today will fail to track changes in
  your data tomorrow.
 
 Tom, I have to disagree here. At least in our application, we must
 provide for an acceptable worst case scenario, and sometimes a slightly
 wrong estimate can lead to a plan which is very fast 99% of the time but
 completely wrong in 1% of the cases. Sometimes the percentage is 50/50.
 I've had this situation with some limit plans where the planner had
 chosen a wrong index. The problem there was that the planner had
 estimated that the query will have 20 rows as a result, but it had less,
 and resulted in the complete scan of the index... as opposed to a much
 smaller scan that would have resulted by scanning the other index, as
 that one would have provided an end condition orders of magnitudes
 sooner. Now the statistics will always be only an estimation, and +/- a
 few can really make a big difference in some situations. In this
 particular situation the index choice of the planner would have been
 faster for all cases where there were really 20 rows returned, but I
 forced it to always choose the other plan (by adding the proper order
 by) because I can't risk a bad result in any of the cases.
 In this particular case I was able to force the planner choose a
 specific plan, but that might not be always possible, so I guess it
 really would make sense to be able to tell the planner how selective
 some conditions are. And yes, sometimes I would like to freeze a
 specific safe plan for a specific query, even if it is not optimal.

Csaba raises a good point here. Many people say they want hints when
what they actually require the plan to be both stable and predictable.

Tom is right to point out that data can change over time. However,
experience with packaged application tuning is that you actually do want
to have things work in a stable way, even if that is somewhat
sub-optimal because when you have 1000s of statements it is important
that it doesn't change after you tune it - otherwise you never finish.
So I would like to give that requirement a name Plan Stability; the
actual solution to that could be many things.

Another aspect to this is predictability. At the moment, we optimise
according to the first parameter a prepared statement is bound with.
Many data distributions contain a small number of values that represent
a large fraction of the total. This can mean that it is pretty random
whether we will get a SeqScan and be stuck with it, or get an IndexScan
and be stuck with it. Either plan being wrong 50% of the time. In these
cases, hinting is definitely a very bad thing, since whichever you hint,
you'll be wrong. This situation gives us two more requirements:
- predictability - because we want to know the worst case
- flexibility - because we want to be able to take advantage of the best
case, but without causing an unconstrained worst case

So IMHO, the requirements list for prepared statement planning is that
optimization must be:
- Flexible
- Predictable
- Stable

The actual solutions to all of those things could be many and varied.

ISTM we could do some of that with another GUC, lets call it
prepare_once = on. The system default is to have a prepared statement
bound to a plan on its first parameter bind. If we set this to off,
then the statement will replan each time we bind. This would give us
both flexibility and predictability. (As ever, someone suggest a better
name?).


The requirements for very large statement tuning are fairly different
from that. e.g. large reports or Data Warehousing queries. In those
cases, Tom's warning about the data changing can be critical and we
don't have the same need to prepare queries. Neil's original point about
needing to avoid SeqScans on some tables but not others hasn't really
been addressed. That got turned into hinting on particular nodes/node
types, but there are still issues: How do you identify one node in a
complex plan?

Perhaps the way forward would be to make enable_* accept a list of
tables, or a * (like listen_addresses). (And make on a synonym for *
and off a synonym for an empty list). That way, you'd be able to control
on a per-table basis what plan types are possible. (Still wouldn't work
that well for complex plans where same table referenced multiple times,
but its a step in the right direction.) I'm not sure I like that
particular idea much, but I'm attempting to address Neil's original
point, with which I agree.

Best Regards, Simon Riggs



Re: [HACKERS] generalizing the planner knobs

2005-12-04 Thread Alvaro Herrera
Simon Riggs wrote:

 ISTM we could do some of that with another GUC, lets call it
 prepare_once = on. The system default is to have a prepared statement
 bound to a plan on its first parameter bind. If we set this to off,
 then the statement will replan each time we bind. This would give us
 both flexibility and predictability. (As ever, someone suggest a better
 name?).

Why would all statements behave the same?  I think an important
percentage of cases would require a fixed plan (thus planning at first
sight is a good idea), while a limited number of cases would require
planning every time the sentence is called.  Your idea of qualifying it
by table name does not make too much sense to me, because you can have
both types of queries for each table, and further any query where this
is necessary will involve more than one table anyway, so which one do
you choose to make the decision?

So we would provide a protocol/libpq option to allow first-params-
planning (the default and current behavior), and another to allow
planning-every-time.  The latter would tell the server to save only the
parsetree of the query and replan each time it is invoked.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] generalizing the planner knobs

2005-12-04 Thread Pollard, Mike
Simon Riggs wrote
 ISTM we could do some of that with another GUC, lets call it
 prepare_once = on. The system default is to have a prepared statement
 bound to a plan on its first parameter bind. If we set this to off,
 then the statement will replan each time we bind. This would give us
 both flexibility and predictability. (As ever, someone suggest a
better
 name?).


We call it deferred optimization.

Do you really stop at the first parameter?  What if it couldn't possibly
affect the plan (col like '%M%', or col is not involved in an
index)?  You can continue to plan up until the first parameter that can
affect the plan.  At that point, you save off the plan, and when you get
actual values (on the execute command), continue with the planning.  You
can do the same thing with correlated subqueries

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] generalizing the planner knobs

2005-12-04 Thread Greg Stark

Simon Riggs [EMAIL PROTECTED] writes:

 Csaba raises a good point here. Many people say they want hints when
 what they actually require the plan to be both stable and predictable.

Plan stability is also an important feature, especially for OLTP systems which
have hard real-time requirements. OLTP systems typically don't care about
getting the best plan for a query, only a plan that is good enough.

Good enough means it can keep up with the rate of incoming requests; it
doesn't matter whether it keeps up with 10% headroom or 20% headroom. But if
one incoming query even one in a thousand takes 1000% of the time available
then the entire system risks falling down.

But plan stability is something that should be integrated directly in the
server. Not something achieved by having the user hint every query to defeat
the optimizer.

What I'm working on for my purposes here is a perl script that takes all the
queries in the application (either gathered from the log or stored statically)
and runs ANALYZE on all of them. Then within a transaction it runs ANALYZE on
the database and re-ANALYZES every query again. If any plans change then it
mails them to the DBA and rolls back the transaction with the database
analysis. The DBA gets a chance to approve the new plans before they go into
effect.

That's more or less what I expect an integrated plan stability feature to do.
It's like a shared query plan cache except that instead of being a cache it's
a database of plans that are specifically approved by the DBA. Queries that
don't have an approved plan could be configured to either produce a warning or
an error until the plan is approved.

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] generalizing the planner knobs

2005-12-04 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Plan stability is also an important feature, especially for OLTP
 systems which have hard real-time requirements. OLTP systems typically
 don't care about getting the best plan for a query, only a plan that
 is good enough.

 Good enough means it can keep up with the rate of incoming requests; it
 doesn't matter whether it keeps up with 10% headroom or 20% headroom. But if
 one incoming query even one in a thousand takes 1000% of the time available
 then the entire system risks falling down.

Is it worth pointing out that using the same plan all the time is *no*
recipe for guaranteeing response time?  There is no such thing as a plan
that is good for every case --- outlying data values can make a
usually-good plan blow out your performance guarantee anyway.  Disabling
the planner is just a recipe for ensuring that that will happen, IMHO.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] generalizing the planner knobs

2005-12-04 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 Is it worth pointing out that using the same plan all the time is *no*
 recipe for guaranteeing response time?  There is no such thing as a plan
 that is good for every case --- outlying data values can make a
 usually-good plan blow out your performance guarantee anyway.  Disabling
 the planner is just a recipe for ensuring that that will happen, IMHO.

But outlying data is something the user has control over. The user when
approving plans needs to be aware not just that the plan is experimentally
good, but that it will perform reliably within the constraints based on his
knowledge of the application and the data.

My point is that I don't need a plan that is good for every case. I need a
plan I can trust to perform as expected. If my boss asks me what impact
doubling the number of users will have I need to be able to answer it'll be
at worst twice as slow (knowing that my queries and the plans I've seen are
all O(users)).

If twice as slow is still tolerable then that's fine, even if a faster plan
was possible. What I don't want to say is well we'll have to try it and see
which is all I can say if there's a risk the plans will change.

Actually I would expect the facility to only be useful if there was still a
way to update the plans. The DBA would periodically re-analyze the queries in
the system and check any changed plans to ensure they were still reasonable.
Effectively the same as my script except at the query level rather than at the
database statistics level.

The scenario where this is useful is not in a development environment where
things are changing dynamically. But rather in a mature application where the
data distribution is well established. Outlying data almost certainly
represents an application bug and should be signalled, not allowed to
spontaneously bring down the rest of the system.

As anecdotal evidence, in the last job where I worked, once we had 10 million
users and over a hundred web requests per second it would have been pretty
hard to believe any reasonable query could involve a sequential scan.
Certainly no query that the web application should be performing itself
without human intervention. If for whatever reason there was some outlying
data point where that would have been the right plan it would have
immediately brought down the web site.

-- 
greg



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] generalizing the planner knobs

2005-12-03 Thread Hans-Juergen Schoenig
wouldn't it be more flexible to define a multiplicator or some sort of bool flag on a per object level?oracle hints are a total overkill and i agree with tom that usually people will abuse this feature.if we had a per object flag the actual planner hint can be decoupled from the actual query (i don't think putting a hint inside a query is the most clever thing).changing a flag would be as simple as running UPDATE on some system table.this should not be too intrusive as well.	best regards,		hansOn Dec 1, 2005, at 7:45 PM, Jonah H. Harris wrote:Tom,  Don't get me wrong, I agree with you completely.  I would rather put effort into enhancing the planner than in developing work-arounds.  In 99% of all cases the planner works correctly, but I know people who actually have to disable planning options (mergejoin) in production applications because they get bad plans.  The "bad" plans are not really bad in terms of what the planner knows about the query, just in areas where the planner doesn't look at other things.  I also agree that a significant amount of work would be required to add run-time hints which would be better spent enhancing the system as a whole.  My only suggestion was that it would be better than Part 1 of Neil's statement.  Somehow I missed the end mention of multipliers which I agree requires less effort.  On 12/1/05, Tom Lane [EMAIL PROTECTED] wrote: "Jonah H. Harris" [EMAIL PROTECTED] writes: In the last couple weeks I too have been thinking about planner hints. Assuming I have read your post correctly, the issue I see with this idea is  that, in most cases, there won't be much of a difference between adding an arbitrary cost value to each type of node and disabling it completely. Also, by fiddling with an arbitrary cost the user may introduce a lot of  variation into the planner which may actually result in worse query plans.Which is pretty much exactly the problem with "planner hints", too.I've resisted that suggestion in the past and will continue to do so, because hints are accidents waiting to happen.  Even if the hint is righttoday for your current Postgres version and current data distribution,it's likely not to be right further down the road --- but once the hint is embedded in your application, how often are you going to revisit it?As an example, a hint forcing the planner to use an indexscan with aparticular index might have been a great idea in PG 8.0 and a lousy idea in 8.1, because it would prevent substitution of a possibly-far-betterbitmap indexscan.The enable_foo switches are debug aids, not something you are expectedto fool with for production purposes, and the same would be true of Neil's suggested multipliers.  While I don't feel any strong need forvariable multipliers, they'd be a small enough incremental amount ofwork that the suggestion doesn't require a lot of supporting argument. Adding a planner hint facility would be several orders of magnitudemore work, and it would be taking the system in a design direction thatI think is fundamentally misguided.regards, tom lane

Re: [HACKERS] generalizing the planner knobs

2005-12-02 Thread Csaba Nagy
On Thu, 2005-12-01 at 22:01, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  On the other hand the type I would prefer to see are hints that feed 
  directly
  into filling in information the planner lacks. This only requires that the
  user understand his own data and still lets the planner pick the best plan
  based on the provided information.
 
 This would avoid some issues, but it still is vulnerable to the problem
 that the hint you put in your code today will fail to track changes in
 your data tomorrow.

Tom, I have to disagree here. At least in our application, we must
provide for an acceptable worst case scenario, and sometimes a slightly
wrong estimate can lead to a plan which is very fast 99% of the time but
completely wrong in 1% of the cases. Sometimes the percentage is 50/50.
I've had this situation with some limit plans where the planner had
chosen a wrong index. The problem there was that the planner had
estimated that the query will have 20 rows as a result, but it had less,
and resulted in the complete scan of the index... as opposed to a much
smaller scan that would have resulted by scanning the other index, as
that one would have provided an end condition orders of magnitudes
sooner. Now the statistics will always be only an estimation, and +/- a
few can really make a big difference in some situations. In this
particular situation the index choice of the planner would have been
faster for all cases where there were really 20 rows returned, but I
forced it to always choose the other plan (by adding the proper order
by) because I can't risk a bad result in any of the cases.
In this particular case I was able to force the planner choose a
specific plan, but that might not be always possible, so I guess it
really would make sense to be able to tell the planner how selective
some conditions are. And yes, sometimes I would like to freeze a
specific safe plan for a specific query, even if it is not optimal.

So for me the hint mechanism is good for telling the server that I'm
not interested at all in the BEST plan but which risks getting very bad
on occasions, but in a good enough plan which is safe.

And as for the selectivity changes over time, the hints will change
along. In most of the situations when selectivity change, the SQL has to
change too, sometimes even the complete workflow. I find that if changed
hints will help in some occasions then having them would mean less
maintenance than the code rewriting that would be otherwise involved...
and I'm completely sure the server can't compensate for the change of
the dynamics of the data all the time. And it definitely can't keep up
with highly dynamic data, where the statistics change constantly in big
tables... 

Our application for example has kind of batch processing, where we
insert smaller or larger batches of data in a HUGE table (~200 millions
of rows), and then that data is immediately used for different
operations and then reports, and furthermore it is heavily updated. I
can't think of any reasonable statistics target and ANALYZE strategy
which could satisfy both small batches and large batches without running
ANALYZE permanently with high statistics targets on the key fields...
and even that would not be specific enough when limit 20 is involved.
For queries involving this table I really would like to freeze plans, as
any misplanning has bad consequences.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] generalizing the planner knobs

2005-12-02 Thread Martijn van Oosterhout
On Fri, Dec 02, 2005 at 11:07:06AM +0100, Csaba Nagy wrote:
 So for me the hint mechanism is good for telling the server that I'm
 not interested at all in the BEST plan but which risks getting very bad
 on occasions, but in a good enough plan which is safe.

I'm wondering if long term another approach might be to have another
parameter in the planner, cost_error or selectivity_error which is an
indication of how accurate we think it is.

So for example you have an index scan might cost x but with a possible
error of 15% and the seqscan might cost y but with an error of 1%.

The error for nested loop would be the product of the two inputs,
whereas a merge join whould be much less sensetive to error. A sort or
hash join would react badly to large variations of input.

So in cases where there is a choice between two indexscans with one
slightly more expensive and more accurate but can result in a mergejoin
would be a better choice than a possibly highly selective index but
without accurate info that needs to be fed into a nested loop. Even
though the latter might look better, the former is the safer option.

I think this would solve the problem where people see sudden flip-flops
between good and bad plans. The downside is that it's yet another
parameter for the planner to get wrong.

Unfortunatly, this is the kind of thing people write thesises on and I
don't think many people have the grounding in statistics to make it all
work.

Have a ncie day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpZhqhRS8A1y.pgp
Description: PGP signature


Re: [HACKERS] generalizing the planner knobs

2005-12-02 Thread Pollard, Mike
Greg Stark [EMAIL PROTECTED] writes:
 You more or less missed my entire point.

Only because I am still getting used to how powerful and flexible
Postgres is; but I am working on expanding my horizons.

 In the extreme, no amount of added intelligence in the optimizer is
going
 to
 help it come up with any sane selectivity estimate for something like
 
   WHERE radius_authenticate(user) = 'OK'

yeah, I can see where something like this would be problematic.  While I
still think that in an ideal world, you want to leave all of this to the
engine, it is true that in the real world sometimes we still have to do
some of the thinking for the computer.  It's just that I've seen code
absolutely littered with optimizer hints, and that really bothers me.
But you can't not build a useful tool just because some would abuse it.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] generalizing the planner knobs

2005-12-02 Thread Rod Taylor
 In the extreme, no amount of added intelligence in the optimizer is going to
 help it come up with any sane selectivity estimate for something like 
 
   WHERE radius_authenticate(user) = 'OK'

Why not?

The missing capability in this case is to be able to provide or generate
(self learning?) statistics for a function that describe a typical
result and the cost of getting that result.
-- 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] generalizing the planner knobs

2005-12-02 Thread Greg Stark
Rod Taylor [EMAIL PROTECTED] writes:

  In the extreme, no amount of added intelligence in the optimizer is going to
  help it come up with any sane selectivity estimate for something like 
  
WHERE radius_authenticate(user) = 'OK'
 
 Why not?
 
 The missing capability in this case is to be able to provide or generate
 (self learning?) statistics for a function that describe a typical result
 and the cost of getting that result.

Ok, try WHERE radius_authenticate(user, (select ...), ?)

The point is that you can improve the estimates the planner gets. But you can
never make them omniscient. There will always be cases where the user knows
his data more than the planner. And those hints are still valid when a new
optimizer has new plans available.

This is different from hints that tell the planner what plan to use. Every
situation where the predicted cost is inaccurate despite accurate estimates
represents a fixable bug in the optimizer's cost model. When a new version of
the optimizer is available with a more accurate cost model or new available
plans those kinds of hints will only get in the way.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] generalizing the planner knobs

2005-12-02 Thread Gregory Maxwell
On 02 Dec 2005 15:49:02 -0500, Greg Stark [EMAIL PROTECTED] wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
  The missing capability in this case is to be able to provide or generate
  (self learning?) statistics for a function that describe a typical result
  and the cost of getting that result.

 Ok, try WHERE radius_authenticate(user, (select ...), ?)

 The point is that you can improve the estimates the planner gets. But you can
 never make them omniscient. There will always be cases where the user knows
 his data more than the planner. And those hints are still valid when a new
 optimizer has new plans available.

Actually...  If a statistics engine stores the entire query as well
and used that as a key I don't see why it couldn't figure this out.
I.e. in queries that look like Z operation X has historically had
selectivity Y.

The the instruction to the user is simple: 'make sure that queries
with different results look different' . This is often naturally the
case.

The challenge becomes how do you group together queries which are
mostly the same so that you get enough data, but not falsely cluster
queries with different statistics.

The simplest way check the statistics list for the most similar query
match, and use that information. If the result is similar to what is
expected, use it to update the statistics record. If the measured
selectivity is too different make a new record which will then attract
similar queries.

Sounds like a good research project for someone.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] generalizing the planner knobs

2005-12-02 Thread Simon Riggs
On Fri, 2005-12-02 at 11:53 +0100, Martijn van Oosterhout wrote:
 On Fri, Dec 02, 2005 at 11:07:06AM +0100, Csaba Nagy wrote:
  So for me the hint mechanism is good for telling the server that I'm
  not interested at all in the BEST plan but which risks getting very bad
  on occasions, but in a good enough plan which is safe.
 
 I'm wondering if long term another approach might be to have another
 parameter in the planner, cost_error or selectivity_error which is an
 indication of how accurate we think it is.
 
 So for example you have an index scan might cost x but with a possible
 error of 15% and the seqscan might cost y but with an error of 1%.
 
 The error for nested loop would be the product of the two inputs,
 whereas a merge join whould be much less sensetive to error. A sort or
 hash join would react badly to large variations of input.
 
 So in cases where there is a choice between two indexscans with one
 slightly more expensive and more accurate but can result in a mergejoin
 would be a better choice than a possibly highly selective index but
 without accurate info that needs to be fed into a nested loop. Even
 though the latter might look better, the former is the safer option.
 
 I think this would solve the problem where people see sudden flip-flops
 between good and bad plans. The downside is that it's yet another
 parameter for the planner to get wrong.

Measuring parameters more accurately is a lengthy experimental job, not
a theoretical one. I think we are just waiting for someone to do this.

 Unfortunatly, this is the kind of thing people write thesises on and I
 don't think many people have the grounding in statistics to make it all
 work.

I'd considered that before; its just a lot of work.

The theory of error propagation is straightforward: you just take the
root mean square of the errors on the parameters. 

Trouble is, many of the planning parameters are just guesses, so you
have no idea of the error estimates either. Hence you can't really
calculate the error propagation accurately enough to make a sensible
stab at risk control. But it would be useful sometimes, which is about
the best it gets with the planner.

Right now the worst part of the planner is:
- the estimation of number of distinct values, which is an inherent
statistical limitation
- need for multi-column interaction statistics

The two are somewhat related.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] generalizing the planner knobs

2005-12-02 Thread Trent Shipley
Is it possible to submit a hand written or arbitrary execution plan to the 
retrieval engine?  (That is, can one bypass the SQL parser and planner or 
optimizer and just provide instructions to nested loop join table a to table 
b ...)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] generalizing the planner knobs

2005-12-01 Thread Neil Conway
There are currently some rather crude knobs for persuading the planner
to favour certain kinds of query plans: the enable_XXX GUC variables.
Several people have asked for a more flexible way to give hints to the
planner. I'm not interested in implementing fully-general planner hints
at the moment, but ISTM that a simple improvement to what we have now
would allow for a wider range of planner hints with only minor changes:
we could replace the enable_XXX variables with a set of variables that
would add an arbitrary constant to the estimated cost of each type of
query node. (Alternatively, an arbitrary multiplier could be specified;
I'm not sure which would be better.)

This would also be useful when diagnosing bad query plans: for example,
setting enable_seqscan=false often causes the planner to disregard the
use of *any* sequential scan, anywhere in the plan. The ability to
slightly bump up the cost of particular operations would allow more
alternative plans to be examined.

On the other hand, the whole mechanism is still a hack. It also means
that applications using this will be more dependent on the actual
costing values produced by the planner, which is not good. However, if
you're in the sort of desperate straights where this sort of hackery is
required, perhaps that's acceptable.

Comments?

-Neil



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Jonah H. Harris
Hey Neil,

In the last couple weeks I too have been thinking about planner
hints. Assuming I have read your post correctly, the issue I see
with this idea is that, in most cases, there won't be much of a
difference between adding an arbitrary cost value to each type of node
and disabling it completely. Also, by fiddling with an arbitrary
cost the user may introduce a lot of variation into the planner which
may actually result in worse query plans.

While Tom's done a great job with the planner, there are certain cases
where a user knows exactly what type of join or index they want to use
for a query. In that case I'd favor run-time hints from the user
similar to Oracle. I've read about seven papers on query
optimization and planning in the last few weeks and have a lot of
ideas... I'm just not sure when I may get time to work on them :(

-JonahOn 12/1/05, Neil Conway [EMAIL PROTECTED] wrote:
There are currently some rather crude knobs for persuading the plannerto favour certain kinds of query plans: the enable_XXX GUC variables.Several people have asked for a more flexible way to give hints to the
planner. I'm not interested in implementing fully-general planner hintsat the moment, but ISTM that a simple improvement to what we have nowwould allow for a wider range of planner hints with only minor changes:
we could replace the enable_XXX variables with a set of variables thatwould add an arbitrary constant to the estimated cost of each type ofquery node. (Alternatively, an arbitrary multiplier could be specified;
I'm not sure which would be better.)This would also be useful when diagnosing bad query plans: for example,setting enable_seqscan=false often causes the planner to disregard theuse of *any* sequential scan, anywhere in the plan. The ability to
slightly bump up the cost of particular operations would allow morealternative plans to be examined.On the other hand, the whole mechanism is still a hack. It also meansthat applications using this will be more dependent on the actual
costing values produced by the planner, which is not good. However, ifyou're in the sort of desperate straights where this sort of hackery isrequired, perhaps that's acceptable.Comments?-Neil
---(end of broadcast)---TIP 4: Have you searched our list archives? http://archives.postgresql.org



Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 ... ISTM that a simple improvement to what we have now
 would allow for a wider range of planner hints with only minor changes:
 we could replace the enable_XXX variables with a set of variables that
 would add an arbitrary constant to the estimated cost of each type of
 query node. (Alternatively, an arbitrary multiplier could be specified;
 I'm not sure which would be better.)

I think the multiplier would be better, because it'd avoid the problem
you mention later that useful values would be dependent on the planner's
cost units.  Also, one could sanely allow a multiplier less than one,
so as to favor instead of penalize a particular plan type.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Qingqing Zhou

Neil Conway [EMAIL PROTECTED] wrote

 This would also be useful when diagnosing bad query plans: for example,
 setting enable_seqscan=false often causes the planner to disregard the
 use of *any* sequential scan, anywhere in the plan. The ability to
 slightly bump up the cost of particular operations would allow more
 alternative plans to be examined.


This method also has the problem of enable_seqscan=false in some 
situations. I would vote we implement the final general solution like query 
plan hints directly.

Regards,
Qingqing 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes:
 In the last couple weeks I too have been thinking about planner hints.
 Assuming I have read your post correctly, the issue I see with this idea is
 that, in most cases, there won't be much of a difference between adding an
 arbitrary cost value to each type of node and disabling it completely.
 Also, by fiddling with an arbitrary cost the user may introduce a lot of
 variation into the planner which may actually result in worse query plans.

Which is pretty much exactly the problem with planner hints, too.
I've resisted that suggestion in the past and will continue to do so,
because hints are accidents waiting to happen.  Even if the hint is right
today for your current Postgres version and current data distribution,
it's likely not to be right further down the road --- but once the hint
is embedded in your application, how often are you going to revisit it?
As an example, a hint forcing the planner to use an indexscan with a
particular index might have been a great idea in PG 8.0 and a lousy idea
in 8.1, because it would prevent substitution of a possibly-far-better
bitmap indexscan.

The enable_foo switches are debug aids, not something you are expected
to fool with for production purposes, and the same would be true of
Neil's suggested multipliers.  While I don't feel any strong need for
variable multipliers, they'd be a small enough incremental amount of
work that the suggestion doesn't require a lot of supporting argument.
Adding a planner hint facility would be several orders of magnitude
more work, and it would be taking the system in a design direction that
I think is fundamentally misguided.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Jonah H. Harris
Tom,

Don't get me wrong, I agree with you completely. I would rather
put effort into enhancing the planner than in developing
work-arounds. In 99% of all cases the planner works correctly,
but I know people who actually have to disable planning options
(mergejoin) in production applications because they get bad
plans. The bad plans are not really bad in terms of what the
planner knows about the query, just in areas where the planner doesn't
look at other things.

I also agree that a significant amount of work would be required to add
run-time hints which would be better spent enhancing the system as a
whole. My only suggestion was that it would be better than Part 1
of Neil's statement. Somehow I missed the end mention of
multipliers which I agree requires less effort.

On 12/1/05, Tom Lane [EMAIL PROTECTED] wrote:
Jonah H. Harris [EMAIL PROTECTED] writes: In the last couple weeks I too have been thinking about planner hints. Assuming I have read your post correctly, the issue I see with this idea is
 that, in most cases, there won't be much of a difference between adding an arbitrary cost value to each type of node and disabling it completely. Also, by fiddling with an arbitrary cost the user may introduce a lot of
 variation into the planner which may actually result in worse query plans.Which is pretty much exactly the problem with planner hints, too.I've resisted that suggestion in the past and will continue to do so,
because hints are accidents waiting to happen.Even if the hint is righttoday for your current Postgres version and current data distribution,it's likely not to be right further down the road --- but once the hint
is embedded in your application, how often are you going to revisit it?As an example, a hint forcing the planner to use an indexscan with aparticular index might have been a great idea in PG 8.0 and a lousy idea
in 8.1, because it would prevent substitution of a possibly-far-betterbitmap indexscan.The enable_foo switches are debug aids, not something you are expectedto fool with for production purposes, and the same would be true of
Neil's suggested multipliers.While I don't feel any strong need forvariable multipliers, they'd be a small enough incremental amount ofwork that the suggestion doesn't require a lot of supporting argument.
Adding a planner hint facility would be several orders of magnitudemore work, and it would be taking the system in a design direction thatI think is fundamentally misguided.regards,
tom lane


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Greg Stark
Jonah H. Harris [EMAIL PROTECTED] writes:

 Tom,
 
 Don't get me wrong, I agree with you completely.  I would rather put effort
 into enhancing the planner than in developing work-arounds.  In 99% of all
 cases the planner works correctly, but I know people who actually have to
 disable planning options (mergejoin) in production applications because they
 get bad plans.  The bad plans are not really bad in terms of what the
 planner knows about the query, just in areas where the planner doesn't look
 at other things.

I would like to draw a distinction between two sorts of hints. Currently
you're talking about one sort of hint, namely hints that tell the planner to
alter its cost model and choose a different plan than the inputs it has would
dictate. Using these require the user to have a fairly in depth understanding
of the planner and what options it has available.

On the other hand the type I would prefer to see are hints that feed directly
into filling in information the planner lacks. This only requires that the
user understand his own data and still lets the planner pick the best plan
based on the provided information.

So for example I would love to see a hint that allowed you to specify the
selectivity of a where clause. And one that let you specify the density of a
grouping clause.

Most of the time the planner makes a mistake it's because of a bad
miscalculation in estimating these givens. If it had the correct values for
the inputs then it would make the right decision about the plan.

Making the planner very good at making the right decisions given accurate
inputs is an attainable goal. Computers are pretty deterministic and it's
possible to come up with very accurate cost models. Despite some known
problems with Postgres's current models they're remarkably good already. And
there's no particular reason to think they can't be made nearly perfect.

Making the planner very good at producing accurate estimates is a much harder
goal. No matter how accurate it gets there will always be more complex
expressions that are harder to predict and there will always be cases the
planner can't estimate well. The user however knows his own data and may well
know the answer.

In the extreme consider user-defined operators, which will always be dependent
on the user to provide estimator functions. If it's a rarely used operator the
user may find it easier to simply tell the planner the selectivity of each
expression rather than come up with a general solution.

(I also think things like joins and group by clauses will never be able to be
guaranteed accurate results in general. Not without a lot more costs up front
including giving up on calculating statistics based on only a sample.)

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 On the other hand the type I would prefer to see are hints that feed directly
 into filling in information the planner lacks. This only requires that the
 user understand his own data and still lets the planner pick the best plan
 based on the provided information.

This would avoid some issues, but it still is vulnerable to the problem
that the hint you put in your code today will fail to track changes in
your data tomorrow.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Pollard, Mike
Greg Stark [EMAIL PROTECTED] writes:
 On the other hand the type I would prefer to see are hints that feed
directly
 into filling in information the planner lacks. This only requires that
the
 user understand his own data and still lets the planner pick the best
plan
 based on the provided information.

Optimizer hints were added because some databases just don't have a very
smart optimizer.  But you are much better served tracking down cases in
which the optimizer makes a bad choice, and teaching the optimizer how
to make a better one.  That way, all users get the benefit of the fix.
Remember, the purpose of SQL is to isolate the end user from having to
care about how the data is retrieved; that is the RDBMS' problem.  (the
other thing forgotten was that it was supposed to be a natural language.
NVL.  Bah.)

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Gregory Maxwell
On 12/1/05, Pollard, Mike [EMAIL PROTECTED] wrote:
 Optimizer hints were added because some databases just don't have a very
 smart optimizer.  But you are much better served tracking down cases in
 which the optimizer makes a bad choice, and teaching the optimizer how
 to make a better one.  That way, all users get the benefit of the fix.
 Remember, the purpose of SQL is to isolate the end user from having to
 care about how the data is retrieved; that is the RDBMS' problem.  (the
 other thing forgotten was that it was supposed to be a natural language.
 NVL.  Bah.)

The flipside there is that a good set of hinting options  may increase
the amount of detailed feedback we get from users on improvements
needed in the optimizer.  The current knobs are pretty blunt and don't
do as much as I'd like when trying to track down exactly where the
optimiser has gone wrong.

If we'd really like to avoid people using the knobs to rig queries,
how about making them only  work with explain analyze, useful for
debugging but not so useful for actual queries.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Pollard, Mike
Gregory Maxwell [EMAIL PROTECTED] wrote:
 The flipside there is that a good set of hinting options  may increase
 the amount of detailed feedback we get from users on improvements
 needed in the optimizer.  The current knobs are pretty blunt and don't
 do as much as I'd like when trying to track down exactly where the
 optimiser has gone wrong.

Point conceded.  Any information that can help diagnose an issue is good
information.  I like the idea of only allowing it on explain.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Neil Conway
On Thu, 2005-12-01 at 21:01 -0500, Gregory Maxwell wrote:
 If we'd really like to avoid people using the knobs to rig queries,
 how about making them only  work with explain analyze, useful for
 debugging but not so useful for actual queries.

That seems a pretty arbitrary limitation. I agree that it's not ideal to
have users adjust planner behavior via this means, but until we have
something better, I think applying that limitation would only make the
status quo worse.

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 On Thu, 2005-12-01 at 21:01 -0500, Gregory Maxwell wrote:
 If we'd really like to avoid people using the knobs to rig queries,
 how about making them only  work with explain analyze, useful for
 debugging but not so useful for actual queries.

 That seems a pretty arbitrary limitation. I agree that it's not ideal to
 have users adjust planner behavior via this means, but until we have
 something better, I think applying that limitation would only make the
 status quo worse.

Yeah, I agree.  Adding code to prevent people from using a facility
doesn't seem very reasonable, even if it's our policy that using the
facility for production purposes is not a good idea.  In fact, we just
today had a counterexample --- see this thread:
http://archives.postgresql.org/pgsql-performance/2005-12/msg00015.php
Being able to use enable_nestloop got Markus out of a short-term bind,
which to me is exactly what you want to be able to do with this sort
of thing.

I don't have any problem with expending small amounts of work to make
it easier to hack the planner in small ways.  The real problem I have
with a planner hints facility (in the form that I think most people
who ask for it have in mind) is that it would be a *very large* amount
of work to do it reasonably well, and I think that amount of effort
would be better spent in other ways.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Trent Shipley
On Thursday 2005-12-01 19:01, Gregory Maxwell wrote:
 On 12/1/05, Pollard, Mike [EMAIL PROTECTED] wrote:
  Optimizer hints were added because some databases just don't have a very
  smart optimizer.  But you are much better served tracking down cases in
  which the optimizer makes a bad choice, and teaching the optimizer how
  to make a better one.  That way, all users get the benefit of the fix.
  Remember, the purpose of SQL is to isolate the end user from having to
  care about how the data is retrieved; that is the RDBMS' problem.  (the
  other thing forgotten was that it was supposed to be a natural language.
  NVL.  Bah.)

 The flipside there is that a good set of hinting options  may increase
 the amount of detailed feedback we get from users on improvements
 needed in the optimizer.  The current knobs are pretty blunt and don't
 do as much as I'd like when trying to track down exactly where the
 optimiser has gone wrong.

 If we'd really like to avoid people using the knobs to rig queries,
 how about making them only  work with explain analyze, useful for
 debugging but not so useful for actual queries.

I'm all in favor of sticking to the declarative language ideal.

Also, I'm much in favor of protecting people from themselves.


On the other hand, if folks insist on engaging in extreme sports (like second 
guessing the optimizer) I'm against regulating their freedom.  I think 
exposing planner variables would be a good thing, on net.  Naturally, you 
would warn everyone not to touch them.  (Safety and freedom are both 
necessary.)

If you can play with the knobs, you should let them be used to return real 
result sets.  That way, when you get feedback, you will be able to tell if 
the cost estimator is broken.  Just returning a modified plan won't 
challenge costing assumptions.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Greg Stark

Pollard, Mike [EMAIL PROTECTED] writes:

 Optimizer hints were added because some databases just don't have a very
 smart optimizer.  But you are much better served tracking down cases in
 which the optimizer makes a bad choice, and teaching the optimizer how
 to make a better one.  

You more or less missed my entire point.

You can always teach the optimizer to make better decisions based on good
data. Your statement is basically right when talking about tweaking the
optimizer's decisions to ignore its best judgement.

But there are many many cases where the data the optimizer has available isn't
good and for good reason. And in plenty of those cases the data the optimizer
has available *can't* be good.

In the extreme, no amount of added intelligence in the optimizer is going to
help it come up with any sane selectivity estimate for something like 

  WHERE radius_authenticate(user) = 'OK'

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings