Re: [HACKERS] [PERFORM] Hints proposal

2006-10-20 Thread Zeugswetter Andreas ADI SD

 I actually think the way to attack this issue is to discuss the kinds
of errors the planner makes, and what tweaks we could do to correct
them. 
 Here's the ones I'm aware of:
 
 -- Incorrect selectivity of WHERE clause
 -- Incorrect selectivity of JOIN
 -- Wrong estimate of rows returned from SRF
 -- Incorrect cost estimate for index use
 
 Can you think of any others?

I think your points are too generic, there is no way to get them all
100% correct from statistical
data even with data hints (and it is usually not at all necessary for
good enough plans).
I think we need to more precisely define the problems of our system with
point in time statistics

-- no reaction to degree of other concurrent activity
-- no way to react to abnormal skew that only persists for a very short
duration
-- too late reaction to changing distribution (e.g. current date column
when a new year starts)
and the variant: too late adaption when a table is beeing filled
-- missing cost/selectivity estimates for several parts of the system 

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-20 Thread Gregory Stark
Josh Berkus josh@agliodbs.com writes:

  I actually think the way to attack this issue is to discuss the kinds
  of errors the planner makes, and what tweaks we could do to correct
  them. Here's the ones I'm aware of:
 
  -- Incorrect selectivity of WHERE clause
  -- Incorrect selectivity of JOIN
  -- Wrong estimate of rows returned from SRF
  -- Incorrect cost estimate for index use
 
  Can you think of any others?

 -- Incorrect estimate for result of DISTINCT or GROUP BY.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-20 Thread Mischa Sandberg


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bucky
Jordan
Sent: Thursday, October 12, 2006 2:19 PM
To: josh@agliodbs.com; Jim C. Nasby
Cc: pgsql-hackers@postgresql.org; pgsql-performance@postgresql.org
Subject: Re: [HACKERS] [PERFORM] Hints proposal

  Well, one nice thing about the per-query method is you can post
before
  and after EXPLAIN ANALYZE along with the hints.
 
 One bad thing is that application designers will tend to use the hint,
fix
 the immediate issue, and never report a problem at all.  And query
hints
 would not be collectable in any organized way except the query log,
which
 would then require very sophisticated text parsing to get any useful
 information at all.
 
Or they'll report it when the next version of Postgres breaks their
app because the hints changed, or because the planner does something
else which makes those hints obsolete.

My main concern with hints (aside from the fact I'd rather see more
intelligence in the planner/stats) is managing them appropriately. I
have two general types of SQL where I'd want to use hints- big OLAP
stuff (where I have a lot of big queries, so it's not just one or two
where I'd need them) or large dynamically generated queries (Users
building custom queries). Either way, I don't want to put them on a
query itself.

What about using regular expressions, plus, if you have a function
(views, or any other statement that is stored), you can assign a rule to
that particular function. So you get matching, plus explicit selection.
This way it's easy to find all your hints, turn them off, manage them,
etc. (Not to mention dynamically generated SQL is ugly enough without
having to put hints in there).

- Bucky

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-17 Thread Robert Treat
On Friday 13 October 2006 12:46, Gregory Stark wrote:
 Josh Berkus josh@agliodbs.com writes:
   I actually think the way to attack this issue is to discuss the kinds
   of errors the planner makes, and what tweaks we could do to correct
   them. Here's the ones I'm aware of:
  
   -- Incorrect selectivity of WHERE clause
   -- Incorrect selectivity of JOIN
   -- Wrong estimate of rows returned from SRF
   -- Incorrect cost estimate for index use
  
   Can you think of any others?

  -- Incorrect estimate for result of DISTINCT or GROUP BY.

Yeah, that one is bad.  I also ran into one the other day where the planner 
did not seem to understand the distinctness of a columns values across table 
partitions... 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-13 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 01:58:22PM -0700, Josh Berkus wrote:
  Unless you've got a time machine or a team of coders in your back
  pocket, I don't see how the planner will suddenly become perfect in
  8.4...
 
 Since you're not a core code contributor, I really don't see why you 
 continue to claim that query hints are going to be easier to implement 
 than relation-level statistics modification.  You think it's easier, but 
 the people who actually work on the planner don't believe that it is.
 
Well, that's not what I said (my point being that until the planner and
stats are perfect you need a way to over-ride them)... but I've also
never said hints would be faster or easier than stats modification (I
said I hope they would). But we'll never know which will be faster or
easier until there's actually a proposal for improving the stats.

  We've been seeing the same kinds of problems that are very difficult (or
  impossible) to fix cropping up for literally years... it'd be really
  good to at least be able to force the planner to do the sane thing even
  if we don't have the manpower to fix it right now...
 
 As I've said to other people on this thread, you keep making the incorrect 
 assumption that Oracle-style query hints are the only possible way of 
 manual nuts-and-bolts query tuning.  They are not.

No, I've never said that. What I've said is a) I doubt that any system
will always be correct for every query, meaning you need to be able to
change things on a per-query basis, and b) I'm hoping that simple hints
will be easy enough to implement that they can go into 8.3.

I completely agree that it's much better *in the long run* to improve
the planner and the statistics system so that we don't need hints. But
there's been no plan put forward for how to do that, which means we also
have no idea when some of these problems will be resolved. If someone
comes up with a plan for that, then we can actually look at which options
are better and how soon we can get fixes for these problems in place.

Unfortunately, this problem is difficult enough that I suspect it could
take a long time just to come up with an idea of how to fix these
problems, which means that without some way to override the planner our
users are stuck in the same place for the foreseeable future. If that
turns out to be the case, then I think we should implement per-query
hints now so that users can handle bad plans while we focus on how to
improve the stats and planner so that in the future hints will become
pointless.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-13 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 I completely agree that it's much better *in the long run* to improve
 the planner and the statistics system so that we don't need hints. But
 there's been no plan put forward for how to do that, which means we also
 have no idea when some of these problems will be resolved.

You keep arguing on the assumption that the planner is static and
there's no one working on it.  That is false --- although this thread
is certainly wasting a lot of time that could have been used more
productively ;-).

I also dispute your assumption that hints of the style you propose
will be easier to implement or maintain than the sort of
statistical-assumption tweaking that's been counter-proposed.  Just for
starters, how are you going to get those hints through the parser and
rewriter?  That's going to take an entire boatload of very ugly code
that isn't needed at all in a saner design.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-13 Thread Bucky Jordan
 I completely agree that it's much better *in the long run* to improve
 the planner and the statistics system so that we don't need hints. But
 there's been no plan put forward for how to do that, which means we
also
 have no idea when some of these problems will be resolved. If someone
 comes up with a plan for that, then we can actually look at which
options
 are better and how soon we can get fixes for these problems in place.
 

Would it be helpful to have a database of EXPLAIN ANALYZE results and
related details that developers could search through? I guess we sort of
have that on the mailing list, but search/reporting features on that are
pretty limited. Something like the Report Bug feature that seems to be
growing popular in other software (Windows, OS X, Firefox, etc) might
allow collection of useful data. The goal would be to identify the most
common problems, and some hints at what's causing them.

Maybe have a form based submission so you could ask the user some
required questions, ensure that they aren't just submitting EXPLAIN
results (parse and look for times maybe?), etc?

I guess the general question is, what information could the users
provide developers to help with this, and how can it be made easy for
the users to submit the information, and easy for the developers to
access in a meaningful way?

As a developer/contributor, what questions would you want to ask a user?
From reading the mailing lists, these seem to be common ones:
- Copy of your postgres.conf
- Basic hardware info
- Explain Analyze Results of poor performing query
- Explain Analyze Results of anything you've gotten to run better
- Comments

If there's interest- web development is something I can actually do
(unlike pg development) so I might actually be able to help with
something like this.

- Bucky


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-13 Thread Josh Berkus
Jim,

 Well, that's not what I said (my point being that until the planner and
 stats are perfect you need a way to over-ride them)... but I've also
 never said hints would be faster or easier than stats modification (I
 said I hope they would).

Yes, you did.  Repeatedly.  On this and other threads, you've made the 
statement at least three times that per-query hints are the only way to go 
for 8.3.   Your insistence on this view has been so strident that if I 
didn't know you better, I would assume some kind of hidden agenda.

Stop harping on the per-query hints are the true way and the only way, or 
prepare to have people start simply ignoring you.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-13 Thread Josh Berkus
Andreas,

 I think we need to more precisely define the problems of our system with
 point in time statistics

 -- no reaction to degree of other concurrent activity
 -- no way to react to abnormal skew that only persists for a very short
 duration
 -- too late reaction to changing distribution (e.g. current date column
 when a new year starts)
   and the variant: too late adaption when a table is beeing filled
 -- missing cost/selectivity estimates for several parts of the system

How would we manage point-in-time statistics?  How would we collect them  
store them?   I think this is an interesting idea, but very, very hard  to 
do ...

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-13 Thread Jim C. Nasby
On Fri, Oct 13, 2006 at 03:57:23PM -0700, Josh Berkus wrote:
 Jim,
 
  Well, that's not what I said (my point being that until the planner and
  stats are perfect you need a way to over-ride them)... but I've also
  never said hints would be faster or easier than stats modification (I
  said I hope they would).
 
 Yes, you did.  Repeatedly.  On this and other threads, you've made the 
 statement at least three times that per-query hints are the only way to go 
 for 8.3.   Your insistence on this view has been so strident that if I 
 didn't know you better, I would assume some kind of hidden agenda.

Let me clarify, because that's not what I meant. Right now, there's not
even a shadow of a design for anything else, and this is a tough nut to
crack. That means it doesn't appear that anything else could be done for
8.3. If I'm wrong, great. If not, we should get something in place for
users now while we come up with something better.

So, does anyone out there have a plan for how we could give user's the
ability to control the planner at a per-table level in 8.3 or even 8.4?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-13 Thread Alvaro Herrera
Jim C. Nasby wrote:

 So, does anyone out there have a plan for how we could give user's the
 ability to control the planner at a per-table level in 8.3 or even 8.4?

Per-table level?  Some of the problems that have been put forward have
to do with table combinations (for example selectivity of joins), so not
all problems will be solved with a per-table design.

I think if it were per table, you could get away with storing stuff in
pg_statistics or some such.  But how do you express statistics for
joins?  How do you express cross-column correlation?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-13 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 Let me clarify, because that's not what I meant. Right now, there's not
 even a shadow of a design for anything else, and this is a tough nut to
 crack.

I think you are not exactly measuring on a level playing field.  On the
textually-embedded-hints side, I see a very handwavy suggestion of a
syntax and absolutely nothing about how it might be implemented --- in
particular, nothing about how the information would be transmitted
through to the planner, and nothing about exactly how the planner would
use it if it had it.  (No, I don't think the planner will obey the
hints is an implementation sketch.)  On the other side, the concept of
system catalog(s) containing overrides for statistical or costing
estimates is pretty handwavy too, but at least it's perfectly clear
where it would plug into the planner: before running one of the current
stats estimation or costing functions, we'd look for a matching override
command in the catalogs.  The main question seems to be what we'd like
to be able to match on ... but that doesn't sound amazingly harder than
specifying what an embedded hint does.

IMO a textual hint facility will actually require *more* infrastructure
code to be written than what's being suggested for alternatives.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 11:42:32AM -0400, Tom Lane wrote:
 [ This is off-topic for -performance, please continue the thread in
 -hackers ]
 
 Jim C. Nasby [EMAIL PROTECTED] writes:
  These hints would outright force the planner to do things a certain way.
  ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
 
 This proposal seems to deliberately ignore every point that has been
 made *against* doing things that way.  It doesn't separate the hints
 from the queries, it doesn't focus on fixing the statistical or cost
 misestimates that are at the heart of the issue, and it takes no account
 of the problem of hints being obsoleted by system improvements.
 
Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
also my comment below.

  It would also be useful to allow tweaking of planner cost estimates.
  This would take the general form of
  node operator value
 
 This is at least focusing on the right sort of thing, although I still
 find it completely misguided to be attaching hints like this to
 individual queries.
 
Yes, but as I mentioned the idea here was to come up with something that
is (hopefully) easy to define and implement. In other words, something
that should be doable for 8.3. Because this proposal essentially amounts
to limiting plans the planner will consider and tweaking it's cost
estimates, I'm hoping that it should be (relatively) easy to implement.

 What I would like to see is information *stored in a system catalog*
 that affects the planner's cost estimates.  As an example, the DBA might
 know that a particular table is touched sufficiently often that it's
 likely to remain RAM-resident, in which case reducing the page fetch
 cost estimates for just that table would make sense.  (BTW, this is
 something the planner could in principle know, but we're unlikely to
 do it anytime soon, for a number of reasons including a desire for plan
 stability.)

All this stuff is great and I would love to see it! But this is all so
abstract that I'm doubtful this could make it into 8.4, let alone 8.3.
Especially if we want a comprehensive system that will handle most/all
cases. I don't know if we even have a list of all the cases we need to
handle.

 The other general category of thing I think we need is a
 way to override selectivity estimates for particular forms of WHERE
 clauses.

I hadn't thought about that for hints, but it would be a good addition.
I think the stats-tweaking model would work, but we'd probably want to
allow = as well (which could go into the other stats tweaking hints as
well).

... WHERE a = b /* SELECTIVITY {+|-|*|/|=} value */
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Josh Berkus

Jim,


These hints would outright force the planner to do things a certain way.
... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This proposal seems to deliberately ignore every point that has been
made *against* doing things that way.  It doesn't separate the hints
from the queries, it doesn't focus on fixing the statistical or cost
misestimates that are at the heart of the issue, and it takes no account
of the problem of hints being obsoleted by system improvements.
 
Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See

also my comment below.


I don't see how adding extra tags to queries is easier to implement than 
an ability to modify the system catalogs.  Quite the opposite, really.


And, as I said, if you're going to push for a feature that will be 
obsolesced in one version, then you're going to have a really rocky row 
to hoe.



Yes, but as I mentioned the idea here was to come up with something that
is (hopefully) easy to define and implement. In other words, something
that should be doable for 8.3. Because this proposal essentially amounts
to limiting plans the planner will consider and tweaking it's cost
estimates, I'm hoping that it should be (relatively) easy to implement.


Even I, the chief marketing geek, am more concerned with getting a 
feature that we will still be proud of in 5 years than getting one in 
the next nine months.  Keep your pants on!


I actually think the way to attack this issue is to discuss the kinds of 
errors the planner makes, and what tweaks we could do to correct them. 
Here's the ones I'm aware of:


-- Incorrect selectivity of WHERE clause
-- Incorrect selectivity of JOIN
-- Wrong estimate of rows returned from SRF
-- Incorrect cost estimate for index use

Can you think of any others?

I also feel that a tenet of the design of the planner tweaks system 
ought to be that the tweaks are collectible and analyzable in some form. 
 This would allow DBAs to mail in their tweaks to -performance or 
-hackers, and then allow us to continue improving the planner.


--Josh Berkus





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Josh Berkus

Csaba,


I guess the angle is: I, as a practicing DBA would like to be able to
experiment and get most out of the imperfect tool I have, and you, the
developers, want to make the tool perfect... I don't care about perfect
tools, it just have to do the job... hints or anything else, if I can
make it work GOOD ENOUGH, it's all fine. And hints is something I would
understand and be able to use.


Hmmm, if you already understand Visual Basic syntax, should we support 
that too?  Or maybe we should support MySQL's use of '-00-00' as the 
zero date because people understand that?


We're just not going to adopt a bad design because Oracle DBAs are used 
to it.   If we wanted to do that, we could shut down the project and 
join a proprietary DB staff.


The current discussion is:

a) Planner tweaking is sometimes necessary;
b) Oracle HINTS are a bad design for planner tweaking;
c) Can we come up with a good design for planner tweaking?

So, how about suggestions for a good design?

--Josh Berkus


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Csaba Nagy
 Hmmm, if you already understand Visual Basic syntax, should we support 
 that too?  Or maybe we should support MySQL's use of '-00-00' as the 
 zero date because people understand that?

You completely misunderstood me... I have no idea about oracle hints,
never used Oracle in fact. My company uses oracle, but I have only very
very limited contact with oracle issues, and never touched a hint.

I'm only talking about ease of use, learning curves, and complexity in
general. While I do like the idea of an all automatic system optimizer
which takes your query portofolio and analyzes the data based on those
queries and creates you all the indexes you need and all that, that's
not gonna happen soon, because it's a very complex thing to implement.

The alternative is that you take your query portofolio, analyze it
yourself, figure out what statistics you need, create indexes, tweak
queries, hint the planner for correlations and stuff... which is a
complex task, and if you have to tell the server about some correlations
with the phase of the moon, you're screwed cause there will never be any
DB engine which will understand that. 

But you always can put the corresponding hint in the query when you know
the correlation is there...

The problem is that the application sometimes really knows better than
the server, when the correlations are not standard.

 We're just not going to adopt a bad design because Oracle DBAs are used 
 to it.   If we wanted to do that, we could shut down the project and 
 join a proprietary DB staff.

I have really nothing to do with Oracle. I think you guys are simply too
blinded by Oracle hate... I don't care about Oracle.

 The current discussion is:
 
 a) Planner tweaking is sometimes necessary;
 b) Oracle HINTS are a bad design for planner tweaking;

While there are plenty of arguments you made against query level hints
(can we not call them Oracle-hints ?), there are plenty of users of
postgres who expressed they would like them. I guess they were tweaking
postgres installations when they needed it, and not Oracle
installations. I expressed it clearly that for me query level hinting
would give more control and better understanding of what I have to do
for the desired result. Perfect planning - forget it, I only care about
good enough with reasonable tuning effort. If I have to tweak statistics
I will NEVER be sure postgres will not backfire on me again. On the
other hand if I say never do a seq scan on this table for this query, I
could be sure it won't...

 c) Can we come up with a good design for planner tweaking?

Angles again: good enough now is better for end users, but programmers
always go for perfect tomorrow... pity.

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Tom Lane
Csaba Nagy [EMAIL PROTECTED] writes:
 Until that point is achieved, the above proposal is one of the simplest
 to understand for the tweaking DBA, and the fastest to deploy when faced
 with catastrophic plans. And I would guess it is one of the simplest to
 be implemented and probably not very high maintenance either, although
 this is just a guess.

That guess is wrong ... but more to the point, if you think that simple
and easy to implement should be the overriding concern for designing a
new feature, see mysql.  They've used that design approach for years and
look what a mess they've got.  This project has traditionally done
things differently and I feel no need to change that mindset now.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Andrew Sullivan
On Thu, Oct 12, 2006 at 02:21:55PM -0400, Merlin Moncure wrote:
 third way: to solve the problem of data (especially constants) not
 being available to the planner at the time the plan was generated.
 this happens most often with prepared statements and sql udfs.  note
 that changes to the plan generation mechanism (i think proposed by
 peter e a few weeks back) might also solve this.

You're right about this, but you also deliver the reason why we don't
need hints for that: the plan generation mechanism is a better
solution to that problem.  It's this latter thing that I keep coming
back to.  As a user of PostgreSQL, the thing that I really like about
it is its pragmatic emphasis on correctness.  In my experience, it's
a system that feels very UNIX-y: there's a willingness to accept
80/20 answers to a problem in the event you at least have a way to
get the last 20, but the developers are opposed to anything that
seems really kludgey.

In the case you're talking about, it seems to me that addressing the
problems where they come from is a better solution that trying to
find some way to work around them.  And most of the use-cases I hear
for a statement-level hints system fall into this latter category.

A
-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Unfortunately reformatting the Internet is a little more painful 
than reformatting your hard drive when it gets out of whack.
--Scott Morris

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 09:40:30AM -0700, Josh Berkus wrote:
 Jim,
 
 These hints would outright force the planner to do things a certain way.
 ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
 This proposal seems to deliberately ignore every point that has been
 made *against* doing things that way.  It doesn't separate the hints
 from the queries, it doesn't focus on fixing the statistical or cost
 misestimates that are at the heart of the issue, and it takes no account
 of the problem of hints being obsoleted by system improvements.
  
 Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
 also my comment below.
 
 I don't see how adding extra tags to queries is easier to implement than 
 an ability to modify the system catalogs.  Quite the opposite, really.
 
 And, as I said, if you're going to push for a feature that will be 
 obsolesced in one version, then you're going to have a really rocky row 
 to hoe.
 
Unless you've got a time machine or a team of coders in your back
pocket, I don't see how the planner will suddenly become perfect in
8.4...

 Yes, but as I mentioned the idea here was to come up with something that
 is (hopefully) easy to define and implement. In other words, something
 that should be doable for 8.3. Because this proposal essentially amounts
 to limiting plans the planner will consider and tweaking it's cost
 estimates, I'm hoping that it should be (relatively) easy to implement.
 
 Even I, the chief marketing geek, am more concerned with getting a 
 feature that we will still be proud of in 5 years than getting one in 
 the next nine months.  Keep your pants on!
 
Hey, I wrote that email while dressed! :P

We've been seeing the same kinds of problems that are very difficult (or
impossible) to fix cropping up for literally years... it'd be really
good to at least be able to force the planner to do the sane thing even
if we don't have the manpower to fix it right now...

 I actually think the way to attack this issue is to discuss the kinds of 
 errors the planner makes, and what tweaks we could do to correct them. 
 Here's the ones I'm aware of:
 
 -- Incorrect selectivity of WHERE clause
 -- Incorrect selectivity of JOIN
 -- Wrong estimate of rows returned from SRF
 -- Incorrect cost estimate for index use
 
 Can you think of any others?
 
There's a range of correlations where the planner will incorrectly
choose a seqscan over an indexscan.

Function problems aren't limited to SRFs... we have 0 statistics ability
for functions.

There's the whole issue of multi-column statistics.

 I also feel that a tenet of the design of the planner tweaks system 
 ought to be that the tweaks are collectible and analyzable in some form. 
  This would allow DBAs to mail in their tweaks to -performance or 
 -hackers, and then allow us to continue improving the planner.

Well, one nice thing about the per-query method is you can post before
and after EXPLAIN ANALYZE along with the hints. But yes, as we move
towards a per-table/index/function solution, there should be an easy way
to see how those hints are affecting the system and to report that data
back to the community.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Josh Berkus
Jim,

  I don't see how adding extra tags to queries is easier to implement
  than an ability to modify the system catalogs.  Quite the opposite,
  really.
 
  And, as I said, if you're going to push for a feature that will be
  obsolesced in one version, then you're going to have a really rocky
  row to hoe.

 Unless you've got a time machine or a team of coders in your back
 pocket, I don't see how the planner will suddenly become perfect in
 8.4...

Since you're not a core code contributor, I really don't see why you 
continue to claim that query hints are going to be easier to implement 
than relation-level statistics modification.  You think it's easier, but 
the people who actually work on the planner don't believe that it is.

 We've been seeing the same kinds of problems that are very difficult (or
 impossible) to fix cropping up for literally years... it'd be really
 good to at least be able to force the planner to do the sane thing even
 if we don't have the manpower to fix it right now...

As I've said to other people on this thread, you keep making the incorrect 
assumption that Oracle-style query hints are the only possible way of 
manual nuts-and-bolts query tuning.  They are not.

  I actually think the way to attack this issue is to discuss the kinds
  of errors the planner makes, and what tweaks we could do to correct
  them. Here's the ones I'm aware of:
 
  -- Incorrect selectivity of WHERE clause
  -- Incorrect selectivity of JOIN
  -- Wrong estimate of rows returned from SRF
  -- Incorrect cost estimate for index use
 
  Can you think of any others?

 There's a range of correlations where the planner will incorrectly
 choose a seqscan over an indexscan.

Please list some if you have ones which don't fall into one of the four 
problems above.

 Function problems aren't limited to SRFs... we have 0 statistics ability
 for functions.

 There's the whole issue of multi-column statistics.

Sure, but again that falls into the category of incorrect selectivity for 
WHERE/JOIN.  Don't make things more complicated than they need to be.

 Well, one nice thing about the per-query method is you can post before
 and after EXPLAIN ANALYZE along with the hints.

One bad thing is that application designers will tend to use the hint, fix 
the immediate issue, and never report a problem at all.  And query hints 
would not be collectable in any organized way except the query log, which 
would then require very sophisticated text parsing to get any useful 
information at all.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Bucky Jordan
  Well, one nice thing about the per-query method is you can post
before
  and after EXPLAIN ANALYZE along with the hints.
 
 One bad thing is that application designers will tend to use the hint,
fix
 the immediate issue, and never report a problem at all.  And query
hints
 would not be collectable in any organized way except the query log,
which
 would then require very sophisticated text parsing to get any useful
 information at all.
 
Or they'll report it when the next version of Postgres breaks their
app because the hints changed, or because the planner does something
else which makes those hints obsolete.

My main concern with hints (aside from the fact I'd rather see more
intelligence in the planner/stats) is managing them appropriately. I
have two general types of SQL where I'd want to use hints- big OLAP
stuff (where I have a lot of big queries, so it's not just one or two
where I'd need them) or large dynamically generated queries (Users
building custom queries). Either way, I don't want to put them on a
query itself.

What about using regular expressions, plus, if you have a function
(views, or any other statement that is stored), you can assign a rule to
that particular function. So you get matching, plus explicit selection.
This way it's easy to find all your hints, turn them off, manage them,
etc. (Not to mention dynamically generated SQL is ugly enough without
having to put hints in there).

- Bucky

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Alvaro Herrera
Bucky Jordan wrote:

 What about using regular expressions, plus, if you have a function
 (views, or any other statement that is stored), you can assign a rule to
 that particular function. So you get matching, plus explicit selection.
 This way it's easy to find all your hints, turn them off, manage them,
 etc. (Not to mention dynamically generated SQL is ugly enough without
 having to put hints in there).

The regular expression idea that's being floated around makes my brain
feel like somebody is screeching a blackboard nearby.  I don't think
it's a sane idea.  I think you could achieve something similar by using
stored plan representations, like we do for rewrite rules.  So you'd
look for, say, a matching join combination in a catalog, and get a
selectivity from a function that would get the selectivities of the
conditions on the base tables.  Or something like that anyway.

That gets ugly pretty fast when you have to extract selectivities for
all the possible join paths in any given query.

But please don't talk about regular expressions.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster