Re: [PERFORM] Scrub one large table against another (vmstat output)

2006-10-12 Thread Markus Schaber
Hi, Brendan,

Brendan Curran wrote:
 What prevents you from using an aggregate function?
 
 I guess I could actually obtain the results in an aggregate function and
 use those to maintain a summary table. There is a web view that requires
 'as accurate as possible' numbers to be queried per group (all 40 groups
 are displayed on the same page) and so constant aggregates over the
 entire table would be a nightmare.

That sounds just like a case for GROUP BY and a materialized view.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


FW: [PERFORM] Simple join optimized badly?

2006-10-12 Thread H.J. Sanders



 Hello.
 
 Simply jumping on the bandwagon, just my 2 cents:
 
 why not just like in some other (commercial) databases:
 
 a statement to say: use index 
 
 I know this is against all though but if even the big ones can not resist
 the pressure of their users, why not?
 
 Henk Sanders
 
  -Oorspronkelijk bericht-
  Van: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] Bucky Jordan
  Verzonden: woensdag 11 oktober 2006 16:27
  Aan: Tom Lane; Brian Herlihy
  CC: Postgresql Performance
  Onderwerp: Re: [PERFORM] Simple join optimized badly? 
  
  
   Brian Herlihy [EMAIL PROTECTED] writes:
What would it take for hints to be added to postgres?
   
   A *whole lot* more thought and effort than has been expended on the
   subject to date.
   
   Personally I have no use for the idea of force the planner to do
   exactly X given a query of exactly Y.  You don't have exactly Y
   today, tomorrow, and the day after (if you do, you don't need a
   hint mechanism at all, you need a mysql-style query cache).
   IMHO most of the planner mistakes we see that could be fixed via
   hinting are really statistical estimation errors, and so the right
   level to be fixing them at is hints about how to estimate the number
   of rows produced for given conditions.  Mind you that's still a plenty
   hard problem, but you could at least hope that a hint of that form
   would be useful for more than one query.
   
  
  Do I understand correctly that you're suggesting it might not be a bad
  idea to allow users to provide statistics?
  
  Is this along the lines of I'm loading a big table and touching every
  row of data, so I may as well collect some stats along the way and I
  know my data contains these statistical properties, but the analyzer
  wasn't able to figure that out (or maybe can't figure it out efficiently
  enough)?
  
  While it seems like this would require more knowledge from the user
  (e.g. more about their data, how the planner works, and how it uses
  statistics) this would actually be helpful/required for those who really
  care about performance. I guess it's the difference between a tool
  advanced users can get long term benefit from, or a quick fix that will
  probably come back to bite you. I've been pleased with Postgres'
  thoughtful design; recently I've been doing some work with MySQL, and
  can't say I feel the same way.
  
  Also, I'm guessing this has already come up at some point, but what
  about allowing PG to do some stat collection during queries? If you're
  touching a lot of data (such as an import process) wouldn't it be more
  efficient (and perhaps more accurate) to collect stats then, rather than
  having to re-scan? It would be nice to be able to turn this on/off on a
  per query basis, seeing as it could have pretty negative impacts on OLTP
  performance...
  
  - Bucky
  
  ---(end of broadcast)---
  TIP 4: Have you searched our list archives?
  
 http://archives.postgresql.org
  

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: FW: [PERFORM] Simple join optimized badly?

2006-10-12 Thread Mark Kirkwood

H.J. Sanders wrote:


 why not just like in some other (commercial) databases:
 
 a statement to say: use index 
 
 I know this is against all though but if even the big ones can not resist

 the pressure of their users, why not?
 


Yeah - some could not (e.g. Oracle), but some did (e.g. DB2), and it 
seemed (to me anyway) significant DB2's optimizer worked much better 
than Oracle's last time I used both of them (Oracle 8/9 and DB2 7/8).


cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: FW: [PERFORM] Simple join optimized badly?

2006-10-12 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 10:59:23PM +1300, Mark Kirkwood wrote:
 H.J. Sanders wrote:
 
  why not just like in some other (commercial) databases:
  
  a statement to say: use index 
  
  I know this is against all though but if even the big ones can not resist
  the pressure of their users, why not?
  
 
 Yeah - some could not (e.g. Oracle), but some did (e.g. DB2), and it 
 seemed (to me anyway) significant DB2's optimizer worked much better 
 than Oracle's last time I used both of them (Oracle 8/9 and DB2 7/8).

If someone's going to commit to putting effort into improving the
planner then that's wonderful. But I can't recall any significant
planner improvements since min/max (which I'd argue was more of a bug
fix than an improvement). In fact, IIRC it took at least 2 major
versions to get min/max fixed, and that was a case where it was very
clear-cut what had to be done.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: FW: [PERFORM] Simple join optimized badly?

2006-10-12 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 If someone's going to commit to putting effort into improving the
 planner then that's wonderful. But I can't recall any significant
 planner improvements since min/max (which I'd argue was more of a bug
 fix than an improvement).

Hmph.  Apparently I've wasted most of the last five years.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Hints proposal

2006-10-12 Thread Jim C. Nasby
Posting here instead of hackers since this is where the thread got
started...

The argument has been made that producing a hints system will be as hard
as actually fixing the optimizer. There's also been clamoring for an
actual proposal, so here's one that (I hope) wouldn't be very difficult
to implemen.

My goal with this is to keep the coding aspect as simple as possible, so
that implementation and maintenance of this isn't a big burden. Towards
that end, these hints either tell the planner specifically how to handle
some aspect of a query, or they tell it to modify specific cost
estimates. My hope is that this information could be added to the
internal representation of a query without much pain, and that the
planner can then use that information when generating plans.

The syntax these hints is something arbitrary. I'm borrowing Oracle's
idea of embedding hints in comments, but we can use some other method if
desired. Right now I'm more concerned with getting the general idea
across.

Since this is such a controversial topic, I've left this at a 'rough
draft' stage - it's meant more as a framework for discussion than a
final proposal for implementation.

Forcing a Plan
--
These hints would outright force the planner to do things a certain way.

... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This would force the planner to access table via a seqscan or
index_name. For the index case, you can also specify if the access must
or must not be via a bitmap scan. If neither is specified, the planner
is free to choose either one.

Theoretically, we could also allow ACCESS INDEX without an index name,
which would simply enforce that a seqscan not be used, but I'm not sure
how useful that would be.

... FROM a JOIN b /* {HASH|NESTED LOOP|MERGE} JOIN */ ON (...)
... FROM a JOIN b ON (...) /* [HASH|NESTED LOOP|MERGE] JOIN */

Force the specified join mechanism on the join. The first form would not
enforce a join order, it would only force table b to be joined to the
rest of the relations using the specified join type. The second form
would specify that a joins to b in that order, and optionally specify
what type of join to use.

... GROUP BY ... /* {HASH|SORT} AGGREGATE */

Specify how aggregation should be handled.

Cost Tweaking
-
It would also be useful to allow tweaking of planner cost estimates.
This would take the general form of

node operator value

where node would be a planner node/hint (ie: ACCESS INDEX), operator
would be +, -, *, /, and value would be the amount to change the
estimate by. So ACCESS INDEX my_index / 2 would tell the planner to
cut the estimated cost of any index scan on a given table in half.

(I realize the syntax will probably need to change to avoid pain in the
grammar code.)

Unlike the hints above that are ment to force a certain behavior on an
operation, you could potentially have multiple cost hints in a single
location, ie:

FROM a /* HASH JOIN * 1.1 NESTED LOOP JOIN * 2 MERGE JOIN + 5000 */
JOIN b ON (...) /* NESTED LOOP JOIN - 5000 */

The first comment block would apply to any joins against a, while the
second one would apply only to joins between a and b. The effects would
be cumulative, so this example means that any merge join against a gets
an added cost of 5000, unless it's a join with b (because +5000 + -5000
= 0). I think you could end up with odd cases if the second form just
over-rode the first, which is why it should be cummulative.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Hints proposal

2006-10-12 Thread Bruce Momjian

Because DB2 doesn't like hints, and the fact that they have gotten to a
point where they feel they do not need them, I feel we too can get to a
point where we don't need them either.  The question is whether we can
get there quickly enough for our userbase.

I perfer attacking the problem at the table definition level, like
something like volatile, or adding to the existing table statistics.

---

Jim C. Nasby wrote:
 Posting here instead of hackers since this is where the thread got
 started...
 
 The argument has been made that producing a hints system will be as hard
 as actually fixing the optimizer. There's also been clamoring for an
 actual proposal, so here's one that (I hope) wouldn't be very difficult
 to implemen.
 
 My goal with this is to keep the coding aspect as simple as possible, so
 that implementation and maintenance of this isn't a big burden. Towards
 that end, these hints either tell the planner specifically how to handle
 some aspect of a query, or they tell it to modify specific cost
 estimates. My hope is that this information could be added to the
 internal representation of a query without much pain, and that the
 planner can then use that information when generating plans.
 
 The syntax these hints is something arbitrary. I'm borrowing Oracle's
 idea of embedding hints in comments, but we can use some other method if
 desired. Right now I'm more concerned with getting the general idea
 across.
 
 Since this is such a controversial topic, I've left this at a 'rough
 draft' stage - it's meant more as a framework for discussion than a
 final proposal for implementation.
 
 Forcing a Plan
 --
 These hints would outright force the planner to do things a certain way.
 
 ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
 
 This would force the planner to access table via a seqscan or
 index_name. For the index case, you can also specify if the access must
 or must not be via a bitmap scan. If neither is specified, the planner
 is free to choose either one.
 
 Theoretically, we could also allow ACCESS INDEX without an index name,
 which would simply enforce that a seqscan not be used, but I'm not sure
 how useful that would be.
 
 ... FROM a JOIN b /* {HASH|NESTED LOOP|MERGE} JOIN */ ON (...)
 ... FROM a JOIN b ON (...) /* [HASH|NESTED LOOP|MERGE] JOIN */
 
 Force the specified join mechanism on the join. The first form would not
 enforce a join order, it would only force table b to be joined to the
 rest of the relations using the specified join type. The second form
 would specify that a joins to b in that order, and optionally specify
 what type of join to use.
 
 ... GROUP BY ... /* {HASH|SORT} AGGREGATE */
 
 Specify how aggregation should be handled.
 
 Cost Tweaking
 -
 It would also be useful to allow tweaking of planner cost estimates.
 This would take the general form of
 
 node operator value
 
 where node would be a planner node/hint (ie: ACCESS INDEX), operator
 would be +, -, *, /, and value would be the amount to change the
 estimate by. So ACCESS INDEX my_index / 2 would tell the planner to
 cut the estimated cost of any index scan on a given table in half.
 
 (I realize the syntax will probably need to change to avoid pain in the
 grammar code.)
 
 Unlike the hints above that are ment to force a certain behavior on an
 operation, you could potentially have multiple cost hints in a single
 location, ie:
 
 FROM a /* HASH JOIN * 1.1 NESTED LOOP JOIN * 2 MERGE JOIN + 5000 */
 JOIN b ON (...) /* NESTED LOOP JOIN - 5000 */
 
 The first comment block would apply to any joins against a, while the
 second one would apply only to joins between a and b. The effects would
 be cumulative, so this example means that any merge join against a gets
 an added cost of 5000, unless it's a join with b (because +5000 + -5000
 = 0). I think you could end up with odd cases if the second form just
 over-rode the first, which is why it should be cummulative.
 -- 
 Jim Nasby[EMAIL PROTECTED]
 EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hints proposal

2006-10-12 Thread Joshua Marsh
On 10/12/06, Jim C. Nasby [EMAIL PROTECTED] wrote:
Posting here instead of hackers since this is where the thread gotstarted...The argument has been made that producing a hints system will be as hard
as actually fixing the optimizer. There's also been clamoring for anactual proposal, so here's one that (I hope) wouldn't be very difficultto implemen.My goal with this is to keep the coding aspect as simple as possible, so
that implementation and maintenance of this isn't a big burden. Towardsthat end, these hints either tell the planner specifically how to handlesome aspect of a query, or they tell it to modify specific cost
estimates. My hope is that this information could be added to theinternal representation of a query without much pain, and that theplanner can then use that information when generating plans.

I've been following the last thread with a bit of interest. I like the proposal. It seems simple and easy to use. What is it about hinting that makes it so easily breakable with new versions? I don't have any experience with Oracle, so I'm not sure how they screwed logic like this up. Hinting to use a specific merge or scan seems fairly straight forward; if the query requests to use an index on a join, I don't see how hard it is to go with the suggestion. It will become painfully obvious to the developer if his hinting is broken.



Re: [PERFORM] Hints proposal

2006-10-12 Thread Tom Lane
[ This is off-topic for -performance, please continue the thread in
-hackers ]

Jim C. Nasby [EMAIL PROTECTED] writes:
 These hints would outright force the planner to do things a certain way.
 ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This proposal seems to deliberately ignore every point that has been
made *against* doing things that way.  It doesn't separate the hints
from the queries, it doesn't focus on fixing the statistical or cost
misestimates that are at the heart of the issue, and it takes no account
of the problem of hints being obsoleted by system improvements.

 It would also be useful to allow tweaking of planner cost estimates.
 This would take the general form of
 node operator value

This is at least focusing on the right sort of thing, although I still
find it completely misguided to be attaching hints like this to
individual queries.

What I would like to see is information *stored in a system catalog*
that affects the planner's cost estimates.  As an example, the DBA might
know that a particular table is touched sufficiently often that it's
likely to remain RAM-resident, in which case reducing the page fetch
cost estimates for just that table would make sense.  (BTW, this is
something the planner could in principle know, but we're unlikely to
do it anytime soon, for a number of reasons including a desire for plan
stability.)  The other general category of thing I think we need is a
way to override selectivity estimates for particular forms of WHERE
clauses.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Hints proposal

2006-10-12 Thread Heikki Linnakangas

Bruce Momjian wrote:

Because DB2 doesn't like hints, and the fact that they have gotten to a
point where they feel they do not need them, I feel we too can get to a
point where we don't need them either.  The question is whether we can
get there quickly enough for our userbase.


In all fairness, when I used to work with DB2 we often had to rewrite 
queries to persuade the planner to choose a different plan. Often it was 
more of an issue of plan stability; a query would suddenly become 
horribly slow in production because a table had grown slowly to the 
point that it chose a different plan than before. Then we had to modify 
the query again, or manually set the statistics. In extreme cases we had 
to split a query to multiple parts and use temporary tables and move 
logic to the application to get a query to perform consistently and fast 
enough. I really really missed hints.


Because DB2 doesn't have MVCC, an accidental table scan is very serious, 
because with stricter isolation levels that keeps the whole table locked.


That said, I really don't like the idea of hints like use index X 
embedded in a query. I do like the idea of hints that give the planner 
more information about the data. I don't have a concrete proposal, but 
here's some examples of hints I'd like to see:


table X sometimes has millions of records and sometimes it's empty
Expression (table.foo = table2.bar * 2) has selectivity 0.99
if foo.bar = 5 then foo.field2 IS NULL
Column X is unique
function foobar() always returns either 1 or 2, and it returns 2 90% of 
the time.
if it's Monday, then table NEW_ORDERS has a cardinality of 10, 
otherwise 10.


BTW: Do we make use of CHECK constraints in the planner? In DB2, that 
was one nice and clean way of hinting the planner about things. If I 
remember correctly, you could even define CHECK constraints that weren't 
actually checked at run-time, but were used by the planner.



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hints proposal

2006-10-12 Thread Merlin Moncure

On 10/12/06, Tom Lane [EMAIL PROTECTED] wrote:

[ This is off-topic for -performance, please continue the thread in
-hackers ]



This proposal seems to deliberately ignore every point that has been
made *against* doing things that way.  It doesn't separate the hints
from the queries, it doesn't focus on fixing the statistical or cost
misestimates that are at the heart of the issue, and it takes no account
of the problem of hints being obsoleted by system improvements.


what about extending the domain system so that we can put in ranges
that override the statistics or (imo much more importantly) provide
information when the planner would have to restort to a guess. my case
for this is prepared statements with a parameterized limit clause.

prepare foo(l int) as select * from bar limit $1;

maybe:
create domain foo_lmt as int hint 1; -- probably needs to be fleshed out
prepare foo(l foolmt) as select * from bar limit $1;

this says: if you have to guess me, please use this

what I like about this over previous attempts to persuade you is the
grammar changes are localized and also imo future proofed. planner can
ignore the hints if they are not appropriate for the oparation.

merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hints proposal

2006-10-12 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 BTW: Do we make use of CHECK constraints in the planner?

Only for constraint exclusion, and at the moment that's off by default.

The gating problem here is that if the planner relies on a CHECK
constraint, and then you drop the constraint, the previously generated
plan might start to silently deliver wrong answers.  So I'd like to see
a plan invalidation mechanism in place before we go very far down the
path of relying on constraints for planning.  That's something I'm going
to try to make happen for 8.3, though.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 11:42:32AM -0400, Tom Lane wrote:
 [ This is off-topic for -performance, please continue the thread in
 -hackers ]
 
 Jim C. Nasby [EMAIL PROTECTED] writes:
  These hints would outright force the planner to do things a certain way.
  ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
 
 This proposal seems to deliberately ignore every point that has been
 made *against* doing things that way.  It doesn't separate the hints
 from the queries, it doesn't focus on fixing the statistical or cost
 misestimates that are at the heart of the issue, and it takes no account
 of the problem of hints being obsoleted by system improvements.
 
Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
also my comment below.

  It would also be useful to allow tweaking of planner cost estimates.
  This would take the general form of
  node operator value
 
 This is at least focusing on the right sort of thing, although I still
 find it completely misguided to be attaching hints like this to
 individual queries.
 
Yes, but as I mentioned the idea here was to come up with something that
is (hopefully) easy to define and implement. In other words, something
that should be doable for 8.3. Because this proposal essentially amounts
to limiting plans the planner will consider and tweaking it's cost
estimates, I'm hoping that it should be (relatively) easy to implement.

 What I would like to see is information *stored in a system catalog*
 that affects the planner's cost estimates.  As an example, the DBA might
 know that a particular table is touched sufficiently often that it's
 likely to remain RAM-resident, in which case reducing the page fetch
 cost estimates for just that table would make sense.  (BTW, this is
 something the planner could in principle know, but we're unlikely to
 do it anytime soon, for a number of reasons including a desire for plan
 stability.)

All this stuff is great and I would love to see it! But this is all so
abstract that I'm doubtful this could make it into 8.4, let alone 8.3.
Especially if we want a comprehensive system that will handle most/all
cases. I don't know if we even have a list of all the cases we need to
handle.

 The other general category of thing I think we need is a
 way to override selectivity estimates for particular forms of WHERE
 clauses.

I hadn't thought about that for hints, but it would be a good addition.
I think the stats-tweaking model would work, but we'd probably want to
allow = as well (which could go into the other stats tweaking hints as
well).

... WHERE a = b /* SELECTIVITY {+|-|*|/|=} value */
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: FW: [PERFORM] Simple join optimized badly?

2006-10-12 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 10:44:20AM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  If someone's going to commit to putting effort into improving the
  planner then that's wonderful. But I can't recall any significant
  planner improvements since min/max (which I'd argue was more of a bug
  fix than an improvement).
 
 Hmph.  Apparently I've wasted most of the last five years.

Ok, now that I've actually looked at the release notes, I take that back
and apologize. But while there's a lot of improvements that have been
made, there's still some seriously tough problems that have been talked
about for a long time and there's still no light at the end of the
tunnel, like how to handle multi-column statistics.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hints proposal

2006-10-12 Thread Bucky Jordan
What is it about hinting that makes it so easily breakable with new versions?  
I don't have any experience with Oracle, so I'm not sure how they screwed 
logic like this up.  

I don't have a ton of experience with oracle either, mostly DB2, MSSQL and PG. 
So, I thought I'd do some googling, and maybe others might find this useful 
info. 

http://asktom.oracle.com/pls/ask/f?p=4950:8:2177642270773127589::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:7038986332061

Interesting quote: In Oracle Applications development (11i apps - HR, CRM, 
etc) Hints are strictly forbidden.  We find the underlying cause and fix it. 
and
Hints -- only useful if you are in RBO and you want to make use of an access 
path.

Maybe because I haven't had access to hints before, I've never been tempted to 
use them. However, I can't remember having to re-write SQL due to a PG upgrade 
either.

Oh, and if you want to see everything that gets broken/depreciated with new 
versions, just take a look at oracle's release notes for 9i and 10g. I 
particularly dislike how they rename stuff for no apparent reason (e.g. 
NOPARALLEL is now NO_PARALLEL - 
http://www.oracle-base.com/articles/10g/PerformanceTuningEnhancements10g.php)

At the very least, I agree it is important to separate the query (what data do 
I want) from performance options (config, indexes, hints, etc). The data I want 
doesn't change unless I have a functionality/requirements change. So I'd prefer 
not to have to go back and change that code just to tweak performance. In 
addition, this creates an even bigger mess for dynamic queries. I would be much 
more likely to consider hints if they could be applied separately.

- Bucky


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hints proposal

2006-10-12 Thread Jeff Davis
On Thu, 2006-10-12 at 10:14 -0500, Jim C. Nasby wrote:
 The syntax these hints is something arbitrary. I'm borrowing Oracle's
 idea of embedding hints in comments, but we can use some other method if
 desired. Right now I'm more concerned with getting the general idea
 across.
 

Is there any advantage to having the hints in the queries? To me that's
asking for trouble with no benefit at all. It would seem to me to be
better to have a system catalog that defined hints as something like:

If user A executes a query matching regex R, then coerce (or force) the
planner in this way.

I'm not suggesting that we do that, but it seems better then embedding
the hints in the queries themselves.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hints proposal

2006-10-12 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 09:26:24AM -0600, Joshua Marsh wrote:
 On 10/12/06, Jim C. Nasby [EMAIL PROTECTED] wrote:
 
 Posting here instead of hackers since this is where the thread got
 started...
 
 The argument has been made that producing a hints system will be as hard
 as actually fixing the optimizer. There's also been clamoring for an
 actual proposal, so here's one that (I hope) wouldn't be very difficult
 to implemen.
 
 My goal with this is to keep the coding aspect as simple as possible, so
 that implementation and maintenance of this isn't a big burden. Towards
 that end, these hints either tell the planner specifically how to handle
 some aspect of a query, or they tell it to modify specific cost
 estimates. My hope is that this information could be added to the
 internal representation of a query without much pain, and that the
 planner can then use that information when generating plans.
 
 
 I've been following the last thread with a bit of interest.  I like the
 proposal.  It seems simple and easy to use.  What is it about hinting that
 makes it so easily breakable with new versions?  I don't have any experience
 with Oracle, so I'm not sure how they screwed logic like this up.  Hinting
 to use a specific merge or scan seems fairly straight forward; if the query
 requests to use an index on a join, I don't see how hard it is to go with
 the suggestion.  It will become painfully obvious to the developer if his
 hinting is broken.

The problem is that when you 'hint' (which is actually not a great name
for the first part of my proposal, since it's really forcing the planner
to do something), you're tying the planner's hands. As the planner
improves in newer versions, it's very possible to end up with forced
query plans that are much less optimal than what the newer planner could
come up with. This is especially true as new query execution nodes are
created, such as hashaggregate.

The other downside is that it's per-query. It would certainly be useful
to be able to nudge the planner in the right direction on a per-table
level, but it's just not clear how to accomplish that. Like I said, the
idea behind my proposal is to have something that can be done soon, like
for 8.3.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hints proposal

2006-10-12 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 04:55:17PM +0100, Heikki Linnakangas wrote:
 Bruce Momjian wrote:
 Because DB2 doesn't like hints, and the fact that they have gotten to a
 point where they feel they do not need them, I feel we too can get to a
 point where we don't need them either.  The question is whether we can
 get there quickly enough for our userbase.
 
 In all fairness, when I used to work with DB2 we often had to rewrite 
 queries to persuade the planner to choose a different plan. Often it was 
 more of an issue of plan stability; a query would suddenly become 
 horribly slow in production because a table had grown slowly to the 
 point that it chose a different plan than before. Then we had to modify 
 the query again, or manually set the statistics. In extreme cases we had 
 to split a query to multiple parts and use temporary tables and move 
 logic to the application to get a query to perform consistently and fast 
 enough. I really really missed hints.
 
Oracle has an interesting way to deal with this, in that you can store a
plan that the optimizer generates and tell it to always use it for that
query. There's some other management tools built on top of that. I don't
know how commonly it's used, though...

Also, on the DB2 argument... I'm wondering what happens when people end
up with a query that they can't get to execute the way it should? Is the
planner *that* good that it never happens? Do you have to wait for a
fixpack when it does happen? I'm all for having a super-smart planner,
but I'm highly doubtful it will always know exactly what to do.

 That said, I really don't like the idea of hints like use index X 
 embedded in a query. I do like the idea of hints that give the planner 
 more information about the data. I don't have a concrete proposal, but 

Which is part of the problem... there's nothing to indicate we'll have
support for these improved hints anytime soon, especially if a number of
them depend on plan invalidation.

 here's some examples of hints I'd like to see:
 
 table X sometimes has millions of records and sometimes it's empty
 Expression (table.foo = table2.bar * 2) has selectivity 0.99
 if foo.bar = 5 then foo.field2 IS NULL
 Column X is unique
 function foobar() always returns either 1 or 2, and it returns 2 90% of 
 the time.
 if it's Monday, then table NEW_ORDERS has a cardinality of 10, 
 otherwise 10.
 
 BTW: Do we make use of CHECK constraints in the planner? In DB2, that 
 was one nice and clean way of hinting the planner about things. If I 
 remember correctly, you could even define CHECK constraints that weren't 
 actually checked at run-time, but were used by the planner.

I think you're right... and it is an elegant way to hint the planner.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hints proposal

2006-10-12 Thread Csaba Nagy
OK, I just have to comment...

Jim C. Nasby [EMAIL PROTECTED] writes:
  These hints would outright force the planner to do things a certain way.
  ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
 
 This proposal seems to deliberately ignore every point that has been
 made *against* doing things that way.  It doesn't separate the hints
 from the queries, it doesn't focus on fixing the statistical or cost
 misestimates that are at the heart of the issue, and it takes no account
 of the problem of hints being obsoleted by system improvements.

But whatever arguments you made about planner improvements and the like,
it will NEVER be possible to correctly estimate in all cases the
statistics for a query, even if you perfectly know WHAT statistics you
need, which is also not the case all the time. 

Tom, you're the one who knows best how the planner works... can you bet
anything you care about on the fact that one day the planner will never
ever generate a catastrophic plan without DBA tweaking ? And how far in
time we'll get to that point ?

Until that point is achieved, the above proposal is one of the simplest
to understand for the tweaking DBA, and the fastest to deploy when faced
with catastrophic plans. And I would guess it is one of the simplest to
be implemented and probably not very high maintenance either, although
this is just a guess.

If I could hint some of my queries, I would enable anonymous prepared
statements to take into account the parameter values, but I can't
because that results in runaway queries every now and then, so I had to
force postgres generate generic queries without knowing anything about
parameter values... so the effect for me is an overall slower postgres
system because I couldn't fix the particular problems I had and had to
tweak general settings. And when I have a problem I can't wait until the
planner is fixed, I have to solve it immediately... the current means to
do that are suboptimal. 

The argument that planner hints would hide problems from being solved is
a fallacy. To put a hint in place almost the same amount of analysis is
needed from the DBA as solving the problem now, so users who ask now for
help will further do it even in the presence of hints. The ones who
wouldn't are not coming for help now either, they know their way out of
the problems... and the ones who still report a shortcoming of the
planner will do it with hints too.

I would even say it would be an added benefit, cause then you could
really see how well a specific plan will do without having the planner
capable to generate alone that plan... so knowledgeable users could come
to you further down the road when they know where the planner is wrong,
saving you time.

I must say it again, this kind of query-level hinting would be the
easiest to understand for the developers... there are many
trial-end-error type of programmers out there, if you got a hint wrong,
you fix it and move on, doesn't need to be perfect, it just have to be
good enough. I heavily doubt that postgres will get bad publicity
because user Joe sot himself in the foot by using bad hints... the
probability for that is low, you must actively put those hints there,
and if you take the time to do that then you're not the average Joe, and
probably not so lazy either, and if you're putting random hints, then
you would probably mess it up some other way anyway.

And the thing about missing new features is also not very founded. If I
would want to exclude a full table scan on a specific table for a
specific query, than that's about for sure that I want to do that
regardless what new features postgres will offer in the future. Picking
one specific access method is more prone to missing new access methods,
but even then, when I upgrade the DB server to a new version, I usually
have enough other compatibility problems (till now I always had some on
every upgrade I had) that making a round of upgrading hints is not an
outstanding problem. And if the application works good enough with
suboptimal plans, why would I even take that extra effort ?

I guess the angle is: I, as a practicing DBA would like to be able to
experiment and get most out of the imperfect tool I have, and you, the
developers, want to make the tool perfect... I don't care about perfect
tools, it just have to do the job... hints or anything else, if I can
make it work GOOD ENOUGH, it's all fine. And hints is something I would
understand and be able to use.

Thanks for your patience if you're still reading this...

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Josh Berkus

Jim,


These hints would outright force the planner to do things a certain way.
... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This proposal seems to deliberately ignore every point that has been
made *against* doing things that way.  It doesn't separate the hints
from the queries, it doesn't focus on fixing the statistical or cost
misestimates that are at the heart of the issue, and it takes no account
of the problem of hints being obsoleted by system improvements.
 
Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See

also my comment below.


I don't see how adding extra tags to queries is easier to implement than 
an ability to modify the system catalogs.  Quite the opposite, really.


And, as I said, if you're going to push for a feature that will be 
obsolesced in one version, then you're going to have a really rocky row 
to hoe.



Yes, but as I mentioned the idea here was to come up with something that
is (hopefully) easy to define and implement. In other words, something
that should be doable for 8.3. Because this proposal essentially amounts
to limiting plans the planner will consider and tweaking it's cost
estimates, I'm hoping that it should be (relatively) easy to implement.


Even I, the chief marketing geek, am more concerned with getting a 
feature that we will still be proud of in 5 years than getting one in 
the next nine months.  Keep your pants on!


I actually think the way to attack this issue is to discuss the kinds of 
errors the planner makes, and what tweaks we could do to correct them. 
Here's the ones I'm aware of:


-- Incorrect selectivity of WHERE clause
-- Incorrect selectivity of JOIN
-- Wrong estimate of rows returned from SRF
-- Incorrect cost estimate for index use

Can you think of any others?

I also feel that a tenet of the design of the planner tweaks system 
ought to be that the tweaks are collectible and analyzable in some form. 
 This would allow DBAs to mail in their tweaks to -performance or 
-hackers, and then allow us to continue improving the planner.


--Josh Berkus





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Josh Berkus

Csaba,


I guess the angle is: I, as a practicing DBA would like to be able to
experiment and get most out of the imperfect tool I have, and you, the
developers, want to make the tool perfect... I don't care about perfect
tools, it just have to do the job... hints or anything else, if I can
make it work GOOD ENOUGH, it's all fine. And hints is something I would
understand and be able to use.


Hmmm, if you already understand Visual Basic syntax, should we support 
that too?  Or maybe we should support MySQL's use of '-00-00' as the 
zero date because people understand that?


We're just not going to adopt a bad design because Oracle DBAs are used 
to it.   If we wanted to do that, we could shut down the project and 
join a proprietary DB staff.


The current discussion is:

a) Planner tweaking is sometimes necessary;
b) Oracle HINTS are a bad design for planner tweaking;
c) Can we come up with a good design for planner tweaking?

So, how about suggestions for a good design?

--Josh Berkus


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Csaba Nagy
 Hmmm, if you already understand Visual Basic syntax, should we support 
 that too?  Or maybe we should support MySQL's use of '-00-00' as the 
 zero date because people understand that?

You completely misunderstood me... I have no idea about oracle hints,
never used Oracle in fact. My company uses oracle, but I have only very
very limited contact with oracle issues, and never touched a hint.

I'm only talking about ease of use, learning curves, and complexity in
general. While I do like the idea of an all automatic system optimizer
which takes your query portofolio and analyzes the data based on those
queries and creates you all the indexes you need and all that, that's
not gonna happen soon, because it's a very complex thing to implement.

The alternative is that you take your query portofolio, analyze it
yourself, figure out what statistics you need, create indexes, tweak
queries, hint the planner for correlations and stuff... which is a
complex task, and if you have to tell the server about some correlations
with the phase of the moon, you're screwed cause there will never be any
DB engine which will understand that. 

But you always can put the corresponding hint in the query when you know
the correlation is there...

The problem is that the application sometimes really knows better than
the server, when the correlations are not standard.

 We're just not going to adopt a bad design because Oracle DBAs are used 
 to it.   If we wanted to do that, we could shut down the project and 
 join a proprietary DB staff.

I have really nothing to do with Oracle. I think you guys are simply too
blinded by Oracle hate... I don't care about Oracle.

 The current discussion is:
 
 a) Planner tweaking is sometimes necessary;
 b) Oracle HINTS are a bad design for planner tweaking;

While there are plenty of arguments you made against query level hints
(can we not call them Oracle-hints ?), there are plenty of users of
postgres who expressed they would like them. I guess they were tweaking
postgres installations when they needed it, and not Oracle
installations. I expressed it clearly that for me query level hinting
would give more control and better understanding of what I have to do
for the desired result. Perfect planning - forget it, I only care about
good enough with reasonable tuning effort. If I have to tweak statistics
I will NEVER be sure postgres will not backfire on me again. On the
other hand if I say never do a seq scan on this table for this query, I
could be sure it won't...

 c) Can we come up with a good design for planner tweaking?

Angles again: good enough now is better for end users, but programmers
always go for perfect tomorrow... pity.

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Tom Lane
Csaba Nagy [EMAIL PROTECTED] writes:
 Until that point is achieved, the above proposal is one of the simplest
 to understand for the tweaking DBA, and the fastest to deploy when faced
 with catastrophic plans. And I would guess it is one of the simplest to
 be implemented and probably not very high maintenance either, although
 this is just a guess.

That guess is wrong ... but more to the point, if you think that simple
and easy to implement should be the overriding concern for designing a
new feature, see mysql.  They've used that design approach for years and
look what a mess they've got.  This project has traditionally done
things differently and I feel no need to change that mindset now.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hints proposal

2006-10-12 Thread Csaba Nagy
 I'm not suggesting that we do that, but it seems better then embedding
 the hints in the queries themselves.

OK, what about this: if I execute the same query from a web client, I
want the not-so-optimal-but-safe plan, if I execute it asynchronously, I
let the planner choose the
best-overall-performance-but-sometimes-may-be-slow plan ?

What kind of statistics/table level hinting will get you this ?

I would say only query level hinting will buy you query level control.
And that's perfectly good in some situations.

I really can't see why a query-level hinting mechanism is so evil, why
it couldn't be kept forever, and augmented with the possibility of
correlation hinting, or table level hinting. 

These are really solving different problems, with some overlapping...

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: FW: [PERFORM] Simple join optimized badly?

2006-10-12 Thread Scott Marlowe
On Thu, 2006-10-12 at 09:44, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  If someone's going to commit to putting effort into improving the
  planner then that's wonderful. But I can't recall any significant
  planner improvements since min/max (which I'd argue was more of a bug
  fix than an improvement).
 
 Hmph.  Apparently I've wasted most of the last five years.

I appreciate the work, and trust me, I've noticed the changes in the
query planner over time.  

Thanks for the hard work, and I'm sure there are plenty of other
thankful people too.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Scrub one large table against another

2006-10-12 Thread Brendan Curran



Well, IN != EXISTS != JOIN. Exists just stops as soon as it finds a
record. For some cases, it's equivalent to IN, but not all. IN has to
de-duplicate it's list in some fashion. For small IN lists, you can do
this with an OR, but at some point you need to switch to an actual
unique (actually, I suspect the difference in PostgreSQL just depends on
if you passed values into IN or a subquery). A join on the other hand
doesn't worry about duplicates at all. There may be some brains in the
planner that realize if a subquery will return a unique set (ie: you're
querying on a primary key).



I agree, and it makes sense now that I consider it that IN would force the planner to implement some 
form of unique check - possibly leveraging a PK or unique index if one is already available. Maybe 
I'll tack up a note to the online documentation letting people know so that it's a little more 
explicitly clear that when you choose IN on data that isn't explicitly unique (to the planner i.e. 
post-analyze) you get the baggage of a forced unique whether you need it or not. Or perhaps someone 
that knows the internals of the planner a little better than me should put some info up regarding that?




Just one more thing... I have found that maintaining a btree index on a 
varchar(255) value is extremely expensive on insert/update/delete. It is 
unfortunately necessary for me to maintain this index for queries and 
reports so I am transitioning to using an unindexed staging table to 
import data into before merging it with the larger table. All the docs 
and posts recommend is to drop the index, import your data, and then 
create the index again. This is untenable on a daily / bi-weekly basis. 
Is there a more elegant solution to this indexing problem?


You might be happier with tsearch than a regular index.


Thanks, I'll look into using tsearch2 as a possibility. From what I've seen so far it would add 
quite a bit of complexity (necessary updates after inserts, proprietary query syntax that might 
require a large amount of specialization from client apps) but in the end the overhead may be less 
than that of maintaining the btree.


Thanks and Regards,
B

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Scrub one large table against another

2006-10-12 Thread Tom Lane
Brendan Curran [EMAIL PROTECTED] writes:
 I'll tack up a note to the online documentation letting people know so
 that it's a little more explicitly clear that when you choose IN on
 data that isn't explicitly unique (to the planner i.e.  post-analyze)
 you get the baggage of a forced unique whether you need it or not. Or
 perhaps someone that knows the internals of the planner a little
 better than me should put some info up regarding that?

You get a forced unique step, period --- the planner doesn't try to
shortcut on the basis of noticing a relevant unique constraint.
We have some plan techniques that might look like they are not checking
uniqueness (eg, an IN Join) but they really are.

This is an example of what I was talking about just a minute ago, about
not wanting to rely on constraints that could go away while the plan is
still potentially usable.  It's certainly something that we should look
at adding as soon as the plan-invalidation infrastructure is there to
make it safe to do.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Andrew Sullivan
On Thu, Oct 12, 2006 at 02:21:55PM -0400, Merlin Moncure wrote:
 third way: to solve the problem of data (especially constants) not
 being available to the planner at the time the plan was generated.
 this happens most often with prepared statements and sql udfs.  note
 that changes to the plan generation mechanism (i think proposed by
 peter e a few weeks back) might also solve this.

You're right about this, but you also deliver the reason why we don't
need hints for that: the plan generation mechanism is a better
solution to that problem.  It's this latter thing that I keep coming
back to.  As a user of PostgreSQL, the thing that I really like about
it is its pragmatic emphasis on correctness.  In my experience, it's
a system that feels very UNIX-y: there's a willingness to accept
80/20 answers to a problem in the event you at least have a way to
get the last 20, but the developers are opposed to anything that
seems really kludgey.

In the case you're talking about, it seems to me that addressing the
problems where they come from is a better solution that trying to
find some way to work around them.  And most of the use-cases I hear
for a statement-level hints system fall into this latter category.

A
-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Unfortunately reformatting the Internet is a little more painful 
than reformatting your hard drive when it gets out of whack.
--Scott Morris

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hints proposal

2006-10-12 Thread Jeff Davis
On Thu, 2006-10-12 at 19:15 +0200, Csaba Nagy wrote:
  I'm not suggesting that we do that, but it seems better then embedding
  the hints in the queries themselves.
 
 OK, what about this: if I execute the same query from a web client, I
 want the not-so-optimal-but-safe plan, if I execute it asynchronously, I
 let the planner choose the
 best-overall-performance-but-sometimes-may-be-slow plan ?
 

Connect as a different user to control whether the hint matches or not.
If this doesn't work for you, read below.

 What kind of statistics/table level hinting will get you this ?
 

It's based not just on the table, but on environment as well, such as
the user/role.

 I would say only query level hinting will buy you query level control.
 And that's perfectly good in some situations.

My particular proposal allows arbitrary regexes on the raw query. You
could add a comment with a query id in it. 

My proposal has these advantages over query comments:
(1) Most people's needs would be solved by just matching the query
form. 
(2) If the DBA really wanted to separate out queries individually (not
based on the query form), he could do it, but it would have an extra
step that might encourage him to reconsider the necessity
(3) If someone went to all that work to shoot themselves in the foot
with unmanagable hints that are way too specific, the postgres
developers are unlikely to be blamed
(4) No backwards compatibility issues that I can see, aside from people
making their own hints unmanagable. If someone started getting bad
plans, they could just remove all the hints from the system catalogs and
it would be just as if they had never used hints. If they added ugly
comments to their queries it wouldn't really have a bad effect.

To formalize the proposal a litte, you could have syntax like:

CREATE HINT [FOR USER username] MATCHES regex APPLY HINT some_hint;

Where some_hint would be a hinting language perhaps like Jim's, except
not guaranteed to be compatible between versions of PostgreSQL. The
developers could change the hinting language at every release and people
can just re-write the hints without changing their application.

 I really can't see why a query-level hinting mechanism is so evil, why
 it couldn't be kept forever, and augmented with the possibility of
 correlation hinting, or table level hinting. 

Well, I wouldn't say evil. Query hints are certainly against the
principles of a relational database, which separate the logical query
from the physical storage.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 09:40:30AM -0700, Josh Berkus wrote:
 Jim,
 
 These hints would outright force the planner to do things a certain way.
 ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
 This proposal seems to deliberately ignore every point that has been
 made *against* doing things that way.  It doesn't separate the hints
 from the queries, it doesn't focus on fixing the statistical or cost
 misestimates that are at the heart of the issue, and it takes no account
 of the problem of hints being obsoleted by system improvements.
  
 Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
 also my comment below.
 
 I don't see how adding extra tags to queries is easier to implement than 
 an ability to modify the system catalogs.  Quite the opposite, really.
 
 And, as I said, if you're going to push for a feature that will be 
 obsolesced in one version, then you're going to have a really rocky row 
 to hoe.
 
Unless you've got a time machine or a team of coders in your back
pocket, I don't see how the planner will suddenly become perfect in
8.4...

 Yes, but as I mentioned the idea here was to come up with something that
 is (hopefully) easy to define and implement. In other words, something
 that should be doable for 8.3. Because this proposal essentially amounts
 to limiting plans the planner will consider and tweaking it's cost
 estimates, I'm hoping that it should be (relatively) easy to implement.
 
 Even I, the chief marketing geek, am more concerned with getting a 
 feature that we will still be proud of in 5 years than getting one in 
 the next nine months.  Keep your pants on!
 
Hey, I wrote that email while dressed! :P

We've been seeing the same kinds of problems that are very difficult (or
impossible) to fix cropping up for literally years... it'd be really
good to at least be able to force the planner to do the sane thing even
if we don't have the manpower to fix it right now...

 I actually think the way to attack this issue is to discuss the kinds of 
 errors the planner makes, and what tweaks we could do to correct them. 
 Here's the ones I'm aware of:
 
 -- Incorrect selectivity of WHERE clause
 -- Incorrect selectivity of JOIN
 -- Wrong estimate of rows returned from SRF
 -- Incorrect cost estimate for index use
 
 Can you think of any others?
 
There's a range of correlations where the planner will incorrectly
choose a seqscan over an indexscan.

Function problems aren't limited to SRFs... we have 0 statistics ability
for functions.

There's the whole issue of multi-column statistics.

 I also feel that a tenet of the design of the planner tweaks system 
 ought to be that the tweaks are collectible and analyzable in some form. 
  This would allow DBAs to mail in their tweaks to -performance or 
 -hackers, and then allow us to continue improving the planner.

Well, one nice thing about the per-query method is you can post before
and after EXPLAIN ANALYZE along with the hints. But yes, as we move
towards a per-table/index/function solution, there should be an easy way
to see how those hints are affecting the system and to report that data
back to the community.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Hints proposal

2006-10-12 Thread Jim C. Nasby
On Thu, Oct 12, 2006 at 09:42:55AM -0700, Jeff Davis wrote:
 On Thu, 2006-10-12 at 10:14 -0500, Jim C. Nasby wrote:
  The syntax these hints is something arbitrary. I'm borrowing Oracle's
  idea of embedding hints in comments, but we can use some other method if
  desired. Right now I'm more concerned with getting the general idea
  across.
  
 
 Is there any advantage to having the hints in the queries? To me that's
 asking for trouble with no benefit at all. It would seem to me to be
 better to have a system catalog that defined hints as something like:
 
 If user A executes a query matching regex R, then coerce (or force) the
 planner in this way.
 
 I'm not suggesting that we do that, but it seems better then embedding
 the hints in the queries themselves.

My experience is that on the occasions when I want to beat the planner
into submission, it's usually a pretty complex query that's the issue,
and that it's unlikely to have more than a handful of them in the
application. That makes me think a regex facility would just get in the
way, but perhaps others have much more extensive need of hinting.

I also suspect that writing that regex could become a real bear.

Having said that... I see no reason why it couldn't work... but the real
challenge is defining the hints.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Hints proposal

2006-10-12 Thread Arjen van der Meijden

On 12-10-2006 21:07 Jeff Davis wrote:

On Thu, 2006-10-12 at 19:15 +0200, Csaba Nagy wrote:

To formalize the proposal a litte, you could have syntax like:

CREATE HINT [FOR USER username] MATCHES regex APPLY HINT some_hint;

Where some_hint would be a hinting language perhaps like Jim's, except
not guaranteed to be compatible between versions of PostgreSQL. The
developers could change the hinting language at every release and people
can just re-write the hints without changing their application.


There are some disadvantages of not writing the hints in a query. But of 
course there are disadvantages to do as well ;)


One I can think of is that it can be very hard to define which hint 
should apply where. Especially in complex queries, defining at which 
point exaclty you'd like your hint to work is not a simple matter, 
unless you can just place a comment right at that position.


Say you have a complex query with several joins of the same table. And 
in all but one of those joins postgresql actually chooses the best 
option, but somehow you keep getting some form of join while a nested 
loop would be best. How would you pinpoint just that specific clause, 
while the others remain unhinted ?


Your approach seems to be a bit similar to aspect oriented programming 
(in java for instance). You may need a large amount of information about 
the queries and it is likely a general regexp with general hint will 
not do much good (at least I expect a hinting-system to be only useable 
in corner cases and very specific points in a query).


By the way, wouldn't it be possible if the planner learned from a query 
execution, so it would know if a choice for a specific plan or estimate 
was actually correct or not for future reference? Or is that in the line 
of DB2's complexity and a very hard problem and/or would it add too much 
overhead?


Best regards,

Arjen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Hints proposal

2006-10-12 Thread Richard Broersma Jr
 By the way, wouldn't it be possible if the planner learned from a query 
 execution, so it would know if a choice for a specific plan or estimate 
 was actually correct or not for future reference? Or is that in the line 
 of DB2's complexity and a very hard problem and/or would it add too much 
 overhead?

Just thinking out-loud here...

Wow, a learning cost based planner sounds a-lot like problem for control  
dynamical systems
theory.  As I understand it, much of the advice given for setting PostgreSQL's 
tune-able
parameters are from RULES-OF-THUMB.  I am sure that effect on server 
performance from all of the
parameters could be modeled and an adaptive feed-back controller could be 
designed to tuned these
parameters as demand on the server changes.

Al-thought, I suppose that a controller like this would have limited success 
since some of the
most affective parameters are non-run-time tune-able.

In regards to query planning, I wonder if there is way to model a controller 
that could
adjust/alter query plans based on a comparison of expected and actual query 
execution times.


Regards,

Richard Broersma Jr.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: FW: [PERFORM] Simple join optimized badly?

2006-10-12 Thread Mark Kirkwood

Tom Lane wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:

If someone's going to commit to putting effort into improving the
planner then that's wonderful. But I can't recall any significant
planner improvements since min/max (which I'd argue was more of a bug
fix than an improvement).


Hmph.  Apparently I've wasted most of the last five years.



In my opinion your on-going well thought out planner improvements are 
*exactly* the approach we need to keep doing...


Cheers

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: FW: [PERFORM] Simple join optimized badly?

2006-10-12 Thread Mark Kirkwood

Jim C. Nasby wrote:



Ok, now that I've actually looked at the release notes, I take that back
and apologize. But while there's a lot of improvements that have been
made, there's still some seriously tough problems that have been talked
about for a long time and there's still no light at the end of the
tunnel, like how to handle multi-column statistics.


Yeah - multi-column stats and cost/stats for functions look the the next 
 feature additions we need to get going on


Cheers

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hints proposal

2006-10-12 Thread Jeff Davis
On Thu, 2006-10-12 at 14:34 -0500, Jim C. Nasby wrote:
 On Thu, Oct 12, 2006 at 09:42:55AM -0700, Jeff Davis wrote:
  On Thu, 2006-10-12 at 10:14 -0500, Jim C. Nasby wrote:
   The syntax these hints is something arbitrary. I'm borrowing Oracle's
   idea of embedding hints in comments, but we can use some other method if
   desired. Right now I'm more concerned with getting the general idea
   across.
   
  
  Is there any advantage to having the hints in the queries? To me that's
  asking for trouble with no benefit at all. It would seem to me to be
  better to have a system catalog that defined hints as something like:
  
  If user A executes a query matching regex R, then coerce (or force) the
  planner in this way.
  
  I'm not suggesting that we do that, but it seems better then embedding
  the hints in the queries themselves.
 
 My experience is that on the occasions when I want to beat the planner
 into submission, it's usually a pretty complex query that's the issue,
 and that it's unlikely to have more than a handful of them in the
 application. That makes me think a regex facility would just get in the
 way, but perhaps others have much more extensive need of hinting.
 
 I also suspect that writing that regex could become a real bear.
 

Well, writing the regex is just matching criteria to apply the hint. If
you really need a quick fix, you can just write a comment with a query
id number in the query. The benefit there is that when the hint is
obsolete later (as the planner improves, or data changes
characteristics) you drop the hint and the query is planned without
interference. No application changes required.

Also, and perhaps more importantly, let's say you are trying to improve
the performance of an existing application where it's impractical to
change the query text (24/7 app, closed source, etc.). You can still
apply a hint if you're willing to write the regex. Just enable query
logging or some such to capture the query, and copy it verbatim except
for a few parameters which are unknown. Instant regex. If you have to
change the query text to apply the hint, it would be impossible in this
case.

 Having said that... I see no reason why it couldn't work... but the real
 challenge is defining the hints.

Right. The only thing I was trying to solve was the problems associated
with the hint itself embedded in the client code. I view that as a
problem that doesn't need to exist.

I'll leave it to smarter people to either improve the planner or develop
a hinting language. I don't even need hints myself, just offering a
suggestion.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Josh Berkus
Jim,

  I don't see how adding extra tags to queries is easier to implement
  than an ability to modify the system catalogs.  Quite the opposite,
  really.
 
  And, as I said, if you're going to push for a feature that will be
  obsolesced in one version, then you're going to have a really rocky
  row to hoe.

 Unless you've got a time machine or a team of coders in your back
 pocket, I don't see how the planner will suddenly become perfect in
 8.4...

Since you're not a core code contributor, I really don't see why you 
continue to claim that query hints are going to be easier to implement 
than relation-level statistics modification.  You think it's easier, but 
the people who actually work on the planner don't believe that it is.

 We've been seeing the same kinds of problems that are very difficult (or
 impossible) to fix cropping up for literally years... it'd be really
 good to at least be able to force the planner to do the sane thing even
 if we don't have the manpower to fix it right now...

As I've said to other people on this thread, you keep making the incorrect 
assumption that Oracle-style query hints are the only possible way of 
manual nuts-and-bolts query tuning.  They are not.

  I actually think the way to attack this issue is to discuss the kinds
  of errors the planner makes, and what tweaks we could do to correct
  them. Here's the ones I'm aware of:
 
  -- Incorrect selectivity of WHERE clause
  -- Incorrect selectivity of JOIN
  -- Wrong estimate of rows returned from SRF
  -- Incorrect cost estimate for index use
 
  Can you think of any others?

 There's a range of correlations where the planner will incorrectly
 choose a seqscan over an indexscan.

Please list some if you have ones which don't fall into one of the four 
problems above.

 Function problems aren't limited to SRFs... we have 0 statistics ability
 for functions.

 There's the whole issue of multi-column statistics.

Sure, but again that falls into the category of incorrect selectivity for 
WHERE/JOIN.  Don't make things more complicated than they need to be.

 Well, one nice thing about the per-query method is you can post before
 and after EXPLAIN ANALYZE along with the hints.

One bad thing is that application designers will tend to use the hint, fix 
the immediate issue, and never report a problem at all.  And query hints 
would not be collectable in any organized way except the query log, which 
would then require very sophisticated text parsing to get any useful 
information at all.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Bucky Jordan
  Well, one nice thing about the per-query method is you can post
before
  and after EXPLAIN ANALYZE along with the hints.
 
 One bad thing is that application designers will tend to use the hint,
fix
 the immediate issue, and never report a problem at all.  And query
hints
 would not be collectable in any organized way except the query log,
which
 would then require very sophisticated text parsing to get any useful
 information at all.
 
Or they'll report it when the next version of Postgres breaks their
app because the hints changed, or because the planner does something
else which makes those hints obsolete.

My main concern with hints (aside from the fact I'd rather see more
intelligence in the planner/stats) is managing them appropriately. I
have two general types of SQL where I'd want to use hints- big OLAP
stuff (where I have a lot of big queries, so it's not just one or two
where I'd need them) or large dynamically generated queries (Users
building custom queries). Either way, I don't want to put them on a
query itself.

What about using regular expressions, plus, if you have a function
(views, or any other statement that is stored), you can assign a rule to
that particular function. So you get matching, plus explicit selection.
This way it's easy to find all your hints, turn them off, manage them,
etc. (Not to mention dynamically generated SQL is ugly enough without
having to put hints in there).

- Bucky

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hints proposal

2006-10-12 Thread Tom Lane
[ trying once again to push this thread over to -hackers where it belongs ]

Arjen van der Meijden [EMAIL PROTECTED] writes:
 On 12-10-2006 21:07 Jeff Davis wrote:
 On Thu, 2006-10-12 at 19:15 +0200, Csaba Nagy wrote:
 To formalize the proposal a litte, you could have syntax like:
 CREATE HINT [FOR USER username] MATCHES regex APPLY HINT some_hint;
 
 Where some_hint would be a hinting language perhaps like Jim's, except
 not guaranteed to be compatible between versions of PostgreSQL. The
 developers could change the hinting language at every release and people
 can just re-write the hints without changing their application.

Do you have any idea how much push-back there would be to that?  In
practice we'd be bound by backwards-compatibility concerns for the hints
too.

 There are some disadvantages of not writing the hints in a query. But of 
 course there are disadvantages to do as well ;)

 One I can think of is that it can be very hard to define which hint 
 should apply where. Especially in complex queries, defining at which 
 point exaclty you'd like your hint to work is not a simple matter, 
 unless you can just place a comment right at that position.

The problems that you are seeing all come from the insistence that a
hint should be textually associated with a query.  Using a regex is a
little better than putting it right into the query, but the only thing
that really fixes is not having the hints directly embedded into
client-side code.  It's still wrong at the conceptual level.

The right way to think about it is to ask why is the planner not picking
the right plan to start with --- is it missing a statistical
correlation, or are its cost parameters wrong for a specific case, or
is it perhaps unable to generate the desired plan at all?  (If the
latter, no amount of hinting is going to help.)  If it's a statistics or
costing problem, I think the right thing is to try to fix it with hints
at that level.  You're much more likely to fix the behavior across a
class of queries than you will be with a hint textually matched to a
specific query.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-12 Thread Alvaro Herrera
Bucky Jordan wrote:

 What about using regular expressions, plus, if you have a function
 (views, or any other statement that is stored), you can assign a rule to
 that particular function. So you get matching, plus explicit selection.
 This way it's easy to find all your hints, turn them off, manage them,
 etc. (Not to mention dynamically generated SQL is ugly enough without
 having to put hints in there).

The regular expression idea that's being floated around makes my brain
feel like somebody is screeching a blackboard nearby.  I don't think
it's a sane idea.  I think you could achieve something similar by using
stored plan representations, like we do for rewrite rules.  So you'd
look for, say, a matching join combination in a catalog, and get a
selectivity from a function that would get the selectivities of the
conditions on the base tables.  Or something like that anyway.

That gets ugly pretty fast when you have to extract selectivities for
all the possible join paths in any given query.

But please don't talk about regular expressions.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hints proposal

2006-10-12 Thread Jeff Davis
On Thu, 2006-10-12 at 17:28 -0400, Tom Lane wrote:
 [ trying once again to push this thread over to -hackers where it belongs ]
 
 Arjen van der Meijden [EMAIL PROTECTED] writes:
  On 12-10-2006 21:07 Jeff Davis wrote:
  On Thu, 2006-10-12 at 19:15 +0200, Csaba Nagy wrote:
  To formalize the proposal a litte, you could have syntax like:
  CREATE HINT [FOR USER username] MATCHES regex APPLY HINT some_hint;
  
  Where some_hint would be a hinting language perhaps like Jim's, except
  not guaranteed to be compatible between versions of PostgreSQL. The
  developers could change the hinting language at every release and people
  can just re-write the hints without changing their application.
 
 Do you have any idea how much push-back there would be to that?  In
 practice we'd be bound by backwards-compatibility concerns for the hints
 too.
 

No, I don't have any idea, except that it would be less push-back than
changing a language that's embedded in client code. Also, I see no
reason to think that a hint would not be obsolete upon a new release
anyway.

 The problems that you are seeing all come from the insistence that a
 hint should be textually associated with a query.  Using a regex is a
 little better than putting it right into the query, but the only thing

Little better is all I was going for. I was just making the
observation that we can separate two concepts:
(1) Embedding code in the client's queries, which I see as very
undesirable and unnecessary
(2) Providing very specific hints

which at least gives us a place to talk about the debate more
reasonably.

 that really fixes is not having the hints directly embedded into
 client-side code.  It's still wrong at the conceptual level.
 

I won't disagree with that. I will just say it's no more wrong than
applying the same concept in addition to embedding the hints in client
queries.

 The right way to think about it is to ask why is the planner not picking
 the right plan to start with --- is it missing a statistical
 correlation, or are its cost parameters wrong for a specific case, or
 is it perhaps unable to generate the desired plan at all?  (If the
 latter, no amount of hinting is going to help.)  If it's a statistics or
 costing problem, I think the right thing is to try to fix it with hints
 at that level.  You're much more likely to fix the behavior across a
 class of queries than you will be with a hint textually matched to a
 specific query.
 

Agreed.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Hints proposal

2006-10-12 Thread Christopher Browne
Quoth [EMAIL PROTECTED] (Richard Broersma Jr):
 By the way, wouldn't it be possible if the planner learned from a query 
 execution, so it would know if a choice for a specific plan or estimate 
 was actually correct or not for future reference? Or is that in the line 
 of DB2's complexity and a very hard problem and/or would it add too much 
 overhead?

 Just thinking out-loud here...

 Wow, a learning cost based planner sounds a-lot like problem for
 control  dynamical systems theory.

Alas, dynamic control theory, home of considerable numbers of
Hamiltonian equations, as well as Pontryagin's Minimum Principle, is
replete with:
 a) Gory multivariate calculus
 b) Need for all kinds of continuity requirements (e.g. - continuous,
smooth functions with no discontinuities or other nastiness) 
otherwise the math gets *really* nasty

We don't have anything even resembling continuous because our
measures are all discrete (e.g. - the base values are all integers).

 As I understand it, much of the advice given for setting
 PostgreSQL's tune-able parameters are from RULES-OF-THUMB.  I am
 sure that effect on server performance from all of the parameters
 could be modeled and an adaptive feed-back controller could be
 designed to tuned these parameters as demand on the server changes.

Optimal control theory loves the bang-bang control, where you go to
one extreme or another, which requires all those continuity conditions
I mentioned, and is almost certainly not the right answer here.

 Al-thought, I suppose that a controller like this would have limited
 success since some of the most affective parameters are non-run-time
 tune-able.

 In regards to query planning, I wonder if there is way to model a
 controller that could adjust/alter query plans based on a comparison
 of expected and actual query execution times.

I think there would be something awesomely useful about recording
expected+actual statistics along with some of the plans.

The case that is easiest to argue for is where Actual  Expected
(e.g. - Actual was a whole lot larger than Expected); in such cases,
you've already spent a LONG time on the query, which means that
spending millisecond recording the moral equivalent to Explain
Analyze output should be an immaterial cost.

If we could record a whole lot of these cases, and possibly, with some
anonymization / permissioning, feed the data to a central place, then
some analysis could be done to see if there's merit to particular
modifications to the query plan cost model.

Part of the *really* fundamental query optimization problem is that
there seems to be some evidence that the cost model isn't perfectly
reflective of the costs of queries.  Improving the quality of the cost
model is one of the factors that would improve the performance of the
query optimizer.  That would represent a fundamental improvement.
-- 
let name=cbbrowne and tld=gmail.com in name ^ @ ^ tld;;
http://linuxdatabases.info/info/languages.html
If I can see farther it is because I am surrounded by dwarves.
-- Murray Gell-Mann 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings