Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-06-06 Thread Robert Klemme
On Thu, Feb 10, 2011 at 7:32 PM, Craig James craig_ja...@emolecules.com wrote:
 On 2/10/11 9:21 AM, Kevin Grittner wrote:

 Shaun Thomasstho...@peak6.com  wrote:

 how difficult would it be to add that syntax to the JOIN
 statement, for example?

 Something like this syntax?:

 JOIN WITH (correlation_factor=0.3)

 Where 1.0 might mean that for each value on the left there was only
 one distinct value on the right, and 0.0 would mean that they were
 entirely independent?  (Just as an off-the-cuff example -- I'm not
 at all sure that this makes sense, let alone is the best thing to
 specify.  I'm trying to get at *syntax* here, not particular knobs.)

 There are two types of problems:

 1. The optimizer is imperfect and makes a sub-optimal choice.

 2. There is theoretical reasons why it's hard for the optimizer. For
 example, in a table with 50 columns, there is a staggering number of
 possible correlations.  An optimizer can't possibly figure this out, but a
 human might know them from the start.  The City/Postal-code correlation is a
 good example.

 For #1, Postgres should never offer any sort of hint mechanism.  As many
 have pointed out, it's far better to spend the time fixing the optimizer
 than adding hacks.

 For #2, it might make sense to give a designer a way to tell Postgres stuff
 that it couldn't possibly figure out. But ... not until the problem is
 clearly defined.

 What should happen is that someone writes with an example query, and the
 community realizes that no amount of cleverness from Postgres could ever
 solve it (for solid theoretical reasons). Only then, when the problem is
 clearly defined, should we talk about solutions and SQL extensions.

I don't have one such query handy.  However, I think your posting is a
good starting point for a discussion how to figure out what we need
and how a good solution could look like.  For example, one thing I
dislike about hints is that they go into the query.  There are a few
drawbacks of this approach

- Applications need to be changed to benefit which is not always possible.
- One important class of such applications are those that use OR
mappers - hinting then would have to be buried in OR mapper code or
configuration.
- Hints in the query work only for exactly that query (this might be
an advantage depending on point of view).

I think the solution should rather be to tell Postgres what it
couldn't possibly figure out.  I imagine that could be some form of
description of the distribution of data in columns and / or
correlations between columns.  Advantage would be that the optimizer
gets additional input which it can use (i.e. the usage can change
between releases), the information is separate from queries (more like
meta data for tables) and thus all queries using a particular table
which was augmented with this meta data would benefit.  Usage of this
meta data could be controlled by a flag per session (as well as
globally) so it would be relatively easy to find out whether this meta
data has become obsolete (because data changed or a new release of the
database is in use).

Kind regards

robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-06-05 Thread pasman pasmański
Hi. I have the idea: hints joined to function. For example instead of

WHERE table1.field1=table2.field2
write:
WHERE specificeq(table1.field1,table2.field2)

and hints add to declaration of specificeq function.

2011/2/23, Robert Haas robertmh...@gmail.com:
 On Wed, Feb 16, 2011 at 4:22 PM, Bruce Momjian br...@momjian.us wrote:
 I am not excited about the idea of putting these correlations in
 queries.  What would be more intesting would be for analyze to build a
 correlation coeffficent matrix showing how columns are correlated:

        a   b   c
    a   1   .4  0
    b   .1  1   -.3
    c   .2  .3  1

 and those correlations could be used to weigh how the single-column
 statistics should be combined.

 If you can make it work, I'll take it... it's (much) easier said than
 done, though.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 

pasman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-22 Thread Robert Haas
On Wed, Feb 16, 2011 at 4:22 PM, Bruce Momjian br...@momjian.us wrote:
 I am not excited about the idea of putting these correlations in
 queries.  What would be more intesting would be for analyze to build a
 correlation coeffficent matrix showing how columns are correlated:

        a   b   c
    a   1   .4  0
    b   .1  1   -.3
    c   .2  .3  1

 and those correlations could be used to weigh how the single-column
 statistics should be combined.

If you can make it work, I'll take it... it's (much) easier said than
done, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-16 Thread Bruce Momjian
Kevin Grittner wrote:
 Shaun Thomas stho...@peak6.com wrote:
  
  how difficult would it be to add that syntax to the JOIN
  statement, for example?
  
 Something like this syntax?:
  
 JOIN WITH (correlation_factor=0.3)
  
 Where 1.0 might mean that for each value on the left there was only
 one distinct value on the right, and 0.0 would mean that they were
 entirely independent?  (Just as an off-the-cuff example -- I'm not
 at all sure that this makes sense, let alone is the best thing to
 specify.  I'm trying to get at *syntax* here, not particular knobs.)

I am not excited about the idea of putting these correlations in
queries.  What would be more intesting would be for analyze to build a
correlation coeffficent matrix showing how columns are correlated:

a   b   c
a   1   .4  0
b   .1  1   -.3
c   .2  .3  1

and those correlations could be used to weigh how the single-column
statistics should be combined.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints

2011-02-13 Thread Rob Wultsch
On Thu, Feb 10, 2011 at 9:25 AM, Chris Browne cbbro...@acm.org wrote:
 robertmh...@gmail.com (Robert Haas) writes:
 On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Well, I'm comfortable digging in my heels against doing *lame* hints
 just because it's what all the other kids are doing, which I think
 is the only thing which would have satisfied the OP on this thread.
 From both on-list posts and ones exchanged off-list with me, it
 seems he was stubbornly resistant to properly tuning the server to
 see if any problems remained, or posting particular problems to see
 how they would be most effectively handled in PostgreSQL.  We
 obviously can't be drawn into dumb approaches because of
 ill-informed demands like that.

 Nor was I proposing any such thing.  But that doesn't make we don't
 want hints an accurate statement.  Despite the impression that OP
 went away with, the real situation is a lot more nuanced than that,
 and the statement on the Todo list gives the wrong impression, IMHO.

 I have added the following comment to the ToDo:

   We are not interested to implement hints in ways they are commonly
   implemented on other databases, and proposals based on because
   they've got them will not be welcomed.  If you have an idea that
   avoids the problems that have been observed with other hint systems,
   that could lead to valuable discussion.

 That seems to me to characterize the nuance.


Where exactly are the problems with other systems noted? Most other
systems have this option so saying They have problems is a giant cop
out.


-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints

2011-02-13 Thread Josh Berkus
I've wordsmithed Chris's changes some, and then spun off a completely
separate page for Hints discussion, since the NotToDo item was becoming
too long.

 Something like this syntax?:

 JOIN WITH (correlation_factor=0.3)

Please, NO!

This is exactly the kind of hint that I regard as a last resort if we
run out of implementation alternatives.  Any hint which gets coded into
the actual queries becomes a *massive* maintenance and upgrade headache
thereafter.  If we're implementing a hint alternative, we should look at
stuff in this priority order:

1. Useful tuning of additional cost parameters by GUC (i.e.
cursor_tuple_fraction)
2. Modifying cost parameters on database *objects* (i.e. ndistinct=500)
3. Adding new parameters to modify on database objects (i.e.
distribution=normal(1.5,17),new_rows=0.1)
4. Query hints (if all of the above fails to give fixes for some tested
problem)

 Where exactly are the problems with other systems noted? Most other
 systems have this option so saying They have problems is a giant cop
 out.

I've put my list down:
http://wiki.postgresql.org/wiki/OptimizerHintsDiscussion#Problems_with_existing_Hint_stystems

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints

2011-02-13 Thread Scott Marlowe
On Sun, Feb 13, 2011 at 3:29 PM, Josh Berkus j...@agliodbs.com wrote:
 I've wordsmithed Chris's changes some, and then spun off a completely
 separate page for Hints discussion, since the NotToDo item was becoming
 too long.

 Something like this syntax?:

 JOIN WITH (correlation_factor=0.3)

 Please, NO!

 This is exactly the kind of hint that I regard as a last resort if we
 run out of implementation alternatives.  Any hint which gets coded into
 the actual queries becomes a *massive* maintenance and upgrade headache
 thereafter.  If we're implementing a hint alternative, we should look at
 stuff in this priority order:

 1. Useful tuning of additional cost parameters by GUC (i.e.
 cursor_tuple_fraction)
 2. Modifying cost parameters on database *objects* (i.e. ndistinct=500)
 3. Adding new parameters to modify on database objects (i.e.
 distribution=normal(1.5,17),new_rows=0.1)
 4. Query hints (if all of the above fails to give fixes for some tested
 problem)

I fail to see how 1 through 3 can tell the planner the correlation
between two fields in two separate tables.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints

2011-02-13 Thread Scott Marlowe
On Sun, Feb 13, 2011 at 10:49 PM, Josh Berkus j...@agliodbs.com wrote:

 I fail to see how 1 through 3 can tell the planner the correlation
 between two fields in two separate tables.

 CREATE CORRELATION_ESTIMATE ( table1.colA ) = ( table2.colB ) IS 0.3

 ... and then it fixes the correlation for *every* query in the database, not
 just that one.  And is easy to fix if the correlation changes.

I like that.  Even better, could we setup some kind of simple command
to tell analyze to collect stats for the two columns together?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-11 Thread Віталій Тимчишин
2011/2/10 Tobias Brox tobi...@gmail.com

 On 4 February 2011 04:46, Josh Berkus j...@agliodbs.com wrote:
  Optimizer hints are used to work around problems in the optimizer and
  introduce upgrade and maintenance issues. We would rather have the
  problems reported and fixed. We have discussed a more sophisticated
  system of per-class cost adjustment instead, but a specification remains
  to be developed.

 I have no clue about how hints works in Oracle ... I've never been
 working enterprise level on anything else than Postgres.  Anyway,
 today I just came over an interesting problem in our production
 database today - and I think it would be a benefit to be able to
 explicitly tell the planner what index to use (the dev team is adding
 redundant attributes and more indexes to solve the problem - which
 worries me, because we will run into serious problems as soon as there
 won't be enough memory for all the frequently-used indexes).

 We have users and transactions, and we have transaction types.  The
 transaction table is huge.  The users are able to interactively check
 their transaction listings online, and they have some simple filter
 options available as well.  Slightly simplified, the queries done
 looks like this:

   select * from account_transaction where account_id=? order by
 created desc limit 25;

   select * from account_transaction where trans_type_id in ( ...
 long, hard-coded list ...) and account_id=? order by created desc
 limit 25;

 and we have indexes on:

 account_transaction(account_id, created)

   account_transaction(account_id, trans_type_id, created)

 If the list is hard-coded, you can create partial index  on
account_transaction(account_id, created desc) where trans_type_id in ( ...
long, hard-coded list ...)


-- 
Best regards,
 Vitalii Tymchyshyn


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-11 Thread Tobias Brox
2011/2/11 Віталій Тимчишин tiv...@gmail.com:
 If the list is hard-coded, you can create partial index  on
 account_transaction(account_id, created desc) where trans_type_id in ( ...
 long, hard-coded list ...)

My idea as well, though it looks ugly and it would be a maintenance
head-ache (upgrading the index as new transaction types are added
would mean costly write locks on the table, and we can't rely on
manual processes to get it right ... we might need to set up scripts
to either upgrade the index or alert us if the index needs upgrading).

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-11 Thread Vitalii Tymchyshyn

11.02.11 11:29, Tobias Brox написав(ла):

2011/2/11 Віталій Тимчишинtiv...@gmail.com:

If the list is hard-coded, you can create partial index  on
account_transaction(account_id, created desc) where trans_type_id in ( ...
long, hard-coded list ...)

My idea as well, though it looks ugly and it would be a maintenance
head-ache (upgrading the index as new transaction types are added
would mean costly write locks on the table,

Create new one concurrently.

  and we can't rely on
manual processes to get it right ... we might need to set up scripts
to either upgrade the index or alert us if the index needs upgrading).

Yep. Another option could be to add query rewrite as

select  * from (
select * from account_transaction where trans_type_id =type1 and 
account_id=? order by created desc limit 25 union all
select * from account_transaction where trans_type_id =type2 and 
account_id=? order by created desc limit 25 union all

...
union all
select * from account_transaction where trans_type_id =typeN and 
account_id=? order by created desc limit 25

) a
order by created desc limit 25

This will allow to use three-column index in the way it can be used for 
such query. Yet if N is large query will look ugly. And I am not sure if 
optimizer is smart enough for not to fetch 25*N rows.



Best regards, Vitalii Tymchyshyn


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-11 Thread Tobias Brox
2011/2/11 Vitalii Tymchyshyn tiv...@gmail.com:
 My idea as well, though it looks ugly and it would be a maintenance
 head-ache (upgrading the index as new transaction types are added
 would mean costly write locks on the table,

 Create new one concurrently.

Concurrently?  Are there any ways to add large indexes without
blocking inserts to the table for the time it takes to create the
index?

 Yep. Another option could be to add query rewrite as

 select  * from (
 select * from account_transaction where trans_type_id =type1 and
 account_id=? order by created desc limit 25 union all
 select * from account_transaction where trans_type_id =type2 and
 account_id=? order by created desc limit 25 union all
 ...
 union all
 select * from account_transaction where trans_type_id =typeN and
 account_id=? order by created desc limit 25
 ) a
 order by created desc limit 25

I actually considered that.  For the test case given it works very
fast.  Not sure if it would work universally ... it scales well when
having extreme amounts of transactions outside the given transaction
list (the case we have problems with now), but it wouldn't scale if
some user has an extreme amount of transactions within the list.
However,  I think our extreme amount of transactions-problem is
mostly limited to the transaction types outside the list.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-11 Thread Andrea Suisani

On 02/11/2011 12:26 PM, Tobias Brox wrote:

2011/2/11 Vitalii Tymchyshyntiv...@gmail.com:

My idea as well, though it looks ugly and it would be a maintenance
head-ache (upgrading the index as new transaction types are added
would mean costly write locks on the table,


Create new one concurrently.


Concurrently?  Are there any ways to add large indexes without
blocking inserts to the table for the time it takes to create the
index?


yep, AFAIR since 8.2
see: 
http://www.postgresql.org/docs/8.2/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

[cut]

Andrea

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-11 Thread Pierre C



   select * from account_transaction where trans_type_id in ( ...
long, hard-coded list ...) and account_id=? order by created desc
limit 25;


You could use an index on (account_id, created, trans_type), in  
replacement of your index on (account_id, created). This will not prevent  
the Index Scan Backwards, but at least, index rows with trans_type not  
matching the WHERE clause will not generate any heap access...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Robert Haas
On Thu, Feb 3, 2011 at 8:46 PM, Josh Berkus j...@agliodbs.com wrote:
 Optimizer hints are used to work around problems in the optimizer and
 introduce upgrade and maintenance issues. We would rather have the
 problems reported and fixed. We have discussed a more sophisticated
 system of per-class cost adjustment instead, but a specification remains
 to be developed.

 That seems pretty straightforwards.  There are even links to prior
 discussions about what kind of system would work.  I don't think this
 text needs any adjustment; that's our clear consensus on the hint issue:
 we want a tool which works better than what we've seen in other databases.

I think it's just dumb to say we don't want hints.  We want hints, or
at least many of us do.  We just want them to actually work, and to
not suck.  Can't we just stop saying we don't want them, and say that
we do want something, but it has to be really good?

 Yes, I occasionally run across cases where having a query tweaking
 system would help me fix a pathological failure in the planner.
 However, even on data warehouses that's less than 0.1% of the queries I
 deal with, so this isn't exactly a common event.  And any hinting system
 we develop needs to address those specific cases, NOT a hypothetical
 case which can't be tested.  Otherwise we'll implement hints which
 actually don't improve queries.

No argument.

The bottom line here is that a lot of features that we don't have are
things that we don't want in the sense that we're not interested in
working on them over other things that seem more pressing, and we have
finite manpower.  But if someone feels motivated to work on it, and
can actually come up with something good, then why should we give the
impression that such a thing would be rejected out of hand?  I think
we ought to nuke that item and replace it with some items in the
optimizer section that express what we DO want, which is some better
ways of fixing queries the few queries that suck despite our best (and
very successful) efforts to produce a top-notch optimizer.

The problem with multi-column statistics is a particularly good
example of something in this class.  We may have a great solution to
that problem for PostgreSQL 11.0.  But between now and then, if you
have that problem, there is no good way to adjust the selectivity
estimates.  If this were an academic research project or just being
used for toy projects that didn't really matter, we might not care.
But this is a real database that people are relying on for their
livelihood, and we should be willing to provide a way for those people
to not get fired when they hit the 0.1% of queries that can't be fixed
using existing methods.  I don't know exactly what the right solution
is off the top of my head, but digging in our heels is not it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 I think it's just dumb to say we don't want hints.  We want hints,
 or at least many of us do.
 
Well, yeah.  Even those most ostensibly opposed to hints have been
known to post that they would rather not have the optimizer
recognize two logically equivalent constructs and optimize them the
same because they find the current difference useful to coerce the
optimizer to choose a certain plan.  That's implementing hints but
refusing to document them.  And it sometimes bites those who don't
know they're accidentally using a hint construct.  An explicit and
documented hint construct would be better.  Probably not a use this
plan type hint, but some form of optimization barrier hint, maybe. 
You know, like OFFSET 0, but more explicitly hint-like.
 
 The bottom line here is that a lot of features that we don't have
 are things that we don't want in the sense that we're not
 interested in working on them over other things that seem more
 pressing, and we have finite manpower.  But if someone feels
 motivated to work on it, and can actually come up with something
 good, then why should we give the impression that such a thing
 would be rejected out of hand?  I think we ought to nuke that item
 and replace it with some items in the optimizer section that
 express what we DO want, which is some better ways of fixing
 queries the few queries that suck despite our best (and very
 successful) efforts to produce a top-notch optimizer.
 
 The problem with multi-column statistics is a particularly good
 example of something in this class.  We may have a great solution
 to that problem for PostgreSQL 11.0.  But between now and then, if
 you have that problem, there is no good way to adjust the
 selectivity estimates.
 
Yeah, this is probably the most important area to devise some
explicit way for a DBA who knows that such multicolumn selections
are going to be used, and is capable of calculating some correlation
factor, could supply it to the optimizer to override the naive
calculation it currently does.  Even there I would tend to think
that the sort of do it this way hints that people seem to
initially want wouldn't be good; it should be a way to override the
costing factor which the optimizer gets wrong, so it can do its
usual excellent job of evaluating plans with accurate costs.
 
 I don't know exactly what the right solution is off the top of my
 head, but digging in our heels is not it.
 
Well, I'm comfortable digging in my heels against doing *lame* hints
just because it's what all the other kids are doing, which I think
is the only thing which would have satisfied the OP on this thread. 
From both on-list posts and ones exchanged off-list with me, it
seems he was stubbornly resistant to properly tuning the server to
see if any problems remained, or posting particular problems to see
how they would be most effectively handled in PostgreSQL.  We
obviously can't be drawn into dumb approaches because of
ill-informed demands like that.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Robert Haas robertmh...@gmail.com wrote:
 I don't know exactly what the right solution is off the top of my
 head, but digging in our heels is not it.
 
 Well, I'm comfortable digging in my heels against doing *lame* hints
 just because it's what all the other kids are doing, which I think
 is the only thing which would have satisfied the OP on this thread. 

Right.  If someone comes up with a design that avoids the serious
pitfalls of traditional hinting schemes, that'd be great.  But I'm
not interested in implementing Oracle-like hints just because Oracle
has them, which I think was basically what the OP wanted.  I haven't
seen a hinting scheme that didn't suck (and that includes the aspects
of our own current behavior that are hint-like).  I don't say that
there can't be one.

I believe that the FAQ entry is meant to answer people who come along
and say oh, this is easily solved, just do what $PRODUCT does.  The
generic answer to that is no, it's not that easy.  But maybe the FAQ
should be rephrased to be more like we don't want traditional hints
because of problems X, Y, and Z.  If you have an idea that avoids those
problems, let us know.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Robert Haas
On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Well, I'm comfortable digging in my heels against doing *lame* hints
 just because it's what all the other kids are doing, which I think
 is the only thing which would have satisfied the OP on this thread.
 From both on-list posts and ones exchanged off-list with me, it
 seems he was stubbornly resistant to properly tuning the server to
 see if any problems remained, or posting particular problems to see
 how they would be most effectively handled in PostgreSQL.  We
 obviously can't be drawn into dumb approaches because of
 ill-informed demands like that.

Nor was I proposing any such thing.  But that doesn't make we don't
want hints an accurate statement.  Despite the impression that OP
went away with, the real situation is a lot more nuanced than that,
and the statement on the Todo list gives the wrong impression, IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Shaun Thomas

On 02/10/2011 10:45 AM, Kevin Grittner wrote:


Even there I would tend to think that the sort of do it this way
hints that people seem to initially want wouldn't be good; it should
be a way to override the costing factor which the optimizer gets
wrong, so it can do its usual excellent job of evaluating plans with
accurate costs.


You know... that's an interesting approach. We already do that with 
functions by allowing users to specify the estimated cost, rows 
returned, and even override config settings. It's an inexact science at 
best, but it might help the optimizer out.


Really... how difficult would it be to add that syntax to the JOIN 
statement, for example?


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Shaun Thomas

On 02/10/2011 11:01 AM, Tom Lane wrote:


But I'm not interested in implementing Oracle-like hints just because
Oracle has them, which I think was basically what the OP wanted.


Hilariously, I'm not so sure that's what the OP wanted. Several of us 
pointed him to EnterpriseDB and their Oracle-style syntax, and the only 
thing he said about that was to use it as further evidence that 
PostgreSQL should implement them. I'm very tempted to say he wanted 
something for free, and was angry he couldn't get it.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Kevin Grittner
Shaun Thomas stho...@peak6.com wrote:
 
 how difficult would it be to add that syntax to the JOIN
 statement, for example?
 
Something like this syntax?:
 
JOIN WITH (correlation_factor=0.3)
 
Where 1.0 might mean that for each value on the left there was only
one distinct value on the right, and 0.0 would mean that they were
entirely independent?  (Just as an off-the-cuff example -- I'm not
at all sure that this makes sense, let alone is the best thing to
specify.  I'm trying to get at *syntax* here, not particular knobs.)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 maybe the FAQ should be rephrased to be more like we don't want
 traditional hints because of problems X, Y, and Z.  If you have
 an idea that avoids those problems, let us know.
 
 That's closer to where I think the community is on this issue
 
That sounds pretty good to me.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Shaun Thomas

On 02/10/2011 11:21 AM, Kevin Grittner wrote:


Something like this syntax?:

JOIN WITH (correlation_factor=0.3)


I was thinking more:

JOIN foo_tab USING (foo_id) WITH (COST=50)

or something, to exploit the hooks that already exist for functions, for 
example. But it's still an interesting concept. Tell the optimizer what 
you want and how the data is really related in cases where it's wrong, 
and let it figure out the best path.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Kevin Grittner
Shaun Thomas stho...@peak6.com wrote:
 
 I was thinking more:
 
 JOIN foo_tab USING (foo_id) WITH (COST=50)
 
The problem I have with that syntax is that it would be hard to read
when you have some nested set of joins or a (SELECT) in the JOIN
instead of simple table name.  For me, at least, it would get lost
less easily if it were right next to the JOIN keyword.
 
The problem with a COST factor is that it's not obvious to me what
it would apply to:
 - each row on the left?
 - each row on the right?
 - each row in the result of the JOIN step?
 - the entire step?
 
How would it scale based on other criteria which affected the number
of rows on either side of the join?
 
If I'm understanding the problem correctly, the part the optimizer
gets wrong (because we don't yet have statistics to support a better
assumption) is assuming that selection criteria on opposite sides of
a join affect entirely independent sets of what would be in the
result without the criteria.  To use an oft-cited example, when one
table is selected by zip code and the other by city, that's a bad
assumption about the correlation, leading to bad estimates, leading
to bad costing, leading to bad plans.  The OP wanted to override
step 4, a COST setting would try to override step 3, but I think we
would want to override step 1 (until we get statistics which let us
compute that accurately).
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Greg Smith

Shaun Thomas wrote:
Hilariously, I'm not so sure that's what the OP wanted. 


Someone to blame as a scapegoat for why his badly planned project had 
failed.  I've done several Oracle conversions before, and never met 
someone who was so resistent to doing the right things for such a 
conversion.  You have to relatively flexible in your thinking to work 
with the good and away from the bad parts of PostgreSQL for such a 
project to succeed.  I didn't hear a whole lot of flexible in that 
discussion.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints

2011-02-10 Thread Chris Browne
robertmh...@gmail.com (Robert Haas) writes:
 On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Well, I'm comfortable digging in my heels against doing *lame* hints
 just because it's what all the other kids are doing, which I think
 is the only thing which would have satisfied the OP on this thread.
 From both on-list posts and ones exchanged off-list with me, it
 seems he was stubbornly resistant to properly tuning the server to
 see if any problems remained, or posting particular problems to see
 how they would be most effectively handled in PostgreSQL.  We
 obviously can't be drawn into dumb approaches because of
 ill-informed demands like that.

 Nor was I proposing any such thing.  But that doesn't make we don't
 want hints an accurate statement.  Despite the impression that OP
 went away with, the real situation is a lot more nuanced than that,
 and the statement on the Todo list gives the wrong impression, IMHO.

I have added the following comment to the ToDo:

   We are not interested to implement hints in ways they are commonly
   implemented on other databases, and proposals based on because
   they've got them will not be welcomed.  If you have an idea that
   avoids the problems that have been observed with other hint systems,
   that could lead to valuable discussion.

That seems to me to characterize the nuance.
-- 
let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];;
http://www3.sympatico.ca/cbbrowne/languages.html
If only women came with pull-down menus and online help.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:
 Shaun Thomas wrote:
 Hilariously, I'm not so sure that's what the OP wanted. 
 
 Someone to blame as a scapegoat for why his badly planned project
 had failed.  I've done several Oracle conversions before, and
 never met someone who was so resistent to doing the right things
 for such a conversion.  You have to relatively flexible in your
 thinking to work with the good and away from the bad parts of
 PostgreSQL for such a project to succeed.  I didn't hear a whole
 lot of flexible in that discussion.
 
I was thinking along the same lines, but couldn't find the words to
put it so politely, so I held back.  Still biting my tongue, but I
appreciate your milder summary.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Craig James

On 2/10/11 9:21 AM, Kevin Grittner wrote:

Shaun Thomasstho...@peak6.com  wrote:


how difficult would it be to add that syntax to the JOIN
statement, for example?


Something like this syntax?:

JOIN WITH (correlation_factor=0.3)

Where 1.0 might mean that for each value on the left there was only
one distinct value on the right, and 0.0 would mean that they were
entirely independent?  (Just as an off-the-cuff example -- I'm not
at all sure that this makes sense, let alone is the best thing to
specify.  I'm trying to get at *syntax* here, not particular knobs.)


There are two types of problems:

1. The optimizer is imperfect and makes a sub-optimal choice.

2. There is theoretical reasons why it's hard for the optimizer. For example, 
in a table with 50 columns, there is a staggering number of possible 
correlations.  An optimizer can't possibly figure this out, but a human might 
know them from the start.  The City/Postal-code correlation is a good example.

For #1, Postgres should never offer any sort of hint mechanism.  As many have 
pointed out, it's far better to spend the time fixing the optimizer than adding 
hacks.

For #2, it might make sense to give a designer a way to tell Postgres stuff 
that it couldn't possibly figure out. But ... not until the problem is clearly 
defined.

What should happen is that someone writes with an example query, and the 
community realizes that no amount of cleverness from Postgres could ever solve 
it (for solid theoretical reasons). Only then, when the problem is clearly 
defined, should we talk about solutions and SQL extensions.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Tobias Brox
On 4 February 2011 04:46, Josh Berkus j...@agliodbs.com wrote:
 Optimizer hints are used to work around problems in the optimizer and
 introduce upgrade and maintenance issues. We would rather have the
 problems reported and fixed. We have discussed a more sophisticated
 system of per-class cost adjustment instead, but a specification remains
 to be developed.

I have no clue about how hints works in Oracle ... I've never been
working enterprise level on anything else than Postgres.  Anyway,
today I just came over an interesting problem in our production
database today - and I think it would be a benefit to be able to
explicitly tell the planner what index to use (the dev team is adding
redundant attributes and more indexes to solve the problem - which
worries me, because we will run into serious problems as soon as there
won't be enough memory for all the frequently-used indexes).

We have users and transactions, and we have transaction types.  The
transaction table is huge.  The users are able to interactively check
their transaction listings online, and they have some simple filter
options available as well.  Slightly simplified, the queries done
looks like this:

   select * from account_transaction where account_id=? order by
created desc limit 25;

   select * from account_transaction where trans_type_id in ( ...
long, hard-coded list ...) and account_id=? order by created desc
limit 25;

and we have indexes on:

   account_transaction(account_id, created)

   account_transaction(account_id, trans_type_id, created)

(At this point, someone would probably suggest to make three
single-key indexes and use bitmap index scan ... well, pulling 25 rows
from the end of an index may be orders of magnitude faster than doing
bitmap index mapping on huge indexes)

For the second query, the planner would chose the first index - and
maybe it makes sense - most of our customers have between 10-30% of
the transactions from the long list of transaction types, slim indexes
are good and by average the slimmer index would probably do the job a
bit faster.  The problem is with the corner cases - for some of our
extreme customers thousands of transaction index tuples may need to be
scanned before 25 rows with the correct transaction type is pulled
out, and if the index happens to be on disk, it may take tens of
seconds to pull out the answer.  Tens of seconds of waiting leads to
frustration, it is a lot nowadays in an interactive session.  Also, I
haven't really checked it up, but it may very well be that this is
exactly the kind of customers we want to retain.

To summarize, there are two things the planner doesn't know - it
doesn't know that there exists such corner cases where the real cost
is far larger than the estimated cost, and it doesn't know that it's
more important to keep the worst-case cost on a reasonable level than
to minimize the average cost.  In the ideal world postgres would have
sufficiently good statistics to know that for user #7 it is better
to chose the second index, but I suppose it would be easier if I was
able to explicitly hide the account_transaction(account_id, created)
index for this query.  Well, I know of one way to do it ... but I
suppose it's not a good idea to put drop index foo; select ...;
rollback; into production ;-)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-10 Thread Tom Lane
Tobias Brox tobi...@gmail.com writes:
 I have no clue about how hints works in Oracle ... I've never been
 working enterprise level on anything else than Postgres.  Anyway,
 today I just came over an interesting problem in our production
 database today - and I think it would be a benefit to be able to
 explicitly tell the planner what index to use (the dev team is adding
 redundant attributes and more indexes to solve the problem - which
 worries me, because we will run into serious problems as soon as there
 won't be enough memory for all the frequently-used indexes).

 We have users and transactions, and we have transaction types.  The
 transaction table is huge.  The users are able to interactively check
 their transaction listings online, and they have some simple filter
 options available as well.  Slightly simplified, the queries done
 looks like this:

select * from account_transaction where account_id=? order by
 created desc limit 25;

select * from account_transaction where trans_type_id in ( ...
 long, hard-coded list ...) and account_id=? order by created desc
 limit 25;

 and we have indexes on:

account_transaction(account_id, created)

account_transaction(account_id, trans_type_id, created)

Well, in this case the optimizer *is* smarter than you are, and the
reason is that it remembers the correct rules for when indexes are
useful.  That second index is of no value for either query, because
in doesn't work the way you're hoping.

I understand the larger point you're trying to make, but this example
also nicely illustrates the point being made on the other side, that
force the optimizer to use the index I think it should use isn't a
very good solution.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-03 Thread Josh Berkus

 http://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want
 
 No. 2 on the list.

Heck, *I* wrote that text.

I quote:

Optimizer hints are used to work around problems in the optimizer and
introduce upgrade and maintenance issues. We would rather have the
problems reported and fixed. We have discussed a more sophisticated
system of per-class cost adjustment instead, but a specification remains
to be developed.

That seems pretty straightforwards.  There are even links to prior
discussions about what kind of system would work.  I don't think this
text needs any adjustment; that's our clear consensus on the hint issue:
we want a tool which works better than what we've seen in other databases.

Quite frankly, the main reason why most DBMSes have a hinting system has
nothing to do with the quality of optimizer and everything to do with
DBAs who think they're smarter than the optimizer (incorrectly).  Oracle
has a darned good query optimizer, and SQL server's is even better.
However, there are a lot of undereducated or fossilized DBAs out there
who don't trust the query planner and want to override it in fairly
arbitrary ways; I refer you to the collected works of Dan Tow, for example.

In many cases Hints are used by DBAs in emergency situations because
they are easier than figuring out what the underlying issue is, even
when that could be done relatively simply.  Improving diagnostic query
tools would be a much better approach here; for example, the team
working on hypothetical indexes has a lot to offer.  If you can figure
out what's really wrong with the query in 10 minutes, you don't need a hint.

Yes, I occasionally run across cases where having a query tweaking
system would help me fix a pathological failure in the planner.
However, even on data warehouses that's less than 0.1% of the queries I
deal with, so this isn't exactly a common event.  And any hinting system
we develop needs to address those specific cases, NOT a hypothetical
case which can't be tested.  Otherwise we'll implement hints which
actually don't improve queries.

Commercial DBMSes have to give in to what their big paying customers
want, no matter how stupid it is. I'm grateful that I can work on a DBMS
-- the third most popular SQL DBMS in the world -- which can focus on
quality instead.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance