Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On Thu, Feb 10, 2011 at 7:32 PM, Craig James craig_ja...@emolecules.com wrote: On 2/10/11 9:21 AM, Kevin Grittner wrote: Shaun Thomasstho...@peak6.com wrote: how difficult would it be to add that syntax to the JOIN statement, for example? Something like this syntax?: JOIN WITH (correlation_factor=0.3) Where 1.0 might mean that for each value on the left there was only one distinct value on the right, and 0.0 would mean that they were entirely independent? (Just as an off-the-cuff example -- I'm not at all sure that this makes sense, let alone is the best thing to specify. I'm trying to get at *syntax* here, not particular knobs.) There are two types of problems: 1. The optimizer is imperfect and makes a sub-optimal choice. 2. There is theoretical reasons why it's hard for the optimizer. For example, in a table with 50 columns, there is a staggering number of possible correlations. An optimizer can't possibly figure this out, but a human might know them from the start. The City/Postal-code correlation is a good example. For #1, Postgres should never offer any sort of hint mechanism. As many have pointed out, it's far better to spend the time fixing the optimizer than adding hacks. For #2, it might make sense to give a designer a way to tell Postgres stuff that it couldn't possibly figure out. But ... not until the problem is clearly defined. What should happen is that someone writes with an example query, and the community realizes that no amount of cleverness from Postgres could ever solve it (for solid theoretical reasons). Only then, when the problem is clearly defined, should we talk about solutions and SQL extensions. I don't have one such query handy. However, I think your posting is a good starting point for a discussion how to figure out what we need and how a good solution could look like. For example, one thing I dislike about hints is that they go into the query. There are a few drawbacks of this approach - Applications need to be changed to benefit which is not always possible. - One important class of such applications are those that use OR mappers - hinting then would have to be buried in OR mapper code or configuration. - Hints in the query work only for exactly that query (this might be an advantage depending on point of view). I think the solution should rather be to tell Postgres what it couldn't possibly figure out. I imagine that could be some form of description of the distribution of data in columns and / or correlations between columns. Advantage would be that the optimizer gets additional input which it can use (i.e. the usage can change between releases), the information is separate from queries (more like meta data for tables) and thus all queries using a particular table which was augmented with this meta data would benefit. Usage of this meta data could be controlled by a flag per session (as well as globally) so it would be relatively easy to find out whether this meta data has become obsolete (because data changed or a new release of the database is in use). Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Hi. I have the idea: hints joined to function. For example instead of WHERE table1.field1=table2.field2 write: WHERE specificeq(table1.field1,table2.field2) and hints add to declaration of specificeq function. 2011/2/23, Robert Haas robertmh...@gmail.com: On Wed, Feb 16, 2011 at 4:22 PM, Bruce Momjian br...@momjian.us wrote: I am not excited about the idea of putting these correlations in queries. What would be more intesting would be for analyze to build a correlation coeffficent matrix showing how columns are correlated: a b c a 1 .4 0 b .1 1 -.3 c .2 .3 1 and those correlations could be used to weigh how the single-column statistics should be combined. If you can make it work, I'll take it... it's (much) easier said than done, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- pasman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On Wed, Feb 16, 2011 at 4:22 PM, Bruce Momjian br...@momjian.us wrote: I am not excited about the idea of putting these correlations in queries. What would be more intesting would be for analyze to build a correlation coeffficent matrix showing how columns are correlated: a b c a 1 .4 0 b .1 1 -.3 c .2 .3 1 and those correlations could be used to weigh how the single-column statistics should be combined. If you can make it work, I'll take it... it's (much) easier said than done, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Kevin Grittner wrote: Shaun Thomas stho...@peak6.com wrote: how difficult would it be to add that syntax to the JOIN statement, for example? Something like this syntax?: JOIN WITH (correlation_factor=0.3) Where 1.0 might mean that for each value on the left there was only one distinct value on the right, and 0.0 would mean that they were entirely independent? (Just as an off-the-cuff example -- I'm not at all sure that this makes sense, let alone is the best thing to specify. I'm trying to get at *syntax* here, not particular knobs.) I am not excited about the idea of putting these correlations in queries. What would be more intesting would be for analyze to build a correlation coeffficent matrix showing how columns are correlated: a b c a 1 .4 0 b .1 1 -.3 c .2 .3 1 and those correlations could be used to weigh how the single-column statistics should be combined. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints
On Thu, Feb 10, 2011 at 9:25 AM, Chris Browne cbbro...@acm.org wrote: robertmh...@gmail.com (Robert Haas) writes: On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Well, I'm comfortable digging in my heels against doing *lame* hints just because it's what all the other kids are doing, which I think is the only thing which would have satisfied the OP on this thread. From both on-list posts and ones exchanged off-list with me, it seems he was stubbornly resistant to properly tuning the server to see if any problems remained, or posting particular problems to see how they would be most effectively handled in PostgreSQL. We obviously can't be drawn into dumb approaches because of ill-informed demands like that. Nor was I proposing any such thing. But that doesn't make we don't want hints an accurate statement. Despite the impression that OP went away with, the real situation is a lot more nuanced than that, and the statement on the Todo list gives the wrong impression, IMHO. I have added the following comment to the ToDo: We are not interested to implement hints in ways they are commonly implemented on other databases, and proposals based on because they've got them will not be welcomed. If you have an idea that avoids the problems that have been observed with other hint systems, that could lead to valuable discussion. That seems to me to characterize the nuance. Where exactly are the problems with other systems noted? Most other systems have this option so saying They have problems is a giant cop out. -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints
I've wordsmithed Chris's changes some, and then spun off a completely separate page for Hints discussion, since the NotToDo item was becoming too long. Something like this syntax?: JOIN WITH (correlation_factor=0.3) Please, NO! This is exactly the kind of hint that I regard as a last resort if we run out of implementation alternatives. Any hint which gets coded into the actual queries becomes a *massive* maintenance and upgrade headache thereafter. If we're implementing a hint alternative, we should look at stuff in this priority order: 1. Useful tuning of additional cost parameters by GUC (i.e. cursor_tuple_fraction) 2. Modifying cost parameters on database *objects* (i.e. ndistinct=500) 3. Adding new parameters to modify on database objects (i.e. distribution=normal(1.5,17),new_rows=0.1) 4. Query hints (if all of the above fails to give fixes for some tested problem) Where exactly are the problems with other systems noted? Most other systems have this option so saying They have problems is a giant cop out. I've put my list down: http://wiki.postgresql.org/wiki/OptimizerHintsDiscussion#Problems_with_existing_Hint_stystems -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints
On Sun, Feb 13, 2011 at 3:29 PM, Josh Berkus j...@agliodbs.com wrote: I've wordsmithed Chris's changes some, and then spun off a completely separate page for Hints discussion, since the NotToDo item was becoming too long. Something like this syntax?: JOIN WITH (correlation_factor=0.3) Please, NO! This is exactly the kind of hint that I regard as a last resort if we run out of implementation alternatives. Any hint which gets coded into the actual queries becomes a *massive* maintenance and upgrade headache thereafter. If we're implementing a hint alternative, we should look at stuff in this priority order: 1. Useful tuning of additional cost parameters by GUC (i.e. cursor_tuple_fraction) 2. Modifying cost parameters on database *objects* (i.e. ndistinct=500) 3. Adding new parameters to modify on database objects (i.e. distribution=normal(1.5,17),new_rows=0.1) 4. Query hints (if all of the above fails to give fixes for some tested problem) I fail to see how 1 through 3 can tell the planner the correlation between two fields in two separate tables. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints
On Sun, Feb 13, 2011 at 10:49 PM, Josh Berkus j...@agliodbs.com wrote: I fail to see how 1 through 3 can tell the planner the correlation between two fields in two separate tables. CREATE CORRELATION_ESTIMATE ( table1.colA ) = ( table2.colB ) IS 0.3 ... and then it fixes the correlation for *every* query in the database, not just that one. And is easy to fix if the correlation changes. I like that. Even better, could we setup some kind of simple command to tell analyze to collect stats for the two columns together? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
2011/2/10 Tobias Brox tobi...@gmail.com On 4 February 2011 04:46, Josh Berkus j...@agliodbs.com wrote: Optimizer hints are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have the problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. I have no clue about how hints works in Oracle ... I've never been working enterprise level on anything else than Postgres. Anyway, today I just came over an interesting problem in our production database today - and I think it would be a benefit to be able to explicitly tell the planner what index to use (the dev team is adding redundant attributes and more indexes to solve the problem - which worries me, because we will run into serious problems as soon as there won't be enough memory for all the frequently-used indexes). We have users and transactions, and we have transaction types. The transaction table is huge. The users are able to interactively check their transaction listings online, and they have some simple filter options available as well. Slightly simplified, the queries done looks like this: select * from account_transaction where account_id=? order by created desc limit 25; select * from account_transaction where trans_type_id in ( ... long, hard-coded list ...) and account_id=? order by created desc limit 25; and we have indexes on: account_transaction(account_id, created) account_transaction(account_id, trans_type_id, created) If the list is hard-coded, you can create partial index on account_transaction(account_id, created desc) where trans_type_id in ( ... long, hard-coded list ...) -- Best regards, Vitalii Tymchyshyn
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
2011/2/11 Віталій Тимчишин tiv...@gmail.com: If the list is hard-coded, you can create partial index on account_transaction(account_id, created desc) where trans_type_id in ( ... long, hard-coded list ...) My idea as well, though it looks ugly and it would be a maintenance head-ache (upgrading the index as new transaction types are added would mean costly write locks on the table, and we can't rely on manual processes to get it right ... we might need to set up scripts to either upgrade the index or alert us if the index needs upgrading). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
11.02.11 11:29, Tobias Brox написав(ла): 2011/2/11 Віталій Тимчишинtiv...@gmail.com: If the list is hard-coded, you can create partial index on account_transaction(account_id, created desc) where trans_type_id in ( ... long, hard-coded list ...) My idea as well, though it looks ugly and it would be a maintenance head-ache (upgrading the index as new transaction types are added would mean costly write locks on the table, Create new one concurrently. and we can't rely on manual processes to get it right ... we might need to set up scripts to either upgrade the index or alert us if the index needs upgrading). Yep. Another option could be to add query rewrite as select * from ( select * from account_transaction where trans_type_id =type1 and account_id=? order by created desc limit 25 union all select * from account_transaction where trans_type_id =type2 and account_id=? order by created desc limit 25 union all ... union all select * from account_transaction where trans_type_id =typeN and account_id=? order by created desc limit 25 ) a order by created desc limit 25 This will allow to use three-column index in the way it can be used for such query. Yet if N is large query will look ugly. And I am not sure if optimizer is smart enough for not to fetch 25*N rows. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
2011/2/11 Vitalii Tymchyshyn tiv...@gmail.com: My idea as well, though it looks ugly and it would be a maintenance head-ache (upgrading the index as new transaction types are added would mean costly write locks on the table, Create new one concurrently. Concurrently? Are there any ways to add large indexes without blocking inserts to the table for the time it takes to create the index? Yep. Another option could be to add query rewrite as select * from ( select * from account_transaction where trans_type_id =type1 and account_id=? order by created desc limit 25 union all select * from account_transaction where trans_type_id =type2 and account_id=? order by created desc limit 25 union all ... union all select * from account_transaction where trans_type_id =typeN and account_id=? order by created desc limit 25 ) a order by created desc limit 25 I actually considered that. For the test case given it works very fast. Not sure if it would work universally ... it scales well when having extreme amounts of transactions outside the given transaction list (the case we have problems with now), but it wouldn't scale if some user has an extreme amount of transactions within the list. However, I think our extreme amount of transactions-problem is mostly limited to the transaction types outside the list. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On 02/11/2011 12:26 PM, Tobias Brox wrote: 2011/2/11 Vitalii Tymchyshyntiv...@gmail.com: My idea as well, though it looks ugly and it would be a maintenance head-ache (upgrading the index as new transaction types are added would mean costly write locks on the table, Create new one concurrently. Concurrently? Are there any ways to add large indexes without blocking inserts to the table for the time it takes to create the index? yep, AFAIR since 8.2 see: http://www.postgresql.org/docs/8.2/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY [cut] Andrea -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
select * from account_transaction where trans_type_id in ( ... long, hard-coded list ...) and account_id=? order by created desc limit 25; You could use an index on (account_id, created, trans_type), in replacement of your index on (account_id, created). This will not prevent the Index Scan Backwards, but at least, index rows with trans_type not matching the WHERE clause will not generate any heap access... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On Thu, Feb 3, 2011 at 8:46 PM, Josh Berkus j...@agliodbs.com wrote: Optimizer hints are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have the problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. That seems pretty straightforwards. There are even links to prior discussions about what kind of system would work. I don't think this text needs any adjustment; that's our clear consensus on the hint issue: we want a tool which works better than what we've seen in other databases. I think it's just dumb to say we don't want hints. We want hints, or at least many of us do. We just want them to actually work, and to not suck. Can't we just stop saying we don't want them, and say that we do want something, but it has to be really good? Yes, I occasionally run across cases where having a query tweaking system would help me fix a pathological failure in the planner. However, even on data warehouses that's less than 0.1% of the queries I deal with, so this isn't exactly a common event. And any hinting system we develop needs to address those specific cases, NOT a hypothetical case which can't be tested. Otherwise we'll implement hints which actually don't improve queries. No argument. The bottom line here is that a lot of features that we don't have are things that we don't want in the sense that we're not interested in working on them over other things that seem more pressing, and we have finite manpower. But if someone feels motivated to work on it, and can actually come up with something good, then why should we give the impression that such a thing would be rejected out of hand? I think we ought to nuke that item and replace it with some items in the optimizer section that express what we DO want, which is some better ways of fixing queries the few queries that suck despite our best (and very successful) efforts to produce a top-notch optimizer. The problem with multi-column statistics is a particularly good example of something in this class. We may have a great solution to that problem for PostgreSQL 11.0. But between now and then, if you have that problem, there is no good way to adjust the selectivity estimates. If this were an academic research project or just being used for toy projects that didn't really matter, we might not care. But this is a real database that people are relying on for their livelihood, and we should be willing to provide a way for those people to not get fired when they hit the 0.1% of queries that can't be fixed using existing methods. I don't know exactly what the right solution is off the top of my head, but digging in our heels is not it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Robert Haas robertmh...@gmail.com wrote: I think it's just dumb to say we don't want hints. We want hints, or at least many of us do. Well, yeah. Even those most ostensibly opposed to hints have been known to post that they would rather not have the optimizer recognize two logically equivalent constructs and optimize them the same because they find the current difference useful to coerce the optimizer to choose a certain plan. That's implementing hints but refusing to document them. And it sometimes bites those who don't know they're accidentally using a hint construct. An explicit and documented hint construct would be better. Probably not a use this plan type hint, but some form of optimization barrier hint, maybe. You know, like OFFSET 0, but more explicitly hint-like. The bottom line here is that a lot of features that we don't have are things that we don't want in the sense that we're not interested in working on them over other things that seem more pressing, and we have finite manpower. But if someone feels motivated to work on it, and can actually come up with something good, then why should we give the impression that such a thing would be rejected out of hand? I think we ought to nuke that item and replace it with some items in the optimizer section that express what we DO want, which is some better ways of fixing queries the few queries that suck despite our best (and very successful) efforts to produce a top-notch optimizer. The problem with multi-column statistics is a particularly good example of something in this class. We may have a great solution to that problem for PostgreSQL 11.0. But between now and then, if you have that problem, there is no good way to adjust the selectivity estimates. Yeah, this is probably the most important area to devise some explicit way for a DBA who knows that such multicolumn selections are going to be used, and is capable of calculating some correlation factor, could supply it to the optimizer to override the naive calculation it currently does. Even there I would tend to think that the sort of do it this way hints that people seem to initially want wouldn't be good; it should be a way to override the costing factor which the optimizer gets wrong, so it can do its usual excellent job of evaluating plans with accurate costs. I don't know exactly what the right solution is off the top of my head, but digging in our heels is not it. Well, I'm comfortable digging in my heels against doing *lame* hints just because it's what all the other kids are doing, which I think is the only thing which would have satisfied the OP on this thread. From both on-list posts and ones exchanged off-list with me, it seems he was stubbornly resistant to properly tuning the server to see if any problems remained, or posting particular problems to see how they would be most effectively handled in PostgreSQL. We obviously can't be drawn into dumb approaches because of ill-informed demands like that. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Kevin Grittner kevin.gritt...@wicourts.gov writes: Robert Haas robertmh...@gmail.com wrote: I don't know exactly what the right solution is off the top of my head, but digging in our heels is not it. Well, I'm comfortable digging in my heels against doing *lame* hints just because it's what all the other kids are doing, which I think is the only thing which would have satisfied the OP on this thread. Right. If someone comes up with a design that avoids the serious pitfalls of traditional hinting schemes, that'd be great. But I'm not interested in implementing Oracle-like hints just because Oracle has them, which I think was basically what the OP wanted. I haven't seen a hinting scheme that didn't suck (and that includes the aspects of our own current behavior that are hint-like). I don't say that there can't be one. I believe that the FAQ entry is meant to answer people who come along and say oh, this is easily solved, just do what $PRODUCT does. The generic answer to that is no, it's not that easy. But maybe the FAQ should be rephrased to be more like we don't want traditional hints because of problems X, Y, and Z. If you have an idea that avoids those problems, let us know. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Well, I'm comfortable digging in my heels against doing *lame* hints just because it's what all the other kids are doing, which I think is the only thing which would have satisfied the OP on this thread. From both on-list posts and ones exchanged off-list with me, it seems he was stubbornly resistant to properly tuning the server to see if any problems remained, or posting particular problems to see how they would be most effectively handled in PostgreSQL. We obviously can't be drawn into dumb approaches because of ill-informed demands like that. Nor was I proposing any such thing. But that doesn't make we don't want hints an accurate statement. Despite the impression that OP went away with, the real situation is a lot more nuanced than that, and the statement on the Todo list gives the wrong impression, IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On 02/10/2011 10:45 AM, Kevin Grittner wrote: Even there I would tend to think that the sort of do it this way hints that people seem to initially want wouldn't be good; it should be a way to override the costing factor which the optimizer gets wrong, so it can do its usual excellent job of evaluating plans with accurate costs. You know... that's an interesting approach. We already do that with functions by allowing users to specify the estimated cost, rows returned, and even override config settings. It's an inexact science at best, but it might help the optimizer out. Really... how difficult would it be to add that syntax to the JOIN statement, for example? -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On 02/10/2011 11:01 AM, Tom Lane wrote: But I'm not interested in implementing Oracle-like hints just because Oracle has them, which I think was basically what the OP wanted. Hilariously, I'm not so sure that's what the OP wanted. Several of us pointed him to EnterpriseDB and their Oracle-style syntax, and the only thing he said about that was to use it as further evidence that PostgreSQL should implement them. I'm very tempted to say he wanted something for free, and was angry he couldn't get it. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Shaun Thomas stho...@peak6.com wrote: how difficult would it be to add that syntax to the JOIN statement, for example? Something like this syntax?: JOIN WITH (correlation_factor=0.3) Where 1.0 might mean that for each value on the left there was only one distinct value on the right, and 0.0 would mean that they were entirely independent? (Just as an off-the-cuff example -- I'm not at all sure that this makes sense, let alone is the best thing to specify. I'm trying to get at *syntax* here, not particular knobs.) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Robert Haas robertmh...@gmail.com wrote: maybe the FAQ should be rephrased to be more like we don't want traditional hints because of problems X, Y, and Z. If you have an idea that avoids those problems, let us know. That's closer to where I think the community is on this issue That sounds pretty good to me. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On 02/10/2011 11:21 AM, Kevin Grittner wrote: Something like this syntax?: JOIN WITH (correlation_factor=0.3) I was thinking more: JOIN foo_tab USING (foo_id) WITH (COST=50) or something, to exploit the hooks that already exist for functions, for example. But it's still an interesting concept. Tell the optimizer what you want and how the data is really related in cases where it's wrong, and let it figure out the best path. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Shaun Thomas stho...@peak6.com wrote: I was thinking more: JOIN foo_tab USING (foo_id) WITH (COST=50) The problem I have with that syntax is that it would be hard to read when you have some nested set of joins or a (SELECT) in the JOIN instead of simple table name. For me, at least, it would get lost less easily if it were right next to the JOIN keyword. The problem with a COST factor is that it's not obvious to me what it would apply to: - each row on the left? - each row on the right? - each row in the result of the JOIN step? - the entire step? How would it scale based on other criteria which affected the number of rows on either side of the join? If I'm understanding the problem correctly, the part the optimizer gets wrong (because we don't yet have statistics to support a better assumption) is assuming that selection criteria on opposite sides of a join affect entirely independent sets of what would be in the result without the criteria. To use an oft-cited example, when one table is selected by zip code and the other by city, that's a bad assumption about the correlation, leading to bad estimates, leading to bad costing, leading to bad plans. The OP wanted to override step 4, a COST setting would try to override step 3, but I think we would want to override step 1 (until we get statistics which let us compute that accurately). -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Shaun Thomas wrote: Hilariously, I'm not so sure that's what the OP wanted. Someone to blame as a scapegoat for why his badly planned project had failed. I've done several Oracle conversions before, and never met someone who was so resistent to doing the right things for such a conversion. You have to relatively flexible in your thinking to work with the good and away from the bad parts of PostgreSQL for such a project to succeed. I didn't hear a whole lot of flexible in that discussion. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints
robertmh...@gmail.com (Robert Haas) writes: On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Well, I'm comfortable digging in my heels against doing *lame* hints just because it's what all the other kids are doing, which I think is the only thing which would have satisfied the OP on this thread. From both on-list posts and ones exchanged off-list with me, it seems he was stubbornly resistant to properly tuning the server to see if any problems remained, or posting particular problems to see how they would be most effectively handled in PostgreSQL. We obviously can't be drawn into dumb approaches because of ill-informed demands like that. Nor was I proposing any such thing. But that doesn't make we don't want hints an accurate statement. Despite the impression that OP went away with, the real situation is a lot more nuanced than that, and the statement on the Todo list gives the wrong impression, IMHO. I have added the following comment to the ToDo: We are not interested to implement hints in ways they are commonly implemented on other databases, and proposals based on because they've got them will not be welcomed. If you have an idea that avoids the problems that have been observed with other hint systems, that could lead to valuable discussion. That seems to me to characterize the nuance. -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://www3.sympatico.ca/cbbrowne/languages.html If only women came with pull-down menus and online help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Greg Smith g...@2ndquadrant.com wrote: Shaun Thomas wrote: Hilariously, I'm not so sure that's what the OP wanted. Someone to blame as a scapegoat for why his badly planned project had failed. I've done several Oracle conversions before, and never met someone who was so resistent to doing the right things for such a conversion. You have to relatively flexible in your thinking to work with the good and away from the bad parts of PostgreSQL for such a project to succeed. I didn't hear a whole lot of flexible in that discussion. I was thinking along the same lines, but couldn't find the words to put it so politely, so I held back. Still biting my tongue, but I appreciate your milder summary. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On 2/10/11 9:21 AM, Kevin Grittner wrote: Shaun Thomasstho...@peak6.com wrote: how difficult would it be to add that syntax to the JOIN statement, for example? Something like this syntax?: JOIN WITH (correlation_factor=0.3) Where 1.0 might mean that for each value on the left there was only one distinct value on the right, and 0.0 would mean that they were entirely independent? (Just as an off-the-cuff example -- I'm not at all sure that this makes sense, let alone is the best thing to specify. I'm trying to get at *syntax* here, not particular knobs.) There are two types of problems: 1. The optimizer is imperfect and makes a sub-optimal choice. 2. There is theoretical reasons why it's hard for the optimizer. For example, in a table with 50 columns, there is a staggering number of possible correlations. An optimizer can't possibly figure this out, but a human might know them from the start. The City/Postal-code correlation is a good example. For #1, Postgres should never offer any sort of hint mechanism. As many have pointed out, it's far better to spend the time fixing the optimizer than adding hacks. For #2, it might make sense to give a designer a way to tell Postgres stuff that it couldn't possibly figure out. But ... not until the problem is clearly defined. What should happen is that someone writes with an example query, and the community realizes that no amount of cleverness from Postgres could ever solve it (for solid theoretical reasons). Only then, when the problem is clearly defined, should we talk about solutions and SQL extensions. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On 4 February 2011 04:46, Josh Berkus j...@agliodbs.com wrote: Optimizer hints are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have the problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. I have no clue about how hints works in Oracle ... I've never been working enterprise level on anything else than Postgres. Anyway, today I just came over an interesting problem in our production database today - and I think it would be a benefit to be able to explicitly tell the planner what index to use (the dev team is adding redundant attributes and more indexes to solve the problem - which worries me, because we will run into serious problems as soon as there won't be enough memory for all the frequently-used indexes). We have users and transactions, and we have transaction types. The transaction table is huge. The users are able to interactively check their transaction listings online, and they have some simple filter options available as well. Slightly simplified, the queries done looks like this: select * from account_transaction where account_id=? order by created desc limit 25; select * from account_transaction where trans_type_id in ( ... long, hard-coded list ...) and account_id=? order by created desc limit 25; and we have indexes on: account_transaction(account_id, created) account_transaction(account_id, trans_type_id, created) (At this point, someone would probably suggest to make three single-key indexes and use bitmap index scan ... well, pulling 25 rows from the end of an index may be orders of magnitude faster than doing bitmap index mapping on huge indexes) For the second query, the planner would chose the first index - and maybe it makes sense - most of our customers have between 10-30% of the transactions from the long list of transaction types, slim indexes are good and by average the slimmer index would probably do the job a bit faster. The problem is with the corner cases - for some of our extreme customers thousands of transaction index tuples may need to be scanned before 25 rows with the correct transaction type is pulled out, and if the index happens to be on disk, it may take tens of seconds to pull out the answer. Tens of seconds of waiting leads to frustration, it is a lot nowadays in an interactive session. Also, I haven't really checked it up, but it may very well be that this is exactly the kind of customers we want to retain. To summarize, there are two things the planner doesn't know - it doesn't know that there exists such corner cases where the real cost is far larger than the estimated cost, and it doesn't know that it's more important to keep the worst-case cost on a reasonable level than to minimize the average cost. In the ideal world postgres would have sufficiently good statistics to know that for user #7 it is better to chose the second index, but I suppose it would be easier if I was able to explicitly hide the account_transaction(account_id, created) index for this query. Well, I know of one way to do it ... but I suppose it's not a good idea to put drop index foo; select ...; rollback; into production ;-) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Tobias Brox tobi...@gmail.com writes: I have no clue about how hints works in Oracle ... I've never been working enterprise level on anything else than Postgres. Anyway, today I just came over an interesting problem in our production database today - and I think it would be a benefit to be able to explicitly tell the planner what index to use (the dev team is adding redundant attributes and more indexes to solve the problem - which worries me, because we will run into serious problems as soon as there won't be enough memory for all the frequently-used indexes). We have users and transactions, and we have transaction types. The transaction table is huge. The users are able to interactively check their transaction listings online, and they have some simple filter options available as well. Slightly simplified, the queries done looks like this: select * from account_transaction where account_id=? order by created desc limit 25; select * from account_transaction where trans_type_id in ( ... long, hard-coded list ...) and account_id=? order by created desc limit 25; and we have indexes on: account_transaction(account_id, created) account_transaction(account_id, trans_type_id, created) Well, in this case the optimizer *is* smarter than you are, and the reason is that it remembers the correct rules for when indexes are useful. That second index is of no value for either query, because in doesn't work the way you're hoping. I understand the larger point you're trying to make, but this example also nicely illustrates the point being made on the other side, that force the optimizer to use the index I think it should use isn't a very good solution. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
http://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want No. 2 on the list. Heck, *I* wrote that text. I quote: Optimizer hints are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have the problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. That seems pretty straightforwards. There are even links to prior discussions about what kind of system would work. I don't think this text needs any adjustment; that's our clear consensus on the hint issue: we want a tool which works better than what we've seen in other databases. Quite frankly, the main reason why most DBMSes have a hinting system has nothing to do with the quality of optimizer and everything to do with DBAs who think they're smarter than the optimizer (incorrectly). Oracle has a darned good query optimizer, and SQL server's is even better. However, there are a lot of undereducated or fossilized DBAs out there who don't trust the query planner and want to override it in fairly arbitrary ways; I refer you to the collected works of Dan Tow, for example. In many cases Hints are used by DBAs in emergency situations because they are easier than figuring out what the underlying issue is, even when that could be done relatively simply. Improving diagnostic query tools would be a much better approach here; for example, the team working on hypothetical indexes has a lot to offer. If you can figure out what's really wrong with the query in 10 minutes, you don't need a hint. Yes, I occasionally run across cases where having a query tweaking system would help me fix a pathological failure in the planner. However, even on data warehouses that's less than 0.1% of the queries I deal with, so this isn't exactly a common event. And any hinting system we develop needs to address those specific cases, NOT a hypothetical case which can't be tested. Otherwise we'll implement hints which actually don't improve queries. Commercial DBMSes have to give in to what their big paying customers want, no matter how stupid it is. I'm grateful that I can work on a DBMS -- the third most popular SQL DBMS in the world -- which can focus on quality instead. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance