Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On Thu, Feb 3, 2011 at 8:46 PM, Josh Berkus j...@agliodbs.com wrote: Optimizer hints are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have the problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. That seems pretty straightforwards. There are even links to prior discussions about what kind of system would work. I don't think this text needs any adjustment; that's our clear consensus on the hint issue: we want a tool which works better than what we've seen in other databases. I think it's just dumb to say we don't want hints. We want hints, or at least many of us do. We just want them to actually work, and to not suck. Can't we just stop saying we don't want them, and say that we do want something, but it has to be really good? Yes, I occasionally run across cases where having a query tweaking system would help me fix a pathological failure in the planner. However, even on data warehouses that's less than 0.1% of the queries I deal with, so this isn't exactly a common event. And any hinting system we develop needs to address those specific cases, NOT a hypothetical case which can't be tested. Otherwise we'll implement hints which actually don't improve queries. No argument. The bottom line here is that a lot of features that we don't have are things that we don't want in the sense that we're not interested in working on them over other things that seem more pressing, and we have finite manpower. But if someone feels motivated to work on it, and can actually come up with something good, then why should we give the impression that such a thing would be rejected out of hand? I think we ought to nuke that item and replace it with some items in the optimizer section that express what we DO want, which is some better ways of fixing queries the few queries that suck despite our best (and very successful) efforts to produce a top-notch optimizer. The problem with multi-column statistics is a particularly good example of something in this class. We may have a great solution to that problem for PostgreSQL 11.0. But between now and then, if you have that problem, there is no good way to adjust the selectivity estimates. If this were an academic research project or just being used for toy projects that didn't really matter, we might not care. But this is a real database that people are relying on for their livelihood, and we should be willing to provide a way for those people to not get fired when they hit the 0.1% of queries that can't be fixed using existing methods. I don't know exactly what the right solution is off the top of my head, but digging in our heels is not it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Robert Haas robertmh...@gmail.com wrote: I think it's just dumb to say we don't want hints. We want hints, or at least many of us do. Well, yeah. Even those most ostensibly opposed to hints have been known to post that they would rather not have the optimizer recognize two logically equivalent constructs and optimize them the same because they find the current difference useful to coerce the optimizer to choose a certain plan. That's implementing hints but refusing to document them. And it sometimes bites those who don't know they're accidentally using a hint construct. An explicit and documented hint construct would be better. Probably not a use this plan type hint, but some form of optimization barrier hint, maybe. You know, like OFFSET 0, but more explicitly hint-like. The bottom line here is that a lot of features that we don't have are things that we don't want in the sense that we're not interested in working on them over other things that seem more pressing, and we have finite manpower. But if someone feels motivated to work on it, and can actually come up with something good, then why should we give the impression that such a thing would be rejected out of hand? I think we ought to nuke that item and replace it with some items in the optimizer section that express what we DO want, which is some better ways of fixing queries the few queries that suck despite our best (and very successful) efforts to produce a top-notch optimizer. The problem with multi-column statistics is a particularly good example of something in this class. We may have a great solution to that problem for PostgreSQL 11.0. But between now and then, if you have that problem, there is no good way to adjust the selectivity estimates. Yeah, this is probably the most important area to devise some explicit way for a DBA who knows that such multicolumn selections are going to be used, and is capable of calculating some correlation factor, could supply it to the optimizer to override the naive calculation it currently does. Even there I would tend to think that the sort of do it this way hints that people seem to initially want wouldn't be good; it should be a way to override the costing factor which the optimizer gets wrong, so it can do its usual excellent job of evaluating plans with accurate costs. I don't know exactly what the right solution is off the top of my head, but digging in our heels is not it. Well, I'm comfortable digging in my heels against doing *lame* hints just because it's what all the other kids are doing, which I think is the only thing which would have satisfied the OP on this thread. From both on-list posts and ones exchanged off-list with me, it seems he was stubbornly resistant to properly tuning the server to see if any problems remained, or posting particular problems to see how they would be most effectively handled in PostgreSQL. We obviously can't be drawn into dumb approaches because of ill-informed demands like that. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Kevin Grittner kevin.gritt...@wicourts.gov writes: Robert Haas robertmh...@gmail.com wrote: I don't know exactly what the right solution is off the top of my head, but digging in our heels is not it. Well, I'm comfortable digging in my heels against doing *lame* hints just because it's what all the other kids are doing, which I think is the only thing which would have satisfied the OP on this thread. Right. If someone comes up with a design that avoids the serious pitfalls of traditional hinting schemes, that'd be great. But I'm not interested in implementing Oracle-like hints just because Oracle has them, which I think was basically what the OP wanted. I haven't seen a hinting scheme that didn't suck (and that includes the aspects of our own current behavior that are hint-like). I don't say that there can't be one. I believe that the FAQ entry is meant to answer people who come along and say oh, this is easily solved, just do what $PRODUCT does. The generic answer to that is no, it's not that easy. But maybe the FAQ should be rephrased to be more like we don't want traditional hints because of problems X, Y, and Z. If you have an idea that avoids those problems, let us know. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Well, I'm comfortable digging in my heels against doing *lame* hints just because it's what all the other kids are doing, which I think is the only thing which would have satisfied the OP on this thread. From both on-list posts and ones exchanged off-list with me, it seems he was stubbornly resistant to properly tuning the server to see if any problems remained, or posting particular problems to see how they would be most effectively handled in PostgreSQL. We obviously can't be drawn into dumb approaches because of ill-informed demands like that. Nor was I proposing any such thing. But that doesn't make we don't want hints an accurate statement. Despite the impression that OP went away with, the real situation is a lot more nuanced than that, and the statement on the Todo list gives the wrong impression, IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On 02/10/2011 10:45 AM, Kevin Grittner wrote: Even there I would tend to think that the sort of do it this way hints that people seem to initially want wouldn't be good; it should be a way to override the costing factor which the optimizer gets wrong, so it can do its usual excellent job of evaluating plans with accurate costs. You know... that's an interesting approach. We already do that with functions by allowing users to specify the estimated cost, rows returned, and even override config settings. It's an inexact science at best, but it might help the optimizer out. Really... how difficult would it be to add that syntax to the JOIN statement, for example? -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On 02/10/2011 11:01 AM, Tom Lane wrote: But I'm not interested in implementing Oracle-like hints just because Oracle has them, which I think was basically what the OP wanted. Hilariously, I'm not so sure that's what the OP wanted. Several of us pointed him to EnterpriseDB and their Oracle-style syntax, and the only thing he said about that was to use it as further evidence that PostgreSQL should implement them. I'm very tempted to say he wanted something for free, and was angry he couldn't get it. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Shaun Thomas stho...@peak6.com wrote: how difficult would it be to add that syntax to the JOIN statement, for example? Something like this syntax?: JOIN WITH (correlation_factor=0.3) Where 1.0 might mean that for each value on the left there was only one distinct value on the right, and 0.0 would mean that they were entirely independent? (Just as an off-the-cuff example -- I'm not at all sure that this makes sense, let alone is the best thing to specify. I'm trying to get at *syntax* here, not particular knobs.) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Robert Haas robertmh...@gmail.com wrote: maybe the FAQ should be rephrased to be more like we don't want traditional hints because of problems X, Y, and Z. If you have an idea that avoids those problems, let us know. That's closer to where I think the community is on this issue That sounds pretty good to me. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On 02/10/2011 11:21 AM, Kevin Grittner wrote: Something like this syntax?: JOIN WITH (correlation_factor=0.3) I was thinking more: JOIN foo_tab USING (foo_id) WITH (COST=50) or something, to exploit the hooks that already exist for functions, for example. But it's still an interesting concept. Tell the optimizer what you want and how the data is really related in cases where it's wrong, and let it figure out the best path. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Shaun Thomas stho...@peak6.com wrote: I was thinking more: JOIN foo_tab USING (foo_id) WITH (COST=50) The problem I have with that syntax is that it would be hard to read when you have some nested set of joins or a (SELECT) in the JOIN instead of simple table name. For me, at least, it would get lost less easily if it were right next to the JOIN keyword. The problem with a COST factor is that it's not obvious to me what it would apply to: - each row on the left? - each row on the right? - each row in the result of the JOIN step? - the entire step? How would it scale based on other criteria which affected the number of rows on either side of the join? If I'm understanding the problem correctly, the part the optimizer gets wrong (because we don't yet have statistics to support a better assumption) is assuming that selection criteria on opposite sides of a join affect entirely independent sets of what would be in the result without the criteria. To use an oft-cited example, when one table is selected by zip code and the other by city, that's a bad assumption about the correlation, leading to bad estimates, leading to bad costing, leading to bad plans. The OP wanted to override step 4, a COST setting would try to override step 3, but I think we would want to override step 1 (until we get statistics which let us compute that accurately). -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Does auto-analyze work on dirty writes?
On Fri, Feb 4, 2011 at 8:50 PM, Mark Mielke m...@mark.mielke.cc wrote: On 02/04/2011 10:41 AM, Tom Lane wrote: 1. Autovacuum fires when the stats collector's insert/update/delete counts have reached appropriate thresholds. Those counts are accumulated from messages sent by backends at transaction commit or rollback, so they take no account of what's been done by transactions still in progress. 2. Only live rows are included in the stats computed by ANALYZE. (IIRC it uses SnapshotNow to decide whether rows are live.) Although the stats collector does track an estimate of the number of dead rows for the benefit of autovacuum, this isn't used by planning. Table bloat is accounted for only in terms of growth of the physical size of the table in blocks. Thanks, Tom. Does this un-analyzed bloat not impact queries? I guess the worst case here is if autovaccum is disabled for some reason and 99% of the table is dead rows. If I understand the above correctly, I think analyze might generate a bad plan under this scenario, thinking that a value is unique, using the index - but every tuple in the index has the same value and each has to be looked up in the table to see if it is visible? It sounds like you're describing something like a one-row table with a unique index on one of its column, getting updates that can't be made HOT, and not getting vacuumed. That scenario does suck - I had a test case I was using it a while back that generated something similar - but I'm not sure how much it's worth worrying about the plan, because either an index scan or a sequential scan is going to be awful. To put that another way, I've founded that the optimizer copes pretty well with adjusting plans as tables get bloated - mostly by using index scans rather than sequential scans. It's possible there is some improvement still to be had there, but I would be a lot more interested in fixing the bloat, at least based on my own experiences. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Shaun Thomas wrote: Hilariously, I'm not so sure that's what the OP wanted. Someone to blame as a scapegoat for why his badly planned project had failed. I've done several Oracle conversions before, and never met someone who was so resistent to doing the right things for such a conversion. You have to relatively flexible in your thinking to work with the good and away from the bad parts of PostgreSQL for such a project to succeed. I didn't hear a whole lot of flexible in that discussion. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints
robertmh...@gmail.com (Robert Haas) writes: On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Well, I'm comfortable digging in my heels against doing *lame* hints just because it's what all the other kids are doing, which I think is the only thing which would have satisfied the OP on this thread. From both on-list posts and ones exchanged off-list with me, it seems he was stubbornly resistant to properly tuning the server to see if any problems remained, or posting particular problems to see how they would be most effectively handled in PostgreSQL. We obviously can't be drawn into dumb approaches because of ill-informed demands like that. Nor was I proposing any such thing. But that doesn't make we don't want hints an accurate statement. Despite the impression that OP went away with, the real situation is a lot more nuanced than that, and the statement on the Todo list gives the wrong impression, IMHO. I have added the following comment to the ToDo: We are not interested to implement hints in ways they are commonly implemented on other databases, and proposals based on because they've got them will not be welcomed. If you have an idea that avoids the problems that have been observed with other hint systems, that could lead to valuable discussion. That seems to me to characterize the nuance. -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://www3.sympatico.ca/cbbrowne/languages.html If only women came with pull-down menus and online help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Greg Smith g...@2ndquadrant.com wrote: Shaun Thomas wrote: Hilariously, I'm not so sure that's what the OP wanted. Someone to blame as a scapegoat for why his badly planned project had failed. I've done several Oracle conversions before, and never met someone who was so resistent to doing the right things for such a conversion. You have to relatively flexible in your thinking to work with the good and away from the bad parts of PostgreSQL for such a project to succeed. I didn't hear a whole lot of flexible in that discussion. I was thinking along the same lines, but couldn't find the words to put it so politely, so I held back. Still biting my tongue, but I appreciate your milder summary. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On 2/10/11 9:21 AM, Kevin Grittner wrote: Shaun Thomasstho...@peak6.com wrote: how difficult would it be to add that syntax to the JOIN statement, for example? Something like this syntax?: JOIN WITH (correlation_factor=0.3) Where 1.0 might mean that for each value on the left there was only one distinct value on the right, and 0.0 would mean that they were entirely independent? (Just as an off-the-cuff example -- I'm not at all sure that this makes sense, let alone is the best thing to specify. I'm trying to get at *syntax* here, not particular knobs.) There are two types of problems: 1. The optimizer is imperfect and makes a sub-optimal choice. 2. There is theoretical reasons why it's hard for the optimizer. For example, in a table with 50 columns, there is a staggering number of possible correlations. An optimizer can't possibly figure this out, but a human might know them from the start. The City/Postal-code correlation is a good example. For #1, Postgres should never offer any sort of hint mechanism. As many have pointed out, it's far better to spend the time fixing the optimizer than adding hacks. For #2, it might make sense to give a designer a way to tell Postgres stuff that it couldn't possibly figure out. But ... not until the problem is clearly defined. What should happen is that someone writes with an example query, and the community realizes that no amount of cleverness from Postgres could ever solve it (for solid theoretical reasons). Only then, when the problem is clearly defined, should we talk about solutions and SQL extensions. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
On 4 February 2011 04:46, Josh Berkus j...@agliodbs.com wrote: Optimizer hints are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have the problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. I have no clue about how hints works in Oracle ... I've never been working enterprise level on anything else than Postgres. Anyway, today I just came over an interesting problem in our production database today - and I think it would be a benefit to be able to explicitly tell the planner what index to use (the dev team is adding redundant attributes and more indexes to solve the problem - which worries me, because we will run into serious problems as soon as there won't be enough memory for all the frequently-used indexes). We have users and transactions, and we have transaction types. The transaction table is huge. The users are able to interactively check their transaction listings online, and they have some simple filter options available as well. Slightly simplified, the queries done looks like this: select * from account_transaction where account_id=? order by created desc limit 25; select * from account_transaction where trans_type_id in ( ... long, hard-coded list ...) and account_id=? order by created desc limit 25; and we have indexes on: account_transaction(account_id, created) account_transaction(account_id, trans_type_id, created) (At this point, someone would probably suggest to make three single-key indexes and use bitmap index scan ... well, pulling 25 rows from the end of an index may be orders of magnitude faster than doing bitmap index mapping on huge indexes) For the second query, the planner would chose the first index - and maybe it makes sense - most of our customers have between 10-30% of the transactions from the long list of transaction types, slim indexes are good and by average the slimmer index would probably do the job a bit faster. The problem is with the corner cases - for some of our extreme customers thousands of transaction index tuples may need to be scanned before 25 rows with the correct transaction type is pulled out, and if the index happens to be on disk, it may take tens of seconds to pull out the answer. Tens of seconds of waiting leads to frustration, it is a lot nowadays in an interactive session. Also, I haven't really checked it up, but it may very well be that this is exactly the kind of customers we want to retain. To summarize, there are two things the planner doesn't know - it doesn't know that there exists such corner cases where the real cost is far larger than the estimated cost, and it doesn't know that it's more important to keep the worst-case cost on a reasonable level than to minimize the average cost. In the ideal world postgres would have sufficiently good statistics to know that for user #7 it is better to chose the second index, but I suppose it would be easier if I was able to explicitly hide the account_transaction(account_id, created) index for this query. Well, I know of one way to do it ... but I suppose it's not a good idea to put drop index foo; select ...; rollback; into production ;-) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
Tobias Brox tobi...@gmail.com writes: I have no clue about how hints works in Oracle ... I've never been working enterprise level on anything else than Postgres. Anyway, today I just came over an interesting problem in our production database today - and I think it would be a benefit to be able to explicitly tell the planner what index to use (the dev team is adding redundant attributes and more indexes to solve the problem - which worries me, because we will run into serious problems as soon as there won't be enough memory for all the frequently-used indexes). We have users and transactions, and we have transaction types. The transaction table is huge. The users are able to interactively check their transaction listings online, and they have some simple filter options available as well. Slightly simplified, the queries done looks like this: select * from account_transaction where account_id=? order by created desc limit 25; select * from account_transaction where trans_type_id in ( ... long, hard-coded list ...) and account_id=? order by created desc limit 25; and we have indexes on: account_transaction(account_id, created) account_transaction(account_id, trans_type_id, created) Well, in this case the optimizer *is* smarter than you are, and the reason is that it remembers the correct rules for when indexes are useful. That second index is of no value for either query, because in doesn't work the way you're hoping. I understand the larger point you're trying to make, but this example also nicely illustrates the point being made on the other side, that force the optimizer to use the index I think it should use isn't a very good solution. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance