Re: [HACKERS] Planner hints in Postgresql
On Monday, March 17, 2014, Atri Sharma atri.j...@gmail.com wrote: On Mon, Mar 17, 2014 at 10:58 PM, Stephen Frost sfr...@snowman.netjavascript:_e(%7B%7D,'cvml','sfr...@snowman.net'); wrote: * Atri Sharma (atri.j...@gmail.comjavascript:_e(%7B%7D,'cvml','atri.j...@gmail.com');) wrote: Isnt using a user given value for selectivity a pretty risky situation as it can horribly screw up the plan selection? Why not allow the user to specify an alternate plan and have the planner Uh, you're worried about the user given us a garbage selectivity, but they're going to get a full-blown plan perfect? I never said that the user plan would be perfect. The entire point of planner hints is based on the assumption that the user knows more about the data than the planner does hence the user's ideas about the plan should be given a preference. Garbage selectivity can screw up the cost estimation of *all* our possible plans and we could end up preferring a sequential scan over an index only scan for e.g. I am trying to think of ways that give some preference to a user plan but do not interfere with the cost estimation of our other potential plans. I'm not opposed to planner hints (or plan mandates), but also not optimistic they will ever get implemented, much less accepted. But if they were, I don't see a use for such fudge factors. By mandating a plan, I am already asserting I know more than the optimizer does. Maybe I am right, maybe I am wrong, but either way I have taken it out of the optimizer's hands and would not welcome it snatching control back. If it is too deranged for me to trust, why would it not become somewhat more deranged and so decide to ignore my hints? The only setting for such a factor I would ever see myself using was the minimum, or the maximum. The feature I would like in such hints, if they are to exist, is to set a version to which they apply. Often a fix is made very quickly after the problem is pointed out, but it could take well over a year for the fix to see production if it is not backpatched and it lands at the wrong part of the release cycle. Cheers, Jeff
Re: [HACKERS] Planner hints in Postgresql
On Monday, March 17, 2014, Tom Lane t...@sss.pgh.pa.us wrote: Claudio Freire klaussfre...@gmail.com javascript:; writes: On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net javascript:; wrote: Even better would be if the planner could estimate how bad a plan will become if we made assumptions that turn out to be wrong. That's precisely what risk estimation was about. Yeah. I would like to see the planner's cost estimates extended to include some sort of uncertainty estimate, whereupon risk-averse people could ask it to prefer low-uncertainty plans over high-uncertainty ones (the plans we typically choose for ORDER BY ... LIMIT queries being great examples of the latter). But it's a long way from wishing that to making it so. Right now it's not even clear (to me anyway) how we'd measure or model such uncertainty. Most of the cases where I've run into horrible estimates, it seemed like the same level of knowledge/reasoning that could allow us to know it was risky, would allow us to just do a better job in the first place. The exception I can think of is in an antijoin between two huge rels. It is like subtracting two large measurements to get a much smaller result. We should know the uncertainty will be large. Cheers, Jeff
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Claudio Freire klaussfre...@gmail.com writes: On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net wrote: Even better would be if the planner could estimate how bad a plan will become if we made assumptions that turn out to be wrong. That's precisely what risk estimation was about. Yeah. I would like to see the planner's cost estimates extended to include some sort of uncertainty estimate, whereupon risk-averse people could ask it to prefer low-uncertainty plans over high-uncertainty ones (the plans we typically choose for ORDER BY ... LIMIT queries being great examples of the latter). But it's a long way from wishing that to making it so. Right now it's not even clear (to me anyway) how we'd measure or model such uncertainty. Well, currently, selectivity estimates based on MCV should be pretty low-uncertainty, whereas certainty of other estimates could be modeled as a random variable if ANALYZE gathered a few statistical moments (for variables that are prone to that kind of statistical analysis). That alone could improve things considerably, and statistical info could be propagated along expressions to make it possible to model uncertainty in complex expressions as well.
Re: [HACKERS] Planner hints in Postgresql
On Tue, Mar 18, 2014 at 11:53 AM, Claudio Freire klaussfre...@gmail.com wrote: On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Claudio Freire klaussfre...@gmail.com writes: On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net wrote: Even better would be if the planner could estimate how bad a plan will become if we made assumptions that turn out to be wrong. That's precisely what risk estimation was about. Yeah. I would like to see the planner's cost estimates extended to include some sort of uncertainty estimate, whereupon risk-averse people could ask it to prefer low-uncertainty plans over high-uncertainty ones (the plans we typically choose for ORDER BY ... LIMIT queries being great examples of the latter). But it's a long way from wishing that to making it so. Right now it's not even clear (to me anyway) how we'd measure or model such uncertainty. Well, currently, selectivity estimates based on MCV should be pretty low-uncertainty, whereas certainty of other estimates could be modeled as a random variable if ANALYZE gathered a few statistical moments (for variables that are prone to that kind of statistical analysis). Sure, plus as noted you have cases where the planer makes SWAGs. Each of those SWAGs can introduce say (in the worst case) an order of magnitude of error in the row count estimate. That alone could improve things considerably, and statistical info could be propagated along expressions to make it possible to model uncertainty in complex expressions as well. But how would that work? I see no solution adumbrated there :-). Let's say you change the rowcount estimate to low/bestguess/high *and* you only engage extra searches when there is enough disparity between those values you still get exponentially more searches. (my thinking is that if bestguess estimated execution time is some user definable amount faster then low/high at any node), a more skeptical plan is introduced. All that could end up being pessimal to the general case though. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On Tue, Mar 18, 2014 at 4:48 PM, Merlin Moncure mmonc...@gmail.com wrote: That alone could improve things considerably, and statistical info could be propagated along expressions to make it possible to model uncertainty in complex expressions as well. But how would that work? I see no solution adumbrated there :-). I would have to tipify the SQL expression grammar for this, but I don't think it would be impossible. Most non-function expression nodes seem rather trivial. Even CASE, as long as you have a distribution for the conditional, you can derive a distribution for the whole. User defined functions would be another game, though. Correlation would have to be measured, and that can be troublesome and a weak spot of risk computation as much as it is of planning, but it could be fuzzed arbitrarily until properly computed - after all, dependency on correlation or non-correlation is a known source of risk, and accounting for it in any way is better than not. Let's say you change the rowcount estimate to low/bestguess/high *and* you only engage extra searches when there is enough disparity between those values you still get exponentially more searches. I was under the impression the planner already did an exhaustive search for some queries. So it's just a matter of picking the best plan among those (ie: estimating cost). The case of GEQO isn't any different, except perhaps introducing a risk-decreasing transformation would be needed, unless I'm missing something. (my thinking is that if bestguess estimated execution time is some user definable amount faster then low/high at any node), a more skeptical plan is introduced. All that could end up being pessimal to the general case though. I think the cost estimate would be replaced by a distribution (simplified perhaps into an array of moments, or whatever is easily manipulated in the face of complex expressions). What the user would pick, is a sampling method of said distribution. Then, plans get measured by the user's stick (say: arithmetic mean, median, 90th percentile, etc). The arithmetic mean would I guess be the default, and that ought to be roughly equivalent to the planner's current behavior.
Re: [HACKERS] Planner hints in Postgresql
That's precisely what risk estimation was about. Yeah. I would like to see the planner's cost estimates extended to include some sort of uncertainty estimate, whereupon risk-averse people could ask it to prefer low-uncertainty plans over high-uncertainty ones (the plans we typically choose for ORDER BY ... LIMIT queries being great examples of the latter). But it's a long way from wishing that to making it so. Right now it's not even clear (to me anyway) how we'd measure or model such uncertainty. Well, currently, selectivity estimates based on MCV should be pretty low-uncertainty, whereas certainty of other estimates could be modeled as a random variable if ANALYZE gathered a few statistical moments (for variables that are prone to that kind of statistical analysis). That alone could improve things considerably, and statistical info could be propagated along expressions to make it possible to model uncertainty in complex expressions as well. That is a sort of solution that I proposed yesterday on the mailing list. The solution essentially takes lots of samples of the data and then plots the mean and standard deviation of the independent samples to get the probability of the histogram selectivity estimate. The problem is multi faceted (outdated stats, bad guess at distribution of data, cases Merlin mentioned before (CASE statements, COALESCE statements etc.). Finding a general solution to this problem shall require a lot of research and time. I agree with Tom, we should focus on some of the main problems we have in that domain and then try to solve them first. Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 9:22 PM, Rajmohan C csrajmo...@gmail.com wrote: I am implementing Planner hints in Postgresql to force the optimizer to select a particular plan for a query on request from sql input. I am having trouble in modifying the planner code. I want to create a path node of hint plan and make it the plan to be used by executor. How do I enforce this ? Should I create a new Plan for this ..how to create a plan node which can be then given directly to executor for a particular query? Planner hints have been discussed a lot before as well and AFAIK there is a wiki page that says why we shouldnt implement them. Have you referred to them? Please share if you have any new points on the same. Regards, Atri
Re: [HACKERS] Planner hints in Postgresql
Atri Sharma wrote On Mon, Mar 17, 2014 at 9:22 PM, Rajmohan C lt; csrajmohan@ gt; wrote: I am implementing Planner hints in Postgresql to force the optimizer to select a particular plan for a query on request from sql input. I am having trouble in modifying the planner code. I want to create a path node of hint plan and make it the plan to be used by executor. How do I enforce this ? Should I create a new Plan for this ..how to create a plan node which can be then given directly to executor for a particular query? Planner hints have been discussed a lot before as well and AFAIK there is a wiki page that says why we shouldnt implement them. Have you referred to them? Please share if you have any new points on the same. Regards, Atri http://wiki.postgresql.org/wiki/Todo (I got to it via the FAQ link on the homepage and the Developer FAQ section there-in. You should make sure you've scanned that as well.) Note the final section titled: Features We Do Not Want Also, you need to consider what you are doing when you cross-post (a bad thing generally) -hackers and -novice. As there is, rightly IMO, no -novice-hackers list you should have probably just hit up -general. Need to discuss the general why before any meaningful help on the how is going to be considered by hackers. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Planner-hints-in-Postgresql-tp5796347p5796353.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
David Johnston pol...@yahoo.com writes: Need to discuss the general why before any meaningful help on the how is going to be considered by hackers. Possibly worth noting is that in past discussions, we've concluded that the most sensible type of hint would not be use this plan at all, but here's what to assume about the selectivity of this WHERE clause. That seems considerably less likely to break than any attempt to directly specify plan details. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: David Johnston pol...@yahoo.com writes: Need to discuss the general why before any meaningful help on the how is going to be considered by hackers. Possibly worth noting is that in past discussions, we've concluded that the most sensible type of hint would not be use this plan at all, but here's what to assume about the selectivity of this WHERE clause. That seems considerably less likely to break than any attempt to directly specify plan details. Isnt using a user given value for selectivity a pretty risky situation as it can horribly screw up the plan selection? Why not allow the user to specify an alternate plan and have the planner assign a higher preference to it during plan evaluation? This shall allow us to still have a fair evaluation of all possible plans as we do right now and yet have a higher preference for the user given plan during evaluation? Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Planner hints in Postgresql
Atri Sharma atri.j...@gmail.com writes: On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Possibly worth noting is that in past discussions, we've concluded that the most sensible type of hint would not be use this plan at all, but here's what to assume about the selectivity of this WHERE clause. That seems considerably less likely to break than any attempt to directly specify plan details. Isnt using a user given value for selectivity a pretty risky situation as it can horribly screw up the plan selection? And forcing a plan to be used *isn't* that? Please re-read the older threads, since you evidently have not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
* Atri Sharma (atri.j...@gmail.com) wrote: Isnt using a user given value for selectivity a pretty risky situation as it can horribly screw up the plan selection? Why not allow the user to specify an alternate plan and have the planner Uh, you're worried about the user given us a garbage selectivity, but they're going to get a full-blown plan perfect? assign a higher preference to it during plan evaluation? This shall allow us to still have a fair evaluation of all possible plans as we do right now and yet have a higher preference for the user given plan during evaluation? What exactly would such a preference look like? A cost modifier? We'd almost certainly have to make that into a GUC or a value passed in as part of the query, with a high likelihood of users figuring out how to use it to say use my plan forever and always.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Planner hints in Postgresql
Atri Sharma wrote On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane lt; tgl@.pa gt; wrote: David Johnston lt; polobo@ gt; writes: Need to discuss the general why before any meaningful help on the how is going to be considered by hackers. Possibly worth noting is that in past discussions, we've concluded that the most sensible type of hint would not be use this plan at all, but here's what to assume about the selectivity of this WHERE clause. That seems considerably less likely to break than any attempt to directly specify plan details. Isnt using a user given value for selectivity a pretty risky situation as it can horribly screw up the plan selection? Why not allow the user to specify an alternate plan and have the planner assign a higher preference to it during plan evaluation? This shall allow us to still have a fair evaluation of all possible plans as we do right now and yet have a higher preference for the user given plan during evaluation? The larger question to answer first is whether we want to implement something that is deterministic... How about just dropping the whole concept of hinting and provide a way for someone to say use this plan, or die trying. Maybe require it be used in conjunction with named PREPAREd statements: PREPARE s1 (USING /path/to/plan_def_on_server_or_something_similar) AS SELECT ...; Aside from whole-plan specification I can definitely see where join/where specification could be useful if it can overcome the current limitation of not being able to calculate inter-table estimations. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Planner-hints-in-Postgresql-tp5796347p5796378.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 10:58 PM, Stephen Frost sfr...@snowman.net wrote: * Atri Sharma (atri.j...@gmail.com) wrote: Isnt using a user given value for selectivity a pretty risky situation as it can horribly screw up the plan selection? Why not allow the user to specify an alternate plan and have the planner Uh, you're worried about the user given us a garbage selectivity, but they're going to get a full-blown plan perfect? I never said that the user plan would be perfect. The entire point of planner hints is based on the assumption that the user knows more about the data than the planner does hence the user's ideas about the plan should be given a preference. Garbage selectivity can screw up the cost estimation of *all* our possible plans and we could end up preferring a sequential scan over an index only scan for e.g. I am trying to think of ways that give some preference to a user plan but do not interfere with the cost estimation of our other potential plans. What exactly would such a preference look like? A cost modifier? We'd almost certainly have to make that into a GUC or a value passed in as part of the query, with a high likelihood of users figuring out how to use it to say use my plan forever and always.. A factor that we experimentally determine by which we decrease the cost of the user specified plan so that it gets a higher preference in the plan evaluation. Of course, this is not a nice hack. Specifically after our discussion on IRC the other day, I am against planner hints, but if we are just discussing how it could be done, I could think of some ways which I listed. Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Planner hints in Postgresql
* Atri Sharma (atri.j...@gmail.com) wrote: Of course, this is not a nice hack. Specifically after our discussion on IRC the other day, I am against planner hints, but if we are just discussing how it could be done, I could think of some ways which I listed. There's lots of ways to implement planner hints, but I fail to see the point in discussing how to implement something we actively don't want. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Planner hints in Postgresql
The larger question to answer first is whether we want to implement something that is deterministic... How about just dropping the whole concept of hinting and provide a way for someone to say use this plan, or die trying. Maybe require it be used in conjunction with named PREPAREd statements: You mean taking away the entire concept of query planning and cost estimation? Thats like replacing the optimizer with DBA decision and I am not at all comfortable with that idea. That are only my thoughts though. PREPARE s1 (USING /path/to/plan_def_on_server_or_something_similar) AS SELECT ...; Aside from whole-plan specification I can definitely see where join/where specification could be useful if it can overcome the current limitation of not being able to calculate inter-table estimations. Prepare plans use a generic plan for the execution. Replacing it with a totally user defined plan does not seem to be clean. The crux is that IMHO planner hints are a bad way of trying to circumvent the need for cross-column statistics. We should do cross-column statistics done and ignore planner hints completely. Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Planner hints in Postgresql
There's lots of ways to implement planner hints, but I fail to see the point in discussing how to implement something we actively don't want. +1. The original poster wanted a way to implement it as a personal project or something ( I think he only replied to me, not the entire list). Planner hints should be ignored :) Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 10:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Atri Sharma atri.j...@gmail.com writes: On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Possibly worth noting is that in past discussions, we've concluded that the most sensible type of hint would not be use this plan at all, but here's what to assume about the selectivity of this WHERE clause. That seems considerably less likely to break than any attempt to directly specify plan details. Isnt using a user given value for selectivity a pretty risky situation as it can horribly screw up the plan selection? And forcing a plan to be used *isn't* that? Please re-read the older threads, since you evidently have not. I never said that we force a plan to be used. I just said that we should increase the preference for a user given plan and not interfere in the cost estimation of the other potential plans and the evaluation of the final selected plan. Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 11:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: David Johnston pol...@yahoo.com writes: Need to discuss the general why before any meaningful help on the how is going to be considered by hackers. Possibly worth noting is that in past discussions, we've concluded that the most sensible type of hint would not be use this plan at all, but here's what to assume about the selectivity of this WHERE clause. That seems considerably less likely to break than any attempt to directly specify plan details. Yeah -- the most common case I see is outlier culling where several repeated low non-deterministic selectivity quals stack reducing the row count estimate to 1. For example: SELECT * FROM foo WHERE length(bar) = 1000 AND length(bar) = 2; The user may have special knowledge that the above is very (or very un-) selective that is difficult or not cost effective to gather in the general case. IIRC in the archives (heh) there is a special workaround using indexes and some discussion regarding how a hypothetical feature involving user input selectivity estimates might look. I don't think that discussion is complete: the syntax for user input selectivity is an unsolved problem. There's a big difference between saying to the planner, Use plan X vs Here's some information describing the data supporting choosing plan X intelligently. The latter allows for better plans in the face of varied/changing data, integrates with the planner in natural way, and encourages users to understand how the planner works. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
* Merlin Moncure (mmonc...@gmail.com) wrote: Yeah -- the most common case I see is outlier culling where several repeated low non-deterministic selectivity quals stack reducing the row count estimate to 1. For example: SELECT * FROM foo WHERE length(bar) = 1000 AND length(bar) = 2; This is exactly the issue that I've seen also- where we end up picking a Nested Loop because we think only one row is going to be returned and instead we end up getting a bunch and it takes forever. There was also some speculation on trying to change plans mid-stream to address a situation like that, once we realize what's happening. Not sure that's really practical but it would be nice to find some solution. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Planner hints in Postgresql
There's a big difference between saying to the planner, Use plan X vs Here's some information describing the data supporting choosing plan X intelligently. The latter allows for better plans in the face of varied/changing data, integrates with the planner in natural way, and encourages users to understand how the planner works. +1 I was thinking of varying the 'weight' of a user defined plan by an fixed experimental factor to tell the planner to give higher/lower preference to this plan, but after your idea above, I think Stephen's point of introducing a GUC for the factor is the only way possible and I agree with him on the point that eventually the user will figure out a way to force usage of his plan using the GUC. Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma atri.j...@gmail.com wrote: There's a big difference between saying to the planner, Use plan X vs Here's some information describing the data supporting choosing plan X intelligently. The latter allows for better plans in the face of varied/changing data, integrates with the planner in natural way, and encourages users to understand how the planner works. +1 I was thinking of varying the 'weight' of a user defined plan by an fixed experimental factor to tell the planner to give higher/lower preference to this plan, but after your idea above, I think Stephen's point of introducing a GUC for the factor is the only way possible and I agree with him on the point that eventually the user will figure out a way to force usage of his plan using the GUC. GUC is not the answer beyond the broad brush mostly debugging level features they already support. What do you do if your plan simultaneously needs and does not need nestloops? A query plan is a complicated thing that is the result of detail analysis of the data. I bet there are less than 100 users on the planet with the architectural knowledge of the planner to submit a 'plan'. What users do have is knowledge of the data that the database can't effectively gather for some reason. Looking at my query above, what it would need (assuming the planner could not be made to look through length()) would be something like: SELECT * FROM foo WHERE length(bar) = 1000 WITH SELECTIVITY 0.999 AND length(bar) = 2 WITH SELECTIVITY 0.999; Note, that's a trivial treatment of the syntax challenges. Ultimately it'd probably look different and/or be hooked in a different way (say, via the function call). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 11:50 PM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma atri.j...@gmail.com wrote: There's a big difference between saying to the planner, Use plan X vs Here's some information describing the data supporting choosing plan X intelligently. The latter allows for better plans in the face of varied/changing data, integrates with the planner in natural way, and encourages users to understand how the planner works. +1 I was thinking of varying the 'weight' of a user defined plan by an fixed experimental factor to tell the planner to give higher/lower preference to this plan, but after your idea above, I think Stephen's point of introducing a GUC for the factor is the only way possible and I agree with him on the point that eventually the user will figure out a way to force usage of his plan using the GUC. GUC is not the answer beyond the broad brush mostly debugging level features they already support. What do you do if your plan simultaneously needs and does not need nestloops? A query plan is a complicated thing that is the result of detail analysis of the data. I bet there are less than 100 users on the planet with the architectural knowledge of the planner to submit a 'plan'. What users do have is knowledge of the data that the database can't effectively gather for some reason. Looking at my query above, what it would need (assuming the planner could not be made to look through length()) would be something like: SELECT * FROM foo WHERE length(bar) = 1000 WITH SELECTIVITY 0.999 AND length(bar) = 2 WITH SELECTIVITY 0.999; Wont this have scaling issues and issues over time as the data in the table changes? Suppose I make a view with the above query. With time, as the data in the table changes, the selectivity values wont be good for planning. This may potentially lead to a lot of changes in the view definition and other places where this query was used. In general, I think I step back on my point that specifying the selectivity is a bad idea. Could this also work (for the time being) for cross-column statistics? Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Planner hints in Postgresql
2014-03-17 19:35 GMT+01:00 Atri Sharma atri.j...@gmail.com: On Mon, Mar 17, 2014 at 11:50 PM, Merlin Moncure mmonc...@gmail.comwrote: On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma atri.j...@gmail.com wrote: There's a big difference between saying to the planner, Use plan X vs Here's some information describing the data supporting choosing plan X intelligently. The latter allows for better plans in the face of varied/changing data, integrates with the planner in natural way, and encourages users to understand how the planner works. +1 I was thinking of varying the 'weight' of a user defined plan by an fixed experimental factor to tell the planner to give higher/lower preference to this plan, but after your idea above, I think Stephen's point of introducing a GUC for the factor is the only way possible and I agree with him on the point that eventually the user will figure out a way to force usage of his plan using the GUC. GUC is not the answer beyond the broad brush mostly debugging level features they already support. What do you do if your plan simultaneously needs and does not need nestloops? A query plan is a complicated thing that is the result of detail analysis of the data. I bet there are less than 100 users on the planet with the architectural knowledge of the planner to submit a 'plan'. What users do have is knowledge of the data that the database can't effectively gather for some reason. Looking at my query above, what it would need (assuming the planner could not be made to look through length()) would be something like: SELECT * FROM foo WHERE length(bar) = 1000 WITH SELECTIVITY 0.999 AND length(bar) = 2 WITH SELECTIVITY 0.999; Wont this have scaling issues and issues over time as the data in the table changes? Suppose I make a view with the above query. With time, as the data in the table changes, the selectivity values wont be good for planning. This may potentially lead to a lot of changes in the view definition and other places where this query was used. In general, I think I step back on my point that specifying the selectivity is a bad idea. Could this also work (for the time being) for cross-column statistics? It is another issue. I don't believe so SELECTIVITY can work well too. Slow queries are usually related to some strange points in data. I am thinking so well concept should be based on validity of estimations. Some plans are based on totally wrong estimation, but should be fast due less sensitivity to bad estimations. So well concept is penalization some risk plans - or use brute force - like COLUMN store engine does. Their plan is usually simply and tolerant to bad estimations. Pavel Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Planner hints in Postgresql
Atri Sharma atri.j...@gmail.com writes: Wont this have scaling issues and issues over time as the data in the table changes? It can't possibly have worse problems of that sort than explicitly specifying a plan does. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehule pavel.steh...@gmail.com wrote: I don't believe so SELECTIVITY can work well too. Slow queries are usually related to some strange points in data. I am thinking so well concept should be based on validity of estimations. Some plans are based on totally wrong estimation, but should be fast due less sensitivity to bad estimations. So well concept is penalization some risk plans - or use brute force - like COLUMN store engine does. Their plan is usually simply and tolerant to bad estimations. Disagree. There is a special case of slow query where problem is not with the data but with the expression over the data; something in the query defeats sampled selectivity. Common culprits are: *) CASE expressions *) COALESCE *) casts *) simple tranformational expressions *) predicate string concatenation When using those expressions, you often end up with default selectivity assumptions and if they are way off -- watch out. Plan risk analysis solves a different problem: small changes in the data mean big changes in the execution runtime. It probably wouldn't even help cases where the server thinks there is one row and you actually have thousands or millions unless you want to implement a selectivity range with perhaps a risk coefficient. This was also suggested sometime back and was also met with some skepticism (but it'd be interesting to see!). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On 3/17/14, 12:58 PM, Stephen Frost wrote: * Merlin Moncure (mmonc...@gmail.com) wrote: Yeah -- the most common case I see is outlier culling where several repeated low non-deterministic selectivity quals stack reducing the row count estimate to 1. For example: SELECT * FROM foo WHERE length(bar) = 1000 AND length(bar) = 2; This is exactly the issue that I've seen also- where we end up picking a Nested Loop because we think only one row is going to be returned and instead we end up getting a bunch and it takes forever. FWIW, I've also seen problems with merge and hash joins at work, but I don't have any concrete examples handy. :( There was also some speculation on trying to change plans mid-stream to address a situation like that, once we realize what's happening. Not sure that's really practical but it would be nice to find some solution. Just being able to detect that something has possibly gone wrong would be useful. We could log that to alert the DBA/user of a potential bad plan. We could even format this in such a fashion that it's suitable for emailing the community with; the query, the plan, the stats, etc. That might make it easier for us to fix the planner (although at this point it seems like we're hitting statistics gathering problems that we simply don't know how to solve). There is another aspect of this though: plan stability. There are lots of cases where users couldn't care less about getting an optimal plan, but they care *greatly* about not getting a brain-dead plan. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 2:02 PM, Jim Nasby j...@nasby.net wrote: Just being able to detect that something has possibly gone wrong would be useful. We could log that to alert the DBA/user of a potential bad plan. We could even format this in such a fashion that it's suitable for emailing the community with; the query, the plan, the stats, etc. That might make it easier for us to fix the planner (although at this point it seems like we're hitting statistics gathering problems that we simply don't know how to solve). Again, that's not the case here. The problem is that the server is using hard wired assumptions (like, 10% selective) *instead* of statistics -- at least in the case discussed above. That being said, I think you're on to something: EXPLAIN ANALYZE rowcounts don't indicate if the row count was generated from data based assumptions or SWAGs. So maybe you could decorate the plan description with an indicator that suggests when default selectivity rules were hit. There is another aspect of this though: plan stability. There are lots of cases where users couldn't care less about getting an optimal plan, but they care *greatly* about not getting a brain-dead plan. Except for cases I noted above, I don't understand how you could flag 'sub-optimal' or 'brain-dead' plans. The server always picks the best plan it can. The trick is to (in a very simple and cpu-unintensive way) indicate when there isn't a lot of confidence in the plan -- but that's not the same thing. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 01:20:47PM -0500, Merlin Moncure wrote: A query plan is a complicated thing that is the result of detail analysis of the data. I bet there are less than 100 users on the planet with the architectural knowledge of the planner to submit a 'plan'. What users do have is knowledge of the data that the database can't effectively gather for some reason. Looking at my query above, what it would need (assuming the planner could not be made to look through length()) would be something like: SELECT * FROM foo WHERE length(bar) = 1000 WITH SELECTIVITY 0.999 AND length(bar) = 2 WITH SELECTIVITY 0.999; A small issue with selectivity is that the selectivity is probably not what the users are expecting anyway, since many will related to conditional selectivities. PostgreSQL is pretty good at single column statistics, it just sometimes screws up on cross-column correlations. This ties in with alerting about a bad plan: if the EXPLAIN output could list for each condition what the actual selectivity was it might give user a way of understanding the problem. So the example given might lead to output like: clause selectivity estimated length(bar)20.50 0.50 length(bar)1000 | length(bar)2 0.50 0.25 The execution engine can only output conditional selectivities because of the order of execution. But this would at least give users a handle on the problem. Note that a first cut of the problem might simply be something like likely()/unlikely() as in gcc. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Planner hints in Postgresql
On 3/17/14, 3:32 PM, Merlin Moncure wrote: On Mon, Mar 17, 2014 at 2:02 PM, Jim Nasby j...@nasby.net wrote: Just being able to detect that something has possibly gone wrong would be useful. We could log that to alert the DBA/user of a potential bad plan. We could even format this in such a fashion that it's suitable for emailing the community with; the query, the plan, the stats, etc. That might make it easier for us to fix the planner (although at this point it seems like we're hitting statistics gathering problems that we simply don't know how to solve). Again, that's not the case here. The problem is that the server is using hard wired assumptions (like, 10% selective) *instead* of statistics -- at least in the case discussed above. That being said, I think you're on to something: EXPLAIN ANALYZE rowcounts don't indicate if the row count was generated from data based assumptions or SWAGs. So maybe you could decorate the plan description with an indicator that suggests when default selectivity rules were hit. It occurs to me... it should be cheap for us to track actual rowcounts compared to the estimate... perhaps it's worth doing that and flagging plans when the estimates are off by more than X percent. Though... I suspect that will just tell us what we already know. :( There is another aspect of this though: plan stability. There are lots of cases where users couldn't care less about getting an optimal plan, but they care *greatly* about not getting a brain-dead plan. Except for cases I noted above, I don't understand how you could flag 'sub-optimal' or 'brain-dead' plans. The server always picks the best plan it can. The trick is to (in a very simple and cpu-unintensive way) indicate when there isn't a lot of confidence in the plan -- but that's not the same thing. Keep in mind that the use case here is critical queries that MUST perform fast enough. They do NOT need to be optimal, but they definitely can not degrade into something stupid. It's often way better to have a query that's 50-100% slower than optimal as opposed to one that suddenly becomes 100+% slower than it normally is (regardless of if normal is optimal or not). You could possibly do an anti-hint: Never use this plan, because we know it sucks. Even better would be if the planner could estimate how bad a plan will become if we made assumptions that turn out to be wrong. Another option would be tracking normal execution times (and the plans used) for queries (not completely unreasonable now with pg_stat_statements); if we try to run a query and it takes noticeably longer than normal and it's a different plan then assume the plan is bad, mark it as such, and try again with one of the known good plans. Worst case would be allowing a means to store an acceptable plan and force/strongly suggest that the planner use it. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net wrote: Even better would be if the planner could estimate how bad a plan will become if we made assumptions that turn out to be wrong. That's precisely what risk estimation was about. Something like SELECT * FROM wherever WHEN id something LIMIT COST 1; Would forbid a sequential scan *if* the table is big enough to suspect the plan might take that much, or a nested loop *if* the planner cannot *prove* it will be faster than that. I don't believe the limit unit is obscure at all (page fetches being a nice measuring stick), but what is, is what do you do when no plan fits the limits.
Re: [HACKERS] Planner hints in Postgresql
On 3/17/14, 2:16 PM, Merlin Moncure wrote: On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehulepavel.steh...@gmail.com wrote: I don't believe so SELECTIVITY can work well too. Slow queries are usually related to some strange points in data. I am thinking so well concept should be based on validity of estimations. Some plans are based on totally wrong estimation, but should be fast due less sensitivity to bad estimations. So well concept is penalization some risk plans - or use brute force - like COLUMN store engine does. Their plan is usually simply and tolerant to bad estimations. Disagree. There is a special case of slow query where problem is not with the data but with the expression over the data; something in the query defeats sampled selectivity. Common culprits are: *) CASE expressions *) COALESCE *) casts *) simple tranformational expressions *) predicate string concatenation *) time/date functions, ie WHERE date_trunc( 'quarter', some_timestamp ) = '2014-1-1' Though, in this case it's probably much better to teach the parser how to turn that into a range expression. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 7:09 PM, Jim Nasby j...@nasby.net wrote: On 3/17/14, 2:16 PM, Merlin Moncure wrote: On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehulepavel.steh...@gmail.com wrote: I don't believe so SELECTIVITY can work well too. Slow queries are usually related to some strange points in data. I am thinking so well concept should be based on validity of estimations. Some plans are based on totally wrong estimation, but should be fast due less sensitivity to bad estimations. So well concept is penalization some risk plans - or use brute force - like COLUMN store engine does. Their plan is usually simply and tolerant to bad estimations. Disagree. There is a special case of slow query where problem is not with the data but with the expression over the data; something in the query defeats sampled selectivity. Common culprits are: *) CASE expressions *) COALESCE *) casts *) simple tranformational expressions *) predicate string concatenation *) time/date functions, ie WHERE date_trunc( 'quarter', some_timestamp ) = '2014-1-1' Though, in this case it's probably much better to teach the parser how to turn that into a range expression. Maybe, maybe not. An index over the truncated time can potentially be much more efficient.
Re: [HACKERS] Planner hints in Postgresql
On 3/17/14, 5:07 PM, Claudio Freire wrote: On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net mailto:j...@nasby.net wrote: Even better would be if the planner could estimate how bad a plan will become if we made assumptions that turn out to be wrong. That's precisely what risk estimation was about. Something like SELECT * FROM wherever WHEN id something LIMIT COST 1; Would forbid a sequential scan *if* the table is big enough to suspect the plan might take that much, or a nested loop *if* the planner cannot *prove* it will be faster than that. I don't believe the limit unit is obscure at all (page fetches being a nice measuring stick), but what is, is what do you do when no plan fits the limits. I don't think that's the same thing... what you're describing is a way to not begin a query if a low-enough cost plan can't be found. What I'm talking about is when the planner picks one low-cost plan over another and it turns out the estimate of the one that was picked was WAY off. I've actually seen cases where plan estimates that were off by just 100 units produce wildly different results. In that scenario, LIMIT COST won't help at all. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On 3/17/14, 5:12 PM, Claudio Freire wrote: On Mon, Mar 17, 2014 at 7:09 PM, Jim Nasby j...@nasby.net mailto:j...@nasby.net wrote: On 3/17/14, 2:16 PM, Merlin Moncure wrote: On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehulepavel.stehule@gmail.__com mailto:pavel.steh...@gmail.com wrote: I don't believe so SELECTIVITY can work well too. Slow queries are usually related to some strange points in data. I am thinking so well concept should be based on validity of estimations. Some plans are based on totally wrong estimation, but should be fast due less sensitivity to bad estimations. So well concept is penalization some risk plans - or use brute force - like COLUMN store engine does. Their plan is usually simply and tolerant to bad estimations. Disagree. There is a special case of slow query where problem is not with the data but with the expression over the data; something in the query defeats sampled selectivity. Common culprits are: *) CASE expressions *) COALESCE *) casts *) simple tranformational expressions *) predicate string concatenation *) time/date functions, ie WHERE date_trunc( 'quarter', some_timestamp ) = '2014-1-1' Though, in this case it's probably much better to teach the parser how to turn that into a range expression. Maybe, maybe not. An index over the truncated time can potentially be much more efficient. More efficient than a range index? Maybe, but I'm doubtful. Even if that's true, in a warehouse you're going to want to limit by weeks, months, quarters, years, etc. So now you're stuck building tons of special indexes. (Granted, most warehouses build a separate date dimension because of these kinds of problems... I'm hoping that we could do something better.) -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On 03/17/2014 01:57 PM, Martijn van Oosterhout wrote: On Mon, Mar 17, 2014 at 01:20:47PM -0500, Merlin Moncure wrote: A query plan is a complicated thing that is the result of detail analysis of the data. I bet there are less than 100 users on the planet with the architectural knowledge of the planner to submit a 'plan'. What users do have is knowledge of the data that the database can't effectively gather for some reason. Looking at my query above, what it would need (assuming the planner could not be made to look through length()) would be something like: SELECT * FROM foo WHERE length(bar) = 1000 WITH SELECTIVITY 0.999 AND length(bar) = 2 WITH SELECTIVITY 0.999; So, if we're going to support query decorators, we might as well go all the way and just offer Oracle-style use this index. Speaking as someone who is often called on to fix performance issues in other people's databases, I find major issues with query decorators: 1. they are impossible to maintain since they're scattered all over the application code. 2. they eventually become a barrier to upgrading, once the performance of the DB engine changes in a way that makes older query decorators crippling and/or erroneous. Because they are scattered all around the code, it then becomes a major refactoring effort to fix them. 3. There's no obvious way to collect cumulative query hints in order to supply data for database-level tuning, or for improving the postgresql query planner. 4. There's no obvious way to use query decorators with ORMs, making them useless to 95% of our users. 5. Application developers will add them without telling the DBAs, and vice-versa. Hilarity ensues. Given that, I would strongly prefer a different mechanism for ad-hoc query plan adjustment. Possible other mechanisms would include: a) ability to set selectivity for database objects, possibly including per-column selectivity (i.e. selectivity for columns a b f is 0.01), and save it. b) ability to execute a query using a presupplied plan. This would include ability to edit the plan using some intermediate format, like JSON or XML. c) ability to finely adjust costs of specific query operations (e.g. bitmapscan_cost = 0.02), possibly for specific database objects. d) ability to save selectivity estimates for specific expressions (e.g. selectivity on log15 ( session_id =, log_time BETWEEN ) = 0.03. Personally, (b) is my favorite version of this feature, becuase it allows me to test the query executor itself, and it raises the bar for clobbering the query planner to people who are willing to spend a little time on it. It would also make a fantastic learning tool for learning about database planning and optimization, and might open the door for more people hacking on our planner. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On 03/17/2014 06:28 PM, Stephen Frost wrote: a value passed in as part of the query, with a high likelihood of users figuring out how to use it to say use my plan forever and always.. Last time this came up, I said on irc that if we ever do implement hints, I'd like them to be tied to a major version somehow so that later versions can issue warnings that the planner might have become smarter about long-forgotten queries and they need to be re-tested to see if the hint is still warrented. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On 03/17/2014 11:15 PM, Josh Berkus wrote: 2. they eventually become a barrier to upgrading, once the performance of the DB engine changes in a way that makes older query decorators crippling and/or erroneous. Because they are scattered all around the code, it then becomes a major refactoring effort to fix them. My idea of associating them with major versions and issuing warnings would help with this. 5. Application developers will add them without telling the DBAs, and vice-versa. Hilarity ensues. And this would be unmasked at next major upgrade when the logs fill up. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 7:12 PM, Jim Nasby j...@nasby.net wrote: On 3/17/14, 5:07 PM, Claudio Freire wrote: On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net mailto: j...@nasby.net wrote: Even better would be if the planner could estimate how bad a plan will become if we made assumptions that turn out to be wrong. That's precisely what risk estimation was about. Something like SELECT * FROM wherever WHEN id something LIMIT COST 1; Would forbid a sequential scan *if* the table is big enough to suspect the plan might take that much, or a nested loop *if* the planner cannot *prove* it will be faster than that. I don't believe the limit unit is obscure at all (page fetches being a nice measuring stick), but what is, is what do you do when no plan fits the limits. I don't think that's the same thing... what you're describing is a way to not begin a query if a low-enough cost plan can't be found. What I'm talking about is when the planner picks one low-cost plan over another and it turns out the estimate of the one that was picked was WAY off. I've actually seen cases where plan estimates that were off by just 100 units produce wildly different results. In that scenario, LIMIT COST won't help at all. The case you describe is different. It's when a plan *effectively* is more expensive than estimated, but the planner could not estimate it. That's what was mentioned about switching plans mid-way through them, which is IMO quite ill-defined as such (it needs a lot of love in order to get a workable spec out of that).
Re: [HACKERS] Planner hints in Postgresql
On Mon, Mar 17, 2014 at 5:15 PM, Josh Berkus j...@agliodbs.com wrote: So, if we're going to support query decorators, we might as well go all the way and just offer Oracle-style use this index. Speaking as someone who is often called on to fix performance issues in other people's databases, I find major issues with query decorators: Supplying selectivity estimates in places where the database can't or wont do them properly itself is a completely different thing from Oracle style hints. For example, they are much more 'future proof' -- both to schema changes and postgres enhancments -- in the absolute case the database can peek into your expression in some future version and generate a better estimate than you can. Aside from that, you're in no way locked out of future innovations. This could be done in a relatively clean way: for example, by putting complex quals in an inlineable function that is decorated with analog of COST/ROWS clause -- perhaps allowing for a user defined expression to base selectivity from the input arguments. OTOH, Hints disable the planner and they are much more complex, particularly for large queries since you have to consider how components of the plan relate to one another. Force feeding a plan to the database is the wrong way to go. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
Claudio Freire klaussfre...@gmail.com writes: On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net wrote: Even better would be if the planner could estimate how bad a plan will become if we made assumptions that turn out to be wrong. That's precisely what risk estimation was about. Yeah. I would like to see the planner's cost estimates extended to include some sort of uncertainty estimate, whereupon risk-averse people could ask it to prefer low-uncertainty plans over high-uncertainty ones (the plans we typically choose for ORDER BY ... LIMIT queries being great examples of the latter). But it's a long way from wishing that to making it so. Right now it's not even clear (to me anyway) how we'd measure or model such uncertainty. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner hints in Postgresql
On Tue, Mar 18, 2014 at 12:46 AM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehule pavel.steh...@gmail.com wrote: I don't believe so SELECTIVITY can work well too. Slow queries are usually related to some strange points in data. I am thinking so well concept should be based on validity of estimations. Some plans are based on totally wrong estimation, but should be fast due less sensitivity to bad estimations. So well concept is penalization some risk plans - or use brute force - like COLUMN store engine does. Their plan is usually simply and tolerant to bad estimations. Disagree. There is a special case of slow query where problem is not with the data but with the expression over the data; something in the query defeats sampled selectivity. Common culprits are: *) CASE expressions *) COALESCE *) casts *) simple tranformational expressions *) predicate string concatenation When using those expressions, you often end up with default selectivity assumptions and if they are way off -- watch out. Plan risk analysis solves a different problem: small changes in the data mean big changes in the execution runtime. It probably wouldn't even help cases where the server thinks there is one row and you actually have thousands or millions unless you want to implement a selectivity range with perhaps a risk coefficient. This was also suggested sometime back and was also met with some skepticism (but it'd be interesting to see!). Another case is with prepared statements, when things like array size are not know to the planner and the planner makes a hard coded estimate for it, leading to selection of a customized plan which is worse than the generic plan. This would be even more useful for prepared statements since they need some support from the user in terms of the selectivity and the user should be allowed to tell more about the data, since he already has given us some indications about the type of query plans he requires using prepared statements. Regards, Atri -- Regards, Atri *l'apprenant*