Re: [HACKERS] Planner hints in Postgresql

2014-03-18 Thread Jeff Janes
On Monday, March 17, 2014, Atri Sharma atri.j...@gmail.com wrote:




 On Mon, Mar 17, 2014 at 10:58 PM, Stephen Frost 
 sfr...@snowman.netjavascript:_e(%7B%7D,'cvml','sfr...@snowman.net');
  wrote:

 * Atri Sharma 
 (atri.j...@gmail.comjavascript:_e(%7B%7D,'cvml','atri.j...@gmail.com');)
 wrote:
  Isnt using a user given value for selectivity a pretty risky situation
 as
  it can horribly screw up the plan selection?
 
  Why not allow the user to specify an alternate plan and have the planner

 Uh, you're worried about the user given us a garbage selectivity, but
 they're going to get a full-blown plan perfect?



 I never said that the user plan would be perfect. The entire point of
 planner hints is based on the assumption that the user knows more about the
 data than the planner does hence the user's ideas about the plan should be
 given a preference. Garbage selectivity can screw up  the cost estimation
 of *all* our possible plans and we could end up preferring a sequential
 scan over an index only scan for e.g. I am trying to think of ways that
 give some preference to a user plan but do not interfere with the cost
 estimation of our other potential plans.


I'm not opposed to planner hints (or plan mandates), but also not
optimistic they will ever get implemented, much less accepted.  But if they
were, I don't see a use for such fudge factors.  By mandating a plan, I am
already asserting I know more than the optimizer does.  Maybe I am right,
maybe I am wrong, but either way I have taken it out of the optimizer's
hands and would not welcome it snatching control back.

If it is too deranged for me to trust, why would it not become somewhat
more deranged and so decide to ignore my hints?  The only setting for such
a factor I would ever see myself using was the minimum, or the maximum.

The feature I would like in such hints, if they are to exist, is to set a
version to which they apply.  Often a fix is made very quickly after the
problem is pointed out, but it could take well over a year for the fix to
see production if it is not backpatched and it lands at the wrong part of
the release cycle.

Cheers,

Jeff




Re: [HACKERS] Planner hints in Postgresql

2014-03-18 Thread Jeff Janes
On Monday, March 17, 2014, Tom Lane t...@sss.pgh.pa.us wrote:

 Claudio Freire klaussfre...@gmail.com javascript:; writes:
  On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net javascript:;
 wrote:
  Even better would be if the planner could estimate how bad a plan will
  become if we made assumptions that turn out to be wrong.

  That's precisely what risk estimation was about.

 Yeah.  I would like to see the planner's cost estimates extended to
 include some sort of uncertainty estimate, whereupon risk-averse people
 could ask it to prefer low-uncertainty plans over high-uncertainty ones
 (the plans we typically choose for ORDER BY ... LIMIT queries being great
 examples of the latter).  But it's a long way from wishing that to making
 it so.  Right now it's not even clear (to me anyway) how we'd measure or
 model such uncertainty.


Most of the cases where I've run into horrible estimates, it seemed like
the same level of knowledge/reasoning that could allow us to know it was
risky, would allow us to just do a better job in the first place.

The exception I can think of is in an antijoin between two huge rels.  It
is like subtracting two large measurements to get a much smaller result.
 We should know the uncertainty will be large.

Cheers,

Jeff


Re: [HACKERS] Planner hints in Postgresql

2014-03-18 Thread Claudio Freire
On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Claudio Freire klaussfre...@gmail.com writes:
  On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net wrote:
  Even better would be if the planner could estimate how bad a plan will
  become if we made assumptions that turn out to be wrong.

  That's precisely what risk estimation was about.

 Yeah.  I would like to see the planner's cost estimates extended to
 include some sort of uncertainty estimate, whereupon risk-averse people
 could ask it to prefer low-uncertainty plans over high-uncertainty ones
 (the plans we typically choose for ORDER BY ... LIMIT queries being great
 examples of the latter).  But it's a long way from wishing that to making
 it so.  Right now it's not even clear (to me anyway) how we'd measure or
 model such uncertainty.


Well, currently, selectivity estimates based on MCV should be pretty
low-uncertainty, whereas certainty of other estimates could be modeled as a
random variable if ANALYZE gathered a few statistical moments (for
variables that are prone to that kind of statistical analysis).

That alone could improve things considerably, and statistical info could be
propagated along expressions to make it possible to model uncertainty in
complex expressions as well.


Re: [HACKERS] Planner hints in Postgresql

2014-03-18 Thread Merlin Moncure
On Tue, Mar 18, 2014 at 11:53 AM, Claudio Freire klaussfre...@gmail.com wrote:

 On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Claudio Freire klaussfre...@gmail.com writes:
  On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net wrote:
  Even better would be if the planner could estimate how bad a plan will
  become if we made assumptions that turn out to be wrong.

  That's precisely what risk estimation was about.

 Yeah.  I would like to see the planner's cost estimates extended to
 include some sort of uncertainty estimate, whereupon risk-averse people
 could ask it to prefer low-uncertainty plans over high-uncertainty ones
 (the plans we typically choose for ORDER BY ... LIMIT queries being great
 examples of the latter).  But it's a long way from wishing that to making
 it so.  Right now it's not even clear (to me anyway) how we'd measure or
 model such uncertainty.

 Well, currently, selectivity estimates based on MCV should be pretty
 low-uncertainty, whereas certainty of other estimates could be modeled as a
 random variable if ANALYZE gathered a few statistical moments (for variables
 that are prone to that kind of statistical analysis).

Sure, plus as noted you have cases where the planer makes SWAGs.  Each
of those SWAGs can introduce say (in the worst case) an order of
magnitude of error in the row count estimate.

 That alone could improve things considerably, and statistical info could be
 propagated along expressions to make it possible to model uncertainty in
 complex expressions as well.

But how would that work?  I see no solution adumbrated there :-).
Let's say you change the rowcount estimate to low/bestguess/high *and*
you only engage extra searches when there is enough disparity between
those values you still get exponentially more searches.  (my thinking
is that if bestguess estimated execution time is some user definable
amount faster then low/high at any node), a more skeptical plan is
introduced.   All that could end up being pessimal to the general case
though.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-18 Thread Claudio Freire
On Tue, Mar 18, 2014 at 4:48 PM, Merlin Moncure mmonc...@gmail.com wrote:

  That alone could improve things considerably, and statistical info could
 be
  propagated along expressions to make it possible to model uncertainty in
  complex expressions as well.

 But how would that work?  I see no solution adumbrated there :-).


I would have to tipify the SQL expression grammar for this, but I don't
think it would be impossible. Most non-function expression nodes seem
rather trivial. Even CASE, as long as you have a distribution for the
conditional, you can derive a distribution for the whole. User defined
functions would be another game, though. Correlation would have to be
measured, and that can be troublesome and a weak spot of risk computation
as much as it is of planning, but it could be fuzzed arbitrarily until
properly computed - after all, dependency on correlation or non-correlation
is a known source of risk, and accounting for it in any way is better than
not.


 Let's say you change the rowcount estimate to low/bestguess/high *and*
 you only engage extra searches when there is enough disparity between
 those values you still get exponentially more searches.


I was under the impression the planner already did an exhaustive search for
some queries. So it's just a matter of picking the best plan among those
(ie: estimating cost). The case of GEQO isn't any different, except perhaps
introducing a risk-decreasing transformation would be needed, unless I'm
missing something.


  (my thinking
 is that if bestguess estimated execution time is some user definable
 amount faster then low/high at any node), a more skeptical plan is
 introduced.   All that could end up being pessimal to the general case
 though.


I think the cost estimate would be replaced by a distribution (simplified
perhaps into an array of moments, or whatever is easily manipulated in the
face of complex expressions). What the user would pick, is a sampling
method of said distribution. Then, plans get measured by the user's stick
(say: arithmetic mean, median, 90th percentile, etc). The arithmetic mean
would I guess be the default, and that ought to be roughly equivalent to
the planner's current behavior.


Re: [HACKERS] Planner hints in Postgresql

2014-03-18 Thread Atri Sharma

  That's precisely what risk estimation was about.

 Yeah.  I would like to see the planner's cost estimates extended to
 include some sort of uncertainty estimate, whereupon risk-averse people
 could ask it to prefer low-uncertainty plans over high-uncertainty ones
 (the plans we typically choose for ORDER BY ... LIMIT queries being great
 examples of the latter).  But it's a long way from wishing that to making
 it so.  Right now it's not even clear (to me anyway) how we'd measure or
 model such uncertainty.


 Well, currently, selectivity estimates based on MCV should be pretty
 low-uncertainty, whereas certainty of other estimates could be modeled as a
 random variable if ANALYZE gathered a few statistical moments (for
 variables that are prone to that kind of statistical analysis).

 That alone could improve things considerably, and statistical info could
 be propagated along expressions to make it possible to model uncertainty in
 complex expressions as well.



That is a sort of solution that I proposed yesterday on the mailing list.
The solution essentially takes lots of samples of the data and then plots
the mean and standard deviation of the independent samples to get the
probability of the histogram selectivity estimate.


The problem is multi faceted (outdated stats, bad guess at distribution of
data, cases Merlin mentioned before (CASE statements, COALESCE statements
etc.). Finding a general solution to this problem shall require a lot of
research and time.

I agree with Tom, we should focus on some of the main problems we have in
that domain and then try to solve them first.

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
On Mon, Mar 17, 2014 at 9:22 PM, Rajmohan C csrajmo...@gmail.com wrote:

 I am implementing Planner hints in Postgresql to force the optimizer to
 select a particular plan for a query on request from sql input. I am having
 trouble in modifying the planner code. I want to create a path node of hint
 plan and make it the plan to be used by executor. How do I enforce this ?
 Should I create a new Plan for this ..how to create a plan node which can
 be then given directly to executor for a particular query?




Planner hints have been discussed a lot before as well and AFAIK there is a
wiki page that says why we shouldnt implement them. Have you referred to
them?

Please share if you have any new points on the same.

Regards,

Atri


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread David Johnston
Atri Sharma wrote
 On Mon, Mar 17, 2014 at 9:22 PM, Rajmohan C lt;

 csrajmohan@

 gt; wrote:
 
 I am implementing Planner hints in Postgresql to force the optimizer to
 select a particular plan for a query on request from sql input. I am
 having
 trouble in modifying the planner code. I want to create a path node of
 hint
 plan and make it the plan to be used by executor. How do I enforce this ?
 Should I create a new Plan for this ..how to create a plan node which can
 be then given directly to executor for a particular query?

 
 Planner hints have been discussed a lot before as well and AFAIK there is
 a
 wiki page that says why we shouldnt implement them. Have you referred to
 them?
 
 Please share if you have any new points on the same.
 
 Regards,
 
 Atri

http://wiki.postgresql.org/wiki/Todo

(I got to it via the FAQ link on the homepage and the Developer FAQ
section there-in.  You should make sure you've scanned that as well.)

Note the final section titled: Features We Do Not Want

Also, you need to consider what you are doing when you cross-post (a bad
thing generally) -hackers and -novice.  As there is, rightly IMO, no
-novice-hackers list you should have probably just hit up -general.

Need to discuss the general why before any meaningful help on the how is
going to be considered by hackers.

David J.








--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Planner-hints-in-Postgresql-tp5796347p5796353.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Tom Lane
David Johnston pol...@yahoo.com writes:
 Need to discuss the general why before any meaningful help on the how is
 going to be considered by hackers.

Possibly worth noting is that in past discussions, we've concluded that
the most sensible type of hint would not be use this plan at all, but
here's what to assume about the selectivity of this WHERE clause.
That seems considerably less likely to break than any attempt to directly
specify plan details.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 David Johnston pol...@yahoo.com writes:
  Need to discuss the general why before any meaningful help on the
 how is
  going to be considered by hackers.

 Possibly worth noting is that in past discussions, we've concluded that
 the most sensible type of hint would not be use this plan at all, but
 here's what to assume about the selectivity of this WHERE clause.
 That seems considerably less likely to break than any attempt to directly
 specify plan details.


Isnt using a user given value for selectivity a pretty risky situation as
it can horribly screw up the plan selection?

Why not allow the user to specify an alternate plan and have the planner
assign a higher preference to it during plan evaluation? This shall allow
us to still have a fair evaluation of all possible plans as we do right now
and yet have a higher preference for the user given plan during evaluation?

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Tom Lane
Atri Sharma atri.j...@gmail.com writes:
 On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Possibly worth noting is that in past discussions, we've concluded that
 the most sensible type of hint would not be use this plan at all, but
 here's what to assume about the selectivity of this WHERE clause.
 That seems considerably less likely to break than any attempt to directly
 specify plan details.

 Isnt using a user given value for selectivity a pretty risky situation as
 it can horribly screw up the plan selection?

And forcing a plan to be used *isn't* that?  Please re-read the older
threads, since you evidently have not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Stephen Frost
* Atri Sharma (atri.j...@gmail.com) wrote:
 Isnt using a user given value for selectivity a pretty risky situation as
 it can horribly screw up the plan selection?
 
 Why not allow the user to specify an alternate plan and have the planner

Uh, you're worried about the user given us a garbage selectivity, but
they're going to get a full-blown plan perfect?

 assign a higher preference to it during plan evaluation? This shall allow
 us to still have a fair evaluation of all possible plans as we do right now
 and yet have a higher preference for the user given plan during evaluation?

What exactly would such a preference look like?  A cost modifier?
We'd almost certainly have to make that into a GUC or a value passed in
as part of the query, with a high likelihood of users figuring out how
to use it to say use my plan forever and always..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread David Johnston
Atri Sharma wrote
 On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane lt;

 tgl@.pa

 gt; wrote:
 
 David Johnston lt;

 polobo@

 gt; writes:
  Need to discuss the general why before any meaningful help on the
 how is
  going to be considered by hackers.

 Possibly worth noting is that in past discussions, we've concluded that
 the most sensible type of hint would not be use this plan at all, but
 here's what to assume about the selectivity of this WHERE clause.
 That seems considerably less likely to break than any attempt to directly
 specify plan details.


 Isnt using a user given value for selectivity a pretty risky situation as
 it can horribly screw up the plan selection?
 
 Why not allow the user to specify an alternate plan and have the planner
 assign a higher preference to it during plan evaluation? This shall allow
 us to still have a fair evaluation of all possible plans as we do right
 now
 and yet have a higher preference for the user given plan during
 evaluation?

The larger question to answer first is whether we want to implement
something that is deterministic...

How about just dropping the whole concept of hinting and provide a way for
someone to say use this plan, or die trying.  Maybe require it be used in
conjunction with named PREPAREd statements:

PREPARE s1 (USING /path/to/plan_def_on_server_or_something_similar) AS
SELECT ...;

Aside from whole-plan specification I can definitely see where join/where
specification could be useful if it can overcome the current limitation of
not being able to calculate inter-table estimations.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Planner-hints-in-Postgresql-tp5796347p5796378.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
On Mon, Mar 17, 2014 at 10:58 PM, Stephen Frost sfr...@snowman.net wrote:

 * Atri Sharma (atri.j...@gmail.com) wrote:
  Isnt using a user given value for selectivity a pretty risky situation as
  it can horribly screw up the plan selection?
 
  Why not allow the user to specify an alternate plan and have the planner

 Uh, you're worried about the user given us a garbage selectivity, but
 they're going to get a full-blown plan perfect?



I never said that the user plan would be perfect. The entire point of
planner hints is based on the assumption that the user knows more about the
data than the planner does hence the user's ideas about the plan should be
given a preference. Garbage selectivity can screw up  the cost estimation
of *all* our possible plans and we could end up preferring a sequential
scan over an index only scan for e.g. I am trying to think of ways that
give some preference to a user plan but do not interfere with the cost
estimation of our other potential plans.



 What exactly would such a preference look like?  A cost modifier?
 We'd almost certainly have to make that into a GUC or a value passed in
 as part of the query, with a high likelihood of users figuring out how
 to use it to say use my plan forever and always..


A factor that we experimentally determine by which we decrease the cost of
the user specified plan so that it gets a higher preference in the plan
evaluation.

Of course, this is not a nice hack. Specifically after our discussion on
IRC the other day, I am against planner hints, but if we are just
discussing how it could be done, I  could think of some ways which I listed.

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Stephen Frost
* Atri Sharma (atri.j...@gmail.com) wrote:
 Of course, this is not a nice hack. Specifically after our discussion on
 IRC the other day, I am against planner hints, but if we are just
 discussing how it could be done, I  could think of some ways which I listed.

There's lots of ways to implement planner hints, but I fail to see the
point in discussing how to implement something we actively don't want.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
The larger question to answer first is whether we want to implement

 something that is deterministic...

 How about just dropping the whole concept of hinting and provide a way
 for
 someone to say use this plan, or die trying.  Maybe require it be used in
 conjunction with named PREPAREd statements:



You mean taking away the entire concept of query planning and cost
estimation? Thats like replacing the optimizer with DBA decision and I am
not at all comfortable with that idea. That are only my thoughts though.




 PREPARE s1 (USING /path/to/plan_def_on_server_or_something_similar) AS
 SELECT ...;

 Aside from whole-plan specification I can definitely see where join/where
 specification could be useful if it can overcome the current limitation of
 not being able to calculate inter-table estimations.



Prepare plans use a generic plan for the execution. Replacing it with a
totally user defined plan does not seem to be clean.

The crux is that IMHO planner hints are a bad way of trying to circumvent
the need for cross-column statistics. We should do cross-column statistics
done and ignore planner hints completely.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
 There's lots of ways to implement planner hints, but I fail to see the
 point in discussing how to implement something we actively don't want.



+1. The original poster wanted a way to implement it as a personal project
or something ( I think he only replied to me, not the entire list).

Planner hints should be ignored :)

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
On Mon, Mar 17, 2014 at 10:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Atri Sharma atri.j...@gmail.com writes:
  On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Possibly worth noting is that in past discussions, we've concluded that
  the most sensible type of hint would not be use this plan at all, but
  here's what to assume about the selectivity of this WHERE clause.
  That seems considerably less likely to break than any attempt to
 directly
  specify plan details.

  Isnt using a user given value for selectivity a pretty risky situation as
  it can horribly screw up the plan selection?

 And forcing a plan to be used *isn't* that?  Please re-read the older
 threads, since you evidently have not.


I never said that we force a plan to be used. I just said that we should
increase the preference for a user given plan and not interfere in the cost
estimation of the other potential plans and the evaluation of the final
selected plan.

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Merlin Moncure
On Mon, Mar 17, 2014 at 11:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 David Johnston pol...@yahoo.com writes:
 Need to discuss the general why before any meaningful help on the how is
 going to be considered by hackers.

 Possibly worth noting is that in past discussions, we've concluded that
 the most sensible type of hint would not be use this plan at all, but
 here's what to assume about the selectivity of this WHERE clause.
 That seems considerably less likely to break than any attempt to directly
 specify plan details.

Yeah -- the most common case I see is outlier culling where several
repeated low non-deterministic selectivity quals stack reducing the
row count estimate to 1.  For example:
SELECT * FROM foo WHERE length(bar) = 1000 AND length(bar) = 2;

The user may have special knowledge that the above is very (or very
un-) selective that is difficult or not cost effective to gather in
the general case.  IIRC in the archives (heh) there is a special
workaround using indexes and some discussion regarding how a
hypothetical feature involving user input selectivity estimates might
look.  I don't think that discussion is complete: the syntax for user
input selectivity is an unsolved problem.

There's a big difference between saying to the planner, Use plan X
vs Here's some information describing the data supporting choosing
plan X intelligently.  The latter allows for better plans in the face
of varied/changing data, integrates with the planner in natural way,
and encourages users to understand how the planner works.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Stephen Frost
* Merlin Moncure (mmonc...@gmail.com) wrote:
 Yeah -- the most common case I see is outlier culling where several
 repeated low non-deterministic selectivity quals stack reducing the
 row count estimate to 1.  For example:
 SELECT * FROM foo WHERE length(bar) = 1000 AND length(bar) = 2;

This is exactly the issue that I've seen also- where we end up picking a
Nested Loop because we think only one row is going to be returned and
instead we end up getting a bunch and it takes forever.

There was also some speculation on trying to change plans mid-stream to
address a situation like that, once we realize what's happening.  Not
sure that's really practical but it would be nice to find some solution.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
There's a big difference between saying to the planner, Use plan X
 vs Here's some information describing the data supporting choosing
 plan X intelligently.  The latter allows for better plans in the face
 of varied/changing data, integrates with the planner in natural way,
 and encourages users to understand how the planner works.



+1

I was thinking of varying the 'weight' of a user defined plan by an fixed
experimental factor to tell the planner to give higher/lower preference to
this plan, but after your idea above, I think Stephen's point of
introducing a GUC for the factor is the only way possible and I agree with
him on the point that eventually the user will figure out a way to force
usage of his plan using the GUC.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Merlin Moncure
On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma atri.j...@gmail.com wrote:

 There's a big difference between saying to the planner, Use plan X
 vs Here's some information describing the data supporting choosing
 plan X intelligently.  The latter allows for better plans in the face
 of varied/changing data, integrates with the planner in natural way,
 and encourages users to understand how the planner works.

 +1

 I was thinking of varying the 'weight' of a user defined plan by an fixed
 experimental factor to tell the planner to give higher/lower preference to
 this plan, but after your idea above, I think Stephen's point of introducing
 a GUC for the factor is the only way possible and I agree with him on the
 point that eventually the user will figure out a way to force usage of his
 plan using the GUC.

GUC is not the answer beyond the broad brush mostly debugging level
features they already support.   What do you do if your plan
simultaneously needs and does not need nestloops?

A query plan is a complicated thing that is the result of detail
analysis of the data.  I bet there are less than 100 users on the
planet with the architectural knowledge of the planner to submit a
'plan'.  What users do have is knowledge of the data that the database
can't effectively gather for some reason.  Looking at my query above,
what it would need (assuming the planner could not be made to look
through length()) would be something like:

SELECT * FROM foo WHERE
  length(bar) = 1000 WITH SELECTIVITY 0.999
  AND length(bar) = 2 WITH SELECTIVITY 0.999;

Note, that's a trivial treatment of the syntax challenges.  Ultimately
it'd probably look different and/or be hooked in a different way (say,
via the function call).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
On Mon, Mar 17, 2014 at 11:50 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma atri.j...@gmail.com wrote:
 
  There's a big difference between saying to the planner, Use plan X
  vs Here's some information describing the data supporting choosing
  plan X intelligently.  The latter allows for better plans in the face
  of varied/changing data, integrates with the planner in natural way,
  and encourages users to understand how the planner works.
 
  +1
 
  I was thinking of varying the 'weight' of a user defined plan by an fixed
  experimental factor to tell the planner to give higher/lower preference
 to
  this plan, but after your idea above, I think Stephen's point of
 introducing
  a GUC for the factor is the only way possible and I agree with him on the
  point that eventually the user will figure out a way to force usage of
 his
  plan using the GUC.

 GUC is not the answer beyond the broad brush mostly debugging level
 features they already support.   What do you do if your plan
 simultaneously needs and does not need nestloops?

 A query plan is a complicated thing that is the result of detail
 analysis of the data.  I bet there are less than 100 users on the
 planet with the architectural knowledge of the planner to submit a
 'plan'.  What users do have is knowledge of the data that the database
 can't effectively gather for some reason.  Looking at my query above,
 what it would need (assuming the planner could not be made to look
 through length()) would be something like:

 SELECT * FROM foo WHERE
   length(bar) = 1000 WITH SELECTIVITY 0.999
   AND length(bar) = 2 WITH SELECTIVITY 0.999;



Wont this have scaling issues and  issues over time as the data in the
table changes?

Suppose I make a view with the above query. With time, as the data in the
table changes, the selectivity values wont be good for planning. This may
potentially lead to a lot of changes in the view definition and other
places where this query was used.



In general, I think I step back on my point that specifying the selectivity
is a bad idea.

Could this also work (for the time being) for cross-column statistics?

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Pavel Stehule
2014-03-17 19:35 GMT+01:00 Atri Sharma atri.j...@gmail.com:




 On Mon, Mar 17, 2014 at 11:50 PM, Merlin Moncure mmonc...@gmail.comwrote:

 On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma atri.j...@gmail.com
 wrote:
 
  There's a big difference between saying to the planner, Use plan X
  vs Here's some information describing the data supporting choosing
  plan X intelligently.  The latter allows for better plans in the face
  of varied/changing data, integrates with the planner in natural way,
  and encourages users to understand how the planner works.
 
  +1
 
  I was thinking of varying the 'weight' of a user defined plan by an
 fixed
  experimental factor to tell the planner to give higher/lower preference
 to
  this plan, but after your idea above, I think Stephen's point of
 introducing
  a GUC for the factor is the only way possible and I agree with him on
 the
  point that eventually the user will figure out a way to force usage of
 his
  plan using the GUC.

 GUC is not the answer beyond the broad brush mostly debugging level
 features they already support.   What do you do if your plan
 simultaneously needs and does not need nestloops?

 A query plan is a complicated thing that is the result of detail
 analysis of the data.  I bet there are less than 100 users on the
 planet with the architectural knowledge of the planner to submit a
 'plan'.  What users do have is knowledge of the data that the database
 can't effectively gather for some reason.  Looking at my query above,
 what it would need (assuming the planner could not be made to look
 through length()) would be something like:

 SELECT * FROM foo WHERE
   length(bar) = 1000 WITH SELECTIVITY 0.999
   AND length(bar) = 2 WITH SELECTIVITY 0.999;



 Wont this have scaling issues and  issues over time as the data in the
 table changes?

 Suppose I make a view with the above query. With time, as the data in the
 table changes, the selectivity values wont be good for planning. This may
 potentially lead to a lot of changes in the view definition and other
 places where this query was used.



 In general, I think I step back on my point that specifying the
 selectivity is a bad idea.

 Could this also work (for the time being) for cross-column statistics?


It is another issue.

I don't believe so SELECTIVITY can work well too. Slow queries are usually
related to some strange points in data. I am thinking so well concept
should be based on validity of estimations. Some plans are based on totally
wrong estimation, but should be fast due less sensitivity to bad
estimations. So well concept is penalization some risk plans - or use brute
force - like COLUMN store engine does. Their plan is usually simply and
tolerant to bad estimations.

Pavel


 Regards,

 Atri



 --
 Regards,

 Atri
 *l'apprenant*



Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Tom Lane
Atri Sharma atri.j...@gmail.com writes:
 Wont this have scaling issues and  issues over time as the data in the
 table changes?

It can't possibly have worse problems of that sort than explicitly
specifying a plan does.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Merlin Moncure
On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 I don't believe so SELECTIVITY can work well too. Slow queries are usually
 related to some strange points in data. I am thinking so well concept should
 be based on validity of estimations. Some plans are based on totally wrong
 estimation, but should be fast due less sensitivity to bad estimations. So
 well concept is penalization some risk plans - or use brute force - like
 COLUMN store engine does. Their plan is usually simply and tolerant to bad
 estimations.

Disagree.  There is a special case of slow query where problem is not
with the data but with the expression over the data; something in the
query defeats sampled selectivity.  Common culprits are:

*) CASE expressions
*) COALESCE
*) casts
*) simple tranformational expressions
*) predicate string concatenation

When using those expressions, you often end up with default
selectivity assumptions and if they are way off -- watch out.

Plan risk analysis solves a different problem: small changes in the
data mean big changes in the execution runtime.  It probably wouldn't
even help cases where the server thinks there is one row and you
actually have thousands or millions unless you want to implement a
selectivity range with perhaps a risk coefficient.  This was also
suggested sometime back and was also met with some skepticism (but
it'd be interesting to see!).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Jim Nasby

On 3/17/14, 12:58 PM, Stephen Frost wrote:

* Merlin Moncure (mmonc...@gmail.com) wrote:

Yeah -- the most common case I see is outlier culling where several
repeated low non-deterministic selectivity quals stack reducing the
row count estimate to 1.  For example:
SELECT * FROM foo WHERE length(bar) = 1000 AND length(bar) = 2;


This is exactly the issue that I've seen also- where we end up picking a
Nested Loop because we think only one row is going to be returned and
instead we end up getting a bunch and it takes forever.


FWIW, I've also seen problems with merge and hash joins at work, but I don't 
have any concrete examples handy. :(


There was also some speculation on trying to change plans mid-stream to
address a situation like that, once we realize what's happening.  Not
sure that's really practical but it would be nice to find some solution.


Just being able to detect that something has possibly gone wrong would be 
useful. We could log that to alert the DBA/user of a potential bad plan. We 
could even format this in such a fashion that it's suitable for emailing the 
community with; the query, the plan, the stats, etc. That might make it easier 
for us to fix the planner (although at this point it seems like we're hitting 
statistics gathering problems that we simply don't know how to solve).

There is another aspect of this though: plan stability. There are lots of cases 
where users couldn't care less about getting an optimal plan, but they care 
*greatly* about not getting a brain-dead plan.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Merlin Moncure
On Mon, Mar 17, 2014 at 2:02 PM, Jim Nasby j...@nasby.net wrote:
 Just being able to detect that something has possibly gone wrong would be
 useful. We could log that to alert the DBA/user of a potential bad plan. We
 could even format this in such a fashion that it's suitable for emailing the
 community with; the query, the plan, the stats, etc. That might make it
 easier for us to fix the planner (although at this point it seems like we're
 hitting statistics gathering problems that we simply don't know how to
 solve).

Again, that's not the case here.  The problem is that the server is
using hard wired assumptions (like, 10% selective) *instead* of
statistics -- at least in the case discussed above.  That being said,
I think you're on to something: EXPLAIN ANALYZE rowcounts don't
indicate if the row count was generated from data based assumptions or
SWAGs.  So maybe you could decorate the plan description with an
indicator that suggests when default selectivity rules were hit.

 There is another aspect of this though: plan stability. There are lots of
 cases where users couldn't care less about getting an optimal plan, but they
 care *greatly* about not getting a brain-dead plan.

Except for cases I noted above, I don't understand how you could flag
'sub-optimal' or 'brain-dead' plans.   The server always picks the
best plan it can.  The trick is to (in a very simple and
cpu-unintensive way) indicate when there isn't a lot of confidence in
the plan -- but that's not the same thing.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Martijn van Oosterhout
On Mon, Mar 17, 2014 at 01:20:47PM -0500, Merlin Moncure wrote:
 A query plan is a complicated thing that is the result of detail
 analysis of the data.  I bet there are less than 100 users on the
 planet with the architectural knowledge of the planner to submit a
 'plan'.  What users do have is knowledge of the data that the database
 can't effectively gather for some reason.  Looking at my query above,
 what it would need (assuming the planner could not be made to look
 through length()) would be something like:
 
 SELECT * FROM foo WHERE
   length(bar) = 1000 WITH SELECTIVITY 0.999
   AND length(bar) = 2 WITH SELECTIVITY 0.999;

A small issue with selectivity is that the selectivity is probably not
what the users are expecting anyway, since many will related to
conditional selectivities.  PostgreSQL is pretty good at single column
statistics, it just sometimes screws up on cross-column correlations. 
This ties in with alerting about a bad plan: if the EXPLAIN output
could list for each condition what the actual selectivity was it might
give user a way of understanding the problem.
   
So the example given might lead to output like:
   
clause   selectivity  estimated
length(bar)20.50 0.50
length(bar)1000 | length(bar)2 0.50 0.25
   
The execution engine can only output conditional selectivities because 
of the order of execution. But this would at least give users a handle 
on the problem.

Note that a first cut of the problem might simply be something like
likely()/unlikely() as in gcc.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Jim Nasby

On 3/17/14, 3:32 PM, Merlin Moncure wrote:

On Mon, Mar 17, 2014 at 2:02 PM, Jim Nasby j...@nasby.net wrote:

Just being able to detect that something has possibly gone wrong would be
useful. We could log that to alert the DBA/user of a potential bad plan. We
could even format this in such a fashion that it's suitable for emailing the
community with; the query, the plan, the stats, etc. That might make it
easier for us to fix the planner (although at this point it seems like we're
hitting statistics gathering problems that we simply don't know how to
solve).


Again, that's not the case here.  The problem is that the server is
using hard wired assumptions (like, 10% selective) *instead* of
statistics -- at least in the case discussed above.  That being said,
I think you're on to something: EXPLAIN ANALYZE rowcounts don't
indicate if the row count was generated from data based assumptions or
SWAGs.  So maybe you could decorate the plan description with an
indicator that suggests when default selectivity rules were hit.


It occurs to me... it should be cheap for us to track actual rowcounts compared 
to the estimate... perhaps it's worth doing that and flagging plans when the 
estimates are off by more than X percent. Though... I suspect that will just 
tell us what we already know. :(


There is another aspect of this though: plan stability. There are lots of
cases where users couldn't care less about getting an optimal plan, but they
care *greatly* about not getting a brain-dead plan.


Except for cases I noted above, I don't understand how you could flag
'sub-optimal' or 'brain-dead' plans.   The server always picks the
best plan it can.  The trick is to (in a very simple and
cpu-unintensive way) indicate when there isn't a lot of confidence in
the plan -- but that's not the same thing.


Keep in mind that the use case here is critical queries that MUST perform fast 
enough. They do NOT need to be optimal, but they definitely can not degrade into 
something stupid. It's often way better to have a query that's 50-100% slower than 
optimal as opposed to one that suddenly becomes 100+% slower than it normally is 
(regardless of if normal is optimal or not).

You could possibly do an anti-hint: Never use this plan, because we know it 
sucks.

Even better would be if the planner could estimate how bad a plan will become 
if we made assumptions that turn out to be wrong.

Another option would be tracking normal execution times (and the plans used) for queries 
(not completely unreasonable now with pg_stat_statements); if we try to run a query and 
it takes noticeably longer than normal and it's a different plan then assume the plan is 
bad, mark it as such, and try again with one of the known good plans.

Worst case would be allowing a means to store an acceptable plan and 
force/strongly suggest that the planner use it.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Claudio Freire
On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net wrote:

 Even better would be if the planner could estimate how bad a plan will
 become if we made assumptions that turn out to be wrong.


That's precisely what risk estimation was about.

Something like

SELECT * FROM wherever WHEN id  something LIMIT COST 1;

Would forbid a sequential scan *if* the table is big enough to suspect the
plan might take that much, or a nested loop *if* the planner cannot *prove*
it will be faster than that.

I don't believe the limit unit is obscure at all (page fetches being a nice
measuring stick), but what is, is what do you do when no plan fits the
limits.


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Jim Nasby

On 3/17/14, 2:16 PM, Merlin Moncure wrote:

On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehulepavel.steh...@gmail.com  wrote:

I don't believe so SELECTIVITY can work well too. Slow queries are usually
related to some strange points in data. I am thinking so well concept should
be based on validity of estimations. Some plans are based on totally wrong
estimation, but should be fast due less sensitivity to bad estimations. So
well concept is penalization some risk plans - or use brute force - like
COLUMN store engine does. Their plan is usually simply and tolerant to bad
estimations.

Disagree.  There is a special case of slow query where problem is not
with the data but with the expression over the data; something in the
query defeats sampled selectivity.  Common culprits are:

*) CASE expressions
*) COALESCE
*) casts
*) simple tranformational expressions
*) predicate string concatenation


*) time/date functions, ie WHERE date_trunc( 'quarter', some_timestamp ) = 
'2014-1-1'

Though, in this case it's probably much better to teach the parser how to turn 
that into a range expression.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Claudio Freire
On Mon, Mar 17, 2014 at 7:09 PM, Jim Nasby j...@nasby.net wrote:

 On 3/17/14, 2:16 PM, Merlin Moncure wrote:

 On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehulepavel.steh...@gmail.com
  wrote:

 I don't believe so SELECTIVITY can work well too. Slow queries are
 usually
 related to some strange points in data. I am thinking so well concept
 should
 be based on validity of estimations. Some plans are based on totally
 wrong
 estimation, but should be fast due less sensitivity to bad estimations.
 So
 well concept is penalization some risk plans - or use brute force - like
 COLUMN store engine does. Their plan is usually simply and tolerant to
 bad
 estimations.

 Disagree.  There is a special case of slow query where problem is not
 with the data but with the expression over the data; something in the
 query defeats sampled selectivity.  Common culprits are:

 *) CASE expressions
 *) COALESCE
 *) casts
 *) simple tranformational expressions
 *) predicate string concatenation


 *) time/date functions, ie WHERE date_trunc( 'quarter', some_timestamp ) =
 '2014-1-1'

 Though, in this case it's probably much better to teach the parser how to
 turn that into a range expression.



Maybe, maybe not.

An index over the truncated time can potentially be much more efficient.


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Jim Nasby

On 3/17/14, 5:07 PM, Claudio Freire wrote:


On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net 
mailto:j...@nasby.net wrote:

Even better would be if the planner could estimate how bad a plan will 
become if we made assumptions that turn out to be wrong.


That's precisely what risk estimation was about.

Something like

SELECT * FROM wherever WHEN id  something LIMIT COST 1;

Would forbid a sequential scan *if* the table is big enough to suspect the plan 
might take that much, or a nested loop *if* the planner cannot *prove* it will 
be faster than that.

I don't believe the limit unit is obscure at all (page fetches being a nice 
measuring stick), but what is, is what do you do when no plan fits the limits.


I don't think that's the same thing... what you're describing is a way to not 
begin a query if a low-enough cost plan can't be found.

What I'm talking about is when the planner picks one low-cost plan over another 
and it turns out the estimate of the one that was picked was WAY off. I've 
actually seen cases where plan estimates that were off by just 100 units 
produce wildly different results.

In that scenario, LIMIT COST won't help at all.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Jim Nasby

On 3/17/14, 5:12 PM, Claudio Freire wrote:


On Mon, Mar 17, 2014 at 7:09 PM, Jim Nasby j...@nasby.net 
mailto:j...@nasby.net wrote:

On 3/17/14, 2:16 PM, Merlin Moncure wrote:

On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehulepavel.stehule@gmail.__com 
mailto:pavel.steh...@gmail.com  wrote:

 I don't believe so SELECTIVITY can work well too. Slow queries 
are usually
 related to some strange points in data. I am thinking so well 
concept should
 be based on validity of estimations. Some plans are based on 
totally wrong
 estimation, but should be fast due less sensitivity to bad 
estimations. So
 well concept is penalization some risk plans - or use brute force 
- like
 COLUMN store engine does. Their plan is usually simply and 
tolerant to bad
 estimations.

Disagree.  There is a special case of slow query where problem is not
with the data but with the expression over the data; something in the
query defeats sampled selectivity.  Common culprits are:

*) CASE expressions
*) COALESCE
*) casts
*) simple tranformational expressions
*) predicate string concatenation


*) time/date functions, ie WHERE date_trunc( 'quarter', some_timestamp ) = 
'2014-1-1'

Though, in this case it's probably much better to teach the parser how to 
turn that into a range expression.



Maybe, maybe not.

An index over the truncated time can potentially be much more efficient.


More efficient than a range index? Maybe, but I'm doubtful. Even if that's 
true, in a warehouse you're going to want to limit by weeks, months, quarters, 
years, etc. So now you're stuck building tons of special indexes.

(Granted, most warehouses build a separate date dimension because of these 
kinds of problems... I'm hoping that we could do something better.)
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Josh Berkus
On 03/17/2014 01:57 PM, Martijn van Oosterhout wrote:
 On Mon, Mar 17, 2014 at 01:20:47PM -0500, Merlin Moncure wrote:
 A query plan is a complicated thing that is the result of detail
 analysis of the data.  I bet there are less than 100 users on the
 planet with the architectural knowledge of the planner to submit a
 'plan'.  What users do have is knowledge of the data that the database
 can't effectively gather for some reason.  Looking at my query above,
 what it would need (assuming the planner could not be made to look
 through length()) would be something like:

 SELECT * FROM foo WHERE
   length(bar) = 1000 WITH SELECTIVITY 0.999
   AND length(bar) = 2 WITH SELECTIVITY 0.999;

So, if we're going to support query decorators, we might as well go all
the way and just offer Oracle-style use this index.  Speaking as
someone who is often called on to fix performance issues in other
people's databases, I find major issues with query decorators:

1. they are impossible to maintain since they're scattered all over the
application code.

2. they eventually become a barrier to upgrading, once the performance
of the DB engine changes in a way that makes older query decorators
crippling and/or erroneous.  Because they are scattered all around the
code, it then becomes a major refactoring effort to fix them.

3. There's no obvious way to collect cumulative query hints in order to
supply data for database-level tuning, or for improving the postgresql
query planner.

4. There's no obvious way to use query decorators with ORMs, making them
useless to 95% of our users.

5. Application developers will add them without telling the DBAs, and
vice-versa.  Hilarity ensues.

Given that, I would strongly prefer a different mechanism for ad-hoc
query plan adjustment.  Possible other mechanisms would include:

a) ability to set selectivity for database objects, possibly including
per-column selectivity (i.e. selectivity for columns a  b  f is 0.01),
and save it.

b) ability to execute a query using a presupplied plan.  This would
include ability to edit the plan using some intermediate format, like
JSON or XML.

c) ability to finely adjust costs of specific query operations (e.g.
bitmapscan_cost = 0.02), possibly for specific database objects.

d) ability to save selectivity estimates for specific expressions (e.g.
selectivity on log15 ( session_id =, log_time BETWEEN ) = 0.03.

Personally, (b) is my favorite version of this feature, becuase it
allows me to test the query executor itself, and it raises the bar for
clobbering the query planner to people who are willing to spend a little
time on it.  It would also make a fantastic learning tool for learning
about database planning and optimization, and might open the door for
more people hacking on our planner.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Vik Fearing
On 03/17/2014 06:28 PM, Stephen Frost wrote:
 a value passed in
 as part of the query, with a high likelihood of users figuring out how
 to use it to say use my plan forever and always..

Last time this came up, I said on irc that if we ever do implement
hints, I'd like them to be tied to a major version somehow so that later
versions can issue warnings that the planner might have become smarter
about long-forgotten queries and they need to be re-tested to see if the
hint is still warrented.

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Vik Fearing
On 03/17/2014 11:15 PM, Josh Berkus wrote:
 2. they eventually become a barrier to upgrading, once the performance
 of the DB engine changes in a way that makes older query decorators
 crippling and/or erroneous.  Because they are scattered all around the
 code, it then becomes a major refactoring effort to fix them.

My idea of associating them with major versions and issuing warnings
would help with this.

 5. Application developers will add them without telling the DBAs, and
 vice-versa.  Hilarity ensues.

And this would be unmasked at next major upgrade when the logs fill up.

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Claudio Freire
On Mon, Mar 17, 2014 at 7:12 PM, Jim Nasby j...@nasby.net wrote:

 On 3/17/14, 5:07 PM, Claudio Freire wrote:


 On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net mailto:
 j...@nasby.net wrote:

 Even better would be if the planner could estimate how bad a plan
 will become if we made assumptions that turn out to be wrong.


 That's precisely what risk estimation was about.

 Something like

 SELECT * FROM wherever WHEN id  something LIMIT COST 1;

 Would forbid a sequential scan *if* the table is big enough to suspect
 the plan might take that much, or a nested loop *if* the planner cannot
 *prove* it will be faster than that.

 I don't believe the limit unit is obscure at all (page fetches being a
 nice measuring stick), but what is, is what do you do when no plan fits the
 limits.


 I don't think that's the same thing... what you're describing is a way to
 not begin a query if a low-enough cost plan can't be found.

 What I'm talking about is when the planner picks one low-cost plan over
 another and it turns out the estimate of the one that was picked was WAY
 off. I've actually seen cases where plan estimates that were off by just
 100 units produce wildly different results.

 In that scenario, LIMIT COST won't help at all.



The case you describe is different. It's when a plan *effectively* is more
expensive than estimated, but the planner could not estimate it.

That's what was mentioned about switching plans mid-way through them, which
is IMO quite ill-defined as such (it needs a lot of love in order to get a
workable spec out of that).


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Merlin Moncure
On Mon, Mar 17, 2014 at 5:15 PM, Josh Berkus j...@agliodbs.com wrote:
 So, if we're going to support query decorators, we might as well go all
 the way and just offer Oracle-style use this index.  Speaking as
 someone who is often called on to fix performance issues in other
 people's databases, I find major issues with query decorators:

Supplying selectivity estimates in places where the database can't or
wont do them properly itself is a completely different thing from
Oracle style hints.  For example, they are much more 'future proof' --
both to schema changes and postgres enhancments -- in the absolute
case the database can peek into your expression in some future version
and generate a better estimate than you can.   Aside from that, you're
in no way locked out of future innovations.  This could be done in a
relatively clean way: for example, by putting complex quals in an
inlineable function that is decorated with analog of COST/ROWS clause
-- perhaps allowing for a user defined expression to base selectivity
from the input arguments.

OTOH, Hints disable the planner and they are much more complex,
particularly for large queries since you have to consider how
components of the plan relate to one another.  Force feeding a plan to
the database is the wrong way to go.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby j...@nasby.net wrote:
 Even better would be if the planner could estimate how bad a plan will
 become if we made assumptions that turn out to be wrong.

 That's precisely what risk estimation was about.

Yeah.  I would like to see the planner's cost estimates extended to
include some sort of uncertainty estimate, whereupon risk-averse people
could ask it to prefer low-uncertainty plans over high-uncertainty ones
(the plans we typically choose for ORDER BY ... LIMIT queries being great
examples of the latter).  But it's a long way from wishing that to making
it so.  Right now it's not even clear (to me anyway) how we'd measure or
model such uncertainty.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
On Tue, Mar 18, 2014 at 12:46 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  I don't believe so SELECTIVITY can work well too. Slow queries are
 usually
  related to some strange points in data. I am thinking so well concept
 should
  be based on validity of estimations. Some plans are based on totally
 wrong
  estimation, but should be fast due less sensitivity to bad estimations.
 So
  well concept is penalization some risk plans - or use brute force - like
  COLUMN store engine does. Their plan is usually simply and tolerant to
 bad
  estimations.

 Disagree.  There is a special case of slow query where problem is not
 with the data but with the expression over the data; something in the
 query defeats sampled selectivity.  Common culprits are:

 *) CASE expressions
 *) COALESCE
 *) casts
 *) simple tranformational expressions
 *) predicate string concatenation

 When using those expressions, you often end up with default
 selectivity assumptions and if they are way off -- watch out.

 Plan risk analysis solves a different problem: small changes in the
 data mean big changes in the execution runtime.  It probably wouldn't
 even help cases where the server thinks there is one row and you
 actually have thousands or millions unless you want to implement a
 selectivity range with perhaps a risk coefficient.  This was also
 suggested sometime back and was also met with some skepticism (but
 it'd be interesting to see!).




Another case is with prepared statements, when things like array size are
not know to the planner and the planner makes a hard coded estimate for it,
leading to selection of a customized plan which is worse than the generic
plan.

This would be even more useful for prepared statements since they need some
support from the user in terms of the selectivity and the user should be
allowed to tell more about the data, since he already has given us some
indications about the type of query plans he requires using prepared
statements.


Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*