Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-20 Thread David Rowley
On 16 December 2015 at 13:26, Simon Riggs  wrote:

> There is an interesting real world case where we might get some use of
> these thoughts.
>
> If we have Orders and OrderItems (FK->Orders)
> and we also know (and can Assert) Order.order_date <= OrderItems.ship_date
> then a restriction on Orders.order_date > X => OrderItem.ship_date > X
> when the two tables are joined on OrderId
> and also a restriction on OrderItems.ship_date >= X => Orders.order_date <
> X when the two tables are joined on OrderId
>
> Such an assertion could be checked during the FK check, so would not be
> expensive to maintain.
>
> One for the future, at least, since we don't have any way of expressing or
> enforcing that just yet.
>
>
That does sound interesting, but it's important to remember that referenced
tables are not updated in real time in that same way that indexes are. This
was the reason the INNER JOIN removals had problems, we simply can't
determine at planner time that the trigger queue for the foreign key will
be empty during execution, so can't be certain that the foreign key will be
"true".

I'm just mentioning this as I wouldn't want someone to run off thinking
this was a fantastic idea without being aware of the above, and waste time
making the same mistakes as I did last year.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-20 Thread David Fetter
On Sun, Dec 20, 2015 at 10:27:35PM +1300, David Rowley wrote:
> On 16 December 2015 at 13:26, Simon Riggs  wrote:
> 
> > There is an interesting real world case where we might get some
> > use of these thoughts.
> >
> > If we have Orders and OrderItems (FK->Orders) and we also know
> > (and can Assert) Order.order_date <= OrderItems.ship_date then a
> > restriction on Orders.order_date > X => OrderItem.ship_date > X
> > when the two tables are joined on OrderId and also a restriction
> > on OrderItems.ship_date >= X => Orders.order_date < X when the two
> > tables are joined on OrderId
> >
> > Such an assertion could be checked during the FK check, so would
> > not be expensive to maintain.
> >
> > One for the future, at least, since we don't have any way of
> > expressing or enforcing that just yet.
> >
> That does sound interesting, but it's important to remember that
> referenced tables are not updated in real time in that same way that
> indexes are.

Is getting them so even remotely possible, given the system we have
now?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-15 Thread Simon Riggs
On 7 December 2015 at 16:44, Simon Riggs  wrote:

> On 6 December 2015 at 16:38, Tom Lane  wrote:
>
> >> Lastly, in most cases knowing that t2.id <= 10 is just not worth all
>> >> that much; it's certainly far less useful than an equality condition.
>>
>> > I'd have thought that my link to a thread of a reported 1100 to 2200
>> times
>> > performance improvement might have made you think twice about claiming
>> the
>> > uselessness of this idea.
>>
>
> Personally, I think this optimization is a long way down the list of
> important items because I don't see it as a typical use case. There are
> already patches submitted for more important items, so this isn't the right
> place to be arguing such things. Not every beneficial optimization has a
> wide use case.
>

There is an interesting real world case where we might get some use of
these thoughts.

If we have Orders and OrderItems (FK->Orders)
and we also know (and can Assert) Order.order_date <= OrderItems.ship_date
then a restriction on Orders.order_date > X => OrderItem.ship_date > X when
the two tables are joined on OrderId
and also a restriction on OrderItems.ship_date >= X => Orders.order_date <
X when the two tables are joined on OrderId

Such an assertion could be checked during the FK check, so would not be
expensive to maintain.

One for the future, at least, since we don't have any way of expressing or
enforcing that just yet.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-15 Thread Tomas Vondra

On 12/16/2015 01:26 AM, Simon Riggs wrote:


There is an interesting real world case where we might get some use of
these thoughts.

If we have Orders and OrderItems (FK->Orders)
and we also know (and can Assert) Order.order_date <= OrderItems.ship_date
then a restriction on Orders.order_date > X => OrderItem.ship_date > X
when the two tables are joined on OrderId
and also a restriction on OrderItems.ship_date >= X => Orders.order_date
< X when the two tables are joined on OrderId

Such an assertion could be checked during the FK check, so would not be
expensive to maintain.

One for the future, at least, since we don't have any way of expressing
or enforcing that just yet.


There's a concept of "correlation maps", described in a paper [1] 
presented on VLDB 2009. It essentially talks about deriving conditions 
between attributes of the same table, but I guess it might be modified 
to track the correlations through foreign keys.


Interestingly enough, the data for the paper paper were collected on 
PostgreSQL, but the correlation maps were implemented in an application 
layer on top of the database.


[1] http://www.vldb.org/pvldb/2/vldb09-199.pdf

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-08 Thread Jeremy Harris
On 07/12/15 16:44, Simon Riggs wrote:
> There are many optimizations we might adopt, yet planning time is a factor.
> It seems simple enough to ignore more complex optimizations if we have
> already achieved a threshold cost (say 10). Such a test would add nearly
> zero time for the common case. We can apply the optimizations in some kind
> of ordering depending upon the cost, so we are careful to balance the
> cost/benefit of trying certain optimizations.

Given parallelism, why not continue planning after initiating a
a cancellable execution, giving a better plan to be used if the
excecution runs for long enough?
-- 
Cheers,
  Jeremy




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-08 Thread Jim Nasby

On 12/7/15 7:26 PM, David Rowley wrote:

I was talking to Thomas Munro yesterday about this, and he mentioned
that it would be quite nice to have some stats on how much time is spent
in the planner, vs executor. He came up with the idea of adding a column
to pg_stat_statements to record the planning time.


I think that would be useful. Maybe something in pg_stat_database too.


If we could get real statistics on real world cases and we discovered
that, for example on average that the total CPU time of planning was
only 1% of execution time, then it would certainly make adding 2%
overhead onto the planner a bit less of a worry, as this would just be
%2 of 1% (0.02%). Such information, if fed back into the community might
be able to drive us in the right direction when it comes to deciding
what needs to be done to resolve this constant issue with accepting
planner improvement patches.


Might be nice, but I think it's also pretty unnecessary.

I've dealt with dozens of queries that took minutes to hours to run. Yet 
I can't recall ever having an EXPLAIN on one of these take more than a 
few seconds. I tend to do more OLTP stuff so maybe others have 
experienced long-running EXPLAIN, in which case it'd be great to know that.


Actually, I have seen long EXPLAIN times, but only as part of trying to 
aggressively increase *_collapse_limit. IIRC I was able to increase one 
of those to 14 and one to 18 before plan time became unpredictably bad 
(it wasn't always bad though, just sometimes).



I believe that with parallel query on the horizon for 9.6 that we're now
aiming to support bigger OLAP type database than ever before. So if we
ignore patches like this one then it appears that we have some
conflicting goals in the community as it seems that we're willing to add
the brawn, but we're not willing to add the brain. If this is the case
then it's a shame, as I think we can have both. So I very much agree on
the fact that we must find a way to maintain support and high
performance of small OLTP databases too.


+1
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-08 Thread Jim Nasby

On 12/8/15 3:52 AM, Jeremy Harris wrote:

On 07/12/15 16:44, Simon Riggs wrote:

There are many optimizations we might adopt, yet planning time is a factor.
It seems simple enough to ignore more complex optimizations if we have
already achieved a threshold cost (say 10). Such a test would add nearly
zero time for the common case. We can apply the optimizations in some kind
of ordering depending upon the cost, so we are careful to balance the
cost/benefit of trying certain optimizations.


Given parallelism, why not continue planning after initiating a
a cancellable execution, giving a better plan to be used if the
excecution runs for long enough?


Because that would take significantly more work than what Simon is 
proposing.


That said, I think the ability to restart with a different plan is 
something we might need, for cases when we discover the plan estimates 
were way off. If that ever gets built it might be useful for what you 
propose as well.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-08 Thread Robert Haas
On Mon, Dec 7, 2015 at 8:26 PM, David Rowley
 wrote:
> The biggest frustration for me is that the way Tom always seems to argue his
> point it's as if planning time is roughly the same or more expensive than
> execution time, and likely that's true in many cases, but I would imagine in
> more normal cases that execution time is longer, although I've never had the
> guts to stand up and argued this as I don't have any stats to back me up.

I think this is missing the point.  It is possible to have a query
planner optimization that is so expensive that it loses even when it
improves the plan, but I don't think this optimization falls into that
category.  This particular change would not have been requested as
many times as it has been if people didn't keep rediscovering that, on
a certain class of queries, it works *really* well.  The problem that
I understand Tom to be concerned about is the queries where the
optimization consumes additional planning time without delivering a
better plan - and those where, without careful thought, it might even
deliver a worse plan.

Now, I'm not sure Tom is right about that, but he might be.  The class
of queries we're talking about here are the ones where somebody
constrains a column that is part of an equivalence class with multiple
members.  Perhaps the best example is a large join, let's say 10
tables, where the join column is the same for all tables; thus, we
have an equivalence class with 10 members.  Suppose further we have an
inequality condition applied to one member of that equivalence class.

Currently, we'll apply that inequality to the table that the user
actually mentioned and ignore all the others; in theory, it could be
right to enforce that inequality condition on any nonempty subset of
the 10 tables we've got.  It might be right to pick exactly one of
those tables and enforce the inequality there, or it might be right to
enforce it on some of them, or it might be right to enforce it on all
of them.  The reason why the answer isn't automatically "all of them"
is because, first of all, it's possible that enforcing the condition
at a particular table costs more to filter out the rows that we save
in execution time at higher levels of the plan tree.  For example,
consider A JOIN B ON A.X = B.X WHERE A.X > 100.  It might be that
the range of A.X is [0,101] but the range of B.X is
[100,200]; so enforcing the inequality against A is very
selective but enforcing it against B filters out basically nothing.
Furthermore, there are some cases involving parameterized paths where
enforcing the inequality multiple times is definitely bad: for
example, if we've got a nested loop where the outer side is a seq scan
that enforces the condition and the inner side is an index probe, it
is just a waste to retest it on the inner side.  We already know that
the outer row passes the inequality, so the inner row will necessarily
pass also.  This doesn't apply to merge or hash joins, and it also
doesn't apply to all nested loops: scans that aren't paramaterized by
the equivalence-class column can still benefit from separate
enforcement of the inequality.

Considering the factors mentioned in the previous paragraph, it seems
likely to me that a naive patch that propagates the inequalities to
every relation where they could hypothetically be enforced will be a
significant loss in some cases even just in execution cost, ignoring
completely the effects on planning time.  And, on the flip side,
figuring out what non-empty subset of relations forms the optimal set
upon which to enforce the inequality seems like a complicated problem.
  A first cut might be to enforce the inequality against the relation
where it's believed to be most selective, and to also enforce it in
paths for other relations that aren't parameterized by the
equivalence-class column mentioned in the inequality provided that the
selectivity is thought to be above some threshold ... but I'm not sure
this is very principled, and it's certainly not obvious what arbitrary
threshold would be appropriate.  The threshold where multiple
enforcement of the qual starts to pay off also depends on the cost of
the qual: an expensive qual has to filter out more rows than a cheap
qual to be worthwhile.  I'm not sure how to estimate all this, but I
don't have trouble believing that if not done carefully it could
either (a) cost a lot of planning time or (b) generate lousy plans.

Now, all that having been said, I think this is a pretty important
optimization.  Lots of people have asked for it, and I think it would
be worth expending some brainpower to try to figure out a way to be
smarter than we are now, which is, in a nutshell, as dumb as possible.
You're completely right that, on an OLAP query that's going to run for
a minute, a few extra milliseconds of planning time could be totally
OK even if they don't yield any benefit on most queries. Maybe we can
get the cost of this feature down 

Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-08 Thread Robert Haas
On Tue, Dec 8, 2015 at 11:12 PM, David Rowley
 wrote:
> My point was more of a response to the general condition that we cannot add
> any planner overhead unless the extra CPU cycles spent are almost certainly
> going improve the plan.

I hope that's an overstatement of the requirement, because otherwise
it boils down to "don't improve the planner".  Most things we do to
try to improve plans are going to generate paths that will only
sometimes be better than the paths we're already generating.  "almost
certainly" is a high bar to clear.

> hmm, well I think it's more than likely that my view on this has been skewed
> by the fact that we push equality quals down with the eclass contains a
> Const without any regard that it could generate a worse plan or add needless
> CPU overhead for checking the quals. If you rewrite your above paragraph
> with equality instead of inequality then it still applies. A JOIN B ON A.X =
> B.X WHERE A.X = 100, will clause B.X = 100 to be pushed into B, but
> what if B.X values are all set to 100? I've not seen anyone complain
> about us doing that.  This is the reason I limited the patch to only deal
> with members of the BTREE op-class, I assumed that if there's some BTREE
> index helping with the equality qual then that same index may well just help
> with a qual using the <, <=, > or >= operator, and also I thought that in
> most cases all these btree inequality operations will have about the same
> CPU cost as equality.
>
> Would you be able to explain the difference between the two? Or is it just a
> question of the likelihood?

Likelihood, I guess.  I mean, typically if you are doing an equijoin
on two or more relations, the join column is unique, so there's only
going to be one row in each of A and B that is equal to a given
constant.  If A.X and/or B.X contain duplicate values, then the output
is going to have a number of rows equal to the product of the number
of duplicates in each, sort of like a Cartesian join.  That scenario
could happen, but it's a pretty strange kind of query which I would be
disinclined to spend a lot of time optimizing.  OTOH, something like
SELECT * FROM orders o, lines l WHERE l.order_id = o.order_id AND
o.order_id > 100 is a pretty normal query, at least IMHO.
Worrying about how that's going to perform with various data
distributions seems like a pretty reasonable thing to care about.

>> Furthermore, there are some cases involving parameterized paths where
>> enforcing the inequality multiple times is definitely bad: for
>> example, if we've got a nested loop where the outer side is a seq scan
>> that enforces the condition and the inner side is an index probe, it
>> is just a waste to retest it on the inner side.  We already know that
>> the outer row passes the inequality, so the inner row will necessarily
>> pass also.  This doesn't apply to merge or hash joins, and it also
>> doesn't apply to all nested loops: scans that aren't paramaterized by
>> the equivalence-class column can still benefit from separate
>> enforcement of the inequality.
>>
> I guess that could be fixed by somehow marking these pushed quals as
> optional and having parameterised scans ignore optional quals.

Yeah.

> I have to admit that I didn't give it much thought as all of the cases that
> I tested turned what used to be nested loop with a parameterised inner index
> scan into a merge join, which was faster in my test case. Of course, that
> does not mean that I claim that this merge join will be faster in all cases
> or that the plan will switch to using a merge join in all cases.

Interesting that it turned into a merge join and that that was faster.

> Again this is one of the reason that I limited it to btree operators only.
> I don't quite know how to estimate this either as the extra rows being
> filtered may mean that a sort no longer spills to disk, or a hash join no
> longer multi-batches. I'd imagine filtering would be extra worthwhile in
> such cases, so I doubt setting the threshold to some constant value is the
> correct way, and most likely considering the path with and without the
> qual(s) would be far too costly.

It might be OK, particularly for OLTP queries, but it's certainly not
going to be the cheapest thing anybody ever did.

> Well I agree with that. I've got no interest in slowing anything down. I've
> been busy working with big databases for quite a while now, so perhaps my
> point of view is coming from that direction still.

Yeah.

> So anyway, it's quite clear that we don't want the patch in its current
> form, and I'm not volunteering any more time at this stage to improve it, so
> for this reason I won't add it the January commit fest.

Too bad, but I understand.  I hope you come back around to working on
this further at some point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-08 Thread David Rowley
On 9 December 2015 at 15:14, Robert Haas  wrote:

> On Mon, Dec 7, 2015 at 8:26 PM, David Rowley
>  wrote:
> > The biggest frustration for me is that the way Tom always seems to argue
> his
> > point it's as if planning time is roughly the same or more expensive than
> > execution time, and likely that's true in many cases, but I would
> imagine in
> > more normal cases that execution time is longer, although I've never had
> the
> > guts to stand up and argued this as I don't have any stats to back me up.
>
> I think this is missing the point.  It is possible to have a query
> planner optimization that is so expensive that it loses even when it
> improves the plan, but I don't think this optimization falls into that
> category.  This particular change would not have been requested as
> many times as it has been if people didn't keep rediscovering that, on
> a certain class of queries, it works *really* well.  The problem that
> I understand Tom to be concerned about is the queries where the
> optimization consumes additional planning time without delivering a
> better plan - and those where, without careful thought, it might even
> deliver a worse plan.
>
>
My point was more of a response to the general condition that we cannot add
any planner overhead unless the extra CPU cycles spent are almost certainly
going improve the plan.


> Now, I'm not sure Tom is right about that, but he might be.  The class
> of queries we're talking about here are the ones where somebody
> constrains a column that is part of an equivalence class with multiple
> members.  Perhaps the best example is a large join, let's say 10
> tables, where the join column is the same for all tables; thus, we
> have an equivalence class with 10 members.  Suppose further we have an
> inequality condition applied to one member of that equivalence class.
>
> Currently, we'll apply that inequality to the table that the user
> actually mentioned and ignore all the others; in theory, it could be
> right to enforce that inequality condition on any nonempty subset of
> the 10 tables we've got.  It might be right to pick exactly one of
> those tables and enforce the inequality there, or it might be right to
> enforce it on some of them, or it might be right to enforce it on all
> of them.  The reason why the answer isn't automatically "all of them"
> is because, first of all, it's possible that enforcing the condition
> at a particular table costs more to filter out the rows that we save
> in execution time at higher levels of the plan tree.  For example,
> consider A JOIN B ON A.X = B.X WHERE A.X > 100.  It might be that
> the range of A.X is [0,101] but the range of B.X is
> [100,200]; so enforcing the inequality against A is very
> selective but enforcing it against B filters out basically nothing.
>

hmm, well I think it's more than likely that my view on this has been
skewed by the fact that we push equality quals down with the eclass
contains a Const without any regard that it could generate a worse plan or
add needless CPU overhead for checking the quals. If you rewrite your above
paragraph with equality instead of inequality then it still applies. A JOIN
B ON A.X = B.X WHERE A.X = 100, will clause B.X = 100 to be pushed
into B, but what if B.X values are all set to 100? I've not seen anyone
complain about us doing that.  This is the reason I limited the patch to
only deal with members of the BTREE op-class, I assumed that if there's
some BTREE index helping with the equality qual then that same index may
well just help with a qual using the <, <=, > or >= operator, and also I
thought that in most cases all these btree inequality operations will have
about the same CPU cost as equality.

Would you be able to explain the difference between the two? Or is it just
a question of the likelihood?



> Furthermore, there are some cases involving parameterized paths where
> enforcing the inequality multiple times is definitely bad: for
> example, if we've got a nested loop where the outer side is a seq scan
> that enforces the condition and the inner side is an index probe, it
> is just a waste to retest it on the inner side.  We already know that
> the outer row passes the inequality, so the inner row will necessarily
> pass also.  This doesn't apply to merge or hash joins, and it also
> doesn't apply to all nested loops: scans that aren't paramaterized by
> the equivalence-class column can still benefit from separate
> enforcement of the inequality.
>
>
I guess that could be fixed by somehow marking these pushed quals as
optional and having parameterised scans ignore optional quals.
I have to admit that I didn't give it much thought as all of the cases that
I tested turned what used to be nested loop with a parameterised inner
index scan into a merge join, which was faster in my test case. Of course,
that does not mean that I claim that this merge join will be faster 

Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Tom Lane
Jim Nasby  writes:
> On 12/6/15 10:38 AM, Tom Lane wrote:
>> I said "in most cases".  You can find example cases to support almost any
>> weird planner optimization no matter how expensive and single-purpose;
>> but that is the wrong way to think about it.  What you have to think about
>> is average cases, and in particular, not putting a drag on planning time
>> in cases where no benefit ensues.  We're not committing any patches that
>> give one uncommon case an 1100X speedup by penalizing every other query 10%,
>> or even 1%; especially not when there may be other ways to fix it.

> This is a problem that seriously hurts Postgres in data warehousing 
> applications.

Please provide some specific examples.  I remain skeptical that this
would make a useful difference all that often in the real world ...
and handwaving like that does nothing to change my opinion.  What do
the queries look like, and why would deducing an extra inequality
condition help them?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Jim Nasby

On 12/6/15 10:38 AM, Tom Lane wrote:

I said "in most cases".  You can find example cases to support almost any
weird planner optimization no matter how expensive and single-purpose;
but that is the wrong way to think about it.  What you have to think about
is average cases, and in particular, not putting a drag on planning time
in cases where no benefit ensues.  We're not committing any patches that
give one uncommon case an 1100X speedup by penalizing every other query 10%,
or even 1%; especially not when there may be other ways to fix it.


This is a problem that seriously hurts Postgres in data warehousing 
applications. We can't keep ignoring optimizations that provide even as 
little as 10% execution improvements for 10x worse planner performance, 
because in a warehouse it's next to impossible for planning time to matter.


Obviously it'd be great if there was a fast, easy way to figure out 
whether a query would be expensive enough to go the whole 9 yards on 
planning it but at this point I suspect a simple GUC would be a big 
improvement.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread David G. Johnston
On Mon, Dec 7, 2015 at 8:35 AM, Jim Nasby  wrote:

> On 12/6/15 10:38 AM, Tom Lane wrote:
>
>> I said "in most cases".  You can find example cases to support almost any
>> weird planner optimization no matter how expensive and single-purpose;
>> but that is the wrong way to think about it.  What you have to think about
>> is average cases, and in particular, not putting a drag on planning time
>> in cases where no benefit ensues.  We're not committing any patches that
>> give one uncommon case an 1100X speedup by penalizing every other query
>> 10%,
>> or even 1%; especially not when there may be other ways to fix it.
>>
>
> This is a problem that seriously hurts Postgres in data warehousing
> applications. We can't keep ignoring optimizations that provide even as
> little as 10% execution improvements for 10x worse planner performance,
> because in a warehouse it's next to impossible for planning time to matter.
>
> Obviously it'd be great if there was a fast, easy way to figure out
> whether a query would be expensive enough to go the whole 9 yards on
> planning it but at this point I suspect a simple GUC would be a big
> improvement.


Something like "enable_equivalencefilters" but that defaults to false
unlike every one existing "enable_*" GUC?

​It would be a lot more user-friendly to have something along the lines of
"planner_mode (text)" with labels like "star, transactional, bulk_load,
etc..." because I suspect there are other things we'd want to add if we
start identifying queries by their type/usage and optimize accordingly.
Having the knobs available is necessary but putting on a façade would make
the user more straight-forward for the common cases.

David J.


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Jim Nasby

On 12/7/15 9:54 AM, Tom Lane wrote:

Jim Nasby  writes:

>On 12/6/15 10:38 AM, Tom Lane wrote:

>>I said "in most cases".  You can find example cases to support almost any
>>weird planner optimization no matter how expensive and single-purpose;
>>but that is the wrong way to think about it.  What you have to think about
>>is average cases, and in particular, not putting a drag on planning time
>>in cases where no benefit ensues.  We're not committing any patches that
>>give one uncommon case an 1100X speedup by penalizing every other query 10%,
>>or even 1%; especially not when there may be other ways to fix it.

>This is a problem that seriously hurts Postgres in data warehousing
>applications.

Please provide some specific examples.  I remain skeptical that this
would make a useful difference all that often in the real world ...
and handwaving like that does nothing to change my opinion.  What do
the queries look like, and why would deducing an extra inequality
condition help them?


I was speaking more broadly than this particular case. There's a lot of 
planner improvements that get shot down because of the planning overhead 
they would add. That's great for cases when milliseconds count, but 
spending an extra 60 seconds (a planning eternity) to shave an hour off 
a warehouse/reporting query.


There needs to be some way to give the planner an idea of how much 
effort it should expend. GEQO and *_collapse_limit addresses this in the 
opposite direction (putting a cap on planner effort), but I think we 
need something that does the opposite "I know this query will take a 
long time, so expend extra effort on planning it."

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Jim Nasby

On 12/7/15 10:44 AM, Simon Riggs wrote:

There are many optimizations we might adopt, yet planning time is a
factor. It seems simple enough to ignore more complex optimizations if
we have already achieved a threshold cost (say 10). Such a test would
add nearly zero time for the common case. We can apply the optimizations
in some kind of ordering depending upon the cost, so we are careful to
balance the cost/benefit of trying certain optimizations.


Unfortunately I've seen a lot of millisecond queries that have 6 figure 
estimates, due to data being in cache. So I'm not sure how practical 
that would be.


Maybe a better starting point would be a planner timeout.

I definitely agree we need some method to limit planning time when 
necessary (ie: OLTP). Without that we'll never be able to start testing 
more complex optimizations.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Simon Riggs
On 6 December 2015 at 16:38, Tom Lane  wrote:

>> Lastly, in most cases knowing that t2.id <= 10 is just not worth all
> >> that much; it's certainly far less useful than an equality condition.
>
> > I'd have thought that my link to a thread of a reported 1100 to 2200
> times
> > performance improvement might have made you think twice about claiming
> the
> > uselessness of this idea.
>

Personally, I think this optimization is a long way down the list of
important items because I don't see it as a typical use case. There are
already patches submitted for more important items, so this isn't the right
place to be arguing such things. Not every beneficial optimization has a
wide use case.

Since the discussion has become more general, I would add a few points.

I said "in most cases".  You can find example cases to support almost any
> weird planner optimization no matter how expensive and single-purpose;
> but that is the wrong way to think about it.  What you have to think about
> is average cases, and in particular, not putting a drag on planning time
> in cases where no benefit ensues.  We're not committing any patches that
> give one uncommon case an 1100X speedup by penalizing every other query
> 10%,
> or even 1%; especially not when there may be other ways to fix it.
>

I agree with this point also, I just don't see it as a blocker for
expensive general case optimizations.

There are many optimizations we might adopt, yet planning time is a factor.
It seems simple enough to ignore more complex optimizations if we have
already achieved a threshold cost (say 10). Such a test would add nearly
zero time for the common case. We can apply the optimizations in some kind
of ordering depending upon the cost, so we are careful to balance the
cost/benefit of trying certain optimizations.

Optimizer directives might be useful also, but we can do just as well with
a heuristic.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Gavin Flower

On 08/12/15 05:27, David G. Johnston wrote:
On Mon, Dec 7, 2015 at 8:35 AM, Jim Nasby >wrote:


On 12/6/15 10:38 AM, Tom Lane wrote:

I said "in most cases".  You can find example cases to support
almost any
weird planner optimization no matter how expensive and
single-purpose;
but that is the wrong way to think about it.  What you have to
think about
is average cases, and in particular, not putting a drag on
planning time
in cases where no benefit ensues.  We're not committing any
patches that
give one uncommon case an 1100X speedup by penalizing every
other query 10%,
or even 1%; especially not when there may be other ways to fix it.


This is a problem that seriously hurts Postgres in data
warehousing applications. We can't keep ignoring optimizations
that provide even as little as 10% execution improvements for 10x
worse planner performance, because in a warehouse it's next to
impossible for planning time to matter.

Obviously it'd be great if there was a fast, easy way to figure
out whether a query would be expensive enough to go the whole 9
yards on planning it but at this point I suspect a simple GUC
would be a big improvement.


Something like "enable_equivalencefilters" but that defaults to false 
unlike every one existing "enable_*" GUC?


​It would be a lot more user-friendly to have something along the 
lines of "planner_mode (text)" with labels like "star, transactional, 
bulk_load, etc..." because I suspect there are other things we'd want 
to add if we start identifying queries by their type/usage and 
optimize accordingly.  Having the knobs available is necessary but 
putting on a façade would make the user more straight-forward for the 
common cases.


David J.


How about:

planning_time_base 10  # Default effort, may be increased or decreased 
as required - must be at least 1
planning_time_  0  # By default, planner makes no (or minimal) 
effort to optimise for feature 


So for some people, adjusting planning_time_base may be sufficient - but 
for more specialised cases, people can tell the planner to consider 
expending more effort.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Evgeniy Shishkin

> On 07 Dec 2015, at 22:27, Gavin Flower  wrote:
> 
> On 08/12/15 05:27, David G. Johnston wrote:
>> On Mon, Dec 7, 2015 at 8:35 AM, Jim Nasby > >wrote:
>> 
>>On 12/6/15 10:38 AM, Tom Lane wrote:
>> 
>>I said "in most cases".  You can find example cases to support
>>almost any
>>weird planner optimization no matter how expensive and
>>single-purpose;
>>but that is the wrong way to think about it.  What you have to
>>think about
>>is average cases, and in particular, not putting a drag on
>>planning time
>>in cases where no benefit ensues.  We're not committing any
>>patches that
>>give one uncommon case an 1100X speedup by penalizing every
>>other query 10%,
>>or even 1%; especially not when there may be other ways to fix it.
>> 
>> 
>>This is a problem that seriously hurts Postgres in data
>>warehousing applications. We can't keep ignoring optimizations
>>that provide even as little as 10% execution improvements for 10x
>>worse planner performance, because in a warehouse it's next to
>>impossible for planning time to matter.
>> 
>>Obviously it'd be great if there was a fast, easy way to figure
>>out whether a query would be expensive enough to go the whole 9
>>yards on planning it but at this point I suspect a simple GUC
>>would be a big improvement.
>> 
>> 
>> Something like "enable_equivalencefilters" but that defaults to false unlike 
>> every one existing "enable_*" GUC?
>> 
>> ​It would be a lot more user-friendly to have something along the lines of 
>> "planner_mode (text)" with labels like "star, transactional, bulk_load, 
>> etc..." because I suspect there are other things we'd want to add if we 
>> start identifying queries by their type/usage and optimize accordingly. 
>> Having the knobs available is necessary but putting on a façade would make 
>> the user more straight-forward for the common cases.
>> 
>> David J.
>> 
> How about:
> 
> planning_time_base 10  # Default effort, may be increased or decreased as 
> required - must be at least 1
> planning_time_  0  # By default, planner makes no (or minimal) effort to 
> optimise for feature 
> 
> So for some people, adjusting planning_time_base may be sufficient - but for 
> more specialised cases, people can tell the planner to consider expending 
> more effort.
> 

Mysql have now 19 optimizer_switch parameters
https://dev.mysql.com/doc/refman/5.7/en/switchable-optimizations.html

Please don't do that.


I'd rather like some sort of pg_stat_statements, which would track execution 
and planning time.
On new query, we can lookup if query can benefit from more planning time.
But i don't know how costly this can be. 

> 
> Cheers,
> Gavin
> 
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Gavin Flower

On 08/12/15 08:34, Evgeniy Shishkin wrote:

On 07 Dec 2015, at 22:27, Gavin Flower  wrote:

On 08/12/15 05:27, David G. Johnston wrote:

On Mon, Dec 7, 2015 at 8:35 AM, Jim Nasby >wrote:

On 12/6/15 10:38 AM, Tom Lane wrote:

I said "in most cases".  You can find example cases to support
almost any
weird planner optimization no matter how expensive and
single-purpose;
but that is the wrong way to think about it.  What you have to
think about
is average cases, and in particular, not putting a drag on
planning time
in cases where no benefit ensues.  We're not committing any
patches that
give one uncommon case an 1100X speedup by penalizing every
other query 10%,
or even 1%; especially not when there may be other ways to fix it.


This is a problem that seriously hurts Postgres in data
warehousing applications. We can't keep ignoring optimizations
that provide even as little as 10% execution improvements for 10x
worse planner performance, because in a warehouse it's next to
impossible for planning time to matter.

Obviously it'd be great if there was a fast, easy way to figure
out whether a query would be expensive enough to go the whole 9
yards on planning it but at this point I suspect a simple GUC
would be a big improvement.


Something like "enable_equivalencefilters" but that defaults to false unlike every one 
existing "enable_*" GUC?

​It would be a lot more user-friendly to have something along the lines of "planner_mode 
(text)" with labels like "star, transactional, bulk_load, etc..." because I suspect 
there are other things we'd want to add if we start identifying queries by their type/usage and 
optimize accordingly. Having the knobs available is necessary but putting on a façade would make 
the user more straight-forward for the common cases.

David J.


How about:

planning_time_base 10  # Default effort, may be increased or decreased as 
required - must be at least 1
planning_time_  0  # By default, planner makes no (or minimal) effort to 
optimise for feature 

So for some people, adjusting planning_time_base may be sufficient - but for 
more specialised cases, people can tell the planner to consider expending more 
effort.


Mysql have now 19 optimizer_switch parameters
https://dev.mysql.com/doc/refman/5.7/en/switchable-optimizations.html
I notice that they are either on or off, I suspect that it is better to 
have some sort of measure of how much extra effort the planner should make.




Please don't do that.
I think having some might be useful - though in most situations, having 
a general indicator to the planner might be sufficient.


From reading the thread, I have the impression that for some extreme 
workloads, them some extra twiddling would be useful even though for 
most people it simply be an unnecessary complication.


In over twenty years I've never needed such knobs, but I might get a 
project next year where they might be useful.  So I agree that for most 
situations, such extra stuff is not needed - but I'd like additional 
options available if I ever needed them.




I'd rather like some sort of pg_stat_statements, which would track execution 
and planning time.
On new query, we can lookup if query can benefit from more planning time.
But i don't know how costly this can be.


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Simon Riggs
On 7 December 2015 at 16:55, Jim Nasby  wrote:


> Unfortunately I've seen a lot of millisecond queries that have 6 figure
> estimates, due to data being in cache. So I'm not sure how practical that
> would be.
>

We seem to be agreed that longer running queries may benefit from more
thorough optimization.

I like the idea of specifying a goal for the optimization, so the user can
explicitly ask to spend more time. But I would also rather that I didn't
have to do that at all, by using a heuristic that allows us to act sensibly
even when not explicitly instructed by the user. It Just Works.

We can decide what the heuristic limit is based upon the cost of the
technique. I don't think it matters how quick your CPU is, it matters how
long planning takes as a % of the whole expected execution time.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread David Rowley
On 8 December 2015 at 04:35, Jim Nasby  wrote:

> On 12/6/15 10:38 AM, Tom Lane wrote:
>
>> I said "in most cases".  You can find example cases to support almost any
>> weird planner optimization no matter how expensive and single-purpose;
>> but that is the wrong way to think about it.  What you have to think about
>> is average cases, and in particular, not putting a drag on planning time
>> in cases where no benefit ensues.  We're not committing any patches that
>> give one uncommon case an 1100X speedup by penalizing every other query
>> 10%,
>> or even 1%; especially not when there may be other ways to fix it.
>>
>
> This is a problem that seriously hurts Postgres in data warehousing
> applications. We can't keep ignoring optimizations that provide even as
> little as 10% execution improvements for 10x worse planner performance,
> because in a warehouse it's next to impossible for planning time to matter.
>
> Obviously it'd be great if there was a fast, easy way to figure out
> whether a query would be expensive enough to go the whole 9 yards on
> planning it but at this point I suspect a simple GUC would be a big
> improvement.


I've certainly been here before [1], but the idea fell of deaf ears.

The biggest frustration for me is that the way Tom always seems to argue
his point it's as if planning time is roughly the same or more expensive
than execution time, and likely that's true in many cases, but I would
imagine in more normal cases that execution time is longer, although I've
never had the guts to stand up and argued this as I don't have any stats to
back me up.

I was talking to Thomas Munro yesterday about this, and he mentioned that
it would be quite nice to have some stats on how much time is spent in the
planner, vs executor. He came up with the idea of adding a column to
pg_stat_statements to record the planning time.

If we could get real statistics on real world cases and we discovered that,
for example on average that the total CPU time of planning was only 1% of
execution time, then it would certainly make adding 2% overhead onto the
planner a bit less of a worry, as this would just be %2 of 1% (0.02%). Such
information, if fed back into the community might be able to drive us in
the right direction when it comes to deciding what needs to be done to
resolve this constant issue with accepting planner improvement patches.

I believe that with parallel query on the horizon for 9.6 that we're now
aiming to support bigger OLAP type database than ever before. So if we
ignore patches like this one then it appears that we have some conflicting
goals in the community as it seems that we're willing to add the brawn, but
we're not willing to add the brain. If this is the case then it's a shame,
as I think we can have both. So I very much agree on the fact that we must
find a way to maintain support and high performance of small OLTP databases
too.

[1]
http://www.postgresql.org/message-id/caaphdvrjrz-0xinyiqtiws0mfx17gwd2y8vz+i92nuzsha8...@mail.gmail.com

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-06 Thread Tom Lane
David Rowley  writes:
> On 6 December 2015 at 06:07, Tom Lane  wrote:
>> Another issue that would need consideration is how to avoid skewing
>> planner selectivity estimates with derived clauses that are fully
>> redundant with other clauses.

> Could you give me an example of where this is a problem?

Using the regression database, try

explain analyze select * from tenk1 a, tenk1 b where a.thousand = b.thousand;

explain analyze select * from tenk1 a, tenk1 b where a.thousand = b.thousand 
and a.thousand < 100;

explain analyze select * from tenk1 a, tenk1 b where a.thousand = b.thousand 
and a.thousand < 100 and b.thousand < 100;

The first two give pretty accurate estimates for the join size, the third
not so much, because it thinks b.thousand < 100 is an independent
constraint.

> I've tried fooling
> the planner into giving me bad estimates, but I've not managed to.
> # explain analyze select * from t1 where t1.id < 10 and t1.id < 100 and
> t1.id < 1000;

That particular case works because clausesel.c's AddRangeClause figures
out that the similar inequalities are redundant and drops all but one,
on its way to (not) identifying a range constraint for t1.id.  There's
nothing comparable for constraints on different variables, though,
especially not constraints on Vars of different relations, which will
never even appear in the same restriction list.

> if so, is the current eclass code not prone to the exact same problem?

No, and I already explained why not.

>> Lastly, in most cases knowing that t2.id <= 10 is just not worth all
>> that much; it's certainly far less useful than an equality condition.

> I'd have thought that my link to a thread of a reported 1100 to 2200 times
> performance improvement might have made you think twice about claiming the
> uselessness of this idea.

I said "in most cases".  You can find example cases to support almost any
weird planner optimization no matter how expensive and single-purpose;
but that is the wrong way to think about it.  What you have to think about
is average cases, and in particular, not putting a drag on planning time
in cases where no benefit ensues.  We're not committing any patches that
give one uncommon case an 1100X speedup by penalizing every other query 10%,
or even 1%; especially not when there may be other ways to fix it.

The EC machinery wasn't designed in a vacuum.  It is interested in
deriving equality conditions because those are useful for merge and hash
joins.  It's also tightly tied in with the planner's understanding of sort
ordering and distinct-ness.  None of those considerations provide any
reason to build a similar thing for inequalities.

It may be that computing derived inequalities is something that's worth
adding, but we're going to have to take a very hard look at the costs and
benefits; it's not a slam dunk that such a patch would get committed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-06 Thread David Rowley
On 6 December 2015 at 06:07, Tom Lane  wrote:

> David Rowley  writes:
> > As of today these Equivalence Classes only incorporate expressions which
> > use the equality operator, but what if the above query had been written
> as:
>
> > select * from t1 inner join t2 on t1.id = t2.id where t1.id <= 10;
>
> > Should we not be able to assume that t2.id is also <= 10?



This sort of thing has been suggested multiple times before, and I've
> rejected it every time on the grounds that it would likely add far more
> planner cycles than it'd be worth, eg, time would be spent on searches for
> matching subexpressions whether or not anything was learned (and often
> nothing would be learned).


I guess if it keeps coming up then people must be hitting this limitation.
Perhaps continually rejecting it based on your original opinion is not the
best way to move forward with improving the query planner.


>   While I've only read your description of the
> patch not the patch itself, the search methodology you propose seems
> pretty brute-force and unlikely to solve that issue.  It's particularly
> important to avoid O(N^2) behaviors when there are N expressions ...
>
>

Likely it would be possible to do something to improve on that.
Perhaps if the list of filter clauses grows beyond a certain number then a
hash table can be built on the ec_members list with the em_expr as the key
and the EquivalenceClass as the data. Although likely we don't currently
have anything available for generating hash values for Expr nodes. On
checking it seems that 32 is the estimated tipping point for hashing
in find_join_rel(), perhaps something similar would suit this.


> Another issue that would need consideration is how to avoid skewing
> planner selectivity estimates with derived clauses that are fully
> redundant with other clauses.  The existing EC machinery is mostly
> able to dodge that problem by generating just a minimal set of equality
> clauses from an EC, but I don't see how we'd make that work here.
>
>
Could you give me an example of where this is a problem? I've tried fooling
the planner into giving me bad estimates, but I've not managed to.

# explain analyze select * from t1 where t1.id < 10 and t1.id < 100 and
t1.id < 1000;
 QUERY PLAN

 Index Scan using t1_pkey on t1  (cost=0.29..8.49 rows=9 width=8) (actual
time=0.155..0.160 rows=9 loops=1)
   Index Cond: ((id < 10) AND (id < 100) AND (id < 1000))

I'd say the t1.id < 100 AND t1.id < 1000 are fully redundant here. The
estimate given by the planner is bang-on.

Perhaps you're talking about another column making the pushed down qual
redundant?  if so, is the current eclass code not prone to the exact same
problem?

I'm also wondering why you want to limit it to comparisons to constants;
> that seems rather arbitrary.
>
>
Do you have other useful cases in mind?
The other one I considered was "expr op expr" clauses, where both those
exprs were in eclasses. For example:

select * from t1 inner join t2 on t1.col1=t2.col1 and t1.col2=t2.col2 where
t1.col1 < t1.col2;

I'd imagine that would have a narrower use case. I simply stopped with
"Expr op Const" because I though that would be more common and less planner
overhead to detect. Giving that below you also raise concerns that "expr op
const" is likely not that useful in enough cases, then I'm not holding up
too much hope of adding more complexity to the detection mechanism for less
common wins.


> Lastly, in most cases knowing that t2.id <= 10 is just not worth all
> that much; it's certainly far less useful than an equality condition.
> It's not difficult to imagine that this would often be a net drag on
> runtime performance (never mind planner performance) by doing nothing
> except creating additional filter conditions the executor has to check.
> Certainly it would be valuable to know this if it let us exclude some
> partition of a table, but that's only going to happen in a small minority
> of queries.
>
>
I'd have thought that my link to a thread of a reported 1100 to 2200 times
performance improvement might have made you think twice about claiming the
uselessness of this idea.

I'm also quite surprised at you mentioning partitions here. Personally I'd
be thinking of indexes long before thinking of partitions. I don't think I
need to remind you that btree indexes handle >= and <= just fine. It's not
all that hard to imagine that if they're using that column for a join
condition and are serious about performance that they just might also have
an index defined on that column too. I'd imagine the only case we might
have to worry about is when the selectivity of the pushed qual is getting
close to 1. Perhaps the RestrictInfos could be marked as "optional"
somehow, and we could simply remove them when their selectivity 

Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-05 Thread Tom Lane
David Rowley  writes:
> As of today these Equivalence Classes only incorporate expressions which
> use the equality operator, but what if the above query had been written as:

> select * from t1 inner join t2 on t1.id = t2.id where t1.id <= 10;

> Should we not be able to assume that t2.id is also <= 10?

This sort of thing has been suggested multiple times before, and I've
rejected it every time on the grounds that it would likely add far more
planner cycles than it'd be worth, eg, time would be spent on searches for
matching subexpressions whether or not anything was learned (and often
nothing would be learned).  While I've only read your description of the
patch not the patch itself, the search methodology you propose seems
pretty brute-force and unlikely to solve that issue.  It's particularly
important to avoid O(N^2) behaviors when there are N expressions ...

Another issue that would need consideration is how to avoid skewing
planner selectivity estimates with derived clauses that are fully
redundant with other clauses.  The existing EC machinery is mostly
able to dodge that problem by generating just a minimal set of equality
clauses from an EC, but I don't see how we'd make that work here.

I'm also wondering why you want to limit it to comparisons to constants;
that seems rather arbitrary.

Lastly, in most cases knowing that t2.id <= 10 is just not worth all
that much; it's certainly far less useful than an equality condition.
It's not difficult to imagine that this would often be a net drag on
runtime performance (never mind planner performance) by doing nothing
except creating additional filter conditions the executor has to check.
Certainly it would be valuable to know this if it let us exclude some
partition of a table, but that's only going to happen in a small minority
of queries.

I'm not necessarily opposed to doing anything in this area, but so far
I've not seen how to do it in a way that is attractive when planner
complexity, performance, and end results are all considered.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers