Re: Pull up sublink of type 'NOT NOT (expr)'
Hi Tom, Thanks for your input. On Fri, Nov 16, 2018 at 6:06 AM, Tom Lane wrote: > Richard Guo writes: > > On Tue, Nov 13, 2018 at 10:05 AM, Tom Lane wrote: > >> What is the argument that this occurs often enough to be worth expending > >> extra cycles and code space on? > > > I am not using an ORM, but just considering maybe it would be better if > > PostgreSQL can do such pull-up. > > Tom, what's your suggestion? Is it worthwhile expending several lines of > > codes to do this pull-up? > > Well, really, if we're just doing this on speculation and don't even have > any concrete indication that anybody writes code like that, I can't get > excited about expending even a few lines of code on it. > > The reason why we perform optimizations similar to this in places like > eval_const_expressions is (IMO anyway) that transformations such as > function inlining and subquery pullup can create parse trees that look > like this, even when the original query was perfectly sane and without > obvious redundancy. However, because pull_up_sublinks runs before any > of that, it would only ever see NOT NOT if someone had actually written > such a thing. So to justify expending any code space or cycles on > optimizing this, you have to make the case that that actually happens > in the wild, and does so often enough to justify wasting some (admittedly > small) number of cycles for everybody else. I'd kind of like to see some > actual field occurrence of NOT NOT over an optimizable IN/EXISTS before > deciding that it's worth doing. > > It's sort of annoying that we have to run pull_up_sublinks before we do > scalar expression simplification. If we could do that in the other order, > NOT NOT elimination would fall out automatically, and we'd also be able > to recognize some other cases where it initially seems that an IN or > EXISTS is not at top level, but it becomes so after we const-fold, apply > DeMorgan's laws, etc. However, according to the point I made above, > it makes more sense to apply expression simplification after we've > completed pullup-like operations. So we can't really get there from > here unless we wanted to do (parts of?) expression simplification twice, > which is unlikely to win often enough to justify the cost. > Agree! If we do expression simplification in the other order, we have to do that twice, which is not a good idea. > > So I'm inclined to reject this patch as probably being a net loss in > almost all cases. If you can show any real-world cases where it wins, > we could reconsider. > I am convinced. I do not see this happen often enough in real world neither. Feel free to reject his patch. Thanks Richard
Re: Pull up sublink of type 'NOT NOT (expr)'
Richard Guo writes: > On Tue, Nov 13, 2018 at 10:05 AM, Tom Lane wrote: >> What is the argument that this occurs often enough to be worth expending >> extra cycles and code space on? > I am not using an ORM, but just considering maybe it would be better if > PostgreSQL can do such pull-up. > Tom, what's your suggestion? Is it worthwhile expending several lines of > codes to do this pull-up? Well, really, if we're just doing this on speculation and don't even have any concrete indication that anybody writes code like that, I can't get excited about expending even a few lines of code on it. The reason why we perform optimizations similar to this in places like eval_const_expressions is (IMO anyway) that transformations such as function inlining and subquery pullup can create parse trees that look like this, even when the original query was perfectly sane and without obvious redundancy. However, because pull_up_sublinks runs before any of that, it would only ever see NOT NOT if someone had actually written such a thing. So to justify expending any code space or cycles on optimizing this, you have to make the case that that actually happens in the wild, and does so often enough to justify wasting some (admittedly small) number of cycles for everybody else. I'd kind of like to see some actual field occurrence of NOT NOT over an optimizable IN/EXISTS before deciding that it's worth doing. It's sort of annoying that we have to run pull_up_sublinks before we do scalar expression simplification. If we could do that in the other order, NOT NOT elimination would fall out automatically, and we'd also be able to recognize some other cases where it initially seems that an IN or EXISTS is not at top level, but it becomes so after we const-fold, apply DeMorgan's laws, etc. However, according to the point I made above, it makes more sense to apply expression simplification after we've completed pullup-like operations. So we can't really get there from here unless we wanted to do (parts of?) expression simplification twice, which is unlikely to win often enough to justify the cost. So I'm inclined to reject this patch as probably being a net loss in almost all cases. If you can show any real-world cases where it wins, we could reconsider. regards, tom lane
Re: Pull up sublink of type 'NOT NOT (expr)'
Hi Tom, Thanks for reviewing. On Tue, Nov 13, 2018 at 10:05 AM, Tom Lane wrote: > Richard Guo writes: > > Currently for quals in the form of "NOT NOT (SubLink)", this SubLink > would > > not be considered when pulling up sublinks. > > Yup. > > > Should we give it a chance, like the attached does? > > What is the argument that this occurs often enough to be worth expending > extra cycles and code space on? > > If we do do something like this, I'd be inclined to make it handle > any-number-of-consecutive-NOTs, and maybe remove NOT NOT over an ANY, > not just EXISTS. But I don't honestly think that it's worth troubling > over. Do even the dumbest ORMs generate such code? > What this patch does is to recursively remove NOT NOT over a SubLink, so it actually can handle any-number-of-consecutive-NOTs, both over ANY and over EXISTS. Over ANY: gpadmin=# explain select * from a where not not not not a.i in (select i from b); QUERY PLAN --- Hash Join (cost=42.75..93.85 rows=1130 width=8) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..32.60 rows=2260 width=8) -> Hash (cost=40.25..40.25 rows=200 width=4) -> HashAggregate (cost=38.25..40.25 rows=200 width=4) Group Key: b.i -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=4) (7 rows) Over EXISTS: gpadmin=# explain select * from a where not not not not exists (select 1 from b where a.i = b.i); QUERY PLAN --- Hash Join (cost=42.75..93.85 rows=1130 width=8) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..32.60 rows=2260 width=8) -> Hash (cost=40.25..40.25 rows=200 width=4) -> HashAggregate (cost=38.25..40.25 rows=200 width=4) Group Key: b.i -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=4) (7 rows) I am not using an ORM, but just considering maybe it would be better if PostgreSQL can do such pull-up. Tom, what's your suggestion? Is it worthwhile expending several lines of codes to do this pull-up? Thanks Richard > >
Re: Pull up sublink of type 'NOT NOT (expr)'
Richard Guo writes: > Currently for quals in the form of "NOT NOT (SubLink)", this SubLink would > not be considered when pulling up sublinks. Yup. > Should we give it a chance, like the attached does? What is the argument that this occurs often enough to be worth expending extra cycles and code space on? If we do do something like this, I'd be inclined to make it handle any-number-of-consecutive-NOTs, and maybe remove NOT NOT over an ANY, not just EXISTS. But I don't honestly think that it's worth troubling over. Do even the dumbest ORMs generate such code? regards, tom lane
Re: Pull up sublink of type 'NOT NOT (expr)'
Hi Alex, Yes hashed SubPlan preserves order and may be faster than hash join in some cases. But I don't think that is a reason good enough to prevent the subplan from being converted to join. Let's suppose the subplan is uncorrelated, otherwise hashed SubPlan would not be used. Hashed SubPlan can only be applied when the size of the subquery result can fit in work_mem. When the data size increases or work_mem is set to a smaller value that more than one batch is needed, hashed SubPlan will not be used and the total cost will increase dramatically. Hash join, by contrast, handles multiple batches better. Below is an example to show the behavior of hash join and hashed SubPlan for single batch and multiple batches. create table a (i int); create table b (i int); insert into a select i from generate_series(1,1)i; insert into b select i from generate_series(1,1)i; *For single batch:* *Hash Join* gpadmin=# explain analyze select * from a where a.i in (select b.i from b); QUERY PLAN - Hash Semi Join (cost=270.00..552.50 rows=1 width=4) (actual time=4.332..9.499 rows=1 loops=1) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..145.00 rows=1 width=4) (actual time=0.016..1.328 rows=1 loops=1) -> Hash (cost=145.00..145.00 rows=1 width=4) (actual time=4.292..4.292 rows=1 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 480kB -> Seq Scan on b (cost=0.00..145.00 rows=1 width=4) (actual time=0.013..1.817 rows=1 loops=1) Planning Time: 0.236 ms Execution Time: 10.099 ms (8 rows) *Hashed SubPlan* gpadmin=# explain analyze select * from a where not not a.i in (select b.i from b); QUERY PLAN - Seq Scan on a (cost=170.00..340.00 rows=5000 width=4) (actual time=6.359..11.843 rows=1 loops=1) Filter: (hashed SubPlan 1) SubPlan 1 -> Seq Scan on b (cost=0.00..145.00 rows=1 width=4) (actual time=0.017..1.749 rows=1 loops=1) Planning Time: 0.109 ms Execution Time: 12.905 ms (6 rows) insert into b select i from generate_series(1,10)i; insert into b select i from generate_series(1,10)i; *For multiple batches:* *Hash Join* gpadmin=# explain analyze select * from a where a.i in (select b.i from b); QUERY PLAN - Hash Semi Join (cost=6476.00..7659.50 rows=1 width=4) (actual time=89.915..121.016 rows=1 loops=1) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..145.00 rows=1 width=4) (actual time=0.020..1.779 rows=1 loops=1) -> Hash (cost=3030.00..3030.00 rows=21 width=4) (actual time=89.666..89.667 rows=21 loops=1) Buckets: 131072 Batches: 4 Memory Usage: 2860kB -> Seq Scan on b (cost=0.00..3030.00 rows=21 width=4) (actual time=0.015..37.686 rows=21 loops=1) Planning Time: 0.256 ms Execution Time: 121.769 ms (8 rows) *Plain SubPlan* gpadmin=# explain analyze select * from a where not not a.i in (select b.i from b); QUERY PLAN - Seq Scan on a (cost=0.00..27130170.00 rows=5000 width=4) (actual time=0.030..9994.036 rows=1 loops=1) Filter: (SubPlan 1) SubPlan 1 -> Materialize (cost=0.00..4901.00 rows=21 width=4) (actual time=0.000..0.359 rows=5000 loops=1) -> Seq Scan on b (cost=0.00..3030.00 rows=21 width=4) (actual time=0.010..1.673 rows=1 loops=1) Planning Time: 0.077 ms Execution Time: 9995.556 ms (7 rows) Thanks Richard On Wed, Oct 24, 2018 at 12:39 AM, Alexey Bashtanov wrote: > Hello Richard, > > Currently for quals in the form of "NOT NOT (SubLink)", this SubLink would > not > be considered when pulling up sublinks. For instance: > > gpadmin=# explain select * from a where NOT NOT (a.i in (select b.i from > b)); > QUERY PLAN > - > Seq Scan on a (cost=51.50..85.62 rows=1005 width=8) >Filter: (hashed SubPlan 1) >SubPlan 1 > -> Seq Scan on b (cost=0.00..44.00 rows=3000 width=4) > (4 rows) > > > Should we give it a chance, like the attached does? > > > Sometimes hashed subplan is faster than hash join and than all the other > options, as it preserves the order. > Using NOT NOT, one can control whether to use it or not: > https://pgblog.bashtanov.com/2017/12/08/double-negative- > and-query-performance/ (test case and results in the bottom of
Re: Pull up sublink of type 'NOT NOT (expr)'
Hello Richard, Currently for quals in the form of "NOT NOT (SubLink)", this SubLink would not be considered when pulling up sublinks. For instance: gpadmin=# explain select * from a where NOT NOT (a.i in (select b.i from b)); QUERY PLAN - Seq Scan on a (cost=51.50..85.62 rows=1005 width=8) Filter: (hashed SubPlan 1) SubPlan 1 -> Seq Scan on b (cost=0.00..44.00 rows=3000 width=4) (4 rows) Should we give it a chance, like the attached does? Sometimes hashed subplan is faster than hash join and than all the other options, as it preserves the order. Using NOT NOT, one can control whether to use it or not: https://pgblog.bashtanov.com/2017/12/08/double-negative-and-query-performance/ (test case and results in the bottom of the page). Surely dirty tricks should not be the way to control the planner, but when breaking them we should probably provide a way to achieve the same result, ideally making the planner choose the best plan without hints. Best, Alex
Pull up sublink of type 'NOT NOT (expr)'
Hi hackers, Currently for quals in the form of "NOT NOT (SubLink)", this SubLink would not be considered when pulling up sublinks. For instance: gpadmin=# explain select * from a where NOT NOT (a.i in (select b.i from b)); QUERY PLAN - Seq Scan on a (cost=51.50..85.62 rows=1005 width=8) Filter: (hashed SubPlan 1) SubPlan 1 -> Seq Scan on b (cost=0.00..44.00 rows=3000 width=4) (4 rows) Should we give it a chance, like the attached does? Thanks Richard 0001-Pull-up-sublink-of-type-NOT-NOT-expr.patch Description: Binary data