subject:"Pull up sublink of type 'NOT NOT \(expr\)'"

Re: Pull up sublink of type 'NOT NOT (expr)'

2018-11-18 Thread Richard Guo

Hi Tom,

Thanks for your input.

On Fri, Nov 16, 2018 at 6:06 AM, Tom Lane  wrote:

> Richard Guo  writes:
> > On Tue, Nov 13, 2018 at 10:05 AM, Tom Lane  wrote:
> >> What is the argument that this occurs often enough to be worth expending
> >> extra cycles and code space on?
>
> > I am not using an ORM, but just considering maybe it would be better if
> > PostgreSQL can do such pull-up.
> > Tom, what's your suggestion? Is it worthwhile expending several lines of
> > codes to do this pull-up?
>
> Well, really, if we're just doing this on speculation and don't even have
> any concrete indication that anybody writes code like that, I can't get
> excited about expending even a few lines of code on it.
>
> The reason why we perform optimizations similar to this in places like
> eval_const_expressions is (IMO anyway) that transformations such as
> function inlining and subquery pullup can create parse trees that look
> like this, even when the original query was perfectly sane and without
> obvious redundancy.  However, because pull_up_sublinks runs before any
> of that, it would only ever see NOT NOT if someone had actually written
> such a thing.  So to justify expending any code space or cycles on
> optimizing this, you have to make the case that that actually happens
> in the wild, and does so often enough to justify wasting some (admittedly
> small) number of cycles for everybody else.  I'd kind of like to see some
> actual field occurrence of NOT NOT over an optimizable IN/EXISTS before
> deciding that it's worth doing.
>
> It's sort of annoying that we have to run pull_up_sublinks before we do
> scalar expression simplification.  If we could do that in the other order,
> NOT NOT elimination would fall out automatically, and we'd also be able
> to recognize some other cases where it initially seems that an IN or
> EXISTS is not at top level, but it becomes so after we const-fold, apply
> DeMorgan's laws, etc.  However, according to the point I made above,
> it makes more sense to apply expression simplification after we've
> completed pullup-like operations.  So we can't really get there from
> here unless we wanted to do (parts of?) expression simplification twice,
> which is unlikely to win often enough to justify the cost.
>

Agree! If we do expression simplification in the other order, we have to do
that twice,
which is not a good idea.


>
> So I'm inclined to reject this patch as probably being a net loss in
> almost all cases.  If you can show any real-world cases where it wins,
> we could reconsider.
>

I am convinced. I do not see this happen often enough in real world neither.
Feel free to reject his patch.

Thanks
Richard

Re: Pull up sublink of type 'NOT NOT (expr)'

2018-11-15 Thread Tom Lane

Richard Guo  writes:
> On Tue, Nov 13, 2018 at 10:05 AM, Tom Lane  wrote:
>> What is the argument that this occurs often enough to be worth expending
>> extra cycles and code space on?

> I am not using an ORM, but just considering maybe it would be better if
> PostgreSQL can do such pull-up.
> Tom, what's your suggestion? Is it worthwhile expending several lines of
> codes to do this pull-up?

Well, really, if we're just doing this on speculation and don't even have
any concrete indication that anybody writes code like that, I can't get
excited about expending even a few lines of code on it.

The reason why we perform optimizations similar to this in places like
eval_const_expressions is (IMO anyway) that transformations such as
function inlining and subquery pullup can create parse trees that look
like this, even when the original query was perfectly sane and without
obvious redundancy.  However, because pull_up_sublinks runs before any
of that, it would only ever see NOT NOT if someone had actually written
such a thing.  So to justify expending any code space or cycles on
optimizing this, you have to make the case that that actually happens
in the wild, and does so often enough to justify wasting some (admittedly
small) number of cycles for everybody else.  I'd kind of like to see some
actual field occurrence of NOT NOT over an optimizable IN/EXISTS before
deciding that it's worth doing.

It's sort of annoying that we have to run pull_up_sublinks before we do
scalar expression simplification.  If we could do that in the other order,
NOT NOT elimination would fall out automatically, and we'd also be able
to recognize some other cases where it initially seems that an IN or
EXISTS is not at top level, but it becomes so after we const-fold, apply
DeMorgan's laws, etc.  However, according to the point I made above,
it makes more sense to apply expression simplification after we've
completed pullup-like operations.  So we can't really get there from
here unless we wanted to do (parts of?) expression simplification twice,
which is unlikely to win often enough to justify the cost.

So I'm inclined to reject this patch as probably being a net loss in
almost all cases.  If you can show any real-world cases where it wins,
we could reconsider.

regards, tom lane

Re: Pull up sublink of type 'NOT NOT (expr)'

2018-11-14 Thread Richard Guo

Hi Tom,

Thanks for reviewing.

On Tue, Nov 13, 2018 at 10:05 AM, Tom Lane  wrote:

> Richard Guo  writes:
> > Currently for quals in the form of "NOT NOT (SubLink)", this SubLink
> would
> > not be considered when pulling up sublinks.
>
> Yup.
>
> > Should we give it a chance, like the attached does?
>
> What is the argument that this occurs often enough to be worth expending
> extra cycles and code space on?
>
> If we do do something like this, I'd be inclined to make it handle
> any-number-of-consecutive-NOTs, and maybe remove NOT NOT over an ANY,
> not just EXISTS.  But I don't honestly think that it's worth troubling
> over.  Do even the dumbest ORMs generate such code?
>

What this patch does is to recursively remove NOT NOT over a SubLink, so it
actually can handle any-number-of-consecutive-NOTs, both over ANY and over
EXISTS.

Over ANY:

gpadmin=# explain select * from a where not not not not a.i in (select i
from b);
  QUERY PLAN
---
 Hash Join  (cost=42.75..93.85 rows=1130 width=8)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..32.60 rows=2260 width=8)
   ->  Hash  (cost=40.25..40.25 rows=200 width=4)
 ->  HashAggregate  (cost=38.25..40.25 rows=200 width=4)
   Group Key: b.i
   ->  Seq Scan on b  (cost=0.00..32.60 rows=2260 width=4)
(7 rows)

Over EXISTS:

gpadmin=# explain select * from a where not not not not exists (select 1
from b where a.i = b.i);
  QUERY PLAN
---
 Hash Join  (cost=42.75..93.85 rows=1130 width=8)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..32.60 rows=2260 width=8)
   ->  Hash  (cost=40.25..40.25 rows=200 width=4)
 ->  HashAggregate  (cost=38.25..40.25 rows=200 width=4)
   Group Key: b.i
   ->  Seq Scan on b  (cost=0.00..32.60 rows=2260 width=4)
(7 rows)


I am not using an ORM, but just considering maybe it would be better if
PostgreSQL can do such pull-up.
Tom, what's your suggestion? Is it worthwhile expending several lines of
codes to do this pull-up?

Thanks
Richard

>
>

Re: Pull up sublink of type 'NOT NOT (expr)'

2018-11-12 Thread Tom Lane

Richard Guo  writes:
> Currently for quals in the form of "NOT NOT (SubLink)", this SubLink would
> not be considered when pulling up sublinks.

Yup.

> Should we give it a chance, like the attached does?

What is the argument that this occurs often enough to be worth expending
extra cycles and code space on?

If we do do something like this, I'd be inclined to make it handle
any-number-of-consecutive-NOTs, and maybe remove NOT NOT over an ANY,
not just EXISTS.  But I don't honestly think that it's worth troubling
over.  Do even the dumbest ORMs generate such code?

regards, tom lane

Re: Pull up sublink of type 'NOT NOT (expr)'

2018-10-25 Thread Richard Guo

Hi Alex,

Yes hashed SubPlan preserves order and may be faster than hash join in some
cases. But I don't think that is a reason good enough to prevent the subplan
from being converted to join.

Let's suppose the subplan is uncorrelated, otherwise hashed SubPlan would
not
be used. Hashed SubPlan can only be applied when the size of the subquery
result can fit in work_mem. When the data size increases or work_mem is set
to
a smaller value that more than one batch is needed, hashed SubPlan will not
be
used and the total cost will increase dramatically. Hash join, by contrast,
handles multiple batches better.

Below is an example to show the behavior of hash join and hashed SubPlan for
single batch and multiple batches.

create table a (i int);
create table b (i int);
insert into a select i from generate_series(1,1)i;
insert into b select i from generate_series(1,1)i;

*For single batch:*

*Hash Join*
gpadmin=# explain analyze select * from a where a.i in (select b.i from b);
   QUERY PLAN
-
 Hash Semi Join  (cost=270.00..552.50 rows=1 width=4) (actual
time=4.332..9.499 rows=1 loops=1)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..145.00 rows=1 width=4) (actual
time=0.016..1.328 rows=1 loops=1)
   ->  Hash  (cost=145.00..145.00 rows=1 width=4) (actual
time=4.292..4.292 rows=1 loops=1)
 Buckets: 16384  Batches: 1  Memory Usage: 480kB
 ->  Seq Scan on b  (cost=0.00..145.00 rows=1 width=4) (actual
time=0.013..1.817 rows=1 loops=1)
 Planning Time: 0.236 ms
 Execution Time: 10.099 ms
(8 rows)

*Hashed SubPlan*
gpadmin=# explain analyze select * from a where not not a.i in (select b.i
from b);
 QUERY PLAN
-
 Seq Scan on a  (cost=170.00..340.00 rows=5000 width=4) (actual
time=6.359..11.843 rows=1 loops=1)
   Filter: (hashed SubPlan 1)
   SubPlan 1
 ->  Seq Scan on b  (cost=0.00..145.00 rows=1 width=4) (actual
time=0.017..1.749 rows=1 loops=1)
 Planning Time: 0.109 ms
 Execution Time: 12.905 ms
(6 rows)

insert into b select i from generate_series(1,10)i;
insert into b select i from generate_series(1,10)i;


*For multiple batches:*


*Hash Join*
gpadmin=# explain analyze select * from a where a.i in (select b.i from b);
 QUERY PLAN
-
 Hash Semi Join  (cost=6476.00..7659.50 rows=1 width=4) (actual
time=89.915..121.016 rows=1 loops=1)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..145.00 rows=1 width=4) (actual
time=0.020..1.779 rows=1 loops=1)
   ->  Hash  (cost=3030.00..3030.00 rows=21 width=4) (actual
time=89.666..89.667 rows=21 loops=1)
 Buckets: 131072  Batches: 4  Memory Usage: 2860kB
 ->  Seq Scan on b  (cost=0.00..3030.00 rows=21 width=4)
(actual time=0.015..37.686 rows=21 loops=1)
 Planning Time: 0.256 ms
 Execution Time: 121.769 ms
(8 rows)

*Plain SubPlan*
gpadmin=# explain analyze select * from a where not not a.i in (select b.i
from b);
 QUERY PLAN
-
 Seq Scan on a  (cost=0.00..27130170.00 rows=5000 width=4) (actual
time=0.030..9994.036 rows=1 loops=1)
   Filter: (SubPlan 1)
   SubPlan 1
 ->  Materialize  (cost=0.00..4901.00 rows=21 width=4) (actual
time=0.000..0.359 rows=5000 loops=1)
   ->  Seq Scan on b  (cost=0.00..3030.00 rows=21 width=4)
(actual time=0.010..1.673 rows=1 loops=1)
 Planning Time: 0.077 ms
 Execution Time: 9995.556 ms
(7 rows)

Thanks
Richard

On Wed, Oct 24, 2018 at 12:39 AM, Alexey Bashtanov 
wrote:

> Hello Richard,
>
> Currently for quals in the form of "NOT NOT (SubLink)", this SubLink would
> not
> be considered when pulling up sublinks. For instance:
>
> gpadmin=# explain select * from a where NOT NOT (a.i in (select b.i from
> b));
>  QUERY PLAN
> -
>  Seq Scan on a  (cost=51.50..85.62 rows=1005 width=8)
>Filter: (hashed SubPlan 1)
>SubPlan 1
>  ->  Seq Scan on b  (cost=0.00..44.00 rows=3000 width=4)
> (4 rows)
>
>
> Should we give it a chance, like the attached does?
>
>
> Sometimes hashed subplan is faster than hash join and than all the other
> options, as it preserves the order.
> Using NOT NOT, one can control whether to use it or not:
> https://pgblog.bashtanov.com/2017/12/08/double-negative-
> and-query-performance/ (test case and results in the bottom of

Re: Pull up sublink of type 'NOT NOT (expr)'

2018-10-23 Thread Alexey Bashtanov


Hello Richard,

Currently for quals in the form of "NOT NOT (SubLink)", this SubLink 
would not

be considered when pulling up sublinks. For instance:

gpadmin=# explain select * from a where NOT NOT (a.i in (select b.i 
from b));

     QUERY PLAN
-
 Seq Scan on a (cost=51.50..85.62 rows=1005 width=8)
   Filter: (hashed SubPlan 1)
   SubPlan 1
     ->  Seq Scan on b  (cost=0.00..44.00 rows=3000 width=4)
(4 rows)


Should we give it a chance, like the attached does?


Sometimes hashed subplan is faster than hash join and than all the other 
options, as it preserves the order.

Using NOT NOT, one can control whether to use it or not:
https://pgblog.bashtanov.com/2017/12/08/double-negative-and-query-performance/ 
(test case and results in the bottom of the page).


Surely dirty tricks should not be the way to control the planner, but 
when breaking them we should probably provide a way to achieve the same 
result,

ideally making the planner choose the best plan without hints.

Best,
  Alex

Pull up sublink of type 'NOT NOT (expr)'

2018-10-17 Thread Richard Guo

Hi hackers,

Currently for quals in the form of "NOT NOT (SubLink)", this SubLink would
not
be considered when pulling up sublinks. For instance:

gpadmin=# explain select * from a where NOT NOT (a.i in (select b.i from
b));
 QUERY PLAN
-
 Seq Scan on a  (cost=51.50..85.62 rows=1005 width=8)
   Filter: (hashed SubPlan 1)
   SubPlan 1
 ->  Seq Scan on b  (cost=0.00..44.00 rows=3000 width=4)
(4 rows)


Should we give it a chance, like the attached does?

Thanks
Richard


0001-Pull-up-sublink-of-type-NOT-NOT-expr.patch
Description: Binary data

Re: Pull up sublink of type 'NOT NOT (expr)'

Re: Pull up sublink of type 'NOT NOT (expr)'

Re: Pull up sublink of type 'NOT NOT (expr)'

Re: Pull up sublink of type 'NOT NOT (expr)'

Re: Pull up sublink of type 'NOT NOT (expr)'

Re: Pull up sublink of type 'NOT NOT (expr)'

Pull up sublink of type 'NOT NOT (expr)'

7 matches

Site Navigation

Mail list logo

Footer information