On Wed, Nov 20, 2019 at 8:15 PM Andy Fan <zhihui.fan1...@gmail.com> wrote:
> Hi Hackers: > > First I found the following queries running bad on pg. > > select count(*) from part2 p1 where p_size > 40 and p_retailprice > > (select avg(p_retailprice) from part2 p2 where p2.p_brand=p1.p_brand); > > the plan is > QUERY PLAN > > ------------------------------------------------------------------------------------ > Aggregate (cost=1899310537.28..1899310537.29 rows=1 width=8) > -> Seq Scan on part2 p1 (cost=0.00..1899310456.00 rows=32513 width=0) > Filter: ((p_size > 40) AND (p_retailprice > (SubPlan 1))) > SubPlan 1 > -> Aggregate (cost=6331.00..6331.01 rows=1 width=32) > -> Seq Scan on part2 p2 (cost=0.00..5956.00 rows=150000 > width=4) > Filter: (p_brand = p1.p_brand) > > however if we change it to the following format, it runs pretty quick. > > select count(*) from part2, > (select p_brand, avg(p_retailprice) as avg_price from part2 where p_size > > 40 group by p_brand) p2 > where p_retailprice > p2.avg_price > and p_size > 40 > and part2.p_brand = p2.p_brand; > > The above example comes from > https://community.pivotal.io/s/article/Pivotal-Query-Optimizer-Explained with > a litter modification. > > 1. why pg can't translate the query 1 to query 2. after some checking > on pull_up_sublinks_qual_recurse, I still doesn't get the idea. > 2. why pg can't do it, while greenplum can? > > Thanks > > add the sql I used for testing for reference. CREATE TABLE part2 ( p_partkey integer NOT NULL, p_brand character(10) NOT NULL, p_size integer NOT NULL, p_retailprice numeric(15,2) NOT NULL ); insert into part2 select 1, 'brand1', random_between(0, 40), random_between(0, 40) from generate_series(1, 100000); insert into part2 select 2, 'brand2', random_between(40, 80), random_between(0, 40) from generate_series(1, 100000); insert into part2 select 3, 'brand1', random_between(0, 40), random_between(0, 40) from generate_series(1, 100000);