Hi: I just tested more cases for the estimation issue for this feature, and we can find **we get a more accurate/stable estimation than before**. Here is the test cases and result (by comparing the master version and patched version).
create table ec_t110 as select i::int as a from generate_series(1, 110) i; create table ec_t200 as select i::int as a from generate_series(1, 200) i; create table ec_t500 as select i::int as a from generate_series(1, 500) i; create table ec_t800 as select i::int as a from generate_series(1, 800) i; create table ec_t1000 as select i::int as a from generate_series(1, 1000) i; analyze; -- 2 table joins. explain analyze select * from ec_t1000, ec_t110 where ec_t1000.a = ec_t110.a and ec_t1000.a > 100; -- (0.9) explain analyze select * from ec_t1000, ec_t110 where ec_t1000.a = ec_t110.a and ec_t110.a > 100; -- (0.1) -- 3 table joins. explain analyze select * from ec_t1000, ec_t110 , ec_t200 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t1000.a > 100; explain analyze select * from ec_t1000, ec_t110 , ec_t200 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t110.a > 100; explain analyze select * from ec_t1000, ec_t110 , ec_t200 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t200.a > 100; -- 4 table joins. explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t1000.a > 100; explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t110.a > 100; explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t200.a > 100; explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a > 100; -- 5 table joins. explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500, ec_t800 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a = ec_t800.a and ec_t1000.a > 100; explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500, ec_t800 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a = ec_t800.a and ec_t110.a > 100; explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500, ec_t800 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a = ec_t800.a and ec_t200.a > 100; explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500, ec_t800 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a = ec_t800.a and ec_t500.a > 100; explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500, ec_t800 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a = ec_t800.a and ec_t800.a > 100; | Query Id | Real rows | Est. Rows at master | Est. rows with patched | table # | |----------+-----------+---------------------+------------------------+---------| | 1 | 10 | 99 | 10 | 2 | | 2 | 10 | 10 | 10 | 2 | | 3 | 10 | 20 | 11 | 3 | | 4 | 10 | 2 | 11 | 3 | | 5 | 10 | 11 | 11 | 3 | | 6 | 10 | 10 | 9 | 4 | | 7 | 10 | 1 | 9 | 4 | | 8 | 10 | 6 | 9 | 4 | | 9 | 10 | 9 | 9 | 4 | | 10 | 10 | 8 | 8 | 5 | | 11 | 10 | 1 | 8 | 5 | | 12 | 10 | 5 | 8 | 5 | | 13 | 10 | 7 | 8 | 5 | | 14 | 10 | 8 | 8 | 5 | In the past, we can just use the qual user provided to do estimation. As for now, since we introduce the CorrectiveQuals design, we still keep just only 1 qual counted, but we can choose the best one in CorrectiveQuals no matter which one is provided by the user. we gain a better and stable estimation because of this. I'm happy about the overall design but not pretty confident about the method to "choose the best one to keep". So I did some test case as many as I can to find something is wrong, so far so good. I'm also happy with how to keep only one qual in CorrectiveQuals (not choose the best one). Assume we just have 1 EC filter in this query for simplicity. At the beginning, all the baserel have been impacted by CorrectiveQual. When join 2 relations, we rollback 1 side and keep the other one. when we join this joinrel with another rel, we rollback 1 side and keep the other one and so forth. (rollback is not changing some things which we already computed, it is only used when estimating size for coming joinrel). The patchset can be applied cleanly with 9e98583898c347e007958c8a09911be2ea4acfb9.