Hello I verified that the patch is generally faster in my benchmarks, with one exception: anti joins with heavy duplication end up being significantly slower, for example:
create table ao (a int not null); create table ai (k int not null); insert into ao select g from generate_series(1,100000) g; insert into ai select g % 50 from generate_series(1,2000000) g; analyze ao; analyze ai; \timing on explain (analyze, costs off, timing off, summary off) select count(*) from ao where a not in (select distinct k from ai); Which seems related to parallelization, as in these scenarios the patched version chooses a serial execution compared to the parallelized deduplication on master, and ends up being 2-4x slower. If I force it to use parallel workers, it ends up being faster even in these cases.
