Re: Using each rel as both outer and inner for JOIN_ANTI
On Fri, Apr 7, 2023 at 3:28 PM Richard Guo wrote: > On Tue, Aug 2, 2022 at 3:13 PM Richard Guo wrote: > >> On Sun, Jul 31, 2022 at 12:07 AM Tom Lane wrote: >> >>> [ wanders away wondering if JOIN_RIGHT_SEMI should become a thing ... ] >> >> Maybe this is something we can do. Currently for the query below: >> >> # explain select * from foo where a in (select c from bar); >>QUERY PLAN >> - >> Hash Semi Join (cost=154156.00..173691.29 rows=10 width=8) >>Hash Cond: (foo.a = bar.c) >>-> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) >>-> Hash (cost=72124.00..72124.00 rows=500 width=4) >> -> Seq Scan on bar (cost=0.00..72124.00 rows=500 width=4) >> (5 rows) >> >> I believe we can get a cheaper plan if we are able to swap the outer and >> inner for SEMI JOIN and use the smaller 'foo' as inner rel. >> > It may not be easy for MergeJoin and NestLoop though, as we do not have > a way to know if an inner tuple has been already matched or not. But > the benefit of swapping inputs for MergeJoin and NestLoop seems to be > small, so I think it's OK to ignore them. > Hmm. Actually we can do it for MergeJoin by avoiding restoring inner scan to the marked tuple in EXEC_MJ_TESTOUTER, in the case when new outer tuple == marked tuple. But I'm not sure how much benefit we can get from Merge Right Semi Join. For HashJoin, though, there are cases that can surely benefit from Hash Right Semi Join. So I go ahead and have a try on it as attached. Thanks Richard v1-0001-Support-Right-Semi-Join.patch Description: Binary data
Re: Using each rel as both outer and inner for JOIN_ANTI
On Tue, Aug 2, 2022 at 3:13 PM Richard Guo wrote: > On Sun, Jul 31, 2022 at 12:07 AM Tom Lane wrote: > >> [ wanders away wondering if JOIN_RIGHT_SEMI should become a thing ... ] > > Maybe this is something we can do. Currently for the query below: > > # explain select * from foo where a in (select c from bar); >QUERY PLAN > - > Hash Semi Join (cost=154156.00..173691.29 rows=10 width=8) >Hash Cond: (foo.a = bar.c) >-> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) >-> Hash (cost=72124.00..72124.00 rows=500 width=4) > -> Seq Scan on bar (cost=0.00..72124.00 rows=500 width=4) > (5 rows) > > I believe we can get a cheaper plan if we are able to swap the outer and > inner for SEMI JOIN and use the smaller 'foo' as inner rel. > I'm thinking about the JOIN_RIGHT_SEMI thing and it seems that it can be implemented for HashJoin with very short change. What we want to do is to just have the first match for each inner tuple. So after scanning the hash bucket for matches, we just need to check whether the inner tuple has been set match and skip it if so, something like { if (!ExecScanHashBucket(node, econtext)) { /* out of matches; check for possible outer-join fill */ node->hj_JoinState = HJ_FILL_OUTER_TUPLE; continue; } } + /* + * In a right-semijoin, we only need the first match for each + * inner tuple. + */ + if (node->js.jointype == JOIN_RIGHT_SEMI && + HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple))) + continue; + I have a simple implementation locally and tried it with the query below and saw a speedup of 2055.617 ms VS. 1156.772 ms (both best of 3). # explain (costs off, analyze) select * from foo where a in (select c from bar); QUERY PLAN --- Hash Semi Join (actual time=1957.748..2055.058 rows=10 loops=1) Hash Cond: (foo.a = bar.c) -> Seq Scan on foo (actual time=0.026..0.029 rows=10 loops=1) -> Hash (actual time=1938.818..1938.819 rows=500 loops=1) Buckets: 262144 Batches: 64 Memory Usage: 4802kB -> Seq Scan on bar (actual time=0.016..853.010 rows=500 loops=1) Planning Time: 0.327 ms Execution Time: 2055.617 ms (8 rows) # explain (costs off, analyze) select * from foo where a in (select c from bar); QUERY PLAN - Hash Right Semi Join (actual time=11.525..1156.713 rows=10 loops=1) Hash Cond: (bar.c = foo.a) -> Seq Scan on bar (actual time=0.034..523.036 rows=500 loops=1) -> Hash (actual time=0.027..0.029 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on foo (actual time=0.009..0.014 rows=10 loops=1) Planning Time: 0.312 ms Execution Time: 1156.772 ms (8 rows) It may not be easy for MergeJoin and NestLoop though, as we do not have a way to know if an inner tuple has been already matched or not. But the benefit of swapping inputs for MergeJoin and NestLoop seems to be small, so I think it's OK to ignore them. So is it worthwhile to make JOIN_RIGHT_SEMI come true? Thanks Richard
Re: Using each rel as both outer and inner for JOIN_ANTI
On Thu, Apr 6, 2023 at 6:40 PM Richard Guo wrote: > Seems it wins if the parallel scan becomes part of a hash join in final > plan. I wonder if we have a way to know that in this early stage. I haven't tried but I'm not sure off the top of my head how to make a decision that early unless it's super coarse grained, like is there an anti join in here somewhere... Generally, this seems to be a consequence of our parallel query planner design where parallel paths are generated separately and compete on cost, as foretold by Hong and Stonebraker[1]. It's going to be hard to prune those early without missing out on some good plans, if you don't have a crystal ball. I wondered for a while what horrible technical problems would come up if we could defer creation of paths, so partial_pathlist is empty but a new RelOptInfo flag says "you can call finish_the_partial_pathlist_please(root, rel)) if you really want one". We could try to be sparing about calling it that so we don't finish up creating them all. That would at least move the should-I-bother-to-make-this-path? logic close to the place with the knowledge that it'd be useful, in this case the inner side of a parallel hash join. One problem is that you have to get a parallel_workers number from somewhere to make a partial path. The hash join path code knows what its own parallel_workers number will be (it inherits it from the outer path, though I can't immediately think of any good reason why it shouldn't be Max(inner, outer)), but if we were to feed that into a created-on-demand inner path that is then cached, we'd have a horrible ordering dependency (some other join in the query gets planned first with a different parallel_workers number and it gets cached differently). Yuck. As you noted, 0 isn't a great number either, but it leads to a another thought... I wondered if we could somehow use the complete (non-partial) path directly in some cases here, if certain conditions are met. Aside from any other technical problems, you might ask "but what about the estimates/costing we already did for that path"; well the numbers are usually wrong anyway! We have complete paths, and partial paths for arbitrary parallel_workers numbers that bubbled up from our scan size-based heuristics. Every time we combine more than one partial path, we have to handle non-matching "parallel degree" (I'm using that word to mean the result of get_parallel_divisor(path), a lightly grilled version of parallel_workers that makes dubious assumptions, but I digress). Concretely, parallel append and parallel hash join already rescale some of their input variables to match their own nominal parallel degree (ugh, I see a few things we should fix in that code, but this email is long enough already). I wonder if we might need some infrastructure to formalise that sort of thing. For example, look at the row estimates in the EXPLAIN of parallel append over tables with parallel_workers explicitly set to different numbers using ALTER TABLE. They are wrong (they're based on different parallel degrees; turn off parallel_leader_participation to make the arithmetic easier to follow), while the append node itself has rescaled them and has a good row estimate for its own nominal parallel degree, which in turn might be wrong depending on what is above. Perhaps EXPLAIN and everything else should use some common infrastructure to deal with this. In other words, we avoid the need for a try-every-parallel-degree explosion by rescaling from some arbitrary input degree to some arbitrary output degree, but we can't go all the way and do the two-phase Hong thing and rescale from non-partial paths in general (for various technical reasons that apply to more complex nodes, but not to basic scans). From where we are, I'm not sure how much of a big step it is to (1) formalise the path rescaling system and (2) be able to rescale some qualifying simple complete paths too, if needed for places like this. Of course you could quibble with the concept of linear scaling of various number by parallel degrees; various things aren't linear or even continuous (probably why Hong's system included hash join thresholds). Even the concept of get_parallel_divisor(path) as applied to row estimates is suspect, because it assumes that the planned workers will show up; if a smaller number shows up (at all, due to max_parallel_workers, or just to this node because a higher level parallel append sent workers to different subplans) then a node might receive many more input tuples than expected and blow through work_mem, even if all estimates were 100% perfect. I have no idea what to do about that. At least *that* problem doesn't apply to parallel hash, which shares the memory for the number of planned workers, even if fewer show up, but that ideas isn't without critics either. I dunno. Sorry for the wall of text/ideas. I see unfinished business in every direction. [1] https://www.postgresql.org/message-id/flat/
Re: Using each rel as both outer and inner for JOIN_ANTI
On Thu, Apr 6, 2023 at 1:06 PM Tom Lane wrote: > This: > > > +#if 0 > > /* If any limit was set to zero, the user doesn't want a > > parallel scan. */ > > if (parallel_workers <= 0) > > return; > > +#endif > > seems like it adds a lot of new paths with a lot lower chance > of win, but maybe we could tighten the conditions to improve > the odds? Seems it wins if the parallel scan becomes part of a hash join in final plan. I wonder if we have a way to know that in this early stage. BTW, zero parallel_workers seems would break some later assumptions, so we may need to give it a meaningful number if we want to do in this way. Thanks Richard
Re: Using each rel as both outer and inner for JOIN_ANTI
On Thu, Apr 6, 2023 at 8:18 AM Thomas Munro wrote: > On Thu, Apr 6, 2023 at 9:11 AM Tom Lane wrote: > > Richard Guo writes: > > > Thanks for reminding. Attached is the rebased patch, with no other > > > changes. I think the patch is ready for commit. > > > > Pushed after a little further fooling with the comments. I also had > > to rebase it over 11c2d6fdf (Parallel Hash Full Join). I think I did > > that correctly, but it's not clear to me whether any of the existing > > test cases are now doing parallelized hashed right antijoins. Might > > be worth a little more testing. > > I don't see any (at least that are EXPLAINed). Wondering if we should > add some of these into join_hash.sql, but probably not before I figure > out how to make that whole file run faster... Thanks Tom for the rebase and pushing. Agreed that we need to add more testing to cover Parallel Hash Right Anti Join. I tried one in join_hash.sql as below explain (costs off) select count(*) from simple r right join bigger_than_it_looks s using (id) where r.id is null; QUERY PLAN - Aggregate -> Gather Workers Planned: 2 -> Parallel Hash Right Anti Join Hash Cond: (r.id = s.id) -> Parallel Seq Scan on simple r -> Parallel Hash -> Parallel Seq Scan on bigger_than_it_looks s (8 rows) But as Thomas said, maybe we need to wait until that file becomes faster. Thanks Richard
Re: Using each rel as both outer and inner for JOIN_ANTI
Thomas Munro writes: > ... It works if you're OK creating partial paths > for everything... Hmm. The committed patch already causes us to investigate more paths than before, which I was okay with because it only costs more if there's an antijoin involved --- which it seems like there's at least a 50% chance of winning on, depending on which table is bigger. This: > +#if 0 > /* If any limit was set to zero, the user doesn't want a > parallel scan. */ > if (parallel_workers <= 0) > return; > +#endif seems like it adds a lot of new paths with a lot lower chance of win, but maybe we could tighten the conditions to improve the odds? regards, tom lane
Re: Using each rel as both outer and inner for JOIN_ANTI
On Thu, Apr 6, 2023 at 12:17 PM Thomas Munro wrote: > I tried the original example from the top of this thread and saw a > decent speedup from parallelism, but only if I set > min_parallel_table_scan_size=0, and otherwise it doesn't choose > Parallel Hash Right Anti Join. Same if I embiggen bar significantly. > Haven't looked yet but I wonder if there is some left/right confusion > on parallel degree computation or something like that... Ahh, the problem is just that create_plain_partial_paths() doesn't bother to create a partial path for foo at all, because it's so small, so hash_inner_and_outer() can't even consider a parallel join (that needs partial paths on both sides). What we want here is a shared hash table so we can have shared match flags, an entirely new concern, but create_plain_partial_path() can't see any point in a partial scan of such a small table. It works if you're OK creating partial paths for everything... +#if 0 /* If any limit was set to zero, the user doesn't want a parallel scan. */ if (parallel_workers <= 0) return; +#endif
Re: Using each rel as both outer and inner for JOIN_ANTI
On Thu, Apr 6, 2023 at 9:11 AM Tom Lane wrote: > Richard Guo writes: > > Thanks for reminding. Attached is the rebased patch, with no other > > changes. I think the patch is ready for commit. > > Pushed after a little further fooling with the comments. I also had > to rebase it over 11c2d6fdf (Parallel Hash Full Join). I think I did > that correctly, but it's not clear to me whether any of the existing > test cases are now doing parallelized hashed right antijoins. Might > be worth a little more testing. I don't see any (at least that are EXPLAINed). Wondering if we should add some of these into join_hash.sql, but probably not before I figure out how to make that whole file run faster... > I think that Alvaro's concern about incorrect cost estimates may be > misplaced. I couldn't find any obvious errors in the costing logic for > this, given that we concluded that the early-exit runtime logic cannot > apply. Also, when I try simply executing Richard's original test query > (in a non-JIT build), the runtimes I get line up quite well ... maybe > too well? ... with the cost estimates: > > v15 HEAD w/patchRatio > > Cost estimate 173691.19 90875.330.52 > Actual (best of 3) 514.200 ms 268.978 ms 0.52 > > I think the smaller differentials you guys were seeing were all about > EXPLAIN ANALYZE overhead. I tried the original example from the top of this thread and saw a decent speedup from parallelism, but only if I set min_parallel_table_scan_size=0, and otherwise it doesn't choose Parallel Hash Right Anti Join. Same if I embiggen bar significantly. Haven't looked yet but I wonder if there is some left/right confusion on parallel degree computation or something like that...
Re: Using each rel as both outer and inner for JOIN_ANTI
Richard Guo writes: > Thanks for reminding. Attached is the rebased patch, with no other > changes. I think the patch is ready for commit. Pushed after a little further fooling with the comments. I also had to rebase it over 11c2d6fdf (Parallel Hash Full Join). I think I did that correctly, but it's not clear to me whether any of the existing test cases are now doing parallelized hashed right antijoins. Might be worth a little more testing. I think that Alvaro's concern about incorrect cost estimates may be misplaced. I couldn't find any obvious errors in the costing logic for this, given that we concluded that the early-exit runtime logic cannot apply. Also, when I try simply executing Richard's original test query (in a non-JIT build), the runtimes I get line up quite well ... maybe too well? ... with the cost estimates: v15 HEAD w/patchRatio Cost estimate 173691.19 90875.330.52 Actual (best of 3) 514.200 ms 268.978 ms 0.52 I think the smaller differentials you guys were seeing were all about EXPLAIN ANALYZE overhead. regards, tom lane
Re: Using each rel as both outer and inner for JOIN_ANTI
On Wed, Mar 15, 2023 at 2:25 AM Gregory Stark (as CFM) wrote: > So what is the status of this patch? > > It looks like you received some feedback from Emre, Tom, Ronan, and > Alvaro but it also looks like you responded to most or all of that. > Are you still blocked waiting for feedback? Anything specific you need > help with? > > Or is the patch ready for commit now? In which case it would be good > to rebase it since it's currently failing to apply. Well it would be > good to rebase regardless but it would be especially important if we > want to get it committed :) Thanks for reminding. Attached is the rebased patch, with no other changes. I think the patch is ready for commit. Thanks Richard v6-0001-Using-each-rel-as-both-outer-and-inner-for-anti-joins.patch Description: Binary data
Re: Using each rel as both outer and inner for JOIN_ANTI
So what is the status of this patch? It looks like you received some feedback from Emre, Tom, Ronan, and Alvaro but it also looks like you responded to most or all of that. Are you still blocked waiting for feedback? Anything specific you need help with? Or is the patch ready for commit now? In which case it would be good to rebase it since it's currently failing to apply. Well it would be good to rebase regardless but it would be especially important if we want to get it committed :)
Re: Using each rel as both outer and inner for JOIN_ANTI
On Wed, Aug 10, 2022 at 4:40 PM Alvaro Herrera wrote: > On 2022-Aug-10, Richard Guo wrote: > > > The right-anti join plan has the same cost estimation with right join > > plan in this case. So would you please help to test what the right join > > plan looks like in your env for the query below? > > > > select * from foo left join bar on foo.a = bar.c; > > You're right, it does. > > 55432 16devel 475322=# explain (analyze, buffers) select * from foo left > join bar on foo.a = bar.c; > QUERY PLAN > > > ── > Hash Right Join (cost=1.23..90875.24 rows=10 width=20) (actual > time=456.410..456.415 rows=10 loops=1) >Hash Cond: (bar.c = foo.a) >Buffers: shared hit=15852 read=6273 >-> Seq Scan on bar (cost=0.00..72124.00 rows=500 width=12) > (actual time=0.036..210.468 rows=500 loops=1) > Buffers: shared hit=15852 read=6272 >-> Hash (cost=1.10..1.10 rows=10 width=8) (actual time=0.037..0.038 > rows=10 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 9kB > Buffers: shared read=1 > -> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) (actual > time=0.022..0.026 rows=10 loops=1) >Buffers: shared read=1 > Planning: >Buffers: shared hit=92 read=13 > Planning Time: 1.077 ms > Execution Time: 456.458 ms > (14 filas) Thanks for help testing. Comparing the anti join plan and the right join plan, the estimated cost and the execution time mismatch a lot. Seems the cost estimate of hashjoin path is not that precise for this case even in the unpatched codes. Maybe this is something we need to improve. Thanks Richard
Re: Using each rel as both outer and inner for JOIN_ANTI
On 2022-Aug-10, Richard Guo wrote: > The right-anti join plan has the same cost estimation with right join > plan in this case. So would you please help to test what the right join > plan looks like in your env for the query below? > > select * from foo left join bar on foo.a = bar.c; You're right, it does. 55432 16devel 475322=# explain (analyze, buffers) select * from foo left join bar on foo.a = bar.c; QUERY PLAN ── Hash Right Join (cost=1.23..90875.24 rows=10 width=20) (actual time=456.410..456.415 rows=10 loops=1) Hash Cond: (bar.c = foo.a) Buffers: shared hit=15852 read=6273 -> Seq Scan on bar (cost=0.00..72124.00 rows=500 width=12) (actual time=0.036..210.468 rows=500 loops=1) Buffers: shared hit=15852 read=6272 -> Hash (cost=1.10..1.10 rows=10 width=8) (actual time=0.037..0.038 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB Buffers: shared read=1 -> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) (actual time=0.022..0.026 rows=10 loops=1) Buffers: shared read=1 Planning: Buffers: shared hit=92 read=13 Planning Time: 1.077 ms Execution Time: 456.458 ms (14 filas) 55432 16devel 475322=# explain (analyze, buffers) select * from foo left join bar on foo.a = bar.c where bar.c is null; QUERY PLAN ── Hash Right Anti Join (cost=1.23..90875.24 rows=10 width=20) (actual time=451.747..451.751 rows=10 loops=1) Hash Cond: (bar.c = foo.a) Buffers: shared hit=15646 read=6479 -> Seq Scan on bar (cost=0.00..72124.00 rows=500 width=12) (actual time=0.048..204.940 rows=500 loops=1) Buffers: shared hit=15645 read=6479 -> Hash (cost=1.10..1.10 rows=10 width=8) (actual time=0.030..0.031 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB Buffers: shared hit=1 -> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) (actual time=0.017..0.020 rows=10 loops=1) Buffers: shared hit=1 Planning Time: 0.227 ms Execution Time: 451.793 ms -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Every machine is a smoke machine if you operate it wrong enough." https://twitter.com/libseybieda/status/1541673325781196801
Re: Using each rel as both outer and inner for JOIN_ANTI
On Tue, Aug 9, 2022 at 6:54 PM Alvaro Herrera wrote: > I suppose this looks good as far as the plan goes, but the cost estimation > might be a little bit too optimistic: it is reporting that the new plan > costs 50% of the original, yet the execution time is only 5% lower. Thanks for trying this patch. Yeah, the estimated cost doesn't match the execution time here. I tried the query locally and here is what I got (best of three): Unpatched: # explain analyze select * from foo left join bar on foo.a = bar.c where bar.c is null; QUERY PLAN --- Hash Anti Join (cost=154156.00..173691.19 rows=1 width=16) (actual time=1548.622..1548.624 rows=0 loops=1) Hash Cond: (foo.a = bar.c) -> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) (actual time=0.024..0.026 rows=10 loops=1) -> Hash (cost=72124.00..72124.00 rows=500 width=8) (actual time=1443.157..1443.158 rows=500 loops=1) Buckets: 262144 Batches: 64 Memory Usage: 5107kB -> Seq Scan on bar (cost=0.00..72124.00 rows=500 width=8) (actual time=0.045..481.059 rows=500 loops=1) Planning Time: 0.262 ms Execution Time: 1549.138 ms (8 rows) Patched: # explain analyze select * from foo left join bar on foo.a = bar.c where bar.c is null; QUERY PLAN - Hash Right Anti Join (cost=1.23..90875.33 rows=1 width=16) (actual time=985.773..985.775 rows=0 loops=1) Hash Cond: (bar.c = foo.a) -> Seq Scan on bar (cost=0.00..72124.00 rows=500 width=8) (actual time=0.095..438.333 rows=500 loops=1) -> Hash (cost=1.10..1.10 rows=10 width=8) (actual time=0.076..0.077 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) (actual time=0.060..0.064 rows=10 loops=1) Planning Time: 0.290 ms Execution Time: 985.830 ms (8 rows) Seems the cost matches the execution time better in my local box. The right-anti join plan has the same cost estimation with right join plan in this case. So would you please help to test what the right join plan looks like in your env for the query below? select * from foo left join bar on foo.a = bar.c; Thanks Richard
Re: Using each rel as both outer and inner for JOIN_ANTI
Just for kicks, I ran query in your original post under EXPLAIN ANALYZE in both patched and unpatched with this last version. I got this (best of three): Unpatched: 55432 16devel 437532=# explain (analyze, buffers) select * from foo left join bar on foo.a = bar.c where bar.c is null; QUERY PLAN Hash Anti Join (cost=159039.00..183457.23 rows=10 width=20) (actual time=482.788..483.182 rows=10 loops=1) Hash Cond: (foo.a = bar.c) Buffers: shared hit=161 read=21964, temp read=8 written=8 -> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) (actual time=0.020..0.022 rows=10 loops=1) Buffers: shared hit=1 -> Hash (cost=72124.00..72124.00 rows=500 width=12) (actual time=482.128..482.129 rows=0 loops=1) Buckets: 262144 Batches: 64 Memory Usage: 2048kB Buffers: shared hit=160 read=21964 -> Seq Scan on bar (cost=0.00..72124.00 rows=500 width=12) (actual time=0.092..237.431 rows=500 loops=1) Buffers: shared hit=160 read=21964 Planning Time: 0.182 ms Execution Time: 483.248 ms Patched: QUERY PLAN ── Hash Right Anti Join (cost=1.23..90875.24 rows=10 width=20) (actual time=457.654..457.658 rows=10 loops=1) Hash Cond: (bar.c = foo.a) Buffers: shared hit=33 read=22092 -> Seq Scan on bar (cost=0.00..72124.00 rows=500 width=12) (actual time=0.020..229.097 rows=500 loops=1) Buffers: shared hit=32 read=22092 -> Hash (cost=1.10..1.10 rows=10 width=8) (actual time=0.011..0.012 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB Buffers: shared hit=1 -> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) (actual time=0.006..0.007 rows=10 loops=1) Buffers: shared hit=1 Planning Time: 0.067 ms Execution Time: 457.679 ms I suppose this looks good as far as the plan goes, but the cost estimation might be a little bit too optimistic: it is reporting that the new plan costs 50% of the original, yet the execution time is only 5% lower. I wonder where does time go (in unpatched) when seqscanning finishes and before hashing starts. (I had to disable JIT for the first one, as it insisted on JITting tuple deforming.) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'. After collecting 500 such letters, he mused, a university somewhere in Arizona would probably grant him a degree. (Don Knuth)
Re: Using each rel as both outer and inner for JOIN_ANTI
On Tue, Aug 2, 2022 at 3:13 PM Richard Guo wrote: > On Sun, Jul 31, 2022 at 12:07 AM Tom Lane wrote: > >> I took a quick look through this. The executor changes are indeed >> impressively short, but that's largely because you've paid zero >> attention to updating obsoleted comments. For example, in >> nodeHashjoin.c there are lots of references to right/full joins >> that likely now need to cover right-anti. I'm not sure that the >> empty-rel startup optimizations are correct for this case, either. > > > Thanks for the review! Yeah, you're right. I neglected to update the > related comments. Will do that in the new patch. For the empty-rel > startup optimizations, since the right-anti join also does null-fill on > inner relation (the HJ_FILL_INNER case), I think we cannot skip building > the hash table even when the outer rel is completely empty. > Here is the new patch which addresses the obsoleted comments. Thanks Richard v5-0001-Using-each-rel-as-both-outer-and-inner-for-anti-j.patch Description: Binary data
Re: Using each rel as both outer and inner for JOIN_ANTI
On Sun, Jul 31, 2022 at 12:07 AM Tom Lane wrote: > Richard Guo writes: > > [ v4-0001-Using-each-rel-as-both-outer-and-inner-for-anti-j.patch ] > > I took a quick look through this. The executor changes are indeed > impressively short, but that's largely because you've paid zero > attention to updating obsoleted comments. For example, in > nodeHashjoin.c there are lots of references to right/full joins > that likely now need to cover right-anti. I'm not sure that the > empty-rel startup optimizations are correct for this case, either. Thanks for the review! Yeah, you're right. I neglected to update the related comments. Will do that in the new patch. For the empty-rel startup optimizations, since the right-anti join also does null-fill on inner relation (the HJ_FILL_INNER case), I think we cannot skip building the hash table even when the outer rel is completely empty. > I don't have a warm feeling about the planner changes being correct: > it looks like what you mostly did was to treat JOIN_RIGHT_ANTI > identically to JOIN_ANTI everywhere, which is surely wrong. > As an example of this, optimizer/README mentions > > Similarly, parameterized paths do not normally get preference in add_path > for having cheap startup cost; that's seldom of much value when on the > inside of a nestloop, so it seems not worth keeping extra paths solely > for > that. An exception occurs for parameterized paths for the RHS relation > of > a SEMI or ANTI join: in those cases, we can stop the inner scan after the > first match, so it's primarily startup not total cost that we care about. > > For RIGHT_ANTI it'd become startup of the outer scan that counts, but > I don't think you've gotten that right here. I think JOIN_RIGHT_ANTI behaves more like JOIN_RIGHT, except that JOIN_RIGHT returns a matched tuple while JOIN_RIGHT_ANTI does not. For each outer tuple, right-anti needs to scan the inner rel for every match and mark its hashtable entry. So I think the right-anti join should not belong to the case 'in those cases, we can stop the inner scan after the first match, so it's primarily startup not total cost that we care about.' Am I thinking it correctly? > [ wanders away wondering if JOIN_RIGHT_SEMI should become a thing ... ] Maybe this is something we can do. Currently for the query below: # explain select * from foo where a in (select c from bar); QUERY PLAN - Hash Semi Join (cost=154156.00..173691.29 rows=10 width=8) Hash Cond: (foo.a = bar.c) -> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) -> Hash (cost=72124.00..72124.00 rows=500 width=4) -> Seq Scan on bar (cost=0.00..72124.00 rows=500 width=4) (5 rows) I believe we can get a cheaper plan if we are able to swap the outer and inner for SEMI JOIN and use the smaller 'foo' as inner rel. Thanks Richard
Re: Using each rel as both outer and inner for JOIN_ANTI
Richard Guo writes: > [ v4-0001-Using-each-rel-as-both-outer-and-inner-for-anti-j.patch ] I took a quick look through this. The executor changes are indeed impressively short, but that's largely because you've paid zero attention to updating obsoleted comments. For example, in nodeHashjoin.c there are lots of references to right/full joins that likely now need to cover right-anti. I'm not sure that the empty-rel startup optimizations are correct for this case, either. I don't have a warm feeling about the planner changes being correct: it looks like what you mostly did was to treat JOIN_RIGHT_ANTI identically to JOIN_ANTI everywhere, which is surely wrong. As an example of this, optimizer/README mentions Similarly, parameterized paths do not normally get preference in add_path for having cheap startup cost; that's seldom of much value when on the inside of a nestloop, so it seems not worth keeping extra paths solely for that. An exception occurs for parameterized paths for the RHS relation of a SEMI or ANTI join: in those cases, we can stop the inner scan after the first match, so it's primarily startup not total cost that we care about. For RIGHT_ANTI it'd become startup of the outer scan that counts, but I don't think you've gotten that right here. There are various references to JOIN_ANTI in planner peripheral code, e.g. selfuncs.c, that probably need adjustment. [ wanders away wondering if JOIN_RIGHT_SEMI should become a thing ... ] regards, tom lane
Re: Using each rel as both outer and inner for JOIN_ANTI
On Fri, Jul 2, 2021 at 11:23 AM Richard Guo wrote: > Thanks! Test cases are updated in v3 patch. Also merge join can do the > 'right anti join' too in the same patch. > > Thanks again for reviewing this patch. > Rebased this patch with latest master, with no other changes. Thanks Richard v4-0001-Using-each-rel-as-both-outer-and-inner-for-anti-j.patch Description: Binary data
Re: Using each rel as both outer and inner for JOIN_ANTI
On Fri, Jul 2, 2021 at 11:59 AM Zhihong Yu wrote: > > > On Thu, Jul 1, 2021 at 8:24 PM Richard Guo wrote: > >> >> On Thu, Jul 1, 2021 at 3:18 PM Ronan Dunklau >> wrote: >> >>> Le jeudi 1 juillet 2021, 09:09:38 CEST Ronan Dunklau a écrit : >>> > > Yes, thanks! I was making a big mistake here thinking the executor >>> can >>> > > stop after the first match. That's not true. We need to use each >>> outer >>> > > tuple to find all the matches and mark the corresponding hashtable >>> > > entries. I have updated the patch with the fix. >>> > >>> > It looks OK to me. >>> >>> I forgot to mention: you also have failing tests due to the plans >>> changing to >>> use the new join type. This might not be the case anymore once you >>> update the >>> cost model, but if that's the case the tests should be updated. >>> >> >> Thanks! Test cases are updated in v3 patch. Also merge join can do the >> 'right anti join' too in the same patch. >> >> Thanks again for reviewing this patch. >> >> Thanks >> Richard >> >> > Hi, > Minor comment: > +* In a right-antijoin, we never return a matched > tuple. > > matched tuple -> matching tuple > Thanks for reviewing. The comment for JOIN_ANTI in ExecHashJoinImpl/ExecMergeJoin is: "In an antijoin, we never return a matched tuple" So I think we'd better keep it consistent for JOIN_RIGHT_ANTI? Thanks Richard
Re: Using each rel as both outer and inner for JOIN_ANTI
On Thu, Jul 1, 2021 at 8:24 PM Richard Guo wrote: > > On Thu, Jul 1, 2021 at 3:18 PM Ronan Dunklau > wrote: > >> Le jeudi 1 juillet 2021, 09:09:38 CEST Ronan Dunklau a écrit : >> > > Yes, thanks! I was making a big mistake here thinking the executor can >> > > stop after the first match. That's not true. We need to use each outer >> > > tuple to find all the matches and mark the corresponding hashtable >> > > entries. I have updated the patch with the fix. >> > >> > It looks OK to me. >> >> I forgot to mention: you also have failing tests due to the plans >> changing to >> use the new join type. This might not be the case anymore once you update >> the >> cost model, but if that's the case the tests should be updated. >> > > Thanks! Test cases are updated in v3 patch. Also merge join can do the > 'right anti join' too in the same patch. > > Thanks again for reviewing this patch. > > Thanks > Richard > > Hi, Minor comment: +* In a right-antijoin, we never return a matched tuple. matched tuple -> matching tuple Cheers
Re: Using each rel as both outer and inner for JOIN_ANTI
On Thu, Jul 1, 2021 at 3:18 PM Ronan Dunklau wrote: > Le jeudi 1 juillet 2021, 09:09:38 CEST Ronan Dunklau a écrit : > > > Yes, thanks! I was making a big mistake here thinking the executor can > > > stop after the first match. That's not true. We need to use each outer > > > tuple to find all the matches and mark the corresponding hashtable > > > entries. I have updated the patch with the fix. > > > > It looks OK to me. > > I forgot to mention: you also have failing tests due to the plans changing > to > use the new join type. This might not be the case anymore once you update > the > cost model, but if that's the case the tests should be updated. > Thanks! Test cases are updated in v3 patch. Also merge join can do the 'right anti join' too in the same patch. Thanks again for reviewing this patch. Thanks Richard v3-0001-Using-each-rel-as-both-outer-and-inner-for-anti-join.patch Description: Binary data
Re: Using each rel as both outer and inner for JOIN_ANTI
Le jeudi 1 juillet 2021, 09:09:38 CEST Ronan Dunklau a écrit : > > Yes, thanks! I was making a big mistake here thinking the executor can > > stop after the first match. That's not true. We need to use each outer > > tuple to find all the matches and mark the corresponding hashtable > > entries. I have updated the patch with the fix. > > It looks OK to me. I forgot to mention: you also have failing tests due to the plans changing to use the new join type. This might not be the case anymore once you update the cost model, but if that's the case the tests should be updated.
Re: Using each rel as both outer and inner for JOIN_ANTI
> Yes, thanks! I was making a big mistake here thinking the executor can > stop after the first match. That's not true. We need to use each outer > tuple to find all the matches and mark the corresponding hashtable > entries. I have updated the patch with the fix. It looks OK to me. > > > I think we can basically use the same cost calculation as with anti > > > joins, since they share the fact that the executor will stop after the > > > first match. However, there are still some differences. Such as when we > > > consider the number of tuples that will pass the basic join, I think we > > > need to use unmatched inner rows, rather than unmatched outer rows. > > > > Due to the fact we cannot just skip at the first match, I'm not sure this > > works > > either. > > This is not correct any more since the fact that the executor will stop > after the first match does not hold true. A brief thought show me that > we can use the same cost calculation as with right joins. Once you do that, you should also add test coverage for those new plans. Also consider adding a commitfest entry. Regards, -- Ronan Dunklau
Re: Using each rel as both outer and inner for JOIN_ANTI
On Tue, Jun 29, 2021 at 10:41 PM Ronan Dunklau wrote: > Le mardi 29 juin 2021, 10:55:59 CEST Richard Guo a écrit : > > On Tue, Jun 29, 2021 at 3:55 PM Emre Hasegeli wrote: > > > > Thanks for the explanation. Attached is a demo code for the hash-join > > > > case, which is only for PoC to show how we can make it work. It's far > > > > from complete, at least we need to adjust the cost calculation for > this > > > > 'right anti join'. > > > > > > I applied the patch and executed some queries. Hash Right Anti Joins > > > seem to be working correctly. Though, some of the tests are failing. > > > I guessed it's because the other join algorithms do not support right > > > anti join, but I couldn't reproduce it. > > > > Thanks for verifying this patch. > > I also ran the tests on this patch, and can confirm the tests are failing. > > The reason for that is that you request a new outer tuple whenever we have > a > match, even when the outer tuple could match several tuples from the hash > table: we end up emitting the duplicates because we switched to another > tuple > after the first match. > Yes, thanks! I was making a big mistake here thinking the executor can stop after the first match. That's not true. We need to use each outer tuple to find all the matches and mark the corresponding hashtable entries. I have updated the patch with the fix. > > I think we can basically use the same cost calculation as with anti > > joins, since they share the fact that the executor will stop after the > > first match. However, there are still some differences. Such as when we > > consider the number of tuples that will pass the basic join, I think we > > need to use unmatched inner rows, rather than unmatched outer rows. > > Due to the fact we cannot just skip at the first match, I'm not sure this > works > either. > This is not correct any more since the fact that the executor will stop after the first match does not hold true. A brief thought show me that we can use the same cost calculation as with right joins. Thanks Richard v2-0001-Using-each-rel-as-both-outer-and-inner-for-anti-join.patch Description: Binary data
Re: Using each rel as both outer and inner for JOIN_ANTI
Le mardi 29 juin 2021, 10:55:59 CEST Richard Guo a écrit : > On Tue, Jun 29, 2021 at 3:55 PM Emre Hasegeli wrote: > > > Thanks for the explanation. Attached is a demo code for the hash-join > > > case, which is only for PoC to show how we can make it work. It's far > > > from complete, at least we need to adjust the cost calculation for this > > > 'right anti join'. > > > > I applied the patch and executed some queries. Hash Right Anti Joins > > seem to be working correctly. Though, some of the tests are failing. > > I guessed it's because the other join algorithms do not support right > > anti join, but I couldn't reproduce it. > > Thanks for verifying this patch. I also ran the tests on this patch, and can confirm the tests are failing. The reason for that is that you request a new outer tuple whenever we have a match, even when the outer tuple could match several tuples from the hash table: we end up emitting the duplicates because we switched to another tuple after the first match. You can set up a simple test case like this: create table t1 (id int); create table t2 (id int); insert into t1 select generate_series (1, 1); insert into t2 VALUES (1), (1), (-1), (-1); analyze t1, t2; set enable_hashjoin = off; explain (analyze) select * from t2 where not exists (SELECT 1 FROM t1 where t1.id = t2.id); set enable_nestloop = off; set enable_hashjoin = on; explain (analyze) select * from t2 where not exists (SELECT 1 FROM t1 where t1.id = t2.id); Also, I'm not familiar enough with the hash join algorithm to know if the approach of "scanning every outer tuple, skip emitting matching inner tuples" would be correct, but this is the first problem I notice. Not getting into the HJ_NEED_NEW_OUTER state when the join type is JOIN_RIGHT_ANTI seems to fix this specific issue tough. > I think we can basically use the same cost calculation as with anti > joins, since they share the fact that the executor will stop after the > first match. However, there are still some differences. Such as when we > consider the number of tuples that will pass the basic join, I think we > need to use unmatched inner rows, rather than unmatched outer rows. Due to the fact we cannot just skip at the first match, I'm not sure this works either. > > Thanks > Richard
Re: Using each rel as both outer and inner for JOIN_ANTI
On Tue, Jun 29, 2021 at 3:55 PM Emre Hasegeli wrote: > > Thanks for the explanation. Attached is a demo code for the hash-join > > case, which is only for PoC to show how we can make it work. It's far > > from complete, at least we need to adjust the cost calculation for this > > 'right anti join'. > > I applied the patch and executed some queries. Hash Right Anti Joins > seem to be working correctly. Though, some of the tests are failing. > I guessed it's because the other join algorithms do not support right > anti join, but I couldn't reproduce it. > Thanks for verifying this patch. > > I am impressed by how simple the patch is: only 2 lines to support a > new join algorithm. This is a good case for the quality of Postgres > code. I hope supporting the other join algorithms would be similar. > Yes, thanks to the excellent design pattern of the execution codes, we only need very few changes to support this new join type. > > I am not sure how the cost estimation should differ from straight anti > join. It seems to me that the planner is already making the right > choice by taking into account the cost of the Hash node which makes > the whole cost greater when the inner table is much bigger. > I think we can basically use the same cost calculation as with anti joins, since they share the fact that the executor will stop after the first match. However, there are still some differences. Such as when we consider the number of tuples that will pass the basic join, I think we need to use unmatched inner rows, rather than unmatched outer rows. > > I am not an expert planner, but it feels to me like a good feature > that can provide better plans in some cases. Given it works correctly > and the implementation is so simple, the only argument against it may > be increased planning time. I know that the planner performance is > affected negatively by the number of join paths to consider. This may > not be a bigger deal as typically there are not many anti joins in a > query, but it'd still be a good idea to do some performance tests. > Agree. Performance tests are necessary if we consider finishing this patch. Thanks Richard
Re: Using each rel as both outer and inner for JOIN_ANTI
> Thanks for the explanation. Attached is a demo code for the hash-join > case, which is only for PoC to show how we can make it work. It's far > from complete, at least we need to adjust the cost calculation for this > 'right anti join'. I applied the patch and executed some queries. Hash Right Anti Joins seem to be working correctly. Though, some of the tests are failing. I guessed it's because the other join algorithms do not support right anti join, but I couldn't reproduce it. I am impressed by how simple the patch is: only 2 lines to support a new join algorithm. This is a good case for the quality of Postgres code. I hope supporting the other join algorithms would be similar. I am not sure how the cost estimation should differ from straight anti join. It seems to me that the planner is already making the right choice by taking into account the cost of the Hash node which makes the whole cost greater when the inner table is much bigger. I am not an expert planner, but it feels to me like a good feature that can provide better plans in some cases. Given it works correctly and the implementation is so simple, the only argument against it may be increased planning time. I know that the planner performance is affected negatively by the number of join paths to consider. This may not be a bigger deal as typically there are not many anti joins in a query, but it'd still be a good idea to do some performance tests.
Re: Using each rel as both outer and inner for JOIN_ANTI
On Thu, Jun 24, 2021 at 10:14 PM Tom Lane wrote: > Heikki Linnakangas writes: > > On 24/06/2021 12:50, Richard Guo wrote: > >> I believe if we use the smaller table 'foo' as inner side for this > >> query, we would have a cheaper plan. > > > How would that work? > > I think you could make it work for the hash-join case by extending > the existing mechanism for right joins: emit nothing during the main > scan, but mark hashtable entries when a match is found. Then make > a post-pass and emit hash entries that never found a match. So > basically just the inverse behavior of a right join, but with the > same state info. > > Merge join could likely support "right anti join" too, though the > benefit of swapping inputs tends to be small there, so it may not be > worth doing. > > Obviously there's a pretty fair amount of code to write to make this > happen. > Thanks for the explanation. Attached is a demo code for the hash-join case, which is only for PoC to show how we can make it work. It's far from complete, at least we need to adjust the cost calculation for this 'right anti join'. Am I going in the right direction? Thanks Richard 0001-Using-each-rel-as-both-outer-and-inner-for-anti-join.patch Description: Binary data
Re: Using each rel as both outer and inner for JOIN_ANTI
Heikki Linnakangas writes: > On 24/06/2021 12:50, Richard Guo wrote: >> I believe if we use the smaller table 'foo' as inner side for this >> query, we would have a cheaper plan. > How would that work? I think you could make it work for the hash-join case by extending the existing mechanism for right joins: emit nothing during the main scan, but mark hashtable entries when a match is found. Then make a post-pass and emit hash entries that never found a match. So basically just the inverse behavior of a right join, but with the same state info. Merge join could likely support "right anti join" too, though the benefit of swapping inputs tends to be small there, so it may not be worth doing. Obviously there's a pretty fair amount of code to write to make this happen. regards, tom lane
Re: Using each rel as both outer and inner for JOIN_ANTI
On 24/06/2021 12:50, Richard Guo wrote: Hi hackers, We may have anti-joins in several cases. Sublinks of 'NOT EXISTS' may be pulled up as anti-joins. Left joins whose join quals are strict for any nullable var that is forced null by higher qual levels will also be reduced to anti-joins. So anti-joins are very commonly used in practice. Currently when populating anti-join with paths, we do not try to swap the outer and inner to get both paths. That may make us miss some cheaper paths. # insert into foo select i, i from generate_series(1,10)i; INSERT 0 10 # insert into bar select i, i from generate_series(1,500)i; INSERT 0 500 # explain select * from foo left join bar on foo.a = bar.c where bar.c is null; QUERY PLAN - Hash Anti Join (cost=154156.00..173691.19 rows=1 width=16) Hash Cond: (foo.a = bar.c) -> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) -> Hash (cost=72124.00..72124.00 rows=500 width=8) -> Seq Scan on bar (cost=0.00..72124.00 rows=500 width=8) (5 rows) I believe if we use the smaller table 'foo' as inner side for this query, we would have a cheaper plan. How would that work? - Heikki