[GitHub] [spark] agrawaldevesh commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

GitBox Tue, 21 Jul 2020 10:34:26 -0700


agrawaldevesh commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-662002297



   @leanken ... How important is the `isUniqueKeyCodePath` in the BHJ codegen 
when NAAJ is present ? That is the key optimization in BHJ and I am wondering 
if it is even applicable to NAAJ ? The reason I ask is:
   - If that optimization is relevant for NAAJ, then you would have to 
duplicate it into the new broadcast null aware anti join node.
   - If it is not relevant, then you can omit it in the new node.
   
   Basically, I think if we end up having to duplicate a lot of logic into the 
new node, then the new node becomes less appealing. 
   
   I can see why having a new node is a good way to future proof, but I 
genuinely think that we may never need to actually embellish not-in query for 
multiple-keys and/or the distributed case given how complex it is. Or atleast 
it would be great to see some real data that suggests that multiple keys and 
distributed build side is common enough to warrant the complexity.
   
   So my vote would be: If the new node ends up copying a lot of code (like the 
`isUniqueKey` optimization) then we should just consider modifying the BHJ as 
is. That is, at this point I am leaning towards just modifying the BHJ node 
with a flag.
   
   We can always refactor and move this into a new node when we actually get 
around to implementing more complex not-in cases.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] agrawaldevesh commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

Reply via email to