leanken commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-661951108


   > I'm on a fence now. We do reuse some codegen stuff in BHJ, but it also 
makes BHJ a bit hard to read since we put `if (isNullAwareAntiJoin)` in several 
places. On the other hand, a new node needs to duplicate some code, but seems 
OK as the duplication is small.
   > 
   > cc @maryannxue @viirya @maropu
   
   we could take multi-column support as another factor.
   
   
![image](https://user-images.githubusercontent.com/17242071/88077041-66a39a00-cbad-11ea-8fb6-c235c4d219b4.png)
   
   "no row has null for all columns" this will it even harder to use 
hashedRelation. because hashedRelation currently will filter rows which has any 
column which is null, but  we need these records.
   Broadcast[Array[InternalRow]] is more suitable for multi column support, for 
long-term consideration, BHJ might be messed up with multi column support 
because it's much more complicated, if putting code into new ExecNode, it will 
be less Historical burden. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to