leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-661951108
> I'm on a fence now. We do reuse some codegen stuff in BHJ, but it also makes BHJ a bit hard to read since we put `if (isNullAwareAntiJoin)` in several places. On the other hand, a new node needs to duplicate some code, but seems OK as the duplication is small. > > cc @maryannxue @viirya @maropu we could take multi-column support as another factor.  "no row has null for all columns" this will it even harder to use hashedRelation. because hashedRelation currently will filter rows which has any column which is null, but we need these records. Broadcast[Array[InternalRow]] is more suitable for multi column support, for long-term consideration, BHJ might be messed up with multi column support because it's much more complicated, if putting code into new ExecNode, it will be less Historical burden. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
