amoghmargoor edited a comment on issue #15178: [SPARK-17556][SQL] Executor side broadcast for broadcast joins URL: https://github.com/apache/spark/pull/15178#issuecomment-481952736 @viirya Thanks for this diff. We found one issue here, which I wanted to point out just in case somebody wanted to use this patch. There are references to broadcast.value in BroadcastHashJoinExec which gets executed on Driver. That might bring the RDD values being broadcasted to Driver's block manager too. That happens due to code generation flow. To fix it, we took the shortcut and avoided using one hash join optimization in code gen for cases where keys in build side are unique. Not sure if we can come up with solution where we need not have to sacrifice upon that.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
