amoghmargoor edited a comment on issue #15178: [SPARK-17556][SQL] Executor side 
broadcast for broadcast joins
URL: https://github.com/apache/spark/pull/15178#issuecomment-481952736
 
 
   @viirya Thanks for this diff. 
   We found one issue here, which I wanted to point out just in case somebody 
wanted to use this patch.
   There are references to broadcast.value in BroadcastHashJoinExec which gets 
executed on Driver. That might bring the RDD values being broadcasted to 
Driver's block manager too.  That happens due to code generation flow. To fix 
it, we took the shortcut and avoided using one hash join optimization in code 
gen for cases where keys in build side are unique. Not sure if we can come up 
with solution where we need not have to sacrifice upon that.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to