mridulm commented on PR #40629: URL: https://github.com/apache/spark/pull/40629#issuecomment-1517342791
A few points to consider: a) Task binary is already broadcasted - so the inlined versoin should not have an overhead by itself. Ignore (a) for time being, b) the poc above was a strawman proposal to inline, we can use other strategies for the small block case in torrent broadcast itself - either inline (as in the example), or fetch from block manager directly, etc. Essentially, what I am trying to get to is, having users explicitly try to reason about whether their broadcast data is small or large is brittle - this is something that needs to be handled automatically by the broadcast impl seemlessly. When data is small, use more efficient paths, and progressively move to more expensive options when data is larger - without any user code change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
