Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2933#issuecomment-60501858
@JoshRosen The motivation is not about performance, it's about stability.
Sending tasks to executors is the critical part in spark, it should be as
stable as possible. Using broadcast to sending tasks bring much of the
complexity (runtime) to it, actually it introduce some problems for us (we did
not have them in 1.0). The motivation of this patch is remove the complexity of
broadcast in most cases, only using it when broadcast can bring performance
benefits (the tasks is large enough). In the future, maybe we could increase
broadcastTaskMinSizeKB to 100 or even more.
This bring some complexity for code (not big), but actually simplify the
runtime behavior. It also will have some performance gain (no RPC or cache at
all),
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]