Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/17936
@viirya , this is slightly different from caching RDD. It is more like
broadcasting, the final state is that each executor will hold the whole data of
RDD2, the difference is that this is executor-executor sync, not
driver-executor sync.
I also have the similar concern. The performance can be varied by
workloads, we'd better have some different workloads to see general
improvements.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]