Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17936
  
    @viirya , this is slightly different from caching RDD. It is more like 
broadcasting, the final state is that each executor will hold the whole data of 
RDD2, the difference is that this is executor-executor sync, not 
driver-executor sync.
    
    I also have the similar concern. The performance can be varied by 
workloads, we'd better have some different workloads to see general 
improvements.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to