Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/5572#issuecomment-105199065
@squito I supposed that because we don't do any special to them, the newly
cached blocks are treated as same as other cached blocks? I think just only
applying idea 1 will not have significant performance improvement as caching
remote blocks.
I agree that this is a very narrow case. However, without this kind of
modification, the original approach is very inefficient. If it is just 2X or 3X
improvement, we can skip it. From the observed improvement, it is still worth
adding this. And this PR has limited the modification, only `CartesianRDD` uses
the newly added `iterator`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]