Github user squito commented on the pull request:
https://github.com/apache/spark/pull/5572#issuecomment-105257051
@viirya @cloud-fan good point, I hadn't thought about multiple tasks on one
executor that are all pulling the same partition of `rdd2`. Still, I'm very
worried about having the extra local caching, if we don't have an effective way
of undoing, because I think it will be very confusing to have these extra
blocks stuck in the cache. I agree that "idea 1" is not as general as a
solution, but I was hoping it was simple enough to fit your narrow need here.
In any case, this is just my opinion -- I'm not adamantly against this,
but I would really like to get some other reviewers that weigh in before we
would merge in those changes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]