Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/2366#issuecomment-56293724
@tdas handling (1) deterministically will make (2) in line with what we
currently have.
And that should be sufficient imo.
(3) was not in context of this patch - but a general shortcoming of spark
currently.
Alleviating (3) might be complicated (not sure how much so) - but will have
some very interesting consequences to performance (among others).
For example: this prevents us from using block persistance for checkpoint -
there was a discussion about this in a JIRA a while back (forgot id) ...
resolving this and with 3x replicated blocks, will mean we get really cheap and
very performent checkpoint (while having fault tolerance at par with hdfs)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]