Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/2366#issuecomment-56293724 @tdas handling (1) deterministically will make (2) in line with what we currently have. And that should be sufficient imo. (3) was not in context of this patch - but a general shortcoming of spark currently. Alleviating (3) might be complicated (not sure how much so) - but will have some very interesting consequences to performance (among others). For example: this prevents us from using block persistance for checkpoint - there was a discussion about this in a JIRA a while back (forgot id) ... resolving this and with 3x replicated blocks, will mean we get really cheap and very performent checkpoint (while having fault tolerance at par with hdfs)
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org