Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
So there are 2 options:
1. ask the RDD closure to be idempotent. I'm not sure if it's OK for MLlib,
cc @mengxr @WeichenXu123 @yanboliang
2. ask the output committer to be able to overwrite a committed task. Note
that, the output committer here is the `FileCommitProtocol` interface in Spark,
not the hadoop output committer. We don't have to make it all the hadoop output
committers work.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]