Github user rezasafi commented on the issue:
https://github.com/apache/spark/pull/19848
@mridulm what I meant by same rdd was to run the same job two times on the
same cluster but in different spark contexts. So it is not the same rdd, but
since sparkContext will start rdd ids from zero then we may have same rdd ids
in different executions. The jobTrackerId will be different, but I actually
didn't check whether hadoop will cause a different file path based on the
jobTrackerId. If that is the case then there will not be a problem. But if not
then the commit will fail I guess. I think this can only happen when
spark.hadoop.validateOutputSpec is true.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]