Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19388
Internally, spark can continue to use stageId for its commit protocol -
which is independent of what we expose as jobId to mapred/mapreduce committer.
As an implementation detail, we were making using of stageId in executor
and rddId at driver.
Spark does not really care about what is presented to hadoop committer, and
hadoop committer does not really care about what spark is using internally; as
long as both are used consistently.
Will not that help make things simpler ? Or am I missing something here ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]