Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21606#discussion_r197540970
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -76,13 +76,17 @@ object SparkHadoopWriter extends Logging {
// Try to write all RDD partitions as a Hadoop OutputFormat.
try {
val ret = sparkContext.runJob(rdd, (context: TaskContext, iter:
Iterator[(K, V)]) => {
+ // SPARK-24552: Generate a unique "attempt ID" based on the stage
and task atempt numbers.
+ // Assumes that there won't be more than Short.MaxValue attempts,
at least not concurrently.
+ val attemptId = (context.stageAttemptNumber << 16) |
context.attemptNumber
--- End diff --
I don't think we should generate an ID this way. We already have a unique
ID that is exposed in the Spark UI. I'd much rather make it clear that the TID
passed to committers as an attempt ID is the same as the TID in the stage view.
That makes debugging easier. Going with this approach just introduces yet
another number to track an attempt.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]