Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21606#discussion_r197542704
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -104,12 +104,12 @@ object SparkHadoopWriter extends Logging {
jobTrackerId: String,
commitJobId: Int,
sparkPartitionId: Int,
- sparkAttemptNumber: Int,
+ sparkTaskId: Long,
committer: FileCommitProtocol,
iterator: Iterator[(K, V)]): TaskCommitMessage = {
// Set up a task.
val taskContext = config.createTaskAttemptContext(
- jobTrackerId, commitJobId, sparkPartitionId, sparkAttemptNumber)
+ jobTrackerId, commitJobId, sparkPartitionId, sparkTaskId.toInt)
--- End diff --
I commented before I saw this thread, but I think it is better to use the
TID because that is already exposed in the UI so it is better for tracking
between UI tasks and logs. The combined attempt number isn't used anywhere so
this would introduce another number to identify a task. And anyway, shifting by
16 means that these grow huge anyway.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]