[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

rdblue Fri, 22 Jun 2018 12:05:11 -0700

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21606#discussion_r197540970
  
    --- Diff: 
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
    @@ -76,13 +76,17 @@ object SparkHadoopWriter extends Logging {
         // Try to write all RDD partitions as a Hadoop OutputFormat.
         try {
           val ret = sparkContext.runJob(rdd, (context: TaskContext, iter: 
Iterator[(K, V)]) => {
    +        // SPARK-24552: Generate a unique "attempt ID" based on the stage 
and task atempt numbers.
    +        // Assumes that there won't be more than Short.MaxValue attempts, 
at least not concurrently.
    +        val attemptId = (context.stageAttemptNumber << 16) | 
context.attemptNumber
    --- End diff --
    
    I don't think we should generate an ID this way. We already have a unique 
ID that is exposed in the Spark UI. I'd much rather make it clear that the TID 
passed to committers as an attempt ID is the same as the TID in the stage view. 
That makes debugging easier. Going with this approach just introduces yet 
another number to track an attempt.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

Reply via email to