[spark] branch branch-2.3 updated: [SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber for Had…

vanzin Thu, 24 Jan 2019 14:18:03 -0800

This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-2.3 by this push:
     new ded902c  [SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber 
for Had…
ded902c is described below

commit ded902c3a90a9340e551091d554245df5982590c
Author: Ryan Blue <b...@apache.org>
AuthorDate: Thu Jan 24 14:17:38 2019 -0800

    [SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber for Had…
    
    ## What changes were proposed in this pull request?
    
    Updates the attempt ID used by FileFormatWriter. Tasks in stage attempts 
use the same task attempt number and could conflict. Using Spark's task attempt 
ID guarantees that Hadoop TaskAttemptID instances are unique.
    
    This is a backport of d5a97c1 to the 2.3 branch.
    
    ## How was this patch tested?
    
    Existing tests. Also validated that we no longer detect this failure case 
in our logs after deployment.
    
    Closes #23640 from rdblue/SPARK-26682-backport-to-2.3.
    
    Authored-by: Ryan Blue <b...@apache.org>
    Signed-off-by: Marcelo Vanzin <van...@cloudera.com>
---
 .../org/apache/spark/sql/execution/datasources/FileFormatWriter.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
index 1d80a69..2f701ed 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
@@ -198,7 +198,7 @@ object FileFormatWriter extends Logging {
             description = description,
             sparkStageId = taskContext.stageId(),
             sparkPartitionId = taskContext.partitionId(),
-            sparkAttemptNumber = taskContext.attemptNumber(),
+            sparkAttemptNumber = taskContext.taskAttemptId().toInt & 
Integer.MAX_VALUE,
             committer,
             iterator = iter)
         },


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.3 updated: [SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber for Had…

Reply via email to