[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

vanzin Thu, 21 Jun 2018 15:06:37 -0700

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21606#discussion_r197291875
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java
 ---
    @@ -42,15 +42,12 @@
        *                    Usually Spark processes many RDD partitions at the 
same time,
        *                    implementations should use the partition id to 
distinguish writers for
        *                    different partitions.
    -   * @param attemptNumber Spark may launch multiple tasks with the same 
task id. For example, a task
    -   *                      failed, Spark launches a new task wth the same 
task id but different
    -   *                      attempt number. Or a task is too slow, Spark 
launches new tasks wth the
    -   *                      same task id but different attempt number, which 
means there are multiple
    -   *                      tasks with the same task id running at the same 
time. Implementations can
    -   *                      use this attempt number to distinguish writers 
of different task attempts.
    +   * @param taskId A unique identifier for a task that is performing the 
write of the partition
    +   *               data. Spark may run multiple tasks for the same 
partition (due to speculation
    +   *               or task failures, for example).
        * @param epochId A monotonically increasing id for streaming queries 
that are split in to
        *                discrete periods of execution. For non-streaming 
queries,
        *                this ID will always be 0.
        */
    -  DataWriter<T> createDataWriter(int partitionId, int attemptNumber, long 
epochId);
    +  DataWriter<T> createDataWriter(int partitionId, int taskId, long 
epochId);
    --- End diff --
    
    Hmm, interesting. But there is an API in 2.3:
    
    ```
    DataWriter<T> createDataWriter(int partitionId, int attemptNumber);
    ```
    
    Which I guess would still suffer from the problem Ryan describes in the 
bug. In any case, that makes it not possible to cleanly backport this, so we 
can make the type change here.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

Reply via email to