[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

cloud-fan Thu, 21 Jun 2018 15:00:02 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21606#discussion_r197290427
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java
 ---
    @@ -42,15 +42,12 @@
        *                    Usually Spark processes many RDD partitions at the 
same time,
        *                    implementations should use the partition id to 
distinguish writers for
        *                    different partitions.
    -   * @param attemptNumber Spark may launch multiple tasks with the same 
task id. For example, a task
    -   *                      failed, Spark launches a new task wth the same 
task id but different
    -   *                      attempt number. Or a task is too slow, Spark 
launches new tasks wth the
    -   *                      same task id but different attempt number, which 
means there are multiple
    -   *                      tasks with the same task id running at the same 
time. Implementations can
    -   *                      use this attempt number to distinguish writers 
of different task attempts.
    +   * @param taskId A unique identifier for a task that is performing the 
write of the partition
    +   *               data. Spark may run multiple tasks for the same 
partition (due to speculation
    +   *               or task failures, for example).
        * @param epochId A monotonically increasing id for streaming queries 
that are split in to
        *                discrete periods of execution. For non-streaming 
queries,
        *                this ID will always be 0.
        */
    -  DataWriter<T> createDataWriter(int partitionId, int attemptNumber, long 
epochId);
    +  DataWriter<T> createDataWriter(int partitionId, int taskId, long 
epochId);
    --- End diff --
    
    if this patch targets 2.3, I'd say we should not change any API or 
document, just pass `taskId.toInt` as `attemptNumber` and add comments to 
explain this hacky workaround.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

Reply via email to