Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21477#discussion_r193839997
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala 
---
    @@ -71,23 +110,17 @@ abstract class ForeachWriter[T] extends Serializable {
       // TODO: Move this to org.apache.spark.sql.util or consolidate this with 
batch API.
     
       /**
    -   * Called when starting to process one partition of new data in the 
executor. The `version` is
    -   * for data deduplication when there are failures. When recovering from 
a failure, some data may
    -   * be generated multiple times but they will always have the same 
version.
    -   *
    -   * If this method finds using the `partitionId` and `version` that this 
partition has already been
    -   * processed, it can return `false` to skip the further data processing. 
However, `close` still
    -   * will be called for cleaning up resources.
    +   * Called when starting to process one partition of new data in the 
executor.
        *
        * @param partitionId the partition id.
    -   * @param version a unique id for data deduplication.
    +   * @param epochId a unique id for data deduplication.
        * @return `true` if the corresponding partition and version id should 
be processed. `false`
        *         indicates the partition should be skipped.
        */
    -  def open(partitionId: Long, version: Long): Boolean
    +  def open(partitionId: Long, epochId: Long): Boolean
    --- End diff --
    
    Renaming a parameter breaks Scala source compatibility. I'm totally fine to 
change this since it's not a stable API, just point this out.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to