Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193839997
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala
---
@@ -71,23 +110,17 @@ abstract class ForeachWriter[T] extends Serializable {
// TODO: Move this to org.apache.spark.sql.util or consolidate this with
batch API.
/**
- * Called when starting to process one partition of new data in the
executor. The `version` is
- * for data deduplication when there are failures. When recovering from
a failure, some data may
- * be generated multiple times but they will always have the same
version.
- *
- * If this method finds using the `partitionId` and `version` that this
partition has already been
- * processed, it can return `false` to skip the further data processing.
However, `close` still
- * will be called for cleaning up resources.
+ * Called when starting to process one partition of new data in the
executor.
*
* @param partitionId the partition id.
- * @param version a unique id for data deduplication.
+ * @param epochId a unique id for data deduplication.
* @return `true` if the corresponding partition and version id should
be processed. `false`
* indicates the partition should be skipped.
*/
- def open(partitionId: Long, version: Long): Boolean
+ def open(partitionId: Long, epochId: Long): Boolean
--- End diff --
Renaming a parameter breaks Scala source compatibility. I'm totally fine to
change this since it's not a stable API, just point this out.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]