Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/22009#discussion_r209020054
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchWriteSupportProvider.java
---
@@ -21,33 +21,39 @@
import org.apache.spark.annotation.InterfaceStability;
import org.apache.spark.sql.SaveMode;
-import org.apache.spark.sql.sources.v2.writer.DataSourceWriter;
+import org.apache.spark.sql.sources.v2.writer.BatchWriteSupport;
import org.apache.spark.sql.types.StructType;
/**
* A mix-in interface for {@link DataSourceV2}. Data sources can implement
this interface to
- * provide data writing ability and save the data to the data source.
+ * provide data writing ability for batch processing.
+ *
+ * This interface is used when end users want to use a data source
implementation directly, e.g.
+ * {@code Dataset.write.format(...).option(...).save()}.
*/
@InterfaceStability.Evolving
-public interface WriteSupport extends DataSourceV2 {
+public interface BatchWriteSupportProvider extends DataSourceV2 {
/**
- * Creates an optional {@link DataSourceWriter} to save the data to this
data source. Data
+ * Creates an optional {@link BatchWriteSupport} to save the data to
this data source. Data
* sources can return None if there is no writing needed to be done
according to the save mode.
*
* If this method fails (by throwing an exception), the action will fail
and no Spark job will be
* submitted.
*
- * @param writeUUID A unique string for the writing job. It's possible
that there are many writing
- * jobs running at the same time, and the returned
{@link DataSourceWriter} can
- * use this job id to distinguish itself from other
jobs.
+ * @param queryId A unique string for the writing query. It's possible
that there are many
+ * writing queries running at the same time, and the
returned
+ * {@link BatchWriteSupport} can use this id to
distinguish itself from others.
* @param schema the schema of the data to be written.
* @param mode the save mode which determines what to do when the data
are already in this data
* source, please refer to {@link SaveMode} for more details.
* @param options the options for the returned data source writer, which
is an immutable
* case-insensitive string-to-string map.
- * @return a writer to append data to this data source
+ * @return a write support to write data to this data source.
*/
- Optional<DataSourceWriter> createWriter(
- String writeUUID, StructType schema, SaveMode mode,
DataSourceOptions options);
+ Optional<BatchWriteSupport> createBatchWriteSupport(
+ String queryId,
+ StructType schema,
+ SaveMode mode,
--- End diff --
I don't think this is a good idea. Why introduce a legacy API into a new
API? If we are moving old sources to the new API, then they should fully
implement the new API and should not continue to expose the unpredictable v1
behavior.
That said, as long as the `TableCatalog` makes it in, I don't care what
anonymous tables do because I don't intend for any of our sources to use this
path.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]