aokolnychyi commented on code in PR #49493:
URL: https://github.com/apache/spark/pull/49493#discussion_r1917449166
##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DeltaWriter.java:
##########
@@ -48,6 +48,23 @@ public interface DeltaWriter<T> extends DataWriter<T> {
*/
void update(T metadata, T id, T row) throws IOException;
+ /**
+ * Inserts a new row with metadata.
+ * <p>
+ * This method is used by row-level operations to handle metadata associated
with updates
+ * that are split into deletes and inserts. For new records added during a
MERGE operation,
+ * metadata column values are set to {@code null}.
+ *
+ * @param metadata values for metadata columns
+ * @param row a row to insert
+ * @throws IOException if failure happens during disk/network IO like
writing files
+ *
+ * @since 4.0.0
+ */
+ default void insert(T metadata, T row) throws IOException {
Review Comment:
Actually, it may not always be a new record to insert in `DataWriter`. In
group-based DELETE, UPDATE, and MERGE operations that replace entire files in
Delta and Iceberg, certain records have to be copied over. That means those
records aren't really inserts. Leaving the method name as `write` in
`DeltaWriter` keeps its purpose fairly generic and allows us to use it beyond
simple inserts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]