aokolnychyi commented on code in PR #49493:
URL: https://github.com/apache/spark/pull/49493#discussion_r1915911332
##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/MetadataColumn.java:
##########
@@ -36,6 +36,45 @@
*/
@Evolving
public interface MetadataColumn {
+ /**
+ * Indicates whether a row-level operation should preserve the value of the
metadata column
+ * for deleted rows. If set to true, the metadata value will be retained and
passed back to
+ * the writer. If false, the metadata value will be replaced with {@code
null}.
+ * <p>
+ * This flag applies only to row-level operations working with deltas of
rows. Group-based
+ * operations handle deletes by discarding matching records.
+ *
+ * @since 4.0.0
+ */
+ String PRESERVE_ON_DELETE = "__preserve_on_delete";
+ boolean PRESERVE_ON_DELETE_DEFAULT = true;
+
+ /**
+ * Indicates whether a row-level operation should preserve the value of the
metadata column
+ * for updated rows. If set to true, the metadata value will be retained and
passed back to
+ * the writer. If false, the metadata value will be replaced with {@code
null}.
+ * <p>
+ * This flag applies to both group-based and delta-based row-level
operations.
+ *
+ * @since 4.0.0
+ */
+ String PRESERVE_ON_UPDATE = "__preserve_on_update";
+ boolean PRESERVE_ON_UPDATE_DEFAULT = true;
+
+ /**
+ * Indicates whether a row-level operation should preserve the value of the
metadata column
+ * for inserts generated by splitting updated rows into deletes and inserts.
If true,
+ * the metadata value will be retained and passed back to the writer. If
false, the
+ * metadata value will be replaced with {@code null}.
+ * <p>
+ * This flag applies only to row-level operations working with deltas of
rows. Group-based
+ * operations do not represent updates as deletes and inserts.
+ *
+ * @since 4.0.0
+ */
+ String PRESERVE_ON_INSERT_AS_UPDATE = "__preserve_on_insert_as_update";
Review Comment:
In theory, we can get rid of this separate flag and respect
`PRESERVE_ON_UPDATE` when updates are split. That said, it will be a behavior
change compared to the existing implementation that always discards metadata
columns when updates are split into deletes and inserts. The API is marked as
`@Experimental` but it probably makes sense to avoid changing the default
behavior, even though we go from 3.5 to 4.0.
Let me know what everybody thinks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]