aokolnychyi commented on code in PR #49493:
URL: https://github.com/apache/spark/pull/49493#discussion_r1915911332


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/MetadataColumn.java:
##########
@@ -36,6 +36,45 @@
  */
 @Evolving
 public interface MetadataColumn {
+  /**
+   * Indicates whether a row-level operation should preserve the value of the 
metadata column
+   * for deleted rows. If set to true, the metadata value will be retained and 
passed back to
+   * the writer. If false, the metadata value will be replaced with {@code 
null}.
+   * <p>
+   * This flag applies only to row-level operations working with deltas of 
rows. Group-based
+   * operations handle deletes by discarding matching records.
+   *
+   * @since 4.0.0
+   */
+  String PRESERVE_ON_DELETE = "__preserve_on_delete";
+  boolean PRESERVE_ON_DELETE_DEFAULT = true;
+
+  /**
+   * Indicates whether a row-level operation should preserve the value of the 
metadata column
+   * for updated rows. If set to true, the metadata value will be retained and 
passed back to
+   * the writer. If false, the metadata value will be replaced with {@code 
null}.
+   * <p>
+   * This flag applies to both group-based and delta-based row-level 
operations.
+   *
+   * @since 4.0.0
+   */
+  String PRESERVE_ON_UPDATE = "__preserve_on_update";
+  boolean PRESERVE_ON_UPDATE_DEFAULT = true;
+
+  /**
+   * Indicates whether a row-level operation should preserve the value of the 
metadata column
+   * for inserts generated by splitting updated rows into deletes and inserts. 
If true,
+   * the metadata value will be retained and passed back to the writer. If 
false, the
+   * metadata value will be replaced with {@code null}.
+   * <p>
+   * This flag applies only to row-level operations working with deltas of 
rows. Group-based
+   * operations do not represent updates as deletes and inserts.
+   *
+   * @since 4.0.0
+   */
+  String PRESERVE_ON_INSERT_AS_UPDATE = "__preserve_on_insert_as_update";

Review Comment:
   In theory, we can get rid of this separate flag and respect 
`PRESERVE_ON_UPDATE` when updates are split. That said, it will be a behavior 
change compared to the existing implementation that always discards metadata 
columns when updates are split into deletes and inserts. The API is marked as 
`@Experimental` but it probably makes sense to avoid changing the default 
behavior. Let me know what everybody thinks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to