gengliangwang commented on code in PR #56619:
URL: https://github.com/apache/spark/pull/56619#discussion_r3470503004


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala:
##########
@@ -736,33 +763,45 @@ private[sql] object CatalogV2Util {
         assert(e.resolved && e.foldable,
           "The existence default value must be a simple SQL string that is 
resolved and " +
             "foldable, but got: " + f.getExistenceDefaultValue().get)
-        LiteralValue(e.eval(), f.dataType)
+        LiteralValue(e.eval(), dataType)
       } else {
         null
       }
       val defaultValue = new 
ColumnDefaultValue(f.getCurrentDefaultValue().get, existsDefault)
-      val cleanedMetadata = metadataWithKeysRemoved(
-        Seq("comment", CURRENT_DEFAULT_COLUMN_METADATA_KEY, 
EXISTS_DEFAULT_COLUMN_METADATA_KEY))
-      Column.create(f.name, f.dataType, f.nullable, f.getComment().orNull, 
defaultValue,
-        metadataAsJson(cleanedMetadata))
+      val cleanedMetadata = 
metadataWithKeysRemoved(REMOVED_DEFAULT_VALUE_METADATA_KEYS)
+      Column.newBuilder(f.name, dataType)
+        .nullable(f.nullable)
+        .comment(comment)
+        .defaultValue(defaultValue)
+        .metadataInJSON(metadataAsJson(cleanedMetadata))

Review Comment:
   This doesn't compile. The builder method was renamed `metadataInJSON(...)` → 
`metadata(...)` in this commit (`Column.java:270` now exposes only 
`metadata(...)`), but this call still uses the old name — as do `785`, `794`, 
`802` in this file and `ColumnSuite.scala:111`. CI is red across every build 
module (incl. `api, catalyst` and `Precompile Spark`).
   
   ```suggestion
           .metadata(metadataAsJson(cleanedMetadata))
   ```
   Apply the same rename on lines 785, 794, 802, and `ColumnSuite.scala:111`.



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/txns.scala:
##########
@@ -116,16 +116,10 @@ class TxnTable(
   // don't false-positive on the TxnTable wrapper having a different UUID.
   override val id: String = delegate.id
 
-  // The starting version should be the delegate version.
-  setVersion(delegate.version())
-
-  // Preserve column IDs from the delegate so that column ID validation can 
correctly detect
-  // drop-and-re-add scenarios (different IDs) and pass when columns are 
unchanged (same IDs).
-  // Uses assignMissingIds to keep the delegate's IDs for existing columns 
while assigning
-  // fresh IDs for any new columns added by schema evolution.
-  updateColumns(InMemoryBaseTable.assignMissingIds(
-    oldColumns = delegate.columns(),
-    newColumns = columns()))
+  // Column IDs for existing columns are preserved through the StructType 
round-trip via
+  // metadata encoding. assignMissingIds assigns fresh IDs to any new columns 
added by
+  // schema evolution.
+  updateColumns(InMemoryBaseTable.assignMissingIds(columns()))

Review Comment:
   `master` sets `setVersion(delegate.version())` right here ("the starting 
version should be the delegate version"), but this rewrite dropped it along 
with the old version comment. It looks unrelated to the column-ID change and 
reads as an accidental collateral deletion.
   
   It matters: `InMemoryBaseTable` equality/hashCode are version-aware (`id() 
== other.id() && version() == other.version()`) and `tableVersion` defaults to 
`0`, so the wrapper now reports version `0` instead of the delegate's — 
re-introducing the same identity false-positive the `id` override just above 
was added to prevent, now via version. Restore the line unless dropping it was 
intentional (in which case, what makes the version sync unnecessary now?).



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/V2TableUtil.scala:
##########
@@ -69,10 +67,11 @@ private[sql] object V2TableUtil extends SQLConfHelper {
   def validateCapturedColumns(
       table: Table,
       originCols: Seq[Column],
-      mode: SchemaValidationMode = PROHIBIT_CHANGES): Seq[String] = {
+      mode: SchemaValidationMode = PROHIBIT_CHANGES,
+      checkFieldIds: Boolean = true): Seq[String] = {

Review Comment:
   The doc for this method lists type/nullability changes, removed, and added 
columns, but not the field-ID check it now performs — which is the headline 
behavior of this PR. Worth a line here and on 
`SchemaUtils.validateSchemaCompatibility`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to