gengliangwang commented on code in PR #56619:
URL: https://github.com/apache/spark/pull/56619#discussion_r3470503004
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala:
##########
@@ -736,33 +763,45 @@ private[sql] object CatalogV2Util {
assert(e.resolved && e.foldable,
"The existence default value must be a simple SQL string that is
resolved and " +
"foldable, but got: " + f.getExistenceDefaultValue().get)
- LiteralValue(e.eval(), f.dataType)
+ LiteralValue(e.eval(), dataType)
} else {
null
}
val defaultValue = new
ColumnDefaultValue(f.getCurrentDefaultValue().get, existsDefault)
- val cleanedMetadata = metadataWithKeysRemoved(
- Seq("comment", CURRENT_DEFAULT_COLUMN_METADATA_KEY,
EXISTS_DEFAULT_COLUMN_METADATA_KEY))
- Column.create(f.name, f.dataType, f.nullable, f.getComment().orNull,
defaultValue,
- metadataAsJson(cleanedMetadata))
+ val cleanedMetadata =
metadataWithKeysRemoved(REMOVED_DEFAULT_VALUE_METADATA_KEYS)
+ Column.newBuilder(f.name, dataType)
+ .nullable(f.nullable)
+ .comment(comment)
+ .defaultValue(defaultValue)
+ .metadataInJSON(metadataAsJson(cleanedMetadata))
Review Comment:
This doesn't compile. The builder method was renamed `metadataInJSON(...)` →
`metadata(...)` in this commit (`Column.java:270` now exposes only
`metadata(...)`), but this call still uses the old name — as do `785`, `794`,
`802` in this file and `ColumnSuite.scala:111`. CI is red across every build
module (incl. `api, catalyst` and `Precompile Spark`).
```suggestion
.metadata(metadataAsJson(cleanedMetadata))
```
Apply the same rename on lines 785, 794, 802, and `ColumnSuite.scala:111`.
##########
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/txns.scala:
##########
@@ -116,16 +116,10 @@ class TxnTable(
// don't false-positive on the TxnTable wrapper having a different UUID.
override val id: String = delegate.id
- // The starting version should be the delegate version.
- setVersion(delegate.version())
-
- // Preserve column IDs from the delegate so that column ID validation can
correctly detect
- // drop-and-re-add scenarios (different IDs) and pass when columns are
unchanged (same IDs).
- // Uses assignMissingIds to keep the delegate's IDs for existing columns
while assigning
- // fresh IDs for any new columns added by schema evolution.
- updateColumns(InMemoryBaseTable.assignMissingIds(
- oldColumns = delegate.columns(),
- newColumns = columns()))
+ // Column IDs for existing columns are preserved through the StructType
round-trip via
+ // metadata encoding. assignMissingIds assigns fresh IDs to any new columns
added by
+ // schema evolution.
+ updateColumns(InMemoryBaseTable.assignMissingIds(columns()))
Review Comment:
`master` sets `setVersion(delegate.version())` right here ("the starting
version should be the delegate version"), but this rewrite dropped it along
with the old version comment. It looks unrelated to the column-ID change and
reads as an accidental collateral deletion.
It matters: `InMemoryBaseTable` equality/hashCode are version-aware (`id()
== other.id() && version() == other.version()`) and `tableVersion` defaults to
`0`, so the wrapper now reports version `0` instead of the delegate's —
re-introducing the same identity false-positive the `id` override just above
was added to prevent, now via version. Restore the line unless dropping it was
intentional (in which case, what makes the version sync unnecessary now?).
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/V2TableUtil.scala:
##########
@@ -69,10 +67,11 @@ private[sql] object V2TableUtil extends SQLConfHelper {
def validateCapturedColumns(
table: Table,
originCols: Seq[Column],
- mode: SchemaValidationMode = PROHIBIT_CHANGES): Seq[String] = {
+ mode: SchemaValidationMode = PROHIBIT_CHANGES,
+ checkFieldIds: Boolean = true): Seq[String] = {
Review Comment:
The doc for this method lists type/nullability changes, removed, and added
columns, but not the field-ID check it now performs — which is the headline
behavior of this PR. Worth a line here and on
`SchemaUtils.validateSchemaCompatibility`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]