cloud-fan commented on code in PR #56712:
URL: https://github.com/apache/spark/pull/56712#discussion_r3471715912


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -337,23 +377,42 @@ object TableOutputResolver extends SQLConfHelper with 
Logging {
       addError: String => Unit,
       colPath: Seq[String] = Nil,
       defaultValueFillMode: DefaultValueFillMode.Value,
-      enforceFullOutput: Boolean = false): Seq[NamedExpression] = {
+      enforceFullOutput: Boolean = false,
+      autoFilledGenCols: mutable.Set[String] = mutable.Set.empty): 
Seq[NamedExpression] = {
     val matchedCols = mutable.HashSet.empty[String]
     val reordered = expectedCols.flatMap { expectedCol =>
       val matched = inputCols.filter(col => conf.resolver(col.name, 
expectedCol.name))
       val newColPath = colPath :+ expectedCol.name
       if (matched.isEmpty) {
-        val defaultExpr = if (Set(FILL, 
RECURSE).contains(defaultValueFillMode)) {
-          getDefaultValueExprOrNullLit(expectedCol, 
conf.useNullsForMissingDefaultColumnValues)
-        } else {
-          None
-        }
-        if (defaultExpr.isEmpty) {
-          throw 
QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(
-            tableName, newColPath.quoted
-          )
+        // Check for generated column expression first, before falling back to 
defaults,
+        // since getDefaultValueExprOrNullLit may return null for nullable 
columns.
+        GeneratedColumn.getGenerationExpression(expectedCol.metadata) match {
+          case Some(genExprSql) =>
+            autoFilledGenCols += expectedCol.name
+            // The parsed references resolve against the query output (the 
user-provided
+            // columns) later in analysis. References to other generated 
columns are already
+            // rejected at table definition time (see 
GeneratedColumnExpression.validate).
+            val genExpr = CatalystSqlParser.parseExpression(genExprSql)
+            Some(applyColumnMetadata(genExpr, expectedCol))

Review Comment:
   By-name auto-fill leaves the generation expression to resolve against the 
pre-cast `query.output` here, while the by-position path (`:545`) rewrites its 
references to the post-cast `matched` columns via 
`resolveGenerationExprReferences`. When a write's input column type differs 
from the table column type (a store-assignment cast), the two paths then 
evaluate the expression over different values — pre-cast by-name, post-cast 
by-position — which can diverge for overflow/precision-sensitive expressions, 
and can leave the by-name generated value inconsistent with the stored cast 
source column.
   
   No by-name test exercises a value-changing cast: the type-coercion test's `a 
+ 1` (`a:INT, b:LONG`) is identical computed in INT vs LONG, so it can't 
observe this. Can you confirm by-name binds to the intended value? If it binds 
pre-cast, mirroring the by-position resolution (and adding a by-name 
widening-cast test) would close the gap.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableConstraints.scala:
##########
@@ -72,6 +79,57 @@ class ResolveTableConstraints(val catalogManager: 
CatalogManager) extends Rule[L
       
.getOrElse(catalogManager.v1SessionCatalog.parser.parseExpression(c.predicateSql()))
   }
 
+  /**
+   * For each user-provided generated column, add a CheckInvariant that 
validates the column
+   * value matches the generation expression. Auto-filled generated columns 
are excluded:
+   * ResolveOutputRelation strips their GENERATION_EXPRESSION metadata from 
the table output,
+   * so they won't appear here.
+   */
+  private def buildGeneratedColumnConstraints(
+      r: DataSourceV2Relation,
+      v2Write: V2WriteCommand): Seq[Expression] = {
+    if (!GeneratedColumn.supportsGeneratedColumnsOnWrite(r.catalog, 
r.table.columns())) {
+      return Seq.empty
+    }
+
+    // Use V2 columns from the table to access both V2 expressions and SQL 
strings.
+    // Only add constraints for generated columns whose GENERATION_EXPRESSION 
metadata
+    // is still present in the table output -- ResolveOutputRelation strips 
the metadata
+    // from auto-filled columns so they are excluded here.
+    val v2Columns = r.table.columns()
+    val resolver = catalogManager.v1SessionCatalog.conf.resolver
+    val userProvidedGenCols = r.output

Review Comment:
   The user-provided-vs-auto-filled distinction is carried by mutated 
table-output metadata: `ResolveOutputRelation` strips `GENERATION_EXPRESSION` 
from auto-filled columns, and this read-back depends on that having happened. 
It's safe today because `outputResolved` keeps this rule from firing before the 
projection is built and stripped — so this is not a bug.
   
   The coupling is implicit, though: a future write path that builds the 
projection without the strip would make this add a `CheckInvariant` on an 
auto-filled column, validating a value that's correct by construction against a 
possibly differently-bound expression (see the by-name/by-position note) — a 
spurious violation. A comment cross-linking the strip site, or carrying the 
auto-filled set on the write command, would make the contract explicit.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -337,23 +377,42 @@ object TableOutputResolver extends SQLConfHelper with 
Logging {
       addError: String => Unit,
       colPath: Seq[String] = Nil,
       defaultValueFillMode: DefaultValueFillMode.Value,
-      enforceFullOutput: Boolean = false): Seq[NamedExpression] = {
+      enforceFullOutput: Boolean = false,
+      autoFilledGenCols: mutable.Set[String] = mutable.Set.empty): 
Seq[NamedExpression] = {
     val matchedCols = mutable.HashSet.empty[String]
     val reordered = expectedCols.flatMap { expectedCol =>
       val matched = inputCols.filter(col => conf.resolver(col.name, 
expectedCol.name))
       val newColPath = colPath :+ expectedCol.name
       if (matched.isEmpty) {
-        val defaultExpr = if (Set(FILL, 
RECURSE).contains(defaultValueFillMode)) {
-          getDefaultValueExprOrNullLit(expectedCol, 
conf.useNullsForMissingDefaultColumnValues)
-        } else {
-          None
-        }
-        if (defaultExpr.isEmpty) {
-          throw 
QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(
-            tableName, newColPath.quoted
-          )
+        // Check for generated column expression first, before falling back to 
defaults,
+        // since getDefaultValueExprOrNullLit may return null for nullable 
columns.
+        GeneratedColumn.getGenerationExpression(expectedCol.metadata) match {
+          case Some(genExprSql) =>
+            autoFilledGenCols += expectedCol.name
+            // The parsed references resolve against the query output (the 
user-provided
+            // columns) later in analysis. References to other generated 
columns are already
+            // rejected at table definition time (see 
GeneratedColumnExpression.validate).
+            val genExpr = CatalystSqlParser.parseExpression(genExprSql)
+            Some(applyColumnMetadata(genExpr, expectedCol))
+          case None =>
+            // When a table has generated columns and the config is off, do not
+            // fill missing nullable non-generated columns with null.
+            val useNullAsDefault = conf.useNullsForMissingDefaultColumnValues 
&&

Review Comment:
   This `useNullAsDefault` gate is duplicated verbatim in 
`resolveColumnsByPosition` (`:548`). A small shared helper would keep the 
by-name and by-position paths from drifting.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to