cloud-fan commented on code in PR #56712:
URL: https://github.com/apache/spark/pull/56712#discussion_r3471715912
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -337,23 +377,42 @@ object TableOutputResolver extends SQLConfHelper with
Logging {
addError: String => Unit,
colPath: Seq[String] = Nil,
defaultValueFillMode: DefaultValueFillMode.Value,
- enforceFullOutput: Boolean = false): Seq[NamedExpression] = {
+ enforceFullOutput: Boolean = false,
+ autoFilledGenCols: mutable.Set[String] = mutable.Set.empty):
Seq[NamedExpression] = {
val matchedCols = mutable.HashSet.empty[String]
val reordered = expectedCols.flatMap { expectedCol =>
val matched = inputCols.filter(col => conf.resolver(col.name,
expectedCol.name))
val newColPath = colPath :+ expectedCol.name
if (matched.isEmpty) {
- val defaultExpr = if (Set(FILL,
RECURSE).contains(defaultValueFillMode)) {
- getDefaultValueExprOrNullLit(expectedCol,
conf.useNullsForMissingDefaultColumnValues)
- } else {
- None
- }
- if (defaultExpr.isEmpty) {
- throw
QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(
- tableName, newColPath.quoted
- )
+ // Check for generated column expression first, before falling back to
defaults,
+ // since getDefaultValueExprOrNullLit may return null for nullable
columns.
+ GeneratedColumn.getGenerationExpression(expectedCol.metadata) match {
+ case Some(genExprSql) =>
+ autoFilledGenCols += expectedCol.name
+ // The parsed references resolve against the query output (the
user-provided
+ // columns) later in analysis. References to other generated
columns are already
+ // rejected at table definition time (see
GeneratedColumnExpression.validate).
+ val genExpr = CatalystSqlParser.parseExpression(genExprSql)
+ Some(applyColumnMetadata(genExpr, expectedCol))
Review Comment:
By-name auto-fill leaves the generation expression to resolve against the
pre-cast `query.output` here, while the by-position path (`:545`) rewrites its
references to the post-cast `matched` columns via
`resolveGenerationExprReferences`. When a write's input column type differs
from the table column type (a store-assignment cast), the two paths then
evaluate the expression over different values — pre-cast by-name, post-cast
by-position — which can diverge for overflow/precision-sensitive expressions,
and can leave the by-name generated value inconsistent with the stored cast
source column.
No by-name test exercises a value-changing cast: the type-coercion test's `a
+ 1` (`a:INT, b:LONG`) is identical computed in INT vs LONG, so it can't
observe this. Can you confirm by-name binds to the intended value? If it binds
pre-cast, mirroring the by-position resolution (and adding a by-name
widening-cast test) would close the gap.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableConstraints.scala:
##########
@@ -72,6 +79,57 @@ class ResolveTableConstraints(val catalogManager:
CatalogManager) extends Rule[L
.getOrElse(catalogManager.v1SessionCatalog.parser.parseExpression(c.predicateSql()))
}
+ /**
+ * For each user-provided generated column, add a CheckInvariant that
validates the column
+ * value matches the generation expression. Auto-filled generated columns
are excluded:
+ * ResolveOutputRelation strips their GENERATION_EXPRESSION metadata from
the table output,
+ * so they won't appear here.
+ */
+ private def buildGeneratedColumnConstraints(
+ r: DataSourceV2Relation,
+ v2Write: V2WriteCommand): Seq[Expression] = {
+ if (!GeneratedColumn.supportsGeneratedColumnsOnWrite(r.catalog,
r.table.columns())) {
+ return Seq.empty
+ }
+
+ // Use V2 columns from the table to access both V2 expressions and SQL
strings.
+ // Only add constraints for generated columns whose GENERATION_EXPRESSION
metadata
+ // is still present in the table output -- ResolveOutputRelation strips
the metadata
+ // from auto-filled columns so they are excluded here.
+ val v2Columns = r.table.columns()
+ val resolver = catalogManager.v1SessionCatalog.conf.resolver
+ val userProvidedGenCols = r.output
Review Comment:
The user-provided-vs-auto-filled distinction is carried by mutated
table-output metadata: `ResolveOutputRelation` strips `GENERATION_EXPRESSION`
from auto-filled columns, and this read-back depends on that having happened.
It's safe today because `outputResolved` keeps this rule from firing before the
projection is built and stripped — so this is not a bug.
The coupling is implicit, though: a future write path that builds the
projection without the strip would make this add a `CheckInvariant` on an
auto-filled column, validating a value that's correct by construction against a
possibly differently-bound expression (see the by-name/by-position note) — a
spurious violation. A comment cross-linking the strip site, or carrying the
auto-filled set on the write command, would make the contract explicit.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -337,23 +377,42 @@ object TableOutputResolver extends SQLConfHelper with
Logging {
addError: String => Unit,
colPath: Seq[String] = Nil,
defaultValueFillMode: DefaultValueFillMode.Value,
- enforceFullOutput: Boolean = false): Seq[NamedExpression] = {
+ enforceFullOutput: Boolean = false,
+ autoFilledGenCols: mutable.Set[String] = mutable.Set.empty):
Seq[NamedExpression] = {
val matchedCols = mutable.HashSet.empty[String]
val reordered = expectedCols.flatMap { expectedCol =>
val matched = inputCols.filter(col => conf.resolver(col.name,
expectedCol.name))
val newColPath = colPath :+ expectedCol.name
if (matched.isEmpty) {
- val defaultExpr = if (Set(FILL,
RECURSE).contains(defaultValueFillMode)) {
- getDefaultValueExprOrNullLit(expectedCol,
conf.useNullsForMissingDefaultColumnValues)
- } else {
- None
- }
- if (defaultExpr.isEmpty) {
- throw
QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(
- tableName, newColPath.quoted
- )
+ // Check for generated column expression first, before falling back to
defaults,
+ // since getDefaultValueExprOrNullLit may return null for nullable
columns.
+ GeneratedColumn.getGenerationExpression(expectedCol.metadata) match {
+ case Some(genExprSql) =>
+ autoFilledGenCols += expectedCol.name
+ // The parsed references resolve against the query output (the
user-provided
+ // columns) later in analysis. References to other generated
columns are already
+ // rejected at table definition time (see
GeneratedColumnExpression.validate).
+ val genExpr = CatalystSqlParser.parseExpression(genExprSql)
+ Some(applyColumnMetadata(genExpr, expectedCol))
+ case None =>
+ // When a table has generated columns and the config is off, do not
+ // fill missing nullable non-generated columns with null.
+ val useNullAsDefault = conf.useNullsForMissingDefaultColumnValues
&&
Review Comment:
This `useNullAsDefault` gate is duplicated verbatim in
`resolveColumnsByPosition` (`:548`). A small shared helper would keep the
by-name and by-position paths from drifting.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]