szehon-ho opened a new pull request, #56712: URL: https://github.com/apache/spark/pull/56712
### What changes were proposed in this pull request? Add support for auto-filling generated column values and enforcing generated column constraints during V2 table writes (INSERT). When a catalog declares `SUPPORT_GENERATED_COLUMN_ON_WRITE`: - Missing generated columns are auto-filled using the generation expression via a `Project` node added during analysis - User-provided generated column values are validated against the generation expression via `CheckInvariant(EqualNullSafe(col, genExpr))` - MERGE, UPDATE, and streaming writes with generated columns are blocked until support is implemented **Key changes:** - `TableCatalogCapability`: New `SUPPORT_GENERATED_COLUMN_ON_WRITE` capability - `CatalogV2Util`: Encode generation expression in V2-to-StructField metadata roundtrip - `TableOutputResolver`: Fill missing generated columns in by-name and by-position write paths (checking generation expression before null defaults) - `ResolveTableConstraints`: Add generated column constraints using V2 expression with SQL parser fallback, scoped to user-provided columns only - `Analyzer` (`ResolveOutputRelation`): Gate on capability; strip generation expression metadata from auto-filled columns so constraints are scoped correctly - `RewriteRowLevelCommand`: Block MERGE/UPDATE with generated columns - `ResolveWriteToStream`: Block streaming writes with generated columns - `SQLConf`: New `spark.sql.generatedColumn.allowNullableIngest.enabled` config ### Why are the changes needed? Generated columns (defined with `GENERATED ALWAYS AS (expr)`) currently only support DDL — the generation expression is stored but never evaluated during writes. This means connectors must handle generated column values themselves (e.g., Delta Lake does this in its own execution layer). This PR adds Spark-native support so that any V2 catalog can opt in to having Spark auto-compute generated column values during INSERT. This brings generated columns closer to parity with default column values, which Spark already handles during writes. ### Does this PR introduce _any_ user-facing change? Yes. For catalogs that declare `SUPPORT_GENERATED_COLUMN_ON_WRITE`: - Users can omit generated columns from INSERT statements and the values are auto-computed - Users who explicitly provide generated column values will get a `CHECK_CONSTRAINT_VIOLATION` error if the value doesn't match the generation expression - MERGE, UPDATE, and streaming writes to tables with generated columns will fail with `UNSUPPORTED_FEATURE.TABLE_OPERATION` - New config `spark.sql.generatedColumn.allowNullableIngest.enabled` (default `true`) controls whether nullable non-generated columns can be omitted when writing to tables with generated columns ### How was this patch tested? New `GeneratedColumnWriteSuite` with 30 tests covering: - Auto-fill by name and by position - Constraint validation (matching/non-matching/NULL values) - Multiple generated columns, multi-column expressions, complex expressions (CAST) - Type coercion, column reordering, case-insensitive matching - INSERT OVERWRITE, INSERT SELECT, CTAS, DataFrame API - Generated column in middle of schema, non-trailing by position (error) - Generated partition columns - Config for nullable ingest - Combined with table CHECK constraints - Plan inspection (CheckInvariant present for user-provided, absent for auto-filled) - Streaming write blocking - Mix of auto-filled and user-provided generated columns ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (claude-opus-4-6) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
