szehon-ho commented on code in PR #56370:
URL: https://github.com/apache/spark/pull/56370#discussion_r3383692063
##########
docs/sql-ref-syntax-dml-insert-table.md:
##########
@@ -259,6 +270,77 @@ SELECT * FROM students WHERE student_id = 11215017;
+------------+----------------------+----------+
```
+##### Insert By Name Using a SELECT Statement
+
+```sql
+CREATE TABLE target (n INT, text STRING, s STRUCT<a INT, b INT>);
+
+-- BY NAME matches both top-level columns and nested struct fields by name,
+-- so they may be listed in any order in the source query.
+INSERT INTO target BY NAME
+ SELECT named_struct('b', 2, 'a', 1) AS s, 0 AS n, 'data' AS text;
+-- s is {1, 2}: nested fields matched by name, not position.
+
+SELECT * FROM target;
++---+----+------+
+| n|text| s|
++---+----+------+
+| 0|data|{1, 2}|
++---+----+------+
+
+CREATE OR REPLACE TABLE target (n INT, arr ARRAY<STRUCT<a INT, b INT>>);
+
+-- A missing top-level column is filled with its default value (NULL here).
+INSERT INTO target BY NAME
+ SELECT array(named_struct('a', 1, 'b', 2)) AS arr, 0 AS n;
+INSERT INTO target BY NAME
+ SELECT array(named_struct('a', 1, 'b', 2)) AS arr;
Review Comment:
Minor: this comment sits above both `INSERT` statements, but only the second
one (without `n`) actually omits a column — the first supplies both `arr` and
`n`. Consider moving it directly above the second `INSERT` so it's clear which
statement demonstrates the default fill.
##########
docs/sql-ref-syntax-dml-insert-table.md:
##########
@@ -259,6 +270,77 @@ SELECT * FROM students WHERE student_id = 11215017;
+------------+----------------------+----------+
```
+##### Insert By Name Using a SELECT Statement
+
+```sql
+CREATE TABLE target (n INT, text STRING, s STRUCT<a INT, b INT>);
+
+-- BY NAME matches both top-level columns and nested struct fields by name,
+-- so they may be listed in any order in the source query.
+INSERT INTO target BY NAME
+ SELECT named_struct('b', 2, 'a', 1) AS s, 0 AS n, 'data' AS text;
+-- s is {1, 2}: nested fields matched by name, not position.
+
+SELECT * FROM target;
++---+----+------+
+| n|text| s|
++---+----+------+
+| 0|data|{1, 2}|
++---+----+------+
+
+CREATE OR REPLACE TABLE target (n INT, arr ARRAY<STRUCT<a INT, b INT>>);
+
+-- A missing top-level column is filled with its default value (NULL here).
+INSERT INTO target BY NAME
+ SELECT array(named_struct('a', 1, 'b', 2)) AS arr, 0 AS n;
+INSERT INTO target BY NAME
+ SELECT array(named_struct('a', 1, 'b', 2)) AS arr;
+
+SELECT * FROM target;
++----+--------+
+| n| arr|
++----+--------+
+| 0|[{1, 2}]|
+|NULL|[{1, 2}]|
++----+--------+
+
+-- A source column whose name does not match any target column is an error.
+INSERT INTO target BY NAME
+ SELECT array(named_struct('a', 1, 'b', 2)) AS arr, 0 AS badname;
+Error
+
+-- Duplicate source column names that resolve to the same target column are an
error.
+INSERT INTO target BY NAME
+ SELECT array(named_struct('a', 1, 'b', 2)) AS arr, 0 AS n, 1 AS n;
+Error: INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS
Review Comment:
I'd suggest just removing this example. It doesn't demonstrate
duplicate-name resolution: the arity check runs before any name matching, so
with 3 source columns against the 2-column target you get
`TOO_MANY_DATA_COLUMNS` purely on column count — the duplicate `n` is never
evaluated. The identical error occurs for a positional insert (`INSERT INTO
target SELECT array(...), 0, 1`), so it's neither `BY NAME`-specific nor
illustrative of what the comment claims.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]