szehon-ho commented on code in PR #56370:
URL: https://github.com/apache/spark/pull/56370#discussion_r3383692063


##########
docs/sql-ref-syntax-dml-insert-table.md:
##########
@@ -259,6 +270,77 @@ SELECT * FROM students WHERE student_id = 11215017;
 +------------+----------------------+----------+
 ```
 
+##### Insert By Name Using a SELECT Statement
+
+```sql
+CREATE TABLE target (n INT, text STRING, s STRUCT<a INT, b INT>);
+
+-- BY NAME matches both top-level columns and nested struct fields by name,
+-- so they may be listed in any order in the source query.
+INSERT INTO target BY NAME
+    SELECT named_struct('b', 2, 'a', 1) AS s, 0 AS n, 'data' AS text;
+-- s is {1, 2}: nested fields matched by name, not position.
+
+SELECT * FROM target;
++---+----+------+
+|  n|text|     s|
++---+----+------+
+|  0|data|{1, 2}|
++---+----+------+
+
+CREATE OR REPLACE TABLE target (n INT, arr ARRAY<STRUCT<a INT, b INT>>);
+
+-- A missing top-level column is filled with its default value (NULL here).
+INSERT INTO target BY NAME
+    SELECT array(named_struct('a', 1, 'b', 2)) AS arr, 0 AS n;
+INSERT INTO target BY NAME
+    SELECT array(named_struct('a', 1, 'b', 2)) AS arr;

Review Comment:
   Minor: this comment sits above both `INSERT` statements, but only the second 
one (without `n`) actually omits a column — the first supplies both `arr` and 
`n`. Consider moving it directly above the second `INSERT` so it's clear which 
statement demonstrates the default fill.



##########
docs/sql-ref-syntax-dml-insert-table.md:
##########
@@ -259,6 +270,77 @@ SELECT * FROM students WHERE student_id = 11215017;
 +------------+----------------------+----------+
 ```
 
+##### Insert By Name Using a SELECT Statement
+
+```sql
+CREATE TABLE target (n INT, text STRING, s STRUCT<a INT, b INT>);
+
+-- BY NAME matches both top-level columns and nested struct fields by name,
+-- so they may be listed in any order in the source query.
+INSERT INTO target BY NAME
+    SELECT named_struct('b', 2, 'a', 1) AS s, 0 AS n, 'data' AS text;
+-- s is {1, 2}: nested fields matched by name, not position.
+
+SELECT * FROM target;
++---+----+------+
+|  n|text|     s|
++---+----+------+
+|  0|data|{1, 2}|
++---+----+------+
+
+CREATE OR REPLACE TABLE target (n INT, arr ARRAY<STRUCT<a INT, b INT>>);
+
+-- A missing top-level column is filled with its default value (NULL here).
+INSERT INTO target BY NAME
+    SELECT array(named_struct('a', 1, 'b', 2)) AS arr, 0 AS n;
+INSERT INTO target BY NAME
+    SELECT array(named_struct('a', 1, 'b', 2)) AS arr;
+
+SELECT * FROM target;
++----+--------+
+|   n|     arr|
++----+--------+
+|   0|[{1, 2}]|
+|NULL|[{1, 2}]|
++----+--------+
+
+-- A source column whose name does not match any target column is an error.
+INSERT INTO target BY NAME
+    SELECT array(named_struct('a', 1, 'b', 2)) AS arr, 0 AS badname;
+Error
+
+-- Duplicate source column names that resolve to the same target column are an 
error.
+INSERT INTO target BY NAME
+    SELECT array(named_struct('a', 1, 'b', 2)) AS arr, 0 AS n, 1 AS n;
+Error: INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS

Review Comment:
   I'd suggest just removing this example. It doesn't demonstrate 
duplicate-name resolution: the arity check runs before any name matching, so 
with 3 source columns against the 2-column target you get 
`TOO_MANY_DATA_COLUMNS` purely on column count — the duplicate `n` is never 
evaluated. The identical error occurs for a positional insert (`INSERT INTO 
target SELECT array(...), 0, 1`), so it's neither `BY NAME`-specific nor 
illustrative of what the comment claims.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to