szehon-ho commented on code in PR #56370:
URL: https://github.com/apache/spark/pull/56370#discussion_r3383033518


##########
docs/sql-ref-syntax-dml-insert-table.md:
##########
@@ -259,6 +270,77 @@ SELECT * FROM students WHERE student_id = 11215017;
 +------------+----------------------+----------+
 ```
 
+##### Insert By Name Using a SELECT Statement
+
+```sql
+CREATE TABLE target (n INT, text STRING, s STRUCT<a INT, b INT>);
+
+-- BY NAME matches the top-level columns by name, so they may be listed in any 
order.
+-- Nested struct fields are matched by position, so the struct field order in 
the query
+-- must match the target schema (STRUCT<a INT, b INT>).

Review Comment:
   This note isn't correct: with `BY NAME`, nested struct fields are matched 
**by name**, not by position. The by-name mode propagates recursively into 
nested structs (and array-element / map structs), so it's not "top-level by 
name, nested by position."
   
   In `TableOutputResolver`, a `BY NAME` write goes through 
`reorderColumnsByName`, and a matched struct column recurses with `byName = 
true` into `resolveStructType`, which again calls `reorderColumnsByName` on the 
nested fields. This matches the behavior of the schema-evolution path 
(`ResolveSchemaEvolution`), which threads the same `isByName` flag into nested 
structs.
   
   This is verified by `V2WriteAnalysisSuite` "check fields of struct type 
column": with target struct `{a, b}` and source struct `{x, y}`, the `byName` 
case *fails* with `INCOMPATIBLE_DATA_FOR_TABLE.CANNOT_FIND_DATA` on `` 
`col`.`a` `` (it looked up the nested field by name), while the `byPosition` 
case succeeds mapping `x->a, y->b`.
   
   Also, as written the example can't actually demonstrate the rule, since the 
source struct `named_struct('a', 1, 'b', 2)` has the same field names *and* 
order as the target — the result `{1, 2}` is identical either way. Suggest 
fixing the comment and reordering the nested fields so it's illustrative:
   
   ```sql
   -- BY NAME matches both top-level columns and nested struct fields by name,
   -- so they may be listed in any order in the source query.
   INSERT INTO target BY NAME
       SELECT named_struct('b', 2, 'a', 1) AS s, 0 AS n, 'data' AS text;
   -- s is {1, 2}: nested fields matched by name, not position
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to