ottomata commented on PR #21012:
URL: https://github.com/apache/spark/pull/21012#issuecomment-1874422376

   Okay, @cloud-fan I got something working, but I'm not sure if you are going 
to like it!  I'd like to vet it with you before I keep working on it.
   
   `convertToStructField` now [returns a nested 
StructField](https://github.com/apache/spark/commit/d00ac40a4699b04632367d863b5e4bdbee481348#diff-415c3b78d7558933c759da01218075e011c45c4d0fb5ff3fe522f24d6e4b8e20R608)
 if `col.name.length > 1`. Then, in `AlterTableAddColumnsCommand`, The 
`colsToAdd`, which might be nested structs now, is [merged as a StructType 
schema with the existing table 
schema](https://github.com/apache/spark/commit/d00ac40a4699b04632367d863b5e4bdbee481348#diff-632efff0ffceb2ff47b97f39570b4f2cff24401f132f559158acaa651b27af26R244).
  This gets us a full new schema with new nested columns recursively merged in. 
 Duplicate column names are still checked for, but now only leaf node path 
names are checked.  This works because the merge merges together struct columns 
from the colsToAdd with existing struct columns, so we know there are no 
duplicate container/struct columns in the new schema.
   
   
[SchemaUtils.merge](https://github.com/apache/spark/commit/d00ac40a4699b04632367d863b5e4bdbee481348#diff-79df452a8a9b284ece3d9b034058e5fcbb5a421890e4947926963e624b7be1d5R331)
 needs the most work.  It's based on Wikimedia code we currently use to merge 
schemas together, but as is it will only work in case-insensitive catalogs 
(Hive).  There are probably other things to deal with in review.
   
   If you are okay with this direction, I'll reopen this PR with this code.
   
   ---
   Question though.  This original PR for DataSource v1 via `ALTER TABLE CHANGE 
COLUMN` worked with adding columns to structs in ArrayType element type, and 
should also work for structs in MapType value types too.  I can address a 
specific qualified column (e.g. `a.b.c`) in a struct with `ADD COLUMNS`, but I 
don't think we address the element type of an array column or the value type of 
a map column in this way.  Would one need to use `CHANGE|ALTER COLUMN` syntax 
for this?  And if so, should that be supported?...And if so, would it not be 
easiest to add columns struct fields via `CHANGE|ALTER COLUMN` too?
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to