ottomata commented on PR #21012: URL: https://github.com/apache/spark/pull/21012#issuecomment-1874422376
Okay, @cloud-fan I got something working, but I'm not sure if you are going to like it! I'd like to vet it with you before I keep working on it. `convertToStructField` now [returns a nested StructField](https://github.com/apache/spark/commit/d00ac40a4699b04632367d863b5e4bdbee481348#diff-415c3b78d7558933c759da01218075e011c45c4d0fb5ff3fe522f24d6e4b8e20R608) if `col.name.length > 1`. Then, in `AlterTableAddColumnsCommand`, The `colsToAdd`, which might be nested structs now, is [merged as a StructType schema with the existing table schema](https://github.com/apache/spark/commit/d00ac40a4699b04632367d863b5e4bdbee481348#diff-632efff0ffceb2ff47b97f39570b4f2cff24401f132f559158acaa651b27af26R244). This gets us a full new schema with new nested columns recursively merged in. Duplicate column names are still checked for, but now only leaf node path names are checked. This works because the merge merges together struct columns from the colsToAdd with existing struct columns, so we know there are no duplicate container/struct columns in the new schema. [SchemaUtils.merge](https://github.com/apache/spark/commit/d00ac40a4699b04632367d863b5e4bdbee481348#diff-79df452a8a9b284ece3d9b034058e5fcbb5a421890e4947926963e624b7be1d5R331) needs the most work. It's based on Wikimedia code we currently use to merge schemas together, but as is it will only work in case-insensitive catalogs (Hive). There are probably other things to deal with in review. If you are okay with this direction, I'll reopen this PR with this code. --- Question though. This original PR for DataSource v1 via `ALTER TABLE CHANGE COLUMN` worked with adding columns to structs in ArrayType element type, and should also work for structs in MapType value types too. I can address a specific qualified column (e.g. `a.b.c`) in a struct with `ADD COLUMNS`, but I don't think we address the element type of an array column or the value type of a map column in this way. Would one need to use `CHANGE|ALTER COLUMN` syntax for this? And if so, should that be supported?...And if so, would it not be easiest to add columns struct fields via `CHANGE|ALTER COLUMN` too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
