[GitHub] [spark] Kimahriman commented on pull request #32972: [SPARK-35756][SQL] unionByName supports struct having same col names but different sequence

GitBox Sat, 19 Jun 2021 20:47:08 -0700


Kimahriman commented on pull request #32972:
URL: https://github.com/apache/spark/pull/32972#issuecomment-864495223



   They're mostly different issues. This is more of a semantics thing. If you 
have two nested structs with the same fields, but in a different order, you 
have to set `allowMissingCol` to true in order for the structs to be sorted, 
which isn't very intuitive. This is trying to make the `ByName` part apply to 
nested structs as well, and leave `allowMissingCol` to just actually apply to 
missing (possibly nested) columns.
   
   So I do think this idea makes sense, but I don't think the implementation 
handles multiple levels of nested structs correctly. `addFields` assumes adding 
missing columns, so I think you could end up with a case that adds null nested 
columns even if `allowMissingCol` is false.
   
   I think the logic would have to be added to `addFields` to handle whether or 
not it should add null missing columns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Kimahriman commented on pull request #32972: [SPARK-35756][SQL] unionByName supports struct having same col names but different sequence

Reply via email to