SaurabhChawla100 edited a comment on pull request #32972:
URL: https://github.com/apache/spark/pull/32972#issuecomment-864544171
> They're mostly different issues. This is more of a semantics thing. If you
have two nested structs with the same fields, but in a different order, you
have to set `allowMissingCol` to true in order for the structs to be sorted,
which isn't very intuitive. This is trying to make the `ByName` part apply to
nested structs as well, and leave `allowMissingCol` to just actually apply to
missing (possibly nested) columns.
>
> So I do think this idea makes sense, but I don't think the implementation
handles multiple levels of nested structs correctly. `addFields` assumes adding
missing columns, so I think you could end up with a case that adds null nested
columns even if `allowMissingCol` is false.
>
> I think the logic would have to be added to `addFields` to handle whether
or not it should add null missing columns.
```So I do think this idea makes sense, but I don't think the
implementation handles multiple levels of nested structs correctly. addFields
assumes adding missing columns, so I think you could end up with a case that
adds null nested columns even if allowMissingCol is false.```
@Kimahriman - Not able to understand how missing columns added null nested
column in this PR for allowMissingCol is false.
```
case (source: StructType, target: StructType)
if !allowMissingCol && !source.sameType(target) &&
target.toAttributes.map(attr => attr.name).sorted
== source.toAttributes.map(x => x.name).sorted =>
// Having an output with same name, but different struct type.
// We will sort columns in the struct expression to make sure
two sides of
// union have consistent schema.
aliased += foundAttr
Alias(addFields(foundAttr, target), foundAttr.name)()
```
In this PR we are only calling the addFields when both source and target
side have same columns on both sides **( target.toAttributes.map(attr =>
attr.name).sorted == source.toAttributes.map(x => x.name).sorted)**.
```
val missingFieldsOpt =
StructType.findMissingFields(col.dataType.asInstanceOf[StructType],
target, resolver)
```
so this missingFieldsOpt is always empty , and we just do a sorting when
allowMissingCol is false.
```
if (missingFieldsOpt.isEmpty) {
sortStructFields(col)
}
```
Please do let me know if my understanding is not correct here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]