ahshahid opened a new pull request, #48174:
URL: https://github.com/apache/spark/pull/48174
### What changes were proposed in this pull request?
1) The places where I could see the schema being created from UDT Type,
instead of the UDTType, its sql type is used
2) A Logical Plan containing deserializer is being used to serialize ( i.e
show the plan as dataframe), then the target DataType during UpCast, if its of
type UDTType, then its sql type is used as target type.
This is done as a rule in analyzer. I am not sure if that is the best way to
handle the issue.
### Why are the changes needed?
When showing the schema definition, the UDTType is presented as the
ClassName of the UDT Type instead of the sql representation.
For eg: say message field is a UDT with sql representation as
StructField("intField", IntegerType, nullable = false),
StructField("stringField", StringType, nullable = false)))
But the schema contains it as
root
|-- message: test (nullable = true)
instead of
root
|-- message: struct (nullable = true)
| |-- intField: integer (nullable = false)
| |-- stringField: string (nullable = false)
### Does this PR introduce _any_ user-facing change?
Yes . The schema containing a UDT type will now be represented using its sql
type.
### How was this patch tested?
Added a bug test. More tests will be added..
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]