kevinwallimann opened a new pull request #35270:
URL: https://github.com/apache/spark/pull/35270
### What changes were proposed in this pull request?
The metadata of a `GetStructField` expression is propagated in the `Alias`
expression.
### Why are the changes needed?
Currently, in a dataframe with nested structs, when selecting an inner
struct, the metadata of that inner struct is lost. For example, suppose
`df.schema.head.dataType.head.metadata`
returns a non-empty Metadata object, then
`df.select('Field0.SubField0').schema.head.metadata`
returns an empty Metadata object
The following snippet demonstrates the issue
```
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{LongType, MetadataBuilder, StructField,
StructType}
val metadataAbc = new MetadataBuilder().putString("my-metadata",
"abc").build()
val metadataXyz = new MetadataBuilder().putString("my-metadata",
"xyz").build()
val schema = StructType(Seq(
StructField("abc",
StructType(Seq(
StructField("xyz", LongType, nullable = true, metadataXyz)
)), metadata = metadataAbc)))
import scala.collection.JavaConverters._
val data = Seq(Row(Row(1L))).asJava
val df = spark.createDataFrame(data, schema)
println(df.select("abc").schema.head.metadata) // OK, metadata is
{"my-metadata":"abc"}
println(df.select("abc.xyz").schema.head.metadata) // NOT OK, metadata is
{}, expected {"my-metadata","xyz"}
```
The issue can be reproduced in versions 3.2.0, 3.1.2 and 2.4.8
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added a new test
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]