guykhazma commented on pull request #28826:
URL: https://github.com/apache/spark/pull/28826#issuecomment-647571832
@viirya `derivedFromAtt` is set to `false` when the expression is requesting
a nested field .
The metadata for a nested column is not preserved (also in Spark 2.4) so I
am not sure what the expected behaviour should be here.
Note that the metadata for nested field also not preserved when using:
```Scala
df.select("col_a.name").schema
```
when `df` is the dataframe that was created locally with the specified
schema.
If this is considered a bug then we can resolve this as well (will require
some more changes).
For example this test triggers a code path where `derivedFromAtt` is `false`
but it currently passes since metadata is not preserved for nested columns:
```Scala
test("SPARK-31988 - make sure schema metadata is preserved - nested
schema") {
withSQLConf((SQLConf.USE_V1_SOURCE_LIST.key,
"avro,csv,json,kafka,orc,text,parquet")) {
withTempPath{ f =>
// create custom dataset with schema metadata
val data = Seq(
Row(Row("a", 45), "b")
)
val schema = List(
StructField("col_a", StructType(
List(
StructField("name", StringType, true,
new MetadataBuilder().putString("check", "b").build()),
StructField("age", IntegerType, true)
)
), true,
new MetadataBuilder().putString("key", "value").build()),
StructField("col_b", StringType, true)
)
val df = spark.createDataFrame(
spark.sparkContext.parallelize(data),
StructType(schema)
)
df.write.parquet(f.getAbsolutePath)
// read from storage
val readDF = spark.read.parquet(f.getAbsolutePath)
// write again
withTempPath { f =>
readDF.select("col_a.name").write.parquet(f.getAbsolutePath)
// read again and verify the schema is equal (including the
metadata)
val readDF2 = spark.read.parquet(f.getAbsolutePath)
assert(readDF.select("col_a.name").schema == readDF2.schema)
}
}
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]