guykhazma opened a new pull request #28826:
URL: https://github.com/apache/spark/pull/28826


   ### What changes were proposed in this pull request?
   
   Fixing the `getRootFields` function to preserve attribute metadata
   
   ### Why are the changes needed?
   
   This can lead to a potential loss of metadata on an attribute in some code 
paths.
   
   For example - when reading a parquet file with a schema that has metadata 
and writing it back the parquet footer it will not contain the metadata that 
was originally there.
   
   Simple code to reproduce (assuming datasource v2):
   ```Scala
       // create custom dataset
       val data = Seq(
         Row("a", "b")
       )
       val schema = List(
         StructField("col_a", StringType, true,
           new sql.types.MetadataBuilder().putString("key", "value").build()),
         StructField("col_b", StringType, true)
       )
       val df = spark.createDataFrame(
         spark.sparkContext.parallelize(data),
         StructType(schema)
       )
       // write
       df.write.parquet("/tmp/check")
       // read and verify the metadata exists
       val readDF = spark.read.parquet("/tmp/check")
       readDF.schema.foreach(s => println(s.metadata))
       // write again
       readDF.write.parquet("/tmp/check2")
       // read again and verify no metadata
       val readDF2 = spark.read.parquet("/tmp/check2")
       readDF2.schema.foreach(s => println(s.metadata))
   ```
   
   Note that this doesn't happen in datasource v1.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   
   No tests were added as this is a minor change to a private function and no 
tests exist to check this function so far.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to