[ https://issues.apache.org/jira/browse/SPARK-38314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yaohua Zhao updated SPARK-38314: -------------------------------- Description: Selecting and then writing df containing hidden file metadata column `_metadata` into a file format like `parquet`, `delta` will still keep the internal `Attribute` metadata information. Then when reading those `parquet`, `delta` files again, it will actually break the code, because it wrongly thinks user data schema named `_metadata` is a hidden file source metadata column. Reproducible code: {code:java} // prepare a file source df df.select("*", "_metadata") .write.format("parquet").save(path) spark.read.format("parquet").load(path) .select("*").show(){code} was: Selecting and then writing df containing hidden file metadata column `_metadata` into a file format like `parquet`, `delta` will still keep the internal `Attribute` metadata information. Then when reading those `parquet`, `delta` files again, it will actually break the code, because it wrongly thinks user data schema named `_metadata` is a hidden file source metadata column. Reproducible code: ``` // prepare a file source df df.select("*", "_metadata") .write.format("parquet").save(path) spark.read.format("parquet").load(path) .select("*").show() ``` > Fail to read parquet files after writing the hidden file metadata in > -------------------------------------------------------------------- > > Key: SPARK-38314 > URL: https://issues.apache.org/jira/browse/SPARK-38314 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.1 > Reporter: Yaohua Zhao > Priority: Major > > Selecting and then writing df containing hidden file metadata column > `_metadata` into a file format like `parquet`, `delta` will still keep the > internal `Attribute` metadata information. Then when reading those `parquet`, > `delta` files again, it will actually break the code, because it wrongly > thinks user data schema named `_metadata` is a hidden file source metadata > column. > > Reproducible code: > {code:java} > // prepare a file source df > df.select("*", "_metadata") > .write.format("parquet").save(path) > spark.read.format("parquet").load(path) > .select("*").show(){code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org