gengliangwang commented on a change in pull request #24284: [SPARK-27356][SQL] File source V2: Fix the case that data columns overlap with partition schema URL: https://github.com/apache/spark/pull/24284#discussion_r272162696
########## File path: docs/sql-migration-guide-upgrade.md ########## @@ -50,6 +50,8 @@ license: | - In Spark version 2.4 and earlier, JSON datasource and JSON functions like `from_json` convert a bad JSON record to a row with all `null`s in the PERMISSIVE mode when specified schema is `StructType`. Since Spark 3.0, the returned row can contain non-`null` fields if some of JSON column values were parsed and converted to desired types successfully. + - In Spark version 2.4 and earlier, if data columns overlap with partition columns, the output schema of file scan respects the ordering of data columns, and adopts the data type of partition columns. Since Spark 3.0, the output schema of file scan puts all the partition columns at the end. For example, if the data schema is `[a: String, b: String, c: String]` and the partition schema is `[b: Int, d: Int]`, the result schema is `[a: String, b: Int, c: String, d: Int]` in Spark 2.4 and earlier, and `[a: String, c: String, b: Int, d: Int]` since Spark 3.0. Review comment: I am OK with either way. Let me remove this. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
