[GitHub] [spark] gengliangwang commented on a change in pull request #24284: [SPARK-27356][SQL] File source V2: Fix the case that data columns overlap with partition schema

GitBox Thu, 04 Apr 2019 05:53:02 -0700

gengliangwang commented on a change in pull request #24284: [SPARK-27356][SQL] 
File source V2: Fix the case that data columns overlap with partition schema
URL: https://github.com/apache/spark/pull/24284#discussion_r272162696


 ##########
 File path: docs/sql-migration-guide-upgrade.md
 ##########
 @@ -50,6 +50,8 @@ license: |
 
   - In Spark version 2.4 and earlier, JSON datasource and JSON functions like 
`from_json` convert a bad JSON record to a row with all `null`s in the 
PERMISSIVE mode when specified schema is `StructType`. Since Spark 3.0, the 
returned row can contain non-`null` fields if some of JSON column values were 
parsed and converted to desired types successfully.
 
+  - In Spark version 2.4 and earlier, if data columns overlap with partition 
columns, the output schema of file scan respects the ordering of data columns, 
and adopts the data type of partition columns. Since Spark 3.0, the output 
schema of file scan puts all the partition columns at the end. For example, if 
the data schema is `[a: String, b: String, c: String]` and the partition schema 
is `[b: Int, d: Int]`, the result schema is `[a: String, b: Int, c: String, d: 
Int]` in Spark 2.4 and earlier, and `[a: String, c: String, b: Int, d: Int]` 
since Spark 3.0.
 
 Review comment:
   I am OK with either way. Let me remove this. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gengliangwang commented on a change in pull request #24284: [SPARK-27356][SQL] File source V2: Fix the case that data columns overlap with partition schema

Reply via email to