[GitHub] [spark] HyukjinKwon commented on a change in pull request #33436: [SPARK-35912][SQL] Fix nullability of `spark.read.json/spark.read.csv`

GitBox Tue, 20 Jul 2021 21:42:39 -0700


HyukjinKwon commented on a change in pull request #33436:
URL: https://github.com/apache/spark/pull/33436#discussion_r673652611




##########
File path: docs/sql-migration-guide.md
##########
@@ -22,6 +22,10 @@ license: |
 * Table of contents
 {:toc}
 
+## Upgrading from Spark SQL 3.2 to 3.3
+
+  - Non-nullable schema was not supported properly in previous Spark version 
so the output schema of `DataFrameReader.json(jsonDataset: Dataset[String])` 
and `DataFrameReader.csv(csvDataset: Dataset[String])` became nullable which 
also matches with `DataFrameReader.json(path: String)` and 
`DataFrameReader.csv(path: String)`.

Review comment:
       Hm, it gives you a wrong result only when the data contains `null` or 
the field is missing for non-nullable types. If the data doesn't contain `null` 
or all fields are present, it works all good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33436: [SPARK-35912][SQL] Fix nullability of `spark.read.json/spark.read.csv`

Reply via email to