[GitHub] [spark] HyukjinKwon commented on a change in pull request #35229: [SPARK-27442][SQL] Remove check field name when reading data

GitBox Tue, 18 Jan 2022 02:22:58 -0800


HyukjinKwon commented on a change in pull request #35229:
URL: https://github.com/apache/spark/pull/35229#discussion_r786604550




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##########
@@ -434,7 +434,7 @@ case class DataSource(
           hs.partitionSchema,
           "in the partition schema",
           equality)
-        DataSourceUtils.verifySchema(hs.fileFormat, hs.dataSchema)
+        DataSourceUtils.checkFieldType(hs.fileFormat, hs.dataSchema)

Review comment:
       For Parquet, maybe yes for now. This is actually a bug in Parquet, 
right? Is it safe to remove this check for all other sources? If we want to fix 
this, I would treat this specifically for Parquet only for now. I don't think 
it's safe to assume that column names can have different restrictions on both 
reading and writing because a format specification should guarantee this 
whether it is in read or write. Doing the check in read side is good. It fails 
fast.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35229: [SPARK-27442][SQL] Remove check field name when reading data

Reply via email to