HyukjinKwon commented on a change in pull request #35229:
URL: https://github.com/apache/spark/pull/35229#discussion_r786604550
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##########
@@ -434,7 +434,7 @@ case class DataSource(
hs.partitionSchema,
"in the partition schema",
equality)
- DataSourceUtils.verifySchema(hs.fileFormat, hs.dataSchema)
+ DataSourceUtils.checkFieldType(hs.fileFormat, hs.dataSchema)
Review comment:
For Parquet, maybe yes for now. This is actually a bug in Parquet,
right? Is it safe to remove this check for all other sources? If we want to fix
this, I would treat this specifically for Parquet only for now. I don't think
it's safe to assume that column names can have different restrictions on both
reading and writing because a format specification should guarantee this
whether it is in read or write. Doing the check in read side is good. It fails
fast.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]