[GitHub] [spark] HyukjinKwon commented on a change in pull request #35229: [SPARK-27442][SQL] Remove check field name when reading data

GitBox Tue, 18 Jan 2022 01:57:34 -0800


HyukjinKwon commented on a change in pull request #35229:
URL: https://github.com/apache/spark/pull/35229#discussion_r786586190




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##########
@@ -434,7 +434,7 @@ case class DataSource(
           hs.partitionSchema,
           "in the partition schema",
           equality)
-        DataSourceUtils.verifySchema(hs.fileFormat, hs.dataSchema)
+        DataSourceUtils.checkFieldType(hs.fileFormat, hs.dataSchema)

Review comment:
       We can but I am not sure if these field names would work properly. 
Especially it would be more problematic when schema evolution is included here 
as we allow to read special characters but won't write out. Another concern is 
that PARQUET-1809 mentions something like dots are being used to nested column 
access, are we sure these file will be read corretly in Parquet without 
correctness issues? Furthermore, we will be potentially exposed to all kind of 
these problems from Avro or ORC + unknown sources that implement `FileFormat`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35229: [SPARK-27442][SQL] Remove check field name when reading data

Reply via email to