[GitHub] [spark] JoshRosen commented on a diff in pull request #37591: [SPARK-40158][SQL] Remove useless configuration & extract common code for parquet read

GitBox Sun, 21 Aug 2022 16:41:25 -0700


JoshRosen commented on code in PR #37591:
URL: https://github.com/apache/spark/pull/37591#discussion_r950916955



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:
##########
@@ -199,33 +199,7 @@ class ParquetFileFormat
       filters: Seq[Filter],
       options: Map[String, String],
       hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = 
{
-    hadoopConf.set(ParquetInputFormat.READ_SUPPORT_CLASS, 
classOf[ParquetReadSupport].getName)
-    hadoopConf.set(
-      ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA,
-      requiredSchema.json)
-    hadoopConf.set(
-      ParquetWriteSupport.SPARK_ROW_SCHEMA,

Review Comment:
   It looks like this configuration is read at 
https://github.com/apache/spark/blob/cf1a80eeae8bf815270fb39568b1846c2bd8d437/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala#L97-L99
 
   
   Given that usage, it doesn't seem immediately self-evident that this 
configuration is no longer needed. Can you please explain in more detail why 
you think that it is safe to remove?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] JoshRosen commented on a diff in pull request #37591: [SPARK-40158][SQL] Remove useless configuration & extract common code for parquet read

Reply via email to