advancedxy commented on a change in pull request #25616: [SPARK-28907][CORE] 
Review invalid usage of new Configuration()
URL: https://github.com/apache/spark/pull/25616#discussion_r319373447
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileWholeTextReader.scala
 ##########
 @@ -45,6 +45,7 @@ class HadoopFileWholeTextReader(file: PartitionedFile, conf: 
Configuration)
     val attemptId = new TaskAttemptID(new TaskID(new JobID(), TaskType.MAP, 
0), 0)
     val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId)
     val reader = new WholeTextFileRecordReader(fileSplit, 
hadoopAttemptContext, 0)
+    reader.setConf(hadoopAttemptContext.getConfiguration)
 
 Review comment:
   `WholeTextFileRecordReader` is `Configurable`, `setConf` should be called 
after creation.
   This is why tests are failing before this patch.
   
   However, I am wondering for 
`org.apache.spark.input.WholeTextFileRecordReader` and 
`org.apache.spark.input.ConfigurableCombineFileRecordReader`, we can already 
retrieve config from `org.apache.hadoop.mapreduce.TaskAttemptContext`.  There 
is no need to make these class `Configurable`
   
   I am wondering if we should remove `Configurable` trait for the related 
classes all at once.  what do you think @gatorsmile 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to