advancedxy commented on a change in pull request #25616: [SPARK-28907][CORE]
Review invalid usage of new Configuration()
URL: https://github.com/apache/spark/pull/25616#discussion_r319373447
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileWholeTextReader.scala
##########
@@ -45,6 +45,7 @@ class HadoopFileWholeTextReader(file: PartitionedFile, conf:
Configuration)
val attemptId = new TaskAttemptID(new TaskID(new JobID(), TaskType.MAP,
0), 0)
val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId)
val reader = new WholeTextFileRecordReader(fileSplit,
hadoopAttemptContext, 0)
+ reader.setConf(hadoopAttemptContext.getConfiguration)
Review comment:
`WholeTextFileRecordReader` is `Configurable`, `setConf` should be called
after creation.
This is why tests are failing before this patch.
However, I am wondering for
`org.apache.spark.input.WholeTextFileRecordReader` and
`org.apache.spark.input.ConfigurableCombineFileRecordReader`, we can already
retrieve config from `org.apache.hadoop.mapreduce.TaskAttemptContext`. There
is no need to make these class `Configurable`
I am wondering if we should remove `Configurable` trait for the related
classes all at once. what do you think @gatorsmile
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]