Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/2684#discussion_r18497896
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -132,24 +132,12 @@ class HadoopRDD[K, V](
// Returns a JobConf that will be used on slaves to obtain input splits
for Hadoop reads.
protected def getJobConf(): JobConf = {
val conf: Configuration = broadcastedConf.value.value
- if (conf.isInstanceOf[JobConf]) {
- // A user-broadcasted JobConf was provided to the HadoopRDD, so
always use it.
- conf.asInstanceOf[JobConf]
- } else if (HadoopRDD.containsCachedMetadata(jobConfCacheKey)) {
- // getJobConf() has been called previously, so there is already a
local cache of the JobConf
- // needed by this RDD.
- HadoopRDD.getCachedMetadata(jobConfCacheKey).asInstanceOf[JobConf]
- } else {
- // Create a JobConf that will be cached and used across this RDD's
getJobConf() calls in the
- // local process. The local cache is accessed through
HadoopRDD.putCachedMetadata().
- // The caching helps minimize GC, since a JobConf can contain ~10KB
of temporary objects.
- // Synchronize to prevent ConcurrentModificationException
(Spark-1097, Hadoop-10456).
- HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
- val newJobConf = new JobConf(conf)
+ HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
+ val newJobConf = new JobConf(conf)
--- End diff --
Does this actually clone the internal map? Or does it just create pointers
to the supplied `conf`? If it just creates pointers it seems like it might end
up having the same synchronization issues.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]