Github user colorant commented on a diff in the pull request:
https://github.com/apache/spark/pull/1273#discussion_r14544250
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -141,7 +141,7 @@ class HadoopRDD[K, V](
// local process. The local cache is accessed through
HadoopRDD.putCachedMetadata().
// The caching helps minimize GC, since a JobConf can contain ~10KB
of temporary objects.
// synchronize to prevent ConcurrentModificationException
(Spark-1097, Hadoop-10456)
- broadcastedConf.synchronized {
+ broadcastedConf.value.value.synchronized {
--- End diff --
@aarondav , Yes, I also thought about this before. The reason I keep use
broadcastedConf.value.value here is because: though broadcast variable is
suggested to be read only and not changed, But I wonder maybe in case someone
miss use it and change the value, read the latest value might be helpful. And
it read the latest code in the next line in the original code, so I keep this
style. But think again, if the value did changed in some place without hold any
synchronize lock, this might still not be able to solve the problem. I will
update the pull request.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---