Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/15371#discussion_r82521015
--- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala ---
@@ -444,7 +444,7 @@ class CollectionAccumulator[T] extends AccumulatorV2[T,
java.util.List[T]] {
override def copy(): CollectionAccumulator[T] = {
val newAcc = new CollectionAccumulator[T]
- newAcc._list.addAll(_list)
+ newAcc.merge(this)
--- End diff --
Yeah I see why this is probably worth fixing in the base class too. The
only concern I have is that `merge` causes a copy of the argument's list, when
here we could get away with `_list.synchronized { newAcc._list.addAll(_list)
}`? It would also avoid a tiny behavior change, that subclasses would now find
copy calls merge (though I don't know of a reason that would be a problem now.)
We could extend the same logic to `merge` to avoid calling `value` but it would
require holding locks on two lists at once and there's an opening for a
deadlock there, I think, that I don't want to mess with.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]