GitHub user zsxwing opened a pull request:
https://github.com/apache/spark/pull/15065
[SPARK-17463][Core]Add necessary memory barrier for accumulators
## What changes were proposed in this pull request?
Added `volatile` for fields that will be read in the heartbeat thread.
Without them, the worse case is the user cannot see any metric updates until a
task finishes.
Unfortunately, there will be a performance regression comparing to Spark
2.0.0. Note: in Spark 1.6, accumulators have a volatile value, so there should
not be any performance regression comparing to 1.6.
To reduce the performance lost caused by `volatile`, there are two
alternative ways:
1. Use `AtomicXXX.lazySet` to update a metric value. The eventually set
should be enough for metrics. There are some comparison numbers in
http://psy-lob-saw.blogspot.com/2012/12/atomiclazyset-is-performance-win-for.html
. However, this will increase the metric object size a lot.
2. Use `AtomicLongFieldUpdater` to avoid the overhead of AtomicLong. See:
http://normanmaurer.me/blog/2013/10/28/Lesser-known-concurrent-classes-Part-1/
. The problem of this approach is we need to rewrite codes using Java
because we cannot add static codes in a class in Scala. The following codes
will fail with `java.lang.IllegalAccessException: Class Foo$ can not access a
member of class Foo with modifiers "private volatile"`:
```Scala
import java.util.concurrent.atomic.AtomicLongFieldUpdater
class Foo {
@volatile var x: Long = 0
}
object Foo extends App {
classOf[Foo].getDeclaredField("x").setAccessible(true) // newUpdater
doesn't use this
val field = AtomicLongFieldUpdater.newUpdater(classOf[Foo], "x")
}
```
## How was this patch tested?
Jenkins tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zsxwing/spark fix-accmu
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15065.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15065
----
commit 4c6bb0b92b55ca5368b2e2752704fc197f9be4ad
Author: Shixiong Zhu <[email protected]>
Date: 2016-09-12T20:51:53Z
Add necessary memory barrier for accumulators
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]