GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/15065

    [SPARK-17463][Core]Add necessary memory barrier for accumulators

    ## What changes were proposed in this pull request?
    
    Added `volatile` for fields that will be read in the heartbeat thread. 
Without them, the worse case is the user cannot see any metric updates until a 
task finishes.
    
    Unfortunately, there will be a performance regression comparing to Spark 
2.0.0. Note: in Spark 1.6, accumulators have a volatile value, so there should 
not be any performance regression comparing to 1.6.
    
    To reduce the performance lost caused by `volatile`, there are two 
alternative ways:
    
    1. Use `AtomicXXX.lazySet` to update a metric value. The eventually set 
should be enough for metrics. There are some comparison numbers in 
http://psy-lob-saw.blogspot.com/2012/12/atomiclazyset-is-performance-win-for.html
 . However, this will increase the metric object size a lot.
    
    2. Use `AtomicLongFieldUpdater` to avoid the overhead of AtomicLong. See: 
http://normanmaurer.me/blog/2013/10/28/Lesser-known-concurrent-classes-Part-1/
     . The problem of this approach is we need to rewrite codes using Java 
because we cannot add static codes in a class in Scala. The following codes 
will fail with `java.lang.IllegalAccessException: Class Foo$ can not access a 
member of class Foo with modifiers "private volatile"`:
    
    ```Scala
    import java.util.concurrent.atomic.AtomicLongFieldUpdater
    
    class Foo {
      @volatile var x: Long = 0
    }
    
    object Foo extends App {
      classOf[Foo].getDeclaredField("x").setAccessible(true) // newUpdater 
doesn't use this
      val field =  AtomicLongFieldUpdater.newUpdater(classOf[Foo], "x")
    }
    ```
    
    
    ## How was this patch tested?
    
    Jenkins tests
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark fix-accmu

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15065.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15065
    
----
commit 4c6bb0b92b55ca5368b2e2752704fc197f9be4ad
Author: Shixiong Zhu <[email protected]>
Date:   2016-09-12T20:51:53Z

    Add necessary memory barrier for accumulators

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to