-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48392/
-----------------------------------------------------------

(Updated June 8, 2016, 8:44 a.m.)


Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.


Bugs: AMBARI-17106
    https://issues.apache.org/jira/browse/AMBARI-17106


Repository: ambari


Description
-------

This is another case of an Ambari cache competing with a JPA transaction. 
Consider these steps:
- A new configuration is created within the context of a Transaction
- Within that same Transaction, the stale configuration cache is told to 
invalidate
- After purging the old data, but before the Transaction is committed, another 
thread tries to read from the cache. It ends up re-populating the old data.

Sometimes the code works because the Transaction is able to committ before the 
cache is re-populated by another thread. In theory, we should be locking around 
reading the cache to ensure that there isn't a transaction writing to it. 
However, this is what caused the deadlock since it interferes with our wonder 
"cluster global lock of doom".

Instead, it's safer in this case to just invalidate the cache after the 
Transaction completes. 
- We do this invalidate on a separate thread to ensure we don't have issues 
with the cluster global lock
- Since the cache isn't needed within the context of the invalidation call, 
it's OK to purge it asynchronously.

```
"Server Action Executor Worker 401" #225 prio=5 os_prio=0 
tid=0x00007fa07c03e800 nid=0x65df waiting on condition [0x00007fa0737ef000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000a059d4f0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)

  --> @TransactionalLock(lockArea = LockArea.STALE_CONFIG_CACHE, lockType = 
LockType.WRITE)

  at 
org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.lockTransaction(AmbariJpaLocalTxnInterceptor.java:291)
        at 
org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:114)
        at 
com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
        at 
com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)
        at 
org.apache.ambari.server.state.cluster.ClusterImpl$$EnhancerByGuice$$991b84fc.applyConfigs(<generated>)

  --> clusterGlobalLock.writeLock().lock();

  at 
org.apache.ambari.server.state.cluster.ClusterImpl.addDesiredConfig(ClusterImpl.java:2340)
        at 
org.apache.ambari.server.state.ConfigHelper.createConfigTypes(ConfigHelper.java:897)
        at 
org.apache.ambari.server.controller.internal.UpgradeResourceProvider.applyStackAndProcessConfigurations(UpgradeResourceProvider.java:1174)
```

```
"ambari-hearbeat-monitor" #23 prio=5 os_prio=0 tid=0x00007fa07476c000 
nid=0x20ad waiting on condition [0x00007fa07bbfb000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000a32d44a0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at 
org.apache.ambari.server.state.cluster.ClusterImpl.getDesiredStackVersion(ClusterImpl.java:1052)
        at 
org.apache.ambari.server.state.ConfigHelper.calculateIsStaleConfigs(ConfigHelper.java:1075)

  --> @TransactionalLock(lockArea = LockArea.STALE_CONFIG_CACHE, lockType = 
LockType.READ)
  
        at 
org.apache.ambari.server.state.ConfigHelper.isStaleConfigs(ConfigHelper.java:456)
        at 
org.apache.ambari.server.agent.HeartbeatMonitor.createStatusCommand(HeartbeatMonitor.java:311)
```


Diffs (updated)
-----

  ambari-server/src/main/java/org/apache/ambari/server/state/ConfigHelper.java 
1bd0ba9 
  ambari-server/src/main/java/org/apache/ambari/server/state/ConfigImpl.java 
b590c56 
  
ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java
 9e9128e 
  
ambari-server/src/test/java/org/apache/ambari/server/state/ConfigHelperTest.java
 b4f0c52 

Diff: https://reviews.apache.org/r/48392/diff/


Testing (updated)
-------

Tests run: 4463, Failures: 0, Errors: 0, Skipped: 34

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 34:11 min
[INFO] Finished at: 2016-06-08T01:10:29-04:00
[INFO] Final Memory: 38M/645M
[INFO] ------------------------------------------------------------------------


Thanks,

Jonathan Hurley

Reply via email to