----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48392/#review136627 -----------------------------------------------------------
Ship it! Ship It! - Nate Cole On June 7, 2016, 11:49 p.m., Jonathan Hurley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48392/ > ----------------------------------------------------------- > > (Updated June 7, 2016, 11:49 p.m.) > > > Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas. > > > Bugs: AMBARI-17106 > https://issues.apache.org/jira/browse/AMBARI-17106 > > > Repository: ambari > > > Description > ------- > > This is another case of an Ambari cache competing with a JPA transaction. > Consider these steps: > - A new configuration is created within the context of a Transaction > - Within that same Transaction, the stale configuration cache is told to > invalidate > - After purging the old data, but before the Transaction is committed, > another thread tries to read from the cache. It ends up re-populating the old > data. > > Sometimes the code works because the Transaction is able to committ before > the cache is re-populated by another thread. In theory, we should be locking > around reading the cache to ensure that there isn't a transaction writing to > it. However, this is what caused the deadlock since it interferes with our > wonder "cluster global lock of doom". > > Instead, it's safer in this case to just invalidate the cache after the > Transaction completes. > - We do this invalidate on a separate thread to ensure we don't have issues > with the cluster global lock > - Since the cache isn't needed within the context of the invalidation call, > it's OK to purge it asynchronously. > > ``` > "Server Action Executor Worker 401" #225 prio=5 os_prio=0 > tid=0x00007fa07c03e800 nid=0x65df waiting on condition [0x00007fa0737ef000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000a059d4f0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$FairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > > --> @TransactionalLock(lockArea = LockArea.STALE_CONFIG_CACHE, lockType = > LockType.WRITE) > > at > org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.lockTransaction(AmbariJpaLocalTxnInterceptor.java:291) > at > org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:114) > at > com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72) > at > com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52) > at > org.apache.ambari.server.state.cluster.ClusterImpl$$EnhancerByGuice$$991b84fc.applyConfigs(<generated>) > > --> clusterGlobalLock.writeLock().lock(); > > at > org.apache.ambari.server.state.cluster.ClusterImpl.addDesiredConfig(ClusterImpl.java:2340) > at > org.apache.ambari.server.state.ConfigHelper.createConfigTypes(ConfigHelper.java:897) > at > org.apache.ambari.server.controller.internal.UpgradeResourceProvider.applyStackAndProcessConfigurations(UpgradeResourceProvider.java:1174) > ``` > > ``` > "ambari-hearbeat-monitor" #23 prio=5 os_prio=0 tid=0x00007fa07476c000 > nid=0x20ad waiting on condition [0x00007fa07bbfb000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000a32d44a0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > org.apache.ambari.server.state.cluster.ClusterImpl.getDesiredStackVersion(ClusterImpl.java:1052) > at > org.apache.ambari.server.state.ConfigHelper.calculateIsStaleConfigs(ConfigHelper.java:1075) > > --> @TransactionalLock(lockArea = LockArea.STALE_CONFIG_CACHE, lockType = > LockType.READ) > > at > org.apache.ambari.server.state.ConfigHelper.isStaleConfigs(ConfigHelper.java:456) > at > org.apache.ambari.server.agent.HeartbeatMonitor.createStatusCommand(HeartbeatMonitor.java:311) > ``` > > > Diffs > ----- > > > ambari-server/src/main/java/org/apache/ambari/server/state/ConfigHelper.java > 1bd0ba9 > ambari-server/src/main/java/org/apache/ambari/server/state/ConfigImpl.java > b590c56 > > ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java > 9e9128e > > Diff: https://reviews.apache.org/r/48392/diff/ > > > Testing > ------- > > PENDING > > > Thanks, > > Jonathan Hurley > >