Here's the implementation for internalBlockUntilConnectedOrTimedOut(). So, it
can't possibly be blocked indefinitely. It will only block for
connectionTimeoutMs.
void internalBlockUntilConnectedOrTimedOut() throws InterruptedException
{
long waitTimeMs = connectionTimeoutMs;
while ( !state.isConnected() && (waitTimeMs > 0) )
{
final CountDownLatch latch = new CountDownLatch(1);
Watcher tempWatcher = new Watcher()
{
@Override
public void process(WatchedEvent event)
{
latch.countDown();
}
};
state.addParentWatcher(tempWatcher);
long startTimeMs = System.currentTimeMillis();
try
{
latch.await(1, TimeUnit.SECONDS);
}
finally
{
state.removeParentWatcher(tempWatcher);
}
long elapsed = Math.max(1, System.currentTimeMillis() -
startTimeMs);
waitTimeMs -= elapsed;
}
}
On Nov 8, 2013, at 11:43 AM, "Bae, Jae Hyeon" <[email protected]> wrote:
> Hi
>
> I got reported that unresponsive servers in production very frequently and
> the symptom is that all http threads are hung on ZK activity. It looks
> InternalBlockUntilConnectedOrTimeOut called from InterProcessMutex recipes.
> The followings are stack trace snippets. Do you have any clue what's wrong
> here?
>
> Thank you
> Best, Jae
>
> "http-0.0.0.0-7101-3" daemon prio=10 tid=0x00000000015c6000 nid=0x11ae
> waiting on condition [0x00007f91771ee000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000005ee0b7688> (a
> java.util.concurrent.CountDownLatch$Sync)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
> at
> org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:296)
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:105)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.findProtectedNodeInForeground(CreateBuilderImpl.java:660)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.access$800(CreateBuilderImpl.java:42)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:619)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:610)
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:606)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:429)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:409)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:42)
> at
> org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:224)
> at
> org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:221)
> at
> org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:96)