https://github.com/apache/geode/pull/3449
On 4/11/19 3:28 PM, Bruce Schuchardt wrote:
I've reopened GEODE-3948 to address this Vahram. I'll have a pull
request up shortly.
On 4/11/19 8:06 AM, Vahram Aharonyan wrote:
Hi All,
We have 2 VMs that are running Geode 1.7 servers – one server per VM.
Along with Geode Server each VM has one Geode 1.7 Client. Hence we
have 2 servers and 2 clients in Geode cluster.
While doing validation, we have introduced packet loss(~65%) on first
VM “A” and after about 1 minute client of VM “B” reports following:
[warning 2019/04/11 16:20:27.502 AMT
Collector-c0f1ee3e-366a-4ac3-8fda-60540cdd21c4 <ThreadsMonitor>
tid=0x1c] Thread <2182> that was executed at <11 Apr 2019 16:19:11
AMT> has been stuck for <76.204 seconds> and number of thread monitor
iteration <1>
Thread Name <poolTimer-CollectorControllerPool-142>
Thread state <RUNNABLE>
Executor Group <ScheduledThreadPoolExecutorWithKeepAlive>
Monitored metric <ResourceManagerStats.numThreadsStuck>
Thread Stack:
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:171)
java.net.SocketInputStream.read(SocketInputStream.java:141)
sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
sun.security.ssl.InputRecord.read(InputRecord.java:503)
sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
org.apache.geode.internal.cache.tier.sockets.Message.fetchHeader(Message.java:809)
org.apache.geode.internal.cache.tier.sockets.Message.readHeaderAndBody(Message.java:659)
org.apache.geode.internal.cache.tier.sockets.Message.receiveWithHeaderReadTimeout(Message.java:1124)
org.apache.geode.internal.cache.tier.sockets.Message.receive(Message.java:1135)
org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:205)
org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:386)
org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:276)
org.apache.geode.cache.client.internal.QueueConnectionImpl.execute(QueueConnectionImpl.java:167)
org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:894)
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:387)
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:349)
org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:827)
org.apache.geode.cache.client.internal.PingOp.execute(PingOp.java:36)
org.apache.geode.cache.client.internal.LiveServerPinger$PingTask.run2(LiveServerPinger.java:90)
org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1338)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
org.apache.geode.internal.ScheduledThreadPoolExecutorWithKeepAlive$DelegatingScheduledFuture.run(ScheduledThreadPoolExecutorWithKeepAlive.java:271)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
This report and stacktrace is being continuously repeated by
ThreadsMOnitor over time – just iteration count and “stuck for”
values are increasing. From stacktrace it seems to be PingOperation
initiated by client on VM “B” to Server of VM “A”. Due to packet drop
between the nodes the response is not reaching caller client from the
server and this thread remaines blocked for hours. In source I see
that receiveWithHeaderReadTimeout receives NO_HEADER_READ_TIMEOUT as
a timeout argument which means we will wait indefinitely. Is this
reasonable? So the question is why PingOperation is executed without
timeout?
Or could it be that this stacked thread will be interrupted by some
monitoring logic at some moment?
Thanks,
Vahram.