I looked back through the git history for Connection.java. He's not
running with a recent build. I was looking at 4/5/17 to find matching
line numbers.
On 1/12/18 2:20 PM, Barry Oglesby wrote:
It looks like you're running the function from a peer as opposed to a
client. Is that right? Otherwise, the ExecuteRegionFunction66 would be
deserializing the arguments in the ServerConnection that receives the
client's function execution request. In that case, the
StreamCorruptedException would occur there instead of in the
RemoteFunctionContext.
I wrote a test with a peer accessor member executing an onRegion
function. I implemented the readObject method in the Serializable
argument I passed to throw a StreamCorruptedException. The severe
warning I see when I run the test is pretty similar to yours except
yours.
In my test, the
FunctionStreamingResultCollector.waitForCacheOrFunctionException
method catches the ReplyException from the member that attempted to
deserialize the FunctionRemoteContext. It handles that exception and
completes. The caller gets the exception. This is exactly the behavior
Bruce described in his reply.
What is your ComputedAndSystemMetricsRetriever thread and how exactly
is it being called to execute the function? Maybe there is something
about the way its being called that is causing different behavior.
btw - I'm running this in the latest Geode develop code. What version
are you running?
Thanks,
Barry Oglesby
On Fri, Jan 12, 2018 at 11:46 AM, Bruce Schuchardt
<[email protected] <mailto:[email protected]>> wrote:
This shouldn't normally cause a hang. The code that handles
receipt of tcp/ip messages reads the message's "reply processor"
identifier before trying to deserilize the rest of the message.
If there is a problem in deserializing the message we send an
error response with the identifier so that the sender knows
something went wrong.
Having said that, I am not as familiar with the function execution
streaming-reply processors and how they handle this kind of
response. It's possible that a hang could occur in your situation
if these reply processors aren't prepared to deal with an error
response.
It seems to me that you should be more concerned that a
deserialization problem occurred at all. For instance, was the
treemap being actively modified during serialization? If so, take
steps to prevent that from happening.
On 1/10/18 5:02 AM, Vahram Aharonyan wrote:
Hi All,
We are experiencing an issue with the thread that is performing
onRegion call and expecting some result in response being stacked
forewer in TIMED_WAITING state with below trace:
"ComputedAndSystemMetricsRetriever" Id=490 in TIMED_WAITING on
lock=java.util.concurrent.CountDownLatch$Sync@5630fcc2
Total blocked: 33 Total waited: 261425
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:716)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:793)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:769)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:856)
org.apache.geode.internal.cache.execute.FunctionStreamingResultCollector.waitForCacheOrFunctionException(FunctionStreamingResultCollector.java:438)
org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:91)
platform.gemfire.GemfireFunctionExecutor.onRegion(GemfireFunctionExecutor.java:494)
In the logs of that member we see following:
[warning 2017/12/20 10:49:14.570 UTC
29acc6f1-5384-489d-b2bd-5187b898e482
<ComputedAndSystemMetricsRetriever> tid=0x1ea] 60 seconds have
elapsed while waiting for replies:
<PRFunctionStreamingResultCollector 100547 waiting for 1 replies
from
[gbv00457(abb6648c-39d6-4c4c-9c6d-ab8589e034a5:9583)<ec><v4>:10002]>
on
gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002
whose current membership list is:
[[gbv00458(8d2960b9-a6be-4519-9547-311e2717231e:15532)<ec><v5>:10002,
gbv00457(abb6648c-39d6-4c4c-9c6d-ab8589e034a5:9583)<ec><v4>:10002,
gbv00460(21fd5612-5fe2-451d-aa9d-b8542fa43fa7:20144)<ec><v9>:10002,
gbv00459(3a14f29a-8bdb-46d5-bb67-0f79cb5c7faa:17197)<ec><v7>:10002,
gbv00454(18618:locator)<ec><v1>:20002,
gbv00454(64aed382-0882-44f5-b71f-08a429af46dd:18983)<ec><v8>:10002,
gbv00453(13656:locator)<ec><v0>:20002,
gbv00453(881591a8-ae04-4af1-866a-5074c2ffb133:14490)<ec><v2>:10002,
gbv00456(63cebdf8-dd1e-414e-af5f-f8c4ebecf726:18001)<ec><v6>:10002,
gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002]]
Near that time on the nodes where this call lands, this
exceptions occur:
[severe 2017/12/20 10:48:14.728 UTC
abb6648c-39d6-4c4c-9c6d-ab8589e034a5 <P2P message reader for
gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002
shared unordered uid=8 port=41631> tid=0x44] IOException
deserializing message
java.io.IOException: failure during message deserialization
at
org.apache.geode.internal.tcp.MsgDestreamer.getMessage(MsgDestreamer.java:190)
at
org.apache.geode.internal.tcp.Connection.runOioReader(Connection.java:2218)
at
org.apache.geode.internal.tcp.Connection.run(Connection.java:1728)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.geode.SerializationException: Could not
create an instance of
org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage
.
at
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2492)
at
org.apache.geode.internal.DSFIDFactory.create(DSFIDFactory.java:979)
at
org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2720)
at
org.apache.geode.internal.tcp.MsgDestreamer$DestreamerThread.run(MsgDestreamer.java:261)
Caused by: org.apache.geode.SerializationException: Could not
create an instance of
org.apache.geode.internal.cache.execute.FunctionRemoteContext .
at
org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2521)
at
org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2958)
at
org.apache.geode.DataSerializer.readObject(DataSerializer.java:2897)
at
org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage.fromData(PartitionedRegionFunctionStreamingMessage.java:180)
at
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2477)
... 3 more
Caused by: org.apache.geode.SerializationException: Could not
create an instance of
org.apache.geode.internal.cache.execute.FunctionRemoteContext .
at
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2492)
at
org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2507)
... 7 more
Caused by: java.io <http://java.io>.StreamCorruptedException:
invalid type code: B1
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1563)
at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2567)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2508)
at java.util.TreeMap.readTreeSet(TreeMap.java:2460)
at java.util.TreeSet.readObject(TreeSet.java:533)
at sun.reflect.GeneratedMethodAccessor743.invoke(Unknown
Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at java.util.ArrayList.readObject(ArrayList.java:791)
at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown
Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at java.util.ArrayList.readObject(ArrayList.java:791)
at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown
Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at
java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at
org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2992)
at
org.apache.geode.DataSerializer.readObject(DataSerializer.java:2897)
at
org.apache.geode.internal.cache.execute.FunctionRemoteContext.fromData(FunctionRemoteContext.java:73)
at
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2479)
... 8 more
So could it be that these exceptions are not being sent back to
caller node resulting caller thread to wait for reply forever?
Thanks,
Vahram.