Hi All,

Could it be someone had a chance to look at info provided in last replies to 
this thread?

Thanks,
Vahram.

From: Vahram Aharonyan [mailto:[email protected]]
Sent: Monday, January 15, 2018 10:02 PM
To: [email protected]
Subject: RE: Function Executor thread stacked

Hi Bruce/Barry,

We are using geode-1.1.0, but we have seen similar failures in 1.2.0 as well. 
This call is peer-to-peer call. BTW, it is not mandatory that we are always 
getting StreamCorruptedException. Once we have ClassCastException as a cause as 
well. Please refer to attached screenshot. So it seems that data that we are 
trying to deserilize seems to be corrupted.

Here is how I was able to reproduce the issue on my side today.

I’ve set a breakpoint in org.apache.geode.internal.tcp.MsgDestreamer#getMessage 
and set new IOException() as a new value for this.failure as a result I’ve 
entered to org.apache.geode.internal.tcp.Connection#sendFailureReply. As 
directAck = false and rpId = 0 in my case, I’ve not got ReplyException sent 
back to caller. BTW, could you please help me to understand what meaning these 
directAck and rpId have and how function executor from the caller peer can have 
some impact on their values?

As I wrote in my previous reply to Dan in this thread, it does not seem that we 
have some concurrent usage of arguments object – anyways I will double check 
this again.

Thanks,
Vahram.

From: Bruce Schuchardt [mailto:[email protected]]
Sent: Saturday, January 13, 2018 3:38 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Function Executor thread stacked


I looked back through the git history for Connection.java.  He's not running 
with a recent build.  I was looking at 4/5/17 to find matching line numbers.
On 1/12/18 2:20 PM, Barry Oglesby wrote:
It looks like you're running the function from a peer as opposed to a client. 
Is that right? Otherwise, the ExecuteRegionFunction66 would be deserializing 
the arguments in the ServerConnection that receives the client's function 
execution request. In that case, the StreamCorruptedException would occur there 
instead of in the RemoteFunctionContext.

I wrote a test with a peer accessor member executing an onRegion function. I 
implemented the readObject method in the Serializable argument I passed to 
throw a StreamCorruptedException. The severe warning I see when I run the test 
is pretty similar to yours except yours.

In my test, the 
FunctionStreamingResultCollector.waitForCacheOrFunctionException method catches 
the ReplyException from the member that attempted to deserialize the 
FunctionRemoteContext. It handles that exception and completes. The caller gets 
the exception. This is exactly the behavior Bruce described in his reply.

What is your ComputedAndSystemMetricsRetriever thread and how exactly is it 
being called to execute the function? Maybe there is something about the way 
its being called that is causing different behavior.

btw - I'm running this in the latest Geode develop code. What version are you 
running?


Thanks,
Barry Oglesby


On Fri, Jan 12, 2018 at 11:46 AM, Bruce Schuchardt 
<[email protected]<mailto:[email protected]>> wrote:

This shouldn't normally cause a hang.  The code that handles receipt of tcp/ip 
messages reads the message's "reply processor" identifier before trying to 
deserilize the rest of the message.  If there is a problem in deserializing the 
message we send an error response with the identifier so that the sender knows 
something went wrong.

Having said that, I am not as familiar with the function execution 
streaming-reply processors and how they handle this kind of response.  It's 
possible that a hang could occur in your situation if these reply processors 
aren't prepared to deal with an error response.

It seems to me that you should be more concerned that a deserialization problem 
occurred at all.  For instance, was the treemap being actively modified during 
serialization?  If so, take steps to prevent that from happening.

On 1/10/18 5:02 AM, Vahram Aharonyan wrote:
Hi All,

We are experiencing an issue with the thread that is performing onRegion call 
and expecting some result in response being stacked forewer in TIMED_WAITING 
state with below  trace:

"ComputedAndSystemMetricsRetriever" Id=490 in TIMED_WAITING on 
lock=java.util.concurrent.CountDownLatch$Sync@5630fcc2<mailto:lock=java.util.concurrent.CountDownLatch$Sync@5630fcc2>
Total blocked: 33   Total waited: 261425
  sun.misc.Unsafe.park(Native Method)
  java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
  
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
  java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
  
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
  
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:716)
  
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:793)
  
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:769)
  
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:856)
  
org.apache.geode.internal.cache.execute.FunctionStreamingResultCollector.waitForCacheOrFunctionException(FunctionStreamingResultCollector.java:438)
  
org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:91)
  
platform.gemfire.GemfireFunctionExecutor.onRegion(GemfireFunctionExecutor.java:494)

In the logs of that member we see following:

[warning 2017/12/20 10:49:14.570 UTC 29acc6f1-5384-489d-b2bd-5187b898e482 
<ComputedAndSystemMetricsRetriever> tid=0x1ea] 60 seconds have elapsed while 
waiting for replies: <PRFunctionStreamingResultCollector 100547 waiting for 1 
replies from 
[gbv00457(abb6648c-39d6-4c4c-9c6d-ab8589e034a5:9583)<ec><v4>:10002]> on 
gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002 whose 
current membership list is: 
[[gbv00458(8d2960b9-a6be-4519-9547-311e2717231e:15532)<ec><v5>:10002, 
gbv00457(abb6648c-39d6-4c4c-9c6d-ab8589e034a5:9583)<ec><v4>:10002, 
gbv00460(21fd5612-5fe2-451d-aa9d-b8542fa43fa7:20144)<ec><v9>:10002, 
gbv00459(3a14f29a-8bdb-46d5-bb67-0f79cb5c7faa:17197)<ec><v7>:10002, 
gbv00454(18618:locator)<ec><v1>:20002, 
gbv00454(64aed382-0882-44f5-b71f-08a429af46dd:18983)<ec><v8>:10002, 
gbv00453(13656:locator)<ec><v0>:20002, 
gbv00453(881591a8-ae04-4af1-866a-5074c2ffb133:14490)<ec><v2>:10002, 
gbv00456(63cebdf8-dd1e-414e-af5f-f8c4ebecf726:18001)<ec><v6>:10002, 
gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002]]

Near that time on the nodes where this call lands, this exceptions occur:

[severe 2017/12/20 10:48:14.728 UTC abb6648c-39d6-4c4c-9c6d-ab8589e034a5 <P2P 
message reader for 
gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002 shared 
unordered uid=8 port=41631> tid=0x44] IOException deserializing message
java.io.IOException: failure during message deserialization
        at 
org.apache.geode.internal.tcp.MsgDestreamer.getMessage(MsgDestreamer.java:190)
        at 
org.apache.geode.internal.tcp.Connection.runOioReader(Connection.java:2218)
        at org.apache.geode.internal.tcp.Connection.run(Connection.java:1728)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.geode.SerializationException: Could not create an 
instance of  
org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage
 .
        at 
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2492)
        at org.apache.geode.internal.DSFIDFactory.create(DSFIDFactory.java:979)
        at 
org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2720)
        at 
org.apache.geode.internal.tcp.MsgDestreamer$DestreamerThread.run(MsgDestreamer.java:261)
Caused by: org.apache.geode.SerializationException: Could not create an 
instance of  org.apache.geode.internal.cache.execute.FunctionRemoteContext .
        at 
org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2521)
        at 
org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2958)
        at org.apache.geode.DataSerializer.readObject(DataSerializer.java:2897)
        at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage.fromData(PartitionedRegionFunctionStreamingMessage.java:180)
        at 
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2477)
        ... 3 more
Caused by: org.apache.geode.SerializationException: Could not create an 
instance of  org.apache.geode.internal.cache.execute.FunctionRemoteContext .
        at 
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2492)
        at 
org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2507)
        ... 7 more
Caused by: 
java.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__java.io&d=DwMDaQ&c=uilaK90D4TOVoH58JNXRgQ&r=wpTWSXVvcGFCkFEMePbOecdHHTbyiIj9aWq7oqKb0J8&m=bjcPkc9czZRlkSZeFiGboXA-eNJYdkPCL0O0wj9woNQ&s=X_oQmT-B_TlpRB1EbA7-EN5vwIru8ed7rRdVOJU4A_w&e=>.StreamCorruptedException:
 invalid type code: B1
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1563)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2567)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
        at java.util.TreeMap.buildFromSorted(TreeMap.java:2508)
        at java.util.TreeMap.readTreeSet(TreeMap.java:2460)
        at java.util.TreeSet.readObject(TreeSet.java:533)
        at sun.reflect.GeneratedMethodAccessor743.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
        at java.util.ArrayList.readObject(ArrayList.java:791)
        at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
       at java.util.ArrayList.readObject(ArrayList.java:791)
        at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
                    at 
org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2992)
        at org.apache.geode.DataSerializer.readObject(DataSerializer.java:2897)
        at 
org.apache.geode.internal.cache.execute.FunctionRemoteContext.fromData(FunctionRemoteContext.java:73)
        at 
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2479)
        ... 8 more


So could it be that these exceptions are not being sent back to caller node 
resulting caller thread to wait for reply forever?

Thanks,
Vahram.



Reply via email to