Vahram,

I see a few differences between your test and mine. Your test:

- is using a MsgDestreamer (MsgDestreamer.getMessage)
- is using the OIO reader (Connection.runOioReader)

My test was using neither of those. I guess that means your function
arguments are bigger than socket buffer size and you have SSL enabled. I'll
change my test to use both of those and see if I can reproduce the hang.

Thanks,
Barry Oglesby


On Fri, Jan 19, 2018 at 9:33 AM, Dan Smith <[email protected]> wrote:

> Hi Vahram,
>
> Well, it's definitely getting corrupted on reading the function arguments
> based on your version. I'm still suspicious that it has something to do
> with the arguments you are passing in. Maybe you can share the code for
> your argument class?
>
> One way to narrow down the problem might be to change the code to copy the
> arguments before you send them to the function and then run whatever
> workload hits this issue. You could do something like this
>
> functionArgument = org.apache.geode.CopyHelper.deepCopy(functionArgument)
> //... execute the function using functionArgument as the argument
>
> This won't fix the problem. But if the issue is really with serializing
> and deserializing the arguments, you will get an exception from the
> deepCopy method instead of within the function execution. That would at
> least mean your caller would get an exception instead of a hang.
>
> -Dan
>
> On Fri, Jan 19, 2018 at 6:22 AM, Vahram Aharonyan <[email protected]>
> wrote:
>
>> Hi All,
>>
>>
>>
>> Could it be someone had a chance to look at info provided in last replies
>> to this thread?
>>
>>
>>
>> Thanks,
>>
>> Vahram.
>>
>>
>>
>> *From:* Vahram Aharonyan [mailto:[email protected]]
>> *Sent:* Monday, January 15, 2018 10:02 PM
>> *To:* [email protected]
>> *Subject:* RE: Function Executor thread stacked
>>
>>
>>
>> Hi Bruce/Barry,
>>
>>
>>
>> We are using geode-1.1.0, but we have seen similar failures in 1.2.0 as
>> well. This call is peer-to-peer call. BTW, it is not mandatory that we are
>> always getting StreamCorruptedException. Once we have ClassCastException
>> as a cause as well. Please refer to attached screenshot. So it seems that
>> data that we are trying to deserilize seems to be corrupted.
>>
>>
>>
>> Here is how I was able to reproduce the issue on my side today.
>>
>> I’ve set a breakpoint in 
>> *org.apache.geode.internal.tcp.MsgDestreamer#getMessage* and set *new 
>> IOException()* as a new value for *this*.failure as a result I’ve entered to 
>> org.apache.geode.internal.tcp.Connection#sendFailureReply. As *directAck = 
>> false* and *rpId = 0* in my case, I’ve not got ReplyException sent back to 
>> caller. BTW, could you please help me to understand what meaning these 
>> directAck and rpId have and how function executor from the caller peer can 
>> have some impact on their values?
>>
>>
>>
>> As I wrote in my previous reply to Dan in this thread, it does not seem
>> that we have some concurrent usage of arguments object – anyways I will
>> double check this again.
>>
>>
>>
>> Thanks,
>>
>> Vahram.
>>
>>
>>
>> *From:* Bruce Schuchardt [mailto:[email protected]
>> <[email protected]>]
>> *Sent:* Saturday, January 13, 2018 3:38 AM
>> *To:* [email protected]
>> *Subject:* Re: Function Executor thread stacked
>>
>>
>>
>> I looked back through the git history for Connection.java.  He's not
>> running with a recent build.  I was looking at 4/5/17 to find matching line
>> numbers.
>>
>> On 1/12/18 2:20 PM, Barry Oglesby wrote:
>>
>> It looks like you're running the function from a peer as opposed to a
>> client. Is that right? Otherwise, the ExecuteRegionFunction66 would be
>> deserializing the arguments in the ServerConnection that receives the
>> client's function execution request. In that case, the
>> StreamCorruptedException would occur there instead of in the
>> RemoteFunctionContext.
>>
>>
>>
>> I wrote a test with a peer accessor member executing an onRegion
>> function. I implemented the readObject method in the Serializable argument
>> I passed to throw a StreamCorruptedException. The severe warning I see when
>> I run the test is pretty similar to yours except yours.
>>
>>
>>
>> In my test, the FunctionStreamingResultCollect
>> or.waitForCacheOrFunctionException method catches the ReplyException
>> from the member that attempted to deserialize the FunctionRemoteContext. It
>> handles that exception and completes. The caller gets the exception. This
>> is exactly the behavior Bruce described in his reply.
>>
>>
>>
>> What is your ComputedAndSystemMetricsRetriever thread and how exactly is
>> it being called to execute the function? Maybe there is something about the
>> way its being called that is causing different behavior.
>>
>>
>>
>> btw - I'm running this in the latest Geode develop code. What version are
>> you running?
>>
>>
>>
>>
>> Thanks,
>>
>> Barry Oglesby
>>
>>
>>
>>
>>
>> On Fri, Jan 12, 2018 at 11:46 AM, Bruce Schuchardt <
>> [email protected]> wrote:
>>
>> This shouldn't normally cause a hang.  The code that handles receipt of
>> tcp/ip messages reads the message's "reply processor" identifier before
>> trying to deserilize the rest of the message.  If there is a problem in
>> deserializing the message we send an error response with the identifier so
>> that the sender knows something went wrong.
>>
>> Having said that, I am not as familiar with the function execution
>> streaming-reply processors and how they handle this kind of response.  It's
>> possible that a hang could occur in your situation if these reply
>> processors aren't prepared to deal with an error response.
>>
>> It seems to me that you should be more concerned that a deserialization
>> problem occurred at all.  For instance, was the treemap being actively
>> modified during serialization?  If so, take steps to prevent that from
>> happening.
>>
>>
>>
>> On 1/10/18 5:02 AM, Vahram Aharonyan wrote:
>>
>> Hi All,
>>
>>
>>
>> We are experiencing an issue with the thread that is performing onRegion
>> call and expecting some result in response being stacked forewer in
>> TIMED_WAITING state with below  trace:
>>
>>
>>
>> "ComputedAndSystemMetricsRetriever" Id=490 in TIMED_WAITING on
>> lock=java.util.concurrent.CountDownLatch$Sync@5630fcc2
>>
>> Total blocked: 33   Total waited: 261425
>>
>>   sun.misc.Unsafe.park(Native Method)
>>
>>   java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>>
>>   java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcqu
>> ireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>>
>>   java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcq
>> uireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>>
>>   java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>>
>>   org.apache.geode.internal.util.concurrent.StoppableCountDown
>> Latch.await(StoppableCountDownLatch.java:64)
>>
>>   org.apache.geode.distributed.internal.ReplyProcessor21.basic
>> Wait(ReplyProcessor21.java:716)
>>
>>   org.apache.geode.distributed.internal.ReplyProcessor21.waitF
>> orRepliesUninterruptibly(ReplyProcessor21.java:793)
>>
>>   org.apache.geode.distributed.internal.ReplyProcessor21.waitF
>> orRepliesUninterruptibly(ReplyProcessor21.java:769)
>>
>>   org.apache.geode.distributed.internal.ReplyProcessor21.waitF
>> orRepliesUninterruptibly(ReplyProcessor21.java:856)
>>
>>   org.apache.geode.internal.cache.execute.FunctionStreamingRes
>> ultCollector.waitForCacheOrFunctionException(FunctionStreami
>> ngResultCollector.java:438)
>>
>>   org.apache.geode.internal.cache.partitioned.PRFunctionStream
>> ingResultCollector.getResult(PRFunctionStreamingResultCollector.java:91)
>>
>>   platform.gemfire.GemfireFunctionExecutor.onRegion(GemfireFun
>> ctionExecutor.java:494)
>>
>>
>>
>> In the logs of that member we see following:
>>
>>
>>
>> [warning 2017/12/20 10:49:14.570 UTC 29acc6f1-5384-489d-b2bd-5187b898e482
>> <ComputedAndSystemMetricsRetriever> tid=0x1ea] 60 seconds have elapsed
>> while waiting for replies: <PRFunctionStreamingResultCollector 100547
>> waiting for 1 replies from [gbv00457(abb6648c-39d6-4c4c-9
>> c6d-ab8589e034a5:9583)<ec><v4>:10002]> on gbv00455(29acc6f1-5384-489d-b2
>> bd-5187b898e482:22303)<ec><v3>:10002 whose current membership list is:
>> [[gbv00458(8d2960b9-a6be-4519-9547-311e2717231e:15532)<ec><v5>:10002,
>> gbv00457(abb6648c-39d6-4c4c-9c6d-ab8589e034a5:9583)<ec><v4>:10002,
>> gbv00460(21fd5612-5fe2-451d-aa9d-b8542fa43fa7:20144)<ec><v9>:10002,
>> gbv00459(3a14f29a-8bdb-46d5-bb67-0f79cb5c7faa:17197)<ec><v7>:10002,
>> gbv00454(18618:locator)<ec><v1>:20002, gbv00454(64aed382-0882-44f5-b7
>> 1f-08a429af46dd:18983)<ec><v8>:10002, gbv00453(13656:locator)<ec><v0>:20002,
>> gbv00453(881591a8-ae04-4af1-866a-5074c2ffb133:14490)<ec><v2>:10002,
>> gbv00456(63cebdf8-dd1e-414e-af5f-f8c4ebecf726:18001)<ec><v6>:10002,
>> gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002]]
>>
>>
>>
>> Near that time on the nodes where this call lands, this exceptions occur:
>>
>>
>>
>> [severe 2017/12/20 10:48:14.728 UTC abb6648c-39d6-4c4c-9c6d-ab8589e034a5
>> <P2P message reader for gbv00455(29acc6f1-5384-489d-b2
>> bd-5187b898e482:22303)<ec><v3>:10002 shared unordered uid=8 port=41631>
>> tid=0x44] IOException deserializing message
>>
>> java.io.IOException: failure during message deserialization
>>
>>         at org.apache.geode.internal.tcp.MsgDestreamer.getMessage(MsgDe
>> streamer.java:190)
>>
>>         at org.apache.geode.internal.tcp.Connection.runOioReader(Connec
>> tion.java:2218)
>>
>>         at org.apache.geode.internal.tcp.Connection.run(Connection.java
>> :1728)
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>>
>>         at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: org.apache.geode.SerializationException: Could not create an
>> instance of  
>> org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage
>> .
>>
>>         at org.apache.geode.internal.InternalDataSerializer.invokeFromD
>> ata(InternalDataSerializer.java:2492)
>>
>>         at org.apache.geode.internal.DSFIDFactory.create(DSFIDFactory.
>> java:979)
>>
>>         at org.apache.geode.internal.InternalDataSerializer.readDSFID(I
>> nternalDataSerializer.java:2720)
>>
>>         at org.apache.geode.internal.tcp.MsgDestreamer$DestreamerThread
>> .run(MsgDestreamer.java:261)
>>
>> Caused by: org.apache.geode.SerializationException: Could not create an
>> instance of  org.apache.geode.internal.cache.execute.FunctionRemoteContext
>> .
>>
>>         at org.apache.geode.internal.InternalDataSerializer.readDataSer
>> ializable(InternalDataSerializer.java:2521)
>>
>>         at org.apache.geode.internal.InternalDataSerializer.basicReadOb
>> ject(InternalDataSerializer.java:2958)
>>
>>         at org.apache.geode.DataSerializer.readObject(DataSerializer.
>> java:2897)
>>
>>         at org.apache.geode.internal.cache.partitioned.PartitionedRegio
>> nFunctionStreamingMessage.fromData(PartitionedRegionFunc
>> tionStreamingMessage.java:180)
>>
>>         at org.apache.geode.internal.InternalDataSerializer.invokeFromD
>> ata(InternalDataSerializer.java:2477)
>>
>>         ... 3 more
>>
>> Caused by: org.apache.geode.SerializationException: Could not create an
>> instance of  org.apache.geode.internal.cache.execute.FunctionRemoteContext
>> .
>>
>>         at org.apache.geode.internal.InternalDataSerializer.invokeFromD
>> ata(InternalDataSerializer.java:2492)
>>
>>         at org.apache.geode.internal.InternalDataSerializer.readDataSer
>> ializable(InternalDataSerializer.java:2507)
>>
>>         ... 7 more
>>
>> Caused by: java.io
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__java.io&d=DwMDaQ&c=uilaK90D4TOVoH58JNXRgQ&r=wpTWSXVvcGFCkFEMePbOecdHHTbyiIj9aWq7oqKb0J8&m=bjcPkc9czZRlkSZeFiGboXA-eNJYdkPCL0O0wj9woNQ&s=X_oQmT-B_TlpRB1EbA7-EN5vwIru8ed7rRdVOJU4A_w&e=>
>> .StreamCorruptedException: invalid type code: B1
>>
>>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java
>> :1563)
>>
>>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
>> 422)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2567)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)
>>
>>         at java.util.TreeMap.buildFromSorted(TreeMap.java:2508)
>>
>>         at java.util.TreeMap.readTreeSet(TreeMap.java:2460)
>>
>>         at java.util.TreeSet.readObject(TreeSet.java:533)
>>
>>         at sun.reflect.GeneratedMethodAccessor743.invoke(Unknown Source)
>>
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>
>>         at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass
>> .java:1058)
>>
>>         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.
>> java:2136)
>>
>>         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre
>> am.java:2027)
>>
>>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java
>> :1535)
>>
>>         at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea
>> m.java:2245)
>>
>>         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.
>> java:2169)
>>
>>         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre
>> am.java:2027)
>>
>>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java
>> :1535)
>>
>>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
>> 422)
>>
>>         at java.util.ArrayList.readObject(ArrayList.java:791)
>>
>>         at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown Source)
>>
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>
>>         at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass
>> .java:1058)
>>
>>         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.
>> java:2136)
>>
>>         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre
>> am.java:2027)
>>
>>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java
>> :1535)
>>
>>         at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea
>> m.java:2245)
>>
>>         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.
>> java:2169)
>>
>>         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre
>> am.java:2027)
>>
>>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java
>> :1535)
>>
>>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
>> 422)
>>
>>        at java.util.ArrayList.readObject(ArrayList.java:791)
>>
>>         at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown Source)
>>
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>
>>         at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass
>> .java:1058)
>>
>>         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.
>> java:2136)
>>
>>         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre
>> am.java:2027)
>>
>>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java
>> :1535)
>>
>>         at java.io.ObjectInputStream.readArray(ObjectInputStream.java:
>> 1933)
>>
>>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java
>> :1529)
>>
>>         at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea
>> m.java:2245)
>>
>>         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.
>> java:2169)
>>
>>         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre
>> am.java:2027)
>>
>>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java
>> :1535)
>>
>>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
>> 422)
>>
>>                     at org.apache.geode.internal.Inte
>> rnalDataSerializer.basicReadObject(InternalDataSerializer.java:2992)
>>
>>         at org.apache.geode.DataSerializer.readObject(DataSerializer.
>> java:2897)
>>
>>         at org.apache.geode.internal.cache.execute.FunctionRemoteContex
>> t.fromData(FunctionRemoteContext.java:73)
>>
>>         at org.apache.geode.internal.InternalDataSerializer.invokeFromD
>> ata(InternalDataSerializer.java:2479)
>>
>>         ... 8 more
>>
>>
>>
>>
>>
>> So could it be that these exceptions are not being sent back to caller
>> node resulting caller thread to wait for reply forever?
>>
>>
>>
>> Thanks,
>>
>> Vahram.
>>
>>
>>
>>
>>
>>
>>
>
>

Reply via email to