Vahram, I see a few differences between your test and mine. Your test:
- is using a MsgDestreamer (MsgDestreamer.getMessage) - is using the OIO reader (Connection.runOioReader) My test was using neither of those. I guess that means your function arguments are bigger than socket buffer size and you have SSL enabled. I'll change my test to use both of those and see if I can reproduce the hang. Thanks, Barry Oglesby On Fri, Jan 19, 2018 at 9:33 AM, Dan Smith <[email protected]> wrote: > Hi Vahram, > > Well, it's definitely getting corrupted on reading the function arguments > based on your version. I'm still suspicious that it has something to do > with the arguments you are passing in. Maybe you can share the code for > your argument class? > > One way to narrow down the problem might be to change the code to copy the > arguments before you send them to the function and then run whatever > workload hits this issue. You could do something like this > > functionArgument = org.apache.geode.CopyHelper.deepCopy(functionArgument) > //... execute the function using functionArgument as the argument > > This won't fix the problem. But if the issue is really with serializing > and deserializing the arguments, you will get an exception from the > deepCopy method instead of within the function execution. That would at > least mean your caller would get an exception instead of a hang. > > -Dan > > On Fri, Jan 19, 2018 at 6:22 AM, Vahram Aharonyan <[email protected]> > wrote: > >> Hi All, >> >> >> >> Could it be someone had a chance to look at info provided in last replies >> to this thread? >> >> >> >> Thanks, >> >> Vahram. >> >> >> >> *From:* Vahram Aharonyan [mailto:[email protected]] >> *Sent:* Monday, January 15, 2018 10:02 PM >> *To:* [email protected] >> *Subject:* RE: Function Executor thread stacked >> >> >> >> Hi Bruce/Barry, >> >> >> >> We are using geode-1.1.0, but we have seen similar failures in 1.2.0 as >> well. This call is peer-to-peer call. BTW, it is not mandatory that we are >> always getting StreamCorruptedException. Once we have ClassCastException >> as a cause as well. Please refer to attached screenshot. So it seems that >> data that we are trying to deserilize seems to be corrupted. >> >> >> >> Here is how I was able to reproduce the issue on my side today. >> >> I’ve set a breakpoint in >> *org.apache.geode.internal.tcp.MsgDestreamer#getMessage* and set *new >> IOException()* as a new value for *this*.failure as a result I’ve entered to >> org.apache.geode.internal.tcp.Connection#sendFailureReply. As *directAck = >> false* and *rpId = 0* in my case, I’ve not got ReplyException sent back to >> caller. BTW, could you please help me to understand what meaning these >> directAck and rpId have and how function executor from the caller peer can >> have some impact on their values? >> >> >> >> As I wrote in my previous reply to Dan in this thread, it does not seem >> that we have some concurrent usage of arguments object – anyways I will >> double check this again. >> >> >> >> Thanks, >> >> Vahram. >> >> >> >> *From:* Bruce Schuchardt [mailto:[email protected] >> <[email protected]>] >> *Sent:* Saturday, January 13, 2018 3:38 AM >> *To:* [email protected] >> *Subject:* Re: Function Executor thread stacked >> >> >> >> I looked back through the git history for Connection.java. He's not >> running with a recent build. I was looking at 4/5/17 to find matching line >> numbers. >> >> On 1/12/18 2:20 PM, Barry Oglesby wrote: >> >> It looks like you're running the function from a peer as opposed to a >> client. Is that right? Otherwise, the ExecuteRegionFunction66 would be >> deserializing the arguments in the ServerConnection that receives the >> client's function execution request. In that case, the >> StreamCorruptedException would occur there instead of in the >> RemoteFunctionContext. >> >> >> >> I wrote a test with a peer accessor member executing an onRegion >> function. I implemented the readObject method in the Serializable argument >> I passed to throw a StreamCorruptedException. The severe warning I see when >> I run the test is pretty similar to yours except yours. >> >> >> >> In my test, the FunctionStreamingResultCollect >> or.waitForCacheOrFunctionException method catches the ReplyException >> from the member that attempted to deserialize the FunctionRemoteContext. It >> handles that exception and completes. The caller gets the exception. This >> is exactly the behavior Bruce described in his reply. >> >> >> >> What is your ComputedAndSystemMetricsRetriever thread and how exactly is >> it being called to execute the function? Maybe there is something about the >> way its being called that is causing different behavior. >> >> >> >> btw - I'm running this in the latest Geode develop code. What version are >> you running? >> >> >> >> >> Thanks, >> >> Barry Oglesby >> >> >> >> >> >> On Fri, Jan 12, 2018 at 11:46 AM, Bruce Schuchardt < >> [email protected]> wrote: >> >> This shouldn't normally cause a hang. The code that handles receipt of >> tcp/ip messages reads the message's "reply processor" identifier before >> trying to deserilize the rest of the message. If there is a problem in >> deserializing the message we send an error response with the identifier so >> that the sender knows something went wrong. >> >> Having said that, I am not as familiar with the function execution >> streaming-reply processors and how they handle this kind of response. It's >> possible that a hang could occur in your situation if these reply >> processors aren't prepared to deal with an error response. >> >> It seems to me that you should be more concerned that a deserialization >> problem occurred at all. For instance, was the treemap being actively >> modified during serialization? If so, take steps to prevent that from >> happening. >> >> >> >> On 1/10/18 5:02 AM, Vahram Aharonyan wrote: >> >> Hi All, >> >> >> >> We are experiencing an issue with the thread that is performing onRegion >> call and expecting some result in response being stacked forewer in >> TIMED_WAITING state with below trace: >> >> >> >> "ComputedAndSystemMetricsRetriever" Id=490 in TIMED_WAITING on >> lock=java.util.concurrent.CountDownLatch$Sync@5630fcc2 >> >> Total blocked: 33 Total waited: 261425 >> >> sun.misc.Unsafe.park(Native Method) >> >> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) >> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcqu >> ireSharedNanos(AbstractQueuedSynchronizer.java:1037) >> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcq >> uireSharedNanos(AbstractQueuedSynchronizer.java:1328) >> >> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) >> >> org.apache.geode.internal.util.concurrent.StoppableCountDown >> Latch.await(StoppableCountDownLatch.java:64) >> >> org.apache.geode.distributed.internal.ReplyProcessor21.basic >> Wait(ReplyProcessor21.java:716) >> >> org.apache.geode.distributed.internal.ReplyProcessor21.waitF >> orRepliesUninterruptibly(ReplyProcessor21.java:793) >> >> org.apache.geode.distributed.internal.ReplyProcessor21.waitF >> orRepliesUninterruptibly(ReplyProcessor21.java:769) >> >> org.apache.geode.distributed.internal.ReplyProcessor21.waitF >> orRepliesUninterruptibly(ReplyProcessor21.java:856) >> >> org.apache.geode.internal.cache.execute.FunctionStreamingRes >> ultCollector.waitForCacheOrFunctionException(FunctionStreami >> ngResultCollector.java:438) >> >> org.apache.geode.internal.cache.partitioned.PRFunctionStream >> ingResultCollector.getResult(PRFunctionStreamingResultCollector.java:91) >> >> platform.gemfire.GemfireFunctionExecutor.onRegion(GemfireFun >> ctionExecutor.java:494) >> >> >> >> In the logs of that member we see following: >> >> >> >> [warning 2017/12/20 10:49:14.570 UTC 29acc6f1-5384-489d-b2bd-5187b898e482 >> <ComputedAndSystemMetricsRetriever> tid=0x1ea] 60 seconds have elapsed >> while waiting for replies: <PRFunctionStreamingResultCollector 100547 >> waiting for 1 replies from [gbv00457(abb6648c-39d6-4c4c-9 >> c6d-ab8589e034a5:9583)<ec><v4>:10002]> on gbv00455(29acc6f1-5384-489d-b2 >> bd-5187b898e482:22303)<ec><v3>:10002 whose current membership list is: >> [[gbv00458(8d2960b9-a6be-4519-9547-311e2717231e:15532)<ec><v5>:10002, >> gbv00457(abb6648c-39d6-4c4c-9c6d-ab8589e034a5:9583)<ec><v4>:10002, >> gbv00460(21fd5612-5fe2-451d-aa9d-b8542fa43fa7:20144)<ec><v9>:10002, >> gbv00459(3a14f29a-8bdb-46d5-bb67-0f79cb5c7faa:17197)<ec><v7>:10002, >> gbv00454(18618:locator)<ec><v1>:20002, gbv00454(64aed382-0882-44f5-b7 >> 1f-08a429af46dd:18983)<ec><v8>:10002, gbv00453(13656:locator)<ec><v0>:20002, >> gbv00453(881591a8-ae04-4af1-866a-5074c2ffb133:14490)<ec><v2>:10002, >> gbv00456(63cebdf8-dd1e-414e-af5f-f8c4ebecf726:18001)<ec><v6>:10002, >> gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002]] >> >> >> >> Near that time on the nodes where this call lands, this exceptions occur: >> >> >> >> [severe 2017/12/20 10:48:14.728 UTC abb6648c-39d6-4c4c-9c6d-ab8589e034a5 >> <P2P message reader for gbv00455(29acc6f1-5384-489d-b2 >> bd-5187b898e482:22303)<ec><v3>:10002 shared unordered uid=8 port=41631> >> tid=0x44] IOException deserializing message >> >> java.io.IOException: failure during message deserialization >> >> at org.apache.geode.internal.tcp.MsgDestreamer.getMessage(MsgDe >> streamer.java:190) >> >> at org.apache.geode.internal.tcp.Connection.runOioReader(Connec >> tion.java:2218) >> >> at org.apache.geode.internal.tcp.Connection.run(Connection.java >> :1728) >> >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> >> at java.lang.Thread.run(Thread.java:748) >> >> Caused by: org.apache.geode.SerializationException: Could not create an >> instance of >> org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage >> . >> >> at org.apache.geode.internal.InternalDataSerializer.invokeFromD >> ata(InternalDataSerializer.java:2492) >> >> at org.apache.geode.internal.DSFIDFactory.create(DSFIDFactory. >> java:979) >> >> at org.apache.geode.internal.InternalDataSerializer.readDSFID(I >> nternalDataSerializer.java:2720) >> >> at org.apache.geode.internal.tcp.MsgDestreamer$DestreamerThread >> .run(MsgDestreamer.java:261) >> >> Caused by: org.apache.geode.SerializationException: Could not create an >> instance of org.apache.geode.internal.cache.execute.FunctionRemoteContext >> . >> >> at org.apache.geode.internal.InternalDataSerializer.readDataSer >> ializable(InternalDataSerializer.java:2521) >> >> at org.apache.geode.internal.InternalDataSerializer.basicReadOb >> ject(InternalDataSerializer.java:2958) >> >> at org.apache.geode.DataSerializer.readObject(DataSerializer. >> java:2897) >> >> at org.apache.geode.internal.cache.partitioned.PartitionedRegio >> nFunctionStreamingMessage.fromData(PartitionedRegionFunc >> tionStreamingMessage.java:180) >> >> at org.apache.geode.internal.InternalDataSerializer.invokeFromD >> ata(InternalDataSerializer.java:2477) >> >> ... 3 more >> >> Caused by: org.apache.geode.SerializationException: Could not create an >> instance of org.apache.geode.internal.cache.execute.FunctionRemoteContext >> . >> >> at org.apache.geode.internal.InternalDataSerializer.invokeFromD >> ata(InternalDataSerializer.java:2492) >> >> at org.apache.geode.internal.InternalDataSerializer.readDataSer >> ializable(InternalDataSerializer.java:2507) >> >> ... 7 more >> >> Caused by: java.io >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__java.io&d=DwMDaQ&c=uilaK90D4TOVoH58JNXRgQ&r=wpTWSXVvcGFCkFEMePbOecdHHTbyiIj9aWq7oqKb0J8&m=bjcPkc9czZRlkSZeFiGboXA-eNJYdkPCL0O0wj9woNQ&s=X_oQmT-B_TlpRB1EbA7-EN5vwIru8ed7rRdVOJU4A_w&e=> >> .StreamCorruptedException: invalid type code: B1 >> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java >> :1563) >> >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java: >> 422) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2567) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) >> >> at java.util.TreeMap.buildFromSorted(TreeMap.java:2508) >> >> at java.util.TreeMap.readTreeSet(TreeMap.java:2460) >> >> at java.util.TreeSet.readObject(TreeSet.java:533) >> >> at sun.reflect.GeneratedMethodAccessor743.invoke(Unknown Source) >> >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:498) >> >> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass >> .java:1058) >> >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream. >> java:2136) >> >> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >> am.java:2027) >> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java >> :1535) >> >> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea >> m.java:2245) >> >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream. >> java:2169) >> >> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >> am.java:2027) >> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java >> :1535) >> >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java: >> 422) >> >> at java.util.ArrayList.readObject(ArrayList.java:791) >> >> at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown Source) >> >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:498) >> >> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass >> .java:1058) >> >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream. >> java:2136) >> >> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >> am.java:2027) >> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java >> :1535) >> >> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea >> m.java:2245) >> >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream. >> java:2169) >> >> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >> am.java:2027) >> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java >> :1535) >> >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java: >> 422) >> >> at java.util.ArrayList.readObject(ArrayList.java:791) >> >> at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown Source) >> >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:498) >> >> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass >> .java:1058) >> >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream. >> java:2136) >> >> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >> am.java:2027) >> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java >> :1535) >> >> at java.io.ObjectInputStream.readArray(ObjectInputStream.java: >> 1933) >> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java >> :1529) >> >> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea >> m.java:2245) >> >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream. >> java:2169) >> >> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >> am.java:2027) >> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java >> :1535) >> >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java: >> 422) >> >> at org.apache.geode.internal.Inte >> rnalDataSerializer.basicReadObject(InternalDataSerializer.java:2992) >> >> at org.apache.geode.DataSerializer.readObject(DataSerializer. >> java:2897) >> >> at org.apache.geode.internal.cache.execute.FunctionRemoteContex >> t.fromData(FunctionRemoteContext.java:73) >> >> at org.apache.geode.internal.InternalDataSerializer.invokeFromD >> ata(InternalDataSerializer.java:2479) >> >> ... 8 more >> >> >> >> >> >> So could it be that these exceptions are not being sent back to caller >> node resulting caller thread to wait for reply forever? >> >> >> >> Thanks, >> >> Vahram. >> >> >> >> >> >> >> > >
