Hi Vahram, Well, it's definitely getting corrupted on reading the function arguments based on your version. I'm still suspicious that it has something to do with the arguments you are passing in. Maybe you can share the code for your argument class?
One way to narrow down the problem might be to change the code to copy the arguments before you send them to the function and then run whatever workload hits this issue. You could do something like this functionArgument = org.apache.geode.CopyHelper.deepCopy(functionArgument) //... execute the function using functionArgument as the argument This won't fix the problem. But if the issue is really with serializing and deserializing the arguments, you will get an exception from the deepCopy method instead of within the function execution. That would at least mean your caller would get an exception instead of a hang. -Dan On Fri, Jan 19, 2018 at 6:22 AM, Vahram Aharonyan <[email protected]> wrote: > Hi All, > > > > Could it be someone had a chance to look at info provided in last replies > to this thread? > > > > Thanks, > > Vahram. > > > > *From:* Vahram Aharonyan [mailto:[email protected]] > *Sent:* Monday, January 15, 2018 10:02 PM > *To:* [email protected] > *Subject:* RE: Function Executor thread stacked > > > > Hi Bruce/Barry, > > > > We are using geode-1.1.0, but we have seen similar failures in 1.2.0 as > well. This call is peer-to-peer call. BTW, it is not mandatory that we are > always getting StreamCorruptedException. Once we have ClassCastException > as a cause as well. Please refer to attached screenshot. So it seems that > data that we are trying to deserilize seems to be corrupted. > > > > Here is how I was able to reproduce the issue on my side today. > > I’ve set a breakpoint in > *org.apache.geode.internal.tcp.MsgDestreamer#getMessage* and set *new > IOException()* as a new value for *this*.failure as a result I’ve entered to > org.apache.geode.internal.tcp.Connection#sendFailureReply. As *directAck = > false* and *rpId = 0* in my case, I’ve not got ReplyException sent back to > caller. BTW, could you please help me to understand what meaning these > directAck and rpId have and how function executor from the caller peer can > have some impact on their values? > > > > As I wrote in my previous reply to Dan in this thread, it does not seem > that we have some concurrent usage of arguments object – anyways I will > double check this again. > > > > Thanks, > > Vahram. > > > > *From:* Bruce Schuchardt [mailto:[email protected] > <[email protected]>] > *Sent:* Saturday, January 13, 2018 3:38 AM > *To:* [email protected] > *Subject:* Re: Function Executor thread stacked > > > > I looked back through the git history for Connection.java. He's not > running with a recent build. I was looking at 4/5/17 to find matching line > numbers. > > On 1/12/18 2:20 PM, Barry Oglesby wrote: > > It looks like you're running the function from a peer as opposed to a > client. Is that right? Otherwise, the ExecuteRegionFunction66 would be > deserializing the arguments in the ServerConnection that receives the > client's function execution request. In that case, the > StreamCorruptedException would occur there instead of in the > RemoteFunctionContext. > > > > I wrote a test with a peer accessor member executing an onRegion function. > I implemented the readObject method in the Serializable argument I passed > to throw a StreamCorruptedException. The severe warning I see when I run > the test is pretty similar to yours except yours. > > > > In my test, the FunctionStreamingResultCollector. > waitForCacheOrFunctionException method catches the ReplyException from > the member that attempted to deserialize the FunctionRemoteContext. It > handles that exception and completes. The caller gets the exception. This > is exactly the behavior Bruce described in his reply. > > > > What is your ComputedAndSystemMetricsRetriever thread and how exactly is > it being called to execute the function? Maybe there is something about the > way its being called that is causing different behavior. > > > > btw - I'm running this in the latest Geode develop code. What version are > you running? > > > > > Thanks, > > Barry Oglesby > > > > > > On Fri, Jan 12, 2018 at 11:46 AM, Bruce Schuchardt <[email protected]> > wrote: > > This shouldn't normally cause a hang. The code that handles receipt of > tcp/ip messages reads the message's "reply processor" identifier before > trying to deserilize the rest of the message. If there is a problem in > deserializing the message we send an error response with the identifier so > that the sender knows something went wrong. > > Having said that, I am not as familiar with the function execution > streaming-reply processors and how they handle this kind of response. It's > possible that a hang could occur in your situation if these reply > processors aren't prepared to deal with an error response. > > It seems to me that you should be more concerned that a deserialization > problem occurred at all. For instance, was the treemap being actively > modified during serialization? If so, take steps to prevent that from > happening. > > > > On 1/10/18 5:02 AM, Vahram Aharonyan wrote: > > Hi All, > > > > We are experiencing an issue with the thread that is performing onRegion > call and expecting some result in response being stacked forewer in > TIMED_WAITING state with below trace: > > > > "ComputedAndSystemMetricsRetriever" Id=490 in TIMED_WAITING on > lock=java.util.concurrent.CountDownLatch$Sync@5630fcc2 > > Total blocked: 33 Total waited: 261425 > > sun.misc.Unsafe.park(Native Method) > > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > > java.util.concurrent.locks.AbstractQueuedSynchronizer. > doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > > java.util.concurrent.locks.AbstractQueuedSynchronizer. > tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > > java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > > org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await( > StoppableCountDownLatch.java:64) > > org.apache.geode.distributed.internal.ReplyProcessor21. > basicWait(ReplyProcessor21.java:716) > > org.apache.geode.distributed.internal.ReplyProcessor21. > waitForRepliesUninterruptibly(ReplyProcessor21.java:793) > > org.apache.geode.distributed.internal.ReplyProcessor21. > waitForRepliesUninterruptibly(ReplyProcessor21.java:769) > > org.apache.geode.distributed.internal.ReplyProcessor21. > waitForRepliesUninterruptibly(ReplyProcessor21.java:856) > > org.apache.geode.internal.cache.execute.FunctionStreamingResultCollect > or.waitForCacheOrFunctionException(FunctionStreamingResultCollect > or.java:438) > > org.apache.geode.internal.cache.partitioned. > PRFunctionStreamingResultCollector.getResult( > PRFunctionStreamingResultCollector.java:91) > > platform.gemfire.GemfireFunctionExecutor.onRegion( > GemfireFunctionExecutor.java:494) > > > > In the logs of that member we see following: > > > > [warning 2017/12/20 10:49:14.570 UTC 29acc6f1-5384-489d-b2bd-5187b898e482 > <ComputedAndSystemMetricsRetriever> tid=0x1ea] 60 seconds have elapsed > while waiting for replies: <PRFunctionStreamingResultCollector 100547 > waiting for 1 replies from [gbv00457(abb6648c-39d6-4c4c- > 9c6d-ab8589e034a5:9583)<ec><v4>:10002]> on gbv00455(29acc6f1-5384-489d- > b2bd-5187b898e482:22303)<ec><v3>:10002 whose current membership list is: > [[gbv00458(8d2960b9-a6be-4519-9547-311e2717231e:15532)<ec><v5>:10002, > gbv00457(abb6648c-39d6-4c4c-9c6d-ab8589e034a5:9583)<ec><v4>:10002, > gbv00460(21fd5612-5fe2-451d-aa9d-b8542fa43fa7:20144)<ec><v9>:10002, > gbv00459(3a14f29a-8bdb-46d5-bb67-0f79cb5c7faa:17197)<ec><v7>:10002, > gbv00454(18618:locator)<ec><v1>:20002, gbv00454(64aed382-0882-44f5- > b71f-08a429af46dd:18983)<ec><v8>:10002, gbv00453(13656:locator)<ec><v0>:20002, > gbv00453(881591a8-ae04-4af1-866a-5074c2ffb133:14490)<ec><v2>:10002, > gbv00456(63cebdf8-dd1e-414e-af5f-f8c4ebecf726:18001)<ec><v6>:10002, > gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002]] > > > > Near that time on the nodes where this call lands, this exceptions occur: > > > > [severe 2017/12/20 10:48:14.728 UTC abb6648c-39d6-4c4c-9c6d-ab8589e034a5 > <P2P message reader for gbv00455(29acc6f1-5384-489d- > b2bd-5187b898e482:22303)<ec><v3>:10002 shared unordered uid=8 port=41631> > tid=0x44] IOException deserializing message > > java.io.IOException: failure during message deserialization > > at org.apache.geode.internal.tcp.MsgDestreamer.getMessage( > MsgDestreamer.java:190) > > at org.apache.geode.internal.tcp.Connection.runOioReader( > Connection.java:2218) > > at org.apache.geode.internal.tcp.Connection.run(Connection. > java:1728) > > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: org.apache.geode.SerializationException: Could not create an > instance of org.apache.geode.internal.cache.partitioned. > PartitionedRegionFunctionStreamingMessage . > > at org.apache.geode.internal.InternalDataSerializer. > invokeFromData(InternalDataSerializer.java:2492) > > at org.apache.geode.internal.DSFIDFactory.create( > DSFIDFactory.java:979) > > at org.apache.geode.internal.InternalDataSerializer.readDSFID( > InternalDataSerializer.java:2720) > > at org.apache.geode.internal.tcp.MsgDestreamer$ > DestreamerThread.run(MsgDestreamer.java:261) > > Caused by: org.apache.geode.SerializationException: Could not create an > instance of org.apache.geode.internal.cache.execute.FunctionRemoteContext > . > > at org.apache.geode.internal.InternalDataSerializer. > readDataSerializable(InternalDataSerializer.java:2521) > > at org.apache.geode.internal.InternalDataSerializer. > basicReadObject(InternalDataSerializer.java:2958) > > at org.apache.geode.DataSerializer.readObject( > DataSerializer.java:2897) > > at org.apache.geode.internal.cache.partitioned. > PartitionedRegionFunctionStreamingMessage.fromData( > PartitionedRegionFunctionStreamingMessage.java:180) > > at org.apache.geode.internal.InternalDataSerializer. > invokeFromData(InternalDataSerializer.java:2477) > > ... 3 more > > Caused by: org.apache.geode.SerializationException: Could not create an > instance of org.apache.geode.internal.cache.execute.FunctionRemoteContext > . > > at org.apache.geode.internal.InternalDataSerializer. > invokeFromData(InternalDataSerializer.java:2492) > > at org.apache.geode.internal.InternalDataSerializer. > readDataSerializable(InternalDataSerializer.java:2507) > > ... 7 more > > Caused by: java.io > <https://urldefense.proofpoint.com/v2/url?u=http-3A__java.io&d=DwMDaQ&c=uilaK90D4TOVoH58JNXRgQ&r=wpTWSXVvcGFCkFEMePbOecdHHTbyiIj9aWq7oqKb0J8&m=bjcPkc9czZRlkSZeFiGboXA-eNJYdkPCL0O0wj9woNQ&s=X_oQmT-B_TlpRB1EbA7-EN5vwIru8ed7rRdVOJU4A_w&e=> > .StreamCorruptedException: invalid type code: B1 > > at java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1563) > > at java.io.ObjectInputStream.readObject(ObjectInputStream. > java:422) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2567) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) > > at java.util.TreeMap.buildFromSorted(TreeMap.java:2508) > > at java.util.TreeMap.readTreeSet(TreeMap.java:2460) > > at java.util.TreeSet.readObject(TreeSet.java:533) > > at sun.reflect.GeneratedMethodAccessor743.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at java.io.ObjectStreamClass.invokeReadObject( > ObjectStreamClass.java:1058) > > at java.io.ObjectInputStream.readSerialData( > ObjectInputStream.java:2136) > > at java.io.ObjectInputStream.readOrdinaryObject( > ObjectInputStream.java:2027) > > at java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1535) > > at java.io.ObjectInputStream.defaultReadFields( > ObjectInputStream.java:2245) > > at java.io.ObjectInputStream.readSerialData( > ObjectInputStream.java:2169) > > at java.io.ObjectInputStream.readOrdinaryObject( > ObjectInputStream.java:2027) > > at java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1535) > > at java.io.ObjectInputStream.readObject(ObjectInputStream. > java:422) > > at java.util.ArrayList.readObject(ArrayList.java:791) > > at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at java.io.ObjectStreamClass.invokeReadObject( > ObjectStreamClass.java:1058) > > at java.io.ObjectInputStream.readSerialData( > ObjectInputStream.java:2136) > > at java.io.ObjectInputStream.readOrdinaryObject( > ObjectInputStream.java:2027) > > at java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1535) > > at java.io.ObjectInputStream.defaultReadFields( > ObjectInputStream.java:2245) > > at java.io.ObjectInputStream.readSerialData( > ObjectInputStream.java:2169) > > at java.io.ObjectInputStream.readOrdinaryObject( > ObjectInputStream.java:2027) > > at java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1535) > > at java.io.ObjectInputStream.readObject(ObjectInputStream. > java:422) > > at java.util.ArrayList.readObject(ArrayList.java:791) > > at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at java.io.ObjectStreamClass.invokeReadObject( > ObjectStreamClass.java:1058) > > at java.io.ObjectInputStream.readSerialData( > ObjectInputStream.java:2136) > > at java.io.ObjectInputStream.readOrdinaryObject( > ObjectInputStream.java:2027) > > at java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1535) > > at java.io.ObjectInputStream.readArray(ObjectInputStream. > java:1933) > > at java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1529) > > at java.io.ObjectInputStream.defaultReadFields( > ObjectInputStream.java:2245) > > at java.io.ObjectInputStream.readSerialData( > ObjectInputStream.java:2169) > > at java.io.ObjectInputStream.readOrdinaryObject( > ObjectInputStream.java:2027) > > at java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1535) > > at java.io.ObjectInputStream.readObject(ObjectInputStream. > java:422) > > at org.apache.geode.internal.InternalDataSerializer. > basicReadObject(InternalDataSerializer.java:2992) > > at org.apache.geode.DataSerializer.readObject( > DataSerializer.java:2897) > > at org.apache.geode.internal.cache.execute.FunctionRemoteContext. > fromData(FunctionRemoteContext.java:73) > > at org.apache.geode.internal.InternalDataSerializer. > invokeFromData(InternalDataSerializer.java:2479) > > ... 8 more > > > > > > So could it be that these exceptions are not being sent back to caller > node resulting caller thread to wait for reply forever? > > > > Thanks, > > Vahram. > > > > > > >
