Hi Dan/Mike, This issue was hit on the setup with geode-1.1.0. Taking into account your input related to FunctionRemoteContext.fromData, it seems that it is args object that seems to cause the issue. Generally for passing an argument to function call we are creating an object that has Object[] args array specific to the function to be executed along with other fields that are generic for all the function executions we will made.
While going over the code I was not able to find race condition while creating an argument object for this call. It’s being instantiated in standalone thread and the directly passed to Executor service. Thanks, Vahram. From: Michael Stolz [mailto:[email protected]] Sent: Friday, January 12, 2018 12:41 AM To: [email protected] Subject: Re: Function Executor thread stacked Could this be the thing about passing 1 argument to a function you receive just what was passed...passing more than one argument you get an array containing the things that you passed? -- Mike Stolz Principal Engineer, GemFire Product Lead Mobile: +1-631-835-4771 Download the new GemFire book here.<https://urldefense.proofpoint.com/v2/url?u=https-3A__content.pivotal.io_ebooks_scaling-2Ddata-2Dservices-2Dwith-2Dpivotal-2Dgemfire&d=DwMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=wpTWSXVvcGFCkFEMePbOecdHHTbyiIj9aWq7oqKb0J8&m=3GzhdGI27IrlCL3fl2LkZ0fILQsdRxAC44v2k1HfwaE&s=6kbFUa__S2UID-64qCgGi-A9s2o9q4C2KwXlnuTbNhA&e=> On Thu, Jan 11, 2018 at 1:47 PM, Dan Smith <[email protected]<mailto:[email protected]>> wrote: I've seen something like this happen before when there is code that is concurrently modifying data that is being serialized. What version of geode are you using? The line number in FunctionRemoteContext.fromData should tell us which of your objects is failing to be deserialized. For example if you are using 1.3 it is the object you are passing as the argument to the function. I would look closely at your code and make sure nothing could be concurrently modifying your function argument or anything that is is referring to while it is being serialized. -Dan On Thu, Jan 11, 2018 at 12:21 AM, Vahram Aharonyan <[email protected]<mailto:[email protected]>> wrote: Hi Jason, Basically we have not done modification in function arguments list recently. Moreover, this is something that is not persistent in our all deployments of the product. And even in this cluster sometimes this function succeeds. I think the important point here is that exception itself is IOException – it seems that data itself that is being deserialized is corrupted. And reason for this could be network issue or other infrastructure problems. And even with this, still question remains why exception is not being passed back to the caller. Thanks, Vahram. From: Jason Huynh [mailto:[email protected]<mailto:[email protected]>] Sent: Thursday, January 11, 2018 3:19 AM To: [email protected]<mailto:[email protected]> Subject: Re: Function Executor thread stacked Hi Vahram, It would be interesting to know what object is not serializing/deserializing correctly. Is there any chance you are passing in function arguments that have had modifications that would impact serialization that the class files on the server do not know about? -Jason On Wed, Jan 10, 2018 at 5:02 AM Vahram Aharonyan <[email protected]<mailto:[email protected]>> wrote: Hi All, We are experiencing an issue with the thread that is performing onRegion call and expecting some result in response being stacked forewer in TIMED_WAITING state with below trace: "ComputedAndSystemMetricsRetriever" Id=490 in TIMED_WAITING on lock=java.util.concurrent.CountDownLatch$Sync@5630fcc2<mailto:lock=java.util.concurrent.CountDownLatch$Sync@5630fcc2> Total blocked: 33 Total waited: 261425 sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64) org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:716) org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:793) org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:769) org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:856) org.apache.geode.internal.cache.execute.FunctionStreamingResultCollector.waitForCacheOrFunctionException(FunctionStreamingResultCollector.java:438) org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:91) platform.gemfire.GemfireFunctionExecutor.onRegion(GemfireFunctionExecutor.java:494) In the logs of that member we see following: [warning 2017/12/20 10:49:14.570 UTC 29acc6f1-5384-489d-b2bd-5187b898e482 <ComputedAndSystemMetricsRetriever> tid=0x1ea] 60 seconds have elapsed while waiting for replies: <PRFunctionStreamingResultCollector 100547 waiting for 1 replies from [gbv00457(abb6648c-39d6-4c4c-9c6d-ab8589e034a5:9583)<ec><v4>:10002]> on gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002 whose current membership list is: [[gbv00458(8d2960b9-a6be-4519-9547-311e2717231e:15532)<ec><v5>:10002, gbv00457(abb6648c-39d6-4c4c-9c6d-ab8589e034a5:9583)<ec><v4>:10002, gbv00460(21fd5612-5fe2-451d-aa9d-b8542fa43fa7:20144)<ec><v9>:10002, gbv00459(3a14f29a-8bdb-46d5-bb67-0f79cb5c7faa:17197)<ec><v7>:10002, gbv00454(18618:locator)<ec><v1>:20002, gbv00454(64aed382-0882-44f5-b71f-08a429af46dd:18983)<ec><v8>:10002, gbv00453(13656:locator)<ec><v0>:20002, gbv00453(881591a8-ae04-4af1-866a-5074c2ffb133:14490)<ec><v2>:10002, gbv00456(63cebdf8-dd1e-414e-af5f-f8c4ebecf726:18001)<ec><v6>:10002, gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002]] Near that time on the nodes where this call lands, this exceptions occur: [severe 2017/12/20 10:48:14.728 UTC abb6648c-39d6-4c4c-9c6d-ab8589e034a5 <P2P message reader for gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002 shared unordered uid=8 port=41631> tid=0x44] IOException deserializing message java.io.IOException: failure during message deserialization at org.apache.geode.internal.tcp.MsgDestreamer.getMessage(MsgDestreamer.java:190) at org.apache.geode.internal.tcp.Connection.runOioReader(Connection.java:2218) at org.apache.geode.internal.tcp.Connection.run(Connection.java:1728) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.geode.SerializationException: Could not create an instance of org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage . at org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2492) at org.apache.geode.internal.DSFIDFactory.create(DSFIDFactory.java:979) at org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2720) at org.apache.geode.internal.tcp.MsgDestreamer$DestreamerThread.run(MsgDestreamer.java:261) Caused by: org.apache.geode.SerializationException: Could not create an instance of org.apache.geode.internal.cache.execute.FunctionRemoteContext . at org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2521) at org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2958) at org.apache.geode.DataSerializer.readObject(DataSerializer.java:2897) at org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage.fromData(PartitionedRegionFunctionStreamingMessage.java:180) at org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2477) ... 3 more Caused by: org.apache.geode.SerializationException: Could not create an instance of org.apache.geode.internal.cache.execute.FunctionRemoteContext . at org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2492) at org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2507) ... 7 more Caused by: java.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__java.io&d=DwMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=wpTWSXVvcGFCkFEMePbOecdHHTbyiIj9aWq7oqKb0J8&m=3GzhdGI27IrlCL3fl2LkZ0fILQsdRxAC44v2k1HfwaE&s=21RMjFpmi1wUjKUTps-XQxh9xNMoYiXP4Dt6O4WWL38&e=>.StreamCorruptedException: invalid type code: B1 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1563) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at java.util.TreeMap.buildFromSorted(TreeMap.java:2567) at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) at java.util.TreeMap.buildFromSorted(TreeMap.java:2551) at java.util.TreeMap.buildFromSorted(TreeMap.java:2583) at java.util.TreeMap.buildFromSorted(TreeMap.java:2508) at java.util.TreeMap.readTreeSet(TreeMap.java:2460) at java.util.TreeSet.readObject(TreeSet.java:533) at sun.reflect.GeneratedMethodAccessor743.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at java.util.ArrayList.readObject(ArrayList.java:791) at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at java.util.ArrayList.readObject(ArrayList.java:791) at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2992) at org.apache.geode.DataSerializer.readObject(DataSerializer.java:2897) at org.apache.geode.internal.cache.execute.FunctionRemoteContext.fromData(FunctionRemoteContext.java:73) at org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2479) ... 8 more So could it be that these exceptions are not being sent back to caller node resulting caller thread to wait for reply forever? Thanks, Vahram.
