I looked back through the git history for Connection.java.  He's not running with a recent build.  I was looking at 4/5/17 to find matching line numbers.

On 1/12/18 2:20 PM, Barry Oglesby wrote:
It looks like you're running the function from a peer as opposed to a client. Is that right? Otherwise, the ExecuteRegionFunction66 would be deserializing the arguments in the ServerConnection that receives the client's function execution request. In that case, the StreamCorruptedException would occur there instead of in the RemoteFunctionContext.

I wrote a test with a peer accessor member executing an onRegion function. I implemented the readObject method in the Serializable argument I passed to throw a StreamCorruptedException. The severe warning I see when I run the test is pretty similar to yours except yours.

In my test, the FunctionStreamingResultCollector.waitForCacheOrFunctionException method catches the ReplyException from the member that attempted to deserialize the FunctionRemoteContext. It handles that exception and completes. The caller gets the exception. This is exactly the behavior Bruce described in his reply.

What is your ComputedAndSystemMetricsRetriever thread and how exactly is it being called to execute the function? Maybe there is something about the way its being called that is causing different behavior.

btw - I'm running this in the latest Geode develop code. What version are you running?


Thanks,
Barry Oglesby


On Fri, Jan 12, 2018 at 11:46 AM, Bruce Schuchardt <[email protected] <mailto:[email protected]>> wrote:

    This shouldn't normally cause a hang.  The code that handles
    receipt of tcp/ip messages reads the message's "reply processor"
    identifier before trying to deserilize the rest of the message. 
    If there is a problem in deserializing the message we send an
    error response with the identifier so that the sender knows
    something went wrong.

    Having said that, I am not as familiar with the function execution
    streaming-reply processors and how they handle this kind of
    response.  It's possible that a hang could occur in your situation
    if these reply processors aren't prepared to deal with an error
    response.

    It seems to me that you should be more concerned that a
    deserialization problem occurred at all.  For instance, was the
    treemap being actively modified during serialization?  If so, take
    steps to prevent that from happening.


    On 1/10/18 5:02 AM, Vahram Aharonyan wrote:

    Hi All,

    We are experiencing an issue with the thread that is performing
    onRegion call and expecting some result in response being stacked
    forewer in TIMED_WAITING state with below  trace:

    "ComputedAndSystemMetricsRetriever" Id=490 in TIMED_WAITING on
    lock=java.util.concurrent.CountDownLatch$Sync@5630fcc2

    Total blocked: 33   Total waited: 261425

      sun.misc.Unsafe.park(Native Method)

    java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

    
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)

    
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)

    java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)

    
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)

    
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:716)

    
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:793)

    
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:769)

    
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:856)

    
org.apache.geode.internal.cache.execute.FunctionStreamingResultCollector.waitForCacheOrFunctionException(FunctionStreamingResultCollector.java:438)

    
org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:91)

    
platform.gemfire.GemfireFunctionExecutor.onRegion(GemfireFunctionExecutor.java:494)

    In the logs of that member we see following:

    [warning 2017/12/20 10:49:14.570 UTC
    29acc6f1-5384-489d-b2bd-5187b898e482
    <ComputedAndSystemMetricsRetriever> tid=0x1ea] 60 seconds have
    elapsed while waiting for replies:
    <PRFunctionStreamingResultCollector 100547 waiting for 1 replies
    from
    [gbv00457(abb6648c-39d6-4c4c-9c6d-ab8589e034a5:9583)<ec><v4>:10002]>
    on
    gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002
    whose current membership list is:
    [[gbv00458(8d2960b9-a6be-4519-9547-311e2717231e:15532)<ec><v5>:10002,
    gbv00457(abb6648c-39d6-4c4c-9c6d-ab8589e034a5:9583)<ec><v4>:10002,
    gbv00460(21fd5612-5fe2-451d-aa9d-b8542fa43fa7:20144)<ec><v9>:10002,
    gbv00459(3a14f29a-8bdb-46d5-bb67-0f79cb5c7faa:17197)<ec><v7>:10002,
    gbv00454(18618:locator)<ec><v1>:20002,
    gbv00454(64aed382-0882-44f5-b71f-08a429af46dd:18983)<ec><v8>:10002,
    gbv00453(13656:locator)<ec><v0>:20002,
    gbv00453(881591a8-ae04-4af1-866a-5074c2ffb133:14490)<ec><v2>:10002,
    gbv00456(63cebdf8-dd1e-414e-af5f-f8c4ebecf726:18001)<ec><v6>:10002,
    gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002]]

    Near that time on the nodes where this call lands, this
    exceptions occur:

    [severe 2017/12/20 10:48:14.728 UTC
    abb6648c-39d6-4c4c-9c6d-ab8589e034a5 <P2P message reader for
    gbv00455(29acc6f1-5384-489d-b2bd-5187b898e482:22303)<ec><v3>:10002
    shared unordered uid=8 port=41631> tid=0x44] IOException
    deserializing message

    java.io.IOException: failure during message deserialization

            at
    
org.apache.geode.internal.tcp.MsgDestreamer.getMessage(MsgDestreamer.java:190)

            at
    org.apache.geode.internal.tcp.Connection.runOioReader(Connection.java:2218)

            at
    org.apache.geode.internal.tcp.Connection.run(Connection.java:1728)

            at
    
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

            at
    
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

            at java.lang.Thread.run(Thread.java:748)

    Caused by: org.apache.geode.SerializationException: Could not
    create an instance of
    
org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage
    .

            at
    
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2492)

            at
    org.apache.geode.internal.DSFIDFactory.create(DSFIDFactory.java:979)

            at
    
org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2720)

            at
    
org.apache.geode.internal.tcp.MsgDestreamer$DestreamerThread.run(MsgDestreamer.java:261)

    Caused by: org.apache.geode.SerializationException: Could not
    create an instance of
    org.apache.geode.internal.cache.execute.FunctionRemoteContext .

            at
    
org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2521)

            at
    
org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2958)

            at
    org.apache.geode.DataSerializer.readObject(DataSerializer.java:2897)

            at
    
org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage.fromData(PartitionedRegionFunctionStreamingMessage.java:180)

            at
    
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2477)

            ... 3 more

    Caused by: org.apache.geode.SerializationException: Could not
    create an instance of
    org.apache.geode.internal.cache.execute.FunctionRemoteContext .

            at
    
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2492)

            at
    
org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2507)

            ... 7 more

    Caused by: java.io <http://java.io>.StreamCorruptedException:
    invalid type code: B1

            at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1563)

            at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2567)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2551)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2583)

            at java.util.TreeMap.buildFromSorted(TreeMap.java:2508)

            at java.util.TreeMap.readTreeSet(TreeMap.java:2460)

            at java.util.TreeSet.readObject(TreeSet.java:533)

            at sun.reflect.GeneratedMethodAccessor743.invoke(Unknown
    Source)

            at
    
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

            at java.lang.reflect.Method.invoke(Method.java:498)

            at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)

            at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136)

            at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)

            at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)

            at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)

            at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)

            at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)

            at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)

            at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)

            at java.util.ArrayList.readObject(ArrayList.java:791)

            at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown
    Source)

            at
    
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

            at java.lang.reflect.Method.invoke(Method.java:498)

            at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)

            at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136)

            at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)

            at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)

            at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)

            at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)

            at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)

            at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)

            at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)

           at java.util.ArrayList.readObject(ArrayList.java:791)

            at sun.reflect.GeneratedMethodAccessor232.invoke(Unknown
    Source)

            at
    
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

            at java.lang.reflect.Method.invoke(Method.java:498)

            at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)

            at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136)

            at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)

            at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)

            at
    java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)

            at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)

            at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)

            at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)

            at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)

            at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)

            at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)

                        at
    
org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2992)

            at
    org.apache.geode.DataSerializer.readObject(DataSerializer.java:2897)

            at
    
org.apache.geode.internal.cache.execute.FunctionRemoteContext.fromData(FunctionRemoteContext.java:73)

            at
    
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2479)

            ... 8 more

    So could it be that these exceptions are not being sent back to
    caller node resulting caller thread to wait for reply forever?

    Thanks,

    Vahram.




Reply via email to