Ashish,

This might not be a server problem... This could very much be a client problem, that is not responding in a timely manner.. Can you also check your client GC?

--Udo


On 10/19/18 11:46, aashish choudhary wrote:
Below is our configuration
4 Data Nodes,3 Locator Nodes
8vcpu per node
128 GB per data node. Allocated 64 GB of ram per data node. I don't think it could be because of GC as our heap utilization is low. Anyways will check the GC logs if it's related to that.

Thanks,
Ashish

On Fri, Oct 19, 2018, 11:32 PM Anthony Baker <[email protected] <mailto:[email protected]>> wrote:

    Look for GC pauses.  You add flags to the startup options to
    capture GC behavior and understand if you’re hitting a “stop the
    world” pause.  How much free heap space do you have?


    Here’s a few links:
    
https://cwiki.apache.org/confluence/display/GEODE/Resource+Management+in+Geode
    
https://cwiki.apache.org/confluence/display/GEODE/Troubleshooting+Garbage+Collection+Pauses

    Anthony


    On Oct 19, 2018, at 10:17 AM, aashish choudhary
    <[email protected]
    <mailto:[email protected]>> wrote:

    Thanks Charlie. I have watched the video and it was helpful.
    Apart from overcommitted hardware could there be any other issue
    from geode perspective i.e. slow server etc ?. Since we have
    encountered this issue for the first time. For sure we will look
    into steal/ready time.

    Thanks,
    Ashish

    On Fri, Oct 19, 2018, 9:43 PM Charlie Black <[email protected]
    <mailto:[email protected]>> wrote:

        It's not normal for Geode to be not servicing requests.   I
        *do not* recommend changing the fault tolerances until you
        find out why things aren't responding in 10 seconds to 1
        minute.    Imagine your users waiting for a minute or more
        for an in-memory system to return a value.

        Some things to look out for is overcommitted hardware.   You
        can review steal time on the guest OS.  However, most
        enterprises disable host reporting so you might have to
        review Ready Time In MS on the Geode VM.   This shows how
        long a VM was waiting to run - if its anything larger then
        zero - make sure this is what you want.

        Here is a video where I talk about overcommitted hardware -
        which is applicable to all things running on containers / vms.

        https://www.youtube.com/watch?v=0I2oPBKctgU

        Regards,

        Charlie

        On Fri, Oct 19, 2018 at 5:32 AM aashish choudhary
        <[email protected]
        <mailto:[email protected]>> wrote:

            Hi,

            Recently in one of our client application for geode we
            are getting below exception.

            Pool unexpected socket timed out on client

            Server unreachable: could not

            connect after 1 attempts
            After looking at various threads came to know that we
            need to set read-timeout in client configuration to a
            higher value.Default is 10 seconds I believe. Just
            curious to know why server would take more than 10
            seconds to respond. As 10 seconds seems to be on a higher
            side already.For now we will probably increase to 30
            seconds atleast and observe it if makes any difference.
            Also on the server side could see below warnings.
            ClientHealthMonitor Unregistering client with member id
            identity xxxx due to: Socket closed.
            Monitoring client with member id identity xxxx It had
            been 60534 ms since the latest heartbeat. Max interval is
            60000. Terminated client.
            Could this be because of high load on a particular
            server? But we have seen these warnings on all of our
            data nodes.
            Any parameters we need to tune in server side for this?

            Thanks,
            Ashish

-- [email protected] <mailto:[email protected]> | +1.858.480.9722
        Principal Realtime Data Engineer



Reply via email to