Look for GC pauses.  You add flags to the startup options to capture GC 
behavior and understand if you’re hitting a “stop the world” pause.  How much 
free heap space do you have?


Here’s a few links:
https://cwiki.apache.org/confluence/display/GEODE/Resource+Management+in+Geode
https://cwiki.apache.org/confluence/display/GEODE/Troubleshooting+Garbage+Collection+Pauses

Anthony


> On Oct 19, 2018, at 10:17 AM, aashish choudhary 
> <[email protected]> wrote:
> 
> Thanks Charlie. I have watched the video and it was helpful. Apart from 
> overcommitted hardware could there be any other issue from geode perspective 
> i.e. slow server etc ?. Since we have encountered this issue for the first 
> time. For sure we will look into steal/ready time.
> 
> Thanks,
> Ashish
> 
> On Fri, Oct 19, 2018, 9:43 PM Charlie Black <[email protected] 
> <mailto:[email protected]>> wrote:
> It's not normal for Geode to be not servicing requests.   I do not recommend 
> changing the fault tolerances until you find out why things aren't responding 
> in 10 seconds to 1 minute.    Imagine your users waiting for a minute or more 
> for an in-memory system to return a value.
> 
> Some things to look out for is overcommitted hardware.   You can review steal 
> time on the guest OS.   However, most enterprises disable host reporting so 
> you might have to review Ready Time In MS on the Geode VM.   This shows how 
> long a VM was waiting to run - if its anything larger then zero - make sure 
> this is what you want.   
> 
> Here is a video where I talk about overcommitted hardware - which is 
> applicable to all things running on containers / vms.
> 
> https://www.youtube.com/watch?v=0I2oPBKctgU 
> <https://www.youtube.com/watch?v=0I2oPBKctgU>
> 
> Regards,
> 
> Charlie
> 
> On Fri, Oct 19, 2018 at 5:32 AM aashish choudhary 
> <[email protected] <mailto:[email protected]>> wrote:
> Hi,
> 
> Recently in one of our client application for geode we are getting below 
> exception.
> 
> Pool unexpected socket timed out on client 
> 
> Server unreachable: could not 
>     connect after 1 attempts
> 
> After looking at various threads came to know that we need to set 
> read-timeout in client configuration to a higher value.Default is 10 seconds 
> I believe. Just curious to know why server would take more than 10 seconds to 
> respond. As 10 seconds seems to be on a higher side already.For now we will 
> probably increase to 30 seconds atleast and observe it if makes any 
> difference.
> 
> Also on the server side could see below warnings.
> 
> ClientHealthMonitor Unregistering client with member id identity xxxx due to: 
> Socket closed.
> Monitoring client with member id identity xxxx It had been 60534 ms since the 
> latest heartbeat. Max interval is 60000. Terminated client.
> 
> Could this be because of high load on a particular server? But we have seen 
> these warnings on all of our data nodes.
> 
> Any parameters we need to tune in server side for this?
> 
> Thanks,
> Ashish
> -- 
> [email protected] <mailto:[email protected]> | +1.858.480.9722
> Principal Realtime Data Engineer 

Reply via email to