Look for GC pauses. You add flags to the startup options to capture GC behavior and understand if you’re hitting a “stop the world” pause. How much free heap space do you have?
Here’s a few links: https://cwiki.apache.org/confluence/display/GEODE/Resource+Management+in+Geode https://cwiki.apache.org/confluence/display/GEODE/Troubleshooting+Garbage+Collection+Pauses Anthony > On Oct 19, 2018, at 10:17 AM, aashish choudhary > <[email protected]> wrote: > > Thanks Charlie. I have watched the video and it was helpful. Apart from > overcommitted hardware could there be any other issue from geode perspective > i.e. slow server etc ?. Since we have encountered this issue for the first > time. For sure we will look into steal/ready time. > > Thanks, > Ashish > > On Fri, Oct 19, 2018, 9:43 PM Charlie Black <[email protected] > <mailto:[email protected]>> wrote: > It's not normal for Geode to be not servicing requests. I do not recommend > changing the fault tolerances until you find out why things aren't responding > in 10 seconds to 1 minute. Imagine your users waiting for a minute or more > for an in-memory system to return a value. > > Some things to look out for is overcommitted hardware. You can review steal > time on the guest OS. However, most enterprises disable host reporting so > you might have to review Ready Time In MS on the Geode VM. This shows how > long a VM was waiting to run - if its anything larger then zero - make sure > this is what you want. > > Here is a video where I talk about overcommitted hardware - which is > applicable to all things running on containers / vms. > > https://www.youtube.com/watch?v=0I2oPBKctgU > <https://www.youtube.com/watch?v=0I2oPBKctgU> > > Regards, > > Charlie > > On Fri, Oct 19, 2018 at 5:32 AM aashish choudhary > <[email protected] <mailto:[email protected]>> wrote: > Hi, > > Recently in one of our client application for geode we are getting below > exception. > > Pool unexpected socket timed out on client > > Server unreachable: could not > connect after 1 attempts > > After looking at various threads came to know that we need to set > read-timeout in client configuration to a higher value.Default is 10 seconds > I believe. Just curious to know why server would take more than 10 seconds to > respond. As 10 seconds seems to be on a higher side already.For now we will > probably increase to 30 seconds atleast and observe it if makes any > difference. > > Also on the server side could see below warnings. > > ClientHealthMonitor Unregistering client with member id identity xxxx due to: > Socket closed. > Monitoring client with member id identity xxxx It had been 60534 ms since the > latest heartbeat. Max interval is 60000. Terminated client. > > Could this be because of high load on a particular server? But we have seen > these warnings on all of our data nodes. > > Any parameters we need to tune in server side for this? > > Thanks, > Ashish > -- > [email protected] <mailto:[email protected]> | +1.858.480.9722 > Principal Realtime Data Engineer
