So far we are not able find anything which points to resource starvation at
client side or server side.Confirmed by checking gc logs  in both server
and client side. Also the client where in we had set the read timeout to 30
seconds we were getting below warnings on the server side.

Server connection from identity xxxx is being terminated because it's
client time-out of 30,000 has expired.

And these issues which I explained earlier also were mostly observed when
client was connected with one particular data node. Not sure if rebooting
it will fix the issue?


Thanks,
Ashish


On Sat, Oct 20, 2018, 3:19 AM John Blum <[email protected]> wrote:

> Yes, as Anthony and Udo point out, I would verify the resources
> utilization on your clients.  You can also verify/change the PING interval
> used by the client Pool configuration [1] to let the server know that your
> client(s) are still around.  But, that will not help much if the client is
> resource strapped in the first place.  But, if they are not, then it might
> help.  Food for thought.
>
>
> [1]
> http://gemfire-95-javadocs.docs.pivotal.io/org/apache/geode/cache/client/PoolFactory.html#setPingInterval-long-
>
> On Fri, Oct 19, 2018 at 11:50 AM, Udo Kohlmeyer <[email protected]>
> wrote:
>
>> Ashish,
>>
>> This might not be a server problem... This could very much be a client
>> problem, that is not responding in a timely manner.. Can you also check
>> your client GC?
>>
>> --Udo
>>
>> On 10/19/18 11:46, aashish choudhary wrote:
>>
>> Below is our configuration
>> 4 Data Nodes,3 Locator Nodes
>> 8vcpu per node
>> 128 GB per data node. Allocated 64 GB of ram per data node. I don't think
>> it could be because of GC as our heap utilization is low. Anyways will
>> check the GC logs if it's related to that.
>>
>> Thanks,
>> Ashish
>>
>> On Fri, Oct 19, 2018, 11:32 PM Anthony Baker <[email protected]> wrote:
>>
>>> Look for GC pauses.  You add flags to the startup options to capture GC
>>> behavior and understand if you’re hitting a “stop the world” pause.  How
>>> much free heap space do you have?
>>>
>>>
>>> Here’s a few links:
>>>
>>> https://cwiki.apache.org/confluence/display/GEODE/Resource+Management+in+Geode
>>>
>>> https://cwiki.apache.org/confluence/display/GEODE/Troubleshooting+Garbage+Collection+Pauses
>>>
>>> Anthony
>>>
>>>
>>> On Oct 19, 2018, at 10:17 AM, aashish choudhary <
>>> [email protected]> wrote:
>>>
>>> Thanks Charlie. I have watched the video and it was helpful. Apart from
>>> overcommitted hardware could there be any other issue from geode
>>> perspective i.e. slow server etc ?. Since we have encountered this issue
>>> for the first time. For sure we will look into steal/ready time.
>>>
>>> Thanks,
>>> Ashish
>>>
>>> On Fri, Oct 19, 2018, 9:43 PM Charlie Black <[email protected]> wrote:
>>>
>>>> It's not normal for Geode to be not servicing requests.   I *do not*
>>>> recommend changing the fault tolerances until you find out why things
>>>> aren't responding in 10 seconds to 1 minute.    Imagine your users waiting
>>>> for a minute or more for an in-memory system to return a value.
>>>>
>>>> Some things to look out for is overcommitted hardware.   You can review
>>>> steal time on the guest OS.   However, most enterprises disable host
>>>> reporting so you might have to review Ready Time In MS on the Geode VM.
>>>>  This shows how long a VM was waiting to run - if its anything larger then
>>>> zero - make sure this is what you want.
>>>>
>>>> Here is a video where I talk about overcommitted hardware - which is
>>>> applicable to all things running on containers / vms.
>>>>
>>>> https://www.youtube.com/watch?v=0I2oPBKctgU
>>>>
>>>> Regards,
>>>>
>>>> Charlie
>>>>
>>>> On Fri, Oct 19, 2018 at 5:32 AM aashish choudhary <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Recently in one of our client application for geode we are getting
>>>>> below exception.
>>>>>
>>>>> Pool unexpected socket timed out on client
>>>>>
>>>>> Server unreachable: could not
>>>>>
>>>>> connect after 1 attempts
>>>>> After looking at various threads came to know that we need to set
>>>>> read-timeout in client configuration to a higher value.Default is 10
>>>>> seconds I believe. Just curious to know why server would take more than 10
>>>>> seconds to respond. As 10 seconds seems to be on a higher side already.For
>>>>> now we will probably increase to 30 seconds atleast and observe it if 
>>>>> makes
>>>>> any difference.
>>>>> Also on the server side could see below warnings.
>>>>> ClientHealthMonitor Unregistering client with member id identity xxxx
>>>>> due to: Socket closed.
>>>>> Monitoring client with member id identity xxxx It had been 60534 ms
>>>>> since the latest heartbeat. Max interval is 60000. Terminated client.
>>>>> Could this be because of high load on a particular server? But we have
>>>>> seen these warnings on all of our data nodes.
>>>>> Any parameters we need to tune in server side for this?
>>>>>
>>>>> Thanks,
>>>>> Ashish
>>>>>
>>>> --
>>>> [email protected] | +1.858.480.9722
>>>> Principal Realtime Data Engineer
>>>>
>>>
>>>
>>
>
>
> --
> -John
> john.blum10101 (skype)
>

Reply via email to