Hello Anthony, Yes I am running in virtualized infrastructure. But when I checked %id and %st and logged graph for it, i see %st as always 0.0 and %id in range of (95-98) most of the time.
Could number of connections for every client app or member-timeout or ack-wait-threshold help here? Thanks, - Dharam Thacker On Thu, Sep 27, 2018 at 8:37 PM Anthony Baker <[email protected]> wrote: > Are you running on cloud or virtualized infrastructure? If so, check if > your steal time stats—you may have “noisy neighbors” causing members to > become unresponsive. Geode detects this and fences off the unhealthy > members to maintain consistency and availability. > > Anthony > > > On Sep 27, 2018, at 10:31 AM, Dharam Thacker <[email protected]> > wrote: > > Hi Team, > > I have following topology for geode currently and all regions are > replicated. > > Note : Unfortunately I am still on version 1.1.1 > > *Host1*: > Locator1 > Server1.1 (Group1) -- 24G > Server2.1 (Group2) -- 24G > Client1 (CQ listener only -- 20 CQs registered via locator pool) > Client2 (Fires OQL queries and functions only via locator pool) > > *Host2*: > Locator2 > Server1.2 (Group1) -- 24G > Server2.2 (Group2) -- 24G > > As shown above I have spring boot web app geode clients (client1 and > client2) only on HOST1. > > If I scale them by putting them on HOST2 as well it works. > > Now I see 40 CQs registered for CQ listener client. > > But I frequently see now "GMS Membership error" complaining about "No > heartbeat request and force disconnection of member" for all server nodes. > > Transient though but really painful! > > Somehow with 1.1.1 it can't auto reconnect which I know is fixed in later > version but that's still fine. > > I did GC,CPU load and Memory analysis very well and at least these 3 looks > quite healthy as expected. > > What could be the possible other reasons where scalling client apps might > result into this? > > Or if you can suggest anything else to look at? > > Thanks, > Dharam > > >
