Adam,

Regarding 4980RAM, these are virtual address spaces assigned to ZK. In your
case, since the resident memory is 104MB, I suspect your OS has similar
memory allocator that allocate per thread memory pool, something like [1].
So if a process has multiple threads (which ZK server process does), then
it would not be a surprise for 4G virtual space being mapped to the
process. If that is your case, then it is harmless, as the 4G is just
virtual address.

For debugging client connection timeout, we could take suspects one by one
out of equation. If you are suspecting (server) swap might cause the issue,
you can set vm.swappiness=0 and set ZK JVM to a reasonably large size to
ensure ZK always operate in memory, then run some load tests at a similar
scale to your prod environment. Be aware that swapping can also happen on
client side (i.e. if too many clients using ZK java client library sitting
on same host).

[1]
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en

On Wed, Nov 2, 2016 at 3:52 PM, Whitney, Adam <adam.whit...@sony.com> wrote:

> Thanks Ben,
>
> That makes sense.
>
> As for why this timeout keeps on happening ... I'm wondering if I'm
> running into a swapping issue because ZooKeeper doesn't have a max heap
> size specified ... and this host has 10GB of RAM ... so the zookeeper
> process is running currently with 4980MB of RAM with 104MB resident
> (according to top) ... 4980MB is a bit excessive as I'm only using
> zookeeper to support replicated leveldb in activemq.
>
> How could I tell if swapping is causing my disconnects?
>
> Also, is anyone familiar with using zookeeper to support replicated
> leveldb in activemq? If so, is 1GB of heap space enough for zookeeper to
> support that? That is all we're using this zookeeper for, so it seems like
> ~5GB of heap might be a bit excessive. For comparison, we've been running
> this setup in another datacenter where zookeeper hosts only have 2GB of RAM
> and it ran fine there ... but those hosts aren't running anymore and since
> we didn't specify the JVM heap size I'm not sure how much RAM zookeeper was
> actually using ... but I'm guessing it was somewhere near 1GB (1/2 of RAM)?
>
> adam
>
> -----Original Message-----
> From: Benjamin Reed [mailto:br...@apache.org]
> Sent: Wednesday, November 02, 2016 3:02 PM
> To: user@zookeeper.apache.org
> Subject: Re: zookeeper client seems to timeout earlier than it should
>
> clients need to make sure they move off of a dead server on to a new one
> to keep their connection alive, so generally if the client hasn't heard
> from the server in 2/3 * sessionTimeout it will try to connect to someone
> else. if it waited the whole 4 seconds, when connected to an active server
> it would be pronounced dead on arrival.
>
> ben
>
> On Wed, Nov 2, 2016 at 5:11 PM, Whitney, Adam <adam.whit...@sony.com>
> wrote:
> > (Sorry if this is a repost … I got a strange response to my original
> > email so I’m not sure if it went through or not)
> >
> > I have a zookeeper cluster with 3 nodes and tick time set to 2s
> >
> > When a client connects to the cluster I see a log entry like this:
> >
> > INFO  | Session establishment complete on server XXX, sessionid = XXX,
> > negotiated timeout = 4000 | org.apache.zookeeper.ClientCnxn |
> > main-SendThread(XXX:2181)
> >
> > Notice the "negotiated timeout = 4000"
> >
> > But about once a day I see a log entry like this:
> >
> > INFO  | Client session timed out, have not heard from server in 2953ms
> > for sessionid XXX, closing socket connection and attempting reconnect
> > | org.apache.zookeeper.ClientCnxn | main-SendThread(XXX:2181)
> >
> > Why would the client (apparently) timeout the session after only 2953ms
> if the negotiated timeout was 4000ms?
> >
>
>


-- 
Cheers
Michael.

Reply via email to