Thanks Patrick, I'll look and see if I can figure out a clean change for this. It was the kernel limit for max number of open fds for the process that was where the problem shows up (not zk limit). FWIW, we tested with a process fd limit of 16K, and ZK performed reasonably well until the fd limit was reached, at which point it choked. There was a throughput degradation, but mostly going from 0 to 4000 connections. 4000 to 16000 was mostly flat until the sharp drop. For our use case it is fine to have a bit of performance loss with huge numbers of connections, so long as we can handle the choke, which for initial rollout I'm planning on just monitoring for.
C -----Original Message----- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Wednesday, October 20, 2010 2:06 PM To: firstname.lastname@example.org Subject: Re: implications of netty on client connections It may just be the case that we haven't tested sufficiently for this case (running out of fds) and we need to handle this better even in nio. Probably by cutting off "op_connect" in the selector. We should be able to do similar in netty. Btw, on unix one can access the open/max fd count using this: http://download.oracle.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html Secondly, are you running into a kernel limit or a zk limit? Take a look at this post describing 1million concurrent connections to a box: http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3 specifically: -------------- During various test with lots of connections, I ended up making some additional changes to my sysctl.conf. This was part trial-and-error, I don't really know enough about the internals to make especially informed decisions about which values to change. My policy was to wait for things to break, check /var/log/kern.log and see what mysterious error was reported, then increase stuff that sounded sensible after a spot of googling. Here are the settings in place during the above test: net.core.rmem_max = 33554432 net.core.wmem_max = 33554432 net.ipv4.tcp_rmem = 4096 16384 33554432 net.ipv4.tcp_wmem = 4096 16384 33554432 net.ipv4.tcp_mem = 786432 1048576 26777216 net.ipv4.tcp_max_tw_buckets = 360000 net.core.netdev_max_backlog = 2500 vm.min_free_kbytes = 65536 vm.swappiness = 0 net.ipv4.ip_local_port_range = 1024 65535 ------------------ I'm guessing that even with this, at some point you'll run into a limit in our server implementation. In particular I suspect that we may start to respond more slowly to pings, eventually getting so bad it would time out. We'd have to debug that and address (optimize). <http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3> Patrick On Tue, Oct 19, 2010 at 7:16 AM, Fournier, Camille F. [Tech] < camille.fourn...@gs.com> wrote: > Hi everyone, > > I'm curious what the implications of using netty are going to be for the > case where a server gets close to its max available file descriptors. Right > now our somewhat limited testing has shown that a ZK server performs fine up > to the point when it runs out of available fds, at which point performance > degrades sharply and new connections get into a somewhat bad state. Is netty > going to enable the server to handle this situation more gracefully (or is > there a way to do this already that I haven't found)? Limiting connections > from the same client is not enough since we can potentially have far more > clients wanting to connect than available fds for certain use cases we might > consider. > > Thanks, > Camille > >