[ https://issues.apache.org/jira/browse/ZOOKEEPER-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Travis Crawford updated ZOOKEEPER-803: -------------------------------------- Attachment: connection-bugfix-diff.png This diff shows a bug where the client developer confused disconnections and expired sessions. In the zookeeper programing model, clients reconnect themselves automatically when disconnected. However, should the session expire the application is responsible for reconnecting. In this case the developer attempted to throttle reconnects, however, due to a bug the application created a new connection each time. A small number of clients running the buggy code took down a 3 node Zookeeper cluster by exhausting 65k file descriptor limit. It only recovered after shutting down clients, restarting the Zookeepers, and then restarting the well-behaved clients. > Improve defenses against misbehaving clients > -------------------------------------------- > > Key: ZOOKEEPER-803 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-803 > Project: Zookeeper > Issue Type: Bug > Affects Versions: 3.3.0 > Reporter: Travis Crawford > Attachments: connection-bugfix-diff.png > > > This issue is in response to ZOOKEEPER-801. Short version is a small number > of buggy clients opened thousands of connections and caused Zookeeper to fail. > The misbehaving client did not correctly handle expired sessions, creating a > new connection each time. The huge number of connections exacerbated the > issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.