[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-485: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 the patch looks good. I just committed this. thanks pat. need ops documentation that details supervision of ZK server processes -- Key: ZOOKEEPER-485 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485 Project: Zookeeper Issue Type: Bug Components: documentation, server Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.3.0 Attachments: ZOOKEEPER-485.patch We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process exits/dies/crashes/etc... In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm. Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done automatically then it will have to be done manually, by operator restarting the ZK server jvm The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - fits into this nicely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-485: --- Assignee: Patrick Hunt Status: Patch Available (was: Open) need ops documentation that details supervision of ZK server processes -- Key: ZOOKEEPER-485 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485 Project: Zookeeper Issue Type: Bug Components: documentation, server Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.3.0 Attachments: ZOOKEEPER-485.patch We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process exits/dies/crashes/etc... In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm. Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done automatically then it will have to be done manually, by operator restarting the ZK server jvm The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - fits into this nicely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-485: --- Fix Version/s: (was: 3.2.1) need ops documentation that details supervision of ZK server processes -- Key: ZOOKEEPER-485 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485 Project: Zookeeper Issue Type: Bug Components: documentation, server Reporter: Patrick Hunt Fix For: 3.3.0 We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process exits/dies/crashes/etc... In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm. Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done automatically then it will have to be done manually, by operator restarting the ZK server jvm The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - fits into this nicely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.