We noticed an odd error over the weekend, and would like some advice. One of our "services", a Python thrift[1] server, which binds to a port had an error and stopped responding to requests. Supervisord "saw" this, and tried to bring up another instance. However the original instance hadn't actually exited, so was still running and was still bound to the port. Over the weekend supervisord brought up a number of instances of the service, so in total we found ~30 running instances none of which were responding correctly.
We are about to script a plug-in for supervisord to "ping" the service to monitor the connection. How would we then kill/restart the service if it doesn't respond as expected? [1]: http://incubator.apache.org/thrift/ _______________________________________________ Supervisor-users mailing list [email protected] http://lists.supervisord.org/mailman/listinfo/supervisor-users
