We noticed an odd error over the weekend, and would like some advice.

One of our "services", a Python thrift[1] server, which binds to a port 
had an error and stopped responding to requests. Supervisord "saw" this, 
and tried to bring up another instance. However the original instance 
hadn't actually exited, so was still running and was still bound to the 
port. Over the weekend supervisord brought up a number of instances of 
the service, so in total we found ~30 running instances none of which 
were responding correctly.

We are about to script a plug-in for supervisord to "ping" the service 
to monitor the connection. How would we then kill/restart the service if 
it doesn't respond as expected?

[1]: http://incubator.apache.org/thrift/
_______________________________________________
Supervisor-users mailing list
[email protected]
http://lists.supervisord.org/mailman/listinfo/supervisor-users

Reply via email to