On Mon, 2010-06-07 at 10:00 +0100, Phillip Oldham wrote: > We noticed an odd error over the weekend, and would like some advice. > > One of our "services", a Python thrift[1] server, which binds to a port > had an error and stopped responding to requests. Supervisord "saw" this, > and tried to bring up another instance.
I think you might mean that superlance httpok saw this and tried to bring up another instance? "Raw" supervisor doesn't monitor process behavior, only process up/down status. > However the original instance > hadn't actually exited, so was still running and was still bound to the > port. Over the weekend supervisord brought up a number of instances of > the service, so in total we found ~30 running instances none of which > were responding correctly. > > We are about to script a plug-in for supervisord to "ping" the service > to monitor the connection. How would we then kill/restart the service if > it doesn't respond as expected? I think you probably need to answer the above question and maybe provide your current config so we can figure out what's going on before any other advice can be given. - C _______________________________________________ Supervisor-users mailing list [email protected] http://lists.supervisord.org/mailman/listinfo/supervisor-users
