Hi, I'm trying to solve an issue I was having with supervisord. It would quick unexpectedly.
I ran strace on supervisord and saw this going to stderr when it failed. Traceback (most recent call last): File "/usr/local/bin/supervisord", line 9, in <module> load_entry_point(supervisor==3.0a12, console_scripts, supervisord)() File "/usr/local/lib/python2.7/dist-packages/supervisor/supervisord.py", line 363, in main go(options) File "/usr/local/lib/python2.7/dist-packages/supervisor/supervisord.py", line 373, in go d.main() File "/usr/local/lib/python2.7/dist-packages/supervisor/supervisord.py", line 83, in main self.run() File "/usr/local/lib/python2.7/dist-packages/supervisor/supervisord.py", line 100, in run self.runforever() File "/usr/local/lib/python2.7/dist-packages/supervisor/supervisord.py", line 254, in runforever self.reap() File "/usr/local/lib/python2.7/dist-packages/supervisor/supervisord.py", line 288, in reap process.finish(pid, sts) File "/usr/local/lib/python2.7/dist-packages/supervisor/process.py", line 429, in finish self._assertInState(ProcessStates.STARTING) File "/usr/local/lib/python2.7/dist-packages/supervisor/process.py", line 172, in _assertInState self.config.name, current_state, allowable_states)) AssertionError: Assertion failed for nginx: RUNNING not in STARTING The code that is failing comes from here... elif tooquickly: # the program did not stay up long enough to make it to RUNNING # implies STARTING -> BACKOFF self.exitstatus = None self.spawnerr = 'Exited too quickly (process log may have details)' msg = "exited: %s (%s)" % (processname, msg + "; not expected") self._assertInState(ProcessStates.STARTING) self.change_state(ProcessStates.BACKOFF) The problem is that "tooquickly" is True when it should be False, because the time is screwed up somewhere. tooquickly = now - self.laststart < self.config.startsecs Here's what I saw when it failed... tooquickly=True self.killing=0 now=1343447897.04 self.laststart=1343447916.56 self.config.startsecs=1 Notice "self.laststart" is greater than "now". self.laststart is initialized to 0 and later is set to time.time() now is set to time.time() just before this call. So the only way this could be going wrong is if time.time() is going backwards. time.time()¶ […] While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls. I did a little debugging and the numbers are all over the place. It's not just the call to time.time(), because setting the variable and printing the variable show a sequence that does not line up. set now=1343449004.92 tooquickly=False now=1343449004.92 self.laststart=1343449000.73 set self.laststart=1343449005.28 set now=1343449007.64 tooquickly=False now=1343449007.64 self.laststart=1343448867.24 set self.laststart=1343449007.71 set now=1343449009.47 tooquickly=False now=1343449009.47 self.laststart=1343448864.52 set self.laststart=1343449009.62 set now=1343449009.85 tooquickly=False now=1343449009.85 self.laststart=1343449005.28 set self.laststart=1343449010.05 set now=1343449010.69 tooquickly=False now=1343449010.69 self.laststart=1343448989.66 set self.laststart=1343449010.82 set now=1343449012.45 tooquickly=False now=1343449012.45 self.laststart=1343448927.71 set self.laststart=1343449012.49 set now=1343449014.54 tooquickly=False now=1343449014.54 self.laststart=1343449010.05 set self.laststart=1343449014.62 set now=1343449000.04 tooquickly=True now=1343449000.04 self.laststart=1343449014.62 So my questions is if anyone has experienced this or has an idea what might be happening. Does supervisord use threads or anything similar which could be causing this? Thanks, Phil _______________________________________________ Supervisor-users mailing list Supervisor-users@lists.supervisord.org http://lists.supervisord.org/mailman/listinfo/supervisor-users