Re: pid file handling issue
Michael Fischer wrote: > (1) Previous-generation parent (P) receives SIGUSR2. > (2) P renames unicorn.pid to unicorn.oldpid > (3) P forks child (P'); if fork unsuccessful, P renames unicorn.oldpid > to unicorn.pid. > (4) P' calls exec and attempts to start; creates unicorn.pid. P > watches for SIGCHLD from P'. If received, P renames unicorn.oldpid to > unicorn.pid. > (5) P' sends SIGQUIT to P. P' unlinks unicorn.oldpid. P' is now P. > > What am I missing here? This is, to my knowledge, precisely what > nginx does > (http://wiki.nginx.org/CommandLine#Upgrading_To_a_New_Binary_On_The_Fly). OK, this is probably safe and do what you want. It's sitting in master for now: -- 8< From: Eric Wong Subject: [PATCH] attempt to rename PID file when possible This will preserve mtime on successful renames for comparisions. While we're at it, avoid writing the new PID until the listeners are inherited successfully. This can be useful to avoid accidentally clobbering a good PID if binding the listener or building the app (preload_app==true) fails --- lib/unicorn/http_server.rb | 48 +++--- 1 file changed, 37 insertions(+), 11 deletions(-) diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index bed24d0..cd160c5 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -134,11 +134,22 @@ class Unicorn::HttpServer # Note that signals don't actually get handled until the #join method QUEUE_SIGS.each { |sig| trap(sig) { SIG_QUEUE << sig; awaken_master } } trap(:CHLD) { awaken_master } -self.pid = config[:pid] + +# write pid early for Mongrel compatibility if we're not inheriting sockets +# This was needed for compatibility with some health checker a long time +# ago. This unfortunately has the side effect of clobbering valid PID +# files. +self.pid = config[:pid] unless ENV["UNICORN_FD"] self.master_pid = $$ build_app! if preload_app bind_new_listeners! + +# Assuming preload_app==false, we drop the pid file after the app is ready +# to process requests. If binding or build_app! fails with +# preload_app==true, we'll never get here and the parent will recover +self.pid = config[:pid] if ENV["UNICORN_FD"] + spawn_missing_workers self end @@ -180,6 +191,21 @@ class Unicorn::HttpServer Unicorn::HttpRequest::DEFAULTS["rack.logger"] = @logger = obj end + def clobber_pid(path) +unlink_pid_safe(@pid) if @pid +if path + fp = begin +tmp = "#{File.dirname(path)}/#{rand}.#$$" +File.open(tmp, File::RDWR|File::CREAT|File::EXCL, 0644) + rescue Errno::EEXIST +retry + end + fp.syswrite("#$$\n") + File.rename(fp.path, path) + fp.close +end + end + # sets the path for the PID file of the master process def pid=(path) if path @@ -194,18 +220,18 @@ class Unicorn::HttpServer "(or pid=#{path} is stale)" end end -unlink_pid_safe(pid) if pid -if path - fp = begin -tmp = "#{File.dirname(path)}/#{rand}.#$$" -File.open(tmp, File::RDWR|File::CREAT|File::EXCL, 0644) - rescue Errno::EEXIST -retry +# rename the old pid if posible +if @pid && path + begin +File.rename(@pid, path) + rescue Errno::ENOENT, Errno::EXDEV +# a user may have accidentally removed the original. +# Obviously cross-FS renames +clobber_pid(path) end - fp.syswrite("#$$\n") - File.rename(fp.path, path) - fp.close +else + clobber_pid(path) end @pid = path end -- 1.8.4.483.g7fe67e6.dirty ___ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Re: pid file handling issue
Michael Fischer wrote: > On Thu, Oct 24, 2013 at 11:21 AM, Eric Wong wrote: > > > Right, we looked at using rename last year but I didn't think it's possible > > given we need to write the pid file before binding new listen sockets > > > > http://mid.gmane.org/20121127215146.ga23...@dcvr.yhbt.net > > > > But perhaps we can drop the pid file late iff ENV["UNICORN_FD"] is > > detected. I'll see if that can be done w/o breaking compatibility. > > My opinion is that supporting backward compatibility cases that are > clearly poorly designed, at least in open-source software, is > ill-advised. (I'm referring to the Mongrel compatibility semantics > discussed in that article.) > > That aside, I don't yet understand this "need" you're referring to. > The control flow I'm proposing is as follows: I'm not really sure, either; I just remember it was somewhat important to Mongrel back then. I'll get back to this later today/tomorrow. Your control flow looks correct, though. > > But NTP syncs early in the boot process before most processes (including > > unicorn) are started. It shouldn't matter, then, right? > > Truth be told, I'm not completely certain why this is an issue. My > reading of procps and the kernel suggests it should be doing the right > thing, but I tried this at first: > > - Touch a timestamp file before sending P a SIGUSR2. > - Wait for oldpid to disappear > - Read the stime field from ps(1) for the remaining master process (P or P') > - If stime < mtime of timestamp: new process failed. If stime > > mtime, new process succeeded. > > But for reasons unclear to me, sometimes the stime of P' (successful > reload) would predate the timestamp! This was obviously agonizing. OK, comparing mtime vs calculated value of stime is not possible because of time adjustments. Process start time is stored as monotonic time, and calculated in ps(1) to real clock time. So you can only compare stimes between different processes. Comparing stime to the mtime/ctime/atime of any file will not work reliably. ___ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Re: pid file handling issue
On Thu, Oct 24, 2013 at 11:21 AM, Eric Wong wrote: > Right, we looked at using rename last year but I didn't think it's possible > given we need to write the pid file before binding new listen sockets > > http://mid.gmane.org/20121127215146.ga23...@dcvr.yhbt.net > > But perhaps we can drop the pid file late iff ENV["UNICORN_FD"] is > detected. I'll see if that can be done w/o breaking compatibility. My opinion is that supporting backward compatibility cases that are clearly poorly designed, at least in open-source software, is ill-advised. (I'm referring to the Mongrel compatibility semantics discussed in that article.) That aside, I don't yet understand this "need" you're referring to. The control flow I'm proposing is as follows: (1) Previous-generation parent (P) receives SIGUSR2. (2) P renames unicorn.pid to unicorn.oldpid (3) P forks child (P'); if fork unsuccessful, P renames unicorn.oldpid to unicorn.pid. (4) P' calls exec and attempts to start; creates unicorn.pid. P watches for SIGCHLD from P'. If received, P renames unicorn.oldpid to unicorn.pid. (5) P' sends SIGQUIT to P. P' unlinks unicorn.oldpid. P' is now P. What am I missing here? This is, to my knowledge, precisely what nginx does (http://wiki.nginx.org/CommandLine#Upgrading_To_a_New_Binary_On_The_Fly). >> If the file's mtime or inode number changes under my proposal, that >> means the reload must have been successful. What race condition are >> you referring to that would render this conclusion inaccurate? > > It doesn't mean the process didn't exit/crash right after writing the PID. That should not happen per (4) above. > But NTP syncs early in the boot process before most processes (including > unicorn) are started. It shouldn't matter, then, right? Truth be told, I'm not completely certain why this is an issue. My reading of procps and the kernel suggests it should be doing the right thing, but I tried this at first: - Touch a timestamp file before sending P a SIGUSR2. - Wait for oldpid to disappear - Read the stime field from ps(1) for the remaining master process (P or P') - If stime < mtime of timestamp: new process failed. If stime > mtime, new process succeeded. But for reasons unclear to me, sometimes the stime of P' (successful reload) would predate the timestamp! This was obviously agonizing. >> To reiterate, I'm not using the PID file in this instance to determine >> Unicorn's PID. It could be empty, for all I care. > > OK. I assume you do the same for nginx? With nginx we have -t; we can at least test the config file and have a reasonable degree of certainty that it will reload properly. With Rack apps, not so much. :) --Michael ___ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Re: pid file handling issue
Michael Fischer wrote: > On Wed, Oct 23, 2013 at 7:03 PM, Eric Wong wrote: > > >> > I read and stash the value of the pid file before issuing any USR2. > >> > Later, you can issue "kill -0 $old_pid" after sending SIGQUIT > >> > to ensure it's dead. > >> > >> That's inherently racy; another process can claim the old PID in the > >> interim. > > > > Right, but raciness goes for anything regarding pid files. > > > > The OS does make an effort to avoid recycling PIDs too often, > > and going through all the PIDs in a system quickly is > > probably rare. I haven't hit it, at least. > > That's not good enough. > > The fact that the pid file contains a pid is immaterial to me; I don't > even need to look at it. I only care about when it was created, or > what its inode number is, so that I can detect whether Unicorn was > last successfully started or restarted. rename(2) is atomic per POSIX > and is not subject to race conditions. Right, we looked at using rename last year but I didn't think it's possible given we need to write the pid file before binding new listen sockets http://mid.gmane.org/20121127215146.ga23...@dcvr.yhbt.net But perhaps we can drop the pid file late iff ENV["UNICORN_FD"] is detected. I'll see if that can be done w/o breaking compatibility. > >> > Checking the mtime of the pidfile is really bizarre... > >> > >> Perhaps (though it's a normative criticism), but on the other hand, it > >> isn't subject to the race above. > > > > It's still racy in a different way, though (file could change right > > after checking). > > If the file's mtime or inode number changes under my proposal, that > means the reload must have been successful. What race condition are > you referring to that would render this conclusion inaccurate? It doesn't mean the process didn't exit/crash right after writing the PID. > > Having the process start time in /proc be unreliable because the server > > has the wrong time is also in the same category of corner cases. > > This is absolutely not true. A significant minority, if not a > majority, of servers will have at least slightly inaccurate wall > clocks on boot. This is usually corrected during boot by an NTP sync, > but by then the die has already been cast insofar as ps(1) output is > concerned. But NTP syncs early in the boot process before most processes (including unicorn) are started. It shouldn't matter, then, right? > > Also, can you check the inode of the /proc/$pid entry? Perhaps > > That's not portable. > > > PID files are horrible, really :< > > To reiterate, I'm not using the PID file in this instance to determine > Unicorn's PID. It could be empty, for all I care. OK. I assume you do the same for nginx? ___ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Re: pid file handling issue
On Wed, Oct 23, 2013 at 7:03 PM, Eric Wong wrote: >> > I read and stash the value of the pid file before issuing any USR2. >> > Later, you can issue "kill -0 $old_pid" after sending SIGQUIT >> > to ensure it's dead. >> >> That's inherently racy; another process can claim the old PID in the interim. > > Right, but raciness goes for anything regarding pid files. > > The OS does make an effort to avoid recycling PIDs too often, > and going through all the PIDs in a system quickly is > probably rare. I haven't hit it, at least. That's not good enough. The fact that the pid file contains a pid is immaterial to me; I don't even need to look at it. I only care about when it was created, or what its inode number is, so that I can detect whether Unicorn was last successfully started or restarted. rename(2) is atomic per POSIX and is not subject to race conditions. >> > Checking the mtime of the pidfile is really bizarre... >> >> Perhaps (though it's a normative criticism), but on the other hand, it >> isn't subject to the race above. > > It's still racy in a different way, though (file could change right > after checking). If the file's mtime or inode number changes under my proposal, that means the reload must have been successful. What race condition are you referring to that would render this conclusion inaccurate? > Having the process start time in /proc be unreliable because the server > has the wrong time is also in the same category of corner cases. This is absolutely not true. A significant minority, if not a majority, of servers will have at least slightly inaccurate wall clocks on boot. This is usually corrected during boot by an NTP sync, but by then the die has already been cast insofar as ps(1) output is concerned. > Also, can you check the inode of the /proc/$pid entry? Perhaps That's not portable. > PID files are horrible, really :< To reiterate, I'm not using the PID file in this instance to determine Unicorn's PID. It could be empty, for all I care. --Michael ___ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Re: Forking non web processes
On Thu, Oct 24, 2013 at 9:17 AM, Eric Wong wrote: > I'm also wondering why... sidekiq/resque are standalone daemons > themselves. Shouldn't that be done as part of the deploy/init process? > (unicorn isn't going to become init/upstart/systemd) Agree with Eric here. You probably want to run unicorn and sidekiq / resque in a way that they're not coupled to one another. They should have different startup scripts and monitoring properties. And eventually you may want to move your background worker processes to another machine. - alex sharp ___ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Re: Forking non web processes
Sam Saffron wrote: > Hi Eric, > > I have been trying to get unicorn to allow me to fork off non-web > processes like sidekiq/resque. > > I got this working, except that I am constantly fighting with the > unicorn reaper. Any chance we can add some sort of api to fork off non > web processes? It helps save memory and cut down on master processes. I've been trying to avoid adding unicorn-specific APIs unless absolutely necessary. You're forking off from the master? Worst case is you'll get a log message about an unknown process, right? I'm also wondering why... sidekiq/resque are standalone daemons themselves. Shouldn't that be done as part of the deploy/init process? (unicorn isn't going to become init/upstart/systemd) ___ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Forking non web processes
Hi Eric, I have been trying to get unicorn to allow me to fork off non-web processes like sidekiq/resque. I got this working, except that I am constantly fighting with the unicorn reaper. Any chance we can add some sort of api to fork off non web processes? It helps save memory and cut down on master processes. Cheers Sam ___ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
Коль желателен устойчивейший эффект
это поприличнее безразлично какого учителя http://thesteelunicorn.com/wp-content/plugins/wp_sed/yvftb.htm ___ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying