Yep, it occurs after 30 minutes of inactivity. Down to the minute; I hit the site at 3:40 and tried at 4:10 and sure enough:
E, [2014-08-04T16:10:52.143541 #2596] ERROR -- : worker=3D0 PID:2599 timeou= t (21s > 20s), killing E, [2014-08-04T16:10:52.158459 #2596] ERROR -- : reaped #<Process::Status: pid 2599 SIGKILL (signal 9)> worker=3D0 I, [2014-08-04T16:10:52.181648 #3086] INFO -- : worker=3D0 ready 2014/08/04 16:10:52 [error] 1684#0: *13 upstream prematurely closed connection while reading response header from upstream, client: *.*.*.*, server: ***.org, request: "GET /outages HTTP/1.1", upstream: "http://unix:/var/www/sites/***/shared/sockets/.unicorn.sock.0:/outages", host: "***.org", referrer: "http://***.org/outages" =E2=80=8B=E2=80=8B=3D=3D=3D This occurs on both instances of unicorn workers that we have opened. I'm going to reduce that to one instance, per Eric, to continue troubleshooting in the smallest possible way. 1) It does not appear to be an nginx persistent connection issue, because once the worker is reaped and restarted, nginx serves the content with no problems. 2) No NFS mounts, no file locks, no FIFO issues. (note: one of the apps does write to files, aside from logs, but problem exists in both apps). It's also important to note that once the worker is reaped the site is blazingly fast, sub second responses (2s most time spent to show the biggest page). Until 30 minutes of inactivity, in which case timeout issue and worker is reaped (rinse and repeat). For the database portion, the DBA says inactivity is killed after 3 hours. Far greater time span than this issue is occurring. Have any other ideas of places I can look? It's too consistent, it has to be some specific setting or functionality that does this. I checked my TCP Timeout settings just in case, but the timeout is set to 2hrs. On Mon, Aug 4, 2014 at 3:34 PM, Eric Wong <e...@80x24.org> wrote: > Daniel Condomitti <dan...@condomitti.com> wrote: > > It could also be that your TCP keepalive interval is higher than your > > database server=E2=80=99s connection timeout. I=E2=80=99ve run into tha= t in the past. > > That kicks in at around 2 hours by default on Linux systems. > I'm not sure it would matter for Tony's case since he hit it > after ~30 minutes of idle (unless he tuned the knobs himself). > > ref: tcp_keep* knobs in > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Docu= mentation/networking/ip-sysctl.txt > > unicorn itself has no timers outside of the configurable timeout. >