Yep, it occurs after 30 minutes of inactivity.  Down to the minute; I hit
the site at 3:40 and tried at 4:10 and sure enough:

E, [2014-08-04T16:10:52.143541 #2596] ERROR -- : worker=3D0 PID:2599 timeou=
t
(21s > 20s), killing
E, [2014-08-04T16:10:52.158459 #2596] ERROR -- : reaped #<Process::Status:
pid 2599 SIGKILL (signal 9)> worker=3D0
I, [2014-08-04T16:10:52.181648 #3086]  INFO -- : worker=3D0 ready

2014/08/04 16:10:52 [error] 1684#0: *13 upstream prematurely closed
connection while reading response header from upstream, client: *.*.*.*,
server: ***.org, request: "GET /outages HTTP/1.1", upstream:
"http://unix:/var/www/sites/***/shared/sockets/.unicorn.sock.0:/outages";,
host: "***.org", referrer: "http://***.org/outages";

=E2=80=8B=E2=80=8B=3D=3D=3D

This occurs on both instances of unicorn workers that we have opened.  I'm
going to reduce that to one instance, per Eric, to continue troubleshooting
in the smallest possible way.

1) It does not appear to be an nginx persistent connection issue, because
once the worker is reaped and restarted, nginx serves the content with no
problems.
2) No NFS mounts, no file locks, no FIFO issues.  (note: one of the apps
does write to files, aside from logs, but problem exists in both apps).

It's also important to note that once the worker is reaped the site is
blazingly fast, sub second responses (2s most time spent to show the
biggest page).  Until 30 minutes of inactivity, in which case timeout issue
and worker is reaped (rinse and repeat).

For the database portion, the DBA says inactivity is killed after 3 hours.
 Far greater time span than this issue is occurring.

Have any other ideas of places I can look?  It's too consistent, it has to
be some specific setting or functionality that does this.

I checked my TCP Timeout settings just in case, but the timeout is set to
2hrs.


On Mon, Aug 4, 2014 at 3:34 PM, Eric Wong <e...@80x24.org> wrote:

> Daniel Condomitti <dan...@condomitti.com> wrote:
> > It could also be that your TCP keepalive interval is higher than your
> > database server=E2=80=99s connection timeout. I=E2=80=99ve run into tha=
t in the past.
>
> That kicks in at around 2 hours by default on Linux systems.
> I'm not sure it would matter for Tony's case since he hit it
> after ~30 minutes of idle (unless he tuned the knobs himself).
>
> ref: tcp_keep* knobs in
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Docu=
mentation/networking/ip-sysctl.txt
>
> unicorn itself has no timers outside of the configurable timeout.
>


Reply via email to