I appreciate all your help Eric and Daniel.  I have not solved this yet,
but I think I have narrowed it down to a Firewall timeout issue.  One app
uses a database connection to Oracle, the other app uses a 3rd Party API
(still on location, but across the network).  The ping times to both of
these devices are extremely fast, however 30 minutes of inactivity across
the Firewall seems to disconnect these connections.  At least that appears
to be what the strace is telling me.  The place in the strace that the
timeout occurs is consistent, every time.  For example the strace of the
app that connects to Oracle shows this:

pid  7825] write(14,
"\0\373\0\0\6\0\0\0\0\0\21iB\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\v\0\0\0\3^Ca\201\0\0\0\0\0\0\376\377\377\377\377\377\377\377\22\0\0\0\376\377\377\377\377\377\377\377\r\0\0\0\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\0\0\0\0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\376\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\22select
1 from
dual\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
251) = 251
[pid  7825] read(14,  <unfinished ...>
[pid  7827] +++ killed by SIGKILL +++
PANIC: handle_group_exit: 7827 leader 7825
[pid  7846] +++ killed by SIGKILL +++
PANIC: handle_group_exit: 7846 leader 7825
+++ killed by SIGKILL +++

Clearly that is a database query 'select 1 from dual'.  It times out trying
to read the response.  At the same time if I watch the lsof -p <pid>, I see
that the database connection drops after 30 minutes.

I'll update this thread again once it is solved, for historical and future
issues (in case someone else experiences something similar).

Again thank you for your help Eric!


On Mon, Aug 4, 2014 at 4:46 PM, Eric Wong <e...@80x24.org> wrote:

> Eric Wong <e...@80x24.org> wrote:
> > Did you try strace-ing for 30 minutes and reproducing the error?
>
> You can also try setting the unicorn timeout to longer than 30
> minutes and get a longer/stalled strace.
>


Reply via email to