Hi, Nathan. Is the couch under heavy load? Thanks.

On Wed, Aug 14, 2013 at 6:15 AM, Joan Touzet <[email protected]> wrote:

> On Tue, Aug 13, 2013 at 02:49:28PM -0500, Nathan Vander Wilt wrote:
> > I've got 1.7GB disk free and 2GB of memory available at the moment, so
> it doesn't seem to be either of those. (I could not find any out-of-memory
> process kill logs in /var/log/syslog.) The only clue I can find is in
> couchdb.stderr:
> >     heart_beat_kill_pid = 1390
> >     heart_beat_timeout = 11
> >     heart: Tue Aug 13 18:34:21 2013: heart-beat time-out, no activity
> for 15 seconds
> >     Killed
>
> So 15s of system clock time passed without erlang's heart receiving a
> ping back. There's a number of possibilities; for instance, if this is a
> VM and the clock was advanced/changed by 15s to synchronize with the
> main system, heart might see that and issue a kill command. Another
> could be extremely heavy load on the system forcing the second couch
> process to get swapped out.
>
> Three suggestions:
>
>   1. set RESPAWN_TIMEOUT to a non-zero value to force couch to restart
>      after a kill. Because of its crash-only design this is safe, and
>      since restarts are rare you're liable to not really be running
>      into serious issues.
>   2. Crank up logging to debug level to see what might be going on
>      when the heartbeat fails to respond.
>   3. Add some additional system monitoring to ensure that you're not
>      overloading your system on CPU, RAM, I/O or network traffic.
>      Do you have a lot of views building / heavy system load due to
>      couchjs processes?
>
> --
> Joan Touzet | [email protected] | wohali everywhere else
>

Reply via email to