Hi, Nathan. Is the couch under heavy load? Thanks.
On Wed, Aug 14, 2013 at 6:15 AM, Joan Touzet <[email protected]> wrote: > On Tue, Aug 13, 2013 at 02:49:28PM -0500, Nathan Vander Wilt wrote: > > I've got 1.7GB disk free and 2GB of memory available at the moment, so > it doesn't seem to be either of those. (I could not find any out-of-memory > process kill logs in /var/log/syslog.) The only clue I can find is in > couchdb.stderr: > > heart_beat_kill_pid = 1390 > > heart_beat_timeout = 11 > > heart: Tue Aug 13 18:34:21 2013: heart-beat time-out, no activity > for 15 seconds > > Killed > > So 15s of system clock time passed without erlang's heart receiving a > ping back. There's a number of possibilities; for instance, if this is a > VM and the clock was advanced/changed by 15s to synchronize with the > main system, heart might see that and issue a kill command. Another > could be extremely heavy load on the system forcing the second couch > process to get swapped out. > > Three suggestions: > > 1. set RESPAWN_TIMEOUT to a non-zero value to force couch to restart > after a kill. Because of its crash-only design this is safe, and > since restarts are rare you're liable to not really be running > into serious issues. > 2. Crank up logging to debug level to see what might be going on > when the heartbeat fails to respond. > 3. Add some additional system monitoring to ensure that you're not > overloading your system on CPU, RAM, I/O or network traffic. > Do you have a lot of views building / heavy system load due to > couchjs processes? > > -- > Joan Touzet | [email protected] | wohali everywhere else >
