Hi All,

I probably have the following account partly wrong, but let me present it
just the same and those who know better can correct me as needed.

I've an application that runs several MongoDB shards, each a Dockerized
container, each on a distinct node (VM); in fact, some of the VMs are on
separate ESXi hosts.

I've lately seen situations where, because of very slow disks for the
database, the following sequence occurs (I think):

   1. Linux (Ubuntu 14.04 LTS) virtual memory manager hits thresholds
   defined by vm.dirty_background_ratio and/or vm.dirty_ratio (probably both)
   2. Synchronous flushing of many, many pages occurs, writing to a slow
   disk
   3. (Around this time one might see in /var/log/syslog "task X blocked
   for more than 120 seconds" for all kinds of tasks, including mesos-master)
   4. mesos-slaves get shutdown (this is the part I'm unclear about; but I
   am quite certain that on 2 nodes the executors and their in-flight MongoDB
   tasks got zapped because I can see that Marathon restarted them).

The consequences of this are a corrupt MongoDB database. In the case at
hand, the job had run for over 50 hours, processing close to 120 million
files.

Steps I've taken so far to remedy include:

   - tune vm.dirty_background_ratio and vm.dirty_ratio down, respectively,
   to 5 and 10 (from 10 and 20). The intent here is to tolerate more frequent,
   smaller flushes and thus avoid less frequent massive flushes that suspend
   threads for very long periods.
   - increase agent ping timeout to 10 minutes (every 30 seconds, 20 times)

So the questions are:

   - Is there some way to be given control (a callback, or an "exit"
   routine) so that the container about to be nuked can be given a chance to
   exit gracefully?
   - Are there other steps I can take to avoid this mildly calamitous
   occurrence?
   - (Also, I'd be grateful for more clarity on anything in steps 1-4 above
   that is a bit hand-wavy!)

As always, thanks.

-Paul

Reply via email to