Hi All, I probably have the following account partly wrong, but let me present it just the same and those who know better can correct me as needed.
I've an application that runs several MongoDB shards, each a Dockerized container, each on a distinct node (VM); in fact, some of the VMs are on separate ESXi hosts. I've lately seen situations where, because of very slow disks for the database, the following sequence occurs (I think): 1. Linux (Ubuntu 14.04 LTS) virtual memory manager hits thresholds defined by vm.dirty_background_ratio and/or vm.dirty_ratio (probably both) 2. Synchronous flushing of many, many pages occurs, writing to a slow disk 3. (Around this time one might see in /var/log/syslog "task X blocked for more than 120 seconds" for all kinds of tasks, including mesos-master) 4. mesos-slaves get shutdown (this is the part I'm unclear about; but I am quite certain that on 2 nodes the executors and their in-flight MongoDB tasks got zapped because I can see that Marathon restarted them). The consequences of this are a corrupt MongoDB database. In the case at hand, the job had run for over 50 hours, processing close to 120 million files. Steps I've taken so far to remedy include: - tune vm.dirty_background_ratio and vm.dirty_ratio down, respectively, to 5 and 10 (from 10 and 20). The intent here is to tolerate more frequent, smaller flushes and thus avoid less frequent massive flushes that suspend threads for very long periods. - increase agent ping timeout to 10 minutes (every 30 seconds, 20 times) So the questions are: - Is there some way to be given control (a callback, or an "exit" routine) so that the container about to be nuked can be given a chance to exit gracefully? - Are there other steps I can take to avoid this mildly calamitous occurrence? - (Also, I'd be grateful for more clarity on anything in steps 1-4 above that is a bit hand-wavy!) As always, thanks. -Paul

