Hi all, We are running Mesos 0.22.1 on CentOS 6 and are hitting some frequent mesos-slave crashes when we try to upgrade our Marathon applications. The crash happens when Marathon deploys a new version of an application and stops a running task. The error in the Mesos logs is:
tag=mesos-slave[12858]: F0831 09:37:29.838184 12898 slave.cpp:3354] CHECK_SOME(os::touch(path)): Failed to open file: No such file or directory tag=mesos-slave[12858]: *** Check failure stack trace: *** tag=mesos-slave[12858]: @ 0x36a46765cd (unknown) tag=mesos-slave[12858]: @ 0x36a467a5e7 (unknown) tag=mesos-slave[12858]: @ 0x36a4678469 (unknown) tag=mesos-slave[12858]: @ 0x36a467876d (unknown) tag=mesos-slave[12858]: @ 0x36a3fc5696 (unknown) tag=mesos-slave[12858]: @ 0x36a421855a (unknown) tag=mesos-slave[12858]: @ 0x36a421c0a9 (unknown) tag=mesos-slave[12858]: @ 0x36a42510ff (unknown) tag=mesos-slave[12858]: @ 0x36a4618b83 (unknown) tag=mesos-slave[12858]: @ 0x36a461978c (unknown) tag=mesos-slave[12858]: @ 0x3699407a51 (unknown) tag=mesos-slave[12858]: @ 0x36990e89ad (unknown) tag=init: mesos-slave main process (12858) killed by ABRT signal It appears in the log immediately after the Docker container stops. The mesos-slave process respawns, but in doing so kills all of the running Docker containers on that slave. It then appears that the mesos-slave process terminates a second time, then comes up successfully. The logs from this process are below. This has been reported by at least one other Marathon user here: https://groups.google.com/forum/#!topic/marathon-framework/oKXhfQUcoMQ Any advice on how to go about troubleshooting this would be most appreciated! Thanks, Scott tag=mesos-slave[17756]: W0831 09:37:42.474733 17783 slave.cpp:2568] Could not find the executor for status update TASK_FINISHED (UUID: 8583e68d-99f0-4a89-a0fd-af5012a1b35d) for task app_pingfederate-console.37953216-4ffe-11e5-bd36-005056a00679 of framework 20141209-011108-1378273290-5050-23221-0001 tag=mesos-slave[17756]: W0831 09:37:42.861536 17781 slave.cpp:2557] Ignoring status update TASK_FINISHED (UUID: 7251ad5f-7850-471f-9976-b7162e183d0e) for task app_legacy.74d76339-4c08-11e5-bd36-005056a00679 of framework 20141209-011108-1378273290-5050-23221-0001 for terminating framework 20141209-011108-1378273290-5050-23221-0001 tag=mesos-slave[17756]: W0831 09:37:42.962225 17779 slave.cpp:2557] Ignoring status update TASK_FINISHED (UUID: b6c60f4b-3e7d-46f9-ad54-630f5be1241f) for task app_pingfederate-engine.aa4f77a1-46ce-11e5-bd36-005056a00679 of framework 20141209-011108-1378273290-5050-23221-0001 for terminating framework 20141209-011108-1378273290-5050-23221-0001 tag=mesos-slave[17756]: W0831 09:37:43.363952 17780 slave.cpp:2557] Ignoring status update TASK_FAILED (UUID: 0d44ee67-f9e3-48d7-b4e1-39d66babcd42) for task marathon-hipache-bridge.1461d8c2-411a-11e5-bd36-005056a00679 of framework 20141209-011108-1378273290-5050-23221-0001 for terminating framework 20141209-011108-1378273290-5050-23221-0001 tag=mesos-slave[17756]: W0831 09:37:46.479511 17781 slave.cpp:2557] Ignoring status update TASK_FINISHED (UUID: f0cf57e3-3cbd-43f2-bbb2-55ad442a8abc) for task service_userservice.b4d14d32-45b9-11e5-bd36-005056a00679 of framework 20141209-011108-1378273290-5050-23221-0001 for terminating framework 20141209-011108-1378273290-5050-23221-0001 tag=mesos-slave[17756]: W0831 09:37:52.476265 17779 status_update_manager.cpp:472] Resending status update TASK_FINISHED (UUID: 8583e68d-99f0-4a89-a0fd-af5012a1b35d) for task app_pingfederate-console.37953216-4ffe-11e5-bd36-005056a00679 of framework 20141209-011108-1378273290-5050-23221-0001 tag=mesos-slave[17756]: W0831 09:37:52.476434 17779 slave.cpp:2731] Dropping status update TASK_FINISHED (UUID: 8583e68d-99f0-4a89-a0fd-af5012a1b35d) for task app_pingfederate-console.37953216-4ffe-11e5-bd36-005056a00679 of framework 20141209-011108-1378273290-5050-23221-0001 sent by status update manager because the slave is in TERMINATING state tag=mesos-slave[17756]: W0831 09:37:54.727569 17782 slave.cpp:2557] Ignoring status update TASK_FAILED (UUID: c5e4092e-75cd-44c8-9ee5-efc53f304df3) for task service_tripbatchservice.6228c9d7-4a99-11e5-bd36-005056a00679 of framework 20141209-011108-1378273290-5050-23221-0001 for terminating framework 20141209-011108-1378273290-5050-23221-0001 tag=mesos-slave[17756]: W0831 09:37:54.814648 17782 slave.cpp:2557] Ignoring status update TASK_FAILED (UUID: a681b752-9522-4acf-8c9f-c6530999d096) for task service_mapservice.18904037-411a-11e5-bd36-005056a00679 of framework 20141209-011108-1378273290-5050-23221-0001 for terminating framework 20141209-011108-1378273290-5050-23221-0001 tag=mesos-slave[17756]: E0831 09:37:57.225787 17783 slave.cpp:3112] Container 'f3da678a-e566-4179-b66a-084e055d32e4' for executor 'app_pingfederate-engine.97ae7bd6-4ffe-11e5-bd36-005056a00679' of framework '20141209-011108-1378273290-5050-23221-0001' failed to start: Container was destroyed while launching Critical tag=mesos-slave[17756]: E0831 09:37:57.225831 17783 slave.cpp:3207] Termination of executor 'app_pingfederate-engine.97ae7bd6-4ffe-11e5-bd36-005056a00679' of framework '20141209-011108-1378273290-5050-23221-0001' failed: Container 'f3da678a-e566-4179-b66a-084e055d32e4' not found tag=mesos-slave[17756]: E0831 09:37:57.234539 17783 slave.cpp:3461] Failed to unmonitor container for executor app_pingfederate-engine.97ae7bd6-4ffe-11e5-bd36-005056a00679 of framework 20141209-011108-1378273290-5050-23221-0001: Not monitored tag=mesos-slave[17756]: W0831 09:37:57.236387 17783 slave.cpp:2181] Shutting down executor 'app_pingfederate-engine.97ae7bd6-4ffe-11e5-bd36-005056a00679' of framework 20141209-011108-1378273290-5050-23221-0001 because the slave is terminating tag=mesos-slave[17756]: E0831 09:38:01.600878 17781 slave.cpp:3112] Container 'd391d728-efab-4b69-b94f-e4fb65917554' for executor 'service_userservice.941aa6b0-4ffe-11e5-bd36-005056a00679' of framework '20141209-011108-1378273290-5050-23221-0001' failed to start: Container was destroyed while launching Critical tag=mesos-slave[17756]: E0831 09:38:01.601580 17781 slave.cpp:3207] Termination of executor 'service_userservice.941aa6b0-4ffe-11e5-bd36-005056a00679' of framework '20141209-011108-1378273290-5050-23221-0001' failed: Container 'd391d728-efab-4b69-b94f-e4fb65917554' not found tag=mesos-slave[17756]: E0831 09:38:01.601749 17784 slave.cpp:3461] Failed to unmonitor container for executor service_userservice.941aa6b0-4ffe-11e5-bd36-005056a00679 of framework 20141209-011108-1378273290-5050-23221-0001: Not monitored tag=mesos-slave[17756]: 2015-08-31 09:38:01,602:17756(0x7fce986b4820):ZOO_INFO@zookeeper_close@2505: Closing zookeeper sessionId=0x24e62be7f030025 to [10.200.38.82:2181] tag=init: mesos-slave main process ended, respawning Context SCOTT RANKIN VP, Technology Motus, LLC Two Financial Center, 60 South Street, Boston, MA 02111 617.467.1931 (W) | sran...@motus.com<mailto:rcaraf...@motus.com> Follow us on LinkedIn<https://www.linkedin.com/company/motus-llc/> | Visit us at motus.com<http://www.motus.com/> This email message contains information that Motus, LLC considers confidential and/or proprietary, or may later designate as confidential and proprietary. It is intended only for use of the individual or entity named above and should not be forwarded to any other persons or entities without the express consent of Motus, LLC, nor should it be used for any purpose other than in the course of any potential or actual business relationship with Motus, LLC. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately and destroy the original message. Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent the preceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is not intended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of avoiding Federal tax penalties, and was not written to support the promotion or marketing of any transaction or matter discussed herein.