Hi all,

We are running Mesos 0.22.1 on CentOS 6 and are hitting some frequent 
mesos-slave crashes when we try to upgrade our Marathon applications.  The 
crash happens when Marathon deploys a new version of an application and stops a 
running task.  The error in the Mesos logs is:

tag=mesos-slave[12858]:  F0831 09:37:29.838184 12898 slave.cpp:3354] 
CHECK_SOME(os::touch(path)): Failed to open file: No such file or directory
tag=mesos-slave[12858]:  *** Check failure stack trace: ***
tag=mesos-slave[12858]:      @       0x36a46765cd  (unknown)
tag=mesos-slave[12858]:      @       0x36a467a5e7  (unknown)
tag=mesos-slave[12858]:      @       0x36a4678469  (unknown)
tag=mesos-slave[12858]:      @       0x36a467876d  (unknown)
tag=mesos-slave[12858]:      @       0x36a3fc5696  (unknown)
tag=mesos-slave[12858]:      @       0x36a421855a  (unknown)
tag=mesos-slave[12858]:      @       0x36a421c0a9  (unknown)
tag=mesos-slave[12858]:      @       0x36a42510ff  (unknown)
tag=mesos-slave[12858]:      @       0x36a4618b83  (unknown)
tag=mesos-slave[12858]:      @       0x36a461978c  (unknown)
tag=mesos-slave[12858]:      @       0x3699407a51  (unknown)
tag=mesos-slave[12858]:      @       0x36990e89ad  (unknown)
tag=init:  mesos-slave main process (12858) killed by ABRT signal

It appears in the log immediately after the Docker container stops.  The 
mesos-slave process respawns, but in doing so kills all of the running Docker 
containers on that slave.  It then appears that the mesos-slave process 
terminates a second time, then comes up successfully.  The logs from this 
process are below.

This has been reported by at least one other Marathon user here:  
https://groups.google.com/forum/#!topic/marathon-framework/oKXhfQUcoMQ

Any advice on how to go about troubleshooting this would be most appreciated!

Thanks,
Scott



tag=mesos-slave[17756]:  W0831 09:37:42.474733 17783 slave.cpp:2568] Could not 
find the executor for status update TASK_FINISHED (UUID: 
8583e68d-99f0-4a89-a0fd-af5012a1b35d) for task 
app_pingfederate-console.37953216-4ffe-11e5-bd36-005056a00679 of framework 
20141209-011108-1378273290-5050-23221-0001
tag=mesos-slave[17756]:  W0831 09:37:42.861536 17781 slave.cpp:2557] Ignoring 
status update TASK_FINISHED (UUID: 7251ad5f-7850-471f-9976-b7162e183d0e) for 
task app_legacy.74d76339-4c08-11e5-bd36-005056a00679 of framework 
20141209-011108-1378273290-5050-23221-0001 for terminating framework 
20141209-011108-1378273290-5050-23221-0001
tag=mesos-slave[17756]:  W0831 09:37:42.962225 17779 slave.cpp:2557] Ignoring 
status update TASK_FINISHED (UUID: b6c60f4b-3e7d-46f9-ad54-630f5be1241f) for 
task app_pingfederate-engine.aa4f77a1-46ce-11e5-bd36-005056a00679 of framework 
20141209-011108-1378273290-5050-23221-0001 for terminating framework 
20141209-011108-1378273290-5050-23221-0001
tag=mesos-slave[17756]:  W0831 09:37:43.363952 17780 slave.cpp:2557] Ignoring 
status update TASK_FAILED (UUID: 0d44ee67-f9e3-48d7-b4e1-39d66babcd42) for task 
marathon-hipache-bridge.1461d8c2-411a-11e5-bd36-005056a00679 of framework 
20141209-011108-1378273290-5050-23221-0001 for terminating framework 
20141209-011108-1378273290-5050-23221-0001
tag=mesos-slave[17756]:  W0831 09:37:46.479511 17781 slave.cpp:2557] Ignoring 
status update TASK_FINISHED (UUID: f0cf57e3-3cbd-43f2-bbb2-55ad442a8abc) for 
task service_userservice.b4d14d32-45b9-11e5-bd36-005056a00679 of framework 
20141209-011108-1378273290-5050-23221-0001 for terminating framework 
20141209-011108-1378273290-5050-23221-0001
tag=mesos-slave[17756]:  W0831 09:37:52.476265 17779 
status_update_manager.cpp:472] Resending status update TASK_FINISHED (UUID: 
8583e68d-99f0-4a89-a0fd-af5012a1b35d) for task 
app_pingfederate-console.37953216-4ffe-11e5-bd36-005056a00679 of framework 
20141209-011108-1378273290-5050-23221-0001
tag=mesos-slave[17756]:  W0831 09:37:52.476434 17779 slave.cpp:2731] Dropping 
status update TASK_FINISHED (UUID: 8583e68d-99f0-4a89-a0fd-af5012a1b35d) for 
task app_pingfederate-console.37953216-4ffe-11e5-bd36-005056a00679 of framework 
20141209-011108-1378273290-5050-23221-0001 sent by status update manager 
because the slave is in TERMINATING state
tag=mesos-slave[17756]:  W0831 09:37:54.727569 17782 slave.cpp:2557] Ignoring 
status update TASK_FAILED (UUID: c5e4092e-75cd-44c8-9ee5-efc53f304df3) for task 
service_tripbatchservice.6228c9d7-4a99-11e5-bd36-005056a00679 of framework 
20141209-011108-1378273290-5050-23221-0001 for terminating framework 
20141209-011108-1378273290-5050-23221-0001
tag=mesos-slave[17756]:  W0831 09:37:54.814648 17782 slave.cpp:2557] Ignoring 
status update TASK_FAILED (UUID: a681b752-9522-4acf-8c9f-c6530999d096) for task 
service_mapservice.18904037-411a-11e5-bd36-005056a00679 of framework 
20141209-011108-1378273290-5050-23221-0001 for terminating framework 
20141209-011108-1378273290-5050-23221-0001
tag=mesos-slave[17756]:  E0831 09:37:57.225787 17783 slave.cpp:3112] Container 
'f3da678a-e566-4179-b66a-084e055d32e4' for executor 
'app_pingfederate-engine.97ae7bd6-4ffe-11e5-bd36-005056a00679' of framework 
'20141209-011108-1378273290-5050-23221-0001' failed to start: Container was 
destroyed while launching Critical
tag=mesos-slave[17756]:  E0831 09:37:57.225831 17783 slave.cpp:3207] 
Termination of executor 
'app_pingfederate-engine.97ae7bd6-4ffe-11e5-bd36-005056a00679' of framework 
'20141209-011108-1378273290-5050-23221-0001' failed: Container 
'f3da678a-e566-4179-b66a-084e055d32e4' not found
tag=mesos-slave[17756]:  E0831 09:37:57.234539 17783 slave.cpp:3461] Failed to 
unmonitor container for executor 
app_pingfederate-engine.97ae7bd6-4ffe-11e5-bd36-005056a00679 of framework 
20141209-011108-1378273290-5050-23221-0001: Not monitored
tag=mesos-slave[17756]:  W0831 09:37:57.236387 17783 slave.cpp:2181] Shutting 
down executor 'app_pingfederate-engine.97ae7bd6-4ffe-11e5-bd36-005056a00679' of 
framework 20141209-011108-1378273290-5050-23221-0001 because the slave is 
terminating
tag=mesos-slave[17756]:  E0831 09:38:01.600878 17781 slave.cpp:3112] Container 
'd391d728-efab-4b69-b94f-e4fb65917554' for executor 
'service_userservice.941aa6b0-4ffe-11e5-bd36-005056a00679' of framework 
'20141209-011108-1378273290-5050-23221-0001' failed to start: Container was 
destroyed while launching Critical
tag=mesos-slave[17756]:  E0831 09:38:01.601580 17781 slave.cpp:3207] 
Termination of executor 
'service_userservice.941aa6b0-4ffe-11e5-bd36-005056a00679' of framework 
'20141209-011108-1378273290-5050-23221-0001' failed: Container 
'd391d728-efab-4b69-b94f-e4fb65917554' not found
tag=mesos-slave[17756]:  E0831 09:38:01.601749 17784 slave.cpp:3461] Failed to 
unmonitor container for executor 
service_userservice.941aa6b0-4ffe-11e5-bd36-005056a00679 of framework 
20141209-011108-1378273290-5050-23221-0001: Not monitored
tag=mesos-slave[17756]:  2015-08-31 
09:38:01,602:17756(0x7fce986b4820):ZOO_INFO@zookeeper_close@2505: Closing 
zookeeper sessionId=0x24e62be7f030025 to [10.200.38.82:2181]
tag=init:  mesos-slave main process ended, respawning Context




SCOTT RANKIN
VP, Technology
Motus, LLC
Two Financial Center, 60 South Street, Boston, MA 02111
617.467.1931 (W) | sran...@motus.com<mailto:rcaraf...@motus.com>

Follow us on LinkedIn<https://www.linkedin.com/company/motus-llc/> | Visit us 
at motus.com<http://www.motus.com/>


This email message contains information that Motus, LLC considers confidential 
and/or proprietary, or may later designate as confidential and proprietary. It 
is intended only for use of the individual or entity named above and should not 
be forwarded to any other persons or entities without the express consent of 
Motus, LLC, nor should it be used for any purpose other than in the course of 
any potential or actual business relationship with Motus, LLC. If the reader of 
this message is not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution, or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written 
advice include a disclaimer. To the extent the preceding message contains 
advice relating to a Federal tax issue, unless expressly stated otherwise the 
advice is not intended or written to be used, and it cannot be used by the 
recipient or any other taxpayer, for the purpose of avoiding Federal tax 
penalties, and was not written to support the promotion or marketing of any 
transaction or matter discussed herein.

Reply via email to