Frédéric Comte created MESOS-9936: ------------------------------------- Summary: Slave recovery is very slow with high local volume persistant ( marathon app ) Key: MESOS-9936 URL: https://issues.apache.org/jira/browse/MESOS-9936 Project: Mesos Issue Type: Bug Components: agent Reporter: Frédéric Comte
I run some local persistant applications.. After an unplannified shutdown of nodes running this kind of applications, I see that the recovery process of mesos is taking a lot of time (more than 8 hours)... This time depends of the amount of data in those volumes. What does Mesos do in this process ? {code:java} Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.771447 13370 docker.cpp:890] Recovering Docker containers Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.783957 13375 containerizer.cpp:801] Recovering Mesos containers Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.799252 13373 linux_launcher.cpp:286] Recovering Linux launcher Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.810429 13375 containerizer.cpp:1127] Recovering isolators Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.817328 13389 containerizer.cpp:1166] Recovering provisioner Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.928683 13373 composing.cpp:339] Finished recovering all containerizers Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.950503 13354 status_update_manager_process.hpp:314] Recovering operation status update manager Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.957418 13399 slave.cpp:7729] Recovering executors {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)