Gilbert Song created MESOS-9507: ----------------------------------- Summary: Agent could not recover due to empty docker volume checkpointed files. Key: MESOS-9507 URL: https://issues.apache.org/jira/browse/MESOS-9507 Project: Mesos Issue Type: Bug Components: containerization Reporter: Gilbert Song
Agent could not recover due to empty docker volume checkpointed files. Please see logs: {noformat} Nov 12 17:12:00 guppy mesos-agent[38960]: E1112 17:12:00.978682 38969 slave.cpp:6279] EXIT with status 1: Failed to perform recovery: Collect failed: Collect failed: Failed to recover docker volumes for orphan container e1b04051-1e4a-47a9-b866-1d625cda1d22: JSON parse failed: syntax error at line 1 near: Nov 12 17:12:00 guppy mesos-agent[38960]: To remedy this do as follows: Nov 12 17:12:00 guppy mesos-agent[38960]: Step 1: rm -f /var/lib/mesos/slave/meta/slaves/latest Nov 12 17:12:00 guppy mesos-agent[38960]: This ensures agent doesn't recover old live executors. Nov 12 17:12:00 guppy mesos-agent[38960]: Step 2: Restart the agent. Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service: main process exited, code=exited, status=1/FAILURE Nov 12 17:12:00 guppy systemd[1]: Unit dcos-mesos-slave.service entered failed state. Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service failed. {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)