[jira] [Assigned] (MESOS-9507) Agent could not recover due to empty docker volume checkpointed files.
[ https://issues.apache.org/jira/browse/MESOS-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Zhang reassigned MESOS-9507: - Assignee: Qian Zhang (was: Andrei Budnik) > Agent could not recover due to empty docker volume checkpointed files. > -- > > Key: MESOS-9507 > URL: https://issues.apache.org/jira/browse/MESOS-9507 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Qian Zhang >Priority: Critical > Labels: containerizer > > Agent could not recover due to empty docker volume checkpointed files. Please > see logs: > {noformat} > Nov 12 17:12:00 guppy mesos-agent[38960]: E1112 17:12:00.978682 38969 > slave.cpp:6279] EXIT with status 1: Failed to perform recovery: Collect > failed: Collect failed: Failed to recover docker volumes for orphan container > e1b04051-1e4a-47a9-b866-1d625cda1d22: JSON parse failed: syntax error at line > 1 near: > Nov 12 17:12:00 guppy mesos-agent[38960]: To remedy this do as follows: > Nov 12 17:12:00 guppy mesos-agent[38960]: Step 1: rm -f > /var/lib/mesos/slave/meta/slaves/latest > Nov 12 17:12:00 guppy mesos-agent[38960]: This ensures agent doesn't recover > old live executors. > Nov 12 17:12:00 guppy mesos-agent[38960]: Step 2: Restart the agent. > Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service: main process > exited, code=exited, status=1/FAILURE > Nov 12 17:12:00 guppy systemd[1]: Unit dcos-mesos-slave.service entered > failed state. > Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service failed. > {noformat} > This is caused by agent recovery after the volume state file is created but > before checkpointing finishes. Basically the docker volume is not mounted > yet, so the docker volume isolator should skip recovering this volume. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9507) Agent could not recover due to empty docker volume checkpointed files.
[ https://issues.apache.org/jira/browse/MESOS-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Budnik reassigned MESOS-9507: Assignee: Andrei Budnik (was: Gilbert Song) > Agent could not recover due to empty docker volume checkpointed files. > -- > > Key: MESOS-9507 > URL: https://issues.apache.org/jira/browse/MESOS-9507 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Andrei Budnik >Priority: Critical > Labels: containerizer > > Agent could not recover due to empty docker volume checkpointed files. Please > see logs: > {noformat} > Nov 12 17:12:00 guppy mesos-agent[38960]: E1112 17:12:00.978682 38969 > slave.cpp:6279] EXIT with status 1: Failed to perform recovery: Collect > failed: Collect failed: Failed to recover docker volumes for orphan container > e1b04051-1e4a-47a9-b866-1d625cda1d22: JSON parse failed: syntax error at line > 1 near: > Nov 12 17:12:00 guppy mesos-agent[38960]: To remedy this do as follows: > Nov 12 17:12:00 guppy mesos-agent[38960]: Step 1: rm -f > /var/lib/mesos/slave/meta/slaves/latest > Nov 12 17:12:00 guppy mesos-agent[38960]: This ensures agent doesn't recover > old live executors. > Nov 12 17:12:00 guppy mesos-agent[38960]: Step 2: Restart the agent. > Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service: main process > exited, code=exited, status=1/FAILURE > Nov 12 17:12:00 guppy systemd[1]: Unit dcos-mesos-slave.service entered > failed state. > Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service failed. > {noformat} > This is caused by agent recovery after the volume state file is created but > before checkpointing finishes. Basically the docker volume is not mounted > yet, so the docker volume isolator should skip recovering this volume. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9507) Agent could not recover due to empty docker volume checkpointed files.
[ https://issues.apache.org/jira/browse/MESOS-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song reassigned MESOS-9507: --- Assignee: Gilbert Song Sprint: Containerization RI10 Spr 39 Story Points: 5 > Agent could not recover due to empty docker volume checkpointed files. > -- > > Key: MESOS-9507 > URL: https://issues.apache.org/jira/browse/MESOS-9507 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Critical > Labels: containerizer > > Agent could not recover due to empty docker volume checkpointed files. Please > see logs: > {noformat} > Nov 12 17:12:00 guppy mesos-agent[38960]: E1112 17:12:00.978682 38969 > slave.cpp:6279] EXIT with status 1: Failed to perform recovery: Collect > failed: Collect failed: Failed to recover docker volumes for orphan container > e1b04051-1e4a-47a9-b866-1d625cda1d22: JSON parse failed: syntax error at line > 1 near: > Nov 12 17:12:00 guppy mesos-agent[38960]: To remedy this do as follows: > Nov 12 17:12:00 guppy mesos-agent[38960]: Step 1: rm -f > /var/lib/mesos/slave/meta/slaves/latest > Nov 12 17:12:00 guppy mesos-agent[38960]: This ensures agent doesn't recover > old live executors. > Nov 12 17:12:00 guppy mesos-agent[38960]: Step 2: Restart the agent. > Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service: main process > exited, code=exited, status=1/FAILURE > Nov 12 17:12:00 guppy systemd[1]: Unit dcos-mesos-slave.service entered > failed state. > Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service failed. > {noformat} > This is caused by agent recovery after the volume state file is created but > before checkpointing finishes. Basically the docker volume is not mounted > yet, so the docker volume isolator should skip recovering this volume. -- This message was sent by Atlassian JIRA (v7.6.3#76005)