[jira] [Assigned] (MESOS-9507) Agent could not recover due to empty docker volume checkpointed files.

2019-02-11 Thread Qian Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang reassigned MESOS-9507:
-

Assignee: Qian Zhang  (was: Andrei Budnik)

> Agent could not recover due to empty docker volume checkpointed files.
> --
>
> Key: MESOS-9507
> URL: https://issues.apache.org/jira/browse/MESOS-9507
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: containerizer
>
> Agent could not recover due to empty docker volume checkpointed files. Please 
> see logs:
> {noformat}
> Nov 12 17:12:00 guppy mesos-agent[38960]: E1112 17:12:00.978682 38969 
> slave.cpp:6279] EXIT with status 1: Failed to perform recovery: Collect 
> failed: Collect failed: Failed to recover docker volumes for orphan container 
> e1b04051-1e4a-47a9-b866-1d625cda1d22: JSON parse failed: syntax error at line 
> 1 near:
> Nov 12 17:12:00 guppy mesos-agent[38960]: To remedy this do as follows: 
> Nov 12 17:12:00 guppy mesos-agent[38960]: Step 1: rm -f 
> /var/lib/mesos/slave/meta/slaves/latest
> Nov 12 17:12:00 guppy mesos-agent[38960]: This ensures agent doesn't recover 
> old live executors.
> Nov 12 17:12:00 guppy mesos-agent[38960]: Step 2: Restart the agent. 
> Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service: main process 
> exited, code=exited, status=1/FAILURE
> Nov 12 17:12:00 guppy systemd[1]: Unit dcos-mesos-slave.service entered 
> failed state.
> Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service failed.
> {noformat}
> This is caused by agent recovery after the volume state file is created but 
> before checkpointing finishes. Basically the docker volume is not mounted 
> yet, so the docker volume isolator should skip recovering this volume.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-9507) Agent could not recover due to empty docker volume checkpointed files.

2019-02-07 Thread Andrei Budnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Budnik reassigned MESOS-9507:


Assignee: Andrei Budnik  (was: Gilbert Song)

> Agent could not recover due to empty docker volume checkpointed files.
> --
>
> Key: MESOS-9507
> URL: https://issues.apache.org/jira/browse/MESOS-9507
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Andrei Budnik
>Priority: Critical
>  Labels: containerizer
>
> Agent could not recover due to empty docker volume checkpointed files. Please 
> see logs:
> {noformat}
> Nov 12 17:12:00 guppy mesos-agent[38960]: E1112 17:12:00.978682 38969 
> slave.cpp:6279] EXIT with status 1: Failed to perform recovery: Collect 
> failed: Collect failed: Failed to recover docker volumes for orphan container 
> e1b04051-1e4a-47a9-b866-1d625cda1d22: JSON parse failed: syntax error at line 
> 1 near:
> Nov 12 17:12:00 guppy mesos-agent[38960]: To remedy this do as follows: 
> Nov 12 17:12:00 guppy mesos-agent[38960]: Step 1: rm -f 
> /var/lib/mesos/slave/meta/slaves/latest
> Nov 12 17:12:00 guppy mesos-agent[38960]: This ensures agent doesn't recover 
> old live executors.
> Nov 12 17:12:00 guppy mesos-agent[38960]: Step 2: Restart the agent. 
> Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service: main process 
> exited, code=exited, status=1/FAILURE
> Nov 12 17:12:00 guppy systemd[1]: Unit dcos-mesos-slave.service entered 
> failed state.
> Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service failed.
> {noformat}
> This is caused by agent recovery after the volume state file is created but 
> before checkpointing finishes. Basically the docker volume is not mounted 
> yet, so the docker volume isolator should skip recovering this volume.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-9507) Agent could not recover due to empty docker volume checkpointed files.

2019-01-30 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-9507:
---

Assignee: Gilbert Song
  Sprint: Containerization RI10 Spr 39
Story Points: 5

> Agent could not recover due to empty docker volume checkpointed files.
> --
>
> Key: MESOS-9507
> URL: https://issues.apache.org/jira/browse/MESOS-9507
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: containerizer
>
> Agent could not recover due to empty docker volume checkpointed files. Please 
> see logs:
> {noformat}
> Nov 12 17:12:00 guppy mesos-agent[38960]: E1112 17:12:00.978682 38969 
> slave.cpp:6279] EXIT with status 1: Failed to perform recovery: Collect 
> failed: Collect failed: Failed to recover docker volumes for orphan container 
> e1b04051-1e4a-47a9-b866-1d625cda1d22: JSON parse failed: syntax error at line 
> 1 near:
> Nov 12 17:12:00 guppy mesos-agent[38960]: To remedy this do as follows: 
> Nov 12 17:12:00 guppy mesos-agent[38960]: Step 1: rm -f 
> /var/lib/mesos/slave/meta/slaves/latest
> Nov 12 17:12:00 guppy mesos-agent[38960]: This ensures agent doesn't recover 
> old live executors.
> Nov 12 17:12:00 guppy mesos-agent[38960]: Step 2: Restart the agent. 
> Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service: main process 
> exited, code=exited, status=1/FAILURE
> Nov 12 17:12:00 guppy systemd[1]: Unit dcos-mesos-slave.service entered 
> failed state.
> Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service failed.
> {noformat}
> This is caused by agent recovery after the volume state file is created but 
> before checkpointing finishes. Basically the docker volume is not mounted 
> yet, so the docker volume isolator should skip recovering this volume.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)