Re: Review Request 72734: Implemented the `cleanup` method of `volume/csi` isolator.

Qian Zhang Fri, 07 Aug 2020 00:21:13 -0700


> On Aug. 7, 2020, 6:04 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 383-387 (patched)
> > <https://reviews.apache.org/r/72734/diff/1/?file=2237067#file2237067line383>
> >
> >     After looking at the code, it seems that a failure here would surface 
> > as a TASK_FAILED status update to the scheduler due to a failed container 
> > termination, is that your understanding as well?
> >     
> >     I'm wondering if we need to add some extra handling for the failed 
> > unpublish case, either here or in the Mesos containerizer somehow. What 
> > action needs to be taken in the case where an unpublish call to the CSI 
> > plugin fails for some transient reason?


> After looking at the code, it seems that a failure here would surface as a 
> TASK_FAILED status update to the scheduler due to a failed container 
> termination, is that your understanding as well?

Yes.

> I'm wondering if we need to add some extra handling for the failed unpublish 
> case, either here or in the Mesos containerizer somehow. What action needs to 
> be taken in the case where an unpublish call to the CSI plugin fails for some 
> transient reason?

Retry the unpublish call? But how can we know whether it fails for a 
**transient** reason?


> On Aug. 7, 2020, 6:04 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 389-391 (patched)
> > <https://reviews.apache.org/r/72734/diff/1/?file=2237067#file2237067line389>
> >
> >     I'm wondering if there's an issue that if we return a failure here, the 
> > state on disk (the `volumes` file for this container) and the state in 
> > memory (the state stored in `infos`) will be inconsistent.
> >     
> >     If the agent were to restart, the CSI volume isolator could clean up 
> > that file if it refers to an unknown container. In that case, we will have 
> > left a volume published on this node without a record of it, but maybe 
> > that's OK?

In that case, when agent is restarted, containerizer will try to destroy the 
container again which in turn will call `cleanup` to try to unpublish the 
volume again.

Before the agent is restarted, yes we will have left a volume published on the 
node without a record of it in memory (since we have already done 
`infos.erase(containerId)` before making the unpublish call), but I think it 
should OK. Do you see any problems with it?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72734/#review221485
-----------------------------------------------------------


On Aug. 5, 2020, 5:20 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72734/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2020, 5:20 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10154
>     https://issues.apache.org/jira/browse/MESOS-10154
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Implemented the `cleanup` method of `volume/csi` isolator.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/volume/csi/isolator.hpp 
> PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/72734/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>

Re: Review Request 72734: Implemented the `cleanup` method of `volume/csi` isolator.

Reply via email to