> On March 12, 2016, 4:02 a.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/isolators/network/cni.hpp, line 85
> > <https://reviews.apache.org/r/44514/diff/2/?file=1294397#file1294397line85>
> >
> >     I would suggest we have a `Info` for each container.
> >     
> >     ```
> >     struct Info
> >     {
> >       ...
> >     };
> >     ```
> >     
> >     Remember that for any field you put into the Info struct, you need to 
> > be able to 'recover' it in recover function. It's also possible that 
> > ExecutorInfo/TaskInfo are not available during recover (orphans due to 
> > wiped meta data). You need to think about how to recover (e.g., the name of 
> > the network) that the container has joined.
> Avinash sridharan wrote:
>     We also need to recover the IP address associated with the container. 
> Since this has to be returned as part of `NetworkINfo` . Need to figure out 
> how to recover this?
> Jie Yu wrote:
>     I am not sure. Why do we need to store/recover IP address? When destroy a 
> container, the CNI plugin only needs container ID (which can be the same as 
> our container Id), the namespace handle and the networkconfig file. The rest 
> can all be fixed.
>     I think the trick part is: when there is an orphan container (no 
> executorinfo/taskinfo), how can we figure out its network namespace handle 
> and networconfig file so that we can destroy it (call CNI plugin DEL). We 
> need to know the name of the network to get the networkcofig file, and the 
> pid to get the network namespace handle.
> Qian Zhang wrote:
>     Yes, that is the tricky part. For orphan container (e.g., the container 
> created by the framework without checkpoint enabled), when agent restarts, in 
> the recover() we only have container ID which is not enough to invoke CNI 
> plugin DEL. Totally we need:
>     1) Container ID.
>     2) Network namespace path.
>     3) Network configuration.
>     4) Name of the interface inside the container.
>     I think currently we only have 1).
>     For 2), my idea is, in `isolate()`, before calling CNI plugin, we do a 
> bind mount (`/var/run/netns/[pid]` -> `proc/[pid]/ns/net`), and then a symbol 
> link (`/var/run/mesos/cni/netns/[containerId]` -> `/var/run/netns/[pid]`). 
> And then in `recover()`, since we have `containerId` for orphan containers, 
> we should be able to figure out the `pid` based on the symbol link, and then 
> get the network namespace path `/proc/[pid]/ns/net`.
>     For 3), actually when we call CNI plugin for a container in `isolate()`, 
> the CNI plugin (actually the CNI `host-local` IPAM plugin) will create a file 
> under `/var/lib/cni/networks/[neworkName]/`, the name of the file is the IP 
> assigned to the container, and the content of file is the container ID. So if 
> a container is assigned an IP from a CNI network, it must have a file under 
> `/var/lib/cni/networks/[neworkName]/`, we may leverage this information to 
> figure out the name of CNI networks the orphan container joins. But this may 
> only work for the CNI network which uses `host-local` IPAM plugin, for `dhcp` 
> IPAM plugin, I think it will not write any files for containers.
>     For 4), I do not have idea for it yet :-(
>     Or another solution would be, when we create the symbol link in 
> `isolate()`, we may encode more info into the name of symbol link, e.g., 
> `/var/run/mesos/cni/netns/[containerId]-[net1]_[ifName in 
> net1]-[net2]_[ifName in net2]...`, however this seems ugly to me.
> Avinash sridharan wrote:
>     For (2) why not bind mount directly to 
> /var/run/mesos/cni/netns/[continaerId] and create a simlink to 
> /var/run/netns/. Also, Jie mentioned it would be a good idea to prefix the 
> `[containerId]` with a conanical string like mesos-cni. For (3) I don't think 
> we should rely on files created by CNI plugins. There is no guarantee that 
> the IPAM plugin would maintain those files. Instead if we have already bind 
> moutned the network namespace, the namespace will persist in the kernel even 
> after the container has exited (became an orphan). At recover we just need to 
> walk to list of `containerID` stored under /var/run/mesos/cni/netns and for 
> each network namespace we can run the `ip netns comamnd` to retrieve the link 
> and IP address for that network namespace.

For 2), what symlink do you want to create under `/var/run/netns`? 
`/var/run/netns/[pid] -> /var/run/mesos/cni/netns/[containerId]`? Or 
`/var/run/netns/[containerId] -> /var/run/mesos/cni/netns/[contiainerId]`? And 
you can take a look at 
 What it does is same as what I proposed above.

And good point for 3), I agree in `recover()`, we can run `ip netns exec 
[namespace] ip addr` to retrieve the link and IP, but the problem is how we can 
know which CNI networks that these links/IPs belong to? E.g., after run `ip 
netns exec [namespace] ip addr` for a container, we get eth0/ and 
eth1/, then how can we know the CNI networks they belong to? Maybe we 
can encode CNI network name in the interface name when calling CNI plugin in 
`isolate()`, e.g., name the interface name as `net0-eth0` rather than `eth0`, 
but this seems a bit strange to me and not consistent with the interface naming 
convention in Linux.

- Qian

This is an automatically generated e-mail. To reply, visit:

On March 10, 2016, 10:20 p.m., Qian Zhang wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44514/
> -----------------------------------------------------------
> (Updated March 10, 2016, 10:20 p.m.)
> Review request for mesos, Avinash sridharan, Gilbert Song, and Jie Yu.
> Bugs: MESOS-4759
>     https://issues.apache.org/jira/browse/MESOS-4759
> Repository: mesos
> Description
> -------
> Implemented prepare() method of "network/cni" isolator.
> Diffs
> -----
>   src/slave/containerizer/mesos/isolators/network/cni.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/network/cni.cpp PRE-CREATION 
> Diff: https://reviews.apache.org/r/44514/diff/
> Testing
> -------
> make check
> Thanks,
> Qian Zhang

Reply via email to