> On March 12, 2016, 4:02 a.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/isolators/network/cni.hpp, line 85
> > <https://reviews.apache.org/r/44514/diff/2/?file=1294397#file1294397line85>
> >
> > I would suggest we have a `Info` for each container.
> >
> > ```
> > struct Info
> > {
> > ...
> > };
> > ```
> >
> > Remember that for any field you put into the Info struct, you need to
> > be able to 'recover' it in recover function. It's also possible that
> > ExecutorInfo/TaskInfo are not available during recover (orphans due to
> > wiped meta data). You need to think about how to recover (e.g., the name of
> > the network) that the container has joined.
>
> Avinash sridharan wrote:
> We also need to recover the IP address associated with the container.
> Since this has to be returned as part of `NetworkINfo` . Need to figure out
> how to recover this?
>
> Jie Yu wrote:
> I am not sure. Why do we need to store/recover IP address? When destroy a
> container, the CNI plugin only needs container ID (which can be the same as
> our container Id), the namespace handle and the networkconfig file. The rest
> can all be fixed.
>
> I think the trick part is: when there is an orphan container (no
> executorinfo/taskinfo), how can we figure out its network namespace handle
> and networconfig file so that we can destroy it (call CNI plugin DEL). We
> need to know the name of the network to get the networkcofig file, and the
> pid to get the network namespace handle.
>
> Qian Zhang wrote:
> Yes, that is the tricky part. For orphan container (e.g., the container
> created by the framework without checkpoint enabled), when agent restarts, in
> the recover() we only have container ID which is not enough to invoke CNI
> plugin DEL. Totally we need:
> 1) Container ID.
> 2) Network namespace path.
> 3) Network configuration.
> 4) Name of the interface inside the container.
>
> I think currently we only have 1).
> For 2), my idea is, in `isolate()`, before calling CNI plugin, we do a
> bind mount (`/var/run/netns/[pid]` -> `proc/[pid]/ns/net`), and then a symbol
> link (`/var/run/mesos/cni/netns/[containerId]` -> `/var/run/netns/[pid]`).
> And then in `recover()`, since we have `containerId` for orphan containers,
> we should be able to figure out the `pid` based on the symbol link, and then
> get the network namespace path `/proc/[pid]/ns/net`.
> For 3), actually when we call CNI plugin for a container in `isolate()`,
> the CNI plugin (actually the CNI `host-local` IPAM plugin) will create a file
> under `/var/lib/cni/networks/[neworkName]/`, the name of the file is the IP
> assigned to the container, and the content of file is the container ID. So if
> a container is assigned an IP from a CNI network, it must have a file under
> `/var/lib/cni/networks/[neworkName]/`, we may leverage this information to
> figure out the name of CNI networks the orphan container joins. But this may
> only work for the CNI network which uses `host-local` IPAM plugin, for `dhcp`
> IPAM plugin, I think it will not write any files for containers.
> For 4), I do not have idea for it yet :-(
>
> Or another solution would be, when we create the symbol link in
> `isolate()`, we may encode more info into the name of symbol link, e.g.,
> `/var/run/mesos/cni/netns/[containerId]-[net1]_[ifName in
> net1]-[net2]_[ifName in net2]...`, however this seems ugly to me.
>
> Avinash sridharan wrote:
> For (2) why not bind mount directly to
> /var/run/mesos/cni/netns/[continaerId] and create a simlink to
> /var/run/netns/. Also, Jie mentioned it would be a good idea to prefix the
> `[containerId]` with a conanical string like mesos-cni. For (3) I don't think
> we should rely on files created by CNI plugins. There is no guarantee that
> the IPAM plugin would maintain those files. Instead if we have already bind
> moutned the network namespace, the namespace will persist in the kernel even
> after the container has exited (became an orphan). At recover we just need to
> walk to list of `containerID` stored under /var/run/mesos/cni/netns and for
> each network namespace we can run the `ip netns comamnd` to retrieve the link
> and IP address for that network namespace.
>
> Qian Zhang wrote:
> For 2), what symlink do you want to create under `/var/run/netns`?
> `/var/run/netns/[pid] -> /var/run/mesos/cni/netns/[containerId]`? Or
> `/var/run/netns/[containerId] -> /var/run/mesos/cni/netns/[contiainerId]`?
> And you can take a look at
> https://github.com/apache/mesos/blob/0.27.2/src/slave/containerizer/mesos/isolators/network/port_mapping.cpp#L2205:L2243,
> What it does is same as what I proposed above.
>
> And good point for 3), I agree in `recover()`, we can run `ip netns exec
> [namespace] ip addr` to retrieve the link and IP, but the problem is how we
> can know which CNI networks that these links/IPs belong to? E.g., after run
> `ip netns exec [namespace] ip addr` for a container, we get eth0/10.0.0.3 and
> eth1/10.0.1.3, then how can we know the CNI networks they belong to? Maybe we
> can encode CNI network name in the interface name when calling CNI plugin in
> `isolate()`, e.g., name the interface name as `net0-eth0` rather than `eth0`,
> but this seems a bit strange to me and not consistent with the interface
> naming convention in Linux.
>
> Jie Yu wrote:
> I don't think we need the symlink. We can just use a bind mount
> /var/run/netns/mesos_cni_<ContainerID>. You can specify the namespace handle
> using the bind mount directly (no need to get the pid). I think the symlink
> in port mapping isolator is mainly for legacy reason (because initially, we
> only have the pid bind mount). But turns out that we definitely need to
> checkpoint some data under /var/run (for ifname and network name). The layout
> might be something like the following:
> ```
> /var/run/mesos/isolators/network/cni/<ContainerID>/
> |-- ns -> /proc/<pid>/ns/net (bind mount)
> |-- <net1>/
> |-- ifname
> |-- ipv4
> |-- <net2>/
> |-- ifname
> |-- ipv4
> ```
Thank Jie. I agree that we only need a bind mount and no need the symlink.
And for the layout under `/var/run/mesos/isolators/network/cni/<ContainerID>/`,
I have a few questions:
1. If we create `|-- ns -> /proc/<pid>/ns/net (bind mount)`, then I think we do
not need the bind mount `/var/run/netns/mesos_cni_<ContainerID> ->
/proc/<pid>/ns/net`, right?
2. Do we need a sub-dir (e.g., `<net1>/` in your layout) for each CNI network
that the container joins? Or what about just a file with network name as its
file name and the line in the file is `ifname:ipv4`.
3. Do you think that we need to introduce an agent flag for operator to specify
the directory `/var/run/mesos/isolators/network/cni/` when starting agent? Or
we can just keep it internal?
So basically the layout in my mind is like:
/var/run/mesos/isolators/network/cni/<ContainerID>/
|-- ns -> /proc/<pid>/ns/net (bind mount)
|-- <net1>: ifname1:ip1
ifname2:ip2
|-- <net2>: ifname3:ip3
ifname4:ip4
- Qian
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44514/#review123200
-----------------------------------------------------------
On March 10, 2016, 10:20 p.m., Qian Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44514/
> -----------------------------------------------------------
>
> (Updated March 10, 2016, 10:20 p.m.)
>
>
> Review request for mesos, Avinash sridharan, Gilbert Song, and Jie Yu.
>
>
> Bugs: MESOS-4759
> https://issues.apache.org/jira/browse/MESOS-4759
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Implemented prepare() method of "network/cni" isolator.
>
>
> Diffs
> -----
>
> src/slave/containerizer/mesos/isolators/network/cni.hpp PRE-CREATION
> src/slave/containerizer/mesos/isolators/network/cni.cpp PRE-CREATION
>
> Diff: https://reviews.apache.org/r/44514/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Qian Zhang
>
>