Repository: mesos Updated Branches: refs/heads/master ff14433da -> de508d94d
Stopped applying empty executor resources in `network/ports` recovery. At recovery time, the containerizer sends only the executor resources, but ports are typically allocated against tasks. This means that we don't know about the complete set of container resources until the containerizer makes the subsequent `update()` call. If we perform a container ports check between the two calls, we can erroneously kill the container because we believe it to have no allocated ports. Review: https://reviews.apache.org/r/65153/ Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/de508d94 Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/de508d94 Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/de508d94 Branch: refs/heads/master Commit: de508d94dadb585fff171a5f44a38ce7dd0dfc01 Parents: 73fc563 Author: James Peach <jpe...@apache.org> Authored: Tue Jan 16 08:37:29 2018 -0800 Committer: James Peach <jpe...@apache.org> Committed: Tue Jan 16 09:00:45 2018 -0800 ---------------------------------------------------------------------- .../containerizer/mesos/isolators/network/ports.cpp | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/de508d94/src/slave/containerizer/mesos/isolators/network/ports.cpp ---------------------------------------------------------------------- diff --git a/src/slave/containerizer/mesos/isolators/network/ports.cpp b/src/slave/containerizer/mesos/isolators/network/ports.cpp index d9f8370..1f84ed4 100644 --- a/src/slave/containerizer/mesos/isolators/network/ports.cpp +++ b/src/slave/containerizer/mesos/isolators/network/ports.cpp @@ -385,6 +385,14 @@ static bool hasNamedNetwork(const ContainerInfo& container_info) } +// Recover the given list of containers. Note that we don't look at +// the executor resources from the ContainerState because from the +// perspective of the container, they are incomplete. That is, the +// resources here are only the resources for the executor, not the +// resources for the whole container. At this point, we don't know +// whether any of the tasks in the container have been allocated ports, +// so we must not start isolating it until we receive the resources +// update. Future<Nothing> NetworkPortsIsolatorProcess::recover( const list<ContainerState>& states, const hashset<ContainerID>& orphans) @@ -403,7 +411,6 @@ Future<Nothing> NetworkPortsIsolatorProcess::recover( if (!cniIsolatorEnabled) { infos.emplace(state.container_id(), Owned<Info>(new Info())); - update(state.container_id(), state.executor_info().resources()); continue; } @@ -413,11 +420,6 @@ Future<Nothing> NetworkPortsIsolatorProcess::recover( if (!state.executor_info().has_container() || !hasNamedNetwork(state.executor_info().container())) { infos.emplace(state.container_id(), Owned<Info>(new Info())); - - // Update the resources for this container so that we can isolate - // it from now until the executor re-registers and the containerizer - // sends us a new update. - update(state.container_id(), state.executor_info().resources()); } }