[mesos] branch master updated: Added support of `null` values for syscall `args` in Seccomp.

2020-06-17 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new a679eb4  Added support of `null` values for syscall `args` in Seccomp.
a679eb4 is described below

commit a679eb4bc35bd2d7c4cffdd9440ab301d8fc8986
Author: Andrei Budnik 
AuthorDate: Wed Jun 10 17:39:05 2020 +0200

Added support of `null` values for syscall `args` in Seccomp.

This patch adds support for `null` values of syscall arguments.
If the value is `null`, then syscall arguments are ignored.

Review: https://reviews.apache.org/r/72596
---
 src/linux/seccomp/seccomp_parser.cpp | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/linux/seccomp/seccomp_parser.cpp 
b/src/linux/seccomp/seccomp_parser.cpp
index 3dcfcd6..8242bd5 100644
--- a/src/linux/seccomp/seccomp_parser.cpp
+++ b/src/linux/seccomp/seccomp_parser.cpp
@@ -435,23 +435,26 @@ Try parseSyscalls(
 
 // Parse `args` section which contains seccomp filtering rules for syscall
 // arguments.
-const auto args = item.as().at("args");
+const auto args = item.as().find("args");
 if (!args.isSome()) {
   return Error(
   "Cannot determine 'args' field for 'syscalls' item: " +
   (args.isError() ? args.error() : "Not found"));
 }
 
-foreach (const JSON::Value& argsItem, args->values) {
-  if (!argsItem.is()) {
-return Error("'names' contains a non-object item");
-  }
+// `args` can be either `null` or an array.
+if (args->is()) {
+  foreach (const JSON::Value& argsItem, args->as().values) {
+if (!argsItem.is()) {
+  return Error("'args' contains a non-object item");
+}
 
-  Try arg =
-parseSyscallArgument(argsItem.as(), syscall.add_args());
+Try arg =
+  parseSyscallArgument(argsItem.as(), 
syscall.add_args());
 
-  if (arg.isError()) {
-return Error(arg.error());
+if (arg.isError()) {
+  return Error(arg.error());
+}
   }
 }
 



[mesos] branch master updated: Added URI to failure messages in URI fetcher plugins.

2020-06-10 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new 5cc10d6  Added URI to failure messages in URI fetcher plugins.
5cc10d6 is described below

commit 5cc10d680f85570a0ce1bf695484f5ff7271e1a8
Author: Andrei Budnik 
AuthorDate: Mon Jun 8 14:16:17 2020 +0200

Added URI to failure messages in URI fetcher plugins.

Review: https://reviews.apache.org/r/72575
---
 src/uri/fetchers/curl.cpp   | 26 +++---
 src/uri/fetchers/docker.cpp | 37 +
 2 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/src/uri/fetchers/curl.cpp b/src/uri/fetchers/curl.cpp
index 1796620..1e6b382 100644
--- a/src/uri/fetchers/curl.cpp
+++ b/src/uri/fetchers/curl.cpp
@@ -153,49 +153,53 @@ Future CurlFetcherPlugin::fetch(
   s->status(),
   io::read(s->out().get()),
   io::read(s->err().get()))
-.then([](const tuple<
+.then([uri](const tuple<
 Future>,
 Future,
 Future>& t) -> Future {
   const Future>& status = std::get<0>(t);
   if (!status.isReady()) {
 return Failure(
-"Failed to get the exit status of the curl subprocess: " +
+"Failed to get the exit status of the curl subprocess for '" +
+stringify(uri) + "': " +
 (status.isFailed() ? status.failure() : "discarded"));
   }
 
   if (status->isNone()) {
-return Failure("Failed to reap the curl subprocess");
+return Failure("Failed to reap the curl subprocess for '" +
+   stringify(uri) + "'");
   }
 
   if (status->get() != 0) {
 const Future& error = std::get<2>(t);
 if (!error.isReady()) {
   return Failure(
-  "Failed to perform 'curl'. Reading stderr failed: " +
+  "Failed to perform 'curl' for '" + stringify(uri) +
+  "'. Reading stderr failed: " +
   (error.isFailed() ? error.failure() : "discarded"));
 }
 
-return Failure("Failed to perform 'curl': " + error.get());
+return Failure("Failed to perform 'curl' for '" + stringify(uri) +
+   "': " + error.get());
   }
 
   const Future& output = std::get<1>(t);
   if (!output.isReady()) {
 return Failure(
-"Failed to read stdout from 'curl': " +
-(output.isFailed() ? output.failure() : "discarded"));
+"Failed to read stdout from 'curl' for '" + stringify(uri) +
+"': " + (output.isFailed() ? output.failure() : "discarded"));
   }
 
   // Parse the output and get the HTTP response code.
   Try code = numify(output.get());
   if (code.isError()) {
-return Failure("Unexpected output from 'curl': " + output.get());
+return Failure("Unexpected output from 'curl' for '" + stringify(uri) +
+   "': " + output.get());
   }
 
   if (code.get() != http::Status::OK) {
-return Failure(
-"Unexpected HTTP response code: " +
-http::Status::string(code.get()));
+return Failure("Unexpected HTTP response code for '" + stringify(uri) +
+   "': " + http::Status::string(code.get()));
   }
 
   return Nothing();
diff --git a/src/uri/fetchers/docker.cpp b/src/uri/fetchers/docker.cpp
index 09feb68..ed71495 100644
--- a/src/uri/fetchers/docker.cpp
+++ b/src/uri/fetchers/docker.cpp
@@ -170,36 +170,38 @@ static Future curl(
   s->status(),
   io::read(s->out().get()),
   io::read(s->err().get()))
-.then([](const tuple<
+.then([uri](const tuple<
 Future>,
 Future,
 Future>& t) -> Future {
   const Future>& status = std::get<0>(t);
   if (!status.isReady()) {
 return Failure(
-"Failed to get the exit status of the curl subprocess: " +
-(status.isFailed() ? status.failure() : "discarded"));
+"Failed to get the exit status of the curl subprocess for '" +
+uri + "': " + (status.isFailed() ? status.failure() : 
"discarded"));
   }
 
   if (status->isNone()) {
-return Failure("Failed to reap the curl subprocess");
+return Failure("Failed to reap the curl subprocess for '" + uri + "'");
   }
 
   if (status->get() != 0) {
 const Future& error

[mesos] branch master updated: Keep retrying to remove cgroup on EBUSY.

2020-05-07 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new 0cb1591  Keep retrying to remove cgroup on EBUSY.
0cb1591 is described below

commit 0cb1591b709e3c9f32093d943b8e2ddcdcf7999f
Author: Charles-Francois Natali 
AuthorDate: Sat May 2 01:41:09 2020 +0100

Keep retrying to remove cgroup on EBUSY.

This is a follow-up to MESOS-10107, which introduced retries when
calling `rmdir` on a seemingly empty cgroup fails with `EBUSY`
because of various kernel bugs.
At the time, the fix introduced a bounded number of retries, using an
exponential backoff summing up to slightly over 1s. This was done
because it was similar to what Docker does, and worked during testing.
However, after 1 month without seeing this error in our cluster at work,
we finally experienced one case where the 1s timeout wasn't enough.
It could be because the machine was busy at the time, or some other
random factor.
So instead of only trying for 1s, I think it might make sense to just
keep retrying, until the top-level container destruction timeout - set
at 1 minute - kicks in.
This actually makes more sense, and avoids having a magical timeout in
the cgroup code.
We just need to ensure that when the destroyer is finalized, it discards
the future in charge of doing the periodic remove.

This closes #362
---
 src/linux/cgroups.cpp | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/linux/cgroups.cpp b/src/linux/cgroups.cpp
index 48be19e..43d8ac0 100644
--- a/src/linux/cgroups.cpp
+++ b/src/linux/cgroups.cpp
@@ -280,7 +280,6 @@ Future remove(const string& hierarchy, const 
string& cgroup)
   // with EBUSY even though the cgroup appears empty.
 
   Duration delay = Duration::zero();
-  int retry = 10;
 
   return loop(
   [=]() mutable {
@@ -291,8 +290,10 @@ Future remove(const string& hierarchy, const 
string& cgroup)
   [=](const Nothing&) mutable -> Future> {
 if (::rmdir(path.c_str()) == 0) {
   return process::Break();
-} else if ((errno == EBUSY) && (retry > 0)) {
-  --retry;
+} else if (errno == EBUSY) {
+  LOG(WARNING) << "Removal of cgroup " << path
+   << " failed with EBUSY, will try again";
+
   return process::Continue();
 } else {
   // If the `cgroup` still exists in the hierarchy, treat this as
@@ -1572,6 +1573,7 @@ protected:
 
   void finalize() override
   {
+remover.discard();
 discard(killers);
 promise.discard();
   }
@@ -1580,8 +1582,8 @@ private:
   void killed(const Future>& kill)
   {
 if (kill.isReady()) {
-  internal::remove(hierarchy, cgroups)
-.onAny(defer(self(), ::removed, lambda::_1));
+  remover = internal::remove(hierarchy, cgroups);
+  remover.onAny(defer(self(), ::removed, lambda::_1));
 } else if (kill.isDiscarded()) {
   promise.discard();
   terminate(self());
@@ -1611,6 +1613,9 @@ private:
 
   // The killer processes used to atomically kill tasks in each cgroup.
   vector> killers;
+
+  // Future used to destroy the cgroups once the tasks have been killed.
+  Future remover;
 };
 
 } // namespace internal {



[mesos] branch master updated (8682b5d -> 515b239)

2020-05-07 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 8682b5d  Added MESOS-10118 to the 1.9.1 CHANGELOG.
 new 63ca5c1  Logged connection error message before shutting down the 
executor.
 new 515b239  Changed permissions for domain sockets to allow non-root 
executors.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/common/domain_sockets.hpp | 2 +-
 src/executor/executor.cpp | 9 ++---
 2 files changed, 7 insertions(+), 4 deletions(-)



[mesos] 02/02: Changed permissions for domain sockets to allow non-root executors.

2020-05-07 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 515b239983e51dd06c8b5c347b4b739644113f8f
Author: Andrei Budnik 
AuthorDate: Wed May 6 19:48:38 2020 +0200

Changed permissions for domain sockets to allow non-root executors.

Previously, the default permissions for domain sockets allowed
r/w access only for the file's user, so an executor launched under
a non-privileged user could not open the agent's socket. This patch
adds r/w permissions for the group and other users to address
the access problem.

Review: https://reviews.apache.org/r/72478
---
 src/common/domain_sockets.hpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/common/domain_sockets.hpp b/src/common/domain_sockets.hpp
index 6d2b0ab..630ea93 100644
--- a/src/common/domain_sockets.hpp
+++ b/src/common/domain_sockets.hpp
@@ -33,7 +33,7 @@ namespace internal {
 namespace common {
 
 constexpr size_t DOMAIN_SOCKET_MAX_PATH_LENGTH = 108;
-constexpr int DOMAIN_SOCKET_DEFAULT_MODE = 0600;
+constexpr int DOMAIN_SOCKET_DEFAULT_MODE = 0666;
 
 
 inline Try createDomainSocket(



[mesos] 01/02: Logged connection error message before shutting down the executor.

2020-05-07 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 63ca5c159ef0272fc1d2add6ff73a88c8e16ea0c
Author: Andrei Budnik 
AuthorDate: Wed May 6 13:13:51 2020 +0200

Logged connection error message before shutting down the executor.

Previously, if an executor failed to connect to the agent, it would
silently shutdown itself without writing an error message to the log.
After we added the support for the domain sockets, a set of potential
failures during `connect` increased. In this patch, we logged
the connection failures to help in debugging.

Review: https://reviews.apache.org/r/72475
---
 src/executor/executor.cpp | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/executor/executor.cpp b/src/executor/executor.cpp
index 213f38e..974049c 100644
--- a/src/executor/executor.cpp
+++ b/src/executor/executor.cpp
@@ -561,11 +561,13 @@ protected:
   recoveryTimer = delay(
   recoveryTimeout.get(),
   self(),
-  ::_recoveryTimeout);
+  ::_recoveryTimeout,
+  failure);
 
   // Backoff and reconnect only if framework checkpointing is enabled.
   backoff();
 } else {
+  LOG(INFO) << "Disconnected from agent: " << failure << "; Shutting down";
   shutdown();
 }
   }
@@ -599,7 +601,7 @@ protected:
 return future;
   }
 
-  void _recoveryTimeout()
+  void _recoveryTimeout(const string& failure)
   {
 // It's possible that a new connection was established since the timeout
 // fired and we were unable to cancel this timeout. If this occurs, don't
@@ -612,7 +614,8 @@ protected:
 
 CHECK_SOME(recoveryTimeout);
 LOG(INFO) << "Recovery timeout of " << recoveryTimeout.get()
-  << " exceeded; Shutting down";
+  << " exceeded following the first connection failure: " << 
failure
+  << "; Shutting down";
 
 shutdown();
   }



[mesos] branch master updated: Fixed unreachable code warning for `cgroups::destroy`.

2020-04-16 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new 6d76751  Fixed unreachable code warning for `cgroups::destroy`.
6d76751 is described below

commit 6d767512bbc46029632e0b7f991f54a61d7ee6a3
Author: Andrei Budnik 
AuthorDate: Thu Apr 16 12:50:22 2020 +0200

Fixed unreachable code warning for `cgroups::destroy`.

This was noted by coverity in CID 146.

Review: https://reviews.apache.org/r/72370
---
 src/linux/cgroups.cpp | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/linux/cgroups.cpp b/src/linux/cgroups.cpp
index 5f10470..bfc48d6 100644
--- a/src/linux/cgroups.cpp
+++ b/src/linux/cgroups.cpp
@@ -1637,12 +1637,10 @@ Future destroy(const string& hierarchy, const 
string& cgroup)
 Future future = destroyer->future();
 spawn(destroyer, true);
 return future;
-  } else {
-// Otherwise, attempt to remove the cgroups in bottom-up fashion.
-return internal::remove(hierarchy, candidates);
   }
 
-  return Nothing();
+  // Attempt to remove the cgroups in a bottom-up fashion.
+  return internal::remove(hierarchy, candidates);
 }
 
 



[mesos] 01/02: Handled EBUSY when destroying a cgroup.

2020-04-15 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit af3ca189aced5fbc537bfca571264142d4cd37b3
Author: Charles-Francois Natali 
AuthorDate: Wed Apr 1 13:40:16 2020 +0100

Handled EBUSY when destroying a cgroup.

It's a workaround for kernel bugs which can cause `rmdir` to fail with
`EBUSY` even though the cgroup - appears - empty.
See for example https://lkml.org/lkml/2020/1/15/1349

This closes #355
---
 src/linux/cgroups.cpp  | 121 ++---
 src/linux/cgroups.hpp  |  14 ---
 src/tests/containerizer/cgroups_isolator_tests.cpp |   3 +-
 src/tests/containerizer/cgroups_tests.cpp  |  14 +--
 4 files changed, 88 insertions(+), 64 deletions(-)

diff --git a/src/linux/cgroups.cpp b/src/linux/cgroups.cpp
index 2234f0d..5f10470 100644
--- a/src/linux/cgroups.cpp
+++ b/src/linux/cgroups.cpp
@@ -41,11 +41,13 @@ extern "C" {
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -263,6 +265,73 @@ static Try cloneCpusetCpusMems(
   return Nothing();
 }
 
+
+// Removes a cgroup from a given hierachy.
+// @param   hierarchy  Path to hierarchy root.
+// @param   cgroup Path of the cgroup relative to the hierarchy root.
+// @return  Some if the operation succeeds.
+//  Error if the operation fails.
+Future remove(const string& hierarchy, const string& cgroup)
+{
+  const string path = path::join(hierarchy, cgroup);
+
+  // We retry on EBUSY as a workaround for kernel bug
+  // https://lkml.org/lkml/2020/1/15/1349 and others which cause rmdir to fail
+  // with EBUSY even though the cgroup appears empty.
+
+  Duration delay = Duration::zero();
+  int retry = 10;
+
+  return loop(
+  [=]() mutable {
+auto timeout = process::after(delay);
+delay = (delay == Duration::zero()) ? Milliseconds(1) : delay * 2;
+return timeout;
+  },
+  [=](const Nothing&) mutable -> Future> {
+if (::rmdir(path.c_str()) == 0) {
+  return process::Break();
+} else if ((errno == EBUSY) && (retry > 0)) {
+  --retry;
+  return process::Continue();
+} else {
+  // If the `cgroup` still exists in the hierarchy, treat this as
+  // an error; otherwise, treat this as a success since the `cgroup`
+  // has actually been cleaned up.
+  // Save the error string as os::exists may clobber errno.
+  const string error = os::strerror(errno);
+  if (os::exists(path::join(hierarchy, cgroup))) {
+return Failure(
+"Failed to remove directory '" + path + "': " + error);
+  }
+
+  return process::Break();
+}
+  }
+);
+}
+
+
+// Removes a list of cgroups from a given hierachy.
+// The cgroups are removed in order, so this function can be used to remove a
+// cgroup hierarchy in a bottom-up fashion.
+// @param   hierarchy  Path to hierarchy root.
+// @param   cgroupsPath of the cgroups relative to the hierarchy root.
+// @return  Some if the operation succeeds.
+//  Error if the operation fails.
+Future remove(const string& hierarchy, const vector& cgroups)
+{
+  Future f = Nothing();
+
+  foreach (const string& cgroup, cgroups) {
+f = f.then([=] {
+  return internal::remove(hierarchy, cgroup);
+});
+  }
+
+  return f;
+}
+
 } // namespace internal {
 
 
@@ -691,21 +760,6 @@ Try create(
 }
 
 
-Try remove(const string& hierarchy, const string& cgroup)
-{
-  string path = path::join(hierarchy, cgroup);
-
-  // Do NOT recursively remove cgroups.
-  Try rmdir = os::rmdir(path, false);
-  if (rmdir.isError()) {
-return Error(
-"Failed to remove cgroup '" + path + "': " + rmdir.error());
-  }
-
-  return rmdir;
-}
-
-
 bool exists(const string& hierarchy, const string& cgroup)
 {
   return os::exists(path::join(hierarchy, cgroup));
@@ -1522,7 +1576,8 @@ private:
   void killed(const Future>& kill)
   {
 if (kill.isReady()) {
-  remove();
+  internal::remove(hierarchy, cgroups)
+.onAny(defer(self(), ::removed, lambda::_1));
 } else if (kill.isDiscarded()) {
   promise.discard();
   terminate(self());
@@ -1533,24 +1588,16 @@ private:
 }
   }
 
-  void remove()
+  void removed(const Future& removeAll)
   {
-foreach (const string& cgroup, cgroups) {
-  Try remove = cgroups::remove(hierarchy, cgroup);
-  if (remove.isError()) {
-// If the `cgroup` still exists in the hierarchy, treat this as
-// an error; otherwise, treat this as a success since the `cgroup`
-// has actually been cleaned up.
-if (os::exists(path::join(hierarchy, cgroup))) {
-  

[mesos] branch master updated (e8793b2 -> 8c0b882)

2020-04-15 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from e8793b2  Fixed libevent SSL socket shutdown race condition.
 new af3ca18  Handled EBUSY when destroying a cgroup.
 new 8c0b882  Added Charles-Francois Natali to contributors list.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/contributors.yaml |   6 +
 src/linux/cgroups.cpp  | 121 ++---
 src/linux/cgroups.hpp  |  14 ---
 src/tests/containerizer/cgroups_isolator_tests.cpp |   3 +-
 src/tests/containerizer/cgroups_tests.cpp  |  14 +--
 5 files changed, 94 insertions(+), 64 deletions(-)



[mesos] 02/02: Added Charles-Francois Natali to contributors list.

2020-04-15 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 8c0b8825eae962c210d76eb47928672fbb55d7d3
Author: Charles-Francois Natali 
AuthorDate: Wed Apr 8 14:33:16 2020 +0100

Added Charles-Francois Natali to contributors list.
---
 docs/contributors.yaml | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/docs/contributors.yaml b/docs/contributors.yaml
index 23fd7c5..bffa0e5 100644
--- a/docs/contributors.yaml
+++ b/docs/contributors.yaml
@@ -231,6 +231,12 @@
   jira_user: drcrallen
   reviewboard_user: drcrallen
 
+- name: Charles-Francois Natali
+  emails:
+- cf.nat...@gmail.com
+  jira_user: cf.natali
+  reviewboard_user: cf.natali
+
 - name: Chen Runcong
   affiliations:
 - {organization: Asiainfo}



[mesos] 06/06: Added a test `LaunchNestedShareCgroups`.

2020-03-13 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit e452533a29baecd8ae748852dda2d4385d2270a6
Author: Andrei Budnik 
AuthorDate: Fri Feb 28 16:15:47 2020 +0100

Added a test `LaunchNestedShareCgroups`.

Review: https://reviews.apache.org/r/72190
---
 .../nested_mesos_containerizer_tests.cpp   | 89 +-
 1 file changed, 88 insertions(+), 1 deletion(-)

diff --git a/src/tests/containerizer/nested_mesos_containerizer_tests.cpp 
b/src/tests/containerizer/nested_mesos_containerizer_tests.cpp
index 13c6d28..8aaf80a 100644
--- a/src/tests/containerizer/nested_mesos_containerizer_tests.cpp
+++ b/src/tests/containerizer/nested_mesos_containerizer_tests.cpp
@@ -53,7 +53,7 @@
 using mesos::internal::slave::Containerizer;
 using mesos::internal::slave::Fetcher;
 using mesos::internal::slave::MesosContainerizer;
-
+using mesos::internal::slave::containerizer::paths::getCgroupPath;
 using mesos::internal::slave::containerizer::paths::getContainerConfig;
 using mesos::internal::slave::containerizer::paths::getRuntimePath;
 using mesos::internal::slave::containerizer::paths::getSandboxPath;
@@ -274,6 +274,93 @@ TEST_P(NestedMesosContainerizerTest, 
ROOT_CGROUPS_LaunchNested)
 }
 
 
+// This test verifies that a separate cgroup is created for a nested
+// container only if it does not share cgroups with its parent container.
+TEST_P(NestedMesosContainerizerTest, ROOT_CGROUPS_LaunchNestedShareCgroups)
+{
+  const bool shareCgroups = GetParam();
+
+  slave::Flags flags = CreateSlaveFlags();
+  flags.launcher = "linux";
+  flags.isolation = "cgroups/cpu,filesystem/linux,namespaces/pid";
+
+  Fetcher fetcher(flags);
+
+  Try create = MesosContainerizer::create(
+  flags,
+  false,
+  );
+
+  ASSERT_SOME(create);
+
+  Owned containerizer(create.get());
+
+  SlaveState state;
+  state.id = SlaveID();
+
+  AWAIT_READY(containerizer->recover(state));
+
+  ContainerID containerId;
+  containerId.set_value(id::UUID::random().toString());
+
+  Try directory = environment->mkdtemp();
+  ASSERT_SOME(directory);
+
+  Future launch = containerizer->launch(
+  containerId,
+  createContainerConfig(
+  None(),
+  createExecutorInfo("executor", "sleep 1000", "cpus:1"),
+  directory.get()),
+  map(),
+  None());
+
+  AWAIT_ASSERT_EQ(Containerizer::LaunchResult::SUCCESS, launch);
+
+  // Now launch nested container.
+  ContainerID nestedContainerId;
+  nestedContainerId.mutable_parent()->CopyFrom(containerId);
+  nestedContainerId.set_value(id::UUID::random().toString());
+
+  launch = containerizer->launch(
+  nestedContainerId,
+  createNestedContainerConfig("cpus:0.1", createCommandInfo("sleep 1000")),
+  map(),
+  None());
+
+  AWAIT_ASSERT_EQ(Containerizer::LaunchResult::SUCCESS, launch);
+
+  // Check that a separate cgroup is created for a nested container only
+  // if `share_cgroups` field is set to false.
+  Result cpuHierarchy = cgroups::hierarchy("cpu");
+  ASSERT_SOME(cpuHierarchy);
+
+  const string cgroup = getCgroupPath(flags.cgroups_root, nestedContainerId);
+
+  ASSERT_NE(shareCgroups, cgroups::exists(cpuHierarchy.get(), cgroup));
+
+  Future> nestedTermination =
+containerizer->destroy(nestedContainerId);
+
+  AWAIT_READY(nestedTermination);
+  ASSERT_SOME(nestedTermination.get());
+  ASSERT_TRUE(nestedTermination.get()->has_status());
+  EXPECT_WTERMSIG_EQ(SIGKILL, nestedTermination.get()->status());
+
+  Future> termination =
+containerizer->destroy(containerId);
+
+  AWAIT_READY(termination);
+  ASSERT_SOME(termination.get());
+  ASSERT_TRUE(termination.get()->has_status());
+  EXPECT_WTERMSIG_EQ(SIGKILL, termination.get()->status());
+
+  // Check that the cgroups isolator cleaned up a nested cgroup
+  // for the nested container.
+  ASSERT_FALSE(cgroups::exists(cpuHierarchy.get(), cgroup));
+}
+
+
 // This test verifies that a debug container inherits the
 // environment of its parent even after agent failover.
 TEST_F(NestedMesosContainerizerTest,



[mesos] branch master updated (9ec5e2b -> e452533)

2020-03-13 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 9ec5e2b  Fixed the broken PathTest.PathIteration on windows.
 new e855f1e  Moved cgroup path helpers to `paths.hpp`.
 new 81555e8  Fixed `cgroups::create` for nested cgroups.
 new 220cf10  Cgroups isolator: added support for nested cgroups during 
launch.
 new 969836b  Cgroups isolator: added support for nested cgroups during 
recovery.
 new 96291c0  Updated nested mesos containerizer tests to support 
`share_cgroups`.
 new e452533  Added a test `LaunchNestedShareCgroups`.

The 6 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/linux/cgroups.cpp  |  26 ++-
 .../mesos/isolators/cgroups/cgroups.cpp| 182 +
 .../mesos/isolators/cgroups/cgroups.hpp|   2 +
 .../mesos/isolators/network/ports.cpp  |   5 +-
 src/slave/containerizer/mesos/linux_launcher.cpp   |  77 ++---
 src/slave/containerizer/mesos/linux_launcher.hpp   |   6 -
 src/slave/containerizer/mesos/paths.cpp|  60 +++
 src/slave/containerizer/mesos/paths.hpp|  14 ++
 .../nested_mesos_containerizer_tests.cpp   | 138 ++--
 9 files changed, 359 insertions(+), 151 deletions(-)



[mesos] 02/06: Fixed `cgroups::create` for nested cgroups.

2020-03-13 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 81555e8d73507afcc28bc6ee92c2ef456adbaf87
Author: Andrei Budnik 
AuthorDate: Tue Mar 3 14:57:49 2020 +0100

Fixed `cgroups::create` for nested cgroups.

This patch modifies `cgroups::create` function to call
`cloneCpusetCpusMems` for all absent nested cgroups along
the path to a cgroup that is accepted as an argument to this function.
For instance, if `cgroups::create` is called to create three
non-existent cgroups recursively for the path `/a/b/c`, then
`cloneCpusetCpusMems` is called to clone both `cpuset.cpus` and
`cpuset.mems` for `/a` from its parent, then `/a/b` from `/a`,
and so on down the path.

Review: https://reviews.apache.org/r/72122/
---
 src/linux/cgroups.cpp | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/src/linux/cgroups.cpp b/src/linux/cgroups.cpp
index 73646c9..2234f0d 100644
--- a/src/linux/cgroups.cpp
+++ b/src/linux/cgroups.cpp
@@ -646,7 +646,19 @@ Try create(
 const string& cgroup,
 bool recursive)
 {
+  vector missingCgroups;
+  string currentCgroup;
+  Path cgroupPath(cgroup);
+  for (auto it = cgroupPath.begin(); it != cgroupPath.end(); ++it) {
+currentCgroup = path::join(currentCgroup, *it);
+if (!missingCgroups.empty() ||
+!os::exists(path::join(hierarchy, currentCgroup))) {
+  missingCgroups.push_back(currentCgroup);
+}
+  }
+
   string path = path::join(hierarchy, cgroup);
+
   Try mkdir = os::mkdir(path, recursive);
   if (mkdir.isError()) {
 return Error(
@@ -661,8 +673,18 @@ Try create(
 "Failed to determine if hierarchy '" + hierarchy +
 "' has the 'cpuset' subsystem attached: " + attached.error());
   } else if (attached->count("cpuset") > 0) {
-string parent = Path(path::join("/", cgroup)).dirname();
-return internal::cloneCpusetCpusMems(hierarchy, parent, cgroup);
+foreach (const string& cgroup, missingCgroups) {
+  string parent = Path(cgroup).dirname();
+
+  Try clone =
+internal::cloneCpusetCpusMems(hierarchy, parent, cgroup);
+
+  if (clone.isError()) {
+return Error(
+"Failed to clone `cpuset.cpus` and `cpuset.mems` from '" +
+parent + "' to '" + cgroup + "': " + clone.error());
+  }
+}
   }
 
   return Nothing();



[mesos] 05/06: Updated nested mesos containerizer tests to support `share_cgroups`.

2020-03-13 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 96291c0e157e0d3c125b75868186dc4661986c96
Author: Andrei Budnik 
AuthorDate: Tue Mar 3 14:53:25 2020 +0100

Updated nested mesos containerizer tests to support `share_cgroups`.

Parametrized some `NestedMesosContainerizerTest` tests on whether we
are launching nested containers with `share_cgroups=false` or not.
Previously, all nested containers shared cgroups with their parent by
default. Now, since we've added support for nested containers with
their own cgroups, we need to verify that a nested container with
`share_cgroups=false` can be successfully launched and it does not
lead to problems during the recovery of Mesos containerizer.

Review: https://reviews.apache.org/r/72189
---
 .../nested_mesos_containerizer_tests.cpp   | 49 ++
 1 file changed, 40 insertions(+), 9 deletions(-)

diff --git a/src/tests/containerizer/nested_mesos_containerizer_tests.cpp 
b/src/tests/containerizer/nested_mesos_containerizer_tests.cpp
index c6f96e6..13c6d28 100644
--- a/src/tests/containerizer/nested_mesos_containerizer_tests.cpp
+++ b/src/tests/containerizer/nested_mesos_containerizer_tests.cpp
@@ -87,7 +87,8 @@ namespace internal {
 namespace tests {
 
 class NestedMesosContainerizerTest
-  : public ContainerizerTest
+  : public ContainerizerTest,
+public ::testing::WithParamInterface
 {
 protected:
   Try createSlaveState(
@@ -135,9 +136,39 @@ protected:
 
 return slaveState;
   }
+
+  template 
+  mesos::slave::ContainerConfig createNestedContainerConfig(
+  const string& resources, Args... args) const
+  {
+mesos::slave::ContainerConfig containerConfig =
+  createContainerConfig(std::forward(args)...);
+
+const bool shareCgroups = GetParam();
+
+ContainerInfo* container = containerConfig.mutable_container_info();
+container->set_type(ContainerInfo::MESOS);
+container->mutable_linux_info()->set_share_cgroups(shareCgroups);
+
+if (!shareCgroups) {
+  containerConfig.mutable_resources()->CopyFrom(
+Resources::parse(resources).get());
+}
+
+return containerConfig;
+  }
 };
 
 
+// Some nested containerizer tests are parameterized by the boolean
+// `shared_cgroups` flag that specifies whether cgroups are shared
+// between nested containers and their parent container.
+INSTANTIATE_TEST_CASE_P(
+NestedContainerShareCgroups,
+NestedMesosContainerizerTest,
+::testing::Values(true, false));
+
+
 TEST_F(NestedMesosContainerizerTest, NestedContainerID)
 {
   ContainerID id1;
@@ -173,7 +204,7 @@ TEST_F(NestedMesosContainerizerTest, NestedContainerID)
 }
 
 
-TEST_F(NestedMesosContainerizerTest, ROOT_CGROUPS_LaunchNested)
+TEST_P(NestedMesosContainerizerTest, ROOT_CGROUPS_LaunchNested)
 {
   slave::Flags flags = CreateSlaveFlags();
   flags.launcher = "linux";
@@ -219,7 +250,7 @@ TEST_F(NestedMesosContainerizerTest, 
ROOT_CGROUPS_LaunchNested)
 
   launch = containerizer->launch(
   nestedContainerId,
-  createContainerConfig(createCommandInfo("exit 42")),
+  createNestedContainerConfig("cpus:0.1", createCommandInfo("exit 42")),
   map(),
   None());
 
@@ -1600,7 +1631,7 @@ TEST_F(NestedMesosContainerizerTest, 
ROOT_CGROUPS_ParentSigterm)
 }
 
 
-TEST_F(NestedMesosContainerizerTest, ROOT_CGROUPS_RecoverNested)
+TEST_P(NestedMesosContainerizerTest, ROOT_CGROUPS_RecoverNested)
 {
   slave::Flags flags = CreateSlaveFlags();
   flags.launcher = "linux";
@@ -1659,7 +1690,7 @@ TEST_F(NestedMesosContainerizerTest, 
ROOT_CGROUPS_RecoverNested)
 
   launch = containerizer->launch(
   nestedContainerId,
-  createContainerConfig(createCommandInfo("sleep 1000")),
+  createNestedContainerConfig("cpus:0.1", createCommandInfo("sleep 1000")),
   map(),
   None());
 
@@ -1740,7 +1771,7 @@ TEST_F(NestedMesosContainerizerTest, 
ROOT_CGROUPS_RecoverNested)
 // This test verifies that the agent could recover if the agent
 // metadata is empty but container runtime dir is not cleaned
 // up. This is a regression test for MESOS-8416.
-TEST_F(NestedMesosContainerizerTest,
+TEST_P(NestedMesosContainerizerTest,
ROOT_CGROUPS_RecoverNestedWithoutSlaveState)
 {
   slave::Flags flags = CreateSlaveFlags();
@@ -1798,7 +1829,7 @@ TEST_F(NestedMesosContainerizerTest,
 
   launch = containerizer->launch(
   nestedContainerId,
-  createContainerConfig(createCommandInfo("sleep 1000")),
+  createNestedContainerConfig("cpus:0.1", createCommandInfo("sleep 1000")),
   map(),
   None());
 
@@ -1844,7 +1875,7 @@ TEST_F(NestedMesosContainerizerTest,
 }
 
 
-TEST_F(NestedMesosContainerizerTest, ROOT_CGROUPS_RecoverNestedWithoutConfig)
+TEST_P(NestedMes

[mesos] 01/06: Moved cgroup path helpers to `paths.hpp`.

2020-03-13 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit e855f1ee7fdcc812cf2bc48d198054a49c5320f2
Author: Andrei Budnik 
AuthorDate: Tue Mar 3 14:57:09 2020 +0100

Moved cgroup path helpers to `paths.hpp`.

Review: https://reviews.apache.org/r/72121/
---
 .../mesos/isolators/network/ports.cpp  |  5 +-
 src/slave/containerizer/mesos/linux_launcher.cpp   | 77 --
 src/slave/containerizer/mesos/linux_launcher.hpp   |  6 --
 src/slave/containerizer/mesos/paths.cpp| 60 +
 src/slave/containerizer/mesos/paths.hpp| 14 
 5 files changed, 88 insertions(+), 74 deletions(-)

diff --git a/src/slave/containerizer/mesos/isolators/network/ports.cpp 
b/src/slave/containerizer/mesos/isolators/network/ports.cpp
index fec74d3..86d3053 100644
--- a/src/slave/containerizer/mesos/isolators/network/ports.cpp
+++ b/src/slave/containerizer/mesos/isolators/network/ports.cpp
@@ -38,7 +38,7 @@
 
 #include "slave/constants.hpp"
 
-#include "slave/containerizer/mesos/linux_launcher.hpp"
+#include "slave/containerizer/mesos/paths.hpp"
 
 using std::list;
 using std::set;
@@ -94,7 +94,8 @@ collectContainerListeners(
 
   foreach (const ContainerID& containerId, containerIds) {
 // Reconstruct the cgroup path from the container ID.
-string cgroup = LinuxLauncher::cgroup(cgroupsRoot, containerId);
+string cgroup =
+  containerizer::paths::getCgroupPath(cgroupsRoot, containerId);
 
 VLOG(1) << "Checking processes for container " << containerId
 << " in cgroup " << cgroup;
diff --git a/src/slave/containerizer/mesos/linux_launcher.cpp 
b/src/slave/containerizer/mesos/linux_launcher.cpp
index c10092b..0c8c890 100644
--- a/src/slave/containerizer/mesos/linux_launcher.cpp
+++ b/src/slave/containerizer/mesos/linux_launcher.cpp
@@ -37,7 +37,6 @@
 
 #include "mesos/resources.hpp"
 
-#include "slave/containerizer/mesos/constants.hpp"
 #include "slave/containerizer/mesos/linux_launcher.hpp"
 #include "slave/containerizer/mesos/paths.hpp"
 
@@ -105,10 +104,6 @@ private:
 
   Future _destroy(const ContainerID& containerId);
 
-  // Helper for parsing the cgroup path to determine the container ID
-  // it belongs to.
-  Option parse(const string& cgroup);
-
   static const string subsystem;
   const Flags flags;
   const string freezerHierarchy;
@@ -194,19 +189,6 @@ bool LinuxLauncher::available()
 }
 
 
-string LinuxLauncher::cgroup(
-const string& cgroupsRoot,
-const ContainerID& containerId)
-{
-  return path::join(
-  cgroupsRoot,
-  containerizer::paths::buildPath(
-  containerId,
-  CGROUP_SEPARATOR,
-  containerizer::paths::JOIN));
-}
-
-
 LinuxLauncher::LinuxLauncher(
 const Flags& flags,
 const string& freezerHierarchy,
@@ -326,7 +308,9 @@ Future> LinuxLauncherProcess::recover(
 // created (e.g., in the future we might have nested containers
 // that are managed by something else rooted within the freezer
 // hierarchy).
-Option containerId = parse(cgroup);
+Option containerId =
+  containerizer::paths::parseCgroupPath(flags.cgroups_root, cgroup);
+
 if (containerId.isNone()) {
   LOG(INFO) << "Not recovering cgroup " << cgroup;
   continue;
@@ -510,7 +494,9 @@ Try LinuxLauncherProcess::fork(
   parentHooks.emplace_back(Subprocess::ParentHook([=](pid_t child) {
 return cgroups::isolate(
 freezerHierarchy,
-LinuxLauncher::cgroup(this->flags.cgroups_root, containerId),
+containerizer::paths::getCgroupPath(
+this->flags.cgroups_root,
+containerId),
 child);
   }));
 
@@ -519,7 +505,9 @@ Try LinuxLauncherProcess::fork(
 parentHooks.emplace_back(Subprocess::ParentHook([=](pid_t child) {
   return cgroups::isolate(
   systemdHierarchy.get(),
-  LinuxLauncher::cgroup(this->flags.cgroups_root, containerId),
+  containerizer::paths::getCgroupPath(
+  this->flags.cgroups_root,
+  containerId),
   child);
 }));
   }
@@ -591,7 +579,7 @@ Future LinuxLauncherProcess::destroy(const 
ContainerID& containerId)
   }
 
   const string cgroup =
-LinuxLauncher::cgroup(flags.cgroups_root, container->id);
+containerizer::paths::getCgroupPath(flags.cgroups_root, container->id);
 
   // We remove the container so that we don't attempt multiple
   // destroys simultaneously and no other functions will return
@@ -641,7 +629,7 @@ Future LinuxLauncherProcess::_destroy(const 
ContainerID& containerId)
   }
 
   const string cgroup =
-LinuxLauncher::cgroup(flags.cgroups_root, containerId);
+containerizer::paths::getCgroupPath

[mesos] 04/06: Cgroups isolator: added support for nested cgroups during recovery.

2020-03-13 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 969836bd653437fd131530520e8875ac55125bc2
Author: Andrei Budnik 
AuthorDate: Tue Mar 3 14:57:34 2020 +0100

Cgroups isolator: added support for nested cgroups during recovery.

This patch enables recovery for nested cgroups and implements
the detection of orphaned nested cgroups.

Review: https://reviews.apache.org/r/71966/
---
 .../mesos/isolators/cgroups/cgroups.cpp| 41 +++---
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
index 09feaf3..bf2a4d8 100644
--- a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
+++ b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
@@ -206,10 +206,17 @@ Future CgroupsIsolatorProcess::recover(
   // Recover active containers first.
   vector> recovers;
   foreach (const ContainerState& state, states) {
-// If we are a nested container, we do not need to recover
-// anything since only top-level containers will have cgroups
+// If we are a nested container with shared cgroups, we do not
+// need to recover anything since its ancestor will have cgroups
 // created for them.
-if (state.container_id().has_parent()) {
+const bool shareCgroups =
+  (state.has_container_info() &&
+   state.container_info().has_linux_info() &&
+   state.container_info().linux_info().has_share_cgroups())
+? state.container_info().linux_info().share_cgroups()
+: true;
+
+if (state.container_id().has_parent() && shareCgroups) {
   continue;
 }
 
@@ -248,7 +255,6 @@ Future CgroupsIsolatorProcess::_recover(
   hashset unknownOrphans;
 
   foreach (const string& hierarchy, subsystems.keys()) {
-// TODO(jieyu): Use non-recursive version of `cgroups::get`.
 Try> cgroups = cgroups::get(
 hierarchy,
 flags.cgroups_root);
@@ -267,18 +273,28 @@ Future CgroupsIsolatorProcess::_recover(
 continue;
   }
 
-  ContainerID containerId;
-  containerId.set_value(Path(cgroup).basename());
+  // Need to parse the cgroup to see if it's one we created (i.e.,
+  // matches our separator structure) or one that someone else
+  // created (e.g., in the future we might have nested containers
+  // that are managed by something else rooted within the cgroup
+  // hierarchy).
+  Option containerId =
+containerizer::paths::parseCgroupPath(flags.cgroups_root, cgroup);
+
+  if (containerId.isNone()) {
+LOG(INFO) << "Not recovering cgroup " << cgroup;
+continue;
+  }
 
   // Skip containerId which already have been recovered.
-  if (infos.contains(containerId)) {
+  if (infos.contains(containerId.get())) {
 continue;
   }
 
-  if (orphans.contains(containerId)) {
-knownOrphans.insert(containerId);
+  if (orphans.contains(containerId.get())) {
+knownOrphans.insert(containerId.get());
   } else {
-unknownOrphans.insert(containerId);
+unknownOrphans.insert(containerId.get());
   }
 }
   }
@@ -335,7 +351,8 @@ Future CgroupsIsolatorProcess::__recover(
 Future CgroupsIsolatorProcess::___recover(
 const ContainerID& containerId)
 {
-  const string cgroup = path::join(flags.cgroups_root, containerId.value());
+  const string cgroup =
+containerizer::paths::getCgroupPath(flags.cgroups_root, containerId);
 
   vector> recovers;
   hashset recoveredSubsystems;
@@ -397,7 +414,7 @@ Future CgroupsIsolatorProcess::recover(
 
   infos[containerId] = Owned(new Info(
   containerId,
-  path::join(flags.cgroups_root, containerId.value(;
+  containerizer::paths::getCgroupPath(flags.cgroups_root, containerId)));
 
   infos[containerId]->subsystems = recoveredSubsystems;
 



[mesos] 03/06: Cgroups isolator: added support for nested cgroups during launch.

2020-03-13 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 220cf1049d33170650059cc3fcd64657b1ea0407
Author: Andrei Budnik 
AuthorDate: Tue Mar 3 14:57:24 2020 +0100

Cgroups isolator: added support for nested cgroups during launch.

This patch adds support for nested cgroups for nested containers.
Nested cgroups are created only for a nested container with explicitly
disabled `share_cgroups` flag. The cgroups isolator stores info about
nested cgroups in the `infos` class variable, which is used to
determine whether a nested container has its nested cgroup.

Review: https://reviews.apache.org/r/71965/
---
 .../mesos/isolators/cgroups/cgroups.cpp| 141 +
 .../mesos/isolators/cgroups/cgroups.hpp|   2 +
 2 files changed, 90 insertions(+), 53 deletions(-)

diff --git a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
index b12b73d..09feaf3 100644
--- a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
+++ b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
@@ -44,6 +44,7 @@
 
 using mesos::internal::protobuf::slave::containerSymlinkOperation;
 
+using mesos::slave::ContainerClass;
 using mesos::slave::ContainerConfig;
 using mesos::slave::ContainerLaunchInfo;
 using mesos::slave::ContainerLimitation;
@@ -408,10 +409,18 @@ Future> 
CgroupsIsolatorProcess::prepare(
 const ContainerID& containerId,
 const ContainerConfig& containerConfig)
 {
-  // Only prepare cgroups for top-level containers. Nested container
-  // will inherit cgroups from its root container, so here we just
+  // If the nested container shares cgroups with its parent container,
+  // we don't need to prepare cgroups. In this case, the nested container
+  // will inherit cgroups from its ancestor, so here we just
   // need to do the container-specific cgroups mounts.
-  if (containerId.has_parent()) {
+  const bool shareCgroups =
+(containerConfig.has_container_info() &&
+ containerConfig.container_info().has_linux_info() &&
+ containerConfig.container_info().linux_info().has_share_cgroups())
+  ? containerConfig.container_info().linux_info().share_cgroups()
+  : true;
+
+  if (containerId.has_parent() && shareCgroups) {
 return __prepare(containerId, containerConfig);
   }
 
@@ -419,11 +428,16 @@ Future> 
CgroupsIsolatorProcess::prepare(
 return Failure("Container has already been prepared");
   }
 
+  CHECK(containerConfig.container_class() != ContainerClass::DEBUG);
+
+  CHECK(!containerId.has_parent() || !containerId.parent().has_parent())
+<< "2nd level or greater nested cgroups are not supported";
+
   // We save 'Info' into 'infos' first so that even if 'prepare'
   // fails, we can properly cleanup the *side effects* created below.
   infos[containerId] = Owned(new Info(
   containerId,
-  path::join(flags.cgroups_root, containerId.value(;
+  containerizer::paths::getCgroupPath(flags.cgroups_root, containerId)));
 
   vector> prepares;
 
@@ -456,7 +470,8 @@ Future> 
CgroupsIsolatorProcess::prepare(
   infos[containerId]->cgroup));
 }
 
-// Chown the cgroup so the executor can create nested cgroups. Do
+// Chown the cgroup so the executor or a nested container whose
+// `share_cgroups` is false can create nested cgroups. Do
 // not recurse so the control files are still owned by the slave
 // user and thus cannot be changed by the executor.
 //
@@ -553,10 +568,6 @@ Future> 
CgroupsIsolatorProcess::__prepare(
 return None();
   }
 
-  const ContainerID rootContainerId = 
protobuf::getRootContainerId(containerId);
-
-  CHECK(infos.contains(rootContainerId));
-
   ContainerLaunchInfo launchInfo;
   launchInfo.add_clone_namespaces(CLONE_NEWNS);
 
@@ -580,9 +591,16 @@ Future> 
CgroupsIsolatorProcess::__prepare(
   // For the subsystem loaded by this isolator, do the container-specific
   // cgroups mount, e.g.:
   //   mount --bind /sys/fs/cgroup/memory/mesos/ 
/sys/fs/cgroup/memory // NOLINT(whitespace/line_length)
+  Owned info = findCgroupInfo(containerId);
+  if (!info.get()) {
+return Failure(
+"Failed to find cgroup for container " +
+stringify(containerId));
+  }
+
   foreach (const string& hierarchy, subsystems.keys()) {
 *launchInfo.add_mounts() = protobuf::slave::createContainerMount(
-path::join(hierarchy, infos[rootContainerId]->cgroup),
+path::join(hierarchy, info->cgroup),
 path::join(
 containerConfig.rootfs(),
 "/sys/fs/cgroup",
@@ -664,11 +682,11 @@ Future CgroupsIsolatorProcess::isolate(
 pid_t pid)
 {
   // We currently can't call `subsystem->isolate()`

[mesos] branch master updated (cb8106f -> ea670ec)

2020-02-24 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from cb8106f  Made `http`, `pid` and `heartbeater` of `Framework` private.
 new 360e883  Removed reimplementation of `cloexec` from systemd activation 
code.
 new ea670ec  Fixed systemd socket activation on old systemd versions.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/linux/systemd.cpp | 32 
 src/linux/systemd.hpp |  3 ++-
 src/slave/main.cpp|  5 -
 3 files changed, 10 insertions(+), 30 deletions(-)



[mesos] 02/02: Fixed systemd socket activation on old systemd versions.

2020-02-24 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit ea670eca55af847bd7fcba1510726dd0059edf32
Author: Andrei Budnik 
AuthorDate: Thu Feb 20 17:30:18 2020 +0100

Fixed systemd socket activation on old systemd versions.

This patch fixes the bug when `listenFdsWithName` function returns
an empty set of file descriptors on pre-227 systemd versions when
`domain_socket_location` value is not equals to the "systemd:unknown".
This happens when a user expects a newer version of systemd and
specifies a "systemd/", but
the actual systemd version does not support `FileDescriptorName`.
In this case, `LISTEN_FDNAMES` env variable is not specified,
so all socket FDs, which are specified by the `LISTEN_FDS`,
must be used to locate the path to the domain socket.

Review: https://reviews.apache.org/r/72158
---
 src/linux/systemd.cpp | 6 +++---
 src/linux/systemd.hpp | 3 ++-
 src/slave/main.cpp| 5 -
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/linux/systemd.cpp b/src/linux/systemd.cpp
index c7815a0..75d3c2b 100644
--- a/src/linux/systemd.cpp
+++ b/src/linux/systemd.cpp
@@ -395,7 +395,7 @@ Try> listenFds()
 }
 
 
-Try> listenFdsWithName(const std::string& name)
+Try> listenFdsWithNames(const hashset& names)
 {
   Try> fds = listenFds();
   if (fds.isError()) {
@@ -420,8 +420,8 @@ Try> listenFdsWithName(const std::string& 
name)
   }
 
   std::vector result;
-  for (size_t i=0; i < listenFdnames.size(); ++i) {
-if (listenFdnames[i] == name) {
+  for (size_t i = 0; i < listenFdnames.size(); ++i) {
+if (names.contains(listenFdnames[i])) {
   result.push_back(fds->at(i));
 }
   }
diff --git a/src/linux/systemd.hpp b/src/linux/systemd.hpp
index ba96000..e3644c0 100644
--- a/src/linux/systemd.hpp
+++ b/src/linux/systemd.hpp
@@ -20,6 +20,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -101,7 +102,7 @@ Try> listenFds();
 // The names are set by the `FileDescriptorName=` directive in the unit file.
 // This requires systemd 227 or newer. Since any number of unit files can
 // specify the same name, this can return more than one file descriptor.
-Try> listenFdsWithName(const std::string& name);
+Try> listenFdsWithNames(const hashset& names);
 
 // Clear the `$LISTEN_PID`, `$LISTEN_FDS` and `$LISTEN_FDNAMES` environment
 // variables.
diff --git a/src/slave/main.cpp b/src/slave/main.cpp
index c1e6551..0aa2cc9 100644
--- a/src/slave/main.cpp
+++ b/src/slave/main.cpp
@@ -651,8 +651,11 @@ int main(int argc, char** argv)
   // Chop off `systemd:` prefix.
   std::string name = flags.domain_socket_location->substr(8);
 #ifdef __linux__
+  // NOTE: Some systemd versions do not support FileDescriptorName,
+  // thus we also need to listen on descriptors with name "unknown",
+  // which is used for sockets where no name could be determined.
   Try> socketFds =
-systemd::socket_activation::listenFdsWithName(name);
+systemd::socket_activation::listenFdsWithNames({name, "unknown"});
 #else
   Try> socketFds =
 Try>({}); // Dummy to avoid compile errors.



[mesos] 01/02: Removed reimplementation of `cloexec` from systemd activation code.

2020-02-24 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 360e8838d08eb07496b2c3e6bcd511348ce488b0
Author: Andrei Budnik 
AuthorDate: Fri Feb 21 12:08:47 2020 +0100

Removed reimplementation of `cloexec` from systemd activation code.

Review: https://reviews.apache.org/r/72157
---
 src/linux/systemd.cpp | 26 +-
 1 file changed, 1 insertion(+), 25 deletions(-)

diff --git a/src/linux/systemd.cpp b/src/linux/systemd.cpp
index 9897473..c7815a0 100644
--- a/src/linux/systemd.cpp
+++ b/src/linux/systemd.cpp
@@ -337,30 +337,6 @@ Try start(const string& name)
 
 namespace socket_activation {
 
-static Try setCloexecFlag(int fd, bool cloexec)
-{
-  CHECK(fd >= 0) << "Invalid file desciptor was passed";
-
-  int flags = ::fcntl(fd, F_GETFD, 0);
-  if (flags < 0) {
-return ErrnoError();
-  }
-
-  int nflags = cloexec == true ? flags | FD_CLOEXEC
-   : flags & ~FD_CLOEXEC;
-
-  if (nflags == flags) {
-return Nothing();
-  }
-
-  if (::fcntl(fd, F_SETFD, nflags) < 0) {
-return ErrnoError();
-  }
-
-  return Nothing();
-}
-
-
 // See `src/libsystemd/sd-daemon/sd-daemon.c` in the systemd source tree
 // for the reference implementation. We follow that implementation to
 // decide which conditions should result in errors and which should return
@@ -404,7 +380,7 @@ Try> listenFds()
   }
 
   for (int fd = SD_LISTEN_FDS_START; fd < SD_LISTEN_FDS_START + n; ++fd) {
-Try cloexec = setCloexecFlag(fd, true);
+Try cloexec = os::cloexec(fd);
 if (cloexec.isError()) {
   return Error(
   "Could not set CLOEXEC flag for file descriptor " + stringify(fd) +



[mesos] 01/02: Moved the Docker executor declaration into a header.

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 1995f63352a5a8c2c8e73adefed708a8620a5d47
Author: Greg Mann 
AuthorDate: Thu Jul 25 12:17:41 2019 -0700

Moved the Docker executor declaration into a header.

This moves the declaration of the Docker executor into the
Docker executor header file and moves the code for the Docker
executor binary into a new launcher implementation file.

This change will enable the Mesos executor driver
implementation to make use of the `DockerExecutor` symbol.

Review: https://reviews.apache.org/r/71033/
---
 src/CMakeLists.txt   |   2 +-
 src/Makefile.am  |   3 +-
 src/docker/CMakeLists.txt|  20 ---
 src/docker/executor.cpp  | 348 +--
 src/docker/executor.hpp  |  53 ++
 src/launcher/CMakeLists.txt  |   5 +
 src/launcher/docker_executor.cpp | 266 ++
 7 files changed, 408 insertions(+), 289 deletions(-)

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 1d4f541..e1a5980 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -249,6 +249,7 @@ set(CSI_SRC
 
 set(DOCKER_SRC
   docker/docker.cpp
+  docker/executor.cpp
   docker/spec.cpp)
 
 set(EXECUTOR_SRC
@@ -620,7 +621,6 @@ endif ()
 ##
 add_subdirectory(checks)
 add_subdirectory(cli)
-add_subdirectory(docker)
 add_subdirectory(examples)
 add_subdirectory(launcher)
 add_subdirectory(local)
diff --git a/src/Makefile.am b/src/Makefile.am
index 5f97523..ba4d453 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -1089,6 +1089,7 @@ libmesos_no_3rdparty_la_SOURCES +=
\
   docker/docker.cpp\
   docker/docker.hpp\
   docker/executor.hpp  \
+  docker/executor.cpp  \
   docker/spec.cpp  \
   examples/flags.hpp   \
   examples/test_anonymous_module.hpp   \
@@ -1809,7 +1810,7 @@ mesos_usage_CPPFLAGS = $(MESOS_CPPFLAGS)
 mesos_usage_LDADD = libmesos.la $(LDADD)
 
 pkglibexec_PROGRAMS += mesos-docker-executor
-mesos_docker_executor_SOURCES = docker/executor.cpp
+mesos_docker_executor_SOURCES = launcher/docker_executor.cpp
 mesos_docker_executor_CPPFLAGS = $(MESOS_CPPFLAGS)
 mesos_docker_executor_LDADD = libmesos.la $(LDADD)
 
diff --git a/src/docker/CMakeLists.txt b/src/docker/CMakeLists.txt
deleted file mode 100644
index 1196664..000
--- a/src/docker/CMakeLists.txt
+++ /dev/null
@@ -1,20 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# THE DOCKER EXECUTOR EXECUTABLE.
-#
-add_executable(mesos-docker-executor executor.cpp)
-target_link_libraries(mesos-docker-executor PRIVATE mesos)
diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index ed1b718..642ed65 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -854,291 +854,105 @@ private:
 };
 
 
-class DockerExecutor : public Executor
+DockerExecutor::DockerExecutor(
+const Owned& docker,
+const string& container,
+const string& sandboxDirectory,
+const string& mappedDirectory,
+const Duration& shutdownGracePeriod,
+const string& launcherDir,
+const map& taskEnvironment,
+const Option& defaultContainerDNS,
+bool cgroupsEnableCfs)
 {
-public:
-  DockerExecutor(
-  const Owned& docker,
-  const string& container,
-  const string& sandboxDirectory,
-  const string& mappedDirectory,
-  const Duration& shutdownGracePeriod,
-  const string& launcherDir,
-  const map& taskEnvironment,
-  const Option& defaultContainerDNS,
-  bool cgroupsEnableCfs)
-  {
-process = Owned(new DockerExecutorProcess(
-docker,
-containe

[mesos] branch 1.8.x updated (a2ca451 -> b17ed79)

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from a2ca451  Changed termination logic of the default executor.
 new 1995f63  Moved the Docker executor declaration into a header.
 new b17ed79  Fixed compiler error.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/CMakeLists.txt   |   2 +-
 src/Makefile.am  |   3 +-
 src/docker/CMakeLists.txt|  20 ---
 src/docker/executor.cpp  | 348 +--
 src/docker/executor.hpp  |  53 ++
 src/exec/exec.cpp|   2 +
 src/launcher/CMakeLists.txt  |   5 +
 src/launcher/docker_executor.cpp | 266 ++
 8 files changed, 410 insertions(+), 289 deletions(-)
 delete mode 100644 src/docker/CMakeLists.txt
 create mode 100644 src/launcher/docker_executor.cpp



[mesos] 02/02: Fixed compiler error.

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit b17ed79106b11c5f382006c453b4f885721e3f23
Author: Andrei Budnik 
AuthorDate: Tue Feb 4 13:12:19 2020 +0100

Fixed compiler error.
---
 src/exec/exec.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index dee2074..5838565 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -47,6 +47,8 @@
 
 #include "common/protobuf_utils.hpp"
 
+#include "docker/executor.hpp"
+
 #include "logging/flags.hpp"
 #include "logging/logging.hpp"
 



[mesos] branch 1.7.x updated (5b39908 -> 8af5a6d)

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 5b39908  Changed termination logic of the default executor.
 new 0567b31  Moved the Docker executor declaration into a header.
 new 8af5a6d  Fixed compiler error.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/CMakeLists.txt   |   2 +-
 src/Makefile.am  |   3 +-
 src/docker/CMakeLists.txt|  20 ---
 src/docker/executor.cpp  | 348 +--
 src/docker/executor.hpp  |  53 ++
 src/exec/exec.cpp|   2 +
 src/launcher/CMakeLists.txt  |   5 +
 src/launcher/docker_executor.cpp | 266 ++
 8 files changed, 410 insertions(+), 289 deletions(-)
 delete mode 100644 src/docker/CMakeLists.txt
 create mode 100644 src/launcher/docker_executor.cpp



[mesos] 01/02: Moved the Docker executor declaration into a header.

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 0567b31212105821d0b37ad049228dab6e98ed63
Author: Greg Mann 
AuthorDate: Thu Jul 25 12:17:41 2019 -0700

Moved the Docker executor declaration into a header.

This moves the declaration of the Docker executor into the
Docker executor header file and moves the code for the Docker
executor binary into a new launcher implementation file.

This change will enable the Mesos executor driver
implementation to make use of the `DockerExecutor` symbol.

Review: https://reviews.apache.org/r/71033/
---
 src/CMakeLists.txt   |   2 +-
 src/Makefile.am  |   3 +-
 src/docker/CMakeLists.txt|  20 ---
 src/docker/executor.cpp  | 348 +--
 src/docker/executor.hpp  |  53 ++
 src/launcher/CMakeLists.txt  |   5 +
 src/launcher/docker_executor.cpp | 266 ++
 7 files changed, 408 insertions(+), 289 deletions(-)

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 5bfc8db..8b3e84a 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -242,6 +242,7 @@ set(CSI_SRC
 
 set(DOCKER_SRC
   docker/docker.cpp
+  docker/executor.cpp
   docker/spec.cpp)
 
 set(EXECUTOR_SRC
@@ -599,7 +600,6 @@ endif ()
 ##
 add_subdirectory(checks)
 add_subdirectory(cli)
-add_subdirectory(docker)
 add_subdirectory(examples)
 add_subdirectory(launcher)
 add_subdirectory(local)
diff --git a/src/Makefile.am b/src/Makefile.am
index fb2a27e..4e99723 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -1006,6 +1006,7 @@ libmesos_no_3rdparty_la_SOURCES +=
\
   common/validation.cpp
\
   common/values.cpp\
   docker/docker.cpp\
+  docker/executor.cpp  \
   docker/spec.cpp  \
   exec/exec.cpp
\
   executor/executor.cpp
\
@@ -1743,7 +1744,7 @@ mesos_usage_CPPFLAGS = $(MESOS_CPPFLAGS)
 mesos_usage_LDADD = libmesos.la $(LDADD)
 
 pkglibexec_PROGRAMS += mesos-docker-executor
-mesos_docker_executor_SOURCES = docker/executor.cpp
+mesos_docker_executor_SOURCES = launcher/docker_executor.cpp
 mesos_docker_executor_CPPFLAGS = $(MESOS_CPPFLAGS)
 mesos_docker_executor_LDADD = libmesos.la $(LDADD)
 
diff --git a/src/docker/CMakeLists.txt b/src/docker/CMakeLists.txt
deleted file mode 100644
index 1196664..000
--- a/src/docker/CMakeLists.txt
+++ /dev/null
@@ -1,20 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# THE DOCKER EXECUTOR EXECUTABLE.
-#
-add_executable(mesos-docker-executor executor.cpp)
-target_link_libraries(mesos-docker-executor PRIVATE mesos)
diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index ed1b718..642ed65 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -854,291 +854,105 @@ private:
 };
 
 
-class DockerExecutor : public Executor
+DockerExecutor::DockerExecutor(
+const Owned& docker,
+const string& container,
+const string& sandboxDirectory,
+const string& mappedDirectory,
+const Duration& shutdownGracePeriod,
+const string& launcherDir,
+const map& taskEnvironment,
+const Option& defaultContainerDNS,
+bool cgroupsEnableCfs)
 {
-public:
-  DockerExecutor(
-  const Owned& docker,
-  const string& container,
-  const string& sandboxDirectory,
-  const string& mappedDirectory,
-  const Duration& shutdownGracePeriod,
-  const string& launcherDir,
-  const map& taskEnvironment,
-  const Option& defaultContainerDNS,
-  bool cgroupsEnableCfs)
-  {
-process = Owned(new DockerExecutorProcess(
-   

[mesos] 02/02: Fixed compiler error.

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 8af5a6d50c64abb683b803f25bbc02e320fe70f6
Author: Andrei Budnik 
AuthorDate: Tue Feb 4 13:12:19 2020 +0100

Fixed compiler error.
---
 src/exec/exec.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index dee2074..5838565 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -47,6 +47,8 @@
 
 #include "common/protobuf_utils.hpp"
 
+#include "docker/executor.hpp"
+
 #include "logging/flags.hpp"
 #include "logging/logging.hpp"
 



[mesos] branch 1.6.x updated (205525e -> 11cdf04)

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 205525e  Changed termination logic of the default executor.
 new 02eb0ce  Moved the Docker executor declaration into a header.
 new 11cdf04  Fixed compiler error.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/CMakeLists.txt   |   2 +-
 src/Makefile.am  |   3 +-
 src/docker/CMakeLists.txt|  20 ---
 src/docker/executor.cpp  | 323 ++-
 src/docker/executor.hpp  |  53 +++
 src/exec/exec.cpp|   2 +
 src/launcher/CMakeLists.txt  |   5 +
 src/launcher/docker_executor.cpp | 266 
 8 files changed, 410 insertions(+), 264 deletions(-)
 delete mode 100644 src/docker/CMakeLists.txt
 create mode 100644 src/launcher/docker_executor.cpp



[mesos] 02/02: Fixed compiler error.

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 11cdf04e39f80a44d062627be39b53d05d76f88d
Author: Andrei Budnik 
AuthorDate: Tue Feb 4 13:12:19 2020 +0100

Fixed compiler error.
---
 src/exec/exec.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index adeb413..e5a03d9 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -47,6 +47,8 @@
 
 #include "common/protobuf_utils.hpp"
 
+#include "docker/executor.hpp"
+
 #include "logging/flags.hpp"
 #include "logging/logging.hpp"
 



[mesos] 01/02: Moved the Docker executor declaration into a header.

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 02eb0ceb87dadc0a5ac6f6cd9f141347e852fb80
Author: Greg Mann 
AuthorDate: Thu Jul 25 12:17:41 2019 -0700

Moved the Docker executor declaration into a header.

This moves the declaration of the Docker executor into the
Docker executor header file and moves the code for the Docker
executor binary into a new launcher implementation file.

This change will enable the Mesos executor driver
implementation to make use of the `DockerExecutor` symbol.

Review: https://reviews.apache.org/r/71033/
---
 src/CMakeLists.txt   |   2 +-
 src/Makefile.am  |   3 +-
 src/docker/CMakeLists.txt|  20 ---
 src/docker/executor.cpp  | 323 ++-
 src/docker/executor.hpp  |  53 +++
 src/launcher/CMakeLists.txt  |   5 +
 src/launcher/docker_executor.cpp | 266 
 7 files changed, 408 insertions(+), 264 deletions(-)

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 8cd56bb..232cf86 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -245,6 +245,7 @@ endif ()
 
 set(DOCKER_SRC
   docker/docker.cpp
+  docker/executor.cpp
   docker/spec.cpp)
 
 set(EXECUTOR_SRC
@@ -554,7 +555,6 @@ endif ()
 ##
 add_subdirectory(checks)
 add_subdirectory(cli)
-add_subdirectory(docker)
 add_subdirectory(examples)
 add_subdirectory(launcher)
 add_subdirectory(local)
diff --git a/src/Makefile.am b/src/Makefile.am
index 98ba106..6e18cde 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -992,6 +992,7 @@ libmesos_no_3rdparty_la_SOURCES +=  
\
   common/validation.cpp
\
   common/values.cpp\
   docker/docker.cpp\
+  docker/executor.cpp  \
   docker/spec.cpp  \
   exec/exec.cpp
\
   executor/executor.cpp
\
@@ -1762,7 +1763,7 @@ mesos_usage_CPPFLAGS = $(MESOS_CPPFLAGS)
 mesos_usage_LDADD = libmesos.la $(LDADD)
 
 pkglibexec_PROGRAMS += mesos-docker-executor
-mesos_docker_executor_SOURCES = docker/executor.cpp
+mesos_docker_executor_SOURCES = launcher/docker_executor.cpp
 mesos_docker_executor_CPPFLAGS = $(MESOS_CPPFLAGS)
 mesos_docker_executor_LDADD = libmesos.la $(LDADD)
 
diff --git a/src/docker/CMakeLists.txt b/src/docker/CMakeLists.txt
deleted file mode 100644
index 1196664..000
--- a/src/docker/CMakeLists.txt
+++ /dev/null
@@ -1,20 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# THE DOCKER EXECUTOR EXECUTABLE.
-#
-add_executable(mesos-docker-executor executor.cpp)
-target_link_libraries(mesos-docker-executor PRIVATE mesos)
diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index d5c905d..714ed68 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -847,266 +847,105 @@ private:
 };
 
 
-class DockerExecutor : public Executor
+DockerExecutor::DockerExecutor(
+const Owned& docker,
+const string& container,
+const string& sandboxDirectory,
+const string& mappedDirectory,
+const Duration& shutdownGracePeriod,
+const string& launcherDir,
+const map& taskEnvironment,
+const Option& defaultContainerDNS,
+bool cgroupsEnableCfs)
 {
-public:
-  DockerExecutor(
-  const Owned& docker,
-  const string& container,
-  const string& sandboxDirectory,
-  const string& mappedDirectory,
-  const Duration& shutdownGracePeriod,
-  const string& launcherDir,
-  const map& taskEnvironment,
-  const Option& defaultContainerDNS,
-  bool cgroupsEnableCfs)
-  {
-process = Owned(new DockerExecutorProcess(
-   

[mesos] branch 1.5.x updated (84b7af3 -> d440095)

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 84b7af3  Changed termination logic of the default executor.
 new 68e8655  Moved the Docker executor declaration into a header.
 new d440095  Fixed compiler error.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/CMakeLists.txt   |   2 +-
 src/Makefile.am  |   3 +-
 src/docker/CMakeLists.txt|  20 ---
 src/docker/executor.cpp  | 323 ++-
 src/docker/executor.hpp  |  53 +++
 src/exec/exec.cpp|   2 +
 src/launcher/CMakeLists.txt  |   5 +
 src/launcher/docker_executor.cpp | 265 
 8 files changed, 409 insertions(+), 264 deletions(-)
 delete mode 100644 src/docker/CMakeLists.txt
 create mode 100644 src/launcher/docker_executor.cpp



[mesos] 01/02: Moved the Docker executor declaration into a header.

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 68e8655c8fb6dbc41de6afb66a569583b32f78d3
Author: Greg Mann 
AuthorDate: Thu Jul 25 12:17:41 2019 -0700

Moved the Docker executor declaration into a header.

This moves the declaration of the Docker executor into the
Docker executor header file and moves the code for the Docker
executor binary into a new launcher implementation file.

This change will enable the Mesos executor driver
implementation to make use of the `DockerExecutor` symbol.

Review: https://reviews.apache.org/r/71033/
---
 src/CMakeLists.txt   |   2 +-
 src/Makefile.am  |   3 +-
 src/docker/CMakeLists.txt|  20 ---
 src/docker/executor.cpp  | 323 ++-
 src/docker/executor.hpp  |  53 +++
 src/launcher/CMakeLists.txt  |   5 +
 src/launcher/docker_executor.cpp | 265 
 7 files changed, 407 insertions(+), 264 deletions(-)

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 865ad59..f911bab 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -212,6 +212,7 @@ set(COMMON_SRC
 
 set(DOCKER_SRC
   docker/docker.cpp
+  docker/executor.cpp
   docker/spec.cpp)
 
 set(EXECUTOR_SRC
@@ -519,7 +520,6 @@ endif ()
 ##
 add_subdirectory(checks)
 add_subdirectory(cli)
-add_subdirectory(docker)
 add_subdirectory(examples)
 add_subdirectory(launcher)
 add_subdirectory(local)
diff --git a/src/Makefile.am b/src/Makefile.am
index 3d6751f..30ff894 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -995,6 +995,7 @@ libmesos_no_3rdparty_la_SOURCES +=  
\
   common/validation.cpp
\
   common/values.cpp\
   docker/docker.cpp\
+  docker/executor.cpp  \
   docker/spec.cpp  \
   exec/exec.cpp
\
   executor/executor.cpp
\
@@ -1754,7 +1755,7 @@ mesos_usage_CPPFLAGS = $(MESOS_CPPFLAGS)
 mesos_usage_LDADD = libmesos.la $(LDADD)
 
 pkglibexec_PROGRAMS += mesos-docker-executor
-mesos_docker_executor_SOURCES = docker/executor.cpp
+mesos_docker_executor_SOURCES = launcher/docker_executor.cpp
 mesos_docker_executor_CPPFLAGS = $(MESOS_CPPFLAGS)
 mesos_docker_executor_LDADD = libmesos.la $(LDADD)
 
diff --git a/src/docker/CMakeLists.txt b/src/docker/CMakeLists.txt
deleted file mode 100644
index 1196664..000
--- a/src/docker/CMakeLists.txt
+++ /dev/null
@@ -1,20 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# THE DOCKER EXECUTOR EXECUTABLE.
-#
-add_executable(mesos-docker-executor executor.cpp)
-target_link_libraries(mesos-docker-executor PRIVATE mesos)
diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index 0d75801..c930ce0 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -810,266 +810,105 @@ private:
 };
 
 
-class DockerExecutor : public Executor
+DockerExecutor::DockerExecutor(
+const Owned& docker,
+const string& container,
+const string& sandboxDirectory,
+const string& mappedDirectory,
+const Duration& shutdownGracePeriod,
+const string& launcherDir,
+const map& taskEnvironment,
+const Option& defaultContainerDNS,
+bool cgroupsEnableCfs)
 {
-public:
-  DockerExecutor(
-  const Owned& docker,
-  const string& container,
-  const string& sandboxDirectory,
-  const string& mappedDirectory,
-  const Duration& shutdownGracePeriod,
-  const string& launcherDir,
-  const map& taskEnvironment,
-  const Option& defaultContainerDNS,
-  bool cgroupsEnableCfs)
-  {
-process = Owned(new DockerExecutorProcess(
-   

[mesos] 02/02: Fixed compiler error.

2020-02-04 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit d440095b5a94872eabbd66def49a588d6f8f2b38
Author: Andrei Budnik 
AuthorDate: Tue Feb 4 13:12:19 2020 +0100

Fixed compiler error.
---
 src/exec/exec.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index e964a11..05d62b1 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -47,6 +47,8 @@
 
 #include "common/protobuf_utils.hpp"
 
+#include "docker/executor.hpp"
+
 #include "logging/flags.hpp"
 #include "logging/logging.hpp"
 



[mesos] 01/02: Changed termination logic of the Docker executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit ff98f12a50a56c13688b87068a116d1d08142f49
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 13:35:02 2020 +0100

Changed termination logic of the Docker executor.

Previously, the Docker executor terminated itself after a task's
container had terminated. This could lead to termination of the
executor before processing of a terminal status update by the agent.
In order to mitigate this issue, the executor slept for one second to
give a chance to send all status updates and receive all status update
acknowledgments before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the Docker executor after receiving a terminal
status update acknowledgment. Also, this patch increases the timeout
from one second to one minute for fail-safety.

Review: https://reviews.apache.org/r/72055
---
 src/docker/executor.cpp | 14 ++
 src/exec/exec.cpp   | 19 +++
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index 7b48f84..0d75801 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -199,7 +199,9 @@ public:
 
   driver->sendStatusUpdate(status);
 
-  _stop();
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // the terminal update for some reason.
+  delay(Seconds(60), self(), ::_stop);
   return;
 }
 
@@ -691,17 +693,13 @@ private:
 CHECK_SOME(driver);
 driver.get()->sendStatusUpdate(taskStatus);
 
-_stop();
+// This is a fail safe in case the agent doesn't send an ACK for
+// the terminal update for some reason.
+delay(Seconds(60), self(), ::_stop);
   }
 
   void _stop()
   {
-// A hack for now ... but we need to wait until the status update
-// is sent to the slave before we shut ourselves down.
-// TODO(tnachen): Remove this hack and also the same hack in the
-// command executor when we have the new HTTP APIs to wait until
-// an ack.
-os::sleep(Seconds(1));
 driver.get()->stop();
   }
 
diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index 0c76a3f..e964a11 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -395,10 +395,29 @@ protected:
   return;
 }
 
+if (!updates.contains(uuid_.get())) {
+  LOG(WARNING) << "Ignoring unknown status update acknowledgement "
+   << uuid_.get() << " for task " << taskId
+   << " of framework " << frameworkId;
+  return;
+}
+
 VLOG(1) << "Executor received status update acknowledgement "
 << uuid_.get() << " for task " << taskId
 << " of framework " << frameworkId;
 
+// If this is a terminal status update acknowledgment for the Docker
+// executor, stop the driver to terminate the executor.
+//
+// TODO(abudnik): This is a workaround for MESOS-9847. A better solution
+// is to update supported API for the Docker executor from V0 to V1. It
+// will allow the executor to handle status update acknowledgments itself.
+if (mesos::internal::protobuf::isTerminalState(
+updates[uuid_.get()].status().state()) &&
+dynamic_cast(executor)) {
+  driver->stop();
+}
+
 // Remove the corresponding update.
 updates.erase(uuid_.get());
 



[mesos] branch 1.5.x updated (2f2146a -> 84b7af3)

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 2f2146a  Remembered whether an executor was agent-generated.
 new ff98f12  Changed termination logic of the Docker executor.
 new 84b7af3  Changed termination logic of the default executor.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/docker/executor.cpp   | 14 ++
 src/exec/exec.cpp | 19 +++
 src/launcher/default_executor.cpp | 25 +++--
 3 files changed, 44 insertions(+), 14 deletions(-)



[mesos] 02/02: Changed termination logic of the default executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 84b7af3409d8af343da0f0420e168a42de4b110f
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 19:07:50 2020 +0100

Changed termination logic of the default executor.

Previously, the default executor terminated itself after all containers
had terminated. This could lead to termination of the executor before
processing of a terminal status update by the agent. In order
to mitigate this issue, the executor slept for one second to give a
chance to send all status updates and receive all status update
acknowledgements before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the default executor if all status updates have
been acknowledged by the agent and no running containers left.
Also, this patch increases the timeout from one second to one minute
for fail-safety.

Review: https://reviews.apache.org/r/72029
---
 src/launcher/default_executor.cpp | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/src/launcher/default_executor.cpp 
b/src/launcher/default_executor.cpp
index 8720dad..c6b13e9 100644
--- a/src/launcher/default_executor.cpp
+++ b/src/launcher/default_executor.cpp
@@ -267,6 +267,12 @@ public:
   containers.at(taskId)->acknowledged = true;
 }
 
+// Terminate the executor if all status updates have been acknowledged
+// by the agent and no running containers left.
+if (containers.empty() && unacknowledgedUpdates.empty()) {
+  terminate(self());
+}
+
 break;
   }
 
@@ -1060,14 +1066,21 @@ protected:
 
   void _shutdown()
   {
-const Duration duration = Seconds(1);
+if (unacknowledgedUpdates.empty()) {
+  terminate(self());
+} else {
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // a status update for some reason.
+  const Duration duration = Seconds(60);
 
-LOG(INFO) << "Terminating after " << duration;
+  LOG(INFO) << "Terminating after " << duration;
 
-// TODO(qianzhang): Remove this hack since the executor now receives
-// acknowledgements for status updates. The executor can terminate
-// after it receives an ACK for a terminal status update.
-os::sleep(duration);
+  delay(duration, self(), ::__shutdown);
+}
+  }
+
+  void __shutdown()
+  {
 terminate(self());
   }
 



[mesos] 02/02: Changed termination logic of the default executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 205525eb56a33e58bed1fc38e0b32189b19d3fbc
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 19:07:50 2020 +0100

Changed termination logic of the default executor.

Previously, the default executor terminated itself after all containers
had terminated. This could lead to termination of the executor before
processing of a terminal status update by the agent. In order
to mitigate this issue, the executor slept for one second to give a
chance to send all status updates and receive all status update
acknowledgements before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the default executor if all status updates have
been acknowledged by the agent and no running containers left.
Also, this patch increases the timeout from one second to one minute
for fail-safety.

Review: https://reviews.apache.org/r/72029
---
 src/launcher/default_executor.cpp | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/src/launcher/default_executor.cpp 
b/src/launcher/default_executor.cpp
index 76c6106..ca17c70 100644
--- a/src/launcher/default_executor.cpp
+++ b/src/launcher/default_executor.cpp
@@ -276,6 +276,12 @@ public:
   containers.at(taskId)->acknowledged = true;
 }
 
+// Terminate the executor if all status updates have been acknowledged
+// by the agent and no running containers left.
+if (containers.empty() && unacknowledgedUpdates.empty()) {
+  terminate(self());
+}
+
 break;
   }
 
@@ -1070,14 +1076,21 @@ protected:
 
   void _shutdown()
   {
-const Duration duration = Seconds(1);
+if (unacknowledgedUpdates.empty()) {
+  terminate(self());
+} else {
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // a status update for some reason.
+  const Duration duration = Seconds(60);
 
-LOG(INFO) << "Terminating after " << duration;
+  LOG(INFO) << "Terminating after " << duration;
 
-// TODO(qianzhang): Remove this hack since the executor now receives
-// acknowledgements for status updates. The executor can terminate
-// after it receives an ACK for a terminal status update.
-os::sleep(duration);
+  delay(duration, self(), ::__shutdown);
+}
+  }
+
+  void __shutdown()
+  {
 terminate(self());
   }
 



[mesos] branch 1.6.x updated (7e4d380 -> 205525e)

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 7e4d380  Remembered whether an executor was agent-generated.
 new f511f25  Changed termination logic of the Docker executor.
 new 205525e  Changed termination logic of the default executor.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/docker/executor.cpp   | 14 ++
 src/exec/exec.cpp | 19 +++
 src/launcher/default_executor.cpp | 25 +++--
 3 files changed, 44 insertions(+), 14 deletions(-)



[mesos] 01/02: Changed termination logic of the Docker executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit f511f25be9d850ee9b65fc3ec5f54d149beb2f19
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 13:35:02 2020 +0100

Changed termination logic of the Docker executor.

Previously, the Docker executor terminated itself after a task's
container had terminated. This could lead to termination of the
executor before processing of a terminal status update by the agent.
In order to mitigate this issue, the executor slept for one second to
give a chance to send all status updates and receive all status update
acknowledgments before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the Docker executor after receiving a terminal
status update acknowledgment. Also, this patch increases the timeout
from one second to one minute for fail-safety.

Review: https://reviews.apache.org/r/72055
---
 src/docker/executor.cpp | 14 ++
 src/exec/exec.cpp   | 19 +++
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index 57cfbcd..d5c905d 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -217,7 +217,9 @@ public:
 
   driver->sendStatusUpdate(status);
 
-  _stop();
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // the terminal update for some reason.
+  delay(Seconds(60), self(), ::_stop);
   return;
 }
 
@@ -750,17 +752,13 @@ private:
 CHECK_SOME(driver);
 driver.get()->sendStatusUpdate(taskStatus);
 
-_stop();
+// This is a fail safe in case the agent doesn't send an ACK for
+// the terminal update for some reason.
+delay(Seconds(60), self(), ::_stop);
   }
 
   void _stop()
   {
-// A hack for now ... but we need to wait until the status update
-// is sent to the slave before we shut ourselves down.
-// TODO(tnachen): Remove this hack and also the same hack in the
-// command executor when we have the new HTTP APIs to wait until
-// an ack.
-os::sleep(Seconds(1));
 driver.get()->stop();
   }
 
diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index 65a671d..adeb413 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -395,10 +395,29 @@ protected:
   return;
 }
 
+if (!updates.contains(uuid_.get())) {
+  LOG(WARNING) << "Ignoring unknown status update acknowledgement "
+   << uuid_.get() << " for task " << taskId
+   << " of framework " << frameworkId;
+  return;
+}
+
 VLOG(1) << "Executor received status update acknowledgement "
 << uuid_.get() << " for task " << taskId
 << " of framework " << frameworkId;
 
+// If this is a terminal status update acknowledgment for the Docker
+// executor, stop the driver to terminate the executor.
+//
+// TODO(abudnik): This is a workaround for MESOS-9847. A better solution
+// is to update supported API for the Docker executor from V0 to V1. It
+// will allow the executor to handle status update acknowledgments itself.
+if (mesos::internal::protobuf::isTerminalState(
+updates[uuid_.get()].status().state()) &&
+dynamic_cast(executor)) {
+  driver->stop();
+}
+
 // Remove the corresponding update.
 updates.erase(uuid_.get());
 



[mesos] branch 1.7.x updated (47f2a11 -> 5b39908)

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 47f2a11  Remembered whether an executor was agent-generated.
 new 6a7da28  Changed termination logic of the Docker executor.
 new 5b39908  Changed termination logic of the default executor.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/docker/executor.cpp   | 14 ++
 src/exec/exec.cpp | 19 +++
 src/launcher/default_executor.cpp | 25 +++--
 3 files changed, 44 insertions(+), 14 deletions(-)



[mesos] 02/02: Changed termination logic of the default executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 5b399080eee11ee03f4bc6c09b791c24670da6c1
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 19:07:50 2020 +0100

Changed termination logic of the default executor.

Previously, the default executor terminated itself after all containers
had terminated. This could lead to termination of the executor before
processing of a terminal status update by the agent. In order
to mitigate this issue, the executor slept for one second to give a
chance to send all status updates and receive all status update
acknowledgements before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the default executor if all status updates have
been acknowledged by the agent and no running containers left.
Also, this patch increases the timeout from one second to one minute
for fail-safety.

Review: https://reviews.apache.org/r/72029
---
 src/launcher/default_executor.cpp | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/src/launcher/default_executor.cpp 
b/src/launcher/default_executor.cpp
index b89b036..6a049b3 100644
--- a/src/launcher/default_executor.cpp
+++ b/src/launcher/default_executor.cpp
@@ -278,6 +278,12 @@ public:
   containers.at(taskId)->acknowledged = true;
 }
 
+// Terminate the executor if all status updates have been acknowledged
+// by the agent and no running containers left.
+if (containers.empty() && unacknowledgedUpdates.empty()) {
+  terminate(self());
+}
+
 break;
   }
 
@@ -1072,14 +1078,21 @@ protected:
 
   void _shutdown()
   {
-const Duration duration = Seconds(1);
+if (unacknowledgedUpdates.empty()) {
+  terminate(self());
+} else {
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // a status update for some reason.
+  const Duration duration = Seconds(60);
 
-LOG(INFO) << "Terminating after " << duration;
+  LOG(INFO) << "Terminating after " << duration;
 
-// TODO(qianzhang): Remove this hack since the executor now receives
-// acknowledgements for status updates. The executor can terminate
-// after it receives an ACK for a terminal status update.
-os::sleep(duration);
+  delay(duration, self(), ::__shutdown);
+}
+  }
+
+  void __shutdown()
+  {
 terminate(self());
   }
 



[mesos] 01/02: Changed termination logic of the Docker executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 6a7da284d1b89f8a144ed2f896f005a5ee9d4aea
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 13:35:02 2020 +0100

Changed termination logic of the Docker executor.

Previously, the Docker executor terminated itself after a task's
container had terminated. This could lead to termination of the
executor before processing of a terminal status update by the agent.
In order to mitigate this issue, the executor slept for one second to
give a chance to send all status updates and receive all status update
acknowledgments before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the Docker executor after receiving a terminal
status update acknowledgment. Also, this patch increases the timeout
from one second to one minute for fail-safety.

Review: https://reviews.apache.org/r/72055
---
 src/docker/executor.cpp | 14 ++
 src/exec/exec.cpp   | 19 +++
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index f638e4b..ed1b718 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -224,7 +224,9 @@ public:
 
   driver->sendStatusUpdate(status);
 
-  _stop();
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // the terminal update for some reason.
+  delay(Seconds(60), self(), ::_stop);
   return;
 }
 
@@ -757,17 +759,13 @@ private:
 CHECK_SOME(driver);
 driver.get()->sendStatusUpdate(taskStatus);
 
-_stop();
+// This is a fail safe in case the agent doesn't send an ACK for
+// the terminal update for some reason.
+delay(Seconds(60), self(), ::_stop);
   }
 
   void _stop()
   {
-// A hack for now ... but we need to wait until the status update
-// is sent to the slave before we shut ourselves down.
-// TODO(tnachen): Remove this hack and also the same hack in the
-// command executor when we have the new HTTP APIs to wait until
-// an ack.
-os::sleep(Seconds(1));
 driver.get()->stop();
   }
 
diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index c0fa3b6..dee2074 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -395,10 +395,29 @@ protected:
   return;
 }
 
+if (!updates.contains(uuid_.get())) {
+  LOG(WARNING) << "Ignoring unknown status update acknowledgement "
+   << uuid_.get() << " for task " << taskId
+   << " of framework " << frameworkId;
+  return;
+}
+
 VLOG(1) << "Executor received status update acknowledgement "
 << uuid_.get() << " for task " << taskId
 << " of framework " << frameworkId;
 
+// If this is a terminal status update acknowledgment for the Docker
+// executor, stop the driver to terminate the executor.
+//
+// TODO(abudnik): This is a workaround for MESOS-9847. A better solution
+// is to update supported API for the Docker executor from V0 to V1. It
+// will allow the executor to handle status update acknowledgments itself.
+if (mesos::internal::protobuf::isTerminalState(
+updates[uuid_.get()].status().state()) &&
+dynamic_cast(executor)) {
+  driver->stop();
+}
+
 // Remove the corresponding update.
 updates.erase(uuid_.get());
 



[mesos] 02/02: Changed termination logic of the default executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit a2ca451aab4625e126b9e7b470eb9f7c232dd746
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 19:07:50 2020 +0100

Changed termination logic of the default executor.

Previously, the default executor terminated itself after all containers
had terminated. This could lead to termination of the executor before
processing of a terminal status update by the agent. In order
to mitigate this issue, the executor slept for one second to give a
chance to send all status updates and receive all status update
acknowledgements before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the default executor if all status updates have
been acknowledged by the agent and no running containers left.
Also, this patch increases the timeout from one second to one minute
for fail-safety.

Review: https://reviews.apache.org/r/72029
---
 src/launcher/default_executor.cpp | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/src/launcher/default_executor.cpp 
b/src/launcher/default_executor.cpp
index 5837cfa..af520f7 100644
--- a/src/launcher/default_executor.cpp
+++ b/src/launcher/default_executor.cpp
@@ -278,6 +278,12 @@ public:
   containers.at(taskId)->acknowledged = true;
 }
 
+// Terminate the executor if all status updates have been acknowledged
+// by the agent and no running containers left.
+if (containers.empty() && unacknowledgedUpdates.empty()) {
+  terminate(self());
+}
+
 break;
   }
 
@@ -1076,14 +1082,21 @@ protected:
 
   void _shutdown()
   {
-const Duration duration = Seconds(1);
+if (unacknowledgedUpdates.empty()) {
+  terminate(self());
+} else {
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // a status update for some reason.
+  const Duration duration = Seconds(60);
 
-LOG(INFO) << "Terminating after " << duration;
+  LOG(INFO) << "Terminating after " << duration;
 
-// TODO(qianzhang): Remove this hack since the executor now receives
-// acknowledgements for status updates. The executor can terminate
-// after it receives an ACK for a terminal status update.
-os::sleep(duration);
+  delay(duration, self(), ::__shutdown);
+}
+  }
+
+  void __shutdown()
+  {
 terminate(self());
   }
 



[mesos] 02/02: Changed termination logic of the default executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.9.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit f37ae68a8f0d23a2e0f31812b8fe4494109769c6
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 19:07:50 2020 +0100

Changed termination logic of the default executor.

Previously, the default executor terminated itself after all containers
had terminated. This could lead to termination of the executor before
processing of a terminal status update by the agent. In order
to mitigate this issue, the executor slept for one second to give a
chance to send all status updates and receive all status update
acknowledgements before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the default executor if all status updates have
been acknowledged by the agent and no running containers left.
Also, this patch increases the timeout from one second to one minute
for fail-safety.

Review: https://reviews.apache.org/r/72029
---
 src/launcher/default_executor.cpp | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/src/launcher/default_executor.cpp 
b/src/launcher/default_executor.cpp
index 521494a..7781382 100644
--- a/src/launcher/default_executor.cpp
+++ b/src/launcher/default_executor.cpp
@@ -279,6 +279,12 @@ public:
   containers.at(taskId)->acknowledged = true;
 }
 
+// Terminate the executor if all status updates have been acknowledged
+// by the agent and no running containers left.
+if (containers.empty() && unacknowledgedUpdates.empty()) {
+  terminate(self());
+}
+
 break;
   }
 
@@ -1088,14 +1094,21 @@ protected:
 
   void _shutdown()
   {
-const Duration duration = Seconds(1);
+if (unacknowledgedUpdates.empty()) {
+  terminate(self());
+} else {
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // a status update for some reason.
+  const Duration duration = Seconds(60);
 
-LOG(INFO) << "Terminating after " << duration;
+  LOG(INFO) << "Terminating after " << duration;
 
-// TODO(qianzhang): Remove this hack since the executor now receives
-// acknowledgements for status updates. The executor can terminate
-// after it receives an ACK for a terminal status update.
-os::sleep(duration);
+  delay(duration, self(), ::__shutdown);
+}
+  }
+
+  void __shutdown()
+  {
 terminate(self());
   }
 



[mesos] 01/02: Changed termination logic of the Docker executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 1bd0b37a7e522d63319db426dae7068b901eaea6
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 13:35:02 2020 +0100

Changed termination logic of the Docker executor.

Previously, the Docker executor terminated itself after a task's
container had terminated. This could lead to termination of the
executor before processing of a terminal status update by the agent.
In order to mitigate this issue, the executor slept for one second to
give a chance to send all status updates and receive all status update
acknowledgments before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the Docker executor after receiving a terminal
status update acknowledgment. Also, this patch increases the timeout
from one second to one minute for fail-safety.

Review: https://reviews.apache.org/r/72055
---
 src/docker/executor.cpp | 14 ++
 src/exec/exec.cpp   | 19 +++
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index f638e4b..ed1b718 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -224,7 +224,9 @@ public:
 
   driver->sendStatusUpdate(status);
 
-  _stop();
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // the terminal update for some reason.
+  delay(Seconds(60), self(), ::_stop);
   return;
 }
 
@@ -757,17 +759,13 @@ private:
 CHECK_SOME(driver);
 driver.get()->sendStatusUpdate(taskStatus);
 
-_stop();
+// This is a fail safe in case the agent doesn't send an ACK for
+// the terminal update for some reason.
+delay(Seconds(60), self(), ::_stop);
   }
 
   void _stop()
   {
-// A hack for now ... but we need to wait until the status update
-// is sent to the slave before we shut ourselves down.
-// TODO(tnachen): Remove this hack and also the same hack in the
-// command executor when we have the new HTTP APIs to wait until
-// an ack.
-os::sleep(Seconds(1));
 driver.get()->stop();
   }
 
diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index c0fa3b6..dee2074 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -395,10 +395,29 @@ protected:
   return;
 }
 
+if (!updates.contains(uuid_.get())) {
+  LOG(WARNING) << "Ignoring unknown status update acknowledgement "
+   << uuid_.get() << " for task " << taskId
+   << " of framework " << frameworkId;
+  return;
+}
+
 VLOG(1) << "Executor received status update acknowledgement "
 << uuid_.get() << " for task " << taskId
 << " of framework " << frameworkId;
 
+// If this is a terminal status update acknowledgment for the Docker
+// executor, stop the driver to terminate the executor.
+//
+// TODO(abudnik): This is a workaround for MESOS-9847. A better solution
+// is to update supported API for the Docker executor from V0 to V1. It
+// will allow the executor to handle status update acknowledgments itself.
+if (mesos::internal::protobuf::isTerminalState(
+updates[uuid_.get()].status().state()) &&
+dynamic_cast(executor)) {
+  driver->stop();
+}
+
 // Remove the corresponding update.
 updates.erase(uuid_.get());
 



[mesos] branch 1.8.x updated (a66419b -> a2ca451)

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from a66419b  Remembered whether an executor was agent-generated.
 new 1bd0b37  Changed termination logic of the Docker executor.
 new a2ca451  Changed termination logic of the default executor.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/docker/executor.cpp   | 14 ++
 src/exec/exec.cpp | 19 +++
 src/launcher/default_executor.cpp | 25 +++--
 3 files changed, 44 insertions(+), 14 deletions(-)



[mesos] branch 1.9.x updated (298d1d7 -> f37ae68)

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.9.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 298d1d7  Remembered whether an executor was agent-generated.
 new 3d60cba  Changed termination logic of the Docker executor.
 new f37ae68  Changed termination logic of the default executor.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/docker/executor.cpp   | 14 ++
 src/exec/exec.cpp | 19 +++
 src/launcher/default_executor.cpp | 25 +++--
 3 files changed, 44 insertions(+), 14 deletions(-)



[mesos] 01/02: Changed termination logic of the Docker executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.9.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 3d60cba39d0377a7dc19b4c47f3bb0807418fe50
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 13:35:02 2020 +0100

Changed termination logic of the Docker executor.

Previously, the Docker executor terminated itself after a task's
container had terminated. This could lead to termination of the
executor before processing of a terminal status update by the agent.
In order to mitigate this issue, the executor slept for one second to
give a chance to send all status updates and receive all status update
acknowledgments before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the Docker executor after receiving a terminal
status update acknowledgment. Also, this patch increases the timeout
from one second to one minute for fail-safety.

Review: https://reviews.apache.org/r/72055
---
 src/docker/executor.cpp | 14 ++
 src/exec/exec.cpp   | 19 +++
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index 132f42b..ebbbc0d 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -224,7 +224,9 @@ public:
 
   driver->sendStatusUpdate(status);
 
-  _stop();
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // the terminal update for some reason.
+  delay(Seconds(60), self(), ::_stop);
   return;
 }
 
@@ -773,17 +775,13 @@ private:
 CHECK_SOME(driver);
 driver.get()->sendStatusUpdate(taskStatus);
 
-_stop();
+// This is a fail safe in case the agent doesn't send an ACK for
+// the terminal update for some reason.
+delay(Seconds(60), self(), ::_stop);
   }
 
   void _stop()
   {
-// A hack for now ... but we need to wait until the status update
-// is sent to the slave before we shut ourselves down.
-// TODO(tnachen): Remove this hack and also the same hack in the
-// command executor when we have the new HTTP APIs to wait until
-// an ack.
-os::sleep(Seconds(1));
 driver.get()->stop();
   }
 
diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index 67e082e..8ed77f2 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -409,10 +409,29 @@ protected:
   return;
 }
 
+if (!updates.contains(uuid_.get())) {
+  LOG(WARNING) << "Ignoring unknown status update acknowledgement "
+   << uuid_.get() << " for task " << taskId
+   << " of framework " << frameworkId;
+  return;
+}
+
 VLOG(1) << "Executor received status update acknowledgement "
 << uuid_.get() << " for task " << taskId
 << " of framework " << frameworkId;
 
+// If this is a terminal status update acknowledgment for the Docker
+// executor, stop the driver to terminate the executor.
+//
+// TODO(abudnik): This is a workaround for MESOS-9847. A better solution
+// is to update supported API for the Docker executor from V0 to V1. It
+// will allow the executor to handle status update acknowledgments itself.
+if (mesos::internal::protobuf::isTerminalState(
+updates[uuid_.get()].status().state()) &&
+dynamic_cast(executor)) {
+  driver->stop();
+}
+
 // Remove the corresponding update.
 updates.erase(uuid_.get());
 



[mesos] branch master updated (3d197d8 -> 683dfc1)

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 3d197d8  Fixed the build on Windows.
 new 457c389  Changed termination logic of the Docker executor.
 new 683dfc1  Changed termination logic of the default executor.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/docker/executor.cpp   | 14 ++
 src/exec/exec.cpp | 19 +++
 src/launcher/default_executor.cpp | 25 +++--
 3 files changed, 44 insertions(+), 14 deletions(-)



[mesos] 02/02: Changed termination logic of the default executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 683dfc1ffb0b1ca758a07d19ab3badd8cac62dc7
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 19:07:50 2020 +0100

Changed termination logic of the default executor.

Previously, the default executor terminated itself after all containers
had terminated. This could lead to termination of the executor before
processing of a terminal status update by the agent. In order
to mitigate this issue, the executor slept for one second to give a
chance to send all status updates and receive all status update
acknowledgements before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the default executor if all status updates have
been acknowledged by the agent and no running containers left.
Also, this patch increases the timeout from one second to one minute
for fail-safety.

Review: https://reviews.apache.org/r/72029
---
 src/launcher/default_executor.cpp | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/src/launcher/default_executor.cpp 
b/src/launcher/default_executor.cpp
index 7997003..b431d8f 100644
--- a/src/launcher/default_executor.cpp
+++ b/src/launcher/default_executor.cpp
@@ -279,6 +279,12 @@ public:
   containers.at(taskId)->acknowledged = true;
 }
 
+// Terminate the executor if all status updates have been acknowledged
+// by the agent and no running containers left.
+if (containers.empty() && unacknowledgedUpdates.empty()) {
+  terminate(self());
+}
+
 break;
   }
 
@@ -1088,14 +1094,21 @@ protected:
 
   void _shutdown()
   {
-const Duration duration = Seconds(1);
+if (unacknowledgedUpdates.empty()) {
+  terminate(self());
+} else {
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // a status update for some reason.
+  const Duration duration = Seconds(60);
 
-LOG(INFO) << "Terminating after " << duration;
+  LOG(INFO) << "Terminating after " << duration;
 
-// TODO(qianzhang): Remove this hack since the executor now receives
-// acknowledgements for status updates. The executor can terminate
-// after it receives an ACK for a terminal status update.
-os::sleep(duration);
+  delay(duration, self(), ::__shutdown);
+}
+  }
+
+  void __shutdown()
+  {
 terminate(self());
   }
 



[mesos] 01/02: Changed termination logic of the Docker executor.

2020-02-03 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 457c38967bf9a53c1c5cd2743385937a26f413f6
Author: Andrei Budnik 
AuthorDate: Wed Jan 29 13:35:02 2020 +0100

Changed termination logic of the Docker executor.

Previously, the Docker executor terminated itself after a task's
container had terminated. This could lead to termination of the
executor before processing of a terminal status update by the agent.
In order to mitigate this issue, the executor slept for one second to
give a chance to send all status updates and receive all status update
acknowledgments before terminating itself. This might have led to
various race conditions in some circumstances (e.g., on a slow host).
This patch terminates the Docker executor after receiving a terminal
status update acknowledgment. Also, this patch increases the timeout
from one second to one minute for fail-safety.

Review: https://reviews.apache.org/r/72055
---
 src/docker/executor.cpp | 14 ++
 src/exec/exec.cpp   | 19 +++
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index 132f42b..ebbbc0d 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -224,7 +224,9 @@ public:
 
   driver->sendStatusUpdate(status);
 
-  _stop();
+  // This is a fail safe in case the agent doesn't send an ACK for
+  // the terminal update for some reason.
+  delay(Seconds(60), self(), ::_stop);
   return;
 }
 
@@ -773,17 +775,13 @@ private:
 CHECK_SOME(driver);
 driver.get()->sendStatusUpdate(taskStatus);
 
-_stop();
+// This is a fail safe in case the agent doesn't send an ACK for
+// the terminal update for some reason.
+delay(Seconds(60), self(), ::_stop);
   }
 
   void _stop()
   {
-// A hack for now ... but we need to wait until the status update
-// is sent to the slave before we shut ourselves down.
-// TODO(tnachen): Remove this hack and also the same hack in the
-// command executor when we have the new HTTP APIs to wait until
-// an ack.
-os::sleep(Seconds(1));
 driver.get()->stop();
   }
 
diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index 69e5e24..0fba5e0 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -409,10 +409,29 @@ protected:
   return;
 }
 
+if (!updates.contains(uuid_.get())) {
+  LOG(WARNING) << "Ignoring unknown status update acknowledgement "
+   << uuid_.get() << " for task " << taskId
+   << " of framework " << frameworkId;
+  return;
+}
+
 VLOG(1) << "Executor received status update acknowledgement "
 << uuid_.get() << " for task " << taskId
 << " of framework " << frameworkId;
 
+// If this is a terminal status update acknowledgment for the Docker
+// executor, stop the driver to terminate the executor.
+//
+// TODO(abudnik): This is a workaround for MESOS-9847. A better solution
+// is to update supported API for the Docker executor from V0 to V1. It
+// will allow the executor to handle status update acknowledgments itself.
+if (mesos::internal::protobuf::isTerminalState(
+updates[uuid_.get()].status().state()) &&
+dynamic_cast(executor)) {
+  driver->stop();
+}
+
 // Remove the corresponding update.
 updates.erase(uuid_.get());
 



[mesos] branch master updated: Fixed broken links to Linux scheduler documentation.

2019-09-25 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new afd86db  Fixed broken links to Linux scheduler documentation.
afd86db is described below

commit afd86db2ef651b33188586a5cafe3e532439534d
Author: Andrei Budnik 
AuthorDate: Wed Sep 25 13:55:15 2019 +0200

Fixed broken links to Linux scheduler documentation.
---
 docs/isolators/cgroups-cpu.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/isolators/cgroups-cpu.md b/docs/isolators/cgroups-cpu.md
index 5392132..6c08b1d 100644
--- a/docs/isolators/cgroups-cpu.md
+++ b/docs/isolators/cgroups-cpu.md
@@ -17,8 +17,8 @@ and assigns `cpu` and `cpuacct` cgroups to each container 
launched by
 Mesos Containerizer.
 
 Cgroups `cpu` subsystem provides 2 mechanisms of limiting the amount
-of CPU time used by cgroups: [CFS 
shares](https://github.com/torvalds/linux/blob/master/Documentation/scheduler/sched-design-CFS.txt)
-and [CFS 
bandwidth](https://github.com/torvalds/linux/blob/master/Documentation/scheduler/sched-bwc.txt)
+of CPU time used by cgroups: [CFS 
shares](https://github.com/torvalds/linux/blob/master/Documentation/scheduler/sched-design-CFS.rst)
+and [CFS 
bandwidth](https://github.com/torvalds/linux/blob/master/Documentation/scheduler/sched-bwc.rst)
 control. The first one can guarantee some minimum number of CPU
 "shares" to a cgroup when the system is under heavy load. It, however,
 does not limit the amount of CPU time available to a cgroup when the



[mesos] 01/02: Added `futureTracker` to the `SlaveOptions` in tests.

2019-09-24 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit dee4b849c8179ea46947c8ea4dd031f6eb37b659
Author: Andrei Budnik 
AuthorDate: Fri Sep 6 17:01:56 2019 +0200

Added `futureTracker` to the `SlaveOptions` in tests.

`PendingFutureTracker` is shared across both Mesos containerizer and
the agent, so we need to add an option to be able to start a slave in
tests with an instance of the `futureTrack` as a parameter.

Review: https://reviews.apache.org/r/71454
---
 src/tests/cluster.cpp | 20 +---
 src/tests/cluster.hpp |  5 +
 src/tests/mesos.cpp   |  1 +
 src/tests/mesos.hpp   |  8 
 4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/src/tests/cluster.cpp b/src/tests/cluster.cpp
index 1646516..f7bc882 100644
--- a/src/tests/cluster.cpp
+++ b/src/tests/cluster.cpp
@@ -416,6 +416,7 @@ Try> Slave::create(
 const Option& qosController,
 const Option& secretGenerator,
 const Option& providedAuthorizer,
+const Option& futureTracker,
 bool mock)
 {
   process::Owned slave(new Slave());
@@ -447,10 +448,15 @@ Try> Slave::create(
   }
 #endif // __WINDOWS__
 
-  Try futureTracker = PendingFutureTracker::create();
-  if (futureTracker.isError()) {
-return Error(
-"Failed to create pending future tracker: " + futureTracker.error());
+  // If the future tracker is not provided, create a default one.
+  if (futureTracker.isNone()) {
+Try _futureTracker = PendingFutureTracker::create();
+if (_futureTracker.isError()) {
+  return Error(
+  "Failed to create pending future tracker: " + 
_futureTracker.error());
+}
+
+slave->futureTracker.reset(_futureTracker.get());
   }
 
   // If the containerizer is not provided, create a default one.
@@ -468,7 +474,7 @@ Try> Slave::create(
   gc.getOrElse(slave->gc.get()),
   nullptr,
   volumeGidManager,
-  futureTracker.get());
+  futureTracker.getOrElse(slave->futureTracker.get()));
 
 if (_containerizer.isError()) {
   return Error("Failed to create containerizer: " + 
_containerizer.error());
@@ -621,7 +627,7 @@ Try> Slave::create(
 qosController.getOrElse(slave->qosController.get()),
 secretGenerator.getOrElse(slave->secretGenerator.get()),
 volumeGidManager,
-futureTracker.get(),
+futureTracker.getOrElse(slave->futureTracker.get()),
 authorizer));
   } else {
 slave->slave.reset(new slave::Slave(
@@ -636,7 +642,7 @@ Try> Slave::create(
 qosController.getOrElse(slave->qosController.get()),
 secretGenerator.getOrElse(slave->secretGenerator.get()),
 volumeGidManager,
-futureTracker.get(),
+futureTracker.getOrElse(slave->futureTracker.get()),
 authorizer));
   }
 
diff --git a/src/tests/cluster.hpp b/src/tests/cluster.hpp
index c04ee14..415a60f 100644
--- a/src/tests/cluster.hpp
+++ b/src/tests/cluster.hpp
@@ -170,6 +170,7 @@ public:
   const Option& qosController = None(),
   const Option& secretGenerator = None(),
   const Option& authorizer = None(),
+  const Option& futureTracker = None(),
   bool mock = false);
 
   ~Slave();
@@ -227,6 +228,10 @@ private:
   // of who created it).
   slave::Containerizer* containerizer = nullptr;
 
+  // Pending future tracker must be destroyed last since there may be
+  // pending requests related to the dependant objects declared below.
+  process::Owned futureTracker;
+
   // Dependencies that are created by the factory method.
   process::Owned authorizer;
   process::Owned ownedContainerizer;
diff --git a/src/tests/mesos.cpp b/src/tests/mesos.cpp
index e77db22..664c302 100644
--- a/src/tests/mesos.cpp
+++ b/src/tests/mesos.cpp
@@ -347,6 +347,7 @@ Try> MesosTest::StartSlave(const 
SlaveOptions& options)
   options.qosController,
   options.secretGenerator,
   options.authorizer,
+  options.futureTracker,
   options.mock);
 
   if (slave.isSome() && !options.mock) {
diff --git a/src/tests/mesos.hpp b/src/tests/mesos.hpp
index ecde518..73b1866 100644
--- a/src/tests/mesos.hpp
+++ b/src/tests/mesos.hpp
@@ -186,6 +186,13 @@ struct SlaveOptions
 return *this;
   }
 
+  SlaveOptions& withFutureTracker(
+  const Option& futureTracker)
+  {
+this->futureTracker = futureTracker;
+return *this;
+  }
+
   mesos::master::detector::MasterDetector* detector;
   bool mock;
   Option flags;
@@ -197,6 +204,7 @@ struct SlaveOptions
   Option qosController;
   Option secretGenerator;
   Option authorizer;
+  Option futureTracker;
 };
 
 



[mesos] branch master updated (ce801bd -> 1122674)

2019-09-24 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from ce801bd  Fixed documentation on master fault domains.
 new dee4b84  Added `futureTracker` to the `SlaveOptions` in tests.
 new 1122674  Implemented an integration test for /containerizer/debug 
endpoint.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/tests/cluster.cpp |  20 --
 src/tests/cluster.hpp |   5 ++
 src/tests/mesos.cpp   |   1 +
 src/tests/mesos.hpp   |   8 +++
 src/tests/slave_tests.cpp | 158 ++
 5 files changed, 185 insertions(+), 7 deletions(-)



[mesos] 02/02: Implemented an integration test for /containerizer/debug endpoint.

2019-09-24 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 1122674a5c03894e4552d46cfa26dca0557a8f68
Author: Andrei Budnik 
AuthorDate: Fri Sep 6 13:25:35 2019 +0200

Implemented an integration test for /containerizer/debug endpoint.

This test starts an agent with the MockIsolator to intercept calls to
its `prepare` method, then it launches a task, which gets stuck.
We check that the /containerizer/debug endpoint returns a non-empty
list of pending futures including `MockIsolator::prepare`. After
setting the promise for the `prepare`, the task successfully starts
and we expect for the /containerizer/debug endpoint to return an
empty list of pending operations.

Review: https://reviews.apache.org/r/71455
---
 src/tests/slave_tests.cpp | 158 ++
 1 file changed, 158 insertions(+)

diff --git a/src/tests/slave_tests.cpp b/src/tests/slave_tests.cpp
index c147bfc..fd4fd6b 100644
--- a/src/tests/slave_tests.cpp
+++ b/src/tests/slave_tests.cpp
@@ -63,6 +63,7 @@
 #endif // USE_SSL_SOCKET
 
 #include "common/build.hpp"
+#include "common/future_tracker.hpp"
 #include "common/http.hpp"
 #include "common/protobuf_utils.hpp"
 
@@ -83,6 +84,8 @@
 #include "slave/containerizer/fetcher_process.hpp"
 
 #include "slave/containerizer/mesos/containerizer.hpp"
+#include "slave/containerizer/mesos/isolator_tracker.hpp"
+#include "slave/containerizer/mesos/launcher.hpp"
 
 #include "tests/active_user_test_helper.hpp"
 #include "tests/containerizer.hpp"
@@ -94,6 +97,7 @@
 #include "tests/resources_utils.hpp"
 #include "tests/utils.hpp"
 
+#include "tests/containerizer/isolator.hpp"
 #include "tests/containerizer/mock_containerizer.hpp"
 
 using namespace mesos::internal::slave;
@@ -112,7 +116,9 @@ using mesos::master::detector::StandaloneMasterDetector;
 using mesos::v1::resource_provider::Event;
 
 using mesos::slave::ContainerConfig;
+using mesos::slave::ContainerLaunchInfo;
 using mesos::slave::ContainerTermination;
+using mesos::slave::Isolator;
 
 using mesos::v1::scheduler::Call;
 using mesos::v1::scheduler::Mesos;
@@ -2787,6 +2793,158 @@ TEST_F(SlaveTest, ContainersEndpoint)
 }
 
 
+// This test ensures that `/containerizer/debug` endpoint returns a non-empty
+// list of pending futures when an isolator becomes unresponsive during
+// container launch.
+TEST_F(SlaveTest, ROOT_ContainerizerDebugEndpoint)
+{
+  Try> master = StartMaster();
+  ASSERT_SOME(master);
+
+  slave::Flags flags = CreateSlaveFlags();
+
+  Try _launcher = SubprocessLauncher::create(flags);
+  ASSERT_SOME(_launcher);
+
+  Owned launcher(_launcher.get());
+
+  MockIsolator* mockIsolator = new MockIsolator();
+
+  Future prepare;
+  Promise> promise;
+
+  EXPECT_CALL(*mockIsolator, recover(_, _))
+.WillOnce(Return(Nothing()));
+
+  // Simulate a long prepare from the isolator.
+  EXPECT_CALL(*mockIsolator, prepare(_, _))
+.WillOnce(DoAll(FutureSatisfy(),
+Return(promise.future(;
+
+  EXPECT_CALL(*mockIsolator, update(_, _))
+.WillOnce(Return(Nothing()));
+
+  // Wrap `mockIsolator` in `PendingFutureTracker`.
+  Try _futureTracker = PendingFutureTracker::create();
+  ASSERT_SOME(_futureTracker);
+
+  Owned futureTracker(_futureTracker.get());
+
+  Owned isolator = Owned(new IsolatorTracker(
+  Owned(mockIsolator), "MockIsolator", futureTracker.get()));
+
+  Fetcher fetcher(flags);
+
+  Try> provisioner = Provisioner::create(flags);
+  ASSERT_SOME(provisioner);
+
+  Try create = MesosContainerizer::create(
+  flags,
+  true,
+  ,
+  nullptr,
+  launcher,
+  provisioner->share(),
+  {isolator});
+
+  ASSERT_SOME(create);
+
+  Owned containerizer(create.get());
+
+  Owned detector = master.get()->createDetector();
+
+  Try> slave =
+StartSlave(SlaveOptions(detector.get())
+ .withContainerizer(containerizer.get())
+ .withFutureTracker(futureTracker.get()));
+
+  ASSERT_SOME(slave);
+
+  MockScheduler sched;
+  MesosSchedulerDriver driver(
+  , DEFAULT_FRAMEWORK_INFO, master.get()->pid, DEFAULT_CREDENTIAL);
+
+  EXPECT_CALL(sched, registered(, _, _));
+
+  Future> offers;
+  EXPECT_CALL(sched, resourceOffers(, _))
+.WillOnce(FutureArg<1>())
+.WillRepeatedly(Return());// Ignore subsequent offers.
+
+  driver.start();
+
+  AWAIT_READY(offers);
+  ASSERT_FALSE(offers->empty());
+
+  const Offer& offer = offers.get()[0];
+
+  // Launch a task and wait until it is in RUNNING status.
+  TaskInfo task = createTask(
+  offer.slave_id(),
+  Resources::parse("cpus:1;mem:32").get(),
+  SLEEP_COMMAND(1000));
+
+  Future statusStarting;
+

[mesos] branch master updated: Added `SlaveOptions` for wrapping all parameters of `StartSlave`.

2019-09-06 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new 6c2a94c  Added `SlaveOptions` for wrapping all parameters of 
`StartSlave`.
6c2a94c is described below

commit 6c2a94ca0eca90e6d3517e4400f4529ddce0b84c
Author: Andrei Budnik 
AuthorDate: Mon Sep 2 17:15:52 2019 +0200

Added `SlaveOptions` for wrapping all parameters of `StartSlave`.

This patch introduces a `SlaveOptions` struct which holds optional
parameters accepted by `cluster::Slave::create`. Added an overload
of `StartSlave` that accepts `SlaveOptions`. It's a preferred way of
creating and starting an instance of `cluster::Slave` in tests, since
underlying `cluster::Slave::create` accepts a long list of optional
arguments, which might be extended in the future.

Review: https://reviews.apache.org/r/71424
---
 src/tests/mesos.cpp | 281 +---
 src/tests/mesos.hpp | 104 +++
 2 files changed, 153 insertions(+), 232 deletions(-)

diff --git a/src/tests/mesos.cpp b/src/tests/mesos.cpp
index 0396ce7..e77db22 100644
--- a/src/tests/mesos.cpp
+++ b/src/tests/mesos.cpp
@@ -334,25 +334,22 @@ Try> MesosTest::StartMaster(
 }
 
 
-Try> MesosTest::StartSlave(
-MasterDetector* detector,
-const Option& flags,
-bool mock)
+Try> MesosTest::StartSlave(const SlaveOptions& options)
 {
   Try> slave = cluster::Slave::create(
-  detector,
-  flags.isNone() ? CreateSlaveFlags() : flags.get(),
-  None(),
-  None(),
-  None(),
-  None(),
-  None(),
-  None(),
-  None(),
-  None(),
-  mock);
-
-  if (slave.isSome() && !mock) {
+  options.detector,
+  options.flags.isNone() ? CreateSlaveFlags() : options.flags.get(),
+  options.id,
+  options.containerizer,
+  options.gc,
+  options.taskStatusUpdateManager,
+  options.resourceEstimator,
+  options.qosController,
+  options.secretGenerator,
+  options.authorizer,
+  options.mock);
+
+  if (slave.isSome() && !options.mock) {
 slave.get()->start();
   }
 
@@ -362,28 +359,23 @@ Try> MesosTest::StartSlave(
 
 Try> MesosTest::StartSlave(
 MasterDetector* detector,
-slave::Containerizer* containerizer,
 const Option& flags,
 bool mock)
 {
-  Try> slave = cluster::Slave::create(
-  detector,
-  flags.isNone() ? CreateSlaveFlags() : flags.get(),
-  None(),
-  containerizer,
-  None(),
-  None(),
-  None(),
-  None(),
-  None(),
-  None(),
-  mock);
+  return StartSlave(SlaveOptions(detector, mock)
+.withFlags(flags));
+}
 
-  if (slave.isSome() && !mock) {
-slave.get()->start();
-  }
 
-  return slave;
+Try> MesosTest::StartSlave(
+MasterDetector* detector,
+slave::Containerizer* containerizer,
+const Option& flags,
+bool mock)
+{
+  return StartSlave(SlaveOptions(detector, mock)
+.withFlags(flags)
+.withContainerizer(containerizer));
 }
 
 
@@ -393,24 +385,9 @@ Try> MesosTest::StartSlave(
 const Option& flags,
 bool mock)
 {
-  Try> slave = cluster::Slave::create(
-  detector,
-  flags.isNone() ? CreateSlaveFlags() : flags.get(),
-  id,
-  None(),
-  None(),
-  None(),
-  None(),
-  None(),
-  None(),
-  None(),
-  mock);
-
-  if (slave.isSome() && !mock) {
-slave.get()->start();
-  }
-
-  return slave;
+  return StartSlave(SlaveOptions(detector, mock)
+.withFlags(flags)
+.withId(id));
 }
 
 
@@ -420,17 +397,10 @@ Try> MesosTest::StartSlave(
 const string& id,
 const Option& flags)
 {
-  Try> slave = cluster::Slave::create(
-  detector,
-  flags.isNone() ? CreateSlaveFlags() : flags.get(),
-  id,
-  containerizer);
-
-  if (slave.isSome()) {
-slave.get()->start();
-  }
-
-  return slave;
+  return StartSlave(SlaveOptions(detector)
+.withFlags(flags)
+.withId(id)
+.withContainerizer(containerizer));
 }
 
 
@@ -440,24 +410,9 @@ Try> MesosTest::StartSlave(
 const Option& flags,
 bool mock)
 {
-  Try> slave = cluster::Slave::create(
-  detector,
-  flags.isNone() ? CreateSlaveFlags() : flags.get(),
-  None(),
-  None(),
-  gc,
-  None(),
-  None(),
-  None(),
-  None(),
-  None(),
-  mock);
-
-  if (slave.isSome() && !mock) {
-slave.get()->start();
-  }
-
-  return slave;
+  return StartSlave(SlaveOptions(detector, mock)
+.withFlags(flags)
+.withGc(gc));
 }
 
 
@@ -466,20 +421,9 @@ Try> MesosTest::StartSlave(
 mesos::slave::ResourceEstimator* resourceEstimator,
 const Option& flags)
 {
-  Try> slave = cluster::Slave::

[mesos] branch 1.9.x updated: Added /containerizer/debug endpoint to 1.9.0 CHANGELOG.

2019-08-28 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.9.x
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/1.9.x by this push:
 new bfcd446  Added /containerizer/debug endpoint to 1.9.0 CHANGELOG.
bfcd446 is described below

commit bfcd4461aaf51444398783a511f973a3a1d9a101
Author: Andrei Budnik 
AuthorDate: Wed Aug 28 12:10:34 2019 +0200

Added /containerizer/debug endpoint to 1.9.0 CHANGELOG.

Review: https://reviews.apache.org/r/71387
---
 CHANGELOG | 5 +
 1 file changed, 5 insertions(+)

diff --git a/CHANGELOG b/CHANGELOG
index aaad58f..307f116 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -36,6 +36,11 @@ This release contains the following highlights:
 * The Mesos containerizer now includes ephemeral overlayfs storage in the
   task disk quota as well as sandbox storage. (MESOS-9900)
 
+* A new `/containerizer/debug` HTTP endpoint has been added. This endpoint
+  exposes debug information for the Mesos containerizer. At the moment, it
+  returns a list of pending operations related to Isolators and Launchers.
+  (MESOS-9756)
+
 Additional API Changes:
 
   * Mesos components will now forego TLS certificate validation for incoming



[mesos] branch master updated: Added /containerizer/debug endpoint to 1.9.0 CHANGELOG.

2019-08-28 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new 279af61  Added /containerizer/debug endpoint to 1.9.0 CHANGELOG.
279af61 is described below

commit 279af615b3413f2ff42f96c180189f37d05d1f5b
Author: Andrei Budnik 
AuthorDate: Wed Aug 28 12:10:34 2019 +0200

Added /containerizer/debug endpoint to 1.9.0 CHANGELOG.

Review: https://reviews.apache.org/r/71387
---
 CHANGELOG | 5 +
 1 file changed, 5 insertions(+)

diff --git a/CHANGELOG b/CHANGELOG
index aaad58f..307f116 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -36,6 +36,11 @@ This release contains the following highlights:
 * The Mesos containerizer now includes ephemeral overlayfs storage in the
   task disk quota as well as sandbox storage. (MESOS-9900)
 
+* A new `/containerizer/debug` HTTP endpoint has been added. This endpoint
+  exposes debug information for the Mesos containerizer. At the moment, it
+  returns a list of pending operations related to Isolators and Launchers.
+  (MESOS-9756)
+
 Additional API Changes:
 
   * Mesos components will now forego TLS certificate validation for incoming



[mesos] 01/03: Added MESOS-9887 to the 1.8.2 CHANGELOG.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 967f105cea4bc31780bfe76bd2d62ad71ffae221
Author: Andrei Budnik 
AuthorDate: Mon Aug 26 15:02:40 2019 +0200

Added MESOS-9887 to the 1.8.2 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index fe08b76..a215e5c 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -49,6 +49,7 @@ Release Notes - Mesos - Version 1.8.2 (WIP)
   * [MESOS-9785] - Frameworks recovered from reregistered agents are not 
reported to master `/api/v1` subscribers.
   * [MESOS-9836] - Docker containerizer overwrites `/mesos/slave` cgroups.
   * [MESOS-9868] - NetworkInfo from the agent /state endpoint is not correct.
+  * [MESOS-9887] - Race condition between two terminal task status updates for 
Docker/Command executor.
   * [MESOS-9893] - `volume/secret` isolator should cleanup the stored secret 
from runtime directory when the container is destroyed.
   * [MESOS-9925] - Default executor takes a couple of seconds to start and 
subscribe Mesos agent.
 



[mesos] 03/03: Added MESOS-9887 to the 1.6.3 CHANGELOG.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 24e989e66507932809c7d852a4a62720de7cb27b
Author: Andrei Budnik 
AuthorDate: Mon Aug 26 14:44:54 2019 +0200

Added MESOS-9887 to the 1.6.3 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 3cd7661..58cf418 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -986,6 +986,7 @@ Release Notes - Mesos - Version 1.6.3 (WIP)
   * [MESOS-9856] - REVIVE call with specified role(s) clears filters for all 
roles of a framework.
   * [MESOS-9868] - NetworkInfo from the agent /state endpoint is not correct.
   * [MESOS-9870] - Simultaneous adding/removal of a role from framework's 
roles and its suppressed roles crashes the master.
+  * [MESOS-9887] - Race condition between two terminal task status updates for 
Docker/Command executor.
   * [MESOS-9893] - `volume/secret` isolator should cleanup the stored secret 
from runtime directory when the container is destroyed.
 
 ** Improvement



[mesos] branch master updated (f0be237 -> 24e989e)

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from f0be237  Fixed out-of-order processing of terminal status updates in 
agent.
 new 967f105  Added MESOS-9887 to the 1.8.2 CHANGELOG.
 new 6b2d101  Added MESOS-9887 to the 1.7.3 CHANGELOG.
 new 24e989e  Added MESOS-9887 to the 1.6.3 CHANGELOG.

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG | 3 +++
 1 file changed, 3 insertions(+)



[mesos] 02/03: Added MESOS-9887 to the 1.7.3 CHANGELOG.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 6b2d101770ae8853d021b8cc5d0f5ae587302a54
Author: Andrei Budnik 
AuthorDate: Mon Aug 26 14:58:45 2019 +0200

Added MESOS-9887 to the 1.7.3 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index a215e5c..3cd7661 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -504,6 +504,7 @@ Release Notes - Mesos - Version 1.7.3 (WIP)
   * [MESOS-9856] - REVIVE call with specified role(s) clears filters for all 
roles of a framework.
   * [MESOS-9868] - NetworkInfo from the agent /state endpoint is not correct.
   * [MESOS-9870] - Simultaneous adding/removal of a role from framework's 
roles and its suppressed roles crashes the master.
+  * [MESOS-9887] - Race condition between two terminal task status updates for 
Docker/Command executor.
   * [MESOS-9893] - `volume/secret` isolator should cleanup the stored secret 
from runtime directory when the container is destroyed.
   * [MESOS-9925] - Default executor takes a couple of seconds to start and 
subscribe Mesos agent.
 



[mesos] 02/03: Fixed out-of-order processing of terminal status updates in agent.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 4bbb0376cd584a4160a2c5c2f0ac4f3ecaa5e622
Author: Andrei Budnik 
AuthorDate: Tue Aug 20 19:24:44 2019 +0200

Fixed out-of-order processing of terminal status updates in agent.

Previously, Mesos agent could send TASK_FAILED status update on
executor termination while processing of TASK_FINISHED status update
was in progress. Processing of task status updates involves sending
requests to the containerizer, which might finish processing of these
requests out-of-order, e.g. `MesosContainerizer::status`. Also,
the agent does not overwrite status of the terminal status update once
it's stored in the `terminatedTasks`. Hence, there was a race condition
between two terminal status updates.

Note that V1 Executors are not affected by this problem because they
wait for an acknowledgement of the terminal status update by the agent
before terminating.

This patch introduces a new data structure `pendingStatusUpdates`,
which holds a list of status updates that are being processed. This
data structure allows validating the order of processing of status
updates by the agent.

Review: https://reviews.apache.org/r/71343
---
 src/slave/slave.cpp | 62 ++---
 src/slave/slave.hpp |  6 ++
 2 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index 50a7d68..8d8cef3 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -5727,6 +5727,8 @@ void Slave::statusUpdate(StatusUpdate update, const 
Option& pid)
 
   metrics.valid_status_updates++;
 
+  executor->addPendingTaskStatus(status);
+
   // Before sending update, we need to retrieve the container status
   // if the task reached the executor. For tasks that are queued, we
   // do not need to send the container status and we must
@@ -5938,6 +5940,17 @@ void Slave::___statusUpdate(
   VLOG(1) << "Task status update manager successfully handled status update "
   << update;
 
+  const TaskStatus& status = update.status();
+
+  Executor* executor = nullptr;
+  Framework* framework = getFramework(update.framework_id());
+  if (framework != nullptr) {
+executor = framework->getExecutor(status.task_id());
+if (executor != nullptr) {
+  executor->removePendingTaskStatus(status);
+}
+  }
+
   if (pid == UPID()) {
 return;
   }
@@ -5945,7 +5958,7 @@ void Slave::___statusUpdate(
   StatusUpdateAcknowledgementMessage message;
   message.mutable_framework_id()->MergeFrom(update.framework_id());
   message.mutable_slave_id()->MergeFrom(update.slave_id());
-  message.mutable_task_id()->MergeFrom(update.status().task_id());
+  message.mutable_task_id()->MergeFrom(status.task_id());
   message.set_uuid(update.uuid());
 
   // Task status update manager successfully handled the status update.
@@ -5957,14 +5970,12 @@ void Slave::___statusUpdate(
 send(pid.get(), message);
   } else {
 // Acknowledge the HTTP based executor.
-Framework* framework = getFramework(update.framework_id());
 if (framework == nullptr) {
   LOG(WARNING) << "Ignoring sending acknowledgement for status update "
<< update << " of unknown framework";
   return;
 }
 
-Executor* executor = framework->getExecutor(update.status().task_id());
 if (executor == nullptr) {
   // Refer to the comments in 'statusUpdate()' on when this can
   // happen.
@@ -10520,6 +10531,33 @@ void Executor::recoverTask(const TaskState& state, 
bool recheckpointTask)
 }
 
 
+void Executor::addPendingTaskStatus(const TaskStatus& status)
+{
+  auto uuid = id::UUID::fromBytes(status.uuid()).get();
+  pendingStatusUpdates[status.task_id()][uuid] = status;
+}
+
+
+void Executor::removePendingTaskStatus(const TaskStatus& status)
+{
+  const TaskID& taskId = status.task_id();
+
+  auto uuid = id::UUID::fromBytes(status.uuid()).get();
+
+  if (!pendingStatusUpdates.contains(taskId) ||
+  !pendingStatusUpdates[taskId].contains(uuid)) {
+LOG(WARNING) << "Unknown pending status update (uuid: " << uuid << ")";
+return;
+  }
+
+  pendingStatusUpdates[taskId].erase(uuid);
+
+  if (pendingStatusUpdates[taskId].empty()) {
+pendingStatusUpdates.erase(taskId);
+  }
+}
+
+
 Try Executor::updateTaskState(const TaskStatus& status)
 {
   bool terminal = protobuf::isTerminalState(status.state());
@@ -10543,6 +10581,24 @@ Try Executor::updateTaskState(const 
TaskStatus& status)
 task = launchedTasks.at(status.task_id());
 
 if (terminal) {
+  if (pendingStatusUpdates.contains(status.task_id())) {
+auto statusUpdates = pending

[mesos] 01/03: Added missing `return` statement in `Slave::statusUpdate`.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 14abb82925cdbce746238bc20dc7b8c279a96a67
Author: Andrei Budnik 
AuthorDate: Fri Aug 23 14:36:18 2019 +0200

Added missing `return` statement in `Slave::statusUpdate`.

Previously, if `statusUpdate` was called for a pending task, it would
forward the status update and then continue executing `statusUpdate`,
which then checks if there is an executor that is aware of this task.
Given that a pending task is not known to any executor, it would always
handle it by forwarding status update one more time. This patch adds
missing `return` statement, which fixes the issue.

Review: https://reviews.apache.org/r/71361
---
 src/slave/slave.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index bf87be0..50a7d68 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -5659,6 +5659,8 @@ void Slave::statusUpdate(StatusUpdate update, const 
Option& pid)
 
 taskStatusUpdateManager->update(update, info.id())
   .onAny(defer(self(), ::___statusUpdate, lambda::_1, update, pid));
+
+return;
   }
 
   Executor* executor = framework->getExecutor(status.task_id());



[mesos] branch 1.8.x updated (f3aa802 -> adc958f)

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from f3aa802  Added MESOS-9836 to the 1.8.2 CHANGELOG.
 new 14abb82  Added missing `return` statement in `Slave::statusUpdate`.
 new 4bbb037  Fixed out-of-order processing of terminal status updates in 
agent.
 new adc958f  Added MESOS-9887 to the 1.8.2 CHANGELOG.

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG   |  1 +
 src/slave/slave.cpp | 64 ++---
 src/slave/slave.hpp |  6 +
 3 files changed, 68 insertions(+), 3 deletions(-)



[mesos] 03/03: Added MESOS-9887 to the 1.8.2 CHANGELOG.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit adc958f553c3728aab5529de56b0ddc30c0f9b68
Author: Andrei Budnik 
AuthorDate: Mon Aug 26 15:02:40 2019 +0200

Added MESOS-9887 to the 1.8.2 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index b3fca25..ff89605 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -6,6 +6,7 @@ Release Notes - Mesos - Version 1.8.2 (WIP)
   * [MESOS-9785] - Frameworks recovered from reregistered agents are not 
reported to master `/api/v1` subscribers.
   * [MESOS-9836] - Docker containerizer overwrites `/mesos/slave` cgroups.
   * [MESOS-9868] - NetworkInfo from the agent /state endpoint is not correct.
+  * [MESOS-9887] - Race condition between two terminal task status updates for 
Docker/Command executor.
   * [MESOS-9893] - `volume/secret` isolator should cleanup the stored secret 
from runtime directory when the container is destroyed.
   * [MESOS-9925] - Default executor takes a couple of seconds to start and 
subscribe Mesos agent.
 



[mesos] 02/03: Fixed out-of-order processing of terminal status updates in agent.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit b7dcc984476904d6d17f7bf699295dfa9ac8a66e
Author: Andrei Budnik 
AuthorDate: Tue Aug 20 19:24:44 2019 +0200

Fixed out-of-order processing of terminal status updates in agent.

Previously, Mesos agent could send TASK_FAILED status update on
executor termination while processing of TASK_FINISHED status update
was in progress. Processing of task status updates involves sending
requests to the containerizer, which might finish processing of these
requests out-of-order, e.g. `MesosContainerizer::status`. Also,
the agent does not overwrite status of the terminal status update once
it's stored in the `terminatedTasks`. Hence, there was a race condition
between two terminal status updates.

Note that V1 Executors are not affected by this problem because they
wait for an acknowledgement of the terminal status update by the agent
before terminating.

This patch introduces a new data structure `pendingStatusUpdates`,
which holds a list of status updates that are being processed. This
data structure allows validating the order of processing of status
updates by the agent.

Review: https://reviews.apache.org/r/71343
---
 src/slave/slave.cpp | 62 ++---
 src/slave/slave.hpp |  6 ++
 2 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index edfe3d0..f10aac2 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -5486,6 +5486,8 @@ void Slave::statusUpdate(StatusUpdate update, const 
Option& pid)
 
   metrics.valid_status_updates++;
 
+  executor->addPendingTaskStatus(status);
+
   // Before sending update, we need to retrieve the container status
   // if the task reached the executor. For tasks that are queued, we
   // do not need to send the container status and we must
@@ -5697,6 +5699,17 @@ void Slave::___statusUpdate(
   VLOG(1) << "Task status update manager successfully handled status update "
   << update;
 
+  const TaskStatus& status = update.status();
+
+  Executor* executor = nullptr;
+  Framework* framework = getFramework(update.framework_id());
+  if (framework != nullptr) {
+executor = framework->getExecutor(status.task_id());
+if (executor != nullptr) {
+  executor->removePendingTaskStatus(status);
+}
+  }
+
   if (pid == UPID()) {
 return;
   }
@@ -5704,7 +5717,7 @@ void Slave::___statusUpdate(
   StatusUpdateAcknowledgementMessage message;
   message.mutable_framework_id()->MergeFrom(update.framework_id());
   message.mutable_slave_id()->MergeFrom(update.slave_id());
-  message.mutable_task_id()->MergeFrom(update.status().task_id());
+  message.mutable_task_id()->MergeFrom(status.task_id());
   message.set_uuid(update.uuid());
 
   // Task status update manager successfully handled the status update.
@@ -5716,14 +5729,12 @@ void Slave::___statusUpdate(
 send(pid.get(), message);
   } else {
 // Acknowledge the HTTP based executor.
-Framework* framework = getFramework(update.framework_id());
 if (framework == nullptr) {
   LOG(WARNING) << "Ignoring sending acknowledgement for status update "
<< update << " of unknown framework";
   return;
 }
 
-Executor* executor = framework->getExecutor(update.status().task_id());
 if (executor == nullptr) {
   // Refer to the comments in 'statusUpdate()' on when this can
   // happen.
@@ -9861,6 +9872,33 @@ void Executor::recoverTask(const TaskState& state, bool 
recheckpointTask)
 }
 
 
+void Executor::addPendingTaskStatus(const TaskStatus& status)
+{
+  auto uuid = id::UUID::fromBytes(status.uuid()).get();
+  pendingStatusUpdates[status.task_id()][uuid] = status;
+}
+
+
+void Executor::removePendingTaskStatus(const TaskStatus& status)
+{
+  const TaskID& taskId = status.task_id();
+
+  auto uuid = id::UUID::fromBytes(status.uuid()).get();
+
+  if (!pendingStatusUpdates.contains(taskId) ||
+  !pendingStatusUpdates[taskId].contains(uuid)) {
+LOG(WARNING) << "Unknown pending status update (uuid: " << uuid << ")";
+return;
+  }
+
+  pendingStatusUpdates[taskId].erase(uuid);
+
+  if (pendingStatusUpdates[taskId].empty()) {
+pendingStatusUpdates.erase(taskId);
+  }
+}
+
+
 Try Executor::updateTaskState(const TaskStatus& status)
 {
   bool terminal = protobuf::isTerminalState(status.state());
@@ -9884,6 +9922,24 @@ Try Executor::updateTaskState(const TaskStatus& 
status)
 task = launchedTasks.at(status.task_id());
 
 if (terminal) {
+  if (pendingStatusUpdates.contains(status.task_id())) {
+auto statusUpdates = pending

[mesos] 01/03: Added missing `return` statement in `Slave::statusUpdate`.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 2d62e8ae0ef94f78c9b32be258a08d1e6e2382df
Author: Andrei Budnik 
AuthorDate: Fri Aug 23 14:36:18 2019 +0200

Added missing `return` statement in `Slave::statusUpdate`.

Previously, if `statusUpdate` was called for a pending task, it would
forward the status update and then continue executing `statusUpdate`,
which then checks if there is an executor that is aware of this task.
Given that a pending task is not known to any executor, it would always
handle it by forwarding status update one more time. This patch adds
missing `return` statement, which fixes the issue.

Review: https://reviews.apache.org/r/71361
---
 src/slave/slave.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index 1c33579..edfe3d0 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -5418,6 +5418,8 @@ void Slave::statusUpdate(StatusUpdate update, const 
Option& pid)
 
 taskStatusUpdateManager->update(update, info.id())
   .onAny(defer(self(), ::___statusUpdate, lambda::_1, update, pid));
+
+return;
   }
 
   Executor* executor = framework->getExecutor(status.task_id());



[mesos] 03/03: Added MESOS-9887 to the 1.7.3 CHANGELOG.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 80d42b9a2c9223665a82bbaaf3cbc222a094e2ef
Author: Andrei Budnik 
AuthorDate: Mon Aug 26 14:58:45 2019 +0200

Added MESOS-9887 to the 1.7.3 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 06c88db..1178228 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -29,6 +29,7 @@ Release Notes - Mesos - Version 1.7.3 (WIP)
   * [MESOS-9856] - REVIVE call with specified role(s) clears filters for all 
roles of a framework.
   * [MESOS-9868] - NetworkInfo from the agent /state endpoint is not correct.
   * [MESOS-9870] - Simultaneous adding/removal of a role from framework's 
roles and its suppressed roles crashes the master.
+  * [MESOS-9887] - Race condition between two terminal task status updates for 
Docker/Command executor.
   * [MESOS-9893] - `volume/secret` isolator should cleanup the stored secret 
from runtime directory when the container is destroyed.
   * [MESOS-9925] - Default executor takes a couple of seconds to start and 
subscribe Mesos agent.
 



[mesos] 01/03: Added missing `return` statement in `Slave::statusUpdate`.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit cc79f22fb07cfad8f248150d5a3040f846998c3a
Author: Andrei Budnik 
AuthorDate: Fri Aug 23 14:36:18 2019 +0200

Added missing `return` statement in `Slave::statusUpdate`.

Previously, if `statusUpdate` was called for a pending task, it would
forward the status update and then continue executing `statusUpdate`,
which then checks if there is an executor that is aware of this task.
Given that a pending task is not known to any executor, it would always
handle it by forwarding status update one more time. This patch adds
missing `return` statement, which fixes the issue.

Review: https://reviews.apache.org/r/71361
---
 src/slave/slave.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index 2a90e96..176d3fb 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -5388,6 +5388,8 @@ void Slave::statusUpdate(StatusUpdate update, const 
Option& pid)
 
 taskStatusUpdateManager->update(update, info.id())
   .onAny(defer(self(), ::___statusUpdate, lambda::_1, update, pid));
+
+return;
   }
 
   Executor* executor = framework->getExecutor(status.task_id());



[mesos] 02/03: Fixed out-of-order processing of terminal status updates in agent.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 3ad802ebbe34565a2fa995d834ba4928c20e5e62
Author: Andrei Budnik 
AuthorDate: Tue Aug 20 19:24:44 2019 +0200

Fixed out-of-order processing of terminal status updates in agent.

Previously, Mesos agent could send TASK_FAILED status update on
executor termination while processing of TASK_FINISHED status update
was in progress. Processing of task status updates involves sending
requests to the containerizer, which might finish processing of these
requests out-of-order, e.g. `MesosContainerizer::status`. Also,
the agent does not overwrite status of the terminal status update once
it's stored in the `terminatedTasks`. Hence, there was a race condition
between two terminal status updates.

Note that V1 Executors are not affected by this problem because they
wait for an acknowledgement of the terminal status update by the agent
before terminating.

This patch introduces a new data structure `pendingStatusUpdates`,
which holds a list of status updates that are being processed. This
data structure allows validating the order of processing of status
updates by the agent.

Review: https://reviews.apache.org/r/71343
---
 src/slave/slave.cpp | 62 ++---
 src/slave/slave.hpp |  6 ++
 2 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index 176d3fb..0861ac2 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -5456,6 +5456,8 @@ void Slave::statusUpdate(StatusUpdate update, const 
Option& pid)
 
   metrics.valid_status_updates++;
 
+  executor->addPendingTaskStatus(status);
+
   // Before sending update, we need to retrieve the container status
   // if the task reached the executor. For tasks that are queued, we
   // do not need to send the container status and we must
@@ -5667,6 +5669,17 @@ void Slave::___statusUpdate(
   VLOG(1) << "Task status update manager successfully handled status update "
   << update;
 
+  const TaskStatus& status = update.status();
+
+  Executor* executor = nullptr;
+  Framework* framework = getFramework(update.framework_id());
+  if (framework != nullptr) {
+executor = framework->getExecutor(status.task_id());
+if (executor != nullptr) {
+  executor->removePendingTaskStatus(status);
+}
+  }
+
   if (pid == UPID()) {
 return;
   }
@@ -5674,7 +5687,7 @@ void Slave::___statusUpdate(
   StatusUpdateAcknowledgementMessage message;
   message.mutable_framework_id()->MergeFrom(update.framework_id());
   message.mutable_slave_id()->MergeFrom(update.slave_id());
-  message.mutable_task_id()->MergeFrom(update.status().task_id());
+  message.mutable_task_id()->MergeFrom(status.task_id());
   message.set_uuid(update.uuid());
 
   // Task status update manager successfully handled the status update.
@@ -5686,14 +5699,12 @@ void Slave::___statusUpdate(
 send(pid.get(), message);
   } else {
 // Acknowledge the HTTP based executor.
-Framework* framework = getFramework(update.framework_id());
 if (framework == nullptr) {
   LOG(WARNING) << "Ignoring sending acknowledgement for status update "
<< update << " of unknown framework";
   return;
 }
 
-Executor* executor = framework->getExecutor(update.status().task_id());
 if (executor == nullptr) {
   // Refer to the comments in 'statusUpdate()' on when this can
   // happen.
@@ -9759,6 +9770,33 @@ void Executor::recoverTask(const TaskState& state, bool 
recheckpointTask)
 }
 
 
+void Executor::addPendingTaskStatus(const TaskStatus& status)
+{
+  auto uuid = id::UUID::fromBytes(status.uuid()).get();
+  pendingStatusUpdates[status.task_id()][uuid] = status;
+}
+
+
+void Executor::removePendingTaskStatus(const TaskStatus& status)
+{
+  const TaskID& taskId = status.task_id();
+
+  auto uuid = id::UUID::fromBytes(status.uuid()).get();
+
+  if (!pendingStatusUpdates.contains(taskId) ||
+  !pendingStatusUpdates[taskId].contains(uuid)) {
+LOG(WARNING) << "Unknown pending status update (uuid: " << uuid << ")";
+return;
+  }
+
+  pendingStatusUpdates[taskId].erase(uuid);
+
+  if (pendingStatusUpdates[taskId].empty()) {
+pendingStatusUpdates.erase(taskId);
+  }
+}
+
+
 Try Executor::updateTaskState(const TaskStatus& status)
 {
   bool terminal = protobuf::isTerminalState(status.state());
@@ -9782,6 +9820,24 @@ Try Executor::updateTaskState(const TaskStatus& 
status)
 task = launchedTasks.at(status.task_id());
 
 if (terminal) {
+  if (pendingStatusUpdates.contains(status.task_id())) {
+auto statusUpdates = pending

[mesos] branch 1.6.x updated (9badb3b -> d77029f)

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 9badb3b  Added MESOS-9836 to the 1.6.3 CHANGELOG.
 new cc79f22  Added missing `return` statement in `Slave::statusUpdate`.
 new 3ad802e  Fixed out-of-order processing of terminal status updates in 
agent.
 new d77029f  Added MESOS-9887 to the 1.6.3 CHANGELOG.

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG   |  1 +
 src/slave/slave.cpp | 64 ++---
 src/slave/slave.hpp |  6 +
 3 files changed, 68 insertions(+), 3 deletions(-)



[mesos] branch master updated (48c20bf -> f0be237)

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 48c20bf  Updated site's dependencies.
 new 8aae23e  Added missing `return` statement in `Slave::statusUpdate`.
 new f0be237  Fixed out-of-order processing of terminal status updates in 
agent.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/slave/slave.cpp | 64 ++---
 src/slave/slave.hpp |  6 +
 2 files changed, 67 insertions(+), 3 deletions(-)



[mesos] 01/02: Added missing `return` statement in `Slave::statusUpdate`.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 8aae23ec7cd4bc50532df0b1d1ea6ec23ce078f8
Author: Andrei Budnik 
AuthorDate: Fri Aug 23 14:36:18 2019 +0200

Added missing `return` statement in `Slave::statusUpdate`.

Previously, if `statusUpdate` was called for a pending task, it would
forward the status update and then continue executing `statusUpdate`,
which then checks if there is an executor that is aware of this task.
Given that a pending task is not known to any executor, it would always
handle it by forwarding status update one more time. This patch adds
missing `return` statement, which fixes the issue.

Review: https://reviews.apache.org/r/71361
---
 src/slave/slave.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index 882040d..45f1584 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -5879,6 +5879,8 @@ void Slave::statusUpdate(StatusUpdate update, const 
Option& pid)
 
 taskStatusUpdateManager->update(update, info.id())
   .onAny(defer(self(), ::___statusUpdate, lambda::_1, update, pid));
+
+return;
   }
 
   Executor* executor = framework->getExecutor(status.task_id());



[mesos] 02/02: Fixed out-of-order processing of terminal status updates in agent.

2019-08-26 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit f0be23765531b05661ed7f1b124faf96744aa80b
Author: Andrei Budnik 
AuthorDate: Tue Aug 20 19:24:44 2019 +0200

Fixed out-of-order processing of terminal status updates in agent.

Previously, Mesos agent could send TASK_FAILED status update on
executor termination while processing of TASK_FINISHED status update
was in progress. Processing of task status updates involves sending
requests to the containerizer, which might finish processing of these
requests out-of-order, e.g. `MesosContainerizer::status`. Also,
the agent does not overwrite status of the terminal status update once
it's stored in the `terminatedTasks`. Hence, there was a race condition
between two terminal status updates.

Note that V1 Executors are not affected by this problem because they
wait for an acknowledgement of the terminal status update by the agent
before terminating.

This patch introduces a new data structure `pendingStatusUpdates`,
which holds a list of status updates that are being processed. This
data structure allows validating the order of processing of status
updates by the agent.

Review: https://reviews.apache.org/r/71343
---
 src/slave/slave.cpp | 62 ++---
 src/slave/slave.hpp |  6 ++
 2 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index 45f1584..4e93656 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -5947,6 +5947,8 @@ void Slave::statusUpdate(StatusUpdate update, const 
Option& pid)
 
   metrics.valid_status_updates++;
 
+  executor->addPendingTaskStatus(status);
+
   // Before sending update, we need to retrieve the container status
   // if the task reached the executor. For tasks that are queued, we
   // do not need to send the container status and we must
@@ -6158,6 +6160,17 @@ void Slave::___statusUpdate(
   VLOG(1) << "Task status update manager successfully handled status update "
   << update;
 
+  const TaskStatus& status = update.status();
+
+  Executor* executor = nullptr;
+  Framework* framework = getFramework(update.framework_id());
+  if (framework != nullptr) {
+executor = framework->getExecutor(status.task_id());
+if (executor != nullptr) {
+  executor->removePendingTaskStatus(status);
+}
+  }
+
   if (pid == UPID()) {
 return;
   }
@@ -6165,7 +6178,7 @@ void Slave::___statusUpdate(
   StatusUpdateAcknowledgementMessage message;
   message.mutable_framework_id()->MergeFrom(update.framework_id());
   message.mutable_slave_id()->MergeFrom(update.slave_id());
-  message.mutable_task_id()->MergeFrom(update.status().task_id());
+  message.mutable_task_id()->MergeFrom(status.task_id());
   message.set_uuid(update.uuid());
 
   // Task status update manager successfully handled the status update.
@@ -6177,14 +6190,12 @@ void Slave::___statusUpdate(
 send(pid.get(), message);
   } else {
 // Acknowledge the HTTP based executor.
-Framework* framework = getFramework(update.framework_id());
 if (framework == nullptr) {
   LOG(WARNING) << "Ignoring sending acknowledgement for status update "
<< update << " of unknown framework";
   return;
 }
 
-Executor* executor = framework->getExecutor(update.status().task_id());
 if (executor == nullptr) {
   // Refer to the comments in 'statusUpdate()' on when this can
   // happen.
@@ -10795,6 +10806,33 @@ void Executor::recoverTask(const TaskState& state, 
bool recheckpointTask)
 }
 
 
+void Executor::addPendingTaskStatus(const TaskStatus& status)
+{
+  auto uuid = id::UUID::fromBytes(status.uuid()).get();
+  pendingStatusUpdates[status.task_id()][uuid] = status;
+}
+
+
+void Executor::removePendingTaskStatus(const TaskStatus& status)
+{
+  const TaskID& taskId = status.task_id();
+
+  auto uuid = id::UUID::fromBytes(status.uuid()).get();
+
+  if (!pendingStatusUpdates.contains(taskId) ||
+  !pendingStatusUpdates[taskId].contains(uuid)) {
+LOG(WARNING) << "Unknown pending status update (uuid: " << uuid << ")";
+return;
+  }
+
+  pendingStatusUpdates[taskId].erase(uuid);
+
+  if (pendingStatusUpdates[taskId].empty()) {
+pendingStatusUpdates.erase(taskId);
+  }
+}
+
+
 Try Executor::updateTaskState(const TaskStatus& status)
 {
   bool terminal = protobuf::isTerminalState(status.state());
@@ -10818,6 +10856,24 @@ Try Executor::updateTaskState(const 
TaskStatus& status)
 task = launchedTasks.at(status.task_id());
 
 if (terminal) {
+  if (pendingStatusUpdates.contains(status.task_id())) {
+auto statusUpdates = pending

[mesos] branch master updated: Fixed compilation error on Mac OS.

2019-06-11 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new f2bedda  Fixed compilation error on Mac OS.
f2bedda is described below

commit f2beddaf04b82c7bf89b4abc6a8f76ab7e41705b
Author: Andrei Budnik 
AuthorDate: Fri Jun 7 18:02:01 2019 +0200

Fixed compilation error on Mac OS.

This patch adds missing switch case to fix compilation error introduced
in `bc5a57122635`.

Review: https://reviews.apache.org/r/70811
---
 src/slave/containerizer/mesos/launch.cpp | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/slave/containerizer/mesos/launch.cpp 
b/src/slave/containerizer/mesos/launch.cpp
index a69a688..b29ec55 100644
--- a/src/slave/containerizer/mesos/launch.cpp
+++ b/src/slave/containerizer/mesos/launch.cpp
@@ -496,8 +496,8 @@ static Try executeFileOperation(const 
ContainerFileOperation& op)
   return Nothing();
 }
 
-#ifdef __linux__
 case ContainerFileOperation::MOUNT: {
+#ifdef __linux__
   Try result = mountContainerFilesystem(op.mount());
   if (result.isError()) {
 return Error(
@@ -506,8 +506,10 @@ static Try executeFileOperation(const 
ContainerFileOperation& op)
   }
 
   return Nothing();
-}
+#else
+  return Error("Container mount is not supported on non-Linux systems");
 #endif // __linux__
+}
 
 case ContainerFileOperation::RENAME: {
   Try result =



[mesos] branch 1.8.x updated: Added license for libseccomp to toplevel LICENSE file.

2019-05-08 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/1.8.x by this push:
 new ea28002  Added license for libseccomp to toplevel LICENSE file.
ea28002 is described below

commit ea2800276ceff4154c8c13add01ce79c4d97554a
Author: Benjamin Bannier 
AuthorDate: Wed May 8 15:50:05 2019 +0200

Added license for libseccomp to toplevel LICENSE file.

Review: https://reviews.apache.org/r/70610/
---
 LICENSE | 462 
 1 file changed, 462 insertions(+)

diff --git a/LICENSE b/LICENSE
index 8632389..e5eb300 100644
--- a/LICENSE
+++ b/LICENSE
@@ -833,3 +833,465 @@ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
LIABILITY,
 WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 OTHER DEALINGS IN THE SOFTWARE.
+
+==
+For the libseccomp library
+(3rdparty/libseccomp-2.3.3.tar.gz):
+==
+
+  GNU LESSER GENERAL PUBLIC LICENSE
+   Version 2.1, February 1999
+
+ Copyright (C) 1991, 1999 Free Software Foundation, Inc.
+ 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+[This is the first released version of the Lesser GPL.  It also counts
+ as the successor of the GNU Library Public License, version 2, hence
+ the version number 2.1.]
+
+Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+Licenses are intended to guarantee your freedom to share and change
+free software--to make sure the software is free for all its users.
+
+  This license, the Lesser General Public License, applies to some
+specially designated software packages--typically libraries--of the
+Free Software Foundation and other authors who decide to use it.  You
+can use it too, but we suggest you first think carefully about whether
+this license or the ordinary General Public License is the better
+strategy to use in any particular case, based on the explanations below.
+
+  When we speak of free software, we are referring to freedom of use,
+not price.  Our General Public Licenses are designed to make sure that
+you have the freedom to distribute copies of free software (and charge
+for this service if you wish); that you receive source code or can get
+it if you want it; that you can change the software and use pieces of
+it in new free programs; and that you are informed that you can do
+these things.
+
+  To protect your rights, we need to make restrictions that forbid
+distributors to deny you these rights or to ask you to surrender these
+rights.  These restrictions translate to certain responsibilities for
+you if you distribute copies of the library or if you modify it.
+
+  For example, if you distribute copies of the library, whether gratis
+or for a fee, you must give the recipients all the rights that we gave
+you.  You must make sure that they, too, receive or can get the source
+code.  If you link other code with the library, you must provide
+complete object files to the recipients, so that they can relink them
+with the library after making changes to the library and recompiling
+it.  And you must show them these terms so they know their rights.
+
+  We protect your rights with a two-step method: (1) we copyright the
+library, and (2) we offer you this license, which gives you legal
+permission to copy, distribute and/or modify the library.
+
+  To protect each distributor, we want to make it very clear that
+there is no warranty for the free library.  Also, if the library is
+modified by someone else and passed on, the recipients should know
+that what they have is not the original version, so that the original
+author's reputation will not be affected by problems that might be
+introduced by others.
+
+  Finally, software patents pose a constant threat to the existence of
+any free program.  We wish to make sure that a company cannot
+effectively restrict the users of a free program by obtaining a
+restrictive license from a patent holder.  Therefore, we insist that
+any patent license obtained for a version of the library must be
+consistent with the full freedom of use specified in this license.
+
+  Most GNU software, including some libraries, is covered by the
+ordinary GNU General Public License.  This license, the GNU Lesser
+General Public License, applies to certain designated libraries, and
+is quite different from the ordinary General Public License.  We use
+this license for certain libraries

[mesos] branch master updated: Added license for libseccomp to toplevel LICENSE file.

2019-05-08 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new d3d83ee  Added license for libseccomp to toplevel LICENSE file.
d3d83ee is described below

commit d3d83eed0df88582885fe10fc9d1f8c6b3a3b41e
Author: Benjamin Bannier 
AuthorDate: Wed May 8 15:47:27 2019 +0200

Added license for libseccomp to toplevel LICENSE file.

Review: https://reviews.apache.org/r/70610/
---
 LICENSE | 462 
 1 file changed, 462 insertions(+)

diff --git a/LICENSE b/LICENSE
index 1d9958d..2c5919d 100644
--- a/LICENSE
+++ b/LICENSE
@@ -833,3 +833,465 @@ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
LIABILITY,
 WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 OTHER DEALINGS IN THE SOFTWARE.
+
+==
+For the libseccomp library
+(3rdparty/libseccomp-2.3.3.tar.gz):
+==
+
+  GNU LESSER GENERAL PUBLIC LICENSE
+   Version 2.1, February 1999
+
+ Copyright (C) 1991, 1999 Free Software Foundation, Inc.
+ 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+[This is the first released version of the Lesser GPL.  It also counts
+ as the successor of the GNU Library Public License, version 2, hence
+ the version number 2.1.]
+
+Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+Licenses are intended to guarantee your freedom to share and change
+free software--to make sure the software is free for all its users.
+
+  This license, the Lesser General Public License, applies to some
+specially designated software packages--typically libraries--of the
+Free Software Foundation and other authors who decide to use it.  You
+can use it too, but we suggest you first think carefully about whether
+this license or the ordinary General Public License is the better
+strategy to use in any particular case, based on the explanations below.
+
+  When we speak of free software, we are referring to freedom of use,
+not price.  Our General Public Licenses are designed to make sure that
+you have the freedom to distribute copies of free software (and charge
+for this service if you wish); that you receive source code or can get
+it if you want it; that you can change the software and use pieces of
+it in new free programs; and that you are informed that you can do
+these things.
+
+  To protect your rights, we need to make restrictions that forbid
+distributors to deny you these rights or to ask you to surrender these
+rights.  These restrictions translate to certain responsibilities for
+you if you distribute copies of the library or if you modify it.
+
+  For example, if you distribute copies of the library, whether gratis
+or for a fee, you must give the recipients all the rights that we gave
+you.  You must make sure that they, too, receive or can get the source
+code.  If you link other code with the library, you must provide
+complete object files to the recipients, so that they can relink them
+with the library after making changes to the library and recompiling
+it.  And you must show them these terms so they know their rights.
+
+  We protect your rights with a two-step method: (1) we copyright the
+library, and (2) we offer you this license, which gives you legal
+permission to copy, distribute and/or modify the library.
+
+  To protect each distributor, we want to make it very clear that
+there is no warranty for the free library.  Also, if the library is
+modified by someone else and passed on, the recipients should know
+that what they have is not the original version, so that the original
+author's reputation will not be affected by problems that might be
+introduced by others.
+
+  Finally, software patents pose a constant threat to the existence of
+any free program.  We wish to make sure that a company cannot
+effectively restrict the users of a free program by obtaining a
+restrictive license from a patent holder.  Therefore, we insist that
+any patent license obtained for a version of the library must be
+consistent with the full freedom of use specified in this license.
+
+  Most GNU software, including some libraries, is covered by the
+ordinary GNU General Public License.  This license, the GNU Lesser
+General Public License, applies to certain designated libraries, and
+is quite different from the ordinary General Public License.  We use
+this license for certain libraries

[mesos] branch master updated: Added MESOS-9695 to the 1.4.4 CHANGELOG.

2019-05-01 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new fc8847d  Added MESOS-9695 to the 1.4.4 CHANGELOG.
fc8847d is described below

commit fc8847d0002a6c937e59341aace12e739102449d
Author: Andrei Budnik 
AuthorDate: Wed May 1 12:25:30 2019 +0200

Added MESOS-9695 to the 1.4.4 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 1a4d782..f3f6f92 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1979,6 +1979,7 @@ Release Notes - Mesos - Version 1.4.4 (WIP)
 
 ** Bug
   * [MESOS-9507] - Agent could not recover due to empty docker volume 
checkpointed files.
+  * [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
 
 ** Improvement:
   * [MESOS-9159] - Support Foreign URLs in docker registry puller.



[mesos] branch 1.4.x updated (1a76202 -> f05058d)

2019-05-01 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.4.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 1a76202  Added MESOS-9159 and MESOS-9675 to the 1.4.4 CHANGELOG.
 new ca65f99  Removed the duplicate pid check in Docker containerizer.
 new f05058d  Added MESOS-9695 to the 1.4.4 CHANGELOG.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG  |  1 +
 src/slave/containerizer/docker.cpp | 27 ++-
 2 files changed, 7 insertions(+), 21 deletions(-)



[mesos] 01/02: Removed the duplicate pid check in Docker containerizer.

2019-05-01 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.4.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit ca65f991727494b9b78e64a19306231ac004289f
Author: Qian Zhang 
AuthorDate: Tue Apr 30 13:23:26 2019 +0200

Removed the duplicate pid check in Docker containerizer.

Review: https://reviews.apache.org/r/70561/
---
 src/slave/containerizer/docker.cpp | 27 ++-
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/src/slave/containerizer/docker.cpp 
b/src/slave/containerizer/docker.cpp
index 31a47f0..97cd75e 100644
--- a/src/slave/containerizer/docker.cpp
+++ b/src/slave/containerizer/docker.cpp
@@ -925,10 +925,6 @@ Future DockerContainerizerProcess::_recover(
   }
 }
 
-// Collection of pids that we've started reaping in order to
-// detect very unlikely duplicate scenario (see below).
-hashmap pids;
-
 foreachvalue (const FrameworkState& framework, state->frameworks) {
   foreachvalue (const ExecutorState& executor, framework.executors) {
 if (executor.info.isNone()) {
@@ -1007,9 +1003,12 @@ Future DockerContainerizerProcess::_recover(
 
 // Only reap the executor process if the executor can be connected
 // otherwise just set `container->status` to `None()`. This is to
-// avoid reaping an irrelevant process, e.g., after the agent host is
-// rebooted, the executor pid happens to be reused by another process.
-// See MESOS-8125 for details.
+// avoid reaping an irrelevant process, e.g., agent process is stopped
+// for a long time, and during this time executor terminates and its
+// pid happens to be reused by another irrelevant process. When agent
+// is restarted, it still considers this executor not complete (i.e.,
+// `run->completed` is false), so we would reap the irrelevant process
+// if we do not check whether that process can be connected.
 // Note that if both the pid and the port of the executor are reused
 // by another process or two processes respectively after the agent
 // host reboots we will still reap an irrelevant process, but that
@@ -1045,20 +1044,6 @@ Future DockerContainerizerProcess::_recover(
 container->status.future().get()
   .onAny(defer(self(), ::reaped, containerId));
 
-if (pids.containsValue(pid)) {
-  // This should (almost) never occur. There is the
-  // possibility that a new executor is launched with the same
-  // pid as one that just exited (highly unlikely) and the
-  // slave dies after the new executor is launched but before
-  // it hears about the termination of the earlier executor
-  // (also unlikely).
-  return Failure(
-  "Detected duplicate pid " + stringify(pid) +
-  " for container " + stringify(containerId));
-}
-
-pids.put(containerId, pid);
-
 const string sandboxDirectory = paths::getExecutorRunPath(
 flags.work_dir,
 state->id,



[mesos] 02/02: Added MESOS-9695 to the 1.4.4 CHANGELOG.

2019-05-01 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.4.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit f05058dad7219950fd9bdc2748ab9c9a79d6e7f1
Author: Andrei Budnik 
AuthorDate: Wed May 1 12:25:30 2019 +0200

Added MESOS-9695 to the 1.4.4 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 0f32baa..0ce1715 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -4,6 +4,7 @@ Release Notes - Mesos - Version 1.4.4 (WIP)
 
 ** Bug
   * [MESOS-9507] - Agent could not recover due to empty docker volume 
checkpointed files.
+  * [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
 
 ** Improvement:
   * [MESOS-9159] - Support Foreign URLs in docker registry puller.



[mesos] branch 1.8.x updated: Allowed compiling Seccomp isolator on older kernel versions.

2019-05-01 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/1.8.x by this push:
 new a684f07  Allowed compiling Seccomp isolator on older kernel versions.
a684f07 is described below

commit a684f073db683d83965fccbc7e5a308e35a7da9c
Author: Andrei Budnik 
AuthorDate: Wed May 1 11:52:49 2019 +0200

Allowed compiling Seccomp isolator on older kernel versions.

This patch removes dependency on `linux/seccomp.h` header, which
may be missing on some Linux distributions.

Review: https://reviews.apache.org/r/70567/
---
 configure.ac  |  8 
 src/slave/containerizer/mesos/isolators/linux/seccomp.cpp | 10 --
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/configure.ac b/configure.ac
index a367c28..8f3a36e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1627,14 +1627,6 @@ The Seccomp isolator is only supported on Linux.
 ---
   ])])
 
-  AC_CHECK_HEADERS([linux/seccomp.h], [],
-   [AC_MSG_ERROR([Cannot find seccomp system headers

-Please install the Linux kernel headers and make sure that you have
-Linux kernel 3.5+ installed.

-  ])])
-
   # Check if libseccomp prefix path was supplied and if so, add it to
   # CPPFLAGS while extending it by /include and to LDFLAGS while
   # extending it by /lib.
diff --git a/src/slave/containerizer/mesos/isolators/linux/seccomp.cpp 
b/src/slave/containerizer/mesos/isolators/linux/seccomp.cpp
index 5624c24..bb35c6e 100644
--- a/src/slave/containerizer/mesos/isolators/linux/seccomp.cpp
+++ b/src/slave/containerizer/mesos/isolators/linux/seccomp.cpp
@@ -14,8 +14,6 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
-#include 
-
 #include 
 
 #include 
@@ -34,6 +32,14 @@ using mesos::slave::ContainerConfig;
 using mesos::slave::ContainerLaunchInfo;
 using mesos::slave::Isolator;
 
+// NOTE: The definition below was taken from the Linux Kernel sources.
+//
+// TODO(abudnik): This definition should be removed in favor of using
+// `linux/seccomp.h` once we drop support for kernels older than 3.5.
+#if !defined(SECCOMP_MODE_FILTER)
+#define SECCOMP_MODE_FILTER 2
+#endif
+
 namespace mesos {
 namespace internal {
 namespace slave {



[mesos] branch master updated: Allowed compiling Seccomp isolator on older kernel versions.

2019-05-01 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new 1954bea  Allowed compiling Seccomp isolator on older kernel versions.
1954bea is described below

commit 1954beae2133fa55efe53038a020de4309b7ae91
Author: Andrei Budnik 
AuthorDate: Wed May 1 11:52:49 2019 +0200

Allowed compiling Seccomp isolator on older kernel versions.

This patch removes dependency on `linux/seccomp.h` header, which
may be missing on some Linux distributions.

Review: https://reviews.apache.org/r/70567/
---
 configure.ac  |  8 
 src/slave/containerizer/mesos/isolators/linux/seccomp.cpp | 10 --
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/configure.ac b/configure.ac
index e5afde4..b4bad57 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1627,14 +1627,6 @@ The Seccomp isolator is only supported on Linux.
 ---
   ])])
 
-  AC_CHECK_HEADERS([linux/seccomp.h], [],
-   [AC_MSG_ERROR([Cannot find seccomp system headers

-Please install the Linux kernel headers and make sure that you have
-Linux kernel 3.5+ installed.

-  ])])
-
   # Check if libseccomp prefix path was supplied and if so, add it to
   # CPPFLAGS while extending it by /include and to LDFLAGS while
   # extending it by /lib.
diff --git a/src/slave/containerizer/mesos/isolators/linux/seccomp.cpp 
b/src/slave/containerizer/mesos/isolators/linux/seccomp.cpp
index 5624c24..bb35c6e 100644
--- a/src/slave/containerizer/mesos/isolators/linux/seccomp.cpp
+++ b/src/slave/containerizer/mesos/isolators/linux/seccomp.cpp
@@ -14,8 +14,6 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
-#include 
-
 #include 
 
 #include 
@@ -34,6 +32,14 @@ using mesos::slave::ContainerConfig;
 using mesos::slave::ContainerLaunchInfo;
 using mesos::slave::Isolator;
 
+// NOTE: The definition below was taken from the Linux Kernel sources.
+//
+// TODO(abudnik): This definition should be removed in favor of using
+// `linux/seccomp.h` once we drop support for kernels older than 3.5.
+#if !defined(SECCOMP_MODE_FILTER)
+#define SECCOMP_MODE_FILTER 2
+#endif
+
 namespace mesos {
 namespace internal {
 namespace slave {



[mesos] 03/04: Added MESOS-9695 to the 1.6.3 CHANGELOG.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 138bfe41c822823afe6d2a0532e3c95e4f7d3bfe
Author: Andrei Budnik 
AuthorDate: Tue Apr 30 14:36:47 2019 +0200

Added MESOS-9695 to the 1.6.3 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index e7ec1a5..425df0e 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -886,6 +886,7 @@ Release Notes - Mesos - Version 1.6.3 (WIP)
   * [MESOS-9564] - Logrotate container logger lets tasks execute arbitrary 
commands in the Mesos agent's namespace.
   * [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port 
Resources
   * [MESOS-9692] - Quota may be under allocated for disk resources.
+  * [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
   * [MESOS-9707] - Calling link::lo() may cause runtime error
 
 ** Improvement



[mesos] 04/04: Added MESOS-9695 to the 1.5.4 CHANGELOG.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 4eec48e17d08575c18458f713d2e8280faab99d6
Author: Andrei Budnik 
AuthorDate: Tue Apr 30 14:45:01 2019 +0200

Added MESOS-9695 to the 1.5.4 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 425df0e..1a4d782 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1350,6 +1350,7 @@ Release Notes - Mesos - Version 1.5.4 (WIP)
 ** Bug
   * [MESOS-9529] - `/proc` should be remounted even if a nested container set 
`share_pid_namespace` to true.
   * [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port 
Resources
+  * [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
   * [MESOS-9707] - Calling link::lo() may cause runtime error
 
 ** Improvement



[mesos] branch master updated (c8004ee -> 4eec48e)

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from c8004ee  Removed the duplicate pid check in Docker containerizer.
 new a8fe548  Added MESOS-9695 to the 1.8.1 CHANGELOG.
 new 592f7c4  Added MESOS-9695 to the 1.7.3 CHANGELOG.
 new 138bfe4  Added MESOS-9695 to the 1.6.3 CHANGELOG.
 new 4eec48e  Added MESOS-9695 to the 1.5.4 CHANGELOG.

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)



[mesos] 01/04: Added MESOS-9695 to the 1.8.1 CHANGELOG.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit a8fe548b091732c2b46f5e6a7d7392de92644f5a
Author: Andrei Budnik 
AuthorDate: Tue Apr 30 14:08:58 2019 +0200

Added MESOS-9695 to the 1.8.1 CHANGELOG.
---
 CHANGELOG | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CHANGELOG b/CHANGELOG
index 9c01040..89452a0 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -4,7 +4,7 @@ Release Notes - Mesos - Version 1.8.1 (WIP)
 
 ** Bug
   * [MESOS-9536] - Nested container launched with non-root user may not be 
able to write to its sandbox via the environment variable `MESOS_SANDBOX`.
-
+  * [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
 
 Release Notes - Mesos - Version 1.8.0
 -



[mesos] 02/04: Added MESOS-9695 to the 1.7.3 CHANGELOG.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 592f7c44a304f5c02804a0de98df07ec295ce070
Author: Andrei Budnik 
AuthorDate: Tue Apr 30 14:30:32 2019 +0200

Added MESOS-9695 to the 1.7.3 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 89452a0..e7ec1a5 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -417,6 +417,7 @@ Release Notes - Mesos - Version 1.7.3 (WIP)
   * [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port 
Resources
   * [MESOS-9661] - Agent crashes when SLRP recovers dropped operations.
   * [MESOS-9692] - Quota may be under allocated for disk resources.
+  * [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
   * [MESOS-9707] - Calling link::lo() may cause runtime error
 
 ** Improvements



[mesos] branch 1.5.x updated (f8e0e41 -> 791ac63)

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from f8e0e41  Added MESOS-9619 to the 1.5.4 CHANGELOG.
 new f4a6453  Removed the duplicate pid check in Docker containerizer.
 new 791ac63  Added MESOS-9695 to the 1.5.4 CHANGELOG.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG  |  1 +
 src/slave/containerizer/docker.cpp | 27 ++-
 2 files changed, 7 insertions(+), 21 deletions(-)



[mesos] 01/02: Removed the duplicate pid check in Docker containerizer.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit f4a6453c77719ed531a9287b6c9cdeb7ad268865
Author: Qian Zhang 
AuthorDate: Tue Apr 30 13:23:26 2019 +0200

Removed the duplicate pid check in Docker containerizer.

Review: https://reviews.apache.org/r/70561/
---
 src/slave/containerizer/docker.cpp | 27 ++-
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/src/slave/containerizer/docker.cpp 
b/src/slave/containerizer/docker.cpp
index 9dbb286..81f72d4 100644
--- a/src/slave/containerizer/docker.cpp
+++ b/src/slave/containerizer/docker.cpp
@@ -934,10 +934,6 @@ Future DockerContainerizerProcess::_recover(
   }
 }
 
-// Collection of pids that we've started reaping in order to
-// detect very unlikely duplicate scenario (see below).
-hashmap pids;
-
 foreachvalue (const FrameworkState& framework, state->frameworks) {
   foreachvalue (const ExecutorState& executor, framework.executors) {
 if (executor.info.isNone()) {
@@ -1016,9 +1012,12 @@ Future DockerContainerizerProcess::_recover(
 
 // Only reap the executor process if the executor can be connected
 // otherwise just set `container->status` to `None()`. This is to
-// avoid reaping an irrelevant process, e.g., after the agent host is
-// rebooted, the executor pid happens to be reused by another process.
-// See MESOS-8125 for details.
+// avoid reaping an irrelevant process, e.g., agent process is stopped
+// for a long time, and during this time executor terminates and its
+// pid happens to be reused by another irrelevant process. When agent
+// is restarted, it still considers this executor not complete (i.e.,
+// `run->completed` is false), so we would reap the irrelevant process
+// if we do not check whether that process can be connected.
 // Note that if both the pid and the port of the executor are reused
 // by another process or two processes respectively after the agent
 // host reboots we will still reap an irrelevant process, but that
@@ -1054,20 +1053,6 @@ Future DockerContainerizerProcess::_recover(
 container->status.future().get()
   .onAny(defer(self(), ::reaped, containerId));
 
-if (pids.containsValue(pid)) {
-  // This should (almost) never occur. There is the
-  // possibility that a new executor is launched with the same
-  // pid as one that just exited (highly unlikely) and the
-  // slave dies after the new executor is launched but before
-  // it hears about the termination of the earlier executor
-  // (also unlikely).
-  return Failure(
-  "Detected duplicate pid " + stringify(pid) +
-  " for container " + stringify(containerId));
-}
-
-pids.put(containerId, pid);
-
 const string sandboxDirectory = paths::getExecutorRunPath(
 flags.work_dir,
 state->id,



[mesos] 02/02: Added MESOS-9695 to the 1.5.4 CHANGELOG.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 791ac63c72c1c4b868c2b3a126c04075575757c1
Author: Andrei Budnik 
AuthorDate: Tue Apr 30 14:45:01 2019 +0200

Added MESOS-9695 to the 1.5.4 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 6e175cf..fd85213 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -5,6 +5,7 @@ Release Notes - Mesos - Version 1.5.4 (WIP)
 ** Bug
   * [MESOS-9529] - `/proc` should be remounted even if a nested container set 
`share_pid_namespace` to true.
   * [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port 
Resources
+  * [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
   * [MESOS-9707] - Calling link::lo() may cause runtime error
 
 ** Improvement



[mesos] branch 1.6.x updated (45bfa2a -> 13fdaa4)

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 45bfa2a  Added MESOS-9536 to the 1.6.3 CHANGELOG.
 new d627a91  Removed the duplicate pid check in Docker containerizer.
 new 13fdaa4  Added MESOS-9695 to the 1.6.3 CHANGELOG.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG  |  1 +
 src/slave/containerizer/docker.cpp | 27 ++-
 2 files changed, 7 insertions(+), 21 deletions(-)



[mesos] 02/02: Added MESOS-9695 to the 1.6.3 CHANGELOG.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 13fdaa4e7be41f5129d2b96944569abca2b60905
Author: Andrei Budnik 
AuthorDate: Tue Apr 30 14:36:47 2019 +0200

Added MESOS-9695 to the 1.6.3 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 58fda2e..55b74d1 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -9,6 +9,7 @@ Release Notes - Mesos - Version 1.6.3 (WIP)
   * [MESOS-9564] - Logrotate container logger lets tasks execute arbitrary 
commands in the Mesos agent's namespace.
   * [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port 
Resources
   * [MESOS-9692] - Quota may be under allocated for disk resources.
+  * [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
   * [MESOS-9707] - Calling link::lo() may cause runtime error
 
 ** Improvement



[mesos] 01/02: Removed the duplicate pid check in Docker containerizer.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit d627a919c651e8dacd14569c027fe4422ef87828
Author: Qian Zhang 
AuthorDate: Tue Apr 30 13:23:26 2019 +0200

Removed the duplicate pid check in Docker containerizer.

Review: https://reviews.apache.org/r/70561/
---
 src/slave/containerizer/docker.cpp | 27 ++-
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/src/slave/containerizer/docker.cpp 
b/src/slave/containerizer/docker.cpp
index ef468ed..85b22f4 100644
--- a/src/slave/containerizer/docker.cpp
+++ b/src/slave/containerizer/docker.cpp
@@ -937,10 +937,6 @@ Future DockerContainerizerProcess::_recover(
   }
 }
 
-// Collection of pids that we've started reaping in order to
-// detect very unlikely duplicate scenario (see below).
-hashmap pids;
-
 foreachvalue (const FrameworkState& framework, state->frameworks) {
   foreachvalue (const ExecutorState& executor, framework.executors) {
 if (executor.info.isNone()) {
@@ -1019,9 +1015,12 @@ Future DockerContainerizerProcess::_recover(
 
 // Only reap the executor process if the executor can be connected
 // otherwise just set `container->status` to `None()`. This is to
-// avoid reaping an irrelevant process, e.g., after the agent host is
-// rebooted, the executor pid happens to be reused by another process.
-// See MESOS-8125 for details.
+// avoid reaping an irrelevant process, e.g., agent process is stopped
+// for a long time, and during this time executor terminates and its
+// pid happens to be reused by another irrelevant process. When agent
+// is restarted, it still considers this executor not complete (i.e.,
+// `run->completed` is false), so we would reap the irrelevant process
+// if we do not check whether that process can be connected.
 // Note that if both the pid and the port of the executor are reused
 // by another process or two processes respectively after the agent
 // host reboots we will still reap an irrelevant process, but that
@@ -1057,20 +1056,6 @@ Future DockerContainerizerProcess::_recover(
 container->status.future()
   ->onAny(defer(self(), ::reaped, containerId));
 
-if (pids.containsValue(pid)) {
-  // This should (almost) never occur. There is the
-  // possibility that a new executor is launched with the same
-  // pid as one that just exited (highly unlikely) and the
-  // slave dies after the new executor is launched but before
-  // it hears about the termination of the earlier executor
-  // (also unlikely).
-  return Failure(
-  "Detected duplicate pid " + stringify(pid) +
-  " for container " + stringify(containerId));
-}
-
-pids.put(containerId, pid);
-
 const string sandboxDirectory = paths::getExecutorRunPath(
 flags.work_dir,
 state->id,



[mesos] 01/02: Removed the duplicate pid check in Docker containerizer.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit bd438b797a15724945a60ee8d57cec656e23ae2d
Author: Qian Zhang 
AuthorDate: Tue Apr 30 13:23:26 2019 +0200

Removed the duplicate pid check in Docker containerizer.

Review: https://reviews.apache.org/r/70561/
---
 src/slave/containerizer/docker.cpp | 27 ++-
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/src/slave/containerizer/docker.cpp 
b/src/slave/containerizer/docker.cpp
index 192dc29..dacf4de 100644
--- a/src/slave/containerizer/docker.cpp
+++ b/src/slave/containerizer/docker.cpp
@@ -945,10 +945,6 @@ Future DockerContainerizerProcess::_recover(
   }
 }
 
-// Collection of pids that we've started reaping in order to
-// detect very unlikely duplicate scenario (see below).
-hashmap pids;
-
 foreachvalue (const FrameworkState& framework, state->frameworks) {
   foreachvalue (const ExecutorState& executor, framework.executors) {
 if (executor.info.isNone()) {
@@ -1027,9 +1023,12 @@ Future DockerContainerizerProcess::_recover(
 
 // Only reap the executor process if the executor can be connected
 // otherwise just set `container->status` to `None()`. This is to
-// avoid reaping an irrelevant process, e.g., after the agent host is
-// rebooted, the executor pid happens to be reused by another process.
-// See MESOS-8125 for details.
+// avoid reaping an irrelevant process, e.g., agent process is stopped
+// for a long time, and during this time executor terminates and its
+// pid happens to be reused by another irrelevant process. When agent
+// is restarted, it still considers this executor not complete (i.e.,
+// `run->completed` is false), so we would reap the irrelevant process
+// if we do not check whether that process can be connected.
 // Note that if both the pid and the port of the executor are reused
 // by another process or two processes respectively after the agent
 // host reboots we will still reap an irrelevant process, but that
@@ -1065,20 +1064,6 @@ Future DockerContainerizerProcess::_recover(
 container->status.future()
   ->onAny(defer(self(), ::reaped, containerId));
 
-if (pids.containsValue(pid)) {
-  // This should (almost) never occur. There is the
-  // possibility that a new executor is launched with the same
-  // pid as one that just exited (highly unlikely) and the
-  // slave dies after the new executor is launched but before
-  // it hears about the termination of the earlier executor
-  // (also unlikely).
-  return Failure(
-  "Detected duplicate pid " + stringify(pid) +
-  " for container " + stringify(containerId));
-}
-
-pids.put(containerId, pid);
-
 const string sandboxDirectory = paths::getExecutorRunPath(
 flags.work_dir,
 state->id,



[mesos] 02/02: Added MESOS-9695 to the 1.7.3 CHANGELOG.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 80c9fd79448b1ea6f6367da8def911b75ababafd
Author: Andrei Budnik 
AuthorDate: Tue Apr 30 14:30:32 2019 +0200

Added MESOS-9695 to the 1.7.3 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index ae40637..369a2c8 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -15,6 +15,7 @@ Release Notes - Mesos - Version 1.7.3 (WIP)
   * [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port 
Resources
   * [MESOS-9661] - Agent crashes when SLRP recovers dropped operations.
   * [MESOS-9692] - Quota may be under allocated for disk resources.
+  * [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
   * [MESOS-9707] - Calling link::lo() may cause runtime error
 
 ** Improvements



[mesos] branch 1.7.x updated (f0cbafd -> 80c9fd7)

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from f0cbafd  Added MESOS-9536 to the 1.7.3 CHANGELOG.
 new bd438b7  Removed the duplicate pid check in Docker containerizer.
 new 80c9fd7  Added MESOS-9695 to the 1.7.3 CHANGELOG.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG  |  1 +
 src/slave/containerizer/docker.cpp | 27 ++-
 2 files changed, 7 insertions(+), 21 deletions(-)



[mesos] 01/02: Removed the duplicate pid check in Docker containerizer.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 02f532ae876196c0c8abad9d6effb75d3ffa5db7
Author: Qian Zhang 
AuthorDate: Tue Apr 30 13:59:54 2019 +0200

Removed the duplicate pid check in Docker containerizer.

Review: https://reviews.apache.org/r/70561/
---
 src/slave/containerizer/docker.cpp | 27 ++-
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/src/slave/containerizer/docker.cpp 
b/src/slave/containerizer/docker.cpp
index 7f1d471..e4ad945 100644
--- a/src/slave/containerizer/docker.cpp
+++ b/src/slave/containerizer/docker.cpp
@@ -936,10 +936,6 @@ Future DockerContainerizerProcess::_recover(
   }
 }
 
-// Collection of pids that we've started reaping in order to
-// detect very unlikely duplicate scenario (see below).
-hashmap pids;
-
 foreachvalue (const FrameworkState& framework, state->frameworks) {
   foreachvalue (const ExecutorState& executor, framework.executors) {
 if (executor.info.isNone()) {
@@ -1018,9 +1014,12 @@ Future DockerContainerizerProcess::_recover(
 
 // Only reap the executor process if the executor can be connected
 // otherwise just set `container->status` to `None()`. This is to
-// avoid reaping an irrelevant process, e.g., after the agent host is
-// rebooted, the executor pid happens to be reused by another process.
-// See MESOS-8125 for details.
+// avoid reaping an irrelevant process, e.g., agent process is stopped
+// for a long time, and during this time executor terminates and its
+// pid happens to be reused by another irrelevant process. When agent
+// is restarted, it still considers this executor not complete (i.e.,
+// `run->completed` is false), so we would reap the irrelevant process
+// if we do not check whether that process can be connected.
 // Note that if both the pid and the port of the executor are reused
 // by another process or two processes respectively after the agent
 // host reboots we will still reap an irrelevant process, but that
@@ -1056,20 +1055,6 @@ Future DockerContainerizerProcess::_recover(
 container->status.future()
   ->onAny(defer(self(), ::reaped, containerId));
 
-if (pids.contains_value(pid)) {
-  // This should (almost) never occur. There is the
-  // possibility that a new executor is launched with the same
-  // pid as one that just exited (highly unlikely) and the
-  // slave dies after the new executor is launched but before
-  // it hears about the termination of the earlier executor
-  // (also unlikely).
-  return Failure(
-  "Detected duplicate pid " + stringify(pid) +
-  " for container " + stringify(containerId));
-}
-
-pids.put(containerId, pid);
-
 const string sandboxDirectory = paths::getExecutorRunPath(
 flags.work_dir,
 state->id,



[mesos] 02/02: Added MESOS-9695 to the 1.8.1 CHANGELOG.

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a commit to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit b032d4ec81c76277b274150f8027766f0e3d2275
Author: Andrei Budnik 
AuthorDate: Tue Apr 30 14:08:58 2019 +0200

Added MESOS-9695 to the 1.8.1 CHANGELOG.
---
 CHANGELOG | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CHANGELOG b/CHANGELOG
index d19085d..c99523c 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -4,7 +4,7 @@ Release Notes - Mesos - Version 1.8.1 (WIP)
 
 ** Bug
   * [MESOS-9536] - Nested container launched with non-root user may not be 
able to write to its sandbox via the environment variable `MESOS_SANDBOX`.
-
+  * [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
 
 Release Notes - Mesos - Version 1.8.0
 -



[mesos] branch 1.8.x updated (6160315 -> b032d4e)

2019-04-30 Thread abudnik
This is an automated email from the ASF dual-hosted git repository.

abudnik pushed a change to branch 1.8.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 6160315  Added MESOS-9536 to the 1.8.1 CHANGELOG.
 new 02f532a  Removed the duplicate pid check in Docker containerizer.
 new b032d4e  Added MESOS-9695 to the 1.8.1 CHANGELOG.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG  |  2 +-
 src/slave/containerizer/docker.cpp | 27 ++-
 2 files changed, 7 insertions(+), 22 deletions(-)



  1   2   >