Re: [PR] [cgroups2] Implement Cgroups 2 isolator w/o nested containers and systemd. [mesos]

2024-04-19 Thread via GitHub


DevinLeamy commented on code in PR #556:
URL: https://github.com/apache/mesos/pull/556#discussion_r1572831258


##
src/slave/containerizer/mesos/isolators/cgroups2/cgroups2.cpp:
##
@@ -120,6 +144,569 @@ bool Cgroups2IsolatorProcess::supportsStandalone()
   return true;
 }
 
+
+Future> Cgroups2IsolatorProcess::prepare(
+const ContainerID& containerId,
+const ContainerConfig& containerConfig)
+{
+  if (containerId.has_parent()) {
+return Failure("cgroups v2 does not support nested containers");
+  }
+
+  if (infos.contains(containerId)) {
+return Failure(
+"Container with id '" + stringify(containerId) + "' "
+"has already been prepared");
+  }
+
+  CHECK(containerConfig.container_class() != ContainerClass::DEBUG);
+
+  // Create the non-leaf and leaf cgroups for the container, enable
+  // controllers in the non-leaf cgroup, and `prepare` each of the controllers.
+  const string& cgroup = cgroups2_paths::container(
+  flags.cgroups_root, containerId);
+  if (cgroups2::exists(cgroup)) {
+return Failure("Cgroup '" + cgroup + "' already exists");
+  }
+
+  Try createCgroup = cgroups2::create(cgroup);
+  if (createCgroup.isError()) {
+return Failure("Failed to create cgroup '" + cgroup + "': "
+   + createCgroup.error());
+  }
+
+  const string& leafCgroup = cgroups2_paths::container(
+  flags.cgroups_root, containerId, true);
+  if (cgroups2::exists(leafCgroup)) {
+return Failure("Cgroup '" + leafCgroup + "' already exists");
+  }
+
+  Try createLeafCgroup = cgroups2::create(leafCgroup);
+  if (createLeafCgroup.isError()) {
+return Failure("Failed to create cgroup '" + leafCgroup + "': "
+   + createLeafCgroup.error());
+  }
+
+  LOG(INFO) << "Created cgroups '" << cgroup << "' and '" << leafCgroup << "'";
+
+  infos[containerId] = Owned(new Info(containerId, cgroup));
+  vector> prepares;
+  foreachvalue (const Owned& controller, controllers) {
+// Enable controllers in the non-leaf cgroup.
+Try enable =
+  cgroups2::controllers::enable(cgroup, {controller->name()});
+
+infos[containerId]->controllers.insert(controller->name());
+prepares.push_back(
+controller->prepare(containerId, cgroup, containerConfig));
+  }
+
+  return await(prepares)
+.then(defer(
+PID(this),
+::_prepare,
+containerId,
+containerConfig,
+lambda::_1));
+}
+
+
+Future> Cgroups2IsolatorProcess::_prepare(
+const ContainerID& containerId,
+const ContainerConfig& containerConfig,
+const vector>& futures)
+{
+  vector errors;
+  foreach (const Future& future, futures) {
+if (!future.isReady()) {
+  errors.push_back(future.isFailed() ? future.failure() : "discarded");
+}
+  }
+
+  if (!errors.empty()) {
+return Failure(
+"Failed to prepare controllers: " + strings::join(": ", errors));
+  }
+
+  return update(
+  containerId,
+  containerConfig.resources(),
+  containerConfig.limits())
+.then(defer(
+PID(this),
+::__prepare,
+containerId,
+containerConfig));
+}
+
+
+Future> Cgroups2IsolatorProcess::__prepare(
+const ContainerID& containerId,
+const ContainerConfig& containerConfig)
+{
+  // Only create cgroup mounts for containers with rootfs.
+  if (!containerConfig.has_rootfs()) {
+return None();
+  }
+
+  Owned info = cgroupInfo(containerId);
+  if (!info.get()) {
+return Failure(
+"Failed to get cgroup for container "
+"'" + stringify(containerId) + "'");
+  }
+
+  ContainerLaunchInfo launchInfo;
+  // Create a new cgroup namespace. The child process will only be able to
+  // see the cgroups that are in its cgroup subtree.
+  launchInfo.add_clone_namespaces(CLONE_NEWCGROUP);
+
+  // Create a new mount namespace and mount the root cgroup at /sys/fs/cgroup.
+  launchInfo.add_clone_namespaces(CLONE_NEWNS);
+  *launchInfo.add_mounts() = protobuf::slave::createContainerMount(
+  cgroups2::path(info->cgroup),
+  path::join(containerConfig.rootfs(), "/sys/fs/cgroup"),
+  MS_BIND | MS_REC);
+
+  return launchInfo;
+}
+
+
+Future Cgroups2IsolatorProcess::recover(
+const vector& states,
+const hashset& orphans)
+{
+  vector> recovers;
+  foreach (const ContainerState& state, states) {
+recovers.push_back(___recover(state.container_id()));
+  }
+
+  return await(recovers)
+.then(defer(
+PID(this),
+::_recover,
+orphans,
+lambda::_1));
+}
+
+
+Future Cgroups2IsolatorProcess::_recover(
+  const hashset& orphans,
+  const vector>& futures)
+{
+  vector errors;
+  foreach (const Future& future, futures) {
+if (!future.isReady()) {
+  errors.push_back(future.isFailed() ? future.failure() : "discarded");
+}
+  }
+
+  if (errors.size() > 0) {
+return Failure("Failed to recover active containers: " +
+   strings::join(": ", errors));
+  }
+
+  hashset knownOrphans;
+  hashset 

[PR] [cgroups2] Error if `--cgroups_limit_swap` is used when cgroups v2 is used. [mesos]

2024-04-19 Thread via GitHub


DevinLeamy opened a new pull request, #565:
URL: https://github.com/apache/mesos/pull/565

   Mesos does not support limiting swap memory when using cgroups v2. This is 
because the cgroups v2 API does not allow us to directly control the swap 
memory usage limit.  Rather, we can only control the combined swap _and_ RAM 
usage limit.
   
   Therefore, when the `--cgroups_limit_swap` flag is provided and cgroups v2 
is used we error during flag validation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@mesos.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [cgroups2] Introduces an interface to read a subset of the memory usage statistics. [mesos]

2024-04-19 Thread via GitHub


bmahler closed pull request #564: [cgroups2] Introduces an interface to read a 
subset of the memory usage statistics.
URL: https://github.com/apache/mesos/pull/564


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@mesos.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [cgroups2] Error if `--cgroups_limit_swap` is used when cgroups v2 is used. [mesos]

2024-04-19 Thread via GitHub


bmahler closed pull request #565: [cgroups2] Error if `--cgroups_limit_swap` is 
used when cgroups v2 is used.
URL: https://github.com/apache/mesos/pull/565


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@mesos.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [cgroups2] Introduces an interface to read a subset of the memory usage statistics. [mesos]

2024-04-19 Thread via GitHub


bmahler commented on code in PR #564:
URL: https://github.com/apache/mesos/pull/564#discussion_r1573045523


##
src/linux/cgroups2.hpp:
##
@@ -243,6 +243,41 @@ Try max(const std::string& cgroup);
 // See: https://docs.kernel.org/admin-guide/cgroup-v2.html
 namespace memory {
 
+// Memory usage statistics.
+//
+// Snapshot of the 'memory.stat' control file.
+//
+// Note:
+// We only record a subset of the memory statistics; a complete list can be
+// found in the kernel documentation.
+// https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files
+struct Stats
+{
+  // Anonymous memory usage.

Review Comment:
   let's just use the cgroups names and docs for these?



##
src/linux/cgroups2.cpp:
##
@@ -964,6 +1005,17 @@ Result high(const string& cgroup)
   return internal::parse_bytelimit(*contents);
 }
 
+
+Try stats(const string& cgroup)
+{
+  Try contents = cgroups2::read(cgroup, control::STAT);
+  if (contents.isError()) {
+return Error("Failed to read 'memory.stat': " + contents.error());
+  }
+
+  return control::stat::parse(strings::trim(*contents));

Review Comment:
   hm.. the parse function should know how to parse without the trimming, was 
this actually needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@mesos.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [cgroups2] Error if `--cgroups_limit_swap` is used when cgroups v2 is used. [mesos]

2024-04-19 Thread via GitHub


bmahler commented on code in PR #565:
URL: https://github.com/apache/mesos/pull/565#discussion_r1573038742


##
src/slave/flags.cpp:
##
@@ -652,8 +657,25 @@ mesos::internal::slave::Flags::Flags()
   add(::cgroups_limit_swap,
   "cgroups_limit_swap",
   "Cgroups feature flag to enable memory limits on both memory and\n"
-  "swap instead of just memory.\n",
-  false);
+  "swap instead of just memory. Not supported if cgroups v2 is used.\n",
+  false,
+  [](const bool& limit_swap) -> Option {
+#ifdef ENABLE_CGROUPS_V2
+Try mounted = cgroups2::mounted();
+if (mounted.isError()) {
+return Error("Failed to check if cgroup2 filesystem is mounted: "

Review Comment:
   nit: 2 space indent here and below



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@mesos.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [cgroups2] Introduces an interface to read a subset of the memory usage statistics. [mesos]

2024-04-19 Thread via GitHub


DevinLeamy opened a new pull request, #564:
URL: https://github.com/apache/mesos/pull/564

   Cgroups v2 exposes memory statistics through the 'memory.stat' control.
   
   Here we introduce `cgroups2::memory::stats` to read a subset of the memory 
usage statistics into a new `memory::Stats` object. These statistics will be 
used by the `MemoryControllerProcess` to populate a `ResourceStatistics` 
object, like is done by the `MemorySubsystemProcess` in cgroups v1.
   
   Additional statistics from the 'memory.stat' control can be included as they 
are required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@mesos.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [cgroups2] Error if `--cgroups_limit_swap` is used when cgroups v2 is used. [mesos]

2024-04-19 Thread via GitHub


bmahler commented on PR #565:
URL: https://github.com/apache/mesos/pull/565#issuecomment-2067370799

   > This is because the cgroups v2 API does not allow us to directly control 
the swap memory usage limit. Rather, we can only control the combined swap and 
RAM usage limit.
   
   This doesn't sound accurate? in v2 we are able to separately control swap 
usage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@mesos.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [cgroups2] Introduces API to make cgroups threaded and read their type. [mesos]

2024-04-19 Thread via GitHub


DevinLeamy commented on PR #562:
URL: https://github.com/apache/mesos/pull/562#issuecomment-2066644211

   Since we want the isolated ("containerized") process to be able to 
manipulate its own cgroups, if it so chooses, we're opting to no have the leaf 
cgroup for each container be threaded, hence removing the motivation for this 
patch. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@mesos.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [cgroups2] Introduces API to make cgroups threaded and read their type. [mesos]

2024-04-19 Thread via GitHub


DevinLeamy closed pull request #562: [cgroups2] Introduces API to make cgroups 
threaded and read their type.
URL: https://github.com/apache/mesos/pull/562


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@mesos.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org