Hi,

I just spent few days on this and have a working proposal.
Functionality:

  1.  Execd_params can be now set to "USE_CGROUPS=systemd"
  2.  This causes shepherd to launch jobs via "systemd-run" scope units rather 
than execing them directly.
  3.  Jobs are no longer monitored via PDC (Sge process data collector) but via 
CGroup the system-run above creates.
  4.  Processes are no longer assigned additional GID (gid-range is ignored) as 
this is no longer needed
  5.  As we are using cgroups, all forked tasks are automatically killed once 
the job finishes (i.e. just like ENABLE_ADDGRP_KILL=true)

Advantages:

  1.  100% reliable process tracking via kernel's control groups
  2.  We can now use tools like systemd-cgls or systemd-cgtop to monitor job's 
performance real-time
  3.  Unlike classic systemd setup for exec daemon (when it runs in 
foreground), restarting sgeexecd service does not kill all jobs (because jobs 
run in a separate systemd scopes)
  4.  Should be backwards compatible (i.e. you just switch this on and see) 
with the traditional shepherd's functionality
  5.  Easy to implement Control Group limits for jobs (i.e. MemoryLimit, 
CPUshares, Memory reservation for job)

Implementation details:
Only two important functions were created:
- start_command_via_systemd() which counterparts start_command() in shepherd
- ptf_get_usage_from_systemd() which counterparts 
ptf_get_usage_from_data_collector()
- minor fixes in other places
- systemd does not seem to support setting CPUAffinity via control group, so 
the existing code for handling cpuset's is functional the same way it was.

TODO:

  1.  I wanted to introduce some new complex attributes (like "cgroup_memmax" 
etc) that could be used to enforce cgroup limits, but it does not seem to be a 
trivial task unfortunately. If anyone knows how to do that it would be great as 
it would make my patches a lot more attractive.
  2.  Testing: so far, everything looks good, interactive and non-interactive 
jobs are working, env is passed, etc... Just need more testing.

Now can I send my patches somewhere so it can be possibly merged with the SoGE 
main repo?
Thanks,

Ondrej



From: Ondrej Valousek
Sent: Friday, August 9, 2019 1:40 PM
To: 'users@gridengine.org' <users@gridengine.org>
Subject: SGE & systemd integration

Hi all,

I am thinking of making SGE (or sge_execd) more systemd friendly.
Right now, there is some (as per 8.1.9) support for cgroups as per:
USE_CGROUPS=y/n
My proposal is to make it:
USE_CGROUPS=y/n/systemd
when set to systemd, we would not to detect and any cgroups (and setting cpuset 
controller) manually.
Instead, shepherd daemon would run the job via "systemd-run" binary.

https://www.freedesktop.org/software/systemd/man/systemd-run.html


systemd-run can set various cgroup controllers via it's "--property" flag, 
achieving the same we do now manually. We would probably also utilize the 
"-scope" flag to make the job running synchronously.

Initially, I was thinking about implementing the same via "starter_method" 
flag, but systemd-run needs to be run as root, so it has to be hardcoded into 
shepherd.c and sge_execd daemon needs to also be running under root privileges, 
not sure if capabilities would help here.

Does this initiative make any sense?
I can try to implement it myself, but I am not familiar with sge internals. I 
can try...

Ondrej

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to