Source: slurm-wlm-contrib
Version: 22.05.8-4+deb12u1
Severity: critical
Justification: breaks the whole system

Dear Maintainer,

   After latest security update, part of our slurm cluster (GPU nodes)
   was unusable.  These nodes were configured using the NVML autodetect
   feature of slurm.  After the deb12u2 update, the NVML plugins failed
   to install because there is no corresponding security update:

   The following packages have unmet dependencies:
 slurm-wlm-nvml-plugin : Depends: slurm-wlm-basic-plugins (= 22.05.8-4+deb12u1) 
but 22.05.8-4+deb12u2 is to be installed
 slurm-wlm-nvml-plugin-dev : Depends: slurm-wlm-basic-plugins-dev (= 
22.05.8-4+deb12u1) but 22.05.8-4+deb12u2 is to be installed
E: Unable to correct problems, you have held broken packages.

Without NVML, the slurmd daemon will not start, so no new jobs could be
submitted to ALL of our GPU nodes.

We discovered that slurm-wlm-contrib had been removed from testing in
Dec 2023 but with no bug reports or explanation as to why.

We have gotten around the issue by reconfiguring our GPU nodes to manual
configuration and removed the NVML packages for now.  However this was
quite impactful and unexpected in our environment.


-- System Information:
Debian Release: 12.4
  APT prefers stable-security
  APT policy: (750, 'stable-security'), (750, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.1.0-15-amd64 (SMP w/96 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, 
TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Reply via email to