Package: nvidia-kernel-dkms
Version: 390.116-1
Severity: serious
Justification: Policy 3.5
Dear Maintainer,
An upgrade from kernel 4.9.0-8 to kernel 4.9.0-9 broke nvidia-kernel-dkms on
our server, which has 2 gpus for gpgpu computing: although nvidia-kernel-dkms
was upgraded too in the process (as it was part of debian 9 upgrade 7 release),
the modules weren't rebuilt, as shown with the following command :
root@physix58:~# lsmod | grep nvidia
root@physix58:~# find /lib/modules/ -name "nvidia*"
/lib/modules/4.9.0-9-amd64/kernel/drivers/net/ethernet/nvidia
/lib/modules/4.9.0-8-amd64/kernel/drivers/net/ethernet/nvidia
/lib/modules/4.9.0-8-amd64/updates/dkms/nvidia-current-modeset.ko
/lib/modules/4.9.0-8-amd64/updates/dkms/nvidia-current-uvm.ko
/lib/modules/4.9.0-8-amd64/updates/dkms/nvidia-current.ko
/lib/modules/4.9.0-8-amd64/updates/dkms/nvidia-current-drm.ko
I managed to get the nvidia modules rebuilt for kernel 4.9.0-9 by using
"dpkg-reconfigure nvidia-kernel-dkms", but only after I installed
linux-headers-4.9.0-9-all-amd64 (linux-headers-4.9.0-8-all-amd64 was installed
but not linux-headers-4.9.0-9-all-amd64). I suspect that "dpkg-reconfigure
nvidia-kernel-dkms" failed because of missing headers when invoked by "aptitude
full-upgrade", but I can't be sure...
This problem seems to be the same as a very old bug report :
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=585862
By looking at this issue on internet, it seems that this problem is quite
common, and people usually get on with it just by rebuilding the modules as I
did. However, I have the impression that the intended behaviour is that nvidia
module be automatically rebuilt whenever there is a kernel upgrade, which is
indeed what the user wants. So, I suspect that this automatic mechanism fails
to work in some cases.
On
https://www.reddit.com/r/linuxquestions/comments/6mqudq/ran_aptget_distupgrade_which_updated_kernel_now/,
debian_miner says "I just did some testing with this, and I believe the reason
some people are seeing this behavior is because they didn't install the
linux-headers- metapackage"
Indeed, on our server, "linux-headers-4.9.0-8-all-amd64" was installed but not
"linux-headers-amd64". I believe that if the package "linux-headers-amd64" was
installed in the first place, then aptitude full-upgrade would have brought
"linux-headers-4.9.0-9-all-amd64" along with the kernel, and the rebuild of the
nvidia module would have then succeeded. But that's merely an hypothesis, as
I'm not an expert.
Some things that might be worth noting :
1. the command that upgraded the kernel is "UCF_FORCE_CONFFNEW=1
DEBIAN_FRONTEND=noninteractive APT_LISTCHANGES_FRONTEND=none yes '' | aptitude
-y -o Dpkg::Options::="--force-confnew" -o
Aptitude::Cmdline::ignore-trust-violations=true full-upgrade". It upgraded the
following packages :
root@physix58:~# grep -B 1 linux-image
/var/log/apt/history.log-20190501
Start-Date: 2019-04-30 08:40:22
Install: linux-image-4.9.0-9-amd64:amd64 (4.9.168-1, automatic)
Upgrade: ca-certificates-java:amd64 (20170929~deb9u1, 20170929~deb9u3),
postfix:amd64 (3.1.9-0+deb9u2, 3.1.12-0+deb9u1), libglx0-glvnd-nvidia:amd64
(390.87-8~deb9u1, 390.116-1), postfix-pcre:amd64 (3.1.9-0+deb9u2,
3.1.12-0+deb9u1), linux-libc-dev:amd64 (4.9.144-3.1, 4.9.168-1),
libnvidia-ml1:amd64 (390.87-8~deb9u1, 390.116-1), nvidia-egl-icd:amd64
(390.87-8~deb9u1, 390.116-1), nvidia-driver:amd64 (390.87-8~deb9u1, 390.116-1),
libpng-dev:amd64 (1.6.28-1, 1.6.28-1+deb9u1), postfix-sqlite:amd64
(3.1.9-0+deb9u2, 3.1.12-0+deb9u1), libmagickwand-6.q16-3:amd64
(8:6.9.7.4+dfsg-11+deb9u6, 8:6.9.7.4+dfsg-11+deb9u7), python3-pip:amd64
(9.0.1-2, 9.0.1-2+deb9u1), libjs-jquery:amd64 (3.1.1-2, 3.1.1-2+deb9u1),
nvidia-vdpau-driver:amd64 (390.87-8~deb9u1, 390.116-1),
libgl1-nvidia-glvnd-glx:amd64 (390.87-8~deb9u1, 390.116-1),
libglx-nvidia0:amd64 (390.87-8~deb9u1, 390.116-1),
linux-compiler-gcc-6-x86:amd64 (4.9.144-3.1, 4.9.168-1), libpq5:amd64
(9.6.11-0+deb9u1, 9.6.12-0+deb9u1), nvidia-kern
el-dkms:
amd64 (390.87-8~deb9u1, 390.116-1), libegl-nvidia0:amd64 (390.87-8~deb9u1,
390.116-1), nvidia-egl-common:amd64 (390.87-8~deb9u1, 390.116-1),
python-cryptography:amd64 (1.7.1-3, 1.7.1-3+deb9u1),
libnvidia-ptxjitcompiler1:amd64 (390.87-8~deb9u1, 390.116-1),
nvidia-legacy-check:amd64 (390.87-8~deb9u1, 390.116-1), libzzip-0-13:amd64
(0.13.62-3.1, 0.13.62-3.2~deb9u1), libnvidia-fatbinaryloader:amd64
(390.87-8~deb9u1, 390.116-1), linux-image-amd64:amd64 (4.9+80+deb9u6,
4.9+80+deb9u7), nvidia-kernel-support:amd64 (390.87-8~deb9u1, 390.116-1),
libgstreamer-plugins-base1.0-0:amd64 (1.10.4-1, 1.10.4-1+deb9u1),
linux-kbuild-4.9:amd64 (4.9.144-3.1, 4.9.168-1), nvidia-driver-libs:amd64
(390.87-8~deb9u1, 390.116-1), nvidia-driver-bin:amd64 (390.87-8~deb9u1,
390.116-1), libjs-bootstrap:amd64 (3.3.7+dfsg-2+deb9u1, 3.3.7+dfsg-2+deb9u2),
libmagickcore-6.q16-3:amd64 (8:6.9.7.4+dfsg-11+deb9u6,
8:6.9.7.4