Hi Jens,

I highly recommend you go through the pain to upgrade the kernel on your
GPU cluster to something modern, like 4.15.0-91-generic. There was quite
a few regressions around the 4.15.0-56 to 4.15.0-58 mark, as we merged a
lot of upstream stable patches in at that time.

4.15.0-91 is pretty stable these days, and you can probably leave it
long term on that kernel.

In this bug, the fix landed in the mlx5_core driver, which is a kernel
module. Kernel modules are only compatible with the kernel that they
were compiled for, since Linux does not have a stable ABI / binary
interface.

So, this isn't as easy as just copying over a fixed kernel module. The
kmod package doesn't actually have any kernel modules in it, just the
blacklists and things defined in /etc/modules-load.d and /etc/modprobe.d

Nvidia drivers should be built with dkms, and *should* work without too
much hassle. I know that theory doesn't always align with reality
though.

Anyway, I recommend you upgrade to a newer kernel on your GPU cluster.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1840854

Title:
  mlx5_core reports hardware checksum error for padded packets on
  Mellanox NICs

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840854/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to