Hi Jens, I highly recommend you go through the pain to upgrade the kernel on your GPU cluster to something modern, like 4.15.0-91-generic. There was quite a few regressions around the 4.15.0-56 to 4.15.0-58 mark, as we merged a lot of upstream stable patches in at that time.
4.15.0-91 is pretty stable these days, and you can probably leave it long term on that kernel. In this bug, the fix landed in the mlx5_core driver, which is a kernel module. Kernel modules are only compatible with the kernel that they were compiled for, since Linux does not have a stable ABI / binary interface. So, this isn't as easy as just copying over a fixed kernel module. The kmod package doesn't actually have any kernel modules in it, just the blacklists and things defined in /etc/modules-load.d and /etc/modprobe.d Nvidia drivers should be built with dkms, and *should* work without too much hassle. I know that theory doesn't always align with reality though. Anyway, I recommend you upgrade to a newer kernel on your GPU cluster. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1840854 Title: mlx5_core reports hardware checksum error for padded packets on Mellanox NICs To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840854/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
