Re: [Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04
Glad it is not just me. I've acquired some other IB cards (Mellanox MHJH29-XTC X5 and an Oracle 7046442) and hope to try them against the later kernels' IB drivers too, but haven't had the time to take down the server yet. On 7/6/23 09:53, Shurak wrote: > Hello! > > > Same problem here: Ubuntu 22.04 (kernel 5.15.0-76-generic) with mother > S3200SHC (with latest fw) and pci-e card (with latest fw 1.2.0) > > 01:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx > HCA] (rev 20) > > I can submit any report needed (just tell me the link to the procedure > or the console commands) > > Thank you very much > Best Regards > -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2007038 Title: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04 Status in linux package in Ubuntu: Expired Bug description: I run some x86_64 machines with Infiniband interfaces (Mellanox MT25204, ib_mthca driver + ib_ipoib for IP-over-IB). This had worked fine for years under Ubuntu 20.04.1 LTS and under RHEL6 before it. But as soon as I updated to 22.04.1 LTS -- with both its default 5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged one I could find), the IB interface doesn't work. dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules, e.g. "shift exponent -25557 is negative". That's a bizarre number - maybe a hint of something uninitialized? The crippling symptom shows up within a second after that: a NULL dereference within the ib_mthca driver -- the "BUG: kernel NULL pointer dereference", in mthca_poll_one. The interface never sets its RUNNING flag (as shown by ifconfig). The rest of the system remains usable after the "BUG" message -- the ethernet, disk, etc. drivers and other functions work as expected. Attempting to unload the ib_mthca driver causes a kernel panic. Is there anything I should try? Should I build a kernel from source with debugging? I could try installing the 5.4.0 kernel from 20.04, but would rather use something that will continue to get security patches. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.15.0-60-generic 5.15.0-60.66 ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78 Uname: Linux 5.15.0-60-generic x86_64 AlsaDevices: total 0 crw-rw+ 1 root audio 116, 1 Feb 12 14:12 seq crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.3 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CasperMD5CheckResult: pass Date: Sun Feb 12 14:17:28 2023 InstallationDate: Installed on 2020-11-22 (812 days ago) InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' MachineType: Supermicro X7DBR-8 PciMultimedia: ProcEnviron: TERM=linux PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro RelatedPackageVersions: linux-restricted-modules-5.15.0-60-generic N/A linux-backports-modules-5.15.0-60-generic N/A linux-firmware 20220329.git681281e4-0ubuntu3.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago) dmi.bios.date: 12/03/2007 dmi.bios.vendor: Phoenix Technologies LTD dmi.bios.version: 6.00 dmi.board.name: X7DBR-8 dmi.board.vendor: Supermicro dmi.board.version: PCB Version dmi.chassis.type: 1 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku: dmi.product.name: X7DBR-8 dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04
Hello! Same problem here: Ubuntu 22.04 (kernel 5.15.0-76-generic) with mother S3200SHC (with latest fw) and pci-e card (with latest fw 1.2.0) 01:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) I can submit any report needed (just tell me the link to the procedure or the console commands) Thank you very much Best Regards -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2007038 Title: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04 Status in linux package in Ubuntu: Expired Bug description: I run some x86_64 machines with Infiniband interfaces (Mellanox MT25204, ib_mthca driver + ib_ipoib for IP-over-IB). This had worked fine for years under Ubuntu 20.04.1 LTS and under RHEL6 before it. But as soon as I updated to 22.04.1 LTS -- with both its default 5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged one I could find), the IB interface doesn't work. dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules, e.g. "shift exponent -25557 is negative". That's a bizarre number - maybe a hint of something uninitialized? The crippling symptom shows up within a second after that: a NULL dereference within the ib_mthca driver -- the "BUG: kernel NULL pointer dereference", in mthca_poll_one. The interface never sets its RUNNING flag (as shown by ifconfig). The rest of the system remains usable after the "BUG" message -- the ethernet, disk, etc. drivers and other functions work as expected. Attempting to unload the ib_mthca driver causes a kernel panic. Is there anything I should try? Should I build a kernel from source with debugging? I could try installing the 5.4.0 kernel from 20.04, but would rather use something that will continue to get security patches. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.15.0-60-generic 5.15.0-60.66 ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78 Uname: Linux 5.15.0-60-generic x86_64 AlsaDevices: total 0 crw-rw+ 1 root audio 116, 1 Feb 12 14:12 seq crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.3 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CasperMD5CheckResult: pass Date: Sun Feb 12 14:17:28 2023 InstallationDate: Installed on 2020-11-22 (812 days ago) InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' MachineType: Supermicro X7DBR-8 PciMultimedia: ProcEnviron: TERM=linux PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro RelatedPackageVersions: linux-restricted-modules-5.15.0-60-generic N/A linux-backports-modules-5.15.0-60-generic N/A linux-firmware 20220329.git681281e4-0ubuntu3.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago) dmi.bios.date: 12/03/2007 dmi.bios.vendor: Phoenix Technologies LTD dmi.bios.version: 6.00 dmi.board.name: X7DBR-8 dmi.board.vendor: Supermicro dmi.board.version: PCB Version dmi.chassis.type: 1 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku: dmi.product.name: X7DBR-8 dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04
Why is this expired? I responded promptly to the last suggestion, and would respond to another. I still hope this can be addressed. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2007038 Title: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04 Status in linux package in Ubuntu: Expired Bug description: I run some x86_64 machines with Infiniband interfaces (Mellanox MT25204, ib_mthca driver + ib_ipoib for IP-over-IB). This had worked fine for years under Ubuntu 20.04.1 LTS and under RHEL6 before it. But as soon as I updated to 22.04.1 LTS -- with both its default 5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged one I could find), the IB interface doesn't work. dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules, e.g. "shift exponent -25557 is negative". That's a bizarre number - maybe a hint of something uninitialized? The crippling symptom shows up within a second after that: a NULL dereference within the ib_mthca driver -- the "BUG: kernel NULL pointer dereference", in mthca_poll_one. The interface never sets its RUNNING flag (as shown by ifconfig). The rest of the system remains usable after the "BUG" message -- the ethernet, disk, etc. drivers and other functions work as expected. Attempting to unload the ib_mthca driver causes a kernel panic. Is there anything I should try? Should I build a kernel from source with debugging? I could try installing the 5.4.0 kernel from 20.04, but would rather use something that will continue to get security patches. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.15.0-60-generic 5.15.0-60.66 ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78 Uname: Linux 5.15.0-60-generic x86_64 AlsaDevices: total 0 crw-rw+ 1 root audio 116, 1 Feb 12 14:12 seq crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.3 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CasperMD5CheckResult: pass Date: Sun Feb 12 14:17:28 2023 InstallationDate: Installed on 2020-11-22 (812 days ago) InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' MachineType: Supermicro X7DBR-8 PciMultimedia: ProcEnviron: TERM=linux PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro RelatedPackageVersions: linux-restricted-modules-5.15.0-60-generic N/A linux-backports-modules-5.15.0-60-generic N/A linux-firmware 20220329.git681281e4-0ubuntu3.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago) dmi.bios.date: 12/03/2007 dmi.bios.vendor: Phoenix Technologies LTD dmi.bios.version: 6.00 dmi.board.name: X7DBR-8 dmi.board.vendor: Supermicro dmi.board.version: PCB Version dmi.chassis.type: 1 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku: dmi.product.name: X7DBR-8 dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04
[Expired for linux (Ubuntu) because there has been no activity for 60 days.] ** Changed in: linux (Ubuntu) Status: Incomplete => Expired -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2007038 Title: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04 Status in linux package in Ubuntu: Expired Bug description: I run some x86_64 machines with Infiniband interfaces (Mellanox MT25204, ib_mthca driver + ib_ipoib for IP-over-IB). This had worked fine for years under Ubuntu 20.04.1 LTS and under RHEL6 before it. But as soon as I updated to 22.04.1 LTS -- with both its default 5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged one I could find), the IB interface doesn't work. dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules, e.g. "shift exponent -25557 is negative". That's a bizarre number - maybe a hint of something uninitialized? The crippling symptom shows up within a second after that: a NULL dereference within the ib_mthca driver -- the "BUG: kernel NULL pointer dereference", in mthca_poll_one. The interface never sets its RUNNING flag (as shown by ifconfig). The rest of the system remains usable after the "BUG" message -- the ethernet, disk, etc. drivers and other functions work as expected. Attempting to unload the ib_mthca driver causes a kernel panic. Is there anything I should try? Should I build a kernel from source with debugging? I could try installing the 5.4.0 kernel from 20.04, but would rather use something that will continue to get security patches. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.15.0-60-generic 5.15.0-60.66 ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78 Uname: Linux 5.15.0-60-generic x86_64 AlsaDevices: total 0 crw-rw+ 1 root audio 116, 1 Feb 12 14:12 seq crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.3 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CasperMD5CheckResult: pass Date: Sun Feb 12 14:17:28 2023 InstallationDate: Installed on 2020-11-22 (812 days ago) InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' MachineType: Supermicro X7DBR-8 PciMultimedia: ProcEnviron: TERM=linux PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro RelatedPackageVersions: linux-restricted-modules-5.15.0-60-generic N/A linux-backports-modules-5.15.0-60-generic N/A linux-firmware 20220329.git681281e4-0ubuntu3.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago) dmi.bios.date: 12/03/2007 dmi.bios.vendor: Phoenix Technologies LTD dmi.bios.version: 6.00 dmi.board.name: X7DBR-8 dmi.board.vendor: Supermicro dmi.board.version: PCB Version dmi.chassis.type: 1 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku: dmi.product.name: X7DBR-8 dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
Re: [Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04
OK, I tried it. It still fails in the infiniband infrastructure, but in a new way, and from an ib_core function rather than ib_mthca. There were still a couple of shift-out-of-bounds UBSAN warnings, but then an attempt to execute in a non-executable page, as if following a trashed function pointer. If there is other debugging information I should gather, please let me know. The kern.log is attached. On 2/22/23 18:45, Kai-Heng Feng wrote: > Please test latest mainline kernel: > https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.2/amd64/ > > Headers are not needed. > > ** Changed in: linux (Ubuntu) > Status: Confirmed => Incomplete > ** Attachment added: "tate-6.2.0-kern.log" https://bugs.launchpad.net/bugs/2007038/+attachment/5649830/+files/tate-6.2.0-kern.log -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2007038 Title: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04 Status in linux package in Ubuntu: Incomplete Bug description: I run some x86_64 machines with Infiniband interfaces (Mellanox MT25204, ib_mthca driver + ib_ipoib for IP-over-IB). This had worked fine for years under Ubuntu 20.04.1 LTS and under RHEL6 before it. But as soon as I updated to 22.04.1 LTS -- with both its default 5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged one I could find), the IB interface doesn't work. dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules, e.g. "shift exponent -25557 is negative". That's a bizarre number - maybe a hint of something uninitialized? The crippling symptom shows up within a second after that: a NULL dereference within the ib_mthca driver -- the "BUG: kernel NULL pointer dereference", in mthca_poll_one. The interface never sets its RUNNING flag (as shown by ifconfig). The rest of the system remains usable after the "BUG" message -- the ethernet, disk, etc. drivers and other functions work as expected. Attempting to unload the ib_mthca driver causes a kernel panic. Is there anything I should try? Should I build a kernel from source with debugging? I could try installing the 5.4.0 kernel from 20.04, but would rather use something that will continue to get security patches. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.15.0-60-generic 5.15.0-60.66 ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78 Uname: Linux 5.15.0-60-generic x86_64 AlsaDevices: total 0 crw-rw+ 1 root audio 116, 1 Feb 12 14:12 seq crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.3 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CasperMD5CheckResult: pass Date: Sun Feb 12 14:17:28 2023 InstallationDate: Installed on 2020-11-22 (812 days ago) InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' MachineType: Supermicro X7DBR-8 PciMultimedia: ProcEnviron: TERM=linux PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro RelatedPackageVersions: linux-restricted-modules-5.15.0-60-generic N/A linux-backports-modules-5.15.0-60-generic N/A linux-firmware 20220329.git681281e4-0ubuntu3.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago) dmi.bios.date: 12/03/2007 dmi.bios.vendor: Phoenix Technologies LTD dmi.bios.version: 6.00 dmi.board.name: X7DBR-8 dmi.board.vendor: Supermicro dmi.board.version: PCB Version dmi.chassis.type: 1 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku: dmi.product.name: X7DBR-8 dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04
Please test latest mainline kernel: https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.2/amd64/ Headers are not needed. ** Changed in: linux (Ubuntu) Status: Confirmed => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2007038 Title: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04 Status in linux package in Ubuntu: Incomplete Bug description: I run some x86_64 machines with Infiniband interfaces (Mellanox MT25204, ib_mthca driver + ib_ipoib for IP-over-IB). This had worked fine for years under Ubuntu 20.04.1 LTS and under RHEL6 before it. But as soon as I updated to 22.04.1 LTS -- with both its default 5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged one I could find), the IB interface doesn't work. dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules, e.g. "shift exponent -25557 is negative". That's a bizarre number - maybe a hint of something uninitialized? The crippling symptom shows up within a second after that: a NULL dereference within the ib_mthca driver -- the "BUG: kernel NULL pointer dereference", in mthca_poll_one. The interface never sets its RUNNING flag (as shown by ifconfig). The rest of the system remains usable after the "BUG" message -- the ethernet, disk, etc. drivers and other functions work as expected. Attempting to unload the ib_mthca driver causes a kernel panic. Is there anything I should try? Should I build a kernel from source with debugging? I could try installing the 5.4.0 kernel from 20.04, but would rather use something that will continue to get security patches. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.15.0-60-generic 5.15.0-60.66 ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78 Uname: Linux 5.15.0-60-generic x86_64 AlsaDevices: total 0 crw-rw+ 1 root audio 116, 1 Feb 12 14:12 seq crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.3 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CasperMD5CheckResult: pass Date: Sun Feb 12 14:17:28 2023 InstallationDate: Installed on 2020-11-22 (812 days ago) InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' MachineType: Supermicro X7DBR-8 PciMultimedia: ProcEnviron: TERM=linux PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro RelatedPackageVersions: linux-restricted-modules-5.15.0-60-generic N/A linux-backports-modules-5.15.0-60-generic N/A linux-firmware 20220329.git681281e4-0ubuntu3.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago) dmi.bios.date: 12/03/2007 dmi.bios.vendor: Phoenix Technologies LTD dmi.bios.version: 6.00 dmi.board.name: X7DBR-8 dmi.board.vendor: Supermicro dmi.board.version: PCB Version dmi.chassis.type: 1 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku: dmi.product.name: X7DBR-8 dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04
I'll note that the infiniband interface is pretty old. It is DDR data rate, while modern ones might use QDR, FDR, HDR, EDR. It might be that the ib_mthca driver has a regression relative to old hardware that isn't noticeable on more recent hardware. Confirmed that this 22.04 system, changed only by running a 5.4.0-139 kernel from 20.04, works fine with this infiniband device. I do have Qlogic IBA7322 IB hardware (ib_qib driver) on other machines here. Will update this thread if I find whether the 5.15.0 kernel appears to work with it. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2007038 Title: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04 Status in linux package in Ubuntu: Confirmed Bug description: I run some x86_64 machines with Infiniband interfaces (Mellanox MT25204, ib_mthca driver + ib_ipoib for IP-over-IB). This had worked fine for years under Ubuntu 20.04.1 LTS and under RHEL6 before it. But as soon as I updated to 22.04.1 LTS -- with both its default 5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged one I could find), the IB interface doesn't work. dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules, e.g. "shift exponent -25557 is negative". That's a bizarre number - maybe a hint of something uninitialized? The crippling symptom shows up within a second after that: a NULL dereference within the ib_mthca driver -- the "BUG: kernel NULL pointer dereference", in mthca_poll_one. The interface never sets its RUNNING flag (as shown by ifconfig). The rest of the system remains usable after the "BUG" message -- the ethernet, disk, etc. drivers and other functions work as expected. Attempting to unload the ib_mthca driver causes a kernel panic. Is there anything I should try? Should I build a kernel from source with debugging? I could try installing the 5.4.0 kernel from 20.04, but would rather use something that will continue to get security patches. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.15.0-60-generic 5.15.0-60.66 ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78 Uname: Linux 5.15.0-60-generic x86_64 AlsaDevices: total 0 crw-rw+ 1 root audio 116, 1 Feb 12 14:12 seq crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.3 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CasperMD5CheckResult: pass Date: Sun Feb 12 14:17:28 2023 InstallationDate: Installed on 2020-11-22 (812 days ago) InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' MachineType: Supermicro X7DBR-8 PciMultimedia: ProcEnviron: TERM=linux PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro RelatedPackageVersions: linux-restricted-modules-5.15.0-60-generic N/A linux-backports-modules-5.15.0-60-generic N/A linux-firmware 20220329.git681281e4-0ubuntu3.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago) dmi.bios.date: 12/03/2007 dmi.bios.vendor: Phoenix Technologies LTD dmi.bios.version: 6.00 dmi.board.name: X7DBR-8 dmi.board.vendor: Supermicro dmi.board.version: PCB Version dmi.chassis.type: 1 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku: dmi.product.name: X7DBR-8 dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp