Re: [Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

2023-07-06 Thread Stuart Levy
Glad it is not just me.

I've acquired some other IB cards (Mellanox MHJH29-XTC X5 and an Oracle 
7046442) and hope to try them against the later kernels' IB drivers too, 
but haven't had the time to take down the server yet.

On 7/6/23 09:53, Shurak wrote:
> Hello!
>
>
> Same problem here: Ubuntu 22.04 (kernel 5.15.0-76-generic) with mother 
> S3200SHC (with latest fw) and pci-e card (with latest fw 1.2.0)
>
> 01:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx
> HCA] (rev 20)
>
> I can submit any report needed (just tell me the link to the procedure
> or the console commands)
>
> Thank you very much
> Best Regards
>

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2007038

Title:
  22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

Status in linux package in Ubuntu:
  Expired

Bug description:
  I run some x86_64 machines with Infiniband interfaces (Mellanox
  MT25204, ib_mthca driver + ib_ipoib for IP-over-IB).

  This had worked fine for years under Ubuntu 20.04.1 LTS and under
  RHEL6 before it.

  But as soon as I updated to 22.04.1 LTS -- with both its default
  5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged
  one I could find), the IB interface doesn't work.

  dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules,
  e.g. "shift exponent -25557 is negative". That's a bizarre number -
  maybe a hint of something uninitialized?

  The crippling symptom shows up within a second after that: a NULL
  dereference within the ib_mthca driver -- the "BUG: kernel NULL
  pointer dereference", in mthca_poll_one.  The interface never sets its
  RUNNING flag (as shown by ifconfig).

  The rest of the system remains usable after the "BUG" message -- the
  ethernet, disk, etc. drivers and other functions work as expected.

  Attempting to unload the ib_mthca driver causes a kernel panic.

  Is there anything I should try?   Should I build a kernel from source
  with debugging?   I could try installing the 5.4.0 kernel from 20.04,
  but would rather use something that will continue to get security
  patches.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.15.0-60-generic 5.15.0-60.66
  ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78
  Uname: Linux 5.15.0-60-generic x86_64
  AlsaDevices:
   total 0
   crw-rw+ 1 root audio 116,  1 Feb 12 14:12 seq
   crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: pass
  Date: Sun Feb 12 14:17:28 2023
  InstallationDate: Installed on 2020-11-22 (812 days ago)
  InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 
(20200731)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  MachineType: Supermicro X7DBR-8
  PciMultimedia:
   
  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic 
root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-60-generic N/A
   linux-backports-modules-5.15.0-60-generic  N/A
   linux-firmware 20220329.git681281e4-0ubuntu3.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago)
  dmi.bios.date: 12/03/2007
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: X7DBR-8
  dmi.board.vendor: Supermicro
  dmi.board.version: PCB Version
  dmi.chassis.type: 1
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku:
  dmi.product.name: X7DBR-8
  dmi.product.version: 0123456789
  dmi.sys.vendor: Supermicro

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

2023-07-06 Thread Shurak
Hello!


Same problem here: Ubuntu 22.04 (kernel 5.15.0-76-generic) with mother S3200SHC 
(with latest fw) and pci-e card (with latest fw 1.2.0)

01:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx
HCA] (rev 20)

I can submit any report needed (just tell me the link to the procedure
or the console commands)

Thank you very much
Best Regards

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2007038

Title:
  22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

Status in linux package in Ubuntu:
  Expired

Bug description:
  I run some x86_64 machines with Infiniband interfaces (Mellanox
  MT25204, ib_mthca driver + ib_ipoib for IP-over-IB).

  This had worked fine for years under Ubuntu 20.04.1 LTS and under
  RHEL6 before it.

  But as soon as I updated to 22.04.1 LTS -- with both its default
  5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged
  one I could find), the IB interface doesn't work.

  dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules,
  e.g. "shift exponent -25557 is negative". That's a bizarre number -
  maybe a hint of something uninitialized?

  The crippling symptom shows up within a second after that: a NULL
  dereference within the ib_mthca driver -- the "BUG: kernel NULL
  pointer dereference", in mthca_poll_one.  The interface never sets its
  RUNNING flag (as shown by ifconfig).

  The rest of the system remains usable after the "BUG" message -- the
  ethernet, disk, etc. drivers and other functions work as expected.

  Attempting to unload the ib_mthca driver causes a kernel panic.

  Is there anything I should try?   Should I build a kernel from source
  with debugging?   I could try installing the 5.4.0 kernel from 20.04,
  but would rather use something that will continue to get security
  patches.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.15.0-60-generic 5.15.0-60.66
  ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78
  Uname: Linux 5.15.0-60-generic x86_64
  AlsaDevices:
   total 0
   crw-rw+ 1 root audio 116,  1 Feb 12 14:12 seq
   crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: pass
  Date: Sun Feb 12 14:17:28 2023
  InstallationDate: Installed on 2020-11-22 (812 days ago)
  InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 
(20200731)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  MachineType: Supermicro X7DBR-8
  PciMultimedia:
   
  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic 
root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-60-generic N/A
   linux-backports-modules-5.15.0-60-generic  N/A
   linux-firmware 20220329.git681281e4-0ubuntu3.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago)
  dmi.bios.date: 12/03/2007
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: X7DBR-8
  dmi.board.vendor: Supermicro
  dmi.board.version: PCB Version
  dmi.chassis.type: 1
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku:
  dmi.product.name: X7DBR-8
  dmi.product.version: 0123456789
  dmi.sys.vendor: Supermicro

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

2023-04-26 Thread Stuart Levy
Why is this expired?   I responded promptly to the last suggestion, and
would respond to another.   I still hope this can be addressed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2007038

Title:
  22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

Status in linux package in Ubuntu:
  Expired

Bug description:
  I run some x86_64 machines with Infiniband interfaces (Mellanox
  MT25204, ib_mthca driver + ib_ipoib for IP-over-IB).

  This had worked fine for years under Ubuntu 20.04.1 LTS and under
  RHEL6 before it.

  But as soon as I updated to 22.04.1 LTS -- with both its default
  5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged
  one I could find), the IB interface doesn't work.

  dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules,
  e.g. "shift exponent -25557 is negative". That's a bizarre number -
  maybe a hint of something uninitialized?

  The crippling symptom shows up within a second after that: a NULL
  dereference within the ib_mthca driver -- the "BUG: kernel NULL
  pointer dereference", in mthca_poll_one.  The interface never sets its
  RUNNING flag (as shown by ifconfig).

  The rest of the system remains usable after the "BUG" message -- the
  ethernet, disk, etc. drivers and other functions work as expected.

  Attempting to unload the ib_mthca driver causes a kernel panic.

  Is there anything I should try?   Should I build a kernel from source
  with debugging?   I could try installing the 5.4.0 kernel from 20.04,
  but would rather use something that will continue to get security
  patches.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.15.0-60-generic 5.15.0-60.66
  ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78
  Uname: Linux 5.15.0-60-generic x86_64
  AlsaDevices:
   total 0
   crw-rw+ 1 root audio 116,  1 Feb 12 14:12 seq
   crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: pass
  Date: Sun Feb 12 14:17:28 2023
  InstallationDate: Installed on 2020-11-22 (812 days ago)
  InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 
(20200731)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  MachineType: Supermicro X7DBR-8
  PciMultimedia:
   
  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic 
root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-60-generic N/A
   linux-backports-modules-5.15.0-60-generic  N/A
   linux-firmware 20220329.git681281e4-0ubuntu3.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago)
  dmi.bios.date: 12/03/2007
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: X7DBR-8
  dmi.board.vendor: Supermicro
  dmi.board.version: PCB Version
  dmi.chassis.type: 1
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku:
  dmi.product.name: X7DBR-8
  dmi.product.version: 0123456789
  dmi.sys.vendor: Supermicro

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

2023-04-25 Thread Launchpad Bug Tracker
[Expired for linux (Ubuntu) because there has been no activity for 60
days.]

** Changed in: linux (Ubuntu)
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2007038

Title:
  22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

Status in linux package in Ubuntu:
  Expired

Bug description:
  I run some x86_64 machines with Infiniband interfaces (Mellanox
  MT25204, ib_mthca driver + ib_ipoib for IP-over-IB).

  This had worked fine for years under Ubuntu 20.04.1 LTS and under
  RHEL6 before it.

  But as soon as I updated to 22.04.1 LTS -- with both its default
  5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged
  one I could find), the IB interface doesn't work.

  dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules,
  e.g. "shift exponent -25557 is negative". That's a bizarre number -
  maybe a hint of something uninitialized?

  The crippling symptom shows up within a second after that: a NULL
  dereference within the ib_mthca driver -- the "BUG: kernel NULL
  pointer dereference", in mthca_poll_one.  The interface never sets its
  RUNNING flag (as shown by ifconfig).

  The rest of the system remains usable after the "BUG" message -- the
  ethernet, disk, etc. drivers and other functions work as expected.

  Attempting to unload the ib_mthca driver causes a kernel panic.

  Is there anything I should try?   Should I build a kernel from source
  with debugging?   I could try installing the 5.4.0 kernel from 20.04,
  but would rather use something that will continue to get security
  patches.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.15.0-60-generic 5.15.0-60.66
  ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78
  Uname: Linux 5.15.0-60-generic x86_64
  AlsaDevices:
   total 0
   crw-rw+ 1 root audio 116,  1 Feb 12 14:12 seq
   crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: pass
  Date: Sun Feb 12 14:17:28 2023
  InstallationDate: Installed on 2020-11-22 (812 days ago)
  InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 
(20200731)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  MachineType: Supermicro X7DBR-8
  PciMultimedia:
   
  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic 
root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-60-generic N/A
   linux-backports-modules-5.15.0-60-generic  N/A
   linux-firmware 20220329.git681281e4-0ubuntu3.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago)
  dmi.bios.date: 12/03/2007
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: X7DBR-8
  dmi.board.vendor: Supermicro
  dmi.board.version: PCB Version
  dmi.chassis.type: 1
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku:
  dmi.product.name: X7DBR-8
  dmi.product.version: 0123456789
  dmi.sys.vendor: Supermicro

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


Re: [Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

2023-02-24 Thread Stuart Levy
OK, I tried it.   It still fails in the infiniband infrastructure, but 
in a new way, and from an ib_core function rather than ib_mthca.  There 
were still a couple of shift-out-of-bounds UBSAN warnings, but then an 
attempt to execute in a non-executable page, as if following a trashed 
function pointer.

If there is other debugging information I should gather, please let me 
know.   The kern.log is attached.

On 2/22/23 18:45, Kai-Heng Feng wrote:
> Please test latest mainline kernel:
> https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.2/amd64/
>
> Headers are not needed.
>
> ** Changed in: linux (Ubuntu)
> Status: Confirmed => Incomplete
>


** Attachment added: "tate-6.2.0-kern.log"
   
https://bugs.launchpad.net/bugs/2007038/+attachment/5649830/+files/tate-6.2.0-kern.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2007038

Title:
  22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  I run some x86_64 machines with Infiniband interfaces (Mellanox
  MT25204, ib_mthca driver + ib_ipoib for IP-over-IB).

  This had worked fine for years under Ubuntu 20.04.1 LTS and under
  RHEL6 before it.

  But as soon as I updated to 22.04.1 LTS -- with both its default
  5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged
  one I could find), the IB interface doesn't work.

  dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules,
  e.g. "shift exponent -25557 is negative". That's a bizarre number -
  maybe a hint of something uninitialized?

  The crippling symptom shows up within a second after that: a NULL
  dereference within the ib_mthca driver -- the "BUG: kernel NULL
  pointer dereference", in mthca_poll_one.  The interface never sets its
  RUNNING flag (as shown by ifconfig).

  The rest of the system remains usable after the "BUG" message -- the
  ethernet, disk, etc. drivers and other functions work as expected.

  Attempting to unload the ib_mthca driver causes a kernel panic.

  Is there anything I should try?   Should I build a kernel from source
  with debugging?   I could try installing the 5.4.0 kernel from 20.04,
  but would rather use something that will continue to get security
  patches.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.15.0-60-generic 5.15.0-60.66
  ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78
  Uname: Linux 5.15.0-60-generic x86_64
  AlsaDevices:
   total 0
   crw-rw+ 1 root audio 116,  1 Feb 12 14:12 seq
   crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: pass
  Date: Sun Feb 12 14:17:28 2023
  InstallationDate: Installed on 2020-11-22 (812 days ago)
  InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 
(20200731)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  MachineType: Supermicro X7DBR-8
  PciMultimedia:
   
  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic 
root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-60-generic N/A
   linux-backports-modules-5.15.0-60-generic  N/A
   linux-firmware 20220329.git681281e4-0ubuntu3.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago)
  dmi.bios.date: 12/03/2007
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: X7DBR-8
  dmi.board.vendor: Supermicro
  dmi.board.version: PCB Version
  dmi.chassis.type: 1
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku:
  dmi.product.name: X7DBR-8
  dmi.product.version: 0123456789
  dmi.sys.vendor: Supermicro

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

2023-02-22 Thread Kai-Heng Feng
Please test latest mainline kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.2/amd64/

Headers are not needed.

** Changed in: linux (Ubuntu)
   Status: Confirmed => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2007038

Title:
  22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  I run some x86_64 machines with Infiniband interfaces (Mellanox
  MT25204, ib_mthca driver + ib_ipoib for IP-over-IB).

  This had worked fine for years under Ubuntu 20.04.1 LTS and under
  RHEL6 before it.

  But as soon as I updated to 22.04.1 LTS -- with both its default
  5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged
  one I could find), the IB interface doesn't work.

  dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules,
  e.g. "shift exponent -25557 is negative". That's a bizarre number -
  maybe a hint of something uninitialized?

  The crippling symptom shows up within a second after that: a NULL
  dereference within the ib_mthca driver -- the "BUG: kernel NULL
  pointer dereference", in mthca_poll_one.  The interface never sets its
  RUNNING flag (as shown by ifconfig).

  The rest of the system remains usable after the "BUG" message -- the
  ethernet, disk, etc. drivers and other functions work as expected.

  Attempting to unload the ib_mthca driver causes a kernel panic.

  Is there anything I should try?   Should I build a kernel from source
  with debugging?   I could try installing the 5.4.0 kernel from 20.04,
  but would rather use something that will continue to get security
  patches.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.15.0-60-generic 5.15.0-60.66
  ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78
  Uname: Linux 5.15.0-60-generic x86_64
  AlsaDevices:
   total 0
   crw-rw+ 1 root audio 116,  1 Feb 12 14:12 seq
   crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: pass
  Date: Sun Feb 12 14:17:28 2023
  InstallationDate: Installed on 2020-11-22 (812 days ago)
  InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 
(20200731)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  MachineType: Supermicro X7DBR-8
  PciMultimedia:
   
  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic 
root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-60-generic N/A
   linux-backports-modules-5.15.0-60-generic  N/A
   linux-firmware 20220329.git681281e4-0ubuntu3.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago)
  dmi.bios.date: 12/03/2007
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: X7DBR-8
  dmi.board.vendor: Supermicro
  dmi.board.version: PCB Version
  dmi.chassis.type: 1
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku:
  dmi.product.name: X7DBR-8
  dmi.product.version: 0123456789
  dmi.sys.vendor: Supermicro

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2007038] Re: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

2023-02-18 Thread Stuart Levy
I'll note that the infiniband interface is pretty old.   It is DDR data
rate, while modern ones might use QDR, FDR, HDR, EDR.   It might be that
the ib_mthca driver has a regression relative to old hardware that isn't
noticeable on more recent hardware.

Confirmed that this 22.04 system, changed only by running a 5.4.0-139
kernel from 20.04, works fine with this infiniband device.

I do have Qlogic IBA7322 IB hardware (ib_qib driver) on other machines
here.  Will update this thread if I find whether the 5.15.0 kernel
appears to work with it.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2007038

Title:
  22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I run some x86_64 machines with Infiniband interfaces (Mellanox
  MT25204, ib_mthca driver + ib_ipoib for IP-over-IB).

  This had worked fine for years under Ubuntu 20.04.1 LTS and under
  RHEL6 before it.

  But as soon as I updated to 22.04.1 LTS -- with both its default
  5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged
  one I could find), the IB interface doesn't work.

  dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules,
  e.g. "shift exponent -25557 is negative". That's a bizarre number -
  maybe a hint of something uninitialized?

  The crippling symptom shows up within a second after that: a NULL
  dereference within the ib_mthca driver -- the "BUG: kernel NULL
  pointer dereference", in mthca_poll_one.  The interface never sets its
  RUNNING flag (as shown by ifconfig).

  The rest of the system remains usable after the "BUG" message -- the
  ethernet, disk, etc. drivers and other functions work as expected.

  Attempting to unload the ib_mthca driver causes a kernel panic.

  Is there anything I should try?   Should I build a kernel from source
  with debugging?   I could try installing the 5.4.0 kernel from 20.04,
  but would rather use something that will continue to get security
  patches.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.15.0-60-generic 5.15.0-60.66
  ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78
  Uname: Linux 5.15.0-60-generic x86_64
  AlsaDevices:
   total 0
   crw-rw+ 1 root audio 116,  1 Feb 12 14:12 seq
   crw-rw+ 1 root audio 116, 33 Feb 12 14:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: pass
  Date: Sun Feb 12 14:17:28 2023
  InstallationDate: Installed on 2020-11-22 (812 days ago)
  InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 
(20200731)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  MachineType: Supermicro X7DBR-8
  PciMultimedia:
   
  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic 
root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-60-generic N/A
   linux-backports-modules-5.15.0-60-generic  N/A
   linux-firmware 20220329.git681281e4-0ubuntu3.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago)
  dmi.bios.date: 12/03/2007
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: X7DBR-8
  dmi.board.vendor: Supermicro
  dmi.board.version: PCB Version
  dmi.chassis.type: 1
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku:
  dmi.product.name: X7DBR-8
  dmi.product.version: 0123456789
  dmi.sys.vendor: Supermicro

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp