[Kernel-packages] [Bug 2026891] Re: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540"
This bug was fixed in the package linux-nvidia-6.2 - 6.2.0-1009.9 --- linux-nvidia-6.2 (6.2.0-1009.9) jammy; urgency=medium * jammy/linux-nvidia-6.2: 6.2.0-1009.9 -proposed tracker (LP: #2031342) * Pull-request to address ARM SMMU issue (LP: #2031320) - NVIDIA: SAUCE: iommu/arm-smmu-v3: Allow default substream bypass with a pasid support * GDS: Add NFS patches to optimized kernel (LP: #1982519) - NVMe/MVMEeOF: Patch NVMe/NVMeOF driver to support GDS on Linux 6.2 Kernel * Miscellaneous upstream changes - Revert "NVIDIA: SAUCE: Add NVMe Patches to enable GDS" - NVIDIA: [Config] CONFIG_NR_CPUS=512 for Grace - NVIDIA: [Config] CONFIG_MTD_SPI_NOR=y for Grace -- Ian May Mon, 14 Aug 2023 18:45:28 -0500 ** Changed in: linux-nvidia-6.2 (Ubuntu) Status: In Progress => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026891 Title: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540" Status in linux-nvidia-6.2 package in Ubuntu: Fix Released Bug description: We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia servers (DGX-1/DGX-2/H100) and hit the following warning during boot: [7.690486] [ cut here ] [7.690487] Interrupts were enabled early [7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540 [7.690498] Modules linked in: [7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29 06/07/2021 [7.690505] RIP: 0010:start_kernel+0x4da/0x540 [7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9 ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff [7.690510] RSP: :98803f08 EFLAGS: 00010246 [7.690512] RAX: RBX: RCX: [7.690513] RDX: RSI: RDI: [7.690514] RBP: 98803f20 R08: R09: [7.690515] R10: R11: R12: 00e0 [7.690516] R13: 5a1ccde0 R14: 5a1c7469 R15: 5a1d7ee0 [7.690518] FS: () GS:96490060() knlGS: [7.690520] CS: 0010 DS: ES: CR0: 80050033 [7.690521] CR2: 970bf000 CR3: 00ecd7810001 CR4: 000606f0 [7.690522] DR0: DR1: DR2: [7.690523] DR3: DR6: fffe0ff0 DR7: 0400 [7.690524] Call Trace: [7.690526] [7.690529] x86_64_start_kernel+0x102/0x180 [7.690536] secondary_startup_64_no_verify+0xe5/0xeb [7.690544] [7.690544] ---[ end trace ]--- I also see pretty much the same thing on some Ampere based arm64 servers: [0.000519] [ cut here ] [0.000521] Interrupts were enabled early [0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x3ac/0x514 [0.000531] Modules linked in: [0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [0.000538] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [0.000540] pc : start_kernel+0x3ac/0x514 [0.000543] lr : start_kernel+0x3ac/0x514 [0.000545] sp : dec5ff733e60 [0.000546] x29: dec5ff733e60 x28: 0819aa09baac x27: 403ffdd124e0 [0.000549] x26: bfdf3788 x25: 9b6fc000 x24: 001dba7b [0.000552] x23: 5ec57c98 x22: 0819ab2a x21: dec5ff749140 [0.000555] x20: dec5ff73d9c0 x19: dec5ffbe4000 x18: dec5ff74a1c8 [0.000558] x17: x16: x15: [0.000560] x14: x13: 0a796c7261652064 x12: 656c62616e652065 [0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : [0.000565] x8 : x7 : x6 : [0.000568] x5 : x4 : x3 : [0.000571] x2 : x1 : x0 : [0.000573] Call trace: [0.000574] start_kernel+0x3ac/0x514 [0.000577] __primary_switched+0xc0/0xc8 [0.000580] ---[ end trace ]--- The warning does not appear on an older thunderx2 server. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2026891/+subscriptions -- Mailing
[Kernel-packages] [Bug 2026891] Re: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540"
** Changed in: linux-nvidia-6.2 (Ubuntu) Status: New => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026891 Title: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540" Status in linux-nvidia-6.2 package in Ubuntu: In Progress Bug description: We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia servers (DGX-1/DGX-2/H100) and hit the following warning during boot: [7.690486] [ cut here ] [7.690487] Interrupts were enabled early [7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540 [7.690498] Modules linked in: [7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29 06/07/2021 [7.690505] RIP: 0010:start_kernel+0x4da/0x540 [7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9 ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff [7.690510] RSP: :98803f08 EFLAGS: 00010246 [7.690512] RAX: RBX: RCX: [7.690513] RDX: RSI: RDI: [7.690514] RBP: 98803f20 R08: R09: [7.690515] R10: R11: R12: 00e0 [7.690516] R13: 5a1ccde0 R14: 5a1c7469 R15: 5a1d7ee0 [7.690518] FS: () GS:96490060() knlGS: [7.690520] CS: 0010 DS: ES: CR0: 80050033 [7.690521] CR2: 970bf000 CR3: 00ecd7810001 CR4: 000606f0 [7.690522] DR0: DR1: DR2: [7.690523] DR3: DR6: fffe0ff0 DR7: 0400 [7.690524] Call Trace: [7.690526] [7.690529] x86_64_start_kernel+0x102/0x180 [7.690536] secondary_startup_64_no_verify+0xe5/0xeb [7.690544] [7.690544] ---[ end trace ]--- I also see pretty much the same thing on some Ampere based arm64 servers: [0.000519] [ cut here ] [0.000521] Interrupts were enabled early [0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x3ac/0x514 [0.000531] Modules linked in: [0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [0.000538] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [0.000540] pc : start_kernel+0x3ac/0x514 [0.000543] lr : start_kernel+0x3ac/0x514 [0.000545] sp : dec5ff733e60 [0.000546] x29: dec5ff733e60 x28: 0819aa09baac x27: 403ffdd124e0 [0.000549] x26: bfdf3788 x25: 9b6fc000 x24: 001dba7b [0.000552] x23: 5ec57c98 x22: 0819ab2a x21: dec5ff749140 [0.000555] x20: dec5ff73d9c0 x19: dec5ffbe4000 x18: dec5ff74a1c8 [0.000558] x17: x16: x15: [0.000560] x14: x13: 0a796c7261652064 x12: 656c62616e652065 [0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : [0.000565] x8 : x7 : x6 : [0.000568] x5 : x4 : x3 : [0.000571] x2 : x1 : x0 : [0.000573] Call trace: [0.000574] start_kernel+0x3ac/0x514 [0.000577] __primary_switched+0xc0/0xc8 [0.000580] ---[ end trace ]--- The warning does not appear on an older thunderx2 server. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2026891/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026891] Re: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540"
The following changes since commit 3d28f6c10d6940b0c6a497482fe90cc4dbd5549a: UBUNTU: Ubuntu-nvidia-6.2-6.2.0-1004.4~22.04.1 (2023-07-03 10:01:31 -0700) are available in the Git repository at: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/pull/new/bfigg- lp2026891 for you to fetch changes up to 8029e7fc883e8a86076e1bb4379f8d6d0236ab97: mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early (2023-07-15 19:30:31 -0700) Thomas Gleixner (1): mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early mm/slab.c | 18 ++ mm/slub.c | 9 + 2 files changed, 15 insertions(+), 12 deletions(-) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026891 Title: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540" Status in linux-nvidia-6.2 package in Ubuntu: New Bug description: We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia servers (DGX-1/DGX-2/H100) and hit the following warning during boot: [7.690486] [ cut here ] [7.690487] Interrupts were enabled early [7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540 [7.690498] Modules linked in: [7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29 06/07/2021 [7.690505] RIP: 0010:start_kernel+0x4da/0x540 [7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9 ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff [7.690510] RSP: :98803f08 EFLAGS: 00010246 [7.690512] RAX: RBX: RCX: [7.690513] RDX: RSI: RDI: [7.690514] RBP: 98803f20 R08: R09: [7.690515] R10: R11: R12: 00e0 [7.690516] R13: 5a1ccde0 R14: 5a1c7469 R15: 5a1d7ee0 [7.690518] FS: () GS:96490060() knlGS: [7.690520] CS: 0010 DS: ES: CR0: 80050033 [7.690521] CR2: 970bf000 CR3: 00ecd7810001 CR4: 000606f0 [7.690522] DR0: DR1: DR2: [7.690523] DR3: DR6: fffe0ff0 DR7: 0400 [7.690524] Call Trace: [7.690526] [7.690529] x86_64_start_kernel+0x102/0x180 [7.690536] secondary_startup_64_no_verify+0xe5/0xeb [7.690544] [7.690544] ---[ end trace ]--- I also see pretty much the same thing on some Ampere based arm64 servers: [0.000519] [ cut here ] [0.000521] Interrupts were enabled early [0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x3ac/0x514 [0.000531] Modules linked in: [0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [0.000538] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [0.000540] pc : start_kernel+0x3ac/0x514 [0.000543] lr : start_kernel+0x3ac/0x514 [0.000545] sp : dec5ff733e60 [0.000546] x29: dec5ff733e60 x28: 0819aa09baac x27: 403ffdd124e0 [0.000549] x26: bfdf3788 x25: 9b6fc000 x24: 001dba7b [0.000552] x23: 5ec57c98 x22: 0819ab2a x21: dec5ff749140 [0.000555] x20: dec5ff73d9c0 x19: dec5ffbe4000 x18: dec5ff74a1c8 [0.000558] x17: x16: x15: [0.000560] x14: x13: 0a796c7261652064 x12: 656c62616e652065 [0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : [0.000565] x8 : x7 : x6 : [0.000568] x5 : x4 : x3 : [0.000571] x2 : x1 : x0 : [0.000573] Call trace: [0.000574] start_kernel+0x3ac/0x514 [0.000577] __primary_switched+0xc0/0xc8 [0.000580] ---[ end trace ]--- The warning does not appear on an older thunderx2 server. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2026891/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe :
[Kernel-packages] [Bug 2026891] Re: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540"
Thanks. I see the same behavior (i.e. no warning) with the patch. I will add the patch 'commit f5451547b8310868f5b5acff7cd4aa7c0267edb3' to linux-nvidia-6.2 then.. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026891 Title: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540" Status in linux-nvidia-6.2 package in Ubuntu: New Bug description: We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia servers (DGX-1/DGX-2/H100) and hit the following warning during boot: [7.690486] [ cut here ] [7.690487] Interrupts were enabled early [7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540 [7.690498] Modules linked in: [7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29 06/07/2021 [7.690505] RIP: 0010:start_kernel+0x4da/0x540 [7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9 ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff [7.690510] RSP: :98803f08 EFLAGS: 00010246 [7.690512] RAX: RBX: RCX: [7.690513] RDX: RSI: RDI: [7.690514] RBP: 98803f20 R08: R09: [7.690515] R10: R11: R12: 00e0 [7.690516] R13: 5a1ccde0 R14: 5a1c7469 R15: 5a1d7ee0 [7.690518] FS: () GS:96490060() knlGS: [7.690520] CS: 0010 DS: ES: CR0: 80050033 [7.690521] CR2: 970bf000 CR3: 00ecd7810001 CR4: 000606f0 [7.690522] DR0: DR1: DR2: [7.690523] DR3: DR6: fffe0ff0 DR7: 0400 [7.690524] Call Trace: [7.690526] [7.690529] x86_64_start_kernel+0x102/0x180 [7.690536] secondary_startup_64_no_verify+0xe5/0xeb [7.690544] [7.690544] ---[ end trace ]--- I also see pretty much the same thing on some Ampere based arm64 servers: [0.000519] [ cut here ] [0.000521] Interrupts were enabled early [0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x3ac/0x514 [0.000531] Modules linked in: [0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [0.000538] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [0.000540] pc : start_kernel+0x3ac/0x514 [0.000543] lr : start_kernel+0x3ac/0x514 [0.000545] sp : dec5ff733e60 [0.000546] x29: dec5ff733e60 x28: 0819aa09baac x27: 403ffdd124e0 [0.000549] x26: bfdf3788 x25: 9b6fc000 x24: 001dba7b [0.000552] x23: 5ec57c98 x22: 0819ab2a x21: dec5ff749140 [0.000555] x20: dec5ff73d9c0 x19: dec5ffbe4000 x18: dec5ff74a1c8 [0.000558] x17: x16: x15: [0.000560] x14: x13: 0a796c7261652064 x12: 656c62616e652065 [0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : [0.000565] x8 : x7 : x6 : [0.000568] x5 : x4 : x3 : [0.000571] x2 : x1 : x0 : [0.000573] Call trace: [0.000574] start_kernel+0x3ac/0x514 [0.000577] __primary_switched+0xc0/0xc8 [0.000580] ---[ end trace ]--- The warning does not appear on an older thunderx2 server. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2026891/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026891] Re: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540"
I built and tested a 6.2.0-1004-nvidia based kernel with this patch applied and did not see the warning message on boot. I'll follow up further with Ian on Monday. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026891 Title: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540" Status in linux-nvidia-6.2 package in Ubuntu: New Bug description: We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia servers (DGX-1/DGX-2/H100) and hit the following warning during boot: [7.690486] [ cut here ] [7.690487] Interrupts were enabled early [7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540 [7.690498] Modules linked in: [7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29 06/07/2021 [7.690505] RIP: 0010:start_kernel+0x4da/0x540 [7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9 ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff [7.690510] RSP: :98803f08 EFLAGS: 00010246 [7.690512] RAX: RBX: RCX: [7.690513] RDX: RSI: RDI: [7.690514] RBP: 98803f20 R08: R09: [7.690515] R10: R11: R12: 00e0 [7.690516] R13: 5a1ccde0 R14: 5a1c7469 R15: 5a1d7ee0 [7.690518] FS: () GS:96490060() knlGS: [7.690520] CS: 0010 DS: ES: CR0: 80050033 [7.690521] CR2: 970bf000 CR3: 00ecd7810001 CR4: 000606f0 [7.690522] DR0: DR1: DR2: [7.690523] DR3: DR6: fffe0ff0 DR7: 0400 [7.690524] Call Trace: [7.690526] [7.690529] x86_64_start_kernel+0x102/0x180 [7.690536] secondary_startup_64_no_verify+0xe5/0xeb [7.690544] [7.690544] ---[ end trace ]--- I also see pretty much the same thing on some Ampere based arm64 servers: [0.000519] [ cut here ] [0.000521] Interrupts were enabled early [0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x3ac/0x514 [0.000531] Modules linked in: [0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [0.000538] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [0.000540] pc : start_kernel+0x3ac/0x514 [0.000543] lr : start_kernel+0x3ac/0x514 [0.000545] sp : dec5ff733e60 [0.000546] x29: dec5ff733e60 x28: 0819aa09baac x27: 403ffdd124e0 [0.000549] x26: bfdf3788 x25: 9b6fc000 x24: 001dba7b [0.000552] x23: 5ec57c98 x22: 0819ab2a x21: dec5ff749140 [0.000555] x20: dec5ff73d9c0 x19: dec5ffbe4000 x18: dec5ff74a1c8 [0.000558] x17: x16: x15: [0.000560] x14: x13: 0a796c7261652064 x12: 656c62616e652065 [0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : [0.000565] x8 : x7 : x6 : [0.000568] x5 : x4 : x3 : [0.000571] x2 : x1 : x0 : [0.000573] Call trace: [0.000574] start_kernel+0x3ac/0x514 [0.000577] __primary_switched+0xc0/0xc8 [0.000580] ---[ end trace ]--- The warning does not appear on an older thunderx2 server. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2026891/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026891] Re: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540"
Can you try the below commit from linus's "linux" tree and see if the warning goes away? commit f5451547b8310868f5b5acff7cd4aa7c0267edb3 Author: Thomas Gleixner Date: Tue Feb 7 15:16:53 2023 +0100 mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026891 Title: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540" Status in linux-nvidia-6.2 package in Ubuntu: New Bug description: We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia servers (DGX-1/DGX-2/H100) and hit the following warning during boot: [7.690486] [ cut here ] [7.690487] Interrupts were enabled early [7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540 [7.690498] Modules linked in: [7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29 06/07/2021 [7.690505] RIP: 0010:start_kernel+0x4da/0x540 [7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9 ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff [7.690510] RSP: :98803f08 EFLAGS: 00010246 [7.690512] RAX: RBX: RCX: [7.690513] RDX: RSI: RDI: [7.690514] RBP: 98803f20 R08: R09: [7.690515] R10: R11: R12: 00e0 [7.690516] R13: 5a1ccde0 R14: 5a1c7469 R15: 5a1d7ee0 [7.690518] FS: () GS:96490060() knlGS: [7.690520] CS: 0010 DS: ES: CR0: 80050033 [7.690521] CR2: 970bf000 CR3: 00ecd7810001 CR4: 000606f0 [7.690522] DR0: DR1: DR2: [7.690523] DR3: DR6: fffe0ff0 DR7: 0400 [7.690524] Call Trace: [7.690526] [7.690529] x86_64_start_kernel+0x102/0x180 [7.690536] secondary_startup_64_no_verify+0xe5/0xeb [7.690544] [7.690544] ---[ end trace ]--- I also see pretty much the same thing on some Ampere based arm64 servers: [0.000519] [ cut here ] [0.000521] Interrupts were enabled early [0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x3ac/0x514 [0.000531] Modules linked in: [0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [0.000538] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [0.000540] pc : start_kernel+0x3ac/0x514 [0.000543] lr : start_kernel+0x3ac/0x514 [0.000545] sp : dec5ff733e60 [0.000546] x29: dec5ff733e60 x28: 0819aa09baac x27: 403ffdd124e0 [0.000549] x26: bfdf3788 x25: 9b6fc000 x24: 001dba7b [0.000552] x23: 5ec57c98 x22: 0819ab2a x21: dec5ff749140 [0.000555] x20: dec5ff73d9c0 x19: dec5ffbe4000 x18: dec5ff74a1c8 [0.000558] x17: x16: x15: [0.000560] x14: x13: 0a796c7261652064 x12: 656c62616e652065 [0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : [0.000565] x8 : x7 : x6 : [0.000568] x5 : x4 : x3 : [0.000571] x2 : x1 : x0 : [0.000573] Call trace: [0.000574] start_kernel+0x3ac/0x514 [0.000577] __primary_switched+0xc0/0xc8 [0.000580] ---[ end trace ]--- The warning does not appear on an older thunderx2 server. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2026891/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026891] Re: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540"
yeah I suspect that.. there are couple of irq patches in the 6.2.0-1004-nvidia could be the cause.. I will update here shortly! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026891 Title: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540" Status in linux-nvidia-6.2 package in Ubuntu: New Bug description: We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia servers (DGX-1/DGX-2/H100) and hit the following warning during boot: [7.690486] [ cut here ] [7.690487] Interrupts were enabled early [7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540 [7.690498] Modules linked in: [7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29 06/07/2021 [7.690505] RIP: 0010:start_kernel+0x4da/0x540 [7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9 ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff [7.690510] RSP: :98803f08 EFLAGS: 00010246 [7.690512] RAX: RBX: RCX: [7.690513] RDX: RSI: RDI: [7.690514] RBP: 98803f20 R08: R09: [7.690515] R10: R11: R12: 00e0 [7.690516] R13: 5a1ccde0 R14: 5a1c7469 R15: 5a1d7ee0 [7.690518] FS: () GS:96490060() knlGS: [7.690520] CS: 0010 DS: ES: CR0: 80050033 [7.690521] CR2: 970bf000 CR3: 00ecd7810001 CR4: 000606f0 [7.690522] DR0: DR1: DR2: [7.690523] DR3: DR6: fffe0ff0 DR7: 0400 [7.690524] Call Trace: [7.690526] [7.690529] x86_64_start_kernel+0x102/0x180 [7.690536] secondary_startup_64_no_verify+0xe5/0xeb [7.690544] [7.690544] ---[ end trace ]--- I also see pretty much the same thing on some Ampere based arm64 servers: [0.000519] [ cut here ] [0.000521] Interrupts were enabled early [0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x3ac/0x514 [0.000531] Modules linked in: [0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [0.000538] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [0.000540] pc : start_kernel+0x3ac/0x514 [0.000543] lr : start_kernel+0x3ac/0x514 [0.000545] sp : dec5ff733e60 [0.000546] x29: dec5ff733e60 x28: 0819aa09baac x27: 403ffdd124e0 [0.000549] x26: bfdf3788 x25: 9b6fc000 x24: 001dba7b [0.000552] x23: 5ec57c98 x22: 0819ab2a x21: dec5ff749140 [0.000555] x20: dec5ff73d9c0 x19: dec5ffbe4000 x18: dec5ff74a1c8 [0.000558] x17: x16: x15: [0.000560] x14: x13: 0a796c7261652064 x12: 656c62616e652065 [0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : [0.000565] x8 : x7 : x6 : [0.000568] x5 : x4 : x3 : [0.000571] x2 : x1 : x0 : [0.000573] Call trace: [0.000574] start_kernel+0x3ac/0x514 [0.000577] __primary_switched+0xc0/0xc8 [0.000580] ---[ end trace ]--- The warning does not appear on an older thunderx2 server. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2026891/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026891] Re: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540"
I ran through several kernels on our DGX-2 server, only the latest 6.2.0-1004-nvidia kernel emitted the warning. Here are all the kernels I tried: Lunar 6.2.0-24.24 generic - PASS Jammy 5.15.0-1028-nvidia - PASS Jammy 5.19.0-46-generic - PASS Jammy 5.19.0-1014-nvidia - PASS Jammy 6.2.0-25-generic - PASS Jammy 6.2.0-1003-nvidia - PASS Jammy 6.2.0-1004-nvidia - FAIL -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026891 Title: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540" Status in linux-nvidia-6.2 package in Ubuntu: New Bug description: We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia servers (DGX-1/DGX-2/H100) and hit the following warning during boot: [7.690486] [ cut here ] [7.690487] Interrupts were enabled early [7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540 [7.690498] Modules linked in: [7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29 06/07/2021 [7.690505] RIP: 0010:start_kernel+0x4da/0x540 [7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9 ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff [7.690510] RSP: :98803f08 EFLAGS: 00010246 [7.690512] RAX: RBX: RCX: [7.690513] RDX: RSI: RDI: [7.690514] RBP: 98803f20 R08: R09: [7.690515] R10: R11: R12: 00e0 [7.690516] R13: 5a1ccde0 R14: 5a1c7469 R15: 5a1d7ee0 [7.690518] FS: () GS:96490060() knlGS: [7.690520] CS: 0010 DS: ES: CR0: 80050033 [7.690521] CR2: 970bf000 CR3: 00ecd7810001 CR4: 000606f0 [7.690522] DR0: DR1: DR2: [7.690523] DR3: DR6: fffe0ff0 DR7: 0400 [7.690524] Call Trace: [7.690526] [7.690529] x86_64_start_kernel+0x102/0x180 [7.690536] secondary_startup_64_no_verify+0xe5/0xeb [7.690544] [7.690544] ---[ end trace ]--- I also see pretty much the same thing on some Ampere based arm64 servers: [0.000519] [ cut here ] [0.000521] Interrupts were enabled early [0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x3ac/0x514 [0.000531] Modules linked in: [0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [0.000538] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [0.000540] pc : start_kernel+0x3ac/0x514 [0.000543] lr : start_kernel+0x3ac/0x514 [0.000545] sp : dec5ff733e60 [0.000546] x29: dec5ff733e60 x28: 0819aa09baac x27: 403ffdd124e0 [0.000549] x26: bfdf3788 x25: 9b6fc000 x24: 001dba7b [0.000552] x23: 5ec57c98 x22: 0819ab2a x21: dec5ff749140 [0.000555] x20: dec5ff73d9c0 x19: dec5ffbe4000 x18: dec5ff74a1c8 [0.000558] x17: x16: x15: [0.000560] x14: x13: 0a796c7261652064 x12: 656c62616e652065 [0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : [0.000565] x8 : x7 : x6 : [0.000568] x5 : x4 : x3 : [0.000571] x2 : x1 : x0 : [0.000573] Call trace: [0.000574] start_kernel+0x3ac/0x514 [0.000577] __primary_switched+0xc0/0xc8 [0.000580] ---[ end trace ]--- The warning does not appear on an older thunderx2 server. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2026891/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026891] Re: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540"
** Changed in: linux-nvidia-6.2 (Ubuntu) Assignee: (unassigned) => Tushar Dave (tdavenvidia) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026891 Title: linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540" Status in linux-nvidia-6.2 package in Ubuntu: New Bug description: We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia servers (DGX-1/DGX-2/H100) and hit the following warning during boot: [7.690486] [ cut here ] [7.690487] Interrupts were enabled early [7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x4da/0x540 [7.690498] Modules linked in: [7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29 06/07/2021 [7.690505] RIP: 0010:start_kernel+0x4da/0x540 [7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9 ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff [7.690510] RSP: :98803f08 EFLAGS: 00010246 [7.690512] RAX: RBX: RCX: [7.690513] RDX: RSI: RDI: [7.690514] RBP: 98803f20 R08: R09: [7.690515] R10: R11: R12: 00e0 [7.690516] R13: 5a1ccde0 R14: 5a1c7469 R15: 5a1d7ee0 [7.690518] FS: () GS:96490060() knlGS: [7.690520] CS: 0010 DS: ES: CR0: 80050033 [7.690521] CR2: 970bf000 CR3: 00ecd7810001 CR4: 000606f0 [7.690522] DR0: DR1: DR2: [7.690523] DR3: DR6: fffe0ff0 DR7: 0400 [7.690524] Call Trace: [7.690526] [7.690529] x86_64_start_kernel+0x102/0x180 [7.690536] secondary_startup_64_no_verify+0xe5/0xeb [7.690544] [7.690544] ---[ end trace ]--- I also see pretty much the same thing on some Ampere based arm64 servers: [0.000519] [ cut here ] [0.000521] Interrupts were enabled early [0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065 start_kernel+0x3ac/0x514 [0.000531] Modules linked in: [0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia #4~22.04.1-Ubuntu [0.000538] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [0.000540] pc : start_kernel+0x3ac/0x514 [0.000543] lr : start_kernel+0x3ac/0x514 [0.000545] sp : dec5ff733e60 [0.000546] x29: dec5ff733e60 x28: 0819aa09baac x27: 403ffdd124e0 [0.000549] x26: bfdf3788 x25: 9b6fc000 x24: 001dba7b [0.000552] x23: 5ec57c98 x22: 0819ab2a x21: dec5ff749140 [0.000555] x20: dec5ff73d9c0 x19: dec5ffbe4000 x18: dec5ff74a1c8 [0.000558] x17: x16: x15: [0.000560] x14: x13: 0a796c7261652064 x12: 656c62616e652065 [0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : [0.000565] x8 : x7 : x6 : [0.000568] x5 : x4 : x3 : [0.000571] x2 : x1 : x0 : [0.000573] Call trace: [0.000574] start_kernel+0x3ac/0x514 [0.000577] __primary_switched+0xc0/0xc8 [0.000580] ---[ end trace ]--- The warning does not appear on an older thunderx2 server. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2026891/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp