[Bug 1940690] Re: amdgpu kernel crash

2022-04-06 Thread torel
*** This bug is a duplicate of bug 1956845 ***
https://bugs.launchpad.net/bugs/1956845

Hardware: DP Epyc Milan 7763 node with 2 qty AMD Instinct Mi100

Kernel:   ubuntu 18.04.6LTS w/linux-hwe 5.4.0-107-generic

ROCm 5.1.0 and AMDGPU version: 5.13.20.5.1 driver

Homegrown software developed using ROCm 5.1.0.

Might this be related?


Logs:

[304726.475355] beegfs: enabling unsafe global rkey
[304734.912424] amdgpu :23:00.0: amdgpu: [gfxhub0] no-retry page fault 
(src_id:0 ring:24 vmid:3 pasid:32769, for process hyprep pid 122284 thread 
hyprep pid 122284)
[304734.928526] amdgpu :23:00.0: amdgpu:   in page starting at address 
0x01753000 from IH client 0x1b (UTCL2)
[304734.939972] amdgpu :23:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[304734.948130] amdgpu :23:00.0: amdgpu: Faulty UTCL2 client ID: TCP 
(0x8)
[304734.955858] amdgpu :23:00.0: amdgpu: MORE_FAULTS: 0x1
[304734.962115] amdgpu :23:00.0: amdgpu: WALKER_ERROR: 0x0
[304734.968441] amdgpu :23:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[304734.975196] amdgpu :23:00.0: amdgpu: MAPPING_ERROR: 0x0
[304734.981580] amdgpu :23:00.0: amdgpu: RW: 0x0
[304735.568400] amdgpu :23:00.0: amdgpu: [gfxhub0] no-retry page fault 
(src_id:0 ring:24 vmid:3 pasid:32769, for process  pid 0 thread  pid 0)
[304735.582318] amdgpu :23:00.0: amdgpu:   in page starting at address 
0x01753000 from IH client 0x1b (UTCL2)
[304735.593722] amdgpu :23:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x
[304735.601851] amdgpu :23:00.0: amdgpu: Faulty UTCL2 client ID: CB 
(0x0)
[304735.609465] amdgpu :23:00.0: amdgpu: MORE_FAULTS: 0x0
[304735.615686] amdgpu :23:00.0: amdgpu: WALKER_ERROR: 0x0
[304735.621994] amdgpu :23:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[304735.628737] amdgpu :23:00.0: amdgpu: MAPPING_ERROR: 0x0
[304735.635104] amdgpu :23:00.0: amdgpu: RW: 0x0
[321839.599489] beegfs: enabling unsafe global rkey

Driver
Apr 02 22:19:59 n004 kernel: [drm] amdgpu kernel modesetting enabled.
Apr 02 22:19:59 n004 kernel: [drm] amdgpu version: 5.13.20.5.1
Apr 02 22:19:59 n004 kernel: amdgpu: Ignoring ACPI CRAT on non-APU system
Apr 02 22:19:59 n004 kernel: amdgpu: Virtual CRAT table created for CPU
Apr 02 22:19:59 n004 kernel: amdgpu: Topology: Add CPU node
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: 
remove_conflicting_pci_framebuffers: bar 0: 0x678 -> 0x67f
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: 
remove_conflicting_pci_framebuffers: bar 2: 0x680 -> 0x680001f
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: 
remove_conflicting_pci_framebuffers: bar 5: 0xeb40 -> 0xeb47
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: enabling device ( -> 0003)
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: Trusted Memory Zone 
(TMZ) feature not supported
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: Fetched VBIOS from 
ROM BAR
Apr 02 22:19:59 n004 kernel: amdgpu: ATOM BIOS: 113-D3431401-100
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: MEM ECC is active.
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: SRAM ECC is active.
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: RAS INFO: ras 
initialized successfully, hardware ability[7fff] ras_mask[7fff]
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: VRAM: 32752M 
0x0080 - 0x0087FEFF (32752M used)
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: GART: 512M 
0x - 0x1FFF
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: AGP: 267878400M 
0x0088 - 0x
Apr 02 22:19:59 n004 kernel: [drm] amdgpu: 32752M of VRAM memory ready
Apr 02 22:19:59 n004 kernel: [drm] amdgpu: 2064153M of GTT memory ready.
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: PSP runtime database 
doesn't exist
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: Will use PSP to load 
VCN firmware
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: HDCP: optional hdcp 
ta ucode is not available
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: DTM: optional dtm ta 
ucode is not available
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: RAP: optional rap ta 
ucode is not available
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: SECUREDISPLAY: 
securedisplay ta ucode is not available
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: use vbios provided 
pptable
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: smc_dpm_info table 
revision(format.content): 4.6
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: PMFW based fan 
control disabled
Apr 02 22:19:59 n004 kernel: amdgpu :23:00.0: amdgpu: SMU is initialized 
successfully!
Apr 02 22:19:59 n004 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Apr 02 22:19:59 n004 kernel: amdgpu: Virtual CRAT table created for 

[Bug 1940690] Re: amdgpu kernel crash

2022-04-06 Thread Launchpad Bug Tracker
*** This bug is a duplicate of bug 1956845 ***
https://bugs.launchpad.net/bugs/1956845

Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: linux (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1940690

Title:
  amdgpu kernel crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1940690/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1940690] Re: amdgpu kernel crash

2022-01-09 Thread Daniel van Vugt
*** This bug is a duplicate of bug 1956845 ***
https://bugs.launchpad.net/bugs/1956845

** This bug is no longer a duplicate of bug 1939417
   Ubuntu GUI crashes (w and w/o Wayland), VM_L2_PROTECTION_FAULT,  amdgpu 
:09:00.0: [gfxhub0] retry page fault
** This bug has been marked a duplicate of bug 1956845
   amdgpu: [gfxhub0] retry page fault

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1940690

Title:
  amdgpu kernel crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1940690/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1940690] Re: amdgpu kernel crash

2021-08-24 Thread Daniel van Vugt
*** This bug is a duplicate of bug 1939417 ***
https://bugs.launchpad.net/bugs/1939417

** Summary changed:

- Xorg crash
+ amdgpu kernel crash

** Package changed: xorg (Ubuntu) => linux (Ubuntu)

** This bug has been marked a duplicate of bug 1939417
   Ubuntu GUI crashes (w and w/o Wayland), VM_L2_PROTECTION_FAULT,  amdgpu 
:09:00.0: [gfxhub0] retry page fault

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1940690

Title:
  amdgpu kernel crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1940690/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs