Re: [regression][6.5] KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX
On Fri, Jul 14, 2023 at 4:09 PM Chen, Guchun wrote: > > Thanks for your patience on this, Mike. I think > https://patchwork.freedesktop.org/patch/547592/ can help this, please take a > try. Tested-by: Mikhail Gavrilov Thanks it looks good. I spent the whole weekend with these patches on top of 3f01e9fed845 and didn't notice any regressions. -- Best Regards, Mike Gavrilov.
RE: [regression][6.5] KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX
[Public] > -Original Message- > From: Mikhail Gavrilov > Sent: Saturday, July 8, 2023 6:27 AM > To: Chen, Guchun > Cc: amd-gfx list ; Koenig, Christian > ; Deucher, Alexander > ; Linux List Kernel Mailing ker...@vger.kernel.org> > Subject: Re: [regression][6.5] KASAN: slab-out-of-bounds in > amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX > > On Fri, Jul 7, 2023 at 6:01 AM Chen, Guchun > wrote: > > > > [Public] > > > > Hi Mike, > > > > Yes, we are aware of this problem, and we are working on that. The > problem is caused by recent code stores xcp_id to amdgpu bo for accounting > memory usage and so on. However, not all VMs are attached to that like the > case in amdgpu_mes_self_test. > > > > I would like to take part in testing the fix. Thanks for your patience on this, Mike. I think https://patchwork.freedesktop.org/patch/547592/ can help this, please take a try. Regards, Guchun > -- > Best Regards, > Mike Gavrilov.
Re: [regression][6.5] KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX
On Fri, Jul 7, 2023 at 6:01 AM Chen, Guchun wrote: > > [Public] > > Hi Mike, > > Yes, we are aware of this problem, and we are working on that. The problem is > caused by recent code stores xcp_id to amdgpu bo for accounting memory usage > and so on. However, not all VMs are attached to that like the case in > amdgpu_mes_self_test. > I would like to take part in testing the fix. -- Best Regards, Mike Gavrilov.
RE: [regression][6.5] KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX
[Public] Hi Mike, Yes, we are aware of this problem, and we are working on that. The problem is caused by recent code stores xcp_id to amdgpu bo for accounting memory usage and so on. However, not all VMs are attached to that like the case in amdgpu_mes_self_test. Regards, Guchun > -Original Message- > From: Mikhail Gavrilov > Sent: Friday, July 7, 2023 5:34 AM > To: amd-gfx list ; Koenig, Christian > ; Deucher, Alexander > ; Chen, Guchun ; > Linux List Kernel Mailing > Subject: [regression][6.5] KASAN: slab-out-of-bounds in > amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX > > Hi, > On Radeon 7900XTX appeared issue "slab-out-of-bounds in > amdgpu_vm_pt_create+0x555/0x670" between commits 3a8a670eeeaa and > e55e5df193d2. > Graphics cards with chips 6800M and 6900XT are unaffected. > > [ 12.562762] > > == > [ 12.562775] BUG: KASAN: slab-out-of-bounds in > amdgpu_vm_pt_create+0x555/0x670 [amdgpu] > [ 12.563173] Read of size 4 at addr 8881347a8dc8 by task (udev- > worker)/660 > > [ 12.563183] CPU: 0 PID: 660 Comm: (udev-worker) Tainted: GW >L--- --- > 6.5.0-0.rc0.20230630gite55e5df193d2.5.fc39.x86_64+debug #1 > [ 12.563192] Hardware name: Micro-Star International Co., Ltd. > MS-7D73/MPG B650I EDGE WIFI (MS-7D73), BIOS 1.30 05/24/2023 > [ 12.563199] Call Trace: > [ 12.563203] > [ 12.563206] dump_stack_lvl+0x76/0xd0 > [ 12.563213] print_report+0xcf/0x670 > [ 12.563220] ? amdgpu_vm_pt_create+0x555/0x670 [amdgpu] > [ 12.563433] kasan_report+0xa6/0xe0 > [ 12.563436] ? amdgpu_vm_pt_create+0x555/0x670 [amdgpu] > [ 12.563637] amdgpu_vm_pt_create+0x555/0x670 [amdgpu] > [ 12.563835] ? __pfx_amdgpu_vm_pt_create+0x10/0x10 [amdgpu] > [ 12.564030] ? __module_address+0x95/0x240 > [ 12.564035] ? lockdep_init_map_type+0x1a5/0x840 > [ 12.564040] ? __raw_spin_lock_init+0x3f/0x110 > [ 12.564044] amdgpu_vm_init+0x749/0x10c0 [amdgpu] > [ 12.564240] ? __pfx_amdgpu_vm_init+0x10/0x10 [amdgpu] > [ 12.564441] amdgpu_mes_self_test+0x16e/0x9e0 [amdgpu] > [ 12.564661] ? lock_acquire+0x1a6/0x4f0 > [ 12.564664] ? __pfx_amdgpu_mes_self_test+0x10/0x10 [amdgpu] > [ 12.564871] ? local_clock_noinstr+0xd/0xc0 > [ 12.564876] ? find_held_lock+0x34/0x120 > [ 12.564882] ? _raw_spin_unlock_irqrestore+0x4f/0x80 > [ 12.564886] ? amdgpu_irq_update+0x1b2/0x2c0 [amdgpu] > [ 12.565094] mes_v11_0_late_init+0xb8/0xe0 [amdgpu] > [ 12.565304] amdgpu_device_ip_late_init+0x100/0x7b0 [amdgpu] > [ 12.565509] amdgpu_device_init+0x7569/0x8660 [amdgpu] > [ 12.565721] ? __pfx_amdgpu_device_init+0x10/0x10 [amdgpu] > [ 12.565920] ? __pfx_pci_bus_read_config_word+0x10/0x10 > [ 12.565925] ? do_pci_enable_device+0x22d/0x2a0 > [ 12.565928] ? pci_wait_for_pending+0xa1/0x110 > [ 12.565933] amdgpu_driver_load_kms+0x1d/0x4b0 [amdgpu] > [ 12.566131] amdgpu_pci_probe+0x287/0x9e0 [amdgpu] > [ 12.566337] ? __pfx_amdgpu_pci_probe+0x10/0x10 [amdgpu] > [ 12.566536] local_pci_probe+0xda/0x190 > [ 12.566540] pci_device_probe+0x23a/0x770 > [ 12.566544] ? kernfs_add_one+0x326/0x490 > [ 12.566548] ? kernfs_get.part.0+0x4c/0x70 > [ 12.566552] ? __pfx_pci_device_probe+0x10/0x10 > [ 12.566555] ? kernfs_create_link+0x16b/0x230 > [ 12.566559] ? kernfs_put+0x1c/0x40 > [ 12.566562] ? sysfs_do_create_link_sd+0x8e/0x100 > [ 12.566566] really_probe+0x3df/0xb80 > [ 12.566570] __driver_probe_device+0x18c/0x450 > [ 12.566573] driver_probe_device+0x4a/0x120 > [ 12.566576] __driver_attach+0x1e5/0x4a0 > [ 12.566579] ? __pfx___driver_attach+0x10/0x10 > [ 12.566582] bus_for_each_dev+0x106/0x190 > [ 12.566586] ? __pfx_bus_for_each_dev+0x10/0x10 > [ 12.566591] bus_add_driver+0x2a1/0x570 > [ 12.566594] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu] > [ 12.566794] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu] > [ 12.566993] driver_register+0x134/0x460 > [ 12.566996] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu] > [ 12.567193] do_one_initcall+0xd2/0x430 > [ 12.567197] ? __pfx_do_one_initcall+0x10/0x10 > [ 12.567202] ? kasan_unpoison+0x44/0x70 > [ 12.567206] do_init_module+0x238/0x770 > [ 12.567210] load_module+0x5581/0x6f10 > [ 12.567216] ? __pfx_load_module+0x10/0x10 > [ 12.567220] ? find_held_lock+0x34/0x120 > [ 12.567223] ? local_clock_noinstr+0xd/0xc0 > [ 12.567227] ? __pfx___might_resched+0x10/0x10 > [ 12.567232] ? __do_sys_init_module+0x1f2/0x220 > [ 12.567235] __do_sys_init_module+0x1f2/0x220 > [ 12.567238] ? __pfx___do_sys_init_module+0x10/0x10 > [ 12.567243] do_syscall_64+0x5d/0x90 > [ 12.567247] ? asm_exc_page_fault+0x26/0x30 > [ 12.567251] ? lockdep_hardirqs_on+0x81/0x110 > [ 12.567255] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > [ 12.567258] RIP: 0033:0x7fdb4e92b5de > [ 12.567267] Code: 48 8b 0d 55 08 12 00 f7 d8 64 89 01 48 83 c8 ff > c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8