Re: [regression][6.5] KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX

2023-07-16 Thread Mikhail Gavrilov
On Fri, Jul 14, 2023 at 4:09 PM Chen, Guchun  wrote:
>
> Thanks for your patience on this, Mike. I think 
> https://patchwork.freedesktop.org/patch/547592/ can help this, please take a 
> try.

Tested-by: Mikhail Gavrilov 
Thanks it looks good. I spent the whole weekend with these patches on
top of 3f01e9fed845 and didn't notice any regressions.

-- 
Best Regards,
Mike Gavrilov.


RE: [regression][6.5] KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX

2023-07-14 Thread Chen, Guchun
[Public]

> -Original Message-
> From: Mikhail Gavrilov 
> Sent: Saturday, July 8, 2023 6:27 AM
> To: Chen, Guchun 
> Cc: amd-gfx list ; Koenig, Christian
> ; Deucher, Alexander
> ; Linux List Kernel Mailing  ker...@vger.kernel.org>
> Subject: Re: [regression][6.5] KASAN: slab-out-of-bounds in
> amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX
>
> On Fri, Jul 7, 2023 at 6:01 AM Chen, Guchun 
> wrote:
> >
> > [Public]
> >
> > Hi Mike,
> >
> > Yes, we are aware of this problem, and we are working on that. The
> problem is caused by recent code stores xcp_id to amdgpu bo for accounting
> memory usage and so on. However, not all VMs are attached to that like the
> case in amdgpu_mes_self_test.
> >
>
> I would like to take part in testing the fix.

Thanks for your patience on this, Mike. I think 
https://patchwork.freedesktop.org/patch/547592/ can help this, please take a 
try.

Regards,
Guchun

> --
> Best Regards,
> Mike Gavrilov.


Re: [regression][6.5] KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX

2023-07-07 Thread Mikhail Gavrilov
On Fri, Jul 7, 2023 at 6:01 AM Chen, Guchun  wrote:
>
> [Public]
>
> Hi Mike,
>
> Yes, we are aware of this problem, and we are working on that. The problem is 
> caused by recent code stores xcp_id to amdgpu bo for accounting memory usage 
> and so on. However, not all VMs are attached to that like the case in 
> amdgpu_mes_self_test.
>

I would like to take part in testing the fix.

-- 
Best Regards,
Mike Gavrilov.


RE: [regression][6.5] KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX

2023-07-06 Thread Chen, Guchun
[Public]

Hi Mike,

Yes, we are aware of this problem, and we are working on that. The problem is 
caused by recent code stores xcp_id to amdgpu bo for accounting memory usage 
and so on. However, not all VMs are attached to that like the case in 
amdgpu_mes_self_test.

Regards,
Guchun

> -Original Message-
> From: Mikhail Gavrilov 
> Sent: Friday, July 7, 2023 5:34 AM
> To: amd-gfx list ; Koenig, Christian
> ; Deucher, Alexander
> ; Chen, Guchun ;
> Linux List Kernel Mailing 
> Subject: [regression][6.5] KASAN: slab-out-of-bounds in
> amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX
>
> Hi,
> On Radeon 7900XTX appeared issue "slab-out-of-bounds in
> amdgpu_vm_pt_create+0x555/0x670" between commits 3a8a670eeeaa and
> e55e5df193d2.
> Graphics cards with chips 6800M and 6900XT are unaffected.
>
> [   12.562762]
> 
> ==
> [   12.562775] BUG: KASAN: slab-out-of-bounds in
> amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
> [   12.563173] Read of size 4 at addr 8881347a8dc8 by task (udev-
> worker)/660
>
> [   12.563183] CPU: 0 PID: 660 Comm: (udev-worker) Tainted: GW
>L---  ---
> 6.5.0-0.rc0.20230630gite55e5df193d2.5.fc39.x86_64+debug #1
> [   12.563192] Hardware name: Micro-Star International Co., Ltd.
> MS-7D73/MPG B650I EDGE WIFI (MS-7D73), BIOS 1.30 05/24/2023
> [   12.563199] Call Trace:
> [   12.563203]  
> [   12.563206]  dump_stack_lvl+0x76/0xd0
> [   12.563213]  print_report+0xcf/0x670
> [   12.563220]  ? amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
> [   12.563433]  kasan_report+0xa6/0xe0
> [   12.563436]  ? amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
> [   12.563637]  amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
> [   12.563835]  ? __pfx_amdgpu_vm_pt_create+0x10/0x10 [amdgpu]
> [   12.564030]  ? __module_address+0x95/0x240
> [   12.564035]  ? lockdep_init_map_type+0x1a5/0x840
> [   12.564040]  ? __raw_spin_lock_init+0x3f/0x110
> [   12.564044]  amdgpu_vm_init+0x749/0x10c0 [amdgpu]
> [   12.564240]  ? __pfx_amdgpu_vm_init+0x10/0x10 [amdgpu]
> [   12.564441]  amdgpu_mes_self_test+0x16e/0x9e0 [amdgpu]
> [   12.564661]  ? lock_acquire+0x1a6/0x4f0
> [   12.564664]  ? __pfx_amdgpu_mes_self_test+0x10/0x10 [amdgpu]
> [   12.564871]  ? local_clock_noinstr+0xd/0xc0
> [   12.564876]  ? find_held_lock+0x34/0x120
> [   12.564882]  ? _raw_spin_unlock_irqrestore+0x4f/0x80
> [   12.564886]  ? amdgpu_irq_update+0x1b2/0x2c0 [amdgpu]
> [   12.565094]  mes_v11_0_late_init+0xb8/0xe0 [amdgpu]
> [   12.565304]  amdgpu_device_ip_late_init+0x100/0x7b0 [amdgpu]
> [   12.565509]  amdgpu_device_init+0x7569/0x8660 [amdgpu]
> [   12.565721]  ? __pfx_amdgpu_device_init+0x10/0x10 [amdgpu]
> [   12.565920]  ? __pfx_pci_bus_read_config_word+0x10/0x10
> [   12.565925]  ? do_pci_enable_device+0x22d/0x2a0
> [   12.565928]  ? pci_wait_for_pending+0xa1/0x110
> [   12.565933]  amdgpu_driver_load_kms+0x1d/0x4b0 [amdgpu]
> [   12.566131]  amdgpu_pci_probe+0x287/0x9e0 [amdgpu]
> [   12.566337]  ? __pfx_amdgpu_pci_probe+0x10/0x10 [amdgpu]
> [   12.566536]  local_pci_probe+0xda/0x190
> [   12.566540]  pci_device_probe+0x23a/0x770
> [   12.566544]  ? kernfs_add_one+0x326/0x490
> [   12.566548]  ? kernfs_get.part.0+0x4c/0x70
> [   12.566552]  ? __pfx_pci_device_probe+0x10/0x10
> [   12.566555]  ? kernfs_create_link+0x16b/0x230
> [   12.566559]  ? kernfs_put+0x1c/0x40
> [   12.566562]  ? sysfs_do_create_link_sd+0x8e/0x100
> [   12.566566]  really_probe+0x3df/0xb80
> [   12.566570]  __driver_probe_device+0x18c/0x450
> [   12.566573]  driver_probe_device+0x4a/0x120
> [   12.566576]  __driver_attach+0x1e5/0x4a0
> [   12.566579]  ? __pfx___driver_attach+0x10/0x10
> [   12.566582]  bus_for_each_dev+0x106/0x190
> [   12.566586]  ? __pfx_bus_for_each_dev+0x10/0x10
> [   12.566591]  bus_add_driver+0x2a1/0x570
> [   12.566594]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
> [   12.566794]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
> [   12.566993]  driver_register+0x134/0x460
> [   12.566996]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
> [   12.567193]  do_one_initcall+0xd2/0x430
> [   12.567197]  ? __pfx_do_one_initcall+0x10/0x10
> [   12.567202]  ? kasan_unpoison+0x44/0x70
> [   12.567206]  do_init_module+0x238/0x770
> [   12.567210]  load_module+0x5581/0x6f10
> [   12.567216]  ? __pfx_load_module+0x10/0x10
> [   12.567220]  ? find_held_lock+0x34/0x120
> [   12.567223]  ? local_clock_noinstr+0xd/0xc0
> [   12.567227]  ? __pfx___might_resched+0x10/0x10
> [   12.567232]  ? __do_sys_init_module+0x1f2/0x220
> [   12.567235]  __do_sys_init_module+0x1f2/0x220
> [   12.567238]  ? __pfx___do_sys_init_module+0x10/0x10
> [   12.567243]  do_syscall_64+0x5d/0x90
> [   12.567247]  ? asm_exc_page_fault+0x26/0x30
> [   12.567251]  ? lockdep_hardirqs_on+0x81/0x110
> [   12.567255]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [   12.567258] RIP: 0033:0x7fdb4e92b5de
> [   12.567267] Code: 48 8b 0d 55 08 12 00 f7 d8 64 89 01 48 83 c8 ff
> c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8