Re: [PATCH] drm/amdgpu: fix deadlock while reading mqd from debugfs
Thanks for the patch, Patch pushed for staging. Regards Shashank On 25/03/2024 00:23, Alex Deucher wrote: On Sat, Mar 23, 2024 at 4:47 PM Sharma, Shashank wrote: On 23/03/2024 15:52, Johannes Weiner wrote: On Thu, Mar 14, 2024 at 01:09:57PM -0400, Johannes Weiner wrote: Hello, On Fri, Mar 08, 2024 at 12:32:33PM +0100, Christian König wrote: Am 07.03.24 um 23:07 schrieb Johannes Weiner: Lastly I went with an open loop instead of a memcpy() as I wasn't sure if that memory is safe to address a byte at at time. Shashank pointed out to me in private that byte access would indeed be safe. However, after actually trying it it won't work because memcpy() doesn't play nice with mqd being volatile: /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c: In function 'amdgpu_debugfs_mqd_read': /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:550:22: warning: passing argument 1 of '__builtin_dynamic_object_size' discards 'volatil' qualifier from pointer target type [-Wdiscarded-qualifiers] 550 | memcpy(kbuf, mqd, ring->mqd_size); So I would propose leaving the patch as-is. Shashank, does that sound good to you? Friendly ping :) Shashank, is your Reviewed-by still good for this patch, given the above? Ah, sorry I missed this due to some parallel work, and just realized the memcpy/volatile limitation. I also feel the need of protecting MQD read under a lock to avoid parallel change in MQD while we do byte-by-byte copy, but I will add that in my to-do list. Please feel free to use my R-b. Shashank, if the patch looks good, can you pick it up and apply it? Alex - Shashank Thanks
Re: [PATCH] drm/amdgpu: fix deadlock while reading mqd from debugfs
[AMD Official Use Only - General] Hey Alex, Sure, I will pick it up and push it to staging. Regards Shashank From: Alex Deucher Sent: Monday, March 25, 2024 12:23 AM To: Sharma, Shashank Cc: Johannes Weiner ; Christian König ; Deucher, Alexander ; Koenig, Christian ; amd-...@lists.freedesktop.org ; dri-devel@lists.freedesktop.org ; linux-ker...@vger.kernel.org Subject: Re: [PATCH] drm/amdgpu: fix deadlock while reading mqd from debugfs On Sat, Mar 23, 2024 at 4:47 PM Sharma, Shashank wrote: > > > On 23/03/2024 15:52, Johannes Weiner wrote: > > On Thu, Mar 14, 2024 at 01:09:57PM -0400, Johannes Weiner wrote: > >> Hello, > >> > >> On Fri, Mar 08, 2024 at 12:32:33PM +0100, Christian König wrote: > >>> Am 07.03.24 um 23:07 schrieb Johannes Weiner: > >>>> Lastly I went with an open loop instead of a memcpy() as I wasn't > >>>> sure if that memory is safe to address a byte at at time. > >> Shashank pointed out to me in private that byte access would indeed be > >> safe. However, after actually trying it it won't work because memcpy() > >> doesn't play nice with mqd being volatile: > >> > >> /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c: In > >> function 'amdgpu_debugfs_mqd_read': > >> /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:550:22: > >> warning: passing argument 1 of '__builtin_dynamic_object_size' discards > >> 'volatil' qualifier from pointer target type [-Wdiscarded-qualifiers] > >>550 | memcpy(kbuf, mqd, ring->mqd_size); > >> > >> So I would propose leaving the patch as-is. Shashank, does that sound > >> good to you? > > Friendly ping :) > > > > Shashank, is your Reviewed-by still good for this patch, given the > > above? > > Ah, sorry I missed this due to some parallel work, and just realized the > memcpy/volatile limitation. > > I also feel the need of protecting MQD read under a lock to avoid > parallel change in MQD while we do byte-by-byte copy, but I will add > that in my to-do list. > > Please feel free to use my R-b. Shashank, if the patch looks good, can you pick it up and apply it? Alex > > - Shashank > > > Thanks
Re: [PATCH] drm/amdgpu: fix deadlock while reading mqd from debugfs
On Sat, Mar 23, 2024 at 4:47 PM Sharma, Shashank wrote: > > > On 23/03/2024 15:52, Johannes Weiner wrote: > > On Thu, Mar 14, 2024 at 01:09:57PM -0400, Johannes Weiner wrote: > >> Hello, > >> > >> On Fri, Mar 08, 2024 at 12:32:33PM +0100, Christian König wrote: > >>> Am 07.03.24 um 23:07 schrieb Johannes Weiner: > Lastly I went with an open loop instead of a memcpy() as I wasn't > sure if that memory is safe to address a byte at at time. > >> Shashank pointed out to me in private that byte access would indeed be > >> safe. However, after actually trying it it won't work because memcpy() > >> doesn't play nice with mqd being volatile: > >> > >> /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c: In > >> function 'amdgpu_debugfs_mqd_read': > >> /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:550:22: > >> warning: passing argument 1 of '__builtin_dynamic_object_size' discards > >> 'volatil' qualifier from pointer target type [-Wdiscarded-qualifiers] > >>550 | memcpy(kbuf, mqd, ring->mqd_size); > >> > >> So I would propose leaving the patch as-is. Shashank, does that sound > >> good to you? > > Friendly ping :) > > > > Shashank, is your Reviewed-by still good for this patch, given the > > above? > > Ah, sorry I missed this due to some parallel work, and just realized the > memcpy/volatile limitation. > > I also feel the need of protecting MQD read under a lock to avoid > parallel change in MQD while we do byte-by-byte copy, but I will add > that in my to-do list. > > Please feel free to use my R-b. Shashank, if the patch looks good, can you pick it up and apply it? Alex > > - Shashank > > > Thanks
Re: [PATCH] drm/amdgpu: fix deadlock while reading mqd from debugfs
On 23/03/2024 15:52, Johannes Weiner wrote: On Thu, Mar 14, 2024 at 01:09:57PM -0400, Johannes Weiner wrote: Hello, On Fri, Mar 08, 2024 at 12:32:33PM +0100, Christian König wrote: Am 07.03.24 um 23:07 schrieb Johannes Weiner: Lastly I went with an open loop instead of a memcpy() as I wasn't sure if that memory is safe to address a byte at at time. Shashank pointed out to me in private that byte access would indeed be safe. However, after actually trying it it won't work because memcpy() doesn't play nice with mqd being volatile: /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c: In function 'amdgpu_debugfs_mqd_read': /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:550:22: warning: passing argument 1 of '__builtin_dynamic_object_size' discards 'volatil' qualifier from pointer target type [-Wdiscarded-qualifiers] 550 | memcpy(kbuf, mqd, ring->mqd_size); So I would propose leaving the patch as-is. Shashank, does that sound good to you? Friendly ping :) Shashank, is your Reviewed-by still good for this patch, given the above? Ah, sorry I missed this due to some parallel work, and just realized the memcpy/volatile limitation. I also feel the need of protecting MQD read under a lock to avoid parallel change in MQD while we do byte-by-byte copy, but I will add that in my to-do list. Please feel free to use my R-b. - Shashank Thanks
Re: [PATCH] drm/amdgpu: fix deadlock while reading mqd from debugfs
On Thu, Mar 14, 2024 at 01:09:57PM -0400, Johannes Weiner wrote: > Hello, > > On Fri, Mar 08, 2024 at 12:32:33PM +0100, Christian König wrote: > > Am 07.03.24 um 23:07 schrieb Johannes Weiner: > > > Lastly I went with an open loop instead of a memcpy() as I wasn't > > > sure if that memory is safe to address a byte at at time. > > Shashank pointed out to me in private that byte access would indeed be > safe. However, after actually trying it it won't work because memcpy() > doesn't play nice with mqd being volatile: > > /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c: In > function 'amdgpu_debugfs_mqd_read': > /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:550:22: > warning: passing argument 1 of '__builtin_dynamic_object_size' discards > 'volatil' qualifier from pointer target type [-Wdiscarded-qualifiers] > 550 | memcpy(kbuf, mqd, ring->mqd_size); > > So I would propose leaving the patch as-is. Shashank, does that sound > good to you? Friendly ping :) Shashank, is your Reviewed-by still good for this patch, given the above? Thanks
Re: [PATCH] drm/amdgpu: fix deadlock while reading mqd from debugfs
Hello, On Fri, Mar 08, 2024 at 12:32:33PM +0100, Christian König wrote: > Am 07.03.24 um 23:07 schrieb Johannes Weiner: > > Lastly I went with an open loop instead of a memcpy() as I wasn't > > sure if that memory is safe to address a byte at at time. Shashank pointed out to me in private that byte access would indeed be safe. However, after actually trying it it won't work because memcpy() doesn't play nice with mqd being volatile: /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c: In function 'amdgpu_debugfs_mqd_read': /home/hannes/src/linux/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:550:22: warning: passing argument 1 of '__builtin_dynamic_object_size' discards 'volatil' qualifier from pointer target type [-Wdiscarded-qualifiers] 550 | memcpy(kbuf, mqd, ring->mqd_size); So I would propose leaving the patch as-is. Shashank, does that sound good to you? (Please keep me CC'd on replies, as I'm not subscribed to the graphics lists.) Thanks!
Re: [PATCH] drm/amdgpu: fix deadlock while reading mqd from debugfs
+ Johannes Regards Shashank On 13/03/2024 18:22, Sharma, Shashank wrote: Hello Johannes, On 07/03/2024 23:07, Johannes Weiner wrote: An errant disk backup on my desktop got into debugfs and triggered the following deadlock scenario in the amdgpu debugfs files. The machine also hard-resets immediately after those lines are printed (although I wasn't able to reproduce that part when reading by hand): [ 1318.016074][ T1082] == [ 1318.016607][ T1082] WARNING: possible circular locking dependency detected [ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted [ 1318.017598][ T1082] -- [ 1318.018096][ T1082] tar/1082 is trying to acquire lock: [ 1318.018585][ T1082] 98c44175d6a0 (>mmap_lock){}-{3:3}, at: __might_fault+0x40/0x80 [ 1318.019084][ T1082] [ 1318.019084][ T1082] but task is already holding lock: [ 1318.020052][ T1082] 98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.020607][ T1082] [ 1318.020607][ T1082] which lock already depends on the new lock. [ 1318.020607][ T1082] [ 1318.022081][ T1082] [ 1318.022081][ T1082] the existing dependency chain (in reverse order) is: [ 1318.023083][ T1082] [ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0 [ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90 [ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330 [ 1318.025683][ T1082] do_one_initcall+0x6a/0x350 [ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.026728][ T1082] kernel_init+0x15/0x1a0 [ 1318.027242][ T1082] ret_from_fork+0x2c/0x40 [ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.028281][ T1082] [ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}: [ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330 [ 1318.029790][ T1082] do_one_initcall+0x6a/0x350 [ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.030722][ T1082] kernel_init+0x15/0x1a0 [ 1318.031168][ T1082] ret_from_fork+0x2c/0x40 [ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.032011][ T1082] [ 1318.032011][ T1082] -> #0 (>mmap_lock){}-{3:3}: [ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.033487][ T1082] __might_fault+0x58/0x80 [ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu] [ 1318.034181][ T1082] full_proxy_read+0x55/0x80 [ 1318.034487][ T1082] vfs_read+0xa7/0x360 [ 1318.034788][ T1082] ksys_read+0x70/0xf0 [ 1318.035085][ T1082] do_syscall_64+0x94/0x180 [ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.035664][ T1082] [ 1318.035664][ T1082] other info that might help us debug this: [ 1318.035664][ T1082] [ 1318.036487][ T1082] Chain exists of: [ 1318.036487][ T1082] >mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [ 1318.036487][ T1082] [ 1318.037310][ T1082] Possible unsafe locking scenario: [ 1318.037310][ T1082] [ 1318.037838][ T1082] CPU0 CPU1 [ 1318.038101][ T1082] [ 1318.038350][ T1082] lock(reservation_ww_class_mutex); [ 1318.038590][ T1082] lock(reservation_ww_class_acquire); [ 1318.038839][ T1082] lock(reservation_ww_class_mutex); [ 1318.039083][ T1082] rlock(>mmap_lock); [ 1318.039328][ T1082] [ 1318.039328][ T1082] *** DEADLOCK *** [ 1318.039328][ T1082] [ 1318.040029][ T1082] 1 lock held by tar/1082: [ 1318.040259][ T1082] #0: 98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.040560][ T1082] [ 1318.040560][ T1082] stack backtrace: [ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2 [ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023 [ 1318.041614][ T1082] Call Trace: [ 1318.041895][ T1082] [ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80 [ 1318.042460][ T1082] check_noncircular+0x145/0x160 [ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.043301][ T1082] ? __might_fault+0x40/0x80 [ 1318.043580][ T1082] ? __might_fault+0x40/0x80 [ 1318.043856][ T1082] __might_fault+0x58/0x80 [ 1318.044131][ T1082] ? __might_fault+0x40/0x80 [ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab] [ 1318.044749][ T1082] full_proxy_read+0x55/0x80 [ 1318.045042][ T1082] vfs_read+0xa7/0x360 [ 1318.045333][ T1082] ksys_read+0x70/0xf0 [ 1318.045623][ T1082] do_syscall_64+0x94/0x180 [ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180 [
Re: [PATCH] drm/amdgpu: fix deadlock while reading mqd from debugfs
Hello Johannes, On 07/03/2024 23:07, Johannes Weiner wrote: An errant disk backup on my desktop got into debugfs and triggered the following deadlock scenario in the amdgpu debugfs files. The machine also hard-resets immediately after those lines are printed (although I wasn't able to reproduce that part when reading by hand): [ 1318.016074][ T1082] == [ 1318.016607][ T1082] WARNING: possible circular locking dependency detected [ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted [ 1318.017598][ T1082] -- [ 1318.018096][ T1082] tar/1082 is trying to acquire lock: [ 1318.018585][ T1082] 98c44175d6a0 (>mmap_lock){}-{3:3}, at: __might_fault+0x40/0x80 [ 1318.019084][ T1082] [ 1318.019084][ T1082] but task is already holding lock: [ 1318.020052][ T1082] 98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.020607][ T1082] [ 1318.020607][ T1082] which lock already depends on the new lock. [ 1318.020607][ T1082] [ 1318.022081][ T1082] [ 1318.022081][ T1082] the existing dependency chain (in reverse order) is: [ 1318.023083][ T1082] [ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 1318.024114][ T1082]__ww_mutex_lock.constprop.0+0xe0/0x12f0 [ 1318.024639][ T1082]ww_mutex_lock+0x32/0x90 [ 1318.025161][ T1082]dma_resv_lockdep+0x18a/0x330 [ 1318.025683][ T1082]do_one_initcall+0x6a/0x350 [ 1318.026210][ T1082]kernel_init_freeable+0x1a3/0x310 [ 1318.026728][ T1082]kernel_init+0x15/0x1a0 [ 1318.027242][ T1082]ret_from_fork+0x2c/0x40 [ 1318.027759][ T1082]ret_from_fork_asm+0x11/0x20 [ 1318.028281][ T1082] [ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}: [ 1318.029297][ T1082]dma_resv_lockdep+0x16c/0x330 [ 1318.029790][ T1082]do_one_initcall+0x6a/0x350 [ 1318.030263][ T1082]kernel_init_freeable+0x1a3/0x310 [ 1318.030722][ T1082]kernel_init+0x15/0x1a0 [ 1318.031168][ T1082]ret_from_fork+0x2c/0x40 [ 1318.031598][ T1082]ret_from_fork_asm+0x11/0x20 [ 1318.032011][ T1082] [ 1318.032011][ T1082] -> #0 (>mmap_lock){}-{3:3}: [ 1318.032778][ T1082]__lock_acquire+0x14bf/0x2680 [ 1318.033141][ T1082]lock_acquire+0xcd/0x2c0 [ 1318.033487][ T1082]__might_fault+0x58/0x80 [ 1318.033814][ T1082]amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu] [ 1318.034181][ T1082]full_proxy_read+0x55/0x80 [ 1318.034487][ T1082]vfs_read+0xa7/0x360 [ 1318.034788][ T1082]ksys_read+0x70/0xf0 [ 1318.035085][ T1082]do_syscall_64+0x94/0x180 [ 1318.035375][ T1082]entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.035664][ T1082] [ 1318.035664][ T1082] other info that might help us debug this: [ 1318.035664][ T1082] [ 1318.036487][ T1082] Chain exists of: [ 1318.036487][ T1082] >mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [ 1318.036487][ T1082] [ 1318.037310][ T1082] Possible unsafe locking scenario: [ 1318.037310][ T1082] [ 1318.037838][ T1082]CPU0CPU1 [ 1318.038101][ T1082] [ 1318.038350][ T1082] lock(reservation_ww_class_mutex); [ 1318.038590][ T1082] lock(reservation_ww_class_acquire); [ 1318.038839][ T1082] lock(reservation_ww_class_mutex); [ 1318.039083][ T1082] rlock(>mmap_lock); [ 1318.039328][ T1082] [ 1318.039328][ T1082] *** DEADLOCK *** [ 1318.039328][ T1082] [ 1318.040029][ T1082] 1 lock held by tar/1082: [ 1318.040259][ T1082] #0: 98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.040560][ T1082] [ 1318.040560][ T1082] stack backtrace: [ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2 [ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023 [ 1318.041614][ T1082] Call Trace: [ 1318.041895][ T1082] [ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80 [ 1318.042460][ T1082] check_noncircular+0x145/0x160 [ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.043301][ T1082] ? __might_fault+0x40/0x80 [ 1318.043580][ T1082] ? __might_fault+0x40/0x80 [ 1318.043856][ T1082] __might_fault+0x58/0x80 [ 1318.044131][ T1082] ? __might_fault+0x40/0x80 [ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab] [ 1318.044749][ T1082] full_proxy_read+0x55/0x80 [ 1318.045042][ T1082] vfs_read+0xa7/0x360 [ 1318.045333][ T1082] ksys_read+0x70/0xf0 [ 1318.045623][ T1082] do_syscall_64+0x94/0x180 [ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180 [
Re: [PATCH] drm/amdgpu: fix deadlock while reading mqd from debugfs
Good catch, Shashank can you take a closer look? Thanks, Christian. Am 07.03.24 um 23:07 schrieb Johannes Weiner: An errant disk backup on my desktop got into debugfs and triggered the following deadlock scenario in the amdgpu debugfs files. The machine also hard-resets immediately after those lines are printed (although I wasn't able to reproduce that part when reading by hand): [ 1318.016074][ T1082] == [ 1318.016607][ T1082] WARNING: possible circular locking dependency detected [ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted [ 1318.017598][ T1082] -- [ 1318.018096][ T1082] tar/1082 is trying to acquire lock: [ 1318.018585][ T1082] 98c44175d6a0 (>mmap_lock){}-{3:3}, at: __might_fault+0x40/0x80 [ 1318.019084][ T1082] [ 1318.019084][ T1082] but task is already holding lock: [ 1318.020052][ T1082] 98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.020607][ T1082] [ 1318.020607][ T1082] which lock already depends on the new lock. [ 1318.020607][ T1082] [ 1318.022081][ T1082] [ 1318.022081][ T1082] the existing dependency chain (in reverse order) is: [ 1318.023083][ T1082] [ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 1318.024114][ T1082]__ww_mutex_lock.constprop.0+0xe0/0x12f0 [ 1318.024639][ T1082]ww_mutex_lock+0x32/0x90 [ 1318.025161][ T1082]dma_resv_lockdep+0x18a/0x330 [ 1318.025683][ T1082]do_one_initcall+0x6a/0x350 [ 1318.026210][ T1082]kernel_init_freeable+0x1a3/0x310 [ 1318.026728][ T1082]kernel_init+0x15/0x1a0 [ 1318.027242][ T1082]ret_from_fork+0x2c/0x40 [ 1318.027759][ T1082]ret_from_fork_asm+0x11/0x20 [ 1318.028281][ T1082] [ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}: [ 1318.029297][ T1082]dma_resv_lockdep+0x16c/0x330 [ 1318.029790][ T1082]do_one_initcall+0x6a/0x350 [ 1318.030263][ T1082]kernel_init_freeable+0x1a3/0x310 [ 1318.030722][ T1082]kernel_init+0x15/0x1a0 [ 1318.031168][ T1082]ret_from_fork+0x2c/0x40 [ 1318.031598][ T1082]ret_from_fork_asm+0x11/0x20 [ 1318.032011][ T1082] [ 1318.032011][ T1082] -> #0 (>mmap_lock){}-{3:3}: [ 1318.032778][ T1082]__lock_acquire+0x14bf/0x2680 [ 1318.033141][ T1082]lock_acquire+0xcd/0x2c0 [ 1318.033487][ T1082]__might_fault+0x58/0x80 [ 1318.033814][ T1082]amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu] [ 1318.034181][ T1082]full_proxy_read+0x55/0x80 [ 1318.034487][ T1082]vfs_read+0xa7/0x360 [ 1318.034788][ T1082]ksys_read+0x70/0xf0 [ 1318.035085][ T1082]do_syscall_64+0x94/0x180 [ 1318.035375][ T1082]entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.035664][ T1082] [ 1318.035664][ T1082] other info that might help us debug this: [ 1318.035664][ T1082] [ 1318.036487][ T1082] Chain exists of: [ 1318.036487][ T1082] >mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [ 1318.036487][ T1082] [ 1318.037310][ T1082] Possible unsafe locking scenario: [ 1318.037310][ T1082] [ 1318.037838][ T1082]CPU0CPU1 [ 1318.038101][ T1082] [ 1318.038350][ T1082] lock(reservation_ww_class_mutex); [ 1318.038590][ T1082] lock(reservation_ww_class_acquire); [ 1318.038839][ T1082] lock(reservation_ww_class_mutex); [ 1318.039083][ T1082] rlock(>mmap_lock); [ 1318.039328][ T1082] [ 1318.039328][ T1082] *** DEADLOCK *** [ 1318.039328][ T1082] [ 1318.040029][ T1082] 1 lock held by tar/1082: [ 1318.040259][ T1082] #0: 98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.040560][ T1082] [ 1318.040560][ T1082] stack backtrace: [ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2 [ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023 [ 1318.041614][ T1082] Call Trace: [ 1318.041895][ T1082] [ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80 [ 1318.042460][ T1082] check_noncircular+0x145/0x160 [ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.043301][ T1082] ? __might_fault+0x40/0x80 [ 1318.043580][ T1082] ? __might_fault+0x40/0x80 [ 1318.043856][ T1082] __might_fault+0x58/0x80 [ 1318.044131][ T1082] ? __might_fault+0x40/0x80 [ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab] [ 1318.044749][ T1082] full_proxy_read+0x55/0x80 [ 1318.045042][ T1082] vfs_read+0xa7/0x360 [ 1318.045333][ T1082] ksys_read+0x70/0xf0 [ 1318.045623][ T1082] do_syscall_64+0x94/0x180 [