Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-10 Thread Christian König

Am 10.02.23 um 12:30 schrieb Xiao, Jack:


[AMD Official Use Only - General]

>> The driver are resumed before the core Linux memory management is 
ready to serve allocations. E.g. swap for example isn't turned on yet.


>> This means that this stuff only worked because we were able to 
allocate memory from the pool which isn't guaranteed in any way.


Memory allocation failure can happen at any time, every programmer 
should correctly handle it.




We are not talking about memory allocation failure, we are talking about 
the kernel calling panic() because it can't properly resume.


Regards,
Christian.

If memory allocation failure is not critical error and can gracefully 
continue to run, it should be acceptable.


The memory allocation failure during mes self test should be the 
acceptable one. It will not make system hang up and


driver can gracefully continue to run.

Regards,

Jack

*From:* Koenig, Christian 
*Sent:* Friday, February 10, 2023 6:25 PM
*To:* Xiao, Jack ; Quan, Evan ; 
Christian König ; 
amd-gfx@lists.freedesktop.org; Deucher, Alexander 

*Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
is unavailable


Hi Jack,

Am 10.02.23 um 10:51 schrieb Xiao, Jack:

[AMD Official Use Only - General]

Hi Christian,

>> Allocating buffers temporary for stuff like that is illegal
during resume.

Can you **deeply** explain why it is illegal during ip late_init
stage which is a part stage of resume?


Well no, I don't have the time to explain this to everybody individually.

[Jack] …

In my understanding, after gmc ready, driver can allocate/free
kernel bo, and after SDMA ready,

the eviction should be ready. What else prevent driver doing that
during resume?


The driver are resumed before the core Linux memory management is 
ready to serve allocations. E.g. swap for example isn't turned on yet.


This means that this stuff only worked because we were able to 
allocate memory from the pool which isn't guaranteed in any way.


>> I strongly suggest to just remove the MES test. It's abusing
the kernel ring interface in a way we didn't want anyway and is
currently replaced by Shahanks work.

The kernel mes unit test is very meaningful and important to catch
MES firmware issue at first time before

issue went spread to other components like kfd/umd to avoid the
problem complicated, Otherwise, the issue

would become hard to catch and debug.

Secondly, for mes unit test is self-containing and no dependency,
it is a part of milestone to qualify MES ready,

indicating that it can deliver to other component especially
during brinup. It is likely ring test and ib test indicating

gfx is ready to go. After totally transitioning to gfx user queue,
mes unit test may be the only one unit test which

can indicate gfx is ready at the very early stage of bringup when
UMD is not ready.


Alex and I are the maintainers of the driver who are deciding stuff 
like that and at least I don't really buy that argument. The ring, IB 
and benchmark tests are in the kernel module because they are simple.


If we have a complicated unit test like simulating creating an MES 
user queue to test the firmware functionality than that's really 
overkill. Especially when you need to allocate memory for it.


We previously had people requesting to add shader code and other 
complicated testing and rejected that as well because it just bloat up 
the kernel driver unnecessarily.


If we can modify the MES test to not abuse the amdgpu_ring structure 
only work with memory from the SA for example we could keep this, but 
not really in the current state.


Regards,
Christian.

Regards,

Jack

*From:* Koenig, Christian 
<mailto:christian.koe...@amd.com>
*Sent:* Friday, February 10, 2023 4:08 PM
*To:* Quan, Evan  <mailto:evan.q...@amd.com>;
Christian König 
<mailto:ckoenig.leichtzumer...@gmail.com>; Xiao, Jack
 <mailto:jack.x...@amd.com>;
amd-gfx@lists.freedesktop.org; Deucher, Alexander
 <mailto:alexander.deuc...@amd.com>
    *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when
DMA is unavailable

Hi Evan,

yeah, exactly that's what this warning should prevent. Allocating
buffers temporary for stuff like that is illegal during resume.

I strongly suggest to just remove the MES test. It's abusing the
kernel ring interface in a way we didn't want anyway and is
currently replaced by Shahanks work.

Regards,
Christian.

Am 10.02.23 um 05:12 schrieb Quan, Evan:

[AMD Official Use Only - General]

Hi Jack,

Are you trying to fix the call trace popped up on resuming below?

It seems mes created some bo for its self test and freed it up
later at the final stage of the resuming process.

All these happened before the in_suspend f

RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-10 Thread Xiao, Jack
[AMD Official Use Only - General]

>> The driver are resumed before the core Linux memory management is ready to 
>> serve allocations. E.g. swap for example isn't turned on yet.

>> This means that this stuff only worked because we were able to allocate 
>> memory from the pool which isn't guaranteed in any way.

Memory allocation failure can happen at any time, every programmer should 
correctly handle it.
If memory allocation failure is not critical error and can gracefully continue 
to run, it should be acceptable.
The memory allocation failure during mes self test should be the acceptable 
one. It will not make system hang up and
driver can gracefully continue to run.

Regards,
Jack

From: Koenig, Christian 
Sent: Friday, February 10, 2023 6:25 PM
To: Xiao, Jack ; Quan, Evan ; Christian 
König ; amd-gfx@lists.freedesktop.org; 
Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable

Hi Jack,

Am 10.02.23 um 10:51 schrieb Xiao, Jack:

[AMD Official Use Only - General]

Hi Christian,

>> Allocating buffers temporary for stuff like that is illegal during resume.

Can you *deeply* explain why it is illegal during ip late_init stage which is a 
part stage of resume?

Well no, I don't have the time to explain this to everybody individually.

[Jack] ...

In my understanding, after gmc ready, driver can allocate/free kernel bo, and 
after SDMA ready,
the eviction should be ready. What else prevent driver doing that during resume?

The driver are resumed before the core Linux memory management is ready to 
serve allocations. E.g. swap for example isn't turned on yet.

This means that this stuff only worked because we were able to allocate memory 
from the pool which isn't guaranteed in any way.

>> I strongly suggest to just remove the MES test. It's abusing the kernel ring 
>> interface in a way we didn't want anyway and is currently replaced by 
>> Shahanks work.

The kernel mes unit test is very meaningful and important to catch MES firmware 
issue at first time before
issue went spread to other components like kfd/umd to avoid the problem 
complicated, Otherwise, the issue
would become hard to catch and debug.

Secondly, for mes unit test is self-containing and no dependency, it is a part 
of milestone to qualify MES ready,
indicating that it can deliver to other component especially during brinup. It 
is likely ring test and ib test indicating
gfx is ready to go. After totally transitioning to gfx user queue, mes unit 
test may be the only one unit test which
can indicate gfx is ready at the very early stage of bringup when UMD is not 
ready.

Alex and I are the maintainers of the driver who are deciding stuff like that 
and at least I don't really buy that argument. The ring, IB and benchmark tests 
are in the kernel module because they are simple.

If we have a complicated unit test like simulating creating an MES user queue 
to test the firmware functionality than that's really overkill. Especially when 
you need to allocate memory for it.

We previously had people requesting to add shader code and other complicated 
testing and rejected that as well because it just bloat up the kernel driver 
unnecessarily.

If we can modify the MES test to not abuse the amdgpu_ring structure only work 
with memory from the SA for example we could keep this, but not really in the 
current state.

Regards,
Christian.


Regards,
Jack

From: Koenig, Christian 
<mailto:christian.koe...@amd.com>
Sent: Friday, February 10, 2023 4:08 PM
To: Quan, Evan <mailto:evan.q...@amd.com>; Christian König 
<mailto:ckoenig.leichtzumer...@gmail.com>; 
Xiao, Jack <mailto:jack.x...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, 
Alexander <mailto:alexander.deuc...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable

Hi Evan,

yeah, exactly that's what this warning should prevent. Allocating buffers 
temporary for stuff like that is illegal during resume.

I strongly suggest to just remove the MES test. It's abusing the kernel ring 
interface in a way we didn't want anyway and is currently replaced by Shahanks 
work.

Regards,
Christian.
Am 10.02.23 um 05:12 schrieb Quan, Evan:

[AMD Official Use Only - General]

Hi Jack,

Are you trying to fix the call trace popped up on resuming below?
It seems mes created some bo for its self test and freed it up later at the 
final stage of the resuming process.
All these happened before the in_suspend flag cleared. And that triggered the 
call trace.
Is my understanding correct?

[74084.799260] WARNING: CPU: 2 PID: 2891 at 
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 
[amdgpu]
[74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE) iommu_v2 gpu_sched 
drm_buddy drm_ttm_helper ttm drm_display_helper drm_kms_helper i2c_algo_bit 
fb_sys_fops

Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-10 Thread Christian König

Hi Jack,

Am 10.02.23 um 10:51 schrieb Xiao, Jack:


[AMD Official Use Only - General]


Hi Christian,

>> Allocating buffers temporary for stuff like that is illegal during 
resume.


Can you **deeply** explain why it is illegal during ip late_init stage 
which is a part stage of resume?




Well no, I don't have the time to explain this to everybody individually.

In my understanding, after gmc ready, driver can allocate/free kernel 
bo, and after SDMA ready,


the eviction should be ready. What else prevent driver doing that 
during resume?




The driver are resumed before the core Linux memory management is ready 
to serve allocations. E.g. swap for example isn't turned on yet.


This means that this stuff only worked because we were able to allocate 
memory from the pool which isn't guaranteed in any way.


>> I strongly suggest to just remove the MES test. It's abusing the 
kernel ring interface in a way we didn't want anyway and is currently 
replaced by Shahanks work.


The kernel mes unit test is very meaningful and important to catch MES 
firmware issue at first time before


issue went spread to other components like kfd/umd to avoid the 
problem complicated, Otherwise, the issue


would become hard to catch and debug.

Secondly, for mes unit test is self-containing and no dependency, it 
is a part of milestone to qualify MES ready,


indicating that it can deliver to other component especially during 
brinup. It is likely ring test and ib test indicating


gfx is ready to go. After totally transitioning to gfx user queue, mes 
unit test may be the only one unit test which


can indicate gfx is ready at the very early stage of bringup when UMD 
is not ready.




Alex and I are the maintainers of the driver who are deciding stuff like 
that and at least I don't really buy that argument. The ring, IB and 
benchmark tests are in the kernel module because they are simple.


If we have a complicated unit test like simulating creating an MES user 
queue to test the firmware functionality than that's really overkill. 
Especially when you need to allocate memory for it.


We previously had people requesting to add shader code and other 
complicated testing and rejected that as well because it just bloat up 
the kernel driver unnecessarily.


If we can modify the MES test to not abuse the amdgpu_ring structure 
only work with memory from the SA for example we could keep this, but 
not really in the current state.


Regards,
Christian.


Regards,

Jack

*From:* Koenig, Christian 
*Sent:* Friday, February 10, 2023 4:08 PM
*To:* Quan, Evan ; Christian König 
; Xiao, Jack ; 
amd-gfx@lists.freedesktop.org; Deucher, Alexander 

*Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
is unavailable


Hi Evan,

yeah, exactly that's what this warning should prevent. Allocating 
buffers temporary for stuff like that is illegal during resume.


I strongly suggest to just remove the MES test. It's abusing the 
kernel ring interface in a way we didn't want anyway and is currently 
replaced by Shahanks work.


Regards,
Christian.

Am 10.02.23 um 05:12 schrieb Quan, Evan:

[AMD Official Use Only - General]

Hi Jack,

Are you trying to fix the call trace popped up on resuming below?

It seems mes created some bo for its self test and freed it up
later at the final stage of the resuming process.

All these happened before the in_suspend flag cleared. And that
triggered the call trace.

Is my understanding correct?

[74084.799260] WARNING: CPU: 2 PID: 2891 at
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425
amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]

[74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE)
iommu_v2 gpu_sched drm_buddy drm_ttm_helper ttm drm_display_helper
drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect
sysimgblt snd_sm

[74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp
hid_logitech_dj hid_generic e1000e usbhid ptp uas hid video
i2c_i801 ahci pps_core crc32_pclmul i2c_smbus usb_storage libahci wmi

[74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: G
   W IOE  6.0.0-custom #1

[74084.923146] Hardware name: ASUS System Product Name/PRIME
Z390-A, BIOS 2004 11/02/2021

[74084.931074] Workqueue: events_unbound async_run_entry_fn

[74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]

[74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d
85 e4 74 08 49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d
c3 cc cc cc cc <0f> 0b e9 39 ff ff ff 3d 00 fe ff ff 0f 85 75 96
47 00 ebf

[74084.961199] RSP: :bed6812ebb90 EFLAGS: 00010202

[74084.966435] RAX:  RBX: bed6812ebc50 RCX:


[74084.973578] RDX: bed6812ebc70 RSI: bed6812ebc60 RDI:
bed6812ebc50

[74084.980725] RBP: bed6812ebbb8 R08:  R09:
  

RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-10 Thread Xiao, Jack
[AMD Official Use Only - General]

Hi Christian,

>> Allocating buffers temporary for stuff like that is illegal during resume.

Can you *deeply* explain why it is illegal during ip late_init stage which is a 
part stage of resume?
In my understanding, after gmc ready, driver can allocate/free kernel bo, and 
after SDMA ready,
the eviction should be ready. What else prevent driver doing that during resume?

>> I strongly suggest to just remove the MES test. It's abusing the kernel ring 
>> interface in a way we didn't want anyway and is currently replaced by 
>> Shahanks work.

The kernel mes unit test is very meaningful and important to catch MES firmware 
issue at first time before
issue went spread to other components like kfd/umd to avoid the problem 
complicated, Otherwise, the issue
would become hard to catch and debug.

Secondly, for mes unit test is self-containing and no dependency, it is a part 
of milestone to qualify MES ready,
indicating that it can deliver to other component especially during brinup. It 
is likely ring test and ib test indicating
gfx is ready to go. After totally transitioning to gfx user queue, mes unit 
test may be the only one unit test which
can indicate gfx is ready at the very early stage of bringup when UMD is not 
ready.

Regards,
Jack

From: Koenig, Christian 
Sent: Friday, February 10, 2023 4:08 PM
To: Quan, Evan ; Christian König 
; Xiao, Jack ; 
amd-gfx@lists.freedesktop.org; Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable

Hi Evan,

yeah, exactly that's what this warning should prevent. Allocating buffers 
temporary for stuff like that is illegal during resume.

I strongly suggest to just remove the MES test. It's abusing the kernel ring 
interface in a way we didn't want anyway and is currently replaced by Shahanks 
work.

Regards,
Christian.
Am 10.02.23 um 05:12 schrieb Quan, Evan:

[AMD Official Use Only - General]

Hi Jack,

Are you trying to fix the call trace popped up on resuming below?
It seems mes created some bo for its self test and freed it up later at the 
final stage of the resuming process.
All these happened before the in_suspend flag cleared. And that triggered the 
call trace.
Is my understanding correct?

[74084.799260] WARNING: CPU: 2 PID: 2891 at 
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 
[amdgpu]
[74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE) iommu_v2 gpu_sched 
drm_buddy drm_ttm_helper ttm drm_display_helper drm_kms_helper i2c_algo_bit 
fb_sys_fops syscopyarea sysfillrect sysimgblt snd_sm
[74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj 
hid_generic e1000e usbhid ptp uas hid video i2c_i801 ahci pps_core crc32_pclmul 
i2c_smbus usb_storage libahci wmi
[74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: GW IOE
  6.0.0-custom #1
[74084.923146] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 
11/02/2021
[74084.931074] Workqueue: events_unbound async_run_entry_fn
[74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d 85 e4 74 08 
49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b e9 
39 ff ff ff 3d 00 fe ff ff 0f 85 75 96 47 00 ebf
[74084.961199] RSP: :bed6812ebb90 EFLAGS: 00010202
[74084.966435] RAX:  RBX: bed6812ebc50 RCX: 
[74084.973578] RDX: bed6812ebc70 RSI: bed6812ebc60 RDI: bed6812ebc50
[74084.980725] RBP: bed6812ebbb8 R08:  R09: 01ff
[74084.987869] R10: bed6812ebb40 R11:  R12: bed6812ebc70
[74084.995015] R13: bed6812ebc60 R14: 963a2945cc00 R15: 9639c7da5630
[74085.002160] FS:  () GS:963d1dc8() 
knlGS:
[74085.010262] CS:  0010 DS:  ES:  CR0: 80050033
[74085.016016] CR2:  CR3: 000377c0a001 CR4: 003706e0
[74085.023164] DR0:  DR1:  DR2: 
[74085.030307] DR3:  DR6: fffe0ff0 DR7: 0400
[74085.037453] Call Trace:
[74085.039911]  
[74085.042023]  amdgpu_mes_self_test+0x385/0x460 [amdgpu]
[74085.047293]  mes_v11_0_late_init+0x44/0x50 [amdgpu]
[74085.052291]  amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]
[74085.058032]  amdgpu_device_resume+0xb0/0x2d0 [amdgpu]
[74085.063187]  amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[74085.068162]  pci_pm_resume+0x68/0x100
[74085.071836]  ? pci_legacy_resume+0x80/0x80
[74085.075943]  dpm_run_callback+0x4c/0x160
[74085.079873]  device_resume+0xad/0x210
[74085.083546]  async_resume+0x1e/0x40
[74085.087046]  async_run_entry_fn+0x30/0x120
[74085.091152]  process_one_work+0x21a/0x3f0
[74085.095173]  worker_thread+0x50/0x3e0
[74085.098845]  ? process_one_work+0x3f0/0x3f0
[74085.103039]  kthread+0xfa/0x130
[74085.106189]  ? 

Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-10 Thread Christian König

Hi Evan,

yeah, exactly that's what this warning should prevent. Allocating 
buffers temporary for stuff like that is illegal during resume.


I strongly suggest to just remove the MES test. It's abusing the kernel 
ring interface in a way we didn't want anyway and is currently replaced 
by Shahanks work.


Regards,
Christian.

Am 10.02.23 um 05:12 schrieb Quan, Evan:


[AMD Official Use Only - General]

Hi Jack,

Are you trying to fix the call trace popped up on resuming below?

It seems mes created some bo for its self test and freed it up later 
at the final stage of the resuming process.


All these happened before the in_suspend flag cleared. And that 
triggered the call trace.


Is my understanding correct?

[74084.799260] WARNING: CPU: 2 PID: 2891 at 
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 
amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]


[74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE) iommu_v2 
gpu_sched drm_buddy drm_ttm_helper ttm drm_display_helper 
drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect 
sysimgblt snd_sm


[74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp 
hid_logitech_dj hid_generic e1000e usbhid ptp uas hid video i2c_i801 
ahci pps_core crc32_pclmul i2c_smbus usb_storage libahci wmi


[74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: G 
   W IOE  6.0.0-custom #1


[74084.923146] Hardware name: ASUS System Product Name/PRIME Z390-A, 
BIOS 2004 11/02/2021


[74084.931074] Workqueue: events_unbound async_run_entry_fn

[74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]

[74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d 85 
e4 74 08 49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc 
cc cc cc <0f> 0b e9 39 ff ff ff 3d 00 fe ff ff 0f 85 75 96 47 00 ebf


[74084.961199] RSP: :bed6812ebb90 EFLAGS: 00010202

[74084.966435] RAX:  RBX: bed6812ebc50 RCX: 



[74084.973578] RDX: bed6812ebc70 RSI: bed6812ebc60 RDI: 
bed6812ebc50


[74084.980725] RBP: bed6812ebbb8 R08:  R09: 
01ff


[74084.987869] R10: bed6812ebb40 R11:  R12: 
bed6812ebc70


[74084.995015] R13: bed6812ebc60 R14: 963a2945cc00 R15: 
9639c7da5630


[74085.002160] FS:  () GS:963d1dc8() 
knlGS:


[74085.010262] CS:  0010 DS:  ES:  CR0: 80050033

[74085.016016] CR2:  CR3: 000377c0a001 CR4: 
003706e0


[74085.023164] DR0:  DR1:  DR2: 



[74085.030307] DR3:  DR6: fffe0ff0 DR7: 
0400


[74085.037453] Call Trace:

[74085.039911]  

[74085.042023] amdgpu_mes_self_test+0x385/0x460 [amdgpu]

[74085.047293] mes_v11_0_late_init+0x44/0x50 [amdgpu]

[74085.052291] amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]

[74085.058032] amdgpu_device_resume+0xb0/0x2d0 [amdgpu]

[74085.063187] amdgpu_pmops_resume+0x37/0x70 [amdgpu]

[74085.068162]  pci_pm_resume+0x68/0x100

[74085.071836]  ? pci_legacy_resume+0x80/0x80

[74085.075943]  dpm_run_callback+0x4c/0x160

[74085.079873]  device_resume+0xad/0x210

[74085.083546]  async_resume+0x1e/0x40

[74085.087046] async_run_entry_fn+0x30/0x120

[74085.091152] process_one_work+0x21a/0x3f0

[74085.095173]  worker_thread+0x50/0x3e0

[74085.098845]  ? process_one_work+0x3f0/0x3f0

[74085.103039]  kthread+0xfa/0x130

[74085.106189]  ? kthread_complete_and_exit+0x20/0x20

[74085.110993]  ret_from_fork+0x1f/0x30

[74085.114576]  

[74085.116773] ---[ end trace  ]---

BR

Evan

*From:* amd-gfx  *On Behalf Of 
*Christian König

*Sent:* Monday, February 6, 2023 5:00 PM
*To:* Xiao, Jack ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org; Deucher, 
Alexander 
*Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
is unavailable


Am 06.02.23 um 09:28 schrieb Xiao, Jack:

[AMD Official Use Only - General]

   >> >> It's simply not allowed to free up resources
during suspend since those can't be acquired again during resume.

      >> The in_suspend flag is set at the
beginning of suspend and unset at the end of resume. It can’t
filter the case you mentioned.


   Why not? This is exactly what it should do.

[Jack] If freeing up resources during resume, it should not hit
the issue you described. But only checking in_suspend flag would
take these cases as warning.


No, once more: Freeing up or allocating resources between suspend and 
resume is illegal!


If you free up a resource during resume you should absolutely hit 
that, this is intentional!


Regards,
Christian.

Regards,

Jack

*From:* Koenig, Christian 
<mailto:christian.koe...@amd.com>
*Sent:* Monday, February 6, 2023 4:06 PM
*To:* Xiao, Jack  <mailto:jack.x...@amd.com>;
Christian Kö

RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-09 Thread Quan, Evan
[AMD Official Use Only - General]

Hi Jack,

Are you trying to fix the call trace popped up on resuming below?
It seems mes created some bo for its self test and freed it up later at the 
final stage of the resuming process.
All these happened before the in_suspend flag cleared. And that triggered the 
call trace.
Is my understanding correct?

[74084.799260] WARNING: CPU: 2 PID: 2891 at 
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 
[amdgpu]
[74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE) iommu_v2 gpu_sched 
drm_buddy drm_ttm_helper ttm drm_display_helper drm_kms_helper i2c_algo_bit 
fb_sys_fops syscopyarea sysfillrect sysimgblt snd_sm
[74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj 
hid_generic e1000e usbhid ptp uas hid video i2c_i801 ahci pps_core crc32_pclmul 
i2c_smbus usb_storage libahci wmi
[74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: GW IOE
  6.0.0-custom #1
[74084.923146] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 
11/02/2021
[74084.931074] Workqueue: events_unbound async_run_entry_fn
[74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d 85 e4 74 08 
49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b e9 
39 ff ff ff 3d 00 fe ff ff 0f 85 75 96 47 00 ebf
[74084.961199] RSP: :bed6812ebb90 EFLAGS: 00010202
[74084.966435] RAX:  RBX: bed6812ebc50 RCX: 
[74084.973578] RDX: bed6812ebc70 RSI: bed6812ebc60 RDI: bed6812ebc50
[74084.980725] RBP: bed6812ebbb8 R08:  R09: 01ff
[74084.987869] R10: bed6812ebb40 R11:  R12: bed6812ebc70
[74084.995015] R13: bed6812ebc60 R14: 963a2945cc00 R15: 9639c7da5630
[74085.002160] FS:  () GS:963d1dc8() 
knlGS:
[74085.010262] CS:  0010 DS:  ES:  CR0: 80050033
[74085.016016] CR2:  CR3: 000377c0a001 CR4: 003706e0
[74085.023164] DR0:  DR1:  DR2: 
[74085.030307] DR3:  DR6: fffe0ff0 DR7: 0400
[74085.037453] Call Trace:
[74085.039911]  
[74085.042023]  amdgpu_mes_self_test+0x385/0x460 [amdgpu]
[74085.047293]  mes_v11_0_late_init+0x44/0x50 [amdgpu]
[74085.052291]  amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]
[74085.058032]  amdgpu_device_resume+0xb0/0x2d0 [amdgpu]
[74085.063187]  amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[74085.068162]  pci_pm_resume+0x68/0x100
[74085.071836]  ? pci_legacy_resume+0x80/0x80
[74085.075943]  dpm_run_callback+0x4c/0x160
[74085.079873]  device_resume+0xad/0x210
[74085.083546]  async_resume+0x1e/0x40
[74085.087046]  async_run_entry_fn+0x30/0x120
[74085.091152]  process_one_work+0x21a/0x3f0
[74085.095173]  worker_thread+0x50/0x3e0
[74085.098845]  ? process_one_work+0x3f0/0x3f0
[74085.103039]  kthread+0xfa/0x130
[74085.106189]  ? kthread_complete_and_exit+0x20/0x20
[74085.110993]  ret_from_fork+0x1f/0x30
[74085.114576]  
[74085.116773] ---[ end trace  ]---

BR
Evan
From: amd-gfx  On Behalf Of Christian 
König
Sent: Monday, February 6, 2023 5:00 PM
To: Xiao, Jack ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org; Deucher, Alexander 

Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable

Am 06.02.23 um 09:28 schrieb Xiao, Jack:

[AMD Official Use Only - General]

  >> >> It's simply not allowed to free up 
resources during suspend since those can't be acquired again during resume.
  >> The in_suspend flag is set at the beginning of 
suspend and unset at the end of resume. It can't filter the case you mentioned.

   Why not? This is exactly what it should do.

[Jack] If freeing up resources during resume, it should not hit the issue you 
described. But only checking in_suspend flag would take these cases as warning.

No, once more: Freeing up or allocating resources between suspend and resume is 
illegal!

If you free up a resource during resume you should absolutely hit that, this is 
intentional!

Regards,
Christian.


Regards,
Jack

From: Koenig, Christian 
<mailto:christian.koe...@amd.com>
Sent: Monday, February 6, 2023 4:06 PM
To: Xiao, Jack <mailto:jack.x...@amd.com>; Christian König 
<mailto:ckoenig.leichtzumer...@gmail.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, 
Alexander <mailto:alexander.deuc...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable

Am 06.02.23 um 08:23 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> Nope, that is not related to any hw state.

can use other flag.

>> It's simply not allowed to free up resources during suspend since those 
&g

Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-06 Thread Christian König

Am 06.02.23 um 09:28 schrieb Xiao, Jack:


[AMD Official Use Only - General]

   >> >> It's simply not allowed to free up resources 
during suspend since those can't be acquired again during resume.


      >> The in_suspend flag is set at the 
beginning of suspend and unset at the end of resume. It can’t filter 
the case you mentioned.



   Why not? This is exactly what it should do.

[Jack] If freeing up resources during resume, it should not hit the 
issue you described. But only checking in_suspend flag would take 
these cases as warning.




No, once more: Freeing up or allocating resources between suspend and 
resume is illegal!


If you free up a resource during resume you should absolutely hit that, 
this is intentional!


Regards,
Christian.


Regards,

Jack

*From:* Koenig, Christian 
*Sent:* Monday, February 6, 2023 4:06 PM
*To:* Xiao, Jack ; Christian König 
; amd-gfx@lists.freedesktop.org; 
Deucher, Alexander 
*Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
is unavailable


Am 06.02.23 um 08:23 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> Nope, that is not related to any hw state.

can use other flag.

>> It's simply not allowed to free up resources during suspend
since those can't be acquired again during resume.

The in_suspend flag is set at the beginning of suspend and unset
at the end of resume. It can’t filter the case you mentioned.


Why not? This is exactly what it should do.

Do you know the root cause of these cases hitting the issue? So
that we can get an exact point to warn the freeing up behavior.


Well the root cause are programming errors. See between suspending and 
resuming you should not allocate nor free memory.


Otherwise we can run into trouble. And this check here is one part of 
that, we should probably add another warning during allocation of 
memory. But this here is certainly correct.


Regards,
Christian.

Thanks,

Jack

*From:* Christian König 
<mailto:ckoenig.leichtzumer...@gmail.com>
*Sent:* Friday, February 3, 2023 9:20 PM
*To:* Xiao, Jack  <mailto:jack.x...@amd.com>;
Koenig, Christian 
<mailto:christian.koe...@amd.com>; amd-gfx@lists.freedesktop.org;
Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
    *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when
DMA is unavailable

Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since
those can't be acquired again during resume.

We had a couple of cases now where this was wrong. If you get a
warning from that please fix the code which tried to free
something during suspend instead.

Regards,
Christian.

Am 03.02.23 um 07:04 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.

Why? In my understanding, the limit was caused by DMA shutdown.

Regards,

Jack

*From:* Koenig, Christian 
<mailto:christian.koe...@amd.com>
*Sent:* Thursday, February 2, 2023 7:43 PM
*To:* Xiao, Jack 
<mailto:jack.x...@amd.com>; amd-gfx@lists.freedesktop.org;
Deucher, Alexander 
        <mailto:alexander.deuc...@amd.com>
        *Subject:* AW: [PATCH] drm/amdgpu: only WARN freeing buffers
when DMA is unavailable

Big NAK to this! This warning is not related in any way to the
hw state.

It's simply illegal to free up memory during suspend.

Regards,

Christian.



*Von:*Xiao, Jack 
*Gesendet:* Donnerstag, 2. Februar 2023 10:54
*An:* amd-gfx@lists.freedesktop.org
; Deucher, Alexander
    ; Koenig, Christian
    
    *Cc:* Xiao, Jack 
    *Betreff:* [PATCH] drm/amdgpu: only WARN freeing buffers when
DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct
amdgpu_bo **bo, u64 *gpu_addr,
 if (*bo == NULL)
 return;

- WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+ WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+

!amdgpu_ttm

RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-06 Thread Xiao, Jack
[AMD Official Use Only - General]

  >> >> It's simply not allowed to free up 
resources during suspend since those can't be acquired again during resume.
  >> The in_suspend flag is set at the beginning of 
suspend and unset at the end of resume. It can't filter the case you mentioned.

   Why not? This is exactly what it should do.

[Jack] If freeing up resources during resume, it should not hit the issue you 
described. But only checking in_suspend flag would take these cases as warning.

Regards,
Jack

From: Koenig, Christian 
Sent: Monday, February 6, 2023 4:06 PM
To: Xiao, Jack ; Christian König 
; amd-gfx@lists.freedesktop.org; Deucher, 
Alexander 
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable

Am 06.02.23 um 08:23 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> Nope, that is not related to any hw state.

can use other flag.

>> It's simply not allowed to free up resources during suspend since those 
>> can't be acquired again during resume.
The in_suspend flag is set at the beginning of suspend and unset at the end of 
resume. It can't filter the case you mentioned.

Why not? This is exactly what it should do.

Do you know the root cause of these cases hitting the issue? So that we can get 
an exact point to warn the freeing up behavior.

Well the root cause are programming errors. See between suspending and resuming 
you should not allocate nor free memory.

Otherwise we can run into trouble. And this check here is one part of that, we 
should probably add another warning during allocation of memory. But this here 
is certainly correct.

Regards,
Christian.


Thanks,
Jack

From: Christian König 
<mailto:ckoenig.leichtzumer...@gmail.com>
Sent: Friday, February 3, 2023 9:20 PM
To: Xiao, Jack <mailto:jack.x...@amd.com>; Koenig, Christian 
<mailto:christian.koe...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, 
Alexander <mailto:alexander.deuc...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable

Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since those can't 
be acquired again during resume.

We had a couple of cases now where this was wrong. If you get a warning from 
that please fix the code which tried to free something during suspend instead.

Regards,
Christian.
Am 03.02.23 um 07:04 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.
Why? In my understanding, the limit was caused by DMA shutdown.

Regards,
Jack

From: Koenig, Christian 
<mailto:christian.koe...@amd.com>
Sent: Thursday, February 2, 2023 7:43 PM
To: Xiao, Jack <mailto:jack.x...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, 
Alexander <mailto:alexander.deuc...@amd.com>
Subject: AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable

Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.


Von: Xiao, Jack mailto:jack.x...@amd.com>>
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>; Deucher, 
Alexander mailto:alexander.deuc...@amd.com>>; 
Koenig, Christian mailto:christian.koe...@amd.com>>
Cc: Xiao, Jack mailto:jack.x...@amd.com>>
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao mailto:jack.x...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 
*gpu_addr,
 if (*bo == NULL)
 return;

-   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+   
!amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

 if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
 if (cpu_addr)
--
2.37.3




Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-06 Thread Christian König

Am 06.02.23 um 08:23 schrieb Xiao, Jack:


[AMD Official Use Only - General]

>> Nope, that is not related to any hw state.

can use other flag.

>> It's simply not allowed to free up resources during suspend since 
those can't be acquired again during resume.


The in_suspend flag is set at the beginning of suspend and unset at 
the end of resume. It can’t filter the case you mentioned.




Why not? This is exactly what it should do.

Do you know the root cause of these cases hitting the issue? So that 
we can get an exact point to warn the freeing up behavior.




Well the root cause are programming errors. See between suspending and 
resuming you should not allocate nor free memory.


Otherwise we can run into trouble. And this check here is one part of 
that, we should probably add another warning during allocation of 
memory. But this here is certainly correct.


Regards,
Christian.


Thanks,

Jack

*From:* Christian König 
*Sent:* Friday, February 3, 2023 9:20 PM
*To:* Xiao, Jack ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org; Deucher, 
Alexander 
*Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
is unavailable


Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since 
those can't be acquired again during resume.


We had a couple of cases now where this was wrong. If you get a 
warning from that please fix the code which tried to free something 
during suspend instead.


Regards,
Christian.

Am 03.02.23 um 07:04 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.

Why? In my understanding, the limit was caused by DMA shutdown.

Regards,

Jack

*From:* Koenig, Christian 
<mailto:christian.koe...@amd.com>
*Sent:* Thursday, February 2, 2023 7:43 PM
*To:* Xiao, Jack  <mailto:jack.x...@amd.com>;
amd-gfx@lists.freedesktop.org; Deucher, Alexander
 <mailto:alexander.deuc...@amd.com>
    *Subject:* AW: [PATCH] drm/amdgpu: only WARN freeing buffers when
DMA is unavailable

Big NAK to this! This warning is not related in any way to the hw
state.

It's simply illegal to free up memory during suspend.

Regards,

Christian.



*Von:*Xiao, Jack 
*Gesendet:* Donnerstag, 2. Februar 2023 10:54
*An:* amd-gfx@lists.freedesktop.org
; Deucher, Alexander
; Koenig, Christian

    *Cc:* Xiao, Jack 
    *Betreff:* [PATCH] drm/amdgpu: only WARN freeing buffers when DMA
is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo
**bo, u64 *gpu_addr,
 if (*bo == NULL)
 return;

- WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+ WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+

!amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

 if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
 if (cpu_addr)
-- 
2.37.3




RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-05 Thread Xiao, Jack
[AMD Official Use Only - General]

>> Nope, that is not related to any hw state.

can use other flag.

>> It's simply not allowed to free up resources during suspend since those 
>> can't be acquired again during resume.
The in_suspend flag is set at the beginning of suspend and unset at the end of 
resume. It can't filter the case you mentioned.
Do you know the root cause of these cases hitting the issue? So that we can get 
an exact point to warn the freeing up behavior.

Thanks,
Jack

From: Christian König 
Sent: Friday, February 3, 2023 9:20 PM
To: Xiao, Jack ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org; Deucher, Alexander 

Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable

Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since those can't 
be acquired again during resume.

We had a couple of cases now where this was wrong. If you get a warning from 
that please fix the code which tried to free something during suspend instead.

Regards,
Christian.
Am 03.02.23 um 07:04 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.
Why? In my understanding, the limit was caused by DMA shutdown.

Regards,
Jack

From: Koenig, Christian 
<mailto:christian.koe...@amd.com>
Sent: Thursday, February 2, 2023 7:43 PM
To: Xiao, Jack <mailto:jack.x...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, 
Alexander <mailto:alexander.deuc...@amd.com>
Subject: AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable

Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.


Von: Xiao, Jack mailto:jack.x...@amd.com>>
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>; Deucher, 
Alexander mailto:alexander.deuc...@amd.com>>; 
Koenig, Christian mailto:christian.koe...@amd.com>>
Cc: Xiao, Jack mailto:jack.x...@amd.com>>
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao mailto:jack.x...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 
*gpu_addr,
 if (*bo == NULL)
 return;

-   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+   
!amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

 if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
 if (cpu_addr)
--
2.37.3



Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-03 Thread Christian König

Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since those 
can't be acquired again during resume.


We had a couple of cases now where this was wrong. If you get a warning 
from that please fix the code which tried to free something during 
suspend instead.


Regards,
Christian.

Am 03.02.23 um 07:04 schrieb Xiao, Jack:


[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.

Why? In my understanding, the limit was caused by DMA shutdown.

Regards,

Jack

*From:* Koenig, Christian 
*Sent:* Thursday, February 2, 2023 7:43 PM
*To:* Xiao, Jack ; amd-gfx@lists.freedesktop.org; 
Deucher, Alexander 
*Subject:* AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
is unavailable


Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,

Christian.



*Von:*Xiao, Jack 
*Gesendet:* Donnerstag, 2. Februar 2023 10:54
*An:* amd-gfx@lists.freedesktop.org ; 
Deucher, Alexander ; Koenig, Christian 


*Cc:* Xiao, Jack 
*Betreff:* [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable


Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, 
u64 *gpu_addr,

 if (*bo == NULL)
 return;

- WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+ WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+ 
!amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);


 if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
 if (cpu_addr)
--
2.37.3



RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-02 Thread Xiao, Jack
[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.
Why? In my understanding, the limit was caused by DMA shutdown.

Regards,
Jack

From: Koenig, Christian 
Sent: Thursday, February 2, 2023 7:43 PM
To: Xiao, Jack ; amd-gfx@lists.freedesktop.org; Deucher, 
Alexander 
Subject: AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
unavailable

Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.


Von: Xiao, Jack mailto:jack.x...@amd.com>>
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>; Deucher, 
Alexander mailto:alexander.deuc...@amd.com>>; 
Koenig, Christian mailto:christian.koe...@amd.com>>
Cc: Xiao, Jack mailto:jack.x...@amd.com>>
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao mailto:jack.x...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 
*gpu_addr,
 if (*bo == NULL)
 return;

-   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+   
!amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

 if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
 if (cpu_addr)
--
2.37.3


AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-02 Thread Koenig, Christian
Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.


Von: Xiao, Jack 
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org ; Deucher, 
Alexander ; Koenig, Christian 

Cc: Xiao, Jack 
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 
*gpu_addr,
 if (*bo == NULL)
 return;

-   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+   
!amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

 if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
 if (cpu_addr)
--
2.37.3



[PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-02 Thread Jack Xiao
Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 
*gpu_addr,
if (*bo == NULL)
return;
 
-   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+   
!amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);
 
if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
if (cpu_addr)
-- 
2.37.3