Re: BUG [RESEND][NEW BUG]: kernel NULL pointer dereference, address: 0000000000000008

2024-01-24 Thread Ma, Jun
Hi Mirsad,


On 1/25/2024 1:48 AM, Mirsad Todorovac wrote:
> Hi, Ma Jun,
> 
> Normally, I would reply under the quoted text, but I will adjust to your 
> convention.
> 
> I have just discovered that your patch causes Ubuntu 22.04 LTS GNOME XWayland 
> session
> to block at typing password and ENTER in the graphical logon screen (tested 
> several times).
> 
This problem is not caused by my patch. 
Based on your syslog, it looks more like a shedule issue.
I just saw a similar problem, please refer to the link below
https://gitlab.freedesktop.org/drm/amd/-/issues/3124

Regards,
Ma Jun
> After that, I was not able to even log from another box with ssh, or the 
> session would
> block (tested one time, second time too, thrid time it passed after I 
> connected before
> attempt to login on XWayland console).
> 
> You might find useful syslog and dmesg of the freeze on this link (they were 
> +100K):
> 
> https://magrf.grf.hr/~mtodorov/linux/bugreports/6.7.0/amdgpu/6.7.0-xway-09721-g61da593f4458/
> 
> The exact applied patch was this:
> 
> marvin@defiant:~/linux/kernel/linux_torvalds$ git diff
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 73f6d7e72c73..6ef333df9adf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -3996,16 +3996,13 @@ static int gfx_v10_0_init_microcode(struct 
> amdgpu_device *adev)
>
>   if (!amdgpu_sriov_vf(adev)) {
>   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", 
> ucode_prefix);
> -   err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, fw_name);
> -   /* don't check this.  There are apparently firmwares in the 
> wild with
> -* incorrect size in the header
> -*/
> -   if (err == -ENODEV)
> -   goto out;
> +   err = request_firmware(&adev->gfx.rlc_fw, fw_name, adev->dev);
>   if (err)
> -   dev_dbg(adev->dev,
> -   "gfx10: amdgpu_ucode_request() failed 
> \"%s\"\n",
> -   fw_name);
> +   goto out;
> +
> +   /* don't validate this firmware.  There are apparently 
> firmwares
> +* in the wild with incorrect size in the header
> +*/
>   rlc_hdr = (const struct rlc_firmware_header_v2_0 
> *)adev->gfx.rlc_fw->data;
>   version_major = 
> le16_to_cpu(rlc_hdr->header.header_version_major);
>   version_minor = 
> le16_to_cpu(rlc_hdr->header.header_version_minor);
> marvin@defiant:~/linux/kernel/linux_torvalds$ uname -rms
> Linux 6.7.0-xway-09721-g61da593f4458 x86_64
> marvin@defiant:~/linux/kernel/linux_torvalds$
> 
> So, there seems to be a problem with the way the patch affects XWayland.
> 
> Checked multiple times the exact commit with and without the diff.
> 
> Hope this helps, because I am not familiar with the amdgpu driver.
> 
> Best regards,
> Mirsad Todorovac
> 
> On 1/22/24 09:34, Ma, Jun wrote:
>> Perhaps similar to the problem I encountered earlier, you can
>> try the following patch
>>
>> https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html
>>
>> Regards,
>> Ma Jun
>>
>> On 1/21/2024 3:54 AM, Mirsad Todorovac wrote:
>>> Hi,
>>>
>>> The last email did not pass to the most of the recipients due to banned .xz 
>>> attachment.
>>>
>>> As the .config is too big to send inline or uncompressed either, I will 
>>> omit it in this
>>> attempt. In the meantime, I had some success in decoding the stack trace, 
>>> but sadly not
>>> complete.
>>>
>>> I don't think this Oops is deterministic, but I am working on a reproducer.
>>>
>>> The platform is Ubuntu 22.04 LTS.
>>>
>>> Complete list of hardware and .config is available here:
>>>
>>> https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v02-nokcsan-09928-g052d534373b7/
>>>
>>> Best regards,
>>> Mirsad
>>>
>>> ---
>>> kernel: [5.576702] BUG: kernel NULL pointer dereference, address: 
>>> 0008
>>> kernel: [5.576707] #PF: supervisor read access in kernel mode
>>> kernel: [5.576710] #PF: error_code(0x) - not-present page
>>> kernel: [5.576712] PGD 0 

Re: BUG [RESEND]: kernel NULL pointer dereference, address: 0000000000000008

2024-01-22 Thread Ma, Jun
Perhaps similar to the problem I encountered earlier, you can
try the following patch

https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html

Regards,
Ma Jun

On 1/21/2024 3:54 AM, Mirsad Todorovac wrote:
> Hi,
> 
> The last email did not pass to the most of the recipients due to banned .xz 
> attachment.
> 
> As the .config is too big to send inline or uncompressed either, I will omit 
> it in this
> attempt. In the meantime, I had some success in decoding the stack trace, but 
> sadly not
> complete.
> 
> I don't think this Oops is deterministic, but I am working on a reproducer.
> 
> The platform is Ubuntu 22.04 LTS.
> 
> Complete list of hardware and .config is available here:
> 
> https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7/
> 
> Best regards,
> Mirsad
> 
> ---
> kernel: [5.576702] BUG: kernel NULL pointer dereference, address: 
> 0008
> kernel: [5.576707] #PF: supervisor read access in kernel mode
> kernel: [5.576710] #PF: error_code(0x) - not-present page
> kernel: [5.576712] PGD 0 P4D 0
> kernel: [5.576715] Oops:  [#1] PREEMPT SMP NOPTI
> kernel: [5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 
> 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
> kernel: [5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG 
> Lightning, BIOS 1.21 04/26/2023
> kernel: [5.576726] RIP: 0010:gfx_v10_0_early_init 
> (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 
> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
> kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 
> 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff 
> <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
> All code
> 
> 0:8d 55 a8lea-0x58(%rbp),%edx
> 3:4c 89 ffmov%r15,%rdi
> 6:e8 e4 83 ec ff  call   0xffec83ef
> b:41 89 c2mov%eax,%r10d
> e:83 f8 edcmp$0xffed,%eax
>11:0f 84 b3 fd ff ff   je 0xfdca
>17:85 c0   test   %eax,%eax
>19:74 05   je 0x20
>1b:0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
>20:49 8b 87 08 87 01 00mov0x18708(%r15),%rax
>27:4c 89 ffmov%r15,%rdi
>2a:*   48 8b 40 08 mov0x8(%rax),%rax   <-- 
> trapping instruction
>2e:0f b7 50 0a movzwl 0xa(%rax),%edx
>32:0f b7 70 08 movzwl 0x8(%rax),%esi
>36:e8 e4 42 fb ff  call   0xfffb431f
>3b:41 89 c2mov%eax,%r10d
>3e:85 c0   test   %eax,%eax
> 
> Code starting with the faulting instruction
> ===
> 0:48 8b 40 08 mov0x8(%rax),%rax
> 4:0f b7 50 0a movzwl 0xa(%rax),%edx
> 8:0f b7 70 08 movzwl 0x8(%rax),%esi
> c:e8 e4 42 fb ff  call   0xfffb42f5
>11:41 89 c2mov%eax,%r10d
>14:85 c0   test   %eax,%eax
> kernel: [5.576878] RSP: 0018:a5b3c103f720 EFLAGS: 00010282
> kernel: [5.576881] RAX:  RBX: c1d73489 RCX: 
> 
> kernel: [5.576884] RDX:  RSI:  RDI: 
> 91ae4fa8
> kernel: [5.576886] RBP: a5b3c103f7b0 R08:  R09: 
> 
> kernel: [5.576889] R10: ffea R11:  R12: 
> 91ae4fa986e8
> kernel: [5.576892] R13: 91ae4fa986d8 R14: 91ae4fa986f8 R15: 
> 91ae4fa8
> kernel: [5.576895] FS:  7fdaa343c8c0() GS:91bd5844() 
> knlGS:
> kernel: [5.576898] CS:  0010 DS:  ES:  CR0: 80050033
> kernel: [5.576900] CR2: 0008 CR3: 0001222d CR4: 
> 00750ef0
> kernel: [5.576903] PKRU: 5554
> kernel: [5.576905] Call Trace:
> kernel: [5.576907]  
> kernel: [5.576909] ? show_regs (arch/x86/kernel/dumpstack.c:479)
> kernel: [5.576914] ? __die (arch/x86/kernel/dumpstack.c:421 
> arch/x86/kernel/dumpstack.c:434)
> kernel: [5.576917] ? page_fault_oops (arch/x86/mm/fault.c:707)
> kernel: [5.576921] ? srso_alias_return_thunk 
> (arch/x86/lib/retpoline.S:181)
> kernel: [

[PATCH] drm/buddy: Fix drm buddy info output format

2023-08-03 Thread Ma Jun
[1] Change pages to blocks to avoid confusion.
[2] Fix output format to align the output info.

Signed-off-by: Ma Jun 
---
 drivers/gpu/drm/drm_buddy.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 3d1f50f481cf..ef3dd15c334a 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -781,15 +781,15 @@ void drm_buddy_print(struct drm_buddy *mm, struct 
drm_printer *p)
count++;
}
 
-   drm_printf(p, "order-%d ", order);
+   drm_printf(p, "order-%2d ", order);
 
free = count * (mm->chunk_size << order);
if (free < SZ_1M)
-   drm_printf(p, "free: %lluKiB", free >> 10);
+   drm_printf(p, "free: %8llu KiB", free >> 10);
else
-   drm_printf(p, "free: %lluMiB", free >> 20);
+   drm_printf(p, "free: %8llu MiB", free >> 20);
 
-   drm_printf(p, ", pages: %llu\n", count);
+   drm_printf(p, ", blocks: %llu\n", count);
}
 }
 EXPORT_SYMBOL(drm_buddy_print);
-- 
2.34.1



Re: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields

2023-05-31 Thread Ma, Jun



On 5/31/2023 4:45 PM, Christian König wrote:
> Am 31.05.23 um 08:20 schrieb Chen, Guchun:
>> [Public]
>>
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of Ma
>>> Jun
>>> Sent: Wednesday, May 31, 2023 1:31 PM
>>> To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Koenig,
>>> Christian 
>>> Cc: Ma, Jun 
>>> Subject: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields
>>>
>>> Remove redundant assignment code for ttm->caching as it's overwritten
>>>
>>> just a few lines later.
>> Please drop the blank line in above message. With it fixed, the patch is: 
>> Reviewed-by: Guchun Chen 
> 
> Seconded, I'm going to pick that patch up and submit it to drm-misc-next 
> with the commit message fixed.
> 

Thanks for help.

Regards,
Ma Jun
> Regards,
> Christian.
> 
>>
>> Regards,
>> Guchun
>>
>>> v2:
>>>   - Update the commit message.
>>>
>>> Signed-off-by: Ma Jun 
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 -
>>>   1 file changed, 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>> index 02b812dacc5d..45a44544b656 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>> @@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>>>   unsigned long extra_pages)
>>>   {
>>>ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) +
>>> extra_pages;
>>> - ttm->caching = ttm_cached;
>>>ttm->page_flags = page_flags;
>>>ttm->dma_address = NULL;
>>>ttm->swap_storage = NULL;
>>> --
>>> 2.34.1
> 


Re: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields

2023-05-31 Thread Ma, Jun



On 5/31/2023 2:20 PM, Chen, Guchun wrote:
> [Public]
> 
>> -Original Message-
>> From: amd-gfx  On Behalf Of Ma
>> Jun
>> Sent: Wednesday, May 31, 2023 1:31 PM
>> To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Koenig,
>> Christian 
>> Cc: Ma, Jun 
>> Subject: [PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields
>>
>> Remove redundant assignment code for ttm->caching as it's overwritten
>>
>> just a few lines later.
> 
> Please drop the blank line in above message. With it fixed, the patch is: 
> Reviewed-by: Guchun Chen 
> 
Thanks for review. Will fix it when push.

Regards,
Ma Jun
> Regards,
> Guchun
> 
>> v2:
>>  - Update the commit message.
>>
>> Signed-off-by: Ma Jun 
>> ---
>>  drivers/gpu/drm/ttm/ttm_tt.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>> index 02b812dacc5d..45a44544b656 100644
>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>> @@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>>  unsigned long extra_pages)
>>  {
>>   ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) +
>> extra_pages;
>> - ttm->caching = ttm_cached;
>>   ttm->page_flags = page_flags;
>>   ttm->dma_address = NULL;
>>   ttm->swap_storage = NULL;
>> --
>> 2.34.1
> 


[PATCH v2] drm/ttm: Remove redundant code in ttm_tt_init_fields

2023-05-30 Thread Ma Jun
Remove redundant assignment code for ttm->caching as it's overwritten

just a few lines later.

v2:
 - Update the commit message.

Signed-off-by: Ma Jun 
---
 drivers/gpu/drm/ttm/ttm_tt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 02b812dacc5d..45a44544b656 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
   unsigned long extra_pages)
 {
ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + 
extra_pages;
-   ttm->caching = ttm_cached;
ttm->page_flags = page_flags;
ttm->dma_address = NULL;
ttm->swap_storage = NULL;
-- 
2.34.1



Re: [PATCH] drm/ttm: Remove redundant code in ttm_tt_init_fields

2023-05-30 Thread Ma, Jun



On 5/30/2023 4:59 PM, Christian König wrote:
> Am 29.05.23 um 11:28 schrieb Ma Jun:
>> Remove redundant assignment code for ttm->caching
> 
> The explanation is missing why this is redundant, e.g. something like 
> "this is overwritten just a few lines later"..
> 

Thanks for review. Will update the commit message in v2

Regards,
Ma Jun

> Apart from that looks good to me,
> Christian.,
> 
>>
>> Signed-off-by: Ma Jun 
>> ---
>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>> index 02b812dacc5d..45a44544b656 100644
>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>> @@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>> unsigned long extra_pages)
>>   {
>>  ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + 
>> extra_pages;
>> -ttm->caching = ttm_cached;
>>  ttm->page_flags = page_flags;
>>  ttm->dma_address = NULL;
>>  ttm->swap_storage = NULL;
> 


[PATCH] drm/ttm: Remove redundant code in ttm_tt_init_fields

2023-05-29 Thread Ma Jun
Remove redundant assignment code for ttm->caching

Signed-off-by: Ma Jun 
---
 drivers/gpu/drm/ttm/ttm_tt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 02b812dacc5d..45a44544b656 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -143,7 +143,6 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
   unsigned long extra_pages)
 {
ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + 
extra_pages;
-   ttm->caching = ttm_cached;
ttm->page_flags = page_flags;
ttm->dma_address = NULL;
ttm->swap_storage = NULL;
-- 
2.34.1



Re: [PATCH 1/2] drm/ttm: Check ttm_debugfs_root before creating files under it

2023-01-15 Thread Ma, Jun



On 1/13/2023 5:37 PM, Christian König wrote:
> Am 13.01.23 um 06:34 schrieb Ma Jun:
>> Check the ttm_debugfs_root before creating files under it.
>> If the ttm_debugfs_root is NULL, all the files created for
>> ttm/ will be placed in the /sys/kerne/debug/ but not
>> /sys/kernel/debug/ttm/
> 
> Well NAK for upstreaming. Why should ttm_debugfs_root be NULL here?
> 

In my case, when the ttm/ removal fails during amdgpu uninstall and then
we try to modprobe the amdgpu again, the ttm_debugfs_root will be NULL
because the ttm/ already exists.

Regards,
Ma Jun

> Regards,
> Christian.
> 
>>
>> Signed-off-by: Ma Jun 
>> ---
>>   drivers/gpu/drm/ttm/ttm_device.c |  3 ++-
>>   drivers/gpu/drm/ttm/ttm_pool.c   | 10 ++
>>   drivers/gpu/drm/ttm/ttm_tt.c |  5 +++--
>>   3 files changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c 
>> b/drivers/gpu/drm/ttm/ttm_device.c
>> index e7147e304637..967bc2244df3 100644
>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>> @@ -105,7 +105,8 @@ static int ttm_global_init(void)
>>  INIT_LIST_HEAD(&glob->device_list);
>>  atomic_set(&glob->bo_count, 0);
>>   
>> -debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root,
>> +if(ttm_debugfs_root)
>> +debugfs_create_atomic_t("buffer_objects", 0444, 
>> ttm_debugfs_root,
>>  &glob->bo_count);
>>   out:
>>  if (ret && ttm_debugfs_root)
>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
>> index 21b61631f73a..d95a65f759df 100644
>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>> @@ -713,10 +713,12 @@ int ttm_pool_mgr_init(unsigned long num_pages)
>>  }
>>   
>>   #ifdef CONFIG_DEBUG_FS
>> -debugfs_create_file("page_pool", 0444, ttm_debugfs_root, NULL,
>> -&ttm_pool_debugfs_globals_fops);
>> -debugfs_create_file("page_pool_shrink", 0400, ttm_debugfs_root, NULL,
>> -&ttm_pool_debugfs_shrink_fops);
>> +if(ttm_debugfs_root) {
>> +debugfs_create_file("page_pool", 0444, ttm_debugfs_root, NULL,
>> +&ttm_pool_debugfs_globals_fops);
>> +debugfs_create_file("page_pool_shrink", 0400, ttm_debugfs_root, 
>> NULL,
>> +&ttm_pool_debugfs_shrink_fops);
>> +}
>>   #endif
>>   
>>  mm_shrinker.count_objects = ttm_pool_shrinker_count;
>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>> index d505603930a7..fec443494ef0 100644
>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>> @@ -394,8 +394,9 @@ DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink);
>>   void ttm_tt_mgr_init(unsigned long num_pages, unsigned long 
>> num_dma32_pages)
>>   {
>>   #ifdef CONFIG_DEBUG_FS
>> -debugfs_create_file("tt_shrink", 0400, ttm_debugfs_root, NULL,
>> -&ttm_tt_debugfs_shrink_fops);
>> +if(ttm_debugfs_root)
>> +debugfs_create_file("tt_shrink", 0400, ttm_debugfs_root, NULL,
>> +&ttm_tt_debugfs_shrink_fops);
>>   #endif
>>   
>>  if (!ttm_pages_limit)
> 


Re: [PATCH 2/2] drm/ttm: Use debugfs_remove_recursive to remove ttm directory

2023-01-15 Thread Ma, Jun



On 1/13/2023 5:38 PM, Christian König wrote:
> Am 13.01.23 um 06:34 schrieb Ma Jun:
>> Use debugfs_remove_recursive to remove the /sys/kernel/debug/ttm
>> directory for better compatibility. Becuase debugfs_remove fails
>> on older kernel.
> 
> Again NAK for upstreaming.
> 
> The upstream kernel is made for the newest kernel version and should not 
> contain any compatibility handling for older kernels.
> 
Yes, generally so.

But the debugfs_remove_recursive() and debugfs_remove() are same function now.
The debugfs_remove_recursive is used here so that we don't need to make kcl 
patch
for it.

Regards,
Ma Jun

> Christian.
> 
>>
>> Signed-off-by: Ma Jun 
>> ---
>>   drivers/gpu/drm/ttm/ttm_device.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c 
>> b/drivers/gpu/drm/ttm/ttm_device.c
>> index 967bc2244df3..590297123bb2 100644
>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>> @@ -55,7 +55,7 @@ static void ttm_global_release(void)
>>  goto out;
>>   
>>  ttm_pool_mgr_fini();
>> -debugfs_remove(ttm_debugfs_root);
>> +debugfs_remove_recursive(ttm_debugfs_root);
>>   
>>  __free_page(glob->dummy_read_page);
>>  memset(glob, 0, sizeof(*glob));
> 


[PATCH 2/2] drm/ttm: Use debugfs_remove_recursive to remove ttm directory

2023-01-12 Thread Ma Jun
Use debugfs_remove_recursive to remove the /sys/kernel/debug/ttm
directory for better compatibility. Becuase debugfs_remove fails
on older kernel.

Signed-off-by: Ma Jun 
---
 drivers/gpu/drm/ttm/ttm_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 967bc2244df3..590297123bb2 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -55,7 +55,7 @@ static void ttm_global_release(void)
goto out;
 
ttm_pool_mgr_fini();
-   debugfs_remove(ttm_debugfs_root);
+   debugfs_remove_recursive(ttm_debugfs_root);
 
__free_page(glob->dummy_read_page);
memset(glob, 0, sizeof(*glob));
-- 
2.25.1



[PATCH 1/2] drm/ttm: Check ttm_debugfs_root before creating files under it

2023-01-12 Thread Ma Jun
Check the ttm_debugfs_root before creating files under it.
If the ttm_debugfs_root is NULL, all the files created for
ttm/ will be placed in the /sys/kerne/debug/ but not
/sys/kernel/debug/ttm/

Signed-off-by: Ma Jun 
---
 drivers/gpu/drm/ttm/ttm_device.c |  3 ++-
 drivers/gpu/drm/ttm/ttm_pool.c   | 10 ++
 drivers/gpu/drm/ttm/ttm_tt.c |  5 +++--
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index e7147e304637..967bc2244df3 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -105,7 +105,8 @@ static int ttm_global_init(void)
INIT_LIST_HEAD(&glob->device_list);
atomic_set(&glob->bo_count, 0);
 
-   debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root,
+   if(ttm_debugfs_root)
+   debugfs_create_atomic_t("buffer_objects", 0444, 
ttm_debugfs_root,
&glob->bo_count);
 out:
if (ret && ttm_debugfs_root)
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 21b61631f73a..d95a65f759df 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -713,10 +713,12 @@ int ttm_pool_mgr_init(unsigned long num_pages)
}
 
 #ifdef CONFIG_DEBUG_FS
-   debugfs_create_file("page_pool", 0444, ttm_debugfs_root, NULL,
-   &ttm_pool_debugfs_globals_fops);
-   debugfs_create_file("page_pool_shrink", 0400, ttm_debugfs_root, NULL,
-   &ttm_pool_debugfs_shrink_fops);
+   if(ttm_debugfs_root) {
+   debugfs_create_file("page_pool", 0444, ttm_debugfs_root, NULL,
+   &ttm_pool_debugfs_globals_fops);
+   debugfs_create_file("page_pool_shrink", 0400, ttm_debugfs_root, 
NULL,
+   &ttm_pool_debugfs_shrink_fops);
+   }
 #endif
 
mm_shrinker.count_objects = ttm_pool_shrinker_count;
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index d505603930a7..fec443494ef0 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -394,8 +394,9 @@ DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink);
 void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages)
 {
 #ifdef CONFIG_DEBUG_FS
-   debugfs_create_file("tt_shrink", 0400, ttm_debugfs_root, NULL,
-   &ttm_tt_debugfs_shrink_fops);
+   if(ttm_debugfs_root)
+   debugfs_create_file("tt_shrink", 0400, ttm_debugfs_root, NULL,
+   &ttm_tt_debugfs_shrink_fops);
 #endif
 
if (!ttm_pages_limit)
-- 
2.25.1



[PATCH V2] drm/plane-helper: Add the missing declaration of drm_atomic_state

2022-12-15 Thread Ma Jun
Add the missing declaration of struct drm_atomic_state to fix the
compile error below:

error: 'struct drm_atomic_state' declared inside parameter
list will not be visible outside of this definition or declaration [-Werror]

Signed-off-by: Ma Jun 
---
 include/drm/drm_plane_helper.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/drm/drm_plane_helper.h b/include/drm/drm_plane_helper.h
index b00ad36cf5b6..90156e13ac11 100644
--- a/include/drm/drm_plane_helper.h
+++ b/include/drm/drm_plane_helper.h
@@ -26,6 +26,7 @@
 
 #include 
 
+struct drm_atomic_state;
 struct drm_crtc;
 struct drm_framebuffer;
 struct drm_modeset_acquire_ctx;
-- 
2.25.1



Re: [RESEND PATCH] drm/plane-helper: Add the missing declaration of drm_atomic_state

2022-12-15 Thread Ma, Jun



On 12/15/2022 4:40 PM, Thomas Zimmermann wrote:
> Hi
> 
> Am 15.12.22 um 04:01 schrieb Ma Jun:
>> Add the missing declaration of struct drm_atomic_state to fix the
>> compile error below:
>>
>> error: 'struct drm_atomic_state' declared inside parameter
>> list will not be visible outside of this definition or declaration [-Werror]
>>
>> Signed-off-by: Ma Jun 
>> ---
>>   include/drm/drm_plane_helper.h | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/include/drm/drm_plane_helper.h b/include/drm/drm_plane_helper.h
>> index b00ad36cf5b6..530f88176db4 100644
>> --- a/include/drm/drm_plane_helper.h
>> +++ b/include/drm/drm_plane_helper.h
>> @@ -30,6 +30,7 @@ struct drm_crtc;
>>   struct drm_framebuffer;
>>   struct drm_modeset_acquire_ctx;
>>   struct drm_plane;
>> +struct drm_atomic_state;
> 
> Thanks for the patch. Please sort the forward declarations alphabetically.
> 
Thanks for review. Will fix in v2

Regards,
Ma Jun
> Best regards
> Thomas
> 
>>   
>>   int drm_plane_helper_update_primary(struct drm_plane *plane, struct 
>> drm_crtc *crtc,
>>  struct drm_framebuffer *fb,
> 


[RESEND PATCH] drm/plane-helper: Add the missing declaration of drm_atomic_state

2022-12-14 Thread Ma Jun
Add the missing declaration of struct drm_atomic_state to fix the
compile error below:

error: 'struct drm_atomic_state' declared inside parameter
list will not be visible outside of this definition or declaration [-Werror]

Signed-off-by: Ma Jun 
---
 include/drm/drm_plane_helper.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/drm/drm_plane_helper.h b/include/drm/drm_plane_helper.h
index b00ad36cf5b6..530f88176db4 100644
--- a/include/drm/drm_plane_helper.h
+++ b/include/drm/drm_plane_helper.h
@@ -30,6 +30,7 @@ struct drm_crtc;
 struct drm_framebuffer;
 struct drm_modeset_acquire_ctx;
 struct drm_plane;
+struct drm_atomic_state;
 
 int drm_plane_helper_update_primary(struct drm_plane *plane, struct drm_crtc 
*crtc,
struct drm_framebuffer *fb,
-- 
2.25.1



[PATCH] drm/plane-helper: Add the missing declaration of drm_atomic_state

2022-12-07 Thread Ma Jun
Add the missing declaration of struct drm_atomic_state

Signed-off-by: Ma Jun 
---
 include/drm/drm_plane_helper.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/drm/drm_plane_helper.h b/include/drm/drm_plane_helper.h
index b00ad36cf5b6..530f88176db4 100644
--- a/include/drm/drm_plane_helper.h
+++ b/include/drm/drm_plane_helper.h
@@ -30,6 +30,7 @@ struct drm_crtc;
 struct drm_framebuffer;
 struct drm_modeset_acquire_ctx;
 struct drm_plane;
+struct drm_atomic_state;
 
 int drm_plane_helper_update_primary(struct drm_plane *plane, struct drm_crtc 
*crtc,
struct drm_framebuffer *fb,
-- 
2.25.1



Re: Coverity: kfd_parse_subtype_cache(): Memory - corruptions

2022-11-06 Thread Ma, Jun
Thanks, I will send the fix patch.

Regards,
Ma Jun

On 11/5/2022 4:40 AM, Felix Kuehling wrote:
> On 2022-11-04 15:41, coverity-bot wrote:
>> Hello!
>>
>> This is an experimental semi-automated report about issues detected by
>> Coverity from a scan of next-20221104 as part of the linux-next scan project:
>> https://scan.coverity.com/projects/linux-next-weekly-scan
>>
>> You're getting this email because you were associated with the identified
>> lines of code (noted below) that were touched by commits:
>>
>>Fri Dec 8 23:08:59 2017 -0500
>>  3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs")
>>
>> Coverity reported the following:
>>
>> *** CID 1527133:  Memory - corruptions  (OVERRUN)
>> drivers/gpu/drm/amd/amdkfd/kfd_crat.c:1113 in kfd_parse_subtype_cache()
>> 1107 props->cache_size = cache->cache_size;
>> 1108 props->cacheline_size = cache->cache_line_size;
>> 1109 props->cachelines_per_tag = 
>> cache->lines_per_tag;
>> 1110 props->cache_assoc = cache->associativity;
>>  props->cache_latency = cache->cache_latency;
>> 1112
>> vvv CID 1527133:  Memory - corruptions  (OVERRUN)
>> vvv Overrunning array "cache->sibling_map" of 32 bytes by passing it to 
>> a function which accesses it at byte offset 63 using argument "64UL". [Note: 
>> The source code implementation of the function has been overridden by a 
>> builtin model.]
>> 1113 memcpy(props->sibling_map, cache->sibling_map,
>> 1114 sizeof(props->sibling_map));
>> 1115
>> 1116 /* set the sibling_map_size as 32 for CRAT from 
>> ACPI */
>> 1117 props->sibling_map_size = CRAT_SIBLINGMAP_SIZE;
>> 1118
>>
>> If this is a false positive, please let us know so we can mark it as
>> such, or teach the Coverity rules to be smarter. If not, please make
>> sure fixes get into linux-next. :) For patches fixing this, please
>> include these lines (but double-check the "Fixes" first):
>>
>> Reported-by: coverity-bot 
>> Addresses-Coverity-ID: 1527133 ("Memory - corruptions")
>> Fixes: 3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs")
>>
>> I'm not sure why this suddenly appeared after 5 years, but the read
>> over-run looks legit:
> 
> 
> I think this was introduced by a more recent patch that was in fact 
> meant to fix an array overrun on HW that is outgrowing the CRAT sibling 
> map size:
>
>> commit 0938fbeb6f53fc44bc9b19784dee28496e68ba0c
>> Author: Ma Jun 
>> Date:   Wed Nov 2 15:53:26 2022 +0800
>>
>>     drm/amdkfd: Fix the warning of array-index-out-of-bounds
>>
>>     For some GPUs with more CUs, the original sibling_map[32]
>>     in struct crat_subtype_cache is not enough
>>     to save the cache information when create the VCRAT table,
>>     so skip filling the struct crat_subtype_cache info instead
>>     fill struct kfd_cache_properties directly to fix this problem.
>>
>>     Signed-off-by: Ma Jun 
>>     Reviewed-by: Felix Kuehling 
>>     Signed-off-by: Alex Deucher 
> I added Ma Jun to the email.
> 
> Regards,
>    Felix
> 
> 
>>
>> struct crat_subtype_cache {
>>  ...
>>  uint8_t sibling_map[CRAT_SIBLINGMAP_SIZE];
>>
>> #define CRAT_SIBLINGMAP_SIZE32
>>
>>
>> struct kfd_cache_properties {
>>  ...
>>  uint8_t sibling_map[CACHE_SIBLINGMAP_SIZE];
>>
>> #define CACHE_SIBLINGMAP_SIZE 64
>>
>> Thanks for your attention!
>>