Renoir: visual artifacts associated with scrolling

2020-10-15 Thread Dirk Gouders
I am dealing with visual artifacts on my laptop.

I'm not sure if screendumps are OK on this list so I will try to
describe the two vulnerable applications I identified so far.  Both of
them can be healed by forcing a full redraws (changing workspace back and
forth, minimizing and maximizing windows) when they misbehave:

1) Xterm
   Using the mouse wheel is OK, SHIFT-PgUp and SHIFT-PgDn result
   in damaged lower and upper halves of the screen accordingly.

   This problem I could solve by using another terminal program.

2) Emacs
   I see visual artifacts when navigating mails or source code,
   for example.  These artifacts are more versatile: sometimes a
   top line that seems to be pinned, remainders of long lines,
   sometimes pinned half lines where the top half of a text line
   and the bottom halfe seem to be a mixture of actually two.
   Sometimes these artifacts are only identifiable, because all
   of the displayed text makes no sense in context.

The predecessor of that laptop is Intel based and does not behave that
way.  Running the two applications on that Intel based laptop via ssh
from the Renoir machine shows the same problems, though.

Because the predecessor of the laptop does not cause these problems, I
tried to do a bisect but did not find a "good" candidate.  With all
kernels an X-server would start with (>v5.4) I see these artifacts.

Still, all this probably does not mean it is not user space that causes
the problems.  Perhaps, someone could give me some hints what else I
could do to further examine this problem.

Dirk

P.S: Scrolling this text up for review before sending it out also
 partially scrambles the text.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/1] drm/amdgpu: fix NULL pointer dereference for Renoir

2020-10-01 Thread Dirk Gouders
Dirk Gouders  writes:

> Commit c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir)
> introduced a NULL pointer dereference when booting with
> amdgpu.discovery=0, because it removed the call of vega10_reg_base_init()
> for that case.
>
> Fix this by calling that funcion if amdgpu_discovery == 0 in addition to
> the case that amdgpu_discovery_reg_base_init() failed.
>
> Fixes: c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir)
> Signed-off-by: Dirk Gouders 
> Cc: Hawking Zhang 
> Cc: Evan Quan 
> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 84d811b6e48b..f8cb62b326d6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -694,12 +694,12 @@ static void soc15_reg_base_init(struct amdgpu_device 
> *adev)
>* it doesn't support SRIOV. */
>   if (amdgpu_discovery) {
>   r = amdgpu_discovery_reg_base_init(adev);
> - if (r) {
> - DRM_WARN("failed to init reg base from ip 
> discovery table, "
> -  "fallback to legacy init method\n");
> - vega10_reg_base_init(adev);
> - }
> + if (r == 0)
> +   break;

Grrr, wrong indentation here.
But I will wait for your review before v1.

Dirk


> + DRM_WARN("failed to init reg base from ip discovery 
> table, "
> +  "fallback to legacy init method\n");
>   }
> + vega10_reg_base_init(adev);
>   break;
>   case CHIP_VEGA20:
>   vega20_reg_base_init(adev);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 0/1] drm/amdgpu: fix NULL pointer dereference for Renoir

2020-10-01 Thread Dirk Gouders
Alex Deucher  writes:

> On Wed, Sep 30, 2020 at 4:46 PM Dirk Gouders  wrote:
>>
>> Commit c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir)
>> introduced a NULL pointer dereference when booting with
>> amdgpu.discovery=0.
>>
>> For amdgpu.discovery=0 that commit effectively removed the call of
>> vega10_reg_base_init(adev), so I tested the correctness of the bisect
>> session by restoring that function call for amdgpu_discovery == 0 and with
>> that change, the NULL pointer dereference does not occur:
>>
>
> Can I add your Signed-off-by?

I did not expect the diff to be seen as a proposed patch, not even that it
shows the correct fix.

Anyway, I did my best to create a hopefully acceptable patch with
some modification of the code that avoids "else" and an identical function call
at two places in the code.

I testet that patch with amdgpu.discovery={0,1} and together with the patch for 
the
first issue you helped me with.  The result is no more call traces.

Thank you for your patient assistance with the two issues.

Dirk


> Thanks,
>
> Alex
>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
>> index 84d811b6e48b..2e93c5e1e7e6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>> @@ -699,7 +699,8 @@ static void soc15_reg_base_init(struct amdgpu_device 
>> *adev)
>>  "fallback to legacy init method\n");
>> vega10_reg_base_init(adev);
>> }
>> -   }
>> +   } else
>> +   vega10_reg_base_init(adev);
>> break;
>> case CHIP_VEGA20:
>> vega20_reg_base_init(adev);
>>
>> Dirk
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Dirk Gouders (1):
  drm/amdgpu: fix NULL pointer dereference for Renoir

 drivers/gpu/drm/amd/amdgpu/soc15.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

-- 
2.26.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/1] drm/amdgpu: fix NULL pointer dereference for Renoir

2020-10-01 Thread Dirk Gouders
Commit c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir)
introduced a NULL pointer dereference when booting with
amdgpu.discovery=0, because it removed the call of vega10_reg_base_init()
for that case.

Fix this by calling that funcion if amdgpu_discovery == 0 in addition to
the case that amdgpu_discovery_reg_base_init() failed.

Fixes: c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir)
Signed-off-by: Dirk Gouders 
Cc: Hawking Zhang 
Cc: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 84d811b6e48b..f8cb62b326d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -694,12 +694,12 @@ static void soc15_reg_base_init(struct amdgpu_device 
*adev)
 * it doesn't support SRIOV. */
if (amdgpu_discovery) {
r = amdgpu_discovery_reg_base_init(adev);
-   if (r) {
-   DRM_WARN("failed to init reg base from ip 
discovery table, "
-"fallback to legacy init method\n");
-   vega10_reg_base_init(adev);
-   }
+   if (r == 0)
+ break;
+   DRM_WARN("failed to init reg base from ip discovery 
table, "
+"fallback to legacy init method\n");
}
+   vega10_reg_base_init(adev);
break;
case CHIP_VEGA20:
vega20_reg_base_init(adev);
-- 
2.26.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


BUG: amdgpu: NULL pointer dereference introduced in 5.9-rc1

2020-09-30 Thread Dirk Gouders
Commit c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir)
introduced a NULL pointer dereference when booting with
amdgpu.discovery=0.

For amdgpu.discovery=0 that commit effectively removed the call of
vega10_reg_base_init(adev), so I tested the correctness of the bisect
session by restoring that function call for amdgpu_discovery == 0 and with
that change, the NULL pointer dereference does not occur:

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 84d811b6e48b..2e93c5e1e7e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -699,7 +699,8 @@ static void soc15_reg_base_init(struct amdgpu_device *adev)
 "fallback to legacy init method\n");
vega10_reg_base_init(adev);
}
-   }
+   } else
+   vega10_reg_base_init(adev);
break;
case CHIP_VEGA20:
vega20_reg_base_init(adev);

Dirk
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/pm: setup APU dpm clock table in SMU HW initialization

2020-09-30 Thread Dirk Gouders
Evan Quan  writes:

> As the dpm clock table is needed during DC HW initialization.
> And that (DC HW initialization) comes before smu_late_init()
> where current APU dpm clock table setup is performed. So, NULL
> pointer dereference will be triggered. By moving APU dpm clock
> table setup to smu_hw_init(), this can be avoided.

Thanks for the quick response.  I tested the patch and it fixes the call
trace I initially reportet (#1 in the table below).  #2 is unaffected by
this patch.  I could try to bisect it as well bud did not do it, so far.

Probably, I caused some confusion in the original thread and I will try to
order it a bit.  What I noticed is:

 with amdgpu.discovery value| noticed issue
 ===
1)unset or "1"  | call trace because of
| assert(0) in 
rn_clk_mgr_helper_populate_bw_params()
 ---+---
2)  0   | NULL pointer dereference in 
soc15_set_ip_blocks()

This patch fixes #1, i.e. avoids the assert() in following code in
drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c

for (i = PP_SMU_NUM_FCLK_DPM_LEVELS - 1; i >= 0; i--) {
if (clock_table->FClocks[i].Freq != 0 && 
clock_table->FClocks[i].Vol != 0) {
j = i;
break;
}
}

if (j == -1) {
/* clock table is all 0s, just use our own hardcode */
ASSERT(0);
return;
}

To me, the commit message sounds as if the patch fixes #2 whereas it
really is #1 that gets fixed.  I also wonder if we probably want a
fixes-line for completeness:

Fixes: 02cf91c113ea (drm/amd/powerplay: postpone operations not required for hw 
setup to late_init)

Dirk

> Change-Id: I2bb1f9ba26f9c8820c08241da62f7be64ab75840
> Signed-off-by: Evan Quan 
> Reported-by: Dirk Gouders 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 22 +++---
>  1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> index f46cf9ea355e..8f6045def272 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> @@ -482,17 +482,6 @@ static int smu_late_init(void *handle)
>   return ret;
>   }
>  
> - /*
> -  * Set initialized values (get from vbios) to dpm tables context such as
> -  * gfxclk, memclk, dcefclk, and etc. And enable the DPM feature for each
> -  * type of clks.
> -  */
> - ret = smu_set_default_dpm_table(smu);
> - if (ret) {
> - dev_err(adev->dev, "Failed to setup default dpm clock 
> tables!\n");
> - return ret;
> - }
> -
>   ret = smu_populate_umd_state_clk(smu);
>   if (ret) {
>   dev_err(adev->dev, "Failed to populate UMD state clocks!\n");
> @@ -1021,6 +1010,17 @@ static int smu_smc_hw_setup(struct smu_context *smu)
>   return ret;
>   }
>  
> + /*
> +  * Set initialized values (get from vbios) to dpm tables context such as
> +  * gfxclk, memclk, dcefclk, and etc. And enable the DPM feature for each
> +  * type of clks.
> +  */
> + ret = smu_set_default_dpm_table(smu);
> + if (ret) {
> + dev_err(adev->dev, "Failed to setup default dpm clock 
> tables!\n");
> + return ret;
> + }
> +
>   ret = smu_notify_display_change(smu);
>   if (ret)
>   return ret;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: amdgpu: call trace introduced in 5.9-rc1 for Lenovo L14 Renoir

2020-09-28 Thread Dirk Gouders
Alex Deucher  writes:

> On Wed, Sep 23, 2020 at 3:45 PM Dirk Gouders  wrote:
>>
>> Dirk Gouders  writes:
>>
>> > Alex Deucher  writes:
>> >
>> >> On Wed, Sep 23, 2020 at 8:54 AM Dirk Gouders  wrote:
>> >>>
>> >>> Dirk Gouders  writes:
>> >>>
>> >>> > Hi,
>> >>> >
>> >>> > I noticed a call trace (attached) when starting my machine (ThinkPad
>> >>> > L14).  This machine is new and I am still working on it's
>> >>> > configuration but visually noticeable is that scrolling in xterms with
>> >>> > SHIFT-PgUp/PgDn is broken.  Using the mouse wheel works.
>> >>> >
>> >>> > It seems the call trace has been introduced between 5.8 and 5.9-rc1 and
>> >>> > I tried to bisect this but always end in situations where I dont't find
>> >>> > a bootable commit around the current bisect position.  Mainly the
>> >>> > machine then hangs when udevd is started.
>> >>>
>> >>> I fixed my netconsole setup (had to use a switch instead of the
>> >>> ports of a FritzBox) and tried a bisect, again (log below).  With the
>> >>> commits between the earliest bad and latest good commits I marked, my
>> >>> machine does not boot and hangs very early with the message:
>> >>>
>> >>> fb0: switching to amdgpudrmfb from EFI VGA
>> >>>
>> >>> That was introduced with
>> >>>
>> >>> c1cf79ca5ced drm/amdgpu: use IP discovery table for renoir
>> >>>
>> >>> and ended with a commit that instead produces the call trace
>> >>>
>> >>> b6df946ef4b5 drm/amdgpu: fix the nullptr issue as for PWR IP not
>> >>>  existing in discovery table
>> >>>
>> >>> I was hoping to get further with the bisect but have no idea how to
>> >>> avoid the early hangs.
>> >>
>> >> You can disable use of the IP discovery table by setting
>> >> amdgpu.discovery=0 on the kernel command line in grub.
>> >
>> > I tried that with b770f04ba2ee (next step in bisect), but no success
>> > with this option, unfortunately.
>> >
>> > I'm not using grub but directly booting from UEFI using CONFIG_CMDLINE.
>> > Any other option I am using (root, loglevel and netconsole) works as
>> > expected and I veryfied that "amdgpu.discovery=0" is included in
>> > vmlinux.
>>
>> Apologies if I'm causing too much noise.
>>
>> While thinking about this I recalled that I changed amdgpu from modular to
>> static when I had problems with netconsole.  I changed it back to
>> modular to see if that helps and I get the earlier mentioned hangs later
>> in the boot process when udevd starts and netconsole is up working.
>> This enables me to inspect boot messages and I tested with
>> amdgpu.discovery=0:
>>
>> 5,175,49060,-;Kernel command line: root=PARTLABEL=system1 amdgpu.discovery=0 
>> loglevel=15 netconsole=...
>>
>> I'm afraid I now get traces that commit b6df946ef4b5 (drm/amdgpu: fix
>> the nullptr issue as for PWR IP not existing in discovery table) is
>> fixing (output attached below) and I cannot decide how to continue with
>> bisecting...
>
> You get the issue with discovery=0?  You can try skipping that commit
> (mark as skip) to finish the bisection.

I get issues with both, amdgpu.discovery={0,1}.  With "0" I hit the NULL
pointer dereference in soc15_set_ip_blocks() and with "1" I hit the
assert in rn_clk_mgr_helper_populate_bw_params().

I only noticed the issue with "0" after you told me about
amdgpu.discovery, so I continued to find the commit that introduced the
issue with "1".

Using bisect skip alone did not help but keeping b6df946ef4b5
(drm/amdgpu: fix the nullptr issue as for PWR IP not existing in
discovery table) in the working tree made the bisect session much more
straight forward.

It resulted in 02cf91c113ea (drm/amd/powerplay: postpone operations not
required for hw setup to late_init) as the first bad commit.

Not that I understand anything about the driver but I wanted to know if
that commit really is causing the issue.  So, I tried to move back some
initialization code from smu_late_init() to smu_smc_hw_setup() (diff
below) and with that the issue is gone.  I'm not sure if you prefer a
full dmesg output, for now I'll append the [drm] part.

Dirk

= diff 

[PATCH] drm/amd/display: remove duplicate call to rn_vbios_smu_get_smu_version()

2020-09-28 Thread Dirk Gouders
Commit 78fe9f63947a2b (drm/amd/display: Remove DISPCLK Limit Floor
for Certain SMU Versions) added a call to
rn_vbios_smu_get_smu_version() to set clk_mgr->smu_ver.  That field is
initialized prior to the if-statement, already.

Fixes: 78fe9f63947a2b (drm/amd/display: Remove DISPCLK Limit Floor for Certain 
SMU Versions)
Signed-off-by: Dirk Gouders 
Cc: Alex Deucher 
Cc: Sung Lee 
Cc: Yongqiang Sun 
Cc: Rodrigo Siqueira 
---
 drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
index 543afa34d87a..21a3073c8929 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
@@ -783,7 +783,6 @@ void rn_clk_mgr_construct(
} else {
struct clk_log_info log_info = {0};
 
-   clk_mgr->smu_ver = rn_vbios_smu_get_smu_version(clk_mgr);
clk_mgr->periodic_retraining_disabled = 
rn_vbios_smu_is_periodic_retraining_disabled(clk_mgr);
 
/* SMU Version 55.51.0 and up no longer have an issue
-- 
2.26.2
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: amdgpu: call trace introduced in 5.9-rc1 for Lenovo L14 Renoir

2020-09-23 Thread Dirk Gouders
Dirk Gouders  writes:

> Alex Deucher  writes:
>
>> On Wed, Sep 23, 2020 at 8:54 AM Dirk Gouders  wrote:
>>>
>>> Dirk Gouders  writes:
>>>
>>> > Hi,
>>> >
>>> > I noticed a call trace (attached) when starting my machine (ThinkPad
>>> > L14).  This machine is new and I am still working on it's
>>> > configuration but visually noticeable is that scrolling in xterms with
>>> > SHIFT-PgUp/PgDn is broken.  Using the mouse wheel works.
>>> >
>>> > It seems the call trace has been introduced between 5.8 and 5.9-rc1 and
>>> > I tried to bisect this but always end in situations where I dont't find
>>> > a bootable commit around the current bisect position.  Mainly the
>>> > machine then hangs when udevd is started.
>>>
>>> I fixed my netconsole setup (had to use a switch instead of the
>>> ports of a FritzBox) and tried a bisect, again (log below).  With the
>>> commits between the earliest bad and latest good commits I marked, my
>>> machine does not boot and hangs very early with the message:
>>>
>>> fb0: switching to amdgpudrmfb from EFI VGA
>>>
>>> That was introduced with
>>>
>>> c1cf79ca5ced drm/amdgpu: use IP discovery table for renoir
>>>
>>> and ended with a commit that instead produces the call trace
>>>
>>> b6df946ef4b5 drm/amdgpu: fix the nullptr issue as for PWR IP not
>>>  existing in discovery table
>>>
>>> I was hoping to get further with the bisect but have no idea how to
>>> avoid the early hangs.
>>
>> You can disable use of the IP discovery table by setting
>> amdgpu.discovery=0 on the kernel command line in grub.
>
> I tried that with b770f04ba2ee (next step in bisect), but no success
> with this option, unfortunately.
>
> I'm not using grub but directly booting from UEFI using CONFIG_CMDLINE.
> Any other option I am using (root, loglevel and netconsole) works as
> expected and I veryfied that "amdgpu.discovery=0" is included in
> vmlinux.

Apologies if I'm causing too much noise.

While thinking about this I recalled that I changed amdgpu from modular to
static when I had problems with netconsole.  I changed it back to
modular to see if that helps and I get the earlier mentioned hangs later
in the boot process when udevd starts and netconsole is up working.
This enables me to inspect boot messages and I tested with
amdgpu.discovery=0:

5,175,49060,-;Kernel command line: root=PARTLABEL=system1 amdgpu.discovery=0 
loglevel=15 netconsole=...

I'm afraid I now get traces that commit b6df946ef4b5 (drm/amdgpu: fix
the nullptr issue as for PWR IP not existing in discovery table) is
fixing (output attached below) and I cannot decide how to continue with
bisecting...

Dirk

1,840,5418458,-;BUG: kernel NULL pointer dereference, address: 0008
1,841,5418472,-;#PF: supervisor read access in kernel mode
1,842,5418474,-;#PF: error_code(0x) - not-present page
6,843,5418476,-;PGD 0 P4D 0 
4,844,5418480,-;Oops:  [#1] SMP NOPTI
4,845,5418483,-;CPU: 3 PID: 744 Comm: udevd Not tainted 
5.7.0-rc2-x86_64-01641-gb770f04ba2ee #216
4,846,5418486,-;Hardware name: LENOVO 20U50008GE/20U50008GE, BIOS R19ET26W 
(1.10 ) 06/22/2020
4,847,5418559,-;RIP: 0010:nbio_v7_0_get_rev_id+0x9/0x1b [amdgpu]
4,848,5418562,-;Code: 5d 41 5d 41 5e e9 9a f0 f9 ff 48 8b 87 e8 5f 01 00 31 d2 
8b 70 08 81 c6 c3 00 00 00 e9 9d ef f9 ff 48 8b 87 e8 5f 01 00 31 d2 <8b> 70 08 
83 c6 0f e8 89 ef f9 ff c1 e8 18 83 e0 0f c3 49 89 f8 48
4,849,5418566,-;RSP: 0018:c900011dba90 EFLAGS: 00010246
4,850,5418568,-;RAX:  RBX: 000fffe0 RCX: 
0018
4,851,5418571,-;RDX:  RSI: a0970e20 RDI: 
8883f554
4,852,5418573,-;RBP: 8883f554 R08: 0001 R09: 

4,853,5418575,-;R10:  R11: 0048 R12: 
ffea
4,854,5418577,-;R13: 7fff R14: 8883f9486800 R15: 
c900011dbe98
4,855,5418580,-;FS:  7f750db3dd80() GS:88840ecc() 
knlGS:
4,856,5418583,-;CS:  0010 DS:  ES:  CR0: 80050033
4,857,5418586,-;CR2: 0008 CR3: 0003f9728000 CR4: 
00340ee0
4,858,5418588,-;Call Trace:
4,859,5418660,-; soc15_set_ip_blocks+0x105/0x4fd [amdgpu]
4,860,5418714,-; amdgpu_device_init+0xcab/0x1862 [amdgpu]
4,861,5418720,-; ? __kmalloc+0xb2/0xc4
4,862,5418766,-; amdgpu_driver_load_kms+0x41/0x178 [amdgpu]
4,863,5418813,-; amdgpu_pci_probe+0x147/0x1c7 [amdgpu]
4,864,5418818,-; pci_device_probe+0xc6/0x135
4,865,5418822,-; really_probe+0x157/0x2d1
4,866,5418825,-; driver_probe_device+0x97/0xcc
4,867,5418828,-; de

Re: amdgpu: call trace introduced in 5.9-rc1 for Lenovo L14 Renoir

2020-09-23 Thread Dirk Gouders
Alex Deucher  writes:

> On Wed, Sep 23, 2020 at 8:54 AM Dirk Gouders  wrote:
>>
>> Dirk Gouders  writes:
>>
>> > Hi,
>> >
>> > I noticed a call trace (attached) when starting my machine (ThinkPad
>> > L14).  This machine is new and I am still working on it's
>> > configuration but visually noticeable is that scrolling in xterms with
>> > SHIFT-PgUp/PgDn is broken.  Using the mouse wheel works.
>> >
>> > It seems the call trace has been introduced between 5.8 and 5.9-rc1 and
>> > I tried to bisect this but always end in situations where I dont't find
>> > a bootable commit around the current bisect position.  Mainly the
>> > machine then hangs when udevd is started.
>>
>> I fixed my netconsole setup (had to use a switch instead of the
>> ports of a FritzBox) and tried a bisect, again (log below).  With the
>> commits between the earliest bad and latest good commits I marked, my
>> machine does not boot and hangs very early with the message:
>>
>> fb0: switching to amdgpudrmfb from EFI VGA
>>
>> That was introduced with
>>
>> c1cf79ca5ced drm/amdgpu: use IP discovery table for renoir
>>
>> and ended with a commit that instead produces the call trace
>>
>> b6df946ef4b5 drm/amdgpu: fix the nullptr issue as for PWR IP not
>>  existing in discovery table
>>
>> I was hoping to get further with the bisect but have no idea how to
>> avoid the early hangs.
>
> You can disable use of the IP discovery table by setting
> amdgpu.discovery=0 on the kernel command line in grub.

I tried that with b770f04ba2ee (next step in bisect), but no success
with this option, unfortunately.

I'm not using grub but directly booting from UEFI using CONFIG_CMDLINE.
Any other option I am using (root, loglevel and netconsole) works as
expected and I veryfied that "amdgpu.discovery=0" is included in
vmlinux.

Dirk

>
> Alex
>
>
>>
>> Dirk
>>
>> = bisect log ===
>> git bisect start
>> # bad: [9123e3a74ec7b934a4a099e98af6a61c2f80bbf5] Linux 5.9-rc1
>> git bisect bad 9123e3a74ec7b934a4a099e98af6a61c2f80bbf5
>> # good: [bcf876870b95592b52519ed4aafcf9d95999bc9c] Linux 5.8
>> git bisect good bcf876870b95592b52519ed4aafcf9d95999bc9c
>> # bad: [8186749621ed6b8fc42644c399e8c755a2b6f630] Merge tag
>> 'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm
>> git bisect bad 8186749621ed6b8fc42644c399e8c755a2b6f630
>> # good: [2324d50d051ec0f14a548e78554fb02513d6dcef] Merge tag 'docs-5.9' of 
>> git://git.lwn.net/linux
>> git bisect good 2324d50d051ec0f14a548e78554fb02513d6dcef
>> # bad: [54d44bfc56308d105b0da37392d8398bdc9d4745] drm/nouveau/nvif:
>> give every disp object a human-readable identifier
>> git bisect bad 54d44bfc56308d105b0da37392d8398bdc9d4745
>> # bad: [9555152beb1143c85c03f9b9de59863cbbe89f4b] Merge tag
>> 'amd-drm-next-5.9-2020-07-01' of
>> git://people.freedesktop.org/~agd5f/linux into drm-next
>> git bisect bad 9555152beb1143c85c03f9b9de59863cbbe89f4b
>> # bad: [dfd991794685b1228387214f28630b6e94e56944] drm/amd/display: Not doing 
>> bios data pack.
>> git bisect bad dfd991794685b1228387214f28630b6e94e56944
>> # good: [ba806f98f868ce107aa9c453fef751de9980e4af] drm/radeon: disable AGP 
>> by default
>> git bisect good ba806f98f868ce107aa9c453fef751de9980e4af
>> # good: [97d798b276e94a366dfb03d62bc90d4742ab3a31] drm/amdgpu: simplify ATIF 
>> backlight handling
>> git bisect good 97d798b276e94a366dfb03d62bc90d4742ab3a31
>> # good: [ac4e189a5623579c023c9cf8006422aef2a487b4] drm/amdgpu/gfx10: add 
>> navi12 to gfxoff case
>> git bisect good ac4e189a5623579c023c9cf8006422aef2a487b4
>> # good: [70534d1ee89ceadd03292d0c2da4dd4020189678] drm/amdgpu: simplify 
>> raven and renoir checks
>> git bisect good 70534d1ee89ceadd03292d0c2da4dd4020189678
>> # good: [4541ea81edde6ce9a1d9be082489aca7e8e7e1dc]
>> drm/[radeon|amdgpu]: Replace one-element array and use struct_size()
>> helper
>> git bisect good 4541ea81edde6ce9a1d9be082489aca7e8e7e1dc
>> # good: [84034ad4c0c0813c1350b43087eed036066edd5a] drm/amd/display: combine 
>> public interfaces into single header
>> git bisect good 84034ad4c0c0813c1350b43087eed036066edd5a
>> # good: [4f1fad0e9dbd762497df7c79309697ed8b2b6cfc] drm/amd/powerplay: stop 
>> thermal IRQs on suspend
>> git bisect good 4f1fad0e9dbd762497df7c79309697ed8b2b6cfc
>> # good: [4292b0b2026bc10bced32636ea02dd8eed00cea9] drm/amdgpu: clean up 
>> discovery testing
>> g

Re: amdgpu: call trace introduced in 5.9-rc1 for Lenovo L14 Renoir

2020-09-23 Thread Dirk Gouders
Dirk Gouders  writes:

> Hi,
>
> I noticed a call trace (attached) when starting my machine (ThinkPad
> L14).  This machine is new and I am still working on it's
> configuration but visually noticeable is that scrolling in xterms with
> SHIFT-PgUp/PgDn is broken.  Using the mouse wheel works.
>
> It seems the call trace has been introduced between 5.8 and 5.9-rc1 and
> I tried to bisect this but always end in situations where I dont't find
> a bootable commit around the current bisect position.  Mainly the
> machine then hangs when udevd is started.

I fixed my netconsole setup (had to use a switch instead of the
ports of a FritzBox) and tried a bisect, again (log below).  With the
commits between the earliest bad and latest good commits I marked, my
machine does not boot and hangs very early with the message:

fb0: switching to amdgpudrmfb from EFI VGA

That was introduced with

c1cf79ca5ced drm/amdgpu: use IP discovery table for renoir

and ended with a commit that instead produces the call trace

b6df946ef4b5 drm/amdgpu: fix the nullptr issue as for PWR IP not
 existing in discovery table

I was hoping to get further with the bisect but have no idea how to
avoid the early hangs.

Dirk

= bisect log ===
git bisect start
# bad: [9123e3a74ec7b934a4a099e98af6a61c2f80bbf5] Linux 5.9-rc1
git bisect bad 9123e3a74ec7b934a4a099e98af6a61c2f80bbf5
# good: [bcf876870b95592b52519ed4aafcf9d95999bc9c] Linux 5.8
git bisect good bcf876870b95592b52519ed4aafcf9d95999bc9c
# bad: [8186749621ed6b8fc42644c399e8c755a2b6f630] Merge tag 
'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm
git bisect bad 8186749621ed6b8fc42644c399e8c755a2b6f630
# good: [2324d50d051ec0f14a548e78554fb02513d6dcef] Merge tag 'docs-5.9' of 
git://git.lwn.net/linux
git bisect good 2324d50d051ec0f14a548e78554fb02513d6dcef
# bad: [54d44bfc56308d105b0da37392d8398bdc9d4745] drm/nouveau/nvif: give every 
disp object a human-readable identifier
git bisect bad 54d44bfc56308d105b0da37392d8398bdc9d4745
# bad: [9555152beb1143c85c03f9b9de59863cbbe89f4b] Merge tag 
'amd-drm-next-5.9-2020-07-01' of git://people.freedesktop.org/~agd5f/linux into 
drm-next
git bisect bad 9555152beb1143c85c03f9b9de59863cbbe89f4b
# bad: [dfd991794685b1228387214f28630b6e94e56944] drm/amd/display: Not doing 
bios data pack.
git bisect bad dfd991794685b1228387214f28630b6e94e56944
# good: [ba806f98f868ce107aa9c453fef751de9980e4af] drm/radeon: disable AGP by 
default
git bisect good ba806f98f868ce107aa9c453fef751de9980e4af
# good: [97d798b276e94a366dfb03d62bc90d4742ab3a31] drm/amdgpu: simplify ATIF 
backlight handling
git bisect good 97d798b276e94a366dfb03d62bc90d4742ab3a31
# good: [ac4e189a5623579c023c9cf8006422aef2a487b4] drm/amdgpu/gfx10: add navi12 
to gfxoff case
git bisect good ac4e189a5623579c023c9cf8006422aef2a487b4
# good: [70534d1ee89ceadd03292d0c2da4dd4020189678] drm/amdgpu: simplify raven 
and renoir checks
git bisect good 70534d1ee89ceadd03292d0c2da4dd4020189678
# good: [4541ea81edde6ce9a1d9be082489aca7e8e7e1dc] drm/[radeon|amdgpu]: Replace 
one-element array and use struct_size() helper
git bisect good 4541ea81edde6ce9a1d9be082489aca7e8e7e1dc
# good: [84034ad4c0c0813c1350b43087eed036066edd5a] drm/amd/display: combine 
public interfaces into single header
git bisect good 84034ad4c0c0813c1350b43087eed036066edd5a
# good: [4f1fad0e9dbd762497df7c79309697ed8b2b6cfc] drm/amd/powerplay: stop 
thermal IRQs on suspend
git bisect good 4f1fad0e9dbd762497df7c79309697ed8b2b6cfc
# good: [4292b0b2026bc10bced32636ea02dd8eed00cea9] drm/amdgpu: clean up 
discovery testing
git bisect good 4292b0b2026bc10bced32636ea02dd8eed00cea9
# bad: [c0838cbee2d05c3eb8a2b5a3d1ce706a73008044] drm/amd/display: Revert 
"enable plane if plane_status changed"
git bisect bad c0838cbee2d05c3eb8a2b5a3d1ce706a73008044
# bad: [651a146526a04993c5bebf0e19cd9256f5e6511d] drm/amdgpu/jpeg: fix race 
condition issue for jpeg start
git bisect bad 651a146526a04993c5bebf0e19cd9256f5e6511d
# bad: [3bda8acd974e362069e291a78c59a10624debc6e] drm/amdgpu/sriov: Add clear 
vf fw support
git bisect bad 3bda8acd974e362069e291a78c59a10624debc6e
# bad: [b6df946ef4b5ae29183b2fdb2d12c381c757b3fb] drm/amdgpu: fix the nullptr 
issue as for PWR IP not existing in discovery table
git bisect bad b6df946ef4b5ae29183b2fdb2d12c381c757b3fb



> Please let me know if I can help with further information.
>
> Dirk
>
> = lspci -vk 
>
> 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] 
> Renoir (rev c3) (prog-if 00 [VGA controller])
> Subsystem: Lenovo Renoir
> Flags: bus master, fast devsel, latency 0, IRQ 64
> Memory at 46000 (64-bit, prefetchable) [size=256M]
> Memory at 4

amdgpu: call trace introduced in 5.9-rc1 for Lenovo L14 Renoir

2020-09-22 Thread Dirk Gouders
Hi,

I noticed a call trace (attached) when starting my machine (ThinkPad
L14).  This machine is new and I am still working on it's
configuration but visually noticeable is that scrolling in xterms with
SHIFT-PgUp/PgDn is broken.  Using the mouse wheel works.

It seems the call trace has been introduced between 5.8 and 5.9-rc1 and
I tried to bisect this but always end in situations where I dont't find
a bootable commit around the current bisect position.  Mainly the
machine then hangs when udevd is started.

Please let me know if I can help with further information.

Dirk

= lspci -vk 

06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] 
Renoir (rev c3) (prog-if 00 [VGA controller])
Subsystem: Lenovo Renoir
Flags: bus master, fast devsel, latency 0, IRQ 64
Memory at 46000 (64-bit, prefetchable) [size=256M]
Memory at 47000 (64-bit, prefetchable) [size=2M]
I/O ports at 1000 [size=256]
Memory at fd30 (32-bit, non-prefetchable) [size=512K]
Capabilities: [48] Vendor Specific Information: Len=08 
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/4 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=4 Masked-
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 

Capabilities: [270] Secondary PCI Express
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [400] Data Link Feature 
Capabilities: [410] Physical Layer 16.0 GT/s 
Capabilities: [440] Lane Margining at the Receiver 
Kernel driver in use: amdgpu
Kernel modules: amdgpu

= call trace ===

[5.181468] amdgpu :06:00.0: amdgpu: SMU is initialized successfully!
[5.182857] [drm] kiq ring mec 2 pipe 1 q 0
[5.183374] [ cut here ]
[5.183448] WARNING: CPU: 1 PID: 684 at 
drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn21/rn_clk_mgr.c:716 
rn_clk_mgr_construct+0x242/0x389 [amdgpu]
[5.183449] Modules linked in: btusb btrtl btbcm btintel bluetooth 
ecdh_generic ecc iwlmvm mac80211 libarc4 wmi_bmof crct10dif_pclmul 
snd_hda_codec_realtek 
crc32c_intel iwlwifi snd_hda_codec_generic amdgpu(+) tpm_crb snd_hda_codec_hdmi 
gpu_sched i2c_algo_bit ttm sdhci_pci aesni_intel drm_kms_helper cqhci sdhci ccp
 syscopyarea snd_hda_intel sysfillrect tpm_tis snd_intel_dspcfg sysimgblt 
xhci_pci tpm_tis_core fb_sys_fops r8169 snd_hda_codec mmc_core snd_hda_core 
xhci_hcd 
thinkpad_acpi cfg80211 realtek drm snd_pcm rng_core mdio_devres sha1_generic 
snd_timer nvram libphy i2c_piix4 snd k10temp soundcore ledtrig_audio rfkill tpm 
hw
mon wmi battery ac video backlight pinctrl_amd acpi_cpufreq button efivarfs
[5.183470] CPU: 1 PID: 684 Comm: udevd Not tainted 5.9.0-rc6-x86_64+ #170
[5.183471] Hardware name: LENOVO 20U50008GE/20U50008GE, BIOS R19ET26W (1.10 
) 06/22/2020
[5.183531] RIP: 0010:rn_clk_mgr_construct+0x242/0x389 [amdgpu]
[5.183533] Code: 30 4d 85 c9 74 26 ba 03 00 00 00 83 bc d4 a8 00 00 00 00 
89 d6 74 0a 83 bc d4 ac 00 00 00 00 75 40 48 ff ca 48 83 fa ff 75 e1 <0f> 0b 83 
7
b 20 01 0f 84 13 01 00 00 81 bd e8 00 00 00 ff 14 37 00
[5.183533] RSP: 0018:c9000111f798 EFLAGS: 00010246
[5.183534] RAX: 8883fc1d8e00 RBX: 8883f925c9c0 RCX: 
[5.183535] RDX:  RSI:  RDI: 8883f8da70c8
[5.183535] RBP: 8883fe8da000 R08:  R09: 8883f724fc00
[5.183535] R10: 7fc9117f R11: 8883f925c9c0 R12: 8883f925c900
[5.183536] R13: 8883f598 R14:  R15: 0001
[5.183537] FS:  7f9e31a83d80() GS:88840ec4() 
knlGS:
[5.183537] CS:  0010 DS:  ES:  CR0: 80050033
[5.183538] CR2: 55fdf9ec5568 CR3: 0003fb2b6000 CR4: 00350ee0
[5.183538] Call Trace:
[5.183595]  dc_clk_mgr_create+0x135/0x18b [amdgpu]
[5.183651]  dc_create+0x238/0x5e3 [amdgpu]
[5.183708]  amdgpu_dm_init+0x167/0x1101 [amdgpu]
[5.183762]  dm_hw_init+0xa/0x17 [amdgpu]
[5.183805]  amdgpu_device_init+0x1566/0x1853 [amdgpu]
[5.183811]  ? __kmalloc+0xad/0xbf
[5.183852]  ? amdgpu_driver_load_kms+0x1c/0x17f [amdgpu]
[5.183892]  amdgpu_driver_load_kms+0x41/0x17f [amdgpu]
[5.183959]  amdgpu_pci_probe+0x139/0x1c0 [amdgpu]
[5.183967]  pci_device_probe+0xc6/0x135
[5.183971]  really_probe+0x157/0x32a
[5.183974]  driver_probe_device+0x63/0x97
[5.183976]  device_driver_attach+0x37/0x50
[5.183978]  __driver_attach+0x92/0x9a
[5.183980]  ? device_driver_attach+0x50/0x50
[