Renoir: visual artifacts associated with scrolling
I am dealing with visual artifacts on my laptop. I'm not sure if screendumps are OK on this list so I will try to describe the two vulnerable applications I identified so far. Both of them can be healed by forcing a full redraws (changing workspace back and forth, minimizing and maximizing windows) when they misbehave: 1) Xterm Using the mouse wheel is OK, SHIFT-PgUp and SHIFT-PgDn result in damaged lower and upper halves of the screen accordingly. This problem I could solve by using another terminal program. 2) Emacs I see visual artifacts when navigating mails or source code, for example. These artifacts are more versatile: sometimes a top line that seems to be pinned, remainders of long lines, sometimes pinned half lines where the top half of a text line and the bottom halfe seem to be a mixture of actually two. Sometimes these artifacts are only identifiable, because all of the displayed text makes no sense in context. The predecessor of that laptop is Intel based and does not behave that way. Running the two applications on that Intel based laptop via ssh from the Renoir machine shows the same problems, though. Because the predecessor of the laptop does not cause these problems, I tried to do a bisect but did not find a "good" candidate. With all kernels an X-server would start with (>v5.4) I see these artifacts. Still, all this probably does not mean it is not user space that causes the problems. Perhaps, someone could give me some hints what else I could do to further examine this problem. Dirk P.S: Scrolling this text up for review before sending it out also partially scrambles the text. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/1] drm/amdgpu: fix NULL pointer dereference for Renoir
Dirk Gouders writes: > Commit c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir) > introduced a NULL pointer dereference when booting with > amdgpu.discovery=0, because it removed the call of vega10_reg_base_init() > for that case. > > Fix this by calling that funcion if amdgpu_discovery == 0 in addition to > the case that amdgpu_discovery_reg_base_init() failed. > > Fixes: c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir) > Signed-off-by: Dirk Gouders > Cc: Hawking Zhang > Cc: Evan Quan > --- > drivers/gpu/drm/amd/amdgpu/soc15.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c > b/drivers/gpu/drm/amd/amdgpu/soc15.c > index 84d811b6e48b..f8cb62b326d6 100644 > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c > @@ -694,12 +694,12 @@ static void soc15_reg_base_init(struct amdgpu_device > *adev) >* it doesn't support SRIOV. */ > if (amdgpu_discovery) { > r = amdgpu_discovery_reg_base_init(adev); > - if (r) { > - DRM_WARN("failed to init reg base from ip > discovery table, " > - "fallback to legacy init method\n"); > - vega10_reg_base_init(adev); > - } > + if (r == 0) > + break; Grrr, wrong indentation here. But I will wait for your review before v1. Dirk > + DRM_WARN("failed to init reg base from ip discovery > table, " > + "fallback to legacy init method\n"); > } > + vega10_reg_base_init(adev); > break; > case CHIP_VEGA20: > vega20_reg_base_init(adev); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 0/1] drm/amdgpu: fix NULL pointer dereference for Renoir
Alex Deucher writes: > On Wed, Sep 30, 2020 at 4:46 PM Dirk Gouders wrote: >> >> Commit c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir) >> introduced a NULL pointer dereference when booting with >> amdgpu.discovery=0. >> >> For amdgpu.discovery=0 that commit effectively removed the call of >> vega10_reg_base_init(adev), so I tested the correctness of the bisect >> session by restoring that function call for amdgpu_discovery == 0 and with >> that change, the NULL pointer dereference does not occur: >> > > Can I add your Signed-off-by? I did not expect the diff to be seen as a proposed patch, not even that it shows the correct fix. Anyway, I did my best to create a hopefully acceptable patch with some modification of the code that avoids "else" and an identical function call at two places in the code. I testet that patch with amdgpu.discovery={0,1} and together with the patch for the first issue you helped me with. The result is no more call traces. Thank you for your patient assistance with the two issues. Dirk > Thanks, > > Alex > >> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c >> b/drivers/gpu/drm/amd/amdgpu/soc15.c >> index 84d811b6e48b..2e93c5e1e7e6 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c >> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c >> @@ -699,7 +699,8 @@ static void soc15_reg_base_init(struct amdgpu_device >> *adev) >> "fallback to legacy init method\n"); >> vega10_reg_base_init(adev); >> } >> - } >> + } else >> + vega10_reg_base_init(adev); >> break; >> case CHIP_VEGA20: >> vega20_reg_base_init(adev); >> >> Dirk >> ___ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx Dirk Gouders (1): drm/amdgpu: fix NULL pointer dereference for Renoir drivers/gpu/drm/amd/amdgpu/soc15.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) -- 2.26.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/1] drm/amdgpu: fix NULL pointer dereference for Renoir
Commit c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir) introduced a NULL pointer dereference when booting with amdgpu.discovery=0, because it removed the call of vega10_reg_base_init() for that case. Fix this by calling that funcion if amdgpu_discovery == 0 in addition to the case that amdgpu_discovery_reg_base_init() failed. Fixes: c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir) Signed-off-by: Dirk Gouders Cc: Hawking Zhang Cc: Evan Quan --- drivers/gpu/drm/amd/amdgpu/soc15.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index 84d811b6e48b..f8cb62b326d6 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -694,12 +694,12 @@ static void soc15_reg_base_init(struct amdgpu_device *adev) * it doesn't support SRIOV. */ if (amdgpu_discovery) { r = amdgpu_discovery_reg_base_init(adev); - if (r) { - DRM_WARN("failed to init reg base from ip discovery table, " -"fallback to legacy init method\n"); - vega10_reg_base_init(adev); - } + if (r == 0) + break; + DRM_WARN("failed to init reg base from ip discovery table, " +"fallback to legacy init method\n"); } + vega10_reg_base_init(adev); break; case CHIP_VEGA20: vega20_reg_base_init(adev); -- 2.26.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
BUG: amdgpu: NULL pointer dereference introduced in 5.9-rc1
Commit c1cf79ca5ced46 (drm/amdgpu: use IP discovery table for renoir) introduced a NULL pointer dereference when booting with amdgpu.discovery=0. For amdgpu.discovery=0 that commit effectively removed the call of vega10_reg_base_init(adev), so I tested the correctness of the bisect session by restoring that function call for amdgpu_discovery == 0 and with that change, the NULL pointer dereference does not occur: diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index 84d811b6e48b..2e93c5e1e7e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -699,7 +699,8 @@ static void soc15_reg_base_init(struct amdgpu_device *adev) "fallback to legacy init method\n"); vega10_reg_base_init(adev); } - } + } else + vega10_reg_base_init(adev); break; case CHIP_VEGA20: vega20_reg_base_init(adev); Dirk ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amd/pm: setup APU dpm clock table in SMU HW initialization
Evan Quan writes: > As the dpm clock table is needed during DC HW initialization. > And that (DC HW initialization) comes before smu_late_init() > where current APU dpm clock table setup is performed. So, NULL > pointer dereference will be triggered. By moving APU dpm clock > table setup to smu_hw_init(), this can be avoided. Thanks for the quick response. I tested the patch and it fixes the call trace I initially reportet (#1 in the table below). #2 is unaffected by this patch. I could try to bisect it as well bud did not do it, so far. Probably, I caused some confusion in the original thread and I will try to order it a bit. What I noticed is: with amdgpu.discovery value| noticed issue === 1)unset or "1" | call trace because of | assert(0) in rn_clk_mgr_helper_populate_bw_params() ---+--- 2) 0 | NULL pointer dereference in soc15_set_ip_blocks() This patch fixes #1, i.e. avoids the assert() in following code in drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c for (i = PP_SMU_NUM_FCLK_DPM_LEVELS - 1; i >= 0; i--) { if (clock_table->FClocks[i].Freq != 0 && clock_table->FClocks[i].Vol != 0) { j = i; break; } } if (j == -1) { /* clock table is all 0s, just use our own hardcode */ ASSERT(0); return; } To me, the commit message sounds as if the patch fixes #2 whereas it really is #1 that gets fixed. I also wonder if we probably want a fixes-line for completeness: Fixes: 02cf91c113ea (drm/amd/powerplay: postpone operations not required for hw setup to late_init) Dirk > Change-Id: I2bb1f9ba26f9c8820c08241da62f7be64ab75840 > Signed-off-by: Evan Quan > Reported-by: Dirk Gouders > --- > drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 22 +++--- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > index f46cf9ea355e..8f6045def272 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > @@ -482,17 +482,6 @@ static int smu_late_init(void *handle) > return ret; > } > > - /* > - * Set initialized values (get from vbios) to dpm tables context such as > - * gfxclk, memclk, dcefclk, and etc. And enable the DPM feature for each > - * type of clks. > - */ > - ret = smu_set_default_dpm_table(smu); > - if (ret) { > - dev_err(adev->dev, "Failed to setup default dpm clock > tables!\n"); > - return ret; > - } > - > ret = smu_populate_umd_state_clk(smu); > if (ret) { > dev_err(adev->dev, "Failed to populate UMD state clocks!\n"); > @@ -1021,6 +1010,17 @@ static int smu_smc_hw_setup(struct smu_context *smu) > return ret; > } > > + /* > + * Set initialized values (get from vbios) to dpm tables context such as > + * gfxclk, memclk, dcefclk, and etc. And enable the DPM feature for each > + * type of clks. > + */ > + ret = smu_set_default_dpm_table(smu); > + if (ret) { > + dev_err(adev->dev, "Failed to setup default dpm clock > tables!\n"); > + return ret; > + } > + > ret = smu_notify_display_change(smu); > if (ret) > return ret; ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: amdgpu: call trace introduced in 5.9-rc1 for Lenovo L14 Renoir
Alex Deucher writes: > On Wed, Sep 23, 2020 at 3:45 PM Dirk Gouders wrote: >> >> Dirk Gouders writes: >> >> > Alex Deucher writes: >> > >> >> On Wed, Sep 23, 2020 at 8:54 AM Dirk Gouders wrote: >> >>> >> >>> Dirk Gouders writes: >> >>> >> >>> > Hi, >> >>> > >> >>> > I noticed a call trace (attached) when starting my machine (ThinkPad >> >>> > L14). This machine is new and I am still working on it's >> >>> > configuration but visually noticeable is that scrolling in xterms with >> >>> > SHIFT-PgUp/PgDn is broken. Using the mouse wheel works. >> >>> > >> >>> > It seems the call trace has been introduced between 5.8 and 5.9-rc1 and >> >>> > I tried to bisect this but always end in situations where I dont't find >> >>> > a bootable commit around the current bisect position. Mainly the >> >>> > machine then hangs when udevd is started. >> >>> >> >>> I fixed my netconsole setup (had to use a switch instead of the >> >>> ports of a FritzBox) and tried a bisect, again (log below). With the >> >>> commits between the earliest bad and latest good commits I marked, my >> >>> machine does not boot and hangs very early with the message: >> >>> >> >>> fb0: switching to amdgpudrmfb from EFI VGA >> >>> >> >>> That was introduced with >> >>> >> >>> c1cf79ca5ced drm/amdgpu: use IP discovery table for renoir >> >>> >> >>> and ended with a commit that instead produces the call trace >> >>> >> >>> b6df946ef4b5 drm/amdgpu: fix the nullptr issue as for PWR IP not >> >>> existing in discovery table >> >>> >> >>> I was hoping to get further with the bisect but have no idea how to >> >>> avoid the early hangs. >> >> >> >> You can disable use of the IP discovery table by setting >> >> amdgpu.discovery=0 on the kernel command line in grub. >> > >> > I tried that with b770f04ba2ee (next step in bisect), but no success >> > with this option, unfortunately. >> > >> > I'm not using grub but directly booting from UEFI using CONFIG_CMDLINE. >> > Any other option I am using (root, loglevel and netconsole) works as >> > expected and I veryfied that "amdgpu.discovery=0" is included in >> > vmlinux. >> >> Apologies if I'm causing too much noise. >> >> While thinking about this I recalled that I changed amdgpu from modular to >> static when I had problems with netconsole. I changed it back to >> modular to see if that helps and I get the earlier mentioned hangs later >> in the boot process when udevd starts and netconsole is up working. >> This enables me to inspect boot messages and I tested with >> amdgpu.discovery=0: >> >> 5,175,49060,-;Kernel command line: root=PARTLABEL=system1 amdgpu.discovery=0 >> loglevel=15 netconsole=... >> >> I'm afraid I now get traces that commit b6df946ef4b5 (drm/amdgpu: fix >> the nullptr issue as for PWR IP not existing in discovery table) is >> fixing (output attached below) and I cannot decide how to continue with >> bisecting... > > You get the issue with discovery=0? You can try skipping that commit > (mark as skip) to finish the bisection. I get issues with both, amdgpu.discovery={0,1}. With "0" I hit the NULL pointer dereference in soc15_set_ip_blocks() and with "1" I hit the assert in rn_clk_mgr_helper_populate_bw_params(). I only noticed the issue with "0" after you told me about amdgpu.discovery, so I continued to find the commit that introduced the issue with "1". Using bisect skip alone did not help but keeping b6df946ef4b5 (drm/amdgpu: fix the nullptr issue as for PWR IP not existing in discovery table) in the working tree made the bisect session much more straight forward. It resulted in 02cf91c113ea (drm/amd/powerplay: postpone operations not required for hw setup to late_init) as the first bad commit. Not that I understand anything about the driver but I wanted to know if that commit really is causing the issue. So, I tried to move back some initialization code from smu_late_init() to smu_smc_hw_setup() (diff below) and with that the issue is gone. I'm not sure if you prefer a full dmesg output, for now I'll append the [drm] part. Dirk = diff
[PATCH] drm/amd/display: remove duplicate call to rn_vbios_smu_get_smu_version()
Commit 78fe9f63947a2b (drm/amd/display: Remove DISPCLK Limit Floor for Certain SMU Versions) added a call to rn_vbios_smu_get_smu_version() to set clk_mgr->smu_ver. That field is initialized prior to the if-statement, already. Fixes: 78fe9f63947a2b (drm/amd/display: Remove DISPCLK Limit Floor for Certain SMU Versions) Signed-off-by: Dirk Gouders Cc: Alex Deucher Cc: Sung Lee Cc: Yongqiang Sun Cc: Rodrigo Siqueira --- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c index 543afa34d87a..21a3073c8929 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c @@ -783,7 +783,6 @@ void rn_clk_mgr_construct( } else { struct clk_log_info log_info = {0}; - clk_mgr->smu_ver = rn_vbios_smu_get_smu_version(clk_mgr); clk_mgr->periodic_retraining_disabled = rn_vbios_smu_is_periodic_retraining_disabled(clk_mgr); /* SMU Version 55.51.0 and up no longer have an issue -- 2.26.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: amdgpu: call trace introduced in 5.9-rc1 for Lenovo L14 Renoir
Dirk Gouders writes: > Alex Deucher writes: > >> On Wed, Sep 23, 2020 at 8:54 AM Dirk Gouders wrote: >>> >>> Dirk Gouders writes: >>> >>> > Hi, >>> > >>> > I noticed a call trace (attached) when starting my machine (ThinkPad >>> > L14). This machine is new and I am still working on it's >>> > configuration but visually noticeable is that scrolling in xterms with >>> > SHIFT-PgUp/PgDn is broken. Using the mouse wheel works. >>> > >>> > It seems the call trace has been introduced between 5.8 and 5.9-rc1 and >>> > I tried to bisect this but always end in situations where I dont't find >>> > a bootable commit around the current bisect position. Mainly the >>> > machine then hangs when udevd is started. >>> >>> I fixed my netconsole setup (had to use a switch instead of the >>> ports of a FritzBox) and tried a bisect, again (log below). With the >>> commits between the earliest bad and latest good commits I marked, my >>> machine does not boot and hangs very early with the message: >>> >>> fb0: switching to amdgpudrmfb from EFI VGA >>> >>> That was introduced with >>> >>> c1cf79ca5ced drm/amdgpu: use IP discovery table for renoir >>> >>> and ended with a commit that instead produces the call trace >>> >>> b6df946ef4b5 drm/amdgpu: fix the nullptr issue as for PWR IP not >>> existing in discovery table >>> >>> I was hoping to get further with the bisect but have no idea how to >>> avoid the early hangs. >> >> You can disable use of the IP discovery table by setting >> amdgpu.discovery=0 on the kernel command line in grub. > > I tried that with b770f04ba2ee (next step in bisect), but no success > with this option, unfortunately. > > I'm not using grub but directly booting from UEFI using CONFIG_CMDLINE. > Any other option I am using (root, loglevel and netconsole) works as > expected and I veryfied that "amdgpu.discovery=0" is included in > vmlinux. Apologies if I'm causing too much noise. While thinking about this I recalled that I changed amdgpu from modular to static when I had problems with netconsole. I changed it back to modular to see if that helps and I get the earlier mentioned hangs later in the boot process when udevd starts and netconsole is up working. This enables me to inspect boot messages and I tested with amdgpu.discovery=0: 5,175,49060,-;Kernel command line: root=PARTLABEL=system1 amdgpu.discovery=0 loglevel=15 netconsole=... I'm afraid I now get traces that commit b6df946ef4b5 (drm/amdgpu: fix the nullptr issue as for PWR IP not existing in discovery table) is fixing (output attached below) and I cannot decide how to continue with bisecting... Dirk 1,840,5418458,-;BUG: kernel NULL pointer dereference, address: 0008 1,841,5418472,-;#PF: supervisor read access in kernel mode 1,842,5418474,-;#PF: error_code(0x) - not-present page 6,843,5418476,-;PGD 0 P4D 0 4,844,5418480,-;Oops: [#1] SMP NOPTI 4,845,5418483,-;CPU: 3 PID: 744 Comm: udevd Not tainted 5.7.0-rc2-x86_64-01641-gb770f04ba2ee #216 4,846,5418486,-;Hardware name: LENOVO 20U50008GE/20U50008GE, BIOS R19ET26W (1.10 ) 06/22/2020 4,847,5418559,-;RIP: 0010:nbio_v7_0_get_rev_id+0x9/0x1b [amdgpu] 4,848,5418562,-;Code: 5d 41 5d 41 5e e9 9a f0 f9 ff 48 8b 87 e8 5f 01 00 31 d2 8b 70 08 81 c6 c3 00 00 00 e9 9d ef f9 ff 48 8b 87 e8 5f 01 00 31 d2 <8b> 70 08 83 c6 0f e8 89 ef f9 ff c1 e8 18 83 e0 0f c3 49 89 f8 48 4,849,5418566,-;RSP: 0018:c900011dba90 EFLAGS: 00010246 4,850,5418568,-;RAX: RBX: 000fffe0 RCX: 0018 4,851,5418571,-;RDX: RSI: a0970e20 RDI: 8883f554 4,852,5418573,-;RBP: 8883f554 R08: 0001 R09: 4,853,5418575,-;R10: R11: 0048 R12: ffea 4,854,5418577,-;R13: 7fff R14: 8883f9486800 R15: c900011dbe98 4,855,5418580,-;FS: 7f750db3dd80() GS:88840ecc() knlGS: 4,856,5418583,-;CS: 0010 DS: ES: CR0: 80050033 4,857,5418586,-;CR2: 0008 CR3: 0003f9728000 CR4: 00340ee0 4,858,5418588,-;Call Trace: 4,859,5418660,-; soc15_set_ip_blocks+0x105/0x4fd [amdgpu] 4,860,5418714,-; amdgpu_device_init+0xcab/0x1862 [amdgpu] 4,861,5418720,-; ? __kmalloc+0xb2/0xc4 4,862,5418766,-; amdgpu_driver_load_kms+0x41/0x178 [amdgpu] 4,863,5418813,-; amdgpu_pci_probe+0x147/0x1c7 [amdgpu] 4,864,5418818,-; pci_device_probe+0xc6/0x135 4,865,5418822,-; really_probe+0x157/0x2d1 4,866,5418825,-; driver_probe_device+0x97/0xcc 4,867,5418828,-; de
Re: amdgpu: call trace introduced in 5.9-rc1 for Lenovo L14 Renoir
Alex Deucher writes: > On Wed, Sep 23, 2020 at 8:54 AM Dirk Gouders wrote: >> >> Dirk Gouders writes: >> >> > Hi, >> > >> > I noticed a call trace (attached) when starting my machine (ThinkPad >> > L14). This machine is new and I am still working on it's >> > configuration but visually noticeable is that scrolling in xterms with >> > SHIFT-PgUp/PgDn is broken. Using the mouse wheel works. >> > >> > It seems the call trace has been introduced between 5.8 and 5.9-rc1 and >> > I tried to bisect this but always end in situations where I dont't find >> > a bootable commit around the current bisect position. Mainly the >> > machine then hangs when udevd is started. >> >> I fixed my netconsole setup (had to use a switch instead of the >> ports of a FritzBox) and tried a bisect, again (log below). With the >> commits between the earliest bad and latest good commits I marked, my >> machine does not boot and hangs very early with the message: >> >> fb0: switching to amdgpudrmfb from EFI VGA >> >> That was introduced with >> >> c1cf79ca5ced drm/amdgpu: use IP discovery table for renoir >> >> and ended with a commit that instead produces the call trace >> >> b6df946ef4b5 drm/amdgpu: fix the nullptr issue as for PWR IP not >> existing in discovery table >> >> I was hoping to get further with the bisect but have no idea how to >> avoid the early hangs. > > You can disable use of the IP discovery table by setting > amdgpu.discovery=0 on the kernel command line in grub. I tried that with b770f04ba2ee (next step in bisect), but no success with this option, unfortunately. I'm not using grub but directly booting from UEFI using CONFIG_CMDLINE. Any other option I am using (root, loglevel and netconsole) works as expected and I veryfied that "amdgpu.discovery=0" is included in vmlinux. Dirk > > Alex > > >> >> Dirk >> >> = bisect log === >> git bisect start >> # bad: [9123e3a74ec7b934a4a099e98af6a61c2f80bbf5] Linux 5.9-rc1 >> git bisect bad 9123e3a74ec7b934a4a099e98af6a61c2f80bbf5 >> # good: [bcf876870b95592b52519ed4aafcf9d95999bc9c] Linux 5.8 >> git bisect good bcf876870b95592b52519ed4aafcf9d95999bc9c >> # bad: [8186749621ed6b8fc42644c399e8c755a2b6f630] Merge tag >> 'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm >> git bisect bad 8186749621ed6b8fc42644c399e8c755a2b6f630 >> # good: [2324d50d051ec0f14a548e78554fb02513d6dcef] Merge tag 'docs-5.9' of >> git://git.lwn.net/linux >> git bisect good 2324d50d051ec0f14a548e78554fb02513d6dcef >> # bad: [54d44bfc56308d105b0da37392d8398bdc9d4745] drm/nouveau/nvif: >> give every disp object a human-readable identifier >> git bisect bad 54d44bfc56308d105b0da37392d8398bdc9d4745 >> # bad: [9555152beb1143c85c03f9b9de59863cbbe89f4b] Merge tag >> 'amd-drm-next-5.9-2020-07-01' of >> git://people.freedesktop.org/~agd5f/linux into drm-next >> git bisect bad 9555152beb1143c85c03f9b9de59863cbbe89f4b >> # bad: [dfd991794685b1228387214f28630b6e94e56944] drm/amd/display: Not doing >> bios data pack. >> git bisect bad dfd991794685b1228387214f28630b6e94e56944 >> # good: [ba806f98f868ce107aa9c453fef751de9980e4af] drm/radeon: disable AGP >> by default >> git bisect good ba806f98f868ce107aa9c453fef751de9980e4af >> # good: [97d798b276e94a366dfb03d62bc90d4742ab3a31] drm/amdgpu: simplify ATIF >> backlight handling >> git bisect good 97d798b276e94a366dfb03d62bc90d4742ab3a31 >> # good: [ac4e189a5623579c023c9cf8006422aef2a487b4] drm/amdgpu/gfx10: add >> navi12 to gfxoff case >> git bisect good ac4e189a5623579c023c9cf8006422aef2a487b4 >> # good: [70534d1ee89ceadd03292d0c2da4dd4020189678] drm/amdgpu: simplify >> raven and renoir checks >> git bisect good 70534d1ee89ceadd03292d0c2da4dd4020189678 >> # good: [4541ea81edde6ce9a1d9be082489aca7e8e7e1dc] >> drm/[radeon|amdgpu]: Replace one-element array and use struct_size() >> helper >> git bisect good 4541ea81edde6ce9a1d9be082489aca7e8e7e1dc >> # good: [84034ad4c0c0813c1350b43087eed036066edd5a] drm/amd/display: combine >> public interfaces into single header >> git bisect good 84034ad4c0c0813c1350b43087eed036066edd5a >> # good: [4f1fad0e9dbd762497df7c79309697ed8b2b6cfc] drm/amd/powerplay: stop >> thermal IRQs on suspend >> git bisect good 4f1fad0e9dbd762497df7c79309697ed8b2b6cfc >> # good: [4292b0b2026bc10bced32636ea02dd8eed00cea9] drm/amdgpu: clean up >> discovery testing >> g
Re: amdgpu: call trace introduced in 5.9-rc1 for Lenovo L14 Renoir
Dirk Gouders writes: > Hi, > > I noticed a call trace (attached) when starting my machine (ThinkPad > L14). This machine is new and I am still working on it's > configuration but visually noticeable is that scrolling in xterms with > SHIFT-PgUp/PgDn is broken. Using the mouse wheel works. > > It seems the call trace has been introduced between 5.8 and 5.9-rc1 and > I tried to bisect this but always end in situations where I dont't find > a bootable commit around the current bisect position. Mainly the > machine then hangs when udevd is started. I fixed my netconsole setup (had to use a switch instead of the ports of a FritzBox) and tried a bisect, again (log below). With the commits between the earliest bad and latest good commits I marked, my machine does not boot and hangs very early with the message: fb0: switching to amdgpudrmfb from EFI VGA That was introduced with c1cf79ca5ced drm/amdgpu: use IP discovery table for renoir and ended with a commit that instead produces the call trace b6df946ef4b5 drm/amdgpu: fix the nullptr issue as for PWR IP not existing in discovery table I was hoping to get further with the bisect but have no idea how to avoid the early hangs. Dirk = bisect log === git bisect start # bad: [9123e3a74ec7b934a4a099e98af6a61c2f80bbf5] Linux 5.9-rc1 git bisect bad 9123e3a74ec7b934a4a099e98af6a61c2f80bbf5 # good: [bcf876870b95592b52519ed4aafcf9d95999bc9c] Linux 5.8 git bisect good bcf876870b95592b52519ed4aafcf9d95999bc9c # bad: [8186749621ed6b8fc42644c399e8c755a2b6f630] Merge tag 'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm git bisect bad 8186749621ed6b8fc42644c399e8c755a2b6f630 # good: [2324d50d051ec0f14a548e78554fb02513d6dcef] Merge tag 'docs-5.9' of git://git.lwn.net/linux git bisect good 2324d50d051ec0f14a548e78554fb02513d6dcef # bad: [54d44bfc56308d105b0da37392d8398bdc9d4745] drm/nouveau/nvif: give every disp object a human-readable identifier git bisect bad 54d44bfc56308d105b0da37392d8398bdc9d4745 # bad: [9555152beb1143c85c03f9b9de59863cbbe89f4b] Merge tag 'amd-drm-next-5.9-2020-07-01' of git://people.freedesktop.org/~agd5f/linux into drm-next git bisect bad 9555152beb1143c85c03f9b9de59863cbbe89f4b # bad: [dfd991794685b1228387214f28630b6e94e56944] drm/amd/display: Not doing bios data pack. git bisect bad dfd991794685b1228387214f28630b6e94e56944 # good: [ba806f98f868ce107aa9c453fef751de9980e4af] drm/radeon: disable AGP by default git bisect good ba806f98f868ce107aa9c453fef751de9980e4af # good: [97d798b276e94a366dfb03d62bc90d4742ab3a31] drm/amdgpu: simplify ATIF backlight handling git bisect good 97d798b276e94a366dfb03d62bc90d4742ab3a31 # good: [ac4e189a5623579c023c9cf8006422aef2a487b4] drm/amdgpu/gfx10: add navi12 to gfxoff case git bisect good ac4e189a5623579c023c9cf8006422aef2a487b4 # good: [70534d1ee89ceadd03292d0c2da4dd4020189678] drm/amdgpu: simplify raven and renoir checks git bisect good 70534d1ee89ceadd03292d0c2da4dd4020189678 # good: [4541ea81edde6ce9a1d9be082489aca7e8e7e1dc] drm/[radeon|amdgpu]: Replace one-element array and use struct_size() helper git bisect good 4541ea81edde6ce9a1d9be082489aca7e8e7e1dc # good: [84034ad4c0c0813c1350b43087eed036066edd5a] drm/amd/display: combine public interfaces into single header git bisect good 84034ad4c0c0813c1350b43087eed036066edd5a # good: [4f1fad0e9dbd762497df7c79309697ed8b2b6cfc] drm/amd/powerplay: stop thermal IRQs on suspend git bisect good 4f1fad0e9dbd762497df7c79309697ed8b2b6cfc # good: [4292b0b2026bc10bced32636ea02dd8eed00cea9] drm/amdgpu: clean up discovery testing git bisect good 4292b0b2026bc10bced32636ea02dd8eed00cea9 # bad: [c0838cbee2d05c3eb8a2b5a3d1ce706a73008044] drm/amd/display: Revert "enable plane if plane_status changed" git bisect bad c0838cbee2d05c3eb8a2b5a3d1ce706a73008044 # bad: [651a146526a04993c5bebf0e19cd9256f5e6511d] drm/amdgpu/jpeg: fix race condition issue for jpeg start git bisect bad 651a146526a04993c5bebf0e19cd9256f5e6511d # bad: [3bda8acd974e362069e291a78c59a10624debc6e] drm/amdgpu/sriov: Add clear vf fw support git bisect bad 3bda8acd974e362069e291a78c59a10624debc6e # bad: [b6df946ef4b5ae29183b2fdb2d12c381c757b3fb] drm/amdgpu: fix the nullptr issue as for PWR IP not existing in discovery table git bisect bad b6df946ef4b5ae29183b2fdb2d12c381c757b3fb > Please let me know if I can help with further information. > > Dirk > > = lspci -vk > > 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] > Renoir (rev c3) (prog-if 00 [VGA controller]) > Subsystem: Lenovo Renoir > Flags: bus master, fast devsel, latency 0, IRQ 64 > Memory at 46000 (64-bit, prefetchable) [size=256M] > Memory at 4
amdgpu: call trace introduced in 5.9-rc1 for Lenovo L14 Renoir
Hi, I noticed a call trace (attached) when starting my machine (ThinkPad L14). This machine is new and I am still working on it's configuration but visually noticeable is that scrolling in xterms with SHIFT-PgUp/PgDn is broken. Using the mouse wheel works. It seems the call trace has been introduced between 5.8 and 5.9-rc1 and I tried to bisect this but always end in situations where I dont't find a bootable commit around the current bisect position. Mainly the machine then hangs when udevd is started. Please let me know if I can help with further information. Dirk = lspci -vk 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c3) (prog-if 00 [VGA controller]) Subsystem: Lenovo Renoir Flags: bus master, fast devsel, latency 0, IRQ 64 Memory at 46000 (64-bit, prefetchable) [size=256M] Memory at 47000 (64-bit, prefetchable) [size=2M] I/O ports at 1000 [size=256] Memory at fd30 (32-bit, non-prefetchable) [size=512K] Capabilities: [48] Vendor Specific Information: Len=08 Capabilities: [50] Power Management version 3 Capabilities: [64] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/4 Maskable- 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=4 Masked- Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 Capabilities: [270] Secondary PCI Express Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Capabilities: [400] Data Link Feature Capabilities: [410] Physical Layer 16.0 GT/s Capabilities: [440] Lane Margining at the Receiver Kernel driver in use: amdgpu Kernel modules: amdgpu = call trace === [5.181468] amdgpu :06:00.0: amdgpu: SMU is initialized successfully! [5.182857] [drm] kiq ring mec 2 pipe 1 q 0 [5.183374] [ cut here ] [5.183448] WARNING: CPU: 1 PID: 684 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn21/rn_clk_mgr.c:716 rn_clk_mgr_construct+0x242/0x389 [amdgpu] [5.183449] Modules linked in: btusb btrtl btbcm btintel bluetooth ecdh_generic ecc iwlmvm mac80211 libarc4 wmi_bmof crct10dif_pclmul snd_hda_codec_realtek crc32c_intel iwlwifi snd_hda_codec_generic amdgpu(+) tpm_crb snd_hda_codec_hdmi gpu_sched i2c_algo_bit ttm sdhci_pci aesni_intel drm_kms_helper cqhci sdhci ccp syscopyarea snd_hda_intel sysfillrect tpm_tis snd_intel_dspcfg sysimgblt xhci_pci tpm_tis_core fb_sys_fops r8169 snd_hda_codec mmc_core snd_hda_core xhci_hcd thinkpad_acpi cfg80211 realtek drm snd_pcm rng_core mdio_devres sha1_generic snd_timer nvram libphy i2c_piix4 snd k10temp soundcore ledtrig_audio rfkill tpm hw mon wmi battery ac video backlight pinctrl_amd acpi_cpufreq button efivarfs [5.183470] CPU: 1 PID: 684 Comm: udevd Not tainted 5.9.0-rc6-x86_64+ #170 [5.183471] Hardware name: LENOVO 20U50008GE/20U50008GE, BIOS R19ET26W (1.10 ) 06/22/2020 [5.183531] RIP: 0010:rn_clk_mgr_construct+0x242/0x389 [amdgpu] [5.183533] Code: 30 4d 85 c9 74 26 ba 03 00 00 00 83 bc d4 a8 00 00 00 00 89 d6 74 0a 83 bc d4 ac 00 00 00 00 75 40 48 ff ca 48 83 fa ff 75 e1 <0f> 0b 83 7 b 20 01 0f 84 13 01 00 00 81 bd e8 00 00 00 ff 14 37 00 [5.183533] RSP: 0018:c9000111f798 EFLAGS: 00010246 [5.183534] RAX: 8883fc1d8e00 RBX: 8883f925c9c0 RCX: [5.183535] RDX: RSI: RDI: 8883f8da70c8 [5.183535] RBP: 8883fe8da000 R08: R09: 8883f724fc00 [5.183535] R10: 7fc9117f R11: 8883f925c9c0 R12: 8883f925c900 [5.183536] R13: 8883f598 R14: R15: 0001 [5.183537] FS: 7f9e31a83d80() GS:88840ec4() knlGS: [5.183537] CS: 0010 DS: ES: CR0: 80050033 [5.183538] CR2: 55fdf9ec5568 CR3: 0003fb2b6000 CR4: 00350ee0 [5.183538] Call Trace: [5.183595] dc_clk_mgr_create+0x135/0x18b [amdgpu] [5.183651] dc_create+0x238/0x5e3 [amdgpu] [5.183708] amdgpu_dm_init+0x167/0x1101 [amdgpu] [5.183762] dm_hw_init+0xa/0x17 [amdgpu] [5.183805] amdgpu_device_init+0x1566/0x1853 [amdgpu] [5.183811] ? __kmalloc+0xad/0xbf [5.183852] ? amdgpu_driver_load_kms+0x1c/0x17f [amdgpu] [5.183892] amdgpu_driver_load_kms+0x41/0x17f [amdgpu] [5.183959] amdgpu_pci_probe+0x139/0x1c0 [amdgpu] [5.183967] pci_device_probe+0xc6/0x135 [5.183971] really_probe+0x157/0x32a [5.183974] driver_probe_device+0x63/0x97 [5.183976] device_driver_attach+0x37/0x50 [5.183978] __driver_attach+0x92/0x9a [5.183980] ? device_driver_attach+0x50/0x50 [