from:"Thorsten Leemhuis"

Re: 6.10/regression/bisected commit c4cb23111103 causes sleeping function called from invalid context at kernel/locking/mutex.c:585

2024-05-28 Thread Linux regression tracking (Thorsten Leemhuis)

On 22.05.24 23:18, Chris Bainbridge wrote:
> On Tue, May 21, 2024 at 02:39:06PM +0500, Mikhail Gavrilov wrote:
>> Yesterday on the fresh kernel snapshot
>> I spotted a new bug message with follow stacktrace:
>> [4.307097] BUG: sleeping function called from invalid context at
>> kernel/locking/mutex.c:585
> I am also getting this error on every boot. Decoded stacktrace:

TWIMC & for the record: Boris also reported this; Vasant Hegde replied
and said a fix is in the works:

https://lore.kernel.org/all/898d356d-ec7d-41de-82d8-3ed4dc559...@amd.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot dup:
https://lore.kernel.org/all/cabxgcsn1z2gj99zsdhqwynptxbymrqhejdff8axxxoiz_0g...@mail.gmail.com/

Re: [PATCH] drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2

2024-05-21 Thread Linux regression tracking (Thorsten Leemhuis)

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Hmm, from here it looks like the patch now that it was reviewed more
that a week ago is still not even in -next. Is there a reason?

I know, we are in the merge window. But at the same time this is a fix
(that already lingered on the lists for way too long before it was
reviewed) for a regression in a somewhat recent kernel, so it in Linus
own words should be "expedited"[1].

Or are we again just missing a right person for the job in the CC?
Adding Dave and Sima just in case.

Ciao, Thorsten

[1]
https://lore.kernel.org/all/CAHk-=wis_qqy4odnynnki5b7qhosmxtoj1jxo5wmb6sruwq...@mail.gmail.com/

On 12.05.24 18:11, Limonciello, Mario wrote:
> On 5/10/2024 4:24 AM, Jani Nikula wrote:
>> On Fri, 10 May 2024, "Lin, Wayne"  wrote:
>>>> -Original Message-
>>>> From: Limonciello, Mario 
>>>> Sent: Friday, May 10, 2024 3:18 AM
>>>> To: Linux regressions mailing list ;
>>>> Wentland, Harry
>>>> ; Lin, Wayne 
>>>> Cc: ly...@redhat.com; imre.d...@intel.com; Leon Weiß
>>>> >>> bochum.de>; sta...@vger.kernel.org; dri-devel@lists.freedesktop.org;
>>>> amd-
>>>> g...@lists.freedesktop.org; intel-...@lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/mst: Fix NULL pointer dereference at
>>>> drm_dp_add_payload_part2
>>>>
>>>> On 5/9/2024 07:43, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>>> On 18.04.24 21:43, Harry Wentland wrote:
>>>>>> On 2024-03-07 01:29, Wayne Lin wrote:
>>>>>>> [Why]
>>>>>>> Commit:
>>>>>>> - commit 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload
>>>>>>> allocation/removement") accidently overwrite the commit
>>>>>>> - commit 54d217406afe ("drm: use mgr->dev in drm_dbg_kms in
>>>>>>> drm_dp_add_payload_part2") which cause regression.
>>>>>>>
>>>>>>> [How]
>>>>>>> Recover the original NULL fix and remove the unnecessary input
>>>>>>> parameter 'state' for drm_dp_add_payload_part2().
>>>>>>>
>>>>>>> Fixes: 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload
>>>>>>> allocation/removement")
>>>>>>> Reported-by: Leon Weiß 
>>>>>>> Link:
>>>>>>> https://lore.kernel.org/r/38c253ea42072cc825dc969ac4e6b9b600371cc8.c
>>>>>>> a...@ruhr-uni-bochum.de/
>>>>>>> Cc: ly...@redhat.com
>>>>>>> Cc: imre.d...@intel.com
>>>>>>> Cc: sta...@vger.kernel.org
>>>>>>> Cc: regressi...@lists.linux.dev
>>>>>>> Signed-off-by: Wayne Lin 
>>>>>>
>>>>>> I haven't been deep in MST code in a while but this all looks pretty
>>>>>> straightforward and good.
>>>>>>
>>>>>> Reviewed-by: Harry Wentland 
>>>>>
>>>>> Hmmm, that was three weeks ago, but it seems since then nothing
>>>>> happened to fix the linked regression through this or some other
>>>>> patch. Is there a reason? The build failure report from the CI maybe?
>>>>
>>>> It touches files outside of amd but only has an ack from AMD.  I
>>>> think we
>>>> /probably/ want an ack from i915 and nouveau to take it through.
>>>
>>> Thanks, Mario!
>>>
>>> Hi Thorsten,
>>> Yeah, like what Mario said. Would also like to have ack from i915 and
>>> nouveau.
>>
>> It usually works better if you Cc the folks you want an ack from! ;)
>>
>> Acked-by: Jani Nikula 
>>
> 
> Thanks! Can someone with commit permissions take this to drm-misc?
> 
> 
>

Re: [PATCH] drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2

2024-05-09 Thread Linux regression tracking (Thorsten Leemhuis)

On 18.04.24 21:43, Harry Wentland wrote:
> On 2024-03-07 01:29, Wayne Lin wrote:
>> [Why]
>> Commit:
>> - commit 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload 
>> allocation/removement")
>> accidently overwrite the commit
>> - commit 54d217406afe ("drm: use mgr->dev in drm_dbg_kms in 
>> drm_dp_add_payload_part2")
>> which cause regression.
>>
>> [How]
>> Recover the original NULL fix and remove the unnecessary input parameter 
>> 'state' for
>> drm_dp_add_payload_part2().
>>
>> Fixes: 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload 
>> allocation/removement")
>> Reported-by: Leon Weiß 
>> Link: 
>> https://lore.kernel.org/r/38c253ea42072cc825dc969ac4e6b9b600371cc8.ca...@ruhr-uni-bochum.de/
>> Cc: ly...@redhat.com
>> Cc: imre.d...@intel.com
>> Cc: sta...@vger.kernel.org
>> Cc: regressi...@lists.linux.dev
>> Signed-off-by: Wayne Lin 
> 
> I haven't been deep in MST code in a while but this all looks
> pretty straightforward and good.
> 
> Reviewed-by: Harry Wentland 

Hmmm, that was three weeks ago, but it seems since then nothing happened
to fix the linked regression through this or some other patch. Is there
a reason? The build failure report from the CI maybe?

Wayne Lin, do you know what's up?

Ciao, Thorsten

>> ---
>>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 2 +-
>>  drivers/gpu/drm/display/drm_dp_mst_topology.c | 4 +---
>>  drivers/gpu/drm/i915/display/intel_dp_mst.c   | 2 +-
>>  drivers/gpu/drm/nouveau/dispnv50/disp.c   | 2 +-
>>  include/drm/display/drm_dp_mst_helper.h   | 1 -
>>  5 files changed, 4 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
>> index c27063305a13..2c36f3d00ca2 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
>> @@ -363,7 +363,7 @@ void dm_helpers_dp_mst_send_payload_allocation(
>>  mst_state = to_drm_dp_mst_topology_state(mst_mgr->base.state);
>>  new_payload = drm_atomic_get_mst_payload_state(mst_state, 
>> aconnector->mst_output_port);
>>  
>> -ret = drm_dp_add_payload_part2(mst_mgr, mst_state->base.state, 
>> new_payload);
>> +ret = drm_dp_add_payload_part2(mst_mgr, new_payload);
>>  
>>  if (ret) {
>>  amdgpu_dm_set_mst_status(>mst_status,
>> diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c 
>> b/drivers/gpu/drm/display/drm_dp_mst_topology.c
>> index 03d528209426..95fd18f24e94 100644
>> --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
>> +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
>> @@ -3421,7 +3421,6 @@ EXPORT_SYMBOL(drm_dp_remove_payload_part2);
>>  /**
>>   * drm_dp_add_payload_part2() - Execute payload update part 2
>>   * @mgr: Manager to use.
>> - * @state: The global atomic state
>>   * @payload: The payload to update
>>   *
>>   * If @payload was successfully assigned a starting time slot by 
>> drm_dp_add_payload_part1(), this
>> @@ -3430,14 +3429,13 @@ EXPORT_SYMBOL(drm_dp_remove_payload_part2);
>>   * Returns: 0 on success, negative error code on failure.
>>   */
>>  int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
>> - struct drm_atomic_state *state,
>>   struct drm_dp_mst_atomic_payload *payload)
>>  {
>>  int ret = 0;
>>  
>>  /* Skip failed payloads */
>>  if (payload->payload_allocation_status != 
>> DRM_DP_MST_PAYLOAD_ALLOCATION_DFP) {
>> -drm_dbg_kms(state->dev, "Part 1 of payload creation for %s 
>> failed, skipping part 2\n",
>> +drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s 
>> failed, skipping part 2\n",
>>  payload->port->connector->name);
>>  return -EIO;
>>  }
>> diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c 
>> b/drivers/gpu/drm/i915/display/intel_dp_mst.c
>> index 53aec023ce92..2fba66aec038 100644
>> --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
>> +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
>> @@ -1160,7 +1160,7 @@ static void intel_mst_enable_dp(struct 
>> intel_atomic_state *state,
>>  if (first_mst_stream)
>>  intel_ddi_wait_for_fec_status(encoder, pipe_config, true);
>>  
>> -drm_dp_add_payload_part2(_dp->mst_mgr, >base,
>> +drm_dp_add_payload_part2(_dp->mst_mgr,
>>   drm_atomic_get_mst_payload_state(mst_state, 
>> connector->port));
>>  
>>  if (DISPLAY_VER(dev_priv) >= 12)
>> diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
>> b/drivers/gpu/drm/nouveau/dispnv50/disp.c
>> index 0c3d88ad0b0e..88728a0b2c25 100644
>> --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
>> +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
>> @@ -915,7 +915,7 @@ nv50_msto_cleanup(struct drm_atomic_state *state,
>>  msto->disabled = false;
>>

Re: [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]

2024-05-08 Thread Linux regression tracking (Thorsten Leemhuis)

On 08.05.24 14:35, Anders Blomdell wrote:
> On 2024-05-07 07:04, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 06.05.24 16:30, David Wang wrote:
>>>> On 30.04.24 08:13, David Wang wrote:
>>
>>>> And confirmed that the warning is caused by
>>>> 07ed11afb68d94eadd4ffc082b97c2331307c5ea and reverting it can fix.
>>>
>>> The kernel warning still shows up in 6.9.0-rc7.
>>> (I think 4 high load processes on a 2-Core VM could easily trigger
>>> the kernel warning.)
>>
>> Thx for the report. Linus just reverted the commit 07ed11afb68 you
>> mentioned in your initial mail (I put that quote in again, see above):
>>
>> 3628e0383dd349 ("Reapply "drm/qxl: simplify qxl_fence_wait"")
>> https://git.kernel.org/torvalds/c/3628e0383dd349f02f882e612ab6184e4bb3dc10
>>
>> So this hopefully should be history now.
>>
> Since this affects the 6.8 series (6.8.7 and onwards), I made a CC to
> sta...@vger.kernel.org

Ohh, good idea, I thought Linus had added a stable tag, but that is not
the case. Adding Greg as well and making things explicit:

@Greg: you might want to add 3628e0383dd349 ("Reapply "drm/qxl: simplify
qxl_fence_wait"") to all branches that received 07ed11afb68d94 ("Revert
"drm/qxl: simplify qxl_fence_wait"") (which afaics went into v6.8.7,
v6.6.28, v6.1.87, and v5.15.156).

Ciao, Thorsten

Re: [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]

2024-05-06 Thread Linux regression tracking (Thorsten Leemhuis)




On 06.05.24 16:30, David Wang wrote:
>> On 30.04.24 08:13, David Wang wrote:

>> And confirmed that the warning is caused by
>> 07ed11afb68d94eadd4ffc082b97c2331307c5ea and reverting it can fix.
>
> The kernel warning still shows up in 6.9.0-rc7.
> (I think 4 high load processes on a 2-Core VM could easily trigger the kernel 
> warning.)

Thx for the report. Linus just reverted the commit 07ed11afb68 you
mentioned in your initial mail (I put that quote in again, see above):

3628e0383dd349 ("Reapply "drm/qxl: simplify qxl_fence_wait"")
https://git.kernel.org/torvalds/c/3628e0383dd349f02f882e612ab6184e4bb3dc10

So this hopefully should be history now.

Ciao, Thorsten

Re: nouveau: r535.c:1266:3: error: label at end of compound statement default: with gcc-8

2024-04-29 Thread Linux regression tracking (Thorsten Leemhuis)




On 29.04.24 17:06, Naresh Kamboju wrote:
> Following build warnings / errors noticed on Linux next-20240429 tag on the
> arm64, arm and riscv with gcc-8 and gcc-13 builds pass.
> 
> Reported-by: Linux Kernel Functional Testing 
> 
> Commit id:
>  b58a0bc904ff nouveau: add command-line GSP-RM registry support
> 
> Buids:
> --
>   gcc-8-arm64-defconfig - Fail
>   gcc-8-arm-defconfig - Fail
>   gcc-8-riscv-defconfig - Fail
> 
> Build log:
> 
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c: In function 'build_registry':
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1266:3: error: label at
> end of compound statement
>default:
>^~~
> make[7]: *** [scripts/Makefile.build:244:
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.o] Error 1

TWIMC, there is another report about this in this thread (sadly some of
its post did not make it to lore):

https://lore.kernel.org/all/162ef3c0-1d7b-4220-a21f-b0008657f...@redhat.com/

Ciao, Thorsten

> metadata:
>   git_describe: next-20240429
>   git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>   git_short_log: b0a2c79c6f35 ("Add linux-next specific files for 20240429")
>   arch: arm64, arm, riscv
>   toolchain: gcc-8
> 
> Steps to reproduce:
> 
> # tuxmake --runtime podman --target-arch arm64 --toolchain gcc-8
> --kconfig defconfig
> 
> Links:
>  - 
> https://storage.tuxsuite.com/public/linaro/lkft/builds/2flcoOuqVJfhTvX4AOYsWMd5hqe/
>  - 
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240429/testrun/23704376/suite/build/test/gcc-8-defconfig/history/
>  - 
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240429/testrun/23705756/suite/build/test/gcc-8-defconfig/details/
> 
> 
> --
> Linaro LKFT
> https://lkft.linaro.org
> 
>

Re: [REGRESSION] external monitor+Dell dock in 6.8

2024-04-02 Thread Linux regression tracking (Thorsten Leemhuis)

[Adding a few folks and list while dropping the stable list, as this is
unrelated to it]

On 31.03.24 07:59, Andrei Gaponenko wrote:
> 
> I noticed a regression with the mailine kernel pre-compiled by EPEL.
> I have just tried linux-6.9-rc1.tar.gz from kernel.org, and it still
> misbehaves.
> 
> The default setup: a laptop is connected to a dock, Dell WD22TB4, via
> a USB-C cable.  The dock is connected to an external monitor via a
> Display Port cable.  With a "good" kernel everything works.  With a
> "broken" kernel, the external monitor is still correctly identified by
> the system, and is shown as enabled in plasma systemsettings. The
> system also behaves like the monitor is working, for example, one can
> move the mouse pointer off the laptop screen.  However the external
> monitor screen stays black, and it eventually goes to sleep.

Just a quick heads up to ensure people are aware of it:

Imre Deak, turns out this is caused by a patch of yours: 55eaef16417448
("drm/i915/dp_mst: Handle the Synaptics HBlank expansion quirk"). Andrei
Gaponenko meanwhile filed a ticket about it here:

https://gitlab.freedesktop.org/drm/intel/-/issues/10637

Ciao, Thorsten

> Everything worked with EPEL mainline kernels up to and including
> kernel-ml-6.7.9-1.el9.elrepo.x86_64
> 
> The breakage is observed in
> 
> kernel-ml-6.8.1-1.el9.elrepo.x86_64
> kernel-ml-6.8.2-1.el9.elrepo.x86_64
> linux-6.9-rc1.tar.gz from kernel.org (with olddefconfig)
> 
> Other tests: using an HDMI cable instead of the Display Port cable
> between the monitor and the dock does not change things, black screen
> with the newer kernels.
> 
> Using a small HDMI-to-USB-C adapter instead of the dock results in a
> working system, even with the newer kernels.  So the breakage appears
> to be specific to the Dell WD22TB4 dock.
> 
> Operating System: AlmaLinux 9.3 (Shamrock Pampas Cat)
> 
> uname -mi: x86_64 x86_64
> 
> Laptop: Dell Precision 5470/02RK6V
> 
> lsusb |grep dock
> Bus 003 Device 007: ID 413c:b06e Dell Computer Corp. Dell dock
> Bus 003 Device 008: ID 413c:b06f Dell Computer Corp. Dell dock
> Bus 003 Device 006: ID 0bda:5413 Realtek Semiconductor Corp. Dell dock
> Bus 003 Device 005: ID 0bda:5487 Realtek Semiconductor Corp. Dell dock
> Bus 002 Device 004: ID 0bda:0413 Realtek Semiconductor Corp. Dell dock
> Bus 002 Device 003: ID 0bda:0487 Realtek Semiconductor Corp. Dell dock
> 
> dmesg and kernel config are attached to 
> https://bugzilla.kernel.org/show_bug.cgi?id=218663
> 
> #regzbot introduced: v6.7.9..v6.8.1

P.S.:

#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=218663
#regzbot duplicate: https://gitlab.freedesktop.org/drm/intel/-/issues/10637
#regzbot title: drm/i915/dp_mst: external monitor on Dell dock broke

Re: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait

2024-03-20 Thread Linux regression tracking (Thorsten Leemhuis)

On 08.03.24 02:08, Alex Constantino wrote:
> Fix OOM scenario by doing multiple notifications to the OOM handler through
> a busy wait logic.
> Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
> result in a '[TTM] Buffer eviction failed' exception whenever it reached a
> timeout.
> 
> Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
> Link: 
> https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info
> Reported-by: Timo Lindfors 
> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514
> Signed-off-by: Alex Constantino 
> ---
>  drivers/gpu/drm/qxl/qxl_release.c | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)

Hey Dave and Gerd as well as Thomas, Maarten and Maxime (the latter two
I just added to the CC), it seems to me this regression fix did not
maybe any progress since it was posted. Did I miss something, is it just
"we are busy with the merge window", or is there some other a reason?
Just wondering, I just saw someone on a Fedora IRC channel complaining
about the regression, that's why I'm asking. Would be really good to
finally get this resolved...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

> diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
> b/drivers/gpu/drm/qxl/qxl_release.c
> index 368d26da0d6a..51c22e7f9647 100644
> --- a/drivers/gpu/drm/qxl/qxl_release.c
> +++ b/drivers/gpu/drm/qxl/qxl_release.c
> @@ -20,8 +20,6 @@
>   * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>   */
>  
> -#include 
> -
>  #include 
>  
>  #include "qxl_drv.h"
> @@ -59,14 +57,24 @@ static long qxl_fence_wait(struct dma_fence *fence, bool 
> intr,
>  {
>   struct qxl_device *qdev;
>   unsigned long cur, end = jiffies + timeout;
> + signed long iterations = 1;
> + signed long timeout_fraction = timeout;
>  
>   qdev = container_of(fence->lock, struct qxl_device, release_lock);
>  
> - if (!wait_event_timeout(qdev->release_event,
> + // using HZ as a factor since it is used in ttm_bo_wait_ctx too
> + if (timeout_fraction > HZ) {
> + iterations = timeout_fraction / HZ;
> + timeout_fraction = HZ;
> + }
> + for (int i = 0; i < iterations; i++) {
> + if (wait_event_timeout(
> + qdev->release_event,
>   (dma_fence_is_signaled(fence) ||
> -  (qxl_io_notify_oom(qdev), 0)),
> - timeout))
> - return 0;
> + (qxl_io_notify_oom(qdev), 0)),
> + timeout_fraction))
> + break;
> + }
>  
>   cur = jiffies;
>   if (time_after(cur, end))

Re: [PATCH] Fix divide-by-zero on DP unplug with nouveau

2024-03-11 Thread Linux regression tracking (Thorsten Leemhuis)

On 11.03.24 17:09, Imre Deak wrote:
> On Sat, Feb 10, 2024 at 09:24:59PM +, Chris Bainbridge wrote:
> Sorry for the delay.

Happens, thx for looking onto this!

>> The following trace occurs when using nouveau and unplugging a DP MST
>> adaptor:
> [...] 
>> +if (bpp_x16 == 0)
>> +return 0;
> 
> Could you please move the check to the beginnig of the function and add
> a debug message in case bpp_x16 is 0?
> 
> It looks odd that a driver calls this function with a 0 bpp_x16, and
> ideally it should be fixed in the driver. However as it's a regression
> and we don't have a better idea now:
> 
> Acked-by: Imre Deak 

Chris: as this went into 6.8, please consider adding a stable-tag to
ensure Greg picks this up.

Ciao, Thorsten

Re: [REGRESSION] Divide-by-zero on DisplayPort MST unplug with nouveau

2024-03-11 Thread Linux regression tracking (Thorsten Leemhuis)

On 07.03.24 18:58, Chris Bainbridge wrote:
> - Forwarded message from Chris Bainbridge  
> -
> 
> Date: Sat, 10 Feb 2024 21:24:59 +

Hmm, it looks like nobody is looking into this regression. Is there a
good reason?

Imre, or did you maybe just miss that Chris' regression seems to be
caused by a commit of yours? He initally proposed a fix (the forwarded
mail that is quoted here) more a month ago already here:
https://lore.kernel.org/all/ZcfpqwnkSoiJxeT9@debian.local/

Chris recently filed a ticket, too:
https://gitlab.freedesktop.org/drm/misc/kernel/-/issues/36

Mostly silence there as well. :-/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S: Chris, sorry, I had missed that you initially proposed the fix a
month ago; if I had noticed this earlier I had sent a mail like this one
earlier.
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> From: Chris Bainbridge 
> To: dri-devel@lists.freedesktop.org
> Cc: ly...@redhat.com, ville.syrj...@linux.intel.com, 
> stanislav.lisovs...@intel.com,
>   mrip...@kernel.org, imre.d...@intel.com
> Subject: [PATCH] Fix divide-by-zero on DP unplug with nouveau
> 
> The following trace occurs when using nouveau and unplugging a DP MST
> adaptor:
>>  divide error:  [#1] PREEMPT SMP PTI
>  CPU: 7 PID: 2962 Comm: Xorg Not tainted 6.8.0-rc3+ #744
>  Hardware name: Razer Blade/DANA_MB, BIOS 01.01 08/31/2018
>  RIP: 0010:drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>  Code: c6 b8 01 00 00 00 75 61 01 c6 41 0f af f3 41 0f af f1 c1 e1 04 48 63 
> c7 31 d2 89 ff 48 8b 5d f8 c9 48 0f af f1 48 8d 44 06 ff <48> f7 f7 31 d2 31 
> c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 45 31
>  RSP: 0018:b2c5c211fa30 EFLAGS: 00010206
>  RAX:  RBX:  RCX: 00f59b00
>  RDX:  RSI:  RDI: 
>  RBP: b2c5c211fa48 R08: 0001 R09: 0020
>  R10: 0004 R11:  R12: 00023b4a
>  R13: 91d37d165800 R14: 91d36fac6d80 R15: 91d34a764010
>  FS:  7f4a1ca3fa80() GS:91d6edbc() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 559491d49000 CR3: 00011d180002 CR4: 003706f0
>  Call Trace:
>   
>   ? show_regs+0x6d/0x80
>   ? die+0x37/0xa0
>   ? do_trap+0xd4/0xf0
>   ? do_error_trap+0x71/0xb0
>   ? drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>   ? exc_divide_error+0x3a/0x70
>   ? drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>   ? asm_exc_divide_error+0x1b/0x20
>   ? drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>   ? drm_dp_calc_pbn_mode+0x2e/0x70 [drm_display_helper]
>   nv50_msto_atomic_check+0xda/0x120 [nouveau]
>   drm_atomic_helper_check_modeset+0xa87/0xdf0 [drm_kms_helper]
>   drm_atomic_helper_check+0x19/0xa0 [drm_kms_helper]
>   nv50_disp_atomic_check+0x13f/0x2f0 [nouveau]
>   drm_atomic_check_only+0x668/0xb20 [drm]
>   ? drm_connector_list_iter_next+0x86/0xc0 [drm]
>   drm_atomic_commit+0x58/0xd0 [drm]
>   ? __pfx___drm_printfn_info+0x10/0x10 [drm]
>   drm_atomic_connector_commit_dpms+0xd7/0x100 [drm]
>   drm_mode_obj_set_property_ioctl+0x1c5/0x450 [drm]
>   ? __pfx_drm_connector_property_set_ioctl+0x10/0x10 [drm]
>   drm_connector_property_set_ioctl+0x3b/0x60 [drm]
>   drm_ioctl_kernel+0xb9/0x120 [drm]
>   drm_ioctl+0x2d0/0x550 [drm]
>   ? __pfx_drm_connector_property_set_ioctl+0x10/0x10 [drm]
>   nouveau_drm_ioctl+0x61/0xc0 [nouveau]
>   __x64_sys_ioctl+0xa0/0xf0
>   do_syscall_64+0x76/0x140
>   ? do_syscall_64+0x85/0x140
>   ? do_syscall_64+0x85/0x140
>   entry_SYSCALL_64_after_hwframe+0x6e/0x76
>  RIP: 0033:0x7f4a1cd1a94f
>  Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 
> 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 
> ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
>  RSP: 002b:7ffd2f1df520 EFLAGS: 0246 ORIG_RAX: 0010
>  RAX: ffda RBX: 7ffd2f1df5b0 RCX: 7f4a1cd1a94f
>  RDX: 7ffd2f1df5b0 RSI: c01064ab RDI: 000f
>  RBP: c01064ab R08: 56347932deb8 R09: 56347a7d99c0
>  R10:  R11: 0246 R12: 56347938a220
>  R13: 000f R14: 563479d9f3f0 R15: 
>   
>  Modules linked in: rfcomm xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat 
> nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user 
> xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp 
> llc ccm cmac algif_hash overlay algif_skcipher af_alg bnep binfmt_misc 
> snd_sof_pci_intel_cnl snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_pci 
> snd_sof_xtensa_dsp snd_sof_intel_hda snd_sof snd_sof_utils 
> snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress 
> snd_sof_intel_hda_mlink

Re: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait

2024-03-08 Thread Thorsten Leemhuis

On 08.03.24 02:08, Alex Constantino wrote:
> Fix OOM scenario by doing multiple notifications to the OOM handler through
> a busy wait logic.
> Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
> result in a '[TTM] Buffer eviction failed' exception whenever it reached a
> timeout.

Thx for working on this.

> Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
> Link: 
> https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info

Nitpicking: that ideally should be pointing to
https://lore.kernel.org/regressions/ztgydqrlk6wx_...@eldamar.lan/ , as
that the report and not just a reply to prod things.

Ciao, Thorsten

Re: [pull] drm/msm: drm-msm-next-2024-02-29 for v6.9

2024-03-05 Thread Linux regression tracking (Thorsten Leemhuis)

On 29.02.24 20:04, Rob Clark wrote:
> 
> This is the main pull for v6.9, description below.
> 
> [...]
>
> GPU:
> - fix sc7180 UBWC config

Why was that queued for 6.9? That is a fix for a 6.8 regression that for
untrained eyes like mine does not look overly dangerous (but of course I
might be wrong with that).

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

Re: [PATCH] drm/nouveau: keep DMA buffers required for suspend/resume

2024-03-03 Thread Linux regression tracking (Thorsten Leemhuis)

[adding a bunch of list and people as well as Timur Tabi, who authored
the culprit]

Sid Pranjale, thx for the report. FWIW, I'm just replying to add this to
the regression tracking to ensure it does not fall through the cracks.
Nevertheless let me mention two things while at it:

On 29.02.24 18:58, Sid Pranjale wrote:
> Nouveau deallocates a few buffers post GPU init which are required for GPU 
> suspend/resume to function correctly.
> This is likely not as big an issue on systems where the NVGPU is the only 
> GPU, but on multi-GPU set ups it leads to a regression where the kernel 
> module errors and results in a system-wide rendering freeze.

These lines are too long, see
Documentation/process/submitting-patches.rst for details.

> This commit addresses that regression by moving the two buffers required for 
> suspend and resume to be deallocated at driver unload instead of post init.
> 
> Fixes: 042b5f8 ("drm/nouveau: fix several DMA buffer leaks")

And that should be:

Fixes:  042b5f83841fbf ("drm/nouveau: fix several DMA buffer leaks")

> Signed-off-by: Sid Pranjale 
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
> b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> index a64c81385..a73a5b589 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> @@ -1054,8 +1054,6 @@ r535_gsp_postinit(struct nvkm_gsp *gsp)
>   /* Release the DMA buffers that were needed only for boot and init */
>   nvkm_gsp_mem_dtor(gsp, >boot.fw);
>   nvkm_gsp_mem_dtor(gsp, >libos);
> - nvkm_gsp_mem_dtor(gsp, >rmargs);
> - nvkm_gsp_mem_dtor(gsp, >wpr_meta);
>  
>   return ret;
>  }
> @@ -2163,6 +2161,8 @@ r535_gsp_dtor(struct nvkm_gsp *gsp)
>  
>   r535_gsp_dtor_fws(gsp);
>  
> + nvkm_gsp_mem_dtor(gsp, >rmargs);
> + nvkm_gsp_mem_dtor(gsp, >wpr_meta);
>   nvkm_gsp_mem_dtor(gsp, >shm.mem);
>   nvkm_gsp_mem_dtor(gsp, >loginit);
>   nvkm_gsp_mem_dtor(gsp, >logintr);

To be sure the issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, the Linux kernel regression tracking bot:

#regzbot ^introduced 042b5f83841fbf
#regzbot title drm/nouveau: rendering freezes with multi-GPU setup
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: drm/msm: VT console DisplayPort regression in 6.8-rc1

2024-02-27 Thread Linux regression tracking #update (Thorsten Leemhuis)

[send with a reduced set of recipients, we all get enough mail already]

On 27.02.24 13:40, Johan Hovold wrote:
> 
> Since 6.8-rc1 the VT console is no longer mirrored on an external
> display on coldplug or hotplug on the Lenovo ThinkPad X13s.
>

Thx for the report!

> I've previously reported this here:
> 
>   https://gitlab.freedesktop.org/drm/msm/-/issues/50

Then let's tell regzbot about is as well, in case the ticket comes back
to life now:

#regzbot duplicate: https://gitlab.freedesktop.org/drm/msm/-/issues/50

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

Re: Bug#1061449: linux-image-6.7-amd64: a boot message from amdgpu

2024-02-14 Thread Linux regression tracking #update (Thorsten Leemhuis)

On 27.01.24 14:14, Salvatore Bonaccorso wrote:
> 
> In Debian (https://bugs.debian.org/1061449) we got the following
> quotred report:
> 
> On Wed, Jan 24, 2024 at 07:38:16PM +0100, Patrice Duroux wrote:
>> Package: src:linux
>> Version: 6.7.1-1~exp1
>> Severity: normal
>>
>> Giving a try to 6.7, here is a message extracted from dmesg:
>>
>> [4.177226] [ cut here ]
>> [4.177227] WARNING: CPU: 6 PID: 248 at
>> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:387
>> construct_phy+0xb26/0xd60 [amdgpu]
> 
> Analysis showed that this appears to be a regression from b17ef04bf3a4
> ("drm/amd/display: Pass pwrseq inst for backlight and ABM"). Does that
> ring some bells?
> 
> See: https://bugs.debian.org/1061449#27
> 
> #regzbot introduced: b17ef04bf3a4
> #regzbot link: https://bugs.debian.org/1061449
> #regzbot title: Regression by b17ef04bf3a4 ("drm/amd/display: Pass pwrseq 
> inst for backlight and ABM")

#regzbot monitor:
https://lore.kernel.org/amd-gfx/20240214184006.1356137-8-rodrigo.sique...@amd.com/
#regzbot fix: drm/amd/display: Only allow dig mapping to pwrseq in new asic
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: drm/msm: DisplayPort regressions in 6.8-rc1

2024-02-14 Thread Linux regression tracking (Thorsten Leemhuis)

On 13.02.24 19:00, Abhinav Kumar wrote:
> 
> Thanks for the report.
> 
> I do agree that pm runtime eDP driver got merged that time but I think
> the issue is either a combination of that along with DRM aux bridge
> https://patchwork.freedesktop.org/series/122584/ OR just the latter as
> even that went in around the same time.

In that case allow me a stupid question from the cheap seats:

Is there anything affected users can do to help getting us closer to the
real problem? Like testing a specific commit or two before or after the
merge of one of those features for example? That might help to rule out
a few things.

Ciao, Thorsten

> Thats why perhaps this issue was not seen with the chromebooks we tested
> on as they do not use pmic_glink (aux bridge).
> 
> So we will need to debug this on sc8280xp specifically or an equivalent
> device which uses aux bridge.
> 
> On 2/13/2024 3:42 AM, Johan Hovold wrote:
>> Hi,
>>
>> Since 6.8-rc1 the internal eDP display on the Lenovo ThinkPad X13s does
>> not always show up on boot.
>>
>> The logs indicate problems with the runtime PM and eDP rework that went
>> into 6.8-rc1:
>>
>> [    6.006236] Console: switching to colour dummy device 80x25
>> [    6.007542] [drm:dpu_kms_hw_init:1048] dpu hardware
>> revision:0x8000
>> [    6.007872] [drm:drm_bridge_attach [drm]] *ERROR* failed to
>> attach bridge /soc@0/phy@88eb000 to encoder TMDS-31: -16
>> [    6.007934] [drm:dp_bridge_init [msm]] *ERROR* failed to attach
>> panel bridge: -16
>> [    6.007983] msm_dpu ae01000.display-controller:
>> [drm:msm_dp_modeset_init [msm]] *ERROR* failed to create dp bridge: -16
>> [    6.008030] [drm:_dpu_kms_initialize_displayport:588] [dpu
>> error]modeset_init failed for DP, rc = -16
>> [    6.008050] [drm:_dpu_kms_setup_displays:681] [dpu
>> error]initialize_DP failed, rc = -16
>> [    6.008068] [drm:dpu_kms_hw_init:1153] [dpu error]modeset init
>> failed: -16
>> [    6.008388] msm_dpu ae01000.display-controller:
>> [drm:msm_drm_kms_init [msm]] *ERROR* kms hw init failed: -16
>> 
>> and this can also manifest itself as a NULL-pointer dereference:
>>
>> [    7.339447] Unable to handle kernel NULL pointer dereference at
>> virtual address 
>> 
>> [    7.643705] pc : drm_bridge_attach+0x70/0x1a8 [drm]
>> [    7.686415] lr : drm_aux_bridge_attach+0x24/0x38 [aux_bridge]
>> 
>> [    7.769039] Call trace:
>> [    7.771564]  drm_bridge_attach+0x70/0x1a8 [drm]
>> [    7.776234]  drm_aux_bridge_attach+0x24/0x38 [aux_bridge]
>> [    7.781782]  drm_bridge_attach+0x80/0x1a8 [drm]
>> [    7.786454]  dp_bridge_init+0xa8/0x15c [msm]
>> [    7.790856]  msm_dp_modeset_init+0x28/0xc4 [msm]
>> [    7.795617]  _dpu_kms_drm_obj_init+0x19c/0x680 [msm]
>> [    7.800731]  dpu_kms_hw_init+0x348/0x4c4 [msm]
>> [    7.805306]  msm_drm_kms_init+0x84/0x324 [msm]
>> [    7.809891]  msm_drm_bind+0x1d8/0x3a8 [msm]
>> [    7.814196]  try_to_bring_up_aggregate_device+0x1f0/0x2f8
>> [    7.819747]  __component_add+0xa4/0x18c
>> [    7.823703]  component_add+0x14/0x20
>> [    7.827389]  dp_display_probe+0x47c/0x568 [msm]
>> [    7.832052]  platform_probe+0x68/0xd8
>>
>> Users have also reported random crashes at boot since 6.8-rc1, and I've
>> been able to trigger hard crashes twice when testing an external display
>> (USB-C/DP), which may also be related to the DP regressions.
>>
>> I've opened an issue here:
>>
>> https://gitlab.freedesktop.org/drm/msm/-/issues/51
>>
>> but I also want Thorsten's help to track this so that it gets fixed
>> before 6.8 is released.
>>
>> #regzbot introduced: v6.7..v6.8-rc1
>>
>> The following series is likely the culprit:
>>
>> 
>> https://lore.kernel.org/all/1701472789-25951-1-git-send-email-quic_khs...@quicinc.com/
>>
>> Johan
> 
>

Re: Bug#1061449: linux-image-6.7-amd64: a boot message from amdgpu

2024-01-28 Thread Linux regression tracking (Thorsten Leemhuis)

On 27.01.24 14:14, Salvatore Bonaccorso wrote:
>
> In Debian (https://bugs.debian.org/1061449) we got the following
> quotred report:
> 
> On Wed, Jan 24, 2024 at 07:38:16PM +0100, Patrice Duroux wrote:
>>
>> Giving a try to 6.7, here is a message extracted from dmesg:
>> [4.177226] [ cut here ]
>> [4.177227] WARNING: CPU: 6 PID: 248 at
>> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:387
>> construct_phy+0xb26/0xd60 [amdgpu]
> [...]

Not my area of expertise, but looks a lot like a duplicate of
https://gitlab.freedesktop.org/drm/amd/-/issues/3122#note_2252835

Mario (now CCed) already prepared a patch for that issue that seems to work.

HTH, Ciao, Thorsten

Re: [git pull] drm for 6.8

2024-01-24 Thread Thorsten Leemhuis

Linus, if you have a minute, I'd really like to know...

On 24.01.24 17:41, Mario Limonciello wrote:
> On 1/24/2024 10:24, Vlastimil Babka wrote:
>> On 1/24/24 16:31, Donald Carr wrote:
>>> On Wed, Jan 24, 2024 at 7:06 AM Vlastimil Babka  wrote:
 When testing the rc1 on my openSUSE Tumbleweed desktop, I've started
 experiencing "frozen desktop" (KDE/Wayland) issues. The symptoms are
 that
 everything freezes including mouse cursor. After a while it either
 resolves,
 or e.g. firefox crashes (if it was actively used when it froze) or it's
 frozen for too long and I reboot with alt-sysrq-b. When it's frozen
 I can
 still ssh to the machine, and there's nothing happening in dmesg.
 The machine is based on Amd Ryzen 7 2700 and Radeon RX7600.
>>> [...]
>>> I am experiencing the exact same symptoms;
>>
>> Big thanks to Thorsten who suggested I look at the following:
>>
>> https://lore.kernel.org/all/20240123021155.2775-1-mario.limoncie...@amd.com/
>> https://lore.kernel.org/all/CABXGCsM2VLs489CH-vF-1539-s3in37=bwuowtoeee+q26z...@mail.gmail.com/
>>
>> Instead of further bisection I've applied Mario's revert from the
>> first link
>> on top of 6.8-rc1 and the issue seems gone for me now.
> 
> Thanks for confirming.  I don't think we should jump right to the revert
> right now.
>
>  I posted it in case that is the direction we need to go
> (simple git revert didn't work due to contextual changes).
> 
> Let's give the folks who work on GPU scheduler some time to understand
> the failure and see if they can fix it.

...how you think about this and other situations like this. Given that
we have

* two affected people in this thread
* one earlier thread about it
* the machine that made Mario write the patch
* and I have someone in #fedora-kernel that likely is affected as well

it seems that this is not some corner case very few people run into.
Hence I tend to say that this should be dealt with rather sooner than
later. Maybe before rc2? Or is this asking too much?

The thing from my point of view is, that each such problem might
discourage testers from testing again or lead to thoughts like "I only
start testing after -rc4". Not to mention that other people will try to
bisect the problem like Vlastimil did, which will cost them quite some
time and effort -- only to find out that we known about the problem
already and did not quickly fix it. That is discouraging for them as
well and thus bad for field testing I'd assume.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

Re: [PATCH] drm/Makefile: Move tiny drivers before native drivers

2024-01-23 Thread Thorsten Leemhuis

On 23.01.24 10:17, Thorsten Leemhuis wrote:
> On 23.01.24 09:53, Jani Nikula wrote:
>> On Wed, 08 Nov 2023, Thomas Zimmermann  wrote:
>>>
>>> thanks for the patch.
>>>
>>> Am 08.11.23 um 03:46 schrieb Huacai Chen:
>>>> After commit 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from
>>>> device_initcall to subsys_initcall_sync") some Lenovo laptops get a blank
>>>> screen until the display manager starts.
>>>>
>>>> This regression occurs with such a Kconfig combination:
>>>> CONFIG_SYSFB=y
>>>> CONFIG_SYSFB_SIMPLEFB=y
>>>> CONFIG_DRM_SIMPLEDRM=y
>>>> CONFIG_DRM_I915=y  # Or other native drivers such as radeon, amdgpu
>>>>
>>>> If replace CONFIG_DRM_SIMPLEDRM with CONFIG_FB_SIMPLE (they use the same
>>>> device), there is no blank screen. The root cause is the initialization
>>>> order, and this order depends on the Makefile.
>>>>
>>>> FB_SIMPLE is before native DRM drivers (e.g. i915, radeon, amdgpu, and
>>>> so on), but DRM_SIMPLEDRM is after them. Thus, if we use FB_SIMPLE, I915
>>>> will takeover FB_SIMPLE, then no problem; and if we use DRM_SIMPLEDRM,
>>>> DRM_SIMPLEDRM will try to takeover I915, but fails to work.
>>>
>>> But what exactly is the problem? From the lengthy discussion threat, it 
>>> looks like you've stumbled across a long-known problem, where the 
>>> firmware driver probes a device that has already been taken by a native 
>>> driver. But that should not be possible.
>>>
>>> As you know, there's a platform device that represents the firmware 
>>> framebuffer. The firmware drivers, such as simpledrm, bind to it. In 
>>> i915 and the other native drivers we remove that platform device, so 
>>> that simpledrm does not run.
>>
>> The problem is still not resolved. Another bug report at [1].
>>
>> The commit message here points at 60aebc955949 ("drivers/firmware: Move
>> sysfb_init() from device_initcall to subsys_initcall_sync") as
>> regressing, and Jaak also bisected it (see Closes:).
>>
>> I agree the patch here is just papering over the issue, but lacking a
>> proper fix, for months, a revert would be in order, no?
>>
>> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/10133
> 
> Interesting.
> 
> JFYI for those that don't follow this closely: Huacai Chen proposed a
> fix

Sorry, I did not look closely and misremembered: that was not a fix, it
was just a test patch for gather more data for debugging.

Ciao, Thorsten

> and asked a earlier reporter to test it, but afaics heard nothing back:
> 
> https://lore.kernel.org/all/CAAhV-H5eXM7FNzuRCMthAziG_jg75XwQV3grpw=sdyj-9gx...@mail.gmail.com/
> 
> That's afaics why this got stuck (and why I didn't request on a escalate
> this weeks ago).
> 
> Ciao, Thorsten
> 
>>> We call the DRM aperture helpers at [1]. It's implemented at [2]. The 
>>> function contains a call to sysfb_disable(), [3] which should be invoked 
>>> for the i915 device and remove the platform device.
>>>
>>> [1] 
>>> https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/i915/i915_driver.c#L489
>>> [2] 
>>> https://elixir.bootlin.com/linux/v6.5/source/drivers/video/aperture.c#L347
>>> [3] 
>>> https://elixir.bootlin.com/linux/v6.5/source/drivers/firmware/sysfb.c#L63
>>>
>>> Can you investigate why this does not work? Is sysfb_disable() not being 
>>> called? Does it remove the platform device?
>>>
>>>>
>>>> So we can move the "tiny" directory before native DRM drivers to solve
>>>> this problem.
>>>
>>> Relying on linking order is just as unreliable. The usual workaround is 
>>> to build native drivers as modules. But first, please investigate where 
>>> the current code fails.
>>>
>>> Best regards
>>> Thomas
>>>
>>>>
>>>> Fixes: 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from 
>>>> device_initcall to subsys_initcall_sync")
>>>> Closes: 
>>>> https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t
>>>> Reported-by: Jaak Ristioja 
>>>> Signed-off-by: Huacai Chen 
>>>> ---
>>>>   drivers/gpu/drm/Makefile | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>>>> index 8e1bde059170..db0f3d3aff43 100644
>>>> --- a/drivers/gpu/drm/Makefile
>>>> +++ b/drivers/gpu/drm/Makefile
>>>> @@ -141,6 +141,7 @@ obj-y  += arm/
>>>>   obj-y+= display/
>>>>   obj-$(CONFIG_DRM_TTM)+= ttm/
>>>>   obj-$(CONFIG_DRM_SCHED)  += scheduler/
>>>> +obj-y += tiny/
>>>>   obj-$(CONFIG_DRM_RADEON)+= radeon/
>>>>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
>>>>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdxcp/
>>>> @@ -182,7 +183,6 @@ obj-$(CONFIG_DRM_FSL_DCU) += fsl-dcu/
>>>>   obj-$(CONFIG_DRM_ETNAVIV) += etnaviv/
>>>>   obj-y+= hisilicon/
>>>>   obj-y+= mxsfb/
>>>> -obj-y += tiny/
>>>>   obj-$(CONFIG_DRM_PL111) += pl111/
>>>>   obj-$(CONFIG_DRM_TVE200) += tve200/
>>>>   obj-$(CONFIG_DRM_XEN) += xen/
>>

Re: [PATCH] drm/Makefile: Move tiny drivers before native drivers

2024-01-23 Thread Thorsten Leemhuis

On 23.01.24 09:53, Jani Nikula wrote:
> On Wed, 08 Nov 2023, Thomas Zimmermann  wrote:
>>
>> thanks for the patch.
>>
>> Am 08.11.23 um 03:46 schrieb Huacai Chen:
>>> After commit 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from
>>> device_initcall to subsys_initcall_sync") some Lenovo laptops get a blank
>>> screen until the display manager starts.
>>>
>>> This regression occurs with such a Kconfig combination:
>>> CONFIG_SYSFB=y
>>> CONFIG_SYSFB_SIMPLEFB=y
>>> CONFIG_DRM_SIMPLEDRM=y
>>> CONFIG_DRM_I915=y  # Or other native drivers such as radeon, amdgpu
>>>
>>> If replace CONFIG_DRM_SIMPLEDRM with CONFIG_FB_SIMPLE (they use the same
>>> device), there is no blank screen. The root cause is the initialization
>>> order, and this order depends on the Makefile.
>>>
>>> FB_SIMPLE is before native DRM drivers (e.g. i915, radeon, amdgpu, and
>>> so on), but DRM_SIMPLEDRM is after them. Thus, if we use FB_SIMPLE, I915
>>> will takeover FB_SIMPLE, then no problem; and if we use DRM_SIMPLEDRM,
>>> DRM_SIMPLEDRM will try to takeover I915, but fails to work.
>>
>> But what exactly is the problem? From the lengthy discussion threat, it 
>> looks like you've stumbled across a long-known problem, where the 
>> firmware driver probes a device that has already been taken by a native 
>> driver. But that should not be possible.
>>
>> As you know, there's a platform device that represents the firmware 
>> framebuffer. The firmware drivers, such as simpledrm, bind to it. In 
>> i915 and the other native drivers we remove that platform device, so 
>> that simpledrm does not run.
> 
> The problem is still not resolved. Another bug report at [1].
> 
> The commit message here points at 60aebc955949 ("drivers/firmware: Move
> sysfb_init() from device_initcall to subsys_initcall_sync") as
> regressing, and Jaak also bisected it (see Closes:).
> 
> I agree the patch here is just papering over the issue, but lacking a
> proper fix, for months, a revert would be in order, no?
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/10133

Interesting.

JFYI for those that don't follow this closely: Huacai Chen proposed a
fix and asked a earlier reporter to test it, but afaics heard nothing back:

https://lore.kernel.org/all/CAAhV-H5eXM7FNzuRCMthAziG_jg75XwQV3grpw=sdyj-9gx...@mail.gmail.com/

That's afaics why this got stuck (and why I didn't request on a escalate
this weeks ago).

Ciao, Thorsten

>> We call the DRM aperture helpers at [1]. It's implemented at [2]. The 
>> function contains a call to sysfb_disable(), [3] which should be invoked 
>> for the i915 device and remove the platform device.
>>
>> [1] 
>> https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/i915/i915_driver.c#L489
>> [2] 
>> https://elixir.bootlin.com/linux/v6.5/source/drivers/video/aperture.c#L347
>> [3] 
>> https://elixir.bootlin.com/linux/v6.5/source/drivers/firmware/sysfb.c#L63
>>
>> Can you investigate why this does not work? Is sysfb_disable() not being 
>> called? Does it remove the platform device?
>>
>>>
>>> So we can move the "tiny" directory before native DRM drivers to solve
>>> this problem.
>>
>> Relying on linking order is just as unreliable. The usual workaround is 
>> to build native drivers as modules. But first, please investigate where 
>> the current code fails.
>>
>> Best regards
>> Thomas
>>
>>>
>>> Fixes: 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from 
>>> device_initcall to subsys_initcall_sync")
>>> Closes: 
>>> https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t
>>> Reported-by: Jaak Ristioja 
>>> Signed-off-by: Huacai Chen 
>>> ---
>>>   drivers/gpu/drm/Makefile | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>>> index 8e1bde059170..db0f3d3aff43 100644
>>> --- a/drivers/gpu/drm/Makefile
>>> +++ b/drivers/gpu/drm/Makefile
>>> @@ -141,6 +141,7 @@ obj-y   += arm/
>>>   obj-y += display/
>>>   obj-$(CONFIG_DRM_TTM) += ttm/
>>>   obj-$(CONFIG_DRM_SCHED)   += scheduler/
>>> +obj-y  += tiny/
>>>   obj-$(CONFIG_DRM_RADEON)+= radeon/
>>>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
>>>   obj-$(CONFIG_DRM_AMDGPU)+= amd/amdxcp/
>>> @@ -182,7 +183,6 @@ obj-$(CONFIG_DRM_FSL_DCU) += fsl-dcu/
>>>   obj-$(CONFIG_DRM_ETNAVIV) += etnaviv/
>>>   obj-y += hisilicon/
>>>   obj-y += mxsfb/
>>> -obj-y  += tiny/
>>>   obj-$(CONFIG_DRM_PL111) += pl111/
>>>   obj-$(CONFIG_DRM_TVE200) += tve200/
>>>   obj-$(CONFIG_DRM_XEN) += xen/
>

Re: [REGRESSION] rx7600 stopped working after "1cfb4d612127 drm/amdgpu: put MQDs in VRAM"

2023-12-06 Thread Linux regression tracking #update (Thorsten Leemhuis)

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 26.10.23 19:33, Alexey Klimov wrote:
> #regzbot introduced: 1cfb4d612127
> #regzbot title: rx7600 stopped working after "1cfb4d612127 drm/amdgpu: put 
> MQDs in VRAM"
> 
> Hi all,
> 
> I've been playing with RX7600 and it was observed that amdgpu stopped working 
> between kernel 6.2 and 6.5.
> Then I narrowed it down to 6.4 <-> 6.5-rc1 and finally bisect pointed at 
> 1cfb4d6121276a829aa94d0e32a7f5e1830ebc21
> And I manually checked if it boots/works on the previous commit and the 
> mentioned one.

#regzbot fix: ba0fb4b48c19a
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: Bug#1054514: linux-image-6.1.0-13-amd64: Debian VM with qxl graphics freezes frequently

2023-12-06 Thread Linux regression tracking (Thorsten Leemhuis)

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Gerd, it seems this regression[1] fell through the cracks. Could you
please take a look? Or is there a good reason why this can't be
addressed? Or was it dealt with and I just missed it?

[1] apparently caused by 5a838e5d5825c8 ("drm/qxl: simplify
qxl_fence_wait") [v5.13-rc1] from Gerd; for details see
https://lore.kernel.org/regressions/ztgydqrlk6wx_...@eldamar.lan/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 24.10.23 23:39, Timo Lindfors wrote:
> Hi,
> 
> On Tue, 24 Oct 2023, Salvatore Bonaccorso wrote:
>> Thanks for the excelent constructed report! I think it's best to
>> forward this directly to upstream including the people for the
>> bisected commit to get some idea.
> 
> Thanks for the quick reply!
> 
>> Can you reproduce the issue with 6.5.8-1 in unstable as well?
> 
> Unfortunately yes:
> 
> ansible@target:~$ uname -r
> 6.5.0-3-amd64
> ansible@target:~$ time sudo ./reproduce.bash
> Wed 25 Oct 2023 12:27:00 AM EEST starting round 1
> Wed 25 Oct 2023 12:27:24 AM EEST starting round 2
> Wed 25 Oct 2023 12:27:48 AM EEST starting round 3
> bug was reproduced after 3 tries
> 
> real    0m48.838s
> user    0m1.115s
> sys 0m45.530s
> 
> I also tested upstream tag v6.6-rc6:
> 
> ...
> + detected_version=6.6.0-rc6
> + '[' 6.6.0-rc6 '!=' 6.6.0-rc6 ']'
> + exec ssh target sudo ./reproduce.bash
> Wed 25 Oct 2023 12:37:16 AM EEST starting round 1
> Wed 25 Oct 2023 12:37:42 AM EEST starting round 2
> Wed 25 Oct 2023 12:38:10 AM EEST starting round 3
> Wed 25 Oct 2023 12:38:36 AM EEST starting round 4
> Wed 25 Oct 2023 12:39:01 AM EEST starting round 5
> Wed 25 Oct 2023 12:39:27 AM EEST starting round 6
> bug was reproduced after 6 tries
> 
> 
> For completeness, here is also the grub_set_default_version.bash script
> that I had to write to automate this (maybe these could be in debian
> wiki?):
> 
> #!/bin/bash
> set -x
> 
> version="$1"
> 
> idx=$(expr $(grep "menuentry " /boot/grub/grub.cfg | sed 1d |grep -n
> "'Debian GNU/Linux, with Linux $version'"|cut -d: -f1) - 1)
> exec sudo grub-set-default "1>$idx"
> 
> 
> 
> -Timo
> 
> 
>

Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2023-12-05 Thread Thorsten Leemhuis

Karol, Lyude, and Daniel:

On 29.11.23 01:37, Owen T. Heisler wrote:
> On 11/21/23 14:23, Owen T. Heisler wrote:
>> On 11/21/23 09:16, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> On 15.11.23 07:19, Owen T. Heisler wrote:
>>>> On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>>> On 28.10.23 04:46, Owen T. Heisler wrote:
>>>>>> #regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
>>>>>> #regzbot link:
>>>>>> https://gitlab.freedesktop.org/drm/nouveau/-/issues/180
>>>>>>
>>>>>> 3. Suddenly the secondary Nvidia-connected display turns off and X
>>>>>> stops responding to keyboard/mouse input.
> 
>> I am currently testing v6.6 with the culprit commit reverted.
> 
> - v6.6: fails
> - v6.6 with the culprit commit reverted: works
> 
> See <https://gitlab.freedesktop.org/drm/nouveau/-/issues/180> for full
> details including a decoded kernel log.

Not sure about the others, but it's kind of confusing that you update
the issue descriptions all the time and never add a comment to that ticket.

Anyway: Nouveau maintainers, could any of you at least comment on this?
Sure, it's the regression is caused by an old commit (6eaa1f3c59a707 was
merged for v5.14-rc7) and reverting it likely is not a option, but it
nevertheless it would be great if this could be solved somehow.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

Re: [PATCH v2 2/2] drm/msm/dp: attach the DP subconnector property

2023-11-21 Thread Linux regression tracking (Thorsten Leemhuis)

On 21.11.23 19:50, Abhinav Kumar wrote:
> On 11/21/2023 9:57 AM, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 15.11.23 19:06, Abhinav Kumar wrote:
>>> On 11/15/2023 12:06 AM, Johan Hovold wrote:
>>>> On Wed, Oct 25, 2023 at 12:23:10PM +0300, Dmitry Baryshkov wrote:
>>>>> While developing and testing the commit bfcc3d8f94f4 ("drm/msm/dp:
>>>>> support setting the DP subconnector type") I had the patch [1] in my
>>>>> tree. I haven't noticed that it was a dependency for the commit in
>>>>> question. Mea culpa.
>>>>
>>>> This also broke boot on the Lenovo ThinkPad X13s.
>>>>
>>>> Would be nice to get this fixed ASAP so that further people don't have
>>>> to debug this known regression.
>>>
>>> I will queue this patch for -fixes rightaway.
>>
>> Thx. I noticed that this fix is still not in -next. I then investigated
>> and I found it was applied on Thursday last week here:
>> https://gitlab.freedesktop.org/drm/msm/-/commits/msm-fixes?ref_type=heads
>>
>> Makes me wonder: when will that patch go to a branch that is included in
>> -next? And when will it move on towards mainline?
> 
> This has been included in a pull request for 6.7-rc3 to the DRM tree and
> shall make it to -next from there.

Ahh, great, thx, I was slowly getting worried.

Ciao, Thorsten

>>>>> Since the patch has not landed yet (and even was not reviewed)
>>>>> and since one of the bridges erroneously uses USB connector type
>>>>> instead
>>>>> of DP, attach the property directly from the MSM DP driver.
>>>>>
>>>>> This fixes the following oops on DP HPD event:
>>>>>
>>>>>    drm_object_property_set_value
>>>>> (drivers/gpu/drm/drm_mode_object.c:288)
>>>>>    dp_display_process_hpd_high
>>>>> (drivers/gpu/drm/msm/dp/dp_display.c:402)
>>>>>    dp_hpd_plug_handle.isra.0 (drivers/gpu/drm/msm/dp/dp_display.c:604)
>>>>>    hpd_event_thread (drivers/gpu/drm/msm/dp/dp_display.c:1110)
>>>>>    kthread (kernel/kthread.c:388)
>>>>>    ret_from_fork (arch/arm64/kernel/entry.S:858)
>>>>
>>>> This only says where the oops happened, it doesn't necessarily in
>>>> itself
>>>> indicate an oops at all or that in this case it's a NULL pointer
>>>> dereference.
>>>>
>>>> On the X13s I'm seeing the NULL deref in a different path during boot,
>>>> and when this happens after a deferred probe (due to the panel lookup
>>>> mess) it hangs the machine, which makes it a bit of a pain to debug:
>>>>
>>>>  Unable to handle kernel NULL pointer dereference at virtual
>>>> address 0060
>>>>  ...
>>>>  CPU: 4 PID: 57 Comm: kworker/u16:1 Not tainted 6.7.0-rc1 #4
>>>>  Hardware name: Qualcomm QRD, BIOS
>>>> 6.0.220110.BOOT.MXF.1.1-00470-MAKENA-1 01/10/2022
>>>>  ...
>>>>  Call trace:
>>>>   drm_object_property_set_value+0x0/0x88 [drm]
>>>>   dp_display_process_hpd_high+0xa0/0x14c [msm]
>>>>   dp_hpd_plug_handle.constprop.0.isra.0+0x90/0x110 [msm]
>>>>   dp_bridge_atomic_enable+0x184/0x21c [msm]
>>>>   edp_bridge_atomic_enable+0x60/0x94 [msm]
>>>>   drm_atomic_bridge_chain_enable+0x54/0xc8 [drm]
>>>>   drm_atomic_helper_commit_modeset_enables+0x194/0x26c
>>>> [drm_kms_helper]
>>>>   msm_atomic_commit_tail+0x204/0x804 [msm]
>>>>   commit_tail+0xa4/0x18c [drm_kms_helper]
>>>>   drm_atomic_helper_commit+0x19c/0x1b0 [drm_kms_helper]
>>>>   drm_atomic_commit+0xa4/0x104 [drm]
>>>>   drm_client_modeset_commit_atomic+0x22c/0x298 [drm]
>>>>   drm_client_modeset_commit_locked+0x60/0x1c0 [drm]
>>>>   drm_client_modeset_commit+0x30/0x58 [drm]
>>>>   __drm_fb_helper_restore_fbdev_mode_unlocked+0xbc/0xfc
>>>> [drm_kms_helper]
>>>>   drm_fb_helper_set_par+0x30/0x4c [drm_kms_helper]
>>>>   fbcon_init+0x224/0x49c
>>>>   visual_init+0xb0/0x108
>>>>   do_bind_con_driver.isra.0+0x19c/0x38c
>>>>   do_take_over_console+0x140/0x1ec
>>>>   do_fbcon_takeover+0x6c/0xe4
>>>>   fbcon_fb_registered+0x180/0x1f0
>>>>   register_framebuffer+0x19c/0x228
>>>>   __drm_fb_helper_initial_config_and_unlock+0x2e8/0x4e8
>>>> [drm_kms_helper]
>>>>   drm_fb_helper_initial_config+0x3c/0x4c [drm_kms_helper]
>>>>   msm_fbdev_client_hotplug+0x84/0xcc [msm]
>>>>   drm_client_register+0x5c/0xa0 [drm]
>>>>   msm_fbdev_setup+0x94/0x148 [msm]
>>>>   msm_drm_bind+0x3d0/0x42c [msm]
>>>>   try_to_bring_up_aggregate_device+0x1ec/0x2f4
>>>>   __component_add+0xa8/0x194
>>>>   component_add+0x14/0x20
>>>>   dp_display_probe+0x278/0x41c [msm]
>>>>
>>>>> [1] https://patchwork.freedesktop.org/patch/30/
>>>>>
>>>>> Fixes: bfcc3d8f94f4 ("drm/msm/dp: support setting the DP subconnector
>>>>> type")
>>>>> Reviewed-by: Abhinav Kumar 
>>>>> Signed-off-by: Dmitry Baryshkov 
>>>>
>>>> Reviewed-by: Johan Hovold 
>>>> Tested-by: Johan Hovold 
>>>>
>>>
>>> Thanks !
>>>
>>>> Johan
> 
>

Re: [PATCH v2 2/2] drm/msm/dp: attach the DP subconnector property

2023-11-21 Thread Linux regression tracking (Thorsten Leemhuis)

On 15.11.23 19:06, Abhinav Kumar wrote:
> On 11/15/2023 12:06 AM, Johan Hovold wrote:
>> On Wed, Oct 25, 2023 at 12:23:10PM +0300, Dmitry Baryshkov wrote:
>>> While developing and testing the commit bfcc3d8f94f4 ("drm/msm/dp:
>>> support setting the DP subconnector type") I had the patch [1] in my
>>> tree. I haven't noticed that it was a dependency for the commit in
>>> question. Mea culpa.
>>
>> This also broke boot on the Lenovo ThinkPad X13s.
>>
>> Would be nice to get this fixed ASAP so that further people don't have
>> to debug this known regression.
> 
> I will queue this patch for -fixes rightaway.

Thx. I noticed that this fix is still not in -next. I then investigated
and I found it was applied on Thursday last week here:
https://gitlab.freedesktop.org/drm/msm/-/commits/msm-fixes?ref_type=heads

Makes me wonder: when will that patch go to a branch that is included in
-next? And when will it move on towards mainline?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

>>> Since the patch has not landed yet (and even was not reviewed)
>>> and since one of the bridges erroneously uses USB connector type instead
>>> of DP, attach the property directly from the MSM DP driver.
>>>
>>> This fixes the following oops on DP HPD event:
>>>
>>>   drm_object_property_set_value (drivers/gpu/drm/drm_mode_object.c:288)
>>>   dp_display_process_hpd_high (drivers/gpu/drm/msm/dp/dp_display.c:402)
>>>   dp_hpd_plug_handle.isra.0 (drivers/gpu/drm/msm/dp/dp_display.c:604)
>>>   hpd_event_thread (drivers/gpu/drm/msm/dp/dp_display.c:1110)
>>>   kthread (kernel/kthread.c:388)
>>>   ret_from_fork (arch/arm64/kernel/entry.S:858)
>>
>> This only says where the oops happened, it doesn't necessarily in itself
>> indicate an oops at all or that in this case it's a NULL pointer
>> dereference.
>>
>> On the X13s I'm seeing the NULL deref in a different path during boot,
>> and when this happens after a deferred probe (due to the panel lookup
>> mess) it hangs the machine, which makes it a bit of a pain to debug:
>>
>>     Unable to handle kernel NULL pointer dereference at virtual
>> address 0060
>>     ...
>>     CPU: 4 PID: 57 Comm: kworker/u16:1 Not tainted 6.7.0-rc1 #4
>>     Hardware name: Qualcomm QRD, BIOS
>> 6.0.220110.BOOT.MXF.1.1-00470-MAKENA-1 01/10/2022
>>     ...
>>     Call trace:
>>  drm_object_property_set_value+0x0/0x88 [drm]
>>  dp_display_process_hpd_high+0xa0/0x14c [msm]
>>  dp_hpd_plug_handle.constprop.0.isra.0+0x90/0x110 [msm]
>>  dp_bridge_atomic_enable+0x184/0x21c [msm]
>>  edp_bridge_atomic_enable+0x60/0x94 [msm]
>>  drm_atomic_bridge_chain_enable+0x54/0xc8 [drm]
>>  drm_atomic_helper_commit_modeset_enables+0x194/0x26c
>> [drm_kms_helper]
>>  msm_atomic_commit_tail+0x204/0x804 [msm]
>>  commit_tail+0xa4/0x18c [drm_kms_helper]
>>  drm_atomic_helper_commit+0x19c/0x1b0 [drm_kms_helper]
>>  drm_atomic_commit+0xa4/0x104 [drm]
>>  drm_client_modeset_commit_atomic+0x22c/0x298 [drm]
>>  drm_client_modeset_commit_locked+0x60/0x1c0 [drm]
>>  drm_client_modeset_commit+0x30/0x58 [drm]
>>  __drm_fb_helper_restore_fbdev_mode_unlocked+0xbc/0xfc
>> [drm_kms_helper]
>>  drm_fb_helper_set_par+0x30/0x4c [drm_kms_helper]
>>  fbcon_init+0x224/0x49c
>>  visual_init+0xb0/0x108
>>  do_bind_con_driver.isra.0+0x19c/0x38c
>>  do_take_over_console+0x140/0x1ec
>>  do_fbcon_takeover+0x6c/0xe4
>>  fbcon_fb_registered+0x180/0x1f0
>>  register_framebuffer+0x19c/0x228
>>  __drm_fb_helper_initial_config_and_unlock+0x2e8/0x4e8
>> [drm_kms_helper]
>>  drm_fb_helper_initial_config+0x3c/0x4c [drm_kms_helper]
>>  msm_fbdev_client_hotplug+0x84/0xcc [msm]
>>  drm_client_register+0x5c/0xa0 [drm]
>>  msm_fbdev_setup+0x94/0x148 [msm]
>>  msm_drm_bind+0x3d0/0x42c [msm]
>>  try_to_bring_up_aggregate_device+0x1ec/0x2f4
>>  __component_add+0xa8/0x194
>>  component_add+0x14/0x20
>>  dp_display_probe+0x278/0x41c [msm]
>>
>>> [1] https://patchwork.freedesktop.org/patch/30/
>>>
>>> Fixes: bfcc3d8f94f4 ("drm/msm/dp: support setting the DP subconnector
>>> type")
>>> Reviewed-by: Abhinav Kumar 
>>> Signed-off-by: Dmitry Baryshkov 
>>
>> Reviewed-by: Johan Hovold 
>> Tested-by: Johan Hovold 
>>
> 
> Thanks !
> 
>> Johan

Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2023-11-21 Thread Linux regression tracking (Thorsten Leemhuis)

On 15.11.23 07:19, Owen T. Heisler wrote:
> On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 28.10.23 04:46, Owen T. Heisler wrote:
>>> #regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
>>> #regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/180
>>>
>>> ## Problem
>>>
>>> 1. Connect external display to DVI port on dock and run X with both
>>>     displays in use.
>>> 2. Wait hours or days.
>>> 3. Suddenly the secondary Nvidia-connected display turns off and X stops
>>>     responding to keyboard/mouse input. In *some* cases it is
>>> possible to
>>>     switch to a virtual TTY with Ctrl+Alt+Fn and log in there.
> 
>> You thus might want to check if the problem occurs with 6.6 -- and
>> ideally also check if reverting the culprit there fixes things for you.
> 
> The problem also occurs with v6.6.

You meanwhile might want to give 6.7-rc as well on the off chance that
it improves things, even if that is unlikely.

> Here is a decoded kernel log from an
> untainted kernel:
> 
> https://gitlab.freedesktop.org/drm/nouveau/uploads/c120faf09da46f9c74006df9f1d14442/async-wait-on-fence-180.log
> 
> The culprit commit does not revert cleanly on v6.6. I have not yet
> attempted to resolve the conflicts.
> 
> I have also updated the bug description at
> <https://gitlab.freedesktop.org/drm/nouveau/-/issues/180>.

Maybe one of the nouveau developer can take a quick look at
d386a4b54607cf and suggest a simple way to revert it in latest mainline.
Maybe just removing the main chunk of code that is added is all that it
takes.

Ciao, Thorsten

Re: Radeon regression in 6.6 kernel

2023-11-19 Thread Linux regression tracking (Thorsten Leemhuis)

On 19.11.23 14:24, Bagas Sanjaya wrote:
> On Sun, Nov 19, 2023 at 04:47:01PM +1000, Dave Airlie wrote:
>>> On 12.11.23 01:46, Phillip Susi wrote:
 I had been testing some things on a post 6.6-rc5 kernel for a week or
 two and then when I pulled to a post 6.6 release kernel, I found that
 system suspend was broken.  It seems that the radeon driver failed to
 suspend, leaving the display dead, the wayland display server hung, and
 the system still running.  I have been trying to bisect it for the last
 few days and have only been able to narrow it down to the following 3
 commits:

 There are only 'skip'ped commits left to test.
 The first bad commit could be any of:
 56e449603f0ac580700621a356d35d5716a62ce5
 c07bf1636f0005f9eb7956404490672286ea59d3
 b70438004a14f4d0f9890b3297cd66248728546c
 We cannot bisect more!
>>>
>>> Hmm, not a single reply from the amdgpu folks. Wondering how we can
>>> encourage them to look into this.
>>>
>>> Phillip, reporting issues by mail should still work, but you might have
>>> more luck here, as that's where the amdgpu afaics prefer to track bugs:
>>> https://gitlab.freedesktop.org/drm/amd/-/issues
>>>
>>> When you file an issue there, please mention it here.
>>>
>>> Furthermore it might help if you could verify if 6.7-rc1 (or rc2, which
>>> comes out later today) or 6.6.2-rc1 improve things.

BTW, ignore the "6.6.2-rc1" here, I misunderstood one detail earlier. Sorry.

>> It would also be good to test if reverting any of these is possible or not.

Good point, sorry, forgot to mention that.

> Hi Dave,
> 
> AFAIK commit c07bf1636f0005 ("MAINTAINERS: Update the GPU Scheduler email")
> doesn't seem to do with this regression as it doesn't change any amdgpu code
> that may introduce the regression.

Bagas, sorry for being blunt here, I know you mean well. But I feel the
need to say the following in the open, as this otherwise falls back on
me and regression tracking.

Stating the above is not very helpful, as Dave for sure will know.
Telling Phillip that he likely can skip that commit might have been
something different. But I guess even for most users that are able to do
a bisection it's obvious and maybe not worth pointing out.

Ciao, Thorsten

Re: Radeon regression in 6.6 kernel

2023-11-18 Thread Linux regression tracking (Thorsten Leemhuis)

Lo!

On 12.11.23 01:46, Phillip Susi wrote:
> I had been testing some things on a post 6.6-rc5 kernel for a week or
> two and then when I pulled to a post 6.6 release kernel, I found that
> system suspend was broken.  It seems that the radeon driver failed to
> suspend, leaving the display dead, the wayland display server hung, and
> the system still running.  I have been trying to bisect it for the last
> few days and have only been able to narrow it down to the following 3
> commits:
> 
> There are only 'skip'ped commits left to test.
> The first bad commit could be any of:
> 56e449603f0ac580700621a356d35d5716a62ce5
> c07bf1636f0005f9eb7956404490672286ea59d3
> b70438004a14f4d0f9890b3297cd66248728546c
> We cannot bisect more!

Hmm, not a single reply from the amdgpu folks. Wondering how we can
encourage them to look into this.

Phillip, reporting issues by mail should still work, but you might have
more luck here, as that's where the amdgpu afaics prefer to track bugs:
https://gitlab.freedesktop.org/drm/amd/-/issues

When you file an issue there, please mention it here.

Furthermore it might help if you could verify if 6.7-rc1 (or rc2, which
comes out later today) or 6.6.2-rc1 improve things.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke


> It appears that there was a late merge in the 6.6 window that originally
> forked from the -rc2, as many of the later commits that I bisected had
> that version number.
> 
> I couldn't get it more narrowed down because I had to skip the
> surrounding commits because they wouldn't even boot up to a gui desktop,
> let alone try to suspend.
> 
> When system suspend fails, I find the following in my syslog after I
> have to magic-sysrq reboot because the the display is dead:
> 
> Nov 11 18:44:39 faldara kernel: PM: suspend entry (deep)
> Nov 11 18:44:39 faldara kernel: Filesystems sync: 0.035 seconds
> Nov 11 18:44:40 faldara kernel: Freezing user space processes
> Nov 11 18:44:40 faldara kernel: Freezing user space processes completed 
> (elapsed 0.001 seconds)
> Nov 11 18:44:40 faldara kernel: OOM killer disabled.
> Nov 11 18:44:40 faldara kernel: Freezing remaining freezable tasks
> Nov 11 18:44:40 faldara kernel: Freezing remaining freezable tasks completed 
> (elapsed 0.001 seconds)
> Nov 11 18:44:40 faldara kernel: printk: Suspending console(s) (use 
> no_console_suspend to debug)
> Nov 11 18:44:40 faldara kernel: serial 00:01: disabled
> Nov 11 18:44:40 faldara kernel: e1000e: EEE TX LPI TIMER: 0011
> Nov 11 18:44:40 faldara kernel: sd 4:0:0:0: [sdb] Synchronizing SCSI cache
> Nov 11 18:44:40 faldara kernel: sd 1:0:0:0: [sda] Synchronizing SCSI cache
> Nov 11 18:44:40 faldara kernel: sd 5:0:0:0: [sdc] Synchronizing SCSI cache
> Nov 11 18:44:40 faldara kernel: sd 4:0:0:0: [sdb] Stopping disk
> Nov 11 18:44:40 faldara kernel: sd 1:0:0:0: [sda] Stopping disk
> Nov 11 18:44:40 faldara kernel: sd 5:0:0:0: [sdc] Stopping disk
> Nov 11 18:44:40 faldara kernel: amdgpu: Move buffer fallback to memcpy 
> unavailable
> Nov 11 18:44:40 faldara kernel: [TTM] Buffer eviction failed
> Nov 11 18:44:40 faldara kernel: [drm] evicting device resources failed
> Nov 11 18:44:40 faldara kernel: amdgpu :03:00.0: PM: pci_pm_suspend(): 
> amdgpu_pmops_suspend+0x0/0x80 [amdgpu] returns -19
> Nov 11 18:44:40 faldara kernel: amdgpu :03:00.0: PM: dpm_run_callback(): 
> pci_pm_suspend+0x0/0x170 returns -19
> Nov 11 18:44:40 faldara kernel: amdgpu :03:00.0: PM: failed to suspend 
> async: error -19
> Nov 11 18:44:40 faldara kernel: PM: Some devices failed to suspend, or early 
> wake event detected
> Nov 11 18:44:40 faldara kernel: xhci_hcd :06:00.0: xHC error in resume, 
> USBSTS 0x401, Reinit
> Nov 11 18:44:40 faldara kernel: usb usb3: root hub lost power or was reset
> Nov 11 18:44:40 faldara kernel: usb usb4: root hub lost power or was reset
> Nov 11 18:44:40 faldara kernel: serial 00:01: activated
> Nov 11 18:44:40 faldara kernel: nvme nvme0: 4/0/0 default/read/poll queues
> Nov 11 18:44:40 faldara kernel: ata8: SATA link down (SStatus 0 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata7: SATA link down (SStatus 0 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 
> SControl 300)
> Nov 11 18:44:40 faldara kernel: ata1: SATA link down (SStatus 4 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata3: SATA link down (SStatus 4 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata4.00: configured for UDMA/133
> Nov 11 18:44:40 faldara kernel: OOM killer enabled.
> Nov 11 18:44:40 faldara kernel: Restarting tasks ... done.
> Nov 11 18:44:40 faldara kernel: random: crng reseeded on system resumption
> Nov 11 18:44:40 faldara kernel: PM: suspend exit
> Nov 11 18:44:40 faldara kernel: PM: suspend entry (s2idle)

Re: [PATCH v2 2/2] drm/msm/dp: attach the DP subconnector property

2023-11-18 Thread Linux regression tracking #adding (Thorsten Leemhuis)

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 15.11.23 09:06, Johan Hovold wrote:
> On Wed, Oct 25, 2023 at 12:23:10PM +0300, Dmitry Baryshkov wrote:
>> While developing and testing the commit bfcc3d8f94f4 ("drm/msm/dp:
>> support setting the DP subconnector type") I had the patch [1] in my
>> tree. I haven't noticed that it was a dependency for the commit in
>> question. Mea culpa.
> This also broke boot on the Lenovo ThinkPad X13s.
> [...]

>> Fixes: bfcc3d8f94f4 ("drm/msm/dp: support setting the DP subconnector type")
>> Reviewed-by: Abhinav Kumar 
>> Signed-off-by: Dmitry Baryshkov 
> 
> Reviewed-by: Johan Hovold 
> Tested-by: Johan Hovold 

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced bfcc3d8f94f4
#regzbot title drm/msm/dp: boot broken on the Lenovo ThinkPad X13s and
some other machines
#regzbot fix: drm/msm/dp: attach the DP subconnector property
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: mainline build failure due to 7966f319c66d ("drm/amd/display: Introduce DML2")

2023-11-12 Thread Linux regression tracking #update (Thorsten Leemhuis)

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 04.11.23 10:42, Sudip Mukherjee wrote:
> On Thu, 2 Nov 2023 at 22:53, Alex Deucher  wrote:
>> On Thu, Nov 2, 2023 at 1:07 PM Sudip Mukherjee
>>  wrote:
>>> On Thu, 2 Nov 2023 at 16:52, Alex Deucher  wrote:
 On Thu, Nov 2, 2023 at 5:32 AM Sudip Mukherjee (Codethink)
  wrote:
>>
>> Should be fixed with Nathan's patch:
>> https://patchwork.freedesktop.org/patch/565675/
> 
> Yes, it does. Thanks.
> 
> Tested-by: Sudip Mukherjee 

#regzbot fix: 6740ec97bcdbe9
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: [Nouveau] Fwd: System (Xeon Nvidia) hangs at boot terminal after kernel 6.4.7

2023-11-01 Thread Linux regression tracking #update (Thorsten Leemhuis)

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 10.08.23 06:19, Thorsten Leemhuis wrote:
> On 10.08.23 05:03, Bagas Sanjaya wrote:
>>
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>
>> [...]
>> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217776

#regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/255
#regzbot fix: 6eb4a83e612af65bab8492957cba
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: Blank screen on boot of Linux 6.5 and later on Lenovo ThinkPad L570

2023-10-25 Thread Linux regression tracking (Thorsten Leemhuis)

On 25.10.23 15:23, Huacai Chen wrote:
> On Wed, Oct 25, 2023 at 6:08 PM Thorsten Leemhuis
>  wrote:
>>
>> Javier, Dave, Sima,
>>
>> On 23.10.23 00:54, Evan Preston wrote:
>>> On 2023-10-20 Fri 05:48pm, Huacai Chen wrote:
>>>> On Fri, Oct 20, 2023 at 5:35 PM Linux regression tracking (Thorsten
>>>> Leemhuis)  wrote:
>>>>> On 09.10.23 10:54, Huacai Chen wrote:
>>>>>> On Mon, Oct 9, 2023 at 4:45 PM Bagas Sanjaya  
>>>>>> wrote:
>>>>>>> On Mon, Oct 09, 2023 at 09:27:02AM +0800, Huacai Chen wrote:
>>>>>>>> On Tue, Sep 26, 2023 at 10:31 PM Huacai Chen  
>>>>>>>> wrote:
>>>>>>>>> On Tue, Sep 26, 2023 at 7:15 PM Linux regression tracking (Thorsten
>>>>>>>>> Leemhuis)  wrote:
>>>>>>>>>> On 13.09.23 14:02, Jaak Ristioja wrote:
>>>>>>>>>>>
>>>>>>>>>>> Upgrading to Linux 6.5 on a Lenovo ThinkPad L570 (Integrated Intel 
>>>>>>>>>>> HD
>>>>>>>>>>> Graphics 620 (rev 02), Intel(R) Core(TM) i7-7500U) results in a 
>>>>>>>>>>> blank
>>>>>>>>>>> screen after boot until the display manager starts... if it does 
>>>>>>>>>>> start
>>>>>>>>>>> at all. Using the nomodeset kernel parameter seems to be a 
>>>>>>>>>>> workaround.
>>>>>>>>>>>
>>>>>>>>>>> I've bisected this to commit 
>>>>>>>>>>> 60aebc9559492cea6a9625f514a8041717e3a2e4
>>>>>>>>>>> ("drivers/firmware: Move sysfb_init() from device_initcall to
>>>>>>>>>>> subsys_initcall_sync").
>>>>>>>>>>
>>>>>>>> As confirmed by Jaak, disabling DRM_SIMPLEDRM makes things work fine
>>>>>>>> again. So I guess the reason:
>>>>>
>>>>> Well, this to me still looks a lot (please correct me if I'm wrong) like
>>>>> regression that should be fixed, as DRM_SIMPLEDRM was enabled beforehand
>>>>> if I understood things correctly. Or is there a proper fix for this
>>>>> already in the works and I just missed this? Or is there some good
>>>>> reason why this won't/can't be fixed?
>>>>
>>>> DRM_SIMPLEDRM was enabled but it didn't work at all because there was
>>>> no corresponding platform device. Now DRM_SIMPLEDRM works but it has a
>>>> blank screen. Of course it is valuable to investigate further about
>>>> DRM_SIMPLEDRM on Jaak's machine, but that needs Jaak's effort because
>>>> I don't have a same machine.
>>
>> Side note: Huacai, have you tried working with Jaak to get down to the
>> real problem? Evan, might you be able to help out here?
> No, Jaak has no response after he 'fixed' his problem by disabling SIMPLEDRM.

Yeah, understood, already suspected something like that, thx for confirming.

>> But I write this mail for a different reason:
>>
>>> I am having the same issue on a Lenovo Thinkpad P70 (Intel
>>> Corporation HD Graphics 530 (rev 06), Intel(R) Core(TM) i7-6700HQ).
>>> Upgrading from Linux 6.4.12 to 6.5 and later results in only a blank
>>> screen after boot and a rapidly flashing device-access-status
>>> indicator.
>>
>> This additional report makes me wonder if we should revert the culprit
>> (60aebc9559492c ("drivers/firmware: Move sysfb_init() from
>> device_initcall to subsys_initcall_sync") [v6.5-rc1]). But I guess that
>> might lead to regressions for some users? But the patch description says
>> that this is not a common configuration, so can we maybe get away with that?
>>From my point of view, this is not a regression, 60aebc9559492c
> doesn't cause a problem, but exposes a problem.

>From my understanding of Linus stance in cases like this I think that
aspect doesn't matter. To for example quote
https://lore.kernel.org/lkml/CAHk-=wiP4K8DRJWsCo=20hn_6054xbamgkf2kpguzpb5ama...@mail.gmail.com/

""
But it ended up exposing another problem, and as such caused a kernel
upgrade to fail for a user. So it got reverted.
"""

For other examples of his view see the bottom half of
https://docs.kernel.org/process/handling-regressions.html

We could bring Linus in to clarify if needed, but I for now didn't CC
him, as I hope we can solve this without h

Re: Blank screen on boot of Linux 6.5 and later on Lenovo ThinkPad L570

2023-10-25 Thread Thorsten Leemhuis

Javier, Dave, Sima,

On 23.10.23 00:54, Evan Preston wrote:
> On 2023-10-20 Fri 05:48pm, Huacai Chen wrote:
>> On Fri, Oct 20, 2023 at 5:35 PM Linux regression tracking (Thorsten
>> Leemhuis)  wrote:
>>> On 09.10.23 10:54, Huacai Chen wrote:
>>>> On Mon, Oct 9, 2023 at 4:45 PM Bagas Sanjaya  wrote:
>>>>> On Mon, Oct 09, 2023 at 09:27:02AM +0800, Huacai Chen wrote:
>>>>>> On Tue, Sep 26, 2023 at 10:31 PM Huacai Chen  
>>>>>> wrote:
>>>>>>> On Tue, Sep 26, 2023 at 7:15 PM Linux regression tracking (Thorsten
>>>>>>> Leemhuis)  wrote:
>>>>>>>> On 13.09.23 14:02, Jaak Ristioja wrote:
>>>>>>>>>
>>>>>>>>> Upgrading to Linux 6.5 on a Lenovo ThinkPad L570 (Integrated Intel HD
>>>>>>>>> Graphics 620 (rev 02), Intel(R) Core(TM) i7-7500U) results in a blank
>>>>>>>>> screen after boot until the display manager starts... if it does start
>>>>>>>>> at all. Using the nomodeset kernel parameter seems to be a workaround.
>>>>>>>>>
>>>>>>>>> I've bisected this to commit 60aebc9559492cea6a9625f514a8041717e3a2e4
>>>>>>>>> ("drivers/firmware: Move sysfb_init() from device_initcall to
>>>>>>>>> subsys_initcall_sync").
>>>>>>>>
>>>>>> As confirmed by Jaak, disabling DRM_SIMPLEDRM makes things work fine
>>>>>> again. So I guess the reason:
>>>
>>> Well, this to me still looks a lot (please correct me if I'm wrong) like
>>> regression that should be fixed, as DRM_SIMPLEDRM was enabled beforehand
>>> if I understood things correctly. Or is there a proper fix for this
>>> already in the works and I just missed this? Or is there some good
>>> reason why this won't/can't be fixed?
>>
>> DRM_SIMPLEDRM was enabled but it didn't work at all because there was
>> no corresponding platform device. Now DRM_SIMPLEDRM works but it has a
>> blank screen. Of course it is valuable to investigate further about
>> DRM_SIMPLEDRM on Jaak's machine, but that needs Jaak's effort because
>> I don't have a same machine.

Side note: Huacai, have you tried working with Jaak to get down to the
real problem? Evan, might you be able to help out here?

But I write this mail for a different reason:

> I am having the same issue on a Lenovo Thinkpad P70 (Intel 
> Corporation HD Graphics 530 (rev 06), Intel(R) Core(TM) i7-6700HQ).
> Upgrading from Linux 6.4.12 to 6.5 and later results in only a blank
> screen after boot and a rapidly flashing device-access-status
> indicator.

This additional report makes me wonder if we should revert the culprit
(60aebc9559492c ("drivers/firmware: Move sysfb_init() from
device_initcall to subsys_initcall_sync") [v6.5-rc1]). But I guess that
might lead to regressions for some users? But the patch description says
that this is not a common configuration, so can we maybe get away with that?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

>>>>>> When SIMPLEDRM takes over the framebuffer, the screen is blank (don't
>>>>>> know why). And before 60aebc9559492cea6a9625f ("drivers/firmware: Move
>>>>>> sysfb_init() from device_initcall to subsys_initcall_sync") there is
>>>>>> no platform device created for SIMPLEDRM at early stage, so it seems
>>>>>> also "no problem".
>>>>> I don't understand above. You mean that after that commit the platform
>>>>> device is also none, right?
>>>> No. The SIMPLEDRM driver needs a platform device to work, and that
>>>> commit makes the platform device created earlier. So, before that
>>>> commit, SIMPLEDRM doesn't work, but the screen isn't blank; after that
>>>> commit, SIMPLEDRM works, but the screen is blank.
>>>>
>>>> Huacai
>>>>>
>>>>> Confused...
>>>>>
>>>>> --
>>>>> An old man doll... just what I always wanted! - Clara
>>>>
>>>>
> 
>

Re: Blank screen on boot of Linux 6.5 and later on Lenovo ThinkPad L570

2023-10-20 Thread Linux regression tracking (Thorsten Leemhuis)

On 09.10.23 10:54, Huacai Chen wrote:
> On Mon, Oct 9, 2023 at 4:45 PM Bagas Sanjaya  wrote:
>> On Mon, Oct 09, 2023 at 09:27:02AM +0800, Huacai Chen wrote:
>>> On Tue, Sep 26, 2023 at 10:31 PM Huacai Chen  wrote:
>>>> On Tue, Sep 26, 2023 at 7:15 PM Linux regression tracking (Thorsten
>>>> Leemhuis)  wrote:
>>>>> On 13.09.23 14:02, Jaak Ristioja wrote:
>>>>>>
>>>>>> Upgrading to Linux 6.5 on a Lenovo ThinkPad L570 (Integrated Intel HD
>>>>>> Graphics 620 (rev 02), Intel(R) Core(TM) i7-7500U) results in a blank
>>>>>> screen after boot until the display manager starts... if it does start
>>>>>> at all. Using the nomodeset kernel parameter seems to be a workaround.
>>>>>>
>>>>>> I've bisected this to commit 60aebc9559492cea6a9625f514a8041717e3a2e4
>>>>>> ("drivers/firmware: Move sysfb_init() from device_initcall to
>>>>>> subsys_initcall_sync").
>>>>>
>>>>> Hmmm, no reaction since it was posted a while ago, unless I'm missing
>>>>> something.
>>>>>
>>>>> Huacai Chen, did you maybe miss this report? The problem is apparently
>>>>> caused by a commit of yours (that Javier applied), you hence should look
>>>>> into this.
>>>> I'm sorry but it looks very strange, could you please share your config 
>>>> file?
>>> As confirmed by Jaak, disabling DRM_SIMPLEDRM makes things work fine
>>> again. So I guess the reason:
>>
>> Did Jaak reply privately? It should have been disclosed in public
>> ML here instead.
> Yes, he replied privately, and disabling DRM_SIMPLEDRM was suggested by me.

Well, this to me still looks a lot (please correct me if I'm wrong) like
regression that should be fixed, as DRM_SIMPLEDRM was enabled beforehand
if I understood things correctly. Or is there a proper fix for this
already in the works and I just missed this? Or is there some good
reason why this won't/can't be fixed?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

>>> When SIMPLEDRM takes over the framebuffer, the screen is blank (don't
>>> know why). And before 60aebc9559492cea6a9625f ("drivers/firmware: Move
>>> sysfb_init() from device_initcall to subsys_initcall_sync") there is
>>> no platform device created for SIMPLEDRM at early stage, so it seems
>>> also "no problem".
>>
>> I don't understand above. You mean that after that commit the platform
>> device is also none, right?
> No. The SIMPLEDRM driver needs a platform device to work, and that
> commit makes the platform device created earlier. So, before that
> commit, SIMPLEDRM doesn't work, but the screen isn't blank; after that
> commit, SIMPLEDRM works, but the screen is blank.
> 
> Huacai
>>
>> Confused...
>>
>> --
>> An old man doll... just what I always wanted! - Clara
> 
>

Re: [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-29 Thread Linux regression tracking #update (Thorsten Leemhuis)

On 19.09.23 16:08, Bagas Sanjaya wrote:
> On Sat, Sep 02, 2023 at 06:14:12PM +0200, Oleksandr Natalenko wrote:
>>
>> Since v6.5 kernel the following HW:
>>
>> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
>> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> #regzbot ^introduced: 0b62af28f249b9
> #regzbot title: gen8_ggtt_insert_entries() panic on Lenovo T14s (Tiger Lake) 
> due to folio_batch() on shmem_sg_free_table()
> #regzbot link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256

#regzbot fix: i915: Limit the length of an sg list to the requested length
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: Blank screen on boot of Linux 6.5 and later on Lenovo ThinkPad L570

2023-09-26 Thread Linux regression tracking (Thorsten Leemhuis)

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

Hi, Thorsten here, the Linux kernel's regression tracker.

On 13.09.23 14:02, Jaak Ristioja wrote:
> 
> Upgrading to Linux 6.5 on a Lenovo ThinkPad L570 (Integrated Intel HD
> Graphics 620 (rev 02), Intel(R) Core(TM) i7-7500U) results in a blank
> screen after boot until the display manager starts... if it does start
> at all. Using the nomodeset kernel parameter seems to be a workaround.
> 
> I've bisected this to commit 60aebc9559492cea6a9625f514a8041717e3a2e4
> ("drivers/firmware: Move sysfb_init() from device_initcall to
> subsys_initcall_sync").

Hmmm, no reaction since it was posted a while ago, unless I'm missing
something.

Huacai Chen, did you maybe miss this report? The problem is apparently
caused by a commit of yours (that Javier applied), you hence should look
into this.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> git bisect start
> # status: waiting for both good and bad commits
> # good: [6995e2de6891c724bfeb2db33d7b87775f913ad1] Linux 6.4
> git bisect good 6995e2de6891c724bfeb2db33d7b87775f913ad1
> # status: waiting for bad commit, 1 good commit known
> # bad: [2dde18cd1d8fac735875f2e4987f11817cc0bc2c] Linux 6.5
> git bisect bad 2dde18cd1d8fac735875f2e4987f11817cc0bc2c
> # bad: [b775d6c5859affe00527cbe74263de05cfe6b9f9] Merge tag 'mips_6.5'
> of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
> git bisect bad b775d6c5859affe00527cbe74263de05cfe6b9f9
> # good: [3a8a670eeeaa40d87bd38a587438952741980c18] Merge tag
> 'net-next-6.5' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
> git bisect good 3a8a670eeeaa40d87bd38a587438952741980c18
> # bad: [188d3f80fc6d8451ab5e570becd6a7b2d3033023] drm/amdgpu: vcn_4_0
> set instance 0 init sched score to 1
> git bisect bad 188d3f80fc6d8451ab5e570becd6a7b2d3033023
> # good: [12fb1ad70d65edc3405884792d044fa79df7244f] drm/amdkfd: update
> process interrupt handling for debug events
> git bisect good 12fb1ad70d65edc3405884792d044fa79df7244f
> # bad: [9cc31938d4586f72eb8e0235ad9d9eb22496fcee] i915/perf: Drop the
> aging_tail logic in perf OA
> git bisect bad 9cc31938d4586f72eb8e0235ad9d9eb22496fcee
> # bad: [51d86ee5e07ccef85af04ee9850b0baa107999b6] drm/msm: Switch to
> fdinfo helper
> git bisect bad 51d86ee5e07ccef85af04ee9850b0baa107999b6
> # good: [bfdede3a58ea970333d77a05144a7bcec13cf515] drm/rockchip: cdn-dp:
> call drm_connector_update_edid_property() unconditionally
> git bisect good bfdede3a58ea970333d77a05144a7bcec13cf515
> # good: [123ee07ba5b7123e0ce0e0f9d64938026c16a2ce] drm: sun4i_tcon: use
> devm_clk_get_enabled in `sun4i_tcon_init_clocks`
> git bisect good 123ee07ba5b7123e0ce0e0f9d64938026c16a2ce
> # bad: [20d54e48d9c705091a025afff5839da2ea606f6b] fbdev: Rename
> fb_mem*() helpers
> git bisect bad 20d54e48d9c705091a025afff5839da2ea606f6b
> # bad: [728cb3f061e2b3a002fd76d91c2449b1497b6640] gpu: drm: bridge: No
> need to set device_driver owner
> git bisect bad 728cb3f061e2b3a002fd76d91c2449b1497b6640
> # bad: [0f1cb4d777281ca3360dbc8959befc488e0c327e] drm/ssd130x: Fix
> include guard name
> git bisect bad 0f1cb4d777281ca3360dbc8959befc488e0c327e
> # good: [0bd5bd65cd2e4d1335ea6c17cd2c8664decbc630] dt-bindings: display:
> simple: Add BOE EV121WXM-N10-1850 panel
> git bisect good 0bd5bd65cd2e4d1335ea6c17cd2c8664decbc630
> # bad: [60aebc9559492cea6a9625f514a8041717e3a2e4] drivers/firmware: Move
> sysfb_init() from device_initcall to subsys_initcall_sync
> git bisect bad 60aebc9559492cea6a9625f514a8041717e3a2e4
> # good: [8bb7c7bca5b70f3cd22d95b4d36029295c4274f6] drm/panel:
> panel-simple: Add BOE EV121WXM-N10-1850 panel support
> git bisect good 8bb7c7bca5b70f3cd22d95b4d36029295c4274f6
> # first bad commit: [60aebc9559492cea6a9625f514a8041717e3a2e4]
> drivers/firmware: Move sysfb_init() from device_initcall to
> subsys_initcall_sync

Re: mainline build failure due to 501126083855 ("fbdev/g364fb: Use fbdev I/O helpers")

2023-09-03 Thread Linux regression tracking #update (Thorsten Leemhuis)

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 31.08.23 20:48, Sudip Mukherjee (Codethink) wrote:
> Hi All,
> 
> The latest mainline kernel branch fails to build mips jazz_defconfig with
> the error:
> 
> drivers/video/fbdev/g364fb.c:115:9: error: 'FB_DEFAULT_IOMEM_HELPERS' 
> undeclared here (not in a function); did you mean 'FB_DEFAULT_IOMEM_OPS'?
>   115 | FB_DEFAULT_IOMEM_HELPERS,
>   | ^~~~
>   | FB_DEFAULT_IOMEM_OPS
> 
> 
> git bisect pointed to 501126083855 ("fbdev/g364fb: Use fbdev I/O helpers").
> 
> Reverting the commit has fixed the build failure.
> 
> I will be happy to test any patch or provide any extra log if needed.
> 
> #regzbot introduced: 5011260838551cefbf23d60b48c3243b6d5530a2
> 

#regzbot fix: 8df0f84c3bb921f5aa1036223dd932bbc7df6d
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: nouveau bug in linux/6.1.38-2

2023-08-31 Thread Linux regression tracking #update (Thorsten Leemhuis)

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 04.08.23 14:02, Thorsten Leemhuis wrote:
> On 02.08.23 23:28, Olaf Skibbe wrote:
>> Dear Maintainers,
>>
>> Hereby I would like to report an apparent bug in the nouveau driver in
>> linux/6.1.38-2.
> 
> Thx for your report. Maybe your problem is caused by a incomplete
> backport. I Cced the maintainers for the drivers (and the regressions
> and the stable list), maybe one of them has an idea, as they know the
> driver.

#regzbot fix: 98e470dc73a9b3539e5a7a3c72f6b7c01c98
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: [REGRESSION] HDMI connector detection broken in 6.3 on Intel(R) Celeron(R) N3060 integrated graphics

2023-08-13 Thread Linux regression tracking (Thorsten Leemhuis)

On 11.08.23 20:10, Mikhail Rudenko wrote:
> On 2023-08-11 at 08:45 +02, Thorsten Leemhuis  
> wrote:
>> On 10.08.23 21:33, Mikhail Rudenko wrote:
>>> The following is a copy an issue I posted to drm/i915 gitlab [1] two
>>> months ago. I repost it to the mailing lists in hope that it will help
>>> the right people pay attention to it.
>>
>> Thx for your report. Wonder why Dmitry (who authored a4e771729a51) or
>> Thomas (who committed it) it didn't look into this, but maybe the i915
>> devs didn't forward the report to them.

For the record: they did, and Jani mentioned already. Sorry, should have
phrased this differently.

>> Let's see if these mails help. Just wondering: does reverting
>> a4e771729a51 from 6.5-rc5 or drm-tip help as well?
> 
> I've redone my tests with 6.5-rc5, and here are the results:
> (1) 6.5-rc5 -> still affected
> (2) 6.5-rc5 + revert a4e771729a51 -> not affected
> (3) 6.5-rc5 + two patches [1][2] suggested on i915 gitlab by @ideak -> not 
> affected (!)
> 
> Should we somehow tell regzbot about (3)?

That's good to know, thx. But the more important things are:

* When will those be merged? They are not yet in next yet afaics, so it
might take some time to mainline them, especially at this point of the
devel cycle. Imre, could you try to prod the right people so that these
are ideally upstreamed rather sooner than later, as they fix a regression?
* They if possible ideally should be tagged for backporting to 6.4, as
this is a regression from the 6.3 cycle.

But yes, let's tell regzbot that fixes are available, too:

#regzbot fix: drm/i915: Fix HPD polling, reenabling the output poll work
as needed

(for the record: that's the second of two patches apparently needed)

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

>> BTW, there was an earlier report about a problem with a4e771729a51 that
>> afaics was never addressed, but it might be unrelated.
>> https://lore.kernel.org/all/20230328023129.3596968-1-zhouzong...@kylinos.cn/
> [1] https://patchwork.freedesktop.org/patch/548590/?series=121050=1
> [2] https://patchwork.freedesktop.org/patch/548591/?series=121050=1

Re: [REGRESSION] HDMI connector detection broken in 6.3 on Intel(R) Celeron(R) N3060 integrated graphics

2023-08-11 Thread Thorsten Leemhuis

[CCing the i915 maintainers and the dri maintainers]

Hi, Thorsten here, the Linux kernel's regression tracker.

On 10.08.23 21:33, Mikhail Rudenko wrote:
> The following is a copy an issue I posted to drm/i915 gitlab [1] two
> months ago. I repost it to the mailing lists in hope that it will help
> the right people pay attention to it.

Thx for your report. Wonder why Dmitry (who authored a4e771729a51) or
Thomas (who committed it) it didn't look into this, but maybe the i915
devs didn't forward the report to them.

Let's see if these mails help. Just wondering: does reverting
a4e771729a51 from 6.5-rc5 or drm-tip help as well?

BTW, there was an earlier report about a problem with a4e771729a51 that
afaics was never addressed, but it might be unrelated.

https://lore.kernel.org/all/20230328023129.3596968-1-zhouzong...@kylinos.cn/

Ciao, Thorsten

> After kernel upgrade from 6.2.13 to 6.3 HDMI connector detection is
> broken for me. Issue is 100% reproducible:
> 
> 1. Start system as usual with HDMI connected.
> 2. Disconnect HDMI
> 3. Connect HDMI back
> 4. Get "no signal" on display, connector status in sysfs is disconnected
> 
> Curiously, running xrandr over ssh like
> 
> ssh qnap251.local env DISPLAY=:0 xrandr
> 
> makes display come back. drm-tip tip is affected as well (last test
> 2023-08-02).
> 
> Bisecting points at a4e771729a51 ("drm/probe_helper: sort out poll_running vs 
> poll_enabled").
> Reverting that commit on top of 6.3 fixes the issue for me.
> 
> System information:
> * System architecture: x86_64
> * Kernel version: 6.3.arch1
> * Linux distribution: Arch Linux
> * Machine: QNAP TS-251A, CPU: Intel(R) Celeron(R) CPU N3060 @ 1.60GHz
> * Display connector: single HDMI display
> * dmesg with debug information (captured on drm-tip, following above 4 
> steps): [2]
> * xrandr output:
> 
> Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 16384 x 16384
> DP-1 disconnected (normal left inverted right x axis y axis)
> HDMI-1 connected primary 1920x1080+0+0 (normal left inverted right x axis 
> y axis) 708mm x 398mm
>1920x1080 60.00*+  50.0059.9430.0025.0024.00
> 29.9723.98
>1920x1080i60.0050.0059.94
>1360x768  59.80
>1280x768  60.35
>1280x720  60.0050.0059.94
>1024x768  75.0370.0760.00
>832x624   74.55
>800x600   75.0060.32
>720x576   50.00
>720x480   60.0059.94
>640x480   75.0060.0059.94
>720x400   70.08
> DP-2 disconnected (normal left inverted right x axis y axis)
> HDMI-2 disconnected (normal left inverted right x axis y axis)```
> 
> I'm willing to provide additional information and/or test fixes.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/8451
> [2] 
> https://gitlab.freedesktop.org/drm/intel/uploads/fda7aff0b13ef20962856c2c7be51544/dmesg.txt
> 
> #regzbot introduced: a4e771729a51
> 
> --
> Best regards,
> Mikhail Rudenko

Re: [Nouveau] Fwd: System (Xeon Nvidia) hangs at boot terminal after kernel 6.4.7

2023-08-09 Thread Thorsten Leemhuis

On 10.08.23 05:03, Bagas Sanjaya wrote:
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:
> 
>> Kernel 6.4.6 compiled from source worked AOK on my desktop with Intel Xeon 
>> cpu and Nvidia graphics - see below for system specs.
>>
>> Kernels 6.4.7 & 6.4.8 also compiled from source with identical configs hang 
>> with a frozen boot terminal screen after a significant way through the boot 
>> sequence (e.g. whilst running /etc/profile). The system may still be running 
>> as a sound is emitted when the power button is pressed (only way to escape 
>> from the system hang).
> [...]
>> Computer Profile:
>>  MachineDell Inc. Precision WorkStation T5400   
>> (version: Not Specified)
>>  Mainboard  Dell Inc. 0RW203 (version: NA)
>>  • BIOS Dell Inc. A11 | Date: 04/30/2012 | Type: Legacy
>>  • CPU  Intel(R) Xeon(R) CPU E5450 @ 3.00GHz (4 cores)
>>  • RAM  Total: 7955 MB | Used: 1555 MB (19.5%) | Actual 
>> Used: 775 MB (9.7%)
>>  Graphics   Resolution: 1366x768 pixels | Display Server: 
>> X.Org 21.1.8
>>  • device-0 NVIDIA Corporation GT218 [NVS 300] [10de:10d8] 
>> (rev a2)
>>  Audio  ALSA
>>  • device-0 Intel Corporation 631xESB/632xESB High 
>> Definition Audio Controller [8086:269a] (rev 09)
>>  • device-1 NVIDIA Corporation High Definition Audio 
>> Controller [10de:0be3] (rev a1)
>>  Networkwlan1
>>  • device-0 Ethernet: Broadcom Inc. and subsidiaries 
>> NetXtreme BCM5754 Gigabit Ethernet PCI Express [14e4:167a] (rev 02)
> 
> See Bugzilla for the full thread.
> [...]
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217776

Not my area of expertise, but nevertheless pretty sure this is the same
issue already discussed here, as it's a GT218 there as well at 6.4.7 is
the version that commit was backported to:

https://lore.kernel.org/all/20230806213107.GFZNARG6moWpFuSJ9W@fat_crate.local/

No final solution ready yet, but looks like the culprit will be reverted.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

2023-08-09 Thread Thorsten Leemhuis

On 09.08.23 15:13, Takashi Iwai wrote:
> 
> If this can't be fixed quickly, I suppose it's safer to revert it from
> 6.4.y for now.  6.5 is still being cooked, but 6.4.x is already in
> wide deployment, hence the regression has to be addressed quickly.

Good luck with that. To quote
https://docs.kernel.org/process/handling-regressions.html :

```
Regarding stable and longterm kernels:

[...]

* Whenever you want to swiftly resolve a regression that recently also
made it into a proper mainline, stable, or longterm release, fix it
quickly in mainline; when appropriate thus involve Linus to fast-track
the fix (see above). That's because the stable team normally does
neither revert nor fix any changes that cause the same problems in mainline.
```

Note the "normally" in there, so there is a chance.

Ciao, Thorsten

Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

2023-08-07 Thread Thorsten Leemhuis

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 06.08.23 23:31, Borislav Petkov wrote:
> 
> the patch in $Subject

Side note, in case anyone cares: it was also included in 6.4.7.

> breaks booting here on one of my test boxes, see
> below.
> 
> Reverting it ontop of -rc4 fixes the issue.
> 
> Thx.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 2b5d1c29f6c4
#regzbot title drm/nouveau: stopped booting
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

> [3.580535] ACPI: \_PR_.CP04: Found 4 idle states
> [3.585694] ACPI: \_PR_.CP05: Found 4 idle states
> [3.590852] ACPI: \_PR_.CP06: Found 4 idle states
> [3.596037] ACPI: \_PR_.CP07: Found 4 idle states
> [3.644065] Freeing initrd memory: 6740K
> [3.742932] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [3.750409] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 
> 16550A
> [3.762111] serial :00:16.3: enabling device ( -> 0003)
> [3.771589] :00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 
> 115200) is a 16550A
> [3.782503] Linux agpgart interface v0.103
> [3.787805] ACPI: bus type drm_connector registered
> 
> <--- boot stops here.
> 
> It should continue with this:
> 
> [3.795491] Console: switching to colour dummy device 80x25
> [3.801933] nouveau :03:00.0: vgaarb: deactivate vga console
> [3.808303] nouveau :03:00.0: NVIDIA GT218 (0a8c00b1)
> [3.931002] nouveau :03:00.0: bios: version 70.18.83.00.08
> [3.941731] nouveau :03:00.0: fb: 512 MiB DDR3
> [4.110348] tsc: Refined TSC clocksource calibration: 3591.349 MHz
> [4.116627] clocksource: tsc: mask: 0x max_cycles: 
> 0x33c466a1ab5, max_idle_ns: 440795209767 ns
> [4.126871] clocksource: Switched to clocksource tsc
> [4.252013] nouveau :03:00.0: DRM: VRAM: 512 MiB
> [4.257088] nouveau :03:00.0: DRM: GART: 1048576 MiB
> [4.262501] nouveau :03:00.0: DRM: TMDS table version 2.0
> [4.268333] nouveau :03:00.0: DRM: DCB version 4.0
> [4.273561] nouveau :03:00.0: DRM: DCB outp 00: 02000360 
> [4.280104] nouveau :03:00.0: DRM: DCB outp 01: 02000362 00020010
> [4.286630] nouveau :03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> [4.293176] nouveau :03:00.0: DRM: DCB outp 03: 01011380 
> [4.299711] nouveau :03:00.0: DRM: DCB outp 04: 08011382 00020010
> [4.306243] nouveau :03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> [4.312772] nouveau :03:00.0: DRM: DCB conn 00: 00101064
> [4.318520] nouveau :03:00.0: DRM: DCB conn 01: 00202165
> [4.329488] nouveau :03:00.0: DRM: MM: using COPY for buffer copies
> [4.336261] stackdepot: allocating hash table of 1048576 entries via 
> kvcalloc
> ...
> 
>

Re: nouveau bug in linux/6.1.38-2

2023-08-04 Thread Thorsten Leemhuis

Hi!

On 02.08.23 23:28, Olaf Skibbe wrote:
> Dear Maintainers,
> 
> Hereby I would like to report an apparent bug in the nouveau driver in
> linux/6.1.38-2.

Thx for your report. Maybe your problem is caused by a incomplete
backport. I Cced the maintainers for the drivers (and the regressions
and the stable list), maybe one of them has an idea, as they know the
driver.

If they don't reply in the next few days, please check if the problem is
also present in mainline. If not, check if the latest 6.1.y. release
already fixes this. If not, try to check which of the four patches you
reverted to make things going is actually causing this (e.g. first only
revert the one that was applied last; then the two last ones; ...).

> Running a current debian stable on a Dell Latitude E6510 with a
> "NVIDIA Corporation GT218M" graphic card, the monitor turns black
> after the grub screen. Also switching to a console (Strg-Alt-F2) shows
> just a black screen. Access via ssh is possible.
> 
> ~# uname -r
> 6.1.0-10-amd64
> 
> demesg shows the following error message:
> 
> [    3.560153] WARNING: CPU: 0 PID: 176 at
> drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460
> nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft
> cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi
> i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm
> scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci
> ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul
> crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core
> crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery
> video wmi button
> [    3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted
> 6.1.0-10-amd64 #1  Debian 6.1.38-2
> [    3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17
> 05/12/2017
> [    3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau]
> [    3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37
> 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc
> cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
> [    3.560541] RSP: 0018:9899c048bd60 EFLAGS: 00010246
> [    3.560542] RAX: 00041eb0 RBX: 88e0209d2600 RCX:
> 00041eb0
> [    3.560544] RDX: c079f760 RSI:  RDI:
> 9899c048bcf0
> [    3.560545] RBP: 0001 R08: 9899c048bc64 R09:
> 5b76
> [    3.560546] R10: 000d R11: 9899c048bde0 R12:
> ffea
> [    3.560548] R13: 88e00b39e480 R14: 00044d45 R15:
> 
> [    3.560549] FS:  () GS:88e123c0()
> knlGS:
> [    3.560551] CS:  0010 DS:  ES:  CR0: 80050033
> [    3.560552] CR2: 7f57f4e90451 CR3: 00018141 CR4:
> 06f0
> [    3.560554] Call Trace:
> [    3.560558]  
> [    3.560560]  ? __warn+0x7d/0xc0
> [    3.560566]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560671]  ? report_bug+0xe6/0x170
> [    3.560675]  ? handle_bug+0x41/0x70
> [    3.560679]  ? exc_invalid_op+0x13/0x60
> [    3.560681]  ? asm_exc_invalid_op+0x16/0x20
> [    3.560685]  ? init_reset_begun+0x20/0x20 [nouveau]
> [    3.560769]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560888]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
> [    3.560997]  nv50_disp_super+0x113/0x210 [nouveau]
> [    3.561103]  process_one_work+0x1c7/0x380
> [    3.561109]  worker_thread+0x4d/0x380
> [    3.561113]  ? rescuer_thread+0x3a0/0x3a0
> [    3.561116]  kthread+0xe9/0x110
> [    3.561120]  ? kthread_complete_and_exit+0x20/0x20
> [    3.561122]  ret_from_fork+0x22/0x30
> [    3.561130]  
> 
> Further information:
> 
> $ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }')
> 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M]
> (rev a2) (prog-if 00 [VGA controller])
> Subsystem: Dell Latitude E6510
> Flags: bus master, fast devsel, latency 0, IRQ 27
> Memory at e200 (32-bit, non-prefetchable) [size=16M]
> Memory at d000 (64-bit, prefetchable) [size=256M]
> Memory at e000 (64-bit, prefetchable) [size=32M]
> I/O ports at 7000 [size=128]
> Expansion ROM at 000c [disabled] [size=128K]
> Capabilities: 
> Kernel driver in use: nouveau
> Kernel modules: nouveau
> 
> I reported this bug to debian already, see
> https://bugs.debian.org/1042753 for context.
> 
> With support (thanks Diederik!) I managed to figure out that the cause
> was a regression between upstream kernel version 6.1.27 and 6.1.38.
> 
> I build a new 6.1.38 kernel with these commits reverted:
> 
> 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
> fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
> 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA

Re: [PATCH 2/2] drm/bridge: lt9611: Do not generate HFP/HBP/HSA and EOT packet

2023-07-26 Thread Linux regression tracking (Thorsten Leemhuis)

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

What's the status wrt to this regression (caused by 8ddce13ae69 from
Marek)? It looks like things are stalled and the regression still is
unresolved, but I ask because I might be missing something.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 14.07.23 08:11, Amit Pundir wrote:
> On Thu, 13 Jul 2023 at 23:58, Marek Vasut  wrote:
>>
>> On 7/13/23 20:09, Abhinav Kumar wrote:
>>>
>>>
>>> On 7/12/2023 10:41 AM, Marek Vasut wrote:
 On 7/9/23 03:03, Abhinav Kumar wrote:
>
>
> On 7/7/2023 1:47 AM, Neil Armstrong wrote:
>> On 07/07/2023 09:18, Neil Armstrong wrote:
>>> Hi,
>>>
>>> On 06/07/2023 11:20, Amit Pundir wrote:
 On Wed, 5 Jul 2023 at 11:09, Dmitry Baryshkov
  wrote:
>
> [Adding freedreno@ to cc list]
>
> On Wed, 5 Jul 2023 at 08:31, Jagan Teki
>  wrote:
>>
>> Hi Amit,
>>
>> On Wed, Jul 5, 2023 at 10:15 AM Amit Pundir
>>  wrote:
>>>
>>> Hi Marek,
>>>
>>> On Wed, 5 Jul 2023 at 01:48, Marek Vasut  wrote:

 Do not generate the HS front and back porch gaps, the HSA gap and
 EOT packet, as these packets are not required. This makes the
 bridge
 work with Samsung DSIM on i.MX8MM and i.MX8MP.
>>>
>>> This patch broke display on Dragonboard 845c (SDM845) devboard
>>> running
>>> AOSP. This is what I see
>>> https://people.linaro.org/~amit.pundir/db845c-userdebug/v6.5-broken-display/PXL_20230704_150156326.jpg.
>>> Reverting this patch fixes this regression for me.
>>
>> Might be msm dsi host require proper handling on these updated
>> mode_flags? did they?
>
> The msm DSI host supports those flags. Also, I'd like to point out
> that the patch didn't change the rest of the driver code. So even if
> drm/msm ignored some of the flags, it should not have caused the
> issue. Most likely the issue is on the lt9611 side. I's suspect that
> additional programming is required to make it work with these flags.

 I spent some time today on smoke testing these flags (individually
 and
 in limited combination) on DB845c, to narrow down this breakage to
 one
 or more flag(s) triggering it. Here are my observations in limited
 testing done so far.

 There is no regression with MIPI_DSI_MODE_NO_EOT_PACKET when enabled
 alone and system boots to UI as usual.

 MIPI_DSI_MODE_VIDEO_NO_HFP always trigger the broken display as in
 the
 screenshot[1] shared earlier as well.

 Adding either of MIPI_DSI_MODE_VIDEO_NO_HSA and
 MIPI_DSI_MODE_VIDEO_NO_HBP always result in no display, unless paired
 with MIPI_DSI_MODE_VIDEO_NO_HFP and in that case we get the broken
 display as reported.

 In short other than MIPI_DSI_MODE_NO_EOT_PACKET flag, all other flags
 added in this commit break the display on DB845c one way or another.
>>>
>>> I think the investigation would be to understand why samsung-dsim
>>> requires
>>> such flags and/or what are the difference in behavior between MSM
>>> DSI and samsung DSIM
>>> for those flags ?
>>>
>>> If someone has access to the lt9611 datasheet, so it requires
>>> HSA/HFP/HBP to be
>>> skipped ? and does MSM DSI and samsung DSIM skip them in the same
>>> way ?
>>
>> I think there's a mismatch, where on one side this flags sets the
>> link in LP-11 while
>> in HSA/HFP/HPB while on the other it completely removes those
>> blanking packets.
>>
>> The name MIPI_DSI_MODE_VIDEO_NO_HBP suggests removal of HPB, not
>> LP-11 while HPB.
>> the registers used in both controllers are different:
>> - samsung-dsim: DSIM_HBP_DISABLE_MODE
>> - msm dsi: DSI_VID_CFG0_HBP_POWER_STOP
>>
>> The first one suggest removing the packet, while the second one
>> suggests powering
>> off the line while in the blanking packet period.
>>
>> @Abhinav, can you comment on that ?
>>
>
> I dont get what it means by completely removes blanking packets in DSIM.

 MIPI_DSI_MODE_VIDEO_NO_HFP means the HBP period is just skipped by DSIM.

 Maybe there is a need for new set of flags which differentiate between
 HBP skipped (i.e. NO HBP) and HBP LP11 ?

>>>
>>> No, the section of the MIPI DSI spec I posted below clearly states there
>>> are

Re: Fwd: Unexplainable packet drop starting at v6.4

2023-07-19 Thread Thorsten Leemhuis

On 19.07.23 14:30, Bagas Sanjaya wrote:
> On 7/19/23 18:49, Thorsten Leemhuis wrote:
>> On 18.07.23 02:51, Bagas Sanjaya wrote:
>>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>>
>>>> After I updated to 6.4 through Archlinux kernel update, suddenly I noticed 
>>>> random packet losses on my routers like nodes. I have these networking 
>>>> relevant config on my nodes
>>>>
>>>> 1. Using archlinux
>>>> 2. Network config through systemd-networkd
>>>> 3. Using bird2 for BGP routing, but not relevant to this bug.
>>>> 4. Using nftables for traffic control, but seems not relevant to this bug. 
>>>> 5. Not using fail2ban like dymanic filtering tools, at least at L3/L4 level
>>>>
>>>> After I ruled out systemd-networkd, nftables related issues. I tracked 
>>>> down issues to kernel.
>>> [...]
>>> See Bugzilla for the full thread.
>>>
>>> Thorsten: The reporter had a bad bisect (some bad commits were marked as 
>>> good
>>> instead), hence SoB chain for culprit (unrelated) ipvu commit is in To:
>>> list. I also asked the reporter (also in To:) to provide dmesg and request
>>> rerunning bisection, but he doesn't currently have a reliable reproducer.
>>> Is it the best I can do?
>>
>> When a bisection apparently went sideways it's best to not bother the
>> culprit's developers with it, they most likely will just be annoyed by
>> it (and then they might become annoyed by regression tracking, which we
>> need to avoid).
>
> I mean don't Cc: the culprit author in that case?

Yes. If a bisection lands on a commit that seems like a pretty unlikely
culprit for the problem at hand (which even the reporter admitted in the
report), then ask the reporter to verify the result (e.g. ideally by
trying to revert it ontop of latest mainline; checking the parent commit
again sometimes can do the trick as well)  before involving the people
that authored and handled said change. Otherwise you just raise a false
alarm and then people will be annoyed by our work or if we are unlucky
start to ignore us -- and we need to prevent that.

Ciao, Thorsten

Re: Fwd: Unexplainable packet drop starting at v6.4

2023-07-19 Thread Thorsten Leemhuis

On 18.07.23 02:51, Bagas Sanjaya wrote:
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:
> 
>> After I updated to 6.4 through Archlinux kernel update, suddenly I noticed 
>> random packet losses on my routers like nodes. I have these networking 
>> relevant config on my nodes
>>
>> 1. Using archlinux
>> 2. Network config through systemd-networkd
>> 3. Using bird2 for BGP routing, but not relevant to this bug.
>> 4. Using nftables for traffic control, but seems not relevant to this bug. 
>> 5. Not using fail2ban like dymanic filtering tools, at least at L3/L4 level
>>
>> After I ruled out systemd-networkd, nftables related issues. I tracked down 
>> issues to kernel.
> [...]
> See Bugzilla for the full thread.
> 
> Thorsten: The reporter had a bad bisect (some bad commits were marked as good
> instead), hence SoB chain for culprit (unrelated) ipvu commit is in To:
> list. I also asked the reporter (also in To:) to provide dmesg and request
> rerunning bisection, but he doesn't currently have a reliable reproducer.
> Is it the best I can do?

When a bisection apparently went sideways it's best to not bother the
culprit's developers with it, they most likely will just be annoyed by
it (and then they might become annoyed by regression tracking, which we
need to avoid).

I'd have forwarded this to the network folks, but in a style along the
lines of "FYI, in case somebody has a idea or has heard about something
similar and thus can help; if not, no worries, reporter is repeating the
bisection".

> Anyway, I'm adding this regression to be tracked in regzbot:
> 
> #regzbot introduced: a3efabee5878b8 
> https://bugzilla.kernel.org/show_bug.cgi?id=217678
> #regzbot title: packet drop on Intel X710-T4L due to ipvu boot fix
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217678

Side note for the record: Stephen also forwarded this. And let me also
clear the commit you specified, as it sounds it's unlikely to be causing
this.

#regzbot introduced: v6.3..v6.4
#regzbot monitor:
https://lore.kernel.org/all/20230717115352.79aecc71@hermes.local/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

Re: [PATCH v2] drm/ast: report connection status on Display Port.

2023-07-10 Thread Linux regression tracking (Thorsten Leemhuis)

On 10.07.23 10:12, Jocelyn Falempe wrote:
> On 06/07/2023 15:03, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 06.07.23 11:58, Jocelyn Falempe wrote:
>>> Aspeed always report the display port as "connected", because it
>>> doesn't set a .detect callback.
>>> Fix this by providing the proper detect callback for astdp and dp501.
>>>
>>> This also fixes the following regression:
>>> Since commit fae7d186403e ("drm/probe-helper: Default to 640x480 if no
>>>   EDID on DP")
>>> The default resolution is now 640x480 when no monitor is connected.
>>> But Aspeed graphics is mostly used in servers, where no monitor
>>> is attached. This also affects the remote BMC resolution to 640x480,
>>> which is inconvenient, and breaks the anaconda installer.
>>>
>>> v2: Add .detect callback to the dp/dp501 connector (Jani Nikula)
>>>
>>> Signed-off-by: Jocelyn Falempe 
>>
>> So if this "also fixes a regression" how about a Fixes: tag and a CC:
>> > also in all affected stable and longterm kernels?
> 
> In this case, the regression only affect one userspace program
> (anaconda),

That is (mostly) irrelevant when it comes to regressions.

> and the fix looks too risky to backport to all stable kernels.

Not sure, but I tend to thing that decision would better be left to the
stable team. Each developer will have a different opinion about what's
too risky or not and they might be in the better position to judge what
they want for their trees. A "Fixes:" tag thus still seems appropriate
here; will also tell downstream distros that might want to pick this up.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

Re: [PATCH 2/2] drm/bridge: lt9611: Do not generate HFP/HBP/HSA and EOT packet

2023-07-08 Thread Linux regression tracking (Thorsten Leemhuis)

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 05.07.23 06:45, Amit Pundir wrote:
> 
> On Wed, 5 Jul 2023 at 01:48, Marek Vasut  wrote:
>>
>> Do not generate the HS front and back porch gaps, the HSA gap and
>> EOT packet, as these packets are not required. This makes the bridge
>> work with Samsung DSIM on i.MX8MM and i.MX8MP.
> 
> This patch broke display on Dragonboard 845c (SDM845) devboard running
> AOSP. This is what I see
> https://people.linaro.org/~amit.pundir/db845c-userdebug/v6.5-broken-display/PXL_20230704_150156326.jpg.
> Reverting this patch fixes this regression for me.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 8ddce13ae69
#regzbot title drm/bridge: lt9611: Dragonboard 845c (SDM845) devboard
broken when running AOSP
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: [PATCH v2] drm/ast: report connection status on Display Port.

2023-07-06 Thread Linux regression tracking (Thorsten Leemhuis)

On 06.07.23 11:58, Jocelyn Falempe wrote:
> Aspeed always report the display port as "connected", because it
> doesn't set a .detect callback.
> Fix this by providing the proper detect callback for astdp and dp501.
> 
> This also fixes the following regression:
> Since commit fae7d186403e ("drm/probe-helper: Default to 640x480 if no
>  EDID on DP")
> The default resolution is now 640x480 when no monitor is connected.
> But Aspeed graphics is mostly used in servers, where no monitor
> is attached. This also affects the remote BMC resolution to 640x480,
> which is inconvenient, and breaks the anaconda installer.
> 
> v2: Add .detect callback to the dp/dp501 connector (Jani Nikula)
> 
> Signed-off-by: Jocelyn Falempe 

So if this "also fixes a regression" how about a Fixes: tag and a CC:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

Re: [PATCH 1/2] fbdev/offb: Update expected device name

2023-06-15 Thread Linux regression tracking (Thorsten Leemhuis)

On 16.04.23 14:34, Salvatore Bonaccorso wrote:
> 
> On Wed, Apr 12, 2023 at 11:55:08AM +0200, Cyril Brulebois wrote:
>> Since commit 241d2fb56a18 ("of: Make OF framebuffer device names unique"),
>> as spotted by Frédéric Bonnard, the historical "of-display" device is
>> gone: the updated logic creates "of-display.0" instead, then as many
>> "of-display.N" as required.
>>
>> This means that offb no longer finds the expected device, which prevents
>> the Debian Installer from setting up its interface, at least on ppc64el.
>>
>> It might be better to iterate on all possible nodes, but updating the
>> hardcoded device from "of-display" to "of-display.0" is confirmed to fix
>> the Debian Installer at the very least.
> [...]
> #regzbot ^introduced 241d2fb56a18
> #regzbot title: Open Firmware framebuffer cannot find of-display
> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=217328
> #regzbot link: 
> https://lore.kernel.org/all/20230412095509.2196162-1-cy...@debamax.com/T/#m34493480243a2cad2ae359abfd9db5e755f41add
> #regzbot link: https://bugs.debian.org/1033058

No reply to my status inquiry[1] a few weeks ago, so I have to assume
nobody cares anymore. If somebody still cares, holler!

#regzbot inconclusive: no answer to a status inquiry
#regzbot ignore-activity

[1]
https://lore.kernel.org/lkml/d1aee7d3-05f6-0920-b8e1-4ed5cf3f9...@leemhuis.info/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

Re: [PATCH] Revert "drm/msm/dp: set self refresh aware based on PSR support"

2023-06-06 Thread Linux regression tracking #adding (Thorsten Leemhuis)




On 05.06.23 12:18, Johan Hovold wrote:
> On Mon, Jun 05, 2023 at 01:05:36PM +0300, Dmitry Baryshkov wrote:
>> On Mon, 5 Jun 2023 at 13:02, Johan Hovold  wrote:
> 
>>> Virtual terminals are still broken with 6.4-rc5 on the Lenovo ThinkPad
>>> X13s two weeks after I reported this, and there has been no indication
>>> of any progress in the other related thread:
>>>
>>> https://lore.kernel.org/lkml/zhyphnwodbxb-...@hovoldconsulting.com
>>>
>>> Seems like it is time to merge this revert to get this sorted.

BTW, thx for bringing this to my attention!

>>> Rob, Abhinav, Dmitry, can either of you merge this one and get it into
>>> 6.4-rc6?
>>
>> Rob sent the pull request few hours ago, see
>> https://lore.kernel.org/dri-devel/caf6aeguhujkfjra6ys36uyh0kur4hd16u1emqjo8toz3ifv...@mail.gmail.com/
> 
> Ok, so you guys went with the module parameter hack. Whatever. As long
> as the regression is finally fixed.

Yup. Let me tell regzbot about the fix:

#regzbot fix: drm/msm/dp: add module parameter for PSR
#regzbot ignore-activity

> Next time, some visibility into your process would be appreciated to
> avoid unnecessary work.

Yeah, that's something we IMHO sooner or later need to improve for all
of kernel development -- among others to give people that find existing
bug reports a chance to find patches that were posted or applied to
address the issue (and of course reporters also, like in this case).

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: [PATCH] drm/probe_helper: fix the warning reported when calling drm_kms_helper_poll_disable during suspend

2023-06-05 Thread Thorsten Leemhuis

On 17.05.23 17:15, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
> 
> Dmitry, was any progress made to address this regression? Doesn't look
> like it, but I strongly suspect I'm missing something,

FWIW, I'm dropping this from the regression tracking. It might be still
around, but it's a warning and nothing that really breaks things afaics.

Please holler if I got this wrong and think it's something that needs to
be fixed to ensure the "no regressions" rule is adhered.

#regzbot inconclusive: a warn() without serious impact, see discussion
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> as I'm not really
> sure if I properly understood this thread. It sounded a bit like
> a4e771729a51 should be reverted for now until all
> drm_kms_helper_poll_disable() calls have been verified. Is that right?
> Or did somebody already verify and fix all of them with bugs?
>
> On 28.04.23 03:17, zongmin zhou wrote:
>> On Wed, 2023-04-26 at 16:10 +0300, Dmitry Baryshkov wrote:
>>> On Wed, 26 Apr 2023 at 12:09, zongmin zhou 
>>> wrote:
>>>> On Sun, 2023-04-23 at 22:51 +0200, Janne Grunau wrote:
>>>>> On 2023-04-20 23:07:01 +0300, Dmitry Baryshkov wrote:
>>>>>> On Thu, 20 Apr 2023 at 23:01, Janne Grunau 
>>>>>> wrote:
>>>>>>>
>>>>>>> On 2023-03-28 10:31:29 +0800, Zongmin Zhou wrote:
>>>>>>>> When drivers call drm_kms_helper_poll_disable from
>>>>>>>> their device suspend implementation without enabled output
>>>>>>>> polling before,
>>>>>>>> following warning will be reported,due to work->func not be
>>>>>>>> initialized:
>>>>>>>
>>>>>>> we see the same warning with the wpork in progress kms driver
>>>>>>> for
>>>>>>> apple
>>>>>>> silicon SoCs. The connectors do not need to polled so the
>>>>>>> driver
>>>>>>> never
>>>>>>> calls drm_kms_helper_poll_init().
>>>>>>>
>>>>>>>> [   55.141361] WARNING: CPU: 3 PID: 372 at
>>>>>>>> kernel/workqueue.c:3066 __flush_work+0x22f/0x240
>>>>>>>> [   55.141382] Modules linked in: nls_iso8859_1
>>>>>>>> snd_hda_codec_generic ledtrig_audio snd_hda_intel
>>>>>>>> snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec
>>>>>>>> snd_hda_core
>>>>>>>> snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event
>>>>>>>> snd_rawmidi
>>>>>>>> snd_seq intel_rapl_msr intel_rapl_common bochs
>>>>>>>> drm_vram_helper
>>>>>>>> drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul
>>>>>>>> snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3
>>>>>>>> aesni_intel drm_kms_helper joydev input_leds syscopyarea
>>>>>>>> crypto_simd snd cryptd sysfillrect sysimgblt mac_hid
>>>>>>>> serio_raw
>>>>>>>> soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp
>>>>>>>> parport drm ramoops reed_solomon pstore_blk pstore_zone
>>>>>>>> efi_pstore virtio_rng ip_tables x_tables autofs4
>>>>>>>> hid_generic
>>>>>>>> usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse
>>>>>>>> virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover
>>>>>>>> virtio_blk xhci_pci_renesas failover
>>>>>>>> [   55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not
>>>>>>>> tainted
>>>>>>>> 6.2.0-rc6+ #16
>>>>>>>> [   55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9,
>>>>>>>> 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org
>>>>>>>> 04/01/2014
>>>>>>>> [   55.141435] Workqueue: events_unbound async_run_entry_fn
>>>>>>>> [   55.141441] RIP: 0010:__flush_work+0x22f/0x240
>>>>>>>> [   55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9

Re: [PATCH v3 11/13] drm/fb-helper: Fix single-probe color-format selection

2023-05-26 Thread Linux regression tracking #update (Thorsten Leemhuis)

[TLDR: This mail in primarily relevant for Linux regression tracking. A
change or fix related to the regression discussed in this thread was
posted or applied, but it did not use a Link: tag to point to the
report, as Linus and the documentation call for. Things happen, no
worries -- but now the regression tracking bot needs to be told manually
about the fix. See link in footer if these mails annoy you.]

On 14.05.23 14:10, Linux regression tracking #adding (Thorsten Leemhuis)
wrote:
> On 12.05.23 15:20, Linus Walleij wrote:
>> Sorry for late regression detection but this patch regresses
>> the Integrator AB IMPD-1 graphics, I bisected down to this
>> patch.
> 
> #regzbot ^introduced 37c90d589dc
> #regzbot title drm/fb-helper: downscaling apparently stopped to work
> with pl110_impd1
> #regzbot ignore-activity

#regzbot monitor:
https://lore.kernel.org/all/20230515092943.1401558-1-linus.wall...@linaro.org/
#regzbot fix: drm/pl111: Fix FB depth on IMPD-1 framebuffer
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: [Nouveau] Fwd: absent both plymouth, and video= on linu lines, vtty[1-6] framebuffers produce vast raster right and bottom borders on the larger resolution of two displays

2023-05-25 Thread Thorsten Leemhuis

On 25.05.23 12:55, Bagas Sanjaya wrote:
> On 5/25/23 17:52, Bagas Sanjaya wrote:
>>
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>> [...]
>> Anyway, I'm adding it to regzbot:
>>
>> #regzbot introduced: v6.1.12..v6.2.12
>> #regzbot title: vast raster right and bottom borders on larger display (two 
>> displays with inequal resolution) unless forcing resolution with video= 
>> parameter

Bagas, thx again for your efforts, much appreciated. But I guess for drm
drivers that have a line like

B: https://gitlab.freedesktop.org/drm/[...]

in MAINTAINERS (which includes all the popular drm drivers) this just
creates a lot of confusion for everyone, as one issue will likely end up
being discussed in two or three places in parallel (bugzilla,
freedesktop, email). Better tell reporters to move their issue to the
freedesktop drm tracker and close the ticket in bugzilla. And don't get
regzbot involved, as it for now it sadly is unable to monitor the
freedesktop drm tracker (sooner or later I'll fix that, but for now it's
a blind spot :-/).

Pretty sure none of the DRM developers will disagree, but if I'm wrong,
please holler.

> Oops, I forget to add bugzilla link:
> 
> #regzbot introduced: v6.1.12..v6.2.12 
> https://bugzilla.kernel.org/show_bug.cgi?id=217479
> #regzbot from: Felix Miata 

Side note: that currently does not work with regzbot. :-/ Whatever, I'll
remove it from the tracking due to above reasons:

#regzbot inconclusive: sadly not tracked for now

Ciao, Thorsten

Re: [PATCH 2/2] drm/ofdrm: Update expected device name

2023-05-22 Thread Linux regression tracking (Thorsten Leemhuis)

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Was a proper solution for the regression the initial mail in this thread
is about ever found? Doesn't look like it for here, but maybe I'm
missing something.

Reminder, the problem afaik is caused by 241d2fb56a ("of: Make OF
framebuffer device names unique") [merged for v6.2-rc8, authored by
Michal Suchanek; committed by Rob Herring].

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 24.04.23 11:35, Helge Deller wrote:
> On 4/24/23 11:07, Thomas Zimmermann wrote:
>> Am 24.04.23 um 09:33 schrieb Geert Uytterhoeven:
>>> On Wed, Apr 12, 2023 at 12:05 PM Cyril Brulebois 
>>> wrote:
 Since commit 241d2fb56a18 ("of: Make OF framebuffer device names
 unique"),
 as spotted by Frédéric Bonnard, the historical "of-display" device is
 gone: the updated logic creates "of-display.0" instead, then as many
 "of-display.N" as required.

 This means that offb no longer finds the expected device, which
 prevents
 the Debian Installer from setting up its interface, at least on
 ppc64el.

 Given the code similarity it is likely to affect ofdrm in the same way.

 It might be better to iterate on all possible nodes, but updating the
 hardcoded device from "of-display" to "of-display.0" is likely to help
 as a first step.

 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217328
 Link: https://bugs.debian.org/1033058
 Fixes: 241d2fb56a18 ("of: Make OF framebuffer device names unique")
 Cc: sta...@vger.kernel.org # v6.2+
 Signed-off-by: Cyril Brulebois 
>>>
>>> Thanks for your patch, which is now commit 3a9d8ea2539ebebd
>>> ("drm/ofdrm: Update expected device name") in fbdev/for-next.
>>>
 --- a/drivers/gpu/drm/tiny/ofdrm.c
 +++ b/drivers/gpu/drm/tiny/ofdrm.c
 @@ -1390,7 +1390,7 @@ MODULE_DEVICE_TABLE(of, ofdrm_of_match_display);

   static struct platform_driver ofdrm_platform_driver = {
  .driver = {
 -   .name = "of-display",
 +   .name = "of-display.0",
  .of_match_table = ofdrm_of_match_display,
  },
  .probe = ofdrm_probe,
>>>
>>> Same comment as for "[PATCH 1/2] fbdev/offb: Update expected device
>>> name".
>>>
>>> https://lore.kernel.org/r/camuhmdvgeeasmb4tauuqqgj-4+bbetwewyja+m9nyjv0bj_...@mail.gmail.com
>>
>> Sorry that I missed this patch. I agree that it's probably not
>> correct. At least in ofdrm, we want to be able to use multiple
>> framebuffers at the same time; a feature that has been broken by this
>> change.
> 
> Geert & Thomas, thanks for the review!
> 
> I've dropped both patches from fbdev tree for now.
> Would be great to find another good solution though, as it breaks the
> debian
> installer.
> 
> Helge

Re: [PATCH] drm/probe_helper: fix the warning reported when calling drm_kms_helper_poll_disable during suspend

2023-05-17 Thread Linux regression tracking (Thorsten Leemhuis)

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Dmitry, was any progress made to address this regression? Doesn't look
like it, but I strongly suspect I'm missing something, as I'm not really
sure if I properly understood this thread. It sounded a bit like
a4e771729a51 should be reverted for now until all
drm_kms_helper_poll_disable() calls have been verified. Is that right?
Or did somebody already verify and fix all of them with bugs?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 28.04.23 03:17, zongmin zhou wrote:
> On Wed, 2023-04-26 at 16:10 +0300, Dmitry Baryshkov wrote:
>> On Wed, 26 Apr 2023 at 12:09, zongmin zhou 
>> wrote:
>>> On Sun, 2023-04-23 at 22:51 +0200, Janne Grunau wrote:
 On 2023-04-20 23:07:01 +0300, Dmitry Baryshkov wrote:
> On Thu, 20 Apr 2023 at 23:01, Janne Grunau 
> wrote:
>>
>> On 2023-03-28 10:31:29 +0800, Zongmin Zhou wrote:
>>> When drivers call drm_kms_helper_poll_disable from
>>> their device suspend implementation without enabled output
>>> polling before,
>>> following warning will be reported,due to work->func not be
>>> initialized:
>>
>> we see the same warning with the wpork in progress kms driver
>> for
>> apple
>> silicon SoCs. The connectors do not need to polled so the
>> driver
>> never
>> calls drm_kms_helper_poll_init().
>>
>>> [   55.141361] WARNING: CPU: 3 PID: 372 at
>>> kernel/workqueue.c:3066 __flush_work+0x22f/0x240
>>> [   55.141382] Modules linked in: nls_iso8859_1
>>> snd_hda_codec_generic ledtrig_audio snd_hda_intel
>>> snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec
>>> snd_hda_core
>>> snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event
>>> snd_rawmidi
>>> snd_seq intel_rapl_msr intel_rapl_common bochs
>>> drm_vram_helper
>>> drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul
>>> snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3
>>> aesni_intel drm_kms_helper joydev input_leds syscopyarea
>>> crypto_simd snd cryptd sysfillrect sysimgblt mac_hid
>>> serio_raw
>>> soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp
>>> parport drm ramoops reed_solomon pstore_blk pstore_zone
>>> efi_pstore virtio_rng ip_tables x_tables autofs4
>>> hid_generic
>>> usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse
>>> virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover
>>> virtio_blk xhci_pci_renesas failover
>>> [   55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not
>>> tainted
>>> 6.2.0-rc6+ #16
>>> [   55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9,
>>> 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org
>>> 04/01/2014
>>> [   55.141435] Workqueue: events_unbound async_run_entry_fn
>>> [   55.141441] RIP: 0010:__flush_work+0x22f/0x240
>>> [   55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff
>>> ff
>>> 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff
>>> ff
>>> 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2
>>> 54
>>> d8 00 66 90 90 90 90 90 90
>>> [   55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246
>>> [   55.141449] RAX:  RBX: 
>>> RCX:
>>> 9b72bcbe
>>> [   55.141450] RDX: 0001 RSI: 0001
>>> RDI:
>>> ff3ea01e4265e330
>>> [   55.141451] RBP: ff59221940833c90 R08: 
>>> R09:
>>> 8080808080808080
>>> [   55.141453] R10: ff3ea01e42b3caf4 R11: 000f
>>> R12:
>>> ff3ea01e4265e330
>>> [   55.141454] R13: 0001 R14: ff3ea01e505e5e80
>>> R15:
>>> 0001
>>> [   55.141455] FS:  ()
>>> GS:ff3ea01fb7cc() knlGS:
>>> [   55.141456] CS:  0010 DS:  ES:  CR0:
>>> 80050033
>>> [   55.141458] CR2: 563543ad1546 CR3: 00010ee82005
>>> CR4:
>>> 00771ee0
>>> [   55.141464] DR0:  DR1: 
>>> DR2:
>>> 
>>> [   55.141465] DR3:  DR6: fffe0ff0
>>> DR7:
>>> 0400
>>> [   55.141466] PKRU: 5554
>>> [   55.141467] Call Trace:
>>> [   55.141469]  
>>> [   55.141472]  ? pcie_wait_cmd+0xdf/0x220
>>> [   55.141478]  ? mptcp_seq_show+0xe0/0x180
>>> [   55.141484]  __cancel_work_timer+0x124/0x1b0
>>> [   55.141487]  cancel_delayed_work_sync+0x17/0x20
>>> [   55.141490]  drm_kms_helper_poll_disable+0x26/0x40
>>> [drm_kms_helper]
>>> [   55.141516]

Re: [PATCH v3 11/13] drm/fb-helper: Fix single-probe color-format selection

2023-05-14 Thread Linux regression tracking #adding (Thorsten Leemhuis)

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 12.05.23 15:20, Linus Walleij wrote:
> Sorry for late regression detection but this patch regresses
> the Integrator AB IMPD-1 graphics, I bisected down to this
> patch.
> 
> On Mon, Jan 2, 2023 at 12:30 PM Thomas Zimmermann  wrote:
> [...]
> Before this patch:
> 
> [drm] Initialized pl111 1.0.0 20170317 for c100.display on minor 0
> drm-clcd-pl111 c100.display: [drm] requested bpp 16, scaled depth down to 
> 15
> drm-clcd-pl111 c100.display: enable IM-PD1 CLCD connectors
> Console: switching to colour frame buffer device 80x30
> drm-clcd-pl111 c100.display: [drm] fb0: pl111drmfb frame buffer device
> 
> After this patch:
> 
> [drm] Initialized pl111 1.0.0 20170317 for c100.display on minor 0
> drm-clcd-pl111 c100.display: [drm] bpp/depth value of 16/16 not supported
> drm-clcd-pl111 c100.display: [drm] No compatible format found
> drm-clcd-pl111 c100.display: [drm] *ERROR* fbdev: Failed to setup
> generic emulation (ret=-12)
> 
> It seems the bpp downscaling stopped to work? [...]

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 37c90d589dc
#regzbot title drm/fb-helper: downscaling apparently stopped to work
with pl110_impd1
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: Fwd: Kernel 5.11 crashes when it boots, it produces black screen.

2023-05-10 Thread Linux regression tracking (Thorsten Leemhuis)

Hi!

On 10.05.23 10:26, Bagas Sanjaya wrote:
> 
> I noticed a regression report on Bugzilla ([1]). As many developers don't
> have a look on it, I decided to forward it by email. See the report
> for the full thread.
> 
> Quoting from the report:
> 
>>  Azamat S. Kalimoulline 2021-04-06 15:45:08 UTC
>>
>> Same as in https://bugzilla.kernel.org/show_bug.cgi?id=212133, but not 
>> StoneyRidge related. I have same issue in 5.11.9 kernel, but on Renoir 
>> architecture. I have AMD Ryzen 5 PRO 4650U with Radeon Graphics. Same stuck 
>> on loading initial ramdisk. modprobe.blacklist=amdgpu 3` didn't help to 
>> boot. Same stuck. Also iommu=off and acpi=off too. 5.10.26 boots fine. I 
>> boot via efi and I have no option boot without it.
> 
> Azamat, can you try reproducing this issue on latest mainline?
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=212579

Bagas, thx for all your help with regression tracking, much appreciated
(side note, as I'm curious for a while already: what is your motivation?
Just want to help? But whatever, any help is great!).

That being said: I'm not sure if I like what you did in this particular
case, as developers might start getting annoyed by regression tracking
if we throw too many bug reports of lesser quality before their feet --
and then they might start to ignore us, which we really need to prevent.

That's why I would not have forwarded that report at this point of time,
mainly for these reasons:

 * The initial report is quite old already, as it fall through the
cracks (not good, but happens; sorry Azamat!). Hence in this case it
would definitely be better to *first* ask the reporter to check if the
problem still happens with latest mainline (or at least latest stable)
before involving the kernel developers, as it might have been fixed
already.

 * This might not be a amdgpu bug at all; in fact the other bug the
reporter mentioned was an iommu thing. Hence this might be one of those
regressions where a bisection is the only way to get down to the
problem. Sure, sending a few developers a quick inquiry along the lines
of "do you maybe have an idea what's up there" is fine, but that's not
what you did in your mail. Your list of recipients is also quite long;
that's risky: if you do that too often, as then they might start
ignoring mail from you.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

Re: PROBLEM: AMD Ryzen 9 7950X iGPU - Blinking Issue

2023-05-02 Thread Linux regression tracking (Thorsten Leemhuis)

On 02.05.23 15:48, Felix Richter wrote:
> On 5/2/23 15:34, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 02.05.23 15:13, Alex Deucher wrote:
>>> On Tue, May 2, 2023 at 7:45 AM Linux regression tracking (Thorsten
>>> Leemhuis)  wrote:
>>>
>>>> On 30.04.23 13:44, Felix Richter wrote:
>>>>> Hi,
>>>>>
>>>>> I am running into an issue with the integrated GPU of the Ryzen 9
>>>>> 7950X. It seems to be a regression from kernel version 6.1 to 6.2.
>>>>> The bug materializes in from of my monitor blinking, meaning it
>>>>> turns full white shortly. This happens very often so that the
>>>>> system becomes unpleasant to use.
>>>>>
>>>>> I am running the Archlinux Kernel:
>>>>> The Issue happens on the bleeding edge kernel: 6.2.13
>>>>> Switching back to the LTS kernel resolves the issue: 6.1.26
>>>>>
>>>>> I have two monitors attached to the system. One 42 inch 4k Display
>>>>> and a 24 inch 1080p Display and am running sway as my desktop.
>>>>>
>>>>> Let me know if there is more information I could provide to help
>>>>> narrow down the issue.
>>>> Thanks for the report. To be sure the issue doesn't fall through the
>>>> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
>>>> tracking bot:
>>>>
>>>> #regzbot ^introduced v6.1..v6.2
>>>> #regzbot title drm: amdgpu: system becomes unpleasant to use after
>>>> monitor starts blinking and turns full white
>>>> #regzbot ignore-activity
>>>>
>>>> This isn't a regression? This issue or a fix for it are already
>>>> discussed somewhere else? It was fixed already? You want to clarify
>>>> when
>>>> the regression started to happen? Or point out I got the title or
>>>> something else totally wrong? Then just reply and tell me -- ideally
>>>> while also telling regzbot about it, as explained by the page listed in
>>>> the footer of this mail.
>>>>
>>>> Developers: When fixing the issue, remember to add 'Link:' tags
>>>> pointing
>>>> to the report (the parent of this mail). See page linked in footer for
>>>> details.
>>> This sounds exactly like the issue that was fixed in this patch which
>>> is already on it's way to Linus:
>>> https://gitlab.freedesktop.org/agd5f/linux/-/commit/08da182175db4c7f80850354849d95f2670e8cd9
>> FWIW, you in the flood of emails likely missed that this is the same
>> thread where you yesterday replied "If the module parameter didn't help
>> then perhaps you are seeing some other issue.  Can you bisect?". That's
>> why I decided to add this to the tracking. Or am I missing something
>> obvious here?
>>
>> /me looks around again and can't see anything, but that doesn't have to
>> mean anything...
>>
>> Felix, btw, this guide might help you with the bisection, even if it's
>> just for kernel compilation:
>>
>> https://docs.kernel.org/next/admin-guide/quickly-build-trimmed-linux.html
>>
>> And to indirectly reply to your mail from yesterday[1]. You might want
>> to ignore the arch linux kernel git repo and just do a bisection between
>> 6.1 and the latest 6.2.y kernel using upstream repos; and if I were you
>> I'd also try 6.3 or even mainline before that, in case the issue was
>> fixed already.
>>
>> [1]
>> https://lore.kernel.org/all/04749ee4-0728-92fe-bcb0-a7320279e...@felixrichter.tech/
>>
> Thanks for the pointers, I'll do a bisection on my desktop from 6.1 to
> the newest commit.

FWIW, I wonder what you actually mean with "newest commit" here: a
bisection between 6.1 and mainline HEAD might be a waste of time, *if*
this is something that only happens in 6.2.y (say due to a broken or
incomplete backport)

> That was the part I was mostly unsure about … where
> to start from.
> 
> I was planning to use PKGBUILD scripts from arch to achieve the same
> configuration as I would when installing
> the package and just rewrite the script to use a local copy of the
> source code instead of the repository.
> That way I can just use the bisect command, rebuild the package and test
> again.

In my experience trying to deal with Linux distro's package managers
creates more trouble than it's worth.

> But I probably won't be able to finish it this week, since I am on
> vacation starting tomorrow and will not have access to the computer in
> question. I will be back next week, by that time the patch Alex is
> talking about might
> already be in mainline. So if that fixes it, I will notice and let you
> know. If not I will do the bisection to figure out what the actual issue
> is.

Enjoy your vacation!

Ciao, Thorsten

Re: PROBLEM: AMD Ryzen 9 7950X iGPU - Blinking Issue

2023-05-02 Thread Linux regression tracking (Thorsten Leemhuis)

On 02.05.23 15:13, Alex Deucher wrote:
> On Tue, May 2, 2023 at 7:45 AM Linux regression tracking (Thorsten
> Leemhuis)  wrote:
>
>> On 30.04.23 13:44, Felix Richter wrote:
>>> Hi,
>>>
>>> I am running into an issue with the integrated GPU of the Ryzen 9 7950X. It 
>>> seems to be a regression from kernel version 6.1 to 6.2.
>>> The bug materializes in from of my monitor blinking, meaning it turns full 
>>> white shortly. This happens very often so that the system becomes 
>>> unpleasant to use.
>>>
>>> I am running the Archlinux Kernel:
>>> The Issue happens on the bleeding edge kernel: 6.2.13
>>> Switching back to the LTS kernel resolves the issue: 6.1.26
>>>
>>> I have two monitors attached to the system. One 42 inch 4k Display and a 24 
>>> inch 1080p Display and am running sway as my desktop.
>>>
>>> Let me know if there is more information I could provide to help narrow 
>>> down the issue.
>>
>> Thanks for the report. To be sure the issue doesn't fall through the
>> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
>> tracking bot:
>>
>> #regzbot ^introduced v6.1..v6.2
>> #regzbot title drm: amdgpu: system becomes unpleasant to use after
>> monitor starts blinking and turns full white
>> #regzbot ignore-activity
>>
>> This isn't a regression? This issue or a fix for it are already
>> discussed somewhere else? It was fixed already? You want to clarify when
>> the regression started to happen? Or point out I got the title or
>> something else totally wrong? Then just reply and tell me -- ideally
>> while also telling regzbot about it, as explained by the page listed in
>> the footer of this mail.
>>
>> Developers: When fixing the issue, remember to add 'Link:' tags pointing
>> to the report (the parent of this mail). See page linked in footer for
>> details.
> 
> This sounds exactly like the issue that was fixed in this patch which
> is already on it's way to Linus:
> https://gitlab.freedesktop.org/agd5f/linux/-/commit/08da182175db4c7f80850354849d95f2670e8cd9

FWIW, you in the flood of emails likely missed that this is the same
thread where you yesterday replied "If the module parameter didn't help
then perhaps you are seeing some other issue.  Can you bisect?". That's
why I decided to add this to the tracking. Or am I missing something
obvious here?

/me looks around again and can't see anything, but that doesn't have to
mean anything...

Felix, btw, this guide might help you with the bisection, even if it's
just for kernel compilation:

https://docs.kernel.org/next/admin-guide/quickly-build-trimmed-linux.html

And to indirectly reply to your mail from yesterday[1]. You might want
to ignore the arch linux kernel git repo and just do a bisection between
6.1 and the latest 6.2.y kernel using upstream repos; and if I were you
I'd also try 6.3 or even mainline before that, in case the issue was
fixed already.

[1]
https://lore.kernel.org/all/04749ee4-0728-92fe-bcb0-a7320279e...@felixrichter.tech/

Ciao, Thorsten

Re: PROBLEM: AMD Ryzen 9 7950X iGPU - Blinking Issue

2023-05-02 Thread Linux regression tracking (Thorsten Leemhuis)

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 30.04.23 13:44, Felix Richter wrote:
> Hi,
> 
> I am running into an issue with the integrated GPU of the Ryzen 9 7950X. It 
> seems to be a regression from kernel version 6.1 to 6.2. 
> The bug materializes in from of my monitor blinking, meaning it turns full 
> white shortly. This happens very often so that the system becomes unpleasant 
> to use.
> 
> I am running the Archlinux Kernel:
> The Issue happens on the bleeding edge kernel: 6.2.13
> Switching back to the LTS kernel resolves the issue: 6.1.26
> 
> I have two monitors attached to the system. One 42 inch 4k Display and a 24 
> inch 1080p Display and am running sway as my desktop.
> 
> Let me know if there is more information I could provide to help narrow down 
> the issue.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced v6.1..v6.2
#regzbot title drm: amdgpu: system becomes unpleasant to use after
monitor starts blinking and turns full white
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

build failure from drm/ttm commit now in mainline (was: linux-next: build failure after merge of the drm tree)

2023-04-25 Thread Thorsten Leemhuis

Lo!

Sometimes the regression tracker runs into regressions himself... :-D

On 11.04.23 08:47, Stephen Rothwell wrote:
> 
> After merging the drm tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
> 
> drivers/gpu/drm/ttm/ttm_pool.c:73:29: error: variably modified 
> 'global_write_combined' at file scope
>73 | static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
>   | ^
> drivers/gpu/drm/ttm/ttm_pool.c:74:29: error: variably modified 
> 'global_uncached' at file scope
>74 | static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
>   | ^~~
> drivers/gpu/drm/ttm/ttm_pool.c:76:29: error: variably modified 
> 'global_dma32_write_combined' at file scope
>76 | static struct ttm_pool_type 
> global_dma32_write_combined[TTM_DIM_ORDER];
>   | ^~~
> drivers/gpu/drm/ttm/ttm_pool.c:77:29: error: variably modified 
> 'global_dma32_uncached' at file scope
>77 | static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
>   | ^
> 
> Caused by commit
> 
>   322458c2bb1a ("drm/ttm: Reduce the number of used allocation orders for TTM 
> pages")
> 
> PMD_SHIFT is not necessarily a constant on ppc (see
> arch/powerpc/include/asm/book3s/64/pgtable.h).
> 
> I have reverted that commit for today.

Did anyone look into this? I today ran into what looks like the same
compiler error when building a mainline snapshot using a Fedora rawhide
config for ppc64le:

```
 drivers/gpu/drm/ttm/ttm_pool.c:73:29: error: variably modified
'global_write_combined' at file scope
73 | static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
   | ^
 drivers/gpu/drm/ttm/ttm_pool.c:74:29: error: variably modified
'global_uncached' at file scope
74 | static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
   | ^~~
 drivers/gpu/drm/ttm/ttm_pool.c:76:29: error: variably modified
'global_dma32_write_combined' at file scope
76 | static struct ttm_pool_type
global_dma32_write_combined[TTM_DIM_ORDER];
   | ^~~
 drivers/gpu/drm/ttm/ttm_pool.c:77:29: error: variably modified
'global_dma32_uncached' at file scope
77 | static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
   | ^
```

Full build log:

https://copr-be.cloud.fedoraproject.org/results/@kernel-vanilla/mainline/fedora-37-ppc64le/05850588-mainline-mainline-releases/build.log.gz

Ciao, Thorsten

Re: [PATCH v3] firmware/sysfb: Fix VESA format selection

2023-04-21 Thread Linux regression tracking (Thorsten Leemhuis)

On 20.04.23 17:57, Pierre Asselin wrote:
> Some legacy BIOSes report no reserved bits in their 32-bit rgb mode,
> breaking the calculation of bits_per_pixel in commit f35cd3fa7729
> ("firmware/sysfb: Fix EFI/VESA format selection").  However they report
> lfb_depth correctly for those modes.  Keep the computation but
> set bits_per_pixel to lfb_depth if the latter is larger.
> 
> v2 fixes the warnings from a max3() macro with arguments of different
> types;  split the bits_per_pixel assignment to avoid uglyfing the code
> with too many casts.
> 
> v3 fixes space and formatting blips pointed out by Javier, and change
> the bit_per_pixel assignment back to a single statement using two casts.
> 
> Link: https://lore.kernel.org/r/4psm6b6lqkz1...@panix3.panix.com
> Link: https://lore.kernel.org/r/20230412150225.3757223-1-javi...@redhat.com
> Link: 
> https://lore.kernel.org/dri-devel/20230418183325.2327-1...@panix.com/T/#u
> Link: 
> https://lore.kernel.org/dri-devel/20230419044834.10816-1...@panix.com/T/#u
> Fixes: f35cd3fa7729 ("firmware/sysfb: Fix EFI/VESA format selection")
> Signed-off-by: Pierre Asselin 

Linus might release the final this weekend and this is among the last
few 6.3 regressions I track. Hence please allow me to ask:

Pierre, Tomas, Javier, et. al: how many "legacy BIOSes" do we suspect
are affected by this? So many that it might be worth delaying the
release by one week? And in case everybody involved might agree that
this patch is ready by today or tomorrow: might it be worth asking Linus
to merge this patch directly[1]?

[FWIW, I highly suspect the answer to the last two questions is "no,
that's definitely not worth is", just wanted to confirm]

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

[1] yes, that's a thing we do:
https://lore.kernel.org/all/CAHk-=wis_qqy4odnynnki5b7qhosmxtoj1jxo5wmb6sruwq...@mail.gmail.com/

Re: [PATCH v3 01/13] firmware/sysfb: Fix EFI/VESA format selection

2023-04-16 Thread Linux regression tracking #update (Thorsten Leemhuis)

[TLDR: This mail in primarily relevant for Linux regression tracking. A
change or fix related to the regression discussed in this thread was
posted or applied, but it did not use a Link: tag to point to the
report, as Linus and the documentation call for. Things happen, no
worries -- but now the regression tracking bot needs to be told manually
about the fix. See link in footer if these mails annoy you.]

On 08.04.23 13:26, Linux regression tracking #adding (Thorsten Leemhuis)
wrote:
> 
> On 06.04.23 17:45, Pierre Asselin wrote:
>> Thomas Zimmermann  wrote:
>> [...] 
>> Starting at linux-6.3-rc1 my simplefb picks the wrong mode and garbles
>> the display This is on a 16-year old i686 laptop.  I can post lshw or
>> dmidecode output if it helps.
>> [...] 
>> I bisected it to f35cd3fa77293c2cd03e94b6a6151e1a7d9309cf
>> firmware/sysfb: Fix EFI/VESA format selection
> 
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
> 
> #regzbot ^introduced f35cd3fa77293c2cd03e
> #regzbot title firmware/sysfb: wrong mode and display garbled on 16-year
> old i686 laptop
> #regzbot ignore-activity

#regzbot monitor:
https://lore.kernel.org/lkml/20230412150225.3757223-1-javi...@redhat.com/
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: [PATCH v3 01/13] firmware/sysfb: Fix EFI/VESA format selection

2023-04-08 Thread Linux regression tracking #adding (Thorsten Leemhuis)

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 06.04.23 17:45, Pierre Asselin wrote:
> Thomas Zimmermann  wrote:
> [...] 
> Starting at linux-6.3-rc1 my simplefb picks the wrong mode and garbles
> the display This is on a 16-year old i686 laptop.  I can post lshw or
> dmidecode output if it helps.
> [...] 
> I bisected it to f35cd3fa77293c2cd03e94b6a6151e1a7d9309cf
> firmware/sysfb: Fix EFI/VESA format selection

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced f35cd3fa77293c2cd03e
#regzbot title firmware/sysfb: wrong mode and display garbled on 16-year
old i686 laptop
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: [PATCH v4 1/5] docs: process: allow Closes tags with links

2023-04-04 Thread Thorsten Leemhuis



On 03.04.23 18:23, Matthieu Baerts wrote:
> [...]
> diff --git a/Documentation/process/submitting-patches.rst 
> b/Documentation/process/submitting-patches.rst
> index 828997bc9ff9..12d58ddc2b8a 100644
> --- a/Documentation/process/submitting-patches.rst
> +++ b/Documentation/process/submitting-patches.rst
> @@ -113,11 +113,9 @@ there is no collision with your six-character ID now, 
> that condition may
>  change five years from now.
>  
>  If related discussions or any other background information behind the change
> -can be found on the web, add 'Link:' tags pointing to it. In case your patch
> -fixes a bug, for example, add a tag with a URL referencing the report in the
> -mailing list archives or a bug tracker; if the patch is a result of some
> -earlier mailing list discussion or something documented on the web, point to
> -it.
> +can be found on the web, add 'Link:' tags pointing to it. If the patch is a
> +result of some earlier mailing list discussions or something documented on 
> the
> +web, point to it.
>  
>  When linking to mailing list archives, preferably use the lore.kernel.org
>  message archiver service. To create the link URL, use the contents of the
> @@ -134,6 +132,16 @@ resources. In addition to giving a URL to a mailing list 
> archive or bug,
>  summarize the relevant points of the discussion that led to the
>  patch as submitted.
>  
> +In case your patch fixes a bug, use the 'Closes:' tag with a URL referencing
> +the report in the mailing list archives or a public bug tracker. For 
> example::
> +
> + Closes: https://example.com/issues/1234

YMMV, but is this...

> +Some bug trackers have the ability to close issues automatically when a
> +commit with such a tag is applied. Some bots monitoring mailing lists can
> +also track such tags and take certain actions. Private bug trackers and
> +invalid URLs are forbidden.
> +

...section (and a similar one in the other document) really worth it
and/or does it have to be that long? A simple "Some bug trackers then
will automatically close the issue when the commit is merged" IMHO would
suffice, but OTOH it might be considered common knowledge. And the
"found on the web", "a public bug tracker" (both quoted above) and
"available on the web" (quoted below) already make it pretty clear that
links to private bug trackers are now desired. And there is also a
"Please check the link to make sure that it is actually working and
points to the relevant message." in submitting-patches.rst already, so
invalid URLs are obviously not wanted either.

>  If your patch fixes a bug in a specific commit, e.g. you found an issue using
>  ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
>  the SHA-1 ID, and the one line summary.  Do not split the tag across multiple
> @@ -498,9 +506,11 @@ Using Reported-by:, Tested-by:, Reviewed-by:, 
> Suggested-by: and Fixes:
>  The Reported-by tag gives credit to people who find bugs and report them and 
> it
>  hopefully inspires them to help us again in the future. The tag is intended 
> for
>  bugs; please do not use it to credit feature requests. The tag should be
> -followed by a Link: tag pointing to the report, unless the report is not
> -available on the web. Please note that if the bug was reported in private, 
> then
> -ask for permission first before using the Reported-by tag.
> +followed by a Closes: tag pointing to the report, unless the report is not
> +available on the web. The Link: tag can be used instead of Closes: if the 
> patch
> +fixes a part of the issue(s) being reported. Please note that if the bug was
> +reported in private, then ask for permission first before using the 
> Reported-by
> +tag.
>  
>  A Tested-by: tag indicates that the patch has been successfully tested (in
>  some environment) by the person named.  This tag informs maintainers that

Ciao, Thorsten

Re: [PATCH v3 0/4] docs & checkpatch: allow Closes tags with links

2023-03-31 Thread Thorsten Leemhuis

On 31.03.23 12:08, Conor Dooley wrote:
> On Fri, Mar 31, 2023 at 11:39:22AM +0200, Thorsten Leemhuis wrote:
> 
>> -Please check the link to make sure that it is actually working and points
>> -to the relevant message.
>> +If the URL points to a bug report that is fixed by the patch, use 'Closes:'
>> +instead.
> 
> This is not specifically a comment about your additional diff, but this
> sprang to mind (again) while reading it.
> I have been wondering if this sort of thing will lead to inconsistency. 
> Reports sometimes report more than one issue at once. Other times a
> patch that is (intentionally) not a complete fix for the problem.
> Using Closes: in those cases is not really true, as it does not close
> the report.
>
> Having a series of N patches, each of which purport to close an issue,
> also doesn't seem quite right.
> The word Closes has a meaning and "forcing" the use of Closes: for
> reports implies meaning that may not be present.
> 
> I suppose it is true that just because documentation or checkpatch says
> to do something, doesn't mean that you **have** to do it but I don't
> want to be the one on the Rx side of a rant...

Yeah, maybe checkpath.pl should allow a "Link" after a "Reported-by" for
cases like this, then developers could save "Closes" for the patch that
addresses the last of the issues the report is about.

OTOH checkpatch.pl currently just prints a warning, so developers could
ignore this and do the above already now, as you say. Guess it depends
on how often we expect "one report with multiple issue" to happen.

Maybe this is an indicator that we are on the wrong track in general and
should not do any of this and just stick to "Link:".

Ciao, Thorsten

Re: [PATCH v3 4/4] checkpatch: check for misuse of the link tags

2023-03-31 Thread Thorsten Leemhuis




On 31.03.23 11:44, Matthieu Baerts wrote:
> Hi Thorsten,
> 
> On 31/03/2023 10:57, Thorsten Leemhuis wrote:
>> On 30.03.23 20:13, Matthieu Baerts wrote:
>>> "Link:" and "Closes:" tags have to be used with public URLs.
>>>
>>> It is difficult to make sure the link is public but at least we can
>>> verify the tag is followed by 'http(s):'.
>>>
>>> With that, we avoid such a tag that is not allowed [1]:
>>>
>>>   Closes: 
>>>
>>> Link: 
>>> https://lore.kernel.org/linux-doc/CAHk-=wh0v1EeDV3v8TzK81nDC40=xutdy2mcr0xy3m3fibv...@mail.gmail.com/
>>>  [1]
>>> Signed-off-by: Matthieu Baerts 
>>> [...]
>>> +# Check for misuse of the link tags
>>> +   if ($in_commit_log &&
>>> +   $line =~ /^\s*(\w+:)\s*(\S+)/) {
>>> +   my $tag = $1;
>>> +   my $value = $2;
>>> +   if ($tag =~ /^$link_tags_search$/ && $value !~ 
>>> /^https?:/) {
>>> +   WARN("COMMIT_LOG_WRONG_LINK",
>>> +"'$tag' should be followed by a public 
>>> http(s) link\n" . $herecurr);
>>> +   }
>>> +   }
>>> +
>>
>> I must be missing something here, but it looks to me like this is
>> checked twice now. See this line in patch2 (which is changed there, but
>> the check itself remains):
>>
>>> } elsif ($rawlines[$linenr] !~ m{^link:\s*https?://}i) {
> 
> If I'm not mistaken, we had the following checks:
> 
> - after Reported-by, there is a link tag (Link:|Closes:)
>
> - (link tags can take more than 75 chars)
> - tags followed by "http(s)://" are restricted to link ones
> 
> Then not: link tags (Link:|Closes:) are followed by "http(s):".

Not in general, afaics -- and ensuring that is likely wise, so thx for
this. But for Link: and Closes: tags after a Reported-by it is already
checked, that's what I meant (and didn't communicate well, sorry). It's
just a detail, but might be wise to do this in patch 4:

- } elsif ($rawlines[$linenr] !~ m{^$link_tags_search\s*https?://}i) {
+ } elsif ($rawlines[$linenr] !~ m{^$link_tags_search}i) {

(that's a line changed in patch2)

Ciao, Thorsten

Re: [PATCH v3 0/4] docs & checkpatch: allow Closes tags with links

2023-03-31 Thread Thorsten Leemhuis

On 30.03.23 20:13, Matthieu Baerts wrote:
> Since v6.3, checkpatch.pl now complains about the use of "Closes:" tags
> followed by a link [1]. It also complains if a "Reported-by:" tag is
> followed by a "Closes:" one [2].
> 
> As detailed in the first patch, this "Closes:" tag is used for a bit of
> time, mainly by DRM and MPTCP subsystems. It is used by some bug
> trackers to automate the closure of issues when a patch is accepted.
> It is even planned to use this tag with bugzilla.kernel.org [3].
> 
> The first patch updates the documentation to explain what is this
> "Closes:" tag and how/when to use it. The second patch modifies
> checkpatch.pl to stop complaining about it.
> 
> The DRM maintainers and their mailing list have been added in Cc as they
> are probably interested by these two patches as well.
> 
> [1] 
> https://lore.kernel.org/all/3b036087d80b8c0e07a46a1dbaaf4ad0d018f8d5.1674217480.git.li...@leemhuis.info/
> [2] 
> https://lore.kernel.org/all/bb5dfd55ea2026303ab2296f4a6df3da7dd64006.1674217480.git.li...@leemhuis.info/
> [3] 
> https://lore.kernel.org/linux-doc/20230315181205.f3av7h6owqzzw64p@meerkat.local/
> 
> Signed-off-by: Matthieu Baerts 

Maybe it's just me, but I think those changes do not make it clear
enough when to use Link: and when to use Closes. Find below an
alternative proposal how I'd do it for consideration that goes
'all-in' for the sake of simplicity.

[untested -- and I hope thunderbird won't mangle the patch]

Ciao, Thorsten


diff --git a/Documentation/process/5.Posting.rst 
b/Documentation/process/5.Posting.rst
index 7a670a075ab6..fc194b4d1674 100644
--- a/Documentation/process/5.Posting.rst
+++ b/Documentation/process/5.Posting.rst
@@ -207,11 +207,17 @@ the patch::
Fixes: 1f2e3d4c5b6a ("The first line of the commit specified by the 
first 12 characters of its SHA-1 ID")
 
 Another tag is used for linking web pages with additional backgrounds or
-details, for example a report about a bug fixed by the patch or a document
+details, for example earlier discussion which lead to the patch or a document
 with a specification implemented by the patch::
 
Link: https://example.com/somewhere.html  optional-other-stuff
 
+If the URL points to a report about a bug fixed by the patch, use this 
instead::
+
+   Closes: https://example.com/somewhere.html  optional-other-stuff
+
+Ensure any such links are publicly accessible.
+
 Many maintainers when applying a patch also add this tag to link to the
 latest public review posting of the patch; often this is automatically done
 by tools like b4 or a git hook like the one described in
@@ -251,7 +257,7 @@ The tags in common use are:
  - Reported-by: names a user who reported a problem which is fixed by this
patch; this tag is used to give credit to the (often underappreciated)
people who test our code and let us know when things do not work
-   correctly. Note, this tag should be followed by a Link: tag pointing to the
+   correctly. Note, this tag should be followed by a Closes: tag pointing to 
the
report, unless the report is not available on the web.
 
  - Cc: the named person received a copy of the patch and had the
diff --git a/Documentation/process/submitting-patches.rst 
b/Documentation/process/submitting-patches.rst
index 69ce64e03c70..73611cf1c372 100644
--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -126,8 +126,10 @@ For example::
 
 Link: https://lore.kernel.org/r/30th.anniversary.rep...@klaava.helsinki.fi/
 
-Please check the link to make sure that it is actually working and points
-to the relevant message.
+If the URL points to a bug report that is fixed by the patch, use 'Closes:'
+instead.
+
+Ensure any such links are publicly accessible.
 
 However, try to make your explanation understandable without external
 resources. In addition to giving a URL to a mailing list archive or bug,
@@ -498,7 +500,7 @@ Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: 
and Fixes:
 The Reported-by tag gives credit to people who find bugs and report them and it
 hopefully inspires them to help us again in the future. The tag is intended for
 bugs; please do not use it to credit feature requests. The tag should be
-followed by a Link: tag pointing to the report, unless the report is not
+followed by a Closes: tag pointing to the report, unless the report is not
 available on the web. Please note that if the bug was reported in private, then
 ask for permission first before using the Reported-by tag.
 
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index bd44d12965c9..f9a7c2b856ae 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3158,14 +3158,14 @@ sub process {
}
}
 
-# check if Reported-by: is followed by a Link:
+# check if Reported-by: is followed by a Closes: tag
if ($sign_off =~ /^reported(?:|-and-tested)-by:$/i) {

Re: [PATCH v3 4/4] checkpatch: check for misuse of the link tags

2023-03-31 Thread Thorsten Leemhuis

On 30.03.23 20:13, Matthieu Baerts wrote:
> "Link:" and "Closes:" tags have to be used with public URLs.
> 
> It is difficult to make sure the link is public but at least we can
> verify the tag is followed by 'http(s):'.
> 
> With that, we avoid such a tag that is not allowed [1]:
> 
>   Closes: 
> 
> Link: 
> https://lore.kernel.org/linux-doc/CAHk-=wh0v1EeDV3v8TzK81nDC40=xutdy2mcr0xy3m3fibv...@mail.gmail.com/
>  [1]
> Signed-off-by: Matthieu Baerts 
> [...]
> +# Check for misuse of the link tags
> + if ($in_commit_log &&
> + $line =~ /^\s*(\w+:)\s*(\S+)/) {
> + my $tag = $1;
> + my $value = $2;
> + if ($tag =~ /^$link_tags_search$/ && $value !~ 
> /^https?:/) {
> + WARN("COMMIT_LOG_WRONG_LINK",
> +  "'$tag' should be followed by a public 
> http(s) link\n" . $herecurr);
> + }
> + }
> +

I must be missing something here, but it looks to me like this is
checked twice now. See this line in patch2 (which is changed there, but
the check itself remains):

> } elsif ($rawlines[$linenr] !~ m{^link:\s*https?://}i) {

Ciao, Thorsten

Re: [PATCH v2 2/2] checkpatch: allow Closes tags with links

2023-03-27 Thread Thorsten Leemhuis

On 27.03.23 15:06, Matthieu Baerts wrote:
> Hi Thorsten,
> 
> On 25/03/2023 07:25, Thorsten Leemhuis wrote:
>> On 24.03.23 19:52, Matthieu Baerts wrote:
>>> As a follow-up of the previous patch modifying the documentation to
>>> allow using the "Closes:" tag, checkpatch.pl is updated accordingly.
>>>
>>> checkpatch.pl now mentions the "Closes:" tag between brackets to express
>>> the fact it should be used only if it makes sense.
>>>
>>> While at it, checkpatch.pl will not complain if the "Closes" tag is used
>>> with a "long" line, similar to what is done with the "Link" tag.
>>>
>>> [...]
>>>  
>>> -# check if Reported-by: is followed by a Link:
>>> +# check if Reported-by: is followed by a Link: (or Closes:) tag
>>
>> Small detail: why the parenthesis here? Why no simply "check if
>> Reported-by: is followed by a either Link: or Closes: tag". Same below...
>>
>>> if ($sign_off =~ /^reported(?:|-and-tested)-by:$/i) {
>>> if (!defined $lines[$linenr]) {
>>> WARN("BAD_REPORTED_BY_LINK",
>>> -"Reported-by: should be 
>>> immediately followed by Link: to the report\n" . $herecurr . 
>>> $rawlines[$linenr] . "\n");
>>> -   } elsif ($rawlines[$linenr] !~ 
>>> m{^link:\s*https?://}i) {
>>> +"Reported-by: should be 
>>> immediately followed by Link: (or Closes:) to the report\n" . $herecurr . 
>>> $rawlines[$linenr] . "\n");
>>
>> ...here, where users actually get to see this and might wonder why it's
>> written like that, without getting any answer.
> 
> I tried to explain that in the cover-letter but maybe I should add an
> additional comment in the code: checkpatch.pl now mentions the "Closes:"
> tag between parenthesis to express the fact it should be used only if it
> makes sense. I didn't find any other short ways to express that but I'm
> open to suggestions.
> 
> Now as discussed on patch 1/2, if the "Closes:" tag can be used with any
> public link, we should definitively remove the parenthesis here and
> probably below (see "Check for odd tags before a URI/URL") as well.

Well, ymmd, but if we go down that route I'd say this code should
suggest to use "Closes:" all the time (or primarily).

Ciao, Thorsten

Re: [PATCH v2 1/2] docs: process: allow Closes tags with links

2023-03-27 Thread Thorsten Leemhuis

On 27.03.23 15:05, Matthieu Baerts wrote:
> 
> Thank you for your reply!

Thank you for working on this!

> On 26/03/2023 13:28, Thorsten Leemhuis wrote:
>> On 24.03.23 19:52, Matthieu Baerts wrote:
>>> Making sure a bug tracker is up to date is not an easy task. For
>>> example, a first version of a patch fixing a tracked issue can be sent a
>>> long time after having created the issue. But also, it can take some
>>> time to have this patch accepted upstream in its final form. When it is
>>> done, someone -- probably not the person who accepted the patch -- has
>>> to remember about closing the corresponding issue.
>>>
>>> This task of closing and tracking the patch can be done automatically by
>>> bug trackers like GitLab [1], GitHub [2] and hopefully soon [3]
>>> bugzilla.kernel.org when the appropriated tag is used. The two first
>>> ones accept multiple tags but it is probably better to pick one.
>>>
>>> [...]
>>>
>>> diff --git a/Documentation/process/5.Posting.rst 
>>> b/Documentation/process/5.Posting.rst
>>> index 7a670a075ab6..20f0b6b639b7 100644
>>> --- a/Documentation/process/5.Posting.rst
>>> +++ b/Documentation/process/5.Posting.rst
>>> @@ -217,6 +217,15 @@ latest public review posting of the patch; often this 
>>> is automatically done
>>>  by tools like b4 or a git hook like the one described in
>>>  'Documentation/maintainer/configure-git.rst'.
>>>  
>>> +Similarly, there is also the "Closes:" tag that can be used to close issues
>>> +when the underlying public bug tracker can do this operation automatically.
>>> +For example::
>>> +
>>> +   Closes: https://example.com/issues/1234
>>> +
>>> +Private bug trackers and invalid URLs are forbidden. For other public bug
>>> +trackers not supporting automations, keep using the "Link:" tag instead.
>>> [...]
>>
>> This more and more seems half-hearted to me.
>>
>> One reason: it makes things unnecessarily complicated for developers, as
>> they'd then have to remember `is this a public bug tracker that is
>> supporting automations? Then use "Closes", otherwise "Link:"`.
>>
>> Another reason: the resulting situation ignores my regression tracking
>> bot, which (among others) tracks emailed reports. It would benefit from
>> "Closes" as well to avoid the ambiguity problem Konstantin brought up
>> (the one about "Link: might just point to a report for background
>> information in patches that don't address the problem the link points
>> to"[1]. But FWIW, I'm not sure if this ambiguity is much of a problem in
>> practice, I have a feeling that it's rare and most of the time will
>> happen after the reported problem has been addressed or in the same
>> patch-set.
> 
> Even if they are rare, I think it might be good to avoid false-positives
> that can be frustrating or create confusions. Using a dedicated tag plus
> some safeguards help then be required. (And it is not compatible with
> existing forges.)
Yeah, FWIW, I was all for such clear tags myself not that long ago (and
even twice proposed some), but due to the experience with regzbot and
Linus recent comment on Closes: I'm more in the neutral camp these days.

>> I thus think we should use either of these approaches:
>>
>> * just stick to "Link: "
>>
>> * go "all-in" and tell developers to use "Closes: "[2] all the time
>> when a patch is resolving an issue that was reported in public
>>
>> I'm not sure which of them I prefer myself. Maybe I'm slightly leaning
>> towards the latter: it avoids the ambiguity, checkpatch.pl will yell if
>> it's used with something else than a URL, it makes things easier for
>> MPTCP & DRM developers, and (maybe most importantly) is something new
>> developers are often used to already from git forges.
> 
> I think it makes sense not to restrict this tag to bug trackers with
> automations as long as they are public of course. After having looked at
> the comments from v1, I didn't feel like it would have been OK to extend
> its usage but I can send a v3 taking this direction hoping to get more
> feedback. After all, thanks to regzbot, we can also say that there are
> some automations behind lore.kernel.org and other ML's :)

:-D

> If we do that, would it be blocking to have this included in v6.3?

You mean if this still can go in for 6.3? Well, the patches afaics needs
to be ACKed by the right people first (Joe for checkpatch I guess, Jon
for docs). It likely

Re: [PATCH v2 1/2] docs: process: allow Closes tags with links

2023-03-26 Thread Thorsten Leemhuis

On 24.03.23 19:52, Matthieu Baerts wrote:
> Making sure a bug tracker is up to date is not an easy task. For
> example, a first version of a patch fixing a tracked issue can be sent a
> long time after having created the issue. But also, it can take some
> time to have this patch accepted upstream in its final form. When it is
> done, someone -- probably not the person who accepted the patch -- has
> to remember about closing the corresponding issue.
> 
> This task of closing and tracking the patch can be done automatically by
> bug trackers like GitLab [1], GitHub [2] and hopefully soon [3]
> bugzilla.kernel.org when the appropriated tag is used. The two first
> ones accept multiple tags but it is probably better to pick one.
> 
> [...]
> 
> diff --git a/Documentation/process/5.Posting.rst 
> b/Documentation/process/5.Posting.rst
> index 7a670a075ab6..20f0b6b639b7 100644
> --- a/Documentation/process/5.Posting.rst
> +++ b/Documentation/process/5.Posting.rst
> @@ -217,6 +217,15 @@ latest public review posting of the patch; often this is 
> automatically done
>  by tools like b4 or a git hook like the one described in
>  'Documentation/maintainer/configure-git.rst'.
>  
> +Similarly, there is also the "Closes:" tag that can be used to close issues
> +when the underlying public bug tracker can do this operation automatically.
> +For example::
> +
> + Closes: https://example.com/issues/1234
> +
> +Private bug trackers and invalid URLs are forbidden. For other public bug
> +trackers not supporting automations, keep using the "Link:" tag instead.
> [...]

This more and more seems half-hearted to me.

One reason: it makes things unnecessarily complicated for developers, as
they'd then have to remember `is this a public bug tracker that is
supporting automations? Then use "Closes", otherwise "Link:"`.

Another reason: the resulting situation ignores my regression tracking
bot, which (among others) tracks emailed reports. It would benefit from
"Closes" as well to avoid the ambiguity problem Konstantin brought up
(the one about "Link: might just point to a report for background
information in patches that don't address the problem the link points
to"[1]. But FWIW, I'm not sure if this ambiguity is much of a problem in
practice, I have a feeling that it's rare and most of the time will
happen after the reported problem has been addressed or in the same
patch-set.

I thus think we should use either of these approaches:

* just stick to "Link: "

* go "all-in" and tell developers to use "Closes: "[2] all the time
when a patch is resolving an issue that was reported in public

I'm not sure which of them I prefer myself. Maybe I'm slightly leaning
towards the latter: it avoids the ambiguity, checkpatch.pl will yell if
it's used with something else than a URL, it makes things easier for
MPTCP & DRM developers, and (maybe most importantly) is something new
developers are often used to already from git forges.

Ciao, Thorsten

[1]
https://lore.kernel.org/linux-doc/20230317185637.ebxzsdxivhgzkqqw@meerkat.local/

[2] fwiw, I still prefer "Resolves:" over "Closes". Yes, I've seen
Konstantin's comment on the subtle difference between the two[3], but as
he said, Bugbot can work with it as well. But to me "Resolves" sounds
way friendlier and more descriptive to me; but well, I'm not a native
speaker, so I don't think my option should count much here.

[3]
https://lore.kernel.org/linux-doc/20230316162227.727rhima2tejdl5j@meerkat.local/

Re: [PATCH v2 2/2] checkpatch: allow Closes tags with links

2023-03-25 Thread Thorsten Leemhuis

On 24.03.23 19:52, Matthieu Baerts wrote:
> As a follow-up of the previous patch modifying the documentation to
> allow using the "Closes:" tag, checkpatch.pl is updated accordingly.
> 
> checkpatch.pl now mentions the "Closes:" tag between brackets to express
> the fact it should be used only if it makes sense.
> 
> While at it, checkpatch.pl will not complain if the "Closes" tag is used
> with a "long" line, similar to what is done with the "Link" tag.
> 
> [...]
>  
> -# check if Reported-by: is followed by a Link:
> +# check if Reported-by: is followed by a Link: (or Closes:) tag

Small detail: why the parenthesis here? Why no simply "check if
Reported-by: is followed by a either Link: or Closes: tag". Same below...

>   if ($sign_off =~ /^reported(?:|-and-tested)-by:$/i) {
>   if (!defined $lines[$linenr]) {
>   WARN("BAD_REPORTED_BY_LINK",
> -  "Reported-by: should be 
> immediately followed by Link: to the report\n" . $herecurr . 
> $rawlines[$linenr] . "\n");
> - } elsif ($rawlines[$linenr] !~ 
> m{^link:\s*https?://}i) {
> +  "Reported-by: should be 
> immediately followed by Link: (or Closes:) to the report\n" . $herecurr . 
> $rawlines[$linenr] . "\n");

...here, where users actually get to see this and might wonder why it's
written like that, without getting any answer.

Ciao, Thorsten

Re: [PATCH 0/2] docs & checkpatch: allow Closes tags with links

2023-03-16 Thread Thorsten Leemhuis

On 15.03.23 18:44, Matthieu Baerts wrote:
> Since v6.3, checkpatch.pl now complains about the use of "Closes:" tags
> followed by a link [1]. It also complains if a "Reported-by:" tag is
> followed by a "Closes:" one [2].
> 
> As detailed in the first patch, this "Closes:" tag is used for a bit of
> time, mainly by DRM and MPTCP subsystems. It is used by some bug
> trackers to automate the closure of issues when a patch is accepted.
> 
> Because this tag is used for a bit of time by different subsystems and
> it looks like it makes sense and it is useful for them, I didn't bother
> Linus to get his permission to continue using it. If you think this is
> necessary to do that up front, please tell me and I will be happy to ask
> for his agreement.

Due to how he reacted to some "invented" tags recently, I'd think it
would be appropriate to CC him on this patchset, as he then can speak up
if he wants to (and I assume a few more mails don't bother him).

> The first patch updates the documentation to explain what is this
> "Closes:" tag and how/when to use it. The second patch modifies
> checkpatch.pl to stop complaining about it.

I liked Andrew's `have been using "Addresses:" on occasion. [...] more
humble [...]` comment.  Sadly that tag is not supported by GitLab and
GitHub. But well, "Resolves" is and also a bit more humble if you ask
me. How about using that instead? Assuming that Konstantin can work with
that tag, too, but I guess he can.

I also wonder if the texts for the documentation could be shorter.
Wouldn't something like this do?

`Instead of "Link:" feel free to use "Resolves:" with an URL instead, if
the issue was filed in a public bug tracker that will consider the issue
resolved when it noticed that tag.`

[s/Resolves/Closes/ if we stick to that]

Side note: makes we wonder if we should go "all in" here to avoid
confusion and allow "Resolves" everywhere, even for links to lore.

> [...]

Ciao, Thorsten

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-03-12 Thread Linux regression tracking (Thorsten Leemhuis)

On 10.03.23 11:20, Karol Herbst wrote:
> On Fri, Mar 10, 2023 at 10:26 AM Chris Clayton  
> wrote:
>>
>> Is it likely that this fix will be sumbmitted to mainline during the ongoing 
>> 6.3 development cycle?
>>
> 
> yes, it's already pushed to drm-misc-fixed, which then will go into
> the current devel cycle. I just don't know when it's the next time it
> will be pushed upwards, but it should get there eventually. 

FWIW, the fix landed now as 1b9b4f922f96 ; sadly without a Link: tag to
the report, hence I have to mark this manually as resolved:

#regzbot fix: 1b9b4f922f96108da3bb5d87b2d603f5dfbc5650

> And
> because it also contains a Fixes tag it will be backported to older
> branches as well.

FWIW, nope, that's not enough you have to tag those explicitly to ensure
backporting, as explained in
Documentation/process/stable-kernel-rules.rst Greg points that out every
few weeks, recently here for example:

https://lore.kernel.org/all/y6bwpo9s9qbns...@kroah.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

>> Chris
>>
>> On 20/02/2023 22:16, Ben Skeggs wrote:
>>> On Mon, 20 Feb 2023 at 21:27, Karol Herbst  wrote:
>>>>
>>>> On Mon, Feb 20, 2023 at 11:51 AM Chris Clayton  
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 20/02/2023 05:35, Ben Skeggs wrote:
>>>>>> On Sun, 19 Feb 2023 at 04:55, Chris Clayton  
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 18/02/2023 15:19, Chris Clayton wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 18/02/2023 12:25, Karol Herbst wrote:
>>>>>>>>> On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton 
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 15/02/2023 11:09, Karol Herbst wrote:
>>>>>>>>>>> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
>>>>>>>>>>> (Thorsten Leemhuis)  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 13.02.23 10:14, Chris Clayton wrote:
>>>>>>>>>>>>> On 13/02/2023 02:57, Dave Airlie wrote:
>>>>>>>>>>>>>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton 
>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 10/02/2023 19:33, Linux regression tracking (Thorsten 
>>>>>>>>>>>>>>> Leemhuis) wrote:
>>>>>>>>>>>>>>>> On 10.02.23 20:01, Karol Herbst wrote:
>>>>>>>>>>>>>>>>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking 
>>>>>>>>>>>>>>>>> (Thorsten
>>>>>>>>>>>>>>>>> Leemhuis)  wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 08.02.23 09:48, Chris Clayton wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'm assuming  that we are not going to see a fix for this 
>>>>>>>>>>>>>>>>>>> regression before 6.2 is released.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yeah, looks like it. That's unfortunate, but happens. But 
>>>>>>>>>>>>>>>>>> there is still
>>>>>>>>>>>>>>>>>> time to fix it and there is one thing I wonder:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Did any of the nouveau developers look at the netconsole 
>>>>>>>>>>>>>>>>>> captures Chris
>>>>&

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-15 Thread Linux regression tracking #update (Thorsten Leemhuis)

On 13.02.23 10:14, Chris Clayton wrote:
> On 13/02/2023 02:57, Dave Airlie wrote:
>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  wrote:
>>>
>>>
>>>
>>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>> On 10.02.23 20:01, Karol Herbst wrote:
>>>>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
>>>>> Leemhuis)  wrote:
>>>>>>
>>>>>> On 08.02.23 09:48, Chris Clayton wrote:
>>>>>>>
>>>>>>> I'm assuming  that we are not going to see a fix for this regression 
>>>>>>> before 6.2 is released.
>>>>>>
>>>>>> Yeah, looks like it. That's unfortunate, but happens. But there is still
>>>>>> time to fix it and there is one thing I wonder:
>>>>>>
>>>>>> Did any of the nouveau developers look at the netconsole captures Chris
>>>>>> posted more than a week ago to check if they somehow help to track down
>>>>>> the root of this problem?
>>>>>
>>>>> I did now and I can't spot anything. I think at this point it would
>>>>> make sense to dump the active tasks/threads via sqsrq keys to see if
>>>>> any is in a weird state preventing the machine from shutting down.
>>>>
>>>> Many thx for looking into it!
>>>
>>> Yes, thanks Karol.
>>>
>>> Attached is the output from dmesg when this block of code:
>>>
>>> /bin/mount /dev/sda7 /mnt/sda7
>>> /bin/mountpoint /proc || /bin/mount /proc
>>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>>> /bin/echo t > /proc/sysrq-trigger
>>> /bin/sleep 1
>>> /bin/sync
>>> /bin/sleep 1
>>> kill $(pidof dmesg)
>>> /bin/umount /mnt/sda7
>>>
>>> is executed immediately before /sbin/reboot is called as the final step of 
>>> rebooting my system.
>>>
>>> I hope this is what you were looking for, but if not, please let me know 
>>> what you need
> 
> Thanks Dave. [...]
FWIW, in case anyone strands here in the archives: the msg was
truncated. The full post can be found in a new thread:

https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/

Sadly it seems the info "With runpm=0, both reboot and poweroff work on
my laptop." didn't bring us much further to a solution. :-/ I don't
really like it, but for regression tracking I'm now putting this on the
back-burner, as a fix is not in sight.

#regzbot monitor:
https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
#regzbot backburner: hard to debug and apparently rare
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

#regzbot ignore-activity

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-10 Thread Linux regression tracking (Thorsten Leemhuis)

On 10.02.23 20:01, Karol Herbst wrote:
> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> Leemhuis)  wrote:
>>
>> On 08.02.23 09:48, Chris Clayton wrote:
>>>
>>> I'm assuming  that we are not going to see a fix for this regression before 
>>> 6.2 is released.
>>
>> Yeah, looks like it. That's unfortunate, but happens. But there is still
>> time to fix it and there is one thing I wonder:
>>
>> Did any of the nouveau developers look at the netconsole captures Chris
>> posted more than a week ago to check if they somehow help to track down
>> the root of this problem?
> 
> I did now and I can't spot anything. I think at this point it would
> make sense to dump the active tasks/threads via sqsrq keys to see if
> any is in a weird state preventing the machine from shutting down.

Many thx for looking into it!

Ciao, Thorsten

>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>>> Consequently, I've
>>> implemented a (very simple) workaround. All that happens is that in the 
>>> (sysv) init script that starts and stops SDDM,
>>> the nouveau module is removed once SDDM is stopped. With that in place, my 
>>> system no longer freezes on reboot or poweroff.
>>>
>>> Let me know if I can provide any additional diagnostics although, with the 
>>> problem seemingly occurring so late in the
>>> shutdown process, I may need help on how to go about capturing.
>>>
>>> Chris
>>>
>>> On 02/02/2023 20:45, Chris Clayton wrote:
>>>>
>>>>
>>>> On 01/02/2023 13:51, Chris Clayton wrote:
>>>>>
>>>>>
>>>>> On 30/01/2023 23:27, Ben Skeggs wrote:
>>>>>> On Tue, 31 Jan 2023 at 09:09, Chris Clayton  
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi again.
>>>>>>>
>>>>>>> On 30/01/2023 20:19, Chris Clayton wrote:
>>>>>>>> Thanks, Ben.
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> This is a complete shot-in-the-dark, as I don't see this behaviour on
>>>>>>>>> *any* of my boards.  Could you try the attached patch please?
>>>>>>>>
>>>>>>>> Unfortunately, the patch made no difference.
>>>>>>>>
>>>>>>>> I've been looking at how the graphics on my laptop is set up, and have 
>>>>>>>> a bit of a worry about whether the firmware might
>>>>>>>> be playing a part in this problem. In order to offload video decoding 
>>>>>>>> to the NVidia TU117 GPU, it seems the scrubber
>>>>>>>> firmware must be available, but as far as I know,that has not been 
>>>>>>>> released by NVidia. To get it to work, I followed
>>>>>>>> what ubuntu have done and the scrubber in 
>>>>>>>> /lib/firmware/nvidia/tu117/nvdec/ is a symlink to
>>>>>>>> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of 
>>>>>>>> the firmware loaded is for a different card is being
>>>>>>>> loaded. I note that processing related to firmware is being changed in 
>>>>>>>> the patch. Might my set up be at the root of my
>>>>>>>> problem?
>>>>>>>>
>>>>>>>> I'll have a fiddle an see what I can work out.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ben.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>> Well, my fiddling has got my system rebooting and shutting down 
>>>>>>> successfully again. I found that if I delete the symlink
>>>>>>> to the scrubber firmware, reboot and shutdown work again. There are 
>>>>>>> however, a number of other files in the tu117
>>>>>>> firmware directory tree that that are symlinks to actual files in its 
&g

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-10 Thread Linux regression tracking (Thorsten Leemhuis)

On 08.02.23 09:48, Chris Clayton wrote:
> 
> I'm assuming  that we are not going to see a fix for this regression before 
> 6.2 is released.

Yeah, looks like it. That's unfortunate, but happens. But there is still
time to fix it and there is one thing I wonder:

Did any of the nouveau developers look at the netconsole captures Chris
posted more than a week ago to check if they somehow help to track down
the root of this problem?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> Consequently, I've
> implemented a (very simple) workaround. All that happens is that in the 
> (sysv) init script that starts and stops SDDM,
> the nouveau module is removed once SDDM is stopped. With that in place, my 
> system no longer freezes on reboot or poweroff.
> 
> Let me know if I can provide any additional diagnostics although, with the 
> problem seemingly occurring so late in the
> shutdown process, I may need help on how to go about capturing.
> 
> Chris
> 
> On 02/02/2023 20:45, Chris Clayton wrote:
>>
>>
>> On 01/02/2023 13:51, Chris Clayton wrote:
>>>
>>>
>>> On 30/01/2023 23:27, Ben Skeggs wrote:
 On Tue, 31 Jan 2023 at 09:09, Chris Clayton  
 wrote:
>
> Hi again.
>
> On 30/01/2023 20:19, Chris Clayton wrote:
>> Thanks, Ben.
>
> 
>
>>> Hey,
>>>
>>> This is a complete shot-in-the-dark, as I don't see this behaviour on
>>> *any* of my boards.  Could you try the attached patch please?
>>
>> Unfortunately, the patch made no difference.
>>
>> I've been looking at how the graphics on my laptop is set up, and have a 
>> bit of a worry about whether the firmware might
>> be playing a part in this problem. In order to offload video decoding to 
>> the NVidia TU117 GPU, it seems the scrubber
>> firmware must be available, but as far as I know,that has not been 
>> released by NVidia. To get it to work, I followed
>> what ubuntu have done and the scrubber in 
>> /lib/firmware/nvidia/tu117/nvdec/ is a symlink to
>> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
>> firmware loaded is for a different card is being
>> loaded. I note that processing related to firmware is being changed in 
>> the patch. Might my set up be at the root of my
>> problem?
>>
>> I'll have a fiddle an see what I can work out.
>>
>> Chris
>>
>>>
>>> Thanks,
>>> Ben.
>>>

>
> Well, my fiddling has got my system rebooting and shutting down 
> successfully again. I found that if I delete the symlink
> to the scrubber firmware, reboot and shutdown work again. There are 
> however, a number of other files in the tu117
> firmware directory tree that that are symlinks to actual files in its 
> tu116 counterpart. So I deleted all of those too.
> Unfortunately, the absence of one or more of those symlinks causes Xorg 
> to fail to start. I've reinstated all the links
> except scrubber and I now have a system that works as it did until I 
> tried to run a kernel that includes the bad commit
> I identified in my bisection. That includes offloading video decoding to 
> the NVidia card, so what ever I read that said
> the scrubber firmware was needed seems to have been wrong. I get a new 
> message that (nouveau :01:00.0: fb: VPR
> locked, but no scrubber binary!), but, hey, we can't have everything.
>
> If you still want to get to the bottom of this, let me know what you need 
> me to provide and I'll do my best. I suspect
> you might want to because there will a n awful lot of Ubuntu-based 
> systems out there with that scrubber.bin symlink in
> place. On the other hand,m it could but quite a while before ubuntu are 
> deploying 6.2 or later kernels.
 The symlinks are correct - whole groups of GPUs share the same FW, and
 we use symlinks in linux-firmware to represent this.

 I don't really have any ideas how/why this patch causes issues with
 shutdown - it's a path that only gets executed during initialisation.
 Can you try and capture the kernel log during shutdown ("dmesg -w"
 over ssh? netconsole?), and see if there's any relevant messages
 providing a hint at what's going on?  Alternatively, you could try
 unloading the module (you will have to stop X/wayland/gdm/etc/etc
 first) and seeing if that hangs too.

 Ben.
>>>
>>> Sorry for the delay - I've been learning about netconsole and netcat. 
>>> However, I had no success with ssh and netconsole
>>> produced a log with nothing unusual in it.
>>>
>>> Simply stopping Xorg and removing the nouveau module succeeds.
>>>
>>> So, I rebuilt rc6+ after a pull from linus'

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Linux kernel regression tracking (Thorsten Leemhuis)

On 27.01.23 20:46, Chris Clayton wrote:
> [Resend because the mail client on my phone decided to turn HTML on behind my 
> back, so my reply got bounced.]
> 
> Thanks Thorsten.
> 
> I did try to revert but it didnt revert cleanly and I don't have the 
> knowledge to fix it up.
> 
> The patch was part of a merge that included a number of related patches. 
> Tomorrow, I'll try to revert the lot and report
> back.

You are free to do so, but there is no need for that from my side. I
only wanted to know if a simple revert would do the trick; if it
doesn't, it in my experience often is best to leave things to the
developers of the code in question, as they know it best and thus have a
better idea which hidden side effect a more complex revert might have.

Ciao, Thorsten

> On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten Leemhuis) 
> wrote:
>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>> to make this easily accessible to everyone.
>>
>> @nouveau-maintainers, did anyone take a look at this? The report is
>> already 8 days old and I don't see a single reply. Sure, we'll likely
>> get a -rc8, but still it would be good to not fix this on the finish line.
>>
>> Chris, btw, did you try if you can revert the commit on top of latest
>> mainline? And if so, does it fix the problem?
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>> #regzbot poke
>>
>> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
>> wrote:
>>> [adding various lists and the two other nouveau maintainers to the list
>>> of recipients]
>>
>>> On 18.01.23 21:59, Chris Clayton wrote:
>>>> Hi.
>>>>
>>>> I build and installed the lastest development kernel earlier this week. 
>>>> I've found that when I try the laptop down (or
>>>> reboot it), it hangs right at the end of closing the current session. The 
>>>> last line I see on  the screen when rebooting is:
>>>>
>>>>sd 4:0:0:0: [sda] Synchronising SCSI cache
>>>>
>>>> when closing down I see one additional line:
>>>>
>>>>sd 4:0:0:0 [sda]Stopping disk
>>>>
>>>> In both cases the machine then hangs and I have to hold down the power 
>>>> button fot a few seconds to switch it off.
>>>>
>>>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
>>>> landed on:
>>>>
>>>># first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
>>>> drm/nouveau/flcn: new code to load+boot simple HS FWs
>>>> (VPR scrubber)
>>>>
>>>> I built and installed a kernel with 
>>>> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
>>>> checked out
>>>> and that shuts down and reboots fine. It the did the same with the bad 
>>>> commit checked out and that does indeed hang, so
>>>> I'm confident the bisect outcome is OK.
>>>>
>>>> Kernels 6.1.6 and 5.15.88 are also OK.
>>>>
>>>> My system had dual GPUs - one intel and one NVidia. Related extracts from 
>>>> 'lscpi -v' is:
>>>>
>>>> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
>>>> Graphics] (rev 05) (prog-if 00 [VGA controller])
>>>> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>>>>
>>>> Flags: bus master, fast devsel, latency 0, IRQ 142
>>>>
>>>> Memory at c200 (64-bit, non-prefetchable) [size=16M]
>>>>
>>>> Memory at a000 (64-bit, prefetchable) [size=256M]
>>>>
>>>> I/O ports at 5000 [size=64]
>>>>
>>>> Expansion ROM at 000c [virtual] [disabled] [size=128K]
>>>>
>>>> Capabilities: [40] Vendor Specific Information: Len=0c 
>>>>
>>>> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
>>>>
>>>> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
>>>>
>>>> Capabilities: [d0] Power Management version 2
>>>>
>>>> Kernel driver in use: i915
>>>>
>>>> Kernel modul

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-27 Thread Linux kernel regression tracking (Thorsten Leemhuis)

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

@nouveau-maintainers, did anyone take a look at this? The report is
already 8 days old and I don't see a single reply. Sure, we'll likely
get a -rc8, but still it would be good to not fix this on the finish line.

Chris, btw, did you try if you can revert the commit on top of latest
mainline? And if so, does it fix the problem?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
wrote:
> [adding various lists and the two other nouveau maintainers to the list
> of recipients]

> On 18.01.23 21:59, Chris Clayton wrote:
>> Hi.
>>
>> I build and installed the lastest development kernel earlier this week. I've 
>> found that when I try the laptop down (or
>> reboot it), it hangs right at the end of closing the current session. The 
>> last line I see on  the screen when rebooting is:
>>
>>  sd 4:0:0:0: [sda] Synchronising SCSI cache
>>
>> when closing down I see one additional line:
>>
>>  sd 4:0:0:0 [sda]Stopping disk
>>
>> In both cases the machine then hangs and I have to hold down the power 
>> button fot a few seconds to switch it off.
>>
>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and landed 
>> on:
>>
>>  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
>> drm/nouveau/flcn: new code to load+boot simple HS FWs
>> (VPR scrubber)
>>
>> I built and installed a kernel with f15cde64b66161bfa74fb58f4e5697d8265b802e 
>> (the parent of the bad commit) checked out
>> and that shuts down and reboots fine. It the did the same with the bad 
>> commit checked out and that does indeed hang, so
>> I'm confident the bisect outcome is OK.
>>
>> Kernels 6.1.6 and 5.15.88 are also OK.
>>
>> My system had dual GPUs - one intel and one NVidia. Related extracts from 
>> 'lscpi -v' is:
>>
>> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
>> Graphics] (rev 05) (prog-if 00 [VGA controller])
>> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>>
>> Flags: bus master, fast devsel, latency 0, IRQ 142
>>
>> Memory at c200 (64-bit, non-prefetchable) [size=16M]
>>
>> Memory at a000 (64-bit, prefetchable) [size=256M]
>>
>> I/O ports at 5000 [size=64]
>>
>> Expansion ROM at 000c [virtual] [disabled] [size=128K]
>>
>> Capabilities: [40] Vendor Specific Information: Len=0c 
>>
>> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
>>
>> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
>>
>> Capabilities: [d0] Power Management version 2
>>
>> Kernel driver in use: i915
>>
>> Kernel modules: i915
>>
>>
>> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
>> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
>> controller])
>> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
>> Flags: bus master, fast devsel, latency 0, IRQ 141
>> Memory at c400 (32-bit, non-prefetchable) [size=16M]
>> Memory at b000 (64-bit, prefetchable) [size=256M]
>> Memory at c000 (64-bit, prefetchable) [size=32M]
>> I/O ports at 4000 [size=128]
>> Expansion ROM at c300 [disabled] [size=512K]
>> Capabilities: [60] Power Management version 3
>> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>> Capabilities: [78] Express Legacy Endpoint, MSI 00
>> Kernel driver in use: nouveau
>> Kernel modules: nouveau
>>
>> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still using 
>> sysvinit).
>>
>> I've attached the bisect.log, but please let me know if I can provide any 
>> other diagnostics. Please cc me as I'm not
>> subscribed.
> 
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
> 
> #regzbot ^introduced e44c2170876197
> #regzbot title drm: nouveau: hangs on poweroff/reboot
> #regzbot ignore-activity
> 
> This isn't a regression? This is

Re: [PATCH] Revert "drm/display/dp_mst: Move all payload info into the atomic state"

2023-01-27 Thread Linux kernel regression tracking (Thorsten Leemhuis)

On 27.01.23 08:39, Greg KH wrote:
> On Fri, Jan 20, 2023 at 11:51:04AM -0600, Limonciello, Mario wrote:
>> On 1/20/2023 11:46, Guenter Roeck wrote:
>>> On Thu, Jan 12, 2023 at 04:50:44PM +0800, Wayne Lin wrote:
 This reverts commit 4d07b0bc403403438d9cf88450506240c5faf92f.

 [Why]
 Changes cause regression on amdgpu mst.
 E.g.
 In fill_dc_mst_payload_table_from_drm(), amdgpu expects to add/remove 
 payload
 one by one and call fill_dc_mst_payload_table_from_drm() to update the HW
 maintained payload table. But previous change tries to go through all the
 payloads in mst_state and update amdpug hw maintained table in once 
 everytime
 driver only tries to add/remove a specific payload stream only. The newly
 design idea conflicts with the implementation in amdgpu nowadays.

 [How]
 Revert this patch first. After addressing all regression problems caused by
 this previous patch, will add it back and adjust it.
>>>
>>> Has there been any progress on this revert, or on fixing the underlying
>>> problem ?
>>>
>>> Thanks,
>>> Guenter
>>
>> Hi Guenter,
>>
>> Wayne is OOO for CNY, but let me update you.
>>
>> Harry has sent out this series which is a collection of proper fixes.
>> https://patchwork.freedesktop.org/series/113125/
>>
>> Once that's reviewed and accepted, 4 of them are applicable for 6.1.
> 
> Any hint on when those will be reviewed and accepted?  patchwork doesn't
> show any activity on them, or at least I can't figure it out...

I didn't look closer (hence please correct me if I'm wrong), but the
core changes afaics are in the DRM pull airlied send a few hours ago to
Linus (note the "amdgpu […] DP MST fixes" line):

https://lore.kernel.org/all/capm%3d9tzuu4xnx6t5v7sksk%2ba5heapoc1iemyznsyqzgztj%3d...@mail.gmail.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-01-19 Thread Linux kernel regression tracking (Thorsten Leemhuis)

[adding various lists and the two other nouveau maintainers to the list
of recipients]

For the rest of this mail:

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 18.01.23 21:59, Chris Clayton wrote:
> Hi.
> 
> I build and installed the lastest development kernel earlier this week. I've 
> found that when I try the laptop down (or
> reboot it), it hangs right at the end of closing the current session. The 
> last line I see on  the screen when rebooting is:
> 
>   sd 4:0:0:0: [sda] Synchronising SCSI cache
> 
> when closing down I see one additional line:
> 
>   sd 4:0:0:0 [sda]Stopping disk
> 
> In both cases the machine then hangs and I have to hold down the power button 
> fot a few seconds to switch it off.
> 
> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and landed 
> on:
> 
>   # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
> drm/nouveau/flcn: new code to load+boot simple HS FWs
> (VPR scrubber)
> 
> I built and installed a kernel with f15cde64b66161bfa74fb58f4e5697d8265b802e 
> (the parent of the bad commit) checked out
> and that shuts down and reboots fine. It the did the same with the bad commit 
> checked out and that does indeed hang, so
> I'm confident the bisect outcome is OK.
> 
> Kernels 6.1.6 and 5.15.88 are also OK.
> 
> My system had dual GPUs - one intel and one NVidia. Related extracts from 
> 'lscpi -v' is:
> 
> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
> Graphics] (rev 05) (prog-if 00 [VGA controller])
> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
> 
> Flags: bus master, fast devsel, latency 0, IRQ 142
> 
> Memory at c200 (64-bit, non-prefetchable) [size=16M]
> 
> Memory at a000 (64-bit, prefetchable) [size=256M]
> 
> I/O ports at 5000 [size=64]
> 
> Expansion ROM at 000c [virtual] [disabled] [size=128K]
> 
> Capabilities: [40] Vendor Specific Information: Len=0c 
> 
> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
> 
> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> 
> Capabilities: [d0] Power Management version 2
> 
> Kernel driver in use: i915
> 
> Kernel modules: i915
> 
> 
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> controller])
> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
> Flags: bus master, fast devsel, latency 0, IRQ 141
> Memory at c400 (32-bit, non-prefetchable) [size=16M]
> Memory at b000 (64-bit, prefetchable) [size=256M]
> Memory at c000 (64-bit, prefetchable) [size=32M]
> I/O ports at 4000 [size=128]
> Expansion ROM at c300 [disabled] [size=512K]
> Capabilities: [60] Power Management version 3
> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Capabilities: [78] Express Legacy Endpoint, MSI 00
> Kernel driver in use: nouveau
> Kernel modules: nouveau
> 
> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still using 
> sysvinit).
> 
> I've attached the bisect.log, but please let me know if I can provide any 
> other diagnostics. Please cc me as I'm not
> subscribed.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced e44c2170876197
#regzbot title drm: nouveau: hangs on poweroff/reboot
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Re: [REGRESSION] GM20B probe fails after commit 2541626cfb79

2023-01-13 Thread Linux kernel regression tracking (Thorsten Leemhuis)

[CCing Daniel]

On 05.01.23 13:28, Thorsten Leemhuis wrote:
> [adding Karol and Lyude to the list of recipients]
> 
> On 28.12.22 15:49, Diogo Ivo wrote:
>> Hello,
>>
>> Commit 2541626cfb79 breaks GM20B probe with
>> the following kernel log:
> Just wondering: is anyone looking on this? The report was posted more
> than a week ago and didn't even get a single reply yet afaics. This of
> course can happen at this time of the year, but I nevertheless thought a
> quick status inquiry might be a good idea at this point.

Hmmm, the report is now more that two weeks old and didn't get a single
reply. My prodding about a week ago also didn't help. Then I guess I
have to bring this to Linus attention, unless something happens in the
next 2 days.

Diogo, for that it would be really helpful to known: is the issue still
happening with latest mainline? Is it possible to revert 2541626cfb79
easily? And if so: do things work afterwards again?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

>> [2.153892] [ cut here ]
>> [2.153897] WARNING: CPU: 1 PID: 36 at 
>> drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:273 
>> gf100_vmm_valid+0x2c4/0x390
>> [2.153916] Modules linked in:
>> [2.153922] CPU: 1 PID: 36 Comm: kworker/u8:1 Not tainted 6.1.0+ #1
>> [2.153929] Hardware name: Google Pixel C (DT)
>> [2.153933] Workqueue: events_unbound deferred_probe_work_func
>> [2.153943] pstate: 8005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS 
>> BTYPE=--)
>> [2.153950] pc : gf100_vmm_valid+0x2c4/0x390
>> [2.153959] lr : gf100_vmm_valid+0xb4/0x390
>> [2.153966] sp : ffc009e134b0
>> [2.153969] x29: ffc009e134b0 x28:  x27: 
>> ffc008fd44c8
>> [2.153979] x26: ffea x25: ffc0087b98d0 x24: 
>> ff8080f89038
>> [2.153987] x23: ff8081fadc08 x22:  x21: 
>> 
>> [2.153995] x20: ff8080f8a000 x19: ffc009e13678 x18: 
>> 
>> [2.154003] x17: f37a8b93418958e6 x16: ffc009f0d000 x15: 
>> 
>> [2.154011] x14: 0002 x13: 0003a020 x12: 
>> ffc00800
>> [2.154019] x11: 000102913000 x10:  x9 : 
>> 
>> [2.154026] x8 : ffc009e136d8 x7 : ffc008fd44c8 x6 : 
>> ff80803d0f00
>> [2.154034] x5 :  x4 : ff8080f88c00 x3 : 
>> 0010
>> [2.154041] x2 : 000c x1 : ffea x0 : 
>> ffea
>> [2.154050] Call trace:
>> [2.154053]  gf100_vmm_valid+0x2c4/0x390
>> [2.154061]  nvkm_vmm_map_valid+0xd4/0x204
>> [2.154069]  nvkm_vmm_map_locked+0xa4/0x344
>> [2.154076]  nvkm_vmm_map+0x50/0x84
>> [2.154083]  nvkm_firmware_mem_map+0x84/0xc4
>> [2.154094]  nvkm_falcon_fw_oneinit+0xc8/0x320
>> [2.154101]  nvkm_acr_oneinit+0x428/0x5b0
>> [2.154109]  nvkm_subdev_oneinit_+0x50/0x104
>> [2.154114]  nvkm_subdev_init_+0x3c/0x12c
>> [2.154119]  nvkm_subdev_init+0x60/0xa0
>> [2.154125]  nvkm_device_init+0x14c/0x2a0
>> [2.154133]  nvkm_udevice_init+0x60/0x9c
>> [2.154140]  nvkm_object_init+0x48/0x1b0
>> [2.154144]  nvkm_ioctl_new+0x168/0x254
>> [2.154149]  nvkm_ioctl+0xd0/0x220
>> [2.154153]  nvkm_client_ioctl+0x10/0x1c
>> [2.154162]  nvif_object_ctor+0xf4/0x22c
>> [2.154168]  nvif_device_ctor+0x28/0x70
>> [2.154174]  nouveau_cli_init+0x150/0x590
>> [2.154180]  nouveau_drm_device_init+0x60/0x2a0
>> [2.154187]  nouveau_platform_device_create+0x90/0xd0
>> [2.154193]  nouveau_platform_probe+0x3c/0x9c
>> [2.154200]  platform_probe+0x68/0xc0
>> [2.154207]  really_probe+0xbc/0x2dc
>> [2.154211]  __driver_probe_device+0x78/0xe0
>> [2.154216]  driver_probe_device+0xd8/0x160
>> [2.154221]  __device_attach_driver+0xb8/0x134
>> [2.154226]  bus_for_each_drv+0x78/0xd0
>> [2.154230]  __device_attach+0x9c/0x1a0
>> [2.154234]  device_initial_probe+0x14/0x20
>> [2.154239]  bus_probe_device+0x98/0xa0
>> [2.154243]  deferred_probe_work_func+0x88/0xc0
>> [2.154247]  process_one_work+0x204/0x40c
>> [2.154256]  worker_thread+0x230/0x450
>> [2.154261]  kthread+0xc8/0xcc
>> [2.154266]  ret_from_fork+0x10/0x20
>> [2.154273]

Re: [REGRESSION] GM20B probe fails after commit 2541626cfb79

2023-01-05 Thread Thorsten Leemhuis

[adding Karol and Lyude to the list of recipients]

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

On 28.12.22 15:49, Diogo Ivo wrote:
> Hello,
> 
> Commit 2541626cfb79 breaks GM20B probe with
> the following kernel log:
Just wondering: is anyone looking on this? The report was posted more
than a week ago and didn't even get a single reply yet afaics. This of
course can happen at this time of the year, but I nevertheless thought a
quick status inquiry might be a good idea at this point.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> [2.153892] [ cut here ]
> [2.153897] WARNING: CPU: 1 PID: 36 at 
> drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:273 
> gf100_vmm_valid+0x2c4/0x390
> [2.153916] Modules linked in:
> [2.153922] CPU: 1 PID: 36 Comm: kworker/u8:1 Not tainted 6.1.0+ #1
> [2.153929] Hardware name: Google Pixel C (DT)
> [2.153933] Workqueue: events_unbound deferred_probe_work_func
> [2.153943] pstate: 8005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [2.153950] pc : gf100_vmm_valid+0x2c4/0x390
> [2.153959] lr : gf100_vmm_valid+0xb4/0x390
> [2.153966] sp : ffc009e134b0
> [2.153969] x29: ffc009e134b0 x28:  x27: 
> ffc008fd44c8
> [2.153979] x26: ffea x25: ffc0087b98d0 x24: 
> ff8080f89038
> [2.153987] x23: ff8081fadc08 x22:  x21: 
> 
> [2.153995] x20: ff8080f8a000 x19: ffc009e13678 x18: 
> 
> [2.154003] x17: f37a8b93418958e6 x16: ffc009f0d000 x15: 
> 
> [2.154011] x14: 0002 x13: 0003a020 x12: 
> ffc00800
> [2.154019] x11: 000102913000 x10:  x9 : 
> 
> [2.154026] x8 : ffc009e136d8 x7 : ffc008fd44c8 x6 : 
> ff80803d0f00
> [2.154034] x5 :  x4 : ff8080f88c00 x3 : 
> 0010
> [2.154041] x2 : 000c x1 : ffea x0 : 
> ffea
> [2.154050] Call trace:
> [2.154053]  gf100_vmm_valid+0x2c4/0x390
> [2.154061]  nvkm_vmm_map_valid+0xd4/0x204
> [2.154069]  nvkm_vmm_map_locked+0xa4/0x344
> [2.154076]  nvkm_vmm_map+0x50/0x84
> [2.154083]  nvkm_firmware_mem_map+0x84/0xc4
> [2.154094]  nvkm_falcon_fw_oneinit+0xc8/0x320
> [2.154101]  nvkm_acr_oneinit+0x428/0x5b0
> [2.154109]  nvkm_subdev_oneinit_+0x50/0x104
> [2.154114]  nvkm_subdev_init_+0x3c/0x12c
> [2.154119]  nvkm_subdev_init+0x60/0xa0
> [2.154125]  nvkm_device_init+0x14c/0x2a0
> [2.154133]  nvkm_udevice_init+0x60/0x9c
> [2.154140]  nvkm_object_init+0x48/0x1b0
> [2.154144]  nvkm_ioctl_new+0x168/0x254
> [2.154149]  nvkm_ioctl+0xd0/0x220
> [2.154153]  nvkm_client_ioctl+0x10/0x1c
> [2.154162]  nvif_object_ctor+0xf4/0x22c
> [2.154168]  nvif_device_ctor+0x28/0x70
> [2.154174]  nouveau_cli_init+0x150/0x590
> [2.154180]  nouveau_drm_device_init+0x60/0x2a0
> [2.154187]  nouveau_platform_device_create+0x90/0xd0
> [2.154193]  nouveau_platform_probe+0x3c/0x9c
> [2.154200]  platform_probe+0x68/0xc0
> [2.154207]  really_probe+0xbc/0x2dc
> [2.154211]  __driver_probe_device+0x78/0xe0
> [2.154216]  driver_probe_device+0xd8/0x160
> [2.154221]  __device_attach_driver+0xb8/0x134
> [2.154226]  bus_for_each_drv+0x78/0xd0
> [2.154230]  __device_attach+0x9c/0x1a0
> [2.154234]  device_initial_probe+0x14/0x20
> [2.154239]  bus_probe_device+0x98/0xa0
> [2.154243]  deferred_probe_work_func+0x88/0xc0
> [2.154247]  process_one_work+0x204/0x40c
> [2.154256]  worker_thread+0x230/0x450
> [2.154261]  kthread+0xc8/0xcc
> [2.154266]  ret_from_fork+0x10/0x20
> [2.154273] ---[ end trace  ]---
> [2.154278] nouveau 5700.gpu: pmu: map -22
> [2.154285] nouveau 5700.gpu: acr: one-time init failed, -22
> [2.154559] nouveau 5700.gpu: init failed with -22
> [2.154564] nouveau: DRM-master::0080: init failed with -22
> [2.154574] nouveau 5700.gpu: DRM-master: Device allocation failed: -22
> [2.162905] nouveau: probe of 5700.gpu failed with error -22
> 
> #regzbot introduced: 2541626cfb79
> 
> Thanks,
> 
> Diogo Ivo
> 
> 

#regzbot poke

Re: [6.2][regression] looks like commit aab9cf7b6954136f4339136a1a7fc0602a2c4d8b leads to use-after-free and random computer hangs

2022-12-22 Thread Thorsten Leemhuis

Hi, this is your Linux kernel regression tracker.

On 18.12.22 14:28, Mikhail Gavrilov wrote:
>
> The kernel 6.2 preparation cycle has begun.
> And after the kernel was updated on my Fedora Rawhide I started
> receiving use-after-free errors with complete computer hangs.
> At least a good reproducer of this behaviour is launch of the game
> "Marvel's Avengers".
> 
> The backtrace of the issue looks like:
> [...]

Thx for your report. I'm not one of the developers for this area of the
kernel, but to my untrained eyes it looks like this patch might fix your
problem:

https://lore.kernel.org/all/20221219104718.21677-1-christian.koe...@amd.com/

Anyway, to be sure the issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced aab9cf7b695413
#regzbot title drm: amdgpu: use-after-free and random computer hangs
#regzbot monitor:
https://lore.kernel.org/all/20221219104718.21677-1-christian.koe...@amd.com/
#regzbot fix: drm/amdgpu: grab extra fence reference for
drm_sched_job_add_dependency
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

Re: [PATCH] drm/vmwgfx: Fix passing partly uninitialized drm_mode_fb_cmd2 struct #forregzbot

2022-12-22 Thread Thorsten Leemhuis

[Note: this mail contains only information for Linux kernel regression
tracking. Mails like these contain '#forregzbot' in the subject to make
then easy to spot and filter out. The author also tried to remove most
or all individuals from the list of recipients to spare them the hassle.]

On 21.12.22 03:23, Kaiwan N Billimoria wrote:
> [REGRESSION] ?
>
> Testing with 6.1, I find the same issue - VirtualBox VMs seem to hang
> on boot, though the kernel has this patch applied of course...
> Am running VirtualBox 7.0.4 on an x86_64 Linux (Ubuntu 22.04.1) host;
> the system hangs on boot with the screen
> going blank.
> Passing 'nomodeset' via GRUB fixes it..

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced v6.0..v6.1
#regzbot title dri: vmwgfx: boot hang in VirtualBox due to an oops
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

Re: [PATCH] drm/vmwgfx: Fix passing partly uninitialized drm_mode_fb_cmd2 struct

2022-12-21 Thread Thorsten Leemhuis

Hi, this is your Linux kernel regression tracker. The relevant code here
is not my area of expertise, nevertheless a few questions:

On 21.12.22 03:23, Kaiwan N Billimoria wrote:
> [REGRESSION] ?

> Testing with 6.1, I find the same issue - VirtualBox VMs seem to hang
> on boot, though the kernel has this patch applied of course...

Maybe I'm missing something, but what made you assume that it's the same
issue? The fix for that issue talked about "garbage" in some structures
that "can cause random failures during the bringup of the fbcon." Yeah,
maybe that ca result in a hang, but I didn't see it that thread (but
maybe I missed)

> Am running VirtualBox 7.0.4 on an x86_64 Linux (Ubuntu 22.04.1) host;
> the system hangs on boot with the screen
> going blank.

A bit more details would be helpful. For example: is anything printed at
all before the system hangs? What's the last kernel that worked for you
(and is the newer kernel using a similar build configuration)? Which
graphics adapater did you configure in VirtualBox?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

Re: [PATCH v2.5] drm/msm/dsi: switch to DRM_PANEL_BRIDGE #forregzbot

2022-12-08 Thread Thorsten Leemhuis




On 13.11.22 11:23, Thorsten Leemhuis wrote:
> [Note: this mail is primarily send for documentation purposes and/or for
> regzbot, my Linux kernel regression tracking bot. That's why I removed
> most or all folks from the list of recipients, but left any that looked
> like a mailing lists. These mails usually contain '#forregzbot' in the
> subject, to make them easy to spot and filter out.]
> 
> [TLDR: I'm adding this regression report to the list of tracked
> regressions; all text from me you find below is based on a few templates
> paragraphs you might have encountered already already in similar form.]
> 
> Hi, this is your Linux kernel regression tracker.
> 
> On 11.11.22 16:30, Caleb Connolly wrote:
>>
>> This patch has caused a regression on 6.1-rc for some devices that use
>> DSI panels. The new behaviour results in the DSI controller being
>> switched off before the panel unprepare hook is called. As a result,
>> panel drivers which call mipi_dsi_dcs_write() or similar in unprepare()
>> fail.
>>
>> I've noticed it specifically on the OnePlus 6 (with upstream Samsung
>> s0fef00 panel driver) and the SHIFT6mq with an out of tree driver.
> 
> Thanks for the report. To be sure below issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
> tracking bot:
> 
> #regzbot ^introduced 007ac0262b0d
> #regzbot title drm: msm: DSI controller being switched off before the
> panel unprepare hook is called
> #regzbot ignore-activity

#regzbot inconclusive: reporter MIA

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start #forregzbot

2022-11-27 Thread Thorsten Leemhuis




On 20.11.22 18:25, Thorsten Leemhuis wrote:
> [Note: this mail is primarily send for documentation purposes and/or for
> regzbot, my Linux kernel regression tracking bot. That's why I removed
> most or all folks from the list of recipients, but left any that looked
> like a mailing lists. These mails usually contain '#forregzbot' in the
> subject, to make them easy to spot and filter out.]
> 
> On 14.11.22 14:22, Christian König wrote:
>>
>> I've found and fixed a few problems around the userptr handling which
>> might explain what you see here.
>>
>> A series of four patches starting with "drm/amdgpu: always register an
>> MMU notifier for userptr" is under review now.
> 
> #regzbot monitor:
> https://lore.kernel.org/all/20221115133853.7950-1-christian.koe...@amd.com/
> #regzbot fixed-by: fec8fdb54e8f

#regzbot fixed-by: 4458da0bb09d443595

Re: [PATCH v2.5] drm/msm/dsi: switch to DRM_PANEL_BRIDGE

2022-11-23 Thread Thorsten Leemhuis

Hi, this is your Linux kernel regression tracker.

On 13.11.22 14:28, Dmitry Baryshkov wrote:
> Hi Caleb,
> 
> On Fri, 11 Nov 2022 at 18:30, Caleb Connolly  
> wrote:
>>
>> Hi,
>>
>> This patch has caused a regression on 6.1-rc for some devices that use
>> DSI panels. The new behaviour results in the DSI controller being
>> switched off before the panel unprepare hook is called. As a result,
>> panel drivers which call mipi_dsi_dcs_write() or similar in
>> unprepare() fail.
> 
> Thanks for the notice. Can you move your command stream to
> panel_disable() hook? (even if it's just as a temporary workaround)

Caleb, did you look into what Dmitry suggested? This issue is on my list
of tracked regressions in 6.1 and time is running out to get it fixed
before the release.

Or was there any progress to get this fixed and I just missed it?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

#regzbot ignore-activity

> From what I see from other panels, some of them call
> mipi_dsi_dcs_set_display_off() in the unprepare() hook, while others
> do it in disable().
> 
> Yes, this is (again) the DSI host vs device order here. Short story:
> the DRM has a notion of 'the display pipe (i.e. clocks and timing
> signals) feeding the bridge being running'. That's the difference
> between enable/pre_enable and disable/post_disable. For the DSI we
> have a third state, when the DSI clock and ln0 allow transferring
> commands to the panel, but the image is not enabled.
> 
> There was a somewhat promising patchset at [1], but it seems it went
> out of the radar. I can try working on an alternative (explicit)
> approach if I have time.
> 
> With respect to your panel. Let me quote the docs: 'Before stopping
> video transmission from the display controller it can be necessary to
> turn off the panel to avoid visual glitches. This is done in the
> .disable() function. Analogously to .enable() this typically involves
> turning off the backlight and waiting for some time to make sure no
> image is visible on the panel. It is then safe for the display
> controller to cease transmission of video data.'
> 
> So, if we stop the call chain after switching the DSI host off but
> before calling the panel's unprepare() hook, will we see any
> artifacts/image leftover/etc. on the panel? Generally I have the
> feeling that all panels should call mipi_dsi_dcs_set_display_off() in
> the .disable() hook, not in the .unprepare() one.
> 
> [1] 
> https://lore.kernel.org/dri-devel/cover.1646406653.git.dave.steven...@raspberrypi.com/
> 
>>
>> I've noticed it specifically on the OnePlus 6 (with upstream Samsung
>> s0fef00 panel driver) and the SHIFT6mq with an out of tree driver.
>>
>> On 12/07/2022 14:22, Dmitry Baryshkov wrote:
>>> Currently the DSI driver has two separate paths: one if the next device
>>> in a chain is a bridge and another one if the panel is connected
>>> directly to the DSI host. Simplify the code path by using panel-bridge
>>> driver (already selected in Kconfig) and dropping support for
>>> handling the panel directly.
>>>
>>> Signed-off-by: Dmitry Baryshkov 
>>> ---
>>>
>>> I'm not sending this as a separate patchset (I'd like to sort out mdp5
>>> first), but more of a preview of changes related to
>>> msm_dsi_manager_ext_bridge_init().
>>>
>>> ---
>>>   drivers/gpu/drm/msm/dsi/dsi.c |  35 +---
>>>   drivers/gpu/drm/msm/dsi/dsi.h |  16 +-
>>>   drivers/gpu/drm/msm/dsi/dsi_host.c|  25 ---
>>>   drivers/gpu/drm/msm/dsi/dsi_manager.c | 283 +++---
>>>   4 files changed, 36 insertions(+), 323 deletions(-)
> 
> [skipped the patch itself]
>

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start #forregzbot

2022-11-20 Thread Thorsten Leemhuis

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

On 14.11.22 14:22, Christian König wrote:
> 
> I've found and fixed a few problems around the userptr handling which
> might explain what you see here.
> 
> A series of four patches starting with "drm/amdgpu: always register an
> MMU notifier for userptr" is under review now.

#regzbot monitor:
https://lore.kernel.org/all/20221115133853.7950-1-christian.koe...@amd.com/
#regzbot fixed-by: fec8fdb54e8f

Re: [regression] Bug 216475 - fbcon crashes during single gpu passthough reattachment to host #forregzbot

2022-11-15 Thread Thorsten Leemhuis

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

On 19.09.22 11:10, Thorsten Leemhuis wrote:
>
> I noticed a regression report in bugzilla.kernel.org. As many (most?)
> kernel developer don't keep an eye on it, I decided to forward it by
> mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216475 :
> [...]
> #regzbot introduced: 3647d6d3dbdafc55f8c4ca8225966963252abe7b
> https://bugzilla.kernel.org/show_bug.cgi?id=216475
> #regzbot ignore-activity

#regzbot invalid: reporter found workaround, quite special use case anyway

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-11-14 Thread Thorsten Leemhuis

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

Christian, was any progress made to address this? It looks stalled sine
10+ days, as I looked for posts and commits that referenced this report,
but couldn't find anything.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

On 02.11.22 14:43, Christian König wrote:
> Am 02.11.22 um 14:36 schrieb Mikhail Gavrilov:
>> On Tue, Nov 1, 2022 at 10:52 PM Christian König
>>  wrote:
>>> Let's focus on one problem at a time.
>>>
>>> The issue here is that somehow userptr handling became racy after we
>>> removed the lock, but I don't see why.
>>>
>>> We need to fix this ASAP since it is probably a much wider problem and
>>> the additional lock just hides it somehow.
>>>
>>> Going to provide you with an updated patch tomorrow.
>>>
>>> Thanks,
>>> Christian.
>> Recently sackboy has been updated and now the kernel log contains a
>> trace very similar to the one in the first post, even with the patch
>> applied.
>>
>> [  155.948044] [ cut here ]
>> [  155.948164] WARNING: CPU: 3 PID: 4850 at
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:678
>> amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
>> [  155.948342] Modules linked in: uinput rfcomm snd_seq_dummy
>> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
>> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
>> qrtr bnep intel_rapl_msr intel_rapl_common snd_hda_codec_realtek
>> snd_sof_amd_renoir snd_sof_amd_acp snd_hda_codec_generic
>> snd_hda_codec_hdmi snd_sof_pci sunrpc binfmt_misc snd_sof
>> snd_hda_intel snd_sof_utils snd_intel_dspcfg mt7921e
>> snd_intel_sdw_acpi snd_hda_codec mt7921_common snd_soc_core
>> edac_mce_amd mt76_connac_lib btusb snd_hda_core snd_compress snd_hwdep
>> mt76 btrtl ac97_bus kvm_amd snd_pcm_dmaengine btbcm snd_rpl_pci_acp6x
>> snd_pci_acp6x btintel mac80211 btmtk snd_seq snd_seq_device kvm
>> snd_pcm snd_pci_acp5x libarc4 bluetooth irqbypass vfat snd_timer
>> snd_rn_pci_acp3x fat rapl snd_acp_config asus_nb_wmi snd cfg80211
>> snd_soc_acpi wmi_bmof k10temp pcspkr
>> [  155.948436]  snd_pci_acp3x i2c_piix4 soundcore asus_wireless
>> amd_pmc joydev zram amdgpu drm_ttm_helper ttm crct10dif_pclmul
>> hid_asus crc32_pclmul asus_wmi crc32c_intel iommu_v2 ledtrig_audio
>> polyval_clmulni gpu_sched sparse_keymap polyval_generic
>> platform_profile drm_buddy drm_display_helper nvme rfkill
>> ghash_clmulni_intel hid_multitouch ucsi_acpi sha512_ssse3 nvme_core
>> typec_ucsi serio_raw sp5100_tco r8169 ccp cec nvme_common typec
>> i2c_hid_acpi i2c_hid video wmi ip6_tables ip_tables fuse
>> [  155.948540] CPU: 3 PID: 4850 Comm: Sackboy-Win64-T Tainted: G
>>   W    L    ---  ---
>> 6.1.0-0.rc3.20221101git5aaef24b5c6d.29.fc38.x86_64 #1
>> [  155.948544] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
>> G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
>> [  155.948547] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190
>> [amdgpu]
>> [  155.948748] Code: 9e f1 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 a8
>> a3 fd c0 48 c7 c7 88 81 1e c1 e8 af 97 ea f1 eb 8e 66 90 bd f2 ff ff
>> ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
>> ff 48
>> [  155.948751] RSP: 0018:960b544d3a50 EFLAGS: 00010282
>> [  155.948756] RAX: 8a4e40d44e00 RBX: 8a4f0e564140 RCX:
>> 0001
>> [  155.948759] RDX:  RSI: 8a4e40d44e00 RDI:
>> 8a4f4b52b400
>> [  155.948761] RBP: 8a4e8c979000 R08: 0dc0 R09:
>> 
>> [  155.948764] R10: 0001 R11:  R12:
>> 8a4e8aaad558
>> [  155.948767] R13: 3b91 R14: 8a4f0e667180 R15:
>> 8a4f4b52b458
>> [  155.948770] FS:  7fa13fe006c0() GS:8a5d16e0()
>> knlGS:36f8
>> [  155.948772] CS:  0010 DS:  ES:  CR0: 80050033
>> [  155.948775] CR2: 25c9e1d0 CR3: 00036199 CR4:
>> 00750ee0
>> [  155.948778] PKRU: 5554
>> [  155.948780] Call Trace:
>> [  155.948783]  
>> [  155.948790]  amdgpu_cs_ioctl+0x9fd/0x2030 [amdgpu]
>> [  155.948992]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
>> [  155.949155]  drm_ioctl_kernel+0xac/0x160
>> [  155.949165]  drm_ioctl+0x1e7/0x450
>> [  155.949172]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
>> [  155.949344]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
>> [  155.949528]  __x64_sys_ioctl+0x90/0xd0
>> [  155.949537]  do_syscall_64+0x5b/0x80
>> [  155.949547]  ? lock_is_held_type+0xe8/0x140
>> [  155.949559]

Re: [PATCH v2.5] drm/msm/dsi: switch to DRM_PANEL_BRIDGE #forregzbot

2022-11-13 Thread Thorsten Leemhuis

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker.

On 11.11.22 16:30, Caleb Connolly wrote:
> 
> This patch has caused a regression on 6.1-rc for some devices that use
> DSI panels. The new behaviour results in the DSI controller being
> switched off before the panel unprepare hook is called. As a result,
> panel drivers which call mipi_dsi_dcs_write() or similar in unprepare()
> fail.
> 
> I've noticed it specifically on the OnePlus 6 (with upstream Samsung
> s0fef00 panel driver) and the SHIFT6mq with an out of tree driver.

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced 007ac0262b0d
#regzbot title drm: msm: DSI controller being switched off before the
panel unprepare hook is called
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

> On 12/07/2022 14:22, Dmitry Baryshkov wrote:
>> Currently the DSI driver has two separate paths: one if the next device
>> in a chain is a bridge and another one if the panel is connected
>> directly to the DSI host. Simplify the code path by using panel-bridge
>> driver (already selected in Kconfig) and dropping support for
>> handling the panel directly.
>>
>> Signed-off-by: Dmitry Baryshkov 
>> ---
>>
>> I'm not sending this as a separate patchset (I'd like to sort out mdp5
>> first), but more of a preview of changes related to
>> msm_dsi_manager_ext_bridge_init().
>>
>> ---
>>   drivers/gpu/drm/msm/dsi/dsi.c |  35 +---
>>   drivers/gpu/drm/msm/dsi/dsi.h |  16 +-
>>   drivers/gpu/drm/msm/dsi/dsi_host.c    |  25 ---
>>   drivers/gpu/drm/msm/dsi/dsi_manager.c | 283 +++---
>>   4 files changed, 36 insertions(+), 323 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/dsi/dsi.c
>> b/drivers/gpu/drm/msm/dsi/dsi.c
>> index 1625328fa430..4edb9167e600 100644
>> --- a/drivers/gpu/drm/msm/dsi/dsi.c
>> +++ b/drivers/gpu/drm/msm/dsi/dsi.c
>> @@ -6,14 +6,6 @@
>>   #include "dsi.h"
>>   #include "dsi_cfg.h"
>>
>> -struct drm_encoder *msm_dsi_get_encoder(struct msm_dsi *msm_dsi)
>> -{
>> -    if (!msm_dsi || !msm_dsi_device_connected(msm_dsi))
>> -    return NULL;
>> -
>> -    return msm_dsi->encoder;
>> -}
>> -
>>   bool msm_dsi_is_cmd_mode(struct msm_dsi *msm_dsi)
>>   {
>>   unsigned long host_flags =
>> msm_dsi_host_get_mode_flags(msm_dsi->host);
>> @@ -220,7 +212,6 @@ int msm_dsi_modeset_init(struct msm_dsi *msm_dsi,
>> struct drm_device *dev,
>>    struct drm_encoder *encoder)
>>   {
>>   struct msm_drm_private *priv;
>> -    struct drm_bridge *ext_bridge;
>>   int ret;
>>
>>   if (WARN_ON(!encoder) || WARN_ON(!msm_dsi) || WARN_ON(!dev))
>> @@ -254,26 +245,10 @@ int msm_dsi_modeset_init(struct msm_dsi
>> *msm_dsi, struct drm_device *dev,
>>   goto fail;
>>   }
>>
>> -    /*
>> - * check if the dsi encoder output is connected to a panel or an
>> - * external bridge. We create a connector only if we're connected
>> to a
>> - * drm_panel device. When we're connected to an external bridge, we
>> - * assume that the drm_bridge driver will create the connector
>> itself.
>> - */
>> -    ext_bridge = msm_dsi_host_get_bridge(msm_dsi->host);
>> -
>> -    if (ext_bridge)
>> -    msm_dsi->connector =
>> -    msm_dsi_manager_ext_bridge_init(msm_dsi->id);
>> -    else
>> -    msm_dsi->connector =
>> -    msm_dsi_manager_connector_init(msm_dsi->id);
>> -
>> -    if

Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"

2022-10-24 Thread Thorsten Leemhuis

Hi! Thx for the reply.

On 24.10.22 12:26, Thomas Zimmermann wrote:
> Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
>>
>> I noticed a regression report in bugzilla.kernel.org. As many (most?)
>> kernel developer don't keep an eye on it, I decided to forward it by
>> mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616  :
>>
>>>   Andreas 2022-10-22 14:25:32 UTC
>>>
>>> Created attachment 303074 [details]
>>> dmesg
> 
> I've looked at the kernel log and found that simpledrm has been loaded
> *after* amdgpu, which should never happen. The problematic patch has
> been taken from a long list of refactoring work on this code. No wonder
> that it doesn't work as expected.
> 
> Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove
> remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and
> report on the results. It should fix the problem.

Greg, is that enough for you to pick this up? Or do you want Andreas to
test first if it really fixes the reported problem?

Ciao, Thorsten


>>> 6.0.2 works.
>>>
>>> On 6.0.3 the system is very sluggish with graphic glitches all over
>>> the place in KDE Plasma Desktop X11 (no graphic glitches when using
>>> Wayland, but also sluggish). SDDM works fine.
>>>
>>> Hardware: Lenovo Legion 5 Pro 16ACH6H: AMD Ryzen 7 5800H "Cezanne",
>>> hybrid graphics AMD "Green Sardine" (Vega 8 GCN 5.1, AMDGPU) and
>>> Nvidia GeForce RTX 3070 Mobile (GA104M, not working with nouveau, I'm
>>> not using the proprietary nvidia driver).
>>>
>>> [reply] [−] Comment 1 Andreas 2022-10-22 14:27:15 UTC
>>>
>>> Created attachment 303075 [details]
>>> my kernel .config for 6.0.3
>>>
>>> Only was CONFIG_HID_TOPRE added in 6.0.3, otherwise it is identical
>>> as my .config for 6.0.2.
>>>
>>> [reply] [−] Comment 2 Andreas 2022-10-22 14:51:23 UTC
>>>
>>> In /var/log/Xorg.0.log the only obvious difference is the last line:
>>>  snap
>>> randr: falling back to unsynchronized pixmap sharing
>>>  snap
>>> The line is present when I boot with 6.0.3, but isn't when I boot 6.0.2.
>>>
>>> (Obviously this is when I login to KDE with X11, not with Wayland,
>>> from SDDM.)
>>>
>>> [reply] [−] Comment 3 Andreas 2022-10-22 22:10:19 UTC
>>>
>>> I did a git bisect on stable kernels 5.0.3 as bad and 5.0.2 as good,
>>> this is the result:
>>>
>>> cfecfc98a78d97a49807531b5b224459bda877de is the first bad commit
>>> commit cfecfc98a78d97a49807531b5b224459bda877de (HEAD, refs/bisect/bad)
>>> Author: Thomas Zimmermann 
>>> Date:   Mon Jul 18 09:23:18 2022 +0200
>>>
>>>  video/aperture: Disable and unregister sysfb devices via
>>> aperture helpers
>>>   [ Upstream commit 5e01376124309b4dbd30d413f43c0d9c2f60edea ]
>>>   Call sysfb_disable() before removing conflicting devices in
>>> aperture
>>>  helpers. Fixes sysfb state if fbdev has been disabled.
>>>   Signed-off-by: Thomas Zimmermann 
>>>  Reviewed-by: Javier Martinez Canillas 
>>>  Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before
>>> internal helpers")
>>>
>>> [reply] [−] Comment 4 Andreas 2022-10-22 22:11:51 UTC
>>>
>>> Link to the suspect patch:
>>>
>>> https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-8-tzimmerm...@suse.de
>>> (or https://patchwork.freedesktop.org/patch/494608/)
>>>
>>> [reply] [−] Comment 5 Andreas 2022-10-22 22:38:14 UTC
>>>
>>> Okay, so I reverted
>>> v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch
>>>  on stable 5.0.3 and the fault is gone.
>>>
>>> I always logged out immediately, which worked (even though everything
>>> is very very sluggish). Also, when I killed the X session within a
>>> couple of seconds (15 or so), no error was shown (I used "systemctl
>>> stop sddm" from another virtual console).
>>>
>>> Noteworthy: I once compiled a kernel from within the Plasma Desktop,
>>> while it was sluggish. The kernel compiled alright. When it was
>>> finished I moved the mouse to reboot, at which point it completely
>>> froze and I had to hard-reset the system.
>>>
>>> While still running, after > 15 seconds, the fault looked like this
>>> (dmesg):
>>>  snap 
&g

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start #forregzbot

2022-10-23 Thread Thorsten Leemhuis

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker. CCing the regression
mailing list, as it should be in the loop for all regressions, as
explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

On 21.10.22 10:08, Mikhail Gavrilov wrote:
> Hi!
> I found that some games (Cyberpunk 2077, Forza Horizon 4/5) hang at
> start after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6.

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced dd80d9c8eecac
#regzbot title drm: amdgpu: some games (Cyberpunk 2077, Forza Horizon
4/5) hang at start
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

> dd80d9c8eecac8c516da5b240d01a35660ba6cb6 is the first bad commit
> commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6
> Author: Christian König 
> Date:   Thu Jul 14 10:23:38 2022 +0200
> 
> drm/amdgpu: revert "partial revert "remove ctx->lock" v2"
> 
> This reverts commit 94f4c4965e5513ba624488f4b601d6b385635aec.
> 
> We found that the bo_list is missing a protection for its list entries.
> Since that is fixed now this workaround can be removed again.
> 
> Signed-off-by: Christian König 
> Reviewed-by: Alex Deucher 
> Signed-off-by: Alex Deucher 
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 21 ++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 -
>  3 files changed, 6 insertions(+), 18 deletions(-)
> 
> 
> And when it happening in kernel log appears a such backtrace:
> [  231.331210] [ cut here ]
> [  231.331262] WARNING: CPU: 11 PID: 6555 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:675
> amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  231.331424] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep intel_rapl_msr intel_rapl_common snd_sof_amd_renoir
> snd_sof_amd_acp snd_sof_pci snd_hda_codec_realtek snd_sof
> snd_hda_codec_generic snd_hda_codec_hdmi snd_sof_utils mt7921e
> snd_hda_intel sunrpc snd_intel_dspcfg mt7921_common binfmt_misc
> snd_intel_sdw_acpi snd_hda_codec mt76_connac_lib edac_mce_amd btusb
> snd_soc_core mt76 snd_hda_core btrtl snd_hwdep snd_compress kvm_amd
> ac97_bus snd_seq btbcm snd_pcm_dmaengine btintel snd_rpl_pci_acp6x
> mac80211 btmtk snd_pci_acp6x kvm snd_seq_device snd_pcm snd_pci_acp5x
> libarc4 irqbypass bluetooth snd_rn_pci_acp3x snd_timer pcspkr
> asus_nb_wmi rapl joydev wmi_bmof snd_acp_config cfg80211 snd_soc_acpi
> vfat snd
> [  231.331490]  snd_pci_acp3x i2c_piix4 soundcore fat k10temp amd_pmc
> asus_wireless zram amdgpu drm_ttm_helper ttm hid_asus asus_wmi
> iommu_v2 crct10dif_pclmul crc32_pclmul gpu_sched crc32c_intel
> ledtrig_audio sparse_keymap polyval_clmulni platform_profile drm_buddy
> polyval_generic hid_multitouch drm_display_helper rfkill nvme
> ucsi_acpi ghash_clmulni_intel nvme_core video typec_ucsi serio_raw ccp
> sha512_ssse3 sp5100_tco r8169 cec nvme_common typec wmi i2c_hid_acpi
> i2c_hid ip6_tables ip_tables fuse
> [  231.331532] CPU: 11 PID: 6555 Comm: GameThread Tainted: GW
>   L---  ---
>

[Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"

2022-10-23 Thread Thorsten Leemhuis

Hi, this is your Linux kernel regression tracker speaking.

I noticed a regression report in bugzilla.kernel.org. As many (most?)
kernel developer don't keep an eye on it, I decided to forward it by
mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616  :

>  Andreas 2022-10-22 14:25:32 UTC
> 
> Created attachment 303074 [details]
> dmesg
> 
> 6.0.2 works.
> 
> On 6.0.3 the system is very sluggish with graphic glitches all over the place 
> in KDE Plasma Desktop X11 (no graphic glitches when using Wayland, but also 
> sluggish). SDDM works fine.
> 
> Hardware: Lenovo Legion 5 Pro 16ACH6H: AMD Ryzen 7 5800H "Cezanne", hybrid 
> graphics AMD "Green Sardine" (Vega 8 GCN 5.1, AMDGPU) and Nvidia GeForce RTX 
> 3070 Mobile (GA104M, not working with nouveau, I'm not using the proprietary 
> nvidia driver).
> 
> [reply] [−] Comment 1 Andreas 2022-10-22 14:27:15 UTC
> 
> Created attachment 303075 [details]
> my kernel .config for 6.0.3
> 
> Only was CONFIG_HID_TOPRE added in 6.0.3, otherwise it is identical as my 
> .config for 6.0.2.
> 
> [reply] [−] Comment 2 Andreas 2022-10-22 14:51:23 UTC
> 
> In /var/log/Xorg.0.log the only obvious difference is the last line:
>  snap
> randr: falling back to unsynchronized pixmap sharing
>  snap
> The line is present when I boot with 6.0.3, but isn't when I boot 6.0.2.
> 
> (Obviously this is when I login to KDE with X11, not with Wayland, from SDDM.)
> 
> [reply] [−] Comment 3 Andreas 2022-10-22 22:10:19 UTC
> 
> I did a git bisect on stable kernels 5.0.3 as bad and 5.0.2 as good, this is 
> the result:
> 
> cfecfc98a78d97a49807531b5b224459bda877de is the first bad commit
> commit cfecfc98a78d97a49807531b5b224459bda877de (HEAD, refs/bisect/bad)
> Author: Thomas Zimmermann 
> Date:   Mon Jul 18 09:23:18 2022 +0200
> 
> video/aperture: Disable and unregister sysfb devices via aperture helpers
> 
> [ Upstream commit 5e01376124309b4dbd30d413f43c0d9c2f60edea ]
> 
> Call sysfb_disable() before removing conflicting devices in aperture
> helpers. Fixes sysfb state if fbdev has been disabled.
> 
> Signed-off-by: Thomas Zimmermann 
> Reviewed-by: Javier Martinez Canillas 
> Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before internal 
> helpers")
> 
> [reply] [−] Comment 4 Andreas 2022-10-22 22:11:51 UTC
> 
> Link to the suspect patch:
> 
> https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-8-tzimmerm...@suse.de
> (or https://patchwork.freedesktop.org/patch/494608/)
> 
> [reply] [−] Comment 5 Andreas 2022-10-22 22:38:14 UTC
> 
> Okay, so I reverted 
> v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch
>  on stable 5.0.3 and the fault is gone.
> 
> I always logged out immediately, which worked (even though everything is very 
> very sluggish). Also, when I killed the X session within a couple of seconds 
> (15 or so), no error was shown (I used "systemctl stop sddm" from another 
> virtual console).
> 
> Noteworthy: I once compiled a kernel from within the Plasma Desktop, while it 
> was sluggish. The kernel compiled alright. When it was finished I moved the 
> mouse to reboot, at which point it completely froze and I had to hard-reset 
> the system.
> 
> While still running, after > 15 seconds, the fault looked like this (dmesg):
>  snap 
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13- } 7 
> jiffies s: 165 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X   state:R  running task stack:0 pid: 4242 ppid:  
> 4228 flags:0x0008
> Call Trace:
>  
>  ? commit_tail+0xd7/0x130
>  ? drm_atomic_helper_commit+0x126/0x150
>  ? drm_atomic_commit+0xa4/0xe0
>  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>  ? drm_atomic_helper_dirtyfb+0x19e/0x280
>  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? drm_ioctl_kernel+0xc4/0x150
>  ? drm_ioctl+0x246/0x3f0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? __x64_sys_ioctl+0x91/0xd0
>  ? do_syscall_64+0x60/0xd0
>  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>  
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13- } 29 
> jiffies s: 165 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X   state:R  running task stack:0 pid: 4242 ppid:  
> 4228 flags:0x0008
> Call Trace:
>  
>  ? commit_tail+0xd7/0x130
>  ? drm_atomic_helper_commit+0x126/0x150
>  ? drm_atomic_commit+0xa4/0xe0
>  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>  ? drm_atomic_helper_dirtyfb+0x19e/0x280
>  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? drm_ioctl_kernel+0xc4/0x150
>  ? drm_ioctl+0x246/0x3f0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? __x64_sys_ioctl+0x91/0xd0
>  ? do_syscall_64+0x60/0xd0
>  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>  
> rcu: INFO: rcu_sched detected expedited stalls on

Re: [PATCH] drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()

2022-10-11 Thread Thorsten Leemhuis

[removed a lot of people from the list of recipients, as this is mainly
for Guenter]

Hi Guenter!

On 06.10.22 19:23, Guenter Roeck wrote:
> On Wed, Oct 05, 2022 at 11:46:15PM -0700, Guenter Roeck wrote:
>> On Tue, Sep 27, 2022 at 03:12:00PM -0400, Hamza Mahfooz wrote:
>>> Address the following error:
>>> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c: In function 
>>> ‘dc_stream_remove_writeback’:
>>> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:527:55: error: 
>>> array subscript [0, 0] is outside array bounds of ‘struct 
>>> dc_writeback_info[1]’ [-Werror=array-bounds]
>>>   527 | stream->writeback_info[j] = 
>>> stream->writeback_info[i];
>>>   | ~~^~~
>>> In file included from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dc.h:1269,
>>>  from 
>>> ./drivers/gpu/drm/amd/amdgpu/../display/dc/inc/core_types.h:29,
>>>  from 
>>> ./drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dc_common.h:29,
>>>  from 
>>> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:27:
>>> ./drivers/gpu/drm/amd/amdgpu/../display/dc/dc_stream.h:241:34: note: while 
>>> referencing ‘writeback_info’
>>>   241 | struct dc_writeback_info writeback_info[MAX_DWB_PIPES];
>>>   |
>>>
>>> Currently, we aren't checking to see if j remains within
>>> writeback_info[]'s bounds. So, add a check to make sure that we aren't
>>> overflowing the buffer.
>>>
>>> Signed-off-by: Hamza Mahfooz 
>>
>> With gcc 11.3, this patch doesn't fix a problem, it introduces one.
>>
>> Building csky:allmodconfig ... failed
>> --
>> Error log:
>> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c: In function 
>> 'dc_stream_remove_writeback':
>> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:527:83: error: 
>> array subscript 1 is above array bounds of 'struct dc_writeback_info[1]' 
>> [-Werror=array-bounds]
>>   527 | stream->writeback_info[j] = 
>> stream->writeback_info[i];
>
> [...]
>
> #regzbot introduced: 5d8c3e836fc2

Thx for using regzbot, much appreciated. JFYI, the initial report was
your own mail you were replying to here, so a "#regzbot ^introduced:
..."  would have been more appropriate. In this case it didn't matter
anyway, as the fix didn't include a "Link:" tag to the initial report
anyway. No worries, I just have to tell regzbot about the fix manually then:

#regzbot fixed-by: faf4d8e07f5b67

Ciao, Thorsten

1 2 >

1 - 100 of 151 matches

Mail list logo