from:"\"Brian Norris\""

Re: [NOT A REGRESSION] firmware: framebuffer-coreboot: duplicate device name "simple-framebuffer.0"

2024-09-13 Thread Brian Norris

Hi Javier,

On Thu, Sep 12, 2024 at 06:33:58PM +0200, Javier Martinez Canillas wrote:
> That's a very good point. I'm actually not familiar with Coreboot and I
> used an educated guess (in the case of DT for example, that's the main
> source of truth and I didn't know if a Core table was in a similar vein).
> 
> Maybe something like the following (untested) patch then?

Julius is more familiar with the Coreboot + payload ecosystem than me,
but his explanations make sense to me, as does this patch.

> From de1c32017006f4671d91b695f4d6b4e99c073ab2 Mon Sep 17 00:00:00 2001
> From: Javier Martinez Canillas 
> Date: Thu, 12 Sep 2024 18:31:55 +0200
> Subject: [PATCH] firmware: coreboot: Don't register a pdev if screen_info data
>  is available
> 
> On Coreboot platforms, a system framebuffer may be provided to the Linux
> kernel by filling a LB_TAG_FRAMEBUFFER entry in the Coreboot table. But
> a Coreboot payload (e.g: SeaBIOS) could also provide this information to
> the Linux kernel.
> 
> If that the case, early arch x86 boot code will fill the global struct
> screen_info data and that data used by the Generic System Framebuffers
> (sysfb) framework to add a platform device with platform data about the
> system framebuffer.

Normally, these sorts of "early" and "later" ordering descriptions would
set alarm bells when talking about independent drivers. But I suppose
the "early arch" code has better ordering guaranteeds than drivers, so
this should be fine.

> But later then the framebuffer_coreboot driver will try to do the same
> framebuffer (using the information from the Coreboot table), which will
> lead to an error due a simple-framebuffer.0 device already registered:
> 
> sysfs: cannot create duplicate filename 
> '/bus/platform/devices/simple-framebuffer.0'
> ...
> coreboot: could not register framebuffer
> framebuffer coreboot8: probe with driver framebuffer failed with error -17
> 
> To prevent the issue, make the framebuffer_core driver to not register a
> platform device if the global struct screen_info data has been filled.
> 
> Reported-by: Brian Norris 
> Link: 
> https://lore.kernel.org/all/ZuCG-DggNThuF4pj@b20ea791c01f/T/#ma7fb65acbc1a56042258adac910992bb225a20d2
> Suggested-by: Julius Werner 
> Signed-off-by: Javier Martinez Canillas 
> ---
>  drivers/firmware/google/framebuffer-coreboot.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/firmware/google/framebuffer-coreboot.c 
> b/drivers/firmware/google/framebuffer-coreboot.c
> index daadd71d8ddd..4e50da17cd7e 100644
> --- a/drivers/firmware/google/framebuffer-coreboot.c
> +++ b/drivers/firmware/google/framebuffer-coreboot.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "coreboot_table.h"
>  
> @@ -27,6 +28,7 @@ static int framebuffer_probe(struct coreboot_device *dev)
>   int i;
>   u32 length;
>   struct lb_framebuffer *fb = &dev->framebuffer;
> + struct screen_info *si = &screen_info;
>   struct platform_device *pdev;
>   struct resource res;
>   struct simplefb_platform_data pdata = {
> @@ -36,6 +38,20 @@ static int framebuffer_probe(struct coreboot_device *dev)
>   .format = NULL,
>   };
>  
> + /*
> +  * If the global screen_info data has been filled, the Generic
> +  * System Framebuffers (sysfb) will already register a platform

Did you mean 'platform_device'?

> +  * and pass the screen_info as platform_data to a driver that
> +  * could scan-out using the system provided framebuffer.
> +  *
> +  * On Coreboot systems, the advertise LB_TAG_FRAMEBUFFER entry

s/advertise/advertised/ ?

> +  * in the Coreboot table should only be used if the payload did
> +  * not set video mode info and passed it to the Linux kernel.

s/passed/pass/

> +  */
> + if (si->orig_video_isVGA == VIDEO_TYPE_VLFB ||
> +si->orig_video_isVGA == VIDEO_TYPE_EFI)

This line is using spaces for indentation. It should use a tab, and then
spaces for alignment. But presumably this will change based on Thomas's
suggestions anyway.

> + return -EINVAL;

Is EINVAL right? IIUC, that will print a noisier error to the logs. I
believe the "expected" sorts of return codes are ENODEV or ENXIO. (See
call_driver_probe().) ENODEV seems like a fine choice, similar to
several of the other return codes already used here.

Anyway, this seems along the right track. Thanks for tackling, and feel
free to carry a:

Reviewed-by: Brian Norris 

> +
>   if (!fb->physical_address)
>   return -ENODEV;
>  
> -- 
> Best regards,
> 
> Javier Martinez Canillas
> Core Platforms
> Red Hat
>

[NOT A REGRESSION] firmware: framebuffer-coreboot: duplicate device name "simple-framebuffer.0"

2024-09-10 Thread Brian Norris

(Tweaking subject; this indeed isn't related to the regression at all)

Hi,

On Mon, Sep 09, 2024 at 10:02:00AM +0200, Borislav Petkov wrote:
> Looking at your log, the first warn is in framebuffer_coreboot. Some mess in
> the sysfs platform devices registration.
> 
> Adding the relevant people for that:
> 
> Aug 20 20:29:36 luna kernel: sysfs: cannot create duplicate filename 
> '/bus/platform/devices/simple-framebuffer.0'
> Aug 20 20:29:36 luna kernel: CPU: 5 PID: 571 Comm: (udev-worker) Tainted: G   
> OE  6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8
> Aug 20 20:29:36 luna kernel: Hardware name: Purism Librem 14/Librem 14, BIOS 
> 4.14-Purism-1 06/18/2021
> Aug 20 20:29:36 luna kernel: Call Trace:
> Aug 20 20:29:36 luna kernel:  
> Aug 20 20:29:36 luna kernel:  dump_stack_lvl+0x5d/0x80
> Aug 20 20:29:36 luna kernel:  sysfs_warn_dup.cold+0x17/0x23
> Aug 20 20:29:36 luna kernel:  sysfs_do_create_link_sd+0xcf/0xe0
> Aug 20 20:29:36 luna kernel:  bus_add_device+0x6b/0x130
> Aug 20 20:29:36 luna kernel:  device_add+0x3b3/0x870
> Aug 20 20:29:36 luna kernel:  platform_device_add+0xed/0x250
> Aug 20 20:29:36 luna kernel:  platform_device_register_full+0xbb/0x140
> Aug 20 20:29:36 luna kernel:  
> platform_device_register_resndata.constprop.0+0x54/0x80 [framebuffer_coreboot 
> a587d2fc243ebaa0205c3badd33442a004d284e0]
> Aug 20 20:29:36 luna kernel:  framebuffer_probe+0x165/0x1b0 
> [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0]
> Aug 20 20:29:36 luna kernel:  really_probe+0xdb/0x340
> Aug 20 20:29:36 luna kernel:  ? pm_runtime_barrier+0x54/0x90
> Aug 20 20:29:36 luna kernel:  ? __pfx___driver_attach+0x10/0x10
> Aug 20 20:29:36 luna kernel:  __driver_probe_device+0x78/0x110
> Aug 20 20:29:36 luna kernel:  driver_probe_device+0x1f/0xa0
> Aug 20 20:29:36 luna kernel:  __driver_attach+0xba/0x1c0
> Aug 20 20:29:36 luna kernel:  bus_for_each_dev+0x8c/0xe0
> Aug 20 20:29:36 luna kernel:  bus_add_driver+0x112/0x1f0
> Aug 20 20:29:36 luna kernel:  driver_register+0x72/0xd0
> Aug 20 20:29:36 luna kernel:  ? __pfx_framebuffer_driver_init+0x10/0x10 
> [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0]
> Aug 20 20:29:36 luna kernel:  do_one_initcall+0x58/0x310
> Aug 20 20:29:36 luna kernel:  do_init_module+0x60/0x220
> Aug 20 20:29:36 luna kernel:  init_module_from_file+0x89/0xe0
> Aug 20 20:29:36 luna kernel:  idempotent_init_module+0x121/0x320
> Aug 20 20:29:36 luna kernel:  __x64_sys_finit_module+0x5e/0xb0
> Aug 20 20:29:36 luna kernel:  do_syscall_64+0x82/0x190
> Aug 20 20:29:36 luna kernel:  ? __do_sys_newfstatat+0x3c/0x80
> Aug 20 20:29:36 luna kernel:  ? syscall_exit_to_user_mode+0x72/0x200
> Aug 20 20:29:36 luna kernel:  ? do_syscall_64+0x8e/0x190
> Aug 20 20:29:36 luna kernel:  ? do_sys_openat2+0x9c/0xe0
> Aug 20 20:29:36 luna kernel:  ? syscall_exit_to_user_mode+0x72/0x200
> Aug 20 20:29:36 luna kernel:  ? do_syscall_64+0x8e/0x190
> Aug 20 20:29:36 luna kernel:  ? clear_bhb_loop+0x25/0x80
> Aug 20 20:29:36 luna kernel:  ? clear_bhb_loop+0x25/0x80
> Aug 20 20:29:36 luna kernel:  ? clear_bhb_loop+0x25/0x80
> Aug 20 20:29:36 luna kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> Aug 20 20:29:36 luna kernel: RIP: 0033:0x7b1bee2f81fd

Looks like it might be a conflict with
drivers/firmware/sysfb_simplefb.c, which also uses the
"simple-framebuffer" name with a constant ID of 0. It's possible both
drivers should be switched to use PLATFORM_DEVID_AUTO? Or at least one
of them. Or they should use different base names.

I'm not really sure what the best option is (does anyone rely on or care
about the device naming?), and I don't actually use this driver. But
here's an untested diff to try if you'd really like. If you test it,
feel free to submit as a proper patch with my:

Signed-off-by: Brian Norris 

diff --git a/drivers/firmware/google/framebuffer-coreboot.c 
b/drivers/firmware/google/framebuffer-coreboot.c
index daadd71d8ddd..3f1b8f664c3f 100644
--- a/drivers/firmware/google/framebuffer-coreboot.c
+++ b/drivers/firmware/google/framebuffer-coreboot.c
@@ -62,7 +62,8 @@ static int framebuffer_probe(struct coreboot_device *dev)
return -EINVAL;
 
pdev = platform_device_register_resndata(&dev->dev,
-"simple-framebuffer", 0,
+"simple-framebuffer",
+PLATFORM_DEVID_AUTO,
 &res, 1, &pdata,
 sizeof(pdata));
if (IS_ERR(pdev))

Re: [PATCH 1/1] drm/bridge: analogix_dp: add a quirk for Bob panel

2023-02-14 Thread Brian Norris

Hi,

You seem to have sent this twice, perhaps to adjust the To/CC list. I
think I've picked the right one to reply to, but it's usually nice to
use a "v2" notation or otherwise put a comment somewhere in the email.

On Wed, Feb 08, 2023 at 12:44:06PM +0800, Kencp huang wrote:
> From: zain wang 
> 
> Some panels' DP_PSR_STATUS (DPCD 0x2008) may be unstable when we
> enable psr. If we get the unexpected psr state, We need try the
> debounce to ensure the panel was in PSR
> 
> Signed-off-by: zain wang 
> Signed-off-by: Chris Zhong 
> Commit-Ready: Kristian H. Kristensen 

'Commit-Ready' isn't something that makes sense for upstream Linux. The
other 'Tested-by' and 'Reviewed-by' *might* make sense to carry forward,
even though these were from the Chromium Gerrit instance, but they also
applied to a very old and different version of this patch, so probably
not.

I'd suggest starting over with only the Signed-off-by tags.

> Tested-by: Kristian H. Kristensen 
> Reviewed-by: Kristian H. Kristensen 
> Tested-by: Kencp huang 
> Signed-off-by: Kencp huang 
> ---
>  .../gpu/drm/bridge/analogix/analogix_dp_reg.c | 71 +++
>  1 file changed, 42 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c 
> b/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c
> index 6a4f20fccf84..7b6e3f8f85b0 100644
> --- a/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c
> +++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c
> @@ -935,25 +935,54 @@ void analogix_dp_enable_psr_crc(struct 
> analogix_dp_device *dp)
>   writel(PSR_VID_CRC_ENABLE, dp->reg_base + ANALOGIX_DP_CRC_CON);
>  }
>  
> -static ssize_t analogix_dp_get_psr_status(struct analogix_dp_device *dp)
> +static int analogix_dp_get_psr_status(struct analogix_dp_device *dp,

Probably could be called 'analogix_dp_wait_psr_status()', since it's no
longer just a "getter" function.

> +   int status)

This would probably make more sense as a 'bool psr_active' or some
similar naming, since it doesn't really represent a "status" field now,
but more of a "am I entering or exiting PSR?" parameter.

>  {
>   ssize_t val;
> - u8 status;
> + u8 reg, store = 0;
> + int cnt = 0;
> +
> + /* About 3ms for a loop */

The commit description explains why you need this polling/debounce loop,
but it's good to document such artifacts in the code too, when they're
as strange as this one. Perhaps a short explanation about the "bouncing"
behavior of some panels here? "Some panels' DP_PSR_STATUS register is
unstable when entering PSR."

Also, I already had a (pre mailing list) question about why this doesn't
use readx_poll_timeout(), so I'll repeat for the record one reason not
to: it's difficult to handle the stateful debouncing aspect with that
macro, and keep the 'store' variable around.

> + while (cnt < 100) {
> + /* Read operation would takes 900us */
> + val = drm_dp_dpcd_readb(&dp->aux, DP_PSR_STATUS, ®);
> + if (val < 0) {
> + dev_err(dp->dev, "PSR_STATUS read failed ret=%zd", val);
> + return val;
> + }
> +
> + /*
> +  * Ensure the PSR_STATE should go to DP_PSR_SINK_ACTIVE_RFB
> +  * from DP_PSR_SINK_ACTIVE_SINK_SYNCED or
> +  * DP_PSR_SINK_ACTIVE_SRC_SYNCED.
> +  * Otherwise, if we get DP_PSR_SINK_ACTIVE_RFB twice in
> +  * succession, it show the Panel is stable PSR enabled state.
> +  */
> + if (status == DP_PSR_SINK_ACTIVE_RFB) {
> + if ((reg == DP_PSR_SINK_ACTIVE_RFB) &&
> + ((store == DP_PSR_SINK_ACTIVE_SINK_SYNCED) ||
> +  (store == DP_PSR_SINK_ACTIVE_SRC_SYNCED) ||
> +  (store == DP_PSR_SINK_ACTIVE_RFB)))
> + return 0;
> + else
> + store = reg;
> + } else {

You dropped the ACTIVE_RESYNC and INACTIVE comments from below. Those
probably should move here.

With those fixed, I think this would be fine:

Reviewed-by: Brian Norris 

> + if ((reg == DP_PSR_SINK_INACTIVE) ||
> + (reg == DP_PSR_SINK_ACTIVE_RESYNC))
> + return 0;
> + }
>  
> - val = drm_dp_dpcd_readb(&dp->aux, DP_PSR_STATUS, &status);
> - if (val < 0) {
> - dev_err(dp->dev, "PSR_STATUS read failed ret=%z

Re: [PATCH] drm: bridge: Use devm_platform_get_and_ioremap_resource()

2023-01-19 Thread Brian Norris

On Thu, Jan 19, 2023 at 03:59:01PM +0800, ye.xingc...@zte.com.cn wrote:
> From: ye xingchen 
> 
> Convert platform_get_resource(), devm_ioremap_resource() to a single
> call to devm_platform_get_and_ioremap_resource(), as this is exactly
> what this function does.
> 
> Signed-off-by: ye xingchen 
> ---
>  drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
> b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> index df9370e0ff23..50f092b316d0 100644
> --- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> +++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> @@ -1686,7 +1686,6 @@ analogix_dp_probe(struct device *dev, struct 
> analogix_dp_plat_data *plat_data)
>  {
>   struct platform_device *pdev = to_platform_device(dev);
>   struct analogix_dp_device *dp;
> - struct resource *res;
>   unsigned int irq_flags;
>   int ret;
> 
> @@ -1740,9 +1739,7 @@ analogix_dp_probe(struct device *dev, struct 
> analogix_dp_plat_data *plat_data)
> 
>   clk_prepare_enable(dp->clock);
> 
> - res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> -
> - dp->reg_base = devm_ioremap_resource(&pdev->dev, res);
> + dp->reg_base = devm_platform_get_and_ioremap_resource(pdev, 0, NULL);

Rather than a NULL 3rd argument, couldn't you just use
devm_platform_ioremap_resource()? With that:

Reviewed-by: Brian Norris 


>   if (IS_ERR(dp->reg_base)) {
>   ret = PTR_ERR(dp->reg_base);
>   goto err_disable_clk;
> -- 
> 2.25.1

[PATCH v3 2/2] drm/rockchip: vop: Leave vblank enabled in self-refresh

2023-01-09 Thread Brian Norris

If we disable vblank when entering self-refresh, vblank APIs (like
DRM_IOCTL_WAIT_VBLANK) no longer work. But user space is not aware when
we enter self-refresh, so this appears to be an API violation -- that
DRM_IOCTL_WAIT_VBLANK fails with EINVAL whenever the display is idle and
enters self-refresh.

The downstream driver used by many of these systems never used to
disable vblank for PSR, and in fact, even upstream, we didn't do that
until radically redesigning the state machine in commit 6c836d965bad
("drm/rockchip: Use the helpers for PSR").

Thus, it seems like a reasonable API fix to simply restore that
behavior, and leave vblank enabled.

Note that this appears to potentially unbalance the
drm_crtc_vblank_{off,on}() calls in some cases, but:
(a) drm_crtc_vblank_on() documents this as OK and
(b) if I do the naive balancing, I find state machine issues such that
we're not in sync properly; so it's easier to take advantage of (a).

This issue was exposed by IGT's kms_vblank tests, and reported by
KernelCI. The bug has been around a while (longer than KernelCI
noticed), but was only exposed once self-refresh was bugfixed more
recently, and so KernelCI could properly test it. Some other notes in:

  https://lore.kernel.org/dri-devel/y6ocg9bpnjvim...@google.com/
  Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked 
on rk3399-gru-kevin

== Backporting notes: ==

Marking as 'Fixes' commit 6c836d965bad ("drm/rockchip: Use the helpers
for PSR"), but it probably depends on commit bed030a49f3e
("drm/rockchip: Don't fully disable vop on self refresh") as well.

We also need the previous patch ("drm/atomic: Allow vblank-enabled +
self-refresh "disable""), of course.

v3:
 * no update

v2:
 * skip unnecessary lock/unlock

Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR")
Cc: 
Link: https://lore.kernel.org/dri-devel/y5itf0+yniqa6...@sirena.org.uk/
Reported-by: "kernelci.org bot" 
Signed-off-by: Brian Norris 
---
 drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index fa1f4ee6d195..9fea03121247 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -717,13 +717,13 @@ static void vop_crtc_atomic_disable(struct drm_crtc *crtc,
if (crtc->state->self_refresh_active)
rockchip_drm_set_win_enabled(crtc, false);
 
+   if (crtc->state->self_refresh_active)
+   goto out;
+
mutex_lock(&vop->vop_lock);
 
drm_crtc_vblank_off(crtc);
 
-   if (crtc->state->self_refresh_active)
-   goto out;
-
/*
 * Vop standby will take effect at end of current frame,
 * if dsp hold valid irq happen, it means standby complete.
@@ -757,9 +757,9 @@ static void vop_crtc_atomic_disable(struct drm_crtc *crtc,
vop_core_clks_disable(vop);
pm_runtime_put(vop->dev);
 
-out:
mutex_unlock(&vop->vop_lock);
 
+out:
if (crtc->state->event && !crtc->state->active) {
spin_lock_irq(&crtc->dev->event_lock);
drm_crtc_send_vblank_event(crtc, crtc->state->event);
-- 
2.39.0.314.g84b9a713c41-goog

[PATCH v3 1/2] drm/atomic: Allow vblank-enabled + self-refresh "disable"

2023-01-09 Thread Brian Norris

The self-refresh helper framework overloads "disable" to sometimes mean
"go into self-refresh mode," and this mode activates automatically
(e.g., after some period of unchanging display output). In such cases,
the display pipe is still considered "on", and user-space is not aware
that we went into self-refresh mode. Thus, users may expect that
vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
properly.

However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
vblank enabled.

Add a different expectation: that CRTCs *should* leave vblank enabled
when going into self-refresh.

This patch is preparation for another patch -- "drm/rockchip: vop: Leave
vblank enabled in self-refresh" -- which resolves conflicts between the
above self-refresh behavior and the API tests in IGT's kms_vblank test
module.

== Some alternatives discussed: ==

It's likely that on many display controllers, vblank interrupts will
turn off when the CRTC is disabled, and so in some cases, self-refresh
may not support vblank. To support such cases, we might consider
additions to the generic helpers such that we fire vblank events based
on a timer.

However, there is currently only one driver using the common
self-refresh helpers (i.e., rockchip), and at least as of commit
bed030a49f3e ("drm/rockchip: Don't fully disable vop on self refresh"),
the CRTC hardware is powered enough to continue to generate vblank
interrupts.

So we chose the simpler option of leaving vblank interrupts enabled. We
can reevaluate this decision and perhaps augment the helpers if/when we
gain a second driver that has different requirements.

v3:
 * include discussion summary

v2:
 * add 'ret != 0' warning case for self-refresh
 * describe failing test case and relation to drm/rockchip patch better

Cc:  # dependency for "drm/rockchip: vop: Leave
         # vblank enabled in self-refresh"
Signed-off-by: Brian Norris 
---
 drivers/gpu/drm/drm_atomic_helper.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index d579fd8f7cb8..a22485e3e924 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1209,7 +1209,16 @@ disable_outputs(struct drm_device *dev, struct 
drm_atomic_state *old_state)
continue;
 
ret = drm_crtc_vblank_get(crtc);
-   WARN_ONCE(ret != -EINVAL, "driver forgot to call 
drm_crtc_vblank_off()\n");
+   /*
+* Self-refresh is not a true "disable"; ensure vblank remains
+* enabled.
+*/
+   if (new_crtc_state->self_refresh_active)
+   WARN_ONCE(ret != 0,
+ "driver disabled vblank in self-refresh\n");
+   else
+   WARN_ONCE(ret != -EINVAL,
+ "driver forgot to call 
drm_crtc_vblank_off()\n");
if (ret == 0)
drm_crtc_vblank_put(crtc);
}
-- 
2.39.0.314.g84b9a713c41-goog

Re: [PATCH v2 1/2] drm/atomic: Allow vblank-enabled + self-refresh "disable"

2023-01-06 Thread Brian Norris

On Fri, Jan 6, 2023 at 5:23 PM Brian Norris  wrote:
> v2:
>  * add 'ret != 0' warning case for self-refresh
>  * describe failing test case and relation to drm/rockchip patch better

Ugh, there's always something you remember right after you hit send: I
forgot to better summarize some of the other discussion from v1, and
alternatives we didn't entertain. I'll write that up now (not sure
whether in patch 1 or 2) and plan on sending a v3 for next week, in
case there are any other comments I should address at the same time.

Sorry for the noise,
Brian

[PATCH v2 2/2] drm/rockchip: vop: Leave vblank enabled in self-refresh

2023-01-06 Thread Brian Norris

If we disable vblank when entering self-refresh, vblank APIs (like
DRM_IOCTL_WAIT_VBLANK) no longer work. But user space is not aware when
we enter self-refresh, so this appears to be an API violation -- that
DRM_IOCTL_WAIT_VBLANK fails with EINVAL whenever the display is idle and
enters self-refresh.

The downstream driver used by many of these systems never used to
disable vblank for PSR, and in fact, even upstream, we didn't do that
until radically redesigning the state machine in commit 6c836d965bad
("drm/rockchip: Use the helpers for PSR").

Thus, it seems like a reasonable API fix to simply restore that
behavior, and leave vblank enabled.

Note that this appears to potentially unbalance the
drm_crtc_vblank_{off,on}() calls in some cases, but:
(a) drm_crtc_vblank_on() documents this as OK and
(b) if I do the naive balancing, I find state machine issues such that
we're not in sync properly; so it's easier to take advantage of (a).

This issue was exposed by IGT's kms_vblank tests, and reported by
KernelCI.

Backporting notes:
Marking as 'Fixes' commit 6c836d965bad ("drm/rockchip: Use the helpers
for PSR"), but it probably depends on commit bed030a49f3e
("drm/rockchip: Don't fully disable vop on self refresh") as well.

We also need the previous patch ("drm/atomic: Allow vblank-enabled +
self-refresh "disable""), of course.

v2:
 * skip unnecessary lock/unlock

Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR")
Cc: 
Link: https://lore.kernel.org/dri-devel/y5itf0+yniqa6...@sirena.org.uk/
Reported-by: "kernelci.org bot" 
Signed-off-by: Brian Norris 
---
 drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index fa1f4ee6d195..9fea03121247 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -717,13 +717,13 @@ static void vop_crtc_atomic_disable(struct drm_crtc *crtc,
if (crtc->state->self_refresh_active)
rockchip_drm_set_win_enabled(crtc, false);
 
+   if (crtc->state->self_refresh_active)
+   goto out;
+
mutex_lock(&vop->vop_lock);
 
drm_crtc_vblank_off(crtc);
 
-   if (crtc->state->self_refresh_active)
-   goto out;
-
/*
 * Vop standby will take effect at end of current frame,
 * if dsp hold valid irq happen, it means standby complete.
@@ -757,9 +757,9 @@ static void vop_crtc_atomic_disable(struct drm_crtc *crtc,
vop_core_clks_disable(vop);
pm_runtime_put(vop->dev);
 
-out:
mutex_unlock(&vop->vop_lock);
 
+out:
if (crtc->state->event && !crtc->state->active) {
spin_lock_irq(&crtc->dev->event_lock);
drm_crtc_send_vblank_event(crtc, crtc->state->event);
-- 
2.39.0.314.g84b9a713c41-goog

[PATCH v2 1/2] drm/atomic: Allow vblank-enabled + self-refresh "disable"

2023-01-06 Thread Brian Norris

The self-refresh helper framework overloads "disable" to sometimes mean
"go into self-refresh mode," and this mode activates automatically
(e.g., after some period of unchanging display output). In such cases,
the display pipe is still considered "on", and user-space is not aware
that we went into self-refresh mode. Thus, users may expect that
vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
properly.

However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
vblank enabled.

Add a different expectation: that CRTCs *should* leave vblank enabled
when going into self-refresh.

This patch is preparation for another patch -- "drm/rockchip: vop: Leave
vblank enabled in self-refresh" -- which resolves conflicts between the
above self-refresh behavior and the API tests in IGT's kms_vblank test
module.

v2:
 * add 'ret != 0' warning case for self-refresh
 * describe failing test case and relation to drm/rockchip patch better

Cc:  # dependency for "drm/rockchip: vop: Leave
     # vblank enabled in self-refresh"
Signed-off-by: Brian Norris 
---
 drivers/gpu/drm/drm_atomic_helper.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index d579fd8f7cb8..a22485e3e924 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1209,7 +1209,16 @@ disable_outputs(struct drm_device *dev, struct 
drm_atomic_state *old_state)
continue;
 
ret = drm_crtc_vblank_get(crtc);
-   WARN_ONCE(ret != -EINVAL, "driver forgot to call 
drm_crtc_vblank_off()\n");
+   /*
+* Self-refresh is not a true "disable"; ensure vblank remains
+* enabled.
+*/
+   if (new_crtc_state->self_refresh_active)
+   WARN_ONCE(ret != 0,
+ "driver disabled vblank in self-refresh\n");
+   else
+   WARN_ONCE(ret != -EINVAL,
+ "driver forgot to call 
drm_crtc_vblank_off()\n");
if (ret == 0)
drm_crtc_vblank_put(crtc);
}
-- 
2.39.0.314.g84b9a713c41-goog

Re: [PATCH 2/2] drm/rockchip: vop: Leave vblank enabled in self-refresh

2023-01-06 Thread Brian Norris

On Fri, Jan 06, 2023 at 12:42:54PM +0100, Michel Dänzer wrote:
> On 1/6/23 02:40, Brian Norris wrote:
> > --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> > +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> > @@ -719,11 +719,11 @@ static void vop_crtc_atomic_disable(struct drm_crtc 
> > *crtc,
> >  
> > mutex_lock(&vop->vop_lock);
> >  
> > -   drm_crtc_vblank_off(crtc);
> > -
> > if (crtc->state->self_refresh_active)
> > goto out;
> >  
> > +   drm_crtc_vblank_off(crtc);
> > +
> > /*
> >  * Vop standby will take effect at end of current frame,
> >  * if dsp hold valid irq happen, it means standby complete.
> 
> The out label immediately unlocks vop->vop_lock again, seems a bit pointless. 
> :)

Oops, of course. Will change that in v2.

> AFAICT the self_refresh_active case should just return immediately,
> the out label would also send any pending atomic commit completion
> event, which seems wrong now that the vblank interrupt is left
> enabled.

If I return immediately and drop that completion, I hit a warning:

[4.423876] WARNING: CPU: 5 PID: 58 at 
drivers/gpu/drm/drm_atomic_helper.c:2460 
drm_atomic_helper_commit_hw_done+0xe0/0xe8
...
[4.424036] Call trace:
[4.424039]  drm_atomic_helper_commit_hw_done+0xe0/0xe8
[4.424045]  drm_atomic_helper_commit_tail_rpm+0x58/0x7c
...

I plan to leave it as-is for v2.

> (I also wonder if drm_crtc_vblank_off doesn't take care of
> that already, at least amdgpu doesn't do this explicitly AFAICT)

I'm not sure.

Brian

Re: [PATCH 1/2] drm/atomic: Allow vblank-enabled + self-refresh "disable"

2023-01-06 Thread Brian Norris

On Fri, Jan 06, 2023 at 09:30:56PM +0100, Daniel Vetter wrote:
> On Fri, Jan 06, 2023 at 11:33:06AM -0800, Brian Norris wrote:
> > On Fri, Jan 06, 2023 at 07:17:53PM +0100, Daniel Vetter wrote:
> > > - check that drivers which use self_refresh are not using
> > >   drm_atomic_helper_wait_for_vblanks(), because that would defeat the
> > >   point
> > 
> > I'm a bit lost on this one. drm_atomic_helper_wait_for_vblanks() is part
> > of the common drm_atomic_helper_commit_tail*() helpers, and so it's
> > naturally used in many cases (including Rockchip/PSR). And how does it
> > defeat the point?
> 
> Yeah, but that's for backwards compat reasons, the much better function is
> drm_atomic_helper_wait_for_flip_done(). And if you go into self refresh
> that's really the better one.

My knowledge is certainly going to diminish once you talk about
backwards compat for drivers I'm very unfamiliar with. Are you
suggesting I can change the default
drm_atomic_helper_commit_tail{,_rpm}() to use
drm_atomic_helper_wait_for_flip_done()? (Or, just when
self_refresh_active==true?) I can try to read through drivers for
compatibility, but I may be prone to breaking things.

Otherwise, I'd be copy/paste/modifying the generic commit helpers.

> > > - have a drm_crtc_vblank_off/on which take the crtc state, so they can
> > >   look at the self-refresh state
> > 
> > And I suppose you mean this helper variant would kick off the next step
> > (fake vblank timer)?
> 
> Yeah, I figured that's the better way to implement this since it would be
> driver agnostic. But rockchip is still the only driver using the
> self-refresh helpers, so I guess it doesn't really matter.

I've run into enough gotchas with these helpers that I feel like it
might be difficult to ever get a second driver using this. Or at least,
we'd have to learn what requirements they have when we get there. (Well,
maybe you know better, but I certainly don't.)

I'm tempted to just go with what's the simplest for Rockchip now, and
look at some generic timer fallbacks later if the need arises.

> > Also, I still haven't found that fake timer machinery, but maybe I just
> > don't know what I'm looking for.
> 
> I ... didn't find it either. I'm honestly not sure whether this works for
> intel, or whether we do something silly like disable self-refresh when a
> vblank interrupt is pending :-/

Nice to know I'm not the only one, I suppose :)

> I think new proposal from me is to just respin this patch here with our
> discussion all summarized (it's good to record this stuff for the next
> person that comes around), and the WARN_ON adjusted so it also checks that
> vblank interrupts keep working (per the ret value at least, it's not a
> real functional check). And call that good enough.

Sounds good. I'll try to summarize without immortalizing too much of my
ignorance ;)

And thanks for your thoughts.

> Also maybe look into switching from wait_for_vblanks to
> wait_for_flip_done, it's the right thing to do (see kerneldoc, it should
> explain things a bit).

I've read some, and will probably reread a few more times. And I left
one question above.

Brian

Re: [PATCH 1/2] drm/atomic: Allow vblank-enabled + self-refresh "disable"

2023-01-06 Thread Brian Norris

On Fri, Jan 06, 2023 at 07:17:53PM +0100, Daniel Vetter wrote:
> Ok I think I was a bit slow here, and it makes sense. Except this now
> means we loose this check, and I'm also not sure whether we really want
> drivers to implement this all.
> 
> What I think we want here is a bit more:
> - for the self-refresh case check that the vblank all still works

You mean, keep the WARN_ONCE(), but invert it to ensure that 'ret == 0'?
I did consider that, but I don't know why I stopped.

> - check that drivers which use self_refresh are not using
>   drm_atomic_helper_wait_for_vblanks(), because that would defeat the
>   point

I'm a bit lost on this one. drm_atomic_helper_wait_for_vblanks() is part
of the common drm_atomic_helper_commit_tail*() helpers, and so it's
naturally used in many cases (including Rockchip/PSR). And how does it
defeat the point?

> - have a drm_crtc_vblank_off/on which take the crtc state, so they can
>   look at the self-refresh state

And I suppose you mean this helper variant would kick off the next step
(fake vblank timer)?

> - fake vblanks with hrtimer, because on most hw when you turn off the crtc
>   the vblanks are also turned off, and so your compositor would still
>   hang. The vblank machinery already has all the code to make this happen
>   (and if it's not all, then i915 psr code should have it).

Is a timer better than an interrupt? I'm pretty sure the vblank
interrupts still can fire on Rockchip CRTC (VOP) (see also the other
branch of this thread), so this isn't really necessary. (IGT vblank
tests pass without hanging.) Unless you simply prefer a fake timer for
some reason.

Also, I still haven't found that fake timer machinery, but maybe I just
don't know what I'm looking for.

> - I think kunit tests for this all would be really good, it's a rather
>   complex state machinery between modesets and vblank functionality. You
>   can speed up the kunit tests with some really high refresh rate, which
>   isn't possible on real hw.

Last time I tried my hand at kunit in a subsystem with no prior kunit
tests, I had a miserable time and gave up. At least DRM has a few
already, so maybe this wouldn't be as terrible. Perhaps I can give this
a shot, but there's a chance this will kick things to the back burner
far enough that I simply don't get around to it at all. (So far, I'm
only addressing this because KernelCI complained.)

> I'm also wondering why we've had this code for years and only hit issues
> now?

I'd guess a few reasons:
1. drm_self_refresh_helper_init() is only used by one driver -- Rockchip
2. Rockchip systems are most commonly either Chromebooks, or else
   otherwise cheap embedded things, and may not have displays at all,
   let alone displays with PSR
3. Rockchip Chromebooks shipped with a kernel forked off of the earlier
   PSR support, before everything got refactored (and vblank handling
   regressed) for the self-refresh "helpers". They only upgraded to a
   newer upstream kernel within the last few months.
4. AFAICT, ChromeOS user space doesn't even exercise the vblank-related
   ioctls, so we don't actually notice that this is "broken". I suppose
   it would only be IGT tests that notice.
5. I fixed up various upstream PSR bugs are part of #3 [0],
   along the way I unborked PSR enough that KernelCI finally caught the
   bug. See my explanation in [1] for why the vblank bug was masked, and
   appeared to be a "regression" due to my more recent fixes.

Brian

[0] Combined with point #2: ChromeOS would be the first serious users of
the refactored PSR support. All this was needed to make it actually
usable:

(2021) c4c6ef229593 drm/bridge: analogix_dp: Make PSR-exit block less
(2022) ca871659ec16 drm/bridge: analogix_dp: Support PSR-exit to disable 
transition <--- KernelCI "blamed" this one, because PSR was less broken
(2022) e54a4424925a drm/atomic: Force bridge self-refresh-exit on CRTC 
switch

[1] https://lore.kernel.org/dri-devel/y6ocg9bpnjvim...@google.com/
Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on 
rk3399-gru-kevin

Re: [PATCH 1/2] drm/atomic: Allow vblank-enabled + self-refresh "disable"

2023-01-06 Thread Brian Norris

On Fri, Jan 06, 2023 at 07:20:40PM +0100, Daniel Vetter wrote:
> On Fri, Jan 06, 2023 at 10:08:53AM -0800, Brian Norris wrote:
> > OK! Then that sounds like it at least ACKs my general idea for this
> > series. (Michel and I poked at a few ideas in the thread at [1] and
> > landed on approx. this solution, or else a fake/timer like you suggest.)
> 
> Yeah once I stopped looking at this the wrong way round it does make sense
> what you're doing. See my other reply, I think with just this series here
> the vblanks still stall out? Or does your hw actually keep generating
> vblank irq with the display off?

I might not be understanding all of the IGT tests that I've run through,
but the display is not actually off -- it's sitting on a still frame.
But yes, IRQs still come, and I see no hangs.

This is also ref'd in patch 2:

  bed030a49f3e drm/rockchip: Don't fully disable vop on self refresh

So, we're not even doing that much to power-down the CRTC/VOP. That's
probably why IRQs are still active, contrary to your expectation?

NB: this is how Rockchip Chromebooks implemented PSR from essentially
day 1.

> > On Fri, Jan 06, 2023 at 06:53:49PM +0100, Daniel Vetter wrote:
> > > We might need a few more helpers. Also, probably more igt, or is this
> > > something igt testing has uncovered? If so, please cite the igt testcase
> > > which hits this.
> > 
> > The current patch only fixes a warning that comes when I try to do the
> > second patch. The second patch is a direct product of an IGT test
> > failure (a few of kms_vblank's subtests), and I linked [1] the KernelCI
> > report there.
> 
> Ah yeah that makes sense. Would be good to cite that in this patch too,
> because I guess the same kms_vblank tests can also hit this warning here?

Well, before this series: no, those tests don't hit this warning. The
warning is only uncovered after patch 2. If I do just patch 2, it's
super-trivial to hit the warning. You just have to turn the display on
and go idle long enough for PSR to activate once. I suppose that can
count as "caught by a test", with a little stretch of the imagination.

At any rate, I'll improve this description to refer precisely to the
"next patch" (as Greg suggested already), and that might involve
describing this test case a little.

Brian

Re: [PATCH 1/2] drm/atomic: Allow vblank-enabled + self-refresh "disable"

2023-01-06 Thread Brian Norris

Hi Daniel,

On Fri, Jan 06, 2023 at 06:53:49PM +0100, Daniel Vetter wrote:
> On Thu, Jan 05, 2023 at 05:40:17PM -0800, Brian Norris wrote:
> > The self-refresh helper framework overloads "disable" to sometimes mean
> > "go into self-refresh mode," and this mode activates automatically
> > (e.g., after some period of unchanging display output). In such cases,
> > the display pipe is still considered "on", and user-space is not aware
> > that we went into self-refresh mode. Thus, users may expect that
> > vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
> > properly.
> > 
> > However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
> > vblank enabled here.
> > 
> > Add a new exception, such that we allow CRTCs to be "disabled" (with
> > self-refresh active) with vblank interrupts still enabled.
> > 
> > Cc:  # dependency for subsequent patch
> > Signed-off-by: Brian Norris 
> > ---
> > 
> >  drivers/gpu/drm/drm_atomic_helper.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> > b/drivers/gpu/drm/drm_atomic_helper.c
> > index d579fd8f7cb8..7b5eddadebd5 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1207,6 +1207,12 @@ disable_outputs(struct drm_device *dev, struct 
> > drm_atomic_state *old_state)
> >  
> > if (!drm_dev_has_vblank(dev))
> > continue;
> > +   /*
> > +* Self-refresh is not a true "disable"; let vblank remain
> > +* enabled.
> > +*/
> > +   if (new_crtc_state->self_refresh_active)
> > +   continue;
> 
> This very fishy, because we check in crtc_needs_disable whether this
> output should stay on due to self-refresh. Which means you should never
> end up in here.

That's not what crtc_needs_disable() does w.r.t. self-refresh. In fact,
it's the opposite; see, for example, the
|new_state->self_refresh_active| clause. That clause means that if we're
entering self-refresh, we *intend* to disable (i.e., we return 'true').
That's because like I mention above, the self-refresh helpers overload
what "disable" means.

I'll also add my caveat again that I'm a bit new to DRM, so feel free to
continue to correct me if I'm wrong :) Or perhaps Sean Paul could
provide second opinions, as I believe he wrote this stuff.

> And yes vblank better work in self refresh :-) If it doesn't, then you
> need to fake it with a timer, that's at least what i915 has done for
> transparent self-refresh.

OK! Then that sounds like it at least ACKs my general idea for this
series. (Michel and I poked at a few ideas in the thread at [1] and
landed on approx. this solution, or else a fake/timer like you suggest.)

> We might need a few more helpers. Also, probably more igt, or is this
> something igt testing has uncovered? If so, please cite the igt testcase
> which hits this.

The current patch only fixes a warning that comes when I try to do the
second patch. The second patch is a direct product of an IGT test
failure (a few of kms_vblank's subtests), and I linked [1] the KernelCI
report there.

Brian

[1] Link: https://lore.kernel.org/dri-devel/y5itf0+yniqa6...@sirena.org.uk/
Reported-by: "kernelci.org bot"

Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on rk3399-gru-kevin

2023-01-05 Thread Brian Norris

On Thu, Jan 05, 2023 at 04:59:55PM -0800, Brian Norris wrote:
> On Wed, Jan 04, 2023 at 10:11:46AM +0100, Michel Dänzer wrote:
> > On 1/4/23 03:11, Brian Norris wrote:
> > > On Tue, Jan 03, 2023 at 07:04:00PM +0100, Michel Dänzer wrote:
> > >> On 12/21/22 23:02, Brian Norris wrote:
> > >>> 3. leave vblank enabled even in the presence of PSR
> > > 
> > > I'm leaning toward this.
> > 
> > If this means vblank interrupts will arrive as expected even while in PSR, 
> > that may be the ideal solution indeed.
> 
> Yes. And I think I have a working patchset for this now. It passes all
> the igt-gpu-tools/kms_vblank tests for me now, and I don't think it
> causes any other issues. I'll send it out shortly, after a bit more
> testing.

For the record, that's here:
https://lore.kernel.org/lkml/20230105174001.2.Ic07cba4ab9a7bd3618a9e4258b8f92ea7d10ae5a@changeid/

I didn't CC everyone who got this report. In included a:

  Reported-by: "kernelci.org bot" 

even though it didn't really bisect the right thing. Let me know if
there's a different/better way to give credit.

Brian

[PATCH 2/2] drm/rockchip: vop: Leave vblank enabled in self-refresh

2023-01-05 Thread Brian Norris

If we disable vblank when entering self-refresh, vblank APIs (like
DRM_IOCTL_WAIT_VBLANK) no longer work. But user space is not aware when
we enter self-refresh, so this appears to be an API violation -- that
DRM_IOCTL_WAIT_VBLANK fails with EINVAL whenever the display is idle and
enters self-refresh.

The downstream driver used by many of these systems never used to
disable vblank for PSR, and in fact, even upstream, we didn't do that
until radically redesigning the state machine in commit 6c836d965bad
("drm/rockchip: Use the helpers for PSR").

Thus, it seems like a reasonable API fix to simply restore that
behavior, and leave vblank enabled.

Note that this appears to potentially unbalance the
drm_crtc_vblank_{off,on}() calls in some cases, but:
(a) drm_crtc_vblank_on() documents this as OK and
(b) if I do the naive balancing, I find state machine issues such that
we're not in sync properly; so it's easier to take advantage of (a).

Backporting notes:
Marking as 'Fixes' commit 6c836d965bad ("drm/rockchip: Use the helpers
for PSR"), but it probably depends on commit bed030a49f3e
("drm/rockchip: Don't fully disable vop on self refresh") as well.

We also need the previous patch ("drm/atomic: Allow vblank-enabled +
self-refresh "disable""), of course.

Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR")
Cc: 
Link: https://lore.kernel.org/dri-devel/y5itf0+yniqa6...@sirena.org.uk/
Reported-by: "kernelci.org bot" 
Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index fa1f4ee6d195..c541967114b4 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -719,11 +719,11 @@ static void vop_crtc_atomic_disable(struct drm_crtc *crtc,
 
mutex_lock(&vop->vop_lock);
 
-   drm_crtc_vblank_off(crtc);
-
if (crtc->state->self_refresh_active)
goto out;
 
+   drm_crtc_vblank_off(crtc);
+
/*
 * Vop standby will take effect at end of current frame,
 * if dsp hold valid irq happen, it means standby complete.
-- 
2.39.0.314.g84b9a713c41-goog

[PATCH 1/2] drm/atomic: Allow vblank-enabled + self-refresh "disable"

2023-01-05 Thread Brian Norris

The self-refresh helper framework overloads "disable" to sometimes mean
"go into self-refresh mode," and this mode activates automatically
(e.g., after some period of unchanging display output). In such cases,
the display pipe is still considered "on", and user-space is not aware
that we went into self-refresh mode. Thus, users may expect that
vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
properly.

However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
vblank enabled here.

Add a new exception, such that we allow CRTCs to be "disabled" (with
self-refresh active) with vblank interrupts still enabled.

Cc:  # dependency for subsequent patch
Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/drm_atomic_helper.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index d579fd8f7cb8..7b5eddadebd5 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1207,6 +1207,12 @@ disable_outputs(struct drm_device *dev, struct 
drm_atomic_state *old_state)
 
if (!drm_dev_has_vblank(dev))
continue;
+   /*
+* Self-refresh is not a true "disable"; let vblank remain
+* enabled.
+*/
+   if (new_crtc_state->self_refresh_active)
+   continue;
 
ret = drm_crtc_vblank_get(crtc);
WARN_ONCE(ret != -EINVAL, "driver forgot to call 
drm_crtc_vblank_off()\n");
-- 
2.39.0.314.g84b9a713c41-goog

Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on rk3399-gru-kevin

2023-01-05 Thread Brian Norris

On Wed, Jan 04, 2023 at 10:11:46AM +0100, Michel Dänzer wrote:
> On 1/4/23 03:11, Brian Norris wrote:
> > On Tue, Jan 03, 2023 at 07:04:00PM +0100, Michel Dänzer wrote:
> >> On 12/21/22 23:02, Brian Norris wrote:
> > 
> >>> 3. leave vblank enabled even in the presence of PSR
> > 
> > I'm leaning toward this.
> 
> If this means vblank interrupts will arrive as expected even while in PSR, 
> that may be the ideal solution indeed.

Yes. And I think I have a working patchset for this now. It passes all
the igt-gpu-tools/kms_vblank tests for me now, and I don't think it
causes any other issues. I'll send it out shortly, after a bit more
testing.

Side note: I believe this vblank-disabled issue actually came in as an
upstream regression at commit 6c836d965bad ("drm/rockchip: Use the
helpers for PSR"); before that, we were doing this very differently, and
didn't touch vblank at all for PSR, similar to the "downstream" stuff I
mentioned earlier.

> >> 5. Go/stay out of PSR while vblank interrupts are enabled (probably want 
> >> to make sure vblankoffdelay is set up such that vblank interrupts are 
> >> disabled ASAP)
> > 
> > That seems not extremely nice, to waste power. Based on the earlier
> > downstream implementation (which left vblank interrupts enabled), I'd
> > wager there's a much larger power win from PSR (on the display-transport
> > and memory-bandwidth costs), relative to the power cost of vblank
> > interrupts.
> > 
> > Also, messing with vblankoffdelay sounds likely to trigger the races
> > mentioned in the drm_vblank.c kerneldoc.
> 
> Not sure how likely that is, quite a few drivers are setting 
> dev->vblank_disable_immediate = true.
> 
> With that, vblank interrupts should generally be enabled only while there are 
> screen updates as well[0], in which case PSR shouldn't be possible anyway.
> 
> [0] There may be user space which uses the vblank ioctls even while there are 
> no screen updates though, which would prevent PSR in this case.

OK. I'm just reading docs here; definitely not an expert.

> >>> [1] Or is it? I don't really know the DRM_IOCTL_WAIT_VBLANK ABI
> >>> contract in the presence of PSR, but I believe the upstream PSR
> >>> support has always worked this way. I could imagine
> >>> DRM_IOCTL_WAIT_VBLANK users might not love seeing EINVAL here
> >>> though.
> >>
> >> Yeah, that's pretty likely to cause issues with existing real-world user 
> >> space.
> > 
> > OK. Any hints as to what kind of user space uses this?
> 
> I don't have any specific example, just thinking about how user space could 
> respond to the vblank ioctls returning an error, and it would seem to be 
> generally either of:
> 
> * Just run non-throttled, which might negate any energy savings from PSR
> * Fail to work altogether
> 
> 
> > I'm in part simply curious, but I'm also wondering what the
> > error-handling and reliability (e.g., what if vblanks don't come?)
> > expectations might be in practice.
> 
> If vblank events never come, user space might freeze.

Thanks. If my patchset works out like I expect, this should no longer be
noticeable to user space, so I don't really have to test out your
educated guesses :)

Thanks for the idea-bouncing.

Brian

Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on rk3399-gru-kevin

2023-01-03 Thread Brian Norris

Hi Michel,

Thanks for your thoughts. I'll give my attempt at weighing my and your
options, with the caveat that I'm still not much of a DRM expert.

On Tue, Jan 03, 2023 at 07:04:00PM +0100, Michel Dänzer wrote:
> On 12/21/22 23:02, Brian Norris wrote:
> > So how to fix this?
> > 
> > 1. ignore it; it's "just" a test failure, IIUC [1]

Probably discarded, per Michel's notes.

> > 2. change the test, to skip this test if the device has PSR

Doesn't seem great, and not just because PSR is not directly detectable
in user space.

> > 3. leave vblank enabled even in the presence of PSR

I'm leaning toward this.

> > 4. otherwise modify the vblank ioctl interface, such that one can "wait"
> >for a vblank that won't come (because the display is in self-refresh
> >/ there are no new frames or vblanks)

That doesn't sound so great of an API, to essentially induce hangs,
pending other activity. (Assuming I understand the DRM APIs here
correctly.)

> FWIW, a couple more alternatives:
> 
> 5. Go/stay out of PSR while vblank interrupts are enabled (probably want to 
> make sure vblankoffdelay is set up such that vblank interrupts are disabled 
> ASAP)

That seems not extremely nice, to waste power. Based on the earlier
downstream implementation (which left vblank interrupts enabled), I'd
wager there's a much larger power win from PSR (on the display-transport
and memory-bandwidth costs), relative to the power cost of vblank
interrupts.

Also, messing with vblankoffdelay sounds likely to trigger the races
mentioned in the drm_vblank.c kerneldoc.

> 6. Use fallback timers for vblank events while in PSR (there might be some 
> infrastructure for this, since some drivers don't have real vblank interrupts)

There's drm_atomic_helper_fake_vblank(), but I don't think that's really
hooked up to a timer. That's potentially promising though.

> > [1] Or is it? I don't really know the DRM_IOCTL_WAIT_VBLANK ABI
> > contract in the presence of PSR, but I believe the upstream PSR
> > support has always worked this way. I could imagine
> > DRM_IOCTL_WAIT_VBLANK users might not love seeing EINVAL here
> > though.
> 
> Yeah, that's pretty likely to cause issues with existing real-world user 
> space.

OK. Any hints as to what kind of user space uses this? A cursory look
doesn't show that any of the ChromeOS user space uses it, which may be
why it was overlooked in the initial PSR development and upstreaming.
I'm in part simply curious, but I'm also wondering what the
error-handling and reliability (e.g., what if vblanks don't come?)
expectations might be in practice.

All in all, it's seeming like maybe option 3 or 6 are best. I'd lean
toward 3, assuming I can hack my way through "driver forgot to call
drm_crtc_vblank_off()" and similar problems, in part because it's ground
that has partially been tread before. I hope that doesn't complicate the
DRM state machine even more though, now that I'll have to better
coordinate the "enabled"/"self_refresh_active" states with "vblank
enabled"...

Thoughts still welcome of course.

Brian

Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on rk3399-gru-kevin

2022-12-21 Thread Brian Norris

Hi Mark, Sean, (and dri-devel)

On Wed, Dec 14, 2022 at 07:04:37PM -0800, Brian Norris wrote:
> On Tue, Dec 13, 2022 at 04:51:11PM +, Mark Brown wrote:
> > On Tue, Dec 13, 2022 at 05:56:30AM -0800, KernelCI bot wrote:
> > 
> > The KernelCI bisection bot found regressions in at least two KMS tests
> > in the Renesas tree on rk3399-gru-kevin just after the Renesas tree
> > merged up mainline:
> > 
> >igt-kms-rockchip.kms_vblank.pipe-A-wait-forked 
> >igt-kms-rockchip.kms_vblank.pipe-A-query-busy
> > 
> > which it bisected to ca871659ec16 ("drm/bridge: analogix_dp: Support
> > PSR-exit to disable transition").  I'm not *100%* sure I trust the
> > bisection but it sure is suspicous that two separate bisects for related
> > issues landed on the same commit.

...

> > | IGT-Version: 1.26-gf8a4a0b (aarch64) (Linux: 6.1.0 aarch64)
> > | <14>[   35.48] [IGT] drm_read: starting subtest short-buffer-wakeup
> > | Starting subtest: short-buffer-wakeup
> > | 
> > | (| drm_read:350) CRITICAL: Test assertion failure function 
> > generate_event, file ../tests/drm_read.c:65:
> > | (drm_read:350) CRITICAL: <14>[   36.155642] [IGT] drm_read: exiting, 
> > ret=98
> > | Failed assertion: kmstest_get_vblank(fd, pipe, DRM_VBLANK_EVENT)
> > | 
> > | (drm_read:350) CRITICAL: Last errno: 22, Invalid argument
> > | Stack trace:
> > | 
> > |   #0 ../lib/igt_core.c:1933 __igt_fail_assert()
> > |   #1 [+0xd5362770]
> > |   #2 [+0xd536193c]
> > |   #3 [__libc_start_main+0xe8]
> > |   #4 [+0xd5361974]
> > |   #5 [[   36.162851] Console: switching to colour frame buffer 
> > device 300x100>+0xd5361974]
> > | Subtest short-buffer-wakeup failed.

...

> I'll look some more, but this might be a difference of test setup, such
> that my setup has the issue before and after that commit, but your setup
> sees a regression.

I believe this is the case; things are broken both before and after, but
depending on test sequencing, etc., they might have seemed less broken
before.

When we're failing, we're getting EINVAL from drm_vblank_get(). That
essentially means that vblank has been disabled (drm_crtc_vblank_off()).
As it happens, this is exactly what we do when we enter PSR (Panel Self
Refresh).

Now, these test cases (especially 'kms_vblank.pipe-A-wait-forked') seem
to display a static image, and then wait for a bunch of vblank events.
This is exactly the sort of case where we should enter PSR, and so we're
likely to disable vblank events. Thus, if everything is working right,
we will often hit EINVAL in ioctl(DRM_IOCTL_WAIT_VBLANK) ... ->
drm_vblank_get(), and fail the test.

As to why this appears to be a regression: like mentioned in my previous
mail, these tests tend to hose the Analogix PSR state before my patch
set. When PSR is hosed, we tend to stay with PSR disabled, and so
drm_vblank_get() doesn't return EINVAL, and the test is happy. (Never
mind that the display is likely non-functional.) [0]

So how to fix this?

1. ignore it; it's "just" a test failure, IIUC [1]
2. change the test, to skip this test if the device has PSR
3. leave vblank enabled even in the presence of PSR
4. otherwise modify the vblank ioctl interface, such that one can "wait"
   for a vblank that won't come (because the display is in self-refresh
   / there are no new frames or vblanks)

I don't know how to do #2, because this variant of PSR is intentionally
opaque to user space.

For #3: the downstream PSR support that these systems shipped with
initially did not disable vblank in PSR. But that was massively
rewritten/refactored by Sean Paul before it made it upstream, and now it
does. I tried briefly to factor that part out
(drivers/gpu/drm/rockchip/rockchip_drm_vop.c /
drm_crtc_vblank_{on,off}()), but because drm_self_refresh_helper.c
(ab?)uses the commit step to "disable" the CRTC for PSR (even though the
CRTC is not fully disabled), I tend to hit this in
drm_atomic_helper_commit_modeset_disables()->disable_outputs():

ret = drm_crtc_vblank_get(crtc);
WARN_ONCE(ret != -EINVAL, "driver forgot to call 
drm_crtc_vblank_off()\n");

And I hit a few more problems too...

...I'm sure I could hack my way through this somehow, but this is all
sounding like it could use some advice from someone more familiar with
DRM and/or the drm_self_refresh_helper design. I've learned my way
around this a bit by necessity, but I still feel like I don't know all
the right answers here. (*cough*, Sean?)

Brian

[0] I definitely reproduce this part locally, before my patches. I can't
find non-expired CI logs for "passing" runs to be sure that's what the
CI is seeing though.

[1] Or is it? I don't really know the DRM_IOCTL_WAIT_VBLANK ABI
contract in the presence of PSR, but I believe the upstream PSR
support has always worked this way. I could imagine
DRM_IOCTL_WAIT_VBLANK users might not love seeing EINVAL here
though.

Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on rk3399-gru-kevin

2022-12-14 Thread Brian Norris

On Tue, Dec 13, 2022 at 04:51:11PM +, Mark Brown wrote:
> On Tue, Dec 13, 2022 at 05:56:30AM -0800, KernelCI bot wrote:
> 
> The KernelCI bisection bot found regressions in at least two KMS tests
> in the Renesas tree on rk3399-gru-kevin just after the Renesas tree
> merged up mainline:
> 
>igt-kms-rockchip.kms_vblank.pipe-A-wait-forked 
>igt-kms-rockchip.kms_vblank.pipe-A-query-busy
> 
> which it bisected to ca871659ec16 ("drm/bridge: analogix_dp: Support
> PSR-exit to disable transition").  I'm not *100%* sure I trust the
> bisection but it sure is suspicous that two separate bisects for related
> issues landed on the same commit.
> 
> Below is the full report for the bisect for the first test, the bisect
> for the latter looks identical.  It's got links to full logs for the
> test run and a Reported-by for the bot - I do see some backtraces from
> userspace in the output, the first is:

I think the backtraces are just because IGT calls assert().

> | IGT-Version: 1.26-gf8a4a0b (aarch64) (Linux: 6.1.0 aarch64)
> | <14>[   35.48] [IGT] drm_read: starting subtest short-buffer-wakeup
> | Starting subtest: short-buffer-wakeup
> | 
> | (| drm_read:350) CRITICAL: Test assertion failure function generate_event, 
> file ../tests/drm_read.c:65:
> | (drm_read:350) CRITICAL: <14>[   36.155642] [IGT] drm_read: exiting, ret=98
> | Failed assertion: kmstest_get_vblank(fd, pipe, DRM_VBLANK_EVENT)
> | 
> | (drm_read:350) CRITICAL: Last errno: 22, Invalid argument
> | Stack trace:
> | 
> |   #0 ../lib/igt_core.c:1933 __igt_fail_assert()
> |   #1 [+0xd5362770]
> |   #2 [+0xd536193c]
> |   #3 [__libc_start_main+0xe8]
> |   #4 [+0xd5361974]
> |   #5 [[   36.162851] Console: switching to colour frame buffer 
> device 300x100>+0xd5361974]
> | Subtest short-buffer-wakeup failed.
> 
> Unfortunately we don't have current results from mainline or -next.

Thanks for the notification. I'm running:

  6e516faf0431 drm/panfrost: Job should reference MMU not file_priv

which is allegedly a "good" kernel. And running this:

  ### First kill my display manager, etc.
  ## Then:
  IGT_FORCE_DRIVER=rockchip /usr/libexec/igt-gpu-tools/kms_vblank --run-subtest 
pipe-A-wait-forked

Gives the appended log [1]. If I'm looking right, it seems like it's
failing the same way as the "regression."

I also tested this:

  # the 5.10.x backport, and its parent:
  dbe04e874d4fbd56be64fdfcb29410241b6ad08a^ -- i.e.:
  61297ee0c329 Input: bcm5974 - set missing URB_NO_TRANSFER_DMA_MAP urb flag

and saw the same failures, and I also see the failures I was trying to
avoid in this series (see e54a4424925a ("drm/atomic: Force bridge
self-refresh-exit on CRTC switch")). Perhaps that's because of the
"First kill my display manager, etc." -- because that step means we
might end up switching CRTCs (VOPs) when the test starts, and triggering
the bug.

I'll look some more, but this might be a difference of test setup, such
that my setup has the issue before and after that commit, but your setup
sees a regression.

Small tangent: It's possible this is similar to stuff I had to debug a
while back in this space:

atomictest: Disable CRTCs before test
https://chromium-review.googlesource.com/c/chromiumos/platform/drm-tests/+/3309960
null_platform_test: Disable CRTCs first
https://chromium-review.googlesource.com/c/chromiumos/platform/drm-tests/+/3315478

In that case, there was actually an underlying kernel regression due to:

  846c7dfc1193 drm/atomic: Try to preserve the crtc enabled state in 
drm_atomic_remove_fb, v2.

But our tests were broken too (assuming an initial state that wasn't
guaranteed), so we just fixed the tests.

Anyway, I'll confirm when I get some fresh eyes and can try a few more
things (like neutering the ChromeOS display framework for my tests).

Brian

[1]

IGT-Version: 1.26-gf8a4a0b5 (arm) (Linux: 5.18.0-rc6+ aarch64)
Starting subtest: pipe-A-wait-forked
Beginning pipe-A-wait-forked on pipe A, connector eDP-1
(kms_vblank:2532) CRITICAL: Test assertion failure function vblank_wait, file 
../igt-gpu-tools-1.25/tests/kms_vblank.c:319:
(kms_vblank:2535) CRITICAL: Test assertion failure function vblank_wait, file 
../igt-gpu-tools-1.25/tests/kms_vblank.c:319:
(kms_vblank:2531) CRITICAL: Test assertion failure function vblank_wait, file 
../igt-gpu-tools-1.25/tests/kms_vblank.c:319:
(kms_vblank:2532) CRITICAL: Failed assertion: wait_vblank(fd, &vbl) == 0
(kms_vblank:2534) CRITICAL: Test assertion failure function vblank_wait, file 
../igt-gpu-tools-1.25/tests/kms_vblank.c:319:
(kms_vblank:2535) CRITICAL: Failed assertion: wait_vblank(fd, &vbl) == 0
(kms_vblank:2531) CRITICAL: Failed assertion: wait_vblank(fd, &vbl) == 0
(kms_vblank:2536) CRITICAL: Test assertion failure function vblank_wait, file 
../igt-gpu-tools-1.25/tests/kms_vblank.c:319:
(kms_vblank:2532) CRITICAL: Last errno: 22, Invalid argument
(kms_vblank:2534) CRITICAL: Failed assertion: wait_vblank(fd, &vbl) == 0
(kms_vblank:2536) CRITICAL: Failed assertion: wait_vblank(fd,

Re: [PATCH] drm/i915/huc: fix leak of debug object in huc load fence on driver unload

2022-11-16 Thread Brian Norris

Hi Daniele,

On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:
> The fence is always initialized in huc_init_early, but the cleanup in
> huc_fini is only being run if HuC is enabled. This causes a leaking of
> the debug object when HuC is disabled/not supported, which can in turn
> trigger a warning if we try to register a new debug offset at the same
> address on driver reload.
> 
> To fix the issue, make sure to always run the cleanup code.
> 
> Reported-by: Tvrtko Ursulin 
> Reported-by: Brian Norris 
> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
> Signed-off-by: Daniele Ceraolo Spurio 
> Cc: Tvrtko Ursulin 
> Cc: Brian Norris 
> Cc: Alan Previn 
> Cc: John Harrison 
> ---
> 
> Note: I didn't manage to repro the reported warning, but I did confirm
> that we weren't correctly calling i915_sw_fence_fini and that this patch
> fixes that.

I *did* reproduce, and with this patch, I no longer reproduce. So:

Tested-by: Brian Norris 

I see this differs very slightly from the draft version (which didn't
work for me):

https://lore.kernel.org/all/ac5fde11-c17d-8574-c938-c2278d53c...@intel.com/

so presumably that diff is the fix.

Thanks a bunch!

Brian

>  drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++-
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
>  2 files changed, 8 insertions(+), 5 deletions(-)

Re: [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device

2022-11-08 Thread Brian Norris

On Tue, Nov 08, 2022 at 11:50:04AM -0500, Felix Kuehling wrote:
> While you're making the pmu list per-device, I'd suggest removing adev from
> the pmu entry because it is now redundant. The device is implied by the list
> that the entry is on. Instead, add an adev parameter to
> init_pmu_entry_by_type_and_add. Or you could move the list_add_tail to
> amdgpu_pmu_init and remove "_and_add" from the function name.

Sorry if I'm being naive here, but does that mean trying to navigate the
list pointers to move from a 'pmu_entry' to an 'adev'
(list_first_entry(), etc.)? There are quite a few cases where we're
trying to go between 'pmu_entry' and 'adev'. I guess I could turn that
into a mini helper.

I'll also need to scrounge around a bit to see if I have an amdgpu
system around that actually supports PMU. I realized the one I tested on
doesn't actually hit this code path... and this would be getting a
little less obvious/trivial.

> Other than that, the patch looks good to me.

Thanks for looking!

Brian

[PATCH] drm/rockchip: vop: Quiet always-warning AFBC log

2022-10-31 Thread Brian Norris

The downstream code from which this was derived didn't ever run through
this 'switch' block with non-AFBC formats, but the upstream code does --
we use this function to probe whether a given format is supported.

Demote the warning to eliminate this sort of warning seen on every
boot:

  [drm] unsupported AFBC format[3231564e]

And make it warn more than once, because if we *actually* care to see
what formats we're probing/rejecting and for what reasons, we probably
care about more than just the first message.

Drop the comment, because one of the two *is* commonly reachable.

And lastly, drop the unreachable return; we'd do better to let the
compiler complain if we start hitting this unexpectedly.

Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index fa1f4ee6d195..aab77eb6caa3 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -316,13 +316,10 @@ static int vop_convert_afbc_format(uint32_t format)
case DRM_FORMAT_RGB565:
case DRM_FORMAT_BGR565:
return AFBC_FMT_RGB565;
-   /* either of the below should not be reachable */
default:
-   DRM_WARN_ONCE("unsupported AFBC format[%08x]\n", format);
+   DRM_DEBUG_KMS("unsupported AFBC format[%08x]\n", format);
return -EINVAL;
}
-
-   return -EINVAL;
 }
 
 static uint16_t scl_vop_cal_scale(enum scale_mode mode, uint32_t src,
-- 
2.38.1.273.g43a17bfeac-goog

[PATCH 2/2] drm/amdgpu: Set PROBE_PREFER_ASYNCHRONOUS

2022-10-28 Thread Brian Norris

This driver often takes over 200ms to start, so it can improve boot
speed to probe it asynchronously.

I did a short review of the driver, and apart from an issue fixed in the
parent patch ("drm/amdgpu: Move racy global PMU list into device"),
there don't appear to be many cross-device dependencies or racy accesses
to global state, so this should be safe.

This driver was pinpointed as part of a survey of top slowest initcalls
(i.e., are built in, and probing synchronously) on a lab of ChromeOS
systems.

Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 3c9fecdd6b2f..2d180e48df1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2793,7 +2793,10 @@ static struct pci_driver amdgpu_kms_pci_driver = {
.probe = amdgpu_pci_probe,
.remove = amdgpu_pci_remove,
.shutdown = amdgpu_pci_shutdown,
-   .driver.pm = &amdgpu_pm_ops,
+   .driver = {
+   .pm = &amdgpu_pm_ops,
+   .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+   },
.err_handler = &amdgpu_pci_err_handler,
.dev_groups = amdgpu_sysfs_groups,
 };
-- 
2.38.1.273.g43a17bfeac-goog

[PATCH 1/2] drm/amdgpu: Move racy global PMU list into device

2022-10-28 Thread Brian Norris

If there are multiple amdgpu devices, this list processing can be racy.

We're really treating this like a per-device list, so make that explicit
and remove the global list.

Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c | 12 +---
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0e6ddf05c23c..e968b7f2417c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1063,6 +1063,10 @@ struct amdgpu_device {
struct work_struct  reset_work;
 
booljob_hang;
+
+#if IS_ENABLED(CONFIG_PERF_EVENTS)
+   struct list_head pmu_list;
+#endif
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
index 71ee361d0972..24f2055a2f23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
@@ -23,6 +23,7 @@
 
 #include 
 #include 
+#include 
 #include "amdgpu.h"
 #include "amdgpu_pmu.h"
 
@@ -72,9 +73,6 @@ static ssize_t amdgpu_pmu_event_show(struct device *dev,
amdgpu_pmu_attr->event_str, amdgpu_pmu_attr->type);
 }
 
-static LIST_HEAD(amdgpu_pmu_list);
-
-
 struct amdgpu_pmu_attr {
const char *name;
const char *config;
@@ -558,7 +556,7 @@ static int init_pmu_entry_by_type_and_add(struct 
amdgpu_pmu_entry *pmu_entry,
pr_info("Detected AMDGPU %d Perf Events.\n", total_num_events);
 
 
-   list_add_tail(&pmu_entry->entry, &amdgpu_pmu_list);
+   list_add_tail(&pmu_entry->entry, &pmu_entry->adev->pmu_list);
 
return 0;
 err_register:
@@ -579,9 +577,7 @@ void amdgpu_pmu_fini(struct amdgpu_device *adev)
 {
struct amdgpu_pmu_entry *pe, *temp;
 
-   list_for_each_entry_safe(pe, temp, &amdgpu_pmu_list, entry) {
-   if (pe->adev != adev)
-   continue;
+   list_for_each_entry_safe(pe, temp, &adev->pmu_list, entry) {
list_del(&pe->entry);
perf_pmu_unregister(&pe->pmu);
kfree(pe->pmu.attr_groups);
@@ -623,6 +619,8 @@ int amdgpu_pmu_init(struct amdgpu_device *adev)
int ret = 0;
struct amdgpu_pmu_entry *pmu_entry, *pmu_entry_df;
 
+   INIT_LIST_HEAD(&adev->pmu_list);
+
switch (adev->asic_type) {
case CHIP_VEGA20:
pmu_entry_df = create_pmu_entry(adev, AMDGPU_PMU_PERF_TYPE_DF,
-- 
2.38.1.273.g43a17bfeac-goog

[PATCH] drm/i915: Set PROBE_PREFER_ASYNCHRONOUS

2022-10-28 Thread Brian Norris

This driver often takes a good 100ms to start, but in some particularly
bad cases takes more than 1 second.

In surveying risk for this driver, I poked around for cross-device
shared state, which can be a source of race conditions. GVT support
(intel_gvt_devices) seems potentially suspect, but it has an appropriate
mutex, and doesn't seem to care about ordering -- if devices are present
when the kvmgt module loads, they'll get picked up; and if they probe
later than kvmgt, they'll attach then.

Additionally, I see past discussions about this patch [1], which
concluded that there were problems at the time due to the way
hdac_i915.c called request_module("i915") and expected it to complete
probing [2]. Work has since been merged [3] to fix that problem.

This driver was pinpointed as part of a survey of drivers that take more
than 100ms in their initcall (i.e., are built in, and probing
synchronously) on a lab of ChromeOS systems.

[1] [RFC] i915: make the probe asynchronous
https://lore.kernel.org/all/20180604053219.2040-1-feng.t...@intel.com/

[2] https://lore.kernel.org/all/s5hin4d1e3f.wl-ti...@suse.de/

[3] Commit f9b54e1961c7 ("ALSA: hda/i915: Allow delayed i915 audio
component binding")

Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/i915/i915_pci.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 38460a0bd7cb..1cb1f87aea86 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1371,7 +1371,10 @@ static struct pci_driver i915_pci_driver = {
.probe = i915_pci_probe,
.remove = i915_pci_remove,
.shutdown = i915_pci_shutdown,
-   .driver.pm = &i915_pm_ops,
+   .driver = {
+   .pm = &i915_pm_ops,
+   .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+   },
 };
 
 int i915_pci_register_driver(void)
-- 
2.38.1.273.g43a17bfeac-goog

[PATCH 2/2] drm/rockchip: dsi: Force synchronous probe

2022-10-19 Thread Brian Norris

We can't safely probe a dual-DSI display asynchronously
(driver_async_probe='*' or driver_async_probe='dw-mipi-dsi-rockchip'
cmdline), because dw_mipi_dsi_rockchip_find_second() pokes one DSI
device's drvdata from the other device without any locking.

Request synchronous probe, at least until this driver learns some
appropriate locking for dual-DSI initialization.

Cc: 
Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index d222c6811207..528ddce144e5 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -1689,5 +1689,11 @@ struct platform_driver dw_mipi_dsi_rockchip_driver = {
.of_match_table = dw_mipi_dsi_rockchip_dt_ids,
.pm = &dw_mipi_dsi_rockchip_pm_ops,
.name   = "dw-mipi-dsi-rockchip",
+   /*
+* For dual-DSI display, one DSI pokes at the other DSI's
+* drvdata in dw_mipi_dsi_rockchip_find_second(). This is not
+* safe for asynchronous probe.
+*/
+   .probe_type = PROBE_FORCE_SYNCHRONOUS,
},
 };
-- 
2.38.0.413.g74048e4d9e-goog

[PATCH 1/2] drm/rockchip: dsi: Clean up 'usage_mode' when failing to attach

2022-10-19 Thread Brian Norris

If we fail to attach the first time (especially: EPROBE_DEFER), we fail
to clean up 'usage_mode', and thus will fail to attach on any subsequent
attempts, with "dsi controller already in use".

Re-set to DW_DSI_USAGE_IDLE on attach failure.

This is especially common to hit when enabling asynchronous probe on a
duel-DSI system (such as RK3399 Gru/Scarlet), such that we're more
likely to fail dw_mipi_dsi_rockchip_find_second() the first time.

Fixes: 71f68fe7f121 ("drm/rockchip: dsi: add ability to work as a phy instead 
of full dsi")
Cc: 
Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index bf6948125b84..d222c6811207 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -1051,23 +1051,31 @@ static int dw_mipi_dsi_rockchip_host_attach(void 
*priv_data,
if (ret) {
DRM_DEV_ERROR(dsi->dev, "Failed to register component: %d\n",
ret);
-   return ret;
+   goto out;
}
 
second = dw_mipi_dsi_rockchip_find_second(dsi);
-   if (IS_ERR(second))
-   return PTR_ERR(second);
+   if (IS_ERR(second)) {
+   ret = PTR_ERR(second);
+   goto out;
+   }
if (second) {
ret = component_add(second, &dw_mipi_dsi_rockchip_ops);
if (ret) {
DRM_DEV_ERROR(second,
  "Failed to register component: %d\n",
  ret);
-   return ret;
+   goto out;
}
}
 
return 0;
+
+out:
+   mutex_lock(&dsi->usage_mutex);
+   dsi->usage_mode = DW_DSI_USAGE_IDLE;
+   mutex_unlock(&dsi->usage_mutex);
+   return ret;
 }
 
 static int dw_mipi_dsi_rockchip_host_detach(void *priv_data,
-- 
2.38.0.413.g74048e4d9e-goog

Re: [PATCH] Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time"

2022-09-12 Thread Brian Norris

On Thu, Aug 25, 2022 at 11:06 AM Brian Norris  wrote:
> On Thu, Aug 25, 2022 at 10:37 AM Doug Anderson  wrote:
> > Given that this is _not_ an area that I'm an expert in nor is it an
> > area where the consequences are super easy to analyze, I'm a little
> > hesitant to apply this to drm-misc-next myself. Ideally someone more
> > familiar with the driver would do it. However, if nobody steps up
> > after a few weeks and nobody has yelled about this patch, I'll apply
> > it.

For this particular patch, I'd be interested in whether Zhang Zekun
has any feedback (even a Tested-by?), since they were patching this
function in the first place, which is why I paid attention:

Subject: [PATCH -next] drm/bridge: Add missing clk_disable_unprepare()
in analogix_dp_resume()
https://lore.kernel.org/lkml/20220816064231.60473-1-zhangzeku...@huawei.com/

But in absence of that...it has now been a few weeks :)

I'll also mark this to come back to again in a week or two, in case
somebody is still hoping to wait longer.

Regards,
Brian

Re: [PATCH] Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time"

2022-08-25 Thread Brian Norris

Hi,

On Thu, Aug 25, 2022 at 10:37 AM Doug Anderson  wrote:
> Reviewed-by: Douglas Anderson 

Thanks :)

> Given that this is _not_ an area that I'm an expert in nor is it an
> area where the consequences are super easy to analyze, I'm a little
> hesitant to apply this to drm-misc-next myself. Ideally someone more
> familiar with the driver would do it. However, if nobody steps up
> after a few weeks and nobody has yelled about this patch, I'll apply
> it.

I'll take this opportunity to correct Andrzej's email address (my bad;
I need to use the up-to-date MAINTAINERS / .mailmap when generating CC
lists), in case he's one such expert.

Brian

[PATCH] Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time"

2022-08-24 Thread Brian Norris

This reverts commit 211f276ed3d96e964d2d1106a198c7f4a4b3f4c0.

For quite some time, core DRM helpers already ensure that any relevant
connectors/CRTCs/etc. are disabled, as well as their associated
components (e.g., bridges) when suspending the system. Thus,
analogix_dp_bridge_{enable,disable}() already get called, which in turn
call drm_panel_{prepare,unprepare}(). This makes these drm_panel_*()
calls redundant.

Besides redundancy, there are a few problems with this handling:

(1) drm_panel_{prepare,unprepare}() are *not* reference-counted APIs and
are not in general designed to be handled by multiple callers --
although some panel drivers have a coarse 'prepared' flag that mitigates
some damage, at least. So at a minimum this is redundant and confusing,
but in some cases, this could be actively harmful.

(2) The error-handling is a bit non-standard. We ignored errors in
suspend(), but handled errors in resume(). And recently, people noticed
that the clk handling is unbalanced in error paths, and getting *that*
right is not actually trivial, given the current way errors are mostly
ignored.

(3) In the particular way analogix_dp_{suspend,resume}() get used (e.g.,
in rockchip_dp_*(), as a late/early callback), we don't necessarily have
a proper PM relationship between the DP/bridge device and the panel
device. So while the DP bridge gets resumed, the panel's parent device
(e.g., platform_device) may still be suspended, and so any prepare()
calls may fail.

So remove the superfluous, possibly-harmful suspend()/resume() handling
of panel state.

Fixes: 211f276ed3d9 ("drm: bridge: analogix/dp: add panel prepare/unprepare in 
suspend/resume time")
Link: https://lore.kernel.org/all/yv2cpbd3picg%2f...@google.com/
Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index 8aadcc0aa90b..df9370e0ff23 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1864,12 +1864,6 @@ EXPORT_SYMBOL_GPL(analogix_dp_remove);
 int analogix_dp_suspend(struct analogix_dp_device *dp)
 {
clk_disable_unprepare(dp->clock);
-
-   if (dp->plat_data->panel) {
-   if (drm_panel_unprepare(dp->plat_data->panel))
-   DRM_ERROR("failed to turnoff the panel\n");
-   }
-
return 0;
 }
 EXPORT_SYMBOL_GPL(analogix_dp_suspend);
@@ -1884,13 +1878,6 @@ int analogix_dp_resume(struct analogix_dp_device *dp)
return ret;
}
 
-   if (dp->plat_data->panel) {
-   if (drm_panel_prepare(dp->plat_data->panel)) {
-   DRM_ERROR("failed to setup the panel\n");
-   return -EBUSY;
-   }
-   }
-
return 0;
 }
 EXPORT_SYMBOL_GPL(analogix_dp_resume);
-- 
2.37.2.609.g9ff673ca1a-goog

Re: [PATCH -next] drm/bridge: Add missing clk_disable_unprepare() in analogix_dp_resume()

2022-08-24 Thread Brian Norris

On Mon, Aug 15, 2022 at 11:46 PM Zhang Zekun  wrote:
>
> Add the missing clk_disable_unprepare() before return from
> analogix_dp_resume() in the error handling case.
>
> Fixes: 211f276ed3d9 ("drm: bridge: analogix/dp: add panel prepare/unprepare 
> in suspend/resume time")
> Signed-off-by: Zhang Zekun 

Reviewed-by: Brian Norris

Re: [PATCH -next] drm/bridge: Add missing clk_disable_unprepare() in analogix_dp_resume()

2022-08-22 Thread Brian Norris

Hi,

On Wed, Aug 17, 2022 at 05:05:25PM -0700, Brian Norris wrote:
> Hmm, actually I'm going to have to retract that now that I've given it
> some more testing locally. I happen to have a system where I commonly
> hit this error case, and I'm thinking commit 211f276ed3d9 is actually
> wrong, and so we shouldn't be "fixing" its error handling -- we should
> be reverting it.

I've submitted that for review here:
https://lore.kernel.org/all/20220822180729.1.I8ac5abe3a4c1c6fd5c061686c6e883c22f69022c@changeid/
[PATCH] Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in 
suspend/resume time"

I'd appreciate your review/testing.

(NB: I failed to honor the .mailmap for Andrzej Hajda.)

> Now separately, I have to figure out why I'm hitting this error case in
> the first place...

FWIW, I captured the reason in point 3 on the above Revert. The
pm_runtime_*() handling in the panel driver fails (-EACCES) because the
bridge driver is resuming before the panel. (The DRM suspend/resume
helpers handle things in the correct order.)

This problem is also resolved by simply reverting.

Brian

Re: [PATCH -next] drm/bridge: Add missing clk_disable_unprepare() in analogix_dp_resume()

2022-08-17 Thread Brian Norris

On Wed, Aug 17, 2022 at 01:34:13PM -0700, Brian Norris wrote:
> On Mon, Aug 15, 2022 at 11:46 PM Zhang Zekun  wrote:
> >
> > Add the missing clk_disable_unprepare() before return from
> > analogix_dp_resume() in the error handling case.
> >
> > Fixes: 211f276ed3d9 ("drm: bridge: analogix/dp: add panel prepare/unprepare 
> > in suspend/resume time")
> > Signed-off-by: Zhang Zekun 
> 
> Reviewed-by: Brian Norris 

Hmm, actually I'm going to have to retract that now that I've given it
some more testing locally. I happen to have a system where I commonly
hit this error case, and I'm thinking commit 211f276ed3d9 is actually
wrong, and so we shouldn't be "fixing" its error handling -- we should
be reverting it.

In particular, drm_panel_prepare()/drm_panel_unprepare() are *not*
reference-counted APIs, and this is already managed by
analogix_dp_bridge_disable(), which is called by the core DRM helpers
during suspend. Thus, analogix_dp_suspend()/analogix_dp_resume() is
serving to be an unwanted *second* client trying to {un,}prepare the
panel.

The panel drivers tend to handle this OK to some extent, as they (e.g.,
panel-edp.c) guard against redundant operations, but *we* don't --
notice that analogix_dp_suspend() ignores drm_panel_unprepare() errors
for one.

Also, I don't believe device management really handles resume() failures
quite right; in the end, this patch ends up un-balancing the clk count
on the RK3399 Gru-Bob systems I'm testing.

(Side note: every other bridge driver seems to ignore
drm_panel_prepare() failures.)

It's possible this was correct (or at least, not terribly broken) back
when it was written, but then, the DRM core frameworks have evolved
since then. Today, I think we do not need to manage the panel state
directly in suspend()/resume().

All in all:

Nacked-by: Brian Norris 
Tested-and-failed-by: Brian Norris 

Now separately, I have to figure out why I'm hitting this error case in
the first place...

Re: [PATCH] drm/rockchip: vop: Don't crash for invalid duplicate_state()

2022-06-24 Thread Brian Norris

On Fri, Jun 24, 2022 at 12:23 AM Heiko Stuebner  wrote:
> The interesting question would be, do we want some fixes tag for it?

I'm not aware of any currently-upstream code that will hit this [1].
I've hit it in out-of-tree code (or, code that I submitted to
dri-devel, but wasn't accepted as-is), and this is the "belt and
braces" part -- the primary fix is that we should avoid calling things
like drm_atomic_get_crtc_state() at inappropriate times.

So, is the "extra safety" check really something that should go to
-stable? (Because let's be honest, everything with a Fixes tag goes
there.) Maybe?

Anyway, if you want to "blame" anything, this commit actually dropped
the safety check:

4e257d9eee23 drm/rockchip: get rid of rockchip_drm_crtc_mode_config

Brian

[1] But I'm not omniscient. So maybe it's good to have anyway.

[PATCH] drm/rockchip: vop: Don't crash for invalid duplicate_state()

2022-06-17 Thread Brian Norris

It's possible for users to try to duplicate the CRTC state even when the
state doesn't exist. drm_atomic_helper_crtc_duplicate_state() (and other
users of __drm_atomic_helper_crtc_duplicate_state()) already guard this
with a WARN_ON() instead of crashing, so let's do that here too.

Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index 74562d40f639..daf192881353 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -1570,6 +1570,9 @@ static struct drm_crtc_state 
*vop_crtc_duplicate_state(struct drm_crtc *crtc)
 {
struct rockchip_crtc_state *rockchip_state;
 
+   if (WARN_ON(!crtc->state))
+   return NULL;
+
rockchip_state = kzalloc(sizeof(*rockchip_state), GFP_KERNEL);
if (!rockchip_state)
return NULL;
-- 
2.36.1.476.g0c4daa206d-goog

Re: [PATCH v2 0/2] drm/bridge: analogix_dp: Self-refresh state machine fixes

2022-06-06 Thread Brian Norris

On Mon, Jun 6, 2022 at 1:30 PM Doug Anderson  wrote:
> On Fri, Jun 3, 2022 at 8:17 AM Doug Anderson  wrote:
> > On Fri, Jun 3, 2022 at 8:11 AM Sean Paul  wrote:
> > > Apologies for the delay. Please in future ping on irc/chat if you're
> > > waiting for review from me, my inbox is often neglected.

OK, I'll try to keep that in mind. I can't help myself with the
semi-relevant XKCD though ;)
https://xkcd.com/1254/

> > > The set still looks good to me,
> > >
> > > Reviewed-by: Sean Paul 

Thanks!

> > Unless someone yells, I'll plan to apply both patches to
> > drm-misc-fixes early next week, possibly Monday. Seems like if someone
> > was going to object to these they've had plenty of time up until now.
>
> As promised, I pushed these to drm-misc-fixes:
>
> e54a4424925a drm/atomic: Force bridge self-refresh-exit on CRTC switch
> ca871659ec16 drm/bridge: analogix_dp: Support PSR-exit to disable transition

And thanks, Doug.

Brian

Re: [PATCH v2 0/2] drm/bridge: analogix_dp: Self-refresh state machine fixes

2022-05-23 Thread Brian Norris

On Thu, Mar 10, 2022 at 3:50 PM Brian Norris  wrote:
> On Mon, Feb 28, 2022 at 12:25 PM Brian Norris  
> wrote:

> Ping for review? Sean, perhaps? (You already reviewed this on the
> Chromium tracker.)

Ping

Re: [PATCH v2 0/2] drm/bridge: analogix_dp: Self-refresh state machine fixes

2022-03-10 Thread Brian Norris

On Mon, Feb 28, 2022 at 12:25 PM Brian Norris  wrote:
>
> Hi,
>
> I've been investigating several eDP issues on a Rockchip RK3399 system
> and have two proposed bugfixes. RK3399 has two CRTCs, either of which
> can be used for eDP output. For both fixes, we have bugs due to the
> relationship between the generalized "self refresh helpers" and the
> analogix-dp bridge driver which controls much of the PSR mechanics.
> These bugs are most visible when switching between CRTCs.
>
> I'm not a DRM expert, but I've been poking at a lot of Rockchip display
> drivers recently. I'd love some skeptical eyes, so feel free to ask
> questions if I haven't explained issues well, or the proposals look
> fishy.
>
> Regards,
> Brian

Ping for review? Sean, perhaps? (You already reviewed this on the
Chromium tracker.)

Brian

[PATCH v4 2/2] drm/bridge: analogix_dp: Enable autosuspend

2022-03-01 Thread Brian Norris

DP AUX transactions can consist of many short operations. There's no
need to power things up/down in short intervals.

I pick an arbitrary 100ms; for the systems I'm testing (Rockchip
RK3399), runtime-PM transitions only take a few microseconds.

Signed-off-by: Brian Norris 
---

Changes in v4:
 - call pm_runtime_mark_last_busy() and
   pm_runtime_dont_use_autosuspend()
 - drop excess pm references around drm_get_edid(), now that we grab and
   hold in the dp-aux helper

Changes in v3:
 - New in v3

 drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index 16be279aed2c..b248d352f2bd 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1119,9 +1119,7 @@ static int analogix_dp_get_modes(struct drm_connector 
*connector)
return 0;
}
 
-   pm_runtime_get_sync(dp->dev);
edid = drm_get_edid(connector, &dp->aux.ddc);
-   pm_runtime_put(dp->dev);
if (edid) {
drm_connector_update_edid_property(&dp->connector,
   edid);
@@ -1642,7 +1640,8 @@ static ssize_t analogix_dpaux_transfer(struct drm_dp_aux 
*aux,
 
ret = analogix_dp_transfer(dp, msg);
 out:
-   pm_runtime_put(dp->dev);
+   pm_runtime_mark_last_busy(dp->dev);
+   pm_runtime_put_autosuspend(dp->dev);
 
return ret;
 }
@@ -1775,6 +1774,8 @@ int analogix_dp_bind(struct analogix_dp_device *dp, 
struct drm_device *drm_dev)
if (ret)
return ret;
 
+   pm_runtime_use_autosuspend(dp->dev);
+   pm_runtime_set_autosuspend_delay(dp->dev, 100);
pm_runtime_enable(dp->dev);
 
ret = analogix_dp_create_bridge(drm_dev, dp);
@@ -1786,6 +1787,7 @@ int analogix_dp_bind(struct analogix_dp_device *dp, 
struct drm_device *drm_dev)
return 0;
 
 err_disable_pm_runtime:
+   pm_runtime_dont_use_autosuspend(dp->dev);
pm_runtime_disable(dp->dev);
drm_dp_aux_unregister(&dp->aux);
 
@@ -1804,6 +1806,7 @@ void analogix_dp_unbind(struct analogix_dp_device *dp)
}
 
drm_dp_aux_unregister(&dp->aux);
+   pm_runtime_dont_use_autosuspend(dp->dev);
pm_runtime_disable(dp->dev);
 }
 EXPORT_SYMBOL_GPL(analogix_dp_unbind);
-- 
2.35.1.574.g5d30c73bfb-goog

[PATCH v4 1/2] drm/bridge: analogix_dp: Grab runtime PM reference for DP-AUX

2022-03-01 Thread Brian Norris

If the display is not enable()d, then we aren't holding a runtime PM
reference here. Thus, it's easy to accidentally cause a hang, if user
space is poking around at /dev/drm_dp_aux0 at the "wrong" time.

Let's get a runtime PM reference, and check that we "see" the panel.
Don't force any panel power-up, etc., because that can be intrusive, and
that's not what other drivers do (see
drivers/gpu/drm/bridge/ti-sn65dsi86.c and
drivers/gpu/drm/bridge/parade-ps8640.c.)

Fixes: 0d97ad03f422 ("drm/bridge: analogix_dp: Remove duplicated code")
Cc: 
Cc: Tomeu Vizoso 
Signed-off-by: Brian Norris 
Reviewed-by: Douglas Anderson 
---

Changes in v4:
- Add Doug's Reviewed-by

Changes in v3:
- Avoid panel power-up; just check for HPD state, and let the rest
  happen "as-is" (e.g., time out, if the caller hasn't prepared things
  properly)

Changes in v2:
- Fix spelling in Subject
- DRM_DEV_ERROR() -> drm_err()
- Propagate errors from un-analogix_dp_prepare_panel()

 drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index b7d2e4449cfa..16be279aed2c 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1632,8 +1632,19 @@ static ssize_t analogix_dpaux_transfer(struct drm_dp_aux 
*aux,
   struct drm_dp_aux_msg *msg)
 {
struct analogix_dp_device *dp = to_dp(aux);
+   int ret;
+
+   pm_runtime_get_sync(dp->dev);
+
+   ret = analogix_dp_detect_hpd(dp);
+   if (ret)
+   goto out;
 
-   return analogix_dp_transfer(dp, msg);
+   ret = analogix_dp_transfer(dp, msg);
+out:
+   pm_runtime_put(dp->dev);
+
+   return ret;
 }
 
 struct analogix_dp_device *
-- 
2.35.1.574.g5d30c73bfb-goog

Re: [PATCH v3 2/2] drm/bridge: analogix_dp: Enable autosuspend

2022-03-01 Thread Brian Norris

On Tue, Feb 22, 2022 at 2:10 PM Doug Anderson  wrote:
> On Thu, Feb 17, 2022 at 2:42 PM Brian Norris  wrote:
> > --- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> > +++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> > @@ -1121,7 +1121,7 @@ static int analogix_dp_get_modes(struct drm_connector 
> > *connector)
> >
> > pm_runtime_get_sync(dp->dev);
> > edid = drm_get_edid(connector, &dp->aux.ddc);
> > -   pm_runtime_put(dp->dev);
> > +   pm_runtime_put_autosuspend(dp->dev);
>
> So I think you can fully get rid of these ones now and rely on the
> ones in the aux transfer, right?

Yep, good catch.

> > if (edid) {
> > drm_connector_update_edid_property(&dp->connector,
> >edid);
> > @@ -1642,7 +1642,7 @@ static ssize_t analogix_dpaux_transfer(struct 
> > drm_dp_aux *aux,
> >
> > ret = analogix_dp_transfer(dp, msg);
> >  out:
> > -   pm_runtime_put(dp->dev);
> > +   pm_runtime_put_autosuspend(dp->dev);
> >
> > return ret;
> >  }
> > @@ -1775,6 +1775,8 @@ int analogix_dp_bind(struct analogix_dp_device *dp, 
> > struct drm_device *drm_dev)
> > if (ret)
> > return ret;
> >
> > +   pm_runtime_use_autosuspend(dp->dev);
> > +   pm_runtime_set_autosuspend_delay(dp->dev, 100);
>
> It's explicitly listed in the Documentation that you need the
> corresponding pm_runtime_dont_use_autosuspend(). Specifically, it
> says:
>
> > Drivers in ->remove() callback should undo the runtime PM changes done
> > in ->probe(). Usually this means calling pm_runtime_disable(),
> > pm_runtime_dont_use_autosuspend() etc.
>
> Not that it's very common to see anyone actually get it right, but I
> seem to remember running into an issue when I didn't do it. I think
> ti-sn65dsi86 still has it wrong since I found out about this later.
> Need to write a patch up for that... Basically you want to put it
> right before the two calls in your driver to pm_runtime_disable().

Ack.

Speaking of API misfeatures that we missed: I've failed to call
pm_runtime_mark_last_busy(). I'll add that in the next rev, for the
cases where we weren't already calling *_put_sync() (i.e., presumably
we don't really care to wait around for autosuspend).

*gripe* What a silly API. *gripe*

Brian

[PATCH v2 2/2] drm/atomic: Force bridge self-refresh-exit on CRTC switch

2022-02-28 Thread Brian Norris

It's possible to change which CRTC is in use for a given
connector/encoder/bridge while we're in self-refresh without fully
disabling the connector/encoder/bridge along the way. This can confuse
the bridge encoder/bridge, because
(a) it needs to track the SR state (trying to perform "active"
operations while the panel is still in SR can be Bad(TM)); and
(b) it tracks the SR state via the CRTC state (and after the switch, the
previous SR state is lost).

Thus, we need to either somehow carry the self-refresh state over to the
new CRTC, or else force an encoder/bridge self-refresh transition during
such a switch.

I choose the latter, so we disable the encoder (and exit PSR) before
attaching it to the new CRTC (where we can continue to assume a clean
(non-self-refresh) state).

This fixes PSR issues seen on Rockchip RK3399 systems with
drivers/gpu/drm/bridge/analogix/analogix_dp_core.c.

Change in v2:

- Drop "->enable" condition; this could possibly be "->active" to
  reflect the intended hardware state, but it also is a little
  over-specific. We want to make a transition through "disabled" any
  time we're exiting PSR at the same time as a CRTC switch.
  (Thanks Liu Ying)

Cc: Liu Ying 
Cc: 
Fixes: 1452c25b0e60 ("drm: Add helpers to kick off self refresh mode in 
drivers")
Signed-off-by: Brian Norris 
---
 drivers/gpu/drm/drm_atomic_helper.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 9603193d2fa1..987e4b212e9f 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1011,9 +1011,19 @@ crtc_needs_disable(struct drm_crtc_state *old_state,
return drm_atomic_crtc_effectively_active(old_state);
 
/*
-* We need to run through the crtc_funcs->disable() function if the CRTC
-* is currently on, if it's transitioning to self refresh mode, or if
-* it's in self refresh mode and needs to be fully disabled.
+* We need to disable bridge(s) and CRTC if we're transitioning out of
+* self-refresh and changing CRTCs at the same time, because the
+* bridge tracks self-refresh status via CRTC state.
+*/
+   if (old_state->self_refresh_active &&
+   old_state->crtc != new_state->crtc)
+   return true;
+
+   /*
+* We also need to run through the crtc_funcs->disable() function if
+* the CRTC is currently on, if it's transitioning to self refresh
+* mode, or if it's in self refresh mode and needs to be fully
+* disabled.
 */
return old_state->active ||
   (old_state->self_refresh_active && !new_state->active) ||
-- 
2.35.1.574.g5d30c73bfb-goog

[PATCH v2 1/2] drm/bridge: analogix_dp: Support PSR-exit to disable transition

2022-02-28 Thread Brian Norris

Most eDP panel functions only work correctly when the panel is not in
self-refresh. In particular, analogix_dp_bridge_disable() tends to hit
AUX channel errors if the panel is in self-refresh.

Given the above, it appears that so far, this driver assumes that we are
never in self-refresh when it comes time to fully disable the bridge.
Prior to commit 846c7dfc1193 ("drm/atomic: Try to preserve the crtc
enabled state in drm_atomic_remove_fb, v2."), this tended to be true,
because we would automatically disable the pipe when framebuffers were
removed, and so we'd typically disable the bridge shortly after the last
display activity.

However, that is not guaranteed: an idle (self-refresh) display pipe may
be disabled, e.g., when switching CRTCs. We need to exit PSR first.

Stable notes: this is definitely a bugfix, and the bug has likely
existed in some form for quite a while. It may predate the "PSR helpers"
refactor, but the code looked very different before that, and it's
probably not worth rewriting the fix.

Cc: 
Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR")
Signed-off-by: Brian Norris 
---

(no changes since v1)

 .../drm/bridge/analogix/analogix_dp_core.c| 42 +--
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index b7d2e4449cfa..6ee0f62a7161 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1268,6 +1268,25 @@ static int analogix_dp_bridge_attach(struct drm_bridge 
*bridge,
return 0;
 }
 
+static
+struct drm_crtc *analogix_dp_get_old_crtc(struct analogix_dp_device *dp,
+ struct drm_atomic_state *state)
+{
+   struct drm_encoder *encoder = dp->encoder;
+   struct drm_connector *connector;
+   struct drm_connector_state *conn_state;
+
+   connector = drm_atomic_get_old_connector_for_encoder(state, encoder);
+   if (!connector)
+   return NULL;
+
+   conn_state = drm_atomic_get_old_connector_state(state, connector);
+   if (!conn_state)
+   return NULL;
+
+   return conn_state->crtc;
+}
+
 static
 struct drm_crtc *analogix_dp_get_new_crtc(struct analogix_dp_device *dp,
  struct drm_atomic_state *state)
@@ -1448,14 +1467,16 @@ analogix_dp_bridge_atomic_disable(struct drm_bridge 
*bridge,
 {
struct drm_atomic_state *old_state = old_bridge_state->base.state;
struct analogix_dp_device *dp = bridge->driver_private;
-   struct drm_crtc *crtc;
+   struct drm_crtc *old_crtc, *new_crtc;
+   struct drm_crtc_state *old_crtc_state = NULL;
struct drm_crtc_state *new_crtc_state = NULL;
+   int ret;
 
-   crtc = analogix_dp_get_new_crtc(dp, old_state);
-   if (!crtc)
+   new_crtc = analogix_dp_get_new_crtc(dp, old_state);
+   if (!new_crtc)
goto out;
 
-   new_crtc_state = drm_atomic_get_new_crtc_state(old_state, crtc);
+   new_crtc_state = drm_atomic_get_new_crtc_state(old_state, new_crtc);
if (!new_crtc_state)
goto out;
 
@@ -1464,6 +1485,19 @@ analogix_dp_bridge_atomic_disable(struct drm_bridge 
*bridge,
return;
 
 out:
+   old_crtc = analogix_dp_get_old_crtc(dp, old_state);
+   if (old_crtc) {
+   old_crtc_state = drm_atomic_get_old_crtc_state(old_state,
+  old_crtc);
+
+   /* When moving from PSR to fully disabled, exit PSR first. */
+   if (old_crtc_state && old_crtc_state->self_refresh_active) {
+   ret = analogix_dp_disable_psr(dp);
+   if (ret)
+   DRM_ERROR("Failed to disable psr (%d)\n", ret);
+   }
+   }
+
analogix_dp_bridge_disable(bridge);
 }
 
-- 
2.35.1.574.g5d30c73bfb-goog

[PATCH v2 0/2] drm/bridge: analogix_dp: Self-refresh state machine fixes

2022-02-28 Thread Brian Norris

Hi,

I've been investigating several eDP issues on a Rockchip RK3399 system
and have two proposed bugfixes. RK3399 has two CRTCs, either of which
can be used for eDP output. For both fixes, we have bugs due to the
relationship between the generalized "self refresh helpers" and the
analogix-dp bridge driver which controls much of the PSR mechanics.
These bugs are most visible when switching between CRTCs.

I'm not a DRM expert, but I've been poking at a lot of Rockchip display
drivers recently. I'd love some skeptical eyes, so feel free to ask
questions if I haven't explained issues well, or the proposals look
fishy.

Regards,
Brian

Changes in v2:
- Drop "->enable" condition in crtc_needs_disable(); this could possibly
  be "->active" to reflect the intended hardware state, but it also is a
  little over-specific. We want to make a transition through "disabled"
  any time we're exiting PSR at the same time as a CRTC switch.
  (Thanks Liu Ying)

Brian Norris (2):
  drm/bridge: analogix_dp: Support PSR-exit to disable transition
  drm/atomic: Force bridge self-refresh-exit on CRTC switch

 .../drm/bridge/analogix/analogix_dp_core.c| 42 +--
 drivers/gpu/drm/drm_atomic_helper.c   | 16 +--
 2 files changed, 51 insertions(+), 7 deletions(-)

-- 
2.35.1.574.g5d30c73bfb-goog

Re: [PATCH 2/2] drm/atomic: Force bridge self-refresh-exit on CRTC switch

2022-02-28 Thread Brian Norris

Hi Liu,

On Mon, Feb 28, 2022 at 1:02 AM Liu Ying  wrote:
> On Tue, 2022-02-15 at 15:54 -0800, Brian Norris wrote:
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1011,9 +1011,19 @@ crtc_needs_disable(struct drm_crtc_state *old_state,
> >   return drm_atomic_crtc_effectively_active(old_state);
> >
> >   /*
> > -  * We need to run through the crtc_funcs->disable() function if the 
> > CRTC
> > -  * is currently on, if it's transitioning to self refresh mode, or if
> > -  * it's in self refresh mode and needs to be fully disabled.
> > +  * We need to disable bridge(s) and CRTC if we're transitioning out of
> > +  * self-refresh and changing CRTCs at the same time, because the
> > +  * bridge tracks self-refresh status via CRTC state.
> > +  */
> > + if (old_state->self_refresh_active && new_state->enable &&
> > + old_state->crtc != new_state->crtc)
> > + return true;
>
> I think 'new_state->enable' should be changed to 'new_state->active',
> because 'active' is the one to enable/disable the CRTC while 'enable'
> reflects whether a mode blob is set to CRTC state.  The overall logic
> added above is ok to me. Let's see if others have any comments.

Thanks for the review, and good catch. This actually shows that most
of my development was before commit 69e630016ef4 ("drm/atomic: Check
new_crtc_state->active to determine if CRTC needs disable in self
refresh mode"). In fact, the "state->enable" condition was included
here mostly as a complement to the "!state->enable" condition that was
present previously, and I didn't adapt it properly upon rebase.

In practice, this portion of the condition is not needed at all; we
really want to exit PSR on CRTC-switch regardless of the new-CRTC
state. So rather than change "enable" to "active", I plan to remove it
entirely.

I'll give it some local tests and send v2 eventually.

Thanks,
Brian

[PATCH v3 2/2] drm/bridge: analogix_dp: Enable autosuspend

2022-02-17 Thread Brian Norris

DP AUX transactions can consist of many short operations. There's no
need to power things up/down in short intervals.

I pick an arbitrary 100ms; for the systems I'm testing (Rockchip
RK3399), runtime-PM transitions only take a few microseconds.

Signed-off-by: Brian Norris 
---

Changes in v3:
 - New in v3

 drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index 16be279aed2c..d82a4ddf44e7 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1121,7 +1121,7 @@ static int analogix_dp_get_modes(struct drm_connector 
*connector)
 
pm_runtime_get_sync(dp->dev);
edid = drm_get_edid(connector, &dp->aux.ddc);
-   pm_runtime_put(dp->dev);
+   pm_runtime_put_autosuspend(dp->dev);
if (edid) {
drm_connector_update_edid_property(&dp->connector,
   edid);
@@ -1642,7 +1642,7 @@ static ssize_t analogix_dpaux_transfer(struct drm_dp_aux 
*aux,
 
ret = analogix_dp_transfer(dp, msg);
 out:
-   pm_runtime_put(dp->dev);
+   pm_runtime_put_autosuspend(dp->dev);
 
return ret;
 }
@@ -1775,6 +1775,8 @@ int analogix_dp_bind(struct analogix_dp_device *dp, 
struct drm_device *drm_dev)
if (ret)
return ret;
 
+   pm_runtime_use_autosuspend(dp->dev);
+   pm_runtime_set_autosuspend_delay(dp->dev, 100);
pm_runtime_enable(dp->dev);
 
ret = analogix_dp_create_bridge(drm_dev, dp);
-- 
2.35.1.265.g69c8d7142f-goog

[PATCH v3 1/2] drm/bridge: analogix_dp: Grab runtime PM reference for DP-AUX

2022-02-17 Thread Brian Norris

If the display is not enable()d, then we aren't holding a runtime PM
reference here. Thus, it's easy to accidentally cause a hang, if user
space is poking around at /dev/drm_dp_aux0 at the "wrong" time.

Let's get a runtime PM reference, and check that we "see" the panel.
Don't force any panel power-up, etc., because that can be intrusive, and
that's not what other drivers do (see
drivers/gpu/drm/bridge/ti-sn65dsi86.c and
drivers/gpu/drm/bridge/parade-ps8640.c.)

Fixes: 0d97ad03f422 ("drm/bridge: analogix_dp: Remove duplicated code")
Cc: 
Cc: Tomeu Vizoso 
Signed-off-by: Brian Norris 
---

Changes in v3:
- Avoid panel power-up; just check for HPD state, and let the rest
  happen "as-is" (e.g., time out, if the caller hasn't prepared things
  properly)

Changes in v2:
- Fix spelling in Subject
- DRM_DEV_ERROR() -> drm_err()
- Propagate errors from un-analogix_dp_prepare_panel()

 drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index b7d2e4449cfa..16be279aed2c 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1632,8 +1632,19 @@ static ssize_t analogix_dpaux_transfer(struct drm_dp_aux 
*aux,
   struct drm_dp_aux_msg *msg)
 {
struct analogix_dp_device *dp = to_dp(aux);
+   int ret;
+
+   pm_runtime_get_sync(dp->dev);
+
+   ret = analogix_dp_detect_hpd(dp);
+   if (ret)
+   goto out;
 
-   return analogix_dp_transfer(dp, msg);
+   ret = analogix_dp_transfer(dp, msg);
+out:
+   pm_runtime_put(dp->dev);
+
+   return ret;
 }
 
 struct analogix_dp_device *
-- 
2.35.1.265.g69c8d7142f-goog

Re: [PATCH v2] drm/bridge: analogix_dp: Grab runtime PM reference for DP-AUX

2022-02-15 Thread Brian Norris

On Tue, Feb 15, 2022 at 3:46 PM Doug Anderson  wrote:
> On Tue, Feb 15, 2022 at 2:52 PM Brian Norris  wrote:
> > It still makes me wonder what the point
> > of the /dev/drm_dp_aux interface is though, because it seems like
> > you're pretty much destined to not have reliable operation through
> > that means.
>
> I can't say I have tons of history for those files. I seem to recall
> maybe someone using them to have userspace tweak the embedded
> backlight on some external DP connected panels? I think we also might
> use it in Chrome OS to update the firmware of panels (dunno if
> internal or external) in some cases too? I suspect that it works OK
> for certain situations but it's really not going to work in all
> cases...

Yes, I believe I'm only submitting patches like this because fwupd
apparently likes to indiscriminately whack at dpaux devices:
https://github.com/fwupd/fwupd/tree/main/plugins/synaptics-mst#kernel-dp-aux-interface

That seems like a bad idea.

(We've already disabled that plugin on these systems, but it seems
wise not to leave the stumbling block here for the next time.)

> I suppose this just further proves the point that this is really not a
> great interface to rely on. It's fine for debugging during hardware
> bringup and I guess in limited situations it might be OK, but it's
> really not something we want userspace tweaking with anyway, right? In
> general I expect it's up to the kernel to be controlling peripherals
> on the DP AUX bus. The kernel should have a backlight driver and
> should do the AUX transfers needed. Having userspace in there mucking
> with things is just a bad idea. I mean, userspace also doesn't know
> when a panel has been power cycled and potentially lost any changes
> that they might have written, right?
>
> I sorta suspect that most of the uses of these files are there because
> there wasn't a kernel driver and someone thought that doing it in
> userspace was the way to go?

*shrug* beats me.

Brian

[PATCH 1/2] drm/bridge: analogix_dp: Support PSR-exit to disable transition

2022-02-15 Thread Brian Norris

Most eDP panel functions only work correctly when the panel is not in
self-refresh. In particular, analogix_dp_bridge_disable() tends to hit
AUX channel errors if the panel is in self-refresh.

Given the above, it appears that so far, this driver assumes that we are
never in self-refresh when it comes time to fully disable the bridge.
Prior to commit 846c7dfc1193 ("drm/atomic: Try to preserve the crtc
enabled state in drm_atomic_remove_fb, v2."), this tended to be true,
because we would automatically disable the pipe when framebuffers were
removed, and so we'd typically disable the bridge shortly after the last
display activity.

However, that is not guaranteed: an idle (self-refresh) display pipe may
be disabled, e.g., when switching CRTCs. We need to exit PSR first.

Stable notes: this is definitely a bugfix, and the bug has likely
existed in some form for quite a while. It may predate the "PSR helpers"
refactor, but the code looked very different before that, and it's
probably not worth rewriting the fix.

Cc: 
Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR")
Signed-off-by: Brian Norris 
---

 .../drm/bridge/analogix/analogix_dp_core.c| 42 +--
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index b7d2e4449cfa..6ee0f62a7161 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1268,6 +1268,25 @@ static int analogix_dp_bridge_attach(struct drm_bridge 
*bridge,
return 0;
 }
 
+static
+struct drm_crtc *analogix_dp_get_old_crtc(struct analogix_dp_device *dp,
+ struct drm_atomic_state *state)
+{
+   struct drm_encoder *encoder = dp->encoder;
+   struct drm_connector *connector;
+   struct drm_connector_state *conn_state;
+
+   connector = drm_atomic_get_old_connector_for_encoder(state, encoder);
+   if (!connector)
+   return NULL;
+
+   conn_state = drm_atomic_get_old_connector_state(state, connector);
+   if (!conn_state)
+   return NULL;
+
+   return conn_state->crtc;
+}
+
 static
 struct drm_crtc *analogix_dp_get_new_crtc(struct analogix_dp_device *dp,
  struct drm_atomic_state *state)
@@ -1448,14 +1467,16 @@ analogix_dp_bridge_atomic_disable(struct drm_bridge 
*bridge,
 {
struct drm_atomic_state *old_state = old_bridge_state->base.state;
struct analogix_dp_device *dp = bridge->driver_private;
-   struct drm_crtc *crtc;
+   struct drm_crtc *old_crtc, *new_crtc;
+   struct drm_crtc_state *old_crtc_state = NULL;
struct drm_crtc_state *new_crtc_state = NULL;
+   int ret;
 
-   crtc = analogix_dp_get_new_crtc(dp, old_state);
-   if (!crtc)
+   new_crtc = analogix_dp_get_new_crtc(dp, old_state);
+   if (!new_crtc)
goto out;
 
-   new_crtc_state = drm_atomic_get_new_crtc_state(old_state, crtc);
+   new_crtc_state = drm_atomic_get_new_crtc_state(old_state, new_crtc);
if (!new_crtc_state)
goto out;
 
@@ -1464,6 +1485,19 @@ analogix_dp_bridge_atomic_disable(struct drm_bridge 
*bridge,
return;
 
 out:
+   old_crtc = analogix_dp_get_old_crtc(dp, old_state);
+   if (old_crtc) {
+   old_crtc_state = drm_atomic_get_old_crtc_state(old_state,
+  old_crtc);
+
+   /* When moving from PSR to fully disabled, exit PSR first. */
+   if (old_crtc_state && old_crtc_state->self_refresh_active) {
+   ret = analogix_dp_disable_psr(dp);
+   if (ret)
+   DRM_ERROR("Failed to disable psr (%d)\n", ret);
+   }
+   }
+
analogix_dp_bridge_disable(bridge);
 }
 
-- 
2.35.1.265.g69c8d7142f-goog

[PATCH 2/2] drm/atomic: Force bridge self-refresh-exit on CRTC switch

2022-02-15 Thread Brian Norris

It's possible to change which CRTC is in use for a given
connector/encoder/bridge while we're in self-refresh without fully
disabling the connector/encoder/bridge along the way. This can confuse
the bridge encoder/bridge, because
(a) it needs to track the SR state (trying to perform "active"
operations while the panel is still in SR can be Bad(TM)); and
(b) it tracks the SR state via the CRTC state (and after the switch, the
previous SR state is lost).

Thus, we need to either somehow carry the self-refresh state over to the
new CRTC, or else force an encoder/bridge self-refresh transition during
such a switch.

I choose the latter, so we disable the encoder (and exit PSR) before
attaching it to the new CRTC (where we can continue to assume a clean
(non-self-refresh) state).

This fixes PSR issues seen on Rockchip RK3399 systems with
drivers/gpu/drm/bridge/analogix/analogix_dp_core.c.

Cc: 
Fixes: 1452c25b0e60 ("drm: Add helpers to kick off self refresh mode in 
drivers")
Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/drm_atomic_helper.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 9603193d2fa1..74161d007894 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1011,9 +1011,19 @@ crtc_needs_disable(struct drm_crtc_state *old_state,
return drm_atomic_crtc_effectively_active(old_state);
 
/*
-* We need to run through the crtc_funcs->disable() function if the CRTC
-* is currently on, if it's transitioning to self refresh mode, or if
-* it's in self refresh mode and needs to be fully disabled.
+* We need to disable bridge(s) and CRTC if we're transitioning out of
+* self-refresh and changing CRTCs at the same time, because the
+* bridge tracks self-refresh status via CRTC state.
+*/
+   if (old_state->self_refresh_active && new_state->enable &&
+   old_state->crtc != new_state->crtc)
+   return true;
+
+   /*
+* We also need to run through the crtc_funcs->disable() function if
+* the CRTC is currently on, if it's transitioning to self refresh
+* mode, or if it's in self refresh mode and needs to be fully
+* disabled.
 */
return old_state->active ||
   (old_state->self_refresh_active && !new_state->active) ||
-- 
2.35.1.265.g69c8d7142f-goog

[PATCH 0/2] drm/bridge: analogix_dp: Self-refresh state machine fixes

2022-02-15 Thread Brian Norris

Hi,

I've been investigating several eDP issues on a Rockchip RK3399 system
and have two proposed bugfixes. RK3399 has two CRTCs, either of which
can be used for eDP output. For both fixes, we have bugs due to the
relationship between the generalized "self refresh helpers" and the
analogix-dp bridge driver which controls much of the PSR mechanics.
These bugs are most visible when switching between CRTCs.

I'm not a DRM expert, but I've been poking at a lot of Rockchip display
drivers recently. I'd love some skeptical eyes, so feel free to ask
questions if I haven't explained issues well, or the proposals look
fishy.

Regards,
Brian


Brian Norris (2):
  drm/bridge: analogix_dp: Support PSR-exit to disable transition
  drm/atomic: Force bridge self-refresh-exit on CRTC switch

 .../drm/bridge/analogix/analogix_dp_core.c| 42 +--
 drivers/gpu/drm/drm_atomic_helper.c   | 16 +--
 2 files changed, 51 insertions(+), 7 deletions(-)

-- 
2.35.1.265.g69c8d7142f-goog

Re: [PATCH v2] drm/bridge: analogix_dp: Grab runtime PM reference for DP-AUX

2022-02-15 Thread Brian Norris

On Tue, Feb 15, 2022 at 1:31 PM Doug Anderson  wrote:
>
> Hi,

Hi!

> On Fri, Oct 1, 2021 at 2:50 PM Brian Norris  wrote:
> >
> > If the display is not enable()d, then we aren't holding a runtime PM
> > reference here. Thus, it's easy to accidentally cause a hang, if user
> > space is poking around at /dev/drm_dp_aux0 at the "wrong" time.
> >
> > Let's get the panel and PM state right before trying to talk AUX.

> > diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
> > b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> > index b7d2e4449cfa..6fc46ac93ef8 100644
> > --- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> > +++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> > @@ -1632,8 +1632,27 @@ static ssize_t analogix_dpaux_transfer(struct 
> > drm_dp_aux *aux,
...
> > +   pm_runtime_get_sync(dp->dev);
> > +   ret = analogix_dp_transfer(dp, msg);
> > +   pm_runtime_put(dp->dev);
>
> I've spent an unfortunate amount of time digging around the DP AUX bus
> recently, so I can at least say that I have some experience and some
> opinions here.

Thanks! Experience is welcome, and opinions too sometimes ;)

> IMO:
>
> 1. Don't power the panel on. If the panel isn't powered on then the DP
> AUX transfer will timeout. Tough nuggies. Think of yourself more like
> an i2c controller and of this as an i2c transfer implementation. The
> i2c controller isn't in charge of powering up the i2c devices on the
> bus. If userspace does an "i2c detect" on an i2c bus and some of the
> devices aren't powered then they won't be found. If you try to
> read/write from a powered off device that won't work either.

I guess this, paired with the driver examples below (ti-sn65dsi86.c,
especially, which specifically throws errors if the panel isn't on),
makes some sense. It's approximately (but more verbosely) what Andrzej
was recommending too, I guess. It still makes me wonder what the point
of the /dev/drm_dp_aux interface is though, because it seems like
you're pretty much destined to not have reliable operation through
that means.

Also note: I found that the AUX bus is really not working properly at
all (even with this patch) in some cases due to self-refresh. Not only
do we need the panel enabled, but we need to not be in self-refresh
mode. Self-refresh is not currently exposed to user space, so user
space has no way of knowing the panel is currently active, aside from
racily inducing artificial display activity.

But if we're OK with "just throw errors" or "just let stuff time out",
then I guess that's not a big deal. My purpose is to avoid hanging the
system, not to make /dev/drm_dp_aux useful.

> 2. In theory if the DP driver can read HPD (I haven't looked through
> the analogix code to see how it handles it) then you can fail an AUX
> transfer right away if HPD isn't asserted instead of timing out. If
> this is hard, it's probably fine to just time out though.

This driver does handle HPD, but it also has overrides because
apparently it doesn't work on some systems. I might see if we can
leverage it, or I might just follow the bridge-enabled state (similar
to ti-sn65dsi86.c's 'comms_enabled').

> 3. Do the "pm_runtime" calls, but enable "autosuspend" with something
> ~1 second autosuspend delay. When using the AUX bus to read an EDID
> the underlying code will call your function 16 times in quick
> succession. If you're powering up and down constantly that'll be a bit
> of a waste.

Does this part really matter? For properly active cases, the bridge
remains enabled, and it holds a runtime PM reference. For "maybe
active" (your "tough nuggies" situation above), you're probably right
that it's inefficient, but does it matter, when it's going to be a
slow timed-out operation anyway? The AUX failure will be much slower
than the PM transition.

I guess I can do this anyway, but frankly, I'll just be copy/pasting
stuff from other drivers, because the runtime PM documentation still
confuses me, and moreso once you involve autosuspend.

> For a reference, you could look at
> `drivers/gpu/drm/bridge/ti-sn65dsi86.c`. Also
> `drivers/gpu/drm/bridge/parade-ps8640.c`

Thanks for these. They look like reasonable patterns to follow.

Brian

[PATCH] drm/rockchip: vop: Correct RK3399 VOP register fields

2022-01-19 Thread Brian Norris

Commit 7707f7227f09 ("drm/rockchip: Add support for afbc") switched up
the rk3399_vop_big[] register windows, but it did so incorrectly.

The biggest problem is in rk3288_win23_data[] vs.
rk3368_win23_data[] .format field:

  RK3288's format: VOP_REG(RK3288_WIN2_CTRL0, 0x7, 1)
  RK3368's format: VOP_REG(RK3368_WIN2_CTRL0, 0x3, 5)

Bits 5:6 (i.e., shift 5, mask 0x3) are correct for RK3399, according to
the TRM.

There are a few other small differences between the 3288 and 3368
definitions that were swapped in commit 7707f7227f09. I reviewed them to
the best of my ability according to the RK3399 TRM and fixed them up.

This fixes IOMMU issues (and display errors) when testing with BG24
color formats.

Fixes: 7707f7227f09 ("drm/rockchip: Add support for afbc")
Cc: Andrzej Pietrasiewicz 
Cc: 
Signed-off-by: Brian Norris 
---
I'd appreciate notes or testing from Andrzej, since I'm not sure how he
tested his original AFBC work.

 drivers/gpu/drm/rockchip/rockchip_vop_reg.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_vop_reg.c 
b/drivers/gpu/drm/rockchip/rockchip_vop_reg.c
index 1f7353f0684a..798b542e5916 100644
--- a/drivers/gpu/drm/rockchip/rockchip_vop_reg.c
+++ b/drivers/gpu/drm/rockchip/rockchip_vop_reg.c
@@ -902,6 +902,7 @@ static const struct vop_win_phy rk3399_win01_data = {
.enable = VOP_REG(RK3288_WIN0_CTRL0, 0x1, 0),
.format = VOP_REG(RK3288_WIN0_CTRL0, 0x7, 1),
.rb_swap = VOP_REG(RK3288_WIN0_CTRL0, 0x1, 12),
+   .x_mir_en = VOP_REG(RK3288_WIN0_CTRL0, 0x1, 21),
.y_mir_en = VOP_REG(RK3288_WIN0_CTRL0, 0x1, 22),
.act_info = VOP_REG(RK3288_WIN0_ACT_INFO, 0x1fff1fff, 0),
.dsp_info = VOP_REG(RK3288_WIN0_DSP_INFO, 0x0fff0fff, 0),
@@ -912,6 +913,7 @@ static const struct vop_win_phy rk3399_win01_data = {
.uv_vir = VOP_REG(RK3288_WIN0_VIR, 0x3fff, 16),
.src_alpha_ctl = VOP_REG(RK3288_WIN0_SRC_ALPHA_CTRL, 0xff, 0),
.dst_alpha_ctl = VOP_REG(RK3288_WIN0_DST_ALPHA_CTRL, 0xff, 0),
+   .channel = VOP_REG(RK3288_WIN0_CTRL2, 0xff, 0),
 };
 
 /*
@@ -922,11 +924,11 @@ static const struct vop_win_phy rk3399_win01_data = {
 static const struct vop_win_data rk3399_vop_win_data[] = {
{ .base = 0x00, .phy = &rk3399_win01_data,
  .type = DRM_PLANE_TYPE_PRIMARY },
-   { .base = 0x40, .phy = &rk3288_win01_data,
+   { .base = 0x40, .phy = &rk3368_win01_data,
  .type = DRM_PLANE_TYPE_OVERLAY },
-   { .base = 0x00, .phy = &rk3288_win23_data,
+   { .base = 0x00, .phy = &rk3368_win23_data,
  .type = DRM_PLANE_TYPE_OVERLAY },
-   { .base = 0x50, .phy = &rk3288_win23_data,
+   { .base = 0x50, .phy = &rk3368_win23_data,
  .type = DRM_PLANE_TYPE_CURSOR },
 };
 
-- 
2.34.1.703.g22d0c6ccf7-goog

Re: [PATCH v2 3/3] ASoC: rk3399_gru_sound: Wire up DP jack detection

2022-01-18 Thread Brian Norris

Hi Chen-Yu,

On Mon, Jan 17, 2022 at 05:01:52PM +0800, Chen-Yu Tsai wrote:
> On Sat, Jan 15, 2022 at 7:03 AM Brian Norris  wrote:
> >
> > Now that the cdn-dp driver supports plug-change callbacks, let's wire it
> > up.
> >
> > Signed-off-by: Brian Norris 
> > ---
> >
> > (no changes since v1)
> >
> >  sound/soc/rockchip/rk3399_gru_sound.c | 20 
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/sound/soc/rockchip/rk3399_gru_sound.c 
> > b/sound/soc/rockchip/rk3399_gru_sound.c
> > index e2d52d8d0ff9..eeef3ed70037 100644
> > --- a/sound/soc/rockchip/rk3399_gru_sound.c
> > +++ b/sound/soc/rockchip/rk3399_gru_sound.c
> > @@ -164,6 +164,25 @@ static int rockchip_sound_da7219_hw_params(struct 
> > snd_pcm_substream *substream,
> > return 0;
> >  }
> >
> > +static struct snd_soc_jack cdn_dp_card_jack;
> > +
> > +static int rockchip_sound_cdndp_init(struct snd_soc_pcm_runtime *rtd)
> > +{
> > +   struct snd_soc_component *component = asoc_rtd_to_codec(rtd, 
> > 0)->component;
> 
> Using snd_soc_card_get_codec_dai() might be a better choice throughout this
> driver. While it will work for the cdn_dp case, because it is the first DAI
> in |rockchip_dais[]|, all the invocations for the other codecs are likely
> returning the wrong DAI.

I'll admit, I'm not very familiar with the ASoC object model, so you may
well be correct that there's something fishy in here. But I did trace
through the objects involved here, and we *are* getting the correct DAI
for both this case and the DA7219 case (preexisting code).

It looks like we actually have a new runtime for each of our static
dai_links:

devm_snd_soc_register_card()
  ...
  for_each_card_prelinks()
snd_soc_add_pcm_runtime()

So I think this is valid to keep as-is.

> For this particular patch it works either way, so
> 
> Reviewed-by: Chen-Yu Tsai 

Thanks for looking!

Brian

[PATCH v2 3/3] ASoC: rk3399_gru_sound: Wire up DP jack detection

2022-01-14 Thread Brian Norris

Now that the cdn-dp driver supports plug-change callbacks, let's wire it
up.

Signed-off-by: Brian Norris 
---

(no changes since v1)

 sound/soc/rockchip/rk3399_gru_sound.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/sound/soc/rockchip/rk3399_gru_sound.c 
b/sound/soc/rockchip/rk3399_gru_sound.c
index e2d52d8d0ff9..eeef3ed70037 100644
--- a/sound/soc/rockchip/rk3399_gru_sound.c
+++ b/sound/soc/rockchip/rk3399_gru_sound.c
@@ -164,6 +164,25 @@ static int rockchip_sound_da7219_hw_params(struct 
snd_pcm_substream *substream,
return 0;
 }
 
+static struct snd_soc_jack cdn_dp_card_jack;
+
+static int rockchip_sound_cdndp_init(struct snd_soc_pcm_runtime *rtd)
+{
+   struct snd_soc_component *component = asoc_rtd_to_codec(rtd, 
0)->component;
+   struct snd_soc_card *card = rtd->card;
+   int ret;
+
+   /* Enable jack detection. */
+   ret = snd_soc_card_jack_new(card, "DP Jack", SND_JACK_LINEOUT,
+   &cdn_dp_card_jack, NULL, 0);
+   if (ret) {
+   dev_err(card->dev, "Can't create DP Jack %d\n", ret);
+   return ret;
+   }
+
+   return snd_soc_component_set_jack(component, &cdn_dp_card_jack, NULL);
+}
+
 static int rockchip_sound_da7219_init(struct snd_soc_pcm_runtime *rtd)
 {
struct snd_soc_component *component = asoc_rtd_to_codec(rtd, 
0)->component;
@@ -315,6 +334,7 @@ static const struct snd_soc_dai_link rockchip_dais[] = {
[DAILINK_CDNDP] = {
.name = "DP",
.stream_name = "DP PCM",
+   .init = rockchip_sound_cdndp_init,
.dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF |
SND_SOC_DAIFMT_CBS_CFS,
SND_SOC_DAILINK_REG(cdndp),
-- 
2.34.1.703.g22d0c6ccf7-goog

[PATCH v2 2/3] drm/rockchip: cdn-dp: Support HDMI codec plug-change callback

2022-01-14 Thread Brian Norris

Some audio servers like to monitor a jack device (perhaps combined with
EDID, for audio-presence info) to determine DP/HDMI audio presence.

Signed-off-by: Brian Norris 
---

(no changes since v1)

 drivers/gpu/drm/rockchip/cdn-dp-core.c | 28 ++
 drivers/gpu/drm/rockchip/cdn-dp-core.h |  4 
 2 files changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.c 
b/drivers/gpu/drm/rockchip/cdn-dp-core.c
index 16497c31d9f9..edd6a1fc46cd 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-core.c
+++ b/drivers/gpu/drm/rockchip/cdn-dp-core.c
@@ -586,6 +586,13 @@ static bool cdn_dp_check_link_status(struct cdn_dp_device 
*dp)
return drm_dp_channel_eq_ok(link_status, min(port->lanes, sink_lanes));
 }
 
+static void cdn_dp_audio_handle_plugged_change(struct cdn_dp_device *dp,
+  bool plugged)
+{
+   if (dp->codec_dev)
+   dp->plugged_cb(dp->codec_dev, plugged);
+}
+
 static void cdn_dp_encoder_enable(struct drm_encoder *encoder)
 {
struct cdn_dp_device *dp = encoder_to_dp(encoder);
@@ -641,6 +648,9 @@ static void cdn_dp_encoder_enable(struct drm_encoder 
*encoder)
DRM_DEV_ERROR(dp->dev, "Failed to valid video %d\n", ret);
goto out;
}
+
+   cdn_dp_audio_handle_plugged_change(dp, true);
+
 out:
mutex_unlock(&dp->lock);
 }
@@ -651,6 +661,8 @@ static void cdn_dp_encoder_disable(struct drm_encoder 
*encoder)
int ret;
 
mutex_lock(&dp->lock);
+   cdn_dp_audio_handle_plugged_change(dp, false);
+
if (dp->active) {
ret = cdn_dp_disable(dp);
if (ret) {
@@ -846,11 +858,27 @@ static int cdn_dp_audio_get_eld(struct device *dev, void 
*data,
return 0;
 }
 
+static int cdn_dp_audio_hook_plugged_cb(struct device *dev, void *data,
+   hdmi_codec_plugged_cb fn,
+   struct device *codec_dev)
+{
+   struct cdn_dp_device *dp = dev_get_drvdata(dev);
+
+   mutex_lock(&dp->lock);
+   dp->plugged_cb = fn;
+   dp->codec_dev = codec_dev;
+   cdn_dp_audio_handle_plugged_change(dp, dp->connected);
+   mutex_unlock(&dp->lock);
+
+   return 0;
+}
+
 static const struct hdmi_codec_ops audio_codec_ops = {
.hw_params = cdn_dp_audio_hw_params,
.audio_shutdown = cdn_dp_audio_shutdown,
.mute_stream = cdn_dp_audio_mute_stream,
.get_eld = cdn_dp_audio_get_eld,
+   .hook_plugged_cb = cdn_dp_audio_hook_plugged_cb,
.no_capture_mute = 1,
 };
 
diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.h 
b/drivers/gpu/drm/rockchip/cdn-dp-core.h
index 81ac9b658a70..d808a9de45ed 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-core.h
+++ b/drivers/gpu/drm/rockchip/cdn-dp-core.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "rockchip_drm_drv.h"
 
@@ -101,5 +102,8 @@ struct cdn_dp_device {
 
u8 dpcd[DP_RECEIVER_CAP_SIZE];
bool sink_has_audio;
+
+   hdmi_codec_plugged_cb plugged_cb;
+   struct device *codec_dev;
 };
 #endif  /* _CDN_DP_CORE_H */
-- 
2.34.1.703.g22d0c6ccf7-goog

[PATCH v2 1/3] arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output

2022-01-14 Thread Brian Norris

Commit b18c6c3c7768 ("ASoC: rockchip: cdn-dp sound output use spdif")
switched the platform to SPDIF, but we didn't fix up the device tree.

Drop the pinctrl settings, because the 'spdif_bus' pins are either:
 * unused (on kevin, bob), so the settings is ~harmless
 * used by a different function (on scarlet), which causes probe
   failures (!!)

Fixes: b18c6c3c7768 ("ASoC: rockchip: cdn-dp sound output use spdif")
Signed-off-by: Brian Norris 
---

Changes in v2:
 - (Un)set pinctrl, because the default assumes we're routing out to
   external pins

 arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
index 45a5ae5d2027..162f08bca0d4 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
@@ -286,7 +286,7 @@ max98357a: max98357a {
 
sound: sound {
compatible = "rockchip,rk3399-gru-sound";
-   rockchip,cpu = <&i2s0 &i2s2>;
+   rockchip,cpu = <&i2s0 &spdif>;
};
 };
 
@@ -437,10 +437,6 @@ &i2s0 {
status = "okay";
 };
 
-&i2s2 {
-   status = "okay";
-};
-
 &io_domains {
status = "okay";
 
@@ -537,6 +533,17 @@ &sdmmc {
vqmmc-supply = <&ppvar_sd_card_io>;
 };
 
+&spdif {
+   status = "okay";
+
+   /*
+* SPDIF is routed internally to DP; we either don't use these pins, or
+* mux them to something else.
+*/
+   /delete-property/ pinctrl-0;
+   /delete-property/ pinctrl-names;
+};
+
 &spi1 {
status = "okay";
 
-- 
2.34.1.703.g22d0c6ccf7-goog

[PATCH v2 0/3] (Re)enable DP/HDMI audio for RK3399 Gru

2022-01-14 Thread Brian Norris

This series fixes DP/HDMI audio for RK3399 Gru systems.

First, there was a regression with the switch to SPDIF. Patch 1 can be
taken separately as a regression fix if desired. But it's not quite so
useful (at least on Chrome OS systems) without the second part.

Second, jack detection was never upstreamed, because the hdmi-codec
dependencies were still being worked out when this platform was first
supported.

Patches cover a few subsystems. Perhaps this is something for arm-soc?

Changes in v2:
 - (Un)set pinctrl, because the default assumes we're routing out to
   external pins

Brian Norris (3):
  arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
  drm/rockchip: cdn-dp: Support HDMI codec plug-change callback
  ASoC: rk3399_gru_sound: Wire up DP jack detection

 arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 17 
 drivers/gpu/drm/rockchip/cdn-dp-core.c   | 28 
 drivers/gpu/drm/rockchip/cdn-dp-core.h   |  4 +++
 sound/soc/rockchip/rk3399_gru_sound.c| 20 ++
 4 files changed, 64 insertions(+), 5 deletions(-)

-- 
2.34.1.703.g22d0c6ccf7-goog

Re: [PATCH 1/3] arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output

2022-01-14 Thread Brian Norris

Sorry to send a self-reply so quickly, but I noticed an error and want
to make sure this doesn't get merged _too_ quickly before I get to
send a revision! See below:

On Fri, Jan 14, 2022 at 12:17 PM Brian Norris  wrote:
>
> Commit b18c6c3c7768 ("ASoC: rockchip: cdn-dp sound output use spdif")
> switched the platform to SPDIF, but we didn't fix up the device tree.
>
> Fixes: b18c6c3c7768 ("ASoC: rockchip: cdn-dp sound output use spdif")
> Signed-off-by: Brian Norris 
> ---
>
>  arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi 
> b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
> index 45a5ae5d2027..21ec073f4d51 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi

> +&spdif {
> +   status = "okay";

I need to fix up the pinctrl settings here. rk3399.dtsi has a default
that is incorrect. That's OK for several variants (Kevin and Bob,
where the pin is actually unconnected), but it breaks Scarlet (where
the pin in question is actually connected to something else).

I'll send a v2 after waiting a bit, in case there are other comments
worth addressing at the same time.

Brian

> +};

[PATCH 2/3] drm/rockchip: cdn-dp: Support HDMI codec plug-change callback

2022-01-14 Thread Brian Norris

Some audio servers like to monitor a jack device (perhaps combined with
EDID, for audio-presence info) to determine DP/HDMI audio presence.

Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/rockchip/cdn-dp-core.c | 28 ++
 drivers/gpu/drm/rockchip/cdn-dp-core.h |  4 
 2 files changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.c 
b/drivers/gpu/drm/rockchip/cdn-dp-core.c
index 16497c31d9f9..edd6a1fc46cd 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-core.c
+++ b/drivers/gpu/drm/rockchip/cdn-dp-core.c
@@ -586,6 +586,13 @@ static bool cdn_dp_check_link_status(struct cdn_dp_device 
*dp)
return drm_dp_channel_eq_ok(link_status, min(port->lanes, sink_lanes));
 }
 
+static void cdn_dp_audio_handle_plugged_change(struct cdn_dp_device *dp,
+  bool plugged)
+{
+   if (dp->codec_dev)
+   dp->plugged_cb(dp->codec_dev, plugged);
+}
+
 static void cdn_dp_encoder_enable(struct drm_encoder *encoder)
 {
struct cdn_dp_device *dp = encoder_to_dp(encoder);
@@ -641,6 +648,9 @@ static void cdn_dp_encoder_enable(struct drm_encoder 
*encoder)
DRM_DEV_ERROR(dp->dev, "Failed to valid video %d\n", ret);
goto out;
}
+
+   cdn_dp_audio_handle_plugged_change(dp, true);
+
 out:
mutex_unlock(&dp->lock);
 }
@@ -651,6 +661,8 @@ static void cdn_dp_encoder_disable(struct drm_encoder 
*encoder)
int ret;
 
mutex_lock(&dp->lock);
+   cdn_dp_audio_handle_plugged_change(dp, false);
+
if (dp->active) {
ret = cdn_dp_disable(dp);
if (ret) {
@@ -846,11 +858,27 @@ static int cdn_dp_audio_get_eld(struct device *dev, void 
*data,
return 0;
 }
 
+static int cdn_dp_audio_hook_plugged_cb(struct device *dev, void *data,
+   hdmi_codec_plugged_cb fn,
+   struct device *codec_dev)
+{
+   struct cdn_dp_device *dp = dev_get_drvdata(dev);
+
+   mutex_lock(&dp->lock);
+   dp->plugged_cb = fn;
+   dp->codec_dev = codec_dev;
+   cdn_dp_audio_handle_plugged_change(dp, dp->connected);
+   mutex_unlock(&dp->lock);
+
+   return 0;
+}
+
 static const struct hdmi_codec_ops audio_codec_ops = {
.hw_params = cdn_dp_audio_hw_params,
.audio_shutdown = cdn_dp_audio_shutdown,
.mute_stream = cdn_dp_audio_mute_stream,
.get_eld = cdn_dp_audio_get_eld,
+   .hook_plugged_cb = cdn_dp_audio_hook_plugged_cb,
.no_capture_mute = 1,
 };
 
diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.h 
b/drivers/gpu/drm/rockchip/cdn-dp-core.h
index 81ac9b658a70..d808a9de45ed 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-core.h
+++ b/drivers/gpu/drm/rockchip/cdn-dp-core.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "rockchip_drm_drv.h"
 
@@ -101,5 +102,8 @@ struct cdn_dp_device {
 
u8 dpcd[DP_RECEIVER_CAP_SIZE];
bool sink_has_audio;
+
+   hdmi_codec_plugged_cb plugged_cb;
+   struct device *codec_dev;
 };
 #endif  /* _CDN_DP_CORE_H */
-- 
2.34.1.703.g22d0c6ccf7-goog

[PATCH 3/3] ASoC: rk3399_gru_sound: Wire up DP jack detection

2022-01-14 Thread Brian Norris

Now that the cdn-dp driver supports plug-change callbacks, let's wire it
up.

Signed-off-by: Brian Norris 
---

 sound/soc/rockchip/rk3399_gru_sound.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/sound/soc/rockchip/rk3399_gru_sound.c 
b/sound/soc/rockchip/rk3399_gru_sound.c
index e2d52d8d0ff9..eeef3ed70037 100644
--- a/sound/soc/rockchip/rk3399_gru_sound.c
+++ b/sound/soc/rockchip/rk3399_gru_sound.c
@@ -164,6 +164,25 @@ static int rockchip_sound_da7219_hw_params(struct 
snd_pcm_substream *substream,
return 0;
 }
 
+static struct snd_soc_jack cdn_dp_card_jack;
+
+static int rockchip_sound_cdndp_init(struct snd_soc_pcm_runtime *rtd)
+{
+   struct snd_soc_component *component = asoc_rtd_to_codec(rtd, 
0)->component;
+   struct snd_soc_card *card = rtd->card;
+   int ret;
+
+   /* Enable jack detection. */
+   ret = snd_soc_card_jack_new(card, "DP Jack", SND_JACK_LINEOUT,
+   &cdn_dp_card_jack, NULL, 0);
+   if (ret) {
+   dev_err(card->dev, "Can't create DP Jack %d\n", ret);
+   return ret;
+   }
+
+   return snd_soc_component_set_jack(component, &cdn_dp_card_jack, NULL);
+}
+
 static int rockchip_sound_da7219_init(struct snd_soc_pcm_runtime *rtd)
 {
struct snd_soc_component *component = asoc_rtd_to_codec(rtd, 
0)->component;
@@ -315,6 +334,7 @@ static const struct snd_soc_dai_link rockchip_dais[] = {
[DAILINK_CDNDP] = {
.name = "DP",
.stream_name = "DP PCM",
+   .init = rockchip_sound_cdndp_init,
.dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF |
SND_SOC_DAIFMT_CBS_CFS,
SND_SOC_DAILINK_REG(cdndp),
-- 
2.34.1.703.g22d0c6ccf7-goog

[PATCH 1/3] arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output

2022-01-14 Thread Brian Norris

Commit b18c6c3c7768 ("ASoC: rockchip: cdn-dp sound output use spdif")
switched the platform to SPDIF, but we didn't fix up the device tree.

Fixes: b18c6c3c7768 ("ASoC: rockchip: cdn-dp sound output use spdif")
Signed-off-by: Brian Norris 
---

 arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
index 45a5ae5d2027..21ec073f4d51 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
@@ -286,7 +286,7 @@ max98357a: max98357a {
 
sound: sound {
compatible = "rockchip,rk3399-gru-sound";
-   rockchip,cpu = <&i2s0 &i2s2>;
+   rockchip,cpu = <&i2s0 &spdif>;
};
 };
 
@@ -437,10 +437,6 @@ &i2s0 {
status = "okay";
 };
 
-&i2s2 {
-   status = "okay";
-};
-
 &io_domains {
status = "okay";
 
@@ -537,6 +533,10 @@ &sdmmc {
vqmmc-supply = <&ppvar_sd_card_io>;
 };
 
+&spdif {
+   status = "okay";
+};
+
 &spi1 {
status = "okay";
 
-- 
2.34.1.703.g22d0c6ccf7-goog

[PATCH 0/3] (Re)enable DP/HDMI audio for RK3399 Gru

2022-01-14 Thread Brian Norris

This series fixes DP/HDMI audio for RK3399 Gru systems.

First, there was a regression with the switch to SPDIF. Patch 1 can be
taken separately as a regression fix if desired. But it's not quite so
useful (at least on Chrome OS systems) without the second part.

Second, jack detection was never upstreamed, because the hdmi-codec
dependencies were still being worked out when this platform was first
supported.

Patches cover a few subsystems. Perhaps this is something for arm-soc?


Brian Norris (3):
  arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
  drm/rockchip: cdn-dp: Support HDMI codec plug-change callback
  ASoC: rk3399_gru_sound: Wire up DP jack detection

 arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 10 +++
 drivers/gpu/drm/rockchip/cdn-dp-core.c   | 28 
 drivers/gpu/drm/rockchip/cdn-dp-core.h   |  4 +++
 sound/soc/rockchip/rk3399_gru_sound.c| 20 ++
 4 files changed, 57 insertions(+), 5 deletions(-)

-- 
2.34.1.703.g22d0c6ccf7-goog

Re: [PATCH v2] drm/bridge: analogix_dp: Grab runtime PM reference for DP-AUX

2022-01-11 Thread Brian Norris

Hi Andrzej,

On Tue, Jan 11, 2022 at 5:26 AM Andrzej Hajda  wrote:
> I am not DP specialist so CC-ed people working with DP

Thanks for the review regardless! I'll also not claim to be a DP
specialist -- although I've had to learn my fair share to debug a good
handful of issues on an SoC using this driver.

> On 01.10.2021 23:42, Brian Norris wrote:
> > If the display is not enable()d, then we aren't holding a runtime PM
> > reference here. Thus, it's easy to accidentally cause a hang, if user
> > space is poking around at /dev/drm_dp_aux0 at the "wrong" time.
> >
> > Let's get the panel and PM state right before trying to talk AUX.
> >
> > Fixes: 0d97ad03f422 ("drm/bridge: analogix_dp: Remove duplicated code")
> > Cc: 
> > Cc: Tomeu Vizoso 
> > Signed-off-by: Brian Norris 
>
>
> Few questions/issues here:
>
> 1. If it is just to avoid accidental 'hangs' it would be better to just
> check if the panel is working before transfer, if not, return error
> code. If there is better reason for this pm dance, please provide it  in
> description.

I'm not that familiar with DP-AUX, but I believe it can potentially
provide a variety of useful information (e.g., EDID?) to users without
the display and primary video link being active. So it doesn't sound
like a good idea to me to purposely leave this interface uninitialized
(and emitting errors) even when the user is asking for communication
(via /dev/drm_dp_aux). Do you want me to document what
/dev/drm_dp_aux does, and why someone would use it, in the commit
message?

> 2. Again I see an assumption that panel-prepare enables power for
> something different than video transmission, accidentally it is true for
> most devices, but devices having more fine grained power management will
> break, or at least will be used inefficiently - but maybe in case of dp
> it is OK ???

For this part, I'm less sure -- I wasn't sure what the general needs
are for AUX communication, and whether we need the panel enabled or
not. It seems logical that we need something powered, and I don't know
of anything besides "prepare()" that ensures that for DP panels.

(NB: the key to _my_ problem is the PM runtime reference. It's
absolutely essential that we don't try to utilize the DP hardware
without powering it up. The panel power state is less critical.)

> 3. More general issue - I am not sure if this should not be handled
> uniformly for all drm_dp devices.

I'm not sure what precisely you mean by #3. But FWIW, this is at least
partially documented ("make sure it's been properly enabled"):

/**
 * @transfer: transfers a message representing a single AUX
 * transaction.
 *
 * This is a hardware-specific implementation of how
 * transactions are executed that the drivers must provide.
...
 * Also note that this callback can be called no matter the
 * state @dev is in. Drivers that need that device to be powered
 * to perform this operation will first need to make sure it's
 * been properly enabled.
 */
ssize_t (*transfer)(struct drm_dp_aux *aux,
struct drm_dp_aux_msg *msg);

But maybe the definition of "properly enabled" is what you're unsure
about? (I'm also a little unsure.)

Regards,
Brian

Re: [PATCH v2] drm/bridge: analogix_dp: Grab runtime PM reference for DP-AUX

2022-01-05 Thread Brian Norris

(updating Andrzej's email)

On Fri, Oct 1, 2021 at 2:50 PM Brian Norris  wrote:
> If the display is not enable()d, then we aren't holding a runtime PM
> reference here. Thus, it's easy to accidentally cause a hang, if user
> space is poking around at /dev/drm_dp_aux0 at the "wrong" time.
>
> Let's get the panel and PM state right before trying to talk AUX.
>
> Fixes: 0d97ad03f422 ("drm/bridge: analogix_dp: Remove duplicated code")
> Cc: 
> Cc: Tomeu Vizoso 
> Signed-off-by: Brian Norris 
> ---
>
> Changes in v2:
> - Fix spelling in Subject
> - DRM_DEV_ERROR() -> drm_err()
> - Propagate errors from un-analogix_dp_prepare_panel()
>
>  .../drm/bridge/analogix/analogix_dp_core.c| 21 ++-
>  1 file changed, 20 insertions(+), 1 deletion(-)

Ping? Do I need to do anything more here?

Thanks,
Brian

Re: [PATCH v2 1/2] drm/input_helper: Add new input-handling helper

2021-11-30 Thread Brian Norris

Hi Pekka,

On Fri, Nov 19, 2021 at 12:38:41PM +0200, Pekka Paalanen wrote:
> On Thu, 18 Nov 2021 17:46:10 -0800
> Brian Norris  wrote:
> > On Thu, Nov 18, 2021 at 12:39:28PM +0200, Pekka Paalanen wrote:
> > > On Wed, 17 Nov 2021 14:48:40 -0800
> > > Brian Norris  wrote:
> > > If KMS gets a pageflip or modeset in no time after an input event, then
> > > what's the gain. OTOH, if the display server is locking on to vblank,
> > > there might be a delay worth avoiding. But then, is it worth
> > > short-circuiting the wake-up in kernel vs. adding a new ioctl that
> > > userspace could hit to start the warming up process?  
> > 
> > Rob responded to the first part to some extent (there is definitely gain
> > to be had).
> > 
> > To the last part: I wrote a simple debugfs hook to allow user space to
> > force a PSR exit, and then a simple user space program to read input
> > events and smash that debugfs file whenever it sees one. Testing in the
> > same scenarios, this appears to lose less than 100 microseconds versus
> > the in-kernel approach, which is negligible for this use case. (I'm not
> > sure about the other use cases.)
> > 
> > So, this is technically doable in user space.
> 
> This is crucial information I would like you to include in some commit
> message. I think it is very interesting for the reviewers. Maybe also
> copy that in the cover letter.
> 
> In my opinion there is a clear and obvious decision due that
> measurement: Add the new ioctl for userspace to hit, do not try to
> hardcode or upload the wake-up policy into the kernel.

Perhaps.

I'll admit, I'm not eager to go write the fd-passing solutions that
others are designing on the fly. I'm currently torn on whether I'll just
live with this current patch set out-of-tree (or, y'all can decide that
a simple, 99% working solution is better than no solution), because it's
simple; or possibly figuring out how to utilize such an ioctl cleanly
within our display manager. I'm not super hopeful on the latter.

IOW, I'm approximately in line with Doug's thoughts:
https://lore.kernel.org/all/CAD=FV=XARhZoj+0p-doxcbC=4K+NuMc=ur6wqg6kwk-mkpk...@mail.gmail.com/
But then, we're obviously biased.

> > I can't speak to the ease of _actually_ integrating this into even our
> > own Chrome display manager, but I highly doubt it will get integrated
> > into others. I'd posit this should weigh into the relative worth, but
> > otherwise can't really give you an answer there.
> 
> I think such a thing would be very simple to add to any display server.
> They already have hooks for things like resetting idle timeout timers on
> any relevant input event.
> 
> > I'd also note, software-directed PSR is so far designed to be completely
> > opaque to user space. There's no way to disable it; no way to know it's
> > active; and no way to know anything about the parameters it's computing
> > (like average entry/exit delay). Would you suggest a whole set of new
> > IOCTLs for this?
> 
> Just one ioctl on the DRM device: "Hey, wake up!". Because that's what
> your patch does in-kernel, right?

Well, we'd at least want something to advertise that the feature does
something ("is supported") I think, otherwise we're just asking user
space to do useless work.

> If there are use case specific parameters, then how did you intend to
> allow adjusting those in your proposal?

Another commenter mentioned the latency tradeoff -- it's possible that
there are panels/eDP-links that resume fast enough that one doesn't care
to use this ioctl. For an in-kernel solution, one has all the data
available and could use hardware information to make decisions, if
needed. For a user space solution, we won't have any of that, and we'd
have to work to expose that information.

I suppose we could ignore that problem and only expose a minimal UAPI
until we need something more, but it feels like exposing a UAPI for
something is a critical point where one should make sure it's reasonably
descriptive and useful.

> > > How do you know userspace is using this input device at all? If
> > > userspace is not using the input device, then DRM should not be opening
> > > it either, as it must have no effect on anything.
> > > 
> > > If you open an input device that userspace does not use, you also cause
> > > a power consumption regression, because now the input device itself is
> > > active and possibly flooding the kernel with events (e.g. an
> > > accelerometer).  
> > 
> > Well, I don't think accelerometers show up

Re: [PATCH v2 1/2] drm/input_helper: Add new input-handling helper

2021-11-19 Thread Brian Norris

Hi Daniel,

On Fri, Nov 19, 2021 at 11:01:18AM +0100, Daniel Vetter wrote:
> On Thu, Nov 18, 2021 at 11:30:43AM -0800, Brian Norris wrote:
> > On Thu, Nov 18, 2021 at 10:05:11AM +0100, Daniel Vetter wrote:
> > > On Wed, Nov 17, 2021 at 02:48:40PM -0800, Brian Norris wrote:
> > > > --- a/drivers/gpu/drm/Kconfig
> > > > +++ b/drivers/gpu/drm/Kconfig
> > > > @@ -79,9 +79,15 @@ config DRM_DEBUG_SELFTEST
> > > >  
> > > >   If in doubt, say "N".
> > > >  
> > > > +config DRM_INPUT_HELPER
> > > > +   def_bool y
> > > > +   depends on DRM_KMS_HELPER
> > > > +   depends on INPUT
> > > 
> > > Uh please no configs for each thing, it just makes everything more
> > > complex. Do we _really_ need this?
> > 
> > First, it's not a configurable option (a user will never see this nor
> > have to answer Y/N to it); it only serves as an intermediary to express
> > the CONFIG_INPUT dependency (which is necessary) without making
> > DRM_KMS_HELPER fully depend on CONFIG_INPUT. (We should be able to run
> > display stacks without the input subsystem.)
> 
> I'm not so much worried about the user cost, but the maintenance cost.
> Kbuild config complexity is ridiculous, anything that adds even a bit is
> really silly.
> 
> > The closest alternative I can think of with fewer Kconfig symbols is to
> > just use CONFIG_INPUT directly in the code, to decide whether to provide
> > the helpers or else just stub them out. But that has a problem of not
> > properly expressing the =m vs. =y necessity: if, for example,
> > CONFIG_DRM_KMS_HELPER=y and CONFIG_INPUT=m, then we'll have linker
> > issues.
> 
> Usually this is done by providing static inline dummy implementations in
> the headers. That avoids having to sprinkle new Kconfig symbols all over.

Right, I already did that, and I'm not sprinkling
CONFIG_DRM_INPUT_HELPER much. (I do include one around the module
parameter, because it doesn't make much sense to have the module
parameter even exist, if the underlying feature is stubbed out.)

But that doesn't solve the problem in my last sentence, involving
tristates. The "stub inline" approach only works well for boolean
features -- either built-in, or disabled. Once your feature is in a
module, you need to ensure that no built-in code depends on it.

Do you want DRM_KMS_HELPER to unconditionally depend on CONFIG_INPUT? If
so, I can just add a 'select' or 'depend' and drop this intermediate
symbol.
If not, then what do you expect to happen with DRM_KMS_HELPER=y and
CONFIG_INPUT=m?

Brian

Re: [PATCH v2 1/2] drm/input_helper: Add new input-handling helper

2021-11-18 Thread Brian Norris

Hi Pekka,

Thanks for the thoughts and review. I've tried to respond below:

On Thu, Nov 18, 2021 at 12:39:28PM +0200, Pekka Paalanen wrote:
> On Wed, 17 Nov 2021 14:48:40 -0800
> Brian Norris  wrote:
> 
> > A variety of applications have found it useful to listen to
> > user-initiated input events to make decisions within a DRM driver, given
> > that input events are often the first sign that we're going to start
> > doing latency-sensitive activities:
> > 
> >  * Panel self-refresh: software-directed self-refresh (e.g., with
> >Rockchip eDP) is especially latency sensitive. In some cases, it can
> >take 10s of milliseconds for a panel to exit self-refresh, which can
> >be noticeable. Rockchip RK3399 Chrome OS systems have always shipped
> >with an input_handler boost, that preemptively exits self-refresh
> >whenever there is input activity.
> > 
> >  * GPU drivers: on GPU-accelerated desktop systems, we may need to
> >render new frames immediately after user activity. Powering up the
> >GPU can take enough time that it is worthwhile to start this process
> >as soon as there is input activity. Many Chrome OS systems also ship
> >with an input_handler boost that powers up the GPU.
> > 
> > This patch provides a small helper library that abstracts some of the
> > input-subsystem details around picking which devices to listen to, and
> > some other boilerplate. This will be used in the next patch to implement
> > the first bullet: preemptive exit for panel self-refresh.
> > 
> > Bits of this are adapted from code the Android and/or Chrome OS kernels
> > have been carrying for a while.
> > 
> > Signed-off-by: Brian Norris 
> > ---
> 
> Thanks Simon for the CC.
> 
> Hi Brian,
> 
> while this feature in general makes sense and sounds good, to start
> warming up display hardware early when something might start to happen,
> this particular proposal has many problems from UAPI perspective (as it
> has none). Comments below.
> 
> Btw. if PSR is that slow to wake up from, how much do you actually gain
> from this input event watching? I would imagine the improvement to not
> be noticeable.

Patch 2 has details. It's not really about precisely how slow PSR is,
but how much foresight we can gain: in patch 2, I note that with my
particular user space and system, I can start PSR-exit 50ms earlier than
I would otherweise. (FWIW, this measurement is exactly the same it was
with the original version written 4 years ago.)

For how long PSR-exit takes: the measurements I'm able to do (via
ftrace) show that drm_self_refresh_transition() takes between 35 and 55
ms. That's noticeable at 60 fps. And quite conveniently, the input-boost
manages to hide nearly 100% of that latency.

Typical use cases where one notices PSR latency (and where this 35-55ms
matters) involve simply moving a cursor; it's very noticeable when you
have more than a few frames of latency to "get started".

> I think some numbers about how much this feature helps would be really
> good, even if they are quite specific use cases. You also need to
> identify the userspace components, because I think different display
> servers are very different in their reaction speed.

If my email address isn't obvious, I'm testing Chrome OS. I'm frankly
not that familiar with the user space display stack, but for what I
know, it's rather custom, developed within the Chromium project. Others
on CC here could probably give you more detail, if you want specific
answers, besides docs like this:

https://chromium.googlesource.com/chromium/src/+/HEAD/docs/ozone_overview.md

> If KMS gets a pageflip or modeset in no time after an input event, then
> what's the gain. OTOH, if the display server is locking on to vblank,
> there might be a delay worth avoiding. But then, is it worth
> short-circuiting the wake-up in kernel vs. adding a new ioctl that
> userspace could hit to start the warming up process?

Rob responded to the first part to some extent (there is definitely gain
to be had).

To the last part: I wrote a simple debugfs hook to allow user space to
force a PSR exit, and then a simple user space program to read input
events and smash that debugfs file whenever it sees one. Testing in the
same scenarios, this appears to lose less than 100 microseconds versus
the in-kernel approach, which is negligible for this use case. (I'm not
sure about the other use cases.)

So, this is technically doable in user space.

I can't speak to the ease of _actually_ integrating this into even our
own Chrome display manager, but I highly doubt it will get integrated
into others. I'd posit this should weigh into the relative worth, but
otherw

Re: [PATCH v2 1/2] drm/input_helper: Add new input-handling helper

2021-11-18 Thread Brian Norris

Hi Daniel,

Thanks for the review. Lots to address elsewhere, but I can respond
here first:

On Thu, Nov 18, 2021 at 10:05:11AM +0100, Daniel Vetter wrote:
> On Wed, Nov 17, 2021 at 02:48:40PM -0800, Brian Norris wrote:
> > --- a/drivers/gpu/drm/Kconfig
> > +++ b/drivers/gpu/drm/Kconfig
> > @@ -79,9 +79,15 @@ config DRM_DEBUG_SELFTEST
> >  
> >   If in doubt, say "N".
> >  
> > +config DRM_INPUT_HELPER
> > +   def_bool y
> > +   depends on DRM_KMS_HELPER
> > +   depends on INPUT
> 
> Uh please no configs for each thing, it just makes everything more
> complex. Do we _really_ need this?

First, it's not a configurable option (a user will never see this nor
have to answer Y/N to it); it only serves as an intermediary to express
the CONFIG_INPUT dependency (which is necessary) without making
DRM_KMS_HELPER fully depend on CONFIG_INPUT. (We should be able to run
display stacks without the input subsystem.)

The closest alternative I can think of with fewer Kconfig symbols is to
just use CONFIG_INPUT directly in the code, to decide whether to provide
the helpers or else just stub them out. But that has a problem of not
properly expressing the =m vs. =y necessity: if, for example,
CONFIG_DRM_KMS_HELPER=y and CONFIG_INPUT=m, then we'll have linker
issues.

In short, yes, I think we really need this. But I'm not a Kbuild expert.

> > diff --git a/include/drm/drm_input_helper.h b/include/drm/drm_input_helper.h
> > new file mode 100644
> > index ..7904f397b934
> > --- /dev/null
> > +++ b/include/drm/drm_input_helper.h
> > @@ -0,0 +1,41 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2021 Google, Inc.
> > + */
> > +#ifndef __DRM_INPUT_HELPER_H__
> > +#define __DRM_INPUT_HELPER_H__
> > +
> > +#include 
> > +
> > +struct drm_device;
> > +
> > +struct drm_input_handler {
> > +   /*
> > +* Callback to call for input activity. Will be called in an atomic
> > +* context.
> 
> How atomic? Like hardirq, and nasty spinlocks held?

Maybe I should have just cribbed off the  doc:

 * @event: event handler. This method is being called by input core with
 *  interrupts disabled and dev->event_lock spinlock held and so
 *  it may not sleep

I probably don't want to propagate the subsystem details about which
locks, but I guess I can be specific about "interrupts disabled" and
"don't sleep".

> > +*/
> > +   void (*callback)(struct drm_input_handler *handler);
> > +
> > +   struct input_handler handler;
> > +};
> > +
> > +#if defined(CONFIG_DRM_INPUT_HELPER)
> > +
> > +int drm_input_handle_register(struct drm_device *dev,
> > + struct drm_input_handler *handler);
> > +void drm_input_handle_unregister(struct drm_input_handler *handler);
> > +
> > +#else /* !CONFIG_DRM_INPUT_HELPER */
> > +
> > +static inline int drm_input_handle_register(struct drm_device *dev,
> > +   struct drm_input_handler *handler)
> > +{
> > +   return 0;
> > +}
> 
> I guess the reason behind the helper is that you also want to use this in
> drivers or maybe drm/sched?

I think my reasoning is heavily described in both the cover letter and
the commit message. If that's not clear, can you point out which part?
I'd gladly improve it :)

But specifically, see the 2nd bullet from the commit message, which I've
re-quoted down here:

> >  * GPU drivers: on GPU-accelerated desktop systems, we may need to
> >render new frames immediately after user activity. Powering up the
> >GPU can take enough time that it is worthwhile to start this process
> >as soon as there is input activity. Many Chrome OS systems also ship
> >with an input_handler boost that powers up the GPU.

Rob Clark has patches to drm/msm to boost GPU power-up via a similar
helper.

> Anyway I think it looks all reasonable. Definitely need an ack from input
> people

I realized I failed to carry Dmitry's Ack from version 1 [1]. If this
has a v3 in similar form, I'll carry it there.

> that the event list you have is a good choice, I have no idea what
> that all does. Maybe also document that part a bit more.

I'm admittedly not an expert there, and this is actually one reason why
we hoped to make this a library (that nobody wants to keep figuring out
whether all those flags, etc., are really doing the right thing), but
there are comments about what each entry is _trying_ to do. Are you
suggesting more, as in, why "BTN_LEFT + EV_KEY" means "pointer"? Or why
we match certain devices (because they represent likely user activity
that will affect the display pipeline)? Or both? Anyway, I'll give it a
shot, if we keep this.

Brian

[1] https://lore.kernel.org/all/yyw6fwsenmk25...@google.com/

[PATCH v2 1/2] drm/input_helper: Add new input-handling helper

2021-11-17 Thread Brian Norris

A variety of applications have found it useful to listen to
user-initiated input events to make decisions within a DRM driver, given
that input events are often the first sign that we're going to start
doing latency-sensitive activities:

 * Panel self-refresh: software-directed self-refresh (e.g., with
   Rockchip eDP) is especially latency sensitive. In some cases, it can
   take 10s of milliseconds for a panel to exit self-refresh, which can
   be noticeable. Rockchip RK3399 Chrome OS systems have always shipped
   with an input_handler boost, that preemptively exits self-refresh
   whenever there is input activity.

 * GPU drivers: on GPU-accelerated desktop systems, we may need to
   render new frames immediately after user activity. Powering up the
   GPU can take enough time that it is worthwhile to start this process
   as soon as there is input activity. Many Chrome OS systems also ship
   with an input_handler boost that powers up the GPU.

This patch provides a small helper library that abstracts some of the
input-subsystem details around picking which devices to listen to, and
some other boilerplate. This will be used in the next patch to implement
the first bullet: preemptive exit for panel self-refresh.

Bits of this are adapted from code the Android and/or Chrome OS kernels
have been carrying for a while.

Signed-off-by: Brian Norris 
---

Changes in v2:
 - Honor CONFIG_INPUT dependency, via new CONFIG_DRM_INPUT_HELPER
 - Remove void*; users should use container_of()
 - Document the callback context

 drivers/gpu/drm/Kconfig|   6 ++
 drivers/gpu/drm/Makefile   |   2 +
 drivers/gpu/drm/drm_input_helper.c | 143 +
 include/drm/drm_input_helper.h |  41 +
 4 files changed, 192 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_input_helper.c
 create mode 100644 include/drm/drm_input_helper.h

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index fb144617055b..381476b10a9d 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -79,9 +79,15 @@ config DRM_DEBUG_SELFTEST
 
  If in doubt, say "N".
 
+config DRM_INPUT_HELPER
+   def_bool y
+   depends on DRM_KMS_HELPER
+   depends on INPUT
+
 config DRM_KMS_HELPER
tristate
depends on DRM
+   select DRM_INPUT_HELPER if INPUT
help
  CRTC helpers for KMS drivers.
 
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 1c41156deb5f..9a6494aa45e6 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -56,6 +56,8 @@ drm_kms_helper-y := drm_bridge_connector.o drm_crtc_helper.o 
drm_dp_helper.o \
drm_atomic_state_helper.o drm_damage_helper.o \
drm_format_helper.o drm_self_refresh_helper.o drm_rect.o
 
+drm_kms_helper-$(CONFIG_DRM_INPUT_HELPER) += drm_input_helper.o
+
 drm_kms_helper-$(CONFIG_DRM_PANEL_BRIDGE) += bridge/panel.o
 drm_kms_helper-$(CONFIG_DRM_FBDEV_EMULATION) += drm_fb_helper.o
 drm_kms_helper-$(CONFIG_DRM_KMS_CMA_HELPER) += drm_fb_cma_helper.o
diff --git a/drivers/gpu/drm/drm_input_helper.c 
b/drivers/gpu/drm/drm_input_helper.c
new file mode 100644
index ..470f90865c7c
--- /dev/null
+++ b/drivers/gpu/drm/drm_input_helper.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Google, Inc.
+ */
+#include 
+#include 
+
+#include 
+#include 
+
+/**
+ * DOC: overview
+ *
+ * This helper library provides a thin wrapper around input handles, so that
+ * DRM drivers can easily perform domain-specific actions in response to user
+ * activity. e.g., if someone is moving a mouse, we're likely to want to
+ * display something soon, and we should exit panel self-refresh.
+ */
+
+static void drm_input_event(struct input_handle *handle, unsigned int type,
+   unsigned int code, int value)
+{
+   struct drm_input_handler *handler = handle->handler->private;
+
+   handler->callback(handler);
+}
+
+static int drm_input_connect(struct input_handler *handler,
+struct input_dev *dev,
+const struct input_device_id *id)
+{
+   struct input_handle *handle;
+   int error;
+
+   handle = kzalloc(sizeof(struct input_handle), GFP_KERNEL);
+   if (!handle)
+   return -ENOMEM;
+
+   handle->dev = dev;
+   handle->handler = handler;
+   handle->name = "drm-input-helper";
+
+   error = input_register_handle(handle);
+   if (error)
+   goto err2;
+
+   error = input_open_device(handle);
+   if (error)
+   goto err1;
+
+   return 0;
+
+err1:
+   input_unregister_handle(handle);
+err2:
+   kfree(handle);
+   return error;
+}
+
+static void drm_input_disconnect(struct input_handle *handle)
+{
+   input_close_device(handle);
+   input_unregister_handle(handle);
+   kfree(handle);
+}
+
+s

[PATCH v2 2/2] drm/self_refresh: Disable self-refresh on input events

2021-11-17 Thread Brian Norris

To improve panel self-refresh exit latency, we speculatively start
exiting when we
receive input events. Occasionally, this may lead to false positives,
but most of the time we get a head start on coming out of PSR. Depending
on how userspace takes to produce a new frame in response to the event,
this can completely hide the exit latency.

In local tests on Chrome OS (Rockchip RK3399 eDP), we've found that the
input notifier gives us about a 50ms head start over the
fb-update-initiated exit.

Leverage a new drm_input_helper library to get easy access to
likely-relevant input event callbacks.

Inspired-by: Kristian H. Kristensen 
Signed-off-by: Brian Norris 
---
This was in part picked up from:

  
https://lore.kernel.org/all/20180405095000.9756-25-enric.balle...@collabora.com/
  [PATCH v6 24/30] drm/rockchip: Disable PSR on input events

with significant rewrites/reworks:

 - moved to common drm_input_helper and drm_self_refresh_helper
   implementation
 - track state only through crtc->state->self_refresh_active

Note that I'm relatively unfamiliar with DRM locking expectations, but I
believe access to drm_crtc->state (which helps us track redundant
transitions) is OK under the locking provided by
drm_atomic_get_crtc_state().

Changes in v2:
 - Delay PSR re-entry, when already disabled
 - Allow default configuration via Kconfig and modparam
 - Replace void* with container_of()

 drivers/gpu/drm/Kconfig   | 16 
 drivers/gpu/drm/drm_self_refresh_helper.c | 98 +++
 2 files changed, 100 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 381476b10a9d..698924ed9b6b 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -84,6 +84,22 @@ config DRM_INPUT_HELPER
depends on DRM_KMS_HELPER
depends on INPUT
 
+config DRM_SELF_REFRESH_INPUT_BOOST_DEFAULT
+   bool "Preemptively exit panel self-refresh on input device activity" if 
EXPERT
+   default y
+   depends on DRM_INPUT_HELPER
+   help
+ Allows the generic DRM panel self-refresh helpers to factor in user
+ input activity to preemptively exit panel self-refresh, in order to
+ reduce potentially-visible latency when displaying new display
+ content. This is an optimization which often will do the right thing,
+ but can be disabled for experimentation or similar.
+
+ Saying Y enables the feature by default; this can also be configured
+ by module parameter, drm_kms_helper.self_refresh_input_boost.
+
+ If in doubt, say "Y".
+
 config DRM_KMS_HELPER
tristate
depends on DRM
diff --git a/drivers/gpu/drm/drm_self_refresh_helper.c 
b/drivers/gpu/drm/drm_self_refresh_helper.c
index dd33fec5aabd..ba4881e683b7 100644
--- a/drivers/gpu/drm/drm_self_refresh_helper.c
+++ b/drivers/gpu/drm/drm_self_refresh_helper.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -15,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -58,17 +60,41 @@ DECLARE_EWMA(psr_time, 4, 4)
 struct drm_self_refresh_data {
struct drm_crtc *crtc;
struct delayed_work entry_work;
+   struct work_struct exit_work;
+   struct drm_input_handler input_handler;
+   bool input_handler_registered;
 
struct mutex avg_mutex;
struct ewma_psr_time entry_avg_ms;
struct ewma_psr_time exit_avg_ms;
 };
 
-static void drm_self_refresh_helper_entry_work(struct work_struct *work)
+static bool self_refresh_input_boost =
+   IS_ENABLED(CONFIG_DRM_SELF_REFRESH_INPUT_BOOST_DEFAULT);
+#if defined(CONFIG_DRM_INPUT_HELPER)
+module_param(self_refresh_input_boost, bool, 0644);
+MODULE_PARM_DESC(self_refresh_input_boost,
+"Enable panel self-refresh input boost [default="
+__stringify(CONFIG_DRM_SELF_REFRESH_INPUT_BOOST_DEFAULT) "]");
+#endif /* CONFIG_DRM_INPUT_HELPER */
+
+
+static void drm_self_refresh_reschedule(struct drm_self_refresh_data *sr_data)
+{
+   unsigned int delay;
+
+   mutex_lock(&sr_data->avg_mutex);
+   delay = (ewma_psr_time_read(&sr_data->entry_avg_ms) +
+ewma_psr_time_read(&sr_data->exit_avg_ms)) * 2;
+   mutex_unlock(&sr_data->avg_mutex);
+
+   mod_delayed_work(system_wq, &sr_data->entry_work,
+msecs_to_jiffies(delay));
+}
+
+static void drm_self_refresh_transition(struct drm_self_refresh_data *sr_data,
+   bool enable)
 {
-   struct drm_self_refresh_data *sr_data = container_of(
-   to_delayed_work(work),
-   struct drm_self_refresh_data, entry_work);
struct drm_crtc *crtc = sr_data->crtc;
struct drm_device *dev = crtc->dev;
struct drm_modeset_acquire_c

[PATCH v2 0/2] drm: Support input-boosted panel self-refresh exit

2021-11-17 Thread Brian Norris

A variety of applications have found it useful to listen to
user-initiated input events to make decisions within a DRM driver, given
that input events are often the first sign that we're going to start
doing latency-sensitive activities:

 * Panel self-refresh: software-directed self-refresh (e.g., with
   Rockchip eDP) is especially latency sensitive. In some cases, it can
   take 10s of milliseconds for a panel to exit self-refresh, which can
   be noticeable. Rockchip RK3399 Chrome OS systems have always shipped
   with an input_handler boost, that preemptively exits self-refresh
   whenever there is input activity.

 * GPU drivers: on GPU-accelerated desktop systems, we may need to
   render new frames immediately after user activity. Powering up the
   GPU can take enough time that it is worthwhile to start this process
   as soon as there is input activity. Many Chrome OS systems also ship
   with an input_handler boost that powers up the GPU.

I implement the first bullet in this series, and I also compared with
some out-of-tree patches for the second, to ensure this could be useful
there too.

Past work on upstreaming a variety of Chromebook display patches got
held up for this particular feature, as there was some desire to make it
a bit more generic, for one. See the latest here:

  
https://lore.kernel.org/all/20180405095000.9756-25-enric.balle...@collabora.com/
  [PATCH v6 24/30] drm/rockchip: Disable PSR on input events

I significantly rewrote this to adapt it to the new common
drm_self_refresh_helpers and to add a new drm_input_helper thin library,
so I only carry my own authorship on this series.

Admittedly, this "drm_input_helper" library is barely DRM-specific at
all, except that all display- and GPU-related input-watchers are likely
to want to watch similar device behavior (unlike, say, rfkill or led
input_handler code). The approximate consensus so far seems to be that
(a) this isn't much code; if we need it for other subsystems (like,
cpufreq-boost), it's easy to implement similar logic
(b) input subsystem maintainers think the existing input_handler
abstraction is good enough
So, I keep the thin input helper in drivers/gpu/drm/.

v1: 
https://lore.kernel.org/all/20211103234018.4009771-1-briannor...@chromium.org/

Changes in v2:
 - Honor CONFIG_INPUT dependency, via new CONFIG_DRM_INPUT_HELPER
 - Remove void*; users should use container_of()
 - Document the callback context
 - Delay PSR re-entry, when already disabled
 - Allow default configuration via Kconfig and modparam
 - really CC dri-devel@lists.freedesktop.org (oops!)

Brian Norris (2):
  drm/input_helper: Add new input-handling helper
  drm/self_refresh: Disable self-refresh on input events

 drivers/gpu/drm/Kconfig   |  22 
 drivers/gpu/drm/Makefile  |   2 +
 drivers/gpu/drm/drm_input_helper.c| 143 ++
 drivers/gpu/drm/drm_self_refresh_helper.c |  98 ---
 include/drm/drm_input_helper.h|  41 +++
 5 files changed, 292 insertions(+), 14 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_input_helper.c
 create mode 100644 include/drm/drm_input_helper.h

-- 
2.34.0.rc1.387.gb447b232ab-goog

[PATCH v3] drm/bridge: analogix_dp: Make PSR-exit block less

2021-11-03 Thread Brian Norris

Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"),
"PSR exit" used non-blocking analogix_dp_send_psr_spd(). The refactor
started using the blocking variant, for a variety of reasons -- quoting
Sean Paul's potentially-faulty memory:

"""
 - To avoid racing a subsequent PSR entry (if exit takes a long time)
 - To avoid racing disable/modeset
 - We're not displaying new content while exiting PSR anyways, so there
   is minimal utility in allowing frames to be submitted
 - We're lying to userspace telling them frames are on the screen when
   we're just dropping them on the floor
"""

However, I'm finding that this blocking transition is causing upwards of
60+ ms of unneeded latency on PSR-exit, to the point that initial cursor
movements when leaving PSR are unbearably jumpy.

It turns out that we need to meet in the middle somewhere: Sean is right
that we were "lying to userspace" with a non-blocking PSR-exit, but the
new blocking behavior is also waiting too long:

According to the eDP specification, the sink device must support PSR
entry transitions from both state 4 (ACTIVE_RESYNC) and state 0
(INACTIVE). It also states that in ACTIVE_RESYNC, "the Sink device must
display the incoming active frames from the Source device with no
visible glitches and/or artifacts."

Thus, for our purposes, we only need to wait for ACTIVE_RESYNC before
moving on; we are ready to display video, and subsequent PSR-entry is
safe.

Tested on a Samsung Chromebook Plus (i.e., Rockchip RK3399 Gru Kevin),
where this saves about 60ms of latency, for PSR-exit that used to
take about 80ms.

Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR")
Cc: 
Cc: Zain Wang 
Cc: Tomasz Figa 
Cc: Heiko Stuebner 
Cc: Sean Paul 
Signed-off-by: Brian Norris 
Reviewed-by: Sean Paul 
---
CC list is partially constructed from the commit message of the Fixed
commit

Changes in v3:
 - Fix the exiting/entering comment (a reviewer noticed off-mailing-list
   that I got it backwards)
 - Add Reviewed-by

Changes in v2:
 - retitled subject (previous: "drm/bridge: analogix_dp: Make
   PSR-disable non-blocking")
 - instead of completely non-blocking, make this "less"-blocking
 - more background (thanks Sean!)
 - more specification details

 drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c
index cab6c8b92efd..6a4f20fccf84 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c
@@ -998,11 +998,21 @@ int analogix_dp_send_psr_spd(struct analogix_dp_device 
*dp,
if (!blocking)
return 0;
 
+   /*
+* db[1]!=0: entering PSR, wait for fully active remote frame buffer.
+* db[1]==0: exiting PSR, wait for either
+*  (a) ACTIVE_RESYNC - the sink "must display the
+*  incoming active frames from the Source device with no visible
+*  glitches and/or artifacts", even though timings may still be
+*  re-synchronizing; or
+*  (b) INACTIVE - the transition is fully complete.
+*/
ret = readx_poll_timeout(analogix_dp_get_psr_status, dp, psr_status,
psr_status >= 0 &&
((vsc->db[1] && psr_status == DP_PSR_SINK_ACTIVE_RFB) ||
-   (!vsc->db[1] && psr_status == DP_PSR_SINK_INACTIVE)), 1500,
-   DP_TIMEOUT_PSR_LOOP_MS * 1000);
+   (!vsc->db[1] && (psr_status == DP_PSR_SINK_ACTIVE_RESYNC ||
+psr_status == DP_PSR_SINK_INACTIVE))),
+   1500, DP_TIMEOUT_PSR_LOOP_MS * 1000);
if (ret) {
dev_warn(dp->dev, "Failed to apply PSR %d\n", ret);
return ret;
-- 
2.34.0.rc0.344.g81b53c2807-goog

[PATCH v2] drm/bridge: analogix_dp: Make PSR-exit block less

2021-10-29 Thread Brian Norris

Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"),
"PSR exit" used non-blocking analogix_dp_send_psr_spd(). The refactor
started using the blocking variant, for a variety of reasons -- quoting
Sean Paul's potentially-faulty memory:

"""
 - To avoid racing a subsequent PSR entry (if exit takes a long time)
 - To avoid racing disable/modeset
 - We're not displaying new content while exiting PSR anyways, so there
   is minimal utility in allowing frames to be submitted
 - We're lying to userspace telling them frames are on the screen when
   we're just dropping them on the floor
"""

However, I'm finding that this blocking transition is causing upwards of
60+ ms of unneeded latency on PSR-exit, to the point that initial cursor
movements when leaving PSR are unbearably jumpy.

It turns out that we need to meet in the middle somewhere: Sean is right
that we were "lying to userspace" with a non-blocking PSR-exit, but the
new blocking behavior is also waiting too long:

According to the eDP specification, the sink device must support PSR
entry transitions from both state 4 (ACTIVE_RESYNC) and state 0
(INACTIVE). It also states that in ACTIVE_RESYNC, "the Sink device must
display the incoming active frames from the Source device with no
visible glitches and/or artifacts."

Thus, for our purposes, we only need to wait for ACTIVE_RESYNC before
moving on; we are ready to display video, and subsequent PSR-entry is
safe.

Tested on a Samsung Chromebook Plus (i.e., Rockchip RK3399 Gru Kevin),
where this saves about 60ms of latency, for PSR-exit that used to
take about 80ms.

Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR")
Cc: 
Cc: Zain Wang 
Cc: Tomasz Figa 
Cc: Heiko Stuebner 
Cc: Sean Paul 
Signed-off-by: Brian Norris 
---
CC list is partially constructed from the commit message of the Fixed
commit

Changes in v2:
 - retitled subject (previous: "drm/bridge: analogix_dp: Make
   PSR-disable non-blocking")
 - instead of completely non-blocking, make this "less"-blocking
 - more background (thanks Sean!)
 - more specification details

 drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c
index cab6c8b92efd..f8e119e84ae2 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c
@@ -998,11 +998,21 @@ int analogix_dp_send_psr_spd(struct analogix_dp_device 
*dp,
if (!blocking)
return 0;
 
+   /*
+* db[1]==0: entering PSR, wait for fully active remote frame buffer.
+* db[1]!=0: exiting PSR, wait for either
+*  (a) ACTIVE_RESYNC - the sink "must display the
+*  incoming active frames from the Source device with no visible
+*  glitches and/or artifacts", even though timings may still be
+*  re-synchronizing; or
+*  (b) INACTIVE - the transition is fully complete.
+*/
ret = readx_poll_timeout(analogix_dp_get_psr_status, dp, psr_status,
psr_status >= 0 &&
((vsc->db[1] && psr_status == DP_PSR_SINK_ACTIVE_RFB) ||
-   (!vsc->db[1] && psr_status == DP_PSR_SINK_INACTIVE)), 1500,
-   DP_TIMEOUT_PSR_LOOP_MS * 1000);
+   (!vsc->db[1] && (psr_status == DP_PSR_SINK_ACTIVE_RESYNC ||
+psr_status == DP_PSR_SINK_INACTIVE))),
+   1500, DP_TIMEOUT_PSR_LOOP_MS * 1000);
if (ret) {
dev_warn(dp->dev, "Failed to apply PSR %d\n", ret);
return ret;
-- 
2.33.1.1089.g2158813163f-goog

Re: [PATCH] drm/bridge: analogix_dp: Make PSR-disable non-blocking

2021-10-29 Thread Brian Norris

On Wed, Oct 20, 2021 at 06:23:35PM -0700, Brian Norris wrote:
> On Wed, Oct 20, 2021 at 5:40 PM Sean Paul  wrote:
> > The actual latency gains from doing this synchronously are minimal since the
> > panel will display new content as soon as it can regardless of whether the
> > kernel is blocking. There is likely a perceptual difference, but that's only
> > because kernel is lying to userspace and skipping frames without consent.
> 
> Hmm, you might well be right about some of the first points (I'm still
> learning the DRM framework), but I'm a bit skeptical that the
> perceptual difference is "only" because we're cheating in some way.
> I'm not doing science here, and it's certainly not a blinded test, but
> I'm nearly certain this patch cuts out approx 50-80% of the cursor lag
> I see without this patch (relative to the current Chrome OS kernel). I
> don't see how cheating would produce a smoother cursor movement --
> we'd still be dropping frames, and the movement would appear jumpy
> somewhere along the way.

Aha, so I think I found {a,the} reason for some disagreement here:
looking at the eDP PSR spec, I see that while the current implementation
is looking for psr_state==DP_PSR_SINK_INACTIVE to signal PSR-exit
completion, the spec shows an intermediate state
(DP_PSR_SINK_ACTIVE_RESYNC == 4), where among other things, "the Sink
device must display the incoming active frames from the Source device
with no visible glitches and/or artifacts."

And it happens that we move to DP_PSR_SINK_ACTIVE_RESYNC somewhat
quickly (on the order of 20-40ms), while the move to
DP_PSR_SINK_INACTIVE is a good chunk longer (approx 60ms more). So
pre-commit-6c836d965bad might have been cheating a little (we'd claim
we're "done" about 20-40ms too early), but post-commit-6c836d965bad
we're waiting about 60ms too long.

I'll send v2 to make this block for DP_PSR_SINK_ACTIVE_RESYNC ||
DP_PSR_SINK_INACTIVE, which gets much or all of the same latency win,
and I'll try to document the reasons, etc., better.

I'll probably also include a patch to drop the 'blocking' parameter,
since it's unused, and gives the wrong idea about this state machine.

> In any case, I'm absolutely certain that mainline Linux performs much
> much worse with PSR than the current CrOS kernel, but there are some
> other potential reasons for that, such as the lack of an
> input-notifier [1].
...
> [1] This got locked up in "controversy":
> https://patchwork.kernel.org/project/linux-arm-kernel/patch/20180405095000.9756-25-enric.balle...@collabora.com/

While I'm here: I also played with this a bit, and I still haven't
gotten all the details right, but I don't believe this alone will get
the latency wins we'd like. We still need something like the above.

Brian

Re: [PATCH] drm/bridge: analogix_dp: Make PSR-disable non-blocking

2021-10-20 Thread Brian Norris

(Dropping Andrzej, because that address keeps bouncing. Does
MAINTAINERS and/or .mailmap need updating?)

Apologies for the double reply here, but I forgot to mention one last
thing for now:

On Wed, Oct 20, 2021 at 5:40 PM Sean Paul  wrote:
>
> On Wed, Oct 20, 2021 at 04:17:28PM -0700, Brian Norris wrote:
> > Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"),
> > "PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor
> > accidentally (?) set blocking=true.
>
> IIRC this wasn't accidental.

One other tip that made me think it was accidental was that today, the
|blocking| argument to analogix_dp_send_psr_spd() is always true. If
non-blocking support was intentionally dropped, it seemed like you
should have dropped the non-blocking code too. But that's a weak proof
of your intentions :)

Brian

Re: [PATCH] drm/bridge: analogix_dp: Make PSR-disable non-blocking

2021-10-20 Thread Brian Norris

On Wed, Oct 20, 2021 at 5:40 PM Sean Paul  wrote:
> On Wed, Oct 20, 2021 at 04:17:28PM -0700, Brian Norris wrote:
> > Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"),
> > "PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor
> > accidentally (?) set blocking=true.
>
> IIRC this wasn't accidental.
>
> The reason it became synchronous was:
>  - To avoid racing a subsequent PSR entry (if exit takes a long time)

How did this work pre-commit-6c836d965bad then? I don't see any
provision for avoiding subsequent PSR entry. Or I guess that was
implicitly covered by PSR_FLUSH_TIMEOUT_MS, which means we allowed at
least 100ms between exit/entry each time, which was good enough? And
in the "new" implementation, that turned into a running average that
gets measured on each commit? So we're no longer guaranteed 100ms, and
it's even worse if we cheat the timing measurement?

I'm still not sure that "race" is truly a problem without consulting
some kind of hardware documentation though. It wouldn't surprise me if
these things are cancelable.

>  - To avoid racing disable/modeset
>  - We're not displaying new content while exiting PSR anyways, so there is
>minimal utility in allowing frames to be submitted
>  - We're lying to userspace telling them frames are on the screen when we're
>just dropping them on the floor
>
> The actual latency gains from doing this synchronously are minimal since the
> panel will display new content as soon as it can regardless of whether the
> kernel is blocking. There is likely a perceptual difference, but that's only
> because kernel is lying to userspace and skipping frames without consent.

Hmm, you might well be right about some of the first points (I'm still
learning the DRM framework), but I'm a bit skeptical that the
perceptual difference is "only" because we're cheating in some way.
I'm not doing science here, and it's certainly not a blinded test, but
I'm nearly certain this patch cuts out approx 50-80% of the cursor lag
I see without this patch (relative to the current Chrome OS kernel). I
don't see how cheating would produce a smoother cursor movement --
we'd still be dropping frames, and the movement would appear jumpy
somewhere along the way.

In any case, I'm absolutely certain that mainline Linux performs much
much worse with PSR than the current CrOS kernel, but there are some
other potential reasons for that, such as the lack of an
input-notifier [1].

> Going back to the first line, it's entirely possible my memory is failing
> and this was accidental!

Well either way, thanks for the notes. I'll see if I can get anywhere
on proving/disproving that they are relevant, or if they can be worked
around some other way; or perhaps I can regain the lost performance
some other way. It'll be a few days before I get around to that.

Brian

[1] This got locked up in "controversy":
https://patchwork.kernel.org/project/linux-arm-kernel/patch/20180405095000.9756-25-enric.balle...@collabora.com/

[PATCH] drm/bridge: analogix_dp: Make PSR-disable non-blocking

2021-10-20 Thread Brian Norris

Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"),
"PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor
accidentally (?) set blocking=true.

This can cause upwards of 60-100ms of unneeded latency when exiting
self-refresh, which can cause very noticeable lag when, say, moving a
cursor.

Presumbaly it's OK to let the display finish exiting refresh in parallel
with clocking out the next video frames, so we shouldn't hold up the
atomic_enable() step. This also brings behavior in line with the
downstream ("mainline-derived") variant of the driver currently deployed
to Chrome OS Rockchip systems.

Tested on a Samsung Chromebook Plus (i.e., Rockchip RK3399 Gru Kevin).

Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR")
Cc: 
Cc: Zain Wang 
Cc: Tomasz Figa 
Cc: Heiko Stuebner 
Cc: Sean Paul 
Signed-off-by: Brian Norris 
---
CC list is partially constructed from the commit message of the Fixed
commit

 drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index b7d2e4449cfa..fbe6eb9df310 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1055,7 +1055,7 @@ static int analogix_dp_disable_psr(struct 
analogix_dp_device *dp)
psr_vsc.db[0] = 0;
psr_vsc.db[1] = 0;
 
-   return analogix_dp_send_psr_spd(dp, &psr_vsc, true);
+   return analogix_dp_send_psr_spd(dp, &psr_vsc, false);
 }
 
 /*
-- 
2.33.0.1079.g6e70778dc9-goog

[PATCH v2] MAINTAINERS: Fixup drm-misc website link

2021-10-20 Thread Brian Norris

https://01.org/linuxgraphics/gfx-docs/maintainer-tools/drm-misc.html
gives HTTP 404, and
https://01.org/linuxgraphics/gfx-docs/maintainer-tools/ redirects to
freedesktop.org now.

Let's save people the pain of figuring that out.

Signed-off-by: Brian Norris 
Reviewed-by: Sean Paul 
---

Changes in v2:
 - Correct the patch description text

 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 100d7f93a15b..811d8d3e35fb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6158,7 +6158,7 @@ M:Maarten Lankhorst 

 M: Maxime Ripard 
 M: Thomas Zimmermann 
 S: Maintained
-W: https://01.org/linuxgraphics/gfx-docs/maintainer-tools/drm-misc.html
+W: https://drm.pages.freedesktop.org/maintainer-tools/drm-misc.html
 T: git git://anongit.freedesktop.org/drm/drm-misc
 F: Documentation/gpu/
 F: drivers/gpu/drm/*
-- 
2.33.0.1079.g6e70778dc9-goog

Re: [PATCH] MAINTAINERS: Fixup drm-misc website link

2021-10-20 Thread Brian Norris

On Tue, Oct 19, 2021 at 5:59 PM Brian Norris  wrote:
>
> https://drm.pages.freedesktop.org/maintainer-tools/drm-misc.html gives
> HTTP 404, and https://drm.pages.freedesktop.org/maintainer-tools/
> redirects to freedesktop.org now.

With a dash of irony, I actually listed the wrong URLs in the
description. (I used the properly-redirected ones, and claimed that
they were the broken ones.)

I'll send a v2, so a maintainer doesn't have to fix that up for me.

Brian

[PATCH] MAINTAINERS: Fixup drm-misc website link

2021-10-19 Thread Brian Norris

https://drm.pages.freedesktop.org/maintainer-tools/drm-misc.html gives
HTTP 404, and https://drm.pages.freedesktop.org/maintainer-tools/
redirects to freedesktop.org now.

Let's save people the pain of figuring that out.

Signed-off-by: Brian Norris 
---

 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 100d7f93a15b..811d8d3e35fb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6158,7 +6158,7 @@ M:Maarten Lankhorst 

 M: Maxime Ripard 
 M: Thomas Zimmermann 
 S: Maintained
-W: https://01.org/linuxgraphics/gfx-docs/maintainer-tools/drm-misc.html
+W: https://drm.pages.freedesktop.org/maintainer-tools/drm-misc.html
 T: git git://anongit.freedesktop.org/drm/drm-misc
 F: Documentation/gpu/
 F: drivers/gpu/drm/*
-- 
2.33.0.1079.g6e70778dc9-goog

[PATCH] drm/rockchip: vop: Add timeout for DSP hold

2021-10-08 Thread Brian Norris

If hardware is malfunctioning (e.g., misconfigured clocks?), we can get
stuck here forever, holding various DRM locks and eventually locking up
the entire system. It's better to complain loudly and move on, than to
lock up the system.

In local tests, this operation takes less than 20ms.

Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index ba9e14da41b4..b28ea5d6a3e2 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -726,7 +726,9 @@ static void vop_crtc_atomic_disable(struct drm_crtc *crtc,
 
spin_unlock(&vop->reg_lock);
 
-   wait_for_completion(&vop->dsp_hold_completion);
+   if (!wait_for_completion_timeout(&vop->dsp_hold_completion,
+msecs_to_jiffies(200)))
+   WARN(1, "%s: timed out waiting for DSP hold", crtc->name);
 
vop_dsp_hold_valid_irq_disable(vop);
 
-- 
2.33.0.882.g93a45727a2-goog

[PATCH v2] drm/bridge: analogix_dp: Grab runtime PM reference for DP-AUX

2021-10-01 Thread Brian Norris

If the display is not enable()d, then we aren't holding a runtime PM
reference here. Thus, it's easy to accidentally cause a hang, if user
space is poking around at /dev/drm_dp_aux0 at the "wrong" time.

Let's get the panel and PM state right before trying to talk AUX.

Fixes: 0d97ad03f422 ("drm/bridge: analogix_dp: Remove duplicated code")
Cc: 
Cc: Tomeu Vizoso 
Signed-off-by: Brian Norris 
---

Changes in v2:
- Fix spelling in Subject
- DRM_DEV_ERROR() -> drm_err()
- Propagate errors from un-analogix_dp_prepare_panel()

 .../drm/bridge/analogix/analogix_dp_core.c| 21 ++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index b7d2e4449cfa..6fc46ac93ef8 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1632,8 +1632,27 @@ static ssize_t analogix_dpaux_transfer(struct drm_dp_aux 
*aux,
   struct drm_dp_aux_msg *msg)
 {
struct analogix_dp_device *dp = to_dp(aux);
+   int ret, ret2;
 
-   return analogix_dp_transfer(dp, msg);
+   ret = analogix_dp_prepare_panel(dp, true, false);
+   if (ret) {
+   drm_err(dp->drm_dev, "Failed to prepare panel (%d)\n", ret);
+   return ret;
+   }
+
+   pm_runtime_get_sync(dp->dev);
+   ret = analogix_dp_transfer(dp, msg);
+   pm_runtime_put(dp->dev);
+
+   ret2 = analogix_dp_prepare_panel(dp, false, false);
+   if (ret2) {
+   drm_err(dp->drm_dev, "Failed to unprepare panel (%d)\n", ret2);
+   /* Prefer the analogix_dp_transfer() error, if it exists. */
+   if (!ret)
+   ret = ret2;
+   }
+
+   return ret;
 }
 
 struct analogix_dp_device *
-- 
2.33.0.800.g4c38ced690-goog

Re: [PATCH] drm/brdige: analogix_dp: Grab runtime PM reference for DP-AUX

2021-10-01 Thread Brian Norris

On Fri, Oct 1, 2021 at 1:37 PM Sean Paul  wrote:
> On Wed, Sep 29, 2021 at 02:41:03PM -0700, Brian Norris wrote:
> > --- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> > +++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> > @@ -1632,8 +1632,23 @@ static ssize_t analogix_dpaux_transfer(struct 
> > drm_dp_aux *aux,
> >  struct drm_dp_aux_msg *msg)
> >  {
> >   struct analogix_dp_device *dp = to_dp(aux);
> > + int ret, ret2;
> >
> > - return analogix_dp_transfer(dp, msg);
> > + ret = analogix_dp_prepare_panel(dp, true, false);
> > + if (ret) {
> > + DRM_DEV_ERROR(dp->dev, "Failed to prepare panel (%d)\n", ret);
>
> s/DRM_DEV_ERROR/drm_err/

Sure. Now that I'm looking a second time, I see the header recommends this.

> > + return ret;
> > + }
> > +
> > + pm_runtime_get_sync(dp->dev);
> > + ret = analogix_dp_transfer(dp, msg);
> > + pm_runtime_put(dp->dev);
> > +
> > + ret2 = analogix_dp_prepare_panel(dp, false, false);
> > + if (ret2)
> > + DRM_DEV_ERROR(dp->dev, "Failed to unprepare panel (%d)\n", 
> > ret2);
>
> What's the reasoning for not propagating unprepare failures? I feel like that
> should be fair game.

I suppose the underlying reason is laziness, sorry. But a related
reason is the we probably should prefer propagating the
analogix_dp_transfer() error, if it's non-zero, rather than the
unprepare error. That's not too hard to do though, even if it's
slightly more awkward.

> > +
> > + return ret;
> >  }
> >
> >  struct analogix_dp_device *

v2 coming.

Regards,
Brian

[PATCH] drm/brdige: analogix_dp: Grab runtime PM reference for DP-AUX

2021-09-29 Thread Brian Norris

If the display is not enable()d, then we aren't holding a runtime PM
reference here. Thus, it's easy to accidentally cause a hang, if user
space is poking around at /dev/drm_dp_aux0 at the "wrong" time.

Let's get the panel and PM state right before trying to talk AUX.

Fixes: 0d97ad03f422 ("drm/bridge: analogix_dp: Remove duplicated code")
Cc: 
Cc: Tomeu Vizoso 
Signed-off-by: Brian Norris 
---

 .../gpu/drm/bridge/analogix/analogix_dp_core.c  | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c 
b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index b7d2e4449cfa..a1b553904b85 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1632,8 +1632,23 @@ static ssize_t analogix_dpaux_transfer(struct drm_dp_aux 
*aux,
   struct drm_dp_aux_msg *msg)
 {
struct analogix_dp_device *dp = to_dp(aux);
+   int ret, ret2;
 
-   return analogix_dp_transfer(dp, msg);
+   ret = analogix_dp_prepare_panel(dp, true, false);
+   if (ret) {
+   DRM_DEV_ERROR(dp->dev, "Failed to prepare panel (%d)\n", ret);
+   return ret;
+   }
+
+   pm_runtime_get_sync(dp->dev);
+   ret = analogix_dp_transfer(dp, msg);
+   pm_runtime_put(dp->dev);
+
+   ret2 = analogix_dp_prepare_panel(dp, false, false);
+   if (ret2)
+   DRM_DEV_ERROR(dp->dev, "Failed to unprepare panel (%d)\n", 
ret2);
+
+   return ret;
 }
 
 struct analogix_dp_device *
-- 
2.33.0.685.g46640cef36-goog

[PATCH v3 4/4] drm/rockchip: dsi: Disable PLL clock on bind error

2021-09-28 Thread Brian Norris

Fix some error handling here noticed in review of other changes.

Fixes: 2d4f7bdafd70 ("drm/rockchip: dsi: migrate to use dw-mipi-dsi bridge 
driver")
Signed-off-by: Brian Norris 
Reported-by: Chen-Yu Tsai 
Reviewed-by: Chen-Yu Tsai 
---

Changes in v3:
- Add Fixes, Reviewed-by

Changes in v2:
- New

 drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index 8ea852880d1c..59c3d8ef6bf9 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -945,7 +945,7 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
ret = clk_prepare_enable(dsi->grf_clk);
if (ret) {
DRM_DEV_ERROR(dsi->dev, "Failed to enable grf_clk: %d\n", ret);
-   goto out_pm_runtime;
+   goto out_pll_clk;
}
 
dw_mipi_dsi_rockchip_config(dsi);
@@ -957,19 +957,21 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
ret = rockchip_dsi_drm_create_encoder(dsi, drm_dev);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to create drm encoder\n");
-   goto out_pm_runtime;
+   goto out_pll_clk;
}
 
ret = dw_mipi_dsi_bind(dsi->dmd, &dsi->encoder);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to bind: %d\n", ret);
-   goto out_pm_runtime;
+   goto out_pll_clk;
}
 
dsi->dsi_bound = true;
 
return 0;
 
+out_pll_clk:
+   clk_disable_unprepare(dsi->pllref_clk);
 out_pm_runtime:
pm_runtime_put(dsi->dev);
if (dsi->slave)
-- 
2.33.0.685.g46640cef36-goog

[PATCH v3 3/4] drm/rockchip: dsi: Fix unbalanced clock on probe error

2021-09-28 Thread Brian Norris

Our probe() function never enabled this clock, so we shouldn't disable
it if we fail to probe the bridge.

Noted by inspection.

Fixes: 2d4f7bdafd70 ("drm/rockchip: dsi: migrate to use dw-mipi-dsi bridge 
driver")
Signed-off-by: Brian Norris 
Reviewed-by: Chen-Yu Tsai 
---

(no changes since v1)

 drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index 21c67343cc6c..8ea852880d1c 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -1434,14 +1434,10 @@ static int dw_mipi_dsi_rockchip_probe(struct 
platform_device *pdev)
if (ret != -EPROBE_DEFER)
DRM_DEV_ERROR(dev,
  "Failed to probe dw_mipi_dsi: %d\n", ret);
-   goto err_clkdisable;
+   return ret;
}
 
return 0;
-
-err_clkdisable:
-   clk_disable_unprepare(dsi->pllref_clk);
-   return ret;
 }
 
 static int dw_mipi_dsi_rockchip_remove(struct platform_device *pdev)
-- 
2.33.0.685.g46640cef36-goog

[PATCH v3 1/4] drm/rockchip: dsi: Hold pm-runtime across bind/unbind

2021-09-28 Thread Brian Norris

In commit 43c2de1002d2, we moved most HW configuration to bind(), but we
didn't move the runtime PM management. Therefore, depending on initial
boot state, runtime-PM workqueue delays, and other timing factors, we
may disable our power domain in between the hardware configuration
(bind()) and when we enable the display. This can cause us to lose
hardware state and fail to configure our display. For example:

  dw-mipi-dsi-rockchip ff968000.mipi: failed to write command FIFO
  panel-innolux-p079zca ff96.mipi.0: failed to write command 0

or:

  dw-mipi-dsi-rockchip ff968000.mipi: failed to write command FIFO
  panel-kingdisplay-kd097d04 ff96.mipi.0: failed write init cmds: -110

We should match the runtime PM to the lifetime of the bind()/unbind()
cycle.

Tested on Acer Chrometab 10 (RK3399 Gru-Scarlet), with panel drivers
built either as modules or built-in.

Side notes: it seems one is more likely to see this problem when the
panel driver is built into the kernel. I've also seen this problem
bisect down to commits that simply changed Kconfig dependencies, because
it changed the order in which driver init functions were compiled into
the kernel, and therefore the ordering and timing of built-in device
probe.

Fixes: 43c2de1002d2 ("drm/rockchip: dsi: move all lane config except LCDC mux 
to bind()")
Link: 
https://lore.kernel.org/linux-rockchip/9aedfb528600ecf871885f7293ca4207c84d16c1.ca...@gmail.com/
Reported-by: 
Cc: 
Signed-off-by: Brian Norris 
Tested-by: Nícolas F. R. A. Prado 
---

(no changes since v2)

Changes in v2:
- Clean up pm-runtime state in error cases.
- Correct git hash for Fixes.

 .../gpu/drm/rockchip/dw-mipi-dsi-rockchip.c   | 37 ++-
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index a2262bee5aa4..45676b23c019 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -773,10 +773,6 @@ static void dw_mipi_dsi_encoder_enable(struct drm_encoder 
*encoder)
if (mux < 0)
return;
 
-   pm_runtime_get_sync(dsi->dev);
-   if (dsi->slave)
-   pm_runtime_get_sync(dsi->slave->dev);
-
/*
 * For the RK3399, the clk of grf must be enabled before writing grf
 * register. And for RK3288 or other soc, this grf_clk must be NULL,
@@ -795,20 +791,10 @@ static void dw_mipi_dsi_encoder_enable(struct drm_encoder 
*encoder)
clk_disable_unprepare(dsi->grf_clk);
 }
 
-static void dw_mipi_dsi_encoder_disable(struct drm_encoder *encoder)
-{
-   struct dw_mipi_dsi_rockchip *dsi = to_dsi(encoder);
-
-   if (dsi->slave)
-   pm_runtime_put(dsi->slave->dev);
-   pm_runtime_put(dsi->dev);
-}
-
 static const struct drm_encoder_helper_funcs
 dw_mipi_dsi_encoder_helper_funcs = {
.atomic_check = dw_mipi_dsi_encoder_atomic_check,
.enable = dw_mipi_dsi_encoder_enable,
-   .disable = dw_mipi_dsi_encoder_disable,
 };
 
 static int rockchip_dsi_drm_create_encoder(struct dw_mipi_dsi_rockchip *dsi,
@@ -938,10 +924,14 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
put_device(second);
}
 
+   pm_runtime_get_sync(dsi->dev);
+   if (dsi->slave)
+   pm_runtime_get_sync(dsi->slave->dev);
+
ret = clk_prepare_enable(dsi->pllref_clk);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to enable pllref_clk: %d\n", ret);
-   return ret;
+   goto out_pm_runtime;
}
 
/*
@@ -953,7 +943,7 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
ret = clk_prepare_enable(dsi->grf_clk);
if (ret) {
DRM_DEV_ERROR(dsi->dev, "Failed to enable grf_clk: %d\n", ret);
-   return ret;
+   goto out_pm_runtime;
}
 
dw_mipi_dsi_rockchip_config(dsi);
@@ -965,16 +955,23 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
ret = rockchip_dsi_drm_create_encoder(dsi, drm_dev);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to create drm encoder\n");
-   return ret;
+   goto out_pm_runtime;
}
 
ret = dw_mipi_dsi_bind(dsi->dmd, &dsi->encoder);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to bind: %d\n", ret);
-   return ret;
+   goto out_pm_runtime;
}
 
return 0;
+
+out_pm_runtime:
+   pm_runtime_put(dsi->dev);
+   if (dsi->slave)
+   pm_runtime_put(dsi->slave->dev);
+
+   return ret;
 }
 
 static void dw_mipi_dsi_rockchip_unbind(struct device *dev,
@@ -989,6 +986,10 @@ static void dw_mipi_dsi_rockchip_unbind(struct device *dev,
dw_mipi_dsi_unbind(dsi->dmd);
 
clk_d

[PATCH v3 2/4] drm/rockchip: dsi: Reconfigure hardware on resume()

2021-09-28 Thread Brian Norris

Since commit 43c2de1002d2, we perform most HW configuration in the
bind() function. This configuration may be lost on suspend/resume, so we
need to call it again. That may lead to errors like this after system
suspend/resume:

  dw-mipi-dsi-rockchip ff968000.mipi: failed to write command FIFO
  panel-kingdisplay-kd097d04 ff96.mipi.0: failed write init cmds: -110

Tested on Acer Chromebook Tab 10 (RK3399 Gru-Scarlet).

Note that early mailing list versions of this driver borrowed Rockchip's
downstream/BSP solution, to do HW configuration in mode_set() (which
*is* called at the appropriate pre-enable() times), but that was
discarded along the way. I've avoided that still, because mode_set()
documentation doesn't suggest this kind of purpose as far as I can tell.

Fixes: 43c2de1002d2 ("drm/rockchip: dsi: move all lane config except LCDC mux 
to bind()")
Cc: 
Signed-off-by: Brian Norris 
---

Changes in v3:
 - New patch, to fix related PM issue discovered after patch 1

 .../gpu/drm/rockchip/dw-mipi-dsi-rockchip.c   | 37 +++
 1 file changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index 45676b23c019..21c67343cc6c 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -268,6 +268,8 @@ struct dw_mipi_dsi_rockchip {
struct dw_mipi_dsi *dmd;
const struct rockchip_dw_dsi_chip_data *cdata;
struct dw_mipi_dsi_plat_data pdata;
+
+   bool dsi_bound;
 };
 
 struct dphy_pll_parameter_map {
@@ -964,6 +966,8 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
goto out_pm_runtime;
}
 
+   dsi->dsi_bound = true;
+
return 0;
 
 out_pm_runtime:
@@ -983,6 +987,8 @@ static void dw_mipi_dsi_rockchip_unbind(struct device *dev,
if (dsi->is_slave)
return;
 
+   dsi->dsi_bound = false;
+
dw_mipi_dsi_unbind(dsi->dmd);
 
clk_disable_unprepare(dsi->pllref_clk);
@@ -1277,6 +1283,36 @@ static const struct phy_ops dw_mipi_dsi_dphy_ops = {
.exit   = dw_mipi_dsi_dphy_exit,
 };
 
+static int __maybe_unused dw_mipi_dsi_rockchip_resume(struct device *dev)
+{
+   struct dw_mipi_dsi_rockchip *dsi = dev_get_drvdata(dev);
+   int ret;
+
+   /*
+* Re-configure DSI state, if we were previously initialized. We need
+* to do this before rockchip_drm_drv tries to re-enable() any panels.
+*/
+   if (dsi->dsi_bound) {
+   ret = clk_prepare_enable(dsi->grf_clk);
+   if (ret) {
+   DRM_DEV_ERROR(dsi->dev, "Failed to enable grf_clk: 
%d\n", ret);
+   return ret;
+   }
+
+   dw_mipi_dsi_rockchip_config(dsi);
+   if (dsi->slave)
+   dw_mipi_dsi_rockchip_config(dsi->slave);
+
+   clk_disable_unprepare(dsi->grf_clk);
+   }
+
+   return 0;
+}
+
+static const struct dev_pm_ops dw_mipi_dsi_rockchip_pm_ops = {
+   SET_LATE_SYSTEM_SLEEP_PM_OPS(NULL, dw_mipi_dsi_rockchip_resume)
+};
+
 static int dw_mipi_dsi_rockchip_probe(struct platform_device *pdev)
 {
struct device *dev = &pdev->dev;
@@ -1594,6 +1630,7 @@ struct platform_driver dw_mipi_dsi_rockchip_driver = {
.remove = dw_mipi_dsi_rockchip_remove,
.driver = {
.of_match_table = dw_mipi_dsi_rockchip_dt_ids,
+   .pm = &dw_mipi_dsi_rockchip_pm_ops,
.name   = "dw-mipi-dsi-rockchip",
},
 };
-- 
2.33.0.685.g46640cef36-goog

[PATCH v3 0/4] Fix Rockchip MIPI DSI display init timeouts

2021-09-28 Thread Brian Norris

The Rockchip DSI driver has had a number of bugs over time and has
usually only worked by chance. A number of users have noticed that
things regressed with commit 43c2de1002d2 ("drm/rockchip: dsi: move all
lane config except LCDC mux to bind()"), although it was fixing several
real issues of its own that had been present forever in the upstream
driver (but notably, not in the downstream/BSP driver).

Patch 1 and 2 are the most important fixes here. See their description
for more info.

While I'm at it, fix a few error handling and cleanup issues too.

Changes in v3:
 - New patch, to fix related PM issue discovered after patch 1

Changes in v2:
- Clean up pm-runtime state in error cases.
- Correct git hash for Fixes.

Brian Norris (4):
  drm/rockchip: dsi: Hold pm-runtime across bind/unbind
  drm/rockchip: dsi: Reconfigure hardware on resume()
  drm/rockchip: dsi: Fix unbalanced clock on probe error
  drm/rockchip: dsi: Disable PLL clock on bind error

 .../gpu/drm/rockchip/dw-mipi-dsi-rockchip.c   | 82 +--
 1 file changed, 59 insertions(+), 23 deletions(-)

-- 
2.33.0.685.g46640cef36-goog

Re: [PATCH v2 3/3] drm/rockchip: dsi: Disable PLL clock on bind error

2021-09-28 Thread Brian Norris

On Mon, Sep 27, 2021 at 9:16 PM Chen-Yu Tsai  wrote:
>
> On Tue, Sep 28, 2021 at 2:00 AM Brian Norris  wrote:
> >
> > Fix some error handling here noticed in review of other changes.
> >
> > Reported-by: Chen-Yu Tsai 
> > Signed-off-by: Brian Norris 
>
> Fixes: 2d4f7bdafd70 ("drm/rockchip: dsi: migrate to use dw-mipi-dsi
> bridge driver")
>
> Otherwise,
>
> Reviewed-by: Chen-Yu Tsai 

I'll add these tags, thanks.

> Additionally, I would move patch 2 and this patch before the first patch,
> as these two fix a commit introduced in v5.0, while the first patch fixes
> something introduced in v5.14. This would make automatic backporting cleaner.

Personally, I prioritize putting user-visible fixes first, and more
cosmetic things (like error handling that we ~never hit) later. Also,
the buggy commit was already backported to all still-supported stable
releases where it was relevant (i.e., 5.4+), so if these get
backported, they all should.

Regards,
Brian

[1] 43c2de1002d2 drm/rockchip: dsi: move all lane config except LCDC
mux to bind()

Re: [PATCH v2 1/3] drm/rockchip: dsi: Hold pm-runtime across bind/unbind

2021-09-27 Thread Brian Norris

On Mon, Sep 27, 2021 at 10:59:42AM -0700, Brian Norris wrote:
> In commit 43c2de1002d2, we moved most HW configuration to bind(), but we
> didn't move the runtime PM management. Therefore, depending on initial
> boot state, runtime-PM workqueue delays, and other timing factors, we
> may disable our power domain in between the hardware configuration
> (bind()) and when we enable the display. This can cause us to lose
> hardware state and fail to configure our display. For example:
> 
>   dw-mipi-dsi-rockchip ff968000.mipi: failed to write command FIFO
>   panel-innolux-p079zca ff96.mipi.0: failed to write command 0
> 
> or:
> 
>   dw-mipi-dsi-rockchip ff968000.mipi: failed to write command FIFO
>   panel-kingdisplay-kd097d04 ff96.mipi.0: failed write init cmds: -110
> 
> We should match the runtime PM to the lifetime of the bind()/unbind()
> cycle.

Hmm, sorry to reply to my own patch so quickly, but after a bit more
testing I'm finding we still have yet another problem here -- that
suspend/resume does not work. For suspend/resume,
drm_mode_config_helper_{suspend,resume}() are expecting to only do
teardown/setup via disable()/enable() -- there is no re-bind() (which
makes sense). But the DSI hardware state may be lost, so the resume-time
enable() may find the panel initialization timing out yet again.

Possible solutions:

(1) I can add PM suspend()/resume() operations just to call
dw_mipi_dsi_rockchip_config().

(2) Switch back to using mode_set() for HW configuration, like the
downstream/BSP driver does (and the initial versions Rockchip and
later Heiko were working on did the same), since that's always
called at the right time before both panel and encoder enable().
That also happens to be where some other DSI drivers [1] do similar
init.

Have we been avoiding (2) just because that doesn't really match the
intended purpose of the callback? I can't find any cleaner callback for
this at the moment, and I'd rather not try to introduce entirely new drm
helper callbacks just for this particularly-unfriendly sequence.

I have a patch written for option (1), and may send a v3 soon to include
that as well (because that's also a regression from the same commit).

Brian

[1] e.g., drivers/gpu/drm/bridge/nwl-dsi.c

Re: [PATCH v2 1/3] drm/rockchip: dsi: Hold pm-runtime across bind/unbind

2021-09-27 Thread Brian Norris

On Mon, Sep 27, 2021 at 12:18 PM Tom Hebb  wrote:
> Reviewed-by: Thomas Hebb 

Thanks!

> Thank you for catching this, and sorry that my original fix broke things.
> There had actually been a report of this breakage from my patch, but I
> missed that email until it had already been merged and then didn't have
> time to follow up on it. Totally my bad.

No worries. It was a 1 step forward, 1 step backward kind of a thing
anyway -- things were broken in many cases before your patch too (with
very similar-looking symptoms) -- so the net result is still good,
having both issues fixed.

I'm not sure how that ideally should have been handled [1], but it's
totally fair to not have time to follow up on everything. At the
worst, we could have reverted things; but again, I'm pretty sure
things were broken just as well without your fix, just with a
different root cause.

Regards,
Brian

[1] Don't accept (or, revert?) your bugfix until what may or may not
have been a regression is fixed? I'm not sure.

Re: [PATCH 1/2] drm/rockchip: dsi: hold pm-runtime across bind/unbind

2021-09-27 Thread Brian Norris

On Mon, Sep 27, 2021 at 12:10 AM Chen-Yu Tsai  wrote:
> On Sat, Sep 25, 2021 at 7:24 AM Brian Norris  wrote:
> > We should match the runtime PM to the lifetime of the bind()/unbind()
> > cycle.
>
> I'm not too familiar with MIPI DSI, but it seems that the subsystem expects
> the DSI link to be always available, and in LPM if power saving is required?
> If so then this change matches that expectation, though we might lose some
> power savings compared to the previous non-conforming behavior.

Yeah, I was a little torn on whether we should care about any possible
lost power savings here, because now we stay runtime-enabled even if
the display is not enable()d. But I'm not aware of a good hook for
handling this kind of a sequence, and I'm not convinced there is much
savings by disabling the power domain in that case.

> > Fixes: 59eb7193bef2 ("drm/rockchip: dsi: move all lane config except LCDC 
> > mux to bind()")
>
> This hash is from some stable branch. The mainline one is:
>
> 43c2de1002d2 drm/rockchip: dsi: move all lane config except LCDC mux to bind()

Oops, good catch. I've been doing too much debugging/development on
5.10.y stable. Fixed in v2.

> The bind function is missing an error cleanup path. We might end up with
> unbalanced runtime PM references. (And also possibly an enabled pllref clk.)
> This is a pre-existing issue though. The code changes here look correct.

In v2, I've performed cleanup for the runtime PM state in this patch,
and added an additional patch to fix the other existing issues you
noted. Thanks.

Brian

[PATCH v2 3/3] drm/rockchip: dsi: Disable PLL clock on bind error

2021-09-27 Thread Brian Norris

Fix some error handling here noticed in review of other changes.

Reported-by: Chen-Yu Tsai 
Signed-off-by: Brian Norris 
---

Changes in v2:
- New

 drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index fa4080176719..0ed13d81fe60 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -943,7 +943,7 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
ret = clk_prepare_enable(dsi->grf_clk);
if (ret) {
DRM_DEV_ERROR(dsi->dev, "Failed to enable grf_clk: %d\n", ret);
-   goto out_pm_runtime;
+   goto out_pll_clk;
}
 
dw_mipi_dsi_rockchip_config(dsi);
@@ -955,17 +955,19 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
ret = rockchip_dsi_drm_create_encoder(dsi, drm_dev);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to create drm encoder\n");
-   goto out_pm_runtime;
+   goto out_pll_clk;
}
 
ret = dw_mipi_dsi_bind(dsi->dmd, &dsi->encoder);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to bind: %d\n", ret);
-   goto out_pm_runtime;
+   goto out_pll_clk;
}
 
return 0;
 
+out_pll_clk:
+   clk_disable_unprepare(dsi->pllref_clk);
 out_pm_runtime:
pm_runtime_put(dsi->dev);
if (dsi->slave)
-- 
2.33.0.685.g46640cef36-goog

[PATCH v2 2/3] drm/rockchip: dsi: Fix unbalanced clock on probe error

2021-09-27 Thread Brian Norris

Our probe() function never enabled this clock, so we shouldn't disable
it if we fail to probe the bridge.

Noted by inspection.

Fixes: 2d4f7bdafd70 ("drm/rockchip: dsi: migrate to use dw-mipi-dsi bridge 
driver")
Signed-off-by: Brian Norris 
Reviewed-by: Chen-Yu Tsai 
---

(no changes since v1)

 drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index 45676b23c019..fa4080176719 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -1398,14 +1398,10 @@ static int dw_mipi_dsi_rockchip_probe(struct 
platform_device *pdev)
if (ret != -EPROBE_DEFER)
DRM_DEV_ERROR(dev,
  "Failed to probe dw_mipi_dsi: %d\n", ret);
-   goto err_clkdisable;
+   return ret;
}
 
return 0;
-
-err_clkdisable:
-   clk_disable_unprepare(dsi->pllref_clk);
-   return ret;
 }
 
 static int dw_mipi_dsi_rockchip_remove(struct platform_device *pdev)
-- 
2.33.0.685.g46640cef36-goog

[PATCH v2 1/3] drm/rockchip: dsi: Hold pm-runtime across bind/unbind

2021-09-27 Thread Brian Norris

In commit 43c2de1002d2, we moved most HW configuration to bind(), but we
didn't move the runtime PM management. Therefore, depending on initial
boot state, runtime-PM workqueue delays, and other timing factors, we
may disable our power domain in between the hardware configuration
(bind()) and when we enable the display. This can cause us to lose
hardware state and fail to configure our display. For example:

  dw-mipi-dsi-rockchip ff968000.mipi: failed to write command FIFO
  panel-innolux-p079zca ff96.mipi.0: failed to write command 0

or:

  dw-mipi-dsi-rockchip ff968000.mipi: failed to write command FIFO
  panel-kingdisplay-kd097d04 ff96.mipi.0: failed write init cmds: -110

We should match the runtime PM to the lifetime of the bind()/unbind()
cycle.

Tested on Acer Chrometab 10 (RK3399 Gru-Scarlet), with panel drivers
built either as modules or built-in.

Side notes: it seems one is more likely to see this problem when the
panel driver is built into the kernel. I've also seen this problem
bisect down to commits that simply changed Kconfig dependencies, because
it changed the order in which driver init functions were compiled into
the kernel, and therefore the ordering and timing of built-in device
probe.

Fixes: 43c2de1002d2 ("drm/rockchip: dsi: move all lane config except LCDC mux 
to bind()")
Link: 
https://lore.kernel.org/linux-rockchip/9aedfb528600ecf871885f7293ca4207c84d16c1.ca...@gmail.com/
Reported-by: 
Cc: 
Signed-off-by: Brian Norris 
Tested-by: Nícolas F. R. A. Prado 
---

Changes in v2:
- Clean up pm-runtime state in error cases.
- Correct git hash for Fixes.

 .../gpu/drm/rockchip/dw-mipi-dsi-rockchip.c   | 37 ++-
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index a2262bee5aa4..45676b23c019 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -773,10 +773,6 @@ static void dw_mipi_dsi_encoder_enable(struct drm_encoder 
*encoder)
if (mux < 0)
return;
 
-   pm_runtime_get_sync(dsi->dev);
-   if (dsi->slave)
-   pm_runtime_get_sync(dsi->slave->dev);
-
/*
 * For the RK3399, the clk of grf must be enabled before writing grf
 * register. And for RK3288 or other soc, this grf_clk must be NULL,
@@ -795,20 +791,10 @@ static void dw_mipi_dsi_encoder_enable(struct drm_encoder 
*encoder)
clk_disable_unprepare(dsi->grf_clk);
 }
 
-static void dw_mipi_dsi_encoder_disable(struct drm_encoder *encoder)
-{
-   struct dw_mipi_dsi_rockchip *dsi = to_dsi(encoder);
-
-   if (dsi->slave)
-   pm_runtime_put(dsi->slave->dev);
-   pm_runtime_put(dsi->dev);
-}
-
 static const struct drm_encoder_helper_funcs
 dw_mipi_dsi_encoder_helper_funcs = {
.atomic_check = dw_mipi_dsi_encoder_atomic_check,
.enable = dw_mipi_dsi_encoder_enable,
-   .disable = dw_mipi_dsi_encoder_disable,
 };
 
 static int rockchip_dsi_drm_create_encoder(struct dw_mipi_dsi_rockchip *dsi,
@@ -938,10 +924,14 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
put_device(second);
}
 
+   pm_runtime_get_sync(dsi->dev);
+   if (dsi->slave)
+   pm_runtime_get_sync(dsi->slave->dev);
+
ret = clk_prepare_enable(dsi->pllref_clk);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to enable pllref_clk: %d\n", ret);
-   return ret;
+   goto out_pm_runtime;
}
 
/*
@@ -953,7 +943,7 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
ret = clk_prepare_enable(dsi->grf_clk);
if (ret) {
DRM_DEV_ERROR(dsi->dev, "Failed to enable grf_clk: %d\n", ret);
-   return ret;
+   goto out_pm_runtime;
}
 
dw_mipi_dsi_rockchip_config(dsi);
@@ -965,16 +955,23 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
ret = rockchip_dsi_drm_create_encoder(dsi, drm_dev);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to create drm encoder\n");
-   return ret;
+   goto out_pm_runtime;
}
 
ret = dw_mipi_dsi_bind(dsi->dmd, &dsi->encoder);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to bind: %d\n", ret);
-   return ret;
+   goto out_pm_runtime;
}
 
return 0;
+
+out_pm_runtime:
+   pm_runtime_put(dsi->dev);
+   if (dsi->slave)
+   pm_runtime_put(dsi->slave->dev);
+
+   return ret;
 }
 
 static void dw_mipi_dsi_rockchip_unbind(struct device *dev,
@@ -989,6 +986,10 @@ static void dw_mipi_dsi_rockchip_unbind(struct device *dev,
dw_mipi_dsi_unbind(dsi->dmd);
 
clk_disable_unprepare(dsi

[PATCH v2 0/3] Fix Rockchip MIPI DSI display init timeouts

2021-09-27 Thread Brian Norris

The Rockchip DSI driver has had a number of bugs over time and has
usually only worked by chance. A number of users have noticed that
things regressed with commit 43c2de1002d2 ("drm/rockchip: dsi: move all
lane config except LCDC mux to bind()"), although it was fixing several
real issues of its own that had been present forever in the upstream
driver (but notably, not in the downstream/BSP driver).

Patch 1 is the most important fix here. See its description for more
info.

While I'm at it, fix a few error handling and cleanup issues too.

Changes in v2:
- Clean up pm-runtime state in error cases.
- Correct git hash for Fixes.

Brian Norris (3):
  drm/rockchip: dsi: Hold pm-runtime across bind/unbind
  drm/rockchip: dsi: Fix unbalanced clock on probe error
  drm/rockchip: dsi: Disable PLL clock on bind error

 .../gpu/drm/rockchip/dw-mipi-dsi-rockchip.c   | 45 +--
 1 file changed, 22 insertions(+), 23 deletions(-)

-- 
2.33.0.685.g46640cef36-goog

1 2 >

1 - 100 of 197 matches

Mail list logo