On Thu, 14 Aug 2025 21:50:06 -0600
Alex Hung <[email protected]> wrote:

> From: Harry Wentland <[email protected]>
> 
> Certain operations require us to preserve values below 0.0 and
> above 1.0 (0x0 and 0xffff respectively in 16 bpc unorm). One
> such operation is a BT709 encoding operation followed by its
> decoding operation, or the reverse.
> 
> We'll use s32 values as intermediate in and outputs of our
> color operations, for the operations where it matters.
> 
> For now this won't apply to LUT operations. We might want to
> update those to work on s32 as well, but it's unclear how
> that should work for unorm LUT definitions. We'll revisit
> that once we add LUT + CTM tests.
> 
> In order to allow for this we'll also invert the nesting of our
> colorop processing loops. We now use the pixel iteration loop
> on the outside and the colorop iteration on the inside.

Hi Alex,

is this an out-dated paragraph in the commit message?

I don't see the patch inverting the nesting of loops.

That statement worried me, because changing the loop structures has
tanked the performance before.


Thanks,
pq

> 
> Reviewed-by: Louis Chauvet <[email protected]>
> Signed-off-by: Alex Hung <[email protected]>
> Signed-off-by: Harry Wentland <[email protected]>
> Reviewed-by: Daniel Stone <[email protected]>
> ---
> v7:
>  - Fix checkpatch warnings
>   - Add a commit messages
>   - Fix code styles by adding and removing spaces (new lines, tabs and so on)
> 
> v6:
>  - use clamp_val instead of manual clamping (Louis Chauvet)
> 
> v4:
>  - Clarify that we're packing 16-bit UNORM into s32, not
>    converting values to a different representation (Pekka)
> 
> v3:
>  - Use new colorop->next pointer
> 
>  drivers/gpu/drm/vkms/vkms_composer.c | 27 +++++++++++++++++++++++++--
>  drivers/gpu/drm/vkms/vkms_drv.h      |  4 ++++
>  2 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
> b/drivers/gpu/drm/vkms/vkms_composer.c
> index 0f3fcd6a5925..6630dccd68a4 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -128,7 +128,7 @@ static void apply_lut(const struct vkms_crtc_state 
> *crtc_state, struct line_buff
>       }
>  }
>  
> -static void apply_colorop(struct pixel_argb_u16 *pixel, struct drm_colorop 
> *colorop)
> +static void apply_colorop(struct pixel_argb_s32 *pixel, struct drm_colorop 
> *colorop)
>  {
>       struct drm_colorop_state *colorop_state = colorop->state;
>       struct drm_device *dev = colorop->dev;
> @@ -157,9 +157,26 @@ static void apply_colorop(struct pixel_argb_u16 *pixel, 
> struct drm_colorop *colo
>  static void pre_blend_color_transform(const struct vkms_plane_state 
> *plane_state,
>                                     struct line_buffer *output_buffer)
>  {
> +     struct pixel_argb_s32 pixel;
> +
>       for (size_t x = 0; x < output_buffer->n_pixels; x++) {
>               struct drm_colorop *colorop = 
> plane_state->base.base.color_pipeline;
>  
> +             /*
> +              * Some operations, such as applying a BT709 encoding matrix,
> +              * followed by a decoding matrix, require that we preserve
> +              * values above 1.0 and below 0.0 until the end of the pipeline.
> +              *
> +              * Pack the 16-bit UNORM values into s32 to give us head-room to
> +              * avoid clipping until we're at the end of the pipeline. Clip
> +              * intentionally at the end of the pipeline before packing
> +              * UNORM values back into u16.
> +              */
> +             pixel.a = output_buffer->pixels[x].a;
> +             pixel.r = output_buffer->pixels[x].r;
> +             pixel.g = output_buffer->pixels[x].g;
> +             pixel.b = output_buffer->pixels[x].b;
> +
>               while (colorop) {
>                       struct drm_colorop_state *colorop_state;
>  
> @@ -169,10 +186,16 @@ static void pre_blend_color_transform(const struct 
> vkms_plane_state *plane_state
>                               return;
>  
>                       if (!colorop_state->bypass)
> -                             apply_colorop(&output_buffer->pixels[x], 
> colorop);
> +                             apply_colorop(&pixel, colorop);
>  
>                       colorop = colorop->next;
>               }
> +
> +             /* clamp values */
> +             output_buffer->pixels[x].a = clamp_val(pixel.a, 0, 0xffff);
> +             output_buffer->pixels[x].r = clamp_val(pixel.r, 0, 0xffff);
> +             output_buffer->pixels[x].g = clamp_val(pixel.g, 0, 0xffff);
> +             output_buffer->pixels[x].b = clamp_val(pixel.b, 0, 0xffff);
>       }
>  }
>  
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 30941714cd0f..55440ec6db52 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -45,6 +45,10 @@ struct vkms_frame_info {
>       unsigned int rotation;
>  };
>  
> +struct pixel_argb_s32 {
> +     s32 a, r, g, b;
> +};
> +
>  /**
>   * struct pixel_argb_u16 - Internal representation of a pixel color.
>   * @a: Alpha component value, stored in 16 bits, without padding, using

Attachment: pgpKKT8_C8TpK.pgp
Description: OpenPGP digital signature

Reply via email to