Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-10 Thread Pekka Paalanen
On Fri, 10 Nov 2023 11:27:14 +
"Shankar, Uma"  wrote:

> > -Original Message-
> > From: Pekka Paalanen 
> > Sent: Thursday, November 9, 2023 5:26 PM
> > To: Shankar, Uma 
> > Cc: Joshua Ashton ; Harry Wentland
> > ; dri-devel@lists.freedesktop.org; Sebastian Wick
> > ; Sasha McIntosh ;
> > Abhinav Kumar ; Shashank Sharma
> > ; Xaver Hugl ; Hector
> > Martin ; Liviu Dudau ; Alexander
> > Goins ; Michel Dänzer ; wayland-
> > de...@lists.freedesktop.org; Melissa Wen ; Jonas Ådahl
> > ; Arthur Grillo ; Victoria
> > Brekenfeld ; Sima ; Aleix Pol
> > ; Naseer Ahmed ; Christopher
> > Braga ; Ville Syrjala 
> > 
> > Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive 
> > color
> > pipeline is needed
> > 
> > On Thu, 9 Nov 2023 10:17:11 +
> > "Shankar, Uma"  wrote:
> >   
> > > > -Original Message-
> > > > From: Joshua Ashton 
> > > > Sent: Wednesday, November 8, 2023 7:13 PM
> > > > To: Shankar, Uma ; Harry Wentland
> > > > ; dri-devel@lists.freedesktop.org  
> > 
> > ...
> >   
> > > > Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why
> > > > prescriptive color pipeline is needed
> > > >
> > > >
> > > >
> > > > On 11/8/23 12:18, Shankar, Uma wrote:  
> > > > >
> > > > >  
> > > > >> -Original Message-
> > > > >> From: Harry Wentland 
> > > > >> Sent: Friday, October 20, 2023 2:51 AM
> > > > >> To: dri-devel@lists.freedesktop.org  
> > 
> > ...
> >   
> > > > >> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why
> > > > >> prescriptive color pipeline is needed  
> > 
> > ...
> >   
> > > > >> +An example of a drm_colorop object might look like one of these::
> > > > >> +
> > > > >> +/* 1D enumerated curve */
> > > > >> +Color operation 42
> > > > >> +├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> > > > >> + matrix, 3x4
> > > > >> matrix, 3D LUT, etc.} = 1D enumerated curve
> > > > >> +├─ "BYPASS": bool {true, false}
> > > > >> +├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ
> > > > >> + EOTF, PQ
> > > > >> inverse EOTF, …}  
> > > > >
> > > > > Having the fixed function enum for some targeted input/output may
> > > > > not be scalable for all usecases. There are multiple colorspaces
> > > > > and transfer functions possible, so it will not be possible to
> > > > > cover all these by any enum definitions. Also, this will depend on
> > > > > the capabilities of  
> > > > respective hardware from various vendors.
> > > >
> > > > The reason this exists is such that certain HW vendors such as AMD
> > > > have transfer functions implemented in HW. It is important to take
> > > > advantage of these for both precision and power reasons.  
> > >
> > > Issue we see here is that, it will be too usecase and vendor specific.
> > > There will be BT601, BT709, BT2020, SRGB, HDR EOTF and many more. Not
> > > to forget we will need linearization and non-linearization enums for each 
> > > of  
> > these.
> > 
> > I don't see that as a problem at all. It's not a combinatorial explosion 
> > like
> > input/output combinations in a single enum would be.
> > It's always a curve and its inverse at most.
> > 
> > It's KMS properties, not every driver needs to implement every defined enum
> > value but only those values it can and wants to support.
> > Userspace also sees the supported list, it does not need trial and error.
> > 
> > This is the only way to actually use hard-wired curves. The alternative 
> > would be
> > for userspace to submit a LUT of some type, and the driver needs to start
> > guessing if it matches one of the hard-wired curves the hardware supports, 
> > which
> > is just not feasible.
> > 
> > Hard-wired curves are an addition, not a replacement, to custom curves 
> > defined
> > by parameters or various different LUT representations.
> > Many of these hard-wired curves will emerge as is from common use cases.  
> 
> Point taken, we can go with this fixed function cur

RE: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-10 Thread Shankar, Uma


> -Original Message-
> From: Pekka Paalanen 
> Sent: Thursday, November 9, 2023 5:26 PM
> To: Shankar, Uma 
> Cc: Joshua Ashton ; Harry Wentland
> ; dri-devel@lists.freedesktop.org; Sebastian Wick
> ; Sasha McIntosh ;
> Abhinav Kumar ; Shashank Sharma
> ; Xaver Hugl ; Hector
> Martin ; Liviu Dudau ; Alexander
> Goins ; Michel Dänzer ; wayland-
> de...@lists.freedesktop.org; Melissa Wen ; Jonas Ådahl
> ; Arthur Grillo ; Victoria
> Brekenfeld ; Sima ; Aleix Pol
> ; Naseer Ahmed ; Christopher
> Braga ; Ville Syrjala 
> Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
> pipeline is needed
> 
> On Thu, 9 Nov 2023 10:17:11 +
> "Shankar, Uma"  wrote:
> 
> > > -Original Message-
> > > From: Joshua Ashton 
> > > Sent: Wednesday, November 8, 2023 7:13 PM
> > > To: Shankar, Uma ; Harry Wentland
> > > ; dri-devel@lists.freedesktop.org
> 
> ...
> 
> > > Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why
> > > prescriptive color pipeline is needed
> > >
> > >
> > >
> > > On 11/8/23 12:18, Shankar, Uma wrote:
> > > >
> > > >
> > > >> -----Original Message-
> > > >> From: Harry Wentland 
> > > >> Sent: Friday, October 20, 2023 2:51 AM
> > > >> To: dri-devel@lists.freedesktop.org
> 
> ...
> 
> > > >> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why
> > > >> prescriptive color pipeline is needed
> 
> ...
> 
> > > >> +An example of a drm_colorop object might look like one of these::
> > > >> +
> > > >> +/* 1D enumerated curve */
> > > >> +Color operation 42
> > > >> +├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> > > >> + matrix, 3x4
> > > >> matrix, 3D LUT, etc.} = 1D enumerated curve
> > > >> +├─ "BYPASS": bool {true, false}
> > > >> +├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ
> > > >> + EOTF, PQ
> > > >> inverse EOTF, …}
> > > >
> > > > Having the fixed function enum for some targeted input/output may
> > > > not be scalable for all usecases. There are multiple colorspaces
> > > > and transfer functions possible, so it will not be possible to
> > > > cover all these by any enum definitions. Also, this will depend on
> > > > the capabilities of
> > > respective hardware from various vendors.
> > >
> > > The reason this exists is such that certain HW vendors such as AMD
> > > have transfer functions implemented in HW. It is important to take
> > > advantage of these for both precision and power reasons.
> >
> > Issue we see here is that, it will be too usecase and vendor specific.
> > There will be BT601, BT709, BT2020, SRGB, HDR EOTF and many more. Not
> > to forget we will need linearization and non-linearization enums for each of
> these.
> 
> I don't see that as a problem at all. It's not a combinatorial explosion like
> input/output combinations in a single enum would be.
> It's always a curve and its inverse at most.
> 
> It's KMS properties, not every driver needs to implement every defined enum
> value but only those values it can and wants to support.
> Userspace also sees the supported list, it does not need trial and error.
> 
> This is the only way to actually use hard-wired curves. The alternative would 
> be
> for userspace to submit a LUT of some type, and the driver needs to start
> guessing if it matches one of the hard-wired curves the hardware supports, 
> which
> is just not feasible.
> 
> Hard-wired curves are an addition, not a replacement, to custom curves defined
> by parameters or various different LUT representations.
> Many of these hard-wired curves will emerge as is from common use cases.

Point taken, we can go with this fixed function curve types as long as it 
represents a
single mathematical operation, thereby avoiding the combination nightmare.

However, just want to make sure that the same thing can be done with a 
programmable
hardware. In the case above, lut tables for the same need to be hardcoded in 
driver for
various platforms (depending on its capabilities, precision, number, and 
distribution of luts etc).
This is manageable, but driver will get bloated with all kinds of hardcoded lut 
tables,
which could have been easily computed by the compositor runtime. Driver cannot 
compute
the tables runtime due to the complexity of the floating math involved, so 
hardcod

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-09 Thread Pekka Paalanen
On Thu, 9 Nov 2023 10:17:11 +
"Shankar, Uma"  wrote:

> > -Original Message-
> > From: Joshua Ashton 
> > Sent: Wednesday, November 8, 2023 7:13 PM
> > To: Shankar, Uma ; Harry Wentland
> > ; dri-devel@lists.freedesktop.org

...

> > Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive 
> > color
> > pipeline is needed
> > 
> > 
> > 
> > On 11/8/23 12:18, Shankar, Uma wrote:  
> > >
> > >  
> > >> -Original Message-
> > >> From: Harry Wentland 
> > >> Sent: Friday, October 20, 2023 2:51 AM
> > >> To: dri-devel@lists.freedesktop.org

...

> > >> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive
> > >> color pipeline is needed

...

> > >> +An example of a drm_colorop object might look like one of these::
> > >> +
> > >> +/* 1D enumerated curve */
> > >> +Color operation 42
> > >> +├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> > >> + matrix, 3x4
> > >> matrix, 3D LUT, etc.} = 1D enumerated curve
> > >> +├─ "BYPASS": bool {true, false}
> > >> +├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF,
> > >> + PQ
> > >> inverse EOTF, …}  
> > >
> > > Having the fixed function enum for some targeted input/output may not
> > > be scalable for all usecases. There are multiple colorspaces and
> > > transfer functions possible, so it will not be possible to cover all
> > > these by any enum definitions. Also, this will depend on the capabilities 
> > > of  
> > respective hardware from various vendors.
> > 
> > The reason this exists is such that certain HW vendors such as AMD have 
> > transfer
> > functions implemented in HW. It is important to take advantage of these for 
> > both
> > precision and power reasons.  
> 
> Issue we see here is that, it will be too usecase and vendor specific.
> There will be BT601, BT709, BT2020, SRGB, HDR EOTF and many more. Not to 
> forget
> we will need linearization and non-linearization enums for each of these.

I don't see that as a problem at all. It's not a combinatorial
explosion like input/output combinations in a single enum would be.
It's always a curve and its inverse at most.

It's KMS properties, not every driver needs to implement every
defined enum value but only those values it can and wants to support.
Userspace also sees the supported list, it does not need trial and
error.

This is the only way to actually use hard-wired curves. The
alternative would be for userspace to submit a LUT of some type, and
the driver needs to start guessing if it matches one of the hard-wired
curves the hardware supports, which is just not feasible.

Hard-wired curves are an addition, not a replacement, to custom
curves defined by parameters or various different LUT representations.
Many of these hard-wired curves will emerge as is from common use cases.

> Also 
> a CTM indication to convert colospace.

Did someone propose to enumerate matrices? I would not do that, unless
you literally have hard-wired matrices in hardware and cannot do custom
matrices.

> Also, if the underlying hardware block is 
> programmable, its not limited to be used only for the colorspace management 
> but
> can be used for other color enhancements as well by a capable client.

Yes, that's why we have other types for curves, the programmable ones.

> Hence, we feel that it is bordering on being descriptive with too many 
> possible
> combinations (not easy to generalize). So, if hardware is programmable, lets
> expose its capability through a blob and be generic.

It's not descriptive though. It's a prescription of a mathematical
function the hardware implements as fixed-function hardware. The
function is a curve. There is no implication that the curve must be
used with specific input or output color spaces.

> For any fixed function hardware where Lut etc is stored in ROM and just a 
> control/enable
> bit is provided to driver, we can define a pipeline with a vendor specific 
> color block. This
> can be identified with a flag (better ways can be discussed). 

No, there is no need for that. A curve type will do well.

A vendor specific colorop needs vendor specific userspace code to
program *at all*. A generic curve colorop might list some curve types
the userspace does not understand, but also curve types userspace does
understand. The understood curve types can still be used by userspace.

> For example, on some of the Intel platform, we had a fixed function to 
> convert colorspaces
> directly with a bit s

RE: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-09 Thread Shankar, Uma


> -Original Message-
> From: Harry Wentland 
> Sent: Wednesday, November 8, 2023 8:08 PM
> To: Shankar, Uma ; dri-devel@lists.freedesktop.org
> Cc: wayland-de...@lists.freedesktop.org; Ville Syrjala
> ; Pekka Paalanen
> ; Simon Ser ; Melissa
> Wen ; Jonas Ådahl ; Sebastian Wick
> ; Shashank Sharma
> ; Alexander Goins ; Joshua
> Ashton ; Michel Dänzer ; Aleix Pol
> ; Xaver Hugl ; Victoria Brekenfeld
> ; Sima ; Naseer Ahmed
> ; Christopher Braga ;
> Abhinav Kumar ; Arthur Grillo
> ; Hector Martin ; Liviu Dudau
> ; Sasha McIntosh 
> Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
> pipeline is needed
> 
> 
> 
> On 2023-11-08 07:18, Shankar, Uma wrote:
> >
> >
> >> -Original Message-
> >> From: Harry Wentland 
> >> Sent: Friday, October 20, 2023 2:51 AM
> >> To: dri-devel@lists.freedesktop.org
> >> Cc: wayland-de...@lists.freedesktop.org; Harry Wentland
> >> ; Ville Syrjala
> >> ; Pekka Paalanen
> >> ; Simon Ser ;
> >> Melissa Wen ; Jonas Ådahl ;
> >> Sebastian Wick ; Shashank Sharma
> >> ; Alexander Goins ;
> >> Joshua Ashton ; Michel Dänzer
> >> ; Aleix Pol ; Xaver Hugl
> >> ; Victoria Brekenfeld ;
> >> Sima ; Shankar, Uma ; Naseer
> >> Ahmed ; Christopher Braga
> >> ; Abhinav Kumar ;
> >> Arthur Grillo ; Hector Martin
> >> ; Liviu Dudau ; Sasha McIntosh
> >> 
> >> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive
> >> color pipeline is needed
> >>
> >> v2:
> >>  - Update colorop visualizations to match reality (Sebastian, Alex
> >> Hung)
> >>  - Updated wording (Pekka)
> >>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> >>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> >>section (Pekka)
> >>  - Use PQ EOTF instead of its inverse in Pipeline Programming example
> >> (Melissa)
> >>  - Add "Driver Implementer's Guide" section (Pekka)
> >>  - Add "Driver Forward/Backward Compatibility" section (Sebastian,
> >> Pekka)
> >>
> >> Signed-off-by: Harry Wentland 
> >> Cc: Ville Syrjala 
> >> Cc: Pekka Paalanen 
> >> Cc: Simon Ser 
> >> Cc: Harry Wentland 
> >> Cc: Melissa Wen 
> >> Cc: Jonas Ådahl 
> >> Cc: Sebastian Wick 
> >> Cc: Shashank Sharma 
> >> Cc: Alexander Goins 
> >> Cc: Joshua Ashton 
> >> Cc: Michel Dänzer 
> >> Cc: Aleix Pol 
> >> Cc: Xaver Hugl 
> >> Cc: Victoria Brekenfeld 
> >> Cc: Sima 
> >> Cc: Uma Shankar 
> >> Cc: Naseer Ahmed 
> >> Cc: Christopher Braga 
> >> Cc: Abhinav Kumar 
> >> Cc: Arthur Grillo 
> >> Cc: Hector Martin 
> >> Cc: Liviu Dudau 
> >> Cc: Sasha McIntosh 
> >> ---
> >>  Documentation/gpu/rfc/color_pipeline.rst | 347
> >> +++
> >>  1 file changed, 347 insertions(+)
> >>  create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
> >>
> >> diff --git a/Documentation/gpu/rfc/color_pipeline.rst
> >> b/Documentation/gpu/rfc/color_pipeline.rst
> >> new file mode 100644
> >> index ..af5f2ea29116
> >> --- /dev/null
> >> +++ b/Documentation/gpu/rfc/color_pipeline.rst
> >> @@ -0,0 +1,347 @@
> >> +
> >> +Linux Color Pipeline API
> >> +
> >> +
> >> +What problem are we solving?
> >> +
> >> +
> >> +We would like to support pre-, and post-blending complex color
> >> +transformations in display controller hardware in order to allow for
> >> +HW-supported HDR use-cases, as well as to provide support to
> >> +color-managed applications, such as video or image editors.
> >> +
> >> +It is possible to support an HDR output on HW supporting the
> >> +Colorspace and HDR Metadata drm_connector properties, but that
> >> +requires the compositor or application to render and compose the
> >> +content into one final buffer intended for display. Doing so is costly.
> >> +
> >> +Most modern display HW offers various 1D LUTs, 3D LUTs, matrices,
> >> +and other operations to support color transformations. These
> >> +operations are often implemented in fixed-function HW and therefore
> >> +much more power efficient than performing similar operations via

RE: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-09 Thread Shankar, Uma


> -Original Message-
> From: Joshua Ashton 
> Sent: Wednesday, November 8, 2023 7:13 PM
> To: Shankar, Uma ; Harry Wentland
> ; dri-devel@lists.freedesktop.org
> Cc: wayland-de...@lists.freedesktop.org; Ville Syrjala
> ; Pekka Paalanen
> ; Simon Ser ; Melissa
> Wen ; Jonas Ådahl ; Sebastian Wick
> ; Shashank Sharma
> ; Alexander Goins ; Michel
> Dänzer ; Aleix Pol ; Xaver Hugl
> ; Victoria Brekenfeld ; Sima
> ; Naseer Ahmed ; Christopher
> Braga ; Abhinav Kumar
> ; Arthur Grillo ; Hector
> Martin ; Liviu Dudau ; Sasha
> McIntosh 
> Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
> pipeline is needed
> 
> 
> 
> On 11/8/23 12:18, Shankar, Uma wrote:
> >
> >
> >> -Original Message-
> >> From: Harry Wentland 
> >> Sent: Friday, October 20, 2023 2:51 AM
> >> To: dri-devel@lists.freedesktop.org
> >> Cc: wayland-de...@lists.freedesktop.org; Harry Wentland
> >> ; Ville Syrjala
> >> ; Pekka Paalanen
> >> ; Simon Ser ;
> >> Melissa Wen ; Jonas Ådahl ;
> >> Sebastian Wick ; Shashank Sharma
> >> ; Alexander Goins ;
> >> Joshua Ashton ; Michel Dänzer
> >> ; Aleix Pol ; Xaver Hugl
> >> ; Victoria Brekenfeld ;
> >> Sima ; Shankar, Uma ; Naseer
> >> Ahmed ; Christopher Braga
> >> ; Abhinav Kumar ;
> >> Arthur Grillo ; Hector Martin
> >> ; Liviu Dudau ; Sasha McIntosh
> >> 
> >> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive
> >> color pipeline is needed
> >>
> >> v2:
> >>   - Update colorop visualizations to match reality (Sebastian, Alex Hung)
> >>   - Updated wording (Pekka)
> >>   - Change BYPASS wording to make it non-mandatory (Sebastian)
> >>   - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> >> section (Pekka)
> >>   - Use PQ EOTF instead of its inverse in Pipeline Programming example
> (Melissa)
> >>   - Add "Driver Implementer's Guide" section (Pekka)
> >>   - Add "Driver Forward/Backward Compatibility" section (Sebastian,
> >> Pekka)
> >>
> >> Signed-off-by: Harry Wentland 
> >> Cc: Ville Syrjala 
> >> Cc: Pekka Paalanen 
> >> Cc: Simon Ser 
> >> Cc: Harry Wentland 
> >> Cc: Melissa Wen 
> >> Cc: Jonas Ådahl 
> >> Cc: Sebastian Wick 
> >> Cc: Shashank Sharma 
> >> Cc: Alexander Goins 
> >> Cc: Joshua Ashton 
> >> Cc: Michel Dänzer 
> >> Cc: Aleix Pol 
> >> Cc: Xaver Hugl 
> >> Cc: Victoria Brekenfeld 
> >> Cc: Sima 
> >> Cc: Uma Shankar 
> >> Cc: Naseer Ahmed 
> >> Cc: Christopher Braga 
> >> Cc: Abhinav Kumar 
> >> Cc: Arthur Grillo 
> >> Cc: Hector Martin 
> >> Cc: Liviu Dudau 
> >> Cc: Sasha McIntosh 
> >> ---
> >>   Documentation/gpu/rfc/color_pipeline.rst | 347 +++
> >>   1 file changed, 347 insertions(+)
> >>   create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
> >>
> >> diff --git a/Documentation/gpu/rfc/color_pipeline.rst
> >> b/Documentation/gpu/rfc/color_pipeline.rst
> >> new file mode 100644
> >> index ..af5f2ea29116
> >> --- /dev/null
> >> +++ b/Documentation/gpu/rfc/color_pipeline.rst
> >> @@ -0,0 +1,347 @@
> >> +
> >> +Linux Color Pipeline API
> >> +
> >> +
> >> +What problem are we solving?
> >> +
> >> +
> >> +We would like to support pre-, and post-blending complex color
> >> +transformations in display controller hardware in order to allow for
> >> +HW-supported HDR use-cases, as well as to provide support to
> >> +color-managed applications, such as video or image editors.
> >> +
> >> +It is possible to support an HDR output on HW supporting the
> >> +Colorspace and HDR Metadata drm_connector properties, but that
> >> +requires the compositor or application to render and compose the
> >> +content into one final buffer intended for display. Doing so is costly.
> >> +
> >> +Most modern display HW offers various 1D LUTs, 3D LUTs, matrices,
> >> +and other operations to support color transformations. These
> >> +operations are often implemented in fixed-function HW and therefore
> >> +much more power efficient than performing similar operations via shaders 
> >> or
&g

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-08 Thread Harry Wentland



On 2023-11-08 07:18, Shankar, Uma wrote:
> 
> 
>> -Original Message-
>> From: Harry Wentland 
>> Sent: Friday, October 20, 2023 2:51 AM
>> To: dri-devel@lists.freedesktop.org
>> Cc: wayland-de...@lists.freedesktop.org; Harry Wentland
>> ; Ville Syrjala ; 
>> Pekka
>> Paalanen ; Simon Ser ;
>> Melissa Wen ; Jonas Ådahl ; Sebastian
>> Wick ; Shashank Sharma
>> ; Alexander Goins ; Joshua
>> Ashton ; Michel Dänzer ; Aleix Pol
>> ; Xaver Hugl ; Victoria Brekenfeld
>> ; Sima ; Shankar, Uma
>> ; Naseer Ahmed ;
>> Christopher Braga ; Abhinav Kumar
>> ; Arthur Grillo ; Hector
>> Martin ; Liviu Dudau ; Sasha
>> McIntosh 
>> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
>> pipeline is needed
>>
>> v2:
>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>>  - Updated wording (Pekka)
>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>>section (Pekka)
>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example 
>> (Melissa)
>>  - Add "Driver Implementer's Guide" section (Pekka)
>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
>>
>> Signed-off-by: Harry Wentland 
>> Cc: Ville Syrjala 
>> Cc: Pekka Paalanen 
>> Cc: Simon Ser 
>> Cc: Harry Wentland 
>> Cc: Melissa Wen 
>> Cc: Jonas Ådahl 
>> Cc: Sebastian Wick 
>> Cc: Shashank Sharma 
>> Cc: Alexander Goins 
>> Cc: Joshua Ashton 
>> Cc: Michel Dänzer 
>> Cc: Aleix Pol 
>> Cc: Xaver Hugl 
>> Cc: Victoria Brekenfeld 
>> Cc: Sima 
>> Cc: Uma Shankar 
>> Cc: Naseer Ahmed 
>> Cc: Christopher Braga 
>> Cc: Abhinav Kumar 
>> Cc: Arthur Grillo 
>> Cc: Hector Martin 
>> Cc: Liviu Dudau 
>> Cc: Sasha McIntosh 
>> ---
>>  Documentation/gpu/rfc/color_pipeline.rst | 347 +++
>>  1 file changed, 347 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
>>
>> diff --git a/Documentation/gpu/rfc/color_pipeline.rst
>> b/Documentation/gpu/rfc/color_pipeline.rst
>> new file mode 100644
>> index ..af5f2ea29116
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/color_pipeline.rst
>> @@ -0,0 +1,347 @@
>> +
>> +Linux Color Pipeline API
>> +
>> +
>> +What problem are we solving?
>> +
>> +
>> +We would like to support pre-, and post-blending complex color
>> +transformations in display controller hardware in order to allow for
>> +HW-supported HDR use-cases, as well as to provide support to
>> +color-managed applications, such as video or image editors.
>> +
>> +It is possible to support an HDR output on HW supporting the Colorspace
>> +and HDR Metadata drm_connector properties, but that requires the
>> +compositor or application to render and compose the content into one
>> +final buffer intended for display. Doing so is costly.
>> +
>> +Most modern display HW offers various 1D LUTs, 3D LUTs, matrices, and
>> +other operations to support color transformations. These operations are
>> +often implemented in fixed-function HW and therefore much more power
>> +efficient than performing similar operations via shaders or CPU.
>> +
>> +We would like to make use of this HW functionality to support complex
>> +color transformations with no, or minimal CPU or shader load.
>> +
>> +
>> +How are other OSes solving this problem?
>> +
>> +
>> +The most widely supported use-cases regard HDR content, whether video
>> +or gaming.
>> +
>> +Most OSes will specify the source content format (color gamut, encoding
>> +transfer function, and other metadata, such as max and average light 
>> levels) to a
>> driver.
>> +Drivers will then program their fixed-function HW accordingly to map
>> +from a source content buffer's space to a display's space.
>> +
>> +When fixed-function HW is not available the compositor will assemble a
>> +shader to ask the GPU to perform the transformation from the source
>> +content format to the display's format.
>> +
>> +A compositor's mapping function and a driver's mapping function are
>> +usually entirely separate concepts. On OSes where a HW vendor has no
>> +insight into closed-source compositor code such a vendor will

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-08 Thread Joshua Ashton




On 11/8/23 12:18, Shankar, Uma wrote:




-Original Message-
From: Harry Wentland 
Sent: Friday, October 20, 2023 2:51 AM
To: dri-devel@lists.freedesktop.org
Cc: wayland-de...@lists.freedesktop.org; Harry Wentland
; Ville Syrjala ; Pekka
Paalanen ; Simon Ser ;
Melissa Wen ; Jonas Ådahl ; Sebastian
Wick ; Shashank Sharma
; Alexander Goins ; Joshua
Ashton ; Michel Dänzer ; Aleix Pol
; Xaver Hugl ; Victoria Brekenfeld
; Sima ; Shankar, Uma
; Naseer Ahmed ;
Christopher Braga ; Abhinav Kumar
; Arthur Grillo ; Hector
Martin ; Liviu Dudau ; Sasha
McIntosh 
Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
pipeline is needed

v2:
  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
  - Updated wording (Pekka)
  - Change BYPASS wording to make it non-mandatory (Sebastian)
  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
section (Pekka)
  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
  - Add "Driver Implementer's Guide" section (Pekka)
  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)

Signed-off-by: Harry Wentland 
Cc: Ville Syrjala 
Cc: Pekka Paalanen 
Cc: Simon Ser 
Cc: Harry Wentland 
Cc: Melissa Wen 
Cc: Jonas Ådahl 
Cc: Sebastian Wick 
Cc: Shashank Sharma 
Cc: Alexander Goins 
Cc: Joshua Ashton 
Cc: Michel Dänzer 
Cc: Aleix Pol 
Cc: Xaver Hugl 
Cc: Victoria Brekenfeld 
Cc: Sima 
Cc: Uma Shankar 
Cc: Naseer Ahmed 
Cc: Christopher Braga 
Cc: Abhinav Kumar 
Cc: Arthur Grillo 
Cc: Hector Martin 
Cc: Liviu Dudau 
Cc: Sasha McIntosh 
---
  Documentation/gpu/rfc/color_pipeline.rst | 347 +++
  1 file changed, 347 insertions(+)
  create mode 100644 Documentation/gpu/rfc/color_pipeline.rst

diff --git a/Documentation/gpu/rfc/color_pipeline.rst
b/Documentation/gpu/rfc/color_pipeline.rst
new file mode 100644
index ..af5f2ea29116
--- /dev/null
+++ b/Documentation/gpu/rfc/color_pipeline.rst
@@ -0,0 +1,347 @@
+
+Linux Color Pipeline API
+
+
+What problem are we solving?
+
+
+We would like to support pre-, and post-blending complex color
+transformations in display controller hardware in order to allow for
+HW-supported HDR use-cases, as well as to provide support to
+color-managed applications, such as video or image editors.
+
+It is possible to support an HDR output on HW supporting the Colorspace
+and HDR Metadata drm_connector properties, but that requires the
+compositor or application to render and compose the content into one
+final buffer intended for display. Doing so is costly.
+
+Most modern display HW offers various 1D LUTs, 3D LUTs, matrices, and
+other operations to support color transformations. These operations are
+often implemented in fixed-function HW and therefore much more power
+efficient than performing similar operations via shaders or CPU.
+
+We would like to make use of this HW functionality to support complex
+color transformations with no, or minimal CPU or shader load.
+
+
+How are other OSes solving this problem?
+
+
+The most widely supported use-cases regard HDR content, whether video
+or gaming.
+
+Most OSes will specify the source content format (color gamut, encoding
+transfer function, and other metadata, such as max and average light levels) 
to a
driver.
+Drivers will then program their fixed-function HW accordingly to map
+from a source content buffer's space to a display's space.
+
+When fixed-function HW is not available the compositor will assemble a
+shader to ask the GPU to perform the transformation from the source
+content format to the display's format.
+
+A compositor's mapping function and a driver's mapping function are
+usually entirely separate concepts. On OSes where a HW vendor has no
+insight into closed-source compositor code such a vendor will tune
+their color management code to visually match the compositor's. On
+other OSes, where both mapping functions are open to an implementer they will
ensure both mappings match.
+
+This results in mapping algorithm lock-in, meaning that no-one alone
+can experiment with or introduce new mapping algorithms and achieve
+consistent results regardless of which implementation path is taken.
+
+Why is Linux different?
+===
+
+Unlike other OSes, where there is one compositor for one or more
+drivers, on Linux we have a many-to-many relationship. Many compositors;
many drivers.
+In addition each compositor vendor or community has their own view of
+how color management should be done. This is what makes Linux so beautiful.
+
+This means that a HW vendor can now no longer tune their driver to one
+compositor, as tuning it to one could make it look fairly different
+from another compositor's color mapping.
+
+We need a better solution.
+
+
+Descriptive API
+===
+
+An API that describes the so

RE: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-08 Thread Shankar, Uma


> -Original Message-
> From: Harry Wentland 
> Sent: Friday, October 20, 2023 2:51 AM
> To: dri-devel@lists.freedesktop.org
> Cc: wayland-de...@lists.freedesktop.org; Harry Wentland
> ; Ville Syrjala ; Pekka
> Paalanen ; Simon Ser ;
> Melissa Wen ; Jonas Ådahl ; Sebastian
> Wick ; Shashank Sharma
> ; Alexander Goins ; Joshua
> Ashton ; Michel Dänzer ; Aleix Pol
> ; Xaver Hugl ; Victoria Brekenfeld
> ; Sima ; Shankar, Uma
> ; Naseer Ahmed ;
> Christopher Braga ; Abhinav Kumar
> ; Arthur Grillo ; Hector
> Martin ; Liviu Dudau ; Sasha
> McIntosh 
> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
> pipeline is needed
> 
> v2:
>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>  - Updated wording (Pekka)
>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>section (Pekka)
>  - Use PQ EOTF instead of its inverse in Pipeline Programming example 
> (Melissa)
>  - Add "Driver Implementer's Guide" section (Pekka)
>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
> 
> Signed-off-by: Harry Wentland 
> Cc: Ville Syrjala 
> Cc: Pekka Paalanen 
> Cc: Simon Ser 
> Cc: Harry Wentland 
> Cc: Melissa Wen 
> Cc: Jonas Ådahl 
> Cc: Sebastian Wick 
> Cc: Shashank Sharma 
> Cc: Alexander Goins 
> Cc: Joshua Ashton 
> Cc: Michel Dänzer 
> Cc: Aleix Pol 
> Cc: Xaver Hugl 
> Cc: Victoria Brekenfeld 
> Cc: Sima 
> Cc: Uma Shankar 
> Cc: Naseer Ahmed 
> Cc: Christopher Braga 
> Cc: Abhinav Kumar 
> Cc: Arthur Grillo 
> Cc: Hector Martin 
> Cc: Liviu Dudau 
> Cc: Sasha McIntosh 
> ---
>  Documentation/gpu/rfc/color_pipeline.rst | 347 +++
>  1 file changed, 347 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
> 
> diff --git a/Documentation/gpu/rfc/color_pipeline.rst
> b/Documentation/gpu/rfc/color_pipeline.rst
> new file mode 100644
> index ..af5f2ea29116
> --- /dev/null
> +++ b/Documentation/gpu/rfc/color_pipeline.rst
> @@ -0,0 +1,347 @@
> +
> +Linux Color Pipeline API
> +
> +
> +What problem are we solving?
> +
> +
> +We would like to support pre-, and post-blending complex color
> +transformations in display controller hardware in order to allow for
> +HW-supported HDR use-cases, as well as to provide support to
> +color-managed applications, such as video or image editors.
> +
> +It is possible to support an HDR output on HW supporting the Colorspace
> +and HDR Metadata drm_connector properties, but that requires the
> +compositor or application to render and compose the content into one
> +final buffer intended for display. Doing so is costly.
> +
> +Most modern display HW offers various 1D LUTs, 3D LUTs, matrices, and
> +other operations to support color transformations. These operations are
> +often implemented in fixed-function HW and therefore much more power
> +efficient than performing similar operations via shaders or CPU.
> +
> +We would like to make use of this HW functionality to support complex
> +color transformations with no, or minimal CPU or shader load.
> +
> +
> +How are other OSes solving this problem?
> +
> +
> +The most widely supported use-cases regard HDR content, whether video
> +or gaming.
> +
> +Most OSes will specify the source content format (color gamut, encoding
> +transfer function, and other metadata, such as max and average light levels) 
> to a
> driver.
> +Drivers will then program their fixed-function HW accordingly to map
> +from a source content buffer's space to a display's space.
> +
> +When fixed-function HW is not available the compositor will assemble a
> +shader to ask the GPU to perform the transformation from the source
> +content format to the display's format.
> +
> +A compositor's mapping function and a driver's mapping function are
> +usually entirely separate concepts. On OSes where a HW vendor has no
> +insight into closed-source compositor code such a vendor will tune
> +their color management code to visually match the compositor's. On
> +other OSes, where both mapping functions are open to an implementer they will
> ensure both mappings match.
> +
> +This results in mapping algorithm lock-in, meaning that no-one alone
> +can experiment with or introduce new mapping algorithms and achieve
> +consistent results regardless of which implementation path is taken.
> +
> +Why is Linux different?
> +===
> +
> +Unlike other 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-07 Thread Sebastian Wick
On Tue, Nov 07, 2023 at 11:52:11AM -0500, Harry Wentland wrote:
> 
> 
> On 2023-10-26 13:30, Sebastian Wick wrote:
> > On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
> >> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> >> Alex Goins  wrote:
> >>
> >>> Thank you Harry and all other contributors for your work on this. 
> >>> Responses
> >>> inline -
> >>>
> >>> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
> >>>
>  On Fri, 20 Oct 2023 11:23:28 -0400
>  Harry Wentland  wrote:
>    
> > On 2023-10-20 10:57, Pekka Paalanen wrote:  
> >> On Fri, 20 Oct 2023 16:22:56 +0200
> >> Sebastian Wick  wrote:
> >> 
> >>> Thanks for continuing to work on this!
> >>>
> >>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
> 
> snip
> 
> >>
> >> I think we also need a definition of "informational".
> >>
> >> Counter-example 1: a colorop that represents a non-configurable
> >
> > Not sure what's "counter" for these examples?
> >   
> >> YUV<->RGB conversion. Maybe it determines its operation from FB pixel
> >> format. It cannot be set to bypass, it cannot be configured, and it
> >> will alter color values.  
> >>>
> >>> Would it be reasonable to expose this is a 3x4 matrix with a read-only 
> >>> blob and
> >>> no BYPASS property? I already brought up a similar idea at the XDC HDR 
> >>> Workshop
> >>> based on the principle that read-only blobs could be used to express some 
> >>> static
> >>> pipeline elements without the need to define a new type, but got mixed 
> >>> opinions.
> >>> I think this demonstrates the principle further, as clients could detect 
> >>> this
> >>> programmatically instead of having to special-case the informational 
> >>> element.
> >>
> > 
> > I'm all for exposing fixed color ops but I suspect that most of those
> > follow some standard and in those cases instead of exposing the matrix
> > values one should prefer to expose a named matrix (e.g. BT.601, BT.709,
> > BT.2020).
> > 
> 
> Agreed.
> 
> > As a general rule: always expose the highest level description. Going
> > from a name to exact values is trivial, going from values to a name is
> > much harder.
> > 
> >> If the blob depends on the pixel format (i.e. the driver automatically
> >> chooses a different blob per pixel format), then I think we would need
> >> to expose all the blobs and how they correspond to pixel formats.
> >> Otherwise ok, I guess.
> >>
> >> However, do we want or need to make a color pipeline or colorop
> >> conditional on pixel formats? For example, if you use a YUV 4:2:0 type
> >> of pixel format, then you must use this pipeline and not any other. Or
> >> floating-point type of pixel format. I did not anticipate this before,
> >> I assumed that all color pipelines and colorops are independent of the
> >> framebuffer pixel format. A specific colorop might have a property that
> >> needs to agree with the framebuffer pixel format, but I didn't expect
> >> further limitations.
> > 
> > We could simply fail commits when the pipeline and pixel format don't
> > work together. We'll probably need some kind of ingress no-op node
> > anyway and maybe could list pixel formats there if required to make it
> > easier to find a working configuration.
> > 
> 
> The problem with failing commits is that user-space has no idea why it
> failed. If this means that userspace falls back to SW composition for
> NV12 and P010 it would avoid HW offloading in one of the most important
> use-cases on AMD HW for power-saving purposes.

Exposing which pixel formats work with a pipeline should be
uncontroversial, and so should be an informative scaler op.

Both can be added without a problem at a later time, so let's not make
any of that mandatory for the first version. One step after the other.

> 
> snip
> 
> >>> Despite being programmable, the LUTs are updated in a manner that is less
> >>> efficient as compared to e.g. the non-static "degamma" LUT. Would it be 
> >>> helpful
> >>> if there was some way to tag operations according to their performance,
> >>> for example so that clients can prefer a high performance one when they
> >>> intend to do an animated transition? I recall from the XDC HDR workshop
> >>> that this is also an issue with AMD's 3DLUT, where updates can be too
> >>> slow to animate.
> >>
> >> I can certainly see such information being useful, but then we need to
> >> somehow quantize the performance.
> >>
> >> What I was left puzzled about after the XDC workshop is that is it
> >> possible to pre-load configurations in the background (slow), and then
> >> quickly switch between them? Hardware-wise I mean.
> > 
> > We could define that pipelines with a lower ID are to be preferred over
> > higher IDs.
> > 
> > The issue is that if programming a pipeline becomes too slow to be
> > useful it probably should just not be made available to user space.
> > 
> > The prepare-commit idea for blob properties would help to 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-07 Thread Harry Wentland



On 2023-11-04 19:01, Christopher Braga wrote:
> Just want to loop back to before we branched off deeper into the programming 
> performance talk
> 
> On 10/26/2023 3:25 PM, Alex Goins wrote:
>> On Thu, 26 Oct 2023, Sebastian Wick wrote:
>>
>>> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
 On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
 Alex Goins  wrote:

> Thank you Harry and all other contributors for your work on this. 
> Responses
> inline -
>
> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
>
>> On Fri, 20 Oct 2023 11:23:28 -0400
>> Harry Wentland  wrote:
>>
>>> On 2023-10-20 10:57, Pekka Paalanen wrote:
 On Fri, 20 Oct 2023 16:22:56 +0200
 Sebastian Wick  wrote:

> Thanks for continuing to work on this!
>
> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:

snip

> Actually, the current examples in the proposal don't include a multiplier 
> color
> op, which might be useful. For AMD as above, but also for NVIDIA as the
> following issue arises:
>
> As discussed further below, the NVIDIA "degamma" LUT performs an implicit 
> fixed
> 
> If possible, let's declare this as two blocks. One that informatively 
> declares the conversion is present, and another for the de-gamma. This will 
> help with block-reuse between vendors.
> 
> point to FP16 conversion. In that conversion, what fixed point 0x 
> maps
> to in floating point varies depending on the source content. If it's SDR
> content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
> potential boost multiplier if we want SDR content to be brighter. If it's 
> HDR PQ
> content, we want the max value in FP16 to be 125.0 (10,000 nits). My 
> assumption
> is that this is also what AMD's "HDR Multiplier" stage is used for, is 
> that
> correct?

 It would be against the UAPI design principles to tag content as HDR or
 SDR. What you can do instead is to expose a colorop with a multiplier of
 1.0 or 125.0 to match your hardware behaviour, then tell your hardware
 that the input is SDR or HDR to get the expected multiplier. You will
 never know what the content actually is, anyway.
>>
>> Right, I didn't mean to suggest that we should tag content as HDR or SDR in 
>> the
>> UAPI, just relating to the end result in the pipe, ultimately it would be
>> determined by the multiplier color op.
>>
> 
> A multiplier could work but we would should give OEMs the option to either 
> make it "informative" and fixed by the hardware, or fully configurable. With 
> the Qualcomm pipeline how we absorb FP16 pixel buffers, as well as how we 
> convert them to fixed point data actually has a dependency on the desired 
> de-gamma and gamma processing. So for an example:
> 
> If a source pixel buffer is scRGB encoded FP16 content we would expect input 
> pixel content to be up to 7.5, with the IGC output reaching 125 as in the 
> NVIDIA case. Likewise gamma 2.2 encoded FP16 content would be 0-1 in and 0-1 
> out.
> 
> So in the Qualcomm case the expectations are fixed depending on the use case.
> 
> It is sounding to me like we would need to be able to declare three things 
> here:
> 1. Value range expectations *into* the de-gamma block. A multiplier wouldn't 
> work here because it would be more of a clipping operation. I guess we would 
> have to add an explicit clamping block as well.
> 2. What the value range expectations  at the *output* of de-gamma processing 
> block. Also covered by using another multiplier block.
> 3. Value range expectations *into* a gamma processing block. This should be 
> covered by declaring a multiplier post-csc, but only assuming CSC output is 
> normalized in the desired value range. A clamping block would be preferable 
> because it describes what happens when it isn't.
> 

What about adding informational input and output range properties
to colorops? I think Intel's PWL definitions had something like
that, but I'd have to take a look at that again. While I'm not
in favor of defining segmented LUTs at the uAPI the input/output
ranges seem to be something of value.

> All this is do-able, but it seems like it would require the definition of 
> multiple color pipelines to expose the different limitations for color block 
> configuration combinations. Additionally, would it be easy for user space to 
> find the right pipeline?
> 

I'm also a little concerned that some of these proposals mean we'd
have to expose an inordinate number of color pipelines and color
pipeline selection becomes difficult and error prone.

snip

 Given that elements like various kinds of look-up tables inherently
 assume that the domain is [0.0, 1.0] (because the it is a table that
 has a beginning and an end, and the usual convention is that the
 beginning is zero and the end is one), I think it is best to stick to

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-07 Thread Harry Wentland



On 2023-10-26 15:25, Alex Goins wrote:
> On Thu, 26 Oct 2023, Sebastian Wick wrote:
> 
>> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
>>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
>>> Alex Goins  wrote:
>>>
 Thank you Harry and all other contributors for your work on this. Responses
 inline -

 On Mon, 23 Oct 2023, Pekka Paalanen wrote:

> On Fri, 20 Oct 2023 11:23:28 -0400
> Harry Wentland  wrote:
>
>> On 2023-10-20 10:57, Pekka Paalanen wrote:
>>> On Fri, 20 Oct 2023 16:22:56 +0200
>>> Sebastian Wick  wrote:
>>>
 Thanks for continuing to work on this!

 On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:

snip

>>>
>>> If we look at BT.2100, there is no such encoding even mentioned where
>>> 125.0 would correspond to 10k cd/m². That 125.0 convention already has
>>> a built-in assumption what the color spaces are and what the conversion
>>> is aiming to do. IOW, I would say that choice is opinionated from the
>>> start. The multiplier in BT.2100 is always 1.
> 
> Be that as it may, the convention of FP16 125.0 corresponding to 10k nits is
> baked in our hardware, so it's unavoidable at least for NVIDIA pipelines.
> 

Yeah, that's not just NVidia, it's basically the same for AMD. Though I
think we can work without that assumption, but the PQ TF you get from AMD
will map to [0.0, 125.0].

snip

>>
>> We could simply fail commits when the pipeline and pixel format don't
>> work together. We'll probably need some kind of ingress no-op node
>> anyway and maybe could list pixel formats there if required to make it
>> easier to find a working configuration.
> 
> Yeah, we could, but having to figure that out through trial and error would be
> unfortunate. Per above, it might be easiest to just tag pipelines with a pixel
> format instead of trying to include the pixel format conversion as a color op.
> 

Agreed, We've been looking at libliftoff a bit but one of the problem is
that it does a lot of atomic checks to figure out an optimal HW plane
configuration and we run out of time budget before we're able to check
all options.

Atomic check failure is really not well suited for this stuff.


>>> "Without the need to define a new type" is something I think we need to
>>> consider case by case. I have a hard time giving a general opinion.
>>>
>>>
>>> Counter-example 2: image size scaling colorop. It might not be
>>> configurable, it is controlled by the plane CRTC_* and SRC_*
>>> properties. You still need to understand what it does, so you can
>>> arrange the scaling to work correctly. (Do not want to scale an image
>>> with PQ-encoded values as Josh demonstrated in XDC.)
>>>
>>
>> IMO the position of the scaling operation is the thing that's important
>> here as the color pipeline won't define scaling properties.

 I agree that blending should ideally be done in linear space, and I 
 remember
 that from Josh's presentation at XDC, but I don't recall the same being 
 said for
 scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the
 pipeline that is meant to be in PQ space (more on this below), and that was
 found to achieve better results at HDR/SDR boundaries. Of course, this only
 bolsters the argument that it would be helpful to have an informational 
 "scaler"
 element to understand at which stage scaling takes place.
>>>
>>> Both blending and scaling are fundamentally the same operation: you
>>> have two or more source colors (pixels), and you want to compute a
>>> weighted average of them following what happens in nature, that is,
>>> physics, as that is what humans are used to.
>>>
>>> Both blending and scaling will suffer from the same problems if the
>>> operation is performed on not light-linear values. The result of the
>>> weighted average does not correspond to physics.
>>>
>>> The problem may be hard to observe with natural imagery, but Josh's
>>> example shows it very clearly. Maybe that effect is sometimes useful
>>> for some imagery in some use cases, but it is still an accidental
>>> side-effect. You might get even better results if you don't rely on
>>> accidental side-effects but design a separate operation for the exact
>>> goal you have.
>>>
>>> Mind, by scaling we mean changing image size. Not scaling color values.
>>>
> 
> Fair enough, but it might not always be a choice given the hardware.
> 

I'm thinking of this as an information element, not a programmable.
Some HW could define this as programmable, but I probably wouldn't
on AMD HW.

snip

>>>
>>> What I was left puzzled about after the XDC workshop is that is it
>>> possible to pre-load configurations in the background (slow), and then
>>> quickly switch between them? Hardware-wise I mean.
> 
> This works fine for our "fast" LUTs, you just point them to a surface in video
> memory and they flip to it. You could keep multiple 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-07 Thread Harry Wentland



On 2023-10-26 13:30, Sebastian Wick wrote:
> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
>> Alex Goins  wrote:
>>
>>> Thank you Harry and all other contributors for your work on this. Responses
>>> inline -
>>>
>>> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
>>>
 On Fri, 20 Oct 2023 11:23:28 -0400
 Harry Wentland  wrote:
   
> On 2023-10-20 10:57, Pekka Paalanen wrote:  
>> On Fri, 20 Oct 2023 16:22:56 +0200
>> Sebastian Wick  wrote:
>> 
>>> Thanks for continuing to work on this!
>>>
>>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:

snip

>>
>> I think we also need a definition of "informational".
>>
>> Counter-example 1: a colorop that represents a non-configurable
>
> Not sure what's "counter" for these examples?
>   
>> YUV<->RGB conversion. Maybe it determines its operation from FB pixel
>> format. It cannot be set to bypass, it cannot be configured, and it
>> will alter color values.  
>>>
>>> Would it be reasonable to expose this is a 3x4 matrix with a read-only blob 
>>> and
>>> no BYPASS property? I already brought up a similar idea at the XDC HDR 
>>> Workshop
>>> based on the principle that read-only blobs could be used to express some 
>>> static
>>> pipeline elements without the need to define a new type, but got mixed 
>>> opinions.
>>> I think this demonstrates the principle further, as clients could detect 
>>> this
>>> programmatically instead of having to special-case the informational 
>>> element.
>>
> 
> I'm all for exposing fixed color ops but I suspect that most of those
> follow some standard and in those cases instead of exposing the matrix
> values one should prefer to expose a named matrix (e.g. BT.601, BT.709,
> BT.2020).
> 

Agreed.

> As a general rule: always expose the highest level description. Going
> from a name to exact values is trivial, going from values to a name is
> much harder.
> 
>> If the blob depends on the pixel format (i.e. the driver automatically
>> chooses a different blob per pixel format), then I think we would need
>> to expose all the blobs and how they correspond to pixel formats.
>> Otherwise ok, I guess.
>>
>> However, do we want or need to make a color pipeline or colorop
>> conditional on pixel formats? For example, if you use a YUV 4:2:0 type
>> of pixel format, then you must use this pipeline and not any other. Or
>> floating-point type of pixel format. I did not anticipate this before,
>> I assumed that all color pipelines and colorops are independent of the
>> framebuffer pixel format. A specific colorop might have a property that
>> needs to agree with the framebuffer pixel format, but I didn't expect
>> further limitations.
> 
> We could simply fail commits when the pipeline and pixel format don't
> work together. We'll probably need some kind of ingress no-op node
> anyway and maybe could list pixel formats there if required to make it
> easier to find a working configuration.
> 

The problem with failing commits is that user-space has no idea why it
failed. If this means that userspace falls back to SW composition for
NV12 and P010 it would avoid HW offloading in one of the most important
use-cases on AMD HW for power-saving purposes.

snip

>>> Despite being programmable, the LUTs are updated in a manner that is less
>>> efficient as compared to e.g. the non-static "degamma" LUT. Would it be 
>>> helpful
>>> if there was some way to tag operations according to their performance,
>>> for example so that clients can prefer a high performance one when they
>>> intend to do an animated transition? I recall from the XDC HDR workshop
>>> that this is also an issue with AMD's 3DLUT, where updates can be too
>>> slow to animate.
>>
>> I can certainly see such information being useful, but then we need to
>> somehow quantize the performance.
>>
>> What I was left puzzled about after the XDC workshop is that is it
>> possible to pre-load configurations in the background (slow), and then
>> quickly switch between them? Hardware-wise I mean.
> 
> We could define that pipelines with a lower ID are to be preferred over
> higher IDs.
> 
> The issue is that if programming a pipeline becomes too slow to be
> useful it probably should just not be made available to user space.
> 
> The prepare-commit idea for blob properties would help to make the
> pipelines usable again, but until then it's probably a good idea to just
> not expose those pipelines.
> 

It's a bit of a judgment call what's too slow, though. The value of having
a HW colorop might outweigh the cost of the programming time for some
compositors but not for others.

Harry

>>
>>
>> Thanks,
>> pq
> 
> 



Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-07 Thread Harry Wentland



On 2023-10-26 04:57, Pekka Paalanen wrote:
> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> Alex Goins  wrote:
> 
>> Thank you Harry and all other contributors for your work on this. Responses
>> inline -
>>
>> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
>>
>>> On Fri, 20 Oct 2023 11:23:28 -0400
>>> Harry Wentland  wrote:
>>>   
 On 2023-10-20 10:57, Pekka Paalanen wrote:  
> On Fri, 20 Oct 2023 16:22:56 +0200
> Sebastian Wick  wrote:
> 
>> Thanks for continuing to work on this!
>>
>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
>>> v2:
>>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>>>  - Updated wording (Pekka)
>>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>>>section (Pekka)
>>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example 
>>> (Melissa)
>>>  - Add "Driver Implementer's Guide" section (Pekka)
>>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, 
>>> Pekka)  
>
> ...
>  
>>> +An example of a drm_colorop object might look like one of these::
>>> +
>>> +/* 1D enumerated curve */
>>> +Color operation 42
>>> +├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 
>>> matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
>>> +├─ "BYPASS": bool {true, false}
>>> +├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, 
>>> PQ inverse EOTF, …}
>>> +└─ "NEXT": immutable color operation ID = 43  
>>
>> I know these are just examples, but I would also like to suggest the 
>> possibility
>> of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
>> compared to setting an identity in some cases depending on the hardware. See
>> below for more on this, RE: implicit format conversions.
>>
>> Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up 
>> in
>> offline discussions that it would nonetheless be helpful to expose enumerated
>> curves in order to hide the vendor-specific complexities of programming
>> segmented LUTs from clients. In that case, we would simply refer to the
>> enumerated curve when calculating/choosing segmented LUT entries.
> 
> That's a good idea.
> 
>> Another thing that came up in offline discussions is that we could use 
>> multiple
>> color operations to program a single operation in hardware. As I understand 
>> it,
>> AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
>> "HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, 
>> but
>> we could combine them into a singular LUT in software, such that you can 
>> combine
>> e.g. segmented PQ EOTF with night light. One caveat is that you will lose
>> precision from the custom LUT where it overlaps with the linear section of 
>> the
>> enumerated curve, but that is unavoidable and shouldn't be an issue in most
>> use-cases.
> 
> Indeed.
> 
>> Actually, the current examples in the proposal don't include a multiplier 
>> color
>> op, which might be useful. For AMD as above, but also for NVIDIA as the
>> following issue arises:
>>
>> As discussed further below, the NVIDIA "degamma" LUT performs an implicit 
>> fixed
>> point to FP16 conversion. In that conversion, what fixed point 0x 
>> maps
>> to in floating point varies depending on the source content. If it's SDR
>> content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
>> potential boost multiplier if we want SDR content to be brighter. If it's 
>> HDR PQ
>> content, we want the max value in FP16 to be 125.0 (10,000 nits). My 
>> assumption
>> is that this is also what AMD's "HDR Multiplier" stage is used for, is that
>> correct?
> 
> It would be against the UAPI design principles to tag content as HDR or
> SDR. What you can do instead is to expose a colorop with a multiplier of
> 1.0 or 125.0 to match your hardware behaviour, then tell your hardware
> that the input is SDR or HDR to get the expected multiplier. You will
> never know what the content actually is, anyway.
> 
> Of course, if we want to have a arbitrary multiplier colorop that is
> somewhat standard, as in, exposed by many drivers to ease userspace
> development, you can certainly use any combination of your hardware
> features you need to realize the UAPI prescribed mathematical operation.
> 
> Since we are talking about floating-point in hardware, a multiplier
> does not significantly affect precision.
> 
> In order to mathematically define all colorops, I believe it is
> necessary to define all colorops in terms of floating-point values (as
> in math), even if they operate on fixed-point or integer. By this I
> mean that if the input is 8 bpc unsigned integer pixel format for
> instance, 0 raw pixel channel value is mapped to 0.0 and 255 is mapped
> to 1.0, 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-07 Thread Harry Wentland



On 2023-10-25 16:16, Alex Goins wrote:
> Thank you Harry and all other contributors for your work on this. Responses
> inline -
> 

Thanks for your comments on this. Apologies for the late response.
I was focussing on the simpler responses to my patch set first and
left your last as it's the most interesting.

> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
> 
>> On Fri, 20 Oct 2023 11:23:28 -0400
>> Harry Wentland  wrote:
>>
>>> On 2023-10-20 10:57, Pekka Paalanen wrote:
 On Fri, 20 Oct 2023 16:22:56 +0200
 Sebastian Wick  wrote:
   
> Thanks for continuing to work on this!
>
> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:  
>> v2:
>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>>  - Updated wording (Pekka)
>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>>section (Pekka)
>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example 
>> (Melissa)
>>  - Add "Driver Implementer's Guide" section (Pekka)
>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)

 ...

>> +An example of a drm_colorop object might look like one of these::
>> +
>> +/* 1D enumerated curve */
>> +Color operation 42
>> +├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 
>> 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
>> +├─ "BYPASS": bool {true, false}
>> +├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ 
>> inverse EOTF, …}
>> +└─ "NEXT": immutable color operation ID = 43
> 
> I know these are just examples, but I would also like to suggest the 
> possibility
> of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
> compared to setting an identity in some cases depending on the hardware. See
> below for more on this, RE: implicit format conversions.
> 
> Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up 
> in
> offline discussions that it would nonetheless be helpful to expose enumerated
> curves in order to hide the vendor-specific complexities of programming
> segmented LUTs from clients. In that case, we would simply refer to the
> enumerated curve when calculating/choosing segmented LUT entries.
> 
> Another thing that came up in offline discussions is that we could use 
> multiple
> color operations to program a single operation in hardware. As I understand 
> it,
> AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
> "HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, 
> but
> we could combine them into a singular LUT in software, such that you can 
> combine
> e.g. segmented PQ EOTF with night light. One caveat is that you will lose
> precision from the custom LUT where it overlaps with the linear section of the
> enumerated curve, but that is unavoidable and shouldn't be an issue in most
> use-cases.
> 

FWIW, for the most part we don't have ROMs followed by custom LUTs. We have
either a ROM-based HW block or a segmented programmable LUT. In the case of the
former we will only expose named transfer functions. In the case of the latter
we expose a named TF, followed by custom LUT and merge them into one segmented
LUT.

> Actually, the current examples in the proposal don't include a multiplier 
> color
> op, which might be useful. For AMD as above, but also for NVIDIA as the
> following issue arises:
> 

The current examples are only examples. A multiplier coloro opwould make a lot
of sense.

> As discussed further below, the NVIDIA "degamma" LUT performs an implicit 
> fixed
> point to FP16 conversion. In that conversion, what fixed point 0x maps
> to in floating point varies depending on the source content. If it's SDR
> content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
> potential boost multiplier if we want SDR content to be brighter. If it's HDR 
> PQ
> content, we want the max value in FP16 to be 125.0 (10,000 nits). My 
> assumption
> is that this is also what AMD's "HDR Multiplier" stage is used for, is that
> correct?
> 

Our PQ transfer function will also map to [0.0, 125.0] without use of the HDR
multiplier. The HDR multiplier is intended to be used to scale SDR brightness
when the user moves the SDR brightness slider in the OS.

> From the given enumerated curves, it's not clear how they would map to the
> above. Should sRGB EOTF have a max FP16 value of 1.0, and the PQ EOTF a max 
> FP16
> value of 125.0? That may work, but it tends towards the "descriptive" notion 
> of

Yes, I think we need to be clear about the output range of a named transfer
function. While AMD and NVidia map PQ to [0.0, 125.0] I could see others map
it to [0.0, 1.0] (and maybe scale sRGB down to 1/125.0 or some other value).

> assuming the source content, which may 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-11-04 Thread Christopher Braga
Just want to loop back to before we branched off deeper into the 
programming performance talk


On 10/26/2023 3:25 PM, Alex Goins wrote:

On Thu, 26 Oct 2023, Sebastian Wick wrote:


On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:

On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
Alex Goins  wrote:


Thank you Harry and all other contributors for your work on this. Responses
inline -

On Mon, 23 Oct 2023, Pekka Paalanen wrote:


On Fri, 20 Oct 2023 11:23:28 -0400
Harry Wentland  wrote:


On 2023-10-20 10:57, Pekka Paalanen wrote:

On Fri, 20 Oct 2023 16:22:56 +0200
Sebastian Wick  wrote:


Thanks for continuing to work on this!

On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:

v2:
  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
  - Updated wording (Pekka)
  - Change BYPASS wording to make it non-mandatory (Sebastian)
  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
section (Pekka)
  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
  - Add "Driver Implementer's Guide" section (Pekka)
  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)


...


+An example of a drm_colorop object might look like one of these::
+
+/* 1D enumerated curve */
+Color operation 42
+├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 
matrix, 3D LUT, etc.} = 1D enumerated curve
+├─ "BYPASS": bool {true, false}
+├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ 
inverse EOTF, …}
+└─ "NEXT": immutable color operation ID = 43


I know these are just examples, but I would also like to suggest the possibility
of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
compared to setting an identity in some cases depending on the hardware. See
below for more on this, RE: implicit format conversions.

Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up in
offline discussions that it would nonetheless be helpful to expose enumerated
curves in order to hide the vendor-specific complexities of programming
segmented LUTs from clients. In that case, we would simply refer to the
enumerated curve when calculating/choosing segmented LUT entries.


That's a good idea.


Another thing that came up in offline discussions is that we could use multiple
color operations to program a single operation in hardware. As I understand it,
AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
"HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, but
we could combine them into a singular LUT in software, such that you can combine
e.g. segmented PQ EOTF with night light. One caveat is that you will lose
precision from the custom LUT where it overlaps with the linear section of the
enumerated curve, but that is unavoidable and shouldn't be an issue in most
use-cases.


Indeed.


Actually, the current examples in the proposal don't include a multiplier color
op, which might be useful. For AMD as above, but also for NVIDIA as the
following issue arises:

As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed


If possible, let's declare this as two blocks. One that informatively 
declares the conversion is present, and another for the de-gamma. This 
will help with block-reuse between vendors.



point to FP16 conversion. In that conversion, what fixed point 0x maps
to in floating point varies depending on the source content. If it's SDR
content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ
content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption
is that this is also what AMD's "HDR Multiplier" stage is used for, is that
correct?


It would be against the UAPI design principles to tag content as HDR or
SDR. What you can do instead is to expose a colorop with a multiplier of
1.0 or 125.0 to match your hardware behaviour, then tell your hardware
that the input is SDR or HDR to get the expected multiplier. You will
never know what the content actually is, anyway.


Right, I didn't mean to suggest that we should tag content as HDR or SDR in the
UAPI, just relating to the end result in the pipe, ultimately it would be
determined by the multiplier color op.



A multiplier could work but we would should give OEMs the option to 
either make it "informative" and fixed by the hardware, or fully 
configurable. With the Qualcomm pipeline how we absorb FP16 pixel 
buffers, as well as how we convert them to fixed point data actually has 
a dependency on the desired de-gamma and gamma processing. So for an 
example:


If a source pixel buffer is scRGB encoded FP16 content we would expect 
input pixel content to be up to 7.5, with the IGC output reaching 125 as 
in the NVIDIA case. Likewise gamma 2.2 encoded FP16 content would be 0-1 
in and 0-1 out.



Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-27 Thread Xaver Hugl
I'm afraid that would not be very useful. It indeed depends on the refresh
rate, but also on how close to vblank the compositor does its commits / on
what the latency requirements for the currently shown content are.
When the compositor presents a fullscreen video with frames that are queued
up in advance, needing a full frame to program the atomic commit could be
acceptable, but when the user moves the cursor or plays a game, the
compositor needs to do the commits as close to vblank as possible. Without
a known upper bound on the time that it takes to program the hardware
that's not doable.

Am Fr., 27. Okt. 2023 um 14:01 Uhr schrieb Pekka Paalanen <
ppaala...@gmail.com>:

> On Fri, 27 Oct 2023 12:01:32 +0200
> Sebastian Wick  wrote:
>
> > On Fri, Oct 27, 2023 at 10:59:25AM +0200, Michel Dänzer wrote:
> > > On 10/26/23 21:25, Alex Goins wrote:
> > > > On Thu, 26 Oct 2023, Sebastian Wick wrote:
> > > >> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
> > > >>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> > > >>> Alex Goins  wrote:
> > > >>>
> > >  Despite being programmable, the LUTs are updated in a manner that
> is less
> > >  efficient as compared to e.g. the non-static "degamma" LUT. Would
> it be helpful
> > >  if there was some way to tag operations according to their
> performance,
> > >  for example so that clients can prefer a high performance one
> when they
> > >  intend to do an animated transition? I recall from the XDC HDR
> workshop
> > >  that this is also an issue with AMD's 3DLUT, where updates can be
> too
> > >  slow to animate.
> > > >>>
> > > >>> I can certainly see such information being useful, but then we
> need to
> > > >>> somehow quantize the performance.
> > > >
> > > > Right, which wouldn't even necessarily be universal, could depend on
> the given
> > > > host, GPU, etc. It could just be a relative performance indication,
> to give an
> > > > order of preference. That wouldn't tell you if it can or can't be
> animated, but
> > > > when choosing between two LUTs to animate you could prefer the higher
> > > > performance one.
> > > >
> > > >>>
> > > >>> What I was left puzzled about after the XDC workshop is that is it
> > > >>> possible to pre-load configurations in the background (slow), and
> then
> > > >>> quickly switch between them? Hardware-wise I mean.
> > > >
> > > > This works fine for our "fast" LUTs, you just point them to a
> surface in video
> > > > memory and they flip to it. You could keep multiple surfaces around
> and flip
> > > > between them without having to reprogram them in software. We can
> easily do that
> > > > with enumerated curves, populating them when the driver initializes
> instead of
> > > > waiting for the client to request them. You can even point multiple
> hardware
> > > > LUTs to the same video memory surface, if they need the same curve.
> > > >
> > > >>
> > > >> We could define that pipelines with a lower ID are to be preferred
> over
> > > >> higher IDs.
> > > >
> > > > Sure, but this isn't just an issue with a pipeline as a whole, but
> the
> > > > individual elements within it and how to use them in a given context.
> > > >
> > > >>
> > > >> The issue is that if programming a pipeline becomes too slow to be
> > > >> useful it probably should just not be made available to user
> space.
> > > >
> > > > It's not that programming the pipeline is overall too slow. The LUTs
> we have
> > > > that are relatively slow to program are meant to be set
> infrequently, or even
> > > > just once, to allow the scaler and tone mapping operator to operate
> in fixed
> > > > point PQ space. You might still want the tone mapper, so you would
> choose a
> > > > pipeline that includes them, but when it comes to e.g. animating a
> night light,
> > > > you would want to choose a different LUT for that purpose.
> > > >
> > > >>
> > > >> The prepare-commit idea for blob properties would help to make the
> > > >> pipelines usable again, but until then it's probably a good idea to
> just
> > > >> not expose those pipelines.
> > > >
> > > > The prepare-commit idea actually wouldn't work for these LUTs,
> because they are
> > > > programmed using methods instead of pointing them to a surface. I'm
> actually not
> > > > sure how slow it actually is, would need to benchmark it. I think
> not exposing
> > > > them at all would be overkill, since it would mean you can't use the
> preblending
> > > > scaler or tonemapper, and animation isn't necessary for that.
> > > >
> > > > The AMD 3DLUT is another example of a LUT that is slow to update,
> and it would
> > > > obviously be a major loss if that wasn't exposed. There just needs
> to be some
> > > > way for clients to know if they are going to kill performance by
> trying to
> > > > change it every frame.
> > >
> > > Might a first step be to require the ALLOW_MODESET flag to be set when
> changing the values for a colorop which is too slow to be updated per
> refresh cycle?

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-27 Thread Xaver Hugl
Am Fr., 27. Okt. 2023 um 12:01 Uhr schrieb Sebastian Wick <
sebastian.w...@redhat.com>:

> On Fri, Oct 27, 2023 at 10:59:25AM +0200, Michel Dänzer wrote:
> > On 10/26/23 21:25, Alex Goins wrote:
> > > On Thu, 26 Oct 2023, Sebastian Wick wrote:
> > >> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
> > >>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> > >>> Alex Goins  wrote:
> > >>>
> >  Despite being programmable, the LUTs are updated in a manner that
> is less
> >  efficient as compared to e.g. the non-static "degamma" LUT. Would
> it be helpful
> >  if there was some way to tag operations according to their
> performance,
> >  for example so that clients can prefer a high performance one when
> they
> >  intend to do an animated transition? I recall from the XDC HDR
> workshop
> >  that this is also an issue with AMD's 3DLUT, where updates can be
> too
> >  slow to animate.
> > >>>
> > >>> I can certainly see such information being useful, but then we need
> to
> > >>> somehow quantize the performance.
> > >
> > > Right, which wouldn't even necessarily be universal, could depend on
> the given
> > > host, GPU, etc. It could just be a relative performance indication, to
> give an
> > > order of preference. That wouldn't tell you if it can or can't be
> animated, but
> > > when choosing between two LUTs to animate you could prefer the higher
> > > performance one.
> > >
> > >>>
> > >>> What I was left puzzled about after the XDC workshop is that is it
> > >>> possible to pre-load configurations in the background (slow), and
> then
> > >>> quickly switch between them? Hardware-wise I mean.
> > >
> > > This works fine for our "fast" LUTs, you just point them to a surface
> in video
> > > memory and they flip to it. You could keep multiple surfaces around
> and flip
> > > between them without having to reprogram them in software. We can
> easily do that
> > > with enumerated curves, populating them when the driver initializes
> instead of
> > > waiting for the client to request them. You can even point multiple
> hardware
> > > LUTs to the same video memory surface, if they need the same curve.
> > >
> > >>
> > >> We could define that pipelines with a lower ID are to be preferred
> over
> > >> higher IDs.
> > >
> > > Sure, but this isn't just an issue with a pipeline as a whole, but the
> > > individual elements within it and how to use them in a given context.
> > >
> > >>
> > >> The issue is that if programming a pipeline becomes too slow to be
> > >> useful it probably should just not be made available to user space.
> > >
> > > It's not that programming the pipeline is overall too slow. The LUTs
> we have
> > > that are relatively slow to program are meant to be set infrequently,
> or even
> > > just once, to allow the scaler and tone mapping operator to operate in
> fixed
> > > point PQ space. You might still want the tone mapper, so you would
> choose a
> > > pipeline that includes them, but when it comes to e.g. animating a
> night light,
> > > you would want to choose a different LUT for that purpose.
> > >
> > >>
> > >> The prepare-commit idea for blob properties would help to make the
> > >> pipelines usable again, but until then it's probably a good idea to
> just
> > >> not expose those pipelines.
> > >
> > > The prepare-commit idea actually wouldn't work for these LUTs, because
> they are
> > > programmed using methods instead of pointing them to a surface. I'm
> actually not
> > > sure how slow it actually is, would need to benchmark it. I think not
> exposing
> > > them at all would be overkill, since it would mean you can't use the
> preblending
> > > scaler or tonemapper, and animation isn't necessary for that.
> > >
> > > The AMD 3DLUT is another example of a LUT that is slow to update, and
> it would
> > > obviously be a major loss if that wasn't exposed. There just needs to
> be some
> > > way for clients to know if they are going to kill performance by
> trying to
> > > change it every frame.
> >
> > Might a first step be to require the ALLOW_MODESET flag to be set when
> changing the values for a colorop which is too slow to be updated per
> refresh cycle?
> >
> > This would tell the compositor: You can use this colorop, but you can't
> change its values on the fly.
>
> I argued before that changing any color op to passthrough should never
> require ALLOW_MODESET and while this is really hard to guarantee from a
> driver perspective I still believe that it's better to not expose any
> feature requiring ALLOW_MODESET or taking too long to program to be
> useful for per-frame changes.
>
> When user space has ways to figure out if going back to a specific state
> (in this case setting everything to bypass) without ALLOW_MODESET we can
> revisit this decision, but until then, let's keep things simple and only
> expose things that work reliably without ALLOW_MODESET and fast enough
> to work for per-frame changes.
>

Knowing an operation is fast 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-27 Thread Pekka Paalanen
On Fri, 27 Oct 2023 12:01:32 +0200
Sebastian Wick  wrote:

> On Fri, Oct 27, 2023 at 10:59:25AM +0200, Michel Dänzer wrote:
> > On 10/26/23 21:25, Alex Goins wrote:  
> > > On Thu, 26 Oct 2023, Sebastian Wick wrote:  
> > >> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:  
> > >>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> > >>> Alex Goins  wrote:
> > >>>  
> >  Despite being programmable, the LUTs are updated in a manner that is 
> >  less
> >  efficient as compared to e.g. the non-static "degamma" LUT. Would it 
> >  be helpful
> >  if there was some way to tag operations according to their performance,
> >  for example so that clients can prefer a high performance one when they
> >  intend to do an animated transition? I recall from the XDC HDR workshop
> >  that this is also an issue with AMD's 3DLUT, where updates can be too
> >  slow to animate.  
> > >>>
> > >>> I can certainly see such information being useful, but then we need to
> > >>> somehow quantize the performance.  
> > > 
> > > Right, which wouldn't even necessarily be universal, could depend on the 
> > > given
> > > host, GPU, etc. It could just be a relative performance indication, to 
> > > give an
> > > order of preference. That wouldn't tell you if it can or can't be 
> > > animated, but
> > > when choosing between two LUTs to animate you could prefer the higher
> > > performance one.
> > >   
> > >>>
> > >>> What I was left puzzled about after the XDC workshop is that is it
> > >>> possible to pre-load configurations in the background (slow), and then
> > >>> quickly switch between them? Hardware-wise I mean.  
> > > 
> > > This works fine for our "fast" LUTs, you just point them to a surface in 
> > > video
> > > memory and they flip to it. You could keep multiple surfaces around and 
> > > flip
> > > between them without having to reprogram them in software. We can easily 
> > > do that
> > > with enumerated curves, populating them when the driver initializes 
> > > instead of
> > > waiting for the client to request them. You can even point multiple 
> > > hardware
> > > LUTs to the same video memory surface, if they need the same curve.
> > >   
> > >>
> > >> We could define that pipelines with a lower ID are to be preferred over
> > >> higher IDs.  
> > > 
> > > Sure, but this isn't just an issue with a pipeline as a whole, but the
> > > individual elements within it and how to use them in a given context.
> > >   
> > >>
> > >> The issue is that if programming a pipeline becomes too slow to be
> > >> useful it probably should just not be made available to user space.  
> > > 
> > > It's not that programming the pipeline is overall too slow. The LUTs we 
> > > have
> > > that are relatively slow to program are meant to be set infrequently, or 
> > > even
> > > just once, to allow the scaler and tone mapping operator to operate in 
> > > fixed
> > > point PQ space. You might still want the tone mapper, so you would choose 
> > > a
> > > pipeline that includes them, but when it comes to e.g. animating a night 
> > > light,
> > > you would want to choose a different LUT for that purpose.
> > >   
> > >>
> > >> The prepare-commit idea for blob properties would help to make the
> > >> pipelines usable again, but until then it's probably a good idea to just
> > >> not expose those pipelines.  
> > > 
> > > The prepare-commit idea actually wouldn't work for these LUTs, because 
> > > they are
> > > programmed using methods instead of pointing them to a surface. I'm 
> > > actually not
> > > sure how slow it actually is, would need to benchmark it. I think not 
> > > exposing
> > > them at all would be overkill, since it would mean you can't use the 
> > > preblending
> > > scaler or tonemapper, and animation isn't necessary for that.
> > > 
> > > The AMD 3DLUT is another example of a LUT that is slow to update, and it 
> > > would
> > > obviously be a major loss if that wasn't exposed. There just needs to be 
> > > some
> > > way for clients to know if they are going to kill performance by trying to
> > > change it every frame.  
> > 
> > Might a first step be to require the ALLOW_MODESET flag to be set when 
> > changing the values for a colorop which is too slow to be updated per 
> > refresh cycle?
> > 
> > This would tell the compositor: You can use this colorop, but you can't 
> > change its values on the fly.  
> 
> I argued before that changing any color op to passthrough should never
> require ALLOW_MODESET and while this is really hard to guarantee from a
> driver perspective I still believe that it's better to not expose any
> feature requiring ALLOW_MODESET or taking too long to program to be
> useful for per-frame changes.
> 
> When user space has ways to figure out if going back to a specific state
> (in this case setting everything to bypass) without ALLOW_MODESET we can
> revisit this decision, but until then, let's keep things simple and only
> expose things that 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-27 Thread Sebastian Wick
On Fri, Oct 27, 2023 at 10:59:25AM +0200, Michel Dänzer wrote:
> On 10/26/23 21:25, Alex Goins wrote:
> > On Thu, 26 Oct 2023, Sebastian Wick wrote:
> >> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
> >>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> >>> Alex Goins  wrote:
> >>>
>  Despite being programmable, the LUTs are updated in a manner that is less
>  efficient as compared to e.g. the non-static "degamma" LUT. Would it be 
>  helpful
>  if there was some way to tag operations according to their performance,
>  for example so that clients can prefer a high performance one when they
>  intend to do an animated transition? I recall from the XDC HDR workshop
>  that this is also an issue with AMD's 3DLUT, where updates can be too
>  slow to animate.
> >>>
> >>> I can certainly see such information being useful, but then we need to
> >>> somehow quantize the performance.
> > 
> > Right, which wouldn't even necessarily be universal, could depend on the 
> > given
> > host, GPU, etc. It could just be a relative performance indication, to give 
> > an
> > order of preference. That wouldn't tell you if it can or can't be animated, 
> > but
> > when choosing between two LUTs to animate you could prefer the higher
> > performance one.
> > 
> >>>
> >>> What I was left puzzled about after the XDC workshop is that is it
> >>> possible to pre-load configurations in the background (slow), and then
> >>> quickly switch between them? Hardware-wise I mean.
> > 
> > This works fine for our "fast" LUTs, you just point them to a surface in 
> > video
> > memory and they flip to it. You could keep multiple surfaces around and flip
> > between them without having to reprogram them in software. We can easily do 
> > that
> > with enumerated curves, populating them when the driver initializes instead 
> > of
> > waiting for the client to request them. You can even point multiple hardware
> > LUTs to the same video memory surface, if they need the same curve.
> > 
> >>
> >> We could define that pipelines with a lower ID are to be preferred over
> >> higher IDs.
> > 
> > Sure, but this isn't just an issue with a pipeline as a whole, but the
> > individual elements within it and how to use them in a given context.
> > 
> >>
> >> The issue is that if programming a pipeline becomes too slow to be
> >> useful it probably should just not be made available to user space.
> > 
> > It's not that programming the pipeline is overall too slow. The LUTs we have
> > that are relatively slow to program are meant to be set infrequently, or 
> > even
> > just once, to allow the scaler and tone mapping operator to operate in fixed
> > point PQ space. You might still want the tone mapper, so you would choose a
> > pipeline that includes them, but when it comes to e.g. animating a night 
> > light,
> > you would want to choose a different LUT for that purpose.
> > 
> >>
> >> The prepare-commit idea for blob properties would help to make the
> >> pipelines usable again, but until then it's probably a good idea to just
> >> not expose those pipelines.
> > 
> > The prepare-commit idea actually wouldn't work for these LUTs, because they 
> > are
> > programmed using methods instead of pointing them to a surface. I'm 
> > actually not
> > sure how slow it actually is, would need to benchmark it. I think not 
> > exposing
> > them at all would be overkill, since it would mean you can't use the 
> > preblending
> > scaler or tonemapper, and animation isn't necessary for that.
> > 
> > The AMD 3DLUT is another example of a LUT that is slow to update, and it 
> > would
> > obviously be a major loss if that wasn't exposed. There just needs to be 
> > some
> > way for clients to know if they are going to kill performance by trying to
> > change it every frame.
> 
> Might a first step be to require the ALLOW_MODESET flag to be set when 
> changing the values for a colorop which is too slow to be updated per refresh 
> cycle?
> 
> This would tell the compositor: You can use this colorop, but you can't 
> change its values on the fly.

I argued before that changing any color op to passthrough should never
require ALLOW_MODESET and while this is really hard to guarantee from a
driver perspective I still believe that it's better to not expose any
feature requiring ALLOW_MODESET or taking too long to program to be
useful for per-frame changes.

When user space has ways to figure out if going back to a specific state
(in this case setting everything to bypass) without ALLOW_MODESET we can
revisit this decision, but until then, let's keep things simple and only
expose things that work reliably without ALLOW_MODESET and fast enough
to work for per-frame changes.

Harry, Pekka: Should we document this? It obviously restricts what can
be exposed but exposing things that can't be used by user space isn't
useful.

> 
> -- 
> Earthling Michel Dänzer|  https://redhat.com
> Libre software 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-27 Thread Michel Dänzer
On 10/26/23 21:25, Alex Goins wrote:
> On Thu, 26 Oct 2023, Sebastian Wick wrote:
>> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
>>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
>>> Alex Goins  wrote:
>>>
 Despite being programmable, the LUTs are updated in a manner that is less
 efficient as compared to e.g. the non-static "degamma" LUT. Would it be 
 helpful
 if there was some way to tag operations according to their performance,
 for example so that clients can prefer a high performance one when they
 intend to do an animated transition? I recall from the XDC HDR workshop
 that this is also an issue with AMD's 3DLUT, where updates can be too
 slow to animate.
>>>
>>> I can certainly see such information being useful, but then we need to
>>> somehow quantize the performance.
> 
> Right, which wouldn't even necessarily be universal, could depend on the given
> host, GPU, etc. It could just be a relative performance indication, to give an
> order of preference. That wouldn't tell you if it can or can't be animated, 
> but
> when choosing between two LUTs to animate you could prefer the higher
> performance one.
> 
>>>
>>> What I was left puzzled about after the XDC workshop is that is it
>>> possible to pre-load configurations in the background (slow), and then
>>> quickly switch between them? Hardware-wise I mean.
> 
> This works fine for our "fast" LUTs, you just point them to a surface in video
> memory and they flip to it. You could keep multiple surfaces around and flip
> between them without having to reprogram them in software. We can easily do 
> that
> with enumerated curves, populating them when the driver initializes instead of
> waiting for the client to request them. You can even point multiple hardware
> LUTs to the same video memory surface, if they need the same curve.
> 
>>
>> We could define that pipelines with a lower ID are to be preferred over
>> higher IDs.
> 
> Sure, but this isn't just an issue with a pipeline as a whole, but the
> individual elements within it and how to use them in a given context.
> 
>>
>> The issue is that if programming a pipeline becomes too slow to be
>> useful it probably should just not be made available to user space.
> 
> It's not that programming the pipeline is overall too slow. The LUTs we have
> that are relatively slow to program are meant to be set infrequently, or even
> just once, to allow the scaler and tone mapping operator to operate in fixed
> point PQ space. You might still want the tone mapper, so you would choose a
> pipeline that includes them, but when it comes to e.g. animating a night 
> light,
> you would want to choose a different LUT for that purpose.
> 
>>
>> The prepare-commit idea for blob properties would help to make the
>> pipelines usable again, but until then it's probably a good idea to just
>> not expose those pipelines.
> 
> The prepare-commit idea actually wouldn't work for these LUTs, because they 
> are
> programmed using methods instead of pointing them to a surface. I'm actually 
> not
> sure how slow it actually is, would need to benchmark it. I think not exposing
> them at all would be overkill, since it would mean you can't use the 
> preblending
> scaler or tonemapper, and animation isn't necessary for that.
> 
> The AMD 3DLUT is another example of a LUT that is slow to update, and it would
> obviously be a major loss if that wasn't exposed. There just needs to be some
> way for clients to know if they are going to kill performance by trying to
> change it every frame.

Might a first step be to require the ALLOW_MODESET flag to be set when changing 
the values for a colorop which is too slow to be updated per refresh cycle?

This would tell the compositor: You can use this colorop, but you can't change 
its values on the fly.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-26 Thread Alex Goins
On Thu, 26 Oct 2023, Sebastian Wick wrote:

> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
> > On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> > Alex Goins  wrote:
> >
> > > Thank you Harry and all other contributors for your work on this. 
> > > Responses
> > > inline -
> > >
> > > On Mon, 23 Oct 2023, Pekka Paalanen wrote:
> > >
> > > > On Fri, 20 Oct 2023 11:23:28 -0400
> > > > Harry Wentland  wrote:
> > > >
> > > > > On 2023-10-20 10:57, Pekka Paalanen wrote:
> > > > > > On Fri, 20 Oct 2023 16:22:56 +0200
> > > > > > Sebastian Wick  wrote:
> > > > > >
> > > > > >> Thanks for continuing to work on this!
> > > > > >>
> > > > > >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
> > > > > >>> v2:
> > > > > >>>  - Update colorop visualizations to match reality (Sebastian, 
> > > > > >>> Alex Hung)
> > > > > >>>  - Updated wording (Pekka)
> > > > > >>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> > > > > >>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane 
> > > > > >>> Property
> > > > > >>>section (Pekka)
> > > > > >>>  - Use PQ EOTF instead of its inverse in Pipeline Programming 
> > > > > >>> example (Melissa)
> > > > > >>>  - Add "Driver Implementer's Guide" section (Pekka)
> > > > > >>>  - Add "Driver Forward/Backward Compatibility" section 
> > > > > >>> (Sebastian, Pekka)
> > > > > >
> > > > > > ...
> > > > > >
> > > > > >>> +An example of a drm_colorop object might look like one of these::
> > > > > >>> +
> > > > > >>> +/* 1D enumerated curve */
> > > > > >>> +Color operation 42
> > > > > >>> +├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 
> > > > > >>> matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
> > > > > >>> +├─ "BYPASS": bool {true, false}
> > > > > >>> +├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ 
> > > > > >>> EOTF, PQ inverse EOTF, …}
> > > > > >>> +└─ "NEXT": immutable color operation ID = 43
> > >
> > > I know these are just examples, but I would also like to suggest the 
> > > possibility
> > > of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
> > > compared to setting an identity in some cases depending on the hardware. 
> > > See
> > > below for more on this, RE: implicit format conversions.
> > >
> > > Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came 
> > > up in
> > > offline discussions that it would nonetheless be helpful to expose 
> > > enumerated
> > > curves in order to hide the vendor-specific complexities of programming
> > > segmented LUTs from clients. In that case, we would simply refer to the
> > > enumerated curve when calculating/choosing segmented LUT entries.
> >
> > That's a good idea.
> >
> > > Another thing that came up in offline discussions is that we could use 
> > > multiple
> > > color operations to program a single operation in hardware. As I 
> > > understand it,
> > > AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by 
> > > an
> > > "HDR Multiplier". On NVIDIA we don't have these as separate hardware 
> > > stages, but
> > > we could combine them into a singular LUT in software, such that you can 
> > > combine
> > > e.g. segmented PQ EOTF with night light. One caveat is that you will lose
> > > precision from the custom LUT where it overlaps with the linear section 
> > > of the
> > > enumerated curve, but that is unavoidable and shouldn't be an issue in 
> > > most
> > > use-cases.
> >
> > Indeed.
> >
> > > Actually, the current examples in the proposal don't include a multiplier 
> > > color
> > > op, which might be useful. For AMD as above, but also for NVIDIA as the
> > > following issue arises:
> > >
> > > As discussed further below, the NVIDIA "degamma" LUT performs an implicit 
> > > fixed
> > > point to FP16 conversion. In that conversion, what fixed point 0x 
> > > maps
> > > to in floating point varies depending on the source content. If it's SDR
> > > content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
> > > potential boost multiplier if we want SDR content to be brighter. If it's 
> > > HDR PQ
> > > content, we want the max value in FP16 to be 125.0 (10,000 nits). My 
> > > assumption
> > > is that this is also what AMD's "HDR Multiplier" stage is used for, is 
> > > that
> > > correct?
> >
> > It would be against the UAPI design principles to tag content as HDR or
> > SDR. What you can do instead is to expose a colorop with a multiplier of
> > 1.0 or 125.0 to match your hardware behaviour, then tell your hardware
> > that the input is SDR or HDR to get the expected multiplier. You will
> > never know what the content actually is, anyway.

Right, I didn't mean to suggest that we should tag content as HDR or SDR in the
UAPI, just relating to the end result in the pipe, ultimately it would be
determined by the multiplier color op. 

> >
> > Of course, if we want to have a arbitrary multiplier colorop that is
> > 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-26 Thread Sebastian Wick
On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> Alex Goins  wrote:
> 
> > Thank you Harry and all other contributors for your work on this. Responses
> > inline -
> > 
> > On Mon, 23 Oct 2023, Pekka Paalanen wrote:
> > 
> > > On Fri, 20 Oct 2023 11:23:28 -0400
> > > Harry Wentland  wrote:
> > >   
> > > > On 2023-10-20 10:57, Pekka Paalanen wrote:  
> > > > > On Fri, 20 Oct 2023 16:22:56 +0200
> > > > > Sebastian Wick  wrote:
> > > > > 
> > > > >> Thanks for continuing to work on this!
> > > > >>
> > > > >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
> > > > >>> v2:
> > > > >>>  - Update colorop visualizations to match reality (Sebastian, Alex 
> > > > >>> Hung)
> > > > >>>  - Updated wording (Pekka)
> > > > >>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> > > > >>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane 
> > > > >>> Property
> > > > >>>section (Pekka)
> > > > >>>  - Use PQ EOTF instead of its inverse in Pipeline Programming 
> > > > >>> example (Melissa)
> > > > >>>  - Add "Driver Implementer's Guide" section (Pekka)
> > > > >>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, 
> > > > >>> Pekka)  
> > > > >
> > > > > ...
> > > > >  
> > > > >>> +An example of a drm_colorop object might look like one of these::
> > > > >>> +
> > > > >>> +/* 1D enumerated curve */
> > > > >>> +Color operation 42
> > > > >>> +├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 
> > > > >>> matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
> > > > >>> +├─ "BYPASS": bool {true, false}
> > > > >>> +├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ 
> > > > >>> EOTF, PQ inverse EOTF, …}
> > > > >>> +└─ "NEXT": immutable color operation ID = 43  
> > 
> > I know these are just examples, but I would also like to suggest the 
> > possibility
> > of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
> > compared to setting an identity in some cases depending on the hardware. See
> > below for more on this, RE: implicit format conversions.
> > 
> > Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came 
> > up in
> > offline discussions that it would nonetheless be helpful to expose 
> > enumerated
> > curves in order to hide the vendor-specific complexities of programming
> > segmented LUTs from clients. In that case, we would simply refer to the
> > enumerated curve when calculating/choosing segmented LUT entries.
> 
> That's a good idea.
> 
> > Another thing that came up in offline discussions is that we could use 
> > multiple
> > color operations to program a single operation in hardware. As I understand 
> > it,
> > AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
> > "HDR Multiplier". On NVIDIA we don't have these as separate hardware 
> > stages, but
> > we could combine them into a singular LUT in software, such that you can 
> > combine
> > e.g. segmented PQ EOTF with night light. One caveat is that you will lose
> > precision from the custom LUT where it overlaps with the linear section of 
> > the
> > enumerated curve, but that is unavoidable and shouldn't be an issue in most
> > use-cases.
> 
> Indeed.
> 
> > Actually, the current examples in the proposal don't include a multiplier 
> > color
> > op, which might be useful. For AMD as above, but also for NVIDIA as the
> > following issue arises:
> > 
> > As discussed further below, the NVIDIA "degamma" LUT performs an implicit 
> > fixed
> > point to FP16 conversion. In that conversion, what fixed point 0x 
> > maps
> > to in floating point varies depending on the source content. If it's SDR
> > content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
> > potential boost multiplier if we want SDR content to be brighter. If it's 
> > HDR PQ
> > content, we want the max value in FP16 to be 125.0 (10,000 nits). My 
> > assumption
> > is that this is also what AMD's "HDR Multiplier" stage is used for, is that
> > correct?
> 
> It would be against the UAPI design principles to tag content as HDR or
> SDR. What you can do instead is to expose a colorop with a multiplier of
> 1.0 or 125.0 to match your hardware behaviour, then tell your hardware
> that the input is SDR or HDR to get the expected multiplier. You will
> never know what the content actually is, anyway.
> 
> Of course, if we want to have a arbitrary multiplier colorop that is
> somewhat standard, as in, exposed by many drivers to ease userspace
> development, you can certainly use any combination of your hardware
> features you need to realize the UAPI prescribed mathematical operation.
> 
> Since we are talking about floating-point in hardware, a multiplier
> does not significantly affect precision.
> 
> In order to mathematically define all colorops, I believe it is
> necessary to define all colorops in terms of floating-point 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-26 Thread Pekka Paalanen
On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
Alex Goins  wrote:

> Thank you Harry and all other contributors for your work on this. Responses
> inline -
> 
> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
> 
> > On Fri, 20 Oct 2023 11:23:28 -0400
> > Harry Wentland  wrote:
> >   
> > > On 2023-10-20 10:57, Pekka Paalanen wrote:  
> > > > On Fri, 20 Oct 2023 16:22:56 +0200
> > > > Sebastian Wick  wrote:
> > > > 
> > > >> Thanks for continuing to work on this!
> > > >>
> > > >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
> > > >>> v2:
> > > >>>  - Update colorop visualizations to match reality (Sebastian, Alex 
> > > >>> Hung)
> > > >>>  - Updated wording (Pekka)
> > > >>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> > > >>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> > > >>>section (Pekka)
> > > >>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example 
> > > >>> (Melissa)
> > > >>>  - Add "Driver Implementer's Guide" section (Pekka)
> > > >>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, 
> > > >>> Pekka)  
> > > >
> > > > ...
> > > >  
> > > >>> +An example of a drm_colorop object might look like one of these::
> > > >>> +
> > > >>> +/* 1D enumerated curve */
> > > >>> +Color operation 42
> > > >>> +├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 
> > > >>> matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
> > > >>> +├─ "BYPASS": bool {true, false}
> > > >>> +├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, 
> > > >>> PQ inverse EOTF, …}
> > > >>> +└─ "NEXT": immutable color operation ID = 43  
> 
> I know these are just examples, but I would also like to suggest the 
> possibility
> of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
> compared to setting an identity in some cases depending on the hardware. See
> below for more on this, RE: implicit format conversions.
> 
> Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up 
> in
> offline discussions that it would nonetheless be helpful to expose enumerated
> curves in order to hide the vendor-specific complexities of programming
> segmented LUTs from clients. In that case, we would simply refer to the
> enumerated curve when calculating/choosing segmented LUT entries.

That's a good idea.

> Another thing that came up in offline discussions is that we could use 
> multiple
> color operations to program a single operation in hardware. As I understand 
> it,
> AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
> "HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, 
> but
> we could combine them into a singular LUT in software, such that you can 
> combine
> e.g. segmented PQ EOTF with night light. One caveat is that you will lose
> precision from the custom LUT where it overlaps with the linear section of the
> enumerated curve, but that is unavoidable and shouldn't be an issue in most
> use-cases.

Indeed.

> Actually, the current examples in the proposal don't include a multiplier 
> color
> op, which might be useful. For AMD as above, but also for NVIDIA as the
> following issue arises:
> 
> As discussed further below, the NVIDIA "degamma" LUT performs an implicit 
> fixed
> point to FP16 conversion. In that conversion, what fixed point 0x maps
> to in floating point varies depending on the source content. If it's SDR
> content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
> potential boost multiplier if we want SDR content to be brighter. If it's HDR 
> PQ
> content, we want the max value in FP16 to be 125.0 (10,000 nits). My 
> assumption
> is that this is also what AMD's "HDR Multiplier" stage is used for, is that
> correct?

It would be against the UAPI design principles to tag content as HDR or
SDR. What you can do instead is to expose a colorop with a multiplier of
1.0 or 125.0 to match your hardware behaviour, then tell your hardware
that the input is SDR or HDR to get the expected multiplier. You will
never know what the content actually is, anyway.

Of course, if we want to have a arbitrary multiplier colorop that is
somewhat standard, as in, exposed by many drivers to ease userspace
development, you can certainly use any combination of your hardware
features you need to realize the UAPI prescribed mathematical operation.

Since we are talking about floating-point in hardware, a multiplier
does not significantly affect precision.

In order to mathematically define all colorops, I believe it is
necessary to define all colorops in terms of floating-point values (as
in math), even if they operate on fixed-point or integer. By this I
mean that if the input is 8 bpc unsigned integer pixel format for
instance, 0 raw pixel channel value is mapped to 0.0 and 255 is mapped
to 1.0, and the color pipeline starts with [0.0, 1.0], not [0, 255]
domain. We have to 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-25 Thread Alex Goins
Thank you Harry and all other contributors for your work on this. Responses
inline -

On Mon, 23 Oct 2023, Pekka Paalanen wrote:

> On Fri, 20 Oct 2023 11:23:28 -0400
> Harry Wentland  wrote:
> 
> > On 2023-10-20 10:57, Pekka Paalanen wrote:
> > > On Fri, 20 Oct 2023 16:22:56 +0200
> > > Sebastian Wick  wrote:
> > >   
> > >> Thanks for continuing to work on this!
> > >>
> > >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:  
> > >>> v2:
> > >>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
> > >>>  - Updated wording (Pekka)
> > >>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> > >>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> > >>>section (Pekka)
> > >>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example 
> > >>> (Melissa)
> > >>>  - Add "Driver Implementer's Guide" section (Pekka)
> > >>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, 
> > >>> Pekka)
> > >
> > > ...
> > >
> > >>> +An example of a drm_colorop object might look like one of these::
> > >>> +
> > >>> +/* 1D enumerated curve */
> > >>> +Color operation 42
> > >>> +├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 
> > >>> matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
> > >>> +├─ "BYPASS": bool {true, false}
> > >>> +├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, 
> > >>> PQ inverse EOTF, …}
> > >>> +└─ "NEXT": immutable color operation ID = 43

I know these are just examples, but I would also like to suggest the possibility
of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
compared to setting an identity in some cases depending on the hardware. See
below for more on this, RE: implicit format conversions.

Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up in
offline discussions that it would nonetheless be helpful to expose enumerated
curves in order to hide the vendor-specific complexities of programming
segmented LUTs from clients. In that case, we would simply refer to the
enumerated curve when calculating/choosing segmented LUT entries.

Another thing that came up in offline discussions is that we could use multiple
color operations to program a single operation in hardware. As I understand it,
AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
"HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, but
we could combine them into a singular LUT in software, such that you can combine
e.g. segmented PQ EOTF with night light. One caveat is that you will lose
precision from the custom LUT where it overlaps with the linear section of the
enumerated curve, but that is unavoidable and shouldn't be an issue in most
use-cases.

Actually, the current examples in the proposal don't include a multiplier color
op, which might be useful. For AMD as above, but also for NVIDIA as the
following issue arises:

As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed
point to FP16 conversion. In that conversion, what fixed point 0x maps
to in floating point varies depending on the source content. If it's SDR
content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ
content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption
is that this is also what AMD's "HDR Multiplier" stage is used for, is that
correct?

>From the given enumerated curves, it's not clear how they would map to the
above. Should sRGB EOTF have a max FP16 value of 1.0, and the PQ EOTF a max FP16
value of 125.0? That may work, but it tends towards the "descriptive" notion of
assuming the source content, which may not be accurate in all cases. This is
also an issue for the custom 1D LUT, as the blob will need to be converted to
FP16 in order to populate our "degamma" LUT. What should the resulting max FP16
value be, given that we no longer have any hint as to the source content?

I think a multiplier color op solves all of these issues. Named curves and
custom 1D LUTs would by default assume a max FP16 value of 1.0, which would then
be adjusted by the multiplier. For 80 nit SDR content, set it to 1, for 400
nit SDR content, set it to 5, for HDR PQ content, set it to 125, etc. 

> > >>> +
> > >>> +/* custom 4k entry 1D LUT */
> > >>> +Color operation 52
> > >>> +├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 
> > >>> matrix, 3x4 matrix, 3D LUT, etc.} = 1D LUT
> > >>> +├─ "BYPASS": bool {true, false}
> > >>> +├─ "LUT_1D_SIZE": immutable range = 4096
> > >>> +├─ "LUT_1D": blob
> > >>> +└─ "NEXT": immutable color operation ID = 0
> > > 
> > > ...
> > >   
> > >>> +Driver Forward/Backward Compatibility
> > >>> +=
> > >>> +
> > >>> +As this is uAPI drivers can't regress color pipelines that 

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-23 Thread Pekka Paalanen
On Fri, 20 Oct 2023 11:23:28 -0400
Harry Wentland  wrote:

> On 2023-10-20 10:57, Pekka Paalanen wrote:
> > On Fri, 20 Oct 2023 16:22:56 +0200
> > Sebastian Wick  wrote:
> >   
> >> Thanks for continuing to work on this!
> >>
> >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:  
> >>> v2:
> >>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
> >>>  - Updated wording (Pekka)
> >>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> >>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> >>>section (Pekka)
> >>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example 
> >>> (Melissa)
> >>>  - Add "Driver Implementer's Guide" section (Pekka)
> >>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
> >>>  
> > 
> > ...
> >   
> >>> +Driver Forward/Backward Compatibility
> >>> +=
> >>> +
> >>> +As this is uAPI drivers can't regress color pipelines that have been
> >>> +introduced for a given HW generation. New HW generations are free to
> >>> +abandon color pipelines advertised for previous generations.
> >>> +Nevertheless, it can be beneficial to carry support for existing color
> >>> +pipelines forward as those will likely already have support in DRM
> >>> +clients.
> >>> +
> >>> +Introducing new colorops to a pipeline is fine, as long as they can be
> >>> +disabled or are purely informational. DRM clients implementing support
> >>> +for the pipeline can always skip unknown properties as long as they can
> >>> +be confident that doing so will not cause unexpected results.
> >>> +
> >>> +If a new colorop doesn't fall into one of the above categories
> >>> +(bypassable or informational) the modified pipeline would be unusable
> >>> +for user space. In this case a new pipeline should be defined.
> >>
> >> How can user space detect an informational element? Should we just add a
> >> BYPASS property to informational elements, make it read only and set to
> >> true maybe? Or something more descriptive?  
> > 
> > Read-only BYPASS set to true would be fine by me, I guess.
> >   
> 
> Don't you mean set to false? An informational element will always do
> something, so it can't be bypassed.

Yeah, this is why we need a definition. I understand "informational" to
not change pixel values in any way. Previously I had some weird idea
that scaling doesn't alter color, but of course it may.


> > I think we also need a definition of "informational".
> > 
> > Counter-example 1: a colorop that represents a non-configurable  
> 
> Not sure what's "counter" for these examples?
> 
> > YUV<->RGB conversion. Maybe it determines its operation from FB pixel
> > format. It cannot be set to bypass, it cannot be configured, and it
> > will alter color values.
> > 
> > Counter-example 2: image size scaling colorop. It might not be
> > configurable, it is controlled by the plane CRTC_* and SRC_*
> > properties. You still need to understand what it does, so you can
> > arrange the scaling to work correctly. (Do not want to scale an image
> > with PQ-encoded values as Josh demonstrated in XDC.)
> >   
> 
> IMO the position of the scaling operation is the thing that's important
> here as the color pipeline won't define scaling properties.
> 
> > Counter-example 3: image sampling colorop. Averages FB originated color
> > values to produce a color sample. Again do not want to do this with
> > PQ-encoded values.
> >   
> 
> Wouldn't this only happen during a scaling op?

There is certainly some overlap between examples 2 and 3. IIRC SRC_X/Y
coordinates can be fractional, which makes nearest vs. bilinear
sampling have a difference even if there is no scaling.

There is also the question of chroma siting with sub-sampled YUV. I
don't know how that actually works, or how it theoretically should work.


Thanks,
pq


pgpNcCZW4JjHb.pgp
Description: OpenPGP digital signature


Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-20 Thread Harry Wentland



On 2023-10-20 10:57, Pekka Paalanen wrote:
> On Fri, 20 Oct 2023 16:22:56 +0200
> Sebastian Wick  wrote:
> 
>> Thanks for continuing to work on this!
>>
>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
>>> v2:
>>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>>>  - Updated wording (Pekka)
>>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>>>section (Pekka)
>>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example 
>>> (Melissa)
>>>  - Add "Driver Implementer's Guide" section (Pekka)
>>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
>>>
> 
> ...
> 
>>> +Driver Forward/Backward Compatibility
>>> +=
>>> +
>>> +As this is uAPI drivers can't regress color pipelines that have been
>>> +introduced for a given HW generation. New HW generations are free to
>>> +abandon color pipelines advertised for previous generations.
>>> +Nevertheless, it can be beneficial to carry support for existing color
>>> +pipelines forward as those will likely already have support in DRM
>>> +clients.
>>> +
>>> +Introducing new colorops to a pipeline is fine, as long as they can be
>>> +disabled or are purely informational. DRM clients implementing support
>>> +for the pipeline can always skip unknown properties as long as they can
>>> +be confident that doing so will not cause unexpected results.
>>> +
>>> +If a new colorop doesn't fall into one of the above categories
>>> +(bypassable or informational) the modified pipeline would be unusable
>>> +for user space. In this case a new pipeline should be defined.  
>>
>> How can user space detect an informational element? Should we just add a
>> BYPASS property to informational elements, make it read only and set to
>> true maybe? Or something more descriptive?
> 
> Read-only BYPASS set to true would be fine by me, I guess.
> 

Don't you mean set to false? An informational element will always do
something, so it can't be bypassed.

> I think we also need a definition of "informational".
> 
> Counter-example 1: a colorop that represents a non-configurable

Not sure what's "counter" for these examples?

> YUV<->RGB conversion. Maybe it determines its operation from FB pixel
> format. It cannot be set to bypass, it cannot be configured, and it
> will alter color values.
> 
> Counter-example 2: image size scaling colorop. It might not be
> configurable, it is controlled by the plane CRTC_* and SRC_*
> properties. You still need to understand what it does, so you can
> arrange the scaling to work correctly. (Do not want to scale an image
> with PQ-encoded values as Josh demonstrated in XDC.)
> 

IMO the position of the scaling operation is the thing that's important
here as the color pipeline won't define scaling properties.

> Counter-example 3: image sampling colorop. Averages FB originated color
> values to produce a color sample. Again do not want to do this with
> PQ-encoded values.
> 

Wouldn't this only happen during a scaling op?

Harry

> 
> Thanks,
> pq



Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-20 Thread Pekka Paalanen
On Fri, 20 Oct 2023 16:22:56 +0200
Sebastian Wick  wrote:

> Thanks for continuing to work on this!
> 
> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
> > v2:
> >  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
> >  - Updated wording (Pekka)
> >  - Change BYPASS wording to make it non-mandatory (Sebastian)
> >  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> >section (Pekka)
> >  - Use PQ EOTF instead of its inverse in Pipeline Programming example 
> > (Melissa)
> >  - Add "Driver Implementer's Guide" section (Pekka)
> >  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
> > 

...

> > +Driver Forward/Backward Compatibility
> > +=
> > +
> > +As this is uAPI drivers can't regress color pipelines that have been
> > +introduced for a given HW generation. New HW generations are free to
> > +abandon color pipelines advertised for previous generations.
> > +Nevertheless, it can be beneficial to carry support for existing color
> > +pipelines forward as those will likely already have support in DRM
> > +clients.
> > +
> > +Introducing new colorops to a pipeline is fine, as long as they can be
> > +disabled or are purely informational. DRM clients implementing support
> > +for the pipeline can always skip unknown properties as long as they can
> > +be confident that doing so will not cause unexpected results.
> > +
> > +If a new colorop doesn't fall into one of the above categories
> > +(bypassable or informational) the modified pipeline would be unusable
> > +for user space. In this case a new pipeline should be defined.  
> 
> How can user space detect an informational element? Should we just add a
> BYPASS property to informational elements, make it read only and set to
> true maybe? Or something more descriptive?

Read-only BYPASS set to true would be fine by me, I guess.

I think we also need a definition of "informational".

Counter-example 1: a colorop that represents a non-configurable
YUV<->RGB conversion. Maybe it determines its operation from FB pixel
format. It cannot be set to bypass, it cannot be configured, and it
will alter color values.

Counter-example 2: image size scaling colorop. It might not be
configurable, it is controlled by the plane CRTC_* and SRC_*
properties. You still need to understand what it does, so you can
arrange the scaling to work correctly. (Do not want to scale an image
with PQ-encoded values as Josh demonstrated in XDC.)

Counter-example 3: image sampling colorop. Averages FB originated color
values to produce a color sample. Again do not want to do this with
PQ-encoded values.


Thanks,
pq


pgpIOyueDJJvc.pgp
Description: OpenPGP digital signature


Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-20 Thread Sebastian Wick
Thanks for continuing to work on this!

On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
> v2:
>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>  - Updated wording (Pekka)
>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>section (Pekka)
>  - Use PQ EOTF instead of its inverse in Pipeline Programming example 
> (Melissa)
>  - Add "Driver Implementer's Guide" section (Pekka)
>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
> 
> Signed-off-by: Harry Wentland 
> Cc: Ville Syrjala 
> Cc: Pekka Paalanen 
> Cc: Simon Ser 
> Cc: Harry Wentland 
> Cc: Melissa Wen 
> Cc: Jonas Ådahl 
> Cc: Sebastian Wick 
> Cc: Shashank Sharma 
> Cc: Alexander Goins 
> Cc: Joshua Ashton 
> Cc: Michel Dänzer 
> Cc: Aleix Pol 
> Cc: Xaver Hugl 
> Cc: Victoria Brekenfeld 
> Cc: Sima 
> Cc: Uma Shankar 
> Cc: Naseer Ahmed 
> Cc: Christopher Braga 
> Cc: Abhinav Kumar 
> Cc: Arthur Grillo 
> Cc: Hector Martin 
> Cc: Liviu Dudau 
> Cc: Sasha McIntosh 
> ---
>  Documentation/gpu/rfc/color_pipeline.rst | 347 +++
>  1 file changed, 347 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
> 
> diff --git a/Documentation/gpu/rfc/color_pipeline.rst 
> b/Documentation/gpu/rfc/color_pipeline.rst
> new file mode 100644
> index ..af5f2ea29116
> --- /dev/null
> +++ b/Documentation/gpu/rfc/color_pipeline.rst
> @@ -0,0 +1,347 @@
> +
> +Linux Color Pipeline API
> +
> +
> +What problem are we solving?
> +
> +
> +We would like to support pre-, and post-blending complex color
> +transformations in display controller hardware in order to allow for
> +HW-supported HDR use-cases, as well as to provide support to
> +color-managed applications, such as video or image editors.
> +
> +It is possible to support an HDR output on HW supporting the Colorspace
> +and HDR Metadata drm_connector properties, but that requires the
> +compositor or application to render and compose the content into one
> +final buffer intended for display. Doing so is costly.
> +
> +Most modern display HW offers various 1D LUTs, 3D LUTs, matrices, and other
> +operations to support color transformations. These operations are often
> +implemented in fixed-function HW and therefore much more power efficient than
> +performing similar operations via shaders or CPU.
> +
> +We would like to make use of this HW functionality to support complex color
> +transformations with no, or minimal CPU or shader load.
> +
> +
> +How are other OSes solving this problem?
> +
> +
> +The most widely supported use-cases regard HDR content, whether video or
> +gaming.
> +
> +Most OSes will specify the source content format (color gamut, encoding 
> transfer
> +function, and other metadata, such as max and average light levels) to a 
> driver.
> +Drivers will then program their fixed-function HW accordingly to map from a
> +source content buffer's space to a display's space.
> +
> +When fixed-function HW is not available the compositor will assemble a 
> shader to
> +ask the GPU to perform the transformation from the source content format to 
> the
> +display's format.
> +
> +A compositor's mapping function and a driver's mapping function are usually
> +entirely separate concepts. On OSes where a HW vendor has no insight into
> +closed-source compositor code such a vendor will tune their color management
> +code to visually match the compositor's. On other OSes, where both mapping
> +functions are open to an implementer they will ensure both mappings match.
> +
> +This results in mapping algorithm lock-in, meaning that no-one alone can
> +experiment with or introduce new mapping algorithms and achieve
> +consistent results regardless of which implementation path is taken.
> +
> +Why is Linux different?
> +===
> +
> +Unlike other OSes, where there is one compositor for one or more drivers, on
> +Linux we have a many-to-many relationship. Many compositors; many drivers.
> +In addition each compositor vendor or community has their own view of how
> +color management should be done. This is what makes Linux so beautiful.
> +
> +This means that a HW vendor can now no longer tune their driver to one
> +compositor, as tuning it to one could make it look fairly different from
> +another compositor's color mapping.
> +
> +We need a better solution.
> +
> +
> +Descriptive API
> +===
> +
> +An API that describes the source and destination colorspaces is a descriptive
> +API. It describes the input and output color spaces but does not describe
> +how precisely they should be mapped. Such a mapping includes many minute
> +design decision that can greatly affect the look of the final result.
> +
> +It is not feasible to describe such mapping with enough detail to 

[RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-19 Thread Harry Wentland
v2:
 - Update colorop visualizations to match reality (Sebastian, Alex Hung)
 - Updated wording (Pekka)
 - Change BYPASS wording to make it non-mandatory (Sebastian)
 - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
   section (Pekka)
 - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
 - Add "Driver Implementer's Guide" section (Pekka)
 - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)

Signed-off-by: Harry Wentland 
Cc: Ville Syrjala 
Cc: Pekka Paalanen 
Cc: Simon Ser 
Cc: Harry Wentland 
Cc: Melissa Wen 
Cc: Jonas Ådahl 
Cc: Sebastian Wick 
Cc: Shashank Sharma 
Cc: Alexander Goins 
Cc: Joshua Ashton 
Cc: Michel Dänzer 
Cc: Aleix Pol 
Cc: Xaver Hugl 
Cc: Victoria Brekenfeld 
Cc: Sima 
Cc: Uma Shankar 
Cc: Naseer Ahmed 
Cc: Christopher Braga 
Cc: Abhinav Kumar 
Cc: Arthur Grillo 
Cc: Hector Martin 
Cc: Liviu Dudau 
Cc: Sasha McIntosh 
---
 Documentation/gpu/rfc/color_pipeline.rst | 347 +++
 1 file changed, 347 insertions(+)
 create mode 100644 Documentation/gpu/rfc/color_pipeline.rst

diff --git a/Documentation/gpu/rfc/color_pipeline.rst 
b/Documentation/gpu/rfc/color_pipeline.rst
new file mode 100644
index ..af5f2ea29116
--- /dev/null
+++ b/Documentation/gpu/rfc/color_pipeline.rst
@@ -0,0 +1,347 @@
+
+Linux Color Pipeline API
+
+
+What problem are we solving?
+
+
+We would like to support pre-, and post-blending complex color
+transformations in display controller hardware in order to allow for
+HW-supported HDR use-cases, as well as to provide support to
+color-managed applications, such as video or image editors.
+
+It is possible to support an HDR output on HW supporting the Colorspace
+and HDR Metadata drm_connector properties, but that requires the
+compositor or application to render and compose the content into one
+final buffer intended for display. Doing so is costly.
+
+Most modern display HW offers various 1D LUTs, 3D LUTs, matrices, and other
+operations to support color transformations. These operations are often
+implemented in fixed-function HW and therefore much more power efficient than
+performing similar operations via shaders or CPU.
+
+We would like to make use of this HW functionality to support complex color
+transformations with no, or minimal CPU or shader load.
+
+
+How are other OSes solving this problem?
+
+
+The most widely supported use-cases regard HDR content, whether video or
+gaming.
+
+Most OSes will specify the source content format (color gamut, encoding 
transfer
+function, and other metadata, such as max and average light levels) to a 
driver.
+Drivers will then program their fixed-function HW accordingly to map from a
+source content buffer's space to a display's space.
+
+When fixed-function HW is not available the compositor will assemble a shader 
to
+ask the GPU to perform the transformation from the source content format to the
+display's format.
+
+A compositor's mapping function and a driver's mapping function are usually
+entirely separate concepts. On OSes where a HW vendor has no insight into
+closed-source compositor code such a vendor will tune their color management
+code to visually match the compositor's. On other OSes, where both mapping
+functions are open to an implementer they will ensure both mappings match.
+
+This results in mapping algorithm lock-in, meaning that no-one alone can
+experiment with or introduce new mapping algorithms and achieve
+consistent results regardless of which implementation path is taken.
+
+Why is Linux different?
+===
+
+Unlike other OSes, where there is one compositor for one or more drivers, on
+Linux we have a many-to-many relationship. Many compositors; many drivers.
+In addition each compositor vendor or community has their own view of how
+color management should be done. This is what makes Linux so beautiful.
+
+This means that a HW vendor can now no longer tune their driver to one
+compositor, as tuning it to one could make it look fairly different from
+another compositor's color mapping.
+
+We need a better solution.
+
+
+Descriptive API
+===
+
+An API that describes the source and destination colorspaces is a descriptive
+API. It describes the input and output color spaces but does not describe
+how precisely they should be mapped. Such a mapping includes many minute
+design decision that can greatly affect the look of the final result.
+
+It is not feasible to describe such mapping with enough detail to ensure the
+same result from each implementation. In fact, these mappings are a very active
+research area.
+
+
+Prescriptive API
+
+
+A prescriptive API describes not the source and destination colorspaces. It
+instead prescribes a recipe for how to manipulate pixel values to arrive at the
+desired outcome.
+
+This recipe is generally an