Re: [RFC] Plane color pipeline KMS uAPI

2023-06-16 Thread Pekka Paalanen
On Thu, 15 Jun 2023 17:44:33 -0400
Christopher Braga  wrote:

> On 6/14/2023 5:00 AM, Pekka Paalanen wrote:
> > On Tue, 13 Jun 2023 12:29:55 -0400
> > Christopher Braga  wrote:
> >   
> >> On 6/13/2023 4:23 AM, Pekka Paalanen wrote:  
> >>> On Mon, 12 Jun 2023 12:56:57 -0400
> >>> Christopher Braga  wrote:
> >>>  
>  On 6/12/2023 5:21 AM, Pekka Paalanen wrote:  
> > On Fri, 9 Jun 2023 19:11:25 -0400
> > Christopher Braga  wrote:
> > 
> >> On 6/9/2023 12:30 PM, Simon Ser wrote:  
> >>> Hi Christopher,
> >>>
> >>> On Friday, June 9th, 2023 at 17:52, Christopher Braga 
> >>>  wrote:
> >>>
> > The new COLOROP objects also expose a number of KMS properties. 
> > Each has a
> > type, a reference to the next COLOROP object in the linked list, 
> > and other
> > type-specific properties. Here is an example for a 1D LUT operation:
> >
> >  Color operation 42
> >  ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >  ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = 
> > LUT  
>  The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
>  curves? Will different hardware be allowed to expose a subset of 
>  these
>  enum values?  
> >>>
> >>> Yes. Only hardcoded LUTs supported by the HW are exposed as enum 
> >>> entries.
> >>>
> >  ├─ "lut_size": immutable range = 4096
> >  ├─ "lut_data": blob
> >  └─ "next": immutable color operation ID = 43
> >   
>  Some hardware has per channel 1D LUT values, while others use the 
>  same
>  LUT for all channels.  We will definitely need to expose this in the
>  UAPI in some form.  
> >>>
> >>> Hm, I was assuming per-channel 1D LUTs here, just like the existing 
> >>> GAMMA_LUT/
> >>> DEGAMMA_LUT properties work. If some hardware can't support that, 
> >>> it'll need
> >>> to get exposed as another color operation block.
> >>>
> > To configure this hardware block, user-space can fill a KMS blob 
> > with
> > 4096 u32
> > entries, then set "lut_data" to the blob ID. Other color operation 
> > types
> > might
> > have different properties.
> >   
>  The bit-depth of the LUT is an important piece of information we 
>  should
>  include by default. Are we assuming that the DRM driver will always
>  reduce the input values to the resolution supported by the pipeline?
>  This could result in differences between the hardware behavior
>  and the shader behavior.
> 
>  Additionally, some pipelines are floating point while others are 
>  fixed.
>  How would user space know if it needs to pack 32 bit integer values 
>  vs
>  32 bit float values?  
> >>>
> >>> Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use 
> >>> a common
> >>> definition of LUT blob (u16 elements) and it's up to the driver to 
> >>> convert.
> >>>
> >>> Using a very precise format for the uAPI has the nice property of 
> >>> making the
> >>> uAPI much simpler to use. User-space sends high precision data and 
> >>> it's up to
> >>> drivers to map that to whatever the hardware accepts.
> >>>   
> >> Conversion from a larger uint type to a smaller type sounds low effort,
> >> however if a block works in a floating point space things are going to
> >> get messy really quickly. If the block operates in FP16 space and the
> >> interface is 16 bits we are good, but going from 32 bits to FP16 (such
> >> as in the matrix case or 3DLUT) is less than ideal.  
> >
> > Hi Christopher,
> >
> > are you thinking of precision loss, or the overhead of conversion?
> >
> > Conversion from N-bit fixed point to N-bit floating-point is generally
> > lossy, too, and the other direction as well.
> >
> > What exactly would be messy?
> > 
>  Overheard of conversion is the primary concern here. Having to extract
>  and / or calculate the significand + exponent components in the kernel
>  is burdensome and imo a task better suited for user space. This also has
>  to be done every blob set, meaning that if user space is re-using
>  pre-calculated blobs we would be repeating the same conversion
>  operations in kernel space unnecessarily.  
> >>>
> >>> What is burdensome in that calculation? I don't think you would need to
> >>> use any actual floating-point instructions. Logarithm for finding the
> >>> exponent is about finding the highest bit set in an integer and
> >>> everything is conveniently expressed in base-2. Finding significand is
> >>> 

Re: [RFC] Plane color pipeline KMS uAPI

2023-06-15 Thread Christopher Braga




On 6/14/2023 5:00 AM, Pekka Paalanen wrote:

On Tue, 13 Jun 2023 12:29:55 -0400
Christopher Braga  wrote:


On 6/13/2023 4:23 AM, Pekka Paalanen wrote:

On Mon, 12 Jun 2023 12:56:57 -0400
Christopher Braga  wrote:
   

On 6/12/2023 5:21 AM, Pekka Paalanen wrote:

On Fri, 9 Jun 2023 19:11:25 -0400
Christopher Braga  wrote:
  

On 6/9/2023 12:30 PM, Simon Ser wrote:

Hi Christopher,

On Friday, June 9th, 2023 at 17:52, Christopher Braga  
wrote:
 

The new COLOROP objects also expose a number of KMS properties. Each has a
type, a reference to the next COLOROP object in the linked list, and other
type-specific properties. Here is an example for a 1D LUT operation:

 Color operation 42
 ├─ "type": enum {Bypass, 1D curve} = 1D curve
 ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT

The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
curves? Will different hardware be allowed to expose a subset of these
enum values?


Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
 

 ├─ "lut_size": immutable range = 4096
 ├─ "lut_data": blob
 └─ "next": immutable color operation ID = 43


Some hardware has per channel 1D LUT values, while others use the same
LUT for all channels.  We will definitely need to expose this in the
UAPI in some form.


Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
to get exposed as another color operation block.
 

To configure this hardware block, user-space can fill a KMS blob with
4096 u32
entries, then set "lut_data" to the blob ID. Other color operation types
might
have different properties.


The bit-depth of the LUT is an important piece of information we should
include by default. Are we assuming that the DRM driver will always
reduce the input values to the resolution supported by the pipeline?
This could result in differences between the hardware behavior
and the shader behavior.

Additionally, some pipelines are floating point while others are fixed.
How would user space know if it needs to pack 32 bit integer values vs
32 bit float values?


Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
definition of LUT blob (u16 elements) and it's up to the driver to convert.

Using a very precise format for the uAPI has the nice property of making the
uAPI much simpler to use. User-space sends high precision data and it's up to
drivers to map that to whatever the hardware accepts.


Conversion from a larger uint type to a smaller type sounds low effort,
however if a block works in a floating point space things are going to
get messy really quickly. If the block operates in FP16 space and the
interface is 16 bits we are good, but going from 32 bits to FP16 (such
as in the matrix case or 3DLUT) is less than ideal.


Hi Christopher,

are you thinking of precision loss, or the overhead of conversion?

Conversion from N-bit fixed point to N-bit floating-point is generally
lossy, too, and the other direction as well.

What exactly would be messy?
  

Overheard of conversion is the primary concern here. Having to extract
and / or calculate the significand + exponent components in the kernel
is burdensome and imo a task better suited for user space. This also has
to be done every blob set, meaning that if user space is re-using
pre-calculated blobs we would be repeating the same conversion
operations in kernel space unnecessarily.


What is burdensome in that calculation? I don't think you would need to
use any actual floating-point instructions. Logarithm for finding the
exponent is about finding the highest bit set in an integer and
everything is conveniently expressed in base-2. Finding significand is
just masking the integer based on the exponent.
   

Oh it definitely can be done, but I think this is just a difference of
opinion at this point. At the end of the day we will do it if we have
to, but it is just more optimal if a more agreeable common type is used.


Can you not cache the converted data, keyed by the DRM blob unique
identity vs. the KMS property it is attached to?

If the userspace compositor has N common transforms (ex: standard P3 ->
sRGB matrix), they would likely have N unique blobs. Obviously from the
kernel end we wouldn't want to cache the transform of every blob passed
down through the UAPI.


Hi Christoper,

as long as the blob exists, why not?


Generally because this is an unbounded amount of blobs. I'm not 100% 
sure what the typical behavior is upstream, but in our driver we have 
scenarios where we can have per-frame blob updates (unique per-frame blobs).


Speaking of per-frame blob updates, there is one concern I neglected to 
bring up. Internally we have seen scenarios where frequent blob 
allocation can lead to memory allocation delays of two frames or higher. 
This 

Re: [RFC] Plane color pipeline KMS uAPI

2023-06-14 Thread Pekka Paalanen
On Tue, 13 Jun 2023 12:29:55 -0400
Christopher Braga  wrote:

> On 6/13/2023 4:23 AM, Pekka Paalanen wrote:
> > On Mon, 12 Jun 2023 12:56:57 -0400
> > Christopher Braga  wrote:
> >   
> >> On 6/12/2023 5:21 AM, Pekka Paalanen wrote:  
> >>> On Fri, 9 Jun 2023 19:11:25 -0400
> >>> Christopher Braga  wrote:
> >>>  
>  On 6/9/2023 12:30 PM, Simon Ser wrote:  
> > Hi Christopher,
> >
> > On Friday, June 9th, 2023 at 17:52, Christopher Braga 
> >  wrote:
> > 
> >>> The new COLOROP objects also expose a number of KMS properties. Each 
> >>> has a
> >>> type, a reference to the next COLOROP object in the linked list, and 
> >>> other
> >>> type-specific properties. Here is an example for a 1D LUT operation:
> >>>
> >>> Color operation 42
> >>> ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >>> ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = 
> >>> LUT  
> >> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
> >> curves? Will different hardware be allowed to expose a subset of these
> >> enum values?  
> >
> > Yes. Only hardcoded LUTs supported by the HW are exposed as enum 
> > entries.
> > 
> >>> ├─ "lut_size": immutable range = 4096
> >>> ├─ "lut_data": blob
> >>> └─ "next": immutable color operation ID = 43
> >>>
> >> Some hardware has per channel 1D LUT values, while others use the same
> >> LUT for all channels.  We will definitely need to expose this in the
> >> UAPI in some form.  
> >
> > Hm, I was assuming per-channel 1D LUTs here, just like the existing 
> > GAMMA_LUT/
> > DEGAMMA_LUT properties work. If some hardware can't support that, it'll 
> > need
> > to get exposed as another color operation block.
> > 
> >>> To configure this hardware block, user-space can fill a KMS blob with
> >>> 4096 u32
> >>> entries, then set "lut_data" to the blob ID. Other color operation 
> >>> types
> >>> might
> >>> have different properties.
> >>>
> >> The bit-depth of the LUT is an important piece of information we should
> >> include by default. Are we assuming that the DRM driver will always
> >> reduce the input values to the resolution supported by the pipeline?
> >> This could result in differences between the hardware behavior
> >> and the shader behavior.
> >>
> >> Additionally, some pipelines are floating point while others are fixed.
> >> How would user space know if it needs to pack 32 bit integer values vs
> >> 32 bit float values?  
> >
> > Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a 
> > common
> > definition of LUT blob (u16 elements) and it's up to the driver to 
> > convert.
> >
> > Using a very precise format for the uAPI has the nice property of 
> > making the
> > uAPI much simpler to use. User-space sends high precision data and it's 
> > up to
> > drivers to map that to whatever the hardware accepts.
> >
>  Conversion from a larger uint type to a smaller type sounds low effort,
>  however if a block works in a floating point space things are going to
>  get messy really quickly. If the block operates in FP16 space and the
>  interface is 16 bits we are good, but going from 32 bits to FP16 (such
>  as in the matrix case or 3DLUT) is less than ideal.  
> >>>
> >>> Hi Christopher,
> >>>
> >>> are you thinking of precision loss, or the overhead of conversion?
> >>>
> >>> Conversion from N-bit fixed point to N-bit floating-point is generally
> >>> lossy, too, and the other direction as well.
> >>>
> >>> What exactly would be messy?
> >>>  
> >> Overheard of conversion is the primary concern here. Having to extract
> >> and / or calculate the significand + exponent components in the kernel
> >> is burdensome and imo a task better suited for user space. This also has
> >> to be done every blob set, meaning that if user space is re-using
> >> pre-calculated blobs we would be repeating the same conversion
> >> operations in kernel space unnecessarily.  
> > 
> > What is burdensome in that calculation? I don't think you would need to
> > use any actual floating-point instructions. Logarithm for finding the
> > exponent is about finding the highest bit set in an integer and
> > everything is conveniently expressed in base-2. Finding significand is
> > just masking the integer based on the exponent.
> >   
> Oh it definitely can be done, but I think this is just a difference of 
> opinion at this point. At the end of the day we will do it if we have 
> to, but it is just more optimal if a more agreeable common type is used.
> 
> > Can you not cache the converted data, keyed by the DRM blob unique
> > identity vs. the KMS property it is attached to?  
> If the userspace 

Re: [RFC] Plane color pipeline KMS uAPI

2023-06-13 Thread Christopher Braga




On 6/13/2023 4:23 AM, Pekka Paalanen wrote:

On Mon, 12 Jun 2023 12:56:57 -0400
Christopher Braga  wrote:


On 6/12/2023 5:21 AM, Pekka Paalanen wrote:

On Fri, 9 Jun 2023 19:11:25 -0400
Christopher Braga  wrote:
   

On 6/9/2023 12:30 PM, Simon Ser wrote:

Hi Christopher,

On Friday, June 9th, 2023 at 17:52, Christopher Braga  
wrote:
  

The new COLOROP objects also expose a number of KMS properties. Each has a
type, a reference to the next COLOROP object in the linked list, and other
type-specific properties. Here is an example for a 1D LUT operation:

Color operation 42
├─ "type": enum {Bypass, 1D curve} = 1D curve
├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT

The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
curves? Will different hardware be allowed to expose a subset of these
enum values?


Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
  

├─ "lut_size": immutable range = 4096
├─ "lut_data": blob
└─ "next": immutable color operation ID = 43
 

Some hardware has per channel 1D LUT values, while others use the same
LUT for all channels.  We will definitely need to expose this in the
UAPI in some form.


Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
to get exposed as another color operation block.
  

To configure this hardware block, user-space can fill a KMS blob with
4096 u32
entries, then set "lut_data" to the blob ID. Other color operation types
might
have different properties.
 

The bit-depth of the LUT is an important piece of information we should
include by default. Are we assuming that the DRM driver will always
reduce the input values to the resolution supported by the pipeline?
This could result in differences between the hardware behavior
and the shader behavior.

Additionally, some pipelines are floating point while others are fixed.
How would user space know if it needs to pack 32 bit integer values vs
32 bit float values?


Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
definition of LUT blob (u16 elements) and it's up to the driver to convert.

Using a very precise format for the uAPI has the nice property of making the
uAPI much simpler to use. User-space sends high precision data and it's up to
drivers to map that to whatever the hardware accepts.
 

Conversion from a larger uint type to a smaller type sounds low effort,
however if a block works in a floating point space things are going to
get messy really quickly. If the block operates in FP16 space and the
interface is 16 bits we are good, but going from 32 bits to FP16 (such
as in the matrix case or 3DLUT) is less than ideal.


Hi Christopher,

are you thinking of precision loss, or the overhead of conversion?

Conversion from N-bit fixed point to N-bit floating-point is generally
lossy, too, and the other direction as well.

What exactly would be messy?
   

Overheard of conversion is the primary concern here. Having to extract
and / or calculate the significand + exponent components in the kernel
is burdensome and imo a task better suited for user space. This also has
to be done every blob set, meaning that if user space is re-using
pre-calculated blobs we would be repeating the same conversion
operations in kernel space unnecessarily.


What is burdensome in that calculation? I don't think you would need to
use any actual floating-point instructions. Logarithm for finding the
exponent is about finding the highest bit set in an integer and
everything is conveniently expressed in base-2. Finding significand is
just masking the integer based on the exponent.

Oh it definitely can be done, but I think this is just a difference of 
opinion at this point. At the end of the day we will do it if we have 
to, but it is just more optimal if a more agreeable common type is used.



Can you not cache the converted data, keyed by the DRM blob unique
identity vs. the KMS property it is attached to?
If the userspace compositor has N common transforms (ex: standard P3 -> 
sRGB matrix), they would likely have N unique blobs. Obviously from the 
kernel end we wouldn't want to cache the transform of every blob passed 
down through the UAPI.




You can assume that userspace will not be re-creating DRM blobs without
a reason to believe the contents have changed. If the same blob is set
on the same property repeatedly, I would definitely not expect a driver
to convert the data again.
If the blob ID is unchanged there is no issue since caching the last 
result is already common. As you say, blobs are immutable so no update 
is needed. I'd question why the compositor keeps trying to send down the

same blob ID though.


If a driver does that, it seems like it
should be easy to avoid, though I'm no kernel dev. Even if the
conversion was just a memcpy, I would still posit it 

Re: [RFC] Plane color pipeline KMS uAPI

2023-06-13 Thread Pekka Paalanen
On Mon, 12 Jun 2023 12:56:57 -0400
Christopher Braga  wrote:

> On 6/12/2023 5:21 AM, Pekka Paalanen wrote:
> > On Fri, 9 Jun 2023 19:11:25 -0400
> > Christopher Braga  wrote:
> >   
> >> On 6/9/2023 12:30 PM, Simon Ser wrote:  
> >>> Hi Christopher,
> >>>
> >>> On Friday, June 9th, 2023 at 17:52, Christopher Braga 
> >>>  wrote:
> >>>  
> > The new COLOROP objects also expose a number of KMS properties. Each 
> > has a
> > type, a reference to the next COLOROP object in the linked list, and 
> > other
> > type-specific properties. Here is an example for a 1D LUT operation:
> >
> >Color operation 42
> >├─ "type": enum {Bypass, 1D curve} = 1D curve
> >├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT  
>  The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
>  curves? Will different hardware be allowed to expose a subset of these
>  enum values?  
> >>>
> >>> Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
> >>>  
> >├─ "lut_size": immutable range = 4096
> >├─ "lut_data": blob
> >└─ "next": immutable color operation ID = 43
> > 
>  Some hardware has per channel 1D LUT values, while others use the same
>  LUT for all channels.  We will definitely need to expose this in the
>  UAPI in some form.  
> >>>
> >>> Hm, I was assuming per-channel 1D LUTs here, just like the existing 
> >>> GAMMA_LUT/
> >>> DEGAMMA_LUT properties work. If some hardware can't support that, it'll 
> >>> need
> >>> to get exposed as another color operation block.
> >>>  
> > To configure this hardware block, user-space can fill a KMS blob with
> > 4096 u32
> > entries, then set "lut_data" to the blob ID. Other color operation types
> > might
> > have different properties.
> > 
>  The bit-depth of the LUT is an important piece of information we should
>  include by default. Are we assuming that the DRM driver will always
>  reduce the input values to the resolution supported by the pipeline?
>  This could result in differences between the hardware behavior
>  and the shader behavior.
> 
>  Additionally, some pipelines are floating point while others are fixed.
>  How would user space know if it needs to pack 32 bit integer values vs
>  32 bit float values?  
> >>>
> >>> Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a 
> >>> common
> >>> definition of LUT blob (u16 elements) and it's up to the driver to 
> >>> convert.
> >>>
> >>> Using a very precise format for the uAPI has the nice property of making 
> >>> the
> >>> uAPI much simpler to use. User-space sends high precision data and it's 
> >>> up to
> >>> drivers to map that to whatever the hardware accepts.
> >>> 
> >> Conversion from a larger uint type to a smaller type sounds low effort,
> >> however if a block works in a floating point space things are going to
> >> get messy really quickly. If the block operates in FP16 space and the
> >> interface is 16 bits we are good, but going from 32 bits to FP16 (such
> >> as in the matrix case or 3DLUT) is less than ideal.  
> > 
> > Hi Christopher,
> > 
> > are you thinking of precision loss, or the overhead of conversion?
> > 
> > Conversion from N-bit fixed point to N-bit floating-point is generally
> > lossy, too, and the other direction as well.
> > 
> > What exactly would be messy?
> >   
> Overheard of conversion is the primary concern here. Having to extract 
> and / or calculate the significand + exponent components in the kernel 
> is burdensome and imo a task better suited for user space. This also has 
> to be done every blob set, meaning that if user space is re-using 
> pre-calculated blobs we would be repeating the same conversion 
> operations in kernel space unnecessarily.

What is burdensome in that calculation? I don't think you would need to
use any actual floating-point instructions. Logarithm for finding the
exponent is about finding the highest bit set in an integer and
everything is conveniently expressed in base-2. Finding significand is
just masking the integer based on the exponent.

Can you not cache the converted data, keyed by the DRM blob unique
identity vs. the KMS property it is attached to?

You can assume that userspace will not be re-creating DRM blobs without
a reason to believe the contents have changed. If the same blob is set
on the same property repeatedly, I would definitely not expect a driver
to convert the data again. If a driver does that, it seems like it
should be easy to avoid, though I'm no kernel dev. Even if the
conversion was just a memcpy, I would still posit it needs to be
avoided when the data has obviously not changed. Blobs are immutable.

Userspace having to use hardware-specific number formats would probably
not be well received.

> I agree normalization of the value causing precision loss and rounding 

Re: [RFC] Plane color pipeline KMS uAPI

2023-06-12 Thread Christopher Braga




On 6/12/2023 5:21 AM, Pekka Paalanen wrote:

On Fri, 9 Jun 2023 19:11:25 -0400
Christopher Braga  wrote:


On 6/9/2023 12:30 PM, Simon Ser wrote:

Hi Christopher,

On Friday, June 9th, 2023 at 17:52, Christopher Braga  
wrote:
   

The new COLOROP objects also expose a number of KMS properties. Each has a
type, a reference to the next COLOROP object in the linked list, and other
type-specific properties. Here is an example for a 1D LUT operation:

   Color operation 42
   ├─ "type": enum {Bypass, 1D curve} = 1D curve
   ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT

The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
curves? Will different hardware be allowed to expose a subset of these
enum values?


Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
   

   ├─ "lut_size": immutable range = 4096
   ├─ "lut_data": blob
   └─ "next": immutable color operation ID = 43
  

Some hardware has per channel 1D LUT values, while others use the same
LUT for all channels.  We will definitely need to expose this in the
UAPI in some form.


Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
to get exposed as another color operation block.
   

To configure this hardware block, user-space can fill a KMS blob with
4096 u32
entries, then set "lut_data" to the blob ID. Other color operation types
might
have different properties.
  

The bit-depth of the LUT is an important piece of information we should
include by default. Are we assuming that the DRM driver will always
reduce the input values to the resolution supported by the pipeline?
This could result in differences between the hardware behavior
and the shader behavior.

Additionally, some pipelines are floating point while others are fixed.
How would user space know if it needs to pack 32 bit integer values vs
32 bit float values?


Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
definition of LUT blob (u16 elements) and it's up to the driver to convert.

Using a very precise format for the uAPI has the nice property of making the
uAPI much simpler to use. User-space sends high precision data and it's up to
drivers to map that to whatever the hardware accepts.
  

Conversion from a larger uint type to a smaller type sounds low effort,
however if a block works in a floating point space things are going to
get messy really quickly. If the block operates in FP16 space and the
interface is 16 bits we are good, but going from 32 bits to FP16 (such
as in the matrix case or 3DLUT) is less than ideal.


Hi Christopher,

are you thinking of precision loss, or the overhead of conversion?

Conversion from N-bit fixed point to N-bit floating-point is generally
lossy, too, and the other direction as well.

What exactly would be messy?

Overheard of conversion is the primary concern here. Having to extract 
and / or calculate the significand + exponent components in the kernel 
is burdensome and imo a task better suited for user space. This also has 
to be done every blob set, meaning that if user space is re-using 
pre-calculated blobs we would be repeating the same conversion 
operations in kernel space unnecessarily.


I agree normalization of the value causing precision loss and rounding 
we can't avoid.


We should also consider the fact that float pipelines have been known to 
use the scrgb definition for floating point values 
(https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_gl_colorspace_scrgb_linear.txt). 
In cases like this where there may be a expected value range in the 
pipeline, how to normalize a larger input becomes a little confusing. Ex 
- Does U32 MAX become FP16 MAX or value MAX (i.e 127).





Exposing the actual hardware precision is something we've talked about during
the hackfest. It'll probably be useful to some extent, but will require some
discussion to figure out how to design the uAPI. Maybe a simple property is
enough, maybe not (e.g. fully describing the precision of segmented LUTs would
probably be trickier).

I'd rather keep things simple for the first pass, we can always add more
properties for bit depth etc later on.
   

Indicating if a block operates on / with fixed vs float values is
significant enough that I think we should account for this in initial
design. It will have a affect on both the user space value packing +
expected value ranges in the hardware.


What do you mean by "value packing"? Memory layout of the bits forming
a value? Or possible exact values of a specific type? >
Both really. If the kernel is provided a U32 value, we need to know if 
this is a U32 value, or a float packed into a U32 container. Likewise as 
mentioned with the scRGB above, float could even adjust the value range 
expectations.



I don't think fixed vs. float is the most important thing. Even fixed
point formats can have different 

Re: [RFC] Plane color pipeline KMS uAPI

2023-06-12 Thread Pekka Paalanen
On Fri, 9 Jun 2023 19:11:25 -0400
Christopher Braga  wrote:

> On 6/9/2023 12:30 PM, Simon Ser wrote:
> > Hi Christopher,
> > 
> > On Friday, June 9th, 2023 at 17:52, Christopher Braga 
> >  wrote:
> >   
> >>> The new COLOROP objects also expose a number of KMS properties. Each has a
> >>> type, a reference to the next COLOROP object in the linked list, and other
> >>> type-specific properties. Here is an example for a 1D LUT operation:
> >>>
> >>>   Color operation 42
> >>>   ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >>>   ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT  
> >> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
> >> curves? Will different hardware be allowed to expose a subset of these
> >> enum values?  
> > 
> > Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
> >   
> >>>   ├─ "lut_size": immutable range = 4096
> >>>   ├─ "lut_data": blob
> >>>   └─ "next": immutable color operation ID = 43
> >>>  
> >> Some hardware has per channel 1D LUT values, while others use the same
> >> LUT for all channels.  We will definitely need to expose this in the
> >> UAPI in some form.  
> > 
> > Hm, I was assuming per-channel 1D LUTs here, just like the existing 
> > GAMMA_LUT/
> > DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
> > to get exposed as another color operation block.
> >   
> >>> To configure this hardware block, user-space can fill a KMS blob with
> >>> 4096 u32
> >>> entries, then set "lut_data" to the blob ID. Other color operation types
> >>> might
> >>> have different properties.
> >>>  
> >> The bit-depth of the LUT is an important piece of information we should
> >> include by default. Are we assuming that the DRM driver will always
> >> reduce the input values to the resolution supported by the pipeline?
> >> This could result in differences between the hardware behavior
> >> and the shader behavior.
> >>
> >> Additionally, some pipelines are floating point while others are fixed.
> >> How would user space know if it needs to pack 32 bit integer values vs
> >> 32 bit float values?  
> > 
> > Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a 
> > common
> > definition of LUT blob (u16 elements) and it's up to the driver to convert.
> > 
> > Using a very precise format for the uAPI has the nice property of making the
> > uAPI much simpler to use. User-space sends high precision data and it's up 
> > to
> > drivers to map that to whatever the hardware accepts.
> >  
> Conversion from a larger uint type to a smaller type sounds low effort, 
> however if a block works in a floating point space things are going to 
> get messy really quickly. If the block operates in FP16 space and the 
> interface is 16 bits we are good, but going from 32 bits to FP16 (such 
> as in the matrix case or 3DLUT) is less than ideal.

Hi Christopher,

are you thinking of precision loss, or the overhead of conversion?

Conversion from N-bit fixed point to N-bit floating-point is generally
lossy, too, and the other direction as well.

What exactly would be messy?

> 
> > Exposing the actual hardware precision is something we've talked about 
> > during
> > the hackfest. It'll probably be useful to some extent, but will require some
> > discussion to figure out how to design the uAPI. Maybe a simple property is
> > enough, maybe not (e.g. fully describing the precision of segmented LUTs 
> > would
> > probably be trickier).
> > 
> > I'd rather keep things simple for the first pass, we can always add more
> > properties for bit depth etc later on.
> >   
> Indicating if a block operates on / with fixed vs float values is 
> significant enough that I think we should account for this in initial 
> design. It will have a affect on both the user space value packing + 
> expected value ranges in the hardware.

What do you mean by "value packing"? Memory layout of the bits forming
a value? Or possible exact values of a specific type?

I don't think fixed vs. float is the most important thing. Even fixed
point formats can have different numbers of bits for whole numbers,
which changes the usable value range and not only precision. Userspace
at the very least needs to know the usable value range for the block's
inputs, outputs, and parameters.

When defining the precision for inputs, outputs and parameters, then
fixed- vs. floating-point becomes meaningful in explaining what "N bits
of precision" means.

Then there is the question of variable precision that depends on the
actual block input and parameter values, how to represent that. Worst
case precision might be too pessimistic alone.

> >>> Here is another example with a 3D LUT:
> >>>
> >>>   Color operation 42
> >>>   ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >>>   ├─ "lut_size": immutable range = 33
> >>>   ├─ "lut_data": blob
> >>>   └─ "next": immutable color operation ID = 43
> >>>  
> >> We are going to 

Re: [RFC] Plane color pipeline KMS uAPI

2023-06-09 Thread Christopher Braga




On 6/9/2023 12:30 PM, Simon Ser wrote:

Hi Christopher,

On Friday, June 9th, 2023 at 17:52, Christopher Braga  
wrote:


The new COLOROP objects also expose a number of KMS properties. Each has a
type, a reference to the next COLOROP object in the linked list, and other
type-specific properties. Here is an example for a 1D LUT operation:

  Color operation 42
  ├─ "type": enum {Bypass, 1D curve} = 1D curve
  ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT

The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
curves? Will different hardware be allowed to expose a subset of these
enum values?


Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.


  ├─ "lut_size": immutable range = 4096
  ├─ "lut_data": blob
  └─ "next": immutable color operation ID = 43


Some hardware has per channel 1D LUT values, while others use the same
LUT for all channels.  We will definitely need to expose this in the
UAPI in some form.


Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
to get exposed as another color operation block.


To configure this hardware block, user-space can fill a KMS blob with
4096 u32
entries, then set "lut_data" to the blob ID. Other color operation types
might
have different properties.


The bit-depth of the LUT is an important piece of information we should
include by default. Are we assuming that the DRM driver will always
reduce the input values to the resolution supported by the pipeline?
This could result in differences between the hardware behavior
and the shader behavior.

Additionally, some pipelines are floating point while others are fixed.
How would user space know if it needs to pack 32 bit integer values vs
32 bit float values?


Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
definition of LUT blob (u16 elements) and it's up to the driver to convert.

Using a very precise format for the uAPI has the nice property of making the
uAPI much simpler to use. User-space sends high precision data and it's up to
drivers to map that to whatever the hardware accepts.

Conversion from a larger uint type to a smaller type sounds low effort, 
however if a block works in a floating point space things are going to 
get messy really quickly. If the block operates in FP16 space and the 
interface is 16 bits we are good, but going from 32 bits to FP16 (such 
as in the matrix case or 3DLUT) is less than ideal.



Exposing the actual hardware precision is something we've talked about during
the hackfest. It'll probably be useful to some extent, but will require some
discussion to figure out how to design the uAPI. Maybe a simple property is
enough, maybe not (e.g. fully describing the precision of segmented LUTs would
probably be trickier).

I'd rather keep things simple for the first pass, we can always add more
properties for bit depth etc later on.

Indicating if a block operates on / with fixed vs float values is 
significant enough that I think we should account for this in initial 
design. It will have a affect on both the user space value packing + 
expected value ranges in the hardware.



Here is another example with a 3D LUT:

  Color operation 42
  ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
  ├─ "lut_size": immutable range = 33
  ├─ "lut_data": blob
  └─ "next": immutable color operation ID = 43


We are going to need to expose the packing order here to avoid any
programming uncertainty. I don't think we can safely assume all hardware
is equivalent.


The driver can easily change the layout of the matrix and do any conversion
necessary when programming the hardware. We do need to document what layout is
used in the uAPI for sure.


And one last example with a matrix:

  Color operation 42
  ├─ "type": enum {Bypass, Matrix} = Matrix
  ├─ "matrix_data": blob
  └─ "next": immutable color operation ID = 43


It is unclear to me what the default sizing of this matrix is. Any
objections to exposing these details with an additional property?


The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
that wouldn't be enough?


Larger cases do exist, but as you mention this can be resolved with a 
different type then. I don't have any issues with the default 'Matrix' 
type being 9 entries.





Dithering logic exists in some pipelines. I think we need a plan to
expose that here as well.


Hm, I'm not too familiar with dithering. Do you think it would make sense to
expose as an additional colorop block? Do you think it would have more
consequences on the design?

I want to re-iterate that we don't need to ship all features from day 1. We
just need to come up with a uAPI design on which new features can be built on.



Agreed. I don't think this will affect the proposed design so this can 
be figured out once we have a DRM driver impl that 

Re: [RFC] Plane color pipeline KMS uAPI

2023-06-09 Thread Simon Ser
Hi Christopher,

On Friday, June 9th, 2023 at 17:52, Christopher Braga  
wrote:

> > The new COLOROP objects also expose a number of KMS properties. Each has a
> > type, a reference to the next COLOROP object in the linked list, and other
> > type-specific properties. Here is an example for a 1D LUT operation:
> >
> >  Color operation 42
> >  ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >  ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
> curves? Will different hardware be allowed to expose a subset of these
> enum values?

Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.

> >  ├─ "lut_size": immutable range = 4096
> >  ├─ "lut_data": blob
> >  └─ "next": immutable color operation ID = 43
> >
> Some hardware has per channel 1D LUT values, while others use the same
> LUT for all channels.  We will definitely need to expose this in the
> UAPI in some form.

Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
to get exposed as another color operation block.

> > To configure this hardware block, user-space can fill a KMS blob with
> > 4096 u32
> > entries, then set "lut_data" to the blob ID. Other color operation types
> > might
> > have different properties.
> >
> The bit-depth of the LUT is an important piece of information we should
> include by default. Are we assuming that the DRM driver will always
> reduce the input values to the resolution supported by the pipeline?
> This could result in differences between the hardware behavior
> and the shader behavior.
> 
> Additionally, some pipelines are floating point while others are fixed.
> How would user space know if it needs to pack 32 bit integer values vs
> 32 bit float values?

Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
definition of LUT blob (u16 elements) and it's up to the driver to convert.

Using a very precise format for the uAPI has the nice property of making the
uAPI much simpler to use. User-space sends high precision data and it's up to
drivers to map that to whatever the hardware accepts.

Exposing the actual hardware precision is something we've talked about during
the hackfest. It'll probably be useful to some extent, but will require some
discussion to figure out how to design the uAPI. Maybe a simple property is
enough, maybe not (e.g. fully describing the precision of segmented LUTs would
probably be trickier).

I'd rather keep things simple for the first pass, we can always add more
properties for bit depth etc later on.

> > Here is another example with a 3D LUT:
> >
> >  Color operation 42
> >  ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >  ├─ "lut_size": immutable range = 33
> >  ├─ "lut_data": blob
> >  └─ "next": immutable color operation ID = 43
> >
> We are going to need to expose the packing order here to avoid any
> programming uncertainty. I don't think we can safely assume all hardware
> is equivalent.

The driver can easily change the layout of the matrix and do any conversion
necessary when programming the hardware. We do need to document what layout is
used in the uAPI for sure.

> > And one last example with a matrix:
> >
> >  Color operation 42
> >  ├─ "type": enum {Bypass, Matrix} = Matrix
> >  ├─ "matrix_data": blob
> >  └─ "next": immutable color operation ID = 43
> >
> It is unclear to me what the default sizing of this matrix is. Any
> objections to exposing these details with an additional property?

The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
that wouldn't be enough?

> Dithering logic exists in some pipelines. I think we need a plan to
> expose that here as well.

Hm, I'm not too familiar with dithering. Do you think it would make sense to
expose as an additional colorop block? Do you think it would have more
consequences on the design?

I want to re-iterate that we don't need to ship all features from day 1. We
just need to come up with a uAPI design on which new features can be built on.

> > [Simon note: an alternative would be to split the color pipeline into
> > two, by
> > having two plane properties ("color_pipeline_pre_scale" and
> > "color_pipeline_post_scale") instead of a single one. This would be
> > similar to
> > the way we want to split pre-blending and post-blending. This could be less
> > expressive for drivers, there may be hardware where there are dependencies
> > between the pre- and post-scaling pipeline?]
> >
> As others have noted, breaking up the pipeline with immutable blocks
> makes the most sense to me here. This way we don't have to predict ahead
> of time every type of block that maybe affected by pipeline ordering.
> Splitting the pipeline into two properties now means future
> logical splits would require introduction of further plane 

Re: [RFC] Plane color pipeline KMS uAPI

2023-06-09 Thread Christopher Braga

Hi all,

The goal of this RFC is to expose a generic KMS uAPI to configure the color
pipeline before blending, ie. after a pixel is tapped from a plane's
framebuffer and before it's blended with other planes. With this new 
uAPI we

aim to reduce the battery life impact of color management and HDR on mobile
devices, to improve performance and to decrease latency by skipping
composition on the 3D engine. This proposal is the result of discussions at
the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
familiar with the AMD, Intel and NVIDIA hardware have participated in the
discussion.

This proposal takes a prescriptive approach instead of a descriptive 
approach.

Drivers describe the available hardware blocks in terms of low-level
mathematical operations, then user-space configures each block. We decided
against a descriptive approach where user-space would provide a high-level
description of the colorspace and other parameters: we want to give more
control and flexibility to user-space, e.g. to be able to replicate 
exactly the

color pipeline with shaders and switch between shaders and KMS pipelines
seamlessly, and to avoid forcing user-space into a particular color 
management

policy.


Thanks for posting this Simon! This overview does a great job of
breaking down the proposal. A few questions inline below.


We've decided against mirroring the existing CRTC properties
DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
pipeline can significantly differ between vendors and this approach cannot
accurately abstract all hardware. In particular, the availability, 
ordering and
capabilities of hardware blocks is different on each display engine. So, 
we've

decided to go for a highly detailed hardware capability discovery.

This new uAPI should not be in conflict with existing standard KMS 
properties,

since there are none which control the pre-blending color pipeline at the
moment. It does conflict with any vendor-specific properties like
NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
properties. Drivers will need to either reject atomic commits 
configuring both
uAPIs, or alternatively we could add a DRM client cap which hides the 
vendor

properties and shows the new generic properties when enabled.

To use this uAPI, first user-space needs to discover hardware 
capabilities via
KMS objects and properties, then user-space can configure the hardware 
via an

atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.

Our proposal introduces a new "color_pipeline" plane property, and a new 
KMS
object type, "COLOROP" (short for color operation). The "color_pipeline" 
plane
property is an enum, each enum entry represents a color pipeline 
supported by

the hardware. The special zero entry indicates that the pipeline is in
"bypass"/"no-op" mode. For instance, the following plane properties 
describe a

primary plane with 2 supported pipelines but currently configured in bypass
mode:

     Plane 10
     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
     ├─ …
     └─ "color_pipeline": enum {0, 42, 52} = 0

The non-zero entries describe color pipelines as a linked list of 
COLOROP KMS

objects. The entry value is an object ID pointing to the head of the linked
list (the first operation in the color pipeline).

The new COLOROP objects also expose a number of KMS properties. Each has a
type, a reference to the next COLOROP object in the linked list, and other
type-specific properties. Here is an example for a 1D LUT operation:

     Color operation 42
     ├─ "type": enum {Bypass, 1D curve} = 1D curve
     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT

The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
curves? Will different hardware be allowed to expose a subset of these 
enum values?



     ├─ "lut_size": immutable range = 4096
     ├─ "lut_data": blob
     └─ "next": immutable color operation ID = 43

Some hardware has per channel 1D LUT values, while others use the same 
LUT for all channels.  We will definitely need to expose this in the 
UAPI in some form.


To configure this hardware block, user-space can fill a KMS blob with 
4096 u32
entries, then set "lut_data" to the blob ID. Other color operation types 
might

have different properties.


The bit-depth of the LUT is an important piece of information we should
include by default. Are we assuming that the DRM driver will always
reduce the input values to the resolution supported by the pipeline? 
This could result in differences between the hardware behavior

and the shader behavior.

Additionally, some pipelines are floating point while others are fixed. 
How would user space know if it needs to pack 32 bit integer values vs

32 bit float values?


Here is another example with a 3D LUT:

     Color operation 42
     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
     ├─ "lut_size": immutable range = 33
     ├─ "lut_data": 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-12 Thread Pekka Paalanen
On Thu, 11 May 2023 19:29:27 +
Simon Ser  wrote:

> On Thursday, May 11th, 2023 at 18:56, Joshua Ashton  wrote:
> 
> > When we are talking about being 'prescriptive' in the API, are we
> > outright saying we don't want to support arbitrary 3D LUTs, or are we
> > just offering certain algorithms to be 'executed' for a plane/crtc/etc
> > in the atomic API? I am confused...  
> 
> From a kernel PoV:
> 
> - Prescriptive = here are the available hardware blocks, feel free to
>   configure each as you like
> - Descriptive = give me the source and destination color-spaces and I
>   take care of everything
> 
> This proposal is a prescriptive API. We haven't explored _that_ much
> how a descriptive API would look like, probably it can include some way
> to do Night Light and similar features but not sure how high-level
> they'd look like. A descriptive API is inherently more restrictive than
> a prescriptive API.

Right. Just like Jonas said, an arbitrary 3D LUT is a well-defined
mathematical operation with no semantics at all, therefore it is a
prescriptive element. A 3D LUT does not fit well in a descriptive API
design, one would need to jump through lots of hoops to turn it into
something descriptive'ish (like ICC does).

I think Joshua mixed up the definitions of "descriptive" and
"prescriptive".

If Gamescope was using a descriptive KMS UAPI, then it would have very
little or no say in what color operations are done and how.

If Gamescope is using prescriptive KMS UAPI, then Gamescope has to know
exactly what it wants to do, how it wants to achieve that, and map that
to the available mathematical processing blocks.

A descriptive UAPI would mean all color policy is in the kernel. A
prescriptive UAPI means all policy is in userspace.

Wayland uses the opposite design principle of KMS UAPI. Wayland is
descriptive, KMS is prescriptive. This puts the color policy into a
Wayland compositor. If we have a library converting descriptive to
prescriptive, then that library contains a policy.

Going from descriptive to prescriptive is easy, just add policy. Going
from prescriptive to descriptive is practically impossible, because
you'd have to "subtract" any policy that has already been applied, in
order to understand what the starting point was.

Coming back to KMS, the color transformations must be prescriptive, but
then we also need to be able to send descriptive information to video
sinks so that video sinks understand what our pixel values mean.


Thanks,
pq


pgpUVy1QqYSpE.pgp
Description: OpenPGP digital signature


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-11 Thread Simon Ser
On Friday, May 5th, 2023 at 15:30, Joshua Ashton  wrote:

> > > AMD would expose the following objects and properties:
> > >
> > > Plane 10
> > > ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > > └─ "color_pipeline": enum {0, 42} = 0
> > > Color operation 42 (input CSC)
> > > ├─ "type": enum {Bypass, Matrix} = Matrix
> > > ├─ "matrix_data": blob
> > > └─ "next": immutable color operation ID = 43
> > > Color operation 43
> > > ├─ "type": enum {Scaling} = Scaling
> > > └─ "next": immutable color operation ID = 44
> > > Color operation 44 (DeGamma)
> > > ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > > ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
> > > └─ "next": immutable color operation ID = 45
> 
> Some vendors have per-tap degamma and some have a degamma after the sample.
> How do we distinguish that behaviour?
> It is important to know.

Can you elaborate? What is "per-tap" and "sample"? Is the "Scaling" color
operation above not enough to indicate where in the pipeline the hw performs
scaling?

> > > Color operation 45 (gamut remap)
> > > ├─ "type": enum {Bypass, Matrix} = Matrix
> > > ├─ "matrix_data": blob
> > > └─ "next": immutable color operation ID = 46
> > > Color operation 46 (shaper LUT RAM)
> > > ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > > ├─ "1d_curve_type": enum {LUT} = LUT
> > > ├─ "lut_size": immutable range = 4096
> > > ├─ "lut_data": blob
> > > └─ "next": immutable color operation ID = 47
> > > Color operation 47 (3D LUT RAM)
> > > ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> > > ├─ "lut_size": immutable range = 17
> > > ├─ "lut_data": blob
> > > └─ "next": immutable color operation ID = 48
> > > Color operation 48 (blend gamma)
> > > ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > > ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
> > > ├─ "lut_size": immutable range = 4096
> > > ├─ "lut_data": blob
> > > └─ "next": immutable color operation ID = 0
> > >
> > > To configure the pipeline for an HDR10 PQ plane (path at the top) and a 
> > > HDR
> > > display, gamescope would perform an atomic commit with the following 
> > > property
> > > values:
> > >
> > > Plane 10
> > > └─ "color_pipeline" = 42
> > > Color operation 42 (input CSC)
> > > └─ "matrix_data" = PQ → scRGB (TF)
> 
> ^
> Not sure what this is.
> We don't use an input CSC before degamma.
> 
> > > Color operation 44 (DeGamma)
> > > └─ "type" = Bypass
> 
> ^
> If we did PQ, this would be PQ -> Linear / 80
> If this was sRGB, it'd be sRGB -> Linear
> If this was scRGB this would be just treating it as it is. So... Linear / 80.
> 
> > > Color operation 45 (gamut remap)
> > > └─ "matrix_data" = scRGB (TF) → PQ
> 
> ^
> This is wrong, we just use this to do scRGB primaries (709) to 2020.
> 
> We then go from scRGB -> PQ to go into our shaper + 3D LUT.
> 
> > > Color operation 46 (shaper LUT RAM)
> > > └─ "lut_data" = PQ → Display native
> 
> ^
> "Display native" is just the response curve of the display.
> In HDR10, this would just be PQ -> PQ
> If we were doing HDR10 on SDR, this would be PQ -> Gamma 2.2 (mapped
> from 0 to display native luminance) [with a potential bit of headroom
> for tonemapping in the 3D LUT]
> For SDR on HDR10 this would be Gamma 2.2 -> PQ (Not intending to start
> an sRGB vs G2.2 argument here! :P)
> 
> > > Color operation 47 (3D LUT RAM)
> > > └─ "lut_data" = Gamut mapping + tone mapping + night mode
> > > Color operation 48 (blend gamma)
> > > └─ "1d_curve_type" = PQ
> 
> ^
> This is wrong, this should be Display Native -> Linearized Display Referred

In the HDR case, isn't this the inverse of PQ?

> > You cannot do a TF with a matrix, and a gamut remap with a matrix on
> > electrical values is certainly surprising, so the example here is a
> > bit odd, but I don't think that hurts the intention of demonstration.
> 
> I have done some corrections inline.
> 
> You can see our fully correct color pipeline here:
> https://raw.githubusercontent.com/ValveSoftware/gamescope/master/src/docs/Steam%20Deck%20Display%20Pipeline.png
> 
> Please let me know if you have any more questions about our color pipeline.

As expected, I got the gamescope part wrong. I'm pretty confident that the
proposed API would still work since the AMD vendor-specific props would just
be exposed as color operation objects. Can you confirm we can make the
gamescope pipeline work with the AMD color pipeline outlined above?


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-11 Thread Simon Ser
On Thursday, May 11th, 2023 at 18:56, Joshua Ashton  wrote:

> When we are talking about being 'prescriptive' in the API, are we
> outright saying we don't want to support arbitrary 3D LUTs, or are we
> just offering certain algorithms to be 'executed' for a plane/crtc/etc
> in the atomic API? I am confused...

>From a kernel PoV:

- Prescriptive = here are the available hardware blocks, feel free to
  configure each as you like
- Descriptive = give me the source and destination color-spaces and I
  take care of everything

This proposal is a prescriptive API. We haven't explored _that_ much
how a descriptive API would look like, probably it can include some way
to do Night Light and similar features but not sure how high-level
they'd look like. A descriptive API is inherently more restrictive than
a prescriptive API.


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-11 Thread Jonas Ådahl
On Thu, May 11, 2023 at 04:56:47PM +, Joshua Ashton wrote:
> When we are talking about being 'prescriptive' in the API, are we
> outright saying we don't want to support arbitrary 3D LUTs, or are we
> just offering certain algorithms to be 'executed' for a plane/crtc/etc
> in the atomic API? I am confused...

The 'prescriptive' idea that the RFC of this thread proposes *is* a way
to support arbitrary 3D LUTs (and other mathematical operations),
arbitrarily, in a somewhat vendored way, only that it will not be vendor
prefixed hard coded properties with specific positions in the pipeline,
but instead more or less an introspectable pipeline, describing what
kind of LUT's, Matrix multiplication (and in what order) etc a hardware
can do.

The theoretical userspace library would be the one turning descriptive
"please turn this into that" requests into the "prescriptive" color
pipeline operations. It would target general purpose compositors, but it
wouldn't be mandatory. Doing vendor specific implemantions in gamescope
would be possible; it wouldn't look like the verion that exist somewhere
now that uses a bunch of AMD_* properties, it'd look more like the
example Simon had in the initial RFC.


Jonas

> 
> There is so much stuff to do with color, that I don't think a
> prescriptive API in the kernel could ever keep up with the things that
> we want to be pushing from Gamescope/SteamOS. For example, we have so
> many things going on, night mode, SDR gamut widening, HDR/SDR gain,
> the ability to apply 'looks' for eg. invert luma or for retro looks,
> enhanced contrast, tonemapping, inverse tonemapping... We also are
> going to be doing a bunch of stuff with EETFs for handling out of
> range HDR content for scanout.
> 
> Some of what we do is kinda standard, regular "there is a paper on
> this" algorithms, and others are not.
> While yes, it might be very possible to do simple things, once you
> start wanting to do something 'different', that's kinda lock-in.
> 
> Whether this co-exists with arbitrary LUTs (that we definitely want
> for SteamOS) or not:
> I think putting a bunch of math-y stuff like this into the kernel is
> probably the complete wrong approach. Everything would need to be
> fixed point and it would be a huge pain in the butt to deal with on
> that side.
> 
> Maybe this is a "hot take", but IMO, DRM atomic is already waaay too
> much being done in the kernel space. I think making it go even further
> and having it be a prescriptive color API is a complete step in the
> wrong direction.
> 
> There is also the problem of... if there is a bug in the math here or
> we want to add a new feature, if it's kernel side, you are locked in
> to having that bug until the next release on your distro and probably
> years if it's a new feature!
> Updating kernels is much harder for 'enterprise' distros if it is not
> mission critical. Having all of this in userspace is completely fine
> however...
> 
> If you want to make some userspace prescriptive -> descriptive color
> library I am all for that for general case compositors, but I don't
> think I would use something like that in Gamescope.
> That's not to be rude, we are just picky and want freedom to do what
> we want and iterate on it easily.
> 
> I guess this all comes back to my initial point... having some
> userspace to handle stuff that is either kinda or entirely vendor
> specific is the right way of solving this problem :-P
> 
> - Joshie ✨
> 
> On Thu, 11 May 2023 at 09:51, Karol Herbst  wrote:
> >
> > On Wed, May 10, 2023 at 9:59 AM Jonas Ådahl  wrote:
> > >
> > > On Tue, May 09, 2023 at 08:22:30PM +, Simon Ser wrote:
> > > > On Tuesday, May 9th, 2023 at 21:53, Dave Airlie  
> > > > wrote:
> > > >
> > > > > There are also other vendor side effects to having this in userspace.
> > > > >
> > > > > Will the library have a loader?
> > > > > Will it allow proprietary plugins?
> > > > > Will it allow proprietary reimplementations?
> > > > > What will happen when a vendor wants distros to ship their
> > > > > proprietary fork of said library?
> > > > >
> > > > > How would NVIDIA integrate this with their proprietary stack?
> > > >
> > > > Since all color operations exposed by KMS are standard, the library
> > > > would just be a simple one: no loader, no plugin, no proprietary pieces,
> > > > etc.
> > > >
> > >
> > > There might be pipelines/color-ops only exposed by proprietary out of
> > > tree drivers; the operation types and semantics should ideally be
> > > defined upstream, but the code paths would in practice be vendor
> > > specific, potentially without any upstream driver using them. It should
> > > be clear whether an implementation that makes such a pipeline work is in
> > > scope for the upstream library.
> > >
> > > The same applies to the kernel; it must be clear whether pipeline
> > > elements that potentially will only be exposed by out of tree drivers
> > > will be acceptable upstream, at least as documented operations.
> > >
> >

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-11 Thread Joshua Ashton
When we are talking about being 'prescriptive' in the API, are we
outright saying we don't want to support arbitrary 3D LUTs, or are we
just offering certain algorithms to be 'executed' for a plane/crtc/etc
in the atomic API? I am confused...

There is so much stuff to do with color, that I don't think a
prescriptive API in the kernel could ever keep up with the things that
we want to be pushing from Gamescope/SteamOS. For example, we have so
many things going on, night mode, SDR gamut widening, HDR/SDR gain,
the ability to apply 'looks' for eg. invert luma or for retro looks,
enhanced contrast, tonemapping, inverse tonemapping... We also are
going to be doing a bunch of stuff with EETFs for handling out of
range HDR content for scanout.

Some of what we do is kinda standard, regular "there is a paper on
this" algorithms, and others are not.
While yes, it might be very possible to do simple things, once you
start wanting to do something 'different', that's kinda lock-in.

Whether this co-exists with arbitrary LUTs (that we definitely want
for SteamOS) or not:
I think putting a bunch of math-y stuff like this into the kernel is
probably the complete wrong approach. Everything would need to be
fixed point and it would be a huge pain in the butt to deal with on
that side.

Maybe this is a "hot take", but IMO, DRM atomic is already waaay too
much being done in the kernel space. I think making it go even further
and having it be a prescriptive color API is a complete step in the
wrong direction.

There is also the problem of... if there is a bug in the math here or
we want to add a new feature, if it's kernel side, you are locked in
to having that bug until the next release on your distro and probably
years if it's a new feature!
Updating kernels is much harder for 'enterprise' distros if it is not
mission critical. Having all of this in userspace is completely fine
however...

If you want to make some userspace prescriptive -> descriptive color
library I am all for that for general case compositors, but I don't
think I would use something like that in Gamescope.
That's not to be rude, we are just picky and want freedom to do what
we want and iterate on it easily.

I guess this all comes back to my initial point... having some
userspace to handle stuff that is either kinda or entirely vendor
specific is the right way of solving this problem :-P

- Joshie ✨

On Thu, 11 May 2023 at 09:51, Karol Herbst  wrote:
>
> On Wed, May 10, 2023 at 9:59 AM Jonas Ådahl  wrote:
> >
> > On Tue, May 09, 2023 at 08:22:30PM +, Simon Ser wrote:
> > > On Tuesday, May 9th, 2023 at 21:53, Dave Airlie  wrote:
> > >
> > > > There are also other vendor side effects to having this in userspace.
> > > >
> > > > Will the library have a loader?
> > > > Will it allow proprietary plugins?
> > > > Will it allow proprietary reimplementations?
> > > > What will happen when a vendor wants distros to ship their
> > > > proprietary fork of said library?
> > > >
> > > > How would NVIDIA integrate this with their proprietary stack?
> > >
> > > Since all color operations exposed by KMS are standard, the library
> > > would just be a simple one: no loader, no plugin, no proprietary pieces,
> > > etc.
> > >
> >
> > There might be pipelines/color-ops only exposed by proprietary out of
> > tree drivers; the operation types and semantics should ideally be
> > defined upstream, but the code paths would in practice be vendor
> > specific, potentially without any upstream driver using them. It should
> > be clear whether an implementation that makes such a pipeline work is in
> > scope for the upstream library.
> >
> > The same applies to the kernel; it must be clear whether pipeline
> > elements that potentially will only be exposed by out of tree drivers
> > will be acceptable upstream, at least as documented operations.
> >
>
> they aren't. All code in the kernel needs to be used by in-tree
> drivers otherwise it's fair to delete it. DRM requires any UAPI change
> to have a real open source user in space user.
>
> Nvidia knows this and they went to great lengths to fulfill this
> requirement in the past. They'll manage.
>
> >
> > Jonas
> >
>


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-11 Thread Karol Herbst
On Wed, May 10, 2023 at 9:59 AM Jonas Ådahl  wrote:
>
> On Tue, May 09, 2023 at 08:22:30PM +, Simon Ser wrote:
> > On Tuesday, May 9th, 2023 at 21:53, Dave Airlie  wrote:
> >
> > > There are also other vendor side effects to having this in userspace.
> > >
> > > Will the library have a loader?
> > > Will it allow proprietary plugins?
> > > Will it allow proprietary reimplementations?
> > > What will happen when a vendor wants distros to ship their
> > > proprietary fork of said library?
> > >
> > > How would NVIDIA integrate this with their proprietary stack?
> >
> > Since all color operations exposed by KMS are standard, the library
> > would just be a simple one: no loader, no plugin, no proprietary pieces,
> > etc.
> >
>
> There might be pipelines/color-ops only exposed by proprietary out of
> tree drivers; the operation types and semantics should ideally be
> defined upstream, but the code paths would in practice be vendor
> specific, potentially without any upstream driver using them. It should
> be clear whether an implementation that makes such a pipeline work is in
> scope for the upstream library.
>
> The same applies to the kernel; it must be clear whether pipeline
> elements that potentially will only be exposed by out of tree drivers
> will be acceptable upstream, at least as documented operations.
>

they aren't. All code in the kernel needs to be used by in-tree
drivers otherwise it's fair to delete it. DRM requires any UAPI change
to have a real open source user in space user.

Nvidia knows this and they went to great lengths to fulfill this
requirement in the past. They'll manage.

>
> Jonas
>



Re: [RFC] Plane color pipeline KMS uAPI

2023-05-10 Thread Pekka Paalanen
On Wed, 10 May 2023 09:59:21 +0200
Jonas Ådahl  wrote:

> On Tue, May 09, 2023 at 08:22:30PM +, Simon Ser wrote:
> > On Tuesday, May 9th, 2023 at 21:53, Dave Airlie  wrote:
> >   
> > > There are also other vendor side effects to having this in userspace.
> > > 
> > > Will the library have a loader?
> > > Will it allow proprietary plugins?
> > > Will it allow proprietary reimplementations?
> > > What will happen when a vendor wants distros to ship their
> > > proprietary fork of said library?
> > > 
> > > How would NVIDIA integrate this with their proprietary stack?  
> > 
> > Since all color operations exposed by KMS are standard, the library
> > would just be a simple one: no loader, no plugin, no proprietary pieces,
> > etc.
> >   
> 
> There might be pipelines/color-ops only exposed by proprietary out of
> tree drivers; the operation types and semantics should ideally be
> defined upstream, but the code paths would in practice be vendor
> specific, potentially without any upstream driver using them. It should
> be clear whether an implementation that makes such a pipeline work is in
> scope for the upstream library.
> 
> The same applies to the kernel; it must be clear whether pipeline
> elements that potentially will only be exposed by out of tree drivers
> will be acceptable upstream, at least as documented operations.

In my opinion, a COLOROP element definition can be accepted in the
upstream kernel documentation only if there is also an upstream driver
implementing it. It does not need to be a "direct" hardware
implementation, it could also be the upstream driver mapping the
COLOROP to whatever hardware block or block chain it has.

For the userspace library I don't know. I am puzzled whether people
want to allow proprietary components or deny them.


Thanks,
pq


pgp1ovgVcpYhH.pgp
Description: OpenPGP digital signature


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-10 Thread Pekka Paalanen
On Tue, 09 May 2023 20:22:30 +
Simon Ser  wrote:

> On Tuesday, May 9th, 2023 at 21:53, Dave Airlie  wrote:
> 
> > There are also other vendor side effects to having this in userspace.
> > 
> > Will the library have a loader?
> > Will it allow proprietary plugins?
> > Will it allow proprietary reimplementations?
> > What will happen when a vendor wants distros to ship their
> > proprietary fork of said library?
> > 
> > How would NVIDIA integrate this with their proprietary stack?  
> 
> Since all color operations exposed by KMS are standard, the library
> would just be a simple one: no loader, no plugin, no proprietary pieces,
> etc.

Hi,

that's certainly the long term goal, and *if* Linux software can in any
way guide hardware design, then I believe it is an achievable goal. I
understand "standard" as something that is widely implemented in
various hardware rather than only "well-defined and documented and
free to implement in any hardware if its vendor cared".

However, like I mentioned in my other reply to Steven, I expect there
will be a time period when each hardware has custom processing blocks
no other hardware (same or different vendor) has. I might not call them
outright proprietary though, because in order have them exposed via
UAPI, the mathematical model of the processing block must be documented
with its UAPI. This means there cannot be secrets on what the hardware
does, which means there cannot be a requirement for secret sauce in
userspace either.

I wonder if we can also require new COLOROP elements to be freely
implementable by anyone anywhere in any way one wants? Or do kernel
maintainers just need to NAK proposals for elements that might not be
that free?

Anything that is driver-chosen or automatic can also be proprietary,
because today's KMS UAPI rules do not require documenting how automatic
features work, e.g. the existing YUV-to-RGB conversion. Hardware could
have whatever wild skin tone improvement algorithms hidden in there for
example. In this new proposal, there cannot be undocumented behaviour.

Dave, if we went with a descriptive UAPI model, everything behind it
could be proprietary and secret. That's not open in the least.

On Wed, 10 May 2023 at 00:31, Harry Wentland  wrote:
>
> I am debating whether we need to be serious about a userspace library
> (or maybe a user-mode driver) to provide an abstraction from the
> descriptive to the prescriptive model. HW vendors need a way to provide
> timely support for new HW generations without requiring updates to a
> large number of compositors.  

Drivers can always map old COLOROP elements to new style hardware
blocks if they can achieve the same mathematical operation up to
whatever precision was promised before. I think that should be the main
form of supporting hardware evolution. Then also add new alternative
COLOROP elements that can better utilize the hardware block.

Naturally that means that COLOROP elements must be designed to be
somewhat generic to have a reasonable life time. They cannot be
extremely tightly married to the hardware implementation that might
cease to exist in the very next hardware revision.

Let's say some vendor has a hardware block that does a series of
operations in an optimized fashion, perhaps with hardwired constants.
This is exposed as a custom COLOROP element. The next hardware revision
no longer has this block, but it has a bunch of new blocks that can
produce the exact same result. The driver for this hardware can expose
two different pipelines: one using the old COLOROP element, and another
using a bunch of other COLOROP elements which exposes the new
flexibility of the hardware design better. If userspace chooses the
former pipeline, the driver just programs the bunch of blocks to behave
accordingly. Hopefully the other COLOROP elements will be more standard
than the old element.

Over time, I hope this causes an evolution where hardware implements
only the most standard COLOROP elements, and special-case compound
elements will eventually fall out of use over the decades.


Thanks,
pq


pgppuUALKyJ8w.pgp
Description: OpenPGP digital signature


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-10 Thread Jonas Ådahl
On Tue, May 09, 2023 at 08:22:30PM +, Simon Ser wrote:
> On Tuesday, May 9th, 2023 at 21:53, Dave Airlie  wrote:
> 
> > There are also other vendor side effects to having this in userspace.
> > 
> > Will the library have a loader?
> > Will it allow proprietary plugins?
> > Will it allow proprietary reimplementations?
> > What will happen when a vendor wants distros to ship their
> > proprietary fork of said library?
> > 
> > How would NVIDIA integrate this with their proprietary stack?
> 
> Since all color operations exposed by KMS are standard, the library
> would just be a simple one: no loader, no plugin, no proprietary pieces,
> etc.
> 

There might be pipelines/color-ops only exposed by proprietary out of
tree drivers; the operation types and semantics should ideally be
defined upstream, but the code paths would in practice be vendor
specific, potentially without any upstream driver using them. It should
be clear whether an implementation that makes such a pipeline work is in
scope for the upstream library.

The same applies to the kernel; it must be clear whether pipeline
elements that potentially will only be exposed by out of tree drivers
will be acceptable upstream, at least as documented operations.


Jonas



Re: [RFC] Plane color pipeline KMS uAPI

2023-05-09 Thread Simon Ser
On Tuesday, May 9th, 2023 at 21:53, Dave Airlie  wrote:

> There are also other vendor side effects to having this in userspace.
> 
> Will the library have a loader?
> Will it allow proprietary plugins?
> Will it allow proprietary reimplementations?
> What will happen when a vendor wants distros to ship their
> proprietary fork of said library?
> 
> How would NVIDIA integrate this with their proprietary stack?

Since all color operations exposed by KMS are standard, the library
would just be a simple one: no loader, no plugin, no proprietary pieces,
etc.


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-09 Thread Dave Airlie
On Wed, 10 May 2023 at 00:31, Harry Wentland  wrote:
>
>
>
> On 5/7/23 19:14, Dave Airlie wrote:
> > On Sat, 6 May 2023 at 08:21, Sebastian Wick  
> > wrote:
> >>
> >> On Fri, May 5, 2023 at 10:40 PM Dave Airlie  wrote:
> >>>
> >>> On Fri, 5 May 2023 at 01:23, Simon Ser  wrote:
> 
>  Hi all,
> 
>  The goal of this RFC is to expose a generic KMS uAPI to configure the 
>  color
>  pipeline before blending, ie. after a pixel is tapped from a plane's
>  framebuffer and before it's blended with other planes. With this new 
>  uAPI we
>  aim to reduce the battery life impact of color management and HDR on 
>  mobile
>  devices, to improve performance and to decrease latency by skipping
>  composition on the 3D engine. This proposal is the result of discussions 
>  at
>  the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
>  familiar with the AMD, Intel and NVIDIA hardware have participated in the
>  discussion.
> 
>  This proposal takes a prescriptive approach instead of a descriptive 
>  approach.
>  Drivers describe the available hardware blocks in terms of low-level
>  mathematical operations, then user-space configures each block. We 
>  decided
>  against a descriptive approach where user-space would provide a 
>  high-level
>  description of the colorspace and other parameters: we want to give more
>  control and flexibility to user-space, e.g. to be able to replicate 
>  exactly the
>  color pipeline with shaders and switch between shaders and KMS pipelines
>  seamlessly, and to avoid forcing user-space into a particular color 
>  management
>  policy.
> >>>
> >>> I'm not 100% sold on the prescriptive here, let's see if someone can
> >>> get me over the line with some questions later.
> >>>
> >>> My feeling is color pipeline hw is not a done deal, and that hw
> >>> vendors will be revising/evolving/churning the hw blocks for a while
> >>> longer, as there is no real standards in the area to aim for, all the
> >>> vendors are mostly just doing whatever gets Windows over the line and
> >>> keeps hw engineers happy. So I have some concerns here around forwards
> >>> compatibility and hence the API design.
> >>>
> >>> I guess my main concern is if you expose a bunch of hw blocks and
> >>> someone comes up with a novel new thing, will all existing userspace
> >>> work, without falling back to shaders?
> >>> Do we have minimum guarantees on what hardware blocks have to be
> >>> exposed to build a useable pipeline?
> >>> If a hardware block goes away in a new silicon revision, do I have to
> >>> rewrite my compositor? or will it be expected that the kernel will
> >>> emulate the old pipelines on top of whatever new fancy thing exists.
> >>
> >> I think there are two answers to those questions.
> >
> > These aren't selling me much better :-)
> >>
> >> The first one is that right now KMS already doesn't guarantee that
> >> every property is supported on all hardware. The guarantee we have is
> >> that properties that are supported on a piece of hardware on a
> >> specific kernel will be supported on the same hardware on later
> >> kernels. The color pipeline is no different here. For a specific piece
> >> of hardware a newer kernel might only change the pipelines in a
> >> backwards compatible way and add new pipelines.
> >>
> >> So to answer your question: if some hardware with a novel pipeline
> >> will show up it might not be supported and that's fine. We already
> >> have cases where some hardware does not support the gamma lut property
> >> but only the CSC property and that breaks night light because we never
> >> bothered to write a shader fallback. KMS provides ways to offload work
> >> but a generic user space always has to provide a fallback and this
> >> doesn't change. Hardware specific user space on the other hand will
> >> keep working with the forward compatibility guarantees we want to
> >> provide.
> >
> > In my mind we've screwed up already, isn't a case to be made for
> > continue down the same path.
> >
> > The kernel is meant to be a hardware abstraction layer, not just a
> > hardware exposure layer. The kernel shouldn't set policy and there are
> > cases where it can't act as an abstraction layer (like where you need
> > a compiler), but I'm not sold that this case is one of those yet. I'm
> > open to being educated here on why it would be.
> >
>
> Thanks for raising these points. When I started out looking at color
> management I favored the descriptive model. Most other HW vendors
> I've talked to also tell me that they think about descriptive APIs
> since that allows HW vendors to map that to whatever their HW supports.
>
> Sebastian, Pekka, and others managed to change my mind about this
> but I still keep having difficult questions within AMD.
>
> Sebastian, Pekka, and Jonas have already done a good job to describe
> our reasoning behind the 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-09 Thread Melissa Wen
On 05/09, Pekka Paalanen wrote:
> On Tue, 9 May 2023 10:23:49 -0100
> Melissa Wen  wrote:
> 
> > On 05/05, Joshua Ashton wrote:
> > > Some corrections and replies inline.
> > > 
> > > On Fri, 5 May 2023 at 12:42, Pekka Paalanen  wrote:  
> > > >
> > > > On Thu, 04 May 2023 15:22:59 +
> > > > Simon Ser  wrote:
> > > >  
> 
> ...
> 
> > > > > Color operation 47 (3D LUT RAM)
> > > > > └─ "lut_data" = Gamut mapping + tone mapping + night mode
> > > > > Color operation 48 (blend gamma)
> > > > > └─ "1d_curve_type" = PQ  
> > > 
> > > ^
> > > This is wrong, this should be Display Native -> Linearized Display 
> > > Referred  
> > 
> > This is a good point to discuss. I understand for the HDR10 case that we
> > are just setting an enumerated TF (that is PQ for this case - correct me
> > if I got it wrong) but, unlike when we use a user-LUT, we don't know
> > from the API that this enumerated TF value with an empty LUT is used for
> > linearizing/degamma. Perhaps this could come as a pair? Any idea?
> 
> PQ curve is an EOTF, so it's always from electrical to optical.
> 
> Are you asking for something like
> 
> "1d_curve_type" = "PQ EOTF"
> 
> vs.
> 
> "1d_curve_type" = "inverse PQ EOTF"?
> 
> I think that's how it should work. It's not a given that if a
> hardware block can do a curve, it can also do its inverse. They need to
> be advertised explicitly.

Sounds good and clear to me.

Thanks!

Melissa

> 
> 
> Thanks,
> pq
> 
> ps. I picked my nick in the 90s. Any resemblance to Perceptual
> Quantizer is unintended. ;-)

:D

> 
> 
> > > >
> > > > You cannot do a TF with a matrix, and a gamut remap with a matrix on
> > > > electrical values is certainly surprising, so the example here is a
> > > > bit odd, but I don't think that hurts the intention of demonstration.  
> > > 
> > > I have done some corrections inline.
> > > 
> > > You can see our fully correct color pipeline here:
> > > https://raw.githubusercontent.com/ValveSoftware/gamescope/master/src/docs/Steam%20Deck%20Display%20Pipeline.png
> > > 
> > > Please let me know if you have any more questions about our color 
> > > pipeline.




signature.asc
Description: PGP signature


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-09 Thread Harry Wentland



On 5/7/23 19:14, Dave Airlie wrote:
> On Sat, 6 May 2023 at 08:21, Sebastian Wick  wrote:
>>
>> On Fri, May 5, 2023 at 10:40 PM Dave Airlie  wrote:
>>>
>>> On Fri, 5 May 2023 at 01:23, Simon Ser  wrote:

 Hi all,

 The goal of this RFC is to expose a generic KMS uAPI to configure the color
 pipeline before blending, ie. after a pixel is tapped from a plane's
 framebuffer and before it's blended with other planes. With this new uAPI 
 we
 aim to reduce the battery life impact of color management and HDR on mobile
 devices, to improve performance and to decrease latency by skipping
 composition on the 3D engine. This proposal is the result of discussions at
 the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
 familiar with the AMD, Intel and NVIDIA hardware have participated in the
 discussion.

 This proposal takes a prescriptive approach instead of a descriptive 
 approach.
 Drivers describe the available hardware blocks in terms of low-level
 mathematical operations, then user-space configures each block. We decided
 against a descriptive approach where user-space would provide a high-level
 description of the colorspace and other parameters: we want to give more
 control and flexibility to user-space, e.g. to be able to replicate 
 exactly the
 color pipeline with shaders and switch between shaders and KMS pipelines
 seamlessly, and to avoid forcing user-space into a particular color 
 management
 policy.
>>>
>>> I'm not 100% sold on the prescriptive here, let's see if someone can
>>> get me over the line with some questions later.
>>>
>>> My feeling is color pipeline hw is not a done deal, and that hw
>>> vendors will be revising/evolving/churning the hw blocks for a while
>>> longer, as there is no real standards in the area to aim for, all the
>>> vendors are mostly just doing whatever gets Windows over the line and
>>> keeps hw engineers happy. So I have some concerns here around forwards
>>> compatibility and hence the API design.
>>>
>>> I guess my main concern is if you expose a bunch of hw blocks and
>>> someone comes up with a novel new thing, will all existing userspace
>>> work, without falling back to shaders?
>>> Do we have minimum guarantees on what hardware blocks have to be
>>> exposed to build a useable pipeline?
>>> If a hardware block goes away in a new silicon revision, do I have to
>>> rewrite my compositor? or will it be expected that the kernel will
>>> emulate the old pipelines on top of whatever new fancy thing exists.
>>
>> I think there are two answers to those questions.
> 
> These aren't selling me much better :-)
>>
>> The first one is that right now KMS already doesn't guarantee that
>> every property is supported on all hardware. The guarantee we have is
>> that properties that are supported on a piece of hardware on a
>> specific kernel will be supported on the same hardware on later
>> kernels. The color pipeline is no different here. For a specific piece
>> of hardware a newer kernel might only change the pipelines in a
>> backwards compatible way and add new pipelines.
>>
>> So to answer your question: if some hardware with a novel pipeline
>> will show up it might not be supported and that's fine. We already
>> have cases where some hardware does not support the gamma lut property
>> but only the CSC property and that breaks night light because we never
>> bothered to write a shader fallback. KMS provides ways to offload work
>> but a generic user space always has to provide a fallback and this
>> doesn't change. Hardware specific user space on the other hand will
>> keep working with the forward compatibility guarantees we want to
>> provide.
> 
> In my mind we've screwed up already, isn't a case to be made for
> continue down the same path.
> 
> The kernel is meant to be a hardware abstraction layer, not just a
> hardware exposure layer. The kernel shouldn't set policy and there are
> cases where it can't act as an abstraction layer (like where you need
> a compiler), but I'm not sold that this case is one of those yet. I'm
> open to being educated here on why it would be.
> 

Thanks for raising these points. When I started out looking at color
management I favored the descriptive model. Most other HW vendors
I've talked to also tell me that they think about descriptive APIs
since that allows HW vendors to map that to whatever their HW supports.

Sebastian, Pekka, and others managed to change my mind about this
but I still keep having difficult questions within AMD.

Sebastian, Pekka, and Jonas have already done a good job to describe
our reasoning behind the prescriptive model. It might be helpful to
see how different the results of different tone-mapping operators
can look:

http://helgeseetzen.com/wp-content/uploads/2017/06/HS1.pdf

According to my understanding all other platforms that have HDR now
have a single compositor. At least 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-09 Thread Pekka Paalanen
On Tue, 9 May 2023 10:23:49 -0100
Melissa Wen  wrote:

> On 05/05, Joshua Ashton wrote:
> > Some corrections and replies inline.
> > 
> > On Fri, 5 May 2023 at 12:42, Pekka Paalanen  wrote:  
> > >
> > > On Thu, 04 May 2023 15:22:59 +
> > > Simon Ser  wrote:
> > >  

...

> > > > Color operation 47 (3D LUT RAM)
> > > > └─ "lut_data" = Gamut mapping + tone mapping + night mode
> > > > Color operation 48 (blend gamma)
> > > > └─ "1d_curve_type" = PQ  
> > 
> > ^
> > This is wrong, this should be Display Native -> Linearized Display Referred 
> >  
> 
> This is a good point to discuss. I understand for the HDR10 case that we
> are just setting an enumerated TF (that is PQ for this case - correct me
> if I got it wrong) but, unlike when we use a user-LUT, we don't know
> from the API that this enumerated TF value with an empty LUT is used for
> linearizing/degamma. Perhaps this could come as a pair? Any idea?

PQ curve is an EOTF, so it's always from electrical to optical.

Are you asking for something like

"1d_curve_type" = "PQ EOTF"

vs.

"1d_curve_type" = "inverse PQ EOTF"?

I think that's how it should work. It's not a given that if a
hardware block can do a curve, it can also do its inverse. They need to
be advertised explicitly.


Thanks,
pq

ps. I picked my nick in the 90s. Any resemblance to Perceptual
Quantizer is unintended. ;-)


> > >
> > > You cannot do a TF with a matrix, and a gamut remap with a matrix on
> > > electrical values is certainly surprising, so the example here is a
> > > bit odd, but I don't think that hurts the intention of demonstration.  
> > 
> > I have done some corrections inline.
> > 
> > You can see our fully correct color pipeline here:
> > https://raw.githubusercontent.com/ValveSoftware/gamescope/master/src/docs/Steam%20Deck%20Display%20Pipeline.png
> > 
> > Please let me know if you have any more questions about our color pipeline.


pgpGkHFPNJN5T.pgp
Description: OpenPGP digital signature


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-09 Thread Melissa Wen
On 05/05, Joshua Ashton wrote:
> Some corrections and replies inline.
> 
> On Fri, 5 May 2023 at 12:42, Pekka Paalanen  wrote:
> >
> > On Thu, 04 May 2023 15:22:59 +
> > Simon Ser  wrote:
> >
> > > Hi all,
> > >
> > > The goal of this RFC is to expose a generic KMS uAPI to configure the 
> > > color
> > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > framebuffer and before it's blended with other planes. With this new uAPI 
> > > we
> > > aim to reduce the battery life impact of color management and HDR on 
> > > mobile
> > > devices, to improve performance and to decrease latency by skipping
> > > composition on the 3D engine. This proposal is the result of discussions 
> > > at
> > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > > discussion.
> >
> > Hi Simon,
> >
> > this is an excellent write-up, thank you!
> >
> > Harry's question about what constitutes UAPI is a good one for danvet.
> >
> > I don't really have much to add here, a couple inline comments. I think
> > this could work.
> >
> > >
> > > This proposal takes a prescriptive approach instead of a descriptive 
> > > approach.
> > > Drivers describe the available hardware blocks in terms of low-level
> > > mathematical operations, then user-space configures each block. We decided
> > > against a descriptive approach where user-space would provide a high-level
> > > description of the colorspace and other parameters: we want to give more
> > > control and flexibility to user-space, e.g. to be able to replicate 
> > > exactly the
> > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > seamlessly, and to avoid forcing user-space into a particular color 
> > > management
> > > policy.
> > >
> > > We've decided against mirroring the existing CRTC properties
> > > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> > > pipeline can significantly differ between vendors and this approach cannot
> > > accurately abstract all hardware. In particular, the availability, 
> > > ordering and
> > > capabilities of hardware blocks is different on each display engine. So, 
> > > we've
> > > decided to go for a highly detailed hardware capability discovery.
> > >
> > > This new uAPI should not be in conflict with existing standard KMS 
> > > properties,
> > > since there are none which control the pre-blending color pipeline at the
> > > moment. It does conflict with any vendor-specific properties like
> > > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> > > properties. Drivers will need to either reject atomic commits configuring 
> > > both
> > > uAPIs, or alternatively we could add a DRM client cap which hides the 
> > > vendor
> > > properties and shows the new generic properties when enabled.
> > >
> > > To use this uAPI, first user-space needs to discover hardware 
> > > capabilities via
> > > KMS objects and properties, then user-space can configure the hardware 
> > > via an
> > > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> > >
> > > Our proposal introduces a new "color_pipeline" plane property, and a new 
> > > KMS
> > > object type, "COLOROP" (short for color operation). The "color_pipeline" 
> > > plane
> > > property is an enum, each enum entry represents a color pipeline 
> > > supported by
> > > the hardware. The special zero entry indicates that the pipeline is in
> > > "bypass"/"no-op" mode. For instance, the following plane properties 
> > > describe a
> > > primary plane with 2 supported pipelines but currently configured in 
> > > bypass
> > > mode:
> > >
> > > Plane 10
> > > ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > > ├─ …
> > > └─ "color_pipeline": enum {0, 42, 52} = 0
> > >
> > > The non-zero entries describe color pipelines as a linked list of COLOROP 
> > > KMS
> > > objects. The entry value is an object ID pointing to the head of the 
> > > linked
> > > list (the first operation in the color pipeline).
> > >
> > > The new COLOROP objects also expose a number of KMS properties. Each has a
> > > type, a reference to the next COLOROP object in the linked list, and other
> > > type-specific properties. Here is an example for a 1D LUT operation:
> > >
> > > Color operation 42
> > > ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > > ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
> > > ├─ "lut_size": immutable range = 4096
> > > ├─ "lut_data": blob
> > > └─ "next": immutable color operation ID = 43
> > >
> > > To configure this hardware block, user-space can fill a KMS blob with 
> > > 4096 u32
> > > entries, then set "lut_data" to the blob ID. Other color operation types 
> > > might
> > > have different properties.
> > >
> > > Here is another example with a 3D LUT:
> > >
> > > Color operation 42
> > > 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-09 Thread Pekka Paalanen
On Mon, 8 May 2023 18:54:09 -0500
Steven Kucharzyk  wrote:

> I'd like to ask if there is a block/flow chart/diagram that has been
> created that represent the elements that are being discussed for this
> RFC? If so, would you be so kind as to point me to it or send it to me?

Hi Steven,

the whole point of the design is that there is no predefined block
diagram or flow chart. It would not fit hardware well, as hardware
generations and vendors do not generally have a common design. Instead,
the idea is to model what the hardware can do, and for that each driver
will create a set of specific pipelines the hardware implements.
Userspace then choose a pipeline that suits it and populates its
parameters.

As for the elements themselves, we can hopefully define some commonly
available types, but undoubtedly there will be a few hardware-specific
elements as well. Otherwise some piece of special hardware functionality
cannot be used at all.

The job of defining a generic pipeline model and mapping that to actual
hardware elements is left for a userspace library. I expect there will
be multiple pipeline models, more to be introduced over time. Hence
putting that in a userspace library instead of carving it in stone at
the kernel UAPI.


Next time, please do use reply-to-all, you have again dropped everyone
and other mailing lists from the CC.


Thanks,
pq


pgpigx3ybIHZj.pgp
Description: OpenPGP digital signature


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-09 Thread Pekka Paalanen
On Thu, 04 May 2023 15:22:59 +
Simon Ser  wrote:

> Hi all,
> 
> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> pipeline before blending, ie. after a pixel is tapped from a plane's
> framebuffer and before it's blended with other planes. With this new uAPI we
> aim to reduce the battery life impact of color management and HDR on mobile
> devices, to improve performance and to decrease latency by skipping
> composition on the 3D engine. This proposal is the result of discussions at
> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> discussion.
> 
> This proposal takes a prescriptive approach instead of a descriptive approach.
> Drivers describe the available hardware blocks in terms of low-level
> mathematical operations, then user-space configures each block. We decided
> against a descriptive approach where user-space would provide a high-level
> description of the colorspace and other parameters: we want to give more
> control and flexibility to user-space, e.g. to be able to replicate exactly 
> the
> color pipeline with shaders and switch between shaders and KMS pipelines
> seamlessly, and to avoid forcing user-space into a particular color management
> policy.
> 
> We've decided against mirroring the existing CRTC properties
> DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> pipeline can significantly differ between vendors and this approach cannot
> accurately abstract all hardware. In particular, the availability, ordering 
> and
> capabilities of hardware blocks is different on each display engine. So, we've
> decided to go for a highly detailed hardware capability discovery.
> 
> This new uAPI should not be in conflict with existing standard KMS properties,
> since there are none which control the pre-blending color pipeline at the
> moment. It does conflict with any vendor-specific properties like
> NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> properties. Drivers will need to either reject atomic commits configuring both
> uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> properties and shows the new generic properties when enabled.

Hi,

I have some further ideas which do conflict with some existing KMS
properties. This dives into the color encoding specific side of the
UAPI.

The main idea is to make the color pipeline not specific to RGB. We
might actually be having YCbCr, XYZ, ICtCp and whatnot instead, at
least in the middle of a pipeline. The aim is to avoid the confusion
from statements like "my red channel is actually luma and not red". So
it's purely syntactic. ISTR some people being against saying "R is just
a channel name, it's not necessarily a red component."

Therefore I propose to address the color channels with indices instead:

ch 0, ch 1, ch 2, ch 3

Then we define the mapping between pixel and wire formats and the
indices:

R = ch 0
G = ch 1
B = ch 2
A = ch 3

Y = ch 0
U = ch 1
V = ch 2

If necessary, the following can also be defined:

Z = ch 1
X = ch 2
L = ch 0
M = ch 1
S = ch 2

The Y from YUV and Y from XYZ share the designation for the name's sake
although they are not the same quantity. If YUV is not a well-defined
designation wrt. YCbCr, ICtCp and everything else in the same category,
we can assign Cb, Cr, I, Ct, Cp etc. instead. That might be more clear
anyway even if there is a popular convention.

We can also choose differently, to e.g. match the H.273 mapping where
channels are assigned such that Y=G, Cr=R, Cb=B. H.273 gives mappings
between almost all of these, so if using those make more sense, then
let's use those. In the end it shouldn't matter too much, since one
does not arbitrarily mix channels from different formats. Special care
needs to be taken when defining COLOROP elements that do not handle all
channels interchangeably. (E.g. a curve set element is mostly
channel-agnostic when it applies the same curve to channels 0-2, but ch
3 is pass-through.)

Then, we define COLOROP elements in terms of channel indices. This
removes any implied connection to any specific color coding. Elements
that just do not make sense for arbitrary channel ordering, e.g.
special-case elements or enumerated matrix elements with a specific
purpose, will document the intended usage and the expected channel
mapping.

The main reason to do all this is to ultimately allow e.g. limited
range YCbCr scanout with a fully pass-through pipeline with no implied
conversion to or from RGB.

This is where some existing KMS properties will conflict: those that
affect how current implicit YUV-RGB conversions are done. These
properties shall be replaced with COLOROP elements in pipelines, so
that they can be controlled explicitly and we can know where they
reside wrt. e.g. 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-08 Thread Harry Wentland



On 5/8/23 05:18, Daniel Vetter wrote:
> On Mon, 8 May 2023 at 10:58, Simon Ser  wrote:
>>
>> On Friday, May 5th, 2023 at 21:53, Daniel Vetter  wrote:
>>
>>> On Fri, May 05, 2023 at 04:06:26PM +, Simon Ser wrote:
 On Friday, May 5th, 2023 at 17:28, Daniel Vetter  wrote:

> Ok no comments from me on the actual color operations and semantics of all
> that, because I have simply nothing to bring to that except confusion :-)
>
> Some higher level thoughts instead:
>
> - I really like that we just go with graph nodes here. I think that was
>   bound to happen sooner or later with kms (we almost got there with
>   writeback, and with hindsight maybe should have).

 I'd really rather not do graphs here. We only need linked lists as 
 Sebastian
 said. Graphs would significantly add more complexity to this proposal, and
 I don't think that's a good idea unless there is a strong use-case.
>>>
>>> You have a graph, because a graph is just nodes + links. I did _not_
>>> propose a full generic graph structure, the link pointer would be in the
>>> class/type specific structure only. Like how we have the plane->crtc or
>>> connector->crtc links already like that (which already _is_ is full blown
>>> graph).
>>
>> I really don't get why a pointer in a struct makes plane->crtc a full-blown
>> graph. There is only a single parent-child link. A plane has a reference to a
>> CRTC, and nothing more.
>>
>> You could say that anything is a graph. Yes, even an isolated struct 
>> somewhere
>> is a graph: one with a single node and no link. But I don't follow what's the
>> point of explaining everything with a graph when we only need a much simpler
>> subset of the concept of graphs?
>>
>> Putting the graph thing aside, what are you suggesting exactly from a 
>> concrete
>> uAPI point-of-view? Introducing a new struct type? Would it be a colorop
>> specific struct, or a more generic one? What would be the fields? Why do you
>> think that's necessary and better than the current proposal?
>>
>> My understanding so far is that you're suggesting introducing something like
>> this at the uAPI level:
>>
>> struct drm_mode_node {
>> uint32_t id;
>>
>> uint32_t children_count;
>> uint32_t *children; // list of child object IDs
>> };
> 
> Already too much I think
> 
> struct drm_mode_node {
> struct drm_mode_object base;
> struct drm_private_obj atomic_base;
> enum drm_mode_node_enum type;
> };
> 

This would be about as much as we would want for a 'node' struct, for
reasons that others already outlined. In short, a good API for a color
pipeline needs to do a good job communicating the constraints. Hence the
"next" pointer needs to be live in a colorop struct, whether it's a
drm_private_obj or its own thing.

I'm not quite seeing much benefits with a drm_mode_node other than being
able to have a GET_NODE IOCTL instead of a GET_COLOROP, the former being
able to be re-used for future scenarios that might need a "node." I feel
this adds a layer of confusion to the API.

Harry

> The actual graph links would be in the specific type's state
> structure, like they are for everything else. And the limits would be
> on the property type, we probably need a new DRM_MODE_PROP_OBJECT_ENUM
> to make the new limitations work correctly, since the current
> DRM_MODE_PROP_OBJECT only limits to a specific type of object, not an
> explicit list of drm_mode_object.id.
> 
> You might not even need a node subclass for the state stuff, that
> would directly be a drm_color_op_state that only embeds
> drm_private_state.
> 
> Another uapi difference is that the new kms objects would be of type
> DRM_MODE_OBJECT_NODE, and would always have a "class" property.
> 
>> I don't think this is a good idea for multiple reasons. First, this is
>> overkill: we don't need this complexity, and this complexity will make it 
>> more
>> difficult to reason about the color pipeline. This is a premature 
>> abstraction,
>> one we don't need right now, and one I heaven't heard a potential future
>> use-case for. Sure, one can kill an ant with a sledgehammer if they'd like, 
>> but
>> that's not the right tool for the job.
>>
>> Second, this will make user-space miserable. User-space already has a tricky
>> task to achieve to translate its abstract descriptive color pipeline to our
>> proposed simple list of color operations. If we expose a full-blown graph, 
>> then
>> the user-space logic will need to handle arbitrary graphs. This will have a
>> significant cost (on implementation and testing), which we will be paying in
>> terms of time spent and in terms of bugs.
> 
> The color op pipeline would still be linear. I did not ask for a non-linear 
> one.
> 
>> Last, this kind of generic "node" struct is at odds with existing KMS object
>> types. So far, KMS objects are concrete like CRTC, connector, plane, etc.
>> "Node" is abstract. This is inconsistent.
> 
> Yeah I think I 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-08 Thread Jonas Ådahl
On Mon, May 08, 2023 at 09:14:18AM +1000, Dave Airlie wrote:
> On Sat, 6 May 2023 at 08:21, Sebastian Wick  wrote:
> >
> > On Fri, May 5, 2023 at 10:40 PM Dave Airlie  wrote:
> > >
> > > On Fri, 5 May 2023 at 01:23, Simon Ser  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > The goal of this RFC is to expose a generic KMS uAPI to configure the 
> > > > color
> > > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > > framebuffer and before it's blended with other planes. With this new 
> > > > uAPI we
> > > > aim to reduce the battery life impact of color management and HDR on 
> > > > mobile
> > > > devices, to improve performance and to decrease latency by skipping
> > > > composition on the 3D engine. This proposal is the result of 
> > > > discussions at
> > > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > > familiar with the AMD, Intel and NVIDIA hardware have participated in 
> > > > the
> > > > discussion.
> > > >
> > > > This proposal takes a prescriptive approach instead of a descriptive 
> > > > approach.
> > > > Drivers describe the available hardware blocks in terms of low-level
> > > > mathematical operations, then user-space configures each block. We 
> > > > decided
> > > > against a descriptive approach where user-space would provide a 
> > > > high-level
> > > > description of the colorspace and other parameters: we want to give more
> > > > control and flexibility to user-space, e.g. to be able to replicate 
> > > > exactly the
> > > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > > seamlessly, and to avoid forcing user-space into a particular color 
> > > > management
> > > > policy.
> > >
> > > I'm not 100% sold on the prescriptive here, let's see if someone can
> > > get me over the line with some questions later.
> > >
> > > My feeling is color pipeline hw is not a done deal, and that hw
> > > vendors will be revising/evolving/churning the hw blocks for a while
> > > longer, as there is no real standards in the area to aim for, all the
> > > vendors are mostly just doing whatever gets Windows over the line and
> > > keeps hw engineers happy. So I have some concerns here around forwards
> > > compatibility and hence the API design.
> > >
> > > I guess my main concern is if you expose a bunch of hw blocks and
> > > someone comes up with a novel new thing, will all existing userspace
> > > work, without falling back to shaders?
> > > Do we have minimum guarantees on what hardware blocks have to be
> > > exposed to build a useable pipeline?
> > > If a hardware block goes away in a new silicon revision, do I have to
> > > rewrite my compositor? or will it be expected that the kernel will
> > > emulate the old pipelines on top of whatever new fancy thing exists.
> >
> > I think there are two answers to those questions.
> 
> These aren't selling me much better :-)
> >
> > The first one is that right now KMS already doesn't guarantee that
> > every property is supported on all hardware. The guarantee we have is
> > that properties that are supported on a piece of hardware on a
> > specific kernel will be supported on the same hardware on later
> > kernels. The color pipeline is no different here. For a specific piece
> > of hardware a newer kernel might only change the pipelines in a
> > backwards compatible way and add new pipelines.
> >
> > So to answer your question: if some hardware with a novel pipeline
> > will show up it might not be supported and that's fine. We already
> > have cases where some hardware does not support the gamma lut property
> > but only the CSC property and that breaks night light because we never
> > bothered to write a shader fallback. KMS provides ways to offload work
> > but a generic user space always has to provide a fallback and this
> > doesn't change. Hardware specific user space on the other hand will
> > keep working with the forward compatibility guarantees we want to
> > provide.
> 
> In my mind we've screwed up already, isn't a case to be made for
> continue down the same path.
> 
> The kernel is meant to be a hardware abstraction layer, not just a
> hardware exposure layer. The kernel shouldn't set policy and there are
> cases where it can't act as an abstraction layer (like where you need
> a compiler), but I'm not sold that this case is one of those yet. I'm
> open to being educated here on why it would be.

It would still be an abstraction of the hardware, just that the level
of abstraction is a bit "lower" than your intuition currently tells you
we should have. IMO it's not too different from the kernel providing low
level input events describing what what the hardware can do and does,
with a rather massive user space library (libinput) turning all of that
low level nonsense to actual useful abstractions.

In this case it's the other way around, the kernel provides vendor
independent knobs that describe what the output hardware can do, and
exactly how it 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-08 Thread Pekka Paalanen
On Mon, 8 May 2023 09:14:18 +1000
Dave Airlie  wrote:

> On Sat, 6 May 2023 at 08:21, Sebastian Wick  wrote:
> >
> > On Fri, May 5, 2023 at 10:40 PM Dave Airlie  wrote:  
> > >
> > > On Fri, 5 May 2023 at 01:23, Simon Ser  wrote:  
> > > >
> > > > Hi all,
> > > >
> > > > The goal of this RFC is to expose a generic KMS uAPI to configure the 
> > > > color
> > > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > > framebuffer and before it's blended with other planes. With this new 
> > > > uAPI we
> > > > aim to reduce the battery life impact of color management and HDR on 
> > > > mobile
> > > > devices, to improve performance and to decrease latency by skipping
> > > > composition on the 3D engine. This proposal is the result of 
> > > > discussions at
> > > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > > familiar with the AMD, Intel and NVIDIA hardware have participated in 
> > > > the
> > > > discussion.
> > > >
> > > > This proposal takes a prescriptive approach instead of a descriptive 
> > > > approach.
> > > > Drivers describe the available hardware blocks in terms of low-level
> > > > mathematical operations, then user-space configures each block. We 
> > > > decided
> > > > against a descriptive approach where user-space would provide a 
> > > > high-level
> > > > description of the colorspace and other parameters: we want to give more
> > > > control and flexibility to user-space, e.g. to be able to replicate 
> > > > exactly the
> > > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > > seamlessly, and to avoid forcing user-space into a particular color 
> > > > management
> > > > policy.  
> > >
> > > I'm not 100% sold on the prescriptive here, let's see if someone can
> > > get me over the line with some questions later.

Hi Dave,

generic userspace must always be able to fall back to GPU shaders or
something else, when a window suddenly stops being eligible for a KMS
plane. That can happen due to a simple window re-stacking operation for
example, maybe a notification pops up temporarily. Hence, it is highly
desirable to be able to implement the exact same algorithm in shaders
as the display hardware does, in order to not cause visible glitches
on screen.

One way to do that is to have a prescriptive UAPI design. Userspace
decides what algorithms to use for color processing, and the UAPI simply
offers a way to implement those well-defined mathematical operations.
An alternative could be that the UAPI gives userspace back shader
programs that implement the same as what the hardware does, but... ugh.

Choosing the algorithm is policy. Userspace must be in control of
policy, right? Therefore a descriptive UAPI is simply not possible.
There is no single correct algorithm for these things, there are many
flavors, more and less correct, different quality/performance
trade-offs, and even just matters of taste. Sometimes even end user
taste, that might need to be configurable. Applications have built-in
assumptions too, and they vary.

To clarify, a descriptive UAPI is a design where userspace tells the
kernel "my source 1 is sRGB, my source 2 is BT.2100/PQ YCbCr 4:2:0 with
blahblahblah metadata, do whatever to display those on KMS planes
simultaneously". As I mentioned, there is not just one answer to that,
and we should also allow for innovation in the algorithms by everyone,
not just hardware designers.

A prescriptive UAPI is where we communicate mathematical operations
without any semantics. It is inherently free of policy in the kernel.

> > >
> > > My feeling is color pipeline hw is not a done deal, and that hw
> > > vendors will be revising/evolving/churning the hw blocks for a while
> > > longer, as there is no real standards in the area to aim for, all the
> > > vendors are mostly just doing whatever gets Windows over the line and
> > > keeps hw engineers happy. So I have some concerns here around forwards
> > > compatibility and hence the API design.
> > >
> > > I guess my main concern is if you expose a bunch of hw blocks and
> > > someone comes up with a novel new thing, will all existing userspace
> > > work, without falling back to shaders?
> > > Do we have minimum guarantees on what hardware blocks have to be
> > > exposed to build a useable pipeline?
> > > If a hardware block goes away in a new silicon revision, do I have to
> > > rewrite my compositor? or will it be expected that the kernel will
> > > emulate the old pipelines on top of whatever new fancy thing exists.  
> >
> > I think there are two answers to those questions.  
> 
> These aren't selling me much better :-)
> >
> > The first one is that right now KMS already doesn't guarantee that
> > every property is supported on all hardware. The guarantee we have is
> > that properties that are supported on a piece of hardware on a
> > specific kernel will be supported on the same hardware on later
> > kernels. The color pipeline is no 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-08 Thread Daniel Vetter
On Mon, 8 May 2023 at 10:58, Simon Ser  wrote:
>
> On Friday, May 5th, 2023 at 21:53, Daniel Vetter  wrote:
>
> > On Fri, May 05, 2023 at 04:06:26PM +, Simon Ser wrote:
> > > On Friday, May 5th, 2023 at 17:28, Daniel Vetter  wrote:
> > >
> > > > Ok no comments from me on the actual color operations and semantics of 
> > > > all
> > > > that, because I have simply nothing to bring to that except confusion 
> > > > :-)
> > > >
> > > > Some higher level thoughts instead:
> > > >
> > > > - I really like that we just go with graph nodes here. I think that was
> > > >   bound to happen sooner or later with kms (we almost got there with
> > > >   writeback, and with hindsight maybe should have).
> > >
> > > I'd really rather not do graphs here. We only need linked lists as 
> > > Sebastian
> > > said. Graphs would significantly add more complexity to this proposal, and
> > > I don't think that's a good idea unless there is a strong use-case.
> >
> > You have a graph, because a graph is just nodes + links. I did _not_
> > propose a full generic graph structure, the link pointer would be in the
> > class/type specific structure only. Like how we have the plane->crtc or
> > connector->crtc links already like that (which already _is_ is full blown
> > graph).
>
> I really don't get why a pointer in a struct makes plane->crtc a full-blown
> graph. There is only a single parent-child link. A plane has a reference to a
> CRTC, and nothing more.
>
> You could say that anything is a graph. Yes, even an isolated struct somewhere
> is a graph: one with a single node and no link. But I don't follow what's the
> point of explaining everything with a graph when we only need a much simpler
> subset of the concept of graphs?
>
> Putting the graph thing aside, what are you suggesting exactly from a concrete
> uAPI point-of-view? Introducing a new struct type? Would it be a colorop
> specific struct, or a more generic one? What would be the fields? Why do you
> think that's necessary and better than the current proposal?
>
> My understanding so far is that you're suggesting introducing something like
> this at the uAPI level:
>
> struct drm_mode_node {
> uint32_t id;
>
> uint32_t children_count;
> uint32_t *children; // list of child object IDs
> };

Already too much I think

struct drm_mode_node {
struct drm_mode_object base;
struct drm_private_obj atomic_base;
enum drm_mode_node_enum type;
};

The actual graph links would be in the specific type's state
structure, like they are for everything else. And the limits would be
on the property type, we probably need a new DRM_MODE_PROP_OBJECT_ENUM
to make the new limitations work correctly, since the current
DRM_MODE_PROP_OBJECT only limits to a specific type of object, not an
explicit list of drm_mode_object.id.

You might not even need a node subclass for the state stuff, that
would directly be a drm_color_op_state that only embeds
drm_private_state.

Another uapi difference is that the new kms objects would be of type
DRM_MODE_OBJECT_NODE, and would always have a "class" property.

> I don't think this is a good idea for multiple reasons. First, this is
> overkill: we don't need this complexity, and this complexity will make it more
> difficult to reason about the color pipeline. This is a premature abstraction,
> one we don't need right now, and one I heaven't heard a potential future
> use-case for. Sure, one can kill an ant with a sledgehammer if they'd like, 
> but
> that's not the right tool for the job.
>
> Second, this will make user-space miserable. User-space already has a tricky
> task to achieve to translate its abstract descriptive color pipeline to our
> proposed simple list of color operations. If we expose a full-blown graph, 
> then
> the user-space logic will need to handle arbitrary graphs. This will have a
> significant cost (on implementation and testing), which we will be paying in
> terms of time spent and in terms of bugs.

The color op pipeline would still be linear. I did not ask for a non-linear one.

> Last, this kind of generic "node" struct is at odds with existing KMS object
> types. So far, KMS objects are concrete like CRTC, connector, plane, etc.
> "Node" is abstract. This is inconsistent.

Yeah I think I think we should change that. That's essentially the
full extend of my proposal. The classes + possible_foo mask approach
just always felt rather brittle to me (and there's plenty of userspace
out there to prove that's the case), going more explicit with the
links with enumerated combos feels better. Plus it should allow
building a bit cleaner interfaces for drivers to construct the correct
graphs, because drivers _also_ rather consistently got the entire
possible_foo mask business wrong.

> Please let me know whether the above is what you have in mind. If not, please
> explain what exactly you mean by "graphs" in terms of uAPI, and please explain
> why we need it and what real-world use-cases it 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-08 Thread Daniel Vetter
On Mon, 8 May 2023 at 10:24, Pekka Paalanen  wrote:
>
> On Fri, 5 May 2023 21:51:41 +0200
> Daniel Vetter  wrote:
>
> > On Fri, May 05, 2023 at 05:57:37PM +0200, Sebastian Wick wrote:
> > > On Fri, May 5, 2023 at 5:28 PM Daniel Vetter  wrote:
> > > >
> > > > On Thu, May 04, 2023 at 03:22:59PM +, Simon Ser wrote:
> > > > > Hi all,
> > > > >
> > > > > The goal of this RFC is to expose a generic KMS uAPI to configure the 
> > > > > color
> > > > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > > > framebuffer and before it's blended with other planes. With this new 
> > > > > uAPI we
> > > > > aim to reduce the battery life impact of color management and HDR on 
> > > > > mobile
> > > > > devices, to improve performance and to decrease latency by skipping
> > > > > composition on the 3D engine. This proposal is the result of 
> > > > > discussions at
> > > > > the Red Hat HDR hackfest [1] which took place a few days ago. 
> > > > > Engineers
> > > > > familiar with the AMD, Intel and NVIDIA hardware have participated in 
> > > > > the
> > > > > discussion.
> > > > >
> > > > > This proposal takes a prescriptive approach instead of a descriptive 
> > > > > approach.
> > > > > Drivers describe the available hardware blocks in terms of low-level
> > > > > mathematical operations, then user-space configures each block. We 
> > > > > decided
> > > > > against a descriptive approach where user-space would provide a 
> > > > > high-level
> > > > > description of the colorspace and other parameters: we want to give 
> > > > > more
> > > > > control and flexibility to user-space, e.g. to be able to replicate 
> > > > > exactly the
> > > > > color pipeline with shaders and switch between shaders and KMS 
> > > > > pipelines
> > > > > seamlessly, and to avoid forcing user-space into a particular color 
> > > > > management
> > > > > policy.
> > > >
> > > > Ack on the prescriptive approach, but generic imo. Descriptive pretty 
> > > > much
> > > > means you need the shaders at the same api level for fallback purposes,
> > > > and we're not going to have that ever in kms. That would need something
> > > > like hwc in userspace to work.
> > >
> > > Which would be nice to have but that would be forcing a specific color
> > > pipeline on everyone and we explicitly want to avoid that. There are
> > > just too many trade-offs to consider.
> > >
> > > > And not generic in it's ultimate consquence would mean we just do a blob
> > > > for a crtc with all the vendor register stuff like adf (android display
> > > > framework) does, because I really don't see a point in trying a
> > > > generic-looking-but-not vendor uapi with each color op/stage split out.
> > > >
> > > > So from very far and pure gut feeling, this seems like a good middle
> > > > ground in the uapi design space we have here.
> > >
> > > Good to hear!
> > >
> > > > > We've decided against mirroring the existing CRTC properties
> > > > > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color 
> > > > > management
> > > > > pipeline can significantly differ between vendors and this approach 
> > > > > cannot
> > > > > accurately abstract all hardware. In particular, the availability, 
> > > > > ordering and
> > > > > capabilities of hardware blocks is different on each display engine. 
> > > > > So, we've
> > > > > decided to go for a highly detailed hardware capability discovery.
> > > > >
> > > > > This new uAPI should not be in conflict with existing standard KMS 
> > > > > properties,
> > > > > since there are none which control the pre-blending color pipeline at 
> > > > > the
> > > > > moment. It does conflict with any vendor-specific properties like
> > > > > NV_INPUT_COLORSPACE or the patches on the mailing list adding 
> > > > > AMD-specific
> > > > > properties. Drivers will need to either reject atomic commits 
> > > > > configuring both
> > > > > uAPIs, or alternatively we could add a DRM client cap which hides the 
> > > > > vendor
> > > > > properties and shows the new generic properties when enabled.
> > > > >
> > > > > To use this uAPI, first user-space needs to discover hardware 
> > > > > capabilities via
> > > > > KMS objects and properties, then user-space can configure the 
> > > > > hardware via an
> > > > > atomic commit. This works similarly to the existing KMS uAPI, e.g. 
> > > > > planes.
> > > > >
> > > > > Our proposal introduces a new "color_pipeline" plane property, and a 
> > > > > new KMS
> > > > > object type, "COLOROP" (short for color operation). The 
> > > > > "color_pipeline" plane
> > > > > property is an enum, each enum entry represents a color pipeline 
> > > > > supported by
> > > > > the hardware. The special zero entry indicates that the pipeline is in
> > > > > "bypass"/"no-op" mode. For instance, the following plane properties 
> > > > > describe a
> > > > > primary plane with 2 supported pipelines but currently configured in 
> > > > > bypass
> > > > > mode:
> > > > >
> > > > > 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-08 Thread Simon Ser
On Friday, May 5th, 2023 at 21:53, Daniel Vetter  wrote:

> On Fri, May 05, 2023 at 04:06:26PM +, Simon Ser wrote:
> > On Friday, May 5th, 2023 at 17:28, Daniel Vetter  wrote:
> >
> > > Ok no comments from me on the actual color operations and semantics of all
> > > that, because I have simply nothing to bring to that except confusion :-)
> > >
> > > Some higher level thoughts instead:
> > >
> > > - I really like that we just go with graph nodes here. I think that was
> > >   bound to happen sooner or later with kms (we almost got there with
> > >   writeback, and with hindsight maybe should have).
> >
> > I'd really rather not do graphs here. We only need linked lists as Sebastian
> > said. Graphs would significantly add more complexity to this proposal, and
> > I don't think that's a good idea unless there is a strong use-case.
> 
> You have a graph, because a graph is just nodes + links. I did _not_
> propose a full generic graph structure, the link pointer would be in the
> class/type specific structure only. Like how we have the plane->crtc or
> connector->crtc links already like that (which already _is_ is full blown
> graph).

I really don't get why a pointer in a struct makes plane->crtc a full-blown
graph. There is only a single parent-child link. A plane has a reference to a
CRTC, and nothing more.

You could say that anything is a graph. Yes, even an isolated struct somewhere
is a graph: one with a single node and no link. But I don't follow what's the
point of explaining everything with a graph when we only need a much simpler
subset of the concept of graphs?

Putting the graph thing aside, what are you suggesting exactly from a concrete
uAPI point-of-view? Introducing a new struct type? Would it be a colorop
specific struct, or a more generic one? What would be the fields? Why do you
think that's necessary and better than the current proposal?

My understanding so far is that you're suggesting introducing something like
this at the uAPI level:

struct drm_mode_node {
uint32_t id;

uint32_t children_count;
uint32_t *children; // list of child object IDs
};

I don't think this is a good idea for multiple reasons. First, this is
overkill: we don't need this complexity, and this complexity will make it more
difficult to reason about the color pipeline. This is a premature abstraction,
one we don't need right now, and one I heaven't heard a potential future
use-case for. Sure, one can kill an ant with a sledgehammer if they'd like, but
that's not the right tool for the job.

Second, this will make user-space miserable. User-space already has a tricky
task to achieve to translate its abstract descriptive color pipeline to our
proposed simple list of color operations. If we expose a full-blown graph, then
the user-space logic will need to handle arbitrary graphs. This will have a
significant cost (on implementation and testing), which we will be paying in
terms of time spent and in terms of bugs.

Last, this kind of generic "node" struct is at odds with existing KMS object
types. So far, KMS objects are concrete like CRTC, connector, plane, etc.
"Node" is abstract. This is inconsistent.

Please let me know whether the above is what you have in mind. If not, please
explain what exactly you mean by "graphs" in terms of uAPI, and please explain
why we need it and what real-world use-cases it would solve.


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-08 Thread Pekka Paalanen
On Fri, 5 May 2023 16:04:35 -0500
Steven Kucharzyk  wrote:

> Hi,
> 
> I'm new to this list and probably can't contribute but interested, I
> passed your original posting to a friend and have enclosed his thoughts
> ... old hash or food for thought ??? I ask your forgiveness if you find
> this inappropriate. (am of the elk * act first, ask for forgiveness
> afterward)  ;-)

Thanks, but please use reply-to-all, it's a bit painful to add back all
the other mailing lists and people.

> 
> Steven
> 
> - start 
> 
> Steven Kucharzyk wrote:
> 
> > Thought you might find this of interest.
> 
> Hi,
>   thanks for sending it to me.
> 
> Unfortunately I don't know enough about the context to say anything
> specific about it.
> 
> The best I can do is state the big picture aims I would
> look for, as someone with a background in display systems
> electronic design, rendering software development
> and Color Science. (I apologize in advance if any of this
> is preaching to the choir!)
> 
> 1) I would make sure that someone with a strong Color Science
> background was consulted in the development of the API.

Where can we find someone like that, who would also not start by saying
we cannot get anything right, or that we cannot change the old software
architecture, and would actually try to understand *our* goals and
limitations as well? Who could commit to long discussions over several
years in a *friendly* manner?

It would take extreme amounts of patience from that person.

> 2) I would be measuring the API against its ability to
> support a "profiling" color management workflow. This workflow
> allows using the full capability of a display, while also allowing
> simultaneous display of multiple sources encoded in any colorspace.
> So the basic architecture is to have a final frame buffer (real
> or virtual) in the native displays colorspace, and use any
> graphics hardware color transform and rendering capability to
> assist with the transformation of data in different source
> colorspaces into the displays native colorspace.
> 
> 3) The third thing I would be looking for, is enough
> standardization that user mode software can be written
> that will get key benefits of what's available in the hardware,
> without needing to be customized to lots of different hardware
> specifics. For instance, I'd make sure that there was a standard display
> frame buffer to display mode that applied per channel curves
> that are specified in a standard way. (i.e. make sure that there
> is an easy to use replacement for XRRCrtcGamma.)
> 
> Any API that is specific to a type or model of graphics card,
> will retard development of color management support to a very large
> degree - the financial and development costs of obtaining, configuring
> and testing against multiple graphic card makes and models puts this
> in the too hard basket for anyone other than a corporation.
> 
> Perhaps little of the above is relevant, if this is a low level API
> that is to be used by other operating system sub-systems such
> as display graphics API's like X11 or Wayland, which will choose
> specific display rendering models and implement them with the hardware
> capabilities that are available.

That is exactly what it is. It is a way to save power and gain
performance when things happen to fit in place just right: what one
needs to do matches what the dedicated color processing hardware blocks
implement.

> From a color management point of view,
> it is the operating system & UI graphics API's that are the ones that
> are desirable to work with, since they are meant to insulate
> applications from hardware details.

Indeed. Anything the display controller hardware cannot do will be
implemented by other means, e.g. on the GPU, by a display server.


Thanks,
pq


pgpMawh849iYa.pgp
Description: OpenPGP digital signature


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-08 Thread Pekka Paalanen
On Fri, 5 May 2023 21:51:41 +0200
Daniel Vetter  wrote:

> On Fri, May 05, 2023 at 05:57:37PM +0200, Sebastian Wick wrote:
> > On Fri, May 5, 2023 at 5:28 PM Daniel Vetter  wrote:  
> > >
> > > On Thu, May 04, 2023 at 03:22:59PM +, Simon Ser wrote:  
> > > > Hi all,
> > > >
> > > > The goal of this RFC is to expose a generic KMS uAPI to configure the 
> > > > color
> > > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > > framebuffer and before it's blended with other planes. With this new 
> > > > uAPI we
> > > > aim to reduce the battery life impact of color management and HDR on 
> > > > mobile
> > > > devices, to improve performance and to decrease latency by skipping
> > > > composition on the 3D engine. This proposal is the result of 
> > > > discussions at
> > > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > > familiar with the AMD, Intel and NVIDIA hardware have participated in 
> > > > the
> > > > discussion.
> > > >
> > > > This proposal takes a prescriptive approach instead of a descriptive 
> > > > approach.
> > > > Drivers describe the available hardware blocks in terms of low-level
> > > > mathematical operations, then user-space configures each block. We 
> > > > decided
> > > > against a descriptive approach where user-space would provide a 
> > > > high-level
> > > > description of the colorspace and other parameters: we want to give more
> > > > control and flexibility to user-space, e.g. to be able to replicate 
> > > > exactly the
> > > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > > seamlessly, and to avoid forcing user-space into a particular color 
> > > > management
> > > > policy.  
> > >
> > > Ack on the prescriptive approach, but generic imo. Descriptive pretty much
> > > means you need the shaders at the same api level for fallback purposes,
> > > and we're not going to have that ever in kms. That would need something
> > > like hwc in userspace to work.  
> > 
> > Which would be nice to have but that would be forcing a specific color
> > pipeline on everyone and we explicitly want to avoid that. There are
> > just too many trade-offs to consider.
> >   
> > > And not generic in it's ultimate consquence would mean we just do a blob
> > > for a crtc with all the vendor register stuff like adf (android display
> > > framework) does, because I really don't see a point in trying a
> > > generic-looking-but-not vendor uapi with each color op/stage split out.
> > >
> > > So from very far and pure gut feeling, this seems like a good middle
> > > ground in the uapi design space we have here.  
> > 
> > Good to hear!
> >   
> > > > We've decided against mirroring the existing CRTC properties
> > > > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> > > > pipeline can significantly differ between vendors and this approach 
> > > > cannot
> > > > accurately abstract all hardware. In particular, the availability, 
> > > > ordering and
> > > > capabilities of hardware blocks is different on each display engine. 
> > > > So, we've
> > > > decided to go for a highly detailed hardware capability discovery.
> > > >
> > > > This new uAPI should not be in conflict with existing standard KMS 
> > > > properties,
> > > > since there are none which control the pre-blending color pipeline at 
> > > > the
> > > > moment. It does conflict with any vendor-specific properties like
> > > > NV_INPUT_COLORSPACE or the patches on the mailing list adding 
> > > > AMD-specific
> > > > properties. Drivers will need to either reject atomic commits 
> > > > configuring both
> > > > uAPIs, or alternatively we could add a DRM client cap which hides the 
> > > > vendor
> > > > properties and shows the new generic properties when enabled.
> > > >
> > > > To use this uAPI, first user-space needs to discover hardware 
> > > > capabilities via
> > > > KMS objects and properties, then user-space can configure the hardware 
> > > > via an
> > > > atomic commit. This works similarly to the existing KMS uAPI, e.g. 
> > > > planes.
> > > >
> > > > Our proposal introduces a new "color_pipeline" plane property, and a 
> > > > new KMS
> > > > object type, "COLOROP" (short for color operation). The 
> > > > "color_pipeline" plane
> > > > property is an enum, each enum entry represents a color pipeline 
> > > > supported by
> > > > the hardware. The special zero entry indicates that the pipeline is in
> > > > "bypass"/"no-op" mode. For instance, the following plane properties 
> > > > describe a
> > > > primary plane with 2 supported pipelines but currently configured in 
> > > > bypass
> > > > mode:
> > > >
> > > > Plane 10
> > > > ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > > > ├─ …
> > > > └─ "color_pipeline": enum {0, 42, 52} = 0  
> > >
> > > A bit confused, why is this an enum, and not just an immutable prop that
> > > points at the first element? You already can 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-07 Thread Dave Airlie
On Sat, 6 May 2023 at 08:21, Sebastian Wick  wrote:
>
> On Fri, May 5, 2023 at 10:40 PM Dave Airlie  wrote:
> >
> > On Fri, 5 May 2023 at 01:23, Simon Ser  wrote:
> > >
> > > Hi all,
> > >
> > > The goal of this RFC is to expose a generic KMS uAPI to configure the 
> > > color
> > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > framebuffer and before it's blended with other planes. With this new uAPI 
> > > we
> > > aim to reduce the battery life impact of color management and HDR on 
> > > mobile
> > > devices, to improve performance and to decrease latency by skipping
> > > composition on the 3D engine. This proposal is the result of discussions 
> > > at
> > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > > discussion.
> > >
> > > This proposal takes a prescriptive approach instead of a descriptive 
> > > approach.
> > > Drivers describe the available hardware blocks in terms of low-level
> > > mathematical operations, then user-space configures each block. We decided
> > > against a descriptive approach where user-space would provide a high-level
> > > description of the colorspace and other parameters: we want to give more
> > > control and flexibility to user-space, e.g. to be able to replicate 
> > > exactly the
> > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > seamlessly, and to avoid forcing user-space into a particular color 
> > > management
> > > policy.
> >
> > I'm not 100% sold on the prescriptive here, let's see if someone can
> > get me over the line with some questions later.
> >
> > My feeling is color pipeline hw is not a done deal, and that hw
> > vendors will be revising/evolving/churning the hw blocks for a while
> > longer, as there is no real standards in the area to aim for, all the
> > vendors are mostly just doing whatever gets Windows over the line and
> > keeps hw engineers happy. So I have some concerns here around forwards
> > compatibility and hence the API design.
> >
> > I guess my main concern is if you expose a bunch of hw blocks and
> > someone comes up with a novel new thing, will all existing userspace
> > work, without falling back to shaders?
> > Do we have minimum guarantees on what hardware blocks have to be
> > exposed to build a useable pipeline?
> > If a hardware block goes away in a new silicon revision, do I have to
> > rewrite my compositor? or will it be expected that the kernel will
> > emulate the old pipelines on top of whatever new fancy thing exists.
>
> I think there are two answers to those questions.

These aren't selling me much better :-)
>
> The first one is that right now KMS already doesn't guarantee that
> every property is supported on all hardware. The guarantee we have is
> that properties that are supported on a piece of hardware on a
> specific kernel will be supported on the same hardware on later
> kernels. The color pipeline is no different here. For a specific piece
> of hardware a newer kernel might only change the pipelines in a
> backwards compatible way and add new pipelines.
>
> So to answer your question: if some hardware with a novel pipeline
> will show up it might not be supported and that's fine. We already
> have cases where some hardware does not support the gamma lut property
> but only the CSC property and that breaks night light because we never
> bothered to write a shader fallback. KMS provides ways to offload work
> but a generic user space always has to provide a fallback and this
> doesn't change. Hardware specific user space on the other hand will
> keep working with the forward compatibility guarantees we want to
> provide.

In my mind we've screwed up already, isn't a case to be made for
continue down the same path.

The kernel is meant to be a hardware abstraction layer, not just a
hardware exposure layer. The kernel shouldn't set policy and there are
cases where it can't act as an abstraction layer (like where you need
a compiler), but I'm not sold that this case is one of those yet. I'm
open to being educated here on why it would be.

>
> The second answer is that we want to provide a user space library
> which takes a description of a color pipeline and tries to map that to
> the available KMS color pipelines. If there is a novel color
> operation, adding support in this library would then make it possible
> to offload compatible color pipelines on this new hardware for all
> consumers of the library. Obviously there is no guarantee that
> whatever color pipeline compositors come up with can actually be
> realized on specific hardware but that's just an inherent hardware
> issue.
>

Why does this library need to be in userspace though? If there's a
library making device dependent decisions, why can't we just make
those device dependent decisions in the kernel?

This feels like we are trying to go down the Android HWC road, but we
aren't 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-05 Thread Sebastian Wick
On Fri, May 5, 2023 at 10:40 PM Dave Airlie  wrote:
>
> On Fri, 5 May 2023 at 01:23, Simon Ser  wrote:
> >
> > Hi all,
> >
> > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > pipeline before blending, ie. after a pixel is tapped from a plane's
> > framebuffer and before it's blended with other planes. With this new uAPI we
> > aim to reduce the battery life impact of color management and HDR on mobile
> > devices, to improve performance and to decrease latency by skipping
> > composition on the 3D engine. This proposal is the result of discussions at
> > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > discussion.
> >
> > This proposal takes a prescriptive approach instead of a descriptive 
> > approach.
> > Drivers describe the available hardware blocks in terms of low-level
> > mathematical operations, then user-space configures each block. We decided
> > against a descriptive approach where user-space would provide a high-level
> > description of the colorspace and other parameters: we want to give more
> > control and flexibility to user-space, e.g. to be able to replicate exactly 
> > the
> > color pipeline with shaders and switch between shaders and KMS pipelines
> > seamlessly, and to avoid forcing user-space into a particular color 
> > management
> > policy.
>
> I'm not 100% sold on the prescriptive here, let's see if someone can
> get me over the line with some questions later.
>
> My feeling is color pipeline hw is not a done deal, and that hw
> vendors will be revising/evolving/churning the hw blocks for a while
> longer, as there is no real standards in the area to aim for, all the
> vendors are mostly just doing whatever gets Windows over the line and
> keeps hw engineers happy. So I have some concerns here around forwards
> compatibility and hence the API design.
>
> I guess my main concern is if you expose a bunch of hw blocks and
> someone comes up with a novel new thing, will all existing userspace
> work, without falling back to shaders?
> Do we have minimum guarantees on what hardware blocks have to be
> exposed to build a useable pipeline?
> If a hardware block goes away in a new silicon revision, do I have to
> rewrite my compositor? or will it be expected that the kernel will
> emulate the old pipelines on top of whatever new fancy thing exists.

I think there are two answers to those questions.

The first one is that right now KMS already doesn't guarantee that
every property is supported on all hardware. The guarantee we have is
that properties that are supported on a piece of hardware on a
specific kernel will be supported on the same hardware on later
kernels. The color pipeline is no different here. For a specific piece
of hardware a newer kernel might only change the pipelines in a
backwards compatible way and add new pipelines.

So to answer your question: if some hardware with a novel pipeline
will show up it might not be supported and that's fine. We already
have cases where some hardware does not support the gamma lut property
but only the CSC property and that breaks night light because we never
bothered to write a shader fallback. KMS provides ways to offload work
but a generic user space always has to provide a fallback and this
doesn't change. Hardware specific user space on the other hand will
keep working with the forward compatibility guarantees we want to
provide.

The second answer is that we want to provide a user space library
which takes a description of a color pipeline and tries to map that to
the available KMS color pipelines. If there is a novel color
operation, adding support in this library would then make it possible
to offload compatible color pipelines on this new hardware for all
consumers of the library. Obviously there is no guarantee that
whatever color pipeline compositors come up with can actually be
realized on specific hardware but that's just an inherent hardware
issue.

> We are not Android, or even Steam OS on a Steamdeck, we have to be
> able to independently update the kernel for new hardware and not
> require every compositor currently providing HDR to need to support
> new hardware blocks and models at the same time.
>
> Dave.
>



Re: [RFC] Plane color pipeline KMS uAPI

2023-05-05 Thread Dave Airlie
On Fri, 5 May 2023 at 01:23, Simon Ser  wrote:
>
> Hi all,
>
> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> pipeline before blending, ie. after a pixel is tapped from a plane's
> framebuffer and before it's blended with other planes. With this new uAPI we
> aim to reduce the battery life impact of color management and HDR on mobile
> devices, to improve performance and to decrease latency by skipping
> composition on the 3D engine. This proposal is the result of discussions at
> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> discussion.
>
> This proposal takes a prescriptive approach instead of a descriptive approach.
> Drivers describe the available hardware blocks in terms of low-level
> mathematical operations, then user-space configures each block. We decided
> against a descriptive approach where user-space would provide a high-level
> description of the colorspace and other parameters: we want to give more
> control and flexibility to user-space, e.g. to be able to replicate exactly 
> the
> color pipeline with shaders and switch between shaders and KMS pipelines
> seamlessly, and to avoid forcing user-space into a particular color management
> policy.

I'm not 100% sold on the prescriptive here, let's see if someone can
get me over the line with some questions later.

My feeling is color pipeline hw is not a done deal, and that hw
vendors will be revising/evolving/churning the hw blocks for a while
longer, as there is no real standards in the area to aim for, all the
vendors are mostly just doing whatever gets Windows over the line and
keeps hw engineers happy. So I have some concerns here around forwards
compatibility and hence the API design.

I guess my main concern is if you expose a bunch of hw blocks and
someone comes up with a novel new thing, will all existing userspace
work, without falling back to shaders?
Do we have minimum guarantees on what hardware blocks have to be
exposed to build a useable pipeline?
If a hardware block goes away in a new silicon revision, do I have to
rewrite my compositor? or will it be expected that the kernel will
emulate the old pipelines on top of whatever new fancy thing exists.

We are not Android, or even Steam OS on a Steamdeck, we have to be
able to independently update the kernel for new hardware and not
require every compositor currently providing HDR to need to support
new hardware blocks and models at the same time.

Dave.


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-05 Thread Daniel Vetter
On Fri, May 05, 2023 at 04:06:26PM +, Simon Ser wrote:
> On Friday, May 5th, 2023 at 17:28, Daniel Vetter  wrote:
> 
> > Ok no comments from me on the actual color operations and semantics of all
> > that, because I have simply nothing to bring to that except confusion :-)
> > 
> > Some higher level thoughts instead:
> > 
> > - I really like that we just go with graph nodes here. I think that was
> >   bound to happen sooner or later with kms (we almost got there with
> >   writeback, and with hindsight maybe should have).
> 
> I'd really rather not do graphs here. We only need linked lists as Sebastian
> said. Graphs would significantly add more complexity to this proposal, and
> I don't think that's a good idea unless there is a strong use-case.

You have a graph, because a graph is just nodes + links. I did _not_
propose a full generic graph structure, the link pointer would be in the
class/type specific structure only. Like how we have the plane->crtc or
connector->crtc links already like that (which already _is_ is full blown
graph).

Maybe explain what exactly you're thinking under "do graphs here" so I
understand what you mean differently than me?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-05 Thread Daniel Vetter
On Fri, May 05, 2023 at 05:57:37PM +0200, Sebastian Wick wrote:
> On Fri, May 5, 2023 at 5:28 PM Daniel Vetter  wrote:
> >
> > On Thu, May 04, 2023 at 03:22:59PM +, Simon Ser wrote:
> > > Hi all,
> > >
> > > The goal of this RFC is to expose a generic KMS uAPI to configure the 
> > > color
> > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > framebuffer and before it's blended with other planes. With this new uAPI 
> > > we
> > > aim to reduce the battery life impact of color management and HDR on 
> > > mobile
> > > devices, to improve performance and to decrease latency by skipping
> > > composition on the 3D engine. This proposal is the result of discussions 
> > > at
> > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > > discussion.
> > >
> > > This proposal takes a prescriptive approach instead of a descriptive 
> > > approach.
> > > Drivers describe the available hardware blocks in terms of low-level
> > > mathematical operations, then user-space configures each block. We decided
> > > against a descriptive approach where user-space would provide a high-level
> > > description of the colorspace and other parameters: we want to give more
> > > control and flexibility to user-space, e.g. to be able to replicate 
> > > exactly the
> > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > seamlessly, and to avoid forcing user-space into a particular color 
> > > management
> > > policy.
> >
> > Ack on the prescriptive approach, but generic imo. Descriptive pretty much
> > means you need the shaders at the same api level for fallback purposes,
> > and we're not going to have that ever in kms. That would need something
> > like hwc in userspace to work.
> 
> Which would be nice to have but that would be forcing a specific color
> pipeline on everyone and we explicitly want to avoid that. There are
> just too many trade-offs to consider.
> 
> > And not generic in it's ultimate consquence would mean we just do a blob
> > for a crtc with all the vendor register stuff like adf (android display
> > framework) does, because I really don't see a point in trying a
> > generic-looking-but-not vendor uapi with each color op/stage split out.
> >
> > So from very far and pure gut feeling, this seems like a good middle
> > ground in the uapi design space we have here.
> 
> Good to hear!
> 
> > > We've decided against mirroring the existing CRTC properties
> > > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> > > pipeline can significantly differ between vendors and this approach cannot
> > > accurately abstract all hardware. In particular, the availability, 
> > > ordering and
> > > capabilities of hardware blocks is different on each display engine. So, 
> > > we've
> > > decided to go for a highly detailed hardware capability discovery.
> > >
> > > This new uAPI should not be in conflict with existing standard KMS 
> > > properties,
> > > since there are none which control the pre-blending color pipeline at the
> > > moment. It does conflict with any vendor-specific properties like
> > > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> > > properties. Drivers will need to either reject atomic commits configuring 
> > > both
> > > uAPIs, or alternatively we could add a DRM client cap which hides the 
> > > vendor
> > > properties and shows the new generic properties when enabled.
> > >
> > > To use this uAPI, first user-space needs to discover hardware 
> > > capabilities via
> > > KMS objects and properties, then user-space can configure the hardware 
> > > via an
> > > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> > >
> > > Our proposal introduces a new "color_pipeline" plane property, and a new 
> > > KMS
> > > object type, "COLOROP" (short for color operation). The "color_pipeline" 
> > > plane
> > > property is an enum, each enum entry represents a color pipeline 
> > > supported by
> > > the hardware. The special zero entry indicates that the pipeline is in
> > > "bypass"/"no-op" mode. For instance, the following plane properties 
> > > describe a
> > > primary plane with 2 supported pipelines but currently configured in 
> > > bypass
> > > mode:
> > >
> > > Plane 10
> > > ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > > ├─ …
> > > └─ "color_pipeline": enum {0, 42, 52} = 0
> >
> > A bit confused, why is this an enum, and not just an immutable prop that
> > points at the first element? You already can disable elements with the
> > bypass thing, also bypassing by changing the pointers to the next node in
> > the graph seems a bit confusing and redundant.
> 
> We want to allow multiple pipelines to exist and a plane can choose
> the pipeline by selecting the first element of the pipeline. The enum
> here lists all the possible 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-05 Thread Joshua Ashton




On 5/5/23 15:16, Pekka Paalanen wrote:

On Fri, 5 May 2023 14:30:11 +0100
Joshua Ashton  wrote:


Some corrections and replies inline.

On Fri, 5 May 2023 at 12:42, Pekka Paalanen  wrote:


On Thu, 04 May 2023 15:22:59 +
Simon Ser  wrote:


...


To wrap things up, let's take a real-world example: how would gamescope [2]
configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].

AMD would expose the following objects and properties:

 Plane 10
 ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
 └─ "color_pipeline": enum {0, 42} = 0
 Color operation 42 (input CSC)
 ├─ "type": enum {Bypass, Matrix} = Matrix
 ├─ "matrix_data": blob
 └─ "next": immutable color operation ID = 43
 Color operation 43
 ├─ "type": enum {Scaling} = Scaling
 └─ "next": immutable color operation ID = 44
 Color operation 44 (DeGamma)
 ├─ "type": enum {Bypass, 1D curve} = 1D curve
 ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
 └─ "next": immutable color operation ID = 45


Some vendors have per-tap degamma and some have a degamma after the sample.
How do we distinguish that behaviour?
It is important to know.


...


Btw. ISTR that if you want to do scaling properly with alpha channel,
you need optical values multiplied by alpha. Alpha vs. scaling is just
yet another thing to look into, and TF operations do not work with
pre-mult.


What are your concerns here?


I believe this is exactly the same question as yours about sampling, at
least for up-scaling where sampling the framebuffer interpolates in
some way.

Oh, interpolation mode would fit in the scaling COLOROP...


Having pre-multiplied alpha is fine with a TF: the alpha was
premultiplied in linear, then encoded with the TF by the client.


There are two different ways to pre-multiply: into optical values
(okay), and into electrical values (what everyone actually does, and
what Wayland assumes by default).

What you described is the thing mostly no-one does in GUI graphics.
Even in the web.


Yeah, I have seen this problem many times before in different fields.

There are not many transparent clients that I know of (most of them are 
Gamescope Overlays), but the ones I do know of do actually do the 
premultiply in linear space (mainly because they use sRGB image views 
for their color attachments so it gets handled for them).


From my perspective and experience, we definitely shouldn't do anything 
to try and 'fix' apps doing their premultiply in the wrong space.


I've had to deal with this before in game development on a transparent 
HUD, and my solution and thinking for that was:
It was authored (or "mastered") with this behaviour in mind. So that's 
what we should do.
It felt bad to 'break' the blending on the HUD of that game, but it 
looked better, and it was what was intended before it was 'fixed' in a 
later engine version.


It is still definitely interesting to think about, but I don't think 
presents a problem at all.

In fact, doing anything would just 'break' the expected behaviour of apps.




If you think of a TF as something something relative to a bunch of
reference state or whatever then you might think "oh you can't do
that!", but you really can.
It's really best to just think of it as a mathematical encoding of a
value in all instances that we touch.


True, except when it's false. If you assume that decoding is the exact
mathematical inverse of encoding, then your conclusion follows.

Unfortunately many video standards do not have it so. BT.601, BT.709,
and I forget if BT.2020 (SDR) as well encode with one function and
decode with something that is not the inverse, and it is totally
intentional and necessary mangling of the values to get the expected
result on screen. Someone has called this "implicit color management".

So one needs to be very careful here what the actual characteristics
are.


The only issue is that you lose precision from having pre-multiplied
alpha as it's quantized to fit into the DRM format rather than using
the full range then getting divided by the alpha at blend time.
It doesn't end up being a visible issue ever however in my experience, at 8bpc.


That's true. Wait, why would you divide by alpha for blending?
Blending/interpolation is the only operation where pre-mult is useful.


I mis-spoke, I meant multiply.

- Joshie ✨




Thanks,
pq



Thanks
  - Joshie ✨




Thanks,
pq
  


I hope comparing these properties to the diagrams linked above can help
understand how the uAPI would be used and give an idea of its viability.

Please feel free to provide feedback! It would be especially useful to have
someone familiar with Arm SoCs look at this, to confirm that this proposal
would work there.

Unless there is a show-stopper, we plan to follow up this RFC with
implementations for AMD, Intel, NVIDIA, gamescope, and IGT.

Many thanks to everybody who contributed to 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-05 Thread Simon Ser
On Friday, May 5th, 2023 at 17:28, Daniel Vetter  wrote:

> Ok no comments from me on the actual color operations and semantics of all
> that, because I have simply nothing to bring to that except confusion :-)
> 
> Some higher level thoughts instead:
> 
> - I really like that we just go with graph nodes here. I think that was
>   bound to happen sooner or later with kms (we almost got there with
>   writeback, and with hindsight maybe should have).

I'd really rather not do graphs here. We only need linked lists as Sebastian
said. Graphs would significantly add more complexity to this proposal, and
I don't think that's a good idea unless there is a strong use-case.


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-05 Thread Sebastian Wick
On Fri, May 5, 2023 at 5:28 PM Daniel Vetter  wrote:
>
> On Thu, May 04, 2023 at 03:22:59PM +, Simon Ser wrote:
> > Hi all,
> >
> > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > pipeline before blending, ie. after a pixel is tapped from a plane's
> > framebuffer and before it's blended with other planes. With this new uAPI we
> > aim to reduce the battery life impact of color management and HDR on mobile
> > devices, to improve performance and to decrease latency by skipping
> > composition on the 3D engine. This proposal is the result of discussions at
> > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > discussion.
> >
> > This proposal takes a prescriptive approach instead of a descriptive 
> > approach.
> > Drivers describe the available hardware blocks in terms of low-level
> > mathematical operations, then user-space configures each block. We decided
> > against a descriptive approach where user-space would provide a high-level
> > description of the colorspace and other parameters: we want to give more
> > control and flexibility to user-space, e.g. to be able to replicate exactly 
> > the
> > color pipeline with shaders and switch between shaders and KMS pipelines
> > seamlessly, and to avoid forcing user-space into a particular color 
> > management
> > policy.
>
> Ack on the prescriptive approach, but generic imo. Descriptive pretty much
> means you need the shaders at the same api level for fallback purposes,
> and we're not going to have that ever in kms. That would need something
> like hwc in userspace to work.

Which would be nice to have but that would be forcing a specific color
pipeline on everyone and we explicitly want to avoid that. There are
just too many trade-offs to consider.

> And not generic in it's ultimate consquence would mean we just do a blob
> for a crtc with all the vendor register stuff like adf (android display
> framework) does, because I really don't see a point in trying a
> generic-looking-but-not vendor uapi with each color op/stage split out.
>
> So from very far and pure gut feeling, this seems like a good middle
> ground in the uapi design space we have here.

Good to hear!

> > We've decided against mirroring the existing CRTC properties
> > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> > pipeline can significantly differ between vendors and this approach cannot
> > accurately abstract all hardware. In particular, the availability, ordering 
> > and
> > capabilities of hardware blocks is different on each display engine. So, 
> > we've
> > decided to go for a highly detailed hardware capability discovery.
> >
> > This new uAPI should not be in conflict with existing standard KMS 
> > properties,
> > since there are none which control the pre-blending color pipeline at the
> > moment. It does conflict with any vendor-specific properties like
> > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> > properties. Drivers will need to either reject atomic commits configuring 
> > both
> > uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> > properties and shows the new generic properties when enabled.
> >
> > To use this uAPI, first user-space needs to discover hardware capabilities 
> > via
> > KMS objects and properties, then user-space can configure the hardware via 
> > an
> > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> >
> > Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> > object type, "COLOROP" (short for color operation). The "color_pipeline" 
> > plane
> > property is an enum, each enum entry represents a color pipeline supported 
> > by
> > the hardware. The special zero entry indicates that the pipeline is in
> > "bypass"/"no-op" mode. For instance, the following plane properties 
> > describe a
> > primary plane with 2 supported pipelines but currently configured in bypass
> > mode:
> >
> > Plane 10
> > ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > ├─ …
> > └─ "color_pipeline": enum {0, 42, 52} = 0
>
> A bit confused, why is this an enum, and not just an immutable prop that
> points at the first element? You already can disable elements with the
> bypass thing, also bypassing by changing the pointers to the next node in
> the graph seems a bit confusing and redundant.

We want to allow multiple pipelines to exist and a plane can choose
the pipeline by selecting the first element of the pipeline. The enum
here lists all the possible pipelines that can be attached to the
surface.

> > The non-zero entries describe color pipelines as a linked list of COLOROP 
> > KMS
> > objects. The entry value is an object ID pointing to the head of the linked
> > list (the first operation in the color pipeline).
> >
> > The new COLOROP objects also 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-05 Thread Daniel Vetter
On Thu, May 04, 2023 at 03:22:59PM +, Simon Ser wrote:
> Hi all,
> 
> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> pipeline before blending, ie. after a pixel is tapped from a plane's
> framebuffer and before it's blended with other planes. With this new uAPI we
> aim to reduce the battery life impact of color management and HDR on mobile
> devices, to improve performance and to decrease latency by skipping
> composition on the 3D engine. This proposal is the result of discussions at
> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> discussion.
> 
> This proposal takes a prescriptive approach instead of a descriptive approach.
> Drivers describe the available hardware blocks in terms of low-level
> mathematical operations, then user-space configures each block. We decided
> against a descriptive approach where user-space would provide a high-level
> description of the colorspace and other parameters: we want to give more
> control and flexibility to user-space, e.g. to be able to replicate exactly 
> the
> color pipeline with shaders and switch between shaders and KMS pipelines
> seamlessly, and to avoid forcing user-space into a particular color management
> policy.

Ack on the prescriptive approach, but generic imo. Descriptive pretty much
means you need the shaders at the same api level for fallback purposes,
and we're not going to have that ever in kms. That would need something
like hwc in userspace to work.

And not generic in it's ultimate consquence would mean we just do a blob
for a crtc with all the vendor register stuff like adf (android display
framework) does, because I really don't see a point in trying a
generic-looking-but-not vendor uapi with each color op/stage split out.

So from very far and pure gut feeling, this seems like a good middle
ground in the uapi design space we have here.

> We've decided against mirroring the existing CRTC properties
> DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> pipeline can significantly differ between vendors and this approach cannot
> accurately abstract all hardware. In particular, the availability, ordering 
> and
> capabilities of hardware blocks is different on each display engine. So, we've
> decided to go for a highly detailed hardware capability discovery.
> 
> This new uAPI should not be in conflict with existing standard KMS properties,
> since there are none which control the pre-blending color pipeline at the
> moment. It does conflict with any vendor-specific properties like
> NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> properties. Drivers will need to either reject atomic commits configuring both
> uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> properties and shows the new generic properties when enabled.
> 
> To use this uAPI, first user-space needs to discover hardware capabilities via
> KMS objects and properties, then user-space can configure the hardware via an
> atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> 
> Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> property is an enum, each enum entry represents a color pipeline supported by
> the hardware. The special zero entry indicates that the pipeline is in
> "bypass"/"no-op" mode. For instance, the following plane properties describe a
> primary plane with 2 supported pipelines but currently configured in bypass
> mode:
> 
> Plane 10
> ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> ├─ …
> └─ "color_pipeline": enum {0, 42, 52} = 0

A bit confused, why is this an enum, and not just an immutable prop that
points at the first element? You already can disable elements with the
bypass thing, also bypassing by changing the pointers to the next node in
the graph seems a bit confusing and redundant.

> The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> objects. The entry value is an object ID pointing to the head of the linked
> list (the first operation in the color pipeline).
> 
> The new COLOROP objects also expose a number of KMS properties. Each has a
> type, a reference to the next COLOROP object in the linked list, and other
> type-specific properties. Here is an example for a 1D LUT operation:

Ok no comments from me on the actual color operations and semantics of all
that, because I have simply nothing to bring to that except confusion :-)

Some higher level thoughts instead:

- I really like that we just go with graph nodes here. I think that was
  bound to happen sooner or later with kms (we almost got there with
  writeback, and with hindsight maybe should have).

- Since there's other use-cases for graph nodes (maybe scaler modes, or
  histogram 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-05 Thread Pekka Paalanen
On Fri, 5 May 2023 14:30:11 +0100
Joshua Ashton  wrote:

> Some corrections and replies inline.
> 
> On Fri, 5 May 2023 at 12:42, Pekka Paalanen  wrote:
> >
> > On Thu, 04 May 2023 15:22:59 +
> > Simon Ser  wrote:

...

> > > To wrap things up, let's take a real-world example: how would gamescope 
> > > [2]
> > > configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope 
> > > color
> > > pipeline is described in [3]. The AMD DCN 3.0 hardware is described in 
> > > [4].
> > >
> > > AMD would expose the following objects and properties:
> > >
> > > Plane 10
> > > ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > > └─ "color_pipeline": enum {0, 42} = 0
> > > Color operation 42 (input CSC)
> > > ├─ "type": enum {Bypass, Matrix} = Matrix
> > > ├─ "matrix_data": blob
> > > └─ "next": immutable color operation ID = 43
> > > Color operation 43
> > > ├─ "type": enum {Scaling} = Scaling
> > > └─ "next": immutable color operation ID = 44
> > > Color operation 44 (DeGamma)
> > > ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > > ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
> > > └─ "next": immutable color operation ID = 45  
> 
> Some vendors have per-tap degamma and some have a degamma after the sample.
> How do we distinguish that behaviour?
> It is important to know.

...

> > Btw. ISTR that if you want to do scaling properly with alpha channel,
> > you need optical values multiplied by alpha. Alpha vs. scaling is just
> > yet another thing to look into, and TF operations do not work with
> > pre-mult.  
> 
> What are your concerns here?

I believe this is exactly the same question as yours about sampling, at
least for up-scaling where sampling the framebuffer interpolates in
some way.

Oh, interpolation mode would fit in the scaling COLOROP...

> Having pre-multiplied alpha is fine with a TF: the alpha was
> premultiplied in linear, then encoded with the TF by the client.

There are two different ways to pre-multiply: into optical values
(okay), and into electrical values (what everyone actually does, and
what Wayland assumes by default).

What you described is the thing mostly no-one does in GUI graphics.
Even in the web.

> If you think of a TF as something something relative to a bunch of
> reference state or whatever then you might think "oh you can't do
> that!", but you really can.
> It's really best to just think of it as a mathematical encoding of a
> value in all instances that we touch.

True, except when it's false. If you assume that decoding is the exact
mathematical inverse of encoding, then your conclusion follows.

Unfortunately many video standards do not have it so. BT.601, BT.709,
and I forget if BT.2020 (SDR) as well encode with one function and
decode with something that is not the inverse, and it is totally
intentional and necessary mangling of the values to get the expected
result on screen. Someone has called this "implicit color management".

So one needs to be very careful here what the actual characteristics
are.

> The only issue is that you lose precision from having pre-multiplied
> alpha as it's quantized to fit into the DRM format rather than using
> the full range then getting divided by the alpha at blend time.
> It doesn't end up being a visible issue ever however in my experience, at 
> 8bpc.

That's true. Wait, why would you divide by alpha for blending?
Blending/interpolation is the only operation where pre-mult is useful.


Thanks,
pq

> 
> Thanks
>  - Joshie ✨
> 
> >
> >
> > Thanks,
> > pq
> >  
> > >
> > > I hope comparing these properties to the diagrams linked above can help
> > > understand how the uAPI would be used and give an idea of its viability.
> > >
> > > Please feel free to provide feedback! It would be especially useful to 
> > > have
> > > someone familiar with Arm SoCs look at this, to confirm that this proposal
> > > would work there.
> > >
> > > Unless there is a show-stopper, we plan to follow up this RFC with
> > > implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
> > >
> > > Many thanks to everybody who contributed to the hackfest, on-site or 
> > > remotely!
> > > Let's work together to make this happen!
> > >
> > > Simon, on behalf of the hackfest participants
> > >
> > > [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
> > > [2]: https://github.com/ValveSoftware/gamescope
> > > [3]: 
> > > https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
> > > [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg  
> >  



pgpFKEwG8wI8J.pgp
Description: OpenPGP digital signature


Re: [RFC] Plane color pipeline KMS uAPI

2023-05-05 Thread Joshua Ashton
Some corrections and replies inline.

On Fri, 5 May 2023 at 12:42, Pekka Paalanen  wrote:
>
> On Thu, 04 May 2023 15:22:59 +
> Simon Ser  wrote:
>
> > Hi all,
> >
> > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > pipeline before blending, ie. after a pixel is tapped from a plane's
> > framebuffer and before it's blended with other planes. With this new uAPI we
> > aim to reduce the battery life impact of color management and HDR on mobile
> > devices, to improve performance and to decrease latency by skipping
> > composition on the 3D engine. This proposal is the result of discussions at
> > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > discussion.
>
> Hi Simon,
>
> this is an excellent write-up, thank you!
>
> Harry's question about what constitutes UAPI is a good one for danvet.
>
> I don't really have much to add here, a couple inline comments. I think
> this could work.
>
> >
> > This proposal takes a prescriptive approach instead of a descriptive 
> > approach.
> > Drivers describe the available hardware blocks in terms of low-level
> > mathematical operations, then user-space configures each block. We decided
> > against a descriptive approach where user-space would provide a high-level
> > description of the colorspace and other parameters: we want to give more
> > control and flexibility to user-space, e.g. to be able to replicate exactly 
> > the
> > color pipeline with shaders and switch between shaders and KMS pipelines
> > seamlessly, and to avoid forcing user-space into a particular color 
> > management
> > policy.
> >
> > We've decided against mirroring the existing CRTC properties
> > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> > pipeline can significantly differ between vendors and this approach cannot
> > accurately abstract all hardware. In particular, the availability, ordering 
> > and
> > capabilities of hardware blocks is different on each display engine. So, 
> > we've
> > decided to go for a highly detailed hardware capability discovery.
> >
> > This new uAPI should not be in conflict with existing standard KMS 
> > properties,
> > since there are none which control the pre-blending color pipeline at the
> > moment. It does conflict with any vendor-specific properties like
> > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> > properties. Drivers will need to either reject atomic commits configuring 
> > both
> > uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> > properties and shows the new generic properties when enabled.
> >
> > To use this uAPI, first user-space needs to discover hardware capabilities 
> > via
> > KMS objects and properties, then user-space can configure the hardware via 
> > an
> > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> >
> > Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> > object type, "COLOROP" (short for color operation). The "color_pipeline" 
> > plane
> > property is an enum, each enum entry represents a color pipeline supported 
> > by
> > the hardware. The special zero entry indicates that the pipeline is in
> > "bypass"/"no-op" mode. For instance, the following plane properties 
> > describe a
> > primary plane with 2 supported pipelines but currently configured in bypass
> > mode:
> >
> > Plane 10
> > ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > ├─ …
> > └─ "color_pipeline": enum {0, 42, 52} = 0
> >
> > The non-zero entries describe color pipelines as a linked list of COLOROP 
> > KMS
> > objects. The entry value is an object ID pointing to the head of the linked
> > list (the first operation in the color pipeline).
> >
> > The new COLOROP objects also expose a number of KMS properties. Each has a
> > type, a reference to the next COLOROP object in the linked list, and other
> > type-specific properties. Here is an example for a 1D LUT operation:
> >
> > Color operation 42
> > ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
> > ├─ "lut_size": immutable range = 4096
> > ├─ "lut_data": blob
> > └─ "next": immutable color operation ID = 43
> >
> > To configure this hardware block, user-space can fill a KMS blob with 4096 
> > u32
> > entries, then set "lut_data" to the blob ID. Other color operation types 
> > might
> > have different properties.
> >
> > Here is another example with a 3D LUT:
> >
> > Color operation 42
> > ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> > ├─ "lut_size": immutable range = 33
> > ├─ "lut_data": blob
> > └─ "next": immutable color operation ID = 43
> >
> > And one last example with a matrix:
> >
> > Color operation 42
> > ├─ "type": enum {Bypass, Matrix} = Matrix
> > ├─ 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-05 Thread Pekka Paalanen
On Thu, 04 May 2023 15:22:59 +
Simon Ser  wrote:

> Hi all,
> 
> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> pipeline before blending, ie. after a pixel is tapped from a plane's
> framebuffer and before it's blended with other planes. With this new uAPI we
> aim to reduce the battery life impact of color management and HDR on mobile
> devices, to improve performance and to decrease latency by skipping
> composition on the 3D engine. This proposal is the result of discussions at
> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> discussion.

Hi Simon,

this is an excellent write-up, thank you!

Harry's question about what constitutes UAPI is a good one for danvet.

I don't really have much to add here, a couple inline comments. I think
this could work.

> 
> This proposal takes a prescriptive approach instead of a descriptive approach.
> Drivers describe the available hardware blocks in terms of low-level
> mathematical operations, then user-space configures each block. We decided
> against a descriptive approach where user-space would provide a high-level
> description of the colorspace and other parameters: we want to give more
> control and flexibility to user-space, e.g. to be able to replicate exactly 
> the
> color pipeline with shaders and switch between shaders and KMS pipelines
> seamlessly, and to avoid forcing user-space into a particular color management
> policy.
> 
> We've decided against mirroring the existing CRTC properties
> DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> pipeline can significantly differ between vendors and this approach cannot
> accurately abstract all hardware. In particular, the availability, ordering 
> and
> capabilities of hardware blocks is different on each display engine. So, we've
> decided to go for a highly detailed hardware capability discovery.
> 
> This new uAPI should not be in conflict with existing standard KMS properties,
> since there are none which control the pre-blending color pipeline at the
> moment. It does conflict with any vendor-specific properties like
> NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> properties. Drivers will need to either reject atomic commits configuring both
> uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> properties and shows the new generic properties when enabled.
> 
> To use this uAPI, first user-space needs to discover hardware capabilities via
> KMS objects and properties, then user-space can configure the hardware via an
> atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> 
> Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> property is an enum, each enum entry represents a color pipeline supported by
> the hardware. The special zero entry indicates that the pipeline is in
> "bypass"/"no-op" mode. For instance, the following plane properties describe a
> primary plane with 2 supported pipelines but currently configured in bypass
> mode:
> 
> Plane 10
> ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> ├─ …
> └─ "color_pipeline": enum {0, 42, 52} = 0
> 
> The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> objects. The entry value is an object ID pointing to the head of the linked
> list (the first operation in the color pipeline).
> 
> The new COLOROP objects also expose a number of KMS properties. Each has a
> type, a reference to the next COLOROP object in the linked list, and other
> type-specific properties. Here is an example for a 1D LUT operation:
> 
> Color operation 42
> ├─ "type": enum {Bypass, 1D curve} = 1D curve
> ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
> ├─ "lut_size": immutable range = 4096
> ├─ "lut_data": blob
> └─ "next": immutable color operation ID = 43
> 
> To configure this hardware block, user-space can fill a KMS blob with 4096 u32
> entries, then set "lut_data" to the blob ID. Other color operation types might
> have different properties.
> 
> Here is another example with a 3D LUT:
> 
> Color operation 42
> ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> ├─ "lut_size": immutable range = 33
> ├─ "lut_data": blob
> └─ "next": immutable color operation ID = 43
> 
> And one last example with a matrix:
> 
> Color operation 42
> ├─ "type": enum {Bypass, Matrix} = Matrix
> ├─ "matrix_data": blob
> └─ "next": immutable color operation ID = 43
> 
> [Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
> a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
> blocks which can be bypassed instead.]
> 
> [Jonas note: perhaps a single "data" property for both LUTs 

Re: [RFC] Plane color pipeline KMS uAPI

2023-05-04 Thread Harry Wentland




On 5/4/23 11:22, Simon Ser wrote:

Hi all,

The goal of this RFC is to expose a generic KMS uAPI to configure the color
pipeline before blending, ie. after a pixel is tapped from a plane's
framebuffer and before it's blended with other planes. With this new uAPI we
aim to reduce the battery life impact of color management and HDR on mobile
devices, to improve performance and to decrease latency by skipping
composition on the 3D engine. This proposal is the result of discussions at
the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
familiar with the AMD, Intel and NVIDIA hardware have participated in the
discussion.



Thanks for typing this up. It does a great job describing the vision.


This proposal takes a prescriptive approach instead of a descriptive approach.
Drivers describe the available hardware blocks in terms of low-level
mathematical operations, then user-space configures each block. We decided
against a descriptive approach where user-space would provide a high-level
description of the colorspace and other parameters: we want to give more
control and flexibility to user-space, e.g. to be able to replicate exactly the
color pipeline with shaders and switch between shaders and KMS pipelines
seamlessly, and to avoid forcing user-space into a particular color management
policy.

We've decided against mirroring the existing CRTC properties
DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
pipeline can significantly differ between vendors and this approach cannot
accurately abstract all hardware. In particular, the availability, ordering and
capabilities of hardware blocks is different on each display engine. So, we've
decided to go for a highly detailed hardware capability discovery.

This new uAPI should not be in conflict with existing standard KMS properties,
since there are none which control the pre-blending color pipeline at the
moment. It does conflict with any vendor-specific properties like
NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
properties. Drivers will need to either reject atomic commits configuring both
uAPIs, or alternatively we could add a DRM client cap which hides the vendor
properties and shows the new generic properties when enabled.

To use this uAPI, first user-space needs to discover hardware capabilities via
KMS objects and properties, then user-space can configure the hardware via an
atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.

Our proposal introduces a new "color_pipeline" plane property, and a new KMS
object type, "COLOROP" (short for color operation). The "color_pipeline" plane
property is an enum, each enum entry represents a color pipeline supported by
the hardware. The special zero entry indicates that the pipeline is in
"bypass"/"no-op" mode. For instance, the following plane properties describe a
primary plane with 2 supported pipelines but currently configured in bypass
mode:

 Plane 10
 ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
 ├─ …
 └─ "color_pipeline": enum {0, 42, 52} = 0

The non-zero entries describe color pipelines as a linked list of COLOROP KMS
objects. The entry value is an object ID pointing to the head of the linked
list (the first operation in the color pipeline).

The new COLOROP objects also expose a number of KMS properties. Each has a
type, a reference to the next COLOROP object in the linked list, and other
type-specific properties. Here is an example for a 1D LUT operation:

 Color operation 42
 ├─ "type": enum {Bypass, 1D curve} = 1D curve
 ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
 ├─ "lut_size": immutable range = 4096
 ├─ "lut_data": blob
 └─ "next": immutable color operation ID = 43

To configure this hardware block, user-space can fill a KMS blob with 4096 u32
entries, then set "lut_data" to the blob ID. Other color operation types might
have different properties.

Here is another example with a 3D LUT:

 Color operation 42
 ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
 ├─ "lut_size": immutable range = 33
 ├─ "lut_data": blob
 └─ "next": immutable color operation ID = 43

And one last example with a matrix:

 Color operation 42
 ├─ "type": enum {Bypass, Matrix} = Matrix
 ├─ "matrix_data": blob
 └─ "next": immutable color operation ID = 43

[Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
blocks which can be bypassed instead.]


I would favor a "bypass" boolean property.



[Jonas note: perhaps a single "data" property for both LUTs and matrices
would make more sense. And a "size" prop for both 1D and 3D LUTs.]



I concur. We'll probably want to document for which types a property 
applies.



If some hardware supports re-ordering operations in the color pipeline, the
driver can expose multiple pipelines with 

[RFC] Plane color pipeline KMS uAPI

2023-05-04 Thread Simon Ser
Hi all,

The goal of this RFC is to expose a generic KMS uAPI to configure the color
pipeline before blending, ie. after a pixel is tapped from a plane's
framebuffer and before it's blended with other planes. With this new uAPI we
aim to reduce the battery life impact of color management and HDR on mobile
devices, to improve performance and to decrease latency by skipping
composition on the 3D engine. This proposal is the result of discussions at
the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
familiar with the AMD, Intel and NVIDIA hardware have participated in the
discussion.

This proposal takes a prescriptive approach instead of a descriptive approach.
Drivers describe the available hardware blocks in terms of low-level
mathematical operations, then user-space configures each block. We decided
against a descriptive approach where user-space would provide a high-level
description of the colorspace and other parameters: we want to give more
control and flexibility to user-space, e.g. to be able to replicate exactly the
color pipeline with shaders and switch between shaders and KMS pipelines
seamlessly, and to avoid forcing user-space into a particular color management
policy.

We've decided against mirroring the existing CRTC properties
DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
pipeline can significantly differ between vendors and this approach cannot
accurately abstract all hardware. In particular, the availability, ordering and
capabilities of hardware blocks is different on each display engine. So, we've
decided to go for a highly detailed hardware capability discovery.

This new uAPI should not be in conflict with existing standard KMS properties,
since there are none which control the pre-blending color pipeline at the
moment. It does conflict with any vendor-specific properties like
NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
properties. Drivers will need to either reject atomic commits configuring both
uAPIs, or alternatively we could add a DRM client cap which hides the vendor
properties and shows the new generic properties when enabled.

To use this uAPI, first user-space needs to discover hardware capabilities via
KMS objects and properties, then user-space can configure the hardware via an
atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.

Our proposal introduces a new "color_pipeline" plane property, and a new KMS
object type, "COLOROP" (short for color operation). The "color_pipeline" plane
property is an enum, each enum entry represents a color pipeline supported by
the hardware. The special zero entry indicates that the pipeline is in
"bypass"/"no-op" mode. For instance, the following plane properties describe a
primary plane with 2 supported pipelines but currently configured in bypass
mode:

Plane 10
├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
├─ …
└─ "color_pipeline": enum {0, 42, 52} = 0

The non-zero entries describe color pipelines as a linked list of COLOROP KMS
objects. The entry value is an object ID pointing to the head of the linked
list (the first operation in the color pipeline).

The new COLOROP objects also expose a number of KMS properties. Each has a
type, a reference to the next COLOROP object in the linked list, and other
type-specific properties. Here is an example for a 1D LUT operation:

Color operation 42
├─ "type": enum {Bypass, 1D curve} = 1D curve
├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
├─ "lut_size": immutable range = 4096
├─ "lut_data": blob
└─ "next": immutable color operation ID = 43

To configure this hardware block, user-space can fill a KMS blob with 4096 u32
entries, then set "lut_data" to the blob ID. Other color operation types might
have different properties.

Here is another example with a 3D LUT:

Color operation 42
├─ "type": enum {Bypass, 3D LUT} = 3D LUT
├─ "lut_size": immutable range = 33
├─ "lut_data": blob
└─ "next": immutable color operation ID = 43

And one last example with a matrix:

Color operation 42
├─ "type": enum {Bypass, Matrix} = Matrix
├─ "matrix_data": blob
└─ "next": immutable color operation ID = 43

[Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
blocks which can be bypassed instead.]

[Jonas note: perhaps a single "data" property for both LUTs and matrices
would make more sense. And a "size" prop for both 1D and 3D LUTs.]

If some hardware supports re-ordering operations in the color pipeline, the
driver can expose multiple pipelines with different operation ordering, and
user-space can pick the ordering it prefers by selecting the right pipeline.
The same scheme can be used to expose hardware blocks supporting multiple
precision levels.

That's pretty much all there is to it, but as always the devil