Re: [Mesa-dev] [PATCH v3 5/7] radeonsi: Implement DCC fast clear.

2015-10-23 Thread Marek Olšák
On Fri, Oct 23, 2015 at 7:06 PM, Bas Nieuwenhuizen
 wrote:
> On Fri, Oct 23, 2015 at 4:57 PM, Marek Olšák  wrote:
>> On Fri, Oct 23, 2015 at 1:57 PM, Bas Nieuwenhuizen
>>  wrote:
>>> On Fri, Oct 23, 2015 at 1:52 PM, Marek Olšák  wrote:
 On Fri, Oct 23, 2015 at 1:30 PM, Bas Nieuwenhuizen
  wrote:
> On Fri, Oct 23, 2015 at 12:50 PM, Marek Olšák  wrote:
>> On Fri, Oct 23, 2015 at 12:17 PM, Bas Nieuwenhuizen
>>  wrote:
>>> On Thu, Oct 22, 2015 at 12:12 PM, Marek Olšák  wrote:
> diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
> b/src/gallium/drivers/radeonsi/si_descriptors.c
> index 5548cba3..a277fa5 100644
> --- a/src/gallium/drivers/radeonsi/si_descriptors.c
> +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
> @@ -234,7 +234,7 @@ static void si_set_sampler_views(struct 
> pipe_context *ctx,
> } else {
> samplers->depth_texture_mask &= ~(1 
> << slot);
> }
> -   if (rtex->cmask.size || rtex->fmask.size) {
> +   if (rtex->cmask.size || rtex->fmask.size || 
> rtex->surface.dcc_enabled) {
> samplers->compressed_colortex_mask |= 
> 1 << slot;

 I'd like this flag to be set only when dirty_level_mask is non-zero.
 Setting this for all textures that have DCC is quite expensive in draw
 calls.
>>>
>>> I think this code is incorrect even without considering DCC. If we do
>>> a fast clear on a surface which allocates a cmask and then use that
>>> surface as a texture without calling set_sampler_views in between
>>> (because it was bound before) we get a stale compressed_colortex_mask.
>>>
>>> Some testing shows that this can be triggered using OpenGL, although
>>> the GL_ARB_texture_barrier extension may be needed to make the result
>>> not undefined per the specification.
>>
>> In that case, we should decompress in texture_barrier and not in draw 
>> calls.
>>
>> Marek
>
>
> texture_barrier does not need to be called though, the language
> changes might be needed.
>
> Basically the test is
>
> fbo1, fbo2 framebuffers with 1 color buffer each:
>
> bind fbo2 as texture
> clear fbo1 using shader
> bind fbo1 as texture
> clear fbo2 using shader
> clear fbo1 using clear (which results in cmask being allocated for fbo1)
>>>
> bind fbo2 as texture
> copy fbo2 to fbo1 using copy shader (which wrongly does not decompress 
> fbo1)
>>>
>>> My apologies, these two lines should just be a copy fbo1 to fbo2,
>>> which does need to eleminate the cmask fast clear.
>>
>> That sounds like a texture barrier is required.
>>
>> Marek
>
> I think it valid if even without ARB_texture_barrier as the only place
> where we could have a rendering feedback loop is the clear. The shader
> clears and the copy do not have the same fbo as texture and therefore
> no render feedback loop.
>
> I am not sure if a clear classifies as a GL rendering operation. If it
> is not, we have no render feedback loop. If it is, it is still not a
> render feedback loop as the active fragment and vertex shaders (the
> clear shader) do not contain instructions that sample from that
> texture.

The texture barrier ensures that the previous writes are visible to
the next read of the texture. The previous reads are irrelevant.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 5/7] radeonsi: Implement DCC fast clear.

2015-10-23 Thread Bas Nieuwenhuizen
On Fri, Oct 23, 2015 at 4:57 PM, Marek Olšák  wrote:
> On Fri, Oct 23, 2015 at 1:57 PM, Bas Nieuwenhuizen
>  wrote:
>> On Fri, Oct 23, 2015 at 1:52 PM, Marek Olšák  wrote:
>>> On Fri, Oct 23, 2015 at 1:30 PM, Bas Nieuwenhuizen
>>>  wrote:
 On Fri, Oct 23, 2015 at 12:50 PM, Marek Olšák  wrote:
> On Fri, Oct 23, 2015 at 12:17 PM, Bas Nieuwenhuizen
>  wrote:
>> On Thu, Oct 22, 2015 at 12:12 PM, Marek Olšák  wrote:
 diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
 b/src/gallium/drivers/radeonsi/si_descriptors.c
 index 5548cba3..a277fa5 100644
 --- a/src/gallium/drivers/radeonsi/si_descriptors.c
 +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
 @@ -234,7 +234,7 @@ static void si_set_sampler_views(struct 
 pipe_context *ctx,
 } else {
 samplers->depth_texture_mask &= ~(1 << 
 slot);
 }
 -   if (rtex->cmask.size || rtex->fmask.size) {
 +   if (rtex->cmask.size || rtex->fmask.size || 
 rtex->surface.dcc_enabled) {
 samplers->compressed_colortex_mask |= 
 1 << slot;
>>>
>>> I'd like this flag to be set only when dirty_level_mask is non-zero.
>>> Setting this for all textures that have DCC is quite expensive in draw
>>> calls.
>>
>> I think this code is incorrect even without considering DCC. If we do
>> a fast clear on a surface which allocates a cmask and then use that
>> surface as a texture without calling set_sampler_views in between
>> (because it was bound before) we get a stale compressed_colortex_mask.
>>
>> Some testing shows that this can be triggered using OpenGL, although
>> the GL_ARB_texture_barrier extension may be needed to make the result
>> not undefined per the specification.
>
> In that case, we should decompress in texture_barrier and not in draw 
> calls.
>
> Marek


 texture_barrier does not need to be called though, the language
 changes might be needed.

 Basically the test is

 fbo1, fbo2 framebuffers with 1 color buffer each:

 bind fbo2 as texture
 clear fbo1 using shader
 bind fbo1 as texture
 clear fbo2 using shader
 clear fbo1 using clear (which results in cmask being allocated for fbo1)
>>
 bind fbo2 as texture
 copy fbo2 to fbo1 using copy shader (which wrongly does not decompress 
 fbo1)
>>
>> My apologies, these two lines should just be a copy fbo1 to fbo2,
>> which does need to eleminate the cmask fast clear.
>
> That sounds like a texture barrier is required.
>
> Marek

I think it valid if even without ARB_texture_barrier as the only place
where we could have a rendering feedback loop is the clear. The shader
clears and the copy do not have the same fbo as texture and therefore
no render feedback loop.

I am not sure if a clear classifies as a GL rendering operation. If it
is not, we have no render feedback loop. If it is, it is still not a
render feedback loop as the active fragment and vertex shaders (the
clear shader) do not contain instructions that sample from that
texture.

- Bas
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 5/7] radeonsi: Implement DCC fast clear.

2015-10-23 Thread Marek Olšák
On Fri, Oct 23, 2015 at 1:57 PM, Bas Nieuwenhuizen
 wrote:
> On Fri, Oct 23, 2015 at 1:52 PM, Marek Olšák  wrote:
>> On Fri, Oct 23, 2015 at 1:30 PM, Bas Nieuwenhuizen
>>  wrote:
>>> On Fri, Oct 23, 2015 at 12:50 PM, Marek Olšák  wrote:
 On Fri, Oct 23, 2015 at 12:17 PM, Bas Nieuwenhuizen
  wrote:
> On Thu, Oct 22, 2015 at 12:12 PM, Marek Olšák  wrote:
>>> diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
>>> b/src/gallium/drivers/radeonsi/si_descriptors.c
>>> index 5548cba3..a277fa5 100644
>>> --- a/src/gallium/drivers/radeonsi/si_descriptors.c
>>> +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
>>> @@ -234,7 +234,7 @@ static void si_set_sampler_views(struct 
>>> pipe_context *ctx,
>>> } else {
>>> samplers->depth_texture_mask &= ~(1 << 
>>> slot);
>>> }
>>> -   if (rtex->cmask.size || rtex->fmask.size) {
>>> +   if (rtex->cmask.size || rtex->fmask.size || 
>>> rtex->surface.dcc_enabled) {
>>> samplers->compressed_colortex_mask |= 1 
>>> << slot;
>>
>> I'd like this flag to be set only when dirty_level_mask is non-zero.
>> Setting this for all textures that have DCC is quite expensive in draw
>> calls.
>
> I think this code is incorrect even without considering DCC. If we do
> a fast clear on a surface which allocates a cmask and then use that
> surface as a texture without calling set_sampler_views in between
> (because it was bound before) we get a stale compressed_colortex_mask.
>
> Some testing shows that this can be triggered using OpenGL, although
> the GL_ARB_texture_barrier extension may be needed to make the result
> not undefined per the specification.

 In that case, we should decompress in texture_barrier and not in draw 
 calls.

 Marek
>>>
>>>
>>> texture_barrier does not need to be called though, the language
>>> changes might be needed.
>>>
>>> Basically the test is
>>>
>>> fbo1, fbo2 framebuffers with 1 color buffer each:
>>>
>>> bind fbo2 as texture
>>> clear fbo1 using shader
>>> bind fbo1 as texture
>>> clear fbo2 using shader
>>> clear fbo1 using clear (which results in cmask being allocated for fbo1)
>
>>> bind fbo2 as texture
>>> copy fbo2 to fbo1 using copy shader (which wrongly does not decompress fbo1)
>
> My apologies, these two lines should just be a copy fbo1 to fbo2,
> which does need to eleminate the cmask fast clear.

That sounds like a texture barrier is required.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 5/7] radeonsi: Implement DCC fast clear.

2015-10-23 Thread Bas Nieuwenhuizen
On Fri, Oct 23, 2015 at 1:52 PM, Marek Olšák  wrote:
> On Fri, Oct 23, 2015 at 1:30 PM, Bas Nieuwenhuizen
>  wrote:
>> On Fri, Oct 23, 2015 at 12:50 PM, Marek Olšák  wrote:
>>> On Fri, Oct 23, 2015 at 12:17 PM, Bas Nieuwenhuizen
>>>  wrote:
 On Thu, Oct 22, 2015 at 12:12 PM, Marek Olšák  wrote:
>> diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
>> b/src/gallium/drivers/radeonsi/si_descriptors.c
>> index 5548cba3..a277fa5 100644
>> --- a/src/gallium/drivers/radeonsi/si_descriptors.c
>> +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
>> @@ -234,7 +234,7 @@ static void si_set_sampler_views(struct pipe_context 
>> *ctx,
>> } else {
>> samplers->depth_texture_mask &= ~(1 << 
>> slot);
>> }
>> -   if (rtex->cmask.size || rtex->fmask.size) {
>> +   if (rtex->cmask.size || rtex->fmask.size || 
>> rtex->surface.dcc_enabled) {
>> samplers->compressed_colortex_mask |= 1 
>> << slot;
>
> I'd like this flag to be set only when dirty_level_mask is non-zero.
> Setting this for all textures that have DCC is quite expensive in draw
> calls.

 I think this code is incorrect even without considering DCC. If we do
 a fast clear on a surface which allocates a cmask and then use that
 surface as a texture without calling set_sampler_views in between
 (because it was bound before) we get a stale compressed_colortex_mask.

 Some testing shows that this can be triggered using OpenGL, although
 the GL_ARB_texture_barrier extension may be needed to make the result
 not undefined per the specification.
>>>
>>> In that case, we should decompress in texture_barrier and not in draw calls.
>>>
>>> Marek
>>
>>
>> texture_barrier does not need to be called though, the language
>> changes might be needed.
>>
>> Basically the test is
>>
>> fbo1, fbo2 framebuffers with 1 color buffer each:
>>
>> bind fbo2 as texture
>> clear fbo1 using shader
>> bind fbo1 as texture
>> clear fbo2 using shader
>> clear fbo1 using clear (which results in cmask being allocated for fbo1)

>> bind fbo2 as texture
>> copy fbo2 to fbo1 using copy shader (which wrongly does not decompress fbo1)

My apologies, these two lines should just be a copy fbo1 to fbo2,
which does need to eleminate the cmask fast clear.

- Bas

>> If the clear shader has an unused sampler2D, then we bind fbo1 as a
>> texture before the clear and do not rebind it afterwards.
>>
>> Note that the normal clear is the only one for which we have a
>> rendering feedback loop, and it should not fetch from any texture at
>> all. I don't think we need a texture barrier for that.
>>
>> I would think this would be most easily fixed by recomputing
>> compressed_colortex_mask after a fast clear.
>
> I'm afraid I don't understand the given example. Copying to fbo1
> doesn't need decompression, because fbo1 is a color buffer.
>
> I'm assuming there are no image stores.
>
> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 5/7] radeonsi: Implement DCC fast clear.

2015-10-23 Thread Marek Olšák
On Fri, Oct 23, 2015 at 1:30 PM, Bas Nieuwenhuizen
 wrote:
> On Fri, Oct 23, 2015 at 12:50 PM, Marek Olšák  wrote:
>> On Fri, Oct 23, 2015 at 12:17 PM, Bas Nieuwenhuizen
>>  wrote:
>>> On Thu, Oct 22, 2015 at 12:12 PM, Marek Olšák  wrote:
> diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
> b/src/gallium/drivers/radeonsi/si_descriptors.c
> index 5548cba3..a277fa5 100644
> --- a/src/gallium/drivers/radeonsi/si_descriptors.c
> +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
> @@ -234,7 +234,7 @@ static void si_set_sampler_views(struct pipe_context 
> *ctx,
> } else {
> samplers->depth_texture_mask &= ~(1 << 
> slot);
> }
> -   if (rtex->cmask.size || rtex->fmask.size) {
> +   if (rtex->cmask.size || rtex->fmask.size || 
> rtex->surface.dcc_enabled) {
> samplers->compressed_colortex_mask |= 1 
> << slot;

 I'd like this flag to be set only when dirty_level_mask is non-zero.
 Setting this for all textures that have DCC is quite expensive in draw
 calls.
>>>
>>> I think this code is incorrect even without considering DCC. If we do
>>> a fast clear on a surface which allocates a cmask and then use that
>>> surface as a texture without calling set_sampler_views in between
>>> (because it was bound before) we get a stale compressed_colortex_mask.
>>>
>>> Some testing shows that this can be triggered using OpenGL, although
>>> the GL_ARB_texture_barrier extension may be needed to make the result
>>> not undefined per the specification.
>>
>> In that case, we should decompress in texture_barrier and not in draw calls.
>>
>> Marek
>
>
> texture_barrier does not need to be called though, the language
> changes might be needed.
>
> Basically the test is
>
> fbo1, fbo2 framebuffers with 1 color buffer each:
>
> bind fbo2 as texture
> clear fbo1 using shader
> bind fbo1 as texture
> clear fbo2 using shader
> clear fbo1 using clear (which results in cmask being allocated for fbo1)
> bind fbo2 as texture
> copy fbo2 to fbo1 using copy shader (which wrongly does not decompress fbo1)
>
> If the clear shader has an unused sampler2D, then we bind fbo1 as a
> texture before the clear and do not rebind it afterwards.
>
> Note that the normal clear is the only one for which we have a
> rendering feedback loop, and it should not fetch from any texture at
> all. I don't think we need a texture barrier for that.
>
> I would think this would be most easily fixed by recomputing
> compressed_colortex_mask after a fast clear.

I'm afraid I don't understand the given example. Copying to fbo1
doesn't need decompression, because fbo1 is a color buffer.

I'm assuming there are no image stores.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 5/7] radeonsi: Implement DCC fast clear.

2015-10-23 Thread Bas Nieuwenhuizen
On Fri, Oct 23, 2015 at 12:50 PM, Marek Olšák  wrote:
> On Fri, Oct 23, 2015 at 12:17 PM, Bas Nieuwenhuizen
>  wrote:
>> On Thu, Oct 22, 2015 at 12:12 PM, Marek Olšák  wrote:
 diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
 b/src/gallium/drivers/radeonsi/si_descriptors.c
 index 5548cba3..a277fa5 100644
 --- a/src/gallium/drivers/radeonsi/si_descriptors.c
 +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
 @@ -234,7 +234,7 @@ static void si_set_sampler_views(struct pipe_context 
 *ctx,
 } else {
 samplers->depth_texture_mask &= ~(1 << 
 slot);
 }
 -   if (rtex->cmask.size || rtex->fmask.size) {
 +   if (rtex->cmask.size || rtex->fmask.size || 
 rtex->surface.dcc_enabled) {
 samplers->compressed_colortex_mask |= 1 << 
 slot;
>>>
>>> I'd like this flag to be set only when dirty_level_mask is non-zero.
>>> Setting this for all textures that have DCC is quite expensive in draw
>>> calls.
>>
>> I think this code is incorrect even without considering DCC. If we do
>> a fast clear on a surface which allocates a cmask and then use that
>> surface as a texture without calling set_sampler_views in between
>> (because it was bound before) we get a stale compressed_colortex_mask.
>>
>> Some testing shows that this can be triggered using OpenGL, although
>> the GL_ARB_texture_barrier extension may be needed to make the result
>> not undefined per the specification.
>
> In that case, we should decompress in texture_barrier and not in draw calls.
>
> Marek


texture_barrier does not need to be called though, the language
changes might be needed.

Basically the test is

fbo1, fbo2 framebuffers with 1 color buffer each:

bind fbo2 as texture
clear fbo1 using shader
bind fbo1 as texture
clear fbo2 using shader
clear fbo1 using clear (which results in cmask being allocated for fbo1)
bind fbo2 as texture
copy fbo2 to fbo1 using copy shader (which wrongly does not decompress fbo1)

If the clear shader has an unused sampler2D, then we bind fbo1 as a
texture before the clear and do not rebind it afterwards.

Note that the normal clear is the only one for which we have a
rendering feedback loop, and it should not fetch from any texture at
all. I don't think we need a texture barrier for that.

I would think this would be most easily fixed by recomputing
compressed_colortex_mask after a fast clear.

- Bas
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 5/7] radeonsi: Implement DCC fast clear.

2015-10-23 Thread Marek Olšák
On Fri, Oct 23, 2015 at 12:17 PM, Bas Nieuwenhuizen
 wrote:
> On Thu, Oct 22, 2015 at 12:12 PM, Marek Olšák  wrote:
>>> diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
>>> b/src/gallium/drivers/radeonsi/si_descriptors.c
>>> index 5548cba3..a277fa5 100644
>>> --- a/src/gallium/drivers/radeonsi/si_descriptors.c
>>> +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
>>> @@ -234,7 +234,7 @@ static void si_set_sampler_views(struct pipe_context 
>>> *ctx,
>>> } else {
>>> samplers->depth_texture_mask &= ~(1 << 
>>> slot);
>>> }
>>> -   if (rtex->cmask.size || rtex->fmask.size) {
>>> +   if (rtex->cmask.size || rtex->fmask.size || 
>>> rtex->surface.dcc_enabled) {
>>> samplers->compressed_colortex_mask |= 1 << 
>>> slot;
>>
>> I'd like this flag to be set only when dirty_level_mask is non-zero.
>> Setting this for all textures that have DCC is quite expensive in draw
>> calls.
>
> I think this code is incorrect even without considering DCC. If we do
> a fast clear on a surface which allocates a cmask and then use that
> surface as a texture without calling set_sampler_views in between
> (because it was bound before) we get a stale compressed_colortex_mask.
>
> Some testing shows that this can be triggered using OpenGL, although
> the GL_ARB_texture_barrier extension may be needed to make the result
> not undefined per the specification.

In that case, we should decompress in texture_barrier and not in draw calls.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 5/7] radeonsi: Implement DCC fast clear.

2015-10-23 Thread Bas Nieuwenhuizen
On Thu, Oct 22, 2015 at 12:12 PM, Marek Olšák  wrote:
>> diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
>> b/src/gallium/drivers/radeonsi/si_descriptors.c
>> index 5548cba3..a277fa5 100644
>> --- a/src/gallium/drivers/radeonsi/si_descriptors.c
>> +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
>> @@ -234,7 +234,7 @@ static void si_set_sampler_views(struct pipe_context 
>> *ctx,
>> } else {
>> samplers->depth_texture_mask &= ~(1 << slot);
>> }
>> -   if (rtex->cmask.size || rtex->fmask.size) {
>> +   if (rtex->cmask.size || rtex->fmask.size || 
>> rtex->surface.dcc_enabled) {
>> samplers->compressed_colortex_mask |= 1 << 
>> slot;
>
> I'd like this flag to be set only when dirty_level_mask is non-zero.
> Setting this for all textures that have DCC is quite expensive in draw
> calls.

I think this code is incorrect even without considering DCC. If we do
a fast clear on a surface which allocates a cmask and then use that
surface as a texture without calling set_sampler_views in between
(because it was bound before) we get a stale compressed_colortex_mask.

Some testing shows that this can be triggered using OpenGL, although
the GL_ARB_texture_barrier extension may be needed to make the result
not undefined per the specification.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 5/7] radeonsi: Implement DCC fast clear.

2015-10-22 Thread Bas Nieuwenhuizen
On Thu, Oct 22, 2015 at 12:12 PM, Marek Olšák  wrote:
> On Wed, Oct 21, 2015 at 12:10 AM, Bas Nieuwenhuizen
>  wrote:
>> Uses the DCC buffer instead of the CMASK buffer. The ELIMINATE_FAST_CLEAR
>> still works. Furthermore, with DCC compression we can directly clear
>> to a limited set of colors such that we do not need a postprocessing step.
>>
>> Signed-off-by: Bas Nieuwenhuizen 
>> ---
>>  src/gallium/drivers/radeon/r600_texture.c | 105 
>> +++---
>>  src/gallium/drivers/radeonsi/si_blit.c|   4 +-
>>  src/gallium/drivers/radeonsi/si_descriptors.c |   2 +-
>>  3 files changed, 97 insertions(+), 14 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeon/r600_texture.c 
>> b/src/gallium/drivers/radeon/r600_texture.c
>> index 0314049..4391665 100644
>> --- a/src/gallium/drivers/radeon/r600_texture.c
>> +++ b/src/gallium/drivers/radeon/r600_texture.c
>> @@ -1239,6 +1239,79 @@ static void evergreen_set_clear_color(struct 
>> r600_texture *rtex,
>> memcpy(rtex->color_clear_value, &uc, 2 * sizeof(uint32_t));
>>  }
>>
>> +static void vi_get_fast_clear_parameters(enum pipe_format surface_format,
>> +const union pipe_color_union *color,
>> +uint32_t* reset_value,
>> +bool* clear_words_needed)
>> +{
>> +   bool values[4] = {};
>> +   bool main_value = false;
>> +   int i;
>> +   int extra_channel;
>> +   int extra_component = 0;
>> +   const struct util_format_description *desc = 
>> util_format_description(surface_format);
>> +
>> +   *clear_words_needed = true;
>> +   *reset_value = 0x20202020U;
>> +
>> +   /* If we want to clear without needing a fast clear eliminate step, 
>> we can set each channel to
>> +* 0 or 1 (or 0/max for integer formats). We have two sets of flags, 
>> one for the last or first
>> +* channel and one for the rest. We decide on the last or first 
>> channel by r600_translate_colorswap.
>> +*
>> +* Note that in formats as R8G8B8X*, the X8 is the last channel, so 
>> the last channel may not correspond
>> +* to the last enabled component.
>> +*/
>> +
>> +   /* Not sure if it is a coincidence that these are all the 3-channel 
>> color formats. */
>> +   if (surface_format == PIPE_FORMAT_R11G11B10_FLOAT ||
>> +   surface_format == PIPE_FORMAT_B5G6R5_UNORM ||
>> +   surface_format == PIPE_FORMAT_B5G6R5_SRGB) {
>> +   extra_channel = -1;
>> +   } else if (desc->layout == UTIL_FORMAT_LAYOUT_PLAIN) {
>> +   extra_channel = (r600_translate_colorswap(surface_format) <= 
>> 1) ? desc->nr_channels - 1 : 0;
>> +   } else
>> +   return;
>> +
>> +   for (i = 0; i < 4; ++i) {
>> +   int index = desc->swizzle[i] - UTIL_FORMAT_SWIZZLE_X;
>> +
>> +   if (desc->swizzle[i] < UTIL_FORMAT_SWIZZLE_X || 
>> desc->swizzle[i] > UTIL_FORMAT_SWIZZLE_W)
>> +   continue;
>> +
>> +
>> +   if (util_format_is_pure_sint(surface_format)) {
>> +   values[i] = color->i[i] != 0;
>> +   if (color->i[i] != 0 && color->i[i] != (1ULL << 
>> (desc->channel[index].size - 1)) - 1)
>> +   return;
>> +   } else if (util_format_is_pure_uint(surface_format)) {
>> +   values[i] = color->ui[i] != 0U;
>> +   if (color->ui[i] != 0U && color->ui[i] != (1ULL << 
>> desc->channel[index].size) - 1)
>> +   return;
>> +   } else {
>> +   values[i] = color->f[i] != 0.0F;
>> +   if (color->f[i] != 0.0F && color->f[i] != 1.0F)
>> +   return;
>> +   }
>> +
>> +   if (index == extra_channel)
>> +   extra_component = i;
>> +   else
>> +   main_value = values[i];
>> +   }
>> +
>> +   for (int i = 0; i < 4; ++i)
>> +   if (values[i] != main_value && desc->swizzle[i] - 
>> UTIL_FORMAT_SWIZZLE_X != extra_channel &&
>> +   desc->swizzle[i] >= UTIL_FORMAT_SWIZZLE_X && 
>> desc->swizzle[i] <= UTIL_FORMAT_SWIZZLE_W)
>> +   return;
>> +
>> +   *clear_words_needed = false;
>> +   if (main_value)
>> +   *reset_value |= 0x80808080U;
>> +
>> +   if (values[extra_component])
>> +   *reset_value |= 0x40404040U;
>
> Could you please reformat this function and rename things to be more
> readable? "main" is color and "extra" is alpha, right? if yes, they
> should be called color and alpha. Also, 80 characters per line where
> possible.

Extra is not necessarily alpha. I.e. for R8G8 it actually is the green
component.

>
>> +}
>> +
>>  void evergreen_do_fast_color_clear(struct r600_common_context *rctx,
>>  

Re: [Mesa-dev] [PATCH v3 5/7] radeonsi: Implement DCC fast clear.

2015-10-22 Thread Marek Olšák
On Wed, Oct 21, 2015 at 12:10 AM, Bas Nieuwenhuizen
 wrote:
> Uses the DCC buffer instead of the CMASK buffer. The ELIMINATE_FAST_CLEAR
> still works. Furthermore, with DCC compression we can directly clear
> to a limited set of colors such that we do not need a postprocessing step.
>
> Signed-off-by: Bas Nieuwenhuizen 
> ---
>  src/gallium/drivers/radeon/r600_texture.c | 105 
> +++---
>  src/gallium/drivers/radeonsi/si_blit.c|   4 +-
>  src/gallium/drivers/radeonsi/si_descriptors.c |   2 +-
>  3 files changed, 97 insertions(+), 14 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_texture.c 
> b/src/gallium/drivers/radeon/r600_texture.c
> index 0314049..4391665 100644
> --- a/src/gallium/drivers/radeon/r600_texture.c
> +++ b/src/gallium/drivers/radeon/r600_texture.c
> @@ -1239,6 +1239,79 @@ static void evergreen_set_clear_color(struct 
> r600_texture *rtex,
> memcpy(rtex->color_clear_value, &uc, 2 * sizeof(uint32_t));
>  }
>
> +static void vi_get_fast_clear_parameters(enum pipe_format surface_format,
> +const union pipe_color_union *color,
> +uint32_t* reset_value,
> +bool* clear_words_needed)
> +{
> +   bool values[4] = {};
> +   bool main_value = false;
> +   int i;
> +   int extra_channel;
> +   int extra_component = 0;
> +   const struct util_format_description *desc = 
> util_format_description(surface_format);
> +
> +   *clear_words_needed = true;
> +   *reset_value = 0x20202020U;
> +
> +   /* If we want to clear without needing a fast clear eliminate step, 
> we can set each channel to
> +* 0 or 1 (or 0/max for integer formats). We have two sets of flags, 
> one for the last or first
> +* channel and one for the rest. We decide on the last or first 
> channel by r600_translate_colorswap.
> +*
> +* Note that in formats as R8G8B8X*, the X8 is the last channel, so 
> the last channel may not correspond
> +* to the last enabled component.
> +*/
> +
> +   /* Not sure if it is a coincidence that these are all the 3-channel 
> color formats. */
> +   if (surface_format == PIPE_FORMAT_R11G11B10_FLOAT ||
> +   surface_format == PIPE_FORMAT_B5G6R5_UNORM ||
> +   surface_format == PIPE_FORMAT_B5G6R5_SRGB) {
> +   extra_channel = -1;
> +   } else if (desc->layout == UTIL_FORMAT_LAYOUT_PLAIN) {
> +   extra_channel = (r600_translate_colorswap(surface_format) <= 
> 1) ? desc->nr_channels - 1 : 0;
> +   } else
> +   return;
> +
> +   for (i = 0; i < 4; ++i) {
> +   int index = desc->swizzle[i] - UTIL_FORMAT_SWIZZLE_X;
> +
> +   if (desc->swizzle[i] < UTIL_FORMAT_SWIZZLE_X || 
> desc->swizzle[i] > UTIL_FORMAT_SWIZZLE_W)
> +   continue;
> +
> +
> +   if (util_format_is_pure_sint(surface_format)) {
> +   values[i] = color->i[i] != 0;
> +   if (color->i[i] != 0 && color->i[i] != (1ULL << 
> (desc->channel[index].size - 1)) - 1)
> +   return;
> +   } else if (util_format_is_pure_uint(surface_format)) {
> +   values[i] = color->ui[i] != 0U;
> +   if (color->ui[i] != 0U && color->ui[i] != (1ULL << 
> desc->channel[index].size) - 1)
> +   return;
> +   } else {
> +   values[i] = color->f[i] != 0.0F;
> +   if (color->f[i] != 0.0F && color->f[i] != 1.0F)
> +   return;
> +   }
> +
> +   if (index == extra_channel)
> +   extra_component = i;
> +   else
> +   main_value = values[i];
> +   }
> +
> +   for (int i = 0; i < 4; ++i)
> +   if (values[i] != main_value && desc->swizzle[i] - 
> UTIL_FORMAT_SWIZZLE_X != extra_channel &&
> +   desc->swizzle[i] >= UTIL_FORMAT_SWIZZLE_X && 
> desc->swizzle[i] <= UTIL_FORMAT_SWIZZLE_W)
> +   return;
> +
> +   *clear_words_needed = false;
> +   if (main_value)
> +   *reset_value |= 0x80808080U;
> +
> +   if (values[extra_component])
> +   *reset_value |= 0x40404040U;

Could you please reformat this function and rename things to be more
readable? "main" is color and "extra" is alpha, right? if yes, they
should be called color and alpha. Also, 80 characters per line where
possible.


> +}
> +
>  void evergreen_do_fast_color_clear(struct r600_common_context *rctx,
>struct pipe_framebuffer_state *fb,
>struct r600_atom *fb_state,
> @@ -1292,23 +1365,33 @@ void evergreen_do_fast_color_clear(struct 
> r600_common_context *rctx,
> continue;