Re: [Mesa-dev] [PATCH 1/9] intel/blorp: Only double the fast-clear rect alignment on HSW

2019-06-07 Thread Jason Ekstrand
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1052

I just kicked it off to Jenkins but it was fine last time I did this a year
ago.

On Fri, May 31, 2019 at 3:55 PM Nanley Chery  wrote:

> Thanks for reaching out to the HW team. Given that the internal
> documentation was updated to set the Project field of this restriction
> to HSW:GT3, what do you think about shortening the comment to mention
> that? I'd like to give this a RB as is, but there are a lot of truth
> claims I'd have to verify in order to do so..
>
> -Nanley
>
> On Mon, Dec 3, 2018 at 2:48 PM Jason Ekstrand 
> wrote:
> >
> > I've received confirmation from the HW team that the extra doubling is
> only needed on Haswell GT3.
> >
> > On Tue, May 15, 2018 at 5:28 PM Jason Ekstrand 
> wrote:
> >>
> >> The data in the commit message is a bit sketchy for Ivybridge.  We don't
> >> run dEQP or any of the CTSs on Ivybridge in CI so all the data we have
> >> is piglit.  On Haswell, piglit didn't catch anything so we don't have
> >> anything to go off of for Ivybridge besides the fact that the
> restriction
> >> wasn't added until Haswell.
> >> ---
> >>  src/intel/blorp/blorp_clear.c | 66
> ---
> >>  1 file changed, 56 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/src/intel/blorp/blorp_clear.c
> b/src/intel/blorp/blorp_clear.c
> >> index 832e8ee..618625b 100644
> >> --- a/src/intel/blorp/blorp_clear.c
> >> +++ b/src/intel/blorp/blorp_clear.c
> >> @@ -235,16 +235,62 @@ get_fast_clear_rect(const struct isl_device *dev,
> >>x_scaledown = x_align / 2;
> >>y_scaledown = y_align / 2;
> >>
> >> -  /* From BSpec: 3D-Media-GPGPU Engine > 3D Pipeline > Pixel >
> Pixel
> >> -   * Backend > MCS Buffer for Render Target(s) [DevIVB+] > Table
> "Color
> >> -   * Clear of Non-MultiSampled Render Target Restrictions":
> >> -   *
> >> -   *   Clear rectangle must be aligned to two times the number of
> >> -   *   pixels in the table shown below due to 16x16 hashing across
> the
> >> -   *   slice.
> >> -   */
> >> -  x_align *= 2;
> >> -  y_align *= 2;
> >> +  if (ISL_DEV_IS_HASWELL(dev)) {
> >> + /* The following text was added in the Haswell PRM, "3D Media
> GPGPU
> >> +  * Engine" >> "MCS Buffer for Render Target(s)" >> Table
> "Color Clear
> >> +  * of Non-MultiSampler Render Target Restrictions":
> >> +  *
> >> +  *"Clear rectangle must be aligned to two times the
> number of
> >> +  *pixels in the table shown below due to 16X16 hashing
> across the
> >> +  *slice."
> >> +  *
> >> +  * It has persisted in the documentation for all platforms up
> until
> >> +  * Cannonlake and possibly even beyond.  However, we believe
> that it
> >> +  * is only needed on Haswell.
> >> +  *
> >> +  * There are a couple possible explanations for this
> restriction:
> >> +  *
> >> +  * 1) If you assume that the hardware is writing to the CCS as
> >> +  *bytes, then the x/y_align computed above gives you an
> alignment
> >> +  *in the CCS of 8x8 bytes and, if 16x16 is needed for
> hashing, we
> >> +  *need to multiply by 2.
> >> +  *
> >> +  * 2) Haswell is a bit unique in that it's CCS tiling does
> not line
> >> +  *up with Y-tiling on a cache-line granularity.  Instead,
> it has
> >> +  *an extra bit of swizzling in bit 9.  Also, bit 6
> swizzling
> >> +  *applies to the CCS on Haswell.  This means that Haswell
> CTS
> >> +  *does not match on a cache-line granularity but it does
> match on
> >> +  *a 2x2 cache line granularity.
> >> +  *
> >> +  * Clearly, the first explanation seems to follow
> documentation the
> >> +  * best but they may be related.  In any case, empirical
> evidence
> >> +  * seems to confirm that it is, indeed required on Haswell.
> >> +  *
> >> +  * On Broadwell things get a bit stickier.  Broadwell adds
> support
> >> +  * for mip-mapped CCS with an alignment in the CCS of
> 256x128.  For a
> >> +  * 32bpb main surface, the above computation will yield a
> x/y_align
> >> +  * of 128x128 for a Y-tiled main surface and 256x64 for
> X-tiled.  In
> >> +  * either case, if we double the alignment, we will get an
> alignment
> >> +  * bigger than horizontal and vertical alignment of the CCS
> and fast
> >> +  * clears of one LOD may leak into others.
> >> +  *
> >> +  * Starting with Skylake, the image alignment for the CCS is
> only
> >> +  * 128x64 which is exactly the x/h_align computed above if
> the main
> >> +  * surface has a 32bpb format.  Also, the "Render Target
> Resolve"
> >> +  * page in the bspec (not the PRM) says, "The Resolve
> Rectangle size
> >> +

Re: [Mesa-dev] [PATCH 1/9] intel/blorp: Only double the fast-clear rect alignment on HSW

2019-05-31 Thread Nanley Chery
Thanks for reaching out to the HW team. Given that the internal
documentation was updated to set the Project field of this restriction
to HSW:GT3, what do you think about shortening the comment to mention
that? I'd like to give this a RB as is, but there are a lot of truth
claims I'd have to verify in order to do so..

-Nanley

On Mon, Dec 3, 2018 at 2:48 PM Jason Ekstrand  wrote:
>
> I've received confirmation from the HW team that the extra doubling is only 
> needed on Haswell GT3.
>
> On Tue, May 15, 2018 at 5:28 PM Jason Ekstrand  wrote:
>>
>> The data in the commit message is a bit sketchy for Ivybridge.  We don't
>> run dEQP or any of the CTSs on Ivybridge in CI so all the data we have
>> is piglit.  On Haswell, piglit didn't catch anything so we don't have
>> anything to go off of for Ivybridge besides the fact that the restriction
>> wasn't added until Haswell.
>> ---
>>  src/intel/blorp/blorp_clear.c | 66 
>> ---
>>  1 file changed, 56 insertions(+), 10 deletions(-)
>>
>> diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
>> index 832e8ee..618625b 100644
>> --- a/src/intel/blorp/blorp_clear.c
>> +++ b/src/intel/blorp/blorp_clear.c
>> @@ -235,16 +235,62 @@ get_fast_clear_rect(const struct isl_device *dev,
>>x_scaledown = x_align / 2;
>>y_scaledown = y_align / 2;
>>
>> -  /* From BSpec: 3D-Media-GPGPU Engine > 3D Pipeline > Pixel > Pixel
>> -   * Backend > MCS Buffer for Render Target(s) [DevIVB+] > Table "Color
>> -   * Clear of Non-MultiSampled Render Target Restrictions":
>> -   *
>> -   *   Clear rectangle must be aligned to two times the number of
>> -   *   pixels in the table shown below due to 16x16 hashing across the
>> -   *   slice.
>> -   */
>> -  x_align *= 2;
>> -  y_align *= 2;
>> +  if (ISL_DEV_IS_HASWELL(dev)) {
>> + /* The following text was added in the Haswell PRM, "3D Media GPGPU
>> +  * Engine" >> "MCS Buffer for Render Target(s)" >> Table "Color 
>> Clear
>> +  * of Non-MultiSampler Render Target Restrictions":
>> +  *
>> +  *"Clear rectangle must be aligned to two times the number of
>> +  *pixels in the table shown below due to 16X16 hashing across 
>> the
>> +  *slice."
>> +  *
>> +  * It has persisted in the documentation for all platforms up until
>> +  * Cannonlake and possibly even beyond.  However, we believe that 
>> it
>> +  * is only needed on Haswell.
>> +  *
>> +  * There are a couple possible explanations for this restriction:
>> +  *
>> +  * 1) If you assume that the hardware is writing to the CCS as
>> +  *bytes, then the x/y_align computed above gives you an 
>> alignment
>> +  *in the CCS of 8x8 bytes and, if 16x16 is needed for hashing, 
>> we
>> +  *need to multiply by 2.
>> +  *
>> +  * 2) Haswell is a bit unique in that it's CCS tiling does not line
>> +  *up with Y-tiling on a cache-line granularity.  Instead, it 
>> has
>> +  *an extra bit of swizzling in bit 9.  Also, bit 6 swizzling
>> +  *applies to the CCS on Haswell.  This means that Haswell CTS
>> +  *does not match on a cache-line granularity but it does match 
>> on
>> +  *a 2x2 cache line granularity.
>> +  *
>> +  * Clearly, the first explanation seems to follow documentation the
>> +  * best but they may be related.  In any case, empirical evidence
>> +  * seems to confirm that it is, indeed required on Haswell.
>> +  *
>> +  * On Broadwell things get a bit stickier.  Broadwell adds support
>> +  * for mip-mapped CCS with an alignment in the CCS of 256x128.  
>> For a
>> +  * 32bpb main surface, the above computation will yield a x/y_align
>> +  * of 128x128 for a Y-tiled main surface and 256x64 for X-tiled.  
>> In
>> +  * either case, if we double the alignment, we will get an 
>> alignment
>> +  * bigger than horizontal and vertical alignment of the CCS and 
>> fast
>> +  * clears of one LOD may leak into others.
>> +  *
>> +  * Starting with Skylake, the image alignment for the CCS is only
>> +  * 128x64 which is exactly the x/h_align computed above if the main
>> +  * surface has a 32bpb format.  Also, the "Render Target Resolve"
>> +  * page in the bspec (not the PRM) says, "The Resolve Rectangle 
>> size
>> +  * is same as Clear Rectangle size from SKL+".  The x/y_align
>> +  * computed above (without doubling) match the resolve rectangle
>> +  * calculation perfectly.
>> +  *
>> +  * Finally, to confirm all this, a full test run was performed on
>> +  * Feb. 9, 2018 with this doubling removed and the only platform
>> +  * which 

Re: [Mesa-dev] [PATCH 1/9] intel/blorp: Only double the fast-clear rect alignment on HSW

2018-12-03 Thread Jason Ekstrand
I've received confirmation from the HW team that the extra doubling is only
needed on Haswell GT3.

On Tue, May 15, 2018 at 5:28 PM Jason Ekstrand  wrote:

> The data in the commit message is a bit sketchy for Ivybridge.  We don't
> run dEQP or any of the CTSs on Ivybridge in CI so all the data we have
> is piglit.  On Haswell, piglit didn't catch anything so we don't have
> anything to go off of for Ivybridge besides the fact that the restriction
> wasn't added until Haswell.
> ---
>  src/intel/blorp/blorp_clear.c | 66
> ---
>  1 file changed, 56 insertions(+), 10 deletions(-)
>
> diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
> index 832e8ee..618625b 100644
> --- a/src/intel/blorp/blorp_clear.c
> +++ b/src/intel/blorp/blorp_clear.c
> @@ -235,16 +235,62 @@ get_fast_clear_rect(const struct isl_device *dev,
>x_scaledown = x_align / 2;
>y_scaledown = y_align / 2;
>
> -  /* From BSpec: 3D-Media-GPGPU Engine > 3D Pipeline > Pixel > Pixel
> -   * Backend > MCS Buffer for Render Target(s) [DevIVB+] > Table
> "Color
> -   * Clear of Non-MultiSampled Render Target Restrictions":
> -   *
> -   *   Clear rectangle must be aligned to two times the number of
> -   *   pixels in the table shown below due to 16x16 hashing across the
> -   *   slice.
> -   */
> -  x_align *= 2;
> -  y_align *= 2;
> +  if (ISL_DEV_IS_HASWELL(dev)) {
> + /* The following text was added in the Haswell PRM, "3D Media
> GPGPU
> +  * Engine" >> "MCS Buffer for Render Target(s)" >> Table "Color
> Clear
> +  * of Non-MultiSampler Render Target Restrictions":
> +  *
> +  *"Clear rectangle must be aligned to two times the number of
> +  *pixels in the table shown below due to 16X16 hashing
> across the
> +  *slice."
> +  *
> +  * It has persisted in the documentation for all platforms up
> until
> +  * Cannonlake and possibly even beyond.  However, we believe
> that it
> +  * is only needed on Haswell.
> +  *
> +  * There are a couple possible explanations for this restriction:
> +  *
> +  * 1) If you assume that the hardware is writing to the CCS as
> +  *bytes, then the x/y_align computed above gives you an
> alignment
> +  *in the CCS of 8x8 bytes and, if 16x16 is needed for
> hashing, we
> +  *need to multiply by 2.
> +  *
> +  * 2) Haswell is a bit unique in that it's CCS tiling does not
> line
> +  *up with Y-tiling on a cache-line granularity.  Instead, it
> has
> +  *an extra bit of swizzling in bit 9.  Also, bit 6 swizzling
> +  *applies to the CCS on Haswell.  This means that Haswell CTS
> +  *does not match on a cache-line granularity but it does
> match on
> +  *a 2x2 cache line granularity.
> +  *
> +  * Clearly, the first explanation seems to follow documentation
> the
> +  * best but they may be related.  In any case, empirical evidence
> +  * seems to confirm that it is, indeed required on Haswell.
> +  *
> +  * On Broadwell things get a bit stickier.  Broadwell adds
> support
> +  * for mip-mapped CCS with an alignment in the CCS of 256x128.
> For a
> +  * 32bpb main surface, the above computation will yield a
> x/y_align
> +  * of 128x128 for a Y-tiled main surface and 256x64 for
> X-tiled.  In
> +  * either case, if we double the alignment, we will get an
> alignment
> +  * bigger than horizontal and vertical alignment of the CCS and
> fast
> +  * clears of one LOD may leak into others.
> +  *
> +  * Starting with Skylake, the image alignment for the CCS is only
> +  * 128x64 which is exactly the x/h_align computed above if the
> main
> +  * surface has a 32bpb format.  Also, the "Render Target Resolve"
> +  * page in the bspec (not the PRM) says, "The Resolve Rectangle
> size
> +  * is same as Clear Rectangle size from SKL+".  The x/y_align
> +  * computed above (without doubling) match the resolve rectangle
> +  * calculation perfectly.
> +  *
> +  * Finally, to confirm all this, a full test run was performed on
> +  * Feb. 9, 2018 with this doubling removed and the only platform
> +  * which seemed to be affected was Haswell.  The run consisted of
> +  * piglit, dEQP, the Vulkan CTS 1.0.2, the OpenGL 4.5 CTS, and
> the
> +  * OpenGL ES 3.2 CTS.
> +  */
> + x_align *= 2;
> + y_align *= 2;
> +  }
> } else {
>assert(aux_surf->usage == ISL_SURF_USAGE_MCS_BIT);
>
> --
> 2.5.0.400.gff86faf
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH 1/9] intel/blorp: Only double the fast-clear rect alignment on HSW

2018-05-15 Thread Jason Ekstrand
The data in the commit message is a bit sketchy for Ivybridge.  We don't
run dEQP or any of the CTSs on Ivybridge in CI so all the data we have
is piglit.  On Haswell, piglit didn't catch anything so we don't have
anything to go off of for Ivybridge besides the fact that the restriction
wasn't added until Haswell.
---
 src/intel/blorp/blorp_clear.c | 66 ---
 1 file changed, 56 insertions(+), 10 deletions(-)

diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index 832e8ee..618625b 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -235,16 +235,62 @@ get_fast_clear_rect(const struct isl_device *dev,
   x_scaledown = x_align / 2;
   y_scaledown = y_align / 2;
 
-  /* From BSpec: 3D-Media-GPGPU Engine > 3D Pipeline > Pixel > Pixel
-   * Backend > MCS Buffer for Render Target(s) [DevIVB+] > Table "Color
-   * Clear of Non-MultiSampled Render Target Restrictions":
-   *
-   *   Clear rectangle must be aligned to two times the number of
-   *   pixels in the table shown below due to 16x16 hashing across the
-   *   slice.
-   */
-  x_align *= 2;
-  y_align *= 2;
+  if (ISL_DEV_IS_HASWELL(dev)) {
+ /* The following text was added in the Haswell PRM, "3D Media GPGPU
+  * Engine" >> "MCS Buffer for Render Target(s)" >> Table "Color Clear
+  * of Non-MultiSampler Render Target Restrictions":
+  *
+  *"Clear rectangle must be aligned to two times the number of
+  *pixels in the table shown below due to 16X16 hashing across the
+  *slice."
+  *
+  * It has persisted in the documentation for all platforms up until
+  * Cannonlake and possibly even beyond.  However, we believe that it
+  * is only needed on Haswell.
+  *
+  * There are a couple possible explanations for this restriction:
+  *
+  * 1) If you assume that the hardware is writing to the CCS as
+  *bytes, then the x/y_align computed above gives you an alignment
+  *in the CCS of 8x8 bytes and, if 16x16 is needed for hashing, we
+  *need to multiply by 2.
+  *
+  * 2) Haswell is a bit unique in that it's CCS tiling does not line
+  *up with Y-tiling on a cache-line granularity.  Instead, it has
+  *an extra bit of swizzling in bit 9.  Also, bit 6 swizzling
+  *applies to the CCS on Haswell.  This means that Haswell CTS
+  *does not match on a cache-line granularity but it does match on
+  *a 2x2 cache line granularity.
+  *
+  * Clearly, the first explanation seems to follow documentation the
+  * best but they may be related.  In any case, empirical evidence
+  * seems to confirm that it is, indeed required on Haswell.
+  *
+  * On Broadwell things get a bit stickier.  Broadwell adds support
+  * for mip-mapped CCS with an alignment in the CCS of 256x128.  For a
+  * 32bpb main surface, the above computation will yield a x/y_align
+  * of 128x128 for a Y-tiled main surface and 256x64 for X-tiled.  In
+  * either case, if we double the alignment, we will get an alignment
+  * bigger than horizontal and vertical alignment of the CCS and fast
+  * clears of one LOD may leak into others.
+  *
+  * Starting with Skylake, the image alignment for the CCS is only
+  * 128x64 which is exactly the x/h_align computed above if the main
+  * surface has a 32bpb format.  Also, the "Render Target Resolve"
+  * page in the bspec (not the PRM) says, "The Resolve Rectangle size
+  * is same as Clear Rectangle size from SKL+".  The x/y_align
+  * computed above (without doubling) match the resolve rectangle
+  * calculation perfectly.
+  *
+  * Finally, to confirm all this, a full test run was performed on
+  * Feb. 9, 2018 with this doubling removed and the only platform
+  * which seemed to be affected was Haswell.  The run consisted of
+  * piglit, dEQP, the Vulkan CTS 1.0.2, the OpenGL 4.5 CTS, and the
+  * OpenGL ES 3.2 CTS.
+  */
+ x_align *= 2;
+ y_align *= 2;
+  }
} else {
   assert(aux_surf->usage == ISL_SURF_USAGE_MCS_BIT);
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] intel/blorp: Only double the fast-clear rect alignment on HSW

2018-02-20 Thread Jason Ekstrand
The data in the commit message is a bit sketchy for Ivybridge.  We don't
run dEQP or any of the CTSs on Ivybridge in CI so all the data we have
is piglit.  On Haswell, piglit didn't catch anything so we don't have
anything to go off of for Ivybridge besides the fact that the restriction
wasn't added until Haswell.
---
 src/intel/blorp/blorp_clear.c | 66 ---
 1 file changed, 56 insertions(+), 10 deletions(-)

diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index dde116f..36ec185 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -235,16 +235,62 @@ get_fast_clear_rect(const struct isl_device *dev,
   x_scaledown = x_align / 2;
   y_scaledown = y_align / 2;
 
-  /* From BSpec: 3D-Media-GPGPU Engine > 3D Pipeline > Pixel > Pixel
-   * Backend > MCS Buffer for Render Target(s) [DevIVB+] > Table "Color
-   * Clear of Non-MultiSampled Render Target Restrictions":
-   *
-   *   Clear rectangle must be aligned to two times the number of
-   *   pixels in the table shown below due to 16x16 hashing across the
-   *   slice.
-   */
-  x_align *= 2;
-  y_align *= 2;
+  if (ISL_DEV_IS_HASWELL(dev)) {
+ /* The following text was added in the Haswell PRM, "3D Media GPGPU
+  * Engine" >> "MCS Buffer for Render Target(s)" >> Table "Color Clear
+  * of Non-MultiSampler Render Target Restrictions":
+  *
+  *"Clear rectangle must be aligned to two times the number of
+  *pixels in the table shown below due to 16X16 hashing across the
+  *slice."
+  *
+  * It has persisted in the documentation for all platforms up until
+  * Cannonlake and possibly even beyond.  However, we believe that it
+  * is only needed on Haswell.
+  *
+  * There are a couple possible explanations for this restriction:
+  *
+  * 1) If you assume that the hardware is writing to the CCS as
+  *bytes, then the x/y_align computed above gives you an alignment
+  *in the CCS of 8x8 bytes and, if 16x16 is needed for hashing, we
+  *need to multiply by 2.
+  *
+  * 2) Haswell is a bit unique in that it's CCS tiling does not line
+  *up with Y-tiling on a cache-line granularity.  Instead, it has
+  *an extra bit of swizzling in bit 9.  Also, bit 6 swizzling
+  *applies to the CCS on Haswell.  This means that Haswell CTS
+  *does not match on a cache-line granularity but it does match on
+  *a 2x2 cache line granularity.
+  *
+  * Clearly, the first explanation seems to follow documentation the
+  * best but they may be related.  In any case, empirical evidence
+  * seems to confirm that it is, indeed required on Haswell.
+  *
+  * On Broadwell things get a bit stickier.  Broadwell adds support
+  * for mip-mapped CCS with an alignment in the CCS of 256x128.  For a
+  * 32bpb main surface, the above computation will yield a x/y_align
+  * of 128x128 for a Y-tiled main surface and 256x64 for X-tiled.  In
+  * either case, if we double the alignment, we will get an alignment
+  * bigger than horizontal and vertical alignment of the CCS and fast
+  * clears of one LOD may leak into others.
+  *
+  * Starting with Skylake, the image alignment for the CCS is only
+  * 128x64 which is exactly the x/h_align computed above if the main
+  * surface has a 32bpb format.  Also, the "Render Target Resolve"
+  * page in the bspec (not the PRM) says, "The Resolve Rectangle size
+  * is same as Clear Rectangle size from SKL+".  The x/y_align
+  * computed above (without doubling) match the resolve rectangle
+  * calculation perfectly.
+  *
+  * Finally, to confirm all this, a full test run was performed on
+  * Feb. 9, 2018 with this doubling removed and the only platform
+  * which seemed to be affected was Haswell.  The run consisted of
+  * piglit, dEQP, the Vulkan CTS 1.0.2, the OpenGL 4.5 CTS, and the
+  * OpenGL ES 3.2 CTS.
+  */
+ x_align *= 2;
+ y_align *= 2;
+  }
} else {
   assert(aux_surf->usage == ISL_SURF_USAGE_MCS_BIT);
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev