Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview

2018-03-10 Thread Jordan Justen
On 2018-03-09 09:51:31, Mark Janes wrote:
> Could this be the reason that BSW systems never reliably passed all unit
> tests?  Up to now, we re-execute each failing test, and mark it as a
> pass if it succeeds a second time.
> 
> I'd like to remove that crutch if possible.

It is possible. We basically had memory corruption happening outside
the scratch buffer. The corruption was happening a bit passed the end
of the buffer we had allocated. It can be difficult to predict the
outcome of such corruption. :)

-Jordan

> Jordan Justen  writes:
> 
> > Ken suggested that we might be underallocating scratch space on HD
> > 400. Allocating scratch space as though there was actually 8 EUs
> > seems to help with a GPU hang seen on synmark CSDof.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> > Cc: Kenneth Graunke 
> > Cc: Eero Tamminen 
> > Cc: 
> > Signed-off-by: Jordan Justen 
> > ---
> >  src/mesa/drivers/dri/i965/brw_program.c | 44 
> > -
> >  1 file changed, 27 insertions(+), 17 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> > b/src/mesa/drivers/dri/i965/brw_program.c
> > index 527f003977b..c121136c439 100644
> > --- a/src/mesa/drivers/dri/i965/brw_program.c
> > +++ b/src/mesa/drivers/dri/i965/brw_program.c
> > @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
> >if (devinfo->gen >= 9)
> >   subslices = 4 * brw->screen->devinfo.num_slices;
> >  
> > -  /* WaCSScratchSize:hsw
> > -   *
> > -   * Haswell's scratch space address calculation appears to be sparse
> > -   * rather than tightly packed.  The Thread ID has bits indicating
> > -   * which subslice, EU within a subslice, and thread within an EU
> > -   * it is.  There's a maximum of two slices and two subslices, so 
> > these
> > -   * can be stored with a single bit.  Even though there are only 10 
> > EUs
> > -   * per subslice, this is stored in 4 bits, so there's an effective
> > -   * maximum value of 16 EUs.  Similarly, although there are only 7
> > -   * threads per EU, this is stored in a 3 bit number, giving an 
> > effective
> > -   * maximum value of 8 threads per EU.
> > -   *
> > -   * This means that we need to use 16 * 8 instead of 10 * 7 for the
> > -   * number of threads per subslice.
> > -   */
> > -  const unsigned scratch_ids_per_subslice =
> > - devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
> > +  unsigned scratch_ids_per_subslice;
> > +  if (devinfo->is_haswell) {
> > + /* WaCSScratchSize:hsw
> > +  *
> > +  * Haswell's scratch space address calculation appears to be 
> > sparse
> > +  * rather than tightly packed. The Thread ID has bits indicating
> > +  * which subslice, EU within a subslice, and thread within an EU 
> > it
> > +  * is. There's a maximum of two slices and two subslices, so these
> > +  * can be stored with a single bit. Even though there are only 10 
> > EUs
> > +  * per subslice, this is stored in 4 bits, so there's an effective
> > +  * maximum value of 16 EUs. Similarly, although there are only 7
> > +  * threads per EU, this is stored in a 3 bit number, giving an
> > +  * effective maximum value of 8 threads per EU.
> > +  *
> > +  * This means that we need to use 16 * 8 instead of 10 * 7 for the
> > +  * number of threads per subslice.
> > +  */
> > + scratch_ids_per_subslice = 16 * 8;
> > +  } else if (devinfo->is_cherryview) {
> > + /* For Cherryview, it appears that the scratch addresses for the 
> > 6 EU
> > +  * devices may still generate compute scratch addresses covering 
> > the
> > +  * same range as 8 EU.
> > +  */
> > + scratch_ids_per_subslice = 8 * 7;
> > +  } else {
> > + scratch_ids_per_subslice = devinfo->max_cs_threads;
> > +  }
> >  
> >thread_count = scratch_ids_per_subslice * subslices;
> >break;
> > -- 
> > 2.16.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview

2018-03-09 Thread Mark Janes
Could this be the reason that BSW systems never reliably passed all unit
tests?  Up to now, we re-execute each failing test, and mark it as a
pass if it succeeds a second time.

I'd like to remove that crutch if possible.

Jordan Justen  writes:

> Ken suggested that we might be underallocating scratch space on HD
> 400. Allocating scratch space as though there was actually 8 EUs
> seems to help with a GPU hang seen on synmark CSDof.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> Cc: Kenneth Graunke 
> Cc: Eero Tamminen 
> Cc: 
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_program.c | 44 
> -
>  1 file changed, 27 insertions(+), 17 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> b/src/mesa/drivers/dri/i965/brw_program.c
> index 527f003977b..c121136c439 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
>if (devinfo->gen >= 9)
>   subslices = 4 * brw->screen->devinfo.num_slices;
>  
> -  /* WaCSScratchSize:hsw
> -   *
> -   * Haswell's scratch space address calculation appears to be sparse
> -   * rather than tightly packed.  The Thread ID has bits indicating
> -   * which subslice, EU within a subslice, and thread within an EU
> -   * it is.  There's a maximum of two slices and two subslices, so these
> -   * can be stored with a single bit.  Even though there are only 10 EUs
> -   * per subslice, this is stored in 4 bits, so there's an effective
> -   * maximum value of 16 EUs.  Similarly, although there are only 7
> -   * threads per EU, this is stored in a 3 bit number, giving an 
> effective
> -   * maximum value of 8 threads per EU.
> -   *
> -   * This means that we need to use 16 * 8 instead of 10 * 7 for the
> -   * number of threads per subslice.
> -   */
> -  const unsigned scratch_ids_per_subslice =
> - devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
> +  unsigned scratch_ids_per_subslice;
> +  if (devinfo->is_haswell) {
> + /* WaCSScratchSize:hsw
> +  *
> +  * Haswell's scratch space address calculation appears to be sparse
> +  * rather than tightly packed. The Thread ID has bits indicating
> +  * which subslice, EU within a subslice, and thread within an EU it
> +  * is. There's a maximum of two slices and two subslices, so these
> +  * can be stored with a single bit. Even though there are only 10 
> EUs
> +  * per subslice, this is stored in 4 bits, so there's an effective
> +  * maximum value of 16 EUs. Similarly, although there are only 7
> +  * threads per EU, this is stored in a 3 bit number, giving an
> +  * effective maximum value of 8 threads per EU.
> +  *
> +  * This means that we need to use 16 * 8 instead of 10 * 7 for the
> +  * number of threads per subslice.
> +  */
> + scratch_ids_per_subslice = 16 * 8;
> +  } else if (devinfo->is_cherryview) {
> + /* For Cherryview, it appears that the scratch addresses for the 6 
> EU
> +  * devices may still generate compute scratch addresses covering the
> +  * same range as 8 EU.
> +  */
> + scratch_ids_per_subslice = 8 * 7;
> +  } else {
> + scratch_ids_per_subslice = devinfo->max_cs_threads;
> +  }
>  
>thread_count = scratch_ids_per_subslice * subslices;
>break;
> -- 
> 2.16.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview

2018-03-07 Thread Jordan Justen
On 2018-03-07 07:41:04, Eero Tamminen wrote:
> Hi,
> 
> Tested SynMark CSDof and GfxBench Aztec Ruins GL & GLES / normal & high 
> versions, which were earlier GPU hanging.  With this patch hangs are gone.
> 
> Tested-by: Eero Tamminen 

Thanks!

> On 07.03.2018 10:16, Jordan Justen wrote:
> > Ken suggested that we might be underallocating scratch space on HD
> > 400. Allocating scratch space as though there was actually 8 EUs
> 
> s/8/18/?
> 

I think you meant 16 rather than 18? I guess we have either 6 EU *per
subslice* (HD 400) or 8 EU per subslice (HD 405). With 2 subslices,
that'd be either 12 or 16 EU.

In my comments and commit message I should add 'per subslice' by the
6/8 EU numbers to make it clearer.

-Jordan

> 
> > seems to help with a GPU hang seen on synmark CSDof.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> > Cc: Kenneth Graunke 
> > Cc: Eero Tamminen 
> > Cc: 
> > Signed-off-by: Jordan Justen 
> > ---
> >   src/mesa/drivers/dri/i965/brw_program.c | 44 
> > -
> >   1 file changed, 27 insertions(+), 17 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> > b/src/mesa/drivers/dri/i965/brw_program.c
> > index 527f003977b..c121136c439 100644
> > --- a/src/mesa/drivers/dri/i965/brw_program.c
> > +++ b/src/mesa/drivers/dri/i965/brw_program.c
> > @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
> > if (devinfo->gen >= 9)
> >subslices = 4 * brw->screen->devinfo.num_slices;
> >   
> > -  /* WaCSScratchSize:hsw
> > -   *
> > -   * Haswell's scratch space address calculation appears to be sparse
> > -   * rather than tightly packed.  The Thread ID has bits indicating
> > -   * which subslice, EU within a subslice, and thread within an EU
> > -   * it is.  There's a maximum of two slices and two subslices, so 
> > these
> > -   * can be stored with a single bit.  Even though there are only 10 
> > EUs
> > -   * per subslice, this is stored in 4 bits, so there's an effective
> > -   * maximum value of 16 EUs.  Similarly, although there are only 7
> > -   * threads per EU, this is stored in a 3 bit number, giving an 
> > effective
> > -   * maximum value of 8 threads per EU.
> > -   *
> > -   * This means that we need to use 16 * 8 instead of 10 * 7 for the
> > -   * number of threads per subslice.
> > -   */
> > -  const unsigned scratch_ids_per_subslice =
> > - devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
> > +  unsigned scratch_ids_per_subslice;
> > +  if (devinfo->is_haswell) {
> > + /* WaCSScratchSize:hsw
> > +  *
> > +  * Haswell's scratch space address calculation appears to be 
> > sparse
> > +  * rather than tightly packed. The Thread ID has bits indicating
> > +  * which subslice, EU within a subslice, and thread within an EU 
> > it
> > +  * is. There's a maximum of two slices and two subslices, so these
> > +  * can be stored with a single bit. Even though there are only 10 
> > EUs
> > +  * per subslice, this is stored in 4 bits, so there's an effective
> > +  * maximum value of 16 EUs. Similarly, although there are only 7
> > +  * threads per EU, this is stored in a 3 bit number, giving an
> > +  * effective maximum value of 8 threads per EU.
> > +  *
> > +  * This means that we need to use 16 * 8 instead of 10 * 7 for the
> > +  * number of threads per subslice.
> > +  */
> > + scratch_ids_per_subslice = 16 * 8;
> > +  } else if (devinfo->is_cherryview) {
> > + /* For Cherryview, it appears that the scratch addresses for the 
> > 6 EU
> > +  * devices may still generate compute scratch addresses covering 
> > the
> > +  * same range as 8 EU.
> > +  */
> > + scratch_ids_per_subslice = 8 * 7;
> > +  } else {
> > + scratch_ids_per_subslice = devinfo->max_cs_threads;
> > +  }
> >   
> > thread_count = scratch_ids_per_subslice * subslices;
> > break;
> > 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview

2018-03-07 Thread Eero Tamminen

Hi,

Tested SynMark CSDof and GfxBench Aztec Ruins GL & GLES / normal & high 
versions, which were earlier GPU hanging.  With this patch hangs are gone.


Tested-by: Eero Tamminen 


On 07.03.2018 10:16, Jordan Justen wrote:

Ken suggested that we might be underallocating scratch space on HD
400. Allocating scratch space as though there was actually 8 EUs


s/8/18/?

- Eero



seems to help with a GPU hang seen on synmark CSDof.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
Cc: Kenneth Graunke 
Cc: Eero Tamminen 
Cc: 
Signed-off-by: Jordan Justen 
---
  src/mesa/drivers/dri/i965/brw_program.c | 44 -
  1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index 527f003977b..c121136c439 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
if (devinfo->gen >= 9)
   subslices = 4 * brw->screen->devinfo.num_slices;
  
-  /* WaCSScratchSize:hsw

-   *
-   * Haswell's scratch space address calculation appears to be sparse
-   * rather than tightly packed.  The Thread ID has bits indicating
-   * which subslice, EU within a subslice, and thread within an EU
-   * it is.  There's a maximum of two slices and two subslices, so these
-   * can be stored with a single bit.  Even though there are only 10 EUs
-   * per subslice, this is stored in 4 bits, so there's an effective
-   * maximum value of 16 EUs.  Similarly, although there are only 7
-   * threads per EU, this is stored in a 3 bit number, giving an effective
-   * maximum value of 8 threads per EU.
-   *
-   * This means that we need to use 16 * 8 instead of 10 * 7 for the
-   * number of threads per subslice.
-   */
-  const unsigned scratch_ids_per_subslice =
- devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
+  unsigned scratch_ids_per_subslice;
+  if (devinfo->is_haswell) {
+ /* WaCSScratchSize:hsw
+  *
+  * Haswell's scratch space address calculation appears to be sparse
+  * rather than tightly packed. The Thread ID has bits indicating
+  * which subslice, EU within a subslice, and thread within an EU it
+  * is. There's a maximum of two slices and two subslices, so these
+  * can be stored with a single bit. Even though there are only 10 EUs
+  * per subslice, this is stored in 4 bits, so there's an effective
+  * maximum value of 16 EUs. Similarly, although there are only 7
+  * threads per EU, this is stored in a 3 bit number, giving an
+  * effective maximum value of 8 threads per EU.
+  *
+  * This means that we need to use 16 * 8 instead of 10 * 7 for the
+  * number of threads per subslice.
+  */
+ scratch_ids_per_subslice = 16 * 8;
+  } else if (devinfo->is_cherryview) {
+ /* For Cherryview, it appears that the scratch addresses for the 6 EU
+  * devices may still generate compute scratch addresses covering the
+  * same range as 8 EU.
+  */
+ scratch_ids_per_subslice = 8 * 7;
+  } else {
+ scratch_ids_per_subslice = devinfo->max_cs_threads;
+  }
  
thread_count = scratch_ids_per_subslice * subslices;

break;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview

2018-03-07 Thread Eero Tamminen

Hi,

Tested SynMark CSDof and GfxBench Aztec Ruins GL & GLES / normal & high 
versions, which were earlier GPU hanging.  With this patch hangs are gone.


Tested-by: Eero Tamminen 


On 07.03.2018 10:16, Jordan Justen wrote:

Ken suggested that we might be underallocating scratch space on HD
400. Allocating scratch space as though there was actually 8 EUs


s/8/18/?

- Eero



seems to help with a GPU hang seen on synmark CSDof.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
Cc: Kenneth Graunke 
Cc: Eero Tamminen 
Cc: 
Signed-off-by: Jordan Justen 
---
  src/mesa/drivers/dri/i965/brw_program.c | 44 -
  1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index 527f003977b..c121136c439 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
if (devinfo->gen >= 9)
   subslices = 4 * brw->screen->devinfo.num_slices;
  
-  /* WaCSScratchSize:hsw

-   *
-   * Haswell's scratch space address calculation appears to be sparse
-   * rather than tightly packed.  The Thread ID has bits indicating
-   * which subslice, EU within a subslice, and thread within an EU
-   * it is.  There's a maximum of two slices and two subslices, so these
-   * can be stored with a single bit.  Even though there are only 10 EUs
-   * per subslice, this is stored in 4 bits, so there's an effective
-   * maximum value of 16 EUs.  Similarly, although there are only 7
-   * threads per EU, this is stored in a 3 bit number, giving an effective
-   * maximum value of 8 threads per EU.
-   *
-   * This means that we need to use 16 * 8 instead of 10 * 7 for the
-   * number of threads per subslice.
-   */
-  const unsigned scratch_ids_per_subslice =
- devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
+  unsigned scratch_ids_per_subslice;
+  if (devinfo->is_haswell) {
+ /* WaCSScratchSize:hsw
+  *
+  * Haswell's scratch space address calculation appears to be sparse
+  * rather than tightly packed. The Thread ID has bits indicating
+  * which subslice, EU within a subslice, and thread within an EU it
+  * is. There's a maximum of two slices and two subslices, so these
+  * can be stored with a single bit. Even though there are only 10 EUs
+  * per subslice, this is stored in 4 bits, so there's an effective
+  * maximum value of 16 EUs. Similarly, although there are only 7
+  * threads per EU, this is stored in a 3 bit number, giving an
+  * effective maximum value of 8 threads per EU.
+  *
+  * This means that we need to use 16 * 8 instead of 10 * 7 for the
+  * number of threads per subslice.
+  */
+ scratch_ids_per_subslice = 16 * 8;
+  } else if (devinfo->is_cherryview) {
+ /* For Cherryview, it appears that the scratch addresses for the 6 EU
+  * devices may still generate compute scratch addresses covering the
+  * same range as 8 EU.
+  */
+ scratch_ids_per_subslice = 8 * 7;
+  } else {
+ scratch_ids_per_subslice = devinfo->max_cs_threads;
+  }
  
thread_count = scratch_ids_per_subslice * subslices;

break;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview

2018-03-07 Thread Kenneth Graunke
On Wednesday, March 7, 2018 12:16:26 AM PST Jordan Justen wrote:
> Ken suggested that we might be underallocating scratch space on HD
> 400. Allocating scratch space as though there was actually 8 EUs
> seems to help with a GPU hang seen on synmark CSDof.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> Cc: Kenneth Graunke 
> Cc: Eero Tamminen 
> Cc: 
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_program.c | 44 
> -
>  1 file changed, 27 insertions(+), 17 deletions(-)

Patches 1-2 are:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev