Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview
On 2018-03-09 09:51:31, Mark Janes wrote: > Could this be the reason that BSW systems never reliably passed all unit > tests? Up to now, we re-execute each failing test, and mark it as a > pass if it succeeds a second time. > > I'd like to remove that crutch if possible. It is possible. We basically had memory corruption happening outside the scratch buffer. The corruption was happening a bit passed the end of the buffer we had allocated. It can be difficult to predict the outcome of such corruption. :) -Jordan > Jordan Justen writes: > > > Ken suggested that we might be underallocating scratch space on HD > > 400. Allocating scratch space as though there was actually 8 EUs > > seems to help with a GPU hang seen on synmark CSDof. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636 > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290 > > Cc: Kenneth Graunke > > Cc: Eero Tamminen > > Cc: > > Signed-off-by: Jordan Justen > > --- > > src/mesa/drivers/dri/i965/brw_program.c | 44 > > - > > 1 file changed, 27 insertions(+), 17 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c > > b/src/mesa/drivers/dri/i965/brw_program.c > > index 527f003977b..c121136c439 100644 > > --- a/src/mesa/drivers/dri/i965/brw_program.c > > +++ b/src/mesa/drivers/dri/i965/brw_program.c > > @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw, > >if (devinfo->gen >= 9) > > subslices = 4 * brw->screen->devinfo.num_slices; > > > > - /* WaCSScratchSize:hsw > > - * > > - * Haswell's scratch space address calculation appears to be sparse > > - * rather than tightly packed. The Thread ID has bits indicating > > - * which subslice, EU within a subslice, and thread within an EU > > - * it is. There's a maximum of two slices and two subslices, so > > these > > - * can be stored with a single bit. Even though there are only 10 > > EUs > > - * per subslice, this is stored in 4 bits, so there's an effective > > - * maximum value of 16 EUs. Similarly, although there are only 7 > > - * threads per EU, this is stored in a 3 bit number, giving an > > effective > > - * maximum value of 8 threads per EU. > > - * > > - * This means that we need to use 16 * 8 instead of 10 * 7 for the > > - * number of threads per subslice. > > - */ > > - const unsigned scratch_ids_per_subslice = > > - devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads; > > + unsigned scratch_ids_per_subslice; > > + if (devinfo->is_haswell) { > > + /* WaCSScratchSize:hsw > > + * > > + * Haswell's scratch space address calculation appears to be > > sparse > > + * rather than tightly packed. The Thread ID has bits indicating > > + * which subslice, EU within a subslice, and thread within an EU > > it > > + * is. There's a maximum of two slices and two subslices, so these > > + * can be stored with a single bit. Even though there are only 10 > > EUs > > + * per subslice, this is stored in 4 bits, so there's an effective > > + * maximum value of 16 EUs. Similarly, although there are only 7 > > + * threads per EU, this is stored in a 3 bit number, giving an > > + * effective maximum value of 8 threads per EU. > > + * > > + * This means that we need to use 16 * 8 instead of 10 * 7 for the > > + * number of threads per subslice. > > + */ > > + scratch_ids_per_subslice = 16 * 8; > > + } else if (devinfo->is_cherryview) { > > + /* For Cherryview, it appears that the scratch addresses for the > > 6 EU > > + * devices may still generate compute scratch addresses covering > > the > > + * same range as 8 EU. > > + */ > > + scratch_ids_per_subslice = 8 * 7; > > + } else { > > + scratch_ids_per_subslice = devinfo->max_cs_threads; > > + } > > > >thread_count = scratch_ids_per_subslice * subslices; > >break; > > -- > > 2.16.1 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview
Could this be the reason that BSW systems never reliably passed all unit tests? Up to now, we re-execute each failing test, and mark it as a pass if it succeeds a second time. I'd like to remove that crutch if possible. Jordan Justen writes: > Ken suggested that we might be underallocating scratch space on HD > 400. Allocating scratch space as though there was actually 8 EUs > seems to help with a GPU hang seen on synmark CSDof. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636 > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290 > Cc: Kenneth Graunke > Cc: Eero Tamminen > Cc: > Signed-off-by: Jordan Justen > --- > src/mesa/drivers/dri/i965/brw_program.c | 44 > - > 1 file changed, 27 insertions(+), 17 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c > b/src/mesa/drivers/dri/i965/brw_program.c > index 527f003977b..c121136c439 100644 > --- a/src/mesa/drivers/dri/i965/brw_program.c > +++ b/src/mesa/drivers/dri/i965/brw_program.c > @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw, >if (devinfo->gen >= 9) > subslices = 4 * brw->screen->devinfo.num_slices; > > - /* WaCSScratchSize:hsw > - * > - * Haswell's scratch space address calculation appears to be sparse > - * rather than tightly packed. The Thread ID has bits indicating > - * which subslice, EU within a subslice, and thread within an EU > - * it is. There's a maximum of two slices and two subslices, so these > - * can be stored with a single bit. Even though there are only 10 EUs > - * per subslice, this is stored in 4 bits, so there's an effective > - * maximum value of 16 EUs. Similarly, although there are only 7 > - * threads per EU, this is stored in a 3 bit number, giving an > effective > - * maximum value of 8 threads per EU. > - * > - * This means that we need to use 16 * 8 instead of 10 * 7 for the > - * number of threads per subslice. > - */ > - const unsigned scratch_ids_per_subslice = > - devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads; > + unsigned scratch_ids_per_subslice; > + if (devinfo->is_haswell) { > + /* WaCSScratchSize:hsw > + * > + * Haswell's scratch space address calculation appears to be sparse > + * rather than tightly packed. The Thread ID has bits indicating > + * which subslice, EU within a subslice, and thread within an EU it > + * is. There's a maximum of two slices and two subslices, so these > + * can be stored with a single bit. Even though there are only 10 > EUs > + * per subslice, this is stored in 4 bits, so there's an effective > + * maximum value of 16 EUs. Similarly, although there are only 7 > + * threads per EU, this is stored in a 3 bit number, giving an > + * effective maximum value of 8 threads per EU. > + * > + * This means that we need to use 16 * 8 instead of 10 * 7 for the > + * number of threads per subslice. > + */ > + scratch_ids_per_subslice = 16 * 8; > + } else if (devinfo->is_cherryview) { > + /* For Cherryview, it appears that the scratch addresses for the 6 > EU > + * devices may still generate compute scratch addresses covering the > + * same range as 8 EU. > + */ > + scratch_ids_per_subslice = 8 * 7; > + } else { > + scratch_ids_per_subslice = devinfo->max_cs_threads; > + } > >thread_count = scratch_ids_per_subslice * subslices; >break; > -- > 2.16.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview
On 2018-03-07 07:41:04, Eero Tamminen wrote: > Hi, > > Tested SynMark CSDof and GfxBench Aztec Ruins GL & GLES / normal & high > versions, which were earlier GPU hanging. With this patch hangs are gone. > > Tested-by: Eero Tamminen Thanks! > On 07.03.2018 10:16, Jordan Justen wrote: > > Ken suggested that we might be underallocating scratch space on HD > > 400. Allocating scratch space as though there was actually 8 EUs > > s/8/18/? > I think you meant 16 rather than 18? I guess we have either 6 EU *per subslice* (HD 400) or 8 EU per subslice (HD 405). With 2 subslices, that'd be either 12 or 16 EU. In my comments and commit message I should add 'per subslice' by the 6/8 EU numbers to make it clearer. -Jordan > > > seems to help with a GPU hang seen on synmark CSDof. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636 > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290 > > Cc: Kenneth Graunke > > Cc: Eero Tamminen > > Cc: > > Signed-off-by: Jordan Justen > > --- > > src/mesa/drivers/dri/i965/brw_program.c | 44 > > - > > 1 file changed, 27 insertions(+), 17 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c > > b/src/mesa/drivers/dri/i965/brw_program.c > > index 527f003977b..c121136c439 100644 > > --- a/src/mesa/drivers/dri/i965/brw_program.c > > +++ b/src/mesa/drivers/dri/i965/brw_program.c > > @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw, > > if (devinfo->gen >= 9) > >subslices = 4 * brw->screen->devinfo.num_slices; > > > > - /* WaCSScratchSize:hsw > > - * > > - * Haswell's scratch space address calculation appears to be sparse > > - * rather than tightly packed. The Thread ID has bits indicating > > - * which subslice, EU within a subslice, and thread within an EU > > - * it is. There's a maximum of two slices and two subslices, so > > these > > - * can be stored with a single bit. Even though there are only 10 > > EUs > > - * per subslice, this is stored in 4 bits, so there's an effective > > - * maximum value of 16 EUs. Similarly, although there are only 7 > > - * threads per EU, this is stored in a 3 bit number, giving an > > effective > > - * maximum value of 8 threads per EU. > > - * > > - * This means that we need to use 16 * 8 instead of 10 * 7 for the > > - * number of threads per subslice. > > - */ > > - const unsigned scratch_ids_per_subslice = > > - devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads; > > + unsigned scratch_ids_per_subslice; > > + if (devinfo->is_haswell) { > > + /* WaCSScratchSize:hsw > > + * > > + * Haswell's scratch space address calculation appears to be > > sparse > > + * rather than tightly packed. The Thread ID has bits indicating > > + * which subslice, EU within a subslice, and thread within an EU > > it > > + * is. There's a maximum of two slices and two subslices, so these > > + * can be stored with a single bit. Even though there are only 10 > > EUs > > + * per subslice, this is stored in 4 bits, so there's an effective > > + * maximum value of 16 EUs. Similarly, although there are only 7 > > + * threads per EU, this is stored in a 3 bit number, giving an > > + * effective maximum value of 8 threads per EU. > > + * > > + * This means that we need to use 16 * 8 instead of 10 * 7 for the > > + * number of threads per subslice. > > + */ > > + scratch_ids_per_subslice = 16 * 8; > > + } else if (devinfo->is_cherryview) { > > + /* For Cherryview, it appears that the scratch addresses for the > > 6 EU > > + * devices may still generate compute scratch addresses covering > > the > > + * same range as 8 EU. > > + */ > > + scratch_ids_per_subslice = 8 * 7; > > + } else { > > + scratch_ids_per_subslice = devinfo->max_cs_threads; > > + } > > > > thread_count = scratch_ids_per_subslice * subslices; > > break; > > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview
Hi, Tested SynMark CSDof and GfxBench Aztec Ruins GL & GLES / normal & high versions, which were earlier GPU hanging. With this patch hangs are gone. Tested-by: Eero Tamminen On 07.03.2018 10:16, Jordan Justen wrote: Ken suggested that we might be underallocating scratch space on HD 400. Allocating scratch space as though there was actually 8 EUs s/8/18/? - Eero seems to help with a GPU hang seen on synmark CSDof. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290 Cc: Kenneth Graunke Cc: Eero Tamminen Cc: Signed-off-by: Jordan Justen --- src/mesa/drivers/dri/i965/brw_program.c | 44 - 1 file changed, 27 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index 527f003977b..c121136c439 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw, if (devinfo->gen >= 9) subslices = 4 * brw->screen->devinfo.num_slices; - /* WaCSScratchSize:hsw - * - * Haswell's scratch space address calculation appears to be sparse - * rather than tightly packed. The Thread ID has bits indicating - * which subslice, EU within a subslice, and thread within an EU - * it is. There's a maximum of two slices and two subslices, so these - * can be stored with a single bit. Even though there are only 10 EUs - * per subslice, this is stored in 4 bits, so there's an effective - * maximum value of 16 EUs. Similarly, although there are only 7 - * threads per EU, this is stored in a 3 bit number, giving an effective - * maximum value of 8 threads per EU. - * - * This means that we need to use 16 * 8 instead of 10 * 7 for the - * number of threads per subslice. - */ - const unsigned scratch_ids_per_subslice = - devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads; + unsigned scratch_ids_per_subslice; + if (devinfo->is_haswell) { + /* WaCSScratchSize:hsw + * + * Haswell's scratch space address calculation appears to be sparse + * rather than tightly packed. The Thread ID has bits indicating + * which subslice, EU within a subslice, and thread within an EU it + * is. There's a maximum of two slices and two subslices, so these + * can be stored with a single bit. Even though there are only 10 EUs + * per subslice, this is stored in 4 bits, so there's an effective + * maximum value of 16 EUs. Similarly, although there are only 7 + * threads per EU, this is stored in a 3 bit number, giving an + * effective maximum value of 8 threads per EU. + * + * This means that we need to use 16 * 8 instead of 10 * 7 for the + * number of threads per subslice. + */ + scratch_ids_per_subslice = 16 * 8; + } else if (devinfo->is_cherryview) { + /* For Cherryview, it appears that the scratch addresses for the 6 EU + * devices may still generate compute scratch addresses covering the + * same range as 8 EU. + */ + scratch_ids_per_subslice = 8 * 7; + } else { + scratch_ids_per_subslice = devinfo->max_cs_threads; + } thread_count = scratch_ids_per_subslice * subslices; break; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview
Hi, Tested SynMark CSDof and GfxBench Aztec Ruins GL & GLES / normal & high versions, which were earlier GPU hanging. With this patch hangs are gone. Tested-by: Eero Tamminen On 07.03.2018 10:16, Jordan Justen wrote: Ken suggested that we might be underallocating scratch space on HD 400. Allocating scratch space as though there was actually 8 EUs s/8/18/? - Eero seems to help with a GPU hang seen on synmark CSDof. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290 Cc: Kenneth Graunke Cc: Eero Tamminen Cc: Signed-off-by: Jordan Justen --- src/mesa/drivers/dri/i965/brw_program.c | 44 - 1 file changed, 27 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index 527f003977b..c121136c439 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw, if (devinfo->gen >= 9) subslices = 4 * brw->screen->devinfo.num_slices; - /* WaCSScratchSize:hsw - * - * Haswell's scratch space address calculation appears to be sparse - * rather than tightly packed. The Thread ID has bits indicating - * which subslice, EU within a subslice, and thread within an EU - * it is. There's a maximum of two slices and two subslices, so these - * can be stored with a single bit. Even though there are only 10 EUs - * per subslice, this is stored in 4 bits, so there's an effective - * maximum value of 16 EUs. Similarly, although there are only 7 - * threads per EU, this is stored in a 3 bit number, giving an effective - * maximum value of 8 threads per EU. - * - * This means that we need to use 16 * 8 instead of 10 * 7 for the - * number of threads per subslice. - */ - const unsigned scratch_ids_per_subslice = - devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads; + unsigned scratch_ids_per_subslice; + if (devinfo->is_haswell) { + /* WaCSScratchSize:hsw + * + * Haswell's scratch space address calculation appears to be sparse + * rather than tightly packed. The Thread ID has bits indicating + * which subslice, EU within a subslice, and thread within an EU it + * is. There's a maximum of two slices and two subslices, so these + * can be stored with a single bit. Even though there are only 10 EUs + * per subslice, this is stored in 4 bits, so there's an effective + * maximum value of 16 EUs. Similarly, although there are only 7 + * threads per EU, this is stored in a 3 bit number, giving an + * effective maximum value of 8 threads per EU. + * + * This means that we need to use 16 * 8 instead of 10 * 7 for the + * number of threads per subslice. + */ + scratch_ids_per_subslice = 16 * 8; + } else if (devinfo->is_cherryview) { + /* For Cherryview, it appears that the scratch addresses for the 6 EU + * devices may still generate compute scratch addresses covering the + * same range as 8 EU. + */ + scratch_ids_per_subslice = 8 * 7; + } else { + scratch_ids_per_subslice = devinfo->max_cs_threads; + } thread_count = scratch_ids_per_subslice * subslices; break; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview
On Wednesday, March 7, 2018 12:16:26 AM PST Jordan Justen wrote: > Ken suggested that we might be underallocating scratch space on HD > 400. Allocating scratch space as though there was actually 8 EUs > seems to help with a GPU hang seen on synmark CSDof. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636 > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290 > Cc: Kenneth Graunke > Cc: Eero Tamminen > Cc: > Signed-off-by: Jordan Justen > --- > src/mesa/drivers/dri/i965/brw_program.c | 44 > - > 1 file changed, 27 insertions(+), 17 deletions(-) Patches 1-2 are: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview
Ken suggested that we might be underallocating scratch space on HD 400. Allocating scratch space as though there was actually 8 EUs seems to help with a GPU hang seen on synmark CSDof. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290 Cc: Kenneth Graunke Cc: Eero Tamminen Cc: Signed-off-by: Jordan Justen --- src/mesa/drivers/dri/i965/brw_program.c | 44 - 1 file changed, 27 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index 527f003977b..c121136c439 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw, if (devinfo->gen >= 9) subslices = 4 * brw->screen->devinfo.num_slices; - /* WaCSScratchSize:hsw - * - * Haswell's scratch space address calculation appears to be sparse - * rather than tightly packed. The Thread ID has bits indicating - * which subslice, EU within a subslice, and thread within an EU - * it is. There's a maximum of two slices and two subslices, so these - * can be stored with a single bit. Even though there are only 10 EUs - * per subslice, this is stored in 4 bits, so there's an effective - * maximum value of 16 EUs. Similarly, although there are only 7 - * threads per EU, this is stored in a 3 bit number, giving an effective - * maximum value of 8 threads per EU. - * - * This means that we need to use 16 * 8 instead of 10 * 7 for the - * number of threads per subslice. - */ - const unsigned scratch_ids_per_subslice = - devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads; + unsigned scratch_ids_per_subslice; + if (devinfo->is_haswell) { + /* WaCSScratchSize:hsw + * + * Haswell's scratch space address calculation appears to be sparse + * rather than tightly packed. The Thread ID has bits indicating + * which subslice, EU within a subslice, and thread within an EU it + * is. There's a maximum of two slices and two subslices, so these + * can be stored with a single bit. Even though there are only 10 EUs + * per subslice, this is stored in 4 bits, so there's an effective + * maximum value of 16 EUs. Similarly, although there are only 7 + * threads per EU, this is stored in a 3 bit number, giving an + * effective maximum value of 8 threads per EU. + * + * This means that we need to use 16 * 8 instead of 10 * 7 for the + * number of threads per subslice. + */ + scratch_ids_per_subslice = 16 * 8; + } else if (devinfo->is_cherryview) { + /* For Cherryview, it appears that the scratch addresses for the 6 EU + * devices may still generate compute scratch addresses covering the + * same range as 8 EU. + */ + scratch_ids_per_subslice = 8 * 7; + } else { + scratch_ids_per_subslice = devinfo->max_cs_threads; + } thread_count = scratch_ids_per_subslice * subslices; break; -- 2.16.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev