Re: [Mesa-dev] [PATCH 3/5] intel/dev: Add devinfo cs_scratch_ids_per_subslice field
On Wednesday, March 7, 2018 12:36:02 PM PST Jordan Justen wrote: > On 2018-03-07 00:47:12, Kenneth Graunke wrote: > > On Wednesday, March 7, 2018 12:16:28 AM PST Jordan Justen wrote: > > > Suggested-by: Kenneth Graunke > > > Signed-off-by: Jordan Justen > > > --- > > > src/intel/dev/gen_device_info.c | 13 - > > > src/intel/dev/gen_device_info.h | 8 > > > 2 files changed, 16 insertions(+), 5 deletions(-) > > > > > > diff --git a/src/intel/dev/gen_device_info.c > > > b/src/intel/dev/gen_device_info.c > > > index 1773009d33c..de018c6a692 100644 > > > --- a/src/intel/dev/gen_device_info.c > > > +++ b/src/intel/dev/gen_device_info.c > > > @@ -284,11 +284,13 @@ static const struct gen_device_info > > > gen_device_info_byt = { > > > }, > > > }; > > > > > > -#define HSW_FEATURES \ > > > - GEN7_FEATURES,\ > > > - .is_haswell = true, \ > > > - .supports_simd16_3src = true, \ > > > - .has_resource_streamer = true > > > +#define HSW_FEATURES \ > > > + GEN7_FEATURES,\ > > > + .is_haswell = true, \ > > > + .supports_simd16_3src = true, \ > > > + .has_resource_streamer = true,\ > > > + /* WaCSScratchSize:hsw */ \ > > > + .cs_scratch_ids_per_subslice = 16 * 8 > > > > > > static const struct gen_device_info gen_device_info_hsw_gt1 = { > > > HSW_FEATURES, .gt = 1, > > > @@ -476,6 +478,7 @@ static const struct gen_device_info > > > gen_device_info_chv = { > > > .max_gs_threads = 80, > > > .max_wm_threads = 128, > > > .max_cs_threads = 6 * 7, > > > + .cs_scratch_ids_per_subslice = 8 * 7, > > > .urb = { > > >.size = 192, > > >.min_entries = { > > > diff --git a/src/intel/dev/gen_device_info.h > > > b/src/intel/dev/gen_device_info.h > > > index b8044d00032..fa6b38f19ec 100644 > > > --- a/src/intel/dev/gen_device_info.h > > > +++ b/src/intel/dev/gen_device_info.h > > > @@ -148,6 +148,14 @@ struct gen_device_info > > > */ > > > unsigned max_cs_threads; > > > > > > + /** > > > +* The range of scratch addresses may differ from the number of EUs > > > +* available for compute programs. This requires allocating a larger > > > +* scratch buffer. For affected hardware, this will be set. If this > > > is not > > > +* set, then max_cs_threads should be used instead. > > > +*/ > > > + unsigned cs_scratch_ids_per_subslice; > > > + > > > struct { > > >/** > > > * Hardware default URB size. > > > > > > > This works, but it's not quite what I had in mind. I was thinking you > > would add the new field, then add this hunk to gen_get_device_info(): > > > >if (devinfo->is_haswell) { > > /* WaCSScratchSize:hsw > >* > >* Haswell's scratch space address calculation appears to be sparse > >* rather than tightly packed. The Thread ID has bits indicating > >* which subslice, EU within a subslice, and thread within an EU it > >* is. There's a maximum of two slices and two subslices, so these > >* can be stored with a single bit. Even though there are only 10 EUs > >* per subslice, this is stored in 4 bits, so there's an effective > >* maximum value of 16 EUs. Similarly, although there are only 7 > >* threads per EU, this is stored in a 3 bit number, giving an > >* effective maximum value of 8 threads per EU. > >* > >* This means that we need to use 16 * 8 instead of 10 * 7 for the > >* number of threads per subslice. > >*/ > > devinfo->cs_scratch_ids_per_subslice = 16 * 8; > >} else if (devinfo->is_cherryview) { > > /* For Cherryview, it appears that the scratch addresses for the 6 EU > >* devices may still generate compute scratch addresses covering the > >* same range as 8 EU. > >*/ > > devinfo->cs_scratch_ids_per_subslice = 8 * 7; > >} else { > > devinfo->cs_scratch_ids_per_subslice = devinfo->max_cs_threads; > >} > > Originally, I had initialized them in the structures, and then had: > >if (devinfo->cs_scratch_ids_per_subslice == 0) > devinfo->cs_scratch_ids_per_subslice = devinfo->max_cs_threads; > > I used it directly, like you mentioned below. > > Then I got concerned about how the kernel params might update > max_cs_threads, and should that cause cs_scratch_ids_per_subslice to > be updated? So, I changed it to be treated as an override. > > But, I could change the code that updates max_cs_threads to also > update cs_scratch_ids_per_subslice. Ugh...thanks for paying more attention than I did. I forgot we update max_cs_threads on Cherryview (and might elsewhere someday - though I kinda hope not). What you did is probably safer. Series is Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___
Re: [Mesa-dev] [PATCH 3/5] intel/dev: Add devinfo cs_scratch_ids_per_subslice field
On 2018-03-07 00:47:12, Kenneth Graunke wrote: > On Wednesday, March 7, 2018 12:16:28 AM PST Jordan Justen wrote: > > Suggested-by: Kenneth Graunke > > Signed-off-by: Jordan Justen > > --- > > src/intel/dev/gen_device_info.c | 13 - > > src/intel/dev/gen_device_info.h | 8 > > 2 files changed, 16 insertions(+), 5 deletions(-) > > > > diff --git a/src/intel/dev/gen_device_info.c > > b/src/intel/dev/gen_device_info.c > > index 1773009d33c..de018c6a692 100644 > > --- a/src/intel/dev/gen_device_info.c > > +++ b/src/intel/dev/gen_device_info.c > > @@ -284,11 +284,13 @@ static const struct gen_device_info > > gen_device_info_byt = { > > }, > > }; > > > > -#define HSW_FEATURES \ > > - GEN7_FEATURES,\ > > - .is_haswell = true, \ > > - .supports_simd16_3src = true, \ > > - .has_resource_streamer = true > > +#define HSW_FEATURES \ > > + GEN7_FEATURES,\ > > + .is_haswell = true, \ > > + .supports_simd16_3src = true, \ > > + .has_resource_streamer = true,\ > > + /* WaCSScratchSize:hsw */ \ > > + .cs_scratch_ids_per_subslice = 16 * 8 > > > > static const struct gen_device_info gen_device_info_hsw_gt1 = { > > HSW_FEATURES, .gt = 1, > > @@ -476,6 +478,7 @@ static const struct gen_device_info gen_device_info_chv > > = { > > .max_gs_threads = 80, > > .max_wm_threads = 128, > > .max_cs_threads = 6 * 7, > > + .cs_scratch_ids_per_subslice = 8 * 7, > > .urb = { > >.size = 192, > >.min_entries = { > > diff --git a/src/intel/dev/gen_device_info.h > > b/src/intel/dev/gen_device_info.h > > index b8044d00032..fa6b38f19ec 100644 > > --- a/src/intel/dev/gen_device_info.h > > +++ b/src/intel/dev/gen_device_info.h > > @@ -148,6 +148,14 @@ struct gen_device_info > > */ > > unsigned max_cs_threads; > > > > + /** > > +* The range of scratch addresses may differ from the number of EUs > > +* available for compute programs. This requires allocating a larger > > +* scratch buffer. For affected hardware, this will be set. If this is > > not > > +* set, then max_cs_threads should be used instead. > > +*/ > > + unsigned cs_scratch_ids_per_subslice; > > + > > struct { > >/** > > * Hardware default URB size. > > > > This works, but it's not quite what I had in mind. I was thinking you > would add the new field, then add this hunk to gen_get_device_info(): > >if (devinfo->is_haswell) { > /* WaCSScratchSize:hsw >* >* Haswell's scratch space address calculation appears to be sparse >* rather than tightly packed. The Thread ID has bits indicating >* which subslice, EU within a subslice, and thread within an EU it >* is. There's a maximum of two slices and two subslices, so these >* can be stored with a single bit. Even though there are only 10 EUs >* per subslice, this is stored in 4 bits, so there's an effective >* maximum value of 16 EUs. Similarly, although there are only 7 >* threads per EU, this is stored in a 3 bit number, giving an >* effective maximum value of 8 threads per EU. >* >* This means that we need to use 16 * 8 instead of 10 * 7 for the >* number of threads per subslice. >*/ > devinfo->cs_scratch_ids_per_subslice = 16 * 8; >} else if (devinfo->is_cherryview) { > /* For Cherryview, it appears that the scratch addresses for the 6 EU >* devices may still generate compute scratch addresses covering the >* same range as 8 EU. >*/ > devinfo->cs_scratch_ids_per_subslice = 8 * 7; >} else { > devinfo->cs_scratch_ids_per_subslice = devinfo->max_cs_threads; >} Originally, I had initialized them in the structures, and then had: if (devinfo->cs_scratch_ids_per_subslice == 0) devinfo->cs_scratch_ids_per_subslice = devinfo->max_cs_threads; I used it directly, like you mentioned below. Then I got concerned about how the kernel params might update max_cs_threads, and should that cause cs_scratch_ids_per_subslice to be updated? So, I changed it to be treated as an override. But, I could change the code that updates max_cs_threads to also update cs_scratch_ids_per_subslice. > This way, the field is always meaningful, and so in patches 4-5 you can > just use devinfo->cs_scratch_ids_per_subslice directly rather than doing > > const unsigned scratch_ids_per_subslice = > devinfo->cs_scratch_ids_per_subslice ? > devinfo->cs_scratch_ids_per_subslice : devinfo->max_cs_threads; > > This also has the side effect of preserving the comment, which I think > is valuable. Well, I guess I thought that the comment might be a bit overwrought. :) I mean, it's good to know that there's a special circumstance, but maybe it's not too bad if you hav
Re: [Mesa-dev] [PATCH 3/5] intel/dev: Add devinfo cs_scratch_ids_per_subslice field
On Wednesday, March 7, 2018 12:16:28 AM PST Jordan Justen wrote: > Suggested-by: Kenneth Graunke > Signed-off-by: Jordan Justen > --- > src/intel/dev/gen_device_info.c | 13 - > src/intel/dev/gen_device_info.h | 8 > 2 files changed, 16 insertions(+), 5 deletions(-) > > diff --git a/src/intel/dev/gen_device_info.c b/src/intel/dev/gen_device_info.c > index 1773009d33c..de018c6a692 100644 > --- a/src/intel/dev/gen_device_info.c > +++ b/src/intel/dev/gen_device_info.c > @@ -284,11 +284,13 @@ static const struct gen_device_info gen_device_info_byt > = { > }, > }; > > -#define HSW_FEATURES \ > - GEN7_FEATURES,\ > - .is_haswell = true, \ > - .supports_simd16_3src = true, \ > - .has_resource_streamer = true > +#define HSW_FEATURES \ > + GEN7_FEATURES,\ > + .is_haswell = true, \ > + .supports_simd16_3src = true, \ > + .has_resource_streamer = true,\ > + /* WaCSScratchSize:hsw */ \ > + .cs_scratch_ids_per_subslice = 16 * 8 > > static const struct gen_device_info gen_device_info_hsw_gt1 = { > HSW_FEATURES, .gt = 1, > @@ -476,6 +478,7 @@ static const struct gen_device_info gen_device_info_chv = > { > .max_gs_threads = 80, > .max_wm_threads = 128, > .max_cs_threads = 6 * 7, > + .cs_scratch_ids_per_subslice = 8 * 7, > .urb = { >.size = 192, >.min_entries = { > diff --git a/src/intel/dev/gen_device_info.h b/src/intel/dev/gen_device_info.h > index b8044d00032..fa6b38f19ec 100644 > --- a/src/intel/dev/gen_device_info.h > +++ b/src/intel/dev/gen_device_info.h > @@ -148,6 +148,14 @@ struct gen_device_info > */ > unsigned max_cs_threads; > > + /** > +* The range of scratch addresses may differ from the number of EUs > +* available for compute programs. This requires allocating a larger > +* scratch buffer. For affected hardware, this will be set. If this is not > +* set, then max_cs_threads should be used instead. > +*/ > + unsigned cs_scratch_ids_per_subslice; > + > struct { >/** > * Hardware default URB size. > This works, but it's not quite what I had in mind. I was thinking you would add the new field, then add this hunk to gen_get_device_info(): if (devinfo->is_haswell) { /* WaCSScratchSize:hsw * * Haswell's scratch space address calculation appears to be sparse * rather than tightly packed. The Thread ID has bits indicating * which subslice, EU within a subslice, and thread within an EU it * is. There's a maximum of two slices and two subslices, so these * can be stored with a single bit. Even though there are only 10 EUs * per subslice, this is stored in 4 bits, so there's an effective * maximum value of 16 EUs. Similarly, although there are only 7 * threads per EU, this is stored in a 3 bit number, giving an * effective maximum value of 8 threads per EU. * * This means that we need to use 16 * 8 instead of 10 * 7 for the * number of threads per subslice. */ devinfo->cs_scratch_ids_per_subslice = 16 * 8; } else if (devinfo->is_cherryview) { /* For Cherryview, it appears that the scratch addresses for the 6 EU * devices may still generate compute scratch addresses covering the * same range as 8 EU. */ devinfo->cs_scratch_ids_per_subslice = 8 * 7; } else { devinfo->cs_scratch_ids_per_subslice = devinfo->max_cs_threads; } This way, the field is always meaningful, and so in patches 4-5 you can just use devinfo->cs_scratch_ids_per_subslice directly rather than doing const unsigned scratch_ids_per_subslice = devinfo->cs_scratch_ids_per_subslice ? devinfo->cs_scratch_ids_per_subslice : devinfo->max_cs_threads; This also has the side effect of preserving the comment, which I think is valuable. There's already some precedent for this, as gen_get_device_info() already whacks max_wm_threads precisely to deal with scratch IDs being somewhat sparse. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev