Re: [Mesa-dev] [PATCH 1/3] i965: Move down genX_upload_sbe in profiles.
Hi Chris, thanks for looking into that! On Saturday, 2 June 2018 09:58:14 CEST Chris Wilson wrote: > Quoting mathias.froehl...@gmx.net (2018-05-17 07:38:26) > > From: Mathias Fröhlich > > > > Avoid looping over all VARYING_SLOT_MAX urb_setup array > > entries from genX_upload_sbe. Prepare an array indirection > > to the active entries of urb_setup already in the compile > > step. On upload only walk the active arrays. > > > > Signed-off-by: Mathias Fröhlich > > --- > > src/intel/compiler/brw_compiler.h | 7 +++ > > src/intel/compiler/brw_fs.cpp | 23 +++ > > src/intel/compiler/brw_fs.h | 2 ++ > > src/intel/compiler/brw_fs_visitor.cpp | 1 + > > src/mesa/drivers/dri/i965/genX_state_upload.c | 7 +++ > > 5 files changed, 36 insertions(+), 4 deletions(-) > > > > diff --git a/src/intel/compiler/brw_compiler.h > > b/src/intel/compiler/brw_compiler.h > > index 8b4e6fe2e2..a9df45e00d 100644 > > --- a/src/intel/compiler/brw_compiler.h > > +++ b/src/intel/compiler/brw_compiler.h > > @@ -743,6 +743,13 @@ struct brw_wm_prog_data { > > * For varying slots that are not used by the FS, the value is -1. > > */ > > int urb_setup[VARYING_SLOT_MAX]; > > + /** > > +* Cache structure into the urb_setup array above that contains the > > +* attribute numbers of active varyings out of urb_setup. > > +* The actual count is stored in urb_setup_attribs_count. > > +*/ > > + int urb_setup_attribs[VARYING_SLOT_MAX]; > > + int urb_setup_attribs_count; > > }; > > > > struct brw_push_const_block { > > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > > index b21996c168..6930414067 100644 > > --- a/src/intel/compiler/brw_fs.cpp > > +++ b/src/intel/compiler/brw_fs.cpp > > @@ -1507,6 +1507,25 @@ fs_visitor::assign_curb_setup() > > this->first_non_payload_grf = payload.num_regs + > > prog_data->curb_read_length; > > } > > > > +/* > > + * Build up an array of indices into the urb_setup array that > > + * references the active entries of the urb_setup array. > > + * Used to accelerate walking the active entries of the urb_setup array > > + * on each upload. > > + */ > > +void > > +brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data) > > +{ > > + int index = 0; > > + for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) { > > + int input_index = wm_prog_data->urb_setup[attr]; > > + if (input_index < 0) > > + continue; > > + wm_prog_data->urb_setup_attribs[index++] = attr; > > So far the only consumer of this wants the input_index again. > Does that change, or is it worth including both to avoid the trawl? Hmm, I don't know too much about the internal requirements of hardware in this regard. But one property of the current code is that the current code orders the varying slots in urb_setup[] with ascending attribute index. So if we collapse the urb_setup[] and urb_setup_index[] arrays into something like struct { uint8_t attrib; uint8_t index; } urb_setup[VARYING_SLOT_MAX; uint8_t urb_setup_count; do we need to sort that afterwards by attribute before we can use that in genX_upload_sbe? > Is uint8_t (with a STATIC_ASSERT) good enough? Sure, I was catching up with the next declaration beside to stick with the 'surrounding coding style'. That's changed here in a v2 version. We could even reach an even smaller cache footprint by using a single uint64_t and iterate that using u_bit_scan64(). But I received some general headwind lately from someone who did not like these bitmask loops. So I apply the bitmask iteration only in places where the bitmasks really provide a larger improvement than just a smaller cache footprint. What do you think? best Mathias ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] i965: Move down genX_upload_sbe in profiles.
Quoting mathias.froehl...@gmx.net (2018-05-17 07:38:26) > From: Mathias Fröhlich > > Avoid looping over all VARYING_SLOT_MAX urb_setup array > entries from genX_upload_sbe. Prepare an array indirection > to the active entries of urb_setup already in the compile > step. On upload only walk the active arrays. > > Signed-off-by: Mathias Fröhlich > --- > src/intel/compiler/brw_compiler.h | 7 +++ > src/intel/compiler/brw_fs.cpp | 23 +++ > src/intel/compiler/brw_fs.h | 2 ++ > src/intel/compiler/brw_fs_visitor.cpp | 1 + > src/mesa/drivers/dri/i965/genX_state_upload.c | 7 +++ > 5 files changed, 36 insertions(+), 4 deletions(-) > > diff --git a/src/intel/compiler/brw_compiler.h > b/src/intel/compiler/brw_compiler.h > index 8b4e6fe2e2..a9df45e00d 100644 > --- a/src/intel/compiler/brw_compiler.h > +++ b/src/intel/compiler/brw_compiler.h > @@ -743,6 +743,13 @@ struct brw_wm_prog_data { > * For varying slots that are not used by the FS, the value is -1. > */ > int urb_setup[VARYING_SLOT_MAX]; > + /** > +* Cache structure into the urb_setup array above that contains the > +* attribute numbers of active varyings out of urb_setup. > +* The actual count is stored in urb_setup_attribs_count. > +*/ > + int urb_setup_attribs[VARYING_SLOT_MAX]; > + int urb_setup_attribs_count; > }; > > struct brw_push_const_block { > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index b21996c168..6930414067 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -1507,6 +1507,25 @@ fs_visitor::assign_curb_setup() > this->first_non_payload_grf = payload.num_regs + > prog_data->curb_read_length; > } > > +/* > + * Build up an array of indices into the urb_setup array that > + * references the active entries of the urb_setup array. > + * Used to accelerate walking the active entries of the urb_setup array > + * on each upload. > + */ > +void > +brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data) > +{ > + int index = 0; > + for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) { > + int input_index = wm_prog_data->urb_setup[attr]; > + if (input_index < 0) > + continue; > + wm_prog_data->urb_setup_attribs[index++] = attr; So far the only consumer of this wants the input_index again. Does that change, or is it worth including both to avoid the trawl? Is uint8_t (with a STATIC_ASSERT) good enough? -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] i965: Move down genX_upload_sbe in profiles.
From: Mathias FröhlichAvoid looping over all VARYING_SLOT_MAX urb_setup array entries from genX_upload_sbe. Prepare an array indirection to the active entries of urb_setup already in the compile step. On upload only walk the active arrays. Signed-off-by: Mathias Fröhlich --- src/intel/compiler/brw_compiler.h | 7 +++ src/intel/compiler/brw_fs.cpp | 23 +++ src/intel/compiler/brw_fs.h | 2 ++ src/intel/compiler/brw_fs_visitor.cpp | 1 + src/mesa/drivers/dri/i965/genX_state_upload.c | 7 +++ 5 files changed, 36 insertions(+), 4 deletions(-) diff --git a/src/intel/compiler/brw_compiler.h b/src/intel/compiler/brw_compiler.h index 8b4e6fe2e2..a9df45e00d 100644 --- a/src/intel/compiler/brw_compiler.h +++ b/src/intel/compiler/brw_compiler.h @@ -743,6 +743,13 @@ struct brw_wm_prog_data { * For varying slots that are not used by the FS, the value is -1. */ int urb_setup[VARYING_SLOT_MAX]; + /** +* Cache structure into the urb_setup array above that contains the +* attribute numbers of active varyings out of urb_setup. +* The actual count is stored in urb_setup_attribs_count. +*/ + int urb_setup_attribs[VARYING_SLOT_MAX]; + int urb_setup_attribs_count; }; struct brw_push_const_block { diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp index b21996c168..6930414067 100644 --- a/src/intel/compiler/brw_fs.cpp +++ b/src/intel/compiler/brw_fs.cpp @@ -1507,6 +1507,25 @@ fs_visitor::assign_curb_setup() this->first_non_payload_grf = payload.num_regs + prog_data->curb_read_length; } +/* + * Build up an array of indices into the urb_setup array that + * references the active entries of the urb_setup array. + * Used to accelerate walking the active entries of the urb_setup array + * on each upload. + */ +void +brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data) +{ + int index = 0; + for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) { + int input_index = wm_prog_data->urb_setup[attr]; + if (input_index < 0) + continue; + wm_prog_data->urb_setup_attribs[index++] = attr; + } + wm_prog_data->urb_setup_attribs_count = index; +} + void fs_visitor::calculate_urb_setup() { @@ -1595,6 +1614,8 @@ fs_visitor::calculate_urb_setup() } prog_data->num_varying_inputs = urb_next; + + brw_compute_urb_setup_index(prog_data); } void @@ -6611,6 +6632,8 @@ gen9_ps_header_only_workaround(struct brw_wm_prog_data *wm_prog_data) wm_prog_data->urb_setup[VARYING_SLOT_LAYER] = 0; wm_prog_data->num_varying_inputs = 1; + + brw_compute_urb_setup_index(wm_prog_data); } bool diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h index e384db809d..4c76b30973 100644 --- a/src/intel/compiler/brw_fs.h +++ b/src/intel/compiler/brw_fs.h @@ -525,4 +525,6 @@ fs_reg setup_imm_df(const brw::fs_builder , enum brw_barycentric_mode brw_barycentric_mode(enum glsl_interp_mode mode, nir_intrinsic_op op); +void brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data); + #endif /* BRW_FS_H */ diff --git a/src/intel/compiler/brw_fs_visitor.cpp b/src/intel/compiler/brw_fs_visitor.cpp index 7a5f6451f2..14805fce61 100644 --- a/src/intel/compiler/brw_fs_visitor.cpp +++ b/src/intel/compiler/brw_fs_visitor.cpp @@ -119,6 +119,7 @@ fs_visitor::emit_dummy_fs() wm_prog_data->num_varying_inputs = devinfo->gen < 6 ? 1 : 0; memset(wm_prog_data->urb_setup, -1, sizeof(wm_prog_data->urb_setup[0]) * VARYING_SLOT_MAX); + brw_compute_urb_setup_index(wm_prog_data); /* We don't have any uniforms. */ stage_prog_data->nr_params = 0; diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c index 6178bfa3f8..cd905d13e5 100644 --- a/src/mesa/drivers/dri/i965/genX_state_upload.c +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c @@ -1104,12 +1104,11 @@ genX(calculate_attr_overrides)(const struct brw_context *brw, * BRW_NEW_PRIMITIVE | BRW_NEW_GS_PROG_DATA | BRW_NEW_TES_PROG_DATA */ bool drawing_points = brw_is_drawing_points(brw); - - for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) { + for (int index = 0; index < wm_prog_data->urb_setup_attribs_count; index++) { + int attr = wm_prog_data->urb_setup_attribs[index]; int input_index = wm_prog_data->urb_setup[attr]; - if (input_index < 0) - continue; + assert(0 <= input_index); /* _NEW_POINT */ bool point_sprite = false; -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev