Re: [Mesa-dev] [PATCH 1/3] i965: Move down genX_upload_sbe in profiles.

2018-06-12 Thread Mathias Fröhlich
Hi Chris,

thanks for looking into that!

On Saturday, 2 June 2018 09:58:14 CEST Chris Wilson wrote:
> Quoting mathias.froehl...@gmx.net (2018-05-17 07:38:26)
> > From: Mathias Fröhlich 
> > 
> > Avoid looping over all VARYING_SLOT_MAX urb_setup array
> > entries from genX_upload_sbe. Prepare an array indirection
> > to the active entries of urb_setup already in the compile
> > step. On upload only walk the active arrays.
> > 
> > Signed-off-by: Mathias Fröhlich 
> > ---
> >  src/intel/compiler/brw_compiler.h |  7 +++
> >  src/intel/compiler/brw_fs.cpp | 23 +++
> >  src/intel/compiler/brw_fs.h   |  2 ++
> >  src/intel/compiler/brw_fs_visitor.cpp |  1 +
> >  src/mesa/drivers/dri/i965/genX_state_upload.c |  7 +++
> >  5 files changed, 36 insertions(+), 4 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_compiler.h 
> > b/src/intel/compiler/brw_compiler.h
> > index 8b4e6fe2e2..a9df45e00d 100644
> > --- a/src/intel/compiler/brw_compiler.h
> > +++ b/src/intel/compiler/brw_compiler.h
> > @@ -743,6 +743,13 @@ struct brw_wm_prog_data {
> >  * For varying slots that are not used by the FS, the value is -1.
> >  */
> > int urb_setup[VARYING_SLOT_MAX];
> > +   /**
> > +* Cache structure into the urb_setup array above that contains the
> > +* attribute numbers of active varyings out of urb_setup.
> > +* The actual count is stored in urb_setup_attribs_count.
> > +*/
> > +   int urb_setup_attribs[VARYING_SLOT_MAX];
> > +   int urb_setup_attribs_count;
> >  };
> >  
> >  struct brw_push_const_block {
> > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> > index b21996c168..6930414067 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -1507,6 +1507,25 @@ fs_visitor::assign_curb_setup()
> > this->first_non_payload_grf = payload.num_regs + 
> > prog_data->curb_read_length;
> >  }
> >  
> > +/*
> > + * Build up an array of indices into the urb_setup array that
> > + * references the active entries of the urb_setup array.
> > + * Used to accelerate walking the active entries of the urb_setup array
> > + * on each upload.
> > + */
> > +void
> > +brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data)
> > +{
> > +   int index = 0;
> > +   for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) {
> > +  int input_index = wm_prog_data->urb_setup[attr];
> > +  if (input_index < 0)
> > + continue;
> > +  wm_prog_data->urb_setup_attribs[index++] = attr;
> 
> So far the only consumer of this wants the input_index again.
> Does that change, or is it worth including both to avoid the trawl?

Hmm, I don't know too much about the internal requirements of hardware in this 
regard.
But one property of the current code is that the current code orders the
varying slots in urb_setup[] with ascending attribute index.
So if we collapse the urb_setup[] and urb_setup_index[] arrays into something 
like

struct {
   uint8_t attrib;
   uint8_t index;
}  urb_setup[VARYING_SLOT_MAX;
uint8_t urb_setup_count;

do we need to sort that afterwards by attribute before we can use that
in genX_upload_sbe?

> Is uint8_t (with a STATIC_ASSERT) good enough?
Sure, I was catching up with the next declaration beside to stick
with the 'surrounding coding style'. That's changed here in a v2 version.

We could even reach an even smaller cache footprint by using a single uint64_t 
and
iterate that using u_bit_scan64(). But I received some general headwind lately 
from
someone who did not like these bitmask loops. So I apply the bitmask iteration 
only in
places where the bitmasks really provide a larger improvement than just a 
smaller
cache footprint.
What do you think?

best

Mathias


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] i965: Move down genX_upload_sbe in profiles.

2018-06-02 Thread Chris Wilson
Quoting mathias.froehl...@gmx.net (2018-05-17 07:38:26)
> From: Mathias Fröhlich 
> 
> Avoid looping over all VARYING_SLOT_MAX urb_setup array
> entries from genX_upload_sbe. Prepare an array indirection
> to the active entries of urb_setup already in the compile
> step. On upload only walk the active arrays.
> 
> Signed-off-by: Mathias Fröhlich 
> ---
>  src/intel/compiler/brw_compiler.h |  7 +++
>  src/intel/compiler/brw_fs.cpp | 23 +++
>  src/intel/compiler/brw_fs.h   |  2 ++
>  src/intel/compiler/brw_fs_visitor.cpp |  1 +
>  src/mesa/drivers/dri/i965/genX_state_upload.c |  7 +++
>  5 files changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_compiler.h 
> b/src/intel/compiler/brw_compiler.h
> index 8b4e6fe2e2..a9df45e00d 100644
> --- a/src/intel/compiler/brw_compiler.h
> +++ b/src/intel/compiler/brw_compiler.h
> @@ -743,6 +743,13 @@ struct brw_wm_prog_data {
>  * For varying slots that are not used by the FS, the value is -1.
>  */
> int urb_setup[VARYING_SLOT_MAX];
> +   /**
> +* Cache structure into the urb_setup array above that contains the
> +* attribute numbers of active varyings out of urb_setup.
> +* The actual count is stored in urb_setup_attribs_count.
> +*/
> +   int urb_setup_attribs[VARYING_SLOT_MAX];
> +   int urb_setup_attribs_count;
>  };
>  
>  struct brw_push_const_block {
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index b21996c168..6930414067 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -1507,6 +1507,25 @@ fs_visitor::assign_curb_setup()
> this->first_non_payload_grf = payload.num_regs + 
> prog_data->curb_read_length;
>  }
>  
> +/*
> + * Build up an array of indices into the urb_setup array that
> + * references the active entries of the urb_setup array.
> + * Used to accelerate walking the active entries of the urb_setup array
> + * on each upload.
> + */
> +void
> +brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data)
> +{
> +   int index = 0;
> +   for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) {
> +  int input_index = wm_prog_data->urb_setup[attr];
> +  if (input_index < 0)
> + continue;
> +  wm_prog_data->urb_setup_attribs[index++] = attr;

So far the only consumer of this wants the input_index again.
Does that change, or is it worth including both to avoid the trawl?

Is uint8_t (with a STATIC_ASSERT) good enough?
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] i965: Move down genX_upload_sbe in profiles.

2018-05-17 Thread Mathias . Froehlich
From: Mathias Fröhlich 

Avoid looping over all VARYING_SLOT_MAX urb_setup array
entries from genX_upload_sbe. Prepare an array indirection
to the active entries of urb_setup already in the compile
step. On upload only walk the active arrays.

Signed-off-by: Mathias Fröhlich 
---
 src/intel/compiler/brw_compiler.h |  7 +++
 src/intel/compiler/brw_fs.cpp | 23 +++
 src/intel/compiler/brw_fs.h   |  2 ++
 src/intel/compiler/brw_fs_visitor.cpp |  1 +
 src/mesa/drivers/dri/i965/genX_state_upload.c |  7 +++
 5 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_compiler.h 
b/src/intel/compiler/brw_compiler.h
index 8b4e6fe2e2..a9df45e00d 100644
--- a/src/intel/compiler/brw_compiler.h
+++ b/src/intel/compiler/brw_compiler.h
@@ -743,6 +743,13 @@ struct brw_wm_prog_data {
 * For varying slots that are not used by the FS, the value is -1.
 */
int urb_setup[VARYING_SLOT_MAX];
+   /**
+* Cache structure into the urb_setup array above that contains the
+* attribute numbers of active varyings out of urb_setup.
+* The actual count is stored in urb_setup_attribs_count.
+*/
+   int urb_setup_attribs[VARYING_SLOT_MAX];
+   int urb_setup_attribs_count;
 };
 
 struct brw_push_const_block {
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index b21996c168..6930414067 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -1507,6 +1507,25 @@ fs_visitor::assign_curb_setup()
this->first_non_payload_grf = payload.num_regs + 
prog_data->curb_read_length;
 }
 
+/*
+ * Build up an array of indices into the urb_setup array that
+ * references the active entries of the urb_setup array.
+ * Used to accelerate walking the active entries of the urb_setup array
+ * on each upload.
+ */
+void
+brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data)
+{
+   int index = 0;
+   for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) {
+  int input_index = wm_prog_data->urb_setup[attr];
+  if (input_index < 0)
+ continue;
+  wm_prog_data->urb_setup_attribs[index++] = attr;
+   }
+   wm_prog_data->urb_setup_attribs_count = index;
+}
+
 void
 fs_visitor::calculate_urb_setup()
 {
@@ -1595,6 +1614,8 @@ fs_visitor::calculate_urb_setup()
}
 
prog_data->num_varying_inputs = urb_next;
+
+   brw_compute_urb_setup_index(prog_data);
 }
 
 void
@@ -6611,6 +6632,8 @@ gen9_ps_header_only_workaround(struct brw_wm_prog_data 
*wm_prog_data)
 
wm_prog_data->urb_setup[VARYING_SLOT_LAYER] = 0;
wm_prog_data->num_varying_inputs = 1;
+
+   brw_compute_urb_setup_index(wm_prog_data);
 }
 
 bool
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index e384db809d..4c76b30973 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -525,4 +525,6 @@ fs_reg setup_imm_df(const brw::fs_builder ,
 enum brw_barycentric_mode brw_barycentric_mode(enum glsl_interp_mode mode,
nir_intrinsic_op op);
 
+void brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data);
+
 #endif /* BRW_FS_H */
diff --git a/src/intel/compiler/brw_fs_visitor.cpp 
b/src/intel/compiler/brw_fs_visitor.cpp
index 7a5f6451f2..14805fce61 100644
--- a/src/intel/compiler/brw_fs_visitor.cpp
+++ b/src/intel/compiler/brw_fs_visitor.cpp
@@ -119,6 +119,7 @@ fs_visitor::emit_dummy_fs()
wm_prog_data->num_varying_inputs = devinfo->gen < 6 ? 1 : 0;
memset(wm_prog_data->urb_setup, -1,
   sizeof(wm_prog_data->urb_setup[0]) * VARYING_SLOT_MAX);
+   brw_compute_urb_setup_index(wm_prog_data);
 
/* We don't have any uniforms. */
stage_prog_data->nr_params = 0;
diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
b/src/mesa/drivers/dri/i965/genX_state_upload.c
index 6178bfa3f8..cd905d13e5 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -1104,12 +1104,11 @@ genX(calculate_attr_overrides)(const struct brw_context 
*brw,
 * BRW_NEW_PRIMITIVE | BRW_NEW_GS_PROG_DATA | BRW_NEW_TES_PROG_DATA
 */
bool drawing_points = brw_is_drawing_points(brw);
-
-   for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) {
+   for (int index = 0; index < wm_prog_data->urb_setup_attribs_count; index++) 
{
+  int attr = wm_prog_data->urb_setup_attribs[index];
   int input_index = wm_prog_data->urb_setup[attr];
 
-  if (input_index < 0)
- continue;
+  assert(0 <= input_index);
 
   /* _NEW_POINT */
   bool point_sprite = false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev