Re: [Mesa-dev] [PATCH 11/11] nir: Copy propagation between blocks

2018-10-15 Thread Jason Ekstrand
Can you please re-send the latest version of this patch?  It's easier to
comment on the ML.

On Mon, Oct 15, 2018 at 12:44 PM Caio Marcelo de Oliveira Filho <
caio.olive...@intel.com> wrote:

> Hi,
>
> > > +   }
> > > +
> > > +   if (new_written) {
> > > +  /* Merge new information to the parent control flow node. */
> > > +  if (written) {
> > > + written->modes |= new_written->modes;
> > > + struct hash_entry *ht_entry;
> > > + hash_table_foreach(new_written->derefs, ht_entry) {
> > > +_mesa_hash_table_insert_pre_hashed(written->derefs,
> > > ht_entry->hash,
> > > +   ht_entry->key,
> > > ht_entry->data);
> > >
> >
> > Do you want to somehow OR masks together?  This is just picking one of
> the
> > two masks.
>
> You are correct.  Fixed.
>
> Turns out the way the local code we are reusing here is structured, we
> don't take much advantage of the fine-grained tracking here.  Added a
> TODO about this.
>
>
>
>
> > >  static void
> > > -copy_entry_remove(struct copy_prop_var_state *state, struct copy_entry
> > > *entry)
> > > +copy_entry_remove(struct util_dynarray *copies,
> > > +  struct copy_entry *entry)
> > >  {
> > > -   list_del(>link);
> > > -   list_add(>link, >copy_free_list);
> > > +   *entry = util_dynarray_pop(copies, struct copy_entry);
> > >
> >
> > It might be worth a quick comment to justify that this works.  It took
> me a
> > minute to figure out that you were re-ordering the array in the process.
>
> Added a function comment describing what this does and stating it is
> safe to use during a reverse iteration.  And also added a comment
> highlighting how this works when it is the last element.
>
> (...)
>
> >
> > > +lookup_entry_and_kill_aliases(struct util_dynarray *copies,
> > > +  nir_deref_instr *deref,
> > > +  unsigned write_mask)
> > >  {
> > > struct copy_entry *entry = NULL;
> > > -   list_for_each_entry_safe(struct copy_entry, iter, >copies,
> > > link) {
> > > +   util_dynarray_foreach_reverse(copies, struct copy_entry, iter) {
> > >
> >
> > Also might be worth commenting why it's safe to remove elements while
> > walking the array.
>
> I think the comments to the copy_entry_remove suffice, but can add it
> here before landing if you prefer.
>
> The latest code is in
>
> https://gitlab.freedesktop.org/cmarcelo/mesa/commits/copy-prop
>
> and all the issues I haven't commented are supposed to be fixed
> according your comment (added new version to each patch that changed).
> Merged the use patches into a single one
>
> intel/nir, freedreno/ir3: Use the separated dead write vars pass
>
> Patches that still need R-b:
>
>nir: Copy propagation between blocks
>nir: Separate dead write removal into its own pass
>
>
> Caio
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/11] nir: Copy propagation between blocks

2018-10-15 Thread Caio Marcelo de Oliveira Filho
Hi,

> > +   }
> > +
> > +   if (new_written) {
> > +  /* Merge new information to the parent control flow node. */
> > +  if (written) {
> > + written->modes |= new_written->modes;
> > + struct hash_entry *ht_entry;
> > + hash_table_foreach(new_written->derefs, ht_entry) {
> > +_mesa_hash_table_insert_pre_hashed(written->derefs,
> > ht_entry->hash,
> > +   ht_entry->key,
> > ht_entry->data);
> >
> 
> Do you want to somehow OR masks together?  This is just picking one of the
> two masks.

You are correct.  Fixed.

Turns out the way the local code we are reusing here is structured, we
don't take much advantage of the fine-grained tracking here.  Added a
TODO about this.




> >  static void
> > -copy_entry_remove(struct copy_prop_var_state *state, struct copy_entry
> > *entry)
> > +copy_entry_remove(struct util_dynarray *copies,
> > +  struct copy_entry *entry)
> >  {
> > -   list_del(>link);
> > -   list_add(>link, >copy_free_list);
> > +   *entry = util_dynarray_pop(copies, struct copy_entry);
> >
> 
> It might be worth a quick comment to justify that this works.  It took me a
> minute to figure out that you were re-ordering the array in the process.

Added a function comment describing what this does and stating it is
safe to use during a reverse iteration.  And also added a comment
highlighting how this works when it is the last element.

(...)

> 
> > +lookup_entry_and_kill_aliases(struct util_dynarray *copies,
> > +  nir_deref_instr *deref,
> > +  unsigned write_mask)
> >  {
> > struct copy_entry *entry = NULL;
> > -   list_for_each_entry_safe(struct copy_entry, iter, >copies,
> > link) {
> > +   util_dynarray_foreach_reverse(copies, struct copy_entry, iter) {
> >
> 
> Also might be worth commenting why it's safe to remove elements while
> walking the array.

I think the comments to the copy_entry_remove suffice, but can add it
here before landing if you prefer.

The latest code is in

https://gitlab.freedesktop.org/cmarcelo/mesa/commits/copy-prop

and all the issues I haven't commented are supposed to be fixed
according your comment (added new version to each patch that changed).
Merged the use patches into a single one

intel/nir, freedreno/ir3: Use the separated dead write vars pass

Patches that still need R-b:

   nir: Copy propagation between blocks
   nir: Separate dead write removal into its own pass


Caio
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/11] nir: Copy propagation between blocks

2018-10-03 Thread Jason Ekstrand
On Sat, Sep 15, 2018 at 12:46 AM Caio Marcelo de Oliveira Filho <
caio.olive...@intel.com> wrote:

> Extend the pass to propagate the copies information along the control
> flow graph.  It performs two walks, first it collects the vars
> that were written inside each node. Then it walks applying the copy
> propagation using a list of copies previously available.  At each node
> the list is invalidated according to results from the first walk.
>
> This approach is simpler than a full data-flow analysis, but covers
> various cases.  If derefs are used for operating on more memory
> resources (e.g. SSBOs), the difference from a regular pass is expected
> to be more visible -- as the SSA copy propagation pass won't apply to
> those.
>
> A full data-flow analysis would handle more scenarios: conditional
> breaks in the control flow and merge equivalent effects from multiple
> branches (e.g. using a phi node to merge the source for writes to the
> same deref).  However, as previous commentary in the code stated, its
> complexity 'rapidly get out of hand'.  The current patch is a good
> intermediate step towards more complex analysis.
>
> The 'copies' linked list was modified to use util_dynarray to make it
> more convenient to clone it (to handle ifs/loops).
>
> Annotated shader-db results for Skylake:
>
> total instructions in shared programs: 15105796 -> 15105451 (<.01%)
> instructions in affected programs: 152293 -> 151948 (-0.23%)
> helped: 96
> HURT: 17
>
> All the HURTs and many HELPs are one instruction.  Looking
> at pass by pass outputs, the copy prop kicks in removing a
> bunch of loads correctly, which ends up altering what other
> other optimizations kick.  In those cases the copies would be
> propagated after lowering to SSA.
>
> In few HELPs we are actually helping doing more than was
> possible previously, e.g. consolidating load_uniforms from
> different blocks.  Most of those are from
> shaders/dolphin/ubershaders/.
>
> total cycles in shared programs: 566048861 -> 565954876 (-0.02%)
> cycles in affected programs: 151461830 -> 151367845 (-0.06%)
> helped: 2933
> HURT: 2950
>
> A lot of noise on both sides.
>
> total loops in shared programs: 4603 -> 4603 (0.00%)
> loops in affected programs: 0 -> 0
> helped: 0
> HURT: 0
>
> total spills in shared programs: 11085 -> 11073 (-0.11%)
> spills in affected programs: 23 -> 11 (-52.17%)
> helped: 1
> HURT: 0
>
> The shaders/dolphin/ubershaders/12.shader_test was able to
> pull a couple of loads from inside if statements and reuse
> them.
>
> total fills in shared programs: 23143 -> 23089 (-0.23%)
> fills in affected programs: 2718 -> 2664 (-1.99%)
> helped: 27
> HURT: 0
>
> All from shaders/dolphin/ubershaders/.
>
> LOST:   0
> GAINED: 0
>
> The other generations follow the same overall shape.  The spills and
> fills HURTs are all from the same game.
>
> shader-db results for Broadwell.
>
> total instructions in shared programs: 15402037 -> 15401841 (<.01%)
> instructions in affected programs: 144386 -> 144190 (-0.14%)
> helped: 86
> HURT: 9
>
> total cycles in shared programs: 600912755 -> 600902486 (<.01%)
> cycles in affected programs: 185662820 -> 185652551 (<.01%)
> helped: 2598
> HURT: 3053
>
> total loops in shared programs: 4579 -> 4579 (0.00%)
> loops in affected programs: 0 -> 0
> helped: 0
> HURT: 0
>
> total spills in shared programs: 80929 -> 80924 (<.01%)
> spills in affected programs: 720 -> 715 (-0.69%)
> helped: 1
> HURT: 5
>
> total fills in shared programs: 93057 -> 93013 (-0.05%)
> fills in affected programs: 3398 -> 3354 (-1.29%)
> helped: 27
> HURT: 5
>
> LOST:   0
> GAINED: 2
>
> shader-db results for Haswell:
>
> total instructions in shared programs: 9231975 -> 9230357 (-0.02%)
> instructions in affected programs: 44992 -> 43374 (-3.60%)
> helped: 27
> HURT: 69
>
> total cycles in shared programs: 87760587 -> 87727502 (-0.04%)
> cycles in affected programs: 7720673 -> 7687588 (-0.43%)
> helped: 1609
> HURT: 1416
>
> total loops in shared programs: 1830 -> 1830 (0.00%)
> loops in affected programs: 0 -> 0
> helped: 0
> HURT: 0
>
> total spills in shared programs: 1988 -> 1692 (-14.89%)
> spills in affected programs: 296 -> 0
> helped: 1
> HURT: 0
>
> total fills in shared programs: 2103 -> 1668 (-20.68%)
> fills in affected programs: 438 -> 3 (-99.32%)
> helped: 4
> HURT: 0
>
> LOST:   0
> GAINED: 1
> ---
>  src/compiler/nir/nir_opt_copy_prop_vars.c | 394 +-
>  1 file changed, 317 insertions(+), 77 deletions(-)
>
> diff --git a/src/compiler/nir/nir_opt_copy_prop_vars.c
> b/src/compiler/nir/nir_opt_copy_prop_vars.c
> index