Re: [Mesa-dev] [PATCH 11/11] nir: Copy propagation between blocks
Can you please re-send the latest version of this patch? It's easier to comment on the ML. On Mon, Oct 15, 2018 at 12:44 PM Caio Marcelo de Oliveira Filho < caio.olive...@intel.com> wrote: > Hi, > > > > + } > > > + > > > + if (new_written) { > > > + /* Merge new information to the parent control flow node. */ > > > + if (written) { > > > + written->modes |= new_written->modes; > > > + struct hash_entry *ht_entry; > > > + hash_table_foreach(new_written->derefs, ht_entry) { > > > +_mesa_hash_table_insert_pre_hashed(written->derefs, > > > ht_entry->hash, > > > + ht_entry->key, > > > ht_entry->data); > > > > > > > Do you want to somehow OR masks together? This is just picking one of > the > > two masks. > > You are correct. Fixed. > > Turns out the way the local code we are reusing here is structured, we > don't take much advantage of the fine-grained tracking here. Added a > TODO about this. > > > > > > > static void > > > -copy_entry_remove(struct copy_prop_var_state *state, struct copy_entry > > > *entry) > > > +copy_entry_remove(struct util_dynarray *copies, > > > + struct copy_entry *entry) > > > { > > > - list_del(>link); > > > - list_add(>link, >copy_free_list); > > > + *entry = util_dynarray_pop(copies, struct copy_entry); > > > > > > > It might be worth a quick comment to justify that this works. It took > me a > > minute to figure out that you were re-ordering the array in the process. > > Added a function comment describing what this does and stating it is > safe to use during a reverse iteration. And also added a comment > highlighting how this works when it is the last element. > > (...) > > > > > > +lookup_entry_and_kill_aliases(struct util_dynarray *copies, > > > + nir_deref_instr *deref, > > > + unsigned write_mask) > > > { > > > struct copy_entry *entry = NULL; > > > - list_for_each_entry_safe(struct copy_entry, iter, >copies, > > > link) { > > > + util_dynarray_foreach_reverse(copies, struct copy_entry, iter) { > > > > > > > Also might be worth commenting why it's safe to remove elements while > > walking the array. > > I think the comments to the copy_entry_remove suffice, but can add it > here before landing if you prefer. > > The latest code is in > > https://gitlab.freedesktop.org/cmarcelo/mesa/commits/copy-prop > > and all the issues I haven't commented are supposed to be fixed > according your comment (added new version to each patch that changed). > Merged the use patches into a single one > > intel/nir, freedreno/ir3: Use the separated dead write vars pass > > Patches that still need R-b: > >nir: Copy propagation between blocks >nir: Separate dead write removal into its own pass > > > Caio > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/11] nir: Copy propagation between blocks
Hi, > > + } > > + > > + if (new_written) { > > + /* Merge new information to the parent control flow node. */ > > + if (written) { > > + written->modes |= new_written->modes; > > + struct hash_entry *ht_entry; > > + hash_table_foreach(new_written->derefs, ht_entry) { > > +_mesa_hash_table_insert_pre_hashed(written->derefs, > > ht_entry->hash, > > + ht_entry->key, > > ht_entry->data); > > > > Do you want to somehow OR masks together? This is just picking one of the > two masks. You are correct. Fixed. Turns out the way the local code we are reusing here is structured, we don't take much advantage of the fine-grained tracking here. Added a TODO about this. > > static void > > -copy_entry_remove(struct copy_prop_var_state *state, struct copy_entry > > *entry) > > +copy_entry_remove(struct util_dynarray *copies, > > + struct copy_entry *entry) > > { > > - list_del(>link); > > - list_add(>link, >copy_free_list); > > + *entry = util_dynarray_pop(copies, struct copy_entry); > > > > It might be worth a quick comment to justify that this works. It took me a > minute to figure out that you were re-ordering the array in the process. Added a function comment describing what this does and stating it is safe to use during a reverse iteration. And also added a comment highlighting how this works when it is the last element. (...) > > > +lookup_entry_and_kill_aliases(struct util_dynarray *copies, > > + nir_deref_instr *deref, > > + unsigned write_mask) > > { > > struct copy_entry *entry = NULL; > > - list_for_each_entry_safe(struct copy_entry, iter, >copies, > > link) { > > + util_dynarray_foreach_reverse(copies, struct copy_entry, iter) { > > > > Also might be worth commenting why it's safe to remove elements while > walking the array. I think the comments to the copy_entry_remove suffice, but can add it here before landing if you prefer. The latest code is in https://gitlab.freedesktop.org/cmarcelo/mesa/commits/copy-prop and all the issues I haven't commented are supposed to be fixed according your comment (added new version to each patch that changed). Merged the use patches into a single one intel/nir, freedreno/ir3: Use the separated dead write vars pass Patches that still need R-b: nir: Copy propagation between blocks nir: Separate dead write removal into its own pass Caio ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/11] nir: Copy propagation between blocks
On Sat, Sep 15, 2018 at 12:46 AM Caio Marcelo de Oliveira Filho < caio.olive...@intel.com> wrote: > Extend the pass to propagate the copies information along the control > flow graph. It performs two walks, first it collects the vars > that were written inside each node. Then it walks applying the copy > propagation using a list of copies previously available. At each node > the list is invalidated according to results from the first walk. > > This approach is simpler than a full data-flow analysis, but covers > various cases. If derefs are used for operating on more memory > resources (e.g. SSBOs), the difference from a regular pass is expected > to be more visible -- as the SSA copy propagation pass won't apply to > those. > > A full data-flow analysis would handle more scenarios: conditional > breaks in the control flow and merge equivalent effects from multiple > branches (e.g. using a phi node to merge the source for writes to the > same deref). However, as previous commentary in the code stated, its > complexity 'rapidly get out of hand'. The current patch is a good > intermediate step towards more complex analysis. > > The 'copies' linked list was modified to use util_dynarray to make it > more convenient to clone it (to handle ifs/loops). > > Annotated shader-db results for Skylake: > > total instructions in shared programs: 15105796 -> 15105451 (<.01%) > instructions in affected programs: 152293 -> 151948 (-0.23%) > helped: 96 > HURT: 17 > > All the HURTs and many HELPs are one instruction. Looking > at pass by pass outputs, the copy prop kicks in removing a > bunch of loads correctly, which ends up altering what other > other optimizations kick. In those cases the copies would be > propagated after lowering to SSA. > > In few HELPs we are actually helping doing more than was > possible previously, e.g. consolidating load_uniforms from > different blocks. Most of those are from > shaders/dolphin/ubershaders/. > > total cycles in shared programs: 566048861 -> 565954876 (-0.02%) > cycles in affected programs: 151461830 -> 151367845 (-0.06%) > helped: 2933 > HURT: 2950 > > A lot of noise on both sides. > > total loops in shared programs: 4603 -> 4603 (0.00%) > loops in affected programs: 0 -> 0 > helped: 0 > HURT: 0 > > total spills in shared programs: 11085 -> 11073 (-0.11%) > spills in affected programs: 23 -> 11 (-52.17%) > helped: 1 > HURT: 0 > > The shaders/dolphin/ubershaders/12.shader_test was able to > pull a couple of loads from inside if statements and reuse > them. > > total fills in shared programs: 23143 -> 23089 (-0.23%) > fills in affected programs: 2718 -> 2664 (-1.99%) > helped: 27 > HURT: 0 > > All from shaders/dolphin/ubershaders/. > > LOST: 0 > GAINED: 0 > > The other generations follow the same overall shape. The spills and > fills HURTs are all from the same game. > > shader-db results for Broadwell. > > total instructions in shared programs: 15402037 -> 15401841 (<.01%) > instructions in affected programs: 144386 -> 144190 (-0.14%) > helped: 86 > HURT: 9 > > total cycles in shared programs: 600912755 -> 600902486 (<.01%) > cycles in affected programs: 185662820 -> 185652551 (<.01%) > helped: 2598 > HURT: 3053 > > total loops in shared programs: 4579 -> 4579 (0.00%) > loops in affected programs: 0 -> 0 > helped: 0 > HURT: 0 > > total spills in shared programs: 80929 -> 80924 (<.01%) > spills in affected programs: 720 -> 715 (-0.69%) > helped: 1 > HURT: 5 > > total fills in shared programs: 93057 -> 93013 (-0.05%) > fills in affected programs: 3398 -> 3354 (-1.29%) > helped: 27 > HURT: 5 > > LOST: 0 > GAINED: 2 > > shader-db results for Haswell: > > total instructions in shared programs: 9231975 -> 9230357 (-0.02%) > instructions in affected programs: 44992 -> 43374 (-3.60%) > helped: 27 > HURT: 69 > > total cycles in shared programs: 87760587 -> 87727502 (-0.04%) > cycles in affected programs: 7720673 -> 7687588 (-0.43%) > helped: 1609 > HURT: 1416 > > total loops in shared programs: 1830 -> 1830 (0.00%) > loops in affected programs: 0 -> 0 > helped: 0 > HURT: 0 > > total spills in shared programs: 1988 -> 1692 (-14.89%) > spills in affected programs: 296 -> 0 > helped: 1 > HURT: 0 > > total fills in shared programs: 2103 -> 1668 (-20.68%) > fills in affected programs: 438 -> 3 (-99.32%) > helped: 4 > HURT: 0 > > LOST: 0 > GAINED: 1 > --- > src/compiler/nir/nir_opt_copy_prop_vars.c | 394 +- > 1 file changed, 317 insertions(+), 77 deletions(-) > > diff --git a/src/compiler/nir/nir_opt_copy_prop_vars.c > b/src/compiler/nir/nir_opt_copy_prop_vars.c > index