Re: [Mesa-dev] [PATCH 1/2] nir: add pass to move load_const
On Wed, Jun 6, 2018 at 8:34 PM, Ian Romanick wrote: > On 06/06/2018 04:47 PM, Rob Clark wrote: >> On Wed, Jun 6, 2018 at 6:51 PM, Ian Romanick wrote: >>> On 06/06/2018 07:43 AM, Rob Clark wrote: Run this pass late (after opt loop) to move load_const instructions back into the basic blocks which use the result, in cases where a load_const is only consumed in a single block. >>> >>> If the load_const is used in more than one block, you could use it to >>> the block that dominates all of the block that contain a use. I suspect >>> in many cases that will just be the first block, but it's probably worth >>> trying. >> >> hmm, good point.. I was mostly going for the super-obvious wins with >> now downside cases (ie. low hanging fruit), but I guess the block that >> dominates all the uses isn't going to be worse than the first block.. >> >> I can play around w/ the common denominator case.. although in at >> least some cases I think it might be better to split the load_const. >> Idk, the low hanging fruit approach cuts register usage by 1/3rd in >> some of these edge case shaders, which by itself seems like a big >> enough win, but I guess nir_shader_compiler_options config params is >> an option for more aggressive approaches that may or may not be a win >> for different drivers. > > True. We could land this and incrementally improve it later. If we > want to do that, I'd suggest > > - Put all of our ideas for improving the pass, including the one >already documented in the code, to the file header comment. > > - Remove the suggestions for future work from the "if consumed in more >than one block, ignore." comment. sgtm, I'll respin tomorrow.. maybe after playing around a bit w/ the common denominator block thing (since I see some cases that seems like it would benefit) > This helps reduce register usage in cases where the backend driver cannot lower the load_const to a uniform. >>> >>> The trade off is that now you might not be able to hide the latency of >>> the load. I tried this on i965, and the results (below) support this >>> idea a bit. If you have a structured backend IR, this seems like a >>> thing better done there... like, if you can't register allocate without >>> spilling, try moving the load. But that's really just a scheduling >>> heuristic. Hmm... >> >> iirc, i965 had some clever load_const lower passes so all the threads >> in a warp could share the same load_const (iirc? I might not have >> that description quite right).. so I wasn't expecting this to be >> useful to i965 (but if it is, then, bonus!). For me I can *almost* >> always lower load_const to uniform, so in the common cases it hasn't >> been a problem. The exceptions are cases where an alu instruction >> consume >1 const (but those get optimized away) or a const plus >> uniform, or things like ssbo/image store. (I noticed the register >> usage blowing up thing looking at some deqp tests like >> dEQP-GLES31.functional.ssbo.layout.2_level_array.shared.vec4 ;-)) > > There are some optimizations for loading constants in the vec4 backend. > There is some code to lower arrays of constants to uniforms. Uniforms > are already shared within the SIMD group. I don't know of anything > other than that, but that's doesn't mean such a thing doesn't exist. > > Also... at what point do you call this pass? Here's what I did for > i965... basically called it as late as I thought was possible: > > diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c > index dfeea73b06a..2209d6142b3 100644 > --- a/src/intel/compiler/brw_nir.c > +++ b/src/intel/compiler/brw_nir.c > @@ -795,6 +795,8 @@ brw_postprocess_nir(nir_shader *nir, const struct > brw_compiler *compiler, > if (devinfo->gen <= 5) >brw_nir_analyze_boolean_resolves(nir); > > + nir_move_load_const(nir); > + fwiw, this is basically we I called it for ir3, it is intended to be after opt loop (or really anything that might move load_const back to the top of the first block).. but tbh I didn't play with it that much yet, other than seeing that it was a big with for the pedantic cases (like the deqp test I mentioned). I think about the only thing I do after the nir_move_load_const() pass is nir_lower_locals_to_regs + nir_convert_from_ssa(phi_webs_only=true) > nir_sweep(nir); > > if (unlikely(debug_enabled)) { > > Now that the full shader-db run is done, it seems to have some odd > effects on vec4 shaders, so I'll probably tweak this a bit. > >> If latency of the load is an issue, perhaps a configuration option >> about the minimum size of destination use_block, with the rough theory >> that if the block is bigger than some threshold the instruction >> scheduling could probably hide the latency? > > I thought about this a bit more... I think the driver's scheduler should > fix this case. If it can't hide the latency of a load immediate, the > scheduler should move it to the earlier block. That might even
Re: [Mesa-dev] [PATCH 1/2] nir: add pass to move load_const
On 06/06/2018 04:47 PM, Rob Clark wrote: > On Wed, Jun 6, 2018 at 6:51 PM, Ian Romanick wrote: >> On 06/06/2018 07:43 AM, Rob Clark wrote: >>> Run this pass late (after opt loop) to move load_const instructions back >>> into the basic blocks which use the result, in cases where a load_const >>> is only consumed in a single block. >> >> If the load_const is used in more than one block, you could use it to >> the block that dominates all of the block that contain a use. I suspect >> in many cases that will just be the first block, but it's probably worth >> trying. > > hmm, good point.. I was mostly going for the super-obvious wins with > now downside cases (ie. low hanging fruit), but I guess the block that > dominates all the uses isn't going to be worse than the first block.. > > I can play around w/ the common denominator case.. although in at > least some cases I think it might be better to split the load_const. > Idk, the low hanging fruit approach cuts register usage by 1/3rd in > some of these edge case shaders, which by itself seems like a big > enough win, but I guess nir_shader_compiler_options config params is > an option for more aggressive approaches that may or may not be a win > for different drivers. True. We could land this and incrementally improve it later. If we want to do that, I'd suggest - Put all of our ideas for improving the pass, including the one already documented in the code, to the file header comment. - Remove the suggestions for future work from the "if consumed in more than one block, ignore." comment. >>> This helps reduce register usage in cases where the backend driver >>> cannot lower the load_const to a uniform. >> >> The trade off is that now you might not be able to hide the latency of >> the load. I tried this on i965, and the results (below) support this >> idea a bit. If you have a structured backend IR, this seems like a >> thing better done there... like, if you can't register allocate without >> spilling, try moving the load. But that's really just a scheduling >> heuristic. Hmm... > > iirc, i965 had some clever load_const lower passes so all the threads > in a warp could share the same load_const (iirc? I might not have > that description quite right).. so I wasn't expecting this to be > useful to i965 (but if it is, then, bonus!). For me I can *almost* > always lower load_const to uniform, so in the common cases it hasn't > been a problem. The exceptions are cases where an alu instruction > consume >1 const (but those get optimized away) or a const plus > uniform, or things like ssbo/image store. (I noticed the register > usage blowing up thing looking at some deqp tests like > dEQP-GLES31.functional.ssbo.layout.2_level_array.shared.vec4 ;-)) There are some optimizations for loading constants in the vec4 backend. There is some code to lower arrays of constants to uniforms. Uniforms are already shared within the SIMD group. I don't know of anything other than that, but that's doesn't mean such a thing doesn't exist. Also... at what point do you call this pass? Here's what I did for i965... basically called it as late as I thought was possible: diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index dfeea73b06a..2209d6142b3 100644 --- a/src/intel/compiler/brw_nir.c +++ b/src/intel/compiler/brw_nir.c @@ -795,6 +795,8 @@ brw_postprocess_nir(nir_shader *nir, const struct brw_compiler *compiler, if (devinfo->gen <= 5) brw_nir_analyze_boolean_resolves(nir); + nir_move_load_const(nir); + nir_sweep(nir); if (unlikely(debug_enabled)) { Now that the full shader-db run is done, it seems to have some odd effects on vec4 shaders, so I'll probably tweak this a bit. > If latency of the load is an issue, perhaps a configuration option > about the minimum size of destination use_block, with the rough theory > that if the block is bigger than some threshold the instruction > scheduling could probably hide the latency? I thought about this a bit more... I think the driver's scheduler should fix this case. If it can't hide the latency of a load immediate, the scheduler should move it to the earlier block. That might even help hide the latency of a compare. Something like cmp.g.f0(16)nullg2<8,8,1>F -g9.4<0,1,0>F (+f0) if(16)JIP: 40 UIP: 40 END B0 ->B1 ->B2 START B1 <-B0 (40 cycles) mov(16) g77<1>F 1F should become cmp.g.f0(16)nullg2<8,8,1>F -g9.4<0,1,0>F mov(16) g77<1>F 1F (+f0) if(16)JIP: 40 UIP: 40 END B0 ->B1 ->B2 START B1 <-B0 (40 cycles) > BR, > -R > > >> total instructions in shared programs: 14398410 -> 14397918 (<.01%) >> instructions in affected programs: 134856 -> 134364 (-0.36%) >> helped: 53 >> HURT: 10 >> helped stats (abs) min: 1 max: 30 x̄: 9.47 x̃: 7 >> helped stats (rel) min: 0.16% max: 2.74% x̄: 0.64% x̃: 0.40% >> HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 >>
Re: [Mesa-dev] [PATCH 1/2] nir: add pass to move load_const
On Wed, Jun 6, 2018 at 6:51 PM, Ian Romanick wrote: > On 06/06/2018 07:43 AM, Rob Clark wrote: >> Run this pass late (after opt loop) to move load_const instructions back >> into the basic blocks which use the result, in cases where a load_const >> is only consumed in a single block. > > If the load_const is used in more than one block, you could use it to > the block that dominates all of the block that contain a use. I suspect > in many cases that will just be the first block, but it's probably worth > trying. hmm, good point.. I was mostly going for the super-obvious wins with now downside cases (ie. low hanging fruit), but I guess the block that dominates all the uses isn't going to be worse than the first block.. I can play around w/ the common denominator case.. although in at least some cases I think it might be better to split the load_const. Idk, the low hanging fruit approach cuts register usage by 1/3rd in some of these edge case shaders, which by itself seems like a big enough win, but I guess nir_shader_compiler_options config params is an option for more aggressive approaches that may or may not be a win for different drivers. > >> This helps reduce register usage in cases where the backend driver >> cannot lower the load_const to a uniform. > > The trade off is that now you might not be able to hide the latency of > the load. I tried this on i965, and the results (below) support this > idea a bit. If you have a structured backend IR, this seems like a > thing better done there... like, if you can't register allocate without > spilling, try moving the load. But that's really just a scheduling > heuristic. Hmm... > iirc, i965 had some clever load_const lower passes so all the threads in a warp could share the same load_const (iirc? I might not have that description quite right).. so I wasn't expecting this to be useful to i965 (but if it is, then, bonus!). For me I can *almost* always lower load_const to uniform, so in the common cases it hasn't been a problem. The exceptions are cases where an alu instruction consume >1 const (but those get optimized away) or a const plus uniform, or things like ssbo/image store. (I noticed the register usage blowing up thing looking at some deqp tests like dEQP-GLES31.functional.ssbo.layout.2_level_array.shared.vec4 ;-)) If latency of the load is an issue, perhaps a configuration option about the minimum size of destination use_block, with the rough theory that if the block is bigger than some threshold the instruction scheduling could probably hide the latency? BR, -R > total instructions in shared programs: 14398410 -> 14397918 (<.01%) > instructions in affected programs: 134856 -> 134364 (-0.36%) > helped: 53 > HURT: 10 > helped stats (abs) min: 1 max: 30 x̄: 9.47 x̃: 7 > helped stats (rel) min: 0.16% max: 2.74% x̄: 0.64% x̃: 0.40% > HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 > HURT stats (rel) min: 0.03% max: 0.21% x̄: 0.05% x̃: 0.03% > 95% mean confidence interval for instructions value: -9.61 -6.01 > 95% mean confidence interval for instructions %-change: -0.68% -0.38% > Instructions are helped. > > total cycles in shared programs: 532949823 -> 532851352 (-0.02%) > cycles in affected programs: 260693023 -> 260594552 (-0.04%) > helped: 122 > HURT: 88 > helped stats (abs) min: 1 max: 24933 x̄: 1080.30 x̃: 20 > helped stats (rel) min: <.01% max: 34.94% x̄: 2.20% x̃: 0.32% > HURT stats (abs) min: 1 max: 1884 x̄: 378.69 x̃: 29 > HURT stats (rel) min: <.01% max: 10.23% x̄: 0.64% x̃: 0.09% > 95% mean confidence interval for cycles value: -943.71 5.89 > 95% mean confidence interval for cycles %-change: -1.71% -0.32% > Inconclusive result (value mean confidence interval includes 0). > > total spills in shared programs: 8044 -> 7975 (-0.86%) > spills in affected programs: 2391 -> 2322 (-2.89%) > helped: 42 > HURT: 0 > > total fills in shared programs: 10950 -> 10884 (-0.60%) > fills in affected programs: 2999 -> 2933 (-2.20%) > helped: 42 > HURT: 0 > > LOST: 4 > GAINED: 7 > >> Signed-off-by: Rob Clark >> --- >> src/compiler/Makefile.sources | 1 + >> src/compiler/nir/meson.build | 1 + >> src/compiler/nir/nir.h | 1 + >> src/compiler/nir/nir_move_load_const.c | 131 + >> 4 files changed, 134 insertions(+) >> create mode 100644 src/compiler/nir/nir_move_load_const.c >> >> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources >> index 3daa2c51334..cb8a2875a84 100644 >> --- a/src/compiler/Makefile.sources >> +++ b/src/compiler/Makefile.sources >> @@ -253,6 +253,7 @@ NIR_FILES = \ >> nir/nir_lower_wpos_center.c \ >> nir/nir_lower_wpos_ytransform.c \ >> nir/nir_metadata.c \ >> + nir/nir_move_load_const.c \ >> nir/nir_move_vec_src_uses_to_dest.c \ >> nir/nir_normalize_cubemap_coords.c \ >> nir/nir_opt_conditional_discard.c \ >> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build >> index
Re: [Mesa-dev] [PATCH 1/2] nir: add pass to move load_const
On 06/06/2018 03:51 PM, Ian Romanick wrote: > On 06/06/2018 07:43 AM, Rob Clark wrote: >> Run this pass late (after opt loop) to move load_const instructions back >> into the basic blocks which use the result, in cases where a load_const >> is only consumed in a single block. > > If the load_const is used in more than one block, you could use it to > the block that dominates all of the block that contain a use. I suspect > in many cases that will just be the first block, but it's probably worth > trying. > >> This helps reduce register usage in cases where the backend driver >> cannot lower the load_const to a uniform. > > The trade off is that now you might not be able to hide the latency of > the load. I tried this on i965, and the results (below) support this > idea a bit. If you have a structured backend IR, this seems like a > thing better done there... like, if you can't register allocate without > spilling, try moving the load. But that's really just a scheduling > heuristic. Hmm... > > total instructions in shared programs: 14398410 -> 14397918 (<.01%) > instructions in affected programs: 134856 -> 134364 (-0.36%) > helped: 53 > HURT: 10 > helped stats (abs) min: 1 max: 30 x̄: 9.47 x̃: 7 > helped stats (rel) min: 0.16% max: 2.74% x̄: 0.64% x̃: 0.40% > HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 > HURT stats (rel) min: 0.03% max: 0.21% x̄: 0.05% x̃: 0.03% > 95% mean confidence interval for instructions value: -9.61 -6.01 > 95% mean confidence interval for instructions %-change: -0.68% -0.38% > Instructions are helped. > > total cycles in shared programs: 532949823 -> 532851352 (-0.02%) > cycles in affected programs: 260693023 -> 260594552 (-0.04%) > helped: 122 > HURT: 88 > helped stats (abs) min: 1 max: 24933 x̄: 1080.30 x̃: 20 > helped stats (rel) min: <.01% max: 34.94% x̄: 2.20% x̃: 0.32% > HURT stats (abs) min: 1 max: 1884 x̄: 378.69 x̃: 29 > HURT stats (rel) min: <.01% max: 10.23% x̄: 0.64% x̃: 0.09% > 95% mean confidence interval for cycles value: -943.71 5.89 > 95% mean confidence interval for cycles %-change: -1.71% -0.32% > Inconclusive result (value mean confidence interval includes 0). > > total spills in shared programs: 8044 -> 7975 (-0.86%) > spills in affected programs: 2391 -> 2322 (-2.89%) > helped: 42 > HURT: 0 > > total fills in shared programs: 10950 -> 10884 (-0.60%) > fills in affected programs: 2999 -> 2933 (-2.20%) > helped: 42 > HURT: 0 > > LOST: 4 > GAINED: 7 This lost result is *really* misleading. The 4 lost shaders are compute shaders that went from SIMD8 to SIMD16. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nir: add pass to move load_const
On 06/06/2018 07:43 AM, Rob Clark wrote: > Run this pass late (after opt loop) to move load_const instructions back > into the basic blocks which use the result, in cases where a load_const > is only consumed in a single block. If the load_const is used in more than one block, you could use it to the block that dominates all of the block that contain a use. I suspect in many cases that will just be the first block, but it's probably worth trying. > This helps reduce register usage in cases where the backend driver > cannot lower the load_const to a uniform. The trade off is that now you might not be able to hide the latency of the load. I tried this on i965, and the results (below) support this idea a bit. If you have a structured backend IR, this seems like a thing better done there... like, if you can't register allocate without spilling, try moving the load. But that's really just a scheduling heuristic. Hmm... total instructions in shared programs: 14398410 -> 14397918 (<.01%) instructions in affected programs: 134856 -> 134364 (-0.36%) helped: 53 HURT: 10 helped stats (abs) min: 1 max: 30 x̄: 9.47 x̃: 7 helped stats (rel) min: 0.16% max: 2.74% x̄: 0.64% x̃: 0.40% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.03% max: 0.21% x̄: 0.05% x̃: 0.03% 95% mean confidence interval for instructions value: -9.61 -6.01 95% mean confidence interval for instructions %-change: -0.68% -0.38% Instructions are helped. total cycles in shared programs: 532949823 -> 532851352 (-0.02%) cycles in affected programs: 260693023 -> 260594552 (-0.04%) helped: 122 HURT: 88 helped stats (abs) min: 1 max: 24933 x̄: 1080.30 x̃: 20 helped stats (rel) min: <.01% max: 34.94% x̄: 2.20% x̃: 0.32% HURT stats (abs) min: 1 max: 1884 x̄: 378.69 x̃: 29 HURT stats (rel) min: <.01% max: 10.23% x̄: 0.64% x̃: 0.09% 95% mean confidence interval for cycles value: -943.71 5.89 95% mean confidence interval for cycles %-change: -1.71% -0.32% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8044 -> 7975 (-0.86%) spills in affected programs: 2391 -> 2322 (-2.89%) helped: 42 HURT: 0 total fills in shared programs: 10950 -> 10884 (-0.60%) fills in affected programs: 2999 -> 2933 (-2.20%) helped: 42 HURT: 0 LOST: 4 GAINED: 7 > Signed-off-by: Rob Clark > --- > src/compiler/Makefile.sources | 1 + > src/compiler/nir/meson.build | 1 + > src/compiler/nir/nir.h | 1 + > src/compiler/nir/nir_move_load_const.c | 131 + > 4 files changed, 134 insertions(+) > create mode 100644 src/compiler/nir/nir_move_load_const.c > > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources > index 3daa2c51334..cb8a2875a84 100644 > --- a/src/compiler/Makefile.sources > +++ b/src/compiler/Makefile.sources > @@ -253,6 +253,7 @@ NIR_FILES = \ > nir/nir_lower_wpos_center.c \ > nir/nir_lower_wpos_ytransform.c \ > nir/nir_metadata.c \ > + nir/nir_move_load_const.c \ > nir/nir_move_vec_src_uses_to_dest.c \ > nir/nir_normalize_cubemap_coords.c \ > nir/nir_opt_conditional_discard.c \ > diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build > index 3fec363691d..373a47fd155 100644 > --- a/src/compiler/nir/meson.build > +++ b/src/compiler/nir/meson.build > @@ -144,6 +144,7 @@ files_libnir = files( >'nir_lower_wpos_ytransform.c', >'nir_lower_bit_size.c', >'nir_metadata.c', > + 'nir_move_load_const.c', >'nir_move_vec_src_uses_to_dest.c', >'nir_normalize_cubemap_coords.c', >'nir_opt_conditional_discard.c', > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > index 5a1f79515ad..073ab4e82ea 100644 > --- a/src/compiler/nir/nir.h > +++ b/src/compiler/nir/nir.h > @@ -2612,6 +2612,7 @@ bool nir_remove_dead_variables(nir_shader *shader, > nir_variable_mode modes); > bool nir_lower_constant_initializers(nir_shader *shader, > nir_variable_mode modes); > > +bool nir_move_load_const(nir_shader *shader); > bool nir_move_vec_src_uses_to_dest(nir_shader *shader); > bool nir_lower_vec_to_movs(nir_shader *shader); > void nir_lower_alpha_test(nir_shader *shader, enum compare_func func, > diff --git a/src/compiler/nir/nir_move_load_const.c > b/src/compiler/nir/nir_move_load_const.c > new file mode 100644 > index 000..c0750035fbb > --- /dev/null > +++ b/src/compiler/nir/nir_move_load_const.c > @@ -0,0 +1,131 @@ > +/* > + * Copyright © 2018 Red Hat > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject
[Mesa-dev] [PATCH 1/2] nir: add pass to move load_const
Run this pass late (after opt loop) to move load_const instructions back into the basic blocks which use the result, in cases where a load_const is only consumed in a single block. This helps reduce register usage in cases where the backend driver cannot lower the load_const to a uniform. Signed-off-by: Rob Clark --- src/compiler/Makefile.sources | 1 + src/compiler/nir/meson.build | 1 + src/compiler/nir/nir.h | 1 + src/compiler/nir/nir_move_load_const.c | 131 + 4 files changed, 134 insertions(+) create mode 100644 src/compiler/nir/nir_move_load_const.c diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources index 3daa2c51334..cb8a2875a84 100644 --- a/src/compiler/Makefile.sources +++ b/src/compiler/Makefile.sources @@ -253,6 +253,7 @@ NIR_FILES = \ nir/nir_lower_wpos_center.c \ nir/nir_lower_wpos_ytransform.c \ nir/nir_metadata.c \ + nir/nir_move_load_const.c \ nir/nir_move_vec_src_uses_to_dest.c \ nir/nir_normalize_cubemap_coords.c \ nir/nir_opt_conditional_discard.c \ diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build index 3fec363691d..373a47fd155 100644 --- a/src/compiler/nir/meson.build +++ b/src/compiler/nir/meson.build @@ -144,6 +144,7 @@ files_libnir = files( 'nir_lower_wpos_ytransform.c', 'nir_lower_bit_size.c', 'nir_metadata.c', + 'nir_move_load_const.c', 'nir_move_vec_src_uses_to_dest.c', 'nir_normalize_cubemap_coords.c', 'nir_opt_conditional_discard.c', diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 5a1f79515ad..073ab4e82ea 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2612,6 +2612,7 @@ bool nir_remove_dead_variables(nir_shader *shader, nir_variable_mode modes); bool nir_lower_constant_initializers(nir_shader *shader, nir_variable_mode modes); +bool nir_move_load_const(nir_shader *shader); bool nir_move_vec_src_uses_to_dest(nir_shader *shader); bool nir_lower_vec_to_movs(nir_shader *shader); void nir_lower_alpha_test(nir_shader *shader, enum compare_func func, diff --git a/src/compiler/nir/nir_move_load_const.c b/src/compiler/nir/nir_move_load_const.c new file mode 100644 index 000..c0750035fbb --- /dev/null +++ b/src/compiler/nir/nir_move_load_const.c @@ -0,0 +1,131 @@ +/* + * Copyright © 2018 Red Hat + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Rob Clark (robdcl...@gmail.com> + * + */ + +#include "nir.h" + + +/* + * A simple pass that moves load_const's into consuming block if + * they are only consumed in a single block, to try to counter- + * act nir's tendency to move all load_const to the top of the + * first block. + */ + +/* iterate a ssa def's use's and if all uses are within a single + * block, return it, otherwise return null. + */ +static nir_block * +get_single_block(nir_ssa_def *def) +{ + nir_block *b = NULL; + + /* hmm, probably ignore if-uses: */ + if (!list_empty(>if_uses)) + return NULL; + + nir_foreach_use(use, def) { + nir_instr *instr = use->parent_instr; + + /* + * Kind of an ugly special-case, but phi instructions + * need to appear first in the block, so by definition + * we can't move a load_immed into a block where it is + * consumed by a phi instruction. We could conceivably + * move it into a dominator block. + */ + if (instr->type == nir_instr_type_phi) + return NULL; + + nir_block *use_block = instr->block; + + if (!b) { + b = use_block; + } else if (b != use_block) { + return NULL; + } + } + + return b; +} + +bool +nir_move_load_const(nir_shader *shader) +{ + bool progress = false; + + nir_foreach_function(function, shader) { + if (!function->impl) + continue; + +