Re: [Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+
On Fri, 2016-09-16 at 12:17 +1000, Timothy Arceri wrote: > On Thu, 2016-09-15 at 21:22 -0400, Connor Abbott wrote: > > > > On Thu, Sep 15, 2016 at 9:06 PM, Timothy Arceri > > wrote: > > > > > > > > > On Thu, 2016-09-15 at 19:49 -0400, Connor Abbott wrote: > > > > > > > > > > > > This seems a little dubious... why restrict it to gen7+? > > > > > > Because if we don't switch to using nir_lower_indirect_derefs() > > > then > > > lower_variable_index_to_cond_assign() makes a big mess before we > > > get to > > > unrolling. I was getting piglit regressions when switching to for > > > gen6 > > > and below, unfortunately I have no hardware to debug it on. > > > > Ok, probably should be in the commit message. > > I've taken another look. The regression is below, this seems to > happen > because Gen6 can't handle sampler indirects and requires loops that > contain them to be unrolled (this seems like a bug the loop can't > always be unrolled). It seems this was failing in my series 2 because > the test has an interations of 32 and my limit for unrolling was < 32 > I > changed it to li->trip_count > max_iter in series 3. > > Test: piglit.spec.!opengl 2_0.max-samplers > Status: fail > Platform/arch: > snb/m64, g965/m64, ilk/m64, g45/m64 > > > > > > > > > > > > > > > > > > > > > > > > > > And why only > > > > scalar? This pass assumes SSA, but so do many other passes in > > > > core > > > > NIR > > > > that we also run in nir_optimize(), so that shouldn't be a > > > > problem. > > > > > > I need to retest to give you the exact reason but there was a > > > lowering > > > pass that is not called for the vector backend that meant we > > > still > > > had > > > to deal with nir registers. > > > > That doesn't quite sound right... we never generate registers in > > the > > frontend (except if they're immediately lowered away), and we don't > > lower to registers until pretty late in the process. > > I could very well be recalling the problem incorrectly. I've just > enabled it for all stages on my ivy bridge and run shader-db without > crashing so it's possible this was cause by a bug I've now resolved. > > I'll enable it and push it to jenkins to be sure. If everything > checks > out ok I'll send some new patches to enable nir unrolling everywhere. It regressed some piglit tests. The problem was the loop analysis pass was getting called when it shouldn't as thomas mistakenly used a decimal value rather than a hex value. nir_metadata_loop_analysis = 0x16, So we call would call it after we have converted from ssa. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 15, 2016 at 3:03 AM, Timothy Arceri > > > > wrote: > > > > > > > > > > > > > > > > > > > > --- > > > > > src/compiler/glsl/glsl_parser_extras.cpp | 12 + > > > > > src/mesa/drivers/dri/i965/brw_compiler.c | 42 > > > > > +++- > > > > > src/mesa/drivers/dri/i965/brw_nir.c | 23 +- > > > > > -- > > > > > - > > > > > 3 files changed, 55 insertions(+), 22 deletions(-) > > > > > > > > > > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp > > > > > b/src/compiler/glsl/glsl_parser_extras.cpp > > > > > index 436ddd0..a5c926a 100644 > > > > > --- a/src/compiler/glsl/glsl_parser_extras.cpp > > > > > +++ b/src/compiler/glsl/glsl_parser_extras.cpp > > > > > @@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir, > > > > > bool > > > > > linked, > > > > > OPT(optimize_split_arrays, ir, linked); > > > > > OPT(optimize_redundant_jumps, ir); > > > > > > > > > > - loop_state *ls = analyze_loop_variables(ir); > > > > > - if (ls->loop_found) { > > > > > - OPT(set_loop_controls, ir, ls); > > > > > - OPT(unroll_loops, ir, ls, options); > > > > > + if (options->MaxUnrollIterations != 0) { > > > > > + loop_state *ls = analyze_loop_variables(ir); > > > > > + if (ls->loop_found) { > > > > > + OPT(set_loop_controls, ir, ls); > > > > > + OPT(unroll_loops, ir, ls, options); > > > > > + } > > > > > + delete ls; > > > > > } > > > > > - delete ls; > > > > > > > > > > #undef OPT > > > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c > > > > > b/src/mesa/drivers/dri/i965/brw_compiler.c > > > > > index 86b1eaa..523b554 100644 > > > > > --- a/src/mesa/drivers/dri/i965/brw_compiler.c > > > > > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c > > > > > @@ -43,18 +43,28 @@ > > > > > .use_interpolated_input_intrinsics = > > > > > true, \ > > > > > .vertex_id_zero_based = true > > > > > > > > > > +#define > > > > > COMMON_SCALAR_OPTIONS > > > > > \ > > > > > + .lower_pack_half_2x16 = > > > > > true, \ > > > > > + .lower_pack_snorm_2x16 = > > > > > true, \ > > > > > + .lower_pack_snorm_4x8 = > > > > > true,
Re: [Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+
On Thu, 2016-09-15 at 21:22 -0400, Connor Abbott wrote: > On Thu, Sep 15, 2016 at 9:06 PM, Timothy Arceri > wrote: > > > > On Thu, 2016-09-15 at 19:49 -0400, Connor Abbott wrote: > > > > > > This seems a little dubious... why restrict it to gen7+? > > > > Because if we don't switch to using nir_lower_indirect_derefs() > > then > > lower_variable_index_to_cond_assign() makes a big mess before we > > get to > > unrolling. I was getting piglit regressions when switching to for > > gen6 > > and below, unfortunately I have no hardware to debug it on. > > Ok, probably should be in the commit message. I've taken another look. The regression is below, this seems to happen because Gen6 can't handle sampler indirects and requires loops that contain them to be unrolled (this seems like a bug the loop can't always be unrolled). It seems this was failing in my series 2 because the test has an interations of 32 and my limit for unrolling was < 32 I changed it to li->trip_count > max_iter in series 3. Test: piglit.spec.!opengl 2_0.max-samplers Status: fail Platform/arch: snb/m64, g965/m64, ilk/m64, g45/m64 > > > > > > > > > > > And why only > > > scalar? This pass assumes SSA, but so do many other passes in > > > core > > > NIR > > > that we also run in nir_optimize(), so that shouldn't be a > > > problem. > > > > I need to retest to give you the exact reason but there was a > > lowering > > pass that is not called for the vector backend that meant we still > > had > > to deal with nir registers. > > That doesn't quite sound right... we never generate registers in the > frontend (except if they're immediately lowered away), and we don't > lower to registers until pretty late in the process. I could very well be recalling the problem incorrectly. I've just enabled it for all stages on my ivy bridge and run shader-db without crashing so it's possible this was cause by a bug I've now resolved. I'll enable it and push it to jenkins to be sure. If everything checks out ok I'll send some new patches to enable nir unrolling everywhere. > > > > > > > > > > > > > > > > On Thu, Sep 15, 2016 at 3:03 AM, Timothy Arceri > > > wrote: > > > > > > > > > > > > --- > > > > src/compiler/glsl/glsl_parser_extras.cpp | 12 + > > > > src/mesa/drivers/dri/i965/brw_compiler.c | 42 > > > > +++- > > > > src/mesa/drivers/dri/i965/brw_nir.c | 23 +--- > > > > - > > > > 3 files changed, 55 insertions(+), 22 deletions(-) > > > > > > > > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp > > > > b/src/compiler/glsl/glsl_parser_extras.cpp > > > > index 436ddd0..a5c926a 100644 > > > > --- a/src/compiler/glsl/glsl_parser_extras.cpp > > > > +++ b/src/compiler/glsl/glsl_parser_extras.cpp > > > > @@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir, > > > > bool > > > > linked, > > > > OPT(optimize_split_arrays, ir, linked); > > > > OPT(optimize_redundant_jumps, ir); > > > > > > > > - loop_state *ls = analyze_loop_variables(ir); > > > > - if (ls->loop_found) { > > > > - OPT(set_loop_controls, ir, ls); > > > > - OPT(unroll_loops, ir, ls, options); > > > > + if (options->MaxUnrollIterations != 0) { > > > > + loop_state *ls = analyze_loop_variables(ir); > > > > + if (ls->loop_found) { > > > > + OPT(set_loop_controls, ir, ls); > > > > + OPT(unroll_loops, ir, ls, options); > > > > + } > > > > + delete ls; > > > > } > > > > - delete ls; > > > > > > > > #undef OPT > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c > > > > b/src/mesa/drivers/dri/i965/brw_compiler.c > > > > index 86b1eaa..523b554 100644 > > > > --- a/src/mesa/drivers/dri/i965/brw_compiler.c > > > > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c > > > > @@ -43,18 +43,28 @@ > > > > .use_interpolated_input_intrinsics = > > > > true, \ > > > > .vertex_id_zero_based = true > > > > > > > > +#define > > > > COMMON_SCALAR_OPTIONS > > > > \ > > > > + .lower_pack_half_2x16 = > > > > true, \ > > > > + .lower_pack_snorm_2x16 = > > > > true, \ > > > > + .lower_pack_snorm_4x8 = > > > > true, \ > > > > + .lower_pack_unorm_2x16 = > > > > true, \ > > > > + .lower_pack_unorm_4x8 = > > > > true, \ > > > > + .lower_unpack_half_2x16 = > > > > true,\ > > > > + .lower_unpack_snorm_2x16 = > > > > true, \ > > > > + .lower_unpack_snorm_4x8 = > > > > true,\ > > > > + .lower_unpack_unorm_2x16 = > > > > true, \ > > > > + .lower_unpack_unorm_4x8 = > > > > true
Re: [Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+
On Thu, Sep 15, 2016 at 9:06 PM, Timothy Arceri wrote: > On Thu, 2016-09-15 at 19:49 -0400, Connor Abbott wrote: >> This seems a little dubious... why restrict it to gen7+? > > Because if we don't switch to using nir_lower_indirect_derefs() then > lower_variable_index_to_cond_assign() makes a big mess before we get to > unrolling. I was getting piglit regressions when switching to for gen6 > and below, unfortunately I have no hardware to debug it on. Ok, probably should be in the commit message. > >> And why only >> scalar? This pass assumes SSA, but so do many other passes in core >> NIR >> that we also run in nir_optimize(), so that shouldn't be a problem. > > I need to retest to give you the exact reason but there was a lowering > pass that is not called for the vector backend that meant we still had > to deal with nir registers. That doesn't quite sound right... we never generate registers in the frontend (except if they're immediately lowered away), and we don't lower to registers until pretty late in the process. > > >> >> On Thu, Sep 15, 2016 at 3:03 AM, Timothy Arceri >> wrote: >> > >> > --- >> > src/compiler/glsl/glsl_parser_extras.cpp | 12 + >> > src/mesa/drivers/dri/i965/brw_compiler.c | 42 >> > +++- >> > src/mesa/drivers/dri/i965/brw_nir.c | 23 + >> > 3 files changed, 55 insertions(+), 22 deletions(-) >> > >> > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp >> > b/src/compiler/glsl/glsl_parser_extras.cpp >> > index 436ddd0..a5c926a 100644 >> > --- a/src/compiler/glsl/glsl_parser_extras.cpp >> > +++ b/src/compiler/glsl/glsl_parser_extras.cpp >> > @@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir, bool >> > linked, >> > OPT(optimize_split_arrays, ir, linked); >> > OPT(optimize_redundant_jumps, ir); >> > >> > - loop_state *ls = analyze_loop_variables(ir); >> > - if (ls->loop_found) { >> > - OPT(set_loop_controls, ir, ls); >> > - OPT(unroll_loops, ir, ls, options); >> > + if (options->MaxUnrollIterations != 0) { >> > + loop_state *ls = analyze_loop_variables(ir); >> > + if (ls->loop_found) { >> > + OPT(set_loop_controls, ir, ls); >> > + OPT(unroll_loops, ir, ls, options); >> > + } >> > + delete ls; >> > } >> > - delete ls; >> > >> > #undef OPT >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c >> > b/src/mesa/drivers/dri/i965/brw_compiler.c >> > index 86b1eaa..523b554 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_compiler.c >> > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c >> > @@ -43,18 +43,28 @@ >> > .use_interpolated_input_intrinsics = >> > true, \ >> > .vertex_id_zero_based = true >> > >> > +#define >> > COMMON_SCALAR_OPTIONS >> >\ >> > + .lower_pack_half_2x16 = >> > true, \ >> > + .lower_pack_snorm_2x16 = >> > true, \ >> > + .lower_pack_snorm_4x8 = >> > true, \ >> > + .lower_pack_unorm_2x16 = >> > true, \ >> > + .lower_pack_unorm_4x8 = >> > true, \ >> > + .lower_unpack_half_2x16 = >> > true,\ >> > + .lower_unpack_snorm_2x16 = >> > true, \ >> > + .lower_unpack_snorm_4x8 = >> > true,\ >> > + .lower_unpack_unorm_2x16 = >> > true, \ >> > + .lower_unpack_unorm_4x8 = >> > true \ >> > + >> > static const struct nir_shader_compiler_options scalar_nir_options >> > = { >> > COMMON_OPTIONS, >> > - .lower_pack_half_2x16 = true, >> > - .lower_pack_snorm_2x16 = true, >> > - .lower_pack_snorm_4x8 = true, >> > - .lower_pack_unorm_2x16 = true, >> > - .lower_pack_unorm_4x8 = true, >> > - .lower_unpack_half_2x16 = true, >> > - .lower_unpack_snorm_2x16 = true, >> > - .lower_unpack_snorm_4x8 = true, >> > - .lower_unpack_unorm_2x16 = true, >> > - .lower_unpack_unorm_4x8 = true, >> > + COMMON_SCALAR_OPTIONS, >> > + .max_unroll_iterations = 0, >> > +}; >> > + >> > +static const struct nir_shader_compiler_options >> > scalar_nir_options_gen7 = { >> > + COMMON_OPTIONS, >> > + COMMON_SCALAR_OPTIONS, >> > + .max_unroll_iterations = 32, >> > }; >> > >> > static const struct nir_shader_compiler_options vector_nir_options >> > = { >> > @@ -75,6 +85,7 @@ static const struct nir_shader_compiler_options >> > vector_nir_options = { >> > .lower_unpack_unorm_2x16 = true, >> > .lower_extract_byte = true, >> > .lower_extract_word = true, >> > + .max_unroll_iterations = 0, >> > }; >> > >> > static const struct nir_shader_compiler_options >> > vector_nir_options_gen6 = { >> > @@ -92,6 +103,7 @@ static const struct nir_sha
Re: [Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+
On Thu, 2016-09-15 at 19:49 -0400, Connor Abbott wrote: > This seems a little dubious... why restrict it to gen7+? Because if we don't switch to using nir_lower_indirect_derefs() then lower_variable_index_to_cond_assign() makes a big mess before we get to unrolling. I was getting piglit regressions when switching to for gen6 and below, unfortunately I have no hardware to debug it on. > And why only > scalar? This pass assumes SSA, but so do many other passes in core > NIR > that we also run in nir_optimize(), so that shouldn't be a problem. I need to retest to give you the exact reason but there was a lowering pass that is not called for the vector backend that meant we still had to deal with nir registers. > > On Thu, Sep 15, 2016 at 3:03 AM, Timothy Arceri > wrote: > > > > --- > > src/compiler/glsl/glsl_parser_extras.cpp | 12 + > > src/mesa/drivers/dri/i965/brw_compiler.c | 42 > > +++- > > src/mesa/drivers/dri/i965/brw_nir.c | 23 + > > 3 files changed, 55 insertions(+), 22 deletions(-) > > > > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp > > b/src/compiler/glsl/glsl_parser_extras.cpp > > index 436ddd0..a5c926a 100644 > > --- a/src/compiler/glsl/glsl_parser_extras.cpp > > +++ b/src/compiler/glsl/glsl_parser_extras.cpp > > @@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir, bool > > linked, > > OPT(optimize_split_arrays, ir, linked); > > OPT(optimize_redundant_jumps, ir); > > > > - loop_state *ls = analyze_loop_variables(ir); > > - if (ls->loop_found) { > > - OPT(set_loop_controls, ir, ls); > > - OPT(unroll_loops, ir, ls, options); > > + if (options->MaxUnrollIterations != 0) { > > + loop_state *ls = analyze_loop_variables(ir); > > + if (ls->loop_found) { > > + OPT(set_loop_controls, ir, ls); > > + OPT(unroll_loops, ir, ls, options); > > + } > > + delete ls; > > } > > - delete ls; > > > > #undef OPT > > > > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c > > b/src/mesa/drivers/dri/i965/brw_compiler.c > > index 86b1eaa..523b554 100644 > > --- a/src/mesa/drivers/dri/i965/brw_compiler.c > > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c > > @@ -43,18 +43,28 @@ > > .use_interpolated_input_intrinsics = > > true, \ > > .vertex_id_zero_based = true > > > > +#define > > COMMON_SCALAR_OPTIONS > > \ > > + .lower_pack_half_2x16 = > > true, \ > > + .lower_pack_snorm_2x16 = > > true, \ > > + .lower_pack_snorm_4x8 = > > true, \ > > + .lower_pack_unorm_2x16 = > > true, \ > > + .lower_pack_unorm_4x8 = > > true, \ > > + .lower_unpack_half_2x16 = > > true,\ > > + .lower_unpack_snorm_2x16 = > > true, \ > > + .lower_unpack_snorm_4x8 = > > true,\ > > + .lower_unpack_unorm_2x16 = > > true, \ > > + .lower_unpack_unorm_4x8 = > > true \ > > + > > static const struct nir_shader_compiler_options scalar_nir_options > > = { > > COMMON_OPTIONS, > > - .lower_pack_half_2x16 = true, > > - .lower_pack_snorm_2x16 = true, > > - .lower_pack_snorm_4x8 = true, > > - .lower_pack_unorm_2x16 = true, > > - .lower_pack_unorm_4x8 = true, > > - .lower_unpack_half_2x16 = true, > > - .lower_unpack_snorm_2x16 = true, > > - .lower_unpack_snorm_4x8 = true, > > - .lower_unpack_unorm_2x16 = true, > > - .lower_unpack_unorm_4x8 = true, > > + COMMON_SCALAR_OPTIONS, > > + .max_unroll_iterations = 0, > > +}; > > + > > +static const struct nir_shader_compiler_options > > scalar_nir_options_gen7 = { > > + COMMON_OPTIONS, > > + COMMON_SCALAR_OPTIONS, > > + .max_unroll_iterations = 32, > > }; > > > > static const struct nir_shader_compiler_options vector_nir_options > > = { > > @@ -75,6 +85,7 @@ static const struct nir_shader_compiler_options > > vector_nir_options = { > > .lower_unpack_unorm_2x16 = true, > > .lower_extract_byte = true, > > .lower_extract_word = true, > > + .max_unroll_iterations = 0, > > }; > > > > static const struct nir_shader_compiler_options > > vector_nir_options_gen6 = { > > @@ -92,6 +103,7 @@ static const struct nir_shader_compiler_options > > vector_nir_options_gen6 = { > > .lower_unpack_unorm_2x16 = true, > > .lower_extract_byte = true, > > .lower_extract_word = true, > > + .max_unroll_iterations = 0, > > }; > > > > struct brw_compiler * > > @@ -119,7 +131,6 @@ brw_compiler_create(void *mem_ctx, const struct > > gen_device_info *devinfo) > > > > /* We want the GL
Re: [Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+
This seems a little dubious... why restrict it to gen7+? And why only scalar? This pass assumes SSA, but so do many other passes in core NIR that we also run in nir_optimize(), so that shouldn't be a problem. On Thu, Sep 15, 2016 at 3:03 AM, Timothy Arceri wrote: > --- > src/compiler/glsl/glsl_parser_extras.cpp | 12 + > src/mesa/drivers/dri/i965/brw_compiler.c | 42 > +++- > src/mesa/drivers/dri/i965/brw_nir.c | 23 + > 3 files changed, 55 insertions(+), 22 deletions(-) > > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp > b/src/compiler/glsl/glsl_parser_extras.cpp > index 436ddd0..a5c926a 100644 > --- a/src/compiler/glsl/glsl_parser_extras.cpp > +++ b/src/compiler/glsl/glsl_parser_extras.cpp > @@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir, bool linked, > OPT(optimize_split_arrays, ir, linked); > OPT(optimize_redundant_jumps, ir); > > - loop_state *ls = analyze_loop_variables(ir); > - if (ls->loop_found) { > - OPT(set_loop_controls, ir, ls); > - OPT(unroll_loops, ir, ls, options); > + if (options->MaxUnrollIterations != 0) { > + loop_state *ls = analyze_loop_variables(ir); > + if (ls->loop_found) { > + OPT(set_loop_controls, ir, ls); > + OPT(unroll_loops, ir, ls, options); > + } > + delete ls; > } > - delete ls; > > #undef OPT > > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c > b/src/mesa/drivers/dri/i965/brw_compiler.c > index 86b1eaa..523b554 100644 > --- a/src/mesa/drivers/dri/i965/brw_compiler.c > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c > @@ -43,18 +43,28 @@ > .use_interpolated_input_intrinsics = true, > \ > .vertex_id_zero_based = true > > +#define COMMON_SCALAR_OPTIONS > \ > + .lower_pack_half_2x16 = true, > \ > + .lower_pack_snorm_2x16 = true, > \ > + .lower_pack_snorm_4x8 = true, > \ > + .lower_pack_unorm_2x16 = true, > \ > + .lower_pack_unorm_4x8 = true, > \ > + .lower_unpack_half_2x16 = true, > \ > + .lower_unpack_snorm_2x16 = true, > \ > + .lower_unpack_snorm_4x8 = true, > \ > + .lower_unpack_unorm_2x16 = true, > \ > + .lower_unpack_unorm_4x8 = true > \ > + > static const struct nir_shader_compiler_options scalar_nir_options = { > COMMON_OPTIONS, > - .lower_pack_half_2x16 = true, > - .lower_pack_snorm_2x16 = true, > - .lower_pack_snorm_4x8 = true, > - .lower_pack_unorm_2x16 = true, > - .lower_pack_unorm_4x8 = true, > - .lower_unpack_half_2x16 = true, > - .lower_unpack_snorm_2x16 = true, > - .lower_unpack_snorm_4x8 = true, > - .lower_unpack_unorm_2x16 = true, > - .lower_unpack_unorm_4x8 = true, > + COMMON_SCALAR_OPTIONS, > + .max_unroll_iterations = 0, > +}; > + > +static const struct nir_shader_compiler_options scalar_nir_options_gen7 = { > + COMMON_OPTIONS, > + COMMON_SCALAR_OPTIONS, > + .max_unroll_iterations = 32, > }; > > static const struct nir_shader_compiler_options vector_nir_options = { > @@ -75,6 +85,7 @@ static const struct nir_shader_compiler_options > vector_nir_options = { > .lower_unpack_unorm_2x16 = true, > .lower_extract_byte = true, > .lower_extract_word = true, > + .max_unroll_iterations = 0, > }; > > static const struct nir_shader_compiler_options vector_nir_options_gen6 = { > @@ -92,6 +103,7 @@ static const struct nir_shader_compiler_options > vector_nir_options_gen6 = { > .lower_unpack_unorm_2x16 = true, > .lower_extract_byte = true, > .lower_extract_word = true, > + .max_unroll_iterations = 0, > }; > > struct brw_compiler * > @@ -119,7 +131,6 @@ brw_compiler_create(void *mem_ctx, const struct > gen_device_info *devinfo) > > /* We want the GLSL compiler to emit code that uses condition codes */ > for (int i = 0; i < MESA_SHADER_STAGES; i++) { > - compiler->glsl_compiler_options[i].MaxUnrollIterations = 32; >compiler->glsl_compiler_options[i].MaxIfDepth = > devinfo->gen < 6 ? 16 : UINT_MAX; > > @@ -140,8 +151,15 @@ brw_compiler_create(void *mem_ctx, const struct > gen_device_info *devinfo) > compiler->glsl_compiler_options[i].EmitNoIndirectSampler = true; > >if (is_scalar) { > - compiler->glsl_compiler_options[i].NirOptions = &scalar_nir_options; > + if (devinfo->gen > 6) { > +compiler->glsl_compiler_options[i].MaxUnrollIterations = 0; > + } else { > +compiler->glsl_compiler_options[i].MaxUnrollIte
[Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+
--- src/compiler/glsl/glsl_parser_extras.cpp | 12 + src/mesa/drivers/dri/i965/brw_compiler.c | 42 +++- src/mesa/drivers/dri/i965/brw_nir.c | 23 + 3 files changed, 55 insertions(+), 22 deletions(-) diff --git a/src/compiler/glsl/glsl_parser_extras.cpp b/src/compiler/glsl/glsl_parser_extras.cpp index 436ddd0..a5c926a 100644 --- a/src/compiler/glsl/glsl_parser_extras.cpp +++ b/src/compiler/glsl/glsl_parser_extras.cpp @@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir, bool linked, OPT(optimize_split_arrays, ir, linked); OPT(optimize_redundant_jumps, ir); - loop_state *ls = analyze_loop_variables(ir); - if (ls->loop_found) { - OPT(set_loop_controls, ir, ls); - OPT(unroll_loops, ir, ls, options); + if (options->MaxUnrollIterations != 0) { + loop_state *ls = analyze_loop_variables(ir); + if (ls->loop_found) { + OPT(set_loop_controls, ir, ls); + OPT(unroll_loops, ir, ls, options); + } + delete ls; } - delete ls; #undef OPT diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c b/src/mesa/drivers/dri/i965/brw_compiler.c index 86b1eaa..523b554 100644 --- a/src/mesa/drivers/dri/i965/brw_compiler.c +++ b/src/mesa/drivers/dri/i965/brw_compiler.c @@ -43,18 +43,28 @@ .use_interpolated_input_intrinsics = true, \ .vertex_id_zero_based = true +#define COMMON_SCALAR_OPTIONS \ + .lower_pack_half_2x16 = true, \ + .lower_pack_snorm_2x16 = true, \ + .lower_pack_snorm_4x8 = true, \ + .lower_pack_unorm_2x16 = true, \ + .lower_pack_unorm_4x8 = true, \ + .lower_unpack_half_2x16 = true,\ + .lower_unpack_snorm_2x16 = true, \ + .lower_unpack_snorm_4x8 = true,\ + .lower_unpack_unorm_2x16 = true, \ + .lower_unpack_unorm_4x8 = true \ + static const struct nir_shader_compiler_options scalar_nir_options = { COMMON_OPTIONS, - .lower_pack_half_2x16 = true, - .lower_pack_snorm_2x16 = true, - .lower_pack_snorm_4x8 = true, - .lower_pack_unorm_2x16 = true, - .lower_pack_unorm_4x8 = true, - .lower_unpack_half_2x16 = true, - .lower_unpack_snorm_2x16 = true, - .lower_unpack_snorm_4x8 = true, - .lower_unpack_unorm_2x16 = true, - .lower_unpack_unorm_4x8 = true, + COMMON_SCALAR_OPTIONS, + .max_unroll_iterations = 0, +}; + +static const struct nir_shader_compiler_options scalar_nir_options_gen7 = { + COMMON_OPTIONS, + COMMON_SCALAR_OPTIONS, + .max_unroll_iterations = 32, }; static const struct nir_shader_compiler_options vector_nir_options = { @@ -75,6 +85,7 @@ static const struct nir_shader_compiler_options vector_nir_options = { .lower_unpack_unorm_2x16 = true, .lower_extract_byte = true, .lower_extract_word = true, + .max_unroll_iterations = 0, }; static const struct nir_shader_compiler_options vector_nir_options_gen6 = { @@ -92,6 +103,7 @@ static const struct nir_shader_compiler_options vector_nir_options_gen6 = { .lower_unpack_unorm_2x16 = true, .lower_extract_byte = true, .lower_extract_word = true, + .max_unroll_iterations = 0, }; struct brw_compiler * @@ -119,7 +131,6 @@ brw_compiler_create(void *mem_ctx, const struct gen_device_info *devinfo) /* We want the GLSL compiler to emit code that uses condition codes */ for (int i = 0; i < MESA_SHADER_STAGES; i++) { - compiler->glsl_compiler_options[i].MaxUnrollIterations = 32; compiler->glsl_compiler_options[i].MaxIfDepth = devinfo->gen < 6 ? 16 : UINT_MAX; @@ -140,8 +151,15 @@ brw_compiler_create(void *mem_ctx, const struct gen_device_info *devinfo) compiler->glsl_compiler_options[i].EmitNoIndirectSampler = true; if (is_scalar) { - compiler->glsl_compiler_options[i].NirOptions = &scalar_nir_options; + if (devinfo->gen > 6) { +compiler->glsl_compiler_options[i].MaxUnrollIterations = 0; + } else { +compiler->glsl_compiler_options[i].MaxUnrollIterations = 32; + } + compiler->glsl_compiler_options[i].NirOptions = +devinfo->gen < 7 ? &scalar_nir_options : &scalar_nir_options_gen7; } else { + compiler->glsl_compiler_options[i].MaxUnrollIterations = 32; compiler->glsl_compiler_options[i].NirOptions = devinfo->gen < 6 ? &vector_nir_options : &vector_nir_options_gen6; } diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c index 0c15b55..15abb77