Re: [Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+

2016-09-15 Thread Timothy Arceri
On Fri, 2016-09-16 at 12:17 +1000, Timothy Arceri wrote:
> On Thu, 2016-09-15 at 21:22 -0400, Connor Abbott wrote:
> > 
> > On Thu, Sep 15, 2016 at 9:06 PM, Timothy Arceri
> >  wrote:
> > > 
> > > 
> > > On Thu, 2016-09-15 at 19:49 -0400, Connor Abbott wrote:
> > > > 
> > > > 
> > > > This seems a little dubious... why restrict it to gen7+?
> > > 
> > > Because if we don't switch to using nir_lower_indirect_derefs()
> > > then
> > > lower_variable_index_to_cond_assign() makes a big mess before we
> > > get to
> > > unrolling. I was getting piglit regressions when switching to for
> > > gen6
> > > and below, unfortunately I have no hardware to debug it on.
> > 
> > Ok, probably should be in the commit message.
> 
> I've taken another look. The regression is below, this seems to
> happen
> because Gen6 can't handle sampler indirects and requires loops that
> contain them to be unrolled (this seems like a bug the loop can't
> always be unrolled). It seems this was failing in my series 2 because
> the test has an interations of 32 and my limit for unrolling was < 32
> I
> changed it to li->trip_count > max_iter in series 3.
> 
> Test: piglit.spec.!opengl 2_0.max-samplers
> Status: fail
> Platform/arch:
>   snb/m64, g965/m64, ilk/m64, g45/m64
> 
> > 
> > 
> > > 
> > > 
> > > 
> > > > 
> > > > 
> > > >  And why only
> > > > scalar? This pass assumes SSA, but so do many other passes in
> > > > core
> > > > NIR
> > > > that we also run in nir_optimize(), so that shouldn't be a
> > > > problem.
> > > 
> > > I need to retest to give you the exact reason but there was a
> > > lowering
> > > pass that is not called for the vector backend that meant we
> > > still
> > > had
> > > to deal with nir registers.
> > 
> > That doesn't quite sound right... we never generate registers in
> > the
> > frontend (except if they're immediately lowered away), and we don't
> > lower to registers until pretty late in the process.
> 
> I could very well be recalling the problem incorrectly. I've just
> enabled it for all stages on my ivy bridge and run shader-db without
> crashing so it's possible this was cause by a bug I've now resolved. 
> 
> I'll enable it and push it to jenkins to be sure. If everything
> checks
> out ok I'll send some new patches to enable nir unrolling everywhere.

It regressed some piglit tests. The problem was the loop analysis pass
was getting called when it shouldn't as thomas mistakenly used a
decimal value rather than a hex value.

nir_metadata_loop_analysis = 0x16,

So we call would call it after we have converted from ssa.


>  
> > 
> > 
> > > 
> > > 
> > > 
> > > 
> > > > 
> > > > 
> > > > 
> > > > On Thu, Sep 15, 2016 at 3:03 AM, Timothy Arceri
> > > >  wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > ---
> > > > >  src/compiler/glsl/glsl_parser_extras.cpp | 12 +
> > > > >  src/mesa/drivers/dri/i965/brw_compiler.c | 42
> > > > > +++-
> > > > >  src/mesa/drivers/dri/i965/brw_nir.c  | 23 +-
> > > > > --
> > > > > -
> > > > >  3 files changed, 55 insertions(+), 22 deletions(-)
> > > > > 
> > > > > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp
> > > > > b/src/compiler/glsl/glsl_parser_extras.cpp
> > > > > index 436ddd0..a5c926a 100644
> > > > > --- a/src/compiler/glsl/glsl_parser_extras.cpp
> > > > > +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> > > > > @@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir,
> > > > > bool
> > > > > linked,
> > > > > OPT(optimize_split_arrays, ir, linked);
> > > > > OPT(optimize_redundant_jumps, ir);
> > > > > 
> > > > > -   loop_state *ls = analyze_loop_variables(ir);
> > > > > -   if (ls->loop_found) {
> > > > > -  OPT(set_loop_controls, ir, ls);
> > > > > -  OPT(unroll_loops, ir, ls, options);
> > > > > +   if (options->MaxUnrollIterations != 0) {
> > > > > +  loop_state *ls = analyze_loop_variables(ir);
> > > > > +  if (ls->loop_found) {
> > > > > + OPT(set_loop_controls, ir, ls);
> > > > > + OPT(unroll_loops, ir, ls, options);
> > > > > +  }
> > > > > +  delete ls;
> > > > > }
> > > > > -   delete ls;
> > > > > 
> > > > >  #undef OPT
> > > > > 
> > > > > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c
> > > > > b/src/mesa/drivers/dri/i965/brw_compiler.c
> > > > > index 86b1eaa..523b554 100644
> > > > > --- a/src/mesa/drivers/dri/i965/brw_compiler.c
> > > > > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c
> > > > > @@ -43,18 +43,28 @@
> > > > > .use_interpolated_input_intrinsics =
> > > > > true, \
> > > > > .vertex_id_zero_based = true
> > > > > 
> > > > > +#define
> > > > > COMMON_SCALAR_OPTIONS
> > > > >    \
> > > > > +   .lower_pack_half_2x16 =
> > > > > true,  \
> > > > > +   .lower_pack_snorm_2x16 =
> > > > > true, \
> > > > > +   .lower_pack_snorm_4x8 =
> > > > > true, 

Re: [Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+

2016-09-15 Thread Timothy Arceri
On Thu, 2016-09-15 at 21:22 -0400, Connor Abbott wrote:
> On Thu, Sep 15, 2016 at 9:06 PM, Timothy Arceri
>  wrote:
> > 
> > On Thu, 2016-09-15 at 19:49 -0400, Connor Abbott wrote:
> > > 
> > > This seems a little dubious... why restrict it to gen7+?
> > 
> > Because if we don't switch to using nir_lower_indirect_derefs()
> > then
> > lower_variable_index_to_cond_assign() makes a big mess before we
> > get to
> > unrolling. I was getting piglit regressions when switching to for
> > gen6
> > and below, unfortunately I have no hardware to debug it on.
> 
> Ok, probably should be in the commit message.

I've taken another look. The regression is below, this seems to happen
because Gen6 can't handle sampler indirects and requires loops that
contain them to be unrolled (this seems like a bug the loop can't
always be unrolled). It seems this was failing in my series 2 because
the test has an interations of 32 and my limit for unrolling was < 32 I
changed it to li->trip_count > max_iter in series 3.

Test: piglit.spec.!opengl 2_0.max-samplers
Status: fail
Platform/arch:
snb/m64, g965/m64, ilk/m64, g45/m64

> 
> > 
> > 
> > > 
> > >  And why only
> > > scalar? This pass assumes SSA, but so do many other passes in
> > > core
> > > NIR
> > > that we also run in nir_optimize(), so that shouldn't be a
> > > problem.
> > 
> > I need to retest to give you the exact reason but there was a
> > lowering
> > pass that is not called for the vector backend that meant we still
> > had
> > to deal with nir registers.
> 
> That doesn't quite sound right... we never generate registers in the
> frontend (except if they're immediately lowered away), and we don't
> lower to registers until pretty late in the process.

I could very well be recalling the problem incorrectly. I've just
enabled it for all stages on my ivy bridge and run shader-db without
crashing so it's possible this was cause by a bug I've now resolved. 

I'll enable it and push it to jenkins to be sure. If everything checks
out ok I'll send some new patches to enable nir unrolling everywhere.
 
> 
> > 
> > 
> > 
> > > 
> > > 
> > > On Thu, Sep 15, 2016 at 3:03 AM, Timothy Arceri
> > >  wrote:
> > > > 
> > > > 
> > > > ---
> > > >  src/compiler/glsl/glsl_parser_extras.cpp | 12 +
> > > >  src/mesa/drivers/dri/i965/brw_compiler.c | 42
> > > > +++-
> > > >  src/mesa/drivers/dri/i965/brw_nir.c  | 23 +---
> > > > -
> > > >  3 files changed, 55 insertions(+), 22 deletions(-)
> > > > 
> > > > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp
> > > > b/src/compiler/glsl/glsl_parser_extras.cpp
> > > > index 436ddd0..a5c926a 100644
> > > > --- a/src/compiler/glsl/glsl_parser_extras.cpp
> > > > +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> > > > @@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir,
> > > > bool
> > > > linked,
> > > > OPT(optimize_split_arrays, ir, linked);
> > > > OPT(optimize_redundant_jumps, ir);
> > > > 
> > > > -   loop_state *ls = analyze_loop_variables(ir);
> > > > -   if (ls->loop_found) {
> > > > -  OPT(set_loop_controls, ir, ls);
> > > > -  OPT(unroll_loops, ir, ls, options);
> > > > +   if (options->MaxUnrollIterations != 0) {
> > > > +  loop_state *ls = analyze_loop_variables(ir);
> > > > +  if (ls->loop_found) {
> > > > + OPT(set_loop_controls, ir, ls);
> > > > + OPT(unroll_loops, ir, ls, options);
> > > > +  }
> > > > +  delete ls;
> > > > }
> > > > -   delete ls;
> > > > 
> > > >  #undef OPT
> > > > 
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c
> > > > b/src/mesa/drivers/dri/i965/brw_compiler.c
> > > > index 86b1eaa..523b554 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_compiler.c
> > > > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c
> > > > @@ -43,18 +43,28 @@
> > > > .use_interpolated_input_intrinsics =
> > > > true, \
> > > > .vertex_id_zero_based = true
> > > > 
> > > > +#define
> > > > COMMON_SCALAR_OPTIONS
> > > >    \
> > > > +   .lower_pack_half_2x16 =
> > > > true,  \
> > > > +   .lower_pack_snorm_2x16 =
> > > > true, \
> > > > +   .lower_pack_snorm_4x8 =
> > > > true,  \
> > > > +   .lower_pack_unorm_2x16 =
> > > > true, \
> > > > +   .lower_pack_unorm_4x8 =
> > > > true,  \
> > > > +   .lower_unpack_half_2x16 =
> > > > true,\
> > > > +   .lower_unpack_snorm_2x16 =
> > > > true,   \
> > > > +   .lower_unpack_snorm_4x8 =
> > > > true,\
> > > > +   .lower_unpack_unorm_2x16 =
> > > > true,   \
> > > > +   .lower_unpack_unorm_4x8 =
> > > > true   

Re: [Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+

2016-09-15 Thread Connor Abbott
On Thu, Sep 15, 2016 at 9:06 PM, Timothy Arceri
 wrote:
> On Thu, 2016-09-15 at 19:49 -0400, Connor Abbott wrote:
>> This seems a little dubious... why restrict it to gen7+?
>
> Because if we don't switch to using nir_lower_indirect_derefs() then
> lower_variable_index_to_cond_assign() makes a big mess before we get to
> unrolling. I was getting piglit regressions when switching to for gen6
> and below, unfortunately I have no hardware to debug it on.

Ok, probably should be in the commit message.

>
>>  And why only
>> scalar? This pass assumes SSA, but so do many other passes in core
>> NIR
>> that we also run in nir_optimize(), so that shouldn't be a problem.
>
> I need to retest to give you the exact reason but there was a lowering
> pass that is not called for the vector backend that meant we still had
> to deal with nir registers.

That doesn't quite sound right... we never generate registers in the
frontend (except if they're immediately lowered away), and we don't
lower to registers until pretty late in the process.

>
>
>>
>> On Thu, Sep 15, 2016 at 3:03 AM, Timothy Arceri
>>  wrote:
>> >
>> > ---
>> >  src/compiler/glsl/glsl_parser_extras.cpp | 12 +
>> >  src/mesa/drivers/dri/i965/brw_compiler.c | 42
>> > +++-
>> >  src/mesa/drivers/dri/i965/brw_nir.c  | 23 +
>> >  3 files changed, 55 insertions(+), 22 deletions(-)
>> >
>> > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp
>> > b/src/compiler/glsl/glsl_parser_extras.cpp
>> > index 436ddd0..a5c926a 100644
>> > --- a/src/compiler/glsl/glsl_parser_extras.cpp
>> > +++ b/src/compiler/glsl/glsl_parser_extras.cpp
>> > @@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir, bool
>> > linked,
>> > OPT(optimize_split_arrays, ir, linked);
>> > OPT(optimize_redundant_jumps, ir);
>> >
>> > -   loop_state *ls = analyze_loop_variables(ir);
>> > -   if (ls->loop_found) {
>> > -  OPT(set_loop_controls, ir, ls);
>> > -  OPT(unroll_loops, ir, ls, options);
>> > +   if (options->MaxUnrollIterations != 0) {
>> > +  loop_state *ls = analyze_loop_variables(ir);
>> > +  if (ls->loop_found) {
>> > + OPT(set_loop_controls, ir, ls);
>> > + OPT(unroll_loops, ir, ls, options);
>> > +  }
>> > +  delete ls;
>> > }
>> > -   delete ls;
>> >
>> >  #undef OPT
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c
>> > b/src/mesa/drivers/dri/i965/brw_compiler.c
>> > index 86b1eaa..523b554 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_compiler.c
>> > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c
>> > @@ -43,18 +43,28 @@
>> > .use_interpolated_input_intrinsics =
>> > true, \
>> > .vertex_id_zero_based = true
>> >
>> > +#define
>> > COMMON_SCALAR_OPTIONS
>> >\
>> > +   .lower_pack_half_2x16 =
>> > true,  \
>> > +   .lower_pack_snorm_2x16 =
>> > true, \
>> > +   .lower_pack_snorm_4x8 =
>> > true,  \
>> > +   .lower_pack_unorm_2x16 =
>> > true, \
>> > +   .lower_pack_unorm_4x8 =
>> > true,  \
>> > +   .lower_unpack_half_2x16 =
>> > true,\
>> > +   .lower_unpack_snorm_2x16 =
>> > true,   \
>> > +   .lower_unpack_snorm_4x8 =
>> > true,\
>> > +   .lower_unpack_unorm_2x16 =
>> > true,   \
>> > +   .lower_unpack_unorm_4x8 =
>> > true \
>> > +
>> >  static const struct nir_shader_compiler_options scalar_nir_options
>> > = {
>> > COMMON_OPTIONS,
>> > -   .lower_pack_half_2x16 = true,
>> > -   .lower_pack_snorm_2x16 = true,
>> > -   .lower_pack_snorm_4x8 = true,
>> > -   .lower_pack_unorm_2x16 = true,
>> > -   .lower_pack_unorm_4x8 = true,
>> > -   .lower_unpack_half_2x16 = true,
>> > -   .lower_unpack_snorm_2x16 = true,
>> > -   .lower_unpack_snorm_4x8 = true,
>> > -   .lower_unpack_unorm_2x16 = true,
>> > -   .lower_unpack_unorm_4x8 = true,
>> > +   COMMON_SCALAR_OPTIONS,
>> > +   .max_unroll_iterations = 0,
>> > +};
>> > +
>> > +static const struct nir_shader_compiler_options
>> > scalar_nir_options_gen7 = {
>> > +   COMMON_OPTIONS,
>> > +   COMMON_SCALAR_OPTIONS,
>> > +   .max_unroll_iterations = 32,
>> >  };
>> >
>> >  static const struct nir_shader_compiler_options vector_nir_options
>> > = {
>> > @@ -75,6 +85,7 @@ static const struct nir_shader_compiler_options
>> > vector_nir_options = {
>> > .lower_unpack_unorm_2x16 = true,
>> > .lower_extract_byte = true,
>> > .lower_extract_word = true,
>> > +   .max_unroll_iterations = 0,
>> >  };
>> >
>> >  static const struct nir_shader_compiler_options
>> > vector_nir_options_gen6 = {
>> > @@ -92,6 +103,7 @@ static const struct nir_sha

Re: [Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+

2016-09-15 Thread Timothy Arceri
On Thu, 2016-09-15 at 19:49 -0400, Connor Abbott wrote:
> This seems a little dubious... why restrict it to gen7+?

Because if we don't switch to using nir_lower_indirect_derefs() then
lower_variable_index_to_cond_assign() makes a big mess before we get to
unrolling. I was getting piglit regressions when switching to for gen6
and below, unfortunately I have no hardware to debug it on. 

>  And why only
> scalar? This pass assumes SSA, but so do many other passes in core
> NIR
> that we also run in nir_optimize(), so that shouldn't be a problem.

I need to retest to give you the exact reason but there was a lowering
pass that is not called for the vector backend that meant we still had
to deal with nir registers.


> 
> On Thu, Sep 15, 2016 at 3:03 AM, Timothy Arceri
>  wrote:
> > 
> > ---
> >  src/compiler/glsl/glsl_parser_extras.cpp | 12 +
> >  src/mesa/drivers/dri/i965/brw_compiler.c | 42
> > +++-
> >  src/mesa/drivers/dri/i965/brw_nir.c  | 23 +
> >  3 files changed, 55 insertions(+), 22 deletions(-)
> > 
> > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp
> > b/src/compiler/glsl/glsl_parser_extras.cpp
> > index 436ddd0..a5c926a 100644
> > --- a/src/compiler/glsl/glsl_parser_extras.cpp
> > +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> > @@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir, bool
> > linked,
> > OPT(optimize_split_arrays, ir, linked);
> > OPT(optimize_redundant_jumps, ir);
> > 
> > -   loop_state *ls = analyze_loop_variables(ir);
> > -   if (ls->loop_found) {
> > -  OPT(set_loop_controls, ir, ls);
> > -  OPT(unroll_loops, ir, ls, options);
> > +   if (options->MaxUnrollIterations != 0) {
> > +  loop_state *ls = analyze_loop_variables(ir);
> > +  if (ls->loop_found) {
> > + OPT(set_loop_controls, ir, ls);
> > + OPT(unroll_loops, ir, ls, options);
> > +  }
> > +  delete ls;
> > }
> > -   delete ls;
> > 
> >  #undef OPT
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c
> > b/src/mesa/drivers/dri/i965/brw_compiler.c
> > index 86b1eaa..523b554 100644
> > --- a/src/mesa/drivers/dri/i965/brw_compiler.c
> > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c
> > @@ -43,18 +43,28 @@
> > .use_interpolated_input_intrinsics =
> > true, \
> > .vertex_id_zero_based = true
> > 
> > +#define
> > COMMON_SCALAR_OPTIONS  
> >    \
> > +   .lower_pack_half_2x16 =
> > true,  \
> > +   .lower_pack_snorm_2x16 =
> > true, \
> > +   .lower_pack_snorm_4x8 =
> > true,  \
> > +   .lower_pack_unorm_2x16 =
> > true, \
> > +   .lower_pack_unorm_4x8 =
> > true,  \
> > +   .lower_unpack_half_2x16 =
> > true,\
> > +   .lower_unpack_snorm_2x16 =
> > true,   \
> > +   .lower_unpack_snorm_4x8 =
> > true,\
> > +   .lower_unpack_unorm_2x16 =
> > true,   \
> > +   .lower_unpack_unorm_4x8 =
> > true \
> > +
> >  static const struct nir_shader_compiler_options scalar_nir_options
> > = {
> > COMMON_OPTIONS,
> > -   .lower_pack_half_2x16 = true,
> > -   .lower_pack_snorm_2x16 = true,
> > -   .lower_pack_snorm_4x8 = true,
> > -   .lower_pack_unorm_2x16 = true,
> > -   .lower_pack_unorm_4x8 = true,
> > -   .lower_unpack_half_2x16 = true,
> > -   .lower_unpack_snorm_2x16 = true,
> > -   .lower_unpack_snorm_4x8 = true,
> > -   .lower_unpack_unorm_2x16 = true,
> > -   .lower_unpack_unorm_4x8 = true,
> > +   COMMON_SCALAR_OPTIONS,
> > +   .max_unroll_iterations = 0,
> > +};
> > +
> > +static const struct nir_shader_compiler_options
> > scalar_nir_options_gen7 = {
> > +   COMMON_OPTIONS,
> > +   COMMON_SCALAR_OPTIONS,
> > +   .max_unroll_iterations = 32,
> >  };
> > 
> >  static const struct nir_shader_compiler_options vector_nir_options
> > = {
> > @@ -75,6 +85,7 @@ static const struct nir_shader_compiler_options
> > vector_nir_options = {
> > .lower_unpack_unorm_2x16 = true,
> > .lower_extract_byte = true,
> > .lower_extract_word = true,
> > +   .max_unroll_iterations = 0,
> >  };
> > 
> >  static const struct nir_shader_compiler_options
> > vector_nir_options_gen6 = {
> > @@ -92,6 +103,7 @@ static const struct nir_shader_compiler_options
> > vector_nir_options_gen6 = {
> > .lower_unpack_unorm_2x16 = true,
> > .lower_extract_byte = true,
> > .lower_extract_word = true,
> > +   .max_unroll_iterations = 0,
> >  };
> > 
> >  struct brw_compiler *
> > @@ -119,7 +131,6 @@ brw_compiler_create(void *mem_ctx, const struct
> > gen_device_info *devinfo)
> > 
> > /* We want the GL

Re: [Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+

2016-09-15 Thread Connor Abbott
This seems a little dubious... why restrict it to gen7+? And why only
scalar? This pass assumes SSA, but so do many other passes in core NIR
that we also run in nir_optimize(), so that shouldn't be a problem.

On Thu, Sep 15, 2016 at 3:03 AM, Timothy Arceri
 wrote:
> ---
>  src/compiler/glsl/glsl_parser_extras.cpp | 12 +
>  src/mesa/drivers/dri/i965/brw_compiler.c | 42 
> +++-
>  src/mesa/drivers/dri/i965/brw_nir.c  | 23 +
>  3 files changed, 55 insertions(+), 22 deletions(-)
>
> diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
> b/src/compiler/glsl/glsl_parser_extras.cpp
> index 436ddd0..a5c926a 100644
> --- a/src/compiler/glsl/glsl_parser_extras.cpp
> +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> @@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir, bool linked,
> OPT(optimize_split_arrays, ir, linked);
> OPT(optimize_redundant_jumps, ir);
>
> -   loop_state *ls = analyze_loop_variables(ir);
> -   if (ls->loop_found) {
> -  OPT(set_loop_controls, ir, ls);
> -  OPT(unroll_loops, ir, ls, options);
> +   if (options->MaxUnrollIterations != 0) {
> +  loop_state *ls = analyze_loop_variables(ir);
> +  if (ls->loop_found) {
> + OPT(set_loop_controls, ir, ls);
> + OPT(unroll_loops, ir, ls, options);
> +  }
> +  delete ls;
> }
> -   delete ls;
>
>  #undef OPT
>
> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c 
> b/src/mesa/drivers/dri/i965/brw_compiler.c
> index 86b1eaa..523b554 100644
> --- a/src/mesa/drivers/dri/i965/brw_compiler.c
> +++ b/src/mesa/drivers/dri/i965/brw_compiler.c
> @@ -43,18 +43,28 @@
> .use_interpolated_input_intrinsics = true,
>  \
> .vertex_id_zero_based = true
>
> +#define COMMON_SCALAR_OPTIONS
>  \
> +   .lower_pack_half_2x16 = true, 
>  \
> +   .lower_pack_snorm_2x16 = true,
>  \
> +   .lower_pack_snorm_4x8 = true, 
>  \
> +   .lower_pack_unorm_2x16 = true,
>  \
> +   .lower_pack_unorm_4x8 = true, 
>  \
> +   .lower_unpack_half_2x16 = true,   
>  \
> +   .lower_unpack_snorm_2x16 = true,  
>  \
> +   .lower_unpack_snorm_4x8 = true,   
>  \
> +   .lower_unpack_unorm_2x16 = true,  
>  \
> +   .lower_unpack_unorm_4x8 = true
>  \
> +
>  static const struct nir_shader_compiler_options scalar_nir_options = {
> COMMON_OPTIONS,
> -   .lower_pack_half_2x16 = true,
> -   .lower_pack_snorm_2x16 = true,
> -   .lower_pack_snorm_4x8 = true,
> -   .lower_pack_unorm_2x16 = true,
> -   .lower_pack_unorm_4x8 = true,
> -   .lower_unpack_half_2x16 = true,
> -   .lower_unpack_snorm_2x16 = true,
> -   .lower_unpack_snorm_4x8 = true,
> -   .lower_unpack_unorm_2x16 = true,
> -   .lower_unpack_unorm_4x8 = true,
> +   COMMON_SCALAR_OPTIONS,
> +   .max_unroll_iterations = 0,
> +};
> +
> +static const struct nir_shader_compiler_options scalar_nir_options_gen7 = {
> +   COMMON_OPTIONS,
> +   COMMON_SCALAR_OPTIONS,
> +   .max_unroll_iterations = 32,
>  };
>
>  static const struct nir_shader_compiler_options vector_nir_options = {
> @@ -75,6 +85,7 @@ static const struct nir_shader_compiler_options 
> vector_nir_options = {
> .lower_unpack_unorm_2x16 = true,
> .lower_extract_byte = true,
> .lower_extract_word = true,
> +   .max_unroll_iterations = 0,
>  };
>
>  static const struct nir_shader_compiler_options vector_nir_options_gen6 = {
> @@ -92,6 +103,7 @@ static const struct nir_shader_compiler_options 
> vector_nir_options_gen6 = {
> .lower_unpack_unorm_2x16 = true,
> .lower_extract_byte = true,
> .lower_extract_word = true,
> +   .max_unroll_iterations = 0,
>  };
>
>  struct brw_compiler *
> @@ -119,7 +131,6 @@ brw_compiler_create(void *mem_ctx, const struct 
> gen_device_info *devinfo)
>
> /* We want the GLSL compiler to emit code that uses condition codes */
> for (int i = 0; i < MESA_SHADER_STAGES; i++) {
> -  compiler->glsl_compiler_options[i].MaxUnrollIterations = 32;
>compiler->glsl_compiler_options[i].MaxIfDepth =
>   devinfo->gen < 6 ? 16 : UINT_MAX;
>
> @@ -140,8 +151,15 @@ brw_compiler_create(void *mem_ctx, const struct 
> gen_device_info *devinfo)
>   compiler->glsl_compiler_options[i].EmitNoIndirectSampler = true;
>
>if (is_scalar) {
> - compiler->glsl_compiler_options[i].NirOptions = &scalar_nir_options;
> + if (devinfo->gen > 6) {
> +compiler->glsl_compiler_options[i].MaxUnrollIterations = 0;
> + } else {
> +compiler->glsl_compiler_options[i].MaxUnrollIte

[Mesa-dev] [PATCH 10/10] i965: use nir unrolling for scalar backend Gen7+

2016-09-15 Thread Timothy Arceri
---
 src/compiler/glsl/glsl_parser_extras.cpp | 12 +
 src/mesa/drivers/dri/i965/brw_compiler.c | 42 +++-
 src/mesa/drivers/dri/i965/brw_nir.c  | 23 +
 3 files changed, 55 insertions(+), 22 deletions(-)

diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
b/src/compiler/glsl/glsl_parser_extras.cpp
index 436ddd0..a5c926a 100644
--- a/src/compiler/glsl/glsl_parser_extras.cpp
+++ b/src/compiler/glsl/glsl_parser_extras.cpp
@@ -2057,12 +2057,14 @@ do_common_optimization(exec_list *ir, bool linked,
OPT(optimize_split_arrays, ir, linked);
OPT(optimize_redundant_jumps, ir);
 
-   loop_state *ls = analyze_loop_variables(ir);
-   if (ls->loop_found) {
-  OPT(set_loop_controls, ir, ls);
-  OPT(unroll_loops, ir, ls, options);
+   if (options->MaxUnrollIterations != 0) {
+  loop_state *ls = analyze_loop_variables(ir);
+  if (ls->loop_found) {
+ OPT(set_loop_controls, ir, ls);
+ OPT(unroll_loops, ir, ls, options);
+  }
+  delete ls;
}
-   delete ls;
 
 #undef OPT
 
diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c 
b/src/mesa/drivers/dri/i965/brw_compiler.c
index 86b1eaa..523b554 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.c
+++ b/src/mesa/drivers/dri/i965/brw_compiler.c
@@ -43,18 +43,28 @@
.use_interpolated_input_intrinsics = true, \
.vertex_id_zero_based = true
 
+#define COMMON_SCALAR_OPTIONS \
+   .lower_pack_half_2x16 = true,  \
+   .lower_pack_snorm_2x16 = true, \
+   .lower_pack_snorm_4x8 = true,  \
+   .lower_pack_unorm_2x16 = true, \
+   .lower_pack_unorm_4x8 = true,  \
+   .lower_unpack_half_2x16 = true,\
+   .lower_unpack_snorm_2x16 = true,   \
+   .lower_unpack_snorm_4x8 = true,\
+   .lower_unpack_unorm_2x16 = true,   \
+   .lower_unpack_unorm_4x8 = true \
+
 static const struct nir_shader_compiler_options scalar_nir_options = {
COMMON_OPTIONS,
-   .lower_pack_half_2x16 = true,
-   .lower_pack_snorm_2x16 = true,
-   .lower_pack_snorm_4x8 = true,
-   .lower_pack_unorm_2x16 = true,
-   .lower_pack_unorm_4x8 = true,
-   .lower_unpack_half_2x16 = true,
-   .lower_unpack_snorm_2x16 = true,
-   .lower_unpack_snorm_4x8 = true,
-   .lower_unpack_unorm_2x16 = true,
-   .lower_unpack_unorm_4x8 = true,
+   COMMON_SCALAR_OPTIONS,
+   .max_unroll_iterations = 0,
+};
+
+static const struct nir_shader_compiler_options scalar_nir_options_gen7 = {
+   COMMON_OPTIONS,
+   COMMON_SCALAR_OPTIONS,
+   .max_unroll_iterations = 32,
 };
 
 static const struct nir_shader_compiler_options vector_nir_options = {
@@ -75,6 +85,7 @@ static const struct nir_shader_compiler_options 
vector_nir_options = {
.lower_unpack_unorm_2x16 = true,
.lower_extract_byte = true,
.lower_extract_word = true,
+   .max_unroll_iterations = 0,
 };
 
 static const struct nir_shader_compiler_options vector_nir_options_gen6 = {
@@ -92,6 +103,7 @@ static const struct nir_shader_compiler_options 
vector_nir_options_gen6 = {
.lower_unpack_unorm_2x16 = true,
.lower_extract_byte = true,
.lower_extract_word = true,
+   .max_unroll_iterations = 0,
 };
 
 struct brw_compiler *
@@ -119,7 +131,6 @@ brw_compiler_create(void *mem_ctx, const struct 
gen_device_info *devinfo)
 
/* We want the GLSL compiler to emit code that uses condition codes */
for (int i = 0; i < MESA_SHADER_STAGES; i++) {
-  compiler->glsl_compiler_options[i].MaxUnrollIterations = 32;
   compiler->glsl_compiler_options[i].MaxIfDepth =
  devinfo->gen < 6 ? 16 : UINT_MAX;
 
@@ -140,8 +151,15 @@ brw_compiler_create(void *mem_ctx, const struct 
gen_device_info *devinfo)
  compiler->glsl_compiler_options[i].EmitNoIndirectSampler = true;
 
   if (is_scalar) {
- compiler->glsl_compiler_options[i].NirOptions = &scalar_nir_options;
+ if (devinfo->gen > 6) {
+compiler->glsl_compiler_options[i].MaxUnrollIterations = 0;
+ } else {
+compiler->glsl_compiler_options[i].MaxUnrollIterations = 32;
+ }
+ compiler->glsl_compiler_options[i].NirOptions =
+devinfo->gen < 7 ? &scalar_nir_options : &scalar_nir_options_gen7;
   } else {
+ compiler->glsl_compiler_options[i].MaxUnrollIterations = 32;
  compiler->glsl_compiler_options[i].NirOptions =
 devinfo->gen < 6 ? &vector_nir_options : &vector_nir_options_gen6;
   }
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index 0c15b55..15abb77