Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-15 Thread Matt Turner
On Wed, Oct 14, 2015 at 12:51 AM, Alejandro Piñeiro
 wrote:
> On 13/10/15 23:36, Matt Turner wrote:
>> The good news is that, unless I've done something wrong, this doesn't
>> affect any shaders in shader-db on ILK or HSW (I only tested those
>> two, but I expect the results are the same everywhere). Apparently
>> this is a pretty rare case.
>
> Are you sure? I have made a run adding your condition, and now comparing
> master vs having the optimization I get this:
> total instructions in shared programs: 6240631 -> 6240471 (-0.00%)
> instructions in affected programs: 18965 -> 18805 (-0.84%)
> helped:160
> HURT:  0
>
> That is a really small gain. Or put in other way, if we compare the
> conditions I have on the original patches vs adding the condition you
> are proposing, I get this:
>
> total instructions in shared programs: 6223900 -> 6240471 (0.27%)
> instructions in affected programs: 477537 -> 494108 (3.47%)
> helped:0
> HURT:  3047

Strange, I must have made a mistake in my shader-db results. Your
results certainly make more sense.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-14 Thread Alejandro Piñeiro
On 14/10/15 10:15, Francisco Jerez wrote:
> Alejandro Piñeiro  writes:
>
>> On 13/10/15 23:36, Matt Turner wrote:
>>> On Tue, Oct 13, 2015 at 1:49 AM, Alejandro Piñeiro  
>>> wrote:
 On 13/10/15 03:10, Matt Turner wrote:
> Looks like this is causing an intermittent failure on HSW in our
> Jenkins system (but I'm not able to reproduce locally) and a
> consistent failure on G45. I'll investigate that.
 Ok, will hold on, and meanwhile I will try to reproduce the problem on
 HSW. Unfortunately I don't have any G45 available, so I will not be able
 to help on that. Thanks for taking a look there.
>>> Okay, I think I see what's going on:
>>>
>>> --- good2015-10-13 13:34:40.753357253 -0700
>>> +++ bad 2015-10-13 13:36:06.114290094 -0700
>>> -and(8)  g6<1>.xDg6<4,4,1>.xD1D  { align16 
>>> };
>>> -cmp.nz.f0(8)null-g6<4,4,1>.xD   0D  { align16 
>>> };
>>> +and.nz.f0(8)g6<1>.xDg6<4,4,1>.xD1D  { align16 
>>> };
>>>
>>> We're propagating a .nz conditional mod from a CMP with a null dest
>>> (that has a writemask of .xyzw) to an AND that has a writemask of only
>>> .x, so only the .x channel of the flag is now being updated.
>> That is really common. And that is the idea behind the first patch of
>> the series that changes how if's are emitted, and allowing to optimize
>> if scan_inst->writemask is WRITEMASK_X. I will use the piglit test
>> glsl-1.30/execution/switch/vs-pervertex.shader_test to explain this.
>>
>> If we use current master, we get the following assembly:
>> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
>> align16 1Q compacted };
>> 2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
>> align16 1Q };
>> 3: (+f0) if(8) JIP: 6  UIP: 6  {
>> align16 1Q };
>>
>> If we just simplify out the mov.nz at #2, the test would fail, because
>> as in the example you mentioned, null has a XYZW writemask, but g19 a
>> writemask X. So this optimized version, f0 only have x updated, and #3
>> is doing a normal if, that uses all the channels.
>>
>> But right now the if just needs to check against one component, the x.
>> So that is the reason (among others) that my patch 1 changes how if's
>> are emitted. So, using both my patches 1 and 2, the assembly outcome
>> would be the following:
>> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
>> align16 1Q compacted };
>> 2: (+f0.x) if(8)   JIP: 6  UIP: 6  {
>> align16 1Q };
>>
>> It is true that now #1 only updates the x channel. But the if at #2 only
>> use that channel now, so the test still passes.
>>
>>> I think for now the thing to do is add
>>>
>>>(inst->dst.writemask & ~scan_inst->dst.writemask) != 0)
>> This would bail out basically all if's.
>>
> I agree with Matt that this check is necessary, otherwise
> cmod_propagation will replace the pair of instruction with another
> instruction which doesn't have equivalent semantics.  

Ok.

> If you still want
> cases like the one you pasted below involving an if conditional that
> only reads the x components to be simplified, I suggest you have a look
> at dead code elimination and fix it so that writemask bits of an
> instruction that writes a flag register (e.g. the "mov.nz.f0") are
> turned off for components which are determined to be dead, kind of like
> is done already for the vec4 components of GRFs.
>
> With that change the assembly from your example would look like this
> after DCE:
>
> | 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD { 
> align16 1Q compacted };
> | 2: mov.nz.f0(8)null.x  g20<4,4,1>.xD { 
> align16 1Q };
> | 3: (+f0.x) if(8)   JIP: 6  UIP: 6{ 
> align16 1Q };
>
> Which cmod propagation would be allowed to simplify further.

Ok, thanks, that sounds like a good idea. Will work on that, and send a
new patch to this series.

>
>> So guessing about what is happening here, as I mentioned, we are
>> allowing to pass that condition if scan_inst->dst.writemask equal to
>> WRITEMASK_X assuming that they come from an if, so we don't need to
>> update all f0 channels. But as Jason mentioned in our off-list chat,
>> that would not be the case if we are dealing with a sel. So something
>> like this:
>> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
>> align16 1Q compacted };
>> 2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
>> align16 1Q };
>> 3: (+f0.x) if(8) JIP: 6  UIP: 6  {
>> align16 1Q };
>>
>> #2 can be optimized out. But if we replace that if for an sel at #3, we
>> can't.
>>
>> If that is the case, in addition to check scan_inst and inst, we would
>> need to check in which kind of instruction the flag register is used later.
>>
>> But as I said, Im just guessing until I see the full fai

Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-14 Thread Francisco Jerez
Alejandro Piñeiro  writes:

> On 13/10/15 23:36, Matt Turner wrote:
>> On Tue, Oct 13, 2015 at 1:49 AM, Alejandro Piñeiro  
>> wrote:
>>> On 13/10/15 03:10, Matt Turner wrote:
 Looks like this is causing an intermittent failure on HSW in our
 Jenkins system (but I'm not able to reproduce locally) and a
 consistent failure on G45. I'll investigate that.
>>> Ok, will hold on, and meanwhile I will try to reproduce the problem on
>>> HSW. Unfortunately I don't have any G45 available, so I will not be able
>>> to help on that. Thanks for taking a look there.
>> Okay, I think I see what's going on:
>>
>> --- good2015-10-13 13:34:40.753357253 -0700
>> +++ bad 2015-10-13 13:36:06.114290094 -0700
>> -and(8)  g6<1>.xDg6<4,4,1>.xD1D  { align16 };
>> -cmp.nz.f0(8)null-g6<4,4,1>.xD   0D  { align16 };
>> +and.nz.f0(8)g6<1>.xDg6<4,4,1>.xD1D  { align16 };
>>
>> We're propagating a .nz conditional mod from a CMP with a null dest
>> (that has a writemask of .xyzw) to an AND that has a writemask of only
>> .x, so only the .x channel of the flag is now being updated.
>
> That is really common. And that is the idea behind the first patch of
> the series that changes how if's are emitted, and allowing to optimize
> if scan_inst->writemask is WRITEMASK_X. I will use the piglit test
> glsl-1.30/execution/switch/vs-pervertex.shader_test to explain this.
>
> If we use current master, we get the following assembly:
> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
> align16 1Q compacted };
> 2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
> align16 1Q };
> 3: (+f0) if(8) JIP: 6  UIP: 6  {
> align16 1Q };
>
> If we just simplify out the mov.nz at #2, the test would fail, because
> as in the example you mentioned, null has a XYZW writemask, but g19 a
> writemask X. So this optimized version, f0 only have x updated, and #3
> is doing a normal if, that uses all the channels.
>
> But right now the if just needs to check against one component, the x.
> So that is the reason (among others) that my patch 1 changes how if's
> are emitted. So, using both my patches 1 and 2, the assembly outcome
> would be the following:
> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
> align16 1Q compacted };
> 2: (+f0.x) if(8)   JIP: 6  UIP: 6  {
> align16 1Q };
>
> It is true that now #1 only updates the x channel. But the if at #2 only
> use that channel now, so the test still passes.
>
>>
>> I think for now the thing to do is add
>>
>>(inst->dst.writemask & ~scan_inst->dst.writemask) != 0)
>
> This would bail out basically all if's.
>
I agree with Matt that this check is necessary, otherwise
cmod_propagation will replace the pair of instruction with another
instruction which doesn't have equivalent semantics.  If you still want
cases like the one you pasted below involving an if conditional that
only reads the x components to be simplified, I suggest you have a look
at dead code elimination and fix it so that writemask bits of an
instruction that writes a flag register (e.g. the "mov.nz.f0") are
turned off for components which are determined to be dead, kind of like
is done already for the vec4 components of GRFs.

With that change the assembly from your example would look like this
after DCE:

| 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD { 
align16 1Q compacted };
| 2: mov.nz.f0(8)null.x  g20<4,4,1>.xD { 
align16 1Q };
| 3: (+f0.x) if(8)   JIP: 6  UIP: 6{ 
align16 1Q };

Which cmod propagation would be allowed to simplify further.

> So guessing about what is happening here, as I mentioned, we are
> allowing to pass that condition if scan_inst->dst.writemask equal to
> WRITEMASK_X assuming that they come from an if, so we don't need to
> update all f0 channels. But as Jason mentioned in our off-list chat,
> that would not be the case if we are dealing with a sel. So something
> like this:
> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
> align16 1Q compacted };
> 2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
> align16 1Q };
> 3: (+f0.x) if(8) JIP: 6  UIP: 6  {
> align16 1Q };
>
> #2 can be optimized out. But if we replace that if for an sel at #3, we
> can't.
>
> If that is the case, in addition to check scan_inst and inst, we would
> need to check in which kind of instruction the flag register is used later.
>
> But as I said, Im just guessing until I see the full failing test.
>
>> to the list of conditions under which we bail out. That is, if the
>> instruction we want to propagate the cmod onto writes fewer channels,
>> we can't do the optimization.
>>
>> With that, the block should look like:
>>
>>  if ((scan_inst->predicate && scan

Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-14 Thread Alejandro Piñeiro
On 13/10/15 23:36, Matt Turner wrote:
> On Tue, Oct 13, 2015 at 1:49 AM, Alejandro Piñeiro  
> wrote:
>> On 13/10/15 03:10, Matt Turner wrote:
>>> Looks like this is causing an intermittent failure on HSW in our
>>> Jenkins system (but I'm not able to reproduce locally) and a
>>> consistent failure on G45. I'll investigate that.
>> Ok, will hold on, and meanwhile I will try to reproduce the problem on
>> HSW. Unfortunately I don't have any G45 available, so I will not be able
>> to help on that. Thanks for taking a look there.
> Okay, I think I see what's going on:
>
> --- good2015-10-13 13:34:40.753357253 -0700
> +++ bad 2015-10-13 13:36:06.114290094 -0700
> -and(8)  g6<1>.xDg6<4,4,1>.xD1D  { align16 };
> -cmp.nz.f0(8)null-g6<4,4,1>.xD   0D  { align16 };
> +and.nz.f0(8)g6<1>.xDg6<4,4,1>.xD1D  { align16 };
>
> We're propagating a .nz conditional mod from a CMP with a null dest
> (that has a writemask of .xyzw) to an AND that has a writemask of only
> .x, so only the .x channel of the flag is now being updated.

That is really common. And that is the idea behind the first patch of
the series that changes how if's are emitted, and allowing to optimize
if scan_inst->writemask is WRITEMASK_X. I will use the piglit test
glsl-1.30/execution/switch/vs-pervertex.shader_test to explain this.

If we use current master, we get the following assembly:
1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
align16 1Q compacted };
2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
align16 1Q };
3: (+f0) if(8) JIP: 6  UIP: 6  {
align16 1Q };

If we just simplify out the mov.nz at #2, the test would fail, because
as in the example you mentioned, null has a XYZW writemask, but g19 a
writemask X. So this optimized version, f0 only have x updated, and #3
is doing a normal if, that uses all the channels.

But right now the if just needs to check against one component, the x.
So that is the reason (among others) that my patch 1 changes how if's
are emitted. So, using both my patches 1 and 2, the assembly outcome
would be the following:
1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
align16 1Q compacted };
2: (+f0.x) if(8)   JIP: 6  UIP: 6  {
align16 1Q };

It is true that now #1 only updates the x channel. But the if at #2 only
use that channel now, so the test still passes.

>
> I think for now the thing to do is add
>
>(inst->dst.writemask & ~scan_inst->dst.writemask) != 0)

This would bail out basically all if's.

So guessing about what is happening here, as I mentioned, we are
allowing to pass that condition if scan_inst->dst.writemask equal to
WRITEMASK_X assuming that they come from an if, so we don't need to
update all f0 channels. But as Jason mentioned in our off-list chat,
that would not be the case if we are dealing with a sel. So something
like this:
1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
align16 1Q compacted };
2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
align16 1Q };
3: (+f0.x) if(8) JIP: 6  UIP: 6  {
align16 1Q };

#2 can be optimized out. But if we replace that if for an sel at #3, we
can't.

If that is the case, in addition to check scan_inst and inst, we would
need to check in which kind of instruction the flag register is used later.

But as I said, Im just guessing until I see the full failing test.

> to the list of conditions under which we bail out. That is, if the
> instruction we want to propagate the cmod onto writes fewer channels,
> we can't do the optimization.
>
> With that, the block should look like:
>
>  if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) ||
>  scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
>  (scan_inst->dst.writemask != WRITEMASK_X &&
>   scan_inst->dst.writemask != WRITEMASK_XYZW) ||
>  (scan_inst->dst.writemask == WRITEMASK_XYZW &&
>   inst->src[0].swizzle != BRW_SWIZZLE_XYZW) ||
>  (inst->dst.writemask & ~scan_inst->dst.writemask) != 0)
> break;
>
> The good news is that, unless I've done something wrong, this doesn't
> affect any shaders in shader-db on ILK or HSW (I only tested those
> two, but I expect the results are the same everywhere). Apparently
> this is a pretty rare case.

Are you sure? I have made a run adding your condition, and now comparing
master vs having the optimization I get this:
total instructions in shared programs: 6240631 -> 6240471 (-0.00%)
instructions in affected programs: 18965 -> 18805 (-0.84%)
helped:160
HURT:  0

That is a really small gain. Or put in other way, if we compare the
conditions I have on the original patches vs adding the condition you
are proposing, I get this:

total inst

Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-13 Thread Matt Turner
On Tue, Oct 13, 2015 at 1:49 AM, Alejandro Piñeiro  wrote:
> On 13/10/15 03:10, Matt Turner wrote:
>> Looks like this is causing an intermittent failure on HSW in our
>> Jenkins system (but I'm not able to reproduce locally) and a
>> consistent failure on G45. I'll investigate that.
>
> Ok, will hold on, and meanwhile I will try to reproduce the problem on
> HSW. Unfortunately I don't have any G45 available, so I will not be able
> to help on that. Thanks for taking a look there.

Okay, I think I see what's going on:

--- good2015-10-13 13:34:40.753357253 -0700
+++ bad 2015-10-13 13:36:06.114290094 -0700
-and(8)  g6<1>.xDg6<4,4,1>.xD1D  { align16 };
-cmp.nz.f0(8)null-g6<4,4,1>.xD   0D  { align16 };
+and.nz.f0(8)g6<1>.xDg6<4,4,1>.xD1D  { align16 };

We're propagating a .nz conditional mod from a CMP with a null dest
(that has a writemask of .xyzw) to an AND that has a writemask of only
.x, so only the .x channel of the flag is now being updated.

I think for now the thing to do is add

   (inst->dst.writemask & ~scan_inst->dst.writemask) != 0)

to the list of conditions under which we bail out. That is, if the
instruction we want to propagate the cmod onto writes fewer channels,
we can't do the optimization.

With that, the block should look like:

 if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) ||
 scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
 (scan_inst->dst.writemask != WRITEMASK_X &&
  scan_inst->dst.writemask != WRITEMASK_XYZW) ||
 (scan_inst->dst.writemask == WRITEMASK_XYZW &&
  inst->src[0].swizzle != BRW_SWIZZLE_XYZW) ||
 (inst->dst.writemask & ~scan_inst->dst.writemask) != 0)
break;

The good news is that, unless I've done something wrong, this doesn't
affect any shaders in shader-db on ILK or HSW (I only tested those
two, but I expect the results are the same everywhere). Apparently
this is a pretty rare case.

With that change, my R-b still stands, though we should have a unit
test for this case as well in 3/5.

Thanks!
Matt
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-13 Thread Alejandro Piñeiro


On 13/10/15 03:10, Matt Turner wrote:
> On Mon, Oct 12, 2015 at 4:25 PM, Matt Turner  wrote:
>> On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro  
>> wrote:
>>> vec4 port of fs_cmod_propagation.
>>>
>>> Shader-db results:
>>> total instructions in shared programs: 6241226 -> 6224469 (-0.27%)
>>> instructions in affected programs: 498213 -> 481456 (-3.36%)
>>> helped:3082
>>> HURT:  0
>>> ---
>>>
>>> The final outcome is really similar to fs_brw_cmod_propagation. In
>>> fact the only difference is that on fs we have this:
>>>  if (scan_inst->overwrites_reg(inst->src[0])) {
>>> if (scan_inst->is_partial_write() ||
>>> scan_inst->dst.reg_offset != inst->src[0].reg_offset)
>>>break;
>>>
>>> And on vec4 (this commit) we have this:
>>>  if (inst->src[0].in_range(scan_inst->dst,
>>>scan_inst->regs_written)) {
>>>
>>> if ((scan_inst->predicate && scan_inst->opcode != 
>>> BRW_OPCODE_SEL) ||
>>> scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
>>> (scan_inst->dst.writemask != WRITEMASK_X && 
>>> scan_inst->dst.writemask != WRITEMASK_XYZW))
>>>break;
>>>
>>> if (scan_inst->dst.writemask == WRITEMASK_XYZW &&
>>> inst->src[0].swizzle != BRW_SWIZZLE_XYZW) {
>>>break;
>>> }
>>>
>>> So at some point I thought about refactoring it and having one common,
>>> like with opt_predicated_break, but that one was possible with just
>>> backend_instructions, while here we would need to deal with
>>> vec4_instructions and fs_inst, that could be somewhat messy, so
>>> I'm leaving this as it is.
>>>
>>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp |   1 +
>>>  src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
>>>  .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 
>>> +
>>>  4 files changed, 166 insertions(+)
>>>  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>>> b/src/mesa/drivers/dri/i965/Makefile.sources
>>> index 81ef628..c1836d6 100644
>>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>>> @@ -56,6 +56,7 @@ i965_compiler_FILES = \
>>> brw_util.c \
>>> brw_util.h \
>>> brw_vec4_builder.h \
>>> +   brw_vec4_cmod_propagation.cpp \
>>> brw_vec4_copy_propagation.cpp \
>>> brw_vec4.cpp \
>>> brw_vec4_cse.cpp \
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index e966b96..55e381b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -1867,6 +1867,7 @@ vec4_visitor::run()
>>>OPT(dead_code_eliminate);
>>>OPT(dead_control_flow_eliminate, this);
>>>OPT(opt_copy_propagation);
>>> +  OPT(opt_cmod_propagation);
>>>OPT(opt_cse);
>>>OPT(opt_algebraic);
>>>OPT(opt_register_coalesce);
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.h
>>> index 5e3500c..3c1711d 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
>>> @@ -149,6 +149,7 @@ public:
>>> int var_range_start(unsigned v, unsigned n) const;
>>> int var_range_end(unsigned v, unsigned n) const;
>>> bool virtual_grf_interferes(int a, int b);
>>> +   bool opt_cmod_propagation();
>>> bool opt_copy_propagation(bool do_constant_prop = true);
>>> bool opt_cse_local(bblock_t *block);
>>> bool opt_cse();
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>>> new file mode 100644
>>> index 000..7e39d2b
>>> --- /dev/null
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>>> @@ -0,0 +1,163 @@
>>> +/*
>>> + * Copyright © 2015 Intel Corporation
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice (including the 
>>> next
>>> + * paragraph) shall be included in all copies or substantial portions of 
>>> the
>>> + * Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>>> OR
>>> + 

Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-12 Thread Matt Turner
On Mon, Oct 12, 2015 at 4:25 PM, Matt Turner  wrote:
> On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro  
> wrote:
>> vec4 port of fs_cmod_propagation.
>>
>> Shader-db results:
>> total instructions in shared programs: 6241226 -> 6224469 (-0.27%)
>> instructions in affected programs: 498213 -> 481456 (-3.36%)
>> helped:3082
>> HURT:  0
>> ---
>>
>> The final outcome is really similar to fs_brw_cmod_propagation. In
>> fact the only difference is that on fs we have this:
>>  if (scan_inst->overwrites_reg(inst->src[0])) {
>> if (scan_inst->is_partial_write() ||
>> scan_inst->dst.reg_offset != inst->src[0].reg_offset)
>>break;
>>
>> And on vec4 (this commit) we have this:
>>  if (inst->src[0].in_range(scan_inst->dst,
>>scan_inst->regs_written)) {
>>
>> if ((scan_inst->predicate && scan_inst->opcode != 
>> BRW_OPCODE_SEL) ||
>> scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
>> (scan_inst->dst.writemask != WRITEMASK_X && 
>> scan_inst->dst.writemask != WRITEMASK_XYZW))
>>break;
>>
>> if (scan_inst->dst.writemask == WRITEMASK_XYZW &&
>> inst->src[0].swizzle != BRW_SWIZZLE_XYZW) {
>>break;
>> }
>>
>> So at some point I thought about refactoring it and having one common,
>> like with opt_predicated_break, but that one was possible with just
>> backend_instructions, while here we would need to deal with
>> vec4_instructions and fs_inst, that could be somewhat messy, so
>> I'm leaving this as it is.
>>
>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp |   1 +
>>  src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
>>  .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 
>> +
>>  4 files changed, 166 insertions(+)
>>  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>>
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>> b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 81ef628..c1836d6 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -56,6 +56,7 @@ i965_compiler_FILES = \
>> brw_util.c \
>> brw_util.h \
>> brw_vec4_builder.h \
>> +   brw_vec4_cmod_propagation.cpp \
>> brw_vec4_copy_propagation.cpp \
>> brw_vec4.cpp \
>> brw_vec4_cse.cpp \
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index e966b96..55e381b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -1867,6 +1867,7 @@ vec4_visitor::run()
>>OPT(dead_code_eliminate);
>>OPT(dead_control_flow_eliminate, this);
>>OPT(opt_copy_propagation);
>> +  OPT(opt_cmod_propagation);
>>OPT(opt_cse);
>>OPT(opt_algebraic);
>>OPT(opt_register_coalesce);
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
>> b/src/mesa/drivers/dri/i965/brw_vec4.h
>> index 5e3500c..3c1711d 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
>> @@ -149,6 +149,7 @@ public:
>> int var_range_start(unsigned v, unsigned n) const;
>> int var_range_end(unsigned v, unsigned n) const;
>> bool virtual_grf_interferes(int a, int b);
>> +   bool opt_cmod_propagation();
>> bool opt_copy_propagation(bool do_constant_prop = true);
>> bool opt_cse_local(bblock_t *block);
>> bool opt_cse();
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>> new file mode 100644
>> index 000..7e39d2b
>> --- /dev/null
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>> @@ -0,0 +1,163 @@
>> +/*
>> + * Copyright © 2015 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUT

Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-12 Thread Matt Turner
On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro  wrote:
> vec4 port of fs_cmod_propagation.
>
> Shader-db results:
> total instructions in shared programs: 6241226 -> 6224469 (-0.27%)
> instructions in affected programs: 498213 -> 481456 (-3.36%)
> helped:3082
> HURT:  0
> ---
>
> The final outcome is really similar to fs_brw_cmod_propagation. In
> fact the only difference is that on fs we have this:
>  if (scan_inst->overwrites_reg(inst->src[0])) {
> if (scan_inst->is_partial_write() ||
> scan_inst->dst.reg_offset != inst->src[0].reg_offset)
>break;
>
> And on vec4 (this commit) we have this:
>  if (inst->src[0].in_range(scan_inst->dst,
>scan_inst->regs_written)) {
>
> if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) 
> ||
> scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
> (scan_inst->dst.writemask != WRITEMASK_X && 
> scan_inst->dst.writemask != WRITEMASK_XYZW))
>break;
>
> if (scan_inst->dst.writemask == WRITEMASK_XYZW &&
> inst->src[0].swizzle != BRW_SWIZZLE_XYZW) {
>break;
> }
>
> So at some point I thought about refactoring it and having one common,
> like with opt_predicated_break, but that one was possible with just
> backend_instructions, while here we would need to deal with
> vec4_instructions and fs_inst, that could be somewhat messy, so
> I'm leaving this as it is.
>
>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>  src/mesa/drivers/dri/i965/brw_vec4.cpp |   1 +
>  src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
>  .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 
> +
>  4 files changed, 166 insertions(+)
>  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
> b/src/mesa/drivers/dri/i965/Makefile.sources
> index 81ef628..c1836d6 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -56,6 +56,7 @@ i965_compiler_FILES = \
> brw_util.c \
> brw_util.h \
> brw_vec4_builder.h \
> +   brw_vec4_cmod_propagation.cpp \
> brw_vec4_copy_propagation.cpp \
> brw_vec4.cpp \
> brw_vec4_cse.cpp \
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index e966b96..55e381b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1867,6 +1867,7 @@ vec4_visitor::run()
>OPT(dead_code_eliminate);
>OPT(dead_control_flow_eliminate, this);
>OPT(opt_copy_propagation);
> +  OPT(opt_cmod_propagation);
>OPT(opt_cse);
>OPT(opt_algebraic);
>OPT(opt_register_coalesce);
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
> b/src/mesa/drivers/dri/i965/brw_vec4.h
> index 5e3500c..3c1711d 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> @@ -149,6 +149,7 @@ public:
> int var_range_start(unsigned v, unsigned n) const;
> int var_range_end(unsigned v, unsigned n) const;
> bool virtual_grf_interferes(int a, int b);
> +   bool opt_cmod_propagation();
> bool opt_copy_propagation(bool do_constant_prop = true);
> bool opt_cse_local(bblock_t *block);
> bool opt_cse();
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
> new file mode 100644
> index 000..7e39d2b
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
> @@ -0,0 +1,163 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNE

Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-10 Thread Alejandro Piñeiro
On 10/10/15 16:54, Jason Ekstrand wrote:
> On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro  
> wrote:
>> vec4 port of fs_cmod_propagation.
>>
>> Shader-db results:
>> total instructions in shared programs: 6241226 -> 6224469 (-0.27%)
>> instructions in affected programs: 498213 -> 481456 (-3.36%)
>> helped:3082
>> HURT:  0
> Would you mind cherry-picking this back onto
> 4e0a8e0a50c9ac91cb7a70b92b8d9c6fcc02b7aa (the commit right before we
> made NIR non-optional) and get some GLSL IR vs. NIR vec4-only numbers
> with this patch?  I'd like to know what it does to that delta as well.

FWIW, the previous shader-db numbers were done without grepping for
vec4. Matt mentioned that he preferred that way. As asked, the numbers
for this email will be vec4-only numbers (so grepping for vec4).

So, the shader-db numbers of IR vs NIR at that reference commit are the
following:
  total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
  instructions in affected programs: 1660279 -> 1460468 (-12.03%)
  helped:14668
  HURT:  1369

And IR vs NIR numbers cherry-picking the optimization are the following:
  total instructions in shared programs: 1845902 -> 1631459 (-11.62%)
  instructions in affected programs: 1663398 -> 1448955 (-12.89%)
  helped:14863
  HURT:  1203

FWIW, we need to take into account that this commit is also helping IR.
The shader-db numbers of IR reference vs IR cherry picking are the
following:
  total instructions in shared programs: 1848027 -> 1845902 (-0.11%)
  instructions in affected programs: 195042 -> 192917 (-1.09%)
  helped:1033
  HURT:  0

And for that reason, probably it is worth to compare IR at the reference
versus NIR results cherry-picking, that are the following:
  total instructions in shared programs: 1848027 -> 1631459 (-11.72%)
  instructions in affected programs: 1672237 -> 1455669 (-12.95%)
  helped:14955
  HURT:  1194

>
> Thanks!
> --Jason

You are welcome. Thanks for the patch reviewing.

Best regards

>
>> ---
>>
>> The final outcome is really similar to fs_brw_cmod_propagation. In
>> fact the only difference is that on fs we have this:
>>  if (scan_inst->overwrites_reg(inst->src[0])) {
>> if (scan_inst->is_partial_write() ||
>> scan_inst->dst.reg_offset != inst->src[0].reg_offset)
>>break;
>>
>> And on vec4 (this commit) we have this:
>>  if (inst->src[0].in_range(scan_inst->dst,
>>scan_inst->regs_written)) {
>>
>> if ((scan_inst->predicate && scan_inst->opcode != 
>> BRW_OPCODE_SEL) ||
>> scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
>> (scan_inst->dst.writemask != WRITEMASK_X && 
>> scan_inst->dst.writemask != WRITEMASK_XYZW))
>>break;
>>
>> if (scan_inst->dst.writemask == WRITEMASK_XYZW &&
>> inst->src[0].swizzle != BRW_SWIZZLE_XYZW) {
>>break;
>> }
>>
>> So at some point I thought about refactoring it and having one common,
>> like with opt_predicated_break, but that one was possible with just
>> backend_instructions, while here we would need to deal with
>> vec4_instructions and fs_inst, that could be somewhat messy, so
>> I'm leaving this as it is.
>>
>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp |   1 +
>>  src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
>>  .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 
>> +
>>  4 files changed, 166 insertions(+)
>>  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>>
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>> b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 81ef628..c1836d6 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -56,6 +56,7 @@ i965_compiler_FILES = \
>> brw_util.c \
>> brw_util.h \
>> brw_vec4_builder.h \
>> +   brw_vec4_cmod_propagation.cpp \
>> brw_vec4_copy_propagation.cpp \
>> brw_vec4.cpp \
>> brw_vec4_cse.cpp \
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index e966b96..55e381b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -1867,6 +1867,7 @@ vec4_visitor::run()
>>OPT(dead_code_eliminate);
>>OPT(dead_control_flow_eliminate, this);
>>OPT(opt_copy_propagation);
>> +  OPT(opt_cmod_propagation);
>>OPT(opt_cse);
>>OPT(

Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-10 Thread Jason Ekstrand
On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro  wrote:
> vec4 port of fs_cmod_propagation.
>
> Shader-db results:
> total instructions in shared programs: 6241226 -> 6224469 (-0.27%)
> instructions in affected programs: 498213 -> 481456 (-3.36%)
> helped:3082
> HURT:  0

Would you mind cherry-picking this back onto
4e0a8e0a50c9ac91cb7a70b92b8d9c6fcc02b7aa (the commit right before we
made NIR non-optional) and get some GLSL IR vs. NIR vec4-only numbers
with this patch?  I'd like to know what it does to that delta as well.

Thanks!
--Jason

> ---
>
> The final outcome is really similar to fs_brw_cmod_propagation. In
> fact the only difference is that on fs we have this:
>  if (scan_inst->overwrites_reg(inst->src[0])) {
> if (scan_inst->is_partial_write() ||
> scan_inst->dst.reg_offset != inst->src[0].reg_offset)
>break;
>
> And on vec4 (this commit) we have this:
>  if (inst->src[0].in_range(scan_inst->dst,
>scan_inst->regs_written)) {
>
> if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) 
> ||
> scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
> (scan_inst->dst.writemask != WRITEMASK_X && 
> scan_inst->dst.writemask != WRITEMASK_XYZW))
>break;
>
> if (scan_inst->dst.writemask == WRITEMASK_XYZW &&
> inst->src[0].swizzle != BRW_SWIZZLE_XYZW) {
>break;
> }
>
> So at some point I thought about refactoring it and having one common,
> like with opt_predicated_break, but that one was possible with just
> backend_instructions, while here we would need to deal with
> vec4_instructions and fs_inst, that could be somewhat messy, so
> I'm leaving this as it is.
>
>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>  src/mesa/drivers/dri/i965/brw_vec4.cpp |   1 +
>  src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
>  .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 
> +
>  4 files changed, 166 insertions(+)
>  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
> b/src/mesa/drivers/dri/i965/Makefile.sources
> index 81ef628..c1836d6 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -56,6 +56,7 @@ i965_compiler_FILES = \
> brw_util.c \
> brw_util.h \
> brw_vec4_builder.h \
> +   brw_vec4_cmod_propagation.cpp \
> brw_vec4_copy_propagation.cpp \
> brw_vec4.cpp \
> brw_vec4_cse.cpp \
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index e966b96..55e381b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1867,6 +1867,7 @@ vec4_visitor::run()
>OPT(dead_code_eliminate);
>OPT(dead_control_flow_eliminate, this);
>OPT(opt_copy_propagation);
> +  OPT(opt_cmod_propagation);
>OPT(opt_cse);
>OPT(opt_algebraic);
>OPT(opt_register_coalesce);
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
> b/src/mesa/drivers/dri/i965/brw_vec4.h
> index 5e3500c..3c1711d 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> @@ -149,6 +149,7 @@ public:
> int var_range_start(unsigned v, unsigned n) const;
> int var_range_end(unsigned v, unsigned n) const;
> bool virtual_grf_interferes(int a, int b);
> +   bool opt_cmod_propagation();
> bool opt_copy_propagation(bool do_constant_prop = true);
> bool opt_cse_local(bblock_t *block);
> bool opt_cse();
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
> new file mode 100644
> index 000..7e39d2b
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
> @@ -0,0 +1,163 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABI

[Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-10 Thread Alejandro Piñeiro
vec4 port of fs_cmod_propagation.

Shader-db results:
total instructions in shared programs: 6241226 -> 6224469 (-0.27%)
instructions in affected programs: 498213 -> 481456 (-3.36%)
helped:3082
HURT:  0
---

The final outcome is really similar to fs_brw_cmod_propagation. In
fact the only difference is that on fs we have this:
 if (scan_inst->overwrites_reg(inst->src[0])) {
if (scan_inst->is_partial_write() ||
scan_inst->dst.reg_offset != inst->src[0].reg_offset)
   break;

And on vec4 (this commit) we have this:
 if (inst->src[0].in_range(scan_inst->dst,
   scan_inst->regs_written)) {

if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) ||
scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
(scan_inst->dst.writemask != WRITEMASK_X && 
scan_inst->dst.writemask != WRITEMASK_XYZW))
   break;

if (scan_inst->dst.writemask == WRITEMASK_XYZW &&
inst->src[0].swizzle != BRW_SWIZZLE_XYZW) {
   break;
}

So at some point I thought about refactoring it and having one common,
like with opt_predicated_break, but that one was possible with just
backend_instructions, while here we would need to deal with
vec4_instructions and fs_inst, that could be somewhat messy, so
I'm leaving this as it is.

 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp |   1 +
 src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
 .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 +
 4 files changed, 166 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 81ef628..c1836d6 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -56,6 +56,7 @@ i965_compiler_FILES = \
brw_util.c \
brw_util.h \
brw_vec4_builder.h \
+   brw_vec4_cmod_propagation.cpp \
brw_vec4_copy_propagation.cpp \
brw_vec4.cpp \
brw_vec4_cse.cpp \
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e966b96..55e381b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1867,6 +1867,7 @@ vec4_visitor::run()
   OPT(dead_code_eliminate);
   OPT(dead_control_flow_eliminate, this);
   OPT(opt_copy_propagation);
+  OPT(opt_cmod_propagation);
   OPT(opt_cse);
   OPT(opt_algebraic);
   OPT(opt_register_coalesce);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 5e3500c..3c1711d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -149,6 +149,7 @@ public:
int var_range_start(unsigned v, unsigned n) const;
int var_range_end(unsigned v, unsigned n) const;
bool virtual_grf_interferes(int a, int b);
+   bool opt_cmod_propagation();
bool opt_copy_propagation(bool do_constant_prop = true);
bool opt_cse_local(bblock_t *block);
bool opt_cse();
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
new file mode 100644
index 000..7e39d2b
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
@@ -0,0 +1,163 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Alejandro Piñeiro Iglesias 
+ *
+ * Based on brw_fs_cmod_propagation.cpp
+ */
+
+/** @file brw_vec4_cmod_propagation.cpp
+ *
+ * Really similar to brw_fs_cmod_propagation but adap