Re: [Mesa-dev] [PATCH] glsl: optimize copy_propagation_elements pass

2016-09-30 Thread Tapani Pälli



On 09/30/2016 03:18 AM, Ian Romanick wrote:

On 09/29/2016 12:17 AM, Tapani Pälli wrote:


On 09/28/2016 06:14 PM, Ian Romanick wrote:

On 09/16/2016 06:21 PM, Tapani Pälli wrote:

Changes make copy_propagation_elements pass faster, reducing link
time spent in test case of bug 94477. Does not fix the actual issue
but brings down the total time. No regressions seen in CI.


How does this affect the time of a full shader-db run?


Almost none at all, this is the open-source shaders (100 runs):

Difference at 95.0% confidence
0.0312 +/- 0.00502746
1.72566% +/- 0.278068%
(Student's t, pooled s = 0.0181375)

(testing with DOTA-2 shaders gave very similar result)

My assumption is that this really helps only the most pathological cases
like in the bug where list size becomes enormous (thousands of entries).
With just few entries, list is 'fast enough' to walk through anyway (?)

BTW Eric was proposing to just remove this pass. However when testing
what happens on removal I noticed there's functional failures
(arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform starts to
fail), so it seems we are currently dependent on this pass.


There are a bunch of bits of this that are confusing to me.  I think
some high-level explanation about which hash tables the acp_ref can be
in, which lists it can be in, and how they relate would help.  I've
pointed out a couple of the confusing bits below.


Signed-off-by: Tapani Pälli 
---

For performance measurements, Martina reported in the bug 8x speedup
to the test case shader link time when using this patch together with
commit 2cd02e30d2e1677762d34f1831b8e609970ef0f3

 .../glsl/opt_copy_propagation_elements.cpp | 187
-
 1 file changed, 145 insertions(+), 42 deletions(-)

diff --git a/src/compiler/glsl/opt_copy_propagation_elements.cpp
b/src/compiler/glsl/opt_copy_propagation_elements.cpp
index e4237cc..1c5060a 100644
--- a/src/compiler/glsl/opt_copy_propagation_elements.cpp
+++ b/src/compiler/glsl/opt_copy_propagation_elements.cpp
@@ -46,6 +46,7 @@
 #include "ir_basic_block.h"
 #include "ir_optimization.h"
 #include "compiler/glsl_types.h"
+#include "util/hash_table.h"

 static bool debug = false;

@@ -76,6 +77,18 @@ public:
int swizzle[4];
 };

+/* Class that refers to acp_entry in another exec_list. Used
+ * when making removals based on rhs.
+ */
+class acp_ref : public exec_node


This pattern is called a box, so maybe acp_box would be a better name.
I'm not too hung up on it.

With this change, can the acp_entry itself still be in a list?


The idea here is a class that only refers to a acp_entry but does not
take any ownership .. so it's really just a list of pointers. I'm OK
with renaming it.


If only a boxed acp_entry can be in a list, then acp_entry doesn't need
to derive from exec_node.  That's why I was asking.  I looked at the
rest of the code again, and I now see that the acp_entry is in the lhs list.

Correct me if I'm wrong, but an entry will effectively be in two lists
at all times: the lhs list and the rhs list.

Assuming that previous assumption is correct, I might suggest a
different structure that makes it all less confusing.  Don't make
acp_entry drive from exec_node.  Instead, embed two acp_ref (or whatever
it ends up being called) nodes in the acp_entry:

acp_ref lhs_node;
acp_ref rhs_node;

When adding an entry to the lhs list, use

lhs_list->push_tail(&entry->lhs_node);

Similar for rhs list:

rhs_list->push_tail(&entry->rhs_node);

Walk the lists like:

   foreach_in_list_safe(acp_ref, ref, rhs_list) {
  acp_entry *entry = ref->entry;

  ...
   }

I think this would be a lot more clear because both lists are handled in
the same way.  It also avoids the overhead of allocating the boxes.


TBH I'm not sure at this point if this will make end implementation 
simpler or more complex but I'll give this change a try :)



+{
+public:
+   acp_ref(acp_entry *e)
+   {
+  entry = e;
+   }
+   acp_entry *entry;
+};

 class kill_entry : public exec_node
 {
@@ -98,14 +111,42 @@ public:
   this->killed_all = false;
   this->mem_ctx = ralloc_context(NULL);
   this->shader_mem_ctx = NULL;
-  this->acp = new(mem_ctx) exec_list;
   this->kills = new(mem_ctx) exec_list;
+
+  create_acp();
}
~ir_copy_propagation_elements_visitor()
{
   ralloc_free(mem_ctx);
}

+   void create_acp()
+   {
+  lhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+  rhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+   }
+
+   void destroy_acp()
+   {
+  _mesa_hash_table_destroy(lhs_ht, NULL);
+  _mesa_hash_table_destroy(rhs_ht, NULL);
+   }
+
+   void populate_acp(hash_table *lhs, hash_table *rhs)
+   {
+  struct hash_entry *entry;
+  hash_table_foreach(lhs, entry)
+  {


Opening { on the

Re: [Mesa-dev] [PATCH] glsl: optimize copy_propagation_elements pass

2016-09-29 Thread Ian Romanick
On 09/29/2016 12:17 AM, Tapani Pälli wrote:
> 
> On 09/28/2016 06:14 PM, Ian Romanick wrote:
>> On 09/16/2016 06:21 PM, Tapani Pälli wrote:
>>> Changes make copy_propagation_elements pass faster, reducing link
>>> time spent in test case of bug 94477. Does not fix the actual issue
>>> but brings down the total time. No regressions seen in CI.
>>
>> How does this affect the time of a full shader-db run?
> 
> Almost none at all, this is the open-source shaders (100 runs):
> 
> Difference at 95.0% confidence
> 0.0312 +/- 0.00502746
> 1.72566% +/- 0.278068%
> (Student's t, pooled s = 0.0181375)
> 
> (testing with DOTA-2 shaders gave very similar result)
> 
> My assumption is that this really helps only the most pathological cases
> like in the bug where list size becomes enormous (thousands of entries).
> With just few entries, list is 'fast enough' to walk through anyway (?)
> 
> BTW Eric was proposing to just remove this pass. However when testing
> what happens on removal I noticed there's functional failures
> (arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform starts to
> fail), so it seems we are currently dependent on this pass.
> 
>> There are a bunch of bits of this that are confusing to me.  I think
>> some high-level explanation about which hash tables the acp_ref can be
>> in, which lists it can be in, and how they relate would help.  I've
>> pointed out a couple of the confusing bits below.
>>
>>> Signed-off-by: Tapani Pälli 
>>> ---
>>>
>>> For performance measurements, Martina reported in the bug 8x speedup
>>> to the test case shader link time when using this patch together with
>>> commit 2cd02e30d2e1677762d34f1831b8e609970ef0f3
>>>
>>>  .../glsl/opt_copy_propagation_elements.cpp | 187
>>> -
>>>  1 file changed, 145 insertions(+), 42 deletions(-)
>>>
>>> diff --git a/src/compiler/glsl/opt_copy_propagation_elements.cpp
>>> b/src/compiler/glsl/opt_copy_propagation_elements.cpp
>>> index e4237cc..1c5060a 100644
>>> --- a/src/compiler/glsl/opt_copy_propagation_elements.cpp
>>> +++ b/src/compiler/glsl/opt_copy_propagation_elements.cpp
>>> @@ -46,6 +46,7 @@
>>>  #include "ir_basic_block.h"
>>>  #include "ir_optimization.h"
>>>  #include "compiler/glsl_types.h"
>>> +#include "util/hash_table.h"
>>>
>>>  static bool debug = false;
>>>
>>> @@ -76,6 +77,18 @@ public:
>>> int swizzle[4];
>>>  };
>>>
>>> +/* Class that refers to acp_entry in another exec_list. Used
>>> + * when making removals based on rhs.
>>> + */
>>> +class acp_ref : public exec_node
>>
>> This pattern is called a box, so maybe acp_box would be a better name.
>> I'm not too hung up on it.
>>
>> With this change, can the acp_entry itself still be in a list?
> 
> The idea here is a class that only refers to a acp_entry but does not
> take any ownership .. so it's really just a list of pointers. I'm OK
> with renaming it.

If only a boxed acp_entry can be in a list, then acp_entry doesn't need
to derive from exec_node.  That's why I was asking.  I looked at the
rest of the code again, and I now see that the acp_entry is in the lhs list.

Correct me if I'm wrong, but an entry will effectively be in two lists
at all times: the lhs list and the rhs list.

Assuming that previous assumption is correct, I might suggest a
different structure that makes it all less confusing.  Don't make
acp_entry drive from exec_node.  Instead, embed two acp_ref (or whatever
it ends up being called) nodes in the acp_entry:

acp_ref lhs_node;
acp_ref rhs_node;

When adding an entry to the lhs list, use

lhs_list->push_tail(&entry->lhs_node);

Similar for rhs list:

rhs_list->push_tail(&entry->rhs_node);

Walk the lists like:

   foreach_in_list_safe(acp_ref, ref, rhs_list) {
  acp_entry *entry = ref->entry;

  ...
   }

I think this would be a lot more clear because both lists are handled in
the same way.  It also avoids the overhead of allocating the boxes.

>>> +{
>>> +public:
>>> +   acp_ref(acp_entry *e)
>>> +   {
>>> +  entry = e;
>>> +   }
>>> +   acp_entry *entry;
>>> +};
>>>
>>>  class kill_entry : public exec_node
>>>  {
>>> @@ -98,14 +111,42 @@ public:
>>>this->killed_all = false;
>>>this->mem_ctx = ralloc_context(NULL);
>>>this->shader_mem_ctx = NULL;
>>> -  this->acp = new(mem_ctx) exec_list;
>>>this->kills = new(mem_ctx) exec_list;
>>> +
>>> +  create_acp();
>>> }
>>> ~ir_copy_propagation_elements_visitor()
>>> {
>>>ralloc_free(mem_ctx);
>>> }
>>>
>>> +   void create_acp()
>>> +   {
>>> +  lhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
>>> +   _mesa_key_pointer_equal);
>>> +  rhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
>>> +   _mesa_key_pointer_equal);
>>> +   }
>>> +
>>> +   void destroy_acp()
>>> +   {
>>> +  _mesa_hash_table_destroy(lhs_ht, NULL);
>>> +  _mesa_hash_table_destroy(rhs_ht, 

Re: [Mesa-dev] [PATCH] glsl: optimize copy_propagation_elements pass

2016-09-29 Thread Tapani Pälli


On 09/28/2016 06:14 PM, Ian Romanick wrote:

On 09/16/2016 06:21 PM, Tapani Pälli wrote:

Changes make copy_propagation_elements pass faster, reducing link
time spent in test case of bug 94477. Does not fix the actual issue
but brings down the total time. No regressions seen in CI.


How does this affect the time of a full shader-db run?


Almost none at all, this is the open-source shaders (100 runs):

Difference at 95.0% confidence
0.0312 +/- 0.00502746
1.72566% +/- 0.278068%
(Student's t, pooled s = 0.0181375)

(testing with DOTA-2 shaders gave very similar result)

My assumption is that this really helps only the most pathological cases 
like in the bug where list size becomes enormous (thousands of entries). 
With just few entries, list is 'fast enough' to walk through anyway (?)


BTW Eric was proposing to just remove this pass. However when testing 
what happens on removal I noticed there's functional failures 
(arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform starts to 
fail), so it seems we are currently dependent on this pass.



There are a bunch of bits of this that are confusing to me.  I think
some high-level explanation about which hash tables the acp_ref can be
in, which lists it can be in, and how they relate would help.  I've
pointed out a couple of the confusing bits below.


Signed-off-by: Tapani Pälli 
---

For performance measurements, Martina reported in the bug 8x speedup
to the test case shader link time when using this patch together with
commit 2cd02e30d2e1677762d34f1831b8e609970ef0f3

 .../glsl/opt_copy_propagation_elements.cpp | 187 -
 1 file changed, 145 insertions(+), 42 deletions(-)

diff --git a/src/compiler/glsl/opt_copy_propagation_elements.cpp 
b/src/compiler/glsl/opt_copy_propagation_elements.cpp
index e4237cc..1c5060a 100644
--- a/src/compiler/glsl/opt_copy_propagation_elements.cpp
+++ b/src/compiler/glsl/opt_copy_propagation_elements.cpp
@@ -46,6 +46,7 @@
 #include "ir_basic_block.h"
 #include "ir_optimization.h"
 #include "compiler/glsl_types.h"
+#include "util/hash_table.h"

 static bool debug = false;

@@ -76,6 +77,18 @@ public:
int swizzle[4];
 };

+/* Class that refers to acp_entry in another exec_list. Used
+ * when making removals based on rhs.
+ */
+class acp_ref : public exec_node


This pattern is called a box, so maybe acp_box would be a better name.
I'm not too hung up on it.

With this change, can the acp_entry itself still be in a list?


The idea here is a class that only refers to a acp_entry but does not 
take any ownership .. so it's really just a list of pointers. I'm OK 
with renaming it.



+{
+public:
+   acp_ref(acp_entry *e)
+   {
+  entry = e;
+   }
+   acp_entry *entry;
+};

 class kill_entry : public exec_node
 {
@@ -98,14 +111,42 @@ public:
   this->killed_all = false;
   this->mem_ctx = ralloc_context(NULL);
   this->shader_mem_ctx = NULL;
-  this->acp = new(mem_ctx) exec_list;
   this->kills = new(mem_ctx) exec_list;
+
+  create_acp();
}
~ir_copy_propagation_elements_visitor()
{
   ralloc_free(mem_ctx);
}

+   void create_acp()
+   {
+  lhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+  rhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+   }
+
+   void destroy_acp()
+   {
+  _mesa_hash_table_destroy(lhs_ht, NULL);
+  _mesa_hash_table_destroy(rhs_ht, NULL);
+   }
+
+   void populate_acp(hash_table *lhs, hash_table *rhs)
+   {
+  struct hash_entry *entry;
+  hash_table_foreach(lhs, entry)
+  {


Opening { on the hash_table_foreach line.


+ _mesa_hash_table_insert(lhs_ht, entry->key, entry->data);
+  }


Blank line here.


+  hash_table_foreach(rhs, entry)
+  {


Opening { on the hash_table_foreach line.


+ _mesa_hash_table_insert(rhs_ht, entry->key, entry->data);
+  }
+   }


thanks, will fix these


+
void handle_loop(ir_loop *, bool keep_acp);
virtual ir_visitor_status visit_enter(class ir_loop *);
virtual ir_visitor_status visit_enter(class ir_function_signature *);
@@ -120,8 +161,10 @@ public:
void kill(kill_entry *k);
void handle_if_block(exec_list *instructions);

-   /** List of acp_entry: The available copies to propagate */
-   exec_list *acp;
+   /** Hash of acp_entry: The available copies to propagate */
+   hash_table *lhs_ht;
+   hash_table *rhs_ht;
+
/**
 * List of kill_entry: The variables whose values were killed in this
 * block.
@@ -147,23 +190,29 @@ 
ir_copy_propagation_elements_visitor::visit_enter(ir_function_signature *ir)
 * block.  Any instructions at global scope will be shuffled into
 * main() at link time, so they're irrelevant to us.
 */
-   exec_list *orig_acp = this->acp;
exec_list *orig_kills = this->kills;
bool orig_killed_all =

Re: [Mesa-dev] [PATCH] glsl: optimize copy_propagation_elements pass

2016-09-28 Thread Ian Romanick
On 09/16/2016 06:21 PM, Tapani Pälli wrote:
> Changes make copy_propagation_elements pass faster, reducing link
> time spent in test case of bug 94477. Does not fix the actual issue
> but brings down the total time. No regressions seen in CI.

How does this affect the time of a full shader-db run?

There are a bunch of bits of this that are confusing to me.  I think
some high-level explanation about which hash tables the acp_ref can be
in, which lists it can be in, and how they relate would help.  I've
pointed out a couple of the confusing bits below.

> Signed-off-by: Tapani Pälli 
> ---
> 
> For performance measurements, Martina reported in the bug 8x speedup
> to the test case shader link time when using this patch together with
> commit 2cd02e30d2e1677762d34f1831b8e609970ef0f3
> 
>  .../glsl/opt_copy_propagation_elements.cpp | 187 
> -
>  1 file changed, 145 insertions(+), 42 deletions(-)
> 
> diff --git a/src/compiler/glsl/opt_copy_propagation_elements.cpp 
> b/src/compiler/glsl/opt_copy_propagation_elements.cpp
> index e4237cc..1c5060a 100644
> --- a/src/compiler/glsl/opt_copy_propagation_elements.cpp
> +++ b/src/compiler/glsl/opt_copy_propagation_elements.cpp
> @@ -46,6 +46,7 @@
>  #include "ir_basic_block.h"
>  #include "ir_optimization.h"
>  #include "compiler/glsl_types.h"
> +#include "util/hash_table.h"
>  
>  static bool debug = false;
>  
> @@ -76,6 +77,18 @@ public:
> int swizzle[4];
>  };
>  
> +/* Class that refers to acp_entry in another exec_list. Used
> + * when making removals based on rhs.
> + */
> +class acp_ref : public exec_node

This pattern is called a box, so maybe acp_box would be a better name.
I'm not too hung up on it.

With this change, can the acp_entry itself still be in a list?

> +{
> +public:
> +   acp_ref(acp_entry *e)
> +   {
> +  entry = e;
> +   }
> +   acp_entry *entry;
> +};
>  
>  class kill_entry : public exec_node
>  {
> @@ -98,14 +111,42 @@ public:
>this->killed_all = false;
>this->mem_ctx = ralloc_context(NULL);
>this->shader_mem_ctx = NULL;
> -  this->acp = new(mem_ctx) exec_list;
>this->kills = new(mem_ctx) exec_list;
> +
> +  create_acp();
> }
> ~ir_copy_propagation_elements_visitor()
> {
>ralloc_free(mem_ctx);
> }
>  
> +   void create_acp()
> +   {
> +  lhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
> +   _mesa_key_pointer_equal);
> +  rhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
> +   _mesa_key_pointer_equal);
> +   }
> +
> +   void destroy_acp()
> +   {
> +  _mesa_hash_table_destroy(lhs_ht, NULL);
> +  _mesa_hash_table_destroy(rhs_ht, NULL);
> +   }
> +
> +   void populate_acp(hash_table *lhs, hash_table *rhs)
> +   {
> +  struct hash_entry *entry;
> +  hash_table_foreach(lhs, entry)
> +  {

Opening { on the hash_table_foreach line.

> + _mesa_hash_table_insert(lhs_ht, entry->key, entry->data);
> +  }

Blank line here.

> +  hash_table_foreach(rhs, entry)
> +  {

Opening { on the hash_table_foreach line.

> + _mesa_hash_table_insert(rhs_ht, entry->key, entry->data);
> +  }
> +   }
> +
> void handle_loop(ir_loop *, bool keep_acp);
> virtual ir_visitor_status visit_enter(class ir_loop *);
> virtual ir_visitor_status visit_enter(class ir_function_signature *);
> @@ -120,8 +161,10 @@ public:
> void kill(kill_entry *k);
> void handle_if_block(exec_list *instructions);
>  
> -   /** List of acp_entry: The available copies to propagate */
> -   exec_list *acp;
> +   /** Hash of acp_entry: The available copies to propagate */
> +   hash_table *lhs_ht;
> +   hash_table *rhs_ht;
> +
> /**
>  * List of kill_entry: The variables whose values were killed in this
>  * block.
> @@ -147,23 +190,29 @@ 
> ir_copy_propagation_elements_visitor::visit_enter(ir_function_signature *ir)
>  * block.  Any instructions at global scope will be shuffled into
>  * main() at link time, so they're irrelevant to us.
>  */
> -   exec_list *orig_acp = this->acp;
> exec_list *orig_kills = this->kills;
> bool orig_killed_all = this->killed_all;
>  
> -   this->acp = new(mem_ctx) exec_list;
> +   hash_table *orig_lhs_ht = lhs_ht;
> +   hash_table *orig_rhs_ht = rhs_ht;
> +
> this->kills = new(mem_ctx) exec_list;

Orthogonal to this patch, there is some efficiency to be gained by not
doing this extra memory allocation.  Doing

  this->kills.move_nodes_to(&orig_kills);

here, and

  assert(this->kills.is_empty());
  orig_kills.move_nodes_to(&this->kills);

below is sufficient.  You'll also have to convert kills from exec_list*
to just exec_list.  One nice thing about that is it will just delete
code. :)

> this->killed_all = false;
>  
> +   create_acp();
> +
> visit_list_elements(this, &ir->body);
>  
> -   ralloc_free(this->acp);
> ralloc_free(this->ki

Re: [Mesa-dev] [PATCH] glsl: optimize copy_propagation_elements pass

2016-09-21 Thread Tapani Pälli

Gentle ping to Eric ..

On 09/16/2016 06:21 PM, Tapani Pälli wrote:

Changes make copy_propagation_elements pass faster, reducing link
time spent in test case of bug 94477. Does not fix the actual issue
but brings down the total time. No regressions seen in CI.

Signed-off-by: Tapani Pälli 
---

For performance measurements, Martina reported in the bug 8x speedup
to the test case shader link time when using this patch together with
commit 2cd02e30d2e1677762d34f1831b8e609970ef0f3

  .../glsl/opt_copy_propagation_elements.cpp | 187 -
  1 file changed, 145 insertions(+), 42 deletions(-)

diff --git a/src/compiler/glsl/opt_copy_propagation_elements.cpp 
b/src/compiler/glsl/opt_copy_propagation_elements.cpp
index e4237cc..1c5060a 100644
--- a/src/compiler/glsl/opt_copy_propagation_elements.cpp
+++ b/src/compiler/glsl/opt_copy_propagation_elements.cpp
@@ -46,6 +46,7 @@
  #include "ir_basic_block.h"
  #include "ir_optimization.h"
  #include "compiler/glsl_types.h"
+#include "util/hash_table.h"
  
  static bool debug = false;
  
@@ -76,6 +77,18 @@ public:

 int swizzle[4];
  };
  
+/* Class that refers to acp_entry in another exec_list. Used

+ * when making removals based on rhs.
+ */
+class acp_ref : public exec_node
+{
+public:
+   acp_ref(acp_entry *e)
+   {
+  entry = e;
+   }
+   acp_entry *entry;
+};
  
  class kill_entry : public exec_node

  {
@@ -98,14 +111,42 @@ public:
this->killed_all = false;
this->mem_ctx = ralloc_context(NULL);
this->shader_mem_ctx = NULL;
-  this->acp = new(mem_ctx) exec_list;
this->kills = new(mem_ctx) exec_list;
+
+  create_acp();
 }
 ~ir_copy_propagation_elements_visitor()
 {
ralloc_free(mem_ctx);
 }
  
+   void create_acp()

+   {
+  lhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+  rhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+   }
+
+   void destroy_acp()
+   {
+  _mesa_hash_table_destroy(lhs_ht, NULL);
+  _mesa_hash_table_destroy(rhs_ht, NULL);
+   }
+
+   void populate_acp(hash_table *lhs, hash_table *rhs)
+   {
+  struct hash_entry *entry;
+  hash_table_foreach(lhs, entry)
+  {
+ _mesa_hash_table_insert(lhs_ht, entry->key, entry->data);
+  }
+  hash_table_foreach(rhs, entry)
+  {
+ _mesa_hash_table_insert(rhs_ht, entry->key, entry->data);
+  }
+   }
+
 void handle_loop(ir_loop *, bool keep_acp);
 virtual ir_visitor_status visit_enter(class ir_loop *);
 virtual ir_visitor_status visit_enter(class ir_function_signature *);
@@ -120,8 +161,10 @@ public:
 void kill(kill_entry *k);
 void handle_if_block(exec_list *instructions);
  
-   /** List of acp_entry: The available copies to propagate */

-   exec_list *acp;
+   /** Hash of acp_entry: The available copies to propagate */
+   hash_table *lhs_ht;
+   hash_table *rhs_ht;
+
 /**
  * List of kill_entry: The variables whose values were killed in this
  * block.
@@ -147,23 +190,29 @@ 
ir_copy_propagation_elements_visitor::visit_enter(ir_function_signature *ir)
  * block.  Any instructions at global scope will be shuffled into
  * main() at link time, so they're irrelevant to us.
  */
-   exec_list *orig_acp = this->acp;
 exec_list *orig_kills = this->kills;
 bool orig_killed_all = this->killed_all;
  
-   this->acp = new(mem_ctx) exec_list;

+   hash_table *orig_lhs_ht = lhs_ht;
+   hash_table *orig_rhs_ht = rhs_ht;
+
 this->kills = new(mem_ctx) exec_list;
 this->killed_all = false;
  
+   create_acp();

+
 visit_list_elements(this, &ir->body);
  
-   ralloc_free(this->acp);

 ralloc_free(this->kills);
  
+   destroy_acp();

+
 this->kills = orig_kills;
-   this->acp = orig_acp;
 this->killed_all = orig_killed_all;
  
+   lhs_ht = orig_lhs_ht;

+   rhs_ht = orig_rhs_ht;
+
 return visit_continue_with_parent;
  }
  
@@ -249,17 +298,19 @@ ir_copy_propagation_elements_visitor::handle_rvalue(ir_rvalue **ir)

 /* Try to find ACP entries covering swizzle_chan[], hoping they're
  * the same source variable.
  */
-   foreach_in_list(acp_entry, entry, this->acp) {
-  if (var == entry->lhs) {
-for (int c = 0; c < chans; c++) {
-   if (entry->write_mask & (1 << swizzle_chan[c])) {
-  source[c] = entry->rhs;
-  source_chan[c] = entry->swizzle[swizzle_chan[c]];
+   hash_entry *ht_entry = _mesa_hash_table_search(lhs_ht, var);
+   if (ht_entry) {
+  exec_list *ht_list = (exec_list *) ht_entry->data;
+  foreach_in_list(acp_entry, entry, ht_list) {
+ for (int c = 0; c < chans; c++) {
+if (entry->write_mask & (1 << swizzle_chan[c])) {
+   source[c] = entry->rhs;
+   source_chan[c] = entry->swizzle[swizzle_chan[c]];
  
 if (sour