Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-28 Thread Jan Ziak
I would like to mention that the response Ian posted was/is a nice read.

I am not sure what to write in response. Let me just note that adding
formalisms to Mesa would lead to both higher performance and higher
safety. If you do not want C++ in Mesa, maybe it would be acceptable
for Mesa developers to generate C code by processing a higher-level
description of OpenGL calls and state transitions.

https://en.wikipedia.org/wiki/Formalism_(philosophy_of_mathematics)
https://en.wikipedia.org/wiki/Automatic_programming#Generative_programming

Jan

On Wed, Oct 19, 2016 at 12:21 AM, Ian Romanick  wrote:
> On 10/18/2016 10:12 AM, Jan Ziak wrote:
>>> Regarding C++ templates, the compiler doesn't use them. If u_vector
>>> (Dave Airlie?) provides the same functionality as your array, I
>>> suggest we use u_vector instead.
>>
>> Let me repeat what you just wrote, because it is unbelievable: You are
>> advising the use of non-templated collection types in C++ code.
>
> Are you able to find any templates anywhere in the GLSL compiler?  I
> don't think his statement was ambiguous.
>
>>> If you can't use u_vector, you should
>>> ask for approval from GLSL compiler leads (e.g. Ian Romanick or
>>> Kenneth Graunke) to use C++ templates.
>>
>> - You are talking about coding rules some Mesa developers agreed upon
>> and didn't bother writing down for other developers to read
>
> It was mostly written down, but it's not documented in the code base.
> It seems impossible to even get current, de facto practices documented.
> It's one of the few things in Mesa that really does get bike shedded.
>
> Before the current GLSL compiler, there was no C++ in Mesa at all.
> While developing the compiler, I found that I was re-implementing
> numerous C++ features by hand in C.  It felt pretty insane.  Why am I
> filling out all of these virtual function tables by hand?
>
> At the same time, I also observed that almost 100% of shipping,
> production-quality compilers were implemented using C++.  The single
> exception was GCC.  The need for GCC to bootstrap on minimal, sometimes
> dire, C compilers was the one thing keeping C++ out of the GCC code
> base.  It wasn't even that long ago that core parts of GCC had to
> support pre-C89 compilers.  As far as I am aware, they have since
> started using C++ too.  Who am I to be so bold as to declare that
> everyone shipping a C compiler is wrong?
>
> In light of that, I opened a discussion about using C++ in the compiler.
>
> Especially at that time (2008-ish), nobody working on Mesa was
> particularly skilled at C++.  I had used it some, and, in the mid-90's,
> had some really, really bad experiences with the implementations and
> side-effects of various language features.  I still have nightmares
> about trying to use templates in GCC 2.4.2.  There are quite a few C++
> features that are really easy to misuse.  There are also a lot of
> subtleties in the language that very few people really understand.
>
> I don't mean this in a pejorative way, but there was and continues to be
> a lot of FUD around C++.  I think a lot of this comes from the "Old
> Woman Who Swallowed a Fly" nature of solving C++ development problems.
> You have a problem.  The only way to solve that problem is to use
> another language feature that you may or may not understand how to use
> safely.  You use that feature to solve your problem.  Use of that
> feature presents a new problem.  The only way to solve the new problem
> is to use yet another language feature that you may or may not
> understand how to use safely.  Pretty soon nobody knows how anything in
> the code works.
>
> After quite a bit of discussion on the mesa-dev list, on #dri-devel, and
> face-to-face at XDC, we decided to use C++ with some restrictions.  The
> main restriction was that C++ would be limited to the GLSL compiler
> stack.  The other restrictions were roughly similar to the embedded C++
> subset.
>
> - No exceptions.
>
> - No RTTI.
>
> - No multiple inheritance.
>
> - No operator overloading.  It could be argued that our use of
>   placement new deviates from this.  In the previous metaphor, I
>   think this was either the spider or the bird.
>
> - No templates.
>
> There are other restrictions (e.g., no STL) that come as natural
> consequences of these.
>
> Our goal was that any existing Mesa developer should be able to read any
> piece of new C++ code and know what it was doing.
>
> I feel like, due to our collective ignorance about the language, we may
> have been slightly too restrictive.  It seems like we could have used
> templates in some very, very restricted ways to enable things like
> iterators that would have saved typing, encouraged refactoring, and made
> the code more understandable.  Instead we have a proliferation of
> foreach macros (or callbacks), and every data structure is a linked
> list.  It's difficult to say whether it would have made things strictly
> better or led us 

Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-19 Thread Eero Tamminen

Hi,

On 18.10.2016 20:12, Jan Ziak wrote:
[...]

Never profile with -O0 or disabled function inlining.


Seriously?


Nobody's going to take seriously optimization results taken from 
non-optimized builds.




Mesa uses -g -O2
with --enable-debug, so that's what you should use too. Don't use any
other -O* variants.


What if I find a case where -O2 prevents me from easily seeing
information necessary to optimize the source code?


I've never had that as a problem after the ARM call unwinding was solved 
in GCC and profiling tools decade ago (Valgrind even has some patches 
for that from the work we were doing at my previous employer).


You need to know what compiler optimizations do, to better understand 
the shown data, but tools can nowadays e.g. show inlined code correctly. 
 In general, if you have problems with optimized builds, either your 
tools or your builds are broken.


(C++ does make things a  bit more difficult because there's *much* more 
inlining happening with compiler optimizations on C++ code.



(Rest of the mail is general comments on profiling, not so much aimed 
for you or Marek, I assume you both already know that stuff.)



The only profiling tools reporting correct results are perf and
sysprof.


Perf uses sampling and reports averages.  While perf varies the sampling 
rate, sampling can still misrepresent some things (small frequently 
called things), and averages aren't good for everything.


That's why one should *also* use something like Valgrind which doesn't 
miss things (although it cannot accurately estimate how much time they 
take), so that you can see all call chains & call counts.


This isn't about latency, but for that good Intel PT based tool would be 
most correct.  Like the data provided by ARM ETM interface, it's very 
awkward to use though (GBs of data to process, tools not open source etc).




I used perf on Metro 2033 Redux and saw do_dead_code() there. Then I
used callgrind to see some more code.


(both use the same mechanism) If you don't enable dwarf in
perf (also sysprof can't use dwarf), you have to build Mesa with
-fno-omit-frame-pointer to see call trees. The only reason you would
want to enable dwarf-based call trees is when you want to see libc
calls. Otherwise, they won't be displayed or counted as part of call
trees. For Mesa developers who do profiling often,
-fno-omit-frame-pointer should be your default.



Callgrind counts calls (that one you can trust), but the reported time
is incorrect,


Callgrind reports number of instructions, not time.

Cachegrind can provide estimates for how much time is taken, but as you 
mentioned, it's not very reliable (while one can specify similar cache 
sizes as the target machine has, the cache model is inaccurate, and I 
don't think it counts SIMD code correctly).


Both report this data only for the user-space process, not for the work 
the process requests from the kernel.


[...]

- Eero

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Michael Schellenberger Costa

Hi Jan,


On 18.10.2016 00:07, Jan Ziak wrote:

This patch replaces the ir_variable_refcount_entry's linked-list
with an array-list.

The array-list has local storage which does not require ANY additional
allocations if the list has small number of elements. The size of this
storage is configurable for each variable.

Benchmark results for "./run -1 shaders" from shader-db[1]:

- The total number of executed instructions goes down from 64.184 to 63.797
   giga-instructions when Mesa is compiled with "gcc -O0 ..."
- In the call tree starting at function do_dead_code():
   - the number of calls to malloc() is reduced by about 10%
   - the number of calls to free() is reduced by about 30%

[1] git://anongit.freedesktop.org/mesa/shader-db

Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0...@gmail.com>
---
  src/compiler/glsl/ir_variable_refcount.cpp |  14 +--
  src/compiler/glsl/ir_variable_refcount.h   |   8 +-
  src/compiler/glsl/opt_dead_code.cpp|  19 ++--
  src/util/fast_list.h   | 167 +
  4 files changed, 176 insertions(+), 32 deletions(-)

diff --git a/src/compiler/glsl/ir_variable_refcount.cpp 
b/src/compiler/glsl/ir_variable_refcount.cpp
index 8306be1..94d6edc 100644
--- a/src/compiler/glsl/ir_variable_refcount.cpp
+++ b/src/compiler/glsl/ir_variable_refcount.cpp
@@ -46,15 +46,6 @@ static void
  free_entry(struct hash_entry *entry)
  {
 ir_variable_refcount_entry *ivre = (ir_variable_refcount_entry *) 
entry->data;
-
-   /* Free assignment list */
-   exec_node *n;
-   while ((n = ivre->assign_list.pop_head()) != NULL) {
-  struct assignment_entry *assignment_entry =
- exec_node_data(struct assignment_entry, n, link);
-  free(assignment_entry);
-   }
-
 delete ivre;
  }
  
@@ -142,10 +133,7 @@ ir_variable_refcount_visitor::visit_leave(ir_assignment *ir)

 */
assert(entry->referenced_count >= entry->assigned_count);
if (entry->referenced_count == entry->assigned_count) {
- struct assignment_entry *assignment_entry =
-(struct assignment_entry *)calloc(1, sizeof(*assignment_entry));
- assignment_entry->assign = ir;
- entry->assign_list.push_head(_entry->link);
+ entry->assign_list.add(ir);
}
 }
  
diff --git a/src/compiler/glsl/ir_variable_refcount.h b/src/compiler/glsl/ir_variable_refcount.h

index 08a11c0..c3ec5fe 100644
--- a/src/compiler/glsl/ir_variable_refcount.h
+++ b/src/compiler/glsl/ir_variable_refcount.h
@@ -32,11 +32,7 @@
  #include "ir.h"
  #include "ir_visitor.h"
  #include "compiler/glsl_types.h"
-
-struct assignment_entry {
-   exec_node link;
-   ir_assignment *assign;
-};
+#include "util/fast_list.h"
  
  class ir_variable_refcount_entry

  {
@@ -50,7 +46,7 @@ public:
  * This is intended to be used for dead code optimisation and may
  * not be a complete list.
  */
-   exec_list assign_list;
+   arraylist assign_list;
  
 /** Number of times the variable is referenced, including assignments. */

 unsigned referenced_count;
diff --git a/src/compiler/glsl/opt_dead_code.cpp 
b/src/compiler/glsl/opt_dead_code.cpp
index 75e668a..06e8c3d 100644
--- a/src/compiler/glsl/opt_dead_code.cpp
+++ b/src/compiler/glsl/opt_dead_code.cpp
@@ -52,7 +52,7 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
  
 struct hash_entry *e;

 hash_table_foreach(v.ht, e) {
-  ir_variable_refcount_entry *entry = (ir_variable_refcount_entry 
*)e->data;
+  ir_variable_refcount_entry *const entry = (ir_variable_refcount_entry 
*)e->data;
  
/* Since each assignment is a reference, the refereneced count must be

 * greater than or equal to the assignment count.  If they are equal,
@@ -89,7 +89,7 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
if (entry->var->data.always_active_io)
   continue;
  
-  if (!entry->assign_list.is_empty()) {

+  if (!entry->assign_list.empty()) {
 /* Remove all the dead assignments to the variable we found.
  * Don't do so if it's a shader or function output, though.
  */
@@ -98,26 +98,19 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
   entry->var->data.mode != ir_var_shader_out &&
   entry->var->data.mode != ir_var_shader_storage) {
  
-while (!entry->assign_list.is_empty()) {

-   struct assignment_entry *assignment_entry =
-  exec_node_data(struct assignment_entry,
- entry->assign_list.get_head_raw(), link);
-
-  assignment_entry->assign->remove();
-
+for(ir_assignment *assign : entry->assign_list) {
The original code separates control flow instructions as for or while 
with a space before the brace, aka "for (...". This applies for all the 
code.

+  assign->remove();
   if (debug) 

Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Ian Romanick
On 10/18/2016 10:12 AM, Jan Ziak wrote:
>> Regarding C++ templates, the compiler doesn't use them. If u_vector
>> (Dave Airlie?) provides the same functionality as your array, I
>> suggest we use u_vector instead.
> 
> Let me repeat what you just wrote, because it is unbelievable: You are
> advising the use of non-templated collection types in C++ code.

Are you able to find any templates anywhere in the GLSL compiler?  I
don't think his statement was ambiguous.

>> If you can't use u_vector, you should
>> ask for approval from GLSL compiler leads (e.g. Ian Romanick or
>> Kenneth Graunke) to use C++ templates.
> 
> - You are talking about coding rules some Mesa developers agreed upon
> and didn't bother writing down for other developers to read

It was mostly written down, but it's not documented in the code base.
It seems impossible to even get current, de facto practices documented.
It's one of the few things in Mesa that really does get bike shedded.

Before the current GLSL compiler, there was no C++ in Mesa at all.
While developing the compiler, I found that I was re-implementing
numerous C++ features by hand in C.  It felt pretty insane.  Why am I
filling out all of these virtual function tables by hand?

At the same time, I also observed that almost 100% of shipping,
production-quality compilers were implemented using C++.  The single
exception was GCC.  The need for GCC to bootstrap on minimal, sometimes
dire, C compilers was the one thing keeping C++ out of the GCC code
base.  It wasn't even that long ago that core parts of GCC had to
support pre-C89 compilers.  As far as I am aware, they have since
started using C++ too.  Who am I to be so bold as to declare that
everyone shipping a C compiler is wrong?

In light of that, I opened a discussion about using C++ in the compiler.

Especially at that time (2008-ish), nobody working on Mesa was
particularly skilled at C++.  I had used it some, and, in the mid-90's,
had some really, really bad experiences with the implementations and
side-effects of various language features.  I still have nightmares
about trying to use templates in GCC 2.4.2.  There are quite a few C++
features that are really easy to misuse.  There are also a lot of
subtleties in the language that very few people really understand.

I don't mean this in a pejorative way, but there was and continues to be
a lot of FUD around C++.  I think a lot of this comes from the "Old
Woman Who Swallowed a Fly" nature of solving C++ development problems.
You have a problem.  The only way to solve that problem is to use
another language feature that you may or may not understand how to use
safely.  You use that feature to solve your problem.  Use of that
feature presents a new problem.  The only way to solve the new problem
is to use yet another language feature that you may or may not
understand how to use safely.  Pretty soon nobody knows how anything in
the code works.

After quite a bit of discussion on the mesa-dev list, on #dri-devel, and
face-to-face at XDC, we decided to use C++ with some restrictions.  The
main restriction was that C++ would be limited to the GLSL compiler
stack.  The other restrictions were roughly similar to the embedded C++
subset.

- No exceptions.

- No RTTI.

- No multiple inheritance.

- No operator overloading.  It could be argued that our use of
  placement new deviates from this.  In the previous metaphor, I
  think this was either the spider or the bird.

- No templates.

There are other restrictions (e.g., no STL) that come as natural
consequences of these.

Our goal was that any existing Mesa developer should be able to read any
piece of new C++ code and know what it was doing.

I feel like, due to our collective ignorance about the language, we may
have been slightly too restrictive.  It seems like we could have used
templates in some very, very restricted ways to enable things like
iterators that would have saved typing, encouraged refactoring, and made
the code more understandable.  Instead we have a proliferation of
foreach macros (or callbacks), and every data structure is a linked
list.  It's difficult to say whether it would have made things strictly
better or led us to swallow a bird, a cat, a dog...

I also feel like that ship has sailed.  When NIR was implemented using
pure C, going so far as to re-invent constructors using macros, the
chances of using more C++ faded substantially.  If, and that's a really,
really big if, additional C++ were to be used, it would have to be
preceded by patches to docs/devinfo.html that documented:

- What features were to be used.

- Why use of those features benefit the code base.  Specifically,
  why use of the new feature is substantially better than a
  different implementation that does not use the feature.

- Any restrictions on the use of those features.

Such a discussion may produce additional alternatives.

> - I am not willing to use u_vector in C++ code


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Jan Ziak
On Tue, Oct 18, 2016 at 8:04 PM, Marek Olšák  wrote:

> On Tue, Oct 18, 2016 at 7:12 PM, Jan Ziak <0xe2.0x9a.0...@gmail.com>
> wrote:
> >> Regarding C++ templates, the compiler doesn't use them. If u_vector
> >> (Dave Airlie?) provides the same functionality as your array, I
> >> suggest we use u_vector instead.
> >
> > Let me repeat what you just wrote, because it is unbelievable: You are
> > advising the use of non-templated collection types in C++ code.
>
> Absolutely.
>

I don't believe what my own eyes are seeing.


> > If it isn't merged by Thursday (2016-oct-20) I will mark it as
> > rejected (rejected based on personal rather than scientific grounds).
>
> Relax. Things tend to move slowly when people are on conferences,
> vacations, or just busy with corporate stuff they have to deal with
> every day etc. and you can't predict those.
>
> Marek
>

Ok. Let's relax.

Jan
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Jan Ziak
Perf stat results for shader-db:

This is measured on an AMD Kaveri CPU.

gcc-6.2.0 -fno-omit-frame-pointer -g -O2

 Unpatched:

$ cd shader-db
$ ../run-upstream perfstat-u --repeat=5 -- ./run -1 shaders >/dev/null

 Performance counter stats for './run -1 shaders' (5 runs):

  13689.962374  task-clock (msec) #1.000 CPUs utilized
   ( +-  0.29% )
   138  context-switches  #0.010 K/sec
   ( +- 17.82% )
 6  cpu-migrations#0.000 K/sec
   ( +- 13.36% )
78,559  page-faults   #0.006 M/sec
   ( +-  0.24% )
53,578,642,861  cycles:u  #3.914 GHz
   ( +-  0.29% )
44,813,859,985  instructions:u#0.84  insn per cycle
  ( +-  0.01% )
 1,069,586,875  cache-references:u#   78.129 M/sec
   ( +-  0.65% )
51,295,256  cache-misses:u#4.796 % of all cache
refs  ( +-  0.56% )
 9,508,996,305  branches:u#  694.596 M/sec
   ( +-  0.01% )
   453,237,236  branch-misses:u   #4.77% of all
branches  ( +-  0.84% )

  13.692494394 seconds time elapsed
 ( +-  0.29% )

 Patched:

$ cd shader-db
$ ../run-upstream-patched perfstat-u --repeat=5 -- ./run -1 shaders
>/dev/null

 Performance counter stats for './run -1 shaders' (5 runs):

  13602.106171  task-clock (msec) #1.000 CPUs utilized
   ( +-  0.14% )
86  context-switches  #0.006 K/sec
   ( +- 13.95% )
 6  cpu-migrations#0.000 K/sec
   ( +- 26.35% )
78,271  page-faults   #0.006 M/sec
   ( +-  0.82% )
53,299,046,681  cycles:u  #3.918 GHz
   ( +-  0.13% )
44,577,707,063  instructions:u#0.84  insn per cycle
  ( +-  0.01% )
 1,078,158,307  cache-references:u#   79.264 M/sec
   ( +-  0.70% )
51,521,287  cache-misses:u#4.779 % of all cache
refs  ( +-  1.03% )
 9,459,962,609  branches:u#  695.478 M/sec
   ( +-  0.01% )
   456,593,871  branch-misses:u   #4.83% of all
branches  ( +-  0.27% )

  13.603795247 seconds time elapsed
 ( +-  0.14% )
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Marek Olšák
On Tue, Oct 18, 2016 at 7:12 PM, Jan Ziak <0xe2.0x9a.0...@gmail.com> wrote:
>> Regarding C++ templates, the compiler doesn't use them. If u_vector
>> (Dave Airlie?) provides the same functionality as your array, I
>> suggest we use u_vector instead.
>
> Let me repeat what you just wrote, because it is unbelievable: You are
> advising the use of non-templated collection types in C++ code.

Absolutely.

>
>> If you can't use u_vector, you should
>> ask for approval from GLSL compiler leads (e.g. Ian Romanick or
>> Kenneth Graunke) to use C++ templates.
>
> - You are talking about coding rules some Mesa developers agreed upon
> and didn't bother writing down for other developers to read
>
> - I am not willing to use u_vector in C++ code
>
>> I'll repeat some stuff about profiling here but also explain my perspective.
>
> So far (which may be a year or so), there is no indication that you
> are better at optimizing code than me.

Good one.

>
>> Never profile with -O0 or disabled function inlining.
>
> Seriously?

Absolutely.

>
>> Mesa uses -g -O2
>> with --enable-debug, so that's what you should use too. Don't use any
>> other -O* variants.
>
> What if I find a case where -O2 prevents me from easily seeing
> information necessary to optimize the source code?

There are several ways to get useful data from optimized code (using
the frame pointer, using dwarf, etc.) -O0 is too distorted.

>
>> The only profiling tools reporting correct results are perf and
>> sysprof.
>
> I used perf on Metro 2033 Redux and saw do_dead_code() there. Then I
> used callgrind to see some more code.

I recommend building Mesa with the frame pointer enabled, or enabling
dwarf in perf. Otherwise you won't see call trees.

>
>> (both use the same mechanism) If you don't enable dwarf in
>> perf (also sysprof can't use dwarf), you have to build Mesa with
>> -fno-omit-frame-pointer to see call trees. The only reason you would
>> want to enable dwarf-based call trees is when you want to see libc
>> calls. Otherwise, they won't be displayed or counted as part of call
>> trees. For Mesa developers who do profiling often,
>> -fno-omit-frame-pointer should be your default.
>
>> Callgrind counts calls (that one you can trust), but the reported time
>> is incorrect,
>
> Are you nuts? You cannot be seriously be assuming that I didn't know about 
> that.
>
>> because it uses its own virtual model of a CPU. Avoid it
>> if you want to measure time spent in functions.
>
> I will *NOT* avoid callgrind because I know how to use it to optimize code.

I didn't suggest avoiding callgrind in all cases.

>
>>Marek
>
> As usual, I would like to notify reviewers of this path that I
> am not willing to wait months to learn whether the code will be merged
> or rejected.
>
> If it isn't merged by Thursday (2016-oct-20) I will mark it as
> rejected (rejected based on personal rather than scientific grounds).

Relax. Things tend to move slowly when people are on conferences,
vacations, or just busy with corporate stuff they have to deal with
every day etc. and you can't predict those.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Jan Ziak
> Regarding C++ templates, the compiler doesn't use them. If u_vector
> (Dave Airlie?) provides the same functionality as your array, I
> suggest we use u_vector instead.

Let me repeat what you just wrote, because it is unbelievable: You are
advising the use of non-templated collection types in C++ code.

> If you can't use u_vector, you should
> ask for approval from GLSL compiler leads (e.g. Ian Romanick or
> Kenneth Graunke) to use C++ templates.

- You are talking about coding rules some Mesa developers agreed upon
and didn't bother writing down for other developers to read

- I am not willing to use u_vector in C++ code

> I'll repeat some stuff about profiling here but also explain my perspective.

So far (which may be a year or so), there is no indication that you
are better at optimizing code than me.

> Never profile with -O0 or disabled function inlining.

Seriously?

> Mesa uses -g -O2
> with --enable-debug, so that's what you should use too. Don't use any
> other -O* variants.

What if I find a case where -O2 prevents me from easily seeing
information necessary to optimize the source code?

> The only profiling tools reporting correct results are perf and
> sysprof.

I used perf on Metro 2033 Redux and saw do_dead_code() there. Then I
used callgrind to see some more code.

> (both use the same mechanism) If you don't enable dwarf in
> perf (also sysprof can't use dwarf), you have to build Mesa with
> -fno-omit-frame-pointer to see call trees. The only reason you would
> want to enable dwarf-based call trees is when you want to see libc
> calls. Otherwise, they won't be displayed or counted as part of call
> trees. For Mesa developers who do profiling often,
> -fno-omit-frame-pointer should be your default.

> Callgrind counts calls (that one you can trust), but the reported time
> is incorrect,

Are you nuts? You cannot be seriously be assuming that I didn't know about that.

> because it uses its own virtual model of a CPU. Avoid it
> if you want to measure time spent in functions.

I will *NOT* avoid callgrind because I know how to use it to optimize code.

>Marek

As usual, I would like to notify reviewers of this path that I
am not willing to wait months to learn whether the code will be merged
or rejected.

If it isn't merged by Thursday (2016-oct-20) I will mark it as
rejected (rejected based on personal rather than scientific grounds).

Jan
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Marek Olšák
On Tue, Oct 18, 2016 at 3:55 PM, Eero Tamminen
 wrote:
> Hi,
>
> On 18.10.2016 16:25, Jan Ziak wrote:
>>
>> On Tue, Oct 18, 2016 at 3:12 PM, Nicolai Hähnle 
>> wrote:
>>>
>>> On 18.10.2016 15:07, Jan Ziak wrote:

 On Tue Oct 18 09:29:59 UTC 2016, Eero Tamminen wrote:
>
> On 18.10.2016 01:07, Jan Ziak wrote:
>>
>> - The total number of executed instructions goes down from 64.184 to
>> 63.797
>>   giga-instructions when Mesa is compiled with "gcc -O0 ..."
>
>
> Please don't do performance related decisions based on data from
> compiling code with optimizations disabled.  Use -O2 or -O3 (or even
> better, check both).


 Options -O2 and -O3 interfere with profiling tools.

 I will try using -Og the next time.
>>>
>>>
>>> Just stop and use proper profiling tools like perf that can work with
>>> optimized tools.
>
>
> Valgrind/callgrind/cachegrind works also fine with optimized binaries.
>
> All profiling tools lie, at least a bit. It's better to know their strengths
> and weaknesses so that one knows which ones complement each other. Perf is
> e.g. good at finding hotspots, Valgrind (callgrind) is more reliable in
> telling how they get called.
>
> One may also needs GCC version from this decade.  Really old GCC versions
> didn't inlude all debug info needed for debugging optimized binaries.

Regarding C++ templates, the compiler doesn't use them. If u_vector
(Dave Airlie?) provides the same functionality as your array, I
suggest we use u_vector instead. If you can't use u_vector, you should
ask for approval from GLSL compiler leads (e.g. Ian Romanick or
Kenneth Graunke) to use C++ templates.


I'll repeat some stuff about profiling here but also explain my perspective.

Never profile with -O0 or disabled function inlining. Mesa uses -g -O2
with --enable-debug, so that's what you should use too. Don't use any
other -O* variants.

The only profiling tools reporting correct results are perf and
sysprof. (both use the same mechanism) If you don't enable dwarf in
perf (also sysprof can't use dwarf), you have to build Mesa with
-fno-omit-frame-pointer to see call trees. The only reason you would
want to enable dwarf-based call trees is when you want to see libc
calls. Otherwise, they won't be displayed or counted as part of call
trees. For Mesa developers who do profiling often,
-fno-omit-frame-pointer should be your default.

Callgrind counts calls (that one you can trust), but the reported time
is incorrect, because it uses its own virtual model of a CPU. Avoid it
if you want to measure time spent in functions.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Eero Tamminen

Hi,

On 18.10.2016 16:25, Jan Ziak wrote:

On Tue, Oct 18, 2016 at 3:12 PM, Nicolai Hähnle  wrote:

On 18.10.2016 15:07, Jan Ziak wrote:

On Tue Oct 18 09:29:59 UTC 2016, Eero Tamminen wrote:

On 18.10.2016 01:07, Jan Ziak wrote:

- The total number of executed instructions goes down from 64.184 to
63.797
  giga-instructions when Mesa is compiled with "gcc -O0 ..."


Please don't do performance related decisions based on data from
compiling code with optimizations disabled.  Use -O2 or -O3 (or even
better, check both).


Options -O2 and -O3 interfere with profiling tools.

I will try using -Og the next time.


Just stop and use proper profiling tools like perf that can work with
optimized tools.


Valgrind/callgrind/cachegrind works also fine with optimized binaries.

All profiling tools lie, at least a bit. It's better to know their 
strengths and weaknesses so that one knows which ones complement each 
other. Perf is e.g. good at finding hotspots, Valgrind (callgrind) is 
more reliable in telling how they get called.


One may also needs GCC version from this decade.  Really old GCC 
versions didn't inlude all debug info needed for debugging optimized 
binaries.




You may have to compile with -fno-omit-frame-pointer, but
anything more than that is going to distort your measurement results so much
that they're worthless.

>

The function from -O0, -O1, -O2 to number-of-instructions (or to
number-of-cycles) is in the vast majority of cases a monotonically
decreasing function.

>
> Are you saying that this rule does not hold in the case of this patch?

-O1 puts variables into registers instead of using them directly from 
stack like -O0 does.  -O3 adds things like inlining and aliasing 
analysis.  Only higher optimization levels do dead code analysis. 
Whether these increase or decrease code side, depends on the code.



- Eero

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Jan Ziak
Hi Michael,

thanks for the suggestions about code formatting. I formatted the
whole fast_list.h file in the Netbeans editor and uploaded a new
revision of the patch.

Jan

On Tue, Oct 18, 2016 at 1:42 PM, Michael Schellenberger Costa
 wrote:
> Hi Jan,
>
> On 18.10.2016 00:07, Jan Ziak wrote:
>>
>> This patch replaces the ir_variable_refcount_entry's linked-list
>> with an array-list.
>>
>> The array-list has local storage which does not require ANY additional
>> allocations if the list has small number of elements. The size of this
>> storage is configurable for each variable.
>>
>> Benchmark results for "./run -1 shaders" from shader-db[1]:
>>
>> - The total number of executed instructions goes down from 64.184 to
>> 63.797
>>giga-instructions when Mesa is compiled with "gcc -O0 ..."
>> - In the call tree starting at function do_dead_code():
>>- the number of calls to malloc() is reduced by about 10%
>>- the number of calls to free() is reduced by about 30%
>>
>> [1] git://anongit.freedesktop.org/mesa/shader-db
>>
>> Signed-off-by: Jan Ziak (http://atom-symbol.net)
>> <0xe2.0x9a.0...@gmail.com>
>> ---
>>   src/compiler/glsl/ir_variable_refcount.cpp |  14 +--
>>   src/compiler/glsl/ir_variable_refcount.h   |   8 +-
>>   src/compiler/glsl/opt_dead_code.cpp|  19 ++--
>>   src/util/fast_list.h   | 167
>> +
>>   4 files changed, 176 insertions(+), 32 deletions(-)
>>
>> diff --git a/src/compiler/glsl/ir_variable_refcount.cpp
>> b/src/compiler/glsl/ir_variable_refcount.cpp
>> index 8306be1..94d6edc 100644
>> --- a/src/compiler/glsl/ir_variable_refcount.cpp
>> +++ b/src/compiler/glsl/ir_variable_refcount.cpp
>> @@ -46,15 +46,6 @@ static void
>>   free_entry(struct hash_entry *entry)
>>   {
>>  ir_variable_refcount_entry *ivre = (ir_variable_refcount_entry *)
>> entry->data;
>> -
>> -   /* Free assignment list */
>> -   exec_node *n;
>> -   while ((n = ivre->assign_list.pop_head()) != NULL) {
>> -  struct assignment_entry *assignment_entry =
>> - exec_node_data(struct assignment_entry, n, link);
>> -  free(assignment_entry);
>> -   }
>> -
>>  delete ivre;
>>   }
>>   @@ -142,10 +133,7 @@
>> ir_variable_refcount_visitor::visit_leave(ir_assignment *ir)
>>  */
>> assert(entry->referenced_count >= entry->assigned_count);
>> if (entry->referenced_count == entry->assigned_count) {
>> - struct assignment_entry *assignment_entry =
>> -(struct assignment_entry *)calloc(1,
>> sizeof(*assignment_entry));
>> - assignment_entry->assign = ir;
>> - entry->assign_list.push_head(_entry->link);
>> + entry->assign_list.add(ir);
>> }
>>  }
>>   diff --git a/src/compiler/glsl/ir_variable_refcount.h
>> b/src/compiler/glsl/ir_variable_refcount.h
>> index 08a11c0..c3ec5fe 100644
>> --- a/src/compiler/glsl/ir_variable_refcount.h
>> +++ b/src/compiler/glsl/ir_variable_refcount.h
>> @@ -32,11 +32,7 @@
>>   #include "ir.h"
>>   #include "ir_visitor.h"
>>   #include "compiler/glsl_types.h"
>> -
>> -struct assignment_entry {
>> -   exec_node link;
>> -   ir_assignment *assign;
>> -};
>> +#include "util/fast_list.h"
>> class ir_variable_refcount_entry
>>   {
>> @@ -50,7 +46,7 @@ public:
>>   * This is intended to be used for dead code optimisation and may
>>   * not be a complete list.
>>   */
>> -   exec_list assign_list;
>> +   arraylist assign_list;
>>/** Number of times the variable is referenced, including
>> assignments. */
>>  unsigned referenced_count;
>> diff --git a/src/compiler/glsl/opt_dead_code.cpp
>> b/src/compiler/glsl/opt_dead_code.cpp
>> index 75e668a..06e8c3d 100644
>> --- a/src/compiler/glsl/opt_dead_code.cpp
>> +++ b/src/compiler/glsl/opt_dead_code.cpp
>> @@ -52,7 +52,7 @@ do_dead_code(exec_list *instructions, bool
>> uniform_locations_assigned)
>>struct hash_entry *e;
>>  hash_table_foreach(v.ht, e) {
>> -  ir_variable_refcount_entry *entry = (ir_variable_refcount_entry
>> *)e->data;
>> +  ir_variable_refcount_entry *const entry =
>> (ir_variable_refcount_entry *)e->data;
>>   /* Since each assignment is a reference, the refereneced count
>> must be
>>  * greater than or equal to the assignment count.  If they are
>> equal,
>> @@ -89,7 +89,7 @@ do_dead_code(exec_list *instructions, bool
>> uniform_locations_assigned)
>> if (entry->var->data.always_active_io)
>>continue;
>>   -  if (!entry->assign_list.is_empty()) {
>> +  if (!entry->assign_list.empty()) {
>>  /* Remove all the dead assignments to the variable we found.
>>   * Don't do so if it's a shader or function output, though.
>>   */
>> @@ -98,26 +98,19 @@ do_dead_code(exec_list *instructions, bool
>> uniform_locations_assigned)
>>entry->var->data.mode != ir_var_shader_out &&
>>

Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Jan Ziak
On Tue, Oct 18, 2016 at 3:32 PM, Emil Velikov  wrote:
> On 17 October 2016 at 23:07, Jan Ziak <0xe2.0x9a.0...@gmail.com> wrote:
>
>> -   exec_list assign_list;
>> +   arraylist assign_list;
>>
> Just an FYI - when people started using C++ for glsl there was an
> agreement that templating is not to be used.

In my opinion, avoiding templates for the very basic datatypes (such
as lists, sets and maps) in a language supporting templates is highly
unusual and counter-productive.

I can agree with Mesa not wanting to use templates in other cases.

> Not sure if it still holds true, nowadays.
> Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Emil Velikov
On 17 October 2016 at 23:07, Jan Ziak <0xe2.0x9a.0...@gmail.com> wrote:

> -   exec_list assign_list;
> +   arraylist assign_list;
>
Just an FYI - when people started using C++ for glsl there was an
agreement that templating is not to be used.

Not sure if it still holds true, nowadays.
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Jan Ziak
This patch replaces the ir_variable_refcount_entry's linked-list
with an array-list.

The array-list has local storage which does not require ANY additional
allocations if the list has small number of elements. The size of this
storage is configurable for each variable.

Benchmark results for "./run -1 shaders" from shader-db[1]:

- The total number of executed instructions goes down from 64.184 to 63.797
  giga-instructions when Mesa is compiled with "gcc -O0 ..."
- In the call tree starting at function do_dead_code():
  - the number of calls to malloc() is reduced by about 10%
  - the number of calls to free() is reduced by about 30%

The layout of the source code in the new file fast_list.h has been
formatted by NetBeans.

[1] git://anongit.freedesktop.org/mesa/shader-db

Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0...@gmail.com>
---
 src/compiler/glsl/ir_variable_refcount.cpp |  14 +-
 src/compiler/glsl/ir_variable_refcount.h   |   8 +-
 src/compiler/glsl/opt_dead_code.cpp|  19 +--
 src/util/fast_list.h   | 215 +
 4 files changed, 224 insertions(+), 32 deletions(-)

diff --git a/src/compiler/glsl/ir_variable_refcount.cpp 
b/src/compiler/glsl/ir_variable_refcount.cpp
index 8306be1..94d6edc 100644
--- a/src/compiler/glsl/ir_variable_refcount.cpp
+++ b/src/compiler/glsl/ir_variable_refcount.cpp
@@ -46,15 +46,6 @@ static void
 free_entry(struct hash_entry *entry)
 {
ir_variable_refcount_entry *ivre = (ir_variable_refcount_entry *) 
entry->data;
-
-   /* Free assignment list */
-   exec_node *n;
-   while ((n = ivre->assign_list.pop_head()) != NULL) {
-  struct assignment_entry *assignment_entry =
- exec_node_data(struct assignment_entry, n, link);
-  free(assignment_entry);
-   }
-
delete ivre;
 }
 
@@ -142,10 +133,7 @@ ir_variable_refcount_visitor::visit_leave(ir_assignment 
*ir)
*/
   assert(entry->referenced_count >= entry->assigned_count);
   if (entry->referenced_count == entry->assigned_count) {
- struct assignment_entry *assignment_entry =
-(struct assignment_entry *)calloc(1, sizeof(*assignment_entry));
- assignment_entry->assign = ir;
- entry->assign_list.push_head(_entry->link);
+ entry->assign_list.add(ir);
   }
}
 
diff --git a/src/compiler/glsl/ir_variable_refcount.h 
b/src/compiler/glsl/ir_variable_refcount.h
index 08a11c0..c3ec5fe 100644
--- a/src/compiler/glsl/ir_variable_refcount.h
+++ b/src/compiler/glsl/ir_variable_refcount.h
@@ -32,11 +32,7 @@
 #include "ir.h"
 #include "ir_visitor.h"
 #include "compiler/glsl_types.h"
-
-struct assignment_entry {
-   exec_node link;
-   ir_assignment *assign;
-};
+#include "util/fast_list.h"
 
 class ir_variable_refcount_entry
 {
@@ -50,7 +46,7 @@ public:
 * This is intended to be used for dead code optimisation and may
 * not be a complete list.
 */
-   exec_list assign_list;
+   arraylist assign_list;
 
/** Number of times the variable is referenced, including assignments. */
unsigned referenced_count;
diff --git a/src/compiler/glsl/opt_dead_code.cpp 
b/src/compiler/glsl/opt_dead_code.cpp
index 75e668a..8e395697 100644
--- a/src/compiler/glsl/opt_dead_code.cpp
+++ b/src/compiler/glsl/opt_dead_code.cpp
@@ -52,7 +52,7 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
 
struct hash_entry *e;
hash_table_foreach(v.ht, e) {
-  ir_variable_refcount_entry *entry = (ir_variable_refcount_entry 
*)e->data;
+  ir_variable_refcount_entry *const entry = (ir_variable_refcount_entry 
*)e->data;
 
   /* Since each assignment is a reference, the refereneced count must be
* greater than or equal to the assignment count.  If they are equal,
@@ -89,7 +89,7 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
   if (entry->var->data.always_active_io)
  continue;
 
-  if (!entry->assign_list.is_empty()) {
+  if (!entry->assign_list.empty()) {
 /* Remove all the dead assignments to the variable we found.
  * Don't do so if it's a shader or function output, though.
  */
@@ -98,26 +98,19 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
  entry->var->data.mode != ir_var_shader_out &&
  entry->var->data.mode != ir_var_shader_storage) {
 
-while (!entry->assign_list.is_empty()) {
-   struct assignment_entry *assignment_entry =
-  exec_node_data(struct assignment_entry,
- entry->assign_list.get_head_raw(), link);
-
-  assignment_entry->assign->remove();
-
+for (ir_assignment *assign : entry->assign_list) {
+  assign->remove();
   if (debug) {
  printf("Removed assignment to %s@%p\n",
 entry->var->name, (void *) entry->var);
}
-
-  

Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Jan Ziak
On Tue, Oct 18, 2016 at 3:12 PM, Nicolai Hähnle  wrote:
> On 18.10.2016 15:07, Jan Ziak wrote:
>>
>> On Tue Oct 18 09:29:59 UTC 2016, Eero Tamminen wrote:
>>>
>>> On 18.10.2016 01:07, Jan Ziak wrote:

 - The total number of executed instructions goes down from 64.184 to
 63.797
   giga-instructions when Mesa is compiled with "gcc -O0 ..."
>>>
>>>
>>> Please don't do performance related decisions based on data from
>>> compiling code with optimizations disabled.  Use -O2 or -O3 (or even
>>> better, check both).
>>
>>
>> Options -O2 and -O3 interfere with profiling tools.
>
>>
>>
>> I will try using -Og the next time.
>
>
> Just stop and use proper profiling tools like perf that can work with
> optimized tools. You may have to compile with -fno-omit-frame-pointer, but
> anything more than that is going to distort your measurement results so much
> that they're worthless.
>
> Nicolai

The function from -O0, -O1, -O2 to number-of-instructions (or to
number-of-cycles) is in the vast majority of cases a monotonically
decreasing function.

Are you saying that this rule does not hold in the case of this patch?

Jan
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Nicolai Hähnle

On 18.10.2016 15:07, Jan Ziak wrote:

On Tue Oct 18 09:29:59 UTC 2016, Eero Tamminen wrote:

On 18.10.2016 01:07, Jan Ziak wrote:

- The total number of executed instructions goes down from 64.184 to 63.797
  giga-instructions when Mesa is compiled with "gcc -O0 ..."


Please don't do performance related decisions based on data from
compiling code with optimizations disabled.  Use -O2 or -O3 (or even
better, check both).


Options -O2 and -O3 interfere with profiling tools.

>

I will try using -Og the next time.


Just stop and use proper profiling tools like perf that can work with 
optimized tools. You may have to compile with -fno-omit-frame-pointer, 
but anything more than that is going to distort your measurement results 
so much that they're worthless.


Nicolai


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Jan Ziak
On Tue Oct 18 09:29:59 UTC 2016, Eero Tamminen wrote:
> On 18.10.2016 01:07, Jan Ziak wrote:
>> - The total number of executed instructions goes down from 64.184 to 63.797
>>   giga-instructions when Mesa is compiled with "gcc -O0 ..."
>
>Please don't do performance related decisions based on data from
>compiling code with optimizations disabled.  Use -O2 or -O3 (or even
>better, check both).

Options -O2 and -O3 interfere with profiling tools.

I will try using -Og the next time.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-18 Thread Eero Tamminen

Hi,

On 18.10.2016 01:07, Jan Ziak wrote:

This patch replaces the ir_variable_refcount_entry's linked-list
with an array-list.

The array-list has local storage which does not require ANY additional
allocations if the list has small number of elements. The size of this
storage is configurable for each variable.

Benchmark results for "./run -1 shaders" from shader-db[1]:

- The total number of executed instructions goes down from 64.184 to 63.797
  giga-instructions when Mesa is compiled with "gcc -O0 ..."


Please don't do performance related decisions based on data from 
compiling code with optimizations disabled.  Use -O2 or -O3 (or even 
better, check both).




- In the call tree starting at function do_dead_code():
  - the number of calls to malloc() is reduced by about 10%
  - the number of calls to free() is reduced by about 30%

[1] git://anongit.freedesktop.org/mesa/shader-db



- Eero

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-17 Thread Dave Airlie
On 18 October 2016 at 08:45, Thomas Helland  wrote:
>
> 18. okt. 2016 00.07 skrev "Jan Ziak" <0xe2.0x9a.0...@gmail.com>:
>>
>> This patch replaces the ir_variable_refcount_entry's linked-list
>> with an array-list.
>>
>> The array-list has local storage which does not require ANY additional
>> allocations if the list has small number of elements. The size of this
>> storage is configurable for each variable.
>>
>> Benchmark results for "./run -1 shaders" from shader-db[1]:
>>
>> - The total number of executed instructions goes down from 64.184 to
>> 63.797
>>   giga-instructions when Mesa is compiled with "gcc -O0 ..."
>> - In the call tree starting at function do_dead_code():
>>   - the number of calls to malloc() is reduced by about 10%
>>   - the number of calls to free() is reduced by about 30%
>>
>> [1] git://anongit.freedesktop.org/mesa/shader-db
>
> Nice find. I would not be surprised if there are more cases of
> inappropriate data structures being used, causing overhead.
> I can't quite tell, as Gmail tends to mangle whitespace stuff,
> but it looks like there might be some style issues with
> not everything following the three-space indent, no tabs
> policy that mesa tries to stick to.
>
> On the subject of this patch, some thoughts that came to mind:
> I think it would be preferred if the implementation was in C.
> Using this in NIR, which tries to be C only, might be a possibility?
> I also seem to recall that I reviewed some patches that
> Jason wrote back a good while ago for NIR that implemented a
> dynamically resizing array of sorts that might be of interest.

Not sure but maybe look at the u_vector code I've extracted from the anv driver
and posted a few days ago.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-17 Thread
On Tue, Oct 18, 2016 at 12:45 AM, Thomas Helland
 wrote:
> I can't quite tell, as Gmail tends to mangle whitespace stuff,
> but it looks like there might be some style issues with
> not everything following the three-space indent, no tabs
> policy that mesa tries to stick to.

I followed the 3-space indent and no tabs rule in my source code.

The original (unpatched) file opt_dead_code.cpp needs some reformatting.

> On the subject of this patch, some thoughts that came to mind:
> I think it would be preferred if the implementation was in C.
> Using this in NIR, which tries to be C only, might be a possibility?

In the context of glsl, I would like to stick to C++. (Besides, if it
was up to my decision, I would convert many Mesa files from C to C++
because the latter is a slightly more productive programming language)

Mesa lacks a C++ implementation of a high-performance list backed an
array. I tried to use std::vector prior to implementing my own
arraylist, but std::vector's code is so complex that the C++
compiler has issues optimizing it.

The new arraylist type is very simple, so that the C++ compiler
can compile it quickly and can optimize it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-17 Thread Thomas Helland
18. okt. 2016 00.07 skrev "Jan Ziak" <0xe2.0x9a.0...@gmail.com>:
>
> This patch replaces the ir_variable_refcount_entry's linked-list
> with an array-list.
>
> The array-list has local storage which does not require ANY additional
> allocations if the list has small number of elements. The size of this
> storage is configurable for each variable.
>
> Benchmark results for "./run -1 shaders" from shader-db[1]:
>
> - The total number of executed instructions goes down from 64.184 to
63.797
>   giga-instructions when Mesa is compiled with "gcc -O0 ..."
> - In the call tree starting at function do_dead_code():
>   - the number of calls to malloc() is reduced by about 10%
>   - the number of calls to free() is reduced by about 30%
>
> [1] git://anongit.freedesktop.org/mesa/shader-db

Nice find. I would not be surprised if there are more cases of
inappropriate data structures being used, causing overhead.
I can't quite tell, as Gmail tends to mangle whitespace stuff,
but it looks like there might be some style issues with
not everything following the three-space indent, no tabs
policy that mesa tries to stick to.

On the subject of this patch, some thoughts that came to mind:
I think it would be preferred if the implementation was in C.
Using this in NIR, which tries to be C only, might be a possibility?
I also seem to recall that I reviewed some patches that
Jason wrote back a good while ago for NIR that implemented a
dynamically resizing array of sorts that might be of interest.

Just some random thoughts.
-Thomas

>
> Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0...@gmail.com
>
> ---
>  src/compiler/glsl/ir_variable_refcount.cpp |  14 +--
>  src/compiler/glsl/ir_variable_refcount.h   |   8 +-
>  src/compiler/glsl/opt_dead_code.cpp|  19 ++--
>  src/util/fast_list.h   | 167
+
>  4 files changed, 176 insertions(+), 32 deletions(-)
>
> diff --git a/src/compiler/glsl/ir_variable_refcount.cpp
b/src/compiler/glsl/ir_variable_refcount.cpp
> index 8306be1..94d6edc 100644
> --- a/src/compiler/glsl/ir_variable_refcount.cpp
> +++ b/src/compiler/glsl/ir_variable_refcount.cpp
> @@ -46,15 +46,6 @@ static void
>  free_entry(struct hash_entry *entry)
>  {
> ir_variable_refcount_entry *ivre = (ir_variable_refcount_entry *)
entry->data;
> -
> -   /* Free assignment list */
> -   exec_node *n;
> -   while ((n = ivre->assign_list.pop_head()) != NULL) {
> -  struct assignment_entry *assignment_entry =
> - exec_node_data(struct assignment_entry, n, link);
> -  free(assignment_entry);
> -   }
> -
> delete ivre;
>  }
>
> @@ -142,10 +133,7 @@
ir_variable_refcount_visitor::visit_leave(ir_assignment *ir)
> */
>assert(entry->referenced_count >= entry->assigned_count);
>if (entry->referenced_count == entry->assigned_count) {
> - struct assignment_entry *assignment_entry =
> -(struct assignment_entry *)calloc(1,
sizeof(*assignment_entry));
> - assignment_entry->assign = ir;
> - entry->assign_list.push_head(_entry->link);
> + entry->assign_list.add(ir);
>}
> }
>
> diff --git a/src/compiler/glsl/ir_variable_refcount.h
b/src/compiler/glsl/ir_variable_refcount.h
> index 08a11c0..c3ec5fe 100644
> --- a/src/compiler/glsl/ir_variable_refcount.h
> +++ b/src/compiler/glsl/ir_variable_refcount.h
> @@ -32,11 +32,7 @@
>  #include "ir.h"
>  #include "ir_visitor.h"
>  #include "compiler/glsl_types.h"
> -
> -struct assignment_entry {
> -   exec_node link;
> -   ir_assignment *assign;
> -};
> +#include "util/fast_list.h"
>
>  class ir_variable_refcount_entry
>  {
> @@ -50,7 +46,7 @@ public:
>  * This is intended to be used for dead code optimisation and may
>  * not be a complete list.
>  */
> -   exec_list assign_list;
> +   arraylist assign_list;
>
> /** Number of times the variable is referenced, including
assignments. */
> unsigned referenced_count;
> diff --git a/src/compiler/glsl/opt_dead_code.cpp
b/src/compiler/glsl/opt_dead_code.cpp
> index 75e668a..06e8c3d 100644
> --- a/src/compiler/glsl/opt_dead_code.cpp
> +++ b/src/compiler/glsl/opt_dead_code.cpp
> @@ -52,7 +52,7 @@ do_dead_code(exec_list *instructions, bool
uniform_locations_assigned)
>
> struct hash_entry *e;
> hash_table_foreach(v.ht, e) {
> -  ir_variable_refcount_entry *entry = (ir_variable_refcount_entry
*)e->data;
> +  ir_variable_refcount_entry *const entry =
(ir_variable_refcount_entry *)e->data;
>
>/* Since each assignment is a reference, the refereneced count
must be
> * greater than or equal to the assignment count.  If they are
equal,
> @@ -89,7 +89,7 @@ do_dead_code(exec_list *instructions, bool
uniform_locations_assigned)
>if (entry->var->data.always_active_io)
>   continue;
>
> -  if (!entry->assign_list.is_empty()) {
> +  if (!entry->assign_list.empty()) {
>  /* Remove all the dead assignments to the 

[Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

2016-10-17 Thread Jan Ziak
This patch replaces the ir_variable_refcount_entry's linked-list
with an array-list.

The array-list has local storage which does not require ANY additional
allocations if the list has small number of elements. The size of this
storage is configurable for each variable.

Benchmark results for "./run -1 shaders" from shader-db[1]:

- The total number of executed instructions goes down from 64.184 to 63.797
  giga-instructions when Mesa is compiled with "gcc -O0 ..."
- In the call tree starting at function do_dead_code():
  - the number of calls to malloc() is reduced by about 10%
  - the number of calls to free() is reduced by about 30%

[1] git://anongit.freedesktop.org/mesa/shader-db

Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0...@gmail.com>
---
 src/compiler/glsl/ir_variable_refcount.cpp |  14 +--
 src/compiler/glsl/ir_variable_refcount.h   |   8 +-
 src/compiler/glsl/opt_dead_code.cpp|  19 ++--
 src/util/fast_list.h   | 167 +
 4 files changed, 176 insertions(+), 32 deletions(-)

diff --git a/src/compiler/glsl/ir_variable_refcount.cpp 
b/src/compiler/glsl/ir_variable_refcount.cpp
index 8306be1..94d6edc 100644
--- a/src/compiler/glsl/ir_variable_refcount.cpp
+++ b/src/compiler/glsl/ir_variable_refcount.cpp
@@ -46,15 +46,6 @@ static void
 free_entry(struct hash_entry *entry)
 {
ir_variable_refcount_entry *ivre = (ir_variable_refcount_entry *) 
entry->data;
-
-   /* Free assignment list */
-   exec_node *n;
-   while ((n = ivre->assign_list.pop_head()) != NULL) {
-  struct assignment_entry *assignment_entry =
- exec_node_data(struct assignment_entry, n, link);
-  free(assignment_entry);
-   }
-
delete ivre;
 }
 
@@ -142,10 +133,7 @@ ir_variable_refcount_visitor::visit_leave(ir_assignment 
*ir)
*/
   assert(entry->referenced_count >= entry->assigned_count);
   if (entry->referenced_count == entry->assigned_count) {
- struct assignment_entry *assignment_entry =
-(struct assignment_entry *)calloc(1, sizeof(*assignment_entry));
- assignment_entry->assign = ir;
- entry->assign_list.push_head(_entry->link);
+ entry->assign_list.add(ir);
   }
}
 
diff --git a/src/compiler/glsl/ir_variable_refcount.h 
b/src/compiler/glsl/ir_variable_refcount.h
index 08a11c0..c3ec5fe 100644
--- a/src/compiler/glsl/ir_variable_refcount.h
+++ b/src/compiler/glsl/ir_variable_refcount.h
@@ -32,11 +32,7 @@
 #include "ir.h"
 #include "ir_visitor.h"
 #include "compiler/glsl_types.h"
-
-struct assignment_entry {
-   exec_node link;
-   ir_assignment *assign;
-};
+#include "util/fast_list.h"
 
 class ir_variable_refcount_entry
 {
@@ -50,7 +46,7 @@ public:
 * This is intended to be used for dead code optimisation and may
 * not be a complete list.
 */
-   exec_list assign_list;
+   arraylist assign_list;
 
/** Number of times the variable is referenced, including assignments. */
unsigned referenced_count;
diff --git a/src/compiler/glsl/opt_dead_code.cpp 
b/src/compiler/glsl/opt_dead_code.cpp
index 75e668a..06e8c3d 100644
--- a/src/compiler/glsl/opt_dead_code.cpp
+++ b/src/compiler/glsl/opt_dead_code.cpp
@@ -52,7 +52,7 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
 
struct hash_entry *e;
hash_table_foreach(v.ht, e) {
-  ir_variable_refcount_entry *entry = (ir_variable_refcount_entry 
*)e->data;
+  ir_variable_refcount_entry *const entry = (ir_variable_refcount_entry 
*)e->data;
 
   /* Since each assignment is a reference, the refereneced count must be
* greater than or equal to the assignment count.  If they are equal,
@@ -89,7 +89,7 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
   if (entry->var->data.always_active_io)
  continue;
 
-  if (!entry->assign_list.is_empty()) {
+  if (!entry->assign_list.empty()) {
 /* Remove all the dead assignments to the variable we found.
  * Don't do so if it's a shader or function output, though.
  */
@@ -98,26 +98,19 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
  entry->var->data.mode != ir_var_shader_out &&
  entry->var->data.mode != ir_var_shader_storage) {
 
-while (!entry->assign_list.is_empty()) {
-   struct assignment_entry *assignment_entry =
-  exec_node_data(struct assignment_entry,
- entry->assign_list.get_head_raw(), link);
-
-  assignment_entry->assign->remove();
-
+for(ir_assignment *assign : entry->assign_list) {
+  assign->remove();
   if (debug) {
  printf("Removed assignment to %s@%p\n",
 entry->var->name, (void *) entry->var);
}
-
-   assignment_entry->link.remove();
-   free(assignment_entry);
 }
+