[Mesa-dev] [Bug 97285] Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules"

2016-08-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97285

Matthew Dawson  changed:

   What|Removed |Added

 Status|NEEDINFO|REOPENED

--- Comment #4 from Matthew Dawson  ---
I can confirm seeing the same behaviour in TF2 on a 7970.  I suspect this is a
driver bug, and not a game bug, as the lighting sometimes works after I play
the game for a bit (but never at the start).  The lighting may break later
during the same session.  In the couple games where it fixed itself, it was
usually around the time this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=93649 occurred, though I don't
know if that is significant.

I independently bisected out the same commit as well.

I'll try to get an apitrace, see if that shows anything interesting.

Also, as another point of data, CS:GO seems to have no issues.  Would an
apitrace of it be useful?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Moving amdgpu/addrlib into a git submodule

2016-08-10 Thread Mathias Fröhlich
Hi,

On Wednesday, 10 August 2016 15:19:06 CEST Nicolai Hähnle wrote:
> On 09.08.2016 22:12, Dave Airlie wrote:
> >>
> >> tbh, git submodules are more annoying than they need to be, and I'm
> >> not really terribly excited to use that for something that is a build
> >> dependency.  Maybe just move it into libdrm instead?
> >>
> >
> > I've only had to use git submodules once with spice project, and it
> > was a nightmare. It makes packaging etc a real pita.
> 
> It would be nice to have something more concrete to go on, like what are 
> the problems _really_ :)
> 
> 
> > Alternatives are something like a fetch external sources script,
> > that does git submodules but does it better, you'll see Vulkan-CTS
> > etc use something like that, it would have to be integrated with
> > the build system a bit better though.
> 
> What pain do submodules give you that wouldn't be even _worse_ with a 
> separate script for fetching external sources?
> 
> 
> Look, there are a bunch of different use cases here. Distro packagers; 
> Mesa contributors and bleeding-edge users; and us (which is slightly 
> separate because we're not _just_ Mesa contributors but have parts of 
> the stack elsewhere, which is why this is coming up in the first place).
> 
> Most of the complaints seem to come from distro packaging, though I have 
> yet to understand what the exact problems are.
> 
> For Mesa contributors and users, I'd really like to avoid the hassle of 
> having to get another repository manually.
> 
> For us, in addition to avoiding the same hassles there's the question of 
> maintainability, and perhaps (with a huge load of optimism) getting the 
> closed source folks a bit more involved in the fact that there's another 
> world out here as well.
> 
> On those axes, I've seen one alternative suggestion that makes sense to 
> me, and that is subtree-merges. I have to play with them a bit, but the 
> initial impression is good. They seem like a more git-like solution. 
> That would also be something new for Mesa, though, since it would mean 
> more exceptions to the otherwise linear history.

For subtree-merge:
There is a pair of visualisation teams that used to work for the same company 
until 
they both got sold to individual vendors but still wanted to share their basic 
framwork 
code. Those two applications done by these teams are somewhere beyond - may be 
far beyond 1e6 loc each. The shared part is at least 2e5 loc, probably more. I 
do not 
recall the exact numbers. The shared part is communicated via a common upstream 
repository managed by git-subtree. There happens branching merging, rebasing, 
the 
usual git things. The git version that is used is ancient, the one from redhat 
6.
The git repository is based on an import from svn with non standard branch 
names in 
svn and that svn used to be imported from cvs. All together more than 10 years 
of 
history.
Observed problems: It needs special care to be handled. There were not so nice 
things 
that happened like disconnected git object trees. After installing a git hook 
in the 
subtree managed upstream repository that refuses such a push, disconnected 
object 
trees never happened again. Also checking out commits from the subtree side of 
a 
subtree merge gives interresting results. Well, plausible once you think about 
it, but 
probably surprising for the average git user. You will see a completely 
different 
filesystem structure, the one of the subtree you have.
The success side: A hand full years sharing > 2e5 loc where normal development 
happens. Probably some orders of magnitude more changes than I observed with 
addrlib since its creation. It basically does what you expect when you think 
about the 
problem to be solved. So, beside the problems there has been loads of expected 
and 
good behavior. Without subtree merge that undertaken would have been much more 
difficult to infeasible.

I do not think that I can give a recomendation. The basic constraints like the 
age of git 
used and the history of the repositry may introduce problems that you never 
observe 
with something simpler.

best
Mathias
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97231] GL_DEPTH_CLAMP doesn't clamp to the far plane

2016-08-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97231

--- Comment #16 from Roland Scheidegger  ---
(In reply to Jules Blok from comment #15)
> Created attachment 125671 [details]
> apitrace file llvmpipe
> 
> Finally got an apitrace running on llvmpipe by using a VM. Sorry for the
> delay.

Thanks, works much better (although in turn it doesn't work on nvidia blob here
- looks like it doesn't like the offsets used for glBindBufferRange()
there...).

I might be able to look into in some days...

Although I really believe this to be a llvmpipe problem - if other mesa drivers
are affected by it too they probably have their own, different bugs with it.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97285] Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules"

2016-08-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97285

--- Comment #3 from Kenneth Graunke  ---
It might also be worth testing with and without multisampling...

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97285] Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules"

2016-08-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97285

--- Comment #2 from Kenneth Graunke  ---
On i965, both Dota 2 (Source 2) and Left 4 Dead 2 (Source 1) appear to be
working.  I took screenshots from the same point in a Dota 2 tournament replay
both with and without the sRGB patches, and they look identical.

So perhaps I just broke Gallium or radeonsi drivers?  Sorry about that :(

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl/tests: fix segfault in uniform initializer test

2016-08-10 Thread Aaron Watry
On Wed, Aug 10, 2016 at 7:34 PM, Timothy Arceri <
timothy.arc...@collabora.com> wrote:

> Cause by 549222f5
>

s/Cause/Caused/ ?

Fixes the make check failure for me (haven't done a full piglit run, just
build/check/install test). Thanks for the quick turnaround.

--Aaron


>
> Cc: Aaron Watry 
> ---
>  src/compiler/glsl/tests/set_uniform_initializer_tests.cpp | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp
> b/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp
> index a36ffdc..9d41017 100644
> --- a/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp
> +++ b/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp
> @@ -24,6 +24,7 @@
>  #include "main/compiler.h"
>  #include "main/mtypes.h"
>  #include "main/macros.h"
> +#include "program/hash_table.h"
>  #include "util/ralloc.h"
>  #include "uniform_initializer_utils.h"
>
> @@ -108,6 +109,7 @@ establish_uniform_storage(struct gl_shader_program
> *prog, unsigned num_storage,
>+ type->components()));
> const unsigned red_zone_components = total_components -
> data_components;
>
> +   prog->UniformHash = new string_to_uint_map;
> prog->UniformStorage = rzalloc_array(prog, struct gl_uniform_storage,
> num_storage);
> prog->NumUniformStorage = num_storage;
> @@ -128,6 +130,9 @@ establish_uniform_storage(struct gl_shader_program
> *prog, unsigned num_storage,
>  data_components,
>  red_zone_components);
>
> +   prog->UniformHash->put(index_to_set,
> +  prog->UniformStorage[index_to_set].name);
> +
> for (unsigned i = 0; i < num_storage; i++) {
>if (i == index_to_set)
>  continue;
> --
> 2.7.4
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl/tests: fix segfault in uniform initializer test

2016-08-10 Thread Timothy Arceri
Cause by 549222f5

Cc: Aaron Watry 
---
 src/compiler/glsl/tests/set_uniform_initializer_tests.cpp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp 
b/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp
index a36ffdc..9d41017 100644
--- a/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp
+++ b/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp
@@ -24,6 +24,7 @@
 #include "main/compiler.h"
 #include "main/mtypes.h"
 #include "main/macros.h"
+#include "program/hash_table.h"
 #include "util/ralloc.h"
 #include "uniform_initializer_utils.h"
 
@@ -108,6 +109,7 @@ establish_uniform_storage(struct gl_shader_program *prog, 
unsigned num_storage,
   + type->components()));
const unsigned red_zone_components = total_components - data_components;
 
+   prog->UniformHash = new string_to_uint_map;
prog->UniformStorage = rzalloc_array(prog, struct gl_uniform_storage,
num_storage);
prog->NumUniformStorage = num_storage;
@@ -128,6 +130,9 @@ establish_uniform_storage(struct gl_shader_program *prog, 
unsigned num_storage,
 data_components,
 red_zone_components);
 
+   prog->UniformHash->put(index_to_set,
+  prog->UniformStorage[index_to_set].name);
+
for (unsigned i = 0; i < num_storage; i++) {
   if (i == index_to_set)
 continue;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Bisected: Make check is failing as of 549222f5

2016-08-10 Thread Aaron Watry
Hello,

I just got home and did a pull/build, and make check is failing as of a
very recent commit today...

${MESA_ROOT}/bin/test-driver: line 107:  5863 Segmentation fault  (core
dumped) "$@" > $log_file 2>&1
FAIL: glsl/tests/uniform-initializer-test
PASS: glsl/glcpp/tests/glcpp-test
^Cmake[6]: *** Deleting file 'glsl/glcpp/tests/glcpp-test-cr-lf.log'
Makefile:2442: recipe for target 'glsl/glcpp/tests/glcpp-test-cr-lf.log'
failed
make[6]: *** [glsl/glcpp/tests/glcpp-test-cr-lf.log] Error 130
Makefile:2414: recipe for target 'check-TESTS' failed
make[5]: *** [check-TESTS] Interrupt
Makefile:2546: recipe for target 'check-am' failed
make[4]: *** [check-am] Interrupt
Makefile:2549: recipe for target 'check' failed
make[3]: *** [check] Interrupt
Makefile:711: recipe for target 'check-recursive' failed
make[2]: *** [check-recursive] Interrupt
Makefile:860: recipe for target 'check' failed
make[1]: *** [check] Interrupt
Makefile:651: recipe for target 'check-recursive' failed
make: *** [check-recursive] Interrupt

me@hostname:${MESA_ROOT$ git bisect bad
549222f5f8ef4616f5e6ddeb5c29ea6446684e5e is the first bad commit
commit 549222f5f8ef4616f5e6ddeb5c29ea6446684e5e
Author: Timothy Arceri 
Date:   Wed Jul 27 15:20:44 2016 +1000

glsl: use UniformHash to find storage location

There is no need to be looping over all the uniforms.

Reviewed-by: Eric Anholt 

:04 04 86cc93603122405d5f03163d2abed6bac788e086
7567379ba933b6089ca1a34f8ba4c1fe8da04729 Msrc
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97285] Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules"

2016-08-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97285

Ian Romanick  changed:

   What|Removed |Added

 Status|NEW |NEEDINFO
 CC||i...@freedesktop.org,
   ||kenn...@whitecape.org

--- Comment #1 from Ian Romanick  ---
This will probably affect i965 too... do you have an apitrace or something at
the locations in question?  I know Ken tried some Source engine games before
pushing the series.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.

2016-08-10 Thread Ian Romanick
On 08/10/2016 12:50 PM, Eric Anholt wrote:
> Connor Abbott  writes:
> 
>> On Wed, Aug 10, 2016 at 1:53 PM, Eric Anholt  wrote:
>>> Kenneth Graunke  writes:
>>>
 On Haswell (GL 3.3):

 total instructions in shared programs: 6208759 -> 6203860 (-0.08%)
 instructions in affected programs: 856541 -> 851642 (-0.57%)
 helped: 3157
 HURT: 113
 LOST:   7
 GAINED: 15

 On Broadwell (GL 4.4):

 total instructions in shared programs: 11637854 -> 11632016 (-0.05%)
 instructions in affected programs: 1055693 -> 1049855 (-0.55%)
 helped: 3900
 HURT: 176
 LOST:   1
 GAINED: 18

 Signed-off-by: Kenneth Graunke 
 ---
  src/compiler/nir/nir_opt_algebraic.py | 4 
  1 file changed, 4 insertions(+)

 diff --git a/src/compiler/nir/nir_opt_algebraic.py 
 b/src/compiler/nir/nir_opt_algebraic.py
 index 1cf614c..4e9896f 100644
 --- a/src/compiler/nir/nir_opt_algebraic.py
 +++ b/src/compiler/nir/nir_opt_algebraic.py
 @@ -251,6 +251,10 @@ optimizations = [
 (('ieq', 'a@bool', False), ('inot', 'a')),
 (('bcsel', a, True, False), ('ine', a, 0)),
 (('bcsel', a, False, True), ('ieq', a, 0)),
 +   (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))),
 +   (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))),
 +   (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0,
 +   (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0,
>>>
>>> Isn't bcsel's first arg guaranteed to be 0 or ~0?  I thought that was
>>> how nir's bcsel (and b2f) worked.  If not, it would be good to see this
>>> documented.
>>
>> Yes, any ALU operation that takes a boolean input can expect it to be
>> 0 or ~0. If that isn't true, then something else in the compiler has
>> messed up. So I think we can replace 'a != 0' with 'a' and 'a == 0'
>> with '!a'.
> 
> Yeah.  So, I like the contents of this series a lot, but it would be
> nice to see some of the @bool annotations dropped.

Right... because ('b2f', 'a@bool') is redundant.  I agree.

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/13] glsl: remove dead builtins before assigning varying locations

2016-08-10 Thread Timothy Arceri
On Wed, 2016-08-10 at 00:01 -0700, Eric Anholt wrote:
> Timothy Arceri  writes:
> 
> > 
> > Builtins already have locations assigned so this shouldn't
> > changing anything. We want to call it earlier so we can tranform
> 
>   "change"
> 
> Other than that, 1-4 are:
> 
> Reviewed-by: Eric Anholt 
> 
> and I'll see if I can get to some more tomorrow.

Great. Thanks for taking a look :)

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97285] Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules"

2016-08-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97285

Bug ID: 97285
   Summary: Darkness in Dota 2 after Patch "Make Gallium's
BlitFramebuffer follow the GL 4.4 sRGB rules"
   Product: Mesa
   Version: unspecified
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: edmo...@eriadon.com
QA Contact: mesa-dev@lists.freedesktop.org

After updating to mesa master (6575ebdc), Dota 2 colors are much more darker.

>From the IRC log 2016-08-10 it seems haagch already bisected it:

15:53 #radeon: < haagch> anyway, am I the only one seeing some opengl games
being far too dark, like dota 2 and portal 2? 
16:03 #radeon: < nha> haagch, since when? maybe since those sRGB changes? 
16:38 #radeon: < haagch> nha: you were right,
3190c7ee9727161d627f107c2e7f8ec3a11941c1 is the first bad commit 

I can confirm that reverting patch 3190c7ee9727161d627f107c2e7f8ec3a11941c1
takes away the darkness in Dota 2.

My configuration:
OpenGL renderer string: Gallium 0.4 on AMD CYPRESS (DRM 2.46.0 / 4.8.0-rc1,
LLVM 3.8.1)
OpenGL core profile version string: 4.1 (Core Profile) Mesa 12.1.0-devel
(git-796d760)
OpenGL core profile shading language version string: 4.10

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/dri2: dri2_make_current: Release previous context's display

2016-08-10 Thread Nicolas Boichat
On Wed, Aug 10, 2016 at 9:44 AM, Michel Dänzer  wrote:
> On 10/08/16 03:00 PM, Nicolas Boichat wrote:
>> eglMakeCurrent can also be used to change the active display. In that
>> case, we need to decrement ref_count of the previous display (possibly
>> destroying it), and increment it on the next display.
>>
>> Also, old_dsurf/old_rsurf cannot be non-NULL if old_ctx is NULL, so
>> we only need to test if old_ctx is non-NULL.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97214
>> Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
>> Cc: "12.0" 
>> Reported-by: Alexandr Zelinsky 
>> Tested-by: Alexandr Zelinsky 
>> Signed-off-by: Nicolas Boichat 
>> ---
>>  src/egl/drivers/dri2/egl_dri2.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/egl/drivers/dri2/egl_dri2.c 
>> b/src/egl/drivers/dri2/egl_dri2.c
>> index 3205a36..701e42a 100644
>> --- a/src/egl/drivers/dri2/egl_dri2.c
>> +++ b/src/egl/drivers/dri2/egl_dri2.c
>> @@ -1285,8 +1285,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, 
>> _EGLSurface *dsurf,
>>
>>if (!unbind)
>>   dri2_dpy->ref_count++;
>> -  if (old_dsurf || old_rsurf || old_ctx)
>> - dri2_display_release(disp);
>> +  if (old_ctx) {
>> + EGLDisplay old_disp = 
>> _eglGetDisplayHandle(old_ctx->Resource.Display);
>> + dri2_display_release(old_disp);
>> +  }
>
> Unfortunately, this change breaks the piglit test "spec@egl
> 1.4@eglterminate then unbind context", because old_ctx != NULL but
> old_ctx->Resource.Display == NULL. Modifying the test to
>
>   if (old_ctx && old_ctx->Resource.Display) {
>
> fixes the regression and doesn't seem to cause any other problems.

This is probably wrong as the display will leak (it definitely should
be freed after calling eglTerminate + eglMakeCurrent(NULL)).

I think I know where the problem is (the call to
_eglReleaseDisplayResources happens too early), I'm not sure what's
the best solution. I'll have a look.

> Alexandr, does the patch still fix your problem with that modification?
>
>
> Nicolas, this regression is also reproducible with
> LIBGL_ALWAYS_SOFTWARE=1 . Please get used to testing your changes like
> that and only send out changes for review which don't cause any regressions.

Managed to build piglit, and run it using a locally built mesa. Are
the commands below what you would use?

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MESA_DIR/lib
export LIBGL_DRIVERS_PATH=$MESA_DIR/lib/gallium
export EGL_DRIVERS_PATH=$MESA_DIR/lib
export EGL_LOG_LEVEL=debug
export LIBGL_ALWAYS_SOFTWARE=1
./piglit run -p x11_egl -t "spec@egl.*" all results/master

Best,

Nicolas
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.

2016-08-10 Thread Connor Abbott
On Wed, Aug 10, 2016 at 2:24 PM, Kenneth Graunke  wrote:
> On Wednesday, August 10, 2016 10:02:12 AM PDT Erik Faye-Lund wrote:
>> On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke  
>> wrote:
>> > On Haswell (GL 3.3):
>> >
>> > total instructions in shared programs: 6208759 -> 6203860 (-0.08%)
>> > instructions in affected programs: 856541 -> 851642 (-0.57%)
>> > helped: 3157
>> > HURT: 113
>> > LOST:   7
>> > GAINED: 15
>> >
>> > On Broadwell (GL 4.4):
>> >
>> > total instructions in shared programs: 11637854 -> 11632016 (-0.05%)
>> > instructions in affected programs: 1055693 -> 1049855 (-0.55%)
>> > helped: 3900
>> > HURT: 176
>> > LOST:   1
>> > GAINED: 18
>> >
>> > Signed-off-by: Kenneth Graunke 
>> > ---
>> >  src/compiler/nir/nir_opt_algebraic.py | 4 
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/src/compiler/nir/nir_opt_algebraic.py 
>> > b/src/compiler/nir/nir_opt_algebraic.py
>> > index 1cf614c..4e9896f 100644
>> > --- a/src/compiler/nir/nir_opt_algebraic.py
>> > +++ b/src/compiler/nir/nir_opt_algebraic.py
>> > @@ -251,6 +251,10 @@ optimizations = [
>> > (('ieq', 'a@bool', False), ('inot', 'a')),
>> > (('bcsel', a, True, False), ('ine', a, 0)),
>> > (('bcsel', a, False, True), ('ieq', a, 0)),
>> > +   (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))),
>> > +   (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))),
>> > +   (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0,
>> > +   (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0,
>> > (('bcsel', True, b, c), b),
>> > (('bcsel', False, b, c), c),
>> > # The result of this should be hit by constant propagation and, in the
>>
>> Same as the previous patch, this smells like intel-isms. Hardware that
>> has native bcsel with support for two inline immediates will do better
>> without.
>
> It definitely feels a little strange replacing a single bcsel with
> a fneg/b2f/ine, as that's three operations instead of one.
>
> I expect the ine to go away - assuming 'a' is a properly formatted
> boolean (0 or 0x), "ine a 0" will just become 'a'.  ieq would
> turn into inot.

I made the point in another place in this thread, but you don't have
to assume -- things that take booleans as input should only ever get 0
or ~0. Anything else would be a bug in the compiler. Now that I have a
little more free time before school starts, I've been thinking about
making booleans have a bit-width of 1 (i.e. making them logical) and
then only lowering to their concrete representation very late, if at
all. That would keep this confusion from happening, in addition to
bringing a lot of other benefits, and it should be a lot easier now
that we have bit widths in NIR.

> If the boolean was a comparison, the inot could be
> folded in - i.e. inot(flt(a,b)) -> fge(a,b).  Or, some GPUs can handle
> boolean negation as a source modifier, so it might be free there too.
>
> Floating point negation can usually be done as a source modifier.
>
> For reference, here's a shader snippet from Goat Simulator which
> prompted me to write this optimization:
>
> const vec4 LocalConst1 = vec4(0.25, -0.25, 0.00, 1.00);
>
> void main()
> {
> ...
> InstrHelpTemp.r = ( ( Temporary1.r >= 0.0 ) ? LocalConst1.b : LocalConst1.a );
> ...
> }
>
> which could be turned into
>
> InstrHelpTemp.r = float(Temporary1.r < 0.0);
>
> which seems arguably better, regardless of hardware.
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 00/16] radeonsi: improve handling of temporary arrays

2016-08-10 Thread Marek Olšák
On Wed, Aug 10, 2016 at 9:23 PM, Nicolai Hähnle  wrote:
> Hi,
>
> this is a respin of the series which scans the shader's TGSI to determine
> which channels of an array are actually written to. Most of the st/mesa
> changes have become unnecessary. Most of the radeon-specific part stays
> the same.
>
> For one F1 2015 shader, it reduces the scratch size from 132096 to 26624
> bytes, which is bound to be much nicer on the texture cache.

This has been bugging me... is there something we can do to move
temporary arrays to registers?

F1 2015 is the only game that doesn't "spill VGPRs", yet has the
highest scratch usage per shader. (without this series)

If a shader uses 32 VGPRs and a *ton* of scratch space, you know
something is wrong.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] aubinator: Fix the tool to correctly decode the DWords

2016-08-10 Thread Matt Turner
On Tue, Aug 9, 2016 at 4:52 PM, Sirisha Gandikota
 wrote:
> From: Sirisha Gandikota 
>
> Several fixes have been added as part of this as listed below:
>
> 1) Fix the mask and add disassembler handling for STATE_DS, STATE_HS
> as the mask returned wrong values of the fields.
>
> 2) Fix the GEN_TYPE_ADDRESS/GEN_TYPE_OFFSET decoding - the address/
> offset were handled the same way as the other fields and that gives
> the wrong values for the address/offset.
>
> 3) Decode nested/recurssive structures - Many packets contain nested
> structures, ex: 3DSATE_SO_BUFFER, STATE_BASE_ADDRESS, etc contain MOC
> structures. Previously, the aubinator printed 1 if there was a MOC
> structure. Now we decode the entire structure and print out its fields.
>
> 4) Print out the DWord address along with its hex value - For a better
> clarity of information, it is helpful to print both the address and
> hex value of the DWord along with the DWord count. Since the DWord0
> contains the instruction code and the instruction length, it is
> unnecessary to print the decoded values for DWord0. This information
> is already available from the DWord hex value.
>
> 5) Decode the  and the corresponding fields in the group- The
>  tag can have fields of several types including structures. A
> group can contain one or more number of fields and this has be correctly
> decoded. Previously, aubinator did not decode the groups or the
> fields/structures inside them. Now we decode the  in the
> instructions and structures where the fields in it repeat for any number
> of times specified.
>
> Signed-off-by: Sirisha Gandikota 
> ---
>  src/intel/tools/aubinator.c | 115 
> +---
>  src/intel/tools/decoder.c   |  95 +---
>  src/intel/tools/decoder.h   |  39 +++
>  3 files changed, 192 insertions(+), 57 deletions(-)
>
> diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> index 99d67a1..d2f9d13 100644
> --- a/src/intel/tools/aubinator.c
> +++ b/src/intel/tools/aubinator.c
> @@ -84,17 +84,80 @@ valid_offset(uint32_t offset)
>  }
>
>  static void
> +print_dword_val(struct gen_field_iterator *iter, uint64_t offset, int 
> *dword_num)
> +{
> +   struct gen_field *f;
> +   union {
> +  uint32_t dw;
> +  float f;
> +   } v;
> +
> +   f = iter->group->fields[iter->i-1];

Spaces around operators. Occurs many places.

> +   v.dw = iter->p[f->start / 32];
> +
> +   if (*dword_num != (f->start/32)) {
> +  printf("0x%08lx:  0x%08x : Dword %d\n",(offset+4*(f->start/32)), v.dw, 
> f->start/32);
> +  *dword_num = (f->start/32);
> +   }
> +}
> +
> +static char *
> +print_iterator_values(struct gen_field_iterator *iter, int *idx)
> +{
> +char *token = NULL;
> +if (strstr(iter->value,"struct") == NULL) {
> +printf("%s: %s\n", iter->name, iter->value);
> +} else {
> +token = strtok(iter->value, " ");
> +if (token != NULL) {
> +token = strtok(NULL, " ");
> +*idx = atoi(strtok(NULL, ">"));
> +} else {
> +token = NULL;
> +}
> +printf("%s:\n", iter->name, token);
> +}
> +return token;
> +

Extra new line.

> +}
> +
> +static void
>  decode_structure(struct gen_spec *spec, struct gen_group *strct, const 
> uint32_t *p)
>  {
> struct gen_field_iterator iter;
> +   char *token = NULL;
> +   int idx = 0, dword_num = 0;
> +   uint64_t offset = 0;
> +
> +   if (option_print_offsets)
> +  offset = (void *) p - gtt;
> +   else
> +  offset = 0;
>
> gen_field_iterator_init(, strct, p);
> while (gen_field_iterator_next()) {
> -  printf("%s: %s\n", iter.name, iter.value);
> +  idx = 0;
> +  print_dword_val(, offset, _num);
> +  token = print_iterator_values(, );
> +  if (token != NULL) {
> + struct gen_group *struct_val = gen_spec_find_struct(spec, token);
> + decode_structure(spec, struct_val, [idx]);
> + token = NULL;
> +  }
> }
>  }
>
>  static void
> +handle_struct_decode(struct gen_spec *spec, char *struct_name, uint32_t *p)
> +{
> +if (struct_name == NULL)
> +return;
> +struct gen_group *struct_val = gen_spec_find_struct(spec, struct_name);
> +decode_structure(spec, struct_val, p);
> +

Extra newline.

> +}
> +
> +static void
>  dump_binding_table(struct gen_spec *spec, uint32_t offset)
>  {
> uint32_t *pointers, i;
> @@ -248,7 +311,8 @@ handle_media_interface_descriptor_load(struct gen_spec 
> *spec, uint32_t *p)
>  }
>
>  /* Heuristic to determine whether a uint32_t is probably actually a float
> - * (http://stackoverflow.com/a/2953466) */
> + * (http://stackoverflow.com/a/2953466)
> + */
>
>  static bool
>  probably_float(uint32_t bits)
> @@ -256,15 +320,15 @@ probably_float(uint32_t bits)
> int exp = ((bits & 0x7f80U) >> 23) - 127;
> uint32_t mant = 

Re: [Mesa-dev] [PATCH 1/2] aubinator: Add a new tool called Aubinator to the src/intel/tools folder.

2016-08-10 Thread Matt Turner
On Tue, Aug 9, 2016 at 4:52 PM, Sirisha Gandikota
 wrote:
> From: Kristian Høgsberg Kristensen 
>
> The Aubinator tool is designed to help the driver developers in debugging
> the driver functionality by decoding the data in the .aub files.
> Primary Authors of this tool are Damien Lespiau 
> and Kristian Høgsberg Kristensen .
>
> Signed-off-by: Sirisha Gandikota 
> ---
>  configure.ac |1 +
>  src/Makefile.am  |4 +
>  src/intel/tools/Makefile.am  |   65 +++
>  src/intel/tools/aubinator.c  | 1039 
> ++
>  src/intel/tools/decoder.c|  520 +
>  src/intel/tools/decoder.h|   57 +++
>  src/intel/tools/disasm.c |  109 +
>  src/intel/tools/gen_disasm.h |   35 ++
>  src/intel/tools/intel_aub.h  |  154 +++
>  9 files changed, 1984 insertions(+)
>  create mode 100644 src/intel/tools/Makefile.am
>  create mode 100644 src/intel/tools/aubinator.c
>  create mode 100644 src/intel/tools/decoder.c
>  create mode 100644 src/intel/tools/decoder.h
>  create mode 100644 src/intel/tools/disasm.c
>  create mode 100644 src/intel/tools/gen_disasm.h
>  create mode 100644 src/intel/tools/intel_aub.h
>
> diff --git a/configure.ac b/configure.ac
> index aea5890..acc8356 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2742,6 +2742,7 @@ AC_CONFIG_FILES([Makefile
> src/intel/genxml/Makefile
> src/intel/isl/Makefile
> src/intel/vulkan/Makefile
> +   src/intel/tools/Makefile

This list is alphabetical. tools before vulkan.

> src/loader/Makefile
> src/mapi/Makefile
> src/mapi/es1api/glesv1_cm.pc
> diff --git a/src/Makefile.am b/src/Makefile.am
> index d4e34b4..190ad08 100644
> --- a/src/Makefile.am
> +++ b/src/Makefile.am
> @@ -88,6 +88,10 @@ if HAVE_INTEL_VULKAN
>  SUBDIRS += intel/vulkan
>  endif
>
> +if HAVE_INTEL_DRIVERS
> +SUBDIRS += intel/tools
> +endif
> +
>  if HAVE_GALLIUM
>  SUBDIRS += gallium
>  endif
> diff --git a/src/intel/tools/Makefile.am b/src/intel/tools/Makefile.am
> new file mode 100644
> index 000..cf5477f
> --- /dev/null
> +++ b/src/intel/tools/Makefile.am
> @@ -0,0 +1,65 @@
> +# Copyright © 2016 Intel Corporation
> +#
> +# Permission is hereby granted, free of charge, to any person obtaining a
> +# copy of this software and associated documentation files (the "Software"),
> +# to deal in the Software without restriction, including without limitation
> +# the rights to use, copy, modify, merge, publish, distribute, sublicense,
> +# and/or sell copies of the Software, and to permit persons to whom the
> +# Software is furnished to do so, subject to the following conditions:
> +#
> +# The above copyright notice and this permission notice (including the next
> +# paragraph) shall be included in all copies or substantial portions of the
> +# Software.
> +#
> +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> +# IN THE SOFTWARE.
> +
> +CLEANFILES = \
> +   aubinator-aubinator.o \
> +   aubinator-decoder.o \
> +   disasm.lo \
> +   libdisasm.la
> +
> +noinst_LTLIBRARIES = libdisasm.la
> +
> +AM_CPPFLAGS = \
> +   $(INTEL_CFLAGS) \
> +   $(VALGRIND_CFLAGS) \
> +   $(DEFINES) \
> +   -I$(top_srcdir)/include \
> +   -I$(top_builddir)/src \
> +   -I$(top_srcdir)/src \
> +   -I$(top_builddir)/src/compiler \
> +   -I$(top_srcdir)/src/compiler \
> +   -I$(top_builddir)/src/compiler/nir \
> +   -I$(top_srcdir)/src/mapi \
> +   -I$(top_srcdir)/src/mesa \
> +   -I$(top_srcdir)/src/mesa/drivers/dri/common \
> +   -I$(top_srcdir)/src/mesa/drivers/dri/i965 \
> +   -I$(top_srcdir)/src/gallium/auxiliary \
> +   -I$(top_srcdir)/src/gallium/include \
> +   -I$(top_builddir)/src/intel \
> +   -I$(top_srcdir)/src/intel
> +
> +libdisasm_la_SOURCES = \
> +   gen_disasm.h \
> +   disasm.c
> +libdisasm_la_LIBADD = 
> $(top_builddir)/src/mesa/drivers/dri/i965/libi965_compiler.la \
> +   $(top_builddir)/src/mesa/libmesa.la \
> +   $(PER_GEN_LIBS) \
> +   $(PTHREAD_LIBS) \
> +   $(DLOPEN_LIBS) \
> +   -lm
> +
> +
> +bin_PROGRAMS = aubinator

Point of discussion: do we actually want to install this to the users' systems?

> +
> +aubinator_SOURCES = intel_aub.h decoder.c decoder.h aubinator.c
> +aubinator_CFLAGS =  $(EXPAT_CFLAGS) 

Re: [Mesa-dev] [PATCH 2/2] aubinator: Fix the tool to correctly decode the DWords

2016-08-10 Thread Jason Ekstrand
On Tue, Aug 9, 2016 at 4:52 PM, Sirisha Gandikota <
sirisha.gandik...@intel.com> wrote:

> From: Sirisha Gandikota 
>
> Several fixes have been added as part of this as listed below:
>
> 1) Fix the mask and add disassembler handling for STATE_DS, STATE_HS
> as the mask returned wrong values of the fields.
>
> 2) Fix the GEN_TYPE_ADDRESS/GEN_TYPE_OFFSET decoding - the address/
> offset were handled the same way as the other fields and that gives
> the wrong values for the address/offset.
>
> 3) Decode nested/recurssive structures - Many packets contain nested
> structures, ex: 3DSATE_SO_BUFFER, STATE_BASE_ADDRESS, etc contain MOC
> structures. Previously, the aubinator printed 1 if there was a MOC
> structure. Now we decode the entire structure and print out its fields.
>
> 4) Print out the DWord address along with its hex value - For a better
> clarity of information, it is helpful to print both the address and
> hex value of the DWord along with the DWord count. Since the DWord0
> contains the instruction code and the instruction length, it is
> unnecessary to print the decoded values for DWord0. This information
> is already available from the DWord hex value.
>
> 5) Decode the  and the corresponding fields in the group- The
>  tag can have fields of several types including structures. A
> group can contain one or more number of fields and this has be correctly
> decoded. Previously, aubinator did not decode the groups or the
> fields/structures inside them. Now we decode the  in the
> instructions and structures where the fields in it repeat for any number
> of times specified.
>

One comment for now:  Usually, when making a bunch of changes like this, we
try and split each functional change into its own patch.  That way it's
more clear what's going on.  This is brand new code, so a couple giant
patches may be ok, but having it split up would be nicer. :)  Git has some
tools that make this less painful than it might look and I'm available to
help if you'd like.

I'll try and get around to actually playing with it soon.
--Jason


>
> Signed-off-by: Sirisha Gandikota 
> ---
>  src/intel/tools/aubinator.c | 115 ++
> +++---
>  src/intel/tools/decoder.c   |  95 +---
>  src/intel/tools/decoder.h   |  39 +++
>  3 files changed, 192 insertions(+), 57 deletions(-)
>
> diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> index 99d67a1..d2f9d13 100644
> --- a/src/intel/tools/aubinator.c
> +++ b/src/intel/tools/aubinator.c
> @@ -84,17 +84,80 @@ valid_offset(uint32_t offset)
>  }
>
>  static void
> +print_dword_val(struct gen_field_iterator *iter, uint64_t offset, int
> *dword_num)
> +{
> +   struct gen_field *f;
> +   union {
> +  uint32_t dw;
> +  float f;
> +   } v;
> +
> +   f = iter->group->fields[iter->i-1];
> +   v.dw = iter->p[f->start / 32];
> +
> +   if (*dword_num != (f->start/32)) {
> +  printf("0x%08lx:  0x%08x : Dword %d\n",(offset+4*(f->start/32)),
> v.dw, f->start/32);
> +  *dword_num = (f->start/32);
> +   }
> +}
> +
> +static char *
> +print_iterator_values(struct gen_field_iterator *iter, int *idx)
> +{
> +char *token = NULL;
> +if (strstr(iter->value,"struct") == NULL) {
> +printf("%s: %s\n", iter->name, iter->value);
> +} else {
> +token = strtok(iter->value, " ");
> +if (token != NULL) {
> +token = strtok(NULL, " ");
> +*idx = atoi(strtok(NULL, ">"));
> +} else {
> +token = NULL;
> +}
> +printf("%s:\n", iter->name, token);
> +}
> +return token;
> +
> +}
> +
> +static void
>  decode_structure(struct gen_spec *spec, struct gen_group *strct, const
> uint32_t *p)
>  {
> struct gen_field_iterator iter;
> +   char *token = NULL;
> +   int idx = 0, dword_num = 0;
> +   uint64_t offset = 0;
> +
> +   if (option_print_offsets)
> +  offset = (void *) p - gtt;
> +   else
> +  offset = 0;
>
> gen_field_iterator_init(, strct, p);
> while (gen_field_iterator_next()) {
> -  printf("%s: %s\n", iter.name, iter.value);
> +  idx = 0;
> +  print_dword_val(, offset, _num);
> +  token = print_iterator_values(, );
> +  if (token != NULL) {
> + struct gen_group *struct_val = gen_spec_find_struct(spec, token);
> + decode_structure(spec, struct_val, [idx]);
> + token = NULL;
> +  }
> }
>  }
>
>  static void
> +handle_struct_decode(struct gen_spec *spec, char *struct_name, uint32_t
> *p)
> +{
> +if (struct_name == NULL)
> +return;
> +struct gen_group *struct_val = gen_spec_find_struct(spec,
> struct_name);
> +decode_structure(spec, struct_val, p);
> +
> +}
> +
> +static void
>  dump_binding_table(struct gen_spec *spec, uint32_t offset)
>  {
> uint32_t *pointers, i;
> @@ -248,7 +311,8 @@ handle_media_interface_descriptor_load(struct
> 

Re: [Mesa-dev] [PATCH 4/6] nir: Introduce a nir_opt_move_comparisons() pass.

2016-08-10 Thread Jason Ekstrand
On Wed, Aug 10, 2016 at 11:14 AM, Ian Romanick  wrote:

> On 08/09/2016 07:30 PM, Kenneth Graunke wrote:
> > This tries to move comparisons (a common source of boolean values)
> > closer to their first use.  For GPUs which use condition codes,
> > this can eliminate a lot of temporary booleans and comparisons
> > which reload the condition code register based on a boolean.
> >
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/compiler/Makefile.sources   |   1 +
> >  src/compiler/nir/nir.h  |   2 +
> >  src/compiler/nir/nir_opt_move_comparisons.c | 173
> 
> >  3 files changed, 176 insertions(+)
> >  create mode 100644 src/compiler/nir/nir_opt_move_comparisons.c
> >
> > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.
> sources
> > index 0ff9b23..008a101 100644
> > --- a/src/compiler/Makefile.sources
> > +++ b/src/compiler/Makefile.sources
> > @@ -226,6 +226,7 @@ NIR_FILES = \
> >   nir/nir_opt_gcm.c \
> >   nir/nir_opt_global_to_local.c \
> >   nir/nir_opt_peephole_select.c \
> > + nir/nir_opt_move_comparisons.c \
> >   nir/nir_opt_remove_phis.c \
> >   nir/nir_opt_undef.c \
> >   nir/nir_phi_builder.c \
> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> > index 9ce5be2..79511a7 100644
> > --- a/src/compiler/nir/nir.h
> > +++ b/src/compiler/nir/nir.h
> > @@ -2574,6 +2574,8 @@ bool nir_opt_dead_cf(nir_shader *shader);
> >
> >  void nir_opt_gcm(nir_shader *shader);
> >
> > +bool nir_opt_move_comparisons(nir_shader *shader);
> > +
> >  bool nir_opt_peephole_select(nir_shader *shader);
> >
> >  bool nir_opt_remove_phis(nir_shader *shader);
> > diff --git a/src/compiler/nir/nir_opt_move_comparisons.c
> b/src/compiler/nir/nir_opt_move_comparisons.c
> > new file mode 100644
> > index 000..74927c9
> > --- /dev/null
> > +++ b/src/compiler/nir/nir_opt_move_comparisons.c
> > @@ -0,0 +1,173 @@
> > +/*
> > + * Copyright © 2016 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person
> obtaining a
> > + * copy of this software and associated documentation files (the
> "Software"),
> > + * to deal in the Software without restriction, including without
> limitation
> > + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including the
> next
> > + * paragraph) shall be included in all copies or substantial portions
> of the
> > + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> > + * IN THE SOFTWARE.
> > + */
> > +
> > +#include "nir.h"
> > +
> > +/**
> > + * \file nir_opt_move_comparisons.c
> > + *
> > + * This pass moves ALU comparison operations just before their first
> use.
> > + *
> > + * It only moves instructions within a single basic block; cross-block
> > + * movement is left to global code motion.
> > + *
> > + * Many GPUs generate condition codes for comparisons, and use
> predication
> > + * for conditional selects and control flow.  In a sequence such as:
> > + *
> > + * vec1 32 ssa_1 = flt a b
> > + * 
> > + * vec1 32 ssa_2 = bcsel ssa_1 c d
> > + *
> > + * the backend would likely do the comparison, producing condition
> codes,
> > + * then save those to a boolean value.  The intervening operations might
> > + * trash the condition codes.  Then, in order to do the bcsel, it would
> > + * need to re-populate the condition code register based on the boolean.
> > + *
> > + * By moving the comparison just before the bcsel, the condition codes
> could
> > + * be used directly.  This eliminates the need to reload them from the
> boolean
> > + * (generally eliminating an instruction).  It may also eliminate the
> need to
> > + * create a boolean value altogether (unless it's used elsewhere),
> which could
> > + * lower register pressure.
> > + */
> > +
> > +static bool
> > +is_comparison(nir_op op)
> > +{
> > +   switch (op) {
> > +   case nir_op_flt:
> > +   case nir_op_fge:
> > +   case nir_op_feq:
> > +   case nir_op_fne:
> > +   case nir_op_ilt:
> > +   case nir_op_ult:
> > +   case nir_op_ige:
> > +   case nir_op_uge:
> > +   case nir_op_ieq:
> > +   case nir_op_ine:
> > +  return true;
> > +   default:
> > +  return false;
> > +   }
> > +}
> > +
> > +static bool
> 

Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.

2016-08-10 Thread Eric Anholt
Connor Abbott  writes:

> On Wed, Aug 10, 2016 at 1:53 PM, Eric Anholt  wrote:
>> Kenneth Graunke  writes:
>>
>>> On Haswell (GL 3.3):
>>>
>>> total instructions in shared programs: 6208759 -> 6203860 (-0.08%)
>>> instructions in affected programs: 856541 -> 851642 (-0.57%)
>>> helped: 3157
>>> HURT: 113
>>> LOST:   7
>>> GAINED: 15
>>>
>>> On Broadwell (GL 4.4):
>>>
>>> total instructions in shared programs: 11637854 -> 11632016 (-0.05%)
>>> instructions in affected programs: 1055693 -> 1049855 (-0.55%)
>>> helped: 3900
>>> HURT: 176
>>> LOST:   1
>>> GAINED: 18
>>>
>>> Signed-off-by: Kenneth Graunke 
>>> ---
>>>  src/compiler/nir/nir_opt_algebraic.py | 4 
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
>>> b/src/compiler/nir/nir_opt_algebraic.py
>>> index 1cf614c..4e9896f 100644
>>> --- a/src/compiler/nir/nir_opt_algebraic.py
>>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>>> @@ -251,6 +251,10 @@ optimizations = [
>>> (('ieq', 'a@bool', False), ('inot', 'a')),
>>> (('bcsel', a, True, False), ('ine', a, 0)),
>>> (('bcsel', a, False, True), ('ieq', a, 0)),
>>> +   (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))),
>>> +   (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))),
>>> +   (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0,
>>> +   (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0,
>>
>> Isn't bcsel's first arg guaranteed to be 0 or ~0?  I thought that was
>> how nir's bcsel (and b2f) worked.  If not, it would be good to see this
>> documented.
>
> Yes, any ALU operation that takes a boolean input can expect it to be
> 0 or ~0. If that isn't true, then something else in the compiler has
> messed up. So I think we can replace 'a != 0' with 'a' and 'a == 0'
> with '!a'.

Yeah.  So, I like the contents of this series a lot, but it would be
nice to see some of the @bool annotations dropped.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swrast: fix active attribs with atifragshader

2016-08-10 Thread Miklós Máté

On 08/04/2016 06:51 PM, Brian Paul wrote:

On 08/04/2016 10:40 AM, Miklós Máté wrote:

On 07/14/2016 10:43 PM, Miklós Máté wrote:

On 07/07/2016 07:11 PM, Miklós Máté wrote:

On 06/26/2016 09:48 PM, Miklós Máté wrote:

Only include the ones that can be used by the shader.

This fixes texture coordinates, which were completely wrong,
because WPOS was included in the list of attribs. It also
increases performance noticeably.

Signed-off-by: Miklós Máté 
---
  src/mesa/swrast/s_context.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/swrast/s_context.c 
b/src/mesa/swrast/s_context.c

index 0a5fc7e..a63179c 100644
--- a/src/mesa/swrast/s_context.c
+++ b/src/mesa/swrast/s_context.c
@@ -504,7 +504,8 @@ _swrast_update_active_attribs(struct gl_context
*ctx)
attribsMask &= ~VARYING_BIT_POS; /* WPOS is always handled
specially */
 }
 else if (ctx->ATIFragmentShader._Enabled) {
-  attribsMask = ~0;  /* XXX fix me */
+  attribsMask = VARYING_BIT_COL0 | VARYING_BIT_COL1 |
+VARYING_BIT_FOGC | VARYING_BITS_TEX_ANY;
 }
 else {
/* fixed function */


ping?


ping?


Can anyone please review this?


I think few people know swrast and this particular extension.

Looks OK to me though.

Reviewed-by: Brian Paul 

Thanks for the review. If there are no objections, I need somebody to 
commit this for me, because I have no access.


MM

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/7] gallium/auxiliary: Add u_bitcast.h header.

2016-08-10 Thread Matt Turner
On Thu, Jul 28, 2016 at 9:55 PM, Connor Abbott  wrote:
> On Thu, Jul 28, 2016 at 7:58 PM, Roland Scheidegger  
> wrote:
>> Am 29.07.2016 um 00:35 schrieb Matt Turner:
>>> ---
>>>  src/gallium/auxiliary/Makefile.sources |  1 +
>>>  src/gallium/auxiliary/util/u_bitcast.h | 57 
>>> ++
>>>  2 files changed, 58 insertions(+)
>>>  create mode 100644 src/gallium/auxiliary/util/u_bitcast.h
>>>
>>> diff --git a/src/gallium/auxiliary/Makefile.sources 
>>> b/src/gallium/auxiliary/Makefile.sources
>>> index e0311bf..26f7eee 100644
>>> --- a/src/gallium/auxiliary/Makefile.sources
>>> +++ b/src/gallium/auxiliary/Makefile.sources
>>> @@ -175,6 +175,7 @@ C_SOURCES := \
>>>   translate/translate_generic.c \
>>>   translate/translate_sse.c \
>>>   util/dbghelp.h \
>>> + util/u_bitcast.h \
>>>   util/u_bitmask.c \
>>>   util/u_bitmask.h \
>>>   util/u_blend.h \
>>> diff --git a/src/gallium/auxiliary/util/u_bitcast.h 
>>> b/src/gallium/auxiliary/util/u_bitcast.h
>>> new file mode 100644
>>> index 000..b1f9938
>>> --- /dev/null
>>> +++ b/src/gallium/auxiliary/util/u_bitcast.h
>>> @@ -0,0 +1,57 @@
>>> +/**
>>> + *
>>> + * Copyright © 2016 Intel Corporation
>>> + * All Rights Reserved.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>> + * copy of this software and associated documentation files (the
>>> + * "Software"), to deal in the Software without restriction, including
>>> + * without limitation the rights to use, copy, modify, merge, publish,
>>> + * distribute, sub license, and/or sell copies of the Software, and to
>>> + * permit persons to whom the Software is furnished to do so, subject to
>>> + * the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice (including the
>>> + * next paragraph) shall be included in all copies or substantial portions
>>> + * of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
>>> + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
>>> + * IN NO EVENT SHALL THE AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR
>>> + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
>>> + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>>> + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>> + *
>>> + 
>>> **/
>>> +
>>> +#ifndef U_BITCAST_H_
>>> +#define U_BITCAST_H_
>>> +
>>> +#include 
>>> +
>>> +#ifdef __cplusplus
>>> +extern "C" {
>>> +#endif
>>> +
>>> +static inline unsigned
>>> +u_bitcast_f2u(float f)
>>> +{
>>> +   unsigned u;
>>> +   memcpy(, , sizeof(u));
>>> +   return u;
>>> +}
>>> +
>>> +static inline float
>>> +u_bitcast_u2f(unsigned u)
>>> +{
>>> +   float f;
>>> +   memcpy(, , sizeof(f));
>>> +   return f;
>>> +}
>>> +
>>> +#ifdef __cplusplus
>>> +}
>>> +#endif
>>> +
>>> +#endif /* U_BITCAST_H_ */
>>>
>>
>> I think traditionally we've used the union fi from u_math.h to do this.
>> That said, I don't have anything against using an inline memcpy helper
>> function instead - both should hopefully get optimized away...
>
> Unfortunately, that won't work with the strict aliasing rules either
> since it's undefined behavior to reinterpret a union by dereferencing
> more than member. That is, once you assign to one field of a union,
> you can read from/write to that field only. You can, however,
> reinterpret anything as a char * and dereference, hence why memcopy
> etc. aren't undefined.

Sorry for the late reply, but I thought it was important to correct
this common misunderstanding (Even John Regehr gets it wrong! [1]).

Since at least C99, the union trick is fine. C99 Technical Corrigendum
3 [2] says:

"""
If the member used to access the contents of a union object is not the
same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as described
in 6.2.6 (a process sometimes called "type punning"). This might be a
trap representation.
"""

It seems, but I have not confirmed, that it is not legal in C++. I
cannot find a reference now, but I remember reading a C89
clarification that explained that it was only "unspecified" because of
the "trap representation" wording and was intended to have the same
semantics as C99.

Regardless, memcpy generates optimal code and is legal in C and C++.
Hence my choice in this series. (Aside: Hence Why Must Die [3])

[1] http://blog.regehr.org/archives/959
[2] http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1235.pdf
[3] http://public.wsu.edu/~brians/errors/hencewhy.html
___
mesa-dev 

[Mesa-dev] [PATCH 06/16] gallium/radeon: more descriptive names for LLVM temporaries in debug builds

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Reviewed-by: Tom Stellard 
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 7b96a58..22ff18e 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -31,20 +31,21 @@
 #include "gallivm/lp_bld_init.h"
 #include "gallivm/lp_bld_intr.h"
 #include "gallivm/lp_bld_misc.h"
 #include "gallivm/lp_bld_swizzle.h"
 #include "tgsi/tgsi_info.h"
 #include "tgsi/tgsi_parse.h"
 #include "util/u_math.h"
 #include "util/u_memory.h"
 #include "util/u_debug.h"
 
+#include 
 #include 
 #include 
 
 LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base,
  enum tgsi_opcode_type type)
 {
LLVMContextRef ctx = bld_base->base.gallivm->context;
 
switch (type) {
case TGSI_TYPE_UNSIGNED:
@@ -421,20 +422,21 @@ static void emit_declaration(struct lp_build_tgsi_context 
*bld_base,
 ctx->soa.addr[idx][chan] = 
si_build_alloca_undef(
>gallivm,
ctx->soa.bld_base.uint_bld.elem_type, 
"");
}
}
break;
}
 
case TGSI_FILE_TEMPORARY:
{
+   char name[16] = "";
LLVMValueRef array_alloca = NULL;
unsigned decl_size;
first = decl->Range.First;
last = decl->Range.Last;
decl_size = 4 * ((last - first) + 1);
if (decl->Declaration.Array) {
unsigned id = decl->Array.ArrayID - 1;
if (!ctx->arrays) {
int size = 
bld_base->info->array_max[TGSI_FILE_TEMPORARY];
ctx->arrays = CALLOC(size, 
sizeof(ctx->arrays[0]));
@@ -458,34 +460,42 @@ static void emit_declaration(struct lp_build_tgsi_context 
*bld_base,
ctx->arrays[id].alloca = array_alloca;
}
}
 
if (!ctx->temps_count) {
ctx->temps_count = 
bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1;
ctx->temps = MALLOC(TGSI_NUM_CHANNELS * 
ctx->temps_count * sizeof(LLVMValueRef));
}
if (!array_alloca) {
for (i = 0; i < decl_size; ++i) {
+#ifdef DEBUG
+   snprintf(name, sizeof(name), "TEMP%d.%c",
+first + i / 4, "xyzw"[i % 4]);
+#endif
ctx->temps[first * TGSI_NUM_CHANNELS + i] =

si_build_alloca_undef(bld_base->base.gallivm,
  
bld_base->base.vec_type,
- "temp");
+ name);
}
} else {
LLVMValueRef idxs[2] = {
bld_base->uint_bld.zero,
NULL
};
for (i = 0; i < decl_size; ++i) {
+#ifdef DEBUG
+   snprintf(name, sizeof(name), "TEMP%d.%c",
+first + i / 4, "xyzw"[i % 4]);
+#endif
idxs[1] = 
lp_build_const_int32(bld_base->base.gallivm, i);
ctx->temps[first * TGSI_NUM_CHANNELS + i] =
-   LLVMBuildGEP(builder, array_alloca, 
idxs, 2, "temp");
+   LLVMBuildGEP(builder, array_alloca, 
idxs, 2, name);
}
}
break;
}
case TGSI_FILE_INPUT:
{
unsigned idx;
for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) {
if (ctx->load_input)
ctx->load_input(ctx, idx, decl);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/16] gallium/radeon: allocate temps array info in radeon_llvm_context_init

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Also, prepare for using tgsi_array_info.

This also opens the door for properly handling allocation failures, but I'm
leaving that for a separate change.
---
 src/gallium/drivers/radeon/radeon_llvm.h   | 11 ++--
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 66 +-
 src/gallium/drivers/radeonsi/si_shader.c   |  6 +-
 3 files changed, 47 insertions(+), 36 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index 13f3336..6086dd6 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -43,25 +43,20 @@ struct radeon_llvm_branch {
LLVMBasicBlockRef if_block;
LLVMBasicBlockRef else_block;
unsigned has_else;
 };
 
 struct radeon_llvm_loop {
LLVMBasicBlockRef loop_block;
LLVMBasicBlockRef endloop_block;
 };
 
-struct radeon_llvm_array {
-   struct tgsi_declaration_range range;
-   LLVMValueRef alloca;
-};
-
 struct radeon_llvm_context {
struct lp_build_tgsi_soa_context soa;
 
/*=== Front end configuration ===*/
 
/* Instructions that are not described by any of the TGSI opcodes. */
 
/** This function is responsible for initilizing the inputs array and 
will be
  * called once for each input declared in the TGSI shader.
  */
@@ -94,21 +89,22 @@ struct radeon_llvm_context {
/*=== Private Members ===*/
 
struct radeon_llvm_branch *branch;
struct radeon_llvm_loop *loop;
 
unsigned branch_depth;
unsigned branch_depth_max;
unsigned loop_depth;
unsigned loop_depth_max;
 
-   struct radeon_llvm_array *arrays;
+   struct tgsi_array_info *temp_arrays;
+   LLVMValueRef *temp_array_allocas;
 
LLVMValueRef main_fn;
LLVMTypeRef return_type;
 
unsigned fpmath_md_kind;
LLVMValueRef fpmath_md_2p5_ulp;
 
struct gallivm_state gallivm;
 };
 
@@ -117,21 +113,22 @@ LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context 
*bld_base,
 
 LLVMValueRef bitcast(struct lp_build_tgsi_context *bld_base,
 enum tgsi_opcode_type type, LLVMValueRef value);
 
 void radeon_llvm_emit_prepare_cube_coords(struct lp_build_tgsi_context 
*bld_base,
  struct lp_build_emit_data *emit_data,
  LLVMValueRef *coords_arg,
  LLVMValueRef *derivs_arg);
 
 void radeon_llvm_context_init(struct radeon_llvm_context *ctx,
-  const char *triple);
+  const char *triple,
+ const struct tgsi_shader_info *info);
 
 void radeon_llvm_create_func(struct radeon_llvm_context *ctx,
 LLVMTypeRef *return_types, unsigned 
num_return_elems,
 LLVMTypeRef *ParamTypes, unsigned ParamCount);
 
 void radeon_llvm_dispose(struct radeon_llvm_context *ctx);
 
 unsigned radeon_llvm_reg_index_soa(unsigned index, unsigned chan);
 
 void radeon_llvm_finalize_module(struct radeon_llvm_context *ctx);
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index d8ab5b0..2521023 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -108,54 +108,54 @@ static LLVMValueRef emit_swizzle(struct 
lp_build_tgsi_context *bld_base,
return LLVMBuildShuffleVector(bld_base->base.gallivm->builder,
  value,
  LLVMGetUndef(LLVMTypeOf(value)),
  LLVMConstVector(swizzles, 4), "");
 }
 
 /**
  * Return the description of the array covering the given temporary register
  * index.
  */
-static const struct radeon_llvm_array *
-get_temp_array(struct lp_build_tgsi_context *bld_base,
-  unsigned reg_index,
-  const struct tgsi_ind_register *reg)
+static unsigned
+get_temp_array_id(struct lp_build_tgsi_context *bld_base,
+ unsigned reg_index,
+ const struct tgsi_ind_register *reg)
 {
struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
unsigned num_arrays = 
ctx->soa.bld_base.info->array_max[TGSI_FILE_TEMPORARY];
unsigned i;
 
if (reg && reg->ArrayID > 0 && reg->ArrayID <= num_arrays)
-   return >arrays[reg->ArrayID - 1];
+   return reg->ArrayID;
 
for (i = 0; i < num_arrays; i++) {
-   const struct radeon_llvm_array *array = >arrays[i];
+   const struct tgsi_array_info *array = >temp_arrays[i];
 
if (reg_index >= array->range.First && reg_index <= 
array->range.Last)
-   return array;
+   return i 

[Mesa-dev] [PATCH 16/16] gallium/radeon: protect against out of bounds temporary array accesses

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

They can lead to VM faults and worse, which goes against the GL robustness
promises.
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 7cdf228..88c7b3c 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -232,20 +232,35 @@ get_pointer_into_array(struct radeon_llvm_context *ctx,
if (!alloca)
return NULL;
 
array = >temp_arrays[array_id - 1];
 
if (!(array->writemask & (1 << swizzle)))
return ctx->undef_alloca;
 
index = emit_array_index(>soa, reg_indirect,
 reg_index - ctx->temp_arrays[array_id - 
1].range.First);
+
+   /* Ensure that the index is within a valid range, to guard against
+* VM faults and overwriting critical data (e.g. spilled resource
+* descriptors).
+*
+* TODO It should be possible to avoid the additional instructions
+* if LLVM is changed so that it guarantuees:
+* 1. the scratch space descriptor isolates the current wave (this
+*could even save the scratch offset SGPR at the cost of an
+*additional SALU instruction)
+* 2. the memory for allocas must be allocated at the _end_ of the
+*scratch space (after spilled registers)
+*/
+   index = radeon_llvm_bound_index(ctx, index, array->range.Last - 
array->range.First + 1);
+
index = LLVMBuildMul(
builder, index,
lp_build_const_int32(gallivm, util_bitcount(array->writemask)),
"");
index = LLVMBuildAdd(
builder, index,
lp_build_const_int32(
gallivm,
util_bitcount(array->writemask & ((1 << swizzle) - 1))),
"");
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/16] gallium/radeon: use tgsi_scan_arrays for temp arrays

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeon/radeon_llvm.h| 3 ++-
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 9 ++---
 src/gallium/drivers/radeonsi/si_shader.c| 3 ++-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index 6086dd6..4ed2c97 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -114,21 +114,22 @@ LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context 
*bld_base,
 LLVMValueRef bitcast(struct lp_build_tgsi_context *bld_base,
 enum tgsi_opcode_type type, LLVMValueRef value);
 
 void radeon_llvm_emit_prepare_cube_coords(struct lp_build_tgsi_context 
*bld_base,
  struct lp_build_emit_data *emit_data,
  LLVMValueRef *coords_arg,
  LLVMValueRef *derivs_arg);
 
 void radeon_llvm_context_init(struct radeon_llvm_context *ctx,
   const char *triple,
- const struct tgsi_shader_info *info);
+ const struct tgsi_shader_info *info,
+ const struct tgsi_token *tokens);
 
 void radeon_llvm_create_func(struct radeon_llvm_context *ctx,
 LLVMTypeRef *return_types, unsigned 
num_return_elems,
 LLVMTypeRef *ParamTypes, unsigned ParamCount);
 
 void radeon_llvm_dispose(struct radeon_llvm_context *ctx);
 
 unsigned radeon_llvm_reg_index_soa(unsigned index, unsigned chan);
 
 void radeon_llvm_finalize_module(struct radeon_llvm_context *ctx);
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 2521023..dac0594 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -478,22 +478,20 @@ static void emit_declaration(struct lp_build_tgsi_context 
*bld_base,
{
char name[16] = "";
LLVMValueRef array_alloca = NULL;
unsigned decl_size;
first = decl->Range.First;
last = decl->Range.Last;
decl_size = 4 * ((last - first) + 1);
if (decl->Declaration.Array) {
unsigned id = decl->Array.ArrayID - 1;
 
-   ctx->temp_arrays[id].range = decl->Range;
-
/* If the array has more than 16 elements, store it
 * in memory using an alloca that spans the entire
 * array.
 *
 * Otherwise, store each array element individually.
 * We will then generate vectors (per-channel, up to
 * <4 x float>) for indirect addressing.
 *
 * Note that 16 is the number of vector elements that
 * LLVM will store in a register, so theoretically an
@@ -1723,21 +1721,22 @@ static void emit_rsq(const struct lp_build_tgsi_action 
*action,
LLVMValueRef sqrt =
lp_build_emit_llvm_unary(bld_base, TGSI_OPCODE_SQRT,
 emit_data->args[0]);
 
emit_data->output[emit_data->chan] =
lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_DIV,
  bld_base->base.one, sqrt);
 }
 
 void radeon_llvm_context_init(struct radeon_llvm_context *ctx, const char 
*triple,
- const struct tgsi_shader_info *info)
+ const struct tgsi_shader_info *info,
+ const struct tgsi_token *tokens)
 {
struct lp_type type;
 
/* Initialize the gallivm object:
 * We are only using the module, context, and builder fields of this 
struct.
 * This should be enough for us to be able to pass our gallivm struct 
to the
 * helper functions in the gallivm module.
 */
memset(>gallivm, 0, sizeof (ctx->gallivm));
memset(>soa, 0, sizeof(ctx->soa));
@@ -1749,20 +1748,24 @@ void radeon_llvm_context_init(struct 
radeon_llvm_context *ctx, const char *tripl
 
struct lp_build_tgsi_context *bld_base = >soa.bld_base;
 
bld_base->info = info;
 
if (info && info->array_max[TGSI_FILE_TEMPORARY] > 0) {
int size = info->array_max[TGSI_FILE_TEMPORARY];
 
ctx->temp_arrays = CALLOC(size, sizeof(ctx->temp_arrays[0]));
ctx->temp_array_allocas = CALLOC(size, 
sizeof(ctx->temp_array_allocas[0]));
+
+   if (tokens)
+   tgsi_scan_arrays(tokens, TGSI_FILE_TEMPORARY, size,
+

[Mesa-dev] [PATCH 14/16] gallium/radeon: reduce alloca of temporaries based on usagemask

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

v2: take actual writemasks into account
---
 src/gallium/drivers/radeon/radeon_llvm.h   |  2 +
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 62 ++
 2 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index 4ed2c97..0276ef3 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -92,20 +92,22 @@ struct radeon_llvm_context {
struct radeon_llvm_loop *loop;
 
unsigned branch_depth;
unsigned branch_depth_max;
unsigned loop_depth;
unsigned loop_depth_max;
 
struct tgsi_array_info *temp_arrays;
LLVMValueRef *temp_array_allocas;
 
+   LLVMValueRef undef_alloca;
+
LLVMValueRef main_fn;
LLVMTypeRef return_type;
 
unsigned fpmath_md_kind;
LLVMValueRef fpmath_md_2p5_ulp;
 
struct gallivm_state gallivm;
 };
 
 LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base,
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index dac0594..dd7d60b 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -178,41 +178,55 @@ emit_array_index(struct lp_build_tgsi_soa_context *bld,
  * must be used.
  */
 static LLVMValueRef
 get_pointer_into_array(struct radeon_llvm_context *ctx,
   unsigned file,
   unsigned swizzle,
   unsigned reg_index,
   const struct tgsi_ind_register *reg_indirect)
 {
unsigned array_id;
+   struct tgsi_array_info *array;
struct gallivm_state *gallivm = ctx->soa.bld_base.base.gallivm;
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef idxs[2];
LLVMValueRef index;
LLVMValueRef alloca;
 
if (file != TGSI_FILE_TEMPORARY)
return NULL;
 
array_id = get_temp_array_id(>soa.bld_base, reg_index, 
reg_indirect);
if (!array_id)
return NULL;
 
alloca = ctx->temp_array_allocas[array_id - 1];
if (!alloca)
return NULL;
 
+   array = >temp_arrays[array_id - 1];
+
+   if (!(array->writemask & (1 << swizzle)))
+   return ctx->undef_alloca;
+
index = emit_array_index(>soa, reg_indirect,
 reg_index - ctx->temp_arrays[array_id - 
1].range.First);
-   index = LLVMBuildMul(builder, index, lp_build_const_int32(gallivm, 
TGSI_NUM_CHANNELS), "");
-   index = LLVMBuildAdd(builder, index, lp_build_const_int32(gallivm, 
swizzle), "");
+   index = LLVMBuildMul(
+   builder, index,
+   lp_build_const_int32(gallivm, util_bitcount(array->writemask)),
+   "");
+   index = LLVMBuildAdd(
+   builder, index,
+   lp_build_const_int32(
+   gallivm,
+   util_bitcount(array->writemask & ((1 << swizzle) - 1))),
+   "");
idxs[0] = ctx->soa.bld_base.uint_bld.zero;
idxs[1] = index;
return LLVMBuildGEP(builder, alloca, idxs, 2, "");
 }
 
 LLVMValueRef
 radeon_llvm_emit_fetch_64bit(struct lp_build_tgsi_context *bld_base,
 enum tgsi_opcode_type type,
 LLVMValueRef ptr,
 LLVMValueRef ptr2)
@@ -472,48 +486,56 @@ static void emit_declaration(struct lp_build_tgsi_context 
*bld_base,
}
}
break;
}
 
case TGSI_FILE_TEMPORARY:
{
char name[16] = "";
LLVMValueRef array_alloca = NULL;
unsigned decl_size;
+   unsigned writemask = decl->Declaration.UsageMask;
first = decl->Range.First;
last = decl->Range.Last;
decl_size = 4 * ((last - first) + 1);
+
if (decl->Declaration.Array) {
unsigned id = decl->Array.ArrayID - 1;
+   unsigned array_size;
+
+   writemask &= ctx->temp_arrays[id].writemask;
+   ctx->temp_arrays[id].writemask = writemask;
+   array_size = ((last - first) + 1) * 
util_bitcount(writemask);
 
/* If the array has more than 16 elements, store it
 * in memory using an alloca that spans the entire
 * array.
 *
 * Otherwise, store each array element individually.
 * We will then generate vectors (per-channel, up to
-* <4 x float>) for indirect addressing.
+* <16 x float> if the usagemask 

[Mesa-dev] [PATCH 09/16] gallium/radeon: pass indirect register info into get_alloca_for_array

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

To have the same signature as get_array_range.
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 994c7da..531a8fe 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -151,28 +151,29 @@ get_array_range(struct lp_build_tgsi_context *bld_base,
return array->range;
}
 
range.First = 0;
range.Last = bld_base->info->file_max[File];
return range;
 }
 
 static LLVMValueRef get_alloca_for_array(struct lp_build_tgsi_context 
*bld_base,
 unsigned file,
-unsigned index)
+unsigned index,
+const struct tgsi_ind_register *reg)
 {
const struct radeon_llvm_array *array;
 
if (file != TGSI_FILE_TEMPORARY)
return NULL;
 
-   array = get_temp_array(bld_base, index, NULL);
+   array = get_temp_array(bld_base, index, reg);
if (!array)
return NULL;
 
return array->alloca;
 }
 
 static LLVMValueRef
 emit_array_index(struct lp_build_tgsi_soa_context *bld,
 const struct tgsi_ind_register *reg,
 unsigned offset)
@@ -240,21 +241,21 @@ load_value_from_array(struct lp_build_tgsi_context 
*bld_base,
  enum tgsi_opcode_type type,
  unsigned swizzle,
  unsigned reg_index,
  const struct tgsi_ind_register *reg_indirect)
 {
struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base);
struct gallivm_state *gallivm = bld_base->base.gallivm;
LLVMBuilderRef builder = gallivm->builder;
struct tgsi_declaration_range range = get_array_range(bld_base, file, 
reg_index, reg_indirect);
LLVMValueRef index = emit_array_index(bld, reg_indirect, reg_index - 
range.First);
-   LLVMValueRef array = get_alloca_for_array(bld_base, file, reg_index);
+   LLVMValueRef array = get_alloca_for_array(bld_base, file, reg_index, 
reg_indirect);
LLVMValueRef ptr, val, indices[2];
 
if (!array) {
/* Handle the case where the array is stored as a vector. */
return LLVMBuildExtractElement(builder,
emit_array_fetch(bld_base, file, type, range, 
swizzle),
index, "");
}
 
index = LLVMBuildMul(builder, index, lp_build_const_int32(gallivm, 
TGSI_NUM_CHANNELS), "");
@@ -280,21 +281,21 @@ store_value_to_array(struct lp_build_tgsi_context 
*bld_base,
 unsigned file,
 unsigned chan_index,
 unsigned reg_index,
 const struct tgsi_ind_register *reg_indirect)
 {
struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base);
struct gallivm_state *gallivm = bld_base->base.gallivm;
LLVMBuilderRef builder = gallivm->builder;
struct tgsi_declaration_range range = get_array_range(bld_base, file, 
reg_index, reg_indirect);
LLVMValueRef index = emit_array_index(bld, reg_indirect, reg_index - 
range.First);
-   LLVMValueRef array = get_alloca_for_array(bld_base, file, reg_index);
+   LLVMValueRef array = get_alloca_for_array(bld_base, file, reg_index, 
reg_indirect);
 
if (array) {
LLVMValueRef indices[2];
index = LLVMBuildMul(builder, index, 
lp_build_const_int32(gallivm, TGSI_NUM_CHANNELS), "");
index = LLVMBuildAdd(builder, index, 
lp_build_const_int32(gallivm, chan_index), "");
indices[0] = bld_base->uint_bld.zero;
indices[1] = index;
LLVMValueRef pointer = LLVMBuildGEP(builder, array, indices, 2, 
"");
LLVMBuildStore(builder, value, pointer);
return NULL;
@@ -617,21 +618,21 @@ void radeon_llvm_emit_store(struct lp_build_tgsi_context 
*bld_base,
 
if (reg->Register.Indirect) {
struct tgsi_declaration_range range = 
get_array_range(bld_base,
reg->Register.File, reg->Register.Index, 
>Indirect);
 
unsigned i, size = range.Last - range.First + 1;
unsigned file = reg->Register.File;
unsigned reg_index = reg->Register.Index;
LLVMValueRef array = store_value_to_array(bld_base, 
value, file, chan_index,
  reg_index, 
>Indirect);
-   if (get_alloca_for_array(bld_base, file, reg_index)) {
+ 

[Mesa-dev] [PATCH 10/16] gallium/radeon: extract common getelementptr logic into get_pointer_into_array

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 105 +
 1 file changed, 66 insertions(+), 39 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 531a8fe..87fc07e 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -180,20 +180,55 @@ emit_array_index(struct lp_build_tgsi_soa_context *bld,
 {
struct gallivm_state *gallivm = bld->bld_base.base.gallivm;
 
if (!reg) {
return lp_build_const_int32(gallivm, offset);
}
LLVMValueRef addr = LLVMBuildLoad(gallivm->builder, 
bld->addr[reg->Index][reg->Swizzle], "");
return LLVMBuildAdd(gallivm->builder, addr, 
lp_build_const_int32(gallivm, offset), "");
 }
 
+/**
+ * For indirect registers, construct a pointer directly to the requested
+ * element using getelementptr if possible.
+ *
+ * Returns NULL if the insertelement/extractelement fallback for array access
+ * must be used.
+ */
+static LLVMValueRef
+get_pointer_into_array(struct radeon_llvm_context *ctx,
+  unsigned file,
+  unsigned swizzle,
+  unsigned reg_index,
+  const struct tgsi_ind_register *reg_indirect)
+{
+   const struct radeon_llvm_array *array;
+   struct gallivm_state *gallivm = ctx->soa.bld_base.base.gallivm;
+   LLVMBuilderRef builder = gallivm->builder;
+   LLVMValueRef idxs[2];
+   LLVMValueRef index;
+
+   if (file != TGSI_FILE_TEMPORARY)
+   return NULL;
+
+   array = get_temp_array(>soa.bld_base, reg_index, reg_indirect);
+   if (!array || !array->alloca)
+   return NULL;
+
+   index = emit_array_index(>soa, reg_indirect, reg_index - 
array->range.First);
+   index = LLVMBuildMul(builder, index, lp_build_const_int32(gallivm, 
TGSI_NUM_CHANNELS), "");
+   index = LLVMBuildAdd(builder, index, lp_build_const_int32(gallivm, 
swizzle), "");
+   idxs[0] = ctx->soa.bld_base.uint_bld.zero;
+   idxs[1] = index;
+   return LLVMBuildGEP(builder, array->alloca, idxs, 2, "");
+}
+
 LLVMValueRef
 radeon_llvm_emit_fetch_64bit(struct lp_build_tgsi_context *bld_base,
 enum tgsi_opcode_type type,
 LLVMValueRef ptr,
 LLVMValueRef ptr2)
 {
LLVMBuilderRef builder = bld_base->base.gallivm->builder;
LLVMValueRef result;
 
result = 
LLVMGetUndef(LLVMVectorType(LLVMIntTypeInContext(bld_base->base.gallivm->context,
 32), bld_base->base.type.length * 2));
@@ -236,80 +271,72 @@ emit_array_fetch(struct lp_build_tgsi_context *bld_base,
 }
 
 static LLVMValueRef
 load_value_from_array(struct lp_build_tgsi_context *bld_base,
  unsigned file,
  enum tgsi_opcode_type type,
  unsigned swizzle,
  unsigned reg_index,
  const struct tgsi_ind_register *reg_indirect)
 {
+   struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base);
struct gallivm_state *gallivm = bld_base->base.gallivm;
LLVMBuilderRef builder = gallivm->builder;
-   struct tgsi_declaration_range range = get_array_range(bld_base, file, 
reg_index, reg_indirect);
-   LLVMValueRef index = emit_array_index(bld, reg_indirect, reg_index - 
range.First);
-   LLVMValueRef array = get_alloca_for_array(bld_base, file, reg_index, 
reg_indirect);
-   LLVMValueRef ptr, val, indices[2];
-
-   if (!array) {
-   /* Handle the case where the array is stored as a vector. */
-   return LLVMBuildExtractElement(builder,
-   emit_array_fetch(bld_base, file, type, range, 
swizzle),
-   index, "");
-   }
+   LLVMValueRef ptr;
 
-   index = LLVMBuildMul(builder, index, lp_build_const_int32(gallivm, 
TGSI_NUM_CHANNELS), "");
-   index = LLVMBuildAdd(builder, index, lp_build_const_int32(gallivm, 
swizzle), "");
-   indices[0] = bld_base->uint_bld.zero;
-   indices[1] = index;
-   ptr = LLVMBuildGEP(builder, array, indices, 2, "");
-   val = LLVMBuildLoad(builder, ptr, "");
-   if (tgsi_type_is_64bit(type)) {
-   LLVMValueRef ptr_hi, val_hi;
-   indices[0] = lp_build_const_int32(gallivm, 1);
-   ptr_hi = LLVMBuildGEP(builder, ptr, indices, 1, "");
-   val_hi = LLVMBuildLoad(builder, ptr_hi, "");
-   val = radeon_llvm_emit_fetch_64bit(bld_base, type, val, val_hi);
+   ptr = get_pointer_into_array(ctx, file, swizzle, reg_index, 
reg_indirect);
+   if (ptr) {
+   LLVMValueRef val = LLVMBuildLoad(builder, ptr, "");
+   if 

[Mesa-dev] [PATCH 05/16] gallium/radeon: simplify radeon_llvm_emit_store for direct array addressing

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

We can use the pointer stored in the temps array directly.

Reviewed-by: Tom Stellard 
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index e084248..7b96a58 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -624,30 +624,23 @@ void radeon_llvm_emit_store(struct lp_build_tgsi_context 
*bld_base,
} else {
switch(reg->Register.File) {
case TGSI_FILE_OUTPUT:
temp_ptr = 
bld->outputs[reg->Register.Index][chan_index];
if (tgsi_type_is_64bit(dtype))
temp_ptr2 = 
bld->outputs[reg->Register.Index][chan_index + 1];
break;
 
case TGSI_FILE_TEMPORARY:
{
-   LLVMValueRef array;
if (reg->Register.Index >= ctx->temps_count)
continue;
-   array = get_alloca_for_array(bld_base, 
reg->Register.File, reg->Register.Index);
 
-   if (array) {
-   store_value_to_array(bld_base, value, 
reg->Register.File, chan_index, reg->Register.Index,
-   NULL);
-   continue;
-   }
temp_ptr = ctx->temps[ TGSI_NUM_CHANNELS * 
reg->Register.Index + chan_index];
if (tgsi_type_is_64bit(dtype))
temp_ptr2 = ctx->temps[ 
TGSI_NUM_CHANNELS * reg->Register.Index + chan_index + 1];
 
break;
}
default:
return;
}
if (!tgsi_type_is_64bit(dtype))
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/16] gallium/radeon: add radeon_llvm_bound_index for bounds checking

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeon/radeon_llvm.h   |  4 +++
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 29 ++
 src/gallium/drivers/radeonsi/si_shader.c   | 19 +-
 3 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index 0276ef3..da5b7f5 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -109,20 +109,24 @@ struct radeon_llvm_context {
 
struct gallivm_state gallivm;
 };
 
 LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base,
  enum tgsi_opcode_type type);
 
 LLVMValueRef bitcast(struct lp_build_tgsi_context *bld_base,
 enum tgsi_opcode_type type, LLVMValueRef value);
 
+LLVMValueRef radeon_llvm_bound_index(struct radeon_llvm_context *ctx,
+LLVMValueRef index,
+unsigned num);
+
 void radeon_llvm_emit_prepare_cube_coords(struct lp_build_tgsi_context 
*bld_base,
  struct lp_build_emit_data *emit_data,
  LLVMValueRef *coords_arg,
  LLVMValueRef *derivs_arg);
 
 void radeon_llvm_context_init(struct radeon_llvm_context *ctx,
   const char *triple,
  const struct tgsi_shader_info *info,
  const struct tgsi_token *tokens);
 
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index dd7d60b..7cdf228 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -66,20 +66,49 @@ LLVMValueRef bitcast(struct lp_build_tgsi_context *bld_base,
 {
LLVMBuilderRef builder = bld_base->base.gallivm->builder;
LLVMTypeRef dst_type = tgsi2llvmtype(bld_base, type);
 
if (dst_type)
return LLVMBuildBitCast(builder, value, dst_type, "");
else
return value;
 }
 
+/**
+ * Return a value that is equal to the given i32 \p index if it lies in [0,num)
+ * or an undefined value in the same interval otherwise.
+ */
+LLVMValueRef radeon_llvm_bound_index(struct radeon_llvm_context *ctx,
+LLVMValueRef index,
+unsigned num)
+{
+   struct gallivm_state *gallivm = >gallivm;
+   LLVMBuilderRef builder = gallivm->builder;
+   LLVMValueRef c_max = lp_build_const_int32(gallivm, num - 1);
+   LLVMValueRef cc;
+
+   if (util_is_power_of_two(num)) {
+   index = LLVMBuildAnd(builder, index, c_max, "");
+   } else {
+   /* In theory, this MAX pattern should result in code that is
+* as good as the bit-wise AND above.
+*
+* In practice, LLVM generates worse code (at the time of
+* writing), because its value tracking is not strong enough.
+*/
+   cc = LLVMBuildICmp(builder, LLVMIntULE, index, c_max, "");
+   index = LLVMBuildSelect(builder, cc, index, c_max, "");
+   }
+
+   return index;
+}
+
 static struct radeon_llvm_loop *get_current_loop(struct radeon_llvm_context 
*ctx)
 {
return ctx->loop_depth > 0 ? ctx->loop + (ctx->loop_depth - 1) : NULL;
 }
 
 static struct radeon_llvm_branch *get_current_branch(struct 
radeon_llvm_context *ctx)
 {
return ctx->branch_depth > 0 ?
ctx->branch + (ctx->branch_depth - 1) : NULL;
 }
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 06b5c9c..a5b566e 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -558,47 +558,30 @@ static LLVMValueRef get_indirect_index(struct 
si_shader_context *ctx,
 }
 
 /**
  * Like get_indirect_index, but restricts the return value to a (possibly
  * undefined) value inside [0..num).
  */
 static LLVMValueRef get_bounded_indirect_index(struct si_shader_context *ctx,
   const struct tgsi_ind_register 
*ind,
   int rel_index, unsigned num)
 {
-   struct gallivm_state *gallivm = >radeon_bld.gallivm;
-   LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef result = get_indirect_index(ctx, ind, rel_index);
-   LLVMValueRef c_max = LLVMConstInt(ctx->i32, num - 1, 0);
-   LLVMValueRef cc;
 
/* LLVM 3.8: If indirect resource indexing is used:
 * - SI & CIK hang
 * - VI crashes
 */
if (HAVE_LLVM <= 0x0308)
return LLVMGetUndef(ctx->i32);
 
-   if (util_is_power_of_two(num)) {
- 

[Mesa-dev] [PATCH 11/16] gallium/radeon: always do the full store in store_value_to_array

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Doing the write-back of the temporary vector in radeon_llvm_emit_store makes
no sense.

This also allows us to get rid of get_alloca_for_array.
---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 77 --
 1 file changed, 28 insertions(+), 49 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 87fc07e..d8ab5b0 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -149,37 +149,20 @@ get_array_range(struct lp_build_tgsi_context *bld_base,
get_temp_array(bld_base, reg_index, reg);
if (array)
return array->range;
}
 
range.First = 0;
range.Last = bld_base->info->file_max[File];
return range;
 }
 
-static LLVMValueRef get_alloca_for_array(struct lp_build_tgsi_context 
*bld_base,
-unsigned file,
-unsigned index,
-const struct tgsi_ind_register *reg)
-{
-   const struct radeon_llvm_array *array;
-
-   if (file != TGSI_FILE_TEMPORARY)
-   return NULL;
-
-   array = get_temp_array(bld_base, index, reg);
-   if (!array)
-   return NULL;
-
-   return array->alloca;
-}
-
 static LLVMValueRef
 emit_array_index(struct lp_build_tgsi_soa_context *bld,
 const struct tgsi_ind_register *reg,
 unsigned offset)
 {
struct gallivm_state *gallivm = bld->bld_base.base.gallivm;
 
if (!reg) {
return lp_build_const_int32(gallivm, offset);
}
@@ -299,44 +282,67 @@ load_value_from_array(struct lp_build_tgsi_context 
*bld_base,
struct tgsi_declaration_range range =
get_array_range(bld_base, file, reg_index, 
reg_indirect);
LLVMValueRef index =
emit_array_index(bld, reg_indirect, reg_index - 
range.First);
LLVMValueRef array =
emit_array_fetch(bld_base, file, type, range, swizzle);
return LLVMBuildExtractElement(builder, array, index, "");
}
 }
 
-static LLVMValueRef
+static void
 store_value_to_array(struct lp_build_tgsi_context *bld_base,
 LLVMValueRef value,
 unsigned file,
 unsigned chan_index,
 unsigned reg_index,
 const struct tgsi_ind_register *reg_indirect)
 {
struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base);
struct gallivm_state *gallivm = bld_base->base.gallivm;
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef ptr;
 
ptr = get_pointer_into_array(ctx, file, chan_index, reg_index, 
reg_indirect);
if (ptr) {
LLVMBuildStore(builder, value, ptr);
-   return NULL;
} else {
+   unsigned i, size;
struct tgsi_declaration_range range = get_array_range(bld_base, 
file, reg_index, reg_indirect);
LLVMValueRef index = emit_array_index(bld, reg_indirect, 
reg_index - range.First);
LLVMValueRef array =
emit_array_fetch(bld_base, file, TGSI_TYPE_FLOAT, 
range, chan_index);
-   return LLVMBuildInsertElement(builder, array, value, index, "");
+   LLVMValueRef temp_ptr;
+
+   array = LLVMBuildInsertElement(builder, array, value, index, 
"");
+
+   size = range.Last - range.First + 1;
+   for (i = 0; i < size; ++i) {
+   switch(file) {
+   case TGSI_FILE_OUTPUT:
+   temp_ptr = bld->outputs[i + 
range.First][chan_index];
+   break;
+
+   case TGSI_FILE_TEMPORARY:
+   if (range.First + i >= ctx->temps_count)
+   continue;
+   temp_ptr = ctx->temps[(i + range.First) * 
TGSI_NUM_CHANNELS + chan_index];
+   break;
+
+   default:
+   continue;
+   }
+   value = LLVMBuildExtractElement(builder, array,
+   lp_build_const_int32(gallivm, i), "");
+   LLVMBuildStore(builder, value, temp_ptr);
+   }
}
 }
 
 LLVMValueRef radeon_llvm_emit_fetch(struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *reg,
enum tgsi_opcode_type type,
unsigned 

[Mesa-dev] [PATCH 07/16] gallium/radeon: clarify the comment on the array alloca heuristic

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 29 ++
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 22ff18e..e4bfa74 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -437,33 +437,42 @@ static void emit_declaration(struct lp_build_tgsi_context 
*bld_base,
decl_size = 4 * ((last - first) + 1);
if (decl->Declaration.Array) {
unsigned id = decl->Array.ArrayID - 1;
if (!ctx->arrays) {
int size = 
bld_base->info->array_max[TGSI_FILE_TEMPORARY];
ctx->arrays = CALLOC(size, 
sizeof(ctx->arrays[0]));
}
 
ctx->arrays[id].range = decl->Range;
 
-   /* If the array is more than 16 elements (each element
-* is 32-bits), then store it in a vector.  Storing the
-* array in a vector will causes the compiler to store
-* the array in registers and access it using indirect
-* addressing.  16 is number of vector elements that
-* LLVM will store in a register.
-* FIXME: We shouldn't need to do this.  LLVM should be
-* smart enough to promote allocas int registers when
-* profitable.
+   /* If the array has more than 16 elements, store it
+* in memory using an alloca that spans the entire
+* array.
+*
+* Otherwise, store each array element individually.
+* We will then generate vectors (per-channel, up to
+* <4 x float>) for indirect addressing.
+*
+* Note that 16 is the number of vector elements that
+* LLVM will store in a register, so theoretically an
+* array with up to 4 * 16 = 64 elements could be
+* handled this way, but whether that's a good idea
+* depends on VGPR register pressure elsewhere.
+*
+* FIXME: We shouldn't need to have the non-alloca
+* code path for arrays. LLVM should be smart enough to
+* promote allocas into registers when profitable.
 */
if (decl_size > 16) {
array_alloca = LLVMBuildAlloca(builder,
-   LLVMArrayType(bld_base->base.vec_type, 
decl_size),"array");
+   LLVMArrayType(bld_base->base.vec_type,
+ decl_size), "array");
ctx->arrays[id].alloca = array_alloca;
}
}
 
if (!ctx->temps_count) {
ctx->temps_count = 
bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1;
ctx->temps = MALLOC(TGSI_NUM_CHANNELS * 
ctx->temps_count * sizeof(LLVMValueRef));
}
if (!array_alloca) {
for (i = 0; i < decl_size; ++i) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/16] gallium/radeon: extract common lookup code into get_temp_array function

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 73 --
 1 file changed, 40 insertions(+), 33 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index e4bfa74..994c7da 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -104,72 +104,79 @@ static LLVMValueRef emit_swizzle(struct 
lp_build_tgsi_context *bld_base,
swizzles[1] = LLVMConstInt(i32t, swizzle_y, 0);
swizzles[2] = LLVMConstInt(i32t, swizzle_z, 0);
swizzles[3] = LLVMConstInt(i32t, swizzle_w, 0);
 
return LLVMBuildShuffleVector(bld_base->base.gallivm->builder,
  value,
  LLVMGetUndef(LLVMTypeOf(value)),
  LLVMConstVector(swizzles, 4), "");
 }
 
+/**
+ * Return the description of the array covering the given temporary register
+ * index.
+ */
+static const struct radeon_llvm_array *
+get_temp_array(struct lp_build_tgsi_context *bld_base,
+  unsigned reg_index,
+  const struct tgsi_ind_register *reg)
+{
+   struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
+   unsigned num_arrays = 
ctx->soa.bld_base.info->array_max[TGSI_FILE_TEMPORARY];
+   unsigned i;
+
+   if (reg && reg->ArrayID > 0 && reg->ArrayID <= num_arrays)
+   return >arrays[reg->ArrayID - 1];
+
+   for (i = 0; i < num_arrays; i++) {
+   const struct radeon_llvm_array *array = >arrays[i];
+
+   if (reg_index >= array->range.First && reg_index <= 
array->range.Last)
+   return array;
+   }
+
+   return NULL;
+}
+
 static struct tgsi_declaration_range
 get_array_range(struct lp_build_tgsi_context *bld_base,
unsigned File, unsigned reg_index,
const struct tgsi_ind_register *reg)
 {
-   struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
+   struct tgsi_declaration_range range;
 
-   if (!reg) {
-   unsigned i;
-   unsigned num_arrays = 
bld_base->info->array_max[TGSI_FILE_TEMPORARY];
-   for (i = 0; i < num_arrays; i++) {
-   const struct tgsi_declaration_range *range =
-   >arrays[i].range;
-
-   if (reg_index >= range->First && reg_index <= 
range->Last) {
-   return ctx->arrays[i].range;
-   }
-   }
+   if (File == TGSI_FILE_TEMPORARY) {
+   const struct radeon_llvm_array *array =
+   get_temp_array(bld_base, reg_index, reg);
+   if (array)
+   return array->range;
}
 
-   if (File != TGSI_FILE_TEMPORARY || !reg || reg->ArrayID == 0 ||
-   reg->ArrayID > bld_base->info->array_max[TGSI_FILE_TEMPORARY]) {
-   struct tgsi_declaration_range range;
-   range.First = 0;
-   range.Last = bld_base->info->file_max[File];
-   return range;
-   }
-
-   return ctx->arrays[reg->ArrayID - 1].range;
+   range.First = 0;
+   range.Last = bld_base->info->file_max[File];
+   return range;
 }
 
 static LLVMValueRef get_alloca_for_array(struct lp_build_tgsi_context 
*bld_base,
 unsigned file,
 unsigned index)
 {
-   unsigned i;
-   unsigned num_arrays;
-   struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
+   const struct radeon_llvm_array *array;
 
if (file != TGSI_FILE_TEMPORARY)
return NULL;
 
-   num_arrays = bld_base->info->array_max[TGSI_FILE_TEMPORARY];
-   for (i = 0; i < num_arrays; i++) {
-   const struct tgsi_declaration_range *range =
-   >arrays[i].range;
+   array = get_temp_array(bld_base, index, NULL);
+   if (!array)
+   return NULL;
 
-   if (index >= range->First && index <= range->Last) {
-   return ctx->arrays[i].alloca;
-   }
-   }
-   return NULL;
+   return array->alloca;
 }
 
 static LLVMValueRef
 emit_array_index(struct lp_build_tgsi_soa_context *bld,
 const struct tgsi_ind_register *reg,
 unsigned offset)
 {
struct gallivm_state *gallivm = bld->bld_base.base.gallivm;
 
if (!reg) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/16] gallium/radeon: clean up emit_declaration for temporaries

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

In the alloca'd array case, no longer create redundant and unused allocas
for the individual elements; create getelementptrs instead.

Reviewed-by: Tom Stellard 
---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 27 ++
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index d75311e..41f24d3 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -408,81 +408,90 @@ static LLVMValueRef si_build_alloca_undef(struct 
gallivm_state *gallivm,
LLVMValueRef ptr = lp_build_alloca(gallivm, type, name);
LLVMBuildStore(gallivm->builder, LLVMGetUndef(type), ptr);
return ptr;
 }
 
 static void emit_declaration(struct lp_build_tgsi_context *bld_base,
 const struct tgsi_full_declaration *decl)
 {
struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
LLVMBuilderRef builder = bld_base->base.gallivm->builder;
-   unsigned first, last, i, idx;
+   unsigned first, last, i;
switch(decl->Declaration.File) {
case TGSI_FILE_ADDRESS:
{
 unsigned idx;
for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) {
unsigned chan;
for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) {
 ctx->soa.addr[idx][chan] = 
si_build_alloca_undef(
>gallivm,
ctx->soa.bld_base.uint_bld.elem_type, 
"");
}
}
break;
}
 
case TGSI_FILE_TEMPORARY:
{
+   LLVMValueRef array_alloca = NULL;
unsigned decl_size;
first = decl->Range.First;
last = decl->Range.Last;
decl_size = 4 * ((last - first) + 1);
if (decl->Declaration.Array) {
unsigned id = decl->Array.ArrayID - 1;
if (!ctx->arrays) {
int size = 
bld_base->info->array_max[TGSI_FILE_TEMPORARY];
ctx->arrays = CALLOC(size, 
sizeof(ctx->arrays[0]));
-   for (i = 0; i < size; ++i) {
-   assert(!ctx->arrays[i].alloca);}
}
 
ctx->arrays[id].range = decl->Range;
 
/* If the array is more than 16 elements (each element
 * is 32-bits), then store it in a vector.  Storing the
 * array in a vector will causes the compiler to store
 * the array in registers and access it using indirect
 * addressing.  16 is number of vector elements that
 * LLVM will store in a register.
 * FIXME: We shouldn't need to do this.  LLVM should be
 * smart enough to promote allocas int registers when
 * profitable.
 */
if (decl_size > 16) {
-   ctx->arrays[id].alloca = 
LLVMBuildAlloca(builder,
+   array_alloca = LLVMBuildAlloca(builder,
LLVMArrayType(bld_base->base.vec_type, 
decl_size),"array");
+   ctx->arrays[id].alloca = array_alloca;
}
}
-   first = decl->Range.First;
-   last = decl->Range.Last;
+
if (!ctx->temps_count) {
ctx->temps_count = 
bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1;
ctx->temps = MALLOC(TGSI_NUM_CHANNELS * 
ctx->temps_count * sizeof(LLVMValueRef));
}
-   for (idx = first; idx <= last; idx++) {
-   for (i = 0; i < TGSI_NUM_CHANNELS; i++) {
-   ctx->temps[idx * TGSI_NUM_CHANNELS + i] =
+   if (!array_alloca) {
+   for (i = 0; i < decl_size; ++i) {
+   ctx->temps[first * TGSI_NUM_CHANNELS + i] =

si_build_alloca_undef(bld_base->base.gallivm,
  
bld_base->base.vec_type,
  "temp");
}
+   } else {
+   LLVMValueRef idxs[2] = {
+   bld_base->uint_bld.zero,
+   NULL
+   };
+   for (i = 0; i < decl_size; ++i) {
+ 

[Mesa-dev] [PATCH v2 00/16] radeonsi: improve handling of temporary arrays

2016-08-10 Thread Nicolai Hähnle
Hi,

this is a respin of the series which scans the shader's TGSI to determine
which channels of an array are actually written to. Most of the st/mesa
changes have become unnecessary. Most of the radeon-specific part stays
the same.

For one F1 2015 shader, it reduces the scratch size from 132096 to 26624
bytes, which is bound to be much nicer on the texture cache.

Please review!

Thanks,
Nicolai

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/16] gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressing

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

We can use the pointer stored in the temps array directly.

Reviewed-by: Tom Stellard 
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 41f24d3..e084248 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -352,25 +352,20 @@ LLVMValueRef radeon_llvm_emit_fetch(struct 
lp_build_tgsi_context *bld_base,
case TGSI_FILE_TEMPORARY:
if (reg->Register.Index >= ctx->temps_count)
return LLVMGetUndef(tgsi2llvmtype(bld_base, type));
ptr = ctx->temps[reg->Register.Index * TGSI_NUM_CHANNELS + 
swizzle];
if (tgsi_type_is_64bit(type)) {
ptr2 = ctx->temps[reg->Register.Index * 
TGSI_NUM_CHANNELS + swizzle + 1];
return radeon_llvm_emit_fetch_64bit(bld_base, type,
 LLVMBuildLoad(builder, ptr, 
""),
 LLVMBuildLoad(builder, ptr2, 
""));
}
-   LLVMValueRef array = get_alloca_for_array(bld_base, 
reg->Register.File, reg->Register.Index);
-   if (array) {
-   return bitcast(bld_base, type, 
load_value_from_array(bld_base, reg->Register.File, type,
-   swizzle, reg->Register.Index, NULL));
-   }
result = LLVMBuildLoad(builder, ptr, "");
break;
 
case TGSI_FILE_OUTPUT:
ptr = lp_get_output_ptr(bld, reg->Register.Index, swizzle);
if (tgsi_type_is_64bit(type)) {
ptr2 = lp_get_output_ptr(bld, reg->Register.Index, 
swizzle + 1);
return radeon_llvm_emit_fetch_64bit(bld_base, type,
 LLVMBuildLoad(builder, ptr, 
""),
 LLVMBuildLoad(builder, ptr2, 
""));
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/16] st_glsl_to_tgsi: use calloc the way it's meant to be used

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 362559f..89e5c4d 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -6031,21 +6031,21 @@ st_translate_program(
   goto out;
}
 
t->procType = procType;
t->inputMapping = inputMapping;
t->outputMapping = outputMapping;
t->ureg = ureg;
t->num_temp_arrays = program->next_array;
if (t->num_temp_arrays)
   t->arrays = (struct ureg_dst*)
-  calloc(1, sizeof(t->arrays[0]) * t->num_temp_arrays);
+  calloc(t->num_temp_arrays, sizeof(t->arrays[0]));
 
/*
 * Declare input attributes.
 */
switch (procType) {
case PIPE_SHADER_FRAGMENT:
   for (i = 0; i < numInputs; i++) {
  unsigned array_id = 0;
  unsigned array_size;
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/16] tgsi/scan: add tgsi_scan_arrays

2016-08-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/auxiliary/tgsi/tgsi_scan.c | 76 ++
 src/gallium/auxiliary/tgsi/tgsi_scan.h | 17 
 2 files changed, 93 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c 
b/src/gallium/auxiliary/tgsi/tgsi_scan.c
index 98d86fc..0167e22 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_scan.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c
@@ -628,20 +628,96 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
   info->file_max[TGSI_FILE_INPUT] =
 MAX2(info->file_max[TGSI_FILE_INPUT], num_verts - 1);
   for (j = 0; j < num_verts; ++j) {
  info->file_mask[TGSI_FILE_INPUT] |= (1 << j);
   }
}
 
tgsi_parse_free();
 }
 
+/**
+ * Collect information about the arrays of a given register file.
+ *
+ * @param tokens TGSI shader
+ * @param file the register file to scan through
+ * @param max_array_id number of entries in @p arrays; should be equal to the
+ * highest array id, i.e. 
tgsi_shader_info::array_max[file].
+ * @param arrays info for array of each ID will be written to arrays[ID - 1].
+ */
+void
+tgsi_scan_arrays(const struct tgsi_token *tokens,
+ unsigned file,
+ unsigned max_array_id,
+ struct tgsi_array_info *arrays)
+{
+   struct tgsi_parse_context parse;
+
+   if (tgsi_parse_init(, tokens) != TGSI_PARSE_OK) {
+  debug_printf("tgsi_parse_init() failed in tgsi_scan_arrays()!\n");
+  return;
+   }
+
+   memset(arrays, 0, sizeof(arrays[0]) * max_array_id);
+
+   while (!tgsi_parse_end_of_tokens()) {
+  struct tgsi_full_instruction *inst;
+
+  tgsi_parse_token();
+
+  if (parse.FullToken.Token.Type == TGSI_TOKEN_TYPE_DECLARATION) {
+ struct tgsi_full_declaration *decl = 
+
+ if (decl->Declaration.Array && decl->Declaration.File == file &&
+ decl->Array.ArrayID > 0 && decl->Array.ArrayID <= max_array_id) {
+struct tgsi_array_info *array = [decl->Array.ArrayID - 1];
+assert(!array->declared);
+array->declared = true;
+array->range = decl->Range;
+ }
+  }
+
+  if (parse.FullToken.Token.Type != TGSI_TOKEN_TYPE_INSTRUCTION)
+ continue;
+
+  inst = 
+  for (unsigned i = 0; i < inst->Instruction.NumDstRegs; i++) {
+ const struct tgsi_full_dst_register *dst = >Dst[i];
+ if (dst->Register.File != file)
+continue;
+
+ if (dst->Register.Indirect) {
+if (dst->Indirect.ArrayID > 0 &&
+dst->Indirect.ArrayID <= max_array_id) {
+   arrays[dst->Indirect.ArrayID - 1].writemask |= 
dst->Register.WriteMask;
+} else {
+   /* Indirect writes without an ArrayID can write anywhere. */
+   for (unsigned j = 0; j < max_array_id; ++j)
+  arrays[j].writemask |= dst->Register.WriteMask;
+}
+ } else {
+/* Check whether the write falls into any of the arrays anyway. */
+for (unsigned j = 0; j < max_array_id; ++j) {
+   struct tgsi_array_info *array = [j];
+   if (array->declared &&
+   dst->Register.Index >= array->range.First &&
+   dst->Register.Index <= array->range.Last)
+  array->writemask |= dst->Register.WriteMask;
+}
+ }
+  }
+   }
+
+   tgsi_parse_free();
+
+   return;
+}
 
 
 /**
  * Check if the given shader is a "passthrough" shader consisting of only
  * MOV instructions of the form:  MOV OUT[n], IN[n]
  *  
  */
 boolean
 tgsi_is_passthrough_shader(const struct tgsi_token *tokens)
 {
diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.h 
b/src/gallium/auxiliary/tgsi/tgsi_scan.h
index f7eefa4..30d0146 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_scan.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_scan.h
@@ -142,23 +142,40 @@ struct tgsi_shader_info
unsigned indirect_files_written;
 
unsigned properties[TGSI_PROPERTY_COUNT]; /* index with TGSI_PROPERTY_ */
 
/**
 * Max nesting limit of loops/if's
 */
unsigned max_depth;
 };
 
+struct tgsi_array_info
+{
+   /** Whether an array with this ID was declared. */
+   bool declared;
+
+   /** The OR of all writemasks used to write to this array. */
+   ubyte writemask;
+
+   /** The range with which the array was declared. */
+   struct tgsi_declaration_range range;
+};
+
 extern void
 tgsi_scan_shader(const struct tgsi_token *tokens,
  struct tgsi_shader_info *info);
 
+void
+tgsi_scan_arrays(const struct tgsi_token *tokens,
+ unsigned file,
+ unsigned max_array_id,
+ struct tgsi_array_info *arrays);
 
 extern boolean
 tgsi_is_passthrough_shader(const struct tgsi_token *tokens);
 
 #ifdef __cplusplus
 } // extern "C"
 #endif
 
 #endif /* TGSI_SCAN_H */
-- 
2.7.4

___

Re: [Mesa-dev] [PATCH] st/mesa: in ATI fs don't assume TEMP0=REG0

2016-08-10 Thread Marek Olšák
On Wed, Aug 10, 2016 at 8:40 PM, Miklós Máté  wrote:
> Thanks. I forgot to add that this should also be applied to the 12.0 stable
> branch.

OK. I've sent it to mesa-stable for inclusion into 12.0.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/mesa: in ATI fs don't assume TEMP0=REG0

2016-08-10 Thread Marek Olšák
From: Miklós Máté 

The temporaries are allocated dynamically.

Signed-off-by: Miklós Máté 
Signed-off-by: Marek Olšák 
Cc: 12.0 

---
 src/mesa/state_tracker/st_atifs_to_tgsi.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c 
b/src/mesa/state_tracker/st_atifs_to_tgsi.c
index 66f442a..ae08796 100644
--- a/src/mesa/state_tracker/st_atifs_to_tgsi.c
+++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c
@@ -691,6 +691,7 @@ transform_inst:
   struct tgsi_full_instruction inst;
   unsigned i;
   int fogc_index = -1;
+  int reg0_index = current_inst->Src[0].Register.Index;
 
   /* find FOGC input */
   for (i = 0; i < ctx->info.num_inputs; i++) {
@@ -804,11 +805,11 @@ transform_inst:
   inst.Instruction.Opcode = TGSI_OPCODE_LRP;
   inst.Instruction.NumDstRegs = 1;
   inst.Dst[0].Register.File  = TGSI_FILE_TEMPORARY;
-  inst.Dst[0].Register.Index = 0;
+  inst.Dst[0].Register.Index = reg0_index;
   inst.Dst[0].Register.WriteMask = TGSI_WRITEMASK_XYZW;
   inst.Instruction.NumSrcRegs = 3;
   SET_SRC(, 0, TGSI_FILE_TEMPORARY, ctx->fog_factor_temp, X, X, X, Y);
-  SET_SRC(, 1, TGSI_FILE_TEMPORARY, 0, X, Y, Z, W);
+  SET_SRC(, 1, TGSI_FILE_TEMPORARY, reg0_index, X, Y, Z, W);
   SET_SRC(, 2, TGSI_FILE_CONSTANT, MAX_NUM_FRAGMENT_CONSTANTS_ATI + 
1, X, Y, Z, W);
   tctx->emit_instruction(tctx, );
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: in ATI fs don't assume TEMP0=REG0

2016-08-10 Thread Miklós Máté
Thanks. I forgot to add that this should also be applied to the 12.0 
stable branch.


MM

On 08/10/2016 03:07 PM, Marek Olšák wrote:

Pushed, thanks.

Marek

On Mon, Aug 8, 2016 at 12:48 AM, Miklós Máté  wrote:

The temporaries are allocated dynamically.

Signed-off-by: Miklós Máté 
---
  src/mesa/state_tracker/st_atifs_to_tgsi.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c 
b/src/mesa/state_tracker/st_atifs_to_tgsi.c
index 66f442a..ae08796 100644
--- a/src/mesa/state_tracker/st_atifs_to_tgsi.c
+++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c
@@ -691,6 +691,7 @@ transform_inst:
struct tgsi_full_instruction inst;
unsigned i;
int fogc_index = -1;
+  int reg0_index = current_inst->Src[0].Register.Index;

/* find FOGC input */
for (i = 0; i < ctx->info.num_inputs; i++) {
@@ -804,11 +805,11 @@ transform_inst:
inst.Instruction.Opcode = TGSI_OPCODE_LRP;
inst.Instruction.NumDstRegs = 1;
inst.Dst[0].Register.File  = TGSI_FILE_TEMPORARY;
-  inst.Dst[0].Register.Index = 0;
+  inst.Dst[0].Register.Index = reg0_index;
inst.Dst[0].Register.WriteMask = TGSI_WRITEMASK_XYZW;
inst.Instruction.NumSrcRegs = 3;
SET_SRC(, 0, TGSI_FILE_TEMPORARY, ctx->fog_factor_temp, X, X, X, 
Y);
-  SET_SRC(, 1, TGSI_FILE_TEMPORARY, 0, X, Y, Z, W);
+  SET_SRC(, 1, TGSI_FILE_TEMPORARY, reg0_index, X, Y, Z, W);
SET_SRC(, 2, TGSI_FILE_CONSTANT, MAX_NUM_FRAGMENT_CONSTANTS_ATI + 
1, X, Y, Z, W);
tctx->emit_instruction(tctx, );
 }
--
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.

2016-08-10 Thread Ian Romanick
On 08/10/2016 11:24 AM, Kenneth Graunke wrote:
> On Wednesday, August 10, 2016 10:02:12 AM PDT Erik Faye-Lund wrote:
>> On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke  
>> wrote:
>>> On Haswell (GL 3.3):
>>>
>>> total instructions in shared programs: 6208759 -> 6203860 (-0.08%)
>>> instructions in affected programs: 856541 -> 851642 (-0.57%)
>>> helped: 3157
>>> HURT: 113
>>> LOST:   7
>>> GAINED: 15
>>>
>>> On Broadwell (GL 4.4):
>>>
>>> total instructions in shared programs: 11637854 -> 11632016 (-0.05%)
>>> instructions in affected programs: 1055693 -> 1049855 (-0.55%)
>>> helped: 3900
>>> HURT: 176
>>> LOST:   1
>>> GAINED: 18
>>>
>>> Signed-off-by: Kenneth Graunke 
>>> ---
>>>  src/compiler/nir/nir_opt_algebraic.py | 4 
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
>>> b/src/compiler/nir/nir_opt_algebraic.py
>>> index 1cf614c..4e9896f 100644
>>> --- a/src/compiler/nir/nir_opt_algebraic.py
>>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>>> @@ -251,6 +251,10 @@ optimizations = [
>>> (('ieq', 'a@bool', False), ('inot', 'a')),
>>> (('bcsel', a, True, False), ('ine', a, 0)),
>>> (('bcsel', a, False, True), ('ieq', a, 0)),
>>> +   (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))),
>>> +   (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))),
>>> +   (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0,
>>> +   (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0,
>>> (('bcsel', True, b, c), b),
>>> (('bcsel', False, b, c), c),
>>> # The result of this should be hit by constant propagation and, in the
>>
>> Same as the previous patch, this smells like intel-isms. Hardware that
>> has native bcsel with support for two inline immediates will do better
>> without.
> 
> It definitely feels a little strange replacing a single bcsel with
> a fneg/b2f/ine, as that's three operations instead of one.
> 
> I expect the ine to go away - assuming 'a' is a properly formatted
> boolean (0 or 0x), "ine a 0" will just become 'a'.  ieq would
> turn into inot.  If the boolean was a comparison, the inot could be
> folded in - i.e. inot(flt(a,b)) -> fge(a,b).  Or, some GPUs can handle
> boolean negation as a source modifier, so it might be free there too.
> 
> Floating point negation can usually be done as a source modifier.
> 
> For reference, here's a shader snippet from Goat Simulator which
> prompted me to write this optimization:
> 
> const vec4 LocalConst1 = vec4(0.25, -0.25, 0.00, 1.00);
> 
> void main()
> {
> ...
> InstrHelpTemp.r = ( ( Temporary1.r >= 0.0 ) ? LocalConst1.b : LocalConst1.a );
> ...
> }
> 
> which could be turned into
> 
> InstrHelpTemp.r = float(Temporary1.r < 0.0);
> 
> which seems arguably better, regardless of hardware.

Yeah... I remember looking at that very shader.  I seem to recall that
InstrHelpTemp.r is then compared with 0.0 a little later to generate
another Boolean value. :)

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] nir: Convert ineg(b2i(a)) to a if it's a boolean.

2016-08-10 Thread Kenneth Graunke
On Wednesday, August 10, 2016 10:53:17 AM PDT Ian Romanick wrote:
> My most "recent" work in this area is in my appropriately named
> assume-CMS-in-precompile branch.  I should just rename all my branches
> idr-got-distracted-#.

Thanks, I tried to look for your recent work on this, but in addition
to the name, that branch wasn't in your repository yesterday...


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.

2016-08-10 Thread Kenneth Graunke
On Wednesday, August 10, 2016 10:02:12 AM PDT Erik Faye-Lund wrote:
> On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke  
> wrote:
> > On Haswell (GL 3.3):
> >
> > total instructions in shared programs: 6208759 -> 6203860 (-0.08%)
> > instructions in affected programs: 856541 -> 851642 (-0.57%)
> > helped: 3157
> > HURT: 113
> > LOST:   7
> > GAINED: 15
> >
> > On Broadwell (GL 4.4):
> >
> > total instructions in shared programs: 11637854 -> 11632016 (-0.05%)
> > instructions in affected programs: 1055693 -> 1049855 (-0.55%)
> > helped: 3900
> > HURT: 176
> > LOST:   1
> > GAINED: 18
> >
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/compiler/nir/nir_opt_algebraic.py | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> > b/src/compiler/nir/nir_opt_algebraic.py
> > index 1cf614c..4e9896f 100644
> > --- a/src/compiler/nir/nir_opt_algebraic.py
> > +++ b/src/compiler/nir/nir_opt_algebraic.py
> > @@ -251,6 +251,10 @@ optimizations = [
> > (('ieq', 'a@bool', False), ('inot', 'a')),
> > (('bcsel', a, True, False), ('ine', a, 0)),
> > (('bcsel', a, False, True), ('ieq', a, 0)),
> > +   (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))),
> > +   (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))),
> > +   (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0,
> > +   (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0,
> > (('bcsel', True, b, c), b),
> > (('bcsel', False, b, c), c),
> > # The result of this should be hit by constant propagation and, in the
> 
> Same as the previous patch, this smells like intel-isms. Hardware that
> has native bcsel with support for two inline immediates will do better
> without.

It definitely feels a little strange replacing a single bcsel with
a fneg/b2f/ine, as that's three operations instead of one.

I expect the ine to go away - assuming 'a' is a properly formatted
boolean (0 or 0x), "ine a 0" will just become 'a'.  ieq would
turn into inot.  If the boolean was a comparison, the inot could be
folded in - i.e. inot(flt(a,b)) -> fge(a,b).  Or, some GPUs can handle
boolean negation as a source modifier, so it might be free there too.

Floating point negation can usually be done as a source modifier.

For reference, here's a shader snippet from Goat Simulator which
prompted me to write this optimization:

const vec4 LocalConst1 = vec4(0.25, -0.25, 0.00, 1.00);

void main()
{
...
InstrHelpTemp.r = ( ( Temporary1.r >= 0.0 ) ? LocalConst1.b : LocalConst1.a );
...
}

which could be turned into

InstrHelpTemp.r = float(Temporary1.r < 0.0);

which seems arguably better, regardless of hardware.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] nir: Turn -(b2f(a) + b2f(b) >= 0 into !(a || b).

2016-08-10 Thread Patrick Baggett
> >
> > For now, this patch is
> >
> > Reviewed-by: Ian Romanick 
>

I had a hard time parsing the title: "Turn -(b2f(a) + b2f(b) >= 0 into
!(a || b)"  at first, until I saw the replacement instructions. You're
missing a ')' on the commit line. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/6] nir: Introduce a nir_opt_move_comparisons() pass.

2016-08-10 Thread Ian Romanick
On 08/09/2016 07:30 PM, Kenneth Graunke wrote:
> This tries to move comparisons (a common source of boolean values)
> closer to their first use.  For GPUs which use condition codes,
> this can eliminate a lot of temporary booleans and comparisons
> which reload the condition code register based on a boolean.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/compiler/Makefile.sources   |   1 +
>  src/compiler/nir/nir.h  |   2 +
>  src/compiler/nir/nir_opt_move_comparisons.c | 173 
> 
>  3 files changed, 176 insertions(+)
>  create mode 100644 src/compiler/nir/nir_opt_move_comparisons.c
> 
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index 0ff9b23..008a101 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -226,6 +226,7 @@ NIR_FILES = \
>   nir/nir_opt_gcm.c \
>   nir/nir_opt_global_to_local.c \
>   nir/nir_opt_peephole_select.c \
> + nir/nir_opt_move_comparisons.c \
>   nir/nir_opt_remove_phis.c \
>   nir/nir_opt_undef.c \
>   nir/nir_phi_builder.c \
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 9ce5be2..79511a7 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2574,6 +2574,8 @@ bool nir_opt_dead_cf(nir_shader *shader);
>  
>  void nir_opt_gcm(nir_shader *shader);
>  
> +bool nir_opt_move_comparisons(nir_shader *shader);
> +
>  bool nir_opt_peephole_select(nir_shader *shader);
>  
>  bool nir_opt_remove_phis(nir_shader *shader);
> diff --git a/src/compiler/nir/nir_opt_move_comparisons.c 
> b/src/compiler/nir/nir_opt_move_comparisons.c
> new file mode 100644
> index 000..74927c9
> --- /dev/null
> +++ b/src/compiler/nir/nir_opt_move_comparisons.c
> @@ -0,0 +1,173 @@
> +/*
> + * Copyright © 2016 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "nir.h"
> +
> +/**
> + * \file nir_opt_move_comparisons.c
> + *
> + * This pass moves ALU comparison operations just before their first use.
> + *
> + * It only moves instructions within a single basic block; cross-block
> + * movement is left to global code motion.
> + *
> + * Many GPUs generate condition codes for comparisons, and use predication
> + * for conditional selects and control flow.  In a sequence such as:
> + *
> + * vec1 32 ssa_1 = flt a b
> + * 
> + * vec1 32 ssa_2 = bcsel ssa_1 c d
> + *
> + * the backend would likely do the comparison, producing condition codes,
> + * then save those to a boolean value.  The intervening operations might
> + * trash the condition codes.  Then, in order to do the bcsel, it would
> + * need to re-populate the condition code register based on the boolean.
> + *
> + * By moving the comparison just before the bcsel, the condition codes could
> + * be used directly.  This eliminates the need to reload them from the 
> boolean
> + * (generally eliminating an instruction).  It may also eliminate the need to
> + * create a boolean value altogether (unless it's used elsewhere), which 
> could
> + * lower register pressure.
> + */
> +
> +static bool
> +is_comparison(nir_op op)
> +{
> +   switch (op) {
> +   case nir_op_flt:
> +   case nir_op_fge:
> +   case nir_op_feq:
> +   case nir_op_fne:
> +   case nir_op_ilt:
> +   case nir_op_ult:
> +   case nir_op_ige:
> +   case nir_op_uge:
> +   case nir_op_ieq:
> +   case nir_op_ine:
> +  return true;
> +   default:
> +  return false;
> +   }
> +}
> +
> +static bool
> +move_comparison_source(nir_src *src, nir_block *block, struct exec_node 
> *before)
> +{
> +   if (src->is_ssa && src->ssa->parent_instr->block == block &&
> +   src->ssa->parent_instr->type == nir_instr_type_alu &&
> +   is_comparison(nir_instr_as_alu(src->ssa->parent_instr)->op)) {
> +
> +  struct exec_node *src_node = 

Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.

2016-08-10 Thread Connor Abbott
On Wed, Aug 10, 2016 at 1:53 PM, Eric Anholt  wrote:
> Kenneth Graunke  writes:
>
>> On Haswell (GL 3.3):
>>
>> total instructions in shared programs: 6208759 -> 6203860 (-0.08%)
>> instructions in affected programs: 856541 -> 851642 (-0.57%)
>> helped: 3157
>> HURT: 113
>> LOST:   7
>> GAINED: 15
>>
>> On Broadwell (GL 4.4):
>>
>> total instructions in shared programs: 11637854 -> 11632016 (-0.05%)
>> instructions in affected programs: 1055693 -> 1049855 (-0.55%)
>> helped: 3900
>> HURT: 176
>> LOST:   1
>> GAINED: 18
>>
>> Signed-off-by: Kenneth Graunke 
>> ---
>>  src/compiler/nir/nir_opt_algebraic.py | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index 1cf614c..4e9896f 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -251,6 +251,10 @@ optimizations = [
>> (('ieq', 'a@bool', False), ('inot', 'a')),
>> (('bcsel', a, True, False), ('ine', a, 0)),
>> (('bcsel', a, False, True), ('ieq', a, 0)),
>> +   (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))),
>> +   (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))),
>> +   (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0,
>> +   (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0,
>
> Isn't bcsel's first arg guaranteed to be 0 or ~0?  I thought that was
> how nir's bcsel (and b2f) worked.  If not, it would be good to see this
> documented.

Yes, any ALU operation that takes a boolean input can expect it to be
0 or ~0. If that isn't true, then something else in the compiler has
messed up. So I think we can replace 'a != 0' with 'a' and 'a == 0'
with '!a'.

>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] nir: Turn -(b2f(a) + b2f(b) >= 0 into !(a || b).

2016-08-10 Thread Kenneth Graunke
On Wednesday, August 10, 2016 10:58:49 AM PDT Ian Romanick wrote:
> On 08/09/2016 07:30 PM, Kenneth Graunke wrote:
> > On Haswell (GL 3.3):
> > 
> > total instructions in shared programs: 6211485 -> 6211427 (-0.00%)
> > instructions in affected programs: 16260 -> 16202 (-0.36%)
> > helped: 25
> > HURT: 37
> > 
> > On Broadwell (GL 4.4):
> > 
> > total instructions in shared programs: 11640288 -> 11640218 (-0.00%)
> > instructions in affected programs: 16313 -> 16243 (-0.43%)
> > helped: 27
> > HURT: 37
> 
> Interesting... I had a similar version of this that worked on arbitrary
> length sequences of (b2f(a_0) + .. b2f(a_n)), but I didn't get any
> benefit.  I also had tests for a couple other relational operators with
> 0 that didn't see benefit.  Maybe one of us should try to revive my old,
> more generic series at some point.
> 
> For now, this patch is
> 
> Reviewed-by: Ian Romanick 

I believe I first saw this sequence after my bcsel -1.0 -0.0 optimization.

I also imported over 11,000 more shaders recently - probably after
you last did this experiment.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] i965: Use the nir_move_comparisons pass.

2016-08-10 Thread Ian Romanick
On 08/09/2016 07:30 PM, Kenneth Graunke wrote:
> On Haswell (GL 3.3):
> 
> total instructions in shared programs: 6211427 -> 6210079 (-0.02%)
> instructions in affected programs: 219356 -> 218008 (-0.61%)
> helped: 567
> HURT: 132
> 
> No spill/fill changes.
> 
> LOST:   0
> GAINED: 4
> 
> On Broadwell (GL 4.4):
> 
> total instructions in shared programs: 11640218 -> 11632136 (-0.07%)
> instructions in affected programs: 542528 -> 534446 (-1.49%)
> helped: 1661
> HURT: 143
> 
> total spills in shared programs: 2922 -> 2932 (0.34%)
> spills in affected programs: 420 -> 430 (2.38%)
> helped: 1
> HURT: 1
> 
> total fills in shared programs: 4389 -> 4390 (0.02%)
> fills in affected programs: 967 -> 968 (0.10%)
> helped: 1
> HURT: 1

I would have thought that this would help more.  I wonder if it's being
held back by the poor way we handle || and &&.  We always implement
these as actual instructions, but we could use predication.  I had
looked into this a very little bit in days just before NIR, and it
looked promising.  Once NIR landed, I couldn't tell the difference
between logical-or and bitwise-or, and I abandoned the effort.

I also had problems dealing with the single condition code register that
we use.  Long sequences like (a && b || c && d) really want to use all
the condition code registers.

> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_nir.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
> b/src/mesa/drivers/dri/i965/brw_nir.c
> index db2056d..b02c404 100644
> --- a/src/mesa/drivers/dri/i965/brw_nir.c
> +++ b/src/mesa/drivers/dri/i965/brw_nir.c
> @@ -487,6 +487,7 @@ brw_postprocess_nir(nir_shader *nir,
> OPT_V(nir_lower_to_source_mods);
> OPT(nir_copy_prop);
> OPT(nir_opt_dce);
> +   OPT(nir_opt_move_comparisons);
>  
> OPT(nir_lower_locals_to_regs);
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Add support for scalarizing load_interpolated_input.

2016-08-10 Thread Eric Anholt
Kenneth Graunke  writes:

> Signed-off-by: Kenneth Graunke 
> ---
>  src/compiler/nir/nir_lower_io_to_scalar.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/nir/nir_lower_io_to_scalar.c 
> b/src/compiler/nir/nir_lower_io_to_scalar.c
> index f2345d5..8be75e4 100644
> --- a/src/compiler/nir/nir_lower_io_to_scalar.c
> +++ b/src/compiler/nir/nir_lower_io_to_scalar.c
> @@ -48,8 +48,10 @@ lower_load_input_to_scalar(nir_builder *b, 
> nir_intrinsic_instr *intr)
>  
>nir_intrinsic_set_base(chan_intr, nir_intrinsic_base(intr));
>nir_intrinsic_set_component(chan_intr, nir_intrinsic_component(intr) + 
> i);
> -  /* offset */
> -  chan_intr->src[0] = intr->src[0];
> +
> +  /* offset and possibly barycentric coordinates */
> +  for (int i = 0; i < nir_intrinsic_infos[intr->intrinsic].num_srcs; i++)
> + chan_intr->src[i] = intr->src[i];
>  
>nir_builder_instr_insert(b, _intr->instr);
>  
> @@ -112,6 +114,7 @@ nir_lower_io_to_scalar(nir_shader *shader, 
> nir_variable_mode mask)
>  
> switch (intr->intrinsic) {
> case nir_intrinsic_load_input:
> +   case nir_intrinsic_load_interpolated_input:
>if (mask & nir_var_shader_in)
>   lower_load_input_to_scalar(, intr);
>break;

Don't you also need:

if (intr->intrinsic == nir_intrinsic_load_interpolated_input)
nir_intrinsic_set_interp_mode(chan_intr,
  nir_intrinsic_interp_mode(intr));


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.

2016-08-10 Thread Ian Romanick
On 08/09/2016 07:30 PM, Kenneth Graunke wrote:
> On Haswell (GL 3.3):
> 
> total instructions in shared programs: 6208759 -> 6203860 (-0.08%)
> instructions in affected programs: 856541 -> 851642 (-0.57%)
> helped: 3157
> HURT: 113
> LOST:   7
> GAINED: 15
> 
> On Broadwell (GL 4.4):
> 
> total instructions in shared programs: 11637854 -> 11632016 (-0.05%)
> instructions in affected programs: 1055693 -> 1049855 (-0.55%)
> helped: 3900
> HURT: 176
> LOST:   1
> GAINED: 18
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index 1cf614c..4e9896f 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -251,6 +251,10 @@ optimizations = [
> (('ieq', 'a@bool', False), ('inot', 'a')),
> (('bcsel', a, True, False), ('ine', a, 0)),
> (('bcsel', a, False, True), ('ieq', a, 0)),
> +   (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))),
> +   (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))),
> +   (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0,
> +   (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0,

One of the other things I was experimenting with, that had mixed
results, was rearranging larger sequences like  x * bcsel(b, 1.0, 0.0).
There were a few different ways to handle it, and the optimal way was
dependent on the way the Boolean value was generated and on the way the
result was used.  This dovetailed into my work in removing spurious b2f
(and f2i) expressions.

> (('bcsel', True, b, c), b),
> (('bcsel', False, b, c), c),
> # The result of this should be hit by constant propagation and, in the
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] nir: Turn -(b2f(a) + b2f(b) >= 0 into !(a || b).

2016-08-10 Thread Ian Romanick
On 08/09/2016 07:30 PM, Kenneth Graunke wrote:
> On Haswell (GL 3.3):
> 
> total instructions in shared programs: 6211485 -> 6211427 (-0.00%)
> instructions in affected programs: 16260 -> 16202 (-0.36%)
> helped: 25
> HURT: 37
> 
> On Broadwell (GL 4.4):
> 
> total instructions in shared programs: 11640288 -> 11640218 (-0.00%)
> instructions in affected programs: 16313 -> 16243 (-0.43%)
> helped: 27
> HURT: 37

Interesting... I had a similar version of this that worked on arbitrary
length sequences of (b2f(a_0) + .. b2f(a_n)), but I didn't get any
benefit.  I also had tests for a couple other relational operators with
0 that didn't see benefit.  Maybe one of us should try to revive my old,
more generic series at some point.

For now, this patch is

Reviewed-by: Ian Romanick 

> Signed-off-by: Kenneth Graunke 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index 4e9896f..ef87d4d 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -135,6 +135,9 @@ optimizations = [
> # inot(a)
> (('fge', 0.0, ('b2f', a)), ('inot', a)),
>  
> +   # -(b2f(a) + b2f(b)) >= 0 becomes !(a || b)
> +   (('fge', ('fneg', ('fadd', ('b2f', 'a@bool'), ('b2f', 'b@bool'))), 0.0), 
> ('inot', ('ior', a, b))),
> +
> # 0.0 < fabs(a)
> # fabs(a) > 0.0
> # fabs(a) != 0.0 because fabs(a) must be >= 0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] nir: Convert ineg(b2i(a)) to a if it's a boolean.

2016-08-10 Thread Ian Romanick
My most "recent" work in this area is in my appropriately named
assume-CMS-in-precompile branch.  I should just rename all my branches
idr-got-distracted-#.

On 08/09/2016 07:30 PM, Kenneth Graunke wrote:
> On Haswell (GL 3.3):
> 
> total instructions in shared programs: 6211678 -> 6211584 (-0.00%)
> instructions in affected programs: 18968 -> 18874 (-0.50%)
> helped: 62
> HURT: 0
> LOST:   0
> GAINED: 4
> 
> On Broadwell (GL 4.4):
> 
> total instructions in shared programs: 11638930 -> 11638181 (-0.01%)
> instructions in affected programs: 84768 -> 84019 (-0.88%)
> helped: 505
> HURT: 45
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index 0f0896b..1cf614c 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -180,6 +180,7 @@ optimizations = [
> (('fmul', ('b2f', a), ('b2f', b)), ('b2f', ('iand', a, b))),
> (('fsat', ('fadd', ('b2f', a), ('b2f', b))), ('b2f', ('ior', a, b))),
> (('iand', 'a@bool', 1.0), ('b2f', a)),
> +   (('ineg', ('b2i', 'a@bool')), a),

I think I might like a comment that this works because Boolean true and
false are always ~0 and 0 in NIR.  With that,

Reviewed-by: Ian Romanick 

> (('flt', ('fneg', ('b2f', a)), 0), a), # Generated by TGSI KILL_IF.
> (('flt', ('fsub', 0.0, ('b2f', a)), 0), a), # Generated by TGSI KILL_IF.
> # Comparison with the same args.  Note that these are not done for
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.

2016-08-10 Thread Eric Anholt
Kenneth Graunke  writes:

> On Haswell (GL 3.3):
>
> total instructions in shared programs: 6208759 -> 6203860 (-0.08%)
> instructions in affected programs: 856541 -> 851642 (-0.57%)
> helped: 3157
> HURT: 113
> LOST:   7
> GAINED: 15
>
> On Broadwell (GL 4.4):
>
> total instructions in shared programs: 11637854 -> 11632016 (-0.05%)
> instructions in affected programs: 1055693 -> 1049855 (-0.55%)
> helped: 3900
> HURT: 176
> LOST:   1
> GAINED: 18
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index 1cf614c..4e9896f 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -251,6 +251,10 @@ optimizations = [
> (('ieq', 'a@bool', False), ('inot', 'a')),
> (('bcsel', a, True, False), ('ine', a, 0)),
> (('bcsel', a, False, True), ('ieq', a, 0)),
> +   (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))),
> +   (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))),
> +   (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0,
> +   (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0,

Isn't bcsel's first arg guaranteed to be 0 or ~0?  I thought that was
how nir's bcsel (and b2f) worked.  If not, it would be good to see this
documented.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] *** Aubinator tool for Intel Gen platforms ***

2016-08-10 Thread Nanley Chery
On Wed, Aug 10, 2016 at 09:43:24AM -0700, Nanley Chery wrote:
> On Tue, Aug 09, 2016 at 04:52:12PM -0700, Sirisha Gandikota wrote:
> > From: Sirisha Gandikota 
> > 
> > This is a patch series for adding the aubinator tool to the codebase.
> > 
> > The aubinator tool is designed to help the driver developers to debug
> > the driver functionality by decoding the data in the .aub files. This
> > tool is for Intel Gen platforms and has been tested for Gen7.5, Gen8 
> > and Gen9 platforms.
> > 
> > Kristian Høgsberg Kristensen (1):
> >   aubinator: Add a new tool called Aubinator to the src/intel/tools
> > folder.
> 
> Hi Sirisha,
> 
> It's great to see this tool on the list. I seems that the first patch in
> the series is missing however. Could you send it out?

Sorry for the noise. My Gmail marked Patch 1/2 as spam for some odd reason.

- Nanley

> 
> Thanks,
> Nanley
> 
> > 
> > Sirisha Gandikota (1):
> >   aubinator: Fix the tool to correctly decode the DWords
> > 
> >  configure.ac |1 +
> >  src/Makefile.am  |4 +
> >  src/intel/tools/Makefile.am  |   65 +++
> >  src/intel/tools/aubinator.c  | 1140 
> > ++
> >  src/intel/tools/decoder.c|  515 +++
> >  src/intel/tools/decoder.h|   96 
> >  src/intel/tools/disasm.c |  109 
> >  src/intel/tools/gen_disasm.h |   35 ++
> >  src/intel/tools/intel_aub.h  |  154 ++
> >  9 files changed, 2119 insertions(+)
> >  create mode 100644 src/intel/tools/Makefile.am
> >  create mode 100644 src/intel/tools/aubinator.c
> >  create mode 100644 src/intel/tools/decoder.c
> >  create mode 100644 src/intel/tools/decoder.h
> >  create mode 100644 src/intel/tools/disasm.c
> >  create mode 100644 src/intel/tools/gen_disasm.h
> >  create mode 100644 src/intel/tools/intel_aub.h
> > 
> > -- 
> > 2.7.4
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] anv/blit2d: Add a format parameter to bind_dst and create_iview

2016-08-10 Thread Nanley Chery
On Wed, Aug 10, 2016 at 10:12:17AM -0700, Nanley Chery wrote:
> On Tue, Aug 02, 2016 at 10:00:29AM -0700, Jason Ekstrand wrote:
> > Signed-off-by: Jasosn Ekstrand 
> > Cc: "12.0" 

I just noticed a bug in this patch. With that fixed the Reviewed-by
will hold. Please see the note inline.

> > ---
> >  src/intel/vulkan/anv_meta_blit2d.c | 15 ++-
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/src/intel/vulkan/anv_meta_blit2d.c 
> > b/src/intel/vulkan/anv_meta_blit2d.c
> > index 9d4c2fc..30bc6ed 100644
> > --- a/src/intel/vulkan/anv_meta_blit2d.c
> > +++ b/src/intel/vulkan/anv_meta_blit2d.c
> > @@ -99,6 +99,7 @@ create_iview(struct anv_cmd_buffer *cmd_buffer,
> >   VkImageUsageFlags usage,
> >   uint32_t width,
> >   uint32_t height,
> > + VkFormat format,
> >   VkImage *img,
> >   struct anv_image_view *iview)
> >  {
> > @@ -106,8 +107,7 @@ create_iview(struct anv_cmd_buffer *cmd_buffer,
> >.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO,
> >.imageType = VK_IMAGE_TYPE_2D,
> >/* W-tiled images must be stencil-formatted. */
> > -  .format = surf->tiling == ISL_TILING_W ?
> > -VK_FORMAT_S8_UINT : vk_format_for_size(surf->bs),
> > +  .format = format,
> >.extent = {
> >   .width = width,
> >   .height = height,
> > @@ -190,6 +190,8 @@ blit2d_bind_src(struct anv_cmd_buffer *cmd_buffer,
> >  
> >create_iview(cmd_buffer, src, offset, usage,
> > rect->src_x + rect->width, rect->src_y + rect->height,
> > +   src->tiling == ISL_TILING_W ?
> > +  VK_FORMAT_S8_UINT : vk_format_for_size(src->bs),
> > >image, >iview);
> >  
> >anv_CreateDescriptorPool(vk_device,
> > @@ -339,10 +341,11 @@ blit2d_bind_dst(struct anv_cmd_buffer *cmd_buffer,
> >  uint64_t offset,
> >  uint32_t width,
> >  uint32_t height,
> > +enum isl_format format,

This should be:

VkFormat format,

- Nanley

> >  struct blit2d_dst_temps *tmp)
> >  {
> > create_iview(cmd_buffer, dst, offset, 
> > VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT,
> > -width, height, >image, >iview);
> > +width, height, format, >image, >iview);
> >  
> > anv_CreateFramebuffer(anv_device_to_handle(cmd_buffer->device),
> >&(VkFramebufferCreateInfo) {
> > @@ -417,7 +420,8 @@ anv_meta_blit2d_normal_dst(struct anv_cmd_buffer 
> > *cmd_buffer,
> >  
> >struct blit2d_dst_temps dst_temps;
> >blit2d_bind_dst(cmd_buffer, dst, offset, rects[r].dst_x + 
> > rects[r].width,
> > -  rects[r].dst_y + rects[r].height, _temps);
> > +  rects[r].dst_y + rects[r].height,
> > +  vk_format_for_size(dst->bs), _temps);
> >  
> >struct blit_vb_data {
> >   float pos[2];
> > @@ -555,7 +559,8 @@ anv_meta_blit2d_w_tiled_dst(struct anv_cmd_buffer 
> > *cmd_buffer,
> >};
> >  
> >struct blit2d_dst_temps dst_temps;
> > -  blit2d_bind_dst(cmd_buffer, _Y, offset, xmax_Y, ymax_Y, 
> > _temps);
> > +  blit2d_bind_dst(cmd_buffer, _Y, offset, xmax_Y, ymax_Y,
> > +  vk_format_for_size(dst->bs), _temps);
> 
> We know the format block size here, so I think we should replace dst->bs with
> 1. There actually isn't any other usage of dst->bs (other than the assert) for
> this reason.
> 
> With that change this patch is,
> Reviewed-by: Nanley Chery 
> 
> >  
> >struct blit_vb_header {
> >   struct anv_vue_header vue;
> > -- 
> > 2.5.0.400.gff86faf
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] anv/blit2d: Add a format parameter to bind_dst and create_iview

2016-08-10 Thread Nanley Chery
On Tue, Aug 02, 2016 at 10:00:29AM -0700, Jason Ekstrand wrote:
> Signed-off-by: Jasosn Ekstrand 
> Cc: "12.0" 
> ---
>  src/intel/vulkan/anv_meta_blit2d.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_meta_blit2d.c 
> b/src/intel/vulkan/anv_meta_blit2d.c
> index 9d4c2fc..30bc6ed 100644
> --- a/src/intel/vulkan/anv_meta_blit2d.c
> +++ b/src/intel/vulkan/anv_meta_blit2d.c
> @@ -99,6 +99,7 @@ create_iview(struct anv_cmd_buffer *cmd_buffer,
>   VkImageUsageFlags usage,
>   uint32_t width,
>   uint32_t height,
> + VkFormat format,
>   VkImage *img,
>   struct anv_image_view *iview)
>  {
> @@ -106,8 +107,7 @@ create_iview(struct anv_cmd_buffer *cmd_buffer,
>.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO,
>.imageType = VK_IMAGE_TYPE_2D,
>/* W-tiled images must be stencil-formatted. */
> -  .format = surf->tiling == ISL_TILING_W ?
> -VK_FORMAT_S8_UINT : vk_format_for_size(surf->bs),
> +  .format = format,
>.extent = {
>   .width = width,
>   .height = height,
> @@ -190,6 +190,8 @@ blit2d_bind_src(struct anv_cmd_buffer *cmd_buffer,
>  
>create_iview(cmd_buffer, src, offset, usage,
> rect->src_x + rect->width, rect->src_y + rect->height,
> +   src->tiling == ISL_TILING_W ?
> +  VK_FORMAT_S8_UINT : vk_format_for_size(src->bs),
> >image, >iview);
>  
>anv_CreateDescriptorPool(vk_device,
> @@ -339,10 +341,11 @@ blit2d_bind_dst(struct anv_cmd_buffer *cmd_buffer,
>  uint64_t offset,
>  uint32_t width,
>  uint32_t height,
> +enum isl_format format,
>  struct blit2d_dst_temps *tmp)
>  {
> create_iview(cmd_buffer, dst, offset, VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT,
> -width, height, >image, >iview);
> +width, height, format, >image, >iview);
>  
> anv_CreateFramebuffer(anv_device_to_handle(cmd_buffer->device),
>&(VkFramebufferCreateInfo) {
> @@ -417,7 +420,8 @@ anv_meta_blit2d_normal_dst(struct anv_cmd_buffer 
> *cmd_buffer,
>  
>struct blit2d_dst_temps dst_temps;
>blit2d_bind_dst(cmd_buffer, dst, offset, rects[r].dst_x + 
> rects[r].width,
> -  rects[r].dst_y + rects[r].height, _temps);
> +  rects[r].dst_y + rects[r].height,
> +  vk_format_for_size(dst->bs), _temps);
>  
>struct blit_vb_data {
>   float pos[2];
> @@ -555,7 +559,8 @@ anv_meta_blit2d_w_tiled_dst(struct anv_cmd_buffer 
> *cmd_buffer,
>};
>  
>struct blit2d_dst_temps dst_temps;
> -  blit2d_bind_dst(cmd_buffer, _Y, offset, xmax_Y, ymax_Y, 
> _temps);
> +  blit2d_bind_dst(cmd_buffer, _Y, offset, xmax_Y, ymax_Y,
> +  vk_format_for_size(dst->bs), _temps);

We know the format block size here, so I think we should replace dst->bs with
1. There actually isn't any other usage of dst->bs (other than the assert) for
this reason.

With that change this patch is,
Reviewed-by: Nanley Chery 

>  
>struct blit_vb_header {
>   struct anv_vue_header vue;
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/8] st/mesa: determine states used or affected by shaders at compile time

2016-08-10 Thread Marek Olšák
On Wed, Aug 10, 2016 at 4:34 PM, Nicolai Hähnle  wrote:
> On 10.08.2016 13:05, Marek Olšák wrote:
>>
>> On Tue, Aug 9, 2016 at 12:56 PM, Nicolai Hähnle 
>> wrote:
>>>
>>> On 07.08.2016 03:12, Marek Olšák wrote:


 From: Marek Olšák 

 At compile time, each shader determines which ST_NEW flags should be set
 at shader bind time.

 This just sets the new field for all shaders. The next commit will use
 it.
 ---
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 175
 -
  src/mesa/state_tracker/st_program.c|  37 +-
  src/mesa/state_tracker/st_program.h|   6 +
  3 files changed, 215 insertions(+), 3 deletions(-)

 diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 index 362559f..fd14766 100644
 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 @@ -,31 +,202 @@ get_mesa_program_tgsi(struct gl_context *ctx,

  static struct gl_program *
  get_mesa_program(struct gl_context *ctx,
   struct gl_shader_program *shader_program,
   struct gl_linked_shader *shader)
  {
 struct pipe_screen *pscreen = ctx->st->pipe->screen;
 unsigned ptarget = st_shader_stage_to_ptarget(shader->Stage);
 enum pipe_shader_ir preferred_ir = (enum pipe_shader_ir)
pscreen->get_shader_param(pscreen, ptarget,
 PIPE_SHADER_CAP_PREFERRED_IR);
 +   struct gl_program *prog = NULL;
 +
 if (preferred_ir == PIPE_SHADER_IR_NIR) {
/* TODO only for GLSL VS/FS for now: */
switch (shader->Stage) {
case MESA_SHADER_VERTEX:
case MESA_SHADER_FRAGMENT:
 - return st_nir_get_mesa_program(ctx, shader_program, shader);
 + prog = st_nir_get_mesa_program(ctx, shader_program, shader);
default:
   break;
}
 +   } else {
 +  prog = get_mesa_program_tgsi(ctx, shader_program, shader);
 +   }
 +
 +   if (prog) {
 +  uint64_t *states;
 +
 +  /* This determines which states will be updated when the shader
 is
 +   * bound.
 +   */
 +  switch (shader->Stage) {
 +  case MESA_SHADER_VERTEX:
 + states = &((struct st_vertex_program*)prog)->affected_states;
 +
 + *states = ST_NEW_VS_STATE |
 +   ST_NEW_RASTERIZER |
 +   ST_NEW_VERTEX_ARRAYS;
 +
 + if (prog->Parameters->NumParameters)
 +*states |= ST_NEW_VS_CONSTANTS;
 +
 + if (shader->num_samplers)
 +*states |= ST_NEW_VS_SAMPLER_VIEWS |
 +   ST_NEW_RENDER_SAMPLERS;
 +
 + if (shader->NumImages)
 +*states |= ST_NEW_VS_IMAGES;
 +
 + if (shader->NumUniformBlocks)
 +*states |= ST_NEW_VS_UBOS;
 +
 + if (shader->NumShaderStorageBlocks)
 +*states |= ST_NEW_VS_SSBOS;
 +
 + if (shader->NumAtomicBuffers)
 +*states |= ST_NEW_VS_ATOMICS;
>>>
>>>
>>>
>>> I'm not overly fond of the code duplication here. Perhaps these could all
>>> be
>>> expressed relative to a stage-specific base flag?
>>
>>
>> Not really. It shouldn't rely on the order the flags are declared. I
>> can't make it better than this:
>
>
> I don't know why we can't just say we should be able to rely on the order,
> but the alternative below is indeed not really any better.

We could rely on the order, but as with system value enumerations,
we'd need a lot of assertions to prevent a breakage if somebody
changes the order and we don't notice.

I'll still amend the diff, because it makes the code a little shorter.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] *** Aubinator tool for Intel Gen platforms ***

2016-08-10 Thread Nanley Chery
On Tue, Aug 09, 2016 at 04:52:12PM -0700, Sirisha Gandikota wrote:
> From: Sirisha Gandikota 
> 
> This is a patch series for adding the aubinator tool to the codebase.
> 
> The aubinator tool is designed to help the driver developers to debug
> the driver functionality by decoding the data in the .aub files. This
> tool is for Intel Gen platforms and has been tested for Gen7.5, Gen8 
> and Gen9 platforms.
> 
> Kristian Høgsberg Kristensen (1):
>   aubinator: Add a new tool called Aubinator to the src/intel/tools
> folder.

Hi Sirisha,

It's great to see this tool on the list. I seems that the first patch in
the series is missing however. Could you send it out?

Thanks,
Nanley

> 
> Sirisha Gandikota (1):
>   aubinator: Fix the tool to correctly decode the DWords
> 
>  configure.ac |1 +
>  src/Makefile.am  |4 +
>  src/intel/tools/Makefile.am  |   65 +++
>  src/intel/tools/aubinator.c  | 1140 
> ++
>  src/intel/tools/decoder.c|  515 +++
>  src/intel/tools/decoder.h|   96 
>  src/intel/tools/disasm.c |  109 
>  src/intel/tools/gen_disasm.h |   35 ++
>  src/intel/tools/intel_aub.h  |  154 ++
>  9 files changed, 2119 insertions(+)
>  create mode 100644 src/intel/tools/Makefile.am
>  create mode 100644 src/intel/tools/aubinator.c
>  create mode 100644 src/intel/tools/decoder.c
>  create mode 100644 src/intel/tools/decoder.h
>  create mode 100644 src/intel/tools/disasm.c
>  create mode 100644 src/intel/tools/gen_disasm.h
>  create mode 100644 src/intel/tools/intel_aub.h
> 
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.

2016-08-10 Thread Erik Faye-Lund
On Wed, Aug 10, 2016 at 5:48 PM, Jason Ekstrand  wrote:
> On Aug 10, 2016 1:02 AM, "Erik Faye-Lund"  wrote:
>>
>> On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke 
>> wrote:
>> > On Haswell (GL 3.3):
>> >
>> > total instructions in shared programs: 6208759 -> 6203860 (-0.08%)
>> > instructions in affected programs: 856541 -> 851642 (-0.57%)
>> > helped: 3157
>> > HURT: 113
>> > LOST:   7
>> > GAINED: 15
>> >
>> > On Broadwell (GL 4.4):
>> >
>> > total instructions in shared programs: 11637854 -> 11632016 (-0.05%)
>> > instructions in affected programs: 1055693 -> 1049855 (-0.55%)
>> > helped: 3900
>> > HURT: 176
>> > LOST:   1
>> > GAINED: 18
>> >
>> > Signed-off-by: Kenneth Graunke 
>> > ---
>> >  src/compiler/nir/nir_opt_algebraic.py | 4 
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/src/compiler/nir/nir_opt_algebraic.py
>> > b/src/compiler/nir/nir_opt_algebraic.py
>> > index 1cf614c..4e9896f 100644
>> > --- a/src/compiler/nir/nir_opt_algebraic.py
>> > +++ b/src/compiler/nir/nir_opt_algebraic.py
>> > @@ -251,6 +251,10 @@ optimizations = [
>> > (('ieq', 'a@bool', False), ('inot', 'a')),
>> > (('bcsel', a, True, False), ('ine', a, 0)),
>> > (('bcsel', a, False, True), ('ieq', a, 0)),
>> > +   (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))),
>> > +   (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))),
>> > +   (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0,
>> > +   (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0,
>> > (('bcsel', True, b, c), b),
>> > (('bcsel', False, b, c), c),
>> > # The result of this should be hit by constant propagation and, in
>> > the
>>
>> Same as the previous patch, this smells like intel-isms. Hardware that
>> has native bcsel with support for two inline immediates will do better
>> without.
>
> Why? If you're back-end handles b2f *worse* then bcsel of two components,
> then it's broken.   Also, we have a lot of optimization in NIR to help cut
> through "fake booleans" where shaders use 0.0 and 1.0 and math operations
> instead of actual booleans.  Recognising hand-coded b2f suddenly enables
> more of these optimization paths to run on the shader.  That 0.57% isn't all
> just immediates being removed.

You're right. I don't know what I was thinking.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] nir: Convert ineg(b2i(a)) to a if it's a boolean.

2016-08-10 Thread Erik Faye-Lund
On Wed, Aug 10, 2016 at 5:40 PM, Jason Ekstrand  wrote:
> On Aug 10, 2016 1:00 AM, "Erik Faye-Lund"  wrote:
>>
>> On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke 
>> wrote:
>> > On Haswell (GL 3.3):
>> >
>> > total instructions in shared programs: 6211678 -> 6211584 (-0.00%)
>> > instructions in affected programs: 18968 -> 18874 (-0.50%)
>> > helped: 62
>> > HURT: 0
>> > LOST:   0
>> > GAINED: 4
>> >
>> > On Broadwell (GL 4.4):
>> >
>> > total instructions in shared programs: 11638930 -> 11638181 (-0.01%)
>> > instructions in affected programs: 84768 -> 84019 (-0.88%)
>> > helped: 505
>> > HURT: 45
>> >
>> > Signed-off-by: Kenneth Graunke 
>> > ---
>> >  src/compiler/nir/nir_opt_algebraic.py | 1 +
>> >  1 file changed, 1 insertion(+)
>> >
>> > diff --git a/src/compiler/nir/nir_opt_algebraic.py
>> > b/src/compiler/nir/nir_opt_algebraic.py
>> > index 0f0896b..1cf614c 100644
>> > --- a/src/compiler/nir/nir_opt_algebraic.py
>> > +++ b/src/compiler/nir/nir_opt_algebraic.py
>> > @@ -180,6 +180,7 @@ optimizations = [
>> > (('fmul', ('b2f', a), ('b2f', b)), ('b2f', ('iand', a, b))),
>> > (('fsat', ('fadd', ('b2f', a), ('b2f', b))), ('b2f', ('ior', a,
>> > b))),
>> > (('iand', 'a@bool', 1.0), ('b2f', a)),
>> > +   (('ineg', ('b2i', 'a@bool')), a),
>> > (('flt', ('fneg', ('b2f', a)), 0), a), # Generated by TGSI KILL_IF.
>> > (('flt', ('fsub', 0.0, ('b2f', a)), 0), a), # Generated by TGSI
>> > KILL_IF.
>> > # Comparison with the same args.  Note that these are not done for
>>
>> Are you sure this shouldn't be conditional? This being an improvement
>> sounds like an intel-ism, and not something that applies to most
>> hardware...
>
> Why? You're replacing two instructions with zero. That should help everyone.

My bad, I somehow read this as something entirely different. Sorry for
the noise, looks good to me.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/19] gallium/radeon: more descriptive names for LLVM temporaries in debug builds

2016-08-10 Thread Tom Stellard
On Tue, Aug 09, 2016 at 12:36:40PM +0200, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
This is a great idea.

Reviewed-by: Tom Stellard 

> ---
>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index 7b96a58..22ff18e 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -31,20 +31,21 @@
>  #include "gallivm/lp_bld_init.h"
>  #include "gallivm/lp_bld_intr.h"
>  #include "gallivm/lp_bld_misc.h"
>  #include "gallivm/lp_bld_swizzle.h"
>  #include "tgsi/tgsi_info.h"
>  #include "tgsi/tgsi_parse.h"
>  #include "util/u_math.h"
>  #include "util/u_memory.h"
>  #include "util/u_debug.h"
>  
> +#include 
>  #include 
>  #include 
>  
>  LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base,
> enum tgsi_opcode_type type)
>  {
>   LLVMContextRef ctx = bld_base->base.gallivm->context;
>  
>   switch (type) {
>   case TGSI_TYPE_UNSIGNED:
> @@ -421,20 +422,21 @@ static void emit_declaration(struct 
> lp_build_tgsi_context *bld_base,
>ctx->soa.addr[idx][chan] = 
> si_build_alloca_undef(
>   >gallivm,
>   ctx->soa.bld_base.uint_bld.elem_type, 
> "");
>   }
>   }
>   break;
>   }
>  
>   case TGSI_FILE_TEMPORARY:
>   {
> + char name[16] = "";
>   LLVMValueRef array_alloca = NULL;
>   unsigned decl_size;
>   first = decl->Range.First;
>   last = decl->Range.Last;
>   decl_size = 4 * ((last - first) + 1);
>   if (decl->Declaration.Array) {
>   unsigned id = decl->Array.ArrayID - 1;
>   if (!ctx->arrays) {
>   int size = 
> bld_base->info->array_max[TGSI_FILE_TEMPORARY];
>   ctx->arrays = CALLOC(size, 
> sizeof(ctx->arrays[0]));
> @@ -458,34 +460,42 @@ static void emit_declaration(struct 
> lp_build_tgsi_context *bld_base,
>   ctx->arrays[id].alloca = array_alloca;
>   }
>   }
>  
>   if (!ctx->temps_count) {
>   ctx->temps_count = 
> bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1;
>   ctx->temps = MALLOC(TGSI_NUM_CHANNELS * 
> ctx->temps_count * sizeof(LLVMValueRef));
>   }
>   if (!array_alloca) {
>   for (i = 0; i < decl_size; ++i) {
> +#ifdef DEBUG
> + snprintf(name, sizeof(name), "TEMP%d.%c",
> +  first + i / 4, "xyzw"[i % 4]);
> +#endif
>   ctx->temps[first * TGSI_NUM_CHANNELS + i] =
>   
> si_build_alloca_undef(bld_base->base.gallivm,
> 
> bld_base->base.vec_type,
> -   "temp");
> +   name);
>   }
>   } else {
>   LLVMValueRef idxs[2] = {
>   bld_base->uint_bld.zero,
>   NULL
>   };
>   for (i = 0; i < decl_size; ++i) {
> +#ifdef DEBUG
> + snprintf(name, sizeof(name), "TEMP%d.%c",
> +  first + i / 4, "xyzw"[i % 4]);
> +#endif
>   idxs[1] = 
> lp_build_const_int32(bld_base->base.gallivm, i);
>   ctx->temps[first * TGSI_NUM_CHANNELS + i] =
> - LLVMBuildGEP(builder, array_alloca, 
> idxs, 2, "temp");
> + LLVMBuildGEP(builder, array_alloca, 
> idxs, 2, name);
>   }
>   }
>   break;
>   }
>   case TGSI_FILE_INPUT:
>   {
>   unsigned idx;
>   for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) {
>   if (ctx->load_input)
>   ctx->load_input(ctx, idx, decl);
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/19] gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressing

2016-08-10 Thread Tom Stellard
On Tue, Aug 09, 2016 at 12:36:38PM +0200, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> We can use the pointer stored in the temps array directly.

Reviewed-by: Tom Stellard 
> ---
>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index 41f24d3..e084248 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -352,25 +352,20 @@ LLVMValueRef radeon_llvm_emit_fetch(struct 
> lp_build_tgsi_context *bld_base,
>   case TGSI_FILE_TEMPORARY:
>   if (reg->Register.Index >= ctx->temps_count)
>   return LLVMGetUndef(tgsi2llvmtype(bld_base, type));
>   ptr = ctx->temps[reg->Register.Index * TGSI_NUM_CHANNELS + 
> swizzle];
>   if (tgsi_type_is_64bit(type)) {
>   ptr2 = ctx->temps[reg->Register.Index * 
> TGSI_NUM_CHANNELS + swizzle + 1];
>   return radeon_llvm_emit_fetch_64bit(bld_base, type,
>LLVMBuildLoad(builder, ptr, 
> ""),
>LLVMBuildLoad(builder, ptr2, 
> ""));
>   }
> - LLVMValueRef array = get_alloca_for_array(bld_base, 
> reg->Register.File, reg->Register.Index);
> - if (array) {
> - return bitcast(bld_base, type, 
> load_value_from_array(bld_base, reg->Register.File, type,
> - swizzle, reg->Register.Index, NULL));
> - }
>   result = LLVMBuildLoad(builder, ptr, "");
>   break;
>  
>   case TGSI_FILE_OUTPUT:
>   ptr = lp_get_output_ptr(bld, reg->Register.Index, swizzle);
>   if (tgsi_type_is_64bit(type)) {
>   ptr2 = lp_get_output_ptr(bld, reg->Register.Index, 
> swizzle + 1);
>   return radeon_llvm_emit_fetch_64bit(bld_base, type,
>LLVMBuildLoad(builder, ptr, 
> ""),
>LLVMBuildLoad(builder, ptr2, 
> ""));
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/19] gallium/radeon: simplify radeon_llvm_emit_store for direct array addressing

2016-08-10 Thread Tom Stellard
On Tue, Aug 09, 2016 at 12:36:39PM +0200, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> We can use the pointer stored in the temps array directly.

Reviewed-by: Tom Stellard 
> ---
>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 7 ---
>  1 file changed, 7 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index e084248..7b96a58 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -624,30 +624,23 @@ void radeon_llvm_emit_store(struct 
> lp_build_tgsi_context *bld_base,
>   } else {
>   switch(reg->Register.File) {
>   case TGSI_FILE_OUTPUT:
>   temp_ptr = 
> bld->outputs[reg->Register.Index][chan_index];
>   if (tgsi_type_is_64bit(dtype))
>   temp_ptr2 = 
> bld->outputs[reg->Register.Index][chan_index + 1];
>   break;
>  
>   case TGSI_FILE_TEMPORARY:
>   {
> - LLVMValueRef array;
>   if (reg->Register.Index >= ctx->temps_count)
>   continue;
> - array = get_alloca_for_array(bld_base, 
> reg->Register.File, reg->Register.Index);
>  
> - if (array) {
> - store_value_to_array(bld_base, value, 
> reg->Register.File, chan_index, reg->Register.Index,
> - NULL);
> - continue;
> - }
>   temp_ptr = ctx->temps[ TGSI_NUM_CHANNELS * 
> reg->Register.Index + chan_index];
>   if (tgsi_type_is_64bit(dtype))
>   temp_ptr2 = ctx->temps[ 
> TGSI_NUM_CHANNELS * reg->Register.Index + chan_index + 1];
>  
>   break;
>   }
>   default:
>   return;
>   }
>   if (!tgsi_type_is_64bit(dtype))
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/19] gallium/radeon: clean up emit_declaration for temporaries

2016-08-10 Thread Tom Stellard
On Tue, Aug 09, 2016 at 12:36:37PM +0200, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> In the alloca'd array case, no longer create redundant and unused allocas
> for the individual elements; create getelementptrs instead.

Reviewed-by: Tom Stellard 
> ---
>  .../drivers/radeon/radeon_setup_tgsi_llvm.c| 27 
> ++
>  1 file changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index d75311e..41f24d3 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -408,81 +408,90 @@ static LLVMValueRef si_build_alloca_undef(struct 
> gallivm_state *gallivm,
>   LLVMValueRef ptr = lp_build_alloca(gallivm, type, name);
>   LLVMBuildStore(gallivm->builder, LLVMGetUndef(type), ptr);
>   return ptr;
>  }
>  
>  static void emit_declaration(struct lp_build_tgsi_context *bld_base,
>const struct tgsi_full_declaration *decl)
>  {
>   struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
>   LLVMBuilderRef builder = bld_base->base.gallivm->builder;
> - unsigned first, last, i, idx;
> + unsigned first, last, i;
>   switch(decl->Declaration.File) {
>   case TGSI_FILE_ADDRESS:
>   {
>unsigned idx;
>   for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) {
>   unsigned chan;
>   for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) {
>ctx->soa.addr[idx][chan] = 
> si_build_alloca_undef(
>   >gallivm,
>   ctx->soa.bld_base.uint_bld.elem_type, 
> "");
>   }
>   }
>   break;
>   }
>  
>   case TGSI_FILE_TEMPORARY:
>   {
> + LLVMValueRef array_alloca = NULL;
>   unsigned decl_size;
>   first = decl->Range.First;
>   last = decl->Range.Last;
>   decl_size = 4 * ((last - first) + 1);
>   if (decl->Declaration.Array) {
>   unsigned id = decl->Array.ArrayID - 1;
>   if (!ctx->arrays) {
>   int size = 
> bld_base->info->array_max[TGSI_FILE_TEMPORARY];
>   ctx->arrays = CALLOC(size, 
> sizeof(ctx->arrays[0]));
> - for (i = 0; i < size; ++i) {
> - assert(!ctx->arrays[i].alloca);}
>   }
>  
>   ctx->arrays[id].range = decl->Range;
>  
>   /* If the array is more than 16 elements (each element
>* is 32-bits), then store it in a vector.  Storing the
>* array in a vector will causes the compiler to store
>* the array in registers and access it using indirect
>* addressing.  16 is number of vector elements that
>* LLVM will store in a register.
>* FIXME: We shouldn't need to do this.  LLVM should be
>* smart enough to promote allocas int registers when
>* profitable.
>*/
>   if (decl_size > 16) {
> - ctx->arrays[id].alloca = 
> LLVMBuildAlloca(builder,
> + array_alloca = LLVMBuildAlloca(builder,
>   LLVMArrayType(bld_base->base.vec_type, 
> decl_size),"array");
> + ctx->arrays[id].alloca = array_alloca;
>   }
>   }
> - first = decl->Range.First;
> - last = decl->Range.Last;
> +
>   if (!ctx->temps_count) {
>   ctx->temps_count = 
> bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1;
>   ctx->temps = MALLOC(TGSI_NUM_CHANNELS * 
> ctx->temps_count * sizeof(LLVMValueRef));
>   }
> - for (idx = first; idx <= last; idx++) {
> - for (i = 0; i < TGSI_NUM_CHANNELS; i++) {
> - ctx->temps[idx * TGSI_NUM_CHANNELS + i] =
> + if (!array_alloca) {
> + for (i = 0; i < decl_size; ++i) {
> + ctx->temps[first * TGSI_NUM_CHANNELS + i] =
>   
> si_build_alloca_undef(bld_base->base.gallivm,
> 
> bld_base->base.vec_type,
> "temp");
>   }
> + } else {
> + LLVMValueRef idxs[2] = {
> + 

Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.

2016-08-10 Thread Jason Ekstrand
On Aug 10, 2016 1:02 AM, "Erik Faye-Lund"  wrote:
>
> On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke 
wrote:
> > On Haswell (GL 3.3):
> >
> > total instructions in shared programs: 6208759 -> 6203860 (-0.08%)
> > instructions in affected programs: 856541 -> 851642 (-0.57%)
> > helped: 3157
> > HURT: 113
> > LOST:   7
> > GAINED: 15
> >
> > On Broadwell (GL 4.4):
> >
> > total instructions in shared programs: 11637854 -> 11632016 (-0.05%)
> > instructions in affected programs: 1055693 -> 1049855 (-0.55%)
> > helped: 3900
> > HURT: 176
> > LOST:   1
> > GAINED: 18
> >
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/compiler/nir/nir_opt_algebraic.py | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/src/compiler/nir/nir_opt_algebraic.py
b/src/compiler/nir/nir_opt_algebraic.py
> > index 1cf614c..4e9896f 100644
> > --- a/src/compiler/nir/nir_opt_algebraic.py
> > +++ b/src/compiler/nir/nir_opt_algebraic.py
> > @@ -251,6 +251,10 @@ optimizations = [
> > (('ieq', 'a@bool', False), ('inot', 'a')),
> > (('bcsel', a, True, False), ('ine', a, 0)),
> > (('bcsel', a, False, True), ('ieq', a, 0)),
> > +   (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))),
> > +   (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))),
> > +   (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0,
> > +   (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0,
> > (('bcsel', True, b, c), b),
> > (('bcsel', False, b, c), c),
> > # The result of this should be hit by constant propagation and, in
the
>
> Same as the previous patch, this smells like intel-isms. Hardware that
> has native bcsel with support for two inline immediates will do better
> without.

Why? If you're back-end handles b2f *worse* then bcsel of two components,
then it's broken.   Also, we have a lot of optimization in NIR to help cut
through "fake booleans" where shaders use 0.0 and 1.0 and math operations
instead of actual booleans.  Recognising hand-coded b2f suddenly enables
more of these optimization paths to run on the shader.  That 0.57% isn't
all just immediates being removed.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] nir: Convert ineg(b2i(a)) to a if it's a boolean.

2016-08-10 Thread Jason Ekstrand
On Aug 10, 2016 1:00 AM, "Erik Faye-Lund"  wrote:
>
> On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke 
wrote:
> > On Haswell (GL 3.3):
> >
> > total instructions in shared programs: 6211678 -> 6211584 (-0.00%)
> > instructions in affected programs: 18968 -> 18874 (-0.50%)
> > helped: 62
> > HURT: 0
> > LOST:   0
> > GAINED: 4
> >
> > On Broadwell (GL 4.4):
> >
> > total instructions in shared programs: 11638930 -> 11638181 (-0.01%)
> > instructions in affected programs: 84768 -> 84019 (-0.88%)
> > helped: 505
> > HURT: 45
> >
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/compiler/nir/nir_opt_algebraic.py | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/src/compiler/nir/nir_opt_algebraic.py
b/src/compiler/nir/nir_opt_algebraic.py
> > index 0f0896b..1cf614c 100644
> > --- a/src/compiler/nir/nir_opt_algebraic.py
> > +++ b/src/compiler/nir/nir_opt_algebraic.py
> > @@ -180,6 +180,7 @@ optimizations = [
> > (('fmul', ('b2f', a), ('b2f', b)), ('b2f', ('iand', a, b))),
> > (('fsat', ('fadd', ('b2f', a), ('b2f', b))), ('b2f', ('ior', a,
b))),
> > (('iand', 'a@bool', 1.0), ('b2f', a)),
> > +   (('ineg', ('b2i', 'a@bool')), a),
> > (('flt', ('fneg', ('b2f', a)), 0), a), # Generated by TGSI KILL_IF.
> > (('flt', ('fsub', 0.0, ('b2f', a)), 0), a), # Generated by TGSI
KILL_IF.
> > # Comparison with the same args.  Note that these are not done for
>
> Are you sure this shouldn't be conditional? This being an improvement
> sounds like an intel-ism, and not something that applies to most
> hardware...

Why? You're replacing two instructions with zero. That should help everyone.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/19] gallium, radeonsi: temporary array improvements

2016-08-10 Thread Nicolai Hähnle
FYI, I think I'm going to drop most of the gallium part of this series 
and replace it by a pass over the resulting TGSI to look at which 
components of the arrays are actually being used.


Nicolai

On 09.08.2016 12:36, Nicolai Hähnle wrote:

Hi,

this series was originally motivated by fixing a VM fault and ended up
growing a bit larger :-)

The goal of patches 1-7 is to change st/mesa so that it sets the UsageMask
field in temporary array declarations. This ends up being helpful for
lowering float and vecN arrays with N <= 3.

The remaining patches are radeon (really radeonsi) specific. They begin
with a bunch of cleanups, and then do two things: first, when alloca is
used for arrays, make use of the UsageMask to allocate smaller arrays
when possible. Second, add explicit bounds checks when accessing those
arrays to prevent VM faults -- those temporary array accesses are not
protected by limits in buffer descriptors.

Note that the radeon part of the series exposes some pre-existing LLVM
bugs in Piglit, at least one of which has already been encountered
elsewhere, see http://reviews.llvm.org/D22556 and
http://reviews.llvm.org/D23303.

Please review!

Thanks,
Nicolai


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 96550] 12.0.0-rc3: mesa_dri_drivers.so linking fails with: relocation R_X86_64_32S against `V4F_COUNT' can not be used when making a shared object

2016-08-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=96550

--- Comment #1 from war...@o2.pl ---
FYI:

ppl from Phoronix forums located the cause of the problem: gen_matypes.c is
passed to the compiler in order to generate assembly code in text form
(compiler switch: -S), but with -flto GCC doesn't output any code nor data
because that is postponed to link time. In such case gcc -S -flto generates a
file with empty .text and .data sections.

This isn't a GCC bug. Passing -ffat-lto-objects solves issue for me so maybe if
compile passes -S together with -flto then the config should imply
-ffat-lto-objects ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97231] GL_DEPTH_CLAMP doesn't clamp to the far plane

2016-08-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97231

Jules Blok  changed:

   What|Removed |Added

 Attachment #125650|0   |1
is obsolete||

--- Comment #15 from Jules Blok  ---
Created attachment 125671
  --> https://bugs.freedesktop.org/attachment.cgi?id=125671=edit
apitrace file llvmpipe

Finally got an apitrace running on llvmpipe by using a VM. Sorry for the delay.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] RadeonSI: GPU hang fix + 3 small changes

2016-08-10 Thread Nicolai Hähnle

On 10.08.2016 01:34, Marek Olšák wrote:

Hi,

The first patch is pretty important and fixes the root cause of Overlord and 
Witcher 2 hangs. The GLSL revert can be reverted after that.


I wonder what else this might fix. Anyway, nice catch! The series is

Reviewed-by: Nicolai Hähnle 



Please review.

Marek

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/8] st/mesa: determine states used or affected by shaders at compile time

2016-08-10 Thread Nicolai Hähnle

On 10.08.2016 13:05, Marek Olšák wrote:

On Tue, Aug 9, 2016 at 12:56 PM, Nicolai Hähnle  wrote:

On 07.08.2016 03:12, Marek Olšák wrote:


From: Marek Olšák 

At compile time, each shader determines which ST_NEW flags should be set
at shader bind time.

This just sets the new field for all shaders. The next commit will use it.
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 175
-
 src/mesa/state_tracker/st_program.c|  37 +-
 src/mesa/state_tracker/st_program.h|   6 +
 3 files changed, 215 insertions(+), 3 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 362559f..fd14766 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -,31 +,202 @@ get_mesa_program_tgsi(struct gl_context *ctx,

 static struct gl_program *
 get_mesa_program(struct gl_context *ctx,
  struct gl_shader_program *shader_program,
  struct gl_linked_shader *shader)
 {
struct pipe_screen *pscreen = ctx->st->pipe->screen;
unsigned ptarget = st_shader_stage_to_ptarget(shader->Stage);
enum pipe_shader_ir preferred_ir = (enum pipe_shader_ir)
   pscreen->get_shader_param(pscreen, ptarget,
PIPE_SHADER_CAP_PREFERRED_IR);
+   struct gl_program *prog = NULL;
+
if (preferred_ir == PIPE_SHADER_IR_NIR) {
   /* TODO only for GLSL VS/FS for now: */
   switch (shader->Stage) {
   case MESA_SHADER_VERTEX:
   case MESA_SHADER_FRAGMENT:
- return st_nir_get_mesa_program(ctx, shader_program, shader);
+ prog = st_nir_get_mesa_program(ctx, shader_program, shader);
   default:
  break;
   }
+   } else {
+  prog = get_mesa_program_tgsi(ctx, shader_program, shader);
+   }
+
+   if (prog) {
+  uint64_t *states;
+
+  /* This determines which states will be updated when the shader is
+   * bound.
+   */
+  switch (shader->Stage) {
+  case MESA_SHADER_VERTEX:
+ states = &((struct st_vertex_program*)prog)->affected_states;
+
+ *states = ST_NEW_VS_STATE |
+   ST_NEW_RASTERIZER |
+   ST_NEW_VERTEX_ARRAYS;
+
+ if (prog->Parameters->NumParameters)
+*states |= ST_NEW_VS_CONSTANTS;
+
+ if (shader->num_samplers)
+*states |= ST_NEW_VS_SAMPLER_VIEWS |
+   ST_NEW_RENDER_SAMPLERS;
+
+ if (shader->NumImages)
+*states |= ST_NEW_VS_IMAGES;
+
+ if (shader->NumUniformBlocks)
+*states |= ST_NEW_VS_UBOS;
+
+ if (shader->NumShaderStorageBlocks)
+*states |= ST_NEW_VS_SSBOS;
+
+ if (shader->NumAtomicBuffers)
+*states |= ST_NEW_VS_ATOMICS;



I'm not overly fond of the code duplication here. Perhaps these could all be
expressed relative to a stage-specific base flag?


Not really. It shouldn't rely on the order the flags are declared. I
can't make it better than this:


I don't know why we can't just say we should be able to rely on the 
order, but the alternative below is indeed not really any better.


Cheers,
Nicolai


diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index fd14766..3e5b322 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -6664,6 +6664,37 @@ get_mesa_program_tgsi(struct gl_context *ctx,
return prog;
 }

+static void
+set_affected_state_flags(uint64_t *states,
+ struct gl_program *prog,
+ struct gl_linked_shader *shader,
+ uint64_t new_constants,
+ uint64_t new_sampler_views,
+ uint64_t new_samplers,
+ uint64_t new_images,
+ uint64_t new_ubos,
+ uint64_t new_ssbos,
+ uint64_t new_atomics)
+{
+   if (prog->Parameters->NumParameters)
+  *states |= new_constants;
+
+   if (shader->num_samplers)
+  *states |= new_sampler_views | new_samplers;
+
+   if (shader->NumImages)
+  *states |= new_images;
+
+   if (shader->NumUniformBlocks)
+  *states |= new_ubos;
+
+   if (shader->NumShaderStorageBlocks)
+  *states |= new_ssbos;
+
+   if (shader->NumAtomicBuffers)
+  *states |= new_atomics;
+}
+
 static struct gl_program *
 get_mesa_program(struct gl_context *ctx,
  struct gl_shader_program *shader_program,
@@ -6702,24 +6733,14 @@ get_mesa_program(struct gl_context *ctx,
ST_NEW_RASTERIZER |
ST_NEW_VERTEX_ARRAYS;

- if (prog->Parameters->NumParameters)
-*states |= ST_NEW_VS_CONSTANTS;
-
- if (shader->num_samplers)
-*states |= ST_NEW_VS_SAMPLER_VIEWS |
-   ST_NEW_RENDER_SAMPLERS;
-
- if 

Re: [Mesa-dev] [PATCH] [rfc] glsl: allow invariant on fragment shader outputs.

2016-08-10 Thread Andres Gomez
Hi,

On Mon, 2016-05-23 at 14:18 +1000, Dave Airlie wrote:
> From: Dave Airlie 
> 
> A CTS test manages to generate this:
> GL45-CTS.shading_language_420pack.qualifier_order
> 
> I cannot find definitive evidence in the spec that it isn't
> allowed. The specs mentions some things can't be used on
> fragment shader outputs, but never specifies invariant as
> one of them. Chris found GLSL1.20 was more explicit about this
> but the language was changed/removed in GLSL 1.30.
> 
> This might be a spec bug, I'm not sure.

Specs are ambiguous, so I think it can be considered a bug in the
specs, since they are not that explicit to ban the qualifier in the FS
outputs.

However, my understanding is that using the qualifier is not valid and,
hence, this patch should be dropped.

From page 27 (page 33 of the PDF) of the GLSL 1.20 spec:

  " Only variables output from a vertex shader can be candidates for
    invariance."

But this later changes to:

From page 37 (page 43 of the PDF) of the GLSL 1.30 spec:

  " Only variables output from a shader can be candidates for
    invariance."

However, there is little sense on allowing this qualifier for the FS
outputs and, actually, we can also read, in this direction:

From page 37 (page 43 of the PDF) of the GLSL 1.30 spec:

  " Initially, by default, all output variables are allowed to be
    variant. To force all output variables to be invariant, use the
    pragma

      #pragma STDGL invariant(all)

    before all declarations in a shader. If this pragma is used after
    the declaration of any variables or functions, then the set of
    outputs that behave as invariant is undefined. It is an error to
    use this pragma in a fragment shader."

Then, I think it is safe to assume that the individual use of
"invariant" is also forbidden in FS outputs.

However, would the conclusion be that FS outputs are actually allowed
to be qualified as invariant, this patch LGTM, then.

-- 

Br,

Andres
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Moving amdgpu/addrlib into a git submodule

2016-08-10 Thread Nicolai Hähnle

On 10.08.2016 06:22, Edward O'Callaghan wrote:

Well taking a step back here for a second. My initial thoughts are -
what, conceptually, is the primary user of addrlib from an extremely
high level view?

If the case is mesa3d then perhaps mesa should be treated as the
upstream from which other users pull.


Mesa really doesn't make logical sense as a dependency for the HSA stack.



If the case is that it is indeed
truly agnostic/general then perhaps deferring to the packaging
expertise/infrastructure of the distro's (aka pkgconfig, etc..) is
perhaps the best mechanism.  In the latter case, addrlib would be
released with versioning which should be fairly maintainable as it is a
slowly moving target right?


Please no. Those things are a fragile mess. I appreciate it the thought, 
but there's a reason why major version upgrades of shared libraries are 
feared.


Nicolai





Just my two cent, hope its helpful!
Kind Regards,
Edward.

On 08/10/2016 01:39 AM, Christian König wrote:

Am 09.08.2016 um 15:47 schrieb Nicolai Hähnle:

Hi everybody,

addrlib is the addressing and alignment calculator which is used by
radeonsi. It's developed (and also used) internally at AMD, and so far
we've had one open source copy living in the Mesa repository at
src/gallium/winsys/amdgpu/drm/addrlib.

The question of using addrlib in non-Mesa parts of our open-source
stack has come up, in particular in relation to compute. We'd
obviously like to share the code rather than having multiple copies
flying around. Since the interface of addrlib is slow-moving but
unstable, shared linking is not an option.

I think the best way forward is to create a dedicated repository for
addrlib which is then integrated into Mesa as a git submodule.

The point of this email is to gather feedback from the Mesa community
on this plan, which is explicitly:

(0) Create an addrlib repository, say amd/addrlib on fd.o.
(1) Add it as a git submodule to the Mesa repository.
(2) Fix up whatever aspects of the build system that may be affected
(perhaps for building source tarballs).
(3) Go back to mostly ignoring addrlib, except for trying to get
better at syncing with the internal closed-source version.

From initial experiments, the impact on users interested in radeon is
that they will have to run `git submodule init` and then occasionally
`git submodule update`. Users who do not build radeonsi should be able
to ignore the change completely.

There are alternatives. For example, ROCm uses Google's repo tool
already. But for Mesa, git submodule looks like a lightweight, well
supported and overall conservative option that everybody should
already have installed. If there are good arguments for something
else, let's hear them!

Another point: if we proceed with this plan, I think we should
consider moving addrlib into src/amd/addrlib. There are two reasons:
First, transitioning to a submodule *without* changing the directory
is probably more fragile, i.e. what happens when you switch between
checkouts before and after the transition. Second, if/when radv ends
up being merged into Mesa master, it makes sense to have addrlib there
anyway.

Thoughts? Complaints? Praise?


Well using git submodule is a possibility and we had rather good
experience with that in GStreamer.

But it would remove one major argument to beating the addrlib guys
towards a stable interface and/or proper library version handling.

Moving it into libdrm is clearly not an option because then you would
need to use versioning for the whole libdrm_amdgpu library which we
don't want.

Christian.


Nicolai
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Moving amdgpu/addrlib into a git submodule

2016-08-10 Thread Nicolai Hähnle

On 09.08.2016 22:12, Dave Airlie wrote:


tbh, git submodules are more annoying than they need to be, and I'm
not really terribly excited to use that for something that is a build
dependency.  Maybe just move it into libdrm instead?



I've only had to use git submodules once with spice project, and it
was a nightmare. It makes packaging etc a real pita.


It would be nice to have something more concrete to go on, like what are 
the problems _really_ :)




Alternatives are something like a fetch external sources script,
that does git submodules but does it better, you'll see Vulkan-CTS
etc use something like that, it would have to be integrated with
the build system a bit better though.


What pain do submodules give you that wouldn't be even _worse_ with a 
separate script for fetching external sources?



Look, there are a bunch of different use cases here. Distro packagers; 
Mesa contributors and bleeding-edge users; and us (which is slightly 
separate because we're not _just_ Mesa contributors but have parts of 
the stack elsewhere, which is why this is coming up in the first place).


Most of the complaints seem to come from distro packaging, though I have 
yet to understand what the exact problems are.


For Mesa contributors and users, I'd really like to avoid the hassle of 
having to get another repository manually.


For us, in addition to avoiding the same hassles there's the question of 
maintainability, and perhaps (with a huge load of optimism) getting the 
closed source folks a bit more involved in the fact that there's another 
world out here as well.


On those axes, I've seen one alternative suggestion that makes sense to 
me, and that is subtree-merges. I have to play with them a bit, but the 
initial impression is good. They seem like a more git-like solution. 
That would also be something new for Mesa, though, since it would mean 
more exceptions to the otherwise linear history.


Nicolai
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: in ATI fs don't assume TEMP0=REG0

2016-08-10 Thread Marek Olšák
Pushed, thanks.

Marek

On Mon, Aug 8, 2016 at 12:48 AM, Miklós Máté  wrote:
> The temporaries are allocated dynamically.
>
> Signed-off-by: Miklós Máté 
> ---
>  src/mesa/state_tracker/st_atifs_to_tgsi.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c 
> b/src/mesa/state_tracker/st_atifs_to_tgsi.c
> index 66f442a..ae08796 100644
> --- a/src/mesa/state_tracker/st_atifs_to_tgsi.c
> +++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c
> @@ -691,6 +691,7 @@ transform_inst:
>struct tgsi_full_instruction inst;
>unsigned i;
>int fogc_index = -1;
> +  int reg0_index = current_inst->Src[0].Register.Index;
>
>/* find FOGC input */
>for (i = 0; i < ctx->info.num_inputs; i++) {
> @@ -804,11 +805,11 @@ transform_inst:
>inst.Instruction.Opcode = TGSI_OPCODE_LRP;
>inst.Instruction.NumDstRegs = 1;
>inst.Dst[0].Register.File  = TGSI_FILE_TEMPORARY;
> -  inst.Dst[0].Register.Index = 0;
> +  inst.Dst[0].Register.Index = reg0_index;
>inst.Dst[0].Register.WriteMask = TGSI_WRITEMASK_XYZW;
>inst.Instruction.NumSrcRegs = 3;
>SET_SRC(, 0, TGSI_FILE_TEMPORARY, ctx->fog_factor_temp, X, X, X, 
> Y);
> -  SET_SRC(, 1, TGSI_FILE_TEMPORARY, 0, X, Y, Z, W);
> +  SET_SRC(, 1, TGSI_FILE_TEMPORARY, reg0_index, X, Y, Z, W);
>SET_SRC(, 2, TGSI_FILE_CONSTANT, MAX_NUM_FRAGMENT_CONSTANTS_ATI + 
> 1, X, Y, Z, W);
>tctx->emit_instruction(tctx, );
> }
> --
> 2.8.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] gallium/radeon: use lp_build_alloca_undef

2016-08-10 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Tue, Aug 9, 2016 at 12:38 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> Avoid building all those store 0 / store undef instrucction pairs that
> end up getting removed anyway.
> ---
>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 15 +++
>  1 file changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index 6a010d5..b419add 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -483,43 +483,34 @@ static LLVMValueRef fetch_system_value(struct 
> lp_build_tgsi_context *bld_base,
> struct gallivm_state *gallivm = bld_base->base.gallivm;
>
> LLVMValueRef cval = ctx->system_values[reg->Register.Index];
> if (LLVMGetTypeKind(LLVMTypeOf(cval)) == LLVMVectorTypeKind) {
> cval = LLVMBuildExtractElement(gallivm->builder, cval,
>lp_build_const_int32(gallivm, 
> swizzle), "");
> }
> return bitcast(bld_base, type, cval);
>  }
>
> -static LLVMValueRef si_build_alloca_undef(struct gallivm_state *gallivm,
> - LLVMTypeRef type,
> - const char *name)
> -{
> -   LLVMValueRef ptr = lp_build_alloca(gallivm, type, name);
> -   LLVMBuildStore(gallivm->builder, LLVMGetUndef(type), ptr);
> -   return ptr;
> -}
> -
>  static void emit_declaration(struct lp_build_tgsi_context *bld_base,
>  const struct tgsi_full_declaration *decl)
>  {
> struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
> LLVMBuilderRef builder = bld_base->base.gallivm->builder;
> unsigned first, last, i;
> switch(decl->Declaration.File) {
> case TGSI_FILE_ADDRESS:
> {
>  unsigned idx;
> for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) 
> {
> unsigned chan;
> for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) {
> -ctx->soa.addr[idx][chan] = 
> si_build_alloca_undef(
> +ctx->soa.addr[idx][chan] = 
> lp_build_alloca_undef(
> >gallivm,
> ctx->soa.bld_base.uint_bld.elem_type, 
> "");
> }
> }
> break;
> }
>
> case TGSI_FILE_TEMPORARY:
> {
> char name[16] = "";
> @@ -574,21 +565,21 @@ static void emit_declaration(struct 
> lp_build_tgsi_context *bld_base,
> ctx->temps_count = 
> bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1;
> ctx->temps = MALLOC(TGSI_NUM_CHANNELS * 
> ctx->temps_count * sizeof(LLVMValueRef));
> }
> if (!array_alloca) {
> for (i = 0; i < decl_size; ++i) {
>  #ifdef DEBUG
> snprintf(name, sizeof(name), "TEMP%d.%c",
>  first + i / 4, "xyzw"[i % 4]);
>  #endif
> ctx->temps[first * TGSI_NUM_CHANNELS + i] =
> -   
> si_build_alloca_undef(bld_base->base.gallivm,
> +   
> lp_build_alloca_undef(bld_base->base.gallivm,
>   
> bld_base->base.vec_type,
>   name);
> }
> } else {
> LLVMValueRef idxs[2] = {
> bld_base->uint_bld.zero,
> NULL
> };
> LLVMValueRef undef = NULL;
> unsigned j = 0;
> @@ -633,21 +624,21 @@ static void emit_declaration(struct 
> lp_build_tgsi_context *bld_base,
> }
> break;
>
> case TGSI_FILE_OUTPUT:
> {
> unsigned idx;
> for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) 
> {
> unsigned chan;
> assert(idx < RADEON_LLVM_MAX_OUTPUTS);
> for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) {
> -   ctx->soa.outputs[idx][chan] = 
> si_build_alloca_undef(
> +   ctx->soa.outputs[idx][chan] = 
> lp_build_alloca_undef(
> >gallivm,
> ctx->soa.bld_base.base.elem_type, "");
> }
> }
> break;
>  

Re: [Mesa-dev] [PATCH] st/nine: Fix invalid attempt to use indirect draws.

2016-08-10 Thread Marek Olšák
Pushed.

Marek

On Wed, Aug 10, 2016 at 1:16 PM, trevor.davenp...@gmail.com
 wrote:
> I don't have commit access so someone else will need to push this out.
>
> Trevor
>
>
> On Aug 10, 2016 2:56 AM, "Marek Olšák"  wrote:
>>
>> Reviewed-by: Marek Olšák 
>>
>> Marek
>>
>> On Wed, Aug 10, 2016 at 5:28 AM, Trevor Davenport
>>  wrote:
>> > Since commit 6d7177f01b231e9fe79a558c28d2b562a218d7ea, radeonsi
>> > would take a different path if info->indirect_params was not
>> > initialized properly.  Nine was not initializating this field.
>> > ---
>> >  src/gallium/state_trackers/nine/device9.c | 1 +
>> >  1 file changed, 1 insertion(+)
>> >
>> > diff --git src/gallium/state_trackers/nine/device9.c
>> > src/gallium/state_trackers/nine/device9.c
>> > index d233304..3f6577c 100644
>> > --- src/gallium/state_trackers/nine/device9.c
>> > +++ src/gallium/state_trackers/nine/device9.c
>> > @@ -2935,6 +2935,7 @@ init_draw_info(struct pipe_draw_info *info,
>> >  info->restart_index = 0;
>> >  info->count_from_stream_output = NULL;
>> >  info->indirect = NULL;
>> > +info->indirect_params = NULL;
>> >  }
>> >
>> >  HRESULT NINE_WINAPI
>> > --
>> > 2.7.4
>> >
>> > ___
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] RadeonSI: GPU hang fix + 3 small changes

2016-08-10 Thread Edward O'Callaghan
This series is,

Reviewed-by: Edward O'Callaghan 

On 08/10/2016 09:34 AM, Marek Olšák wrote:
> Hi,
> 
> The first patch is pretty important and fixes the root cause of Overlord and 
> Witcher 2 hangs. The GLSL revert can be reverted after that.
> 
> Please review.
> 
> Marek
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/nine: Fix invalid attempt to use indirect draws.

2016-08-10 Thread trevor.davenp...@gmail.com
I don't have commit access so someone else will need to push this out.

Trevor

On Aug 10, 2016 2:56 AM, "Marek Olšák"  wrote:

> Reviewed-by: Marek Olšák 
>
> Marek
>
> On Wed, Aug 10, 2016 at 5:28 AM, Trevor Davenport
>  wrote:
> > Since commit 6d7177f01b231e9fe79a558c28d2b562a218d7ea, radeonsi
> > would take a different path if info->indirect_params was not
> > initialized properly.  Nine was not initializating this field.
> > ---
> >  src/gallium/state_trackers/nine/device9.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git src/gallium/state_trackers/nine/device9.c
> src/gallium/state_trackers/nine/device9.c
> > index d233304..3f6577c 100644
> > --- src/gallium/state_trackers/nine/device9.c
> > +++ src/gallium/state_trackers/nine/device9.c
> > @@ -2935,6 +2935,7 @@ init_draw_info(struct pipe_draw_info *info,
> >  info->restart_index = 0;
> >  info->count_from_stream_output = NULL;
> >  info->indirect = NULL;
> > +info->indirect_params = NULL;
> >  }
> >
> >  HRESULT NINE_WINAPI
> > --
> > 2.7.4
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/8] st/mesa: determine states used or affected by shaders at compile time

2016-08-10 Thread Marek Olšák
On Tue, Aug 9, 2016 at 12:56 PM, Nicolai Hähnle  wrote:
> On 07.08.2016 03:12, Marek Olšák wrote:
>>
>> From: Marek Olšák 
>>
>> At compile time, each shader determines which ST_NEW flags should be set
>> at shader bind time.
>>
>> This just sets the new field for all shaders. The next commit will use it.
>> ---
>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 175
>> -
>>  src/mesa/state_tracker/st_program.c|  37 +-
>>  src/mesa/state_tracker/st_program.h|   6 +
>>  3 files changed, 215 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> index 362559f..fd14766 100644
>> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> @@ -,31 +,202 @@ get_mesa_program_tgsi(struct gl_context *ctx,
>>
>>  static struct gl_program *
>>  get_mesa_program(struct gl_context *ctx,
>>   struct gl_shader_program *shader_program,
>>   struct gl_linked_shader *shader)
>>  {
>> struct pipe_screen *pscreen = ctx->st->pipe->screen;
>> unsigned ptarget = st_shader_stage_to_ptarget(shader->Stage);
>> enum pipe_shader_ir preferred_ir = (enum pipe_shader_ir)
>>pscreen->get_shader_param(pscreen, ptarget,
>> PIPE_SHADER_CAP_PREFERRED_IR);
>> +   struct gl_program *prog = NULL;
>> +
>> if (preferred_ir == PIPE_SHADER_IR_NIR) {
>>/* TODO only for GLSL VS/FS for now: */
>>switch (shader->Stage) {
>>case MESA_SHADER_VERTEX:
>>case MESA_SHADER_FRAGMENT:
>> - return st_nir_get_mesa_program(ctx, shader_program, shader);
>> + prog = st_nir_get_mesa_program(ctx, shader_program, shader);
>>default:
>>   break;
>>}
>> +   } else {
>> +  prog = get_mesa_program_tgsi(ctx, shader_program, shader);
>> +   }
>> +
>> +   if (prog) {
>> +  uint64_t *states;
>> +
>> +  /* This determines which states will be updated when the shader is
>> +   * bound.
>> +   */
>> +  switch (shader->Stage) {
>> +  case MESA_SHADER_VERTEX:
>> + states = &((struct st_vertex_program*)prog)->affected_states;
>> +
>> + *states = ST_NEW_VS_STATE |
>> +   ST_NEW_RASTERIZER |
>> +   ST_NEW_VERTEX_ARRAYS;
>> +
>> + if (prog->Parameters->NumParameters)
>> +*states |= ST_NEW_VS_CONSTANTS;
>> +
>> + if (shader->num_samplers)
>> +*states |= ST_NEW_VS_SAMPLER_VIEWS |
>> +   ST_NEW_RENDER_SAMPLERS;
>> +
>> + if (shader->NumImages)
>> +*states |= ST_NEW_VS_IMAGES;
>> +
>> + if (shader->NumUniformBlocks)
>> +*states |= ST_NEW_VS_UBOS;
>> +
>> + if (shader->NumShaderStorageBlocks)
>> +*states |= ST_NEW_VS_SSBOS;
>> +
>> + if (shader->NumAtomicBuffers)
>> +*states |= ST_NEW_VS_ATOMICS;
>
>
> I'm not overly fond of the code duplication here. Perhaps these could all be
> expressed relative to a stage-specific base flag?

Not really. It shouldn't rely on the order the flags are declared. I
can't make it better than this:

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index fd14766..3e5b322 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -6664,6 +6664,37 @@ get_mesa_program_tgsi(struct gl_context *ctx,
return prog;
 }

+static void
+set_affected_state_flags(uint64_t *states,
+ struct gl_program *prog,
+ struct gl_linked_shader *shader,
+ uint64_t new_constants,
+ uint64_t new_sampler_views,
+ uint64_t new_samplers,
+ uint64_t new_images,
+ uint64_t new_ubos,
+ uint64_t new_ssbos,
+ uint64_t new_atomics)
+{
+   if (prog->Parameters->NumParameters)
+  *states |= new_constants;
+
+   if (shader->num_samplers)
+  *states |= new_sampler_views | new_samplers;
+
+   if (shader->NumImages)
+  *states |= new_images;
+
+   if (shader->NumUniformBlocks)
+  *states |= new_ubos;
+
+   if (shader->NumShaderStorageBlocks)
+  *states |= new_ssbos;
+
+   if (shader->NumAtomicBuffers)
+  *states |= new_atomics;
+}
+
 static struct gl_program *
 get_mesa_program(struct gl_context *ctx,
  struct gl_shader_program *shader_program,
@@ -6702,24 +6733,14 @@ get_mesa_program(struct gl_context *ctx,
ST_NEW_RASTERIZER |
ST_NEW_VERTEX_ARRAYS;

- if (prog->Parameters->NumParameters)
-*states |= ST_NEW_VS_CONSTANTS;
-
- if (shader->num_samplers)
-*states |= ST_NEW_VS_SAMPLER_VIEWS |
-   

Re: [Mesa-dev] [PATCH 0/8] More state optimizations for st/mesa

2016-08-10 Thread Marek Olšák
On Tue, Aug 9, 2016 at 12:58 PM, Nicolai Hähnle  wrote:
> I sent comments on patches 4 & 6. Apart from that, the series is
>
> Reviewed-by: Nicolai Hähnle 
>
> On 07.08.2016 03:12, Marek Olšák wrote:
>>
>> PS: In order to make reviewing easier, all my patches have 10 lines
>> of contexts instead of 3. That will be the default for all my work
>> from now on.
>
>
> I like that idea, I've made the same change to my git config.

BTW, it breaks the Patchwork patch-matching when doing git-push, but
let's consider that a patchwork bug.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/nine: Fix invalid attempt to use indirect draws.

2016-08-10 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Wed, Aug 10, 2016 at 5:28 AM, Trevor Davenport
 wrote:
> Since commit 6d7177f01b231e9fe79a558c28d2b562a218d7ea, radeonsi
> would take a different path if info->indirect_params was not
> initialized properly.  Nine was not initializating this field.
> ---
>  src/gallium/state_trackers/nine/device9.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git src/gallium/state_trackers/nine/device9.c 
> src/gallium/state_trackers/nine/device9.c
> index d233304..3f6577c 100644
> --- src/gallium/state_trackers/nine/device9.c
> +++ src/gallium/state_trackers/nine/device9.c
> @@ -2935,6 +2935,7 @@ init_draw_info(struct pipe_draw_info *info,
>  info->restart_index = 0;
>  info->count_from_stream_output = NULL;
>  info->indirect = NULL;
> +info->indirect_params = NULL;
>  }
>
>  HRESULT NINE_WINAPI
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.

2016-08-10 Thread Erik Faye-Lund
On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke  wrote:
> On Haswell (GL 3.3):
>
> total instructions in shared programs: 6208759 -> 6203860 (-0.08%)
> instructions in affected programs: 856541 -> 851642 (-0.57%)
> helped: 3157
> HURT: 113
> LOST:   7
> GAINED: 15
>
> On Broadwell (GL 4.4):
>
> total instructions in shared programs: 11637854 -> 11632016 (-0.05%)
> instructions in affected programs: 1055693 -> 1049855 (-0.55%)
> helped: 3900
> HURT: 176
> LOST:   1
> GAINED: 18
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index 1cf614c..4e9896f 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -251,6 +251,10 @@ optimizations = [
> (('ieq', 'a@bool', False), ('inot', 'a')),
> (('bcsel', a, True, False), ('ine', a, 0)),
> (('bcsel', a, False, True), ('ieq', a, 0)),
> +   (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))),
> +   (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))),
> +   (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0,
> +   (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0,
> (('bcsel', True, b, c), b),
> (('bcsel', False, b, c), c),
> # The result of this should be hit by constant propagation and, in the

Same as the previous patch, this smells like intel-isms. Hardware that
has native bcsel with support for two inline immediates will do better
without.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] nir: Convert ineg(b2i(a)) to a if it's a boolean.

2016-08-10 Thread Erik Faye-Lund
On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke  wrote:
> On Haswell (GL 3.3):
>
> total instructions in shared programs: 6211678 -> 6211584 (-0.00%)
> instructions in affected programs: 18968 -> 18874 (-0.50%)
> helped: 62
> HURT: 0
> LOST:   0
> GAINED: 4
>
> On Broadwell (GL 4.4):
>
> total instructions in shared programs: 11638930 -> 11638181 (-0.01%)
> instructions in affected programs: 84768 -> 84019 (-0.88%)
> helped: 505
> HURT: 45
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index 0f0896b..1cf614c 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -180,6 +180,7 @@ optimizations = [
> (('fmul', ('b2f', a), ('b2f', b)), ('b2f', ('iand', a, b))),
> (('fsat', ('fadd', ('b2f', a), ('b2f', b))), ('b2f', ('ior', a, b))),
> (('iand', 'a@bool', 1.0), ('b2f', a)),
> +   (('ineg', ('b2i', 'a@bool')), a),
> (('flt', ('fneg', ('b2f', a)), 0), a), # Generated by TGSI KILL_IF.
> (('flt', ('fsub', 0.0, ('b2f', a)), 0), a), # Generated by TGSI KILL_IF.
> # Comparison with the same args.  Note that these are not done for

Are you sure this shouldn't be conditional? This being an improvement
sounds like an intel-ism, and not something that applies to most
hardware...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/dri2: dri2_make_current: Release previous context's display

2016-08-10 Thread Michel Dänzer
On 10/08/16 03:00 PM, Nicolas Boichat wrote:
> eglMakeCurrent can also be used to change the active display. In that
> case, we need to decrement ref_count of the previous display (possibly
> destroying it), and increment it on the next display.
> 
> Also, old_dsurf/old_rsurf cannot be non-NULL if old_ctx is NULL, so
> we only need to test if old_ctx is non-NULL.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97214
> Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
> Cc: "12.0" 
> Reported-by: Alexandr Zelinsky 
> Tested-by: Alexandr Zelinsky 
> Signed-off-by: Nicolas Boichat 
> ---
>  src/egl/drivers/dri2/egl_dri2.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
> index 3205a36..701e42a 100644
> --- a/src/egl/drivers/dri2/egl_dri2.c
> +++ b/src/egl/drivers/dri2/egl_dri2.c
> @@ -1285,8 +1285,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, 
> _EGLSurface *dsurf,
>  
>if (!unbind)
>   dri2_dpy->ref_count++;
> -  if (old_dsurf || old_rsurf || old_ctx)
> - dri2_display_release(disp);
> +  if (old_ctx) {
> + EGLDisplay old_disp = 
> _eglGetDisplayHandle(old_ctx->Resource.Display);
> + dri2_display_release(old_disp);
> +  }

Unfortunately, this change breaks the piglit test "spec@egl
1.4@eglterminate then unbind context", because old_ctx != NULL but
old_ctx->Resource.Display == NULL. Modifying the test to

  if (old_ctx && old_ctx->Resource.Display) {

fixes the regression and doesn't seem to cause any other problems.
Alexandr, does the patch still fix your problem with that modification?


Nicolas, this regression is also reproducible with
LIBGL_ALWAYS_SOFTWARE=1 . Please get used to testing your changes like
that and only send out changes for review which don't cause any regressions.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: Add support for scalarizing load_interpolated_input.

2016-08-10 Thread Kenneth Graunke
Signed-off-by: Kenneth Graunke 
---
 src/compiler/nir/nir_lower_io_to_scalar.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/compiler/nir/nir_lower_io_to_scalar.c 
b/src/compiler/nir/nir_lower_io_to_scalar.c
index f2345d5..8be75e4 100644
--- a/src/compiler/nir/nir_lower_io_to_scalar.c
+++ b/src/compiler/nir/nir_lower_io_to_scalar.c
@@ -48,8 +48,10 @@ lower_load_input_to_scalar(nir_builder *b, 
nir_intrinsic_instr *intr)
 
   nir_intrinsic_set_base(chan_intr, nir_intrinsic_base(intr));
   nir_intrinsic_set_component(chan_intr, nir_intrinsic_component(intr) + 
i);
-  /* offset */
-  chan_intr->src[0] = intr->src[0];
+
+  /* offset and possibly barycentric coordinates */
+  for (int i = 0; i < nir_intrinsic_infos[intr->intrinsic].num_srcs; i++)
+ chan_intr->src[i] = intr->src[i];
 
   nir_builder_instr_insert(b, _intr->instr);
 
@@ -112,6 +114,7 @@ nir_lower_io_to_scalar(nir_shader *shader, 
nir_variable_mode mask)
 
switch (intr->intrinsic) {
case nir_intrinsic_load_input:
+   case nir_intrinsic_load_interpolated_input:
   if (mask & nir_var_shader_in)
  lower_load_input_to_scalar(, intr);
   break;
-- 
2.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Add an IO scalarizing pass using the intrinsic's first_component.

2016-08-10 Thread Kenneth Graunke
On Friday, August 5, 2016 4:27:03 PM PDT Eric Anholt wrote:
> vc4 wants to have per-scalar IO load/stores so that dead code elimination
> can happen on a more granular basis, which it has been doing in the
> backend using a multiplication by 4 of the intrinsic's driver_location.
> We can represent it properly in the NIR using the first_component field,
> though.
> ---
> 
> Next part of the vc4 GTN series.  I can't do uniforms because they
> don't have a first-component field.
> 
> NIR developers -- is this the right usage of the first-component
> field?

It looks right to me.

> Also, do I need to be splitting up the variable declarations
> if I do this?

Leaving them as-is should be fine.  Variables are pretty meaningless
after nir_lower_io().  i965 basically doesn't use them at all
anymore...with my FS input rework, everything's in the intrinsics.

> There's a check during shader printing that the
> fractional component of the variable matches the component field of
> the intrinsic, even if the intrinsics's component value would lie
> within the vector variable.

Not seeing that code offhand...my guess is that it's trying to print
the name of the variable being loaded.  It's probably just looking for
one with location_frac == component, instead of checking whether

   location_frac <= component < location_frac + vector_elements

That could probably be spiffed up for nicer pretty printing, but
it shouldn't actually affect any functionality.

I think it'd be great to handle load_interpolated_input here too;
I'll send a follow-up patch to do that.  Feel free to review/squash
as you see fit.

Reviewed-by: Kenneth Graunke 

> 
>  src/compiler/Makefile.sources |   1 +
>  src/compiler/nir/nir.h|   1 +
>  src/compiler/nir/nir_lower_io_to_scalar.c | 129 
> ++
>  3 files changed, 131 insertions(+)
>  create mode 100644 src/compiler/nir/nir_lower_io_to_scalar.c
> 
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index 0ff9b23e4c38..04016d32db56 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -201,6 +201,7 @@ NIR_FILES = \
>   nir/nir_lower_indirect_derefs.c \
>   nir/nir_lower_io.c \
>   nir/nir_lower_io_to_temporaries.c \
> + nir/nir_lower_io_to_scalar.c \
>   nir/nir_lower_io_types.c \
>   nir/nir_lower_passthrough_edgeflags.c \
>   nir/nir_lower_phis_to_scalar.c \
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 9ce5be2176d9..379a5f8e022e 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2398,6 +2398,7 @@ void nir_lower_alu_to_scalar(nir_shader *shader);
>  void nir_lower_load_const_to_scalar(nir_shader *shader);
>  
>  void nir_lower_phis_to_scalar(nir_shader *shader);
> +void nir_lower_io_to_scalar(nir_shader *shader, nir_variable_mode mask);
>  
>  void nir_lower_samplers(nir_shader *shader,
>  const struct gl_shader_program *shader_program);
> diff --git a/src/compiler/nir/nir_lower_io_to_scalar.c 
> b/src/compiler/nir/nir_lower_io_to_scalar.c
> new file mode 100644
> index ..f2345d58acfc
> --- /dev/null
> +++ b/src/compiler/nir/nir_lower_io_to_scalar.c
> @@ -0,0 +1,129 @@
> +/*
> + * Copyright © 2016 Broadcom
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "nir.h"
> +#include "nir_builder.h"
> +
> +/** @file nir_lower_io_to_scalar.c
> + *
> + * Replaces nir_load_input/nir_store_output operations with num_components !=
> + * 1 with individual per-channel operations.
> + */
> +
> +static void
> +lower_load_input_to_scalar(nir_builder *b, nir_intrinsic_instr *intr)
> +{
> +   b->cursor = nir_before_instr(>instr);
> +
> +   assert(intr->dest.is_ssa);
> +
> +   nir_ssa_def *loads[4];
> +
> +   for (unsigned i = 0; i < 

Re: [Mesa-dev] [PATCH 02/13] glsl: remove dead builtins before assigning varying locations

2016-08-10 Thread Eric Anholt
Timothy Arceri  writes:

> Builtins already have locations assigned so this shouldn't
> changing anything. We want to call it earlier so we can tranform

  "change"

Other than that, 1-4 are:

Reviewed-by: Eric Anholt 

and I'll see if I can get to some more tomorrow.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: look for frag data bindings with [0] tacked onto the end for arrays

2016-08-10 Thread Eric Anholt
Ilia Mirkin  writes:

> The GL spec is very unclear on this point. Apparently this is discussed
> without resolution in the closed Khronos bugtracker at
> https://cvs.khronos.org/bugzilla/show_bug.cgi?id=7829 . The
> recommendation is to allow dropping the [0] for looking up the bindings.

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97214] X not running with error "Failed to make EGL context current"

2016-08-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97214

Nicolas Boichat  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #12 from Nicolas Boichat  ---
Patch on list here https://patchwork.freedesktop.org/patch/104229/, thanks for
testing!

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] egl/dri2: dri2_make_current: Release previous context's display

2016-08-10 Thread Nicolas Boichat
eglMakeCurrent can also be used to change the active display. In that
case, we need to decrement ref_count of the previous display (possibly
destroying it), and increment it on the next display.

Also, old_dsurf/old_rsurf cannot be non-NULL if old_ctx is NULL, so
we only need to test if old_ctx is non-NULL.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97214
Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" 
Reported-by: Alexandr Zelinsky 
Tested-by: Alexandr Zelinsky 
Signed-off-by: Nicolas Boichat 
---
 src/egl/drivers/dri2/egl_dri2.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index 3205a36..701e42a 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -1285,8 +1285,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLSurface *dsurf,
 
   if (!unbind)
  dri2_dpy->ref_count++;
-  if (old_dsurf || old_rsurf || old_ctx)
- dri2_display_release(disp);
+  if (old_ctx) {
+ EGLDisplay old_disp = _eglGetDisplayHandle(old_ctx->Resource.Display);
+ dri2_display_release(old_disp);
+  }
 
   return EGL_TRUE;
} else {
-- 
2.8.0.rc3.226.g39d4020

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev