[Mesa-dev] [Bug 97285] Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules"
https://bugs.freedesktop.org/show_bug.cgi?id=97285 Matthew Dawson changed: What|Removed |Added CC||matt...@mjdsystems.ca --- Comment #5 from Matthew Dawson --- apitrace with screen darkening can be found at https://stuff.mjdsystems.ca/tf2_srgb_issue.trace. Running it against mesa 11.2.2 shows no lightening issues (other rendering issues do appear). I couldn't test the trace against the broken commit, but I assume it works. sha1sum: ac9273151e55e6c25038687134b06d1636cd9e79 -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] aubinator: Fix the tool to correctly decode the DWords
On Wednesday, August 10, 2016 2:22:29 PM PDT Jason Ekstrand wrote: > On Tue, Aug 9, 2016 at 4:52 PM, Sirisha Gandikota < > sirisha.gandik...@intel.com> wrote: > > > From: Sirisha Gandikota > > > > Several fixes have been added as part of this as listed below: > > > > 1) Fix the mask and add disassembler handling for STATE_DS, STATE_HS > > as the mask returned wrong values of the fields. > > > > 2) Fix the GEN_TYPE_ADDRESS/GEN_TYPE_OFFSET decoding - the address/ > > offset were handled the same way as the other fields and that gives > > the wrong values for the address/offset. > > > > 3) Decode nested/recurssive structures - Many packets contain nested > > structures, ex: 3DSATE_SO_BUFFER, STATE_BASE_ADDRESS, etc contain MOC > > structures. Previously, the aubinator printed 1 if there was a MOC > > structure. Now we decode the entire structure and print out its fields. > > > > 4) Print out the DWord address along with its hex value - For a better > > clarity of information, it is helpful to print both the address and > > hex value of the DWord along with the DWord count. Since the DWord0 > > contains the instruction code and the instruction length, it is > > unnecessary to print the decoded values for DWord0. This information > > is already available from the DWord hex value. > > > > 5) Decode the and the corresponding fields in the group- The > > tag can have fields of several types including structures. A > > group can contain one or more number of fields and this has be correctly > > decoded. Previously, aubinator did not decode the groups or the > > fields/structures inside them. Now we decode the in the > > instructions and structures where the fields in it repeat for any number > > of times specified. > > > > One comment for now: Usually, when making a bunch of changes like this, we > try and split each functional change into its own patch. That way it's > more clear what's going on. This is brand new code, so a couple giant > patches may be ok, but having it split up would be nicer. :) Git has some > tools that make this less painful than it might look and I'm available to > help if you'd like. > > I'll try and get around to actually playing with it soon. > --Jason Sirisha asked me whether she should split things up, or squash everything together. I suggested that she may as well squash everything together for the initial import (but obviously any future changes should follow the normal procedure). My reasoning was: - There was already one giant commit from Kristian importing most of the tool (patch 1 here). I didn't think it was worth time trying to split that out into an artificial history. And...when developing a tool from scratch...what parts go where? - I figured reviewing the tool as a whole would be easier than a series of patches that gave an incomplete picture. - There's nothing to bisect across or anything that might regress. But I did think it was important to preserve authorship information, so both Kristian and Sirisha got proper credit for their work. Anyway, that's why it is the way it is. Feel free to blame me for it :) signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 97285] Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules"
https://bugs.freedesktop.org/show_bug.cgi?id=97285 Matthew Dawson changed: What|Removed |Added Status|NEEDINFO|REOPENED --- Comment #4 from Matthew Dawson --- I can confirm seeing the same behaviour in TF2 on a 7970. I suspect this is a driver bug, and not a game bug, as the lighting sometimes works after I play the game for a bit (but never at the start). The lighting may break later during the same session. In the couple games where it fixed itself, it was usually around the time this bug: https://bugs.freedesktop.org/show_bug.cgi?id=93649 occurred, though I don't know if that is significant. I independently bisected out the same commit as well. I'll try to get an apitrace, see if that shows anything interesting. Also, as another point of data, CS:GO seems to have no issues. Would an apitrace of it be useful? -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Moving amdgpu/addrlib into a git submodule
Hi, On Wednesday, 10 August 2016 15:19:06 CEST Nicolai Hähnle wrote: > On 09.08.2016 22:12, Dave Airlie wrote: > >> > >> tbh, git submodules are more annoying than they need to be, and I'm > >> not really terribly excited to use that for something that is a build > >> dependency. Maybe just move it into libdrm instead? > >> > > > > I've only had to use git submodules once with spice project, and it > > was a nightmare. It makes packaging etc a real pita. > > It would be nice to have something more concrete to go on, like what are > the problems _really_ :) > > > > Alternatives are something like a fetch external sources script, > > that does git submodules but does it better, you'll see Vulkan-CTS > > etc use something like that, it would have to be integrated with > > the build system a bit better though. > > What pain do submodules give you that wouldn't be even _worse_ with a > separate script for fetching external sources? > > > Look, there are a bunch of different use cases here. Distro packagers; > Mesa contributors and bleeding-edge users; and us (which is slightly > separate because we're not _just_ Mesa contributors but have parts of > the stack elsewhere, which is why this is coming up in the first place). > > Most of the complaints seem to come from distro packaging, though I have > yet to understand what the exact problems are. > > For Mesa contributors and users, I'd really like to avoid the hassle of > having to get another repository manually. > > For us, in addition to avoiding the same hassles there's the question of > maintainability, and perhaps (with a huge load of optimism) getting the > closed source folks a bit more involved in the fact that there's another > world out here as well. > > On those axes, I've seen one alternative suggestion that makes sense to > me, and that is subtree-merges. I have to play with them a bit, but the > initial impression is good. They seem like a more git-like solution. > That would also be something new for Mesa, though, since it would mean > more exceptions to the otherwise linear history. For subtree-merge: There is a pair of visualisation teams that used to work for the same company until they both got sold to individual vendors but still wanted to share their basic framwork code. Those two applications done by these teams are somewhere beyond - may be far beyond 1e6 loc each. The shared part is at least 2e5 loc, probably more. I do not recall the exact numbers. The shared part is communicated via a common upstream repository managed by git-subtree. There happens branching merging, rebasing, the usual git things. The git version that is used is ancient, the one from redhat 6. The git repository is based on an import from svn with non standard branch names in svn and that svn used to be imported from cvs. All together more than 10 years of history. Observed problems: It needs special care to be handled. There were not so nice things that happened like disconnected git object trees. After installing a git hook in the subtree managed upstream repository that refuses such a push, disconnected object trees never happened again. Also checking out commits from the subtree side of a subtree merge gives interresting results. Well, plausible once you think about it, but probably surprising for the average git user. You will see a completely different filesystem structure, the one of the subtree you have. The success side: A hand full years sharing > 2e5 loc where normal development happens. Probably some orders of magnitude more changes than I observed with addrlib since its creation. It basically does what you expect when you think about the problem to be solved. So, beside the problems there has been loads of expected and good behavior. Without subtree merge that undertaken would have been much more difficult to infeasible. I do not think that I can give a recomendation. The basic constraints like the age of git used and the history of the repositry may introduce problems that you never observe with something simpler. best Mathias ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 97231] GL_DEPTH_CLAMP doesn't clamp to the far plane
https://bugs.freedesktop.org/show_bug.cgi?id=97231 --- Comment #16 from Roland Scheidegger --- (In reply to Jules Blok from comment #15) > Created attachment 125671 [details] > apitrace file llvmpipe > > Finally got an apitrace running on llvmpipe by using a VM. Sorry for the > delay. Thanks, works much better (although in turn it doesn't work on nvidia blob here - looks like it doesn't like the offsets used for glBindBufferRange() there...). I might be able to look into in some days... Although I really believe this to be a llvmpipe problem - if other mesa drivers are affected by it too they probably have their own, different bugs with it. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 97285] Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules"
https://bugs.freedesktop.org/show_bug.cgi?id=97285 --- Comment #3 from Kenneth Graunke --- It might also be worth testing with and without multisampling... -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 97285] Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules"
https://bugs.freedesktop.org/show_bug.cgi?id=97285 --- Comment #2 from Kenneth Graunke --- On i965, both Dota 2 (Source 2) and Left 4 Dead 2 (Source 1) appear to be working. I took screenshots from the same point in a Dota 2 tournament replay both with and without the sRGB patches, and they look identical. So perhaps I just broke Gallium or radeonsi drivers? Sorry about that :( -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl/tests: fix segfault in uniform initializer test
On Wed, Aug 10, 2016 at 7:34 PM, Timothy Arceri < timothy.arc...@collabora.com> wrote: > Cause by 549222f5 > s/Cause/Caused/ ? Fixes the make check failure for me (haven't done a full piglit run, just build/check/install test). Thanks for the quick turnaround. --Aaron > > Cc: Aaron Watry > --- > src/compiler/glsl/tests/set_uniform_initializer_tests.cpp | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp > b/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp > index a36ffdc..9d41017 100644 > --- a/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp > +++ b/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp > @@ -24,6 +24,7 @@ > #include "main/compiler.h" > #include "main/mtypes.h" > #include "main/macros.h" > +#include "program/hash_table.h" > #include "util/ralloc.h" > #include "uniform_initializer_utils.h" > > @@ -108,6 +109,7 @@ establish_uniform_storage(struct gl_shader_program > *prog, unsigned num_storage, >+ type->components())); > const unsigned red_zone_components = total_components - > data_components; > > + prog->UniformHash = new string_to_uint_map; > prog->UniformStorage = rzalloc_array(prog, struct gl_uniform_storage, > num_storage); > prog->NumUniformStorage = num_storage; > @@ -128,6 +130,9 @@ establish_uniform_storage(struct gl_shader_program > *prog, unsigned num_storage, > data_components, > red_zone_components); > > + prog->UniformHash->put(index_to_set, > + prog->UniformStorage[index_to_set].name); > + > for (unsigned i = 0; i < num_storage; i++) { >if (i == index_to_set) > continue; > -- > 2.7.4 > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl/tests: fix segfault in uniform initializer test
Cause by 549222f5 Cc: Aaron Watry --- src/compiler/glsl/tests/set_uniform_initializer_tests.cpp | 5 + 1 file changed, 5 insertions(+) diff --git a/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp b/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp index a36ffdc..9d41017 100644 --- a/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp +++ b/src/compiler/glsl/tests/set_uniform_initializer_tests.cpp @@ -24,6 +24,7 @@ #include "main/compiler.h" #include "main/mtypes.h" #include "main/macros.h" +#include "program/hash_table.h" #include "util/ralloc.h" #include "uniform_initializer_utils.h" @@ -108,6 +109,7 @@ establish_uniform_storage(struct gl_shader_program *prog, unsigned num_storage, + type->components())); const unsigned red_zone_components = total_components - data_components; + prog->UniformHash = new string_to_uint_map; prog->UniformStorage = rzalloc_array(prog, struct gl_uniform_storage, num_storage); prog->NumUniformStorage = num_storage; @@ -128,6 +130,9 @@ establish_uniform_storage(struct gl_shader_program *prog, unsigned num_storage, data_components, red_zone_components); + prog->UniformHash->put(index_to_set, + prog->UniformStorage[index_to_set].name); + for (unsigned i = 0; i < num_storage; i++) { if (i == index_to_set) continue; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Bisected: Make check is failing as of 549222f5
Hello, I just got home and did a pull/build, and make check is failing as of a very recent commit today... ${MESA_ROOT}/bin/test-driver: line 107: 5863 Segmentation fault (core dumped) "$@" > $log_file 2>&1 FAIL: glsl/tests/uniform-initializer-test PASS: glsl/glcpp/tests/glcpp-test ^Cmake[6]: *** Deleting file 'glsl/glcpp/tests/glcpp-test-cr-lf.log' Makefile:2442: recipe for target 'glsl/glcpp/tests/glcpp-test-cr-lf.log' failed make[6]: *** [glsl/glcpp/tests/glcpp-test-cr-lf.log] Error 130 Makefile:2414: recipe for target 'check-TESTS' failed make[5]: *** [check-TESTS] Interrupt Makefile:2546: recipe for target 'check-am' failed make[4]: *** [check-am] Interrupt Makefile:2549: recipe for target 'check' failed make[3]: *** [check] Interrupt Makefile:711: recipe for target 'check-recursive' failed make[2]: *** [check-recursive] Interrupt Makefile:860: recipe for target 'check' failed make[1]: *** [check] Interrupt Makefile:651: recipe for target 'check-recursive' failed make: *** [check-recursive] Interrupt me@hostname:${MESA_ROOT$ git bisect bad 549222f5f8ef4616f5e6ddeb5c29ea6446684e5e is the first bad commit commit 549222f5f8ef4616f5e6ddeb5c29ea6446684e5e Author: Timothy Arceri Date: Wed Jul 27 15:20:44 2016 +1000 glsl: use UniformHash to find storage location There is no need to be looping over all the uniforms. Reviewed-by: Eric Anholt :04 04 86cc93603122405d5f03163d2abed6bac788e086 7567379ba933b6089ca1a34f8ba4c1fe8da04729 Msrc ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 97285] Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules"
https://bugs.freedesktop.org/show_bug.cgi?id=97285 Ian Romanick changed: What|Removed |Added Status|NEW |NEEDINFO CC||i...@freedesktop.org, ||kenn...@whitecape.org --- Comment #1 from Ian Romanick --- This will probably affect i965 too... do you have an apitrace or something at the locations in question? I know Ken tried some Source engine games before pushing the series. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On 08/10/2016 12:50 PM, Eric Anholt wrote: > Connor Abbott writes: > >> On Wed, Aug 10, 2016 at 1:53 PM, Eric Anholt wrote: >>> Kenneth Graunke writes: >>> On Haswell (GL 3.3): total instructions in shared programs: 6208759 -> 6203860 (-0.08%) instructions in affected programs: 856541 -> 851642 (-0.57%) helped: 3157 HURT: 113 LOST: 7 GAINED: 15 On Broadwell (GL 4.4): total instructions in shared programs: 11637854 -> 11632016 (-0.05%) instructions in affected programs: 1055693 -> 1049855 (-0.55%) helped: 3900 HURT: 176 LOST: 1 GAINED: 18 Signed-off-by: Kenneth Graunke --- src/compiler/nir/nir_opt_algebraic.py | 4 1 file changed, 4 insertions(+) diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index 1cf614c..4e9896f 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -251,6 +251,10 @@ optimizations = [ (('ieq', 'a@bool', False), ('inot', 'a')), (('bcsel', a, True, False), ('ine', a, 0)), (('bcsel', a, False, True), ('ieq', a, 0)), + (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))), + (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))), + (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0, + (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0, >>> >>> Isn't bcsel's first arg guaranteed to be 0 or ~0? I thought that was >>> how nir's bcsel (and b2f) worked. If not, it would be good to see this >>> documented. >> >> Yes, any ALU operation that takes a boolean input can expect it to be >> 0 or ~0. If that isn't true, then something else in the compiler has >> messed up. So I think we can replace 'a != 0' with 'a' and 'a == 0' >> with '!a'. > > Yeah. So, I like the contents of this series a lot, but it would be > nice to see some of the @bool annotations dropped. Right... because ('b2f', 'a@bool') is redundant. I agree. > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/13] glsl: remove dead builtins before assigning varying locations
On Wed, 2016-08-10 at 00:01 -0700, Eric Anholt wrote: > Timothy Arceri writes: > > > > > Builtins already have locations assigned so this shouldn't > > changing anything. We want to call it earlier so we can tranform > > "change" > > Other than that, 1-4 are: > > Reviewed-by: Eric Anholt > > and I'll see if I can get to some more tomorrow. Great. Thanks for taking a look :) > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 97285] Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules"
https://bugs.freedesktop.org/show_bug.cgi?id=97285 Bug ID: 97285 Summary: Darkness in Dota 2 after Patch "Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules" Product: Mesa Version: unspecified Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: edmo...@eriadon.com QA Contact: mesa-dev@lists.freedesktop.org After updating to mesa master (6575ebdc), Dota 2 colors are much more darker. >From the IRC log 2016-08-10 it seems haagch already bisected it: 15:53 #radeon: < haagch> anyway, am I the only one seeing some opengl games being far too dark, like dota 2 and portal 2? 16:03 #radeon: < nha> haagch, since when? maybe since those sRGB changes? 16:38 #radeon: < haagch> nha: you were right, 3190c7ee9727161d627f107c2e7f8ec3a11941c1 is the first bad commit I can confirm that reverting patch 3190c7ee9727161d627f107c2e7f8ec3a11941c1 takes away the darkness in Dota 2. My configuration: OpenGL renderer string: Gallium 0.4 on AMD CYPRESS (DRM 2.46.0 / 4.8.0-rc1, LLVM 3.8.1) OpenGL core profile version string: 4.1 (Core Profile) Mesa 12.1.0-devel (git-796d760) OpenGL core profile shading language version string: 4.10 -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl/dri2: dri2_make_current: Release previous context's display
On Wed, Aug 10, 2016 at 9:44 AM, Michel Dänzer wrote: > On 10/08/16 03:00 PM, Nicolas Boichat wrote: >> eglMakeCurrent can also be used to change the active display. In that >> case, we need to decrement ref_count of the previous display (possibly >> destroying it), and increment it on the next display. >> >> Also, old_dsurf/old_rsurf cannot be non-NULL if old_ctx is NULL, so >> we only need to test if old_ctx is non-NULL. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97214 >> Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display) >> Cc: "12.0" >> Reported-by: Alexandr Zelinsky >> Tested-by: Alexandr Zelinsky >> Signed-off-by: Nicolas Boichat >> --- >> src/egl/drivers/dri2/egl_dri2.c | 6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/src/egl/drivers/dri2/egl_dri2.c >> b/src/egl/drivers/dri2/egl_dri2.c >> index 3205a36..701e42a 100644 >> --- a/src/egl/drivers/dri2/egl_dri2.c >> +++ b/src/egl/drivers/dri2/egl_dri2.c >> @@ -1285,8 +1285,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, >> _EGLSurface *dsurf, >> >>if (!unbind) >> dri2_dpy->ref_count++; >> - if (old_dsurf || old_rsurf || old_ctx) >> - dri2_display_release(disp); >> + if (old_ctx) { >> + EGLDisplay old_disp = >> _eglGetDisplayHandle(old_ctx->Resource.Display); >> + dri2_display_release(old_disp); >> + } > > Unfortunately, this change breaks the piglit test "spec@egl > 1.4@eglterminate then unbind context", because old_ctx != NULL but > old_ctx->Resource.Display == NULL. Modifying the test to > > if (old_ctx && old_ctx->Resource.Display) { > > fixes the regression and doesn't seem to cause any other problems. This is probably wrong as the display will leak (it definitely should be freed after calling eglTerminate + eglMakeCurrent(NULL)). I think I know where the problem is (the call to _eglReleaseDisplayResources happens too early), I'm not sure what's the best solution. I'll have a look. > Alexandr, does the patch still fix your problem with that modification? > > > Nicolas, this regression is also reproducible with > LIBGL_ALWAYS_SOFTWARE=1 . Please get used to testing your changes like > that and only send out changes for review which don't cause any regressions. Managed to build piglit, and run it using a locally built mesa. Are the commands below what you would use? export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MESA_DIR/lib export LIBGL_DRIVERS_PATH=$MESA_DIR/lib/gallium export EGL_DRIVERS_PATH=$MESA_DIR/lib export EGL_LOG_LEVEL=debug export LIBGL_ALWAYS_SOFTWARE=1 ./piglit run -p x11_egl -t "spec@egl.*" all results/master Best, Nicolas ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On Wed, Aug 10, 2016 at 2:24 PM, Kenneth Graunke wrote: > On Wednesday, August 10, 2016 10:02:12 AM PDT Erik Faye-Lund wrote: >> On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke >> wrote: >> > On Haswell (GL 3.3): >> > >> > total instructions in shared programs: 6208759 -> 6203860 (-0.08%) >> > instructions in affected programs: 856541 -> 851642 (-0.57%) >> > helped: 3157 >> > HURT: 113 >> > LOST: 7 >> > GAINED: 15 >> > >> > On Broadwell (GL 4.4): >> > >> > total instructions in shared programs: 11637854 -> 11632016 (-0.05%) >> > instructions in affected programs: 1055693 -> 1049855 (-0.55%) >> > helped: 3900 >> > HURT: 176 >> > LOST: 1 >> > GAINED: 18 >> > >> > Signed-off-by: Kenneth Graunke >> > --- >> > src/compiler/nir/nir_opt_algebraic.py | 4 >> > 1 file changed, 4 insertions(+) >> > >> > diff --git a/src/compiler/nir/nir_opt_algebraic.py >> > b/src/compiler/nir/nir_opt_algebraic.py >> > index 1cf614c..4e9896f 100644 >> > --- a/src/compiler/nir/nir_opt_algebraic.py >> > +++ b/src/compiler/nir/nir_opt_algebraic.py >> > @@ -251,6 +251,10 @@ optimizations = [ >> > (('ieq', 'a@bool', False), ('inot', 'a')), >> > (('bcsel', a, True, False), ('ine', a, 0)), >> > (('bcsel', a, False, True), ('ieq', a, 0)), >> > + (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))), >> > + (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))), >> > + (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0, >> > + (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0, >> > (('bcsel', True, b, c), b), >> > (('bcsel', False, b, c), c), >> > # The result of this should be hit by constant propagation and, in the >> >> Same as the previous patch, this smells like intel-isms. Hardware that >> has native bcsel with support for two inline immediates will do better >> without. > > It definitely feels a little strange replacing a single bcsel with > a fneg/b2f/ine, as that's three operations instead of one. > > I expect the ine to go away - assuming 'a' is a properly formatted > boolean (0 or 0x), "ine a 0" will just become 'a'. ieq would > turn into inot. I made the point in another place in this thread, but you don't have to assume -- things that take booleans as input should only ever get 0 or ~0. Anything else would be a bug in the compiler. Now that I have a little more free time before school starts, I've been thinking about making booleans have a bit-width of 1 (i.e. making them logical) and then only lowering to their concrete representation very late, if at all. That would keep this confusion from happening, in addition to bringing a lot of other benefits, and it should be a lot easier now that we have bit widths in NIR. > If the boolean was a comparison, the inot could be > folded in - i.e. inot(flt(a,b)) -> fge(a,b). Or, some GPUs can handle > boolean negation as a source modifier, so it might be free there too. > > Floating point negation can usually be done as a source modifier. > > For reference, here's a shader snippet from Goat Simulator which > prompted me to write this optimization: > > const vec4 LocalConst1 = vec4(0.25, -0.25, 0.00, 1.00); > > void main() > { > ... > InstrHelpTemp.r = ( ( Temporary1.r >= 0.0 ) ? LocalConst1.b : LocalConst1.a ); > ... > } > > which could be turned into > > InstrHelpTemp.r = float(Temporary1.r < 0.0); > > which seems arguably better, regardless of hardware. > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 00/16] radeonsi: improve handling of temporary arrays
On Wed, Aug 10, 2016 at 9:23 PM, Nicolai Hähnle wrote: > Hi, > > this is a respin of the series which scans the shader's TGSI to determine > which channels of an array are actually written to. Most of the st/mesa > changes have become unnecessary. Most of the radeon-specific part stays > the same. > > For one F1 2015 shader, it reduces the scratch size from 132096 to 26624 > bytes, which is bound to be much nicer on the texture cache. This has been bugging me... is there something we can do to move temporary arrays to registers? F1 2015 is the only game that doesn't "spill VGPRs", yet has the highest scratch usage per shader. (without this series) If a shader uses 32 VGPRs and a *ton* of scratch space, you know something is wrong. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] aubinator: Fix the tool to correctly decode the DWords
On Tue, Aug 9, 2016 at 4:52 PM, Sirisha Gandikota wrote: > From: Sirisha Gandikota > > Several fixes have been added as part of this as listed below: > > 1) Fix the mask and add disassembler handling for STATE_DS, STATE_HS > as the mask returned wrong values of the fields. > > 2) Fix the GEN_TYPE_ADDRESS/GEN_TYPE_OFFSET decoding - the address/ > offset were handled the same way as the other fields and that gives > the wrong values for the address/offset. > > 3) Decode nested/recurssive structures - Many packets contain nested > structures, ex: 3DSATE_SO_BUFFER, STATE_BASE_ADDRESS, etc contain MOC > structures. Previously, the aubinator printed 1 if there was a MOC > structure. Now we decode the entire structure and print out its fields. > > 4) Print out the DWord address along with its hex value - For a better > clarity of information, it is helpful to print both the address and > hex value of the DWord along with the DWord count. Since the DWord0 > contains the instruction code and the instruction length, it is > unnecessary to print the decoded values for DWord0. This information > is already available from the DWord hex value. > > 5) Decode the and the corresponding fields in the group- The > tag can have fields of several types including structures. A > group can contain one or more number of fields and this has be correctly > decoded. Previously, aubinator did not decode the groups or the > fields/structures inside them. Now we decode the in the > instructions and structures where the fields in it repeat for any number > of times specified. > > Signed-off-by: Sirisha Gandikota > --- > src/intel/tools/aubinator.c | 115 > +--- > src/intel/tools/decoder.c | 95 +--- > src/intel/tools/decoder.h | 39 +++ > 3 files changed, 192 insertions(+), 57 deletions(-) > > diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c > index 99d67a1..d2f9d13 100644 > --- a/src/intel/tools/aubinator.c > +++ b/src/intel/tools/aubinator.c > @@ -84,17 +84,80 @@ valid_offset(uint32_t offset) > } > > static void > +print_dword_val(struct gen_field_iterator *iter, uint64_t offset, int > *dword_num) > +{ > + struct gen_field *f; > + union { > + uint32_t dw; > + float f; > + } v; > + > + f = iter->group->fields[iter->i-1]; Spaces around operators. Occurs many places. > + v.dw = iter->p[f->start / 32]; > + > + if (*dword_num != (f->start/32)) { > + printf("0x%08lx: 0x%08x : Dword %d\n",(offset+4*(f->start/32)), v.dw, > f->start/32); > + *dword_num = (f->start/32); > + } > +} > + > +static char * > +print_iterator_values(struct gen_field_iterator *iter, int *idx) > +{ > +char *token = NULL; > +if (strstr(iter->value,"struct") == NULL) { > +printf("%s: %s\n", iter->name, iter->value); > +} else { > +token = strtok(iter->value, " "); > +if (token != NULL) { > +token = strtok(NULL, " "); > +*idx = atoi(strtok(NULL, ">")); > +} else { > +token = NULL; > +} > +printf("%s:\n", iter->name, token); > +} > +return token; > + Extra new line. > +} > + > +static void > decode_structure(struct gen_spec *spec, struct gen_group *strct, const > uint32_t *p) > { > struct gen_field_iterator iter; > + char *token = NULL; > + int idx = 0, dword_num = 0; > + uint64_t offset = 0; > + > + if (option_print_offsets) > + offset = (void *) p - gtt; > + else > + offset = 0; > > gen_field_iterator_init(&iter, strct, p); > while (gen_field_iterator_next(&iter)) { > - printf("%s: %s\n", iter.name, iter.value); > + idx = 0; > + print_dword_val(&iter, offset, &dword_num); > + token = print_iterator_values(&iter, &idx); > + if (token != NULL) { > + struct gen_group *struct_val = gen_spec_find_struct(spec, token); > + decode_structure(spec, struct_val, &p[idx]); > + token = NULL; > + } > } > } > > static void > +handle_struct_decode(struct gen_spec *spec, char *struct_name, uint32_t *p) > +{ > +if (struct_name == NULL) > +return; > +struct gen_group *struct_val = gen_spec_find_struct(spec, struct_name); > +decode_structure(spec, struct_val, p); > + Extra newline. > +} > + > +static void > dump_binding_table(struct gen_spec *spec, uint32_t offset) > { > uint32_t *pointers, i; > @@ -248,7 +311,8 @@ handle_media_interface_descriptor_load(struct gen_spec > *spec, uint32_t *p) > } > > /* Heuristic to determine whether a uint32_t is probably actually a float > - * (http://stackoverflow.com/a/2953466) */ > + * (http://stackoverflow.com/a/2953466) > + */ > > static bool > probably_float(uint32_t bits) > @@ -256,15 +320,15 @@ probably_float(uint32_t bits) > int exp = ((bits & 0x7f80U) >> 23) - 127; > uint32_t mant = bits & 0x007f; > > - // +- 0.0 > + /* +- 0.0 */
Re: [Mesa-dev] [PATCH 1/2] aubinator: Add a new tool called Aubinator to the src/intel/tools folder.
On Tue, Aug 9, 2016 at 4:52 PM, Sirisha Gandikota wrote: > From: Kristian Høgsberg Kristensen > > The Aubinator tool is designed to help the driver developers in debugging > the driver functionality by decoding the data in the .aub files. > Primary Authors of this tool are Damien Lespiau > and Kristian Høgsberg Kristensen . > > Signed-off-by: Sirisha Gandikota > --- > configure.ac |1 + > src/Makefile.am |4 + > src/intel/tools/Makefile.am | 65 +++ > src/intel/tools/aubinator.c | 1039 > ++ > src/intel/tools/decoder.c| 520 + > src/intel/tools/decoder.h| 57 +++ > src/intel/tools/disasm.c | 109 + > src/intel/tools/gen_disasm.h | 35 ++ > src/intel/tools/intel_aub.h | 154 +++ > 9 files changed, 1984 insertions(+) > create mode 100644 src/intel/tools/Makefile.am > create mode 100644 src/intel/tools/aubinator.c > create mode 100644 src/intel/tools/decoder.c > create mode 100644 src/intel/tools/decoder.h > create mode 100644 src/intel/tools/disasm.c > create mode 100644 src/intel/tools/gen_disasm.h > create mode 100644 src/intel/tools/intel_aub.h > > diff --git a/configure.ac b/configure.ac > index aea5890..acc8356 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -2742,6 +2742,7 @@ AC_CONFIG_FILES([Makefile > src/intel/genxml/Makefile > src/intel/isl/Makefile > src/intel/vulkan/Makefile > + src/intel/tools/Makefile This list is alphabetical. tools before vulkan. > src/loader/Makefile > src/mapi/Makefile > src/mapi/es1api/glesv1_cm.pc > diff --git a/src/Makefile.am b/src/Makefile.am > index d4e34b4..190ad08 100644 > --- a/src/Makefile.am > +++ b/src/Makefile.am > @@ -88,6 +88,10 @@ if HAVE_INTEL_VULKAN > SUBDIRS += intel/vulkan > endif > > +if HAVE_INTEL_DRIVERS > +SUBDIRS += intel/tools > +endif > + > if HAVE_GALLIUM > SUBDIRS += gallium > endif > diff --git a/src/intel/tools/Makefile.am b/src/intel/tools/Makefile.am > new file mode 100644 > index 000..cf5477f > --- /dev/null > +++ b/src/intel/tools/Makefile.am > @@ -0,0 +1,65 @@ > +# Copyright © 2016 Intel Corporation > +# > +# Permission is hereby granted, free of charge, to any person obtaining a > +# copy of this software and associated documentation files (the "Software"), > +# to deal in the Software without restriction, including without limitation > +# the rights to use, copy, modify, merge, publish, distribute, sublicense, > +# and/or sell copies of the Software, and to permit persons to whom the > +# Software is furnished to do so, subject to the following conditions: > +# > +# The above copyright notice and this permission notice (including the next > +# paragraph) shall be included in all copies or substantial portions of the > +# Software. > +# > +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > DEALINGS > +# IN THE SOFTWARE. > + > +CLEANFILES = \ > + aubinator-aubinator.o \ > + aubinator-decoder.o \ > + disasm.lo \ > + libdisasm.la > + > +noinst_LTLIBRARIES = libdisasm.la > + > +AM_CPPFLAGS = \ > + $(INTEL_CFLAGS) \ > + $(VALGRIND_CFLAGS) \ > + $(DEFINES) \ > + -I$(top_srcdir)/include \ > + -I$(top_builddir)/src \ > + -I$(top_srcdir)/src \ > + -I$(top_builddir)/src/compiler \ > + -I$(top_srcdir)/src/compiler \ > + -I$(top_builddir)/src/compiler/nir \ > + -I$(top_srcdir)/src/mapi \ > + -I$(top_srcdir)/src/mesa \ > + -I$(top_srcdir)/src/mesa/drivers/dri/common \ > + -I$(top_srcdir)/src/mesa/drivers/dri/i965 \ > + -I$(top_srcdir)/src/gallium/auxiliary \ > + -I$(top_srcdir)/src/gallium/include \ > + -I$(top_builddir)/src/intel \ > + -I$(top_srcdir)/src/intel > + > +libdisasm_la_SOURCES = \ > + gen_disasm.h \ > + disasm.c > +libdisasm_la_LIBADD = > $(top_builddir)/src/mesa/drivers/dri/i965/libi965_compiler.la \ > + $(top_builddir)/src/mesa/libmesa.la \ > + $(PER_GEN_LIBS) \ > + $(PTHREAD_LIBS) \ > + $(DLOPEN_LIBS) \ > + -lm > + > + > +bin_PROGRAMS = aubinator Point of discussion: do we actually want to install this to the users' systems? > + > +aubinator_SOURCES = intel_aub.h decoder.c decoder.h aubinator.c > +aubinator_CFLAGS = $(EXPAT_CFLAGS) $(AM_CFLAGS) -I$(top_srcdir)/src > -I$(top_srcdir)/include > + > +aubinator_LDADD = $(EXPAT_LIBS) libdisasm.la > diff --git a
Re: [Mesa-dev] [PATCH 2/2] aubinator: Fix the tool to correctly decode the DWords
On Tue, Aug 9, 2016 at 4:52 PM, Sirisha Gandikota < sirisha.gandik...@intel.com> wrote: > From: Sirisha Gandikota > > Several fixes have been added as part of this as listed below: > > 1) Fix the mask and add disassembler handling for STATE_DS, STATE_HS > as the mask returned wrong values of the fields. > > 2) Fix the GEN_TYPE_ADDRESS/GEN_TYPE_OFFSET decoding - the address/ > offset were handled the same way as the other fields and that gives > the wrong values for the address/offset. > > 3) Decode nested/recurssive structures - Many packets contain nested > structures, ex: 3DSATE_SO_BUFFER, STATE_BASE_ADDRESS, etc contain MOC > structures. Previously, the aubinator printed 1 if there was a MOC > structure. Now we decode the entire structure and print out its fields. > > 4) Print out the DWord address along with its hex value - For a better > clarity of information, it is helpful to print both the address and > hex value of the DWord along with the DWord count. Since the DWord0 > contains the instruction code and the instruction length, it is > unnecessary to print the decoded values for DWord0. This information > is already available from the DWord hex value. > > 5) Decode the and the corresponding fields in the group- The > tag can have fields of several types including structures. A > group can contain one or more number of fields and this has be correctly > decoded. Previously, aubinator did not decode the groups or the > fields/structures inside them. Now we decode the in the > instructions and structures where the fields in it repeat for any number > of times specified. > One comment for now: Usually, when making a bunch of changes like this, we try and split each functional change into its own patch. That way it's more clear what's going on. This is brand new code, so a couple giant patches may be ok, but having it split up would be nicer. :) Git has some tools that make this less painful than it might look and I'm available to help if you'd like. I'll try and get around to actually playing with it soon. --Jason > > Signed-off-by: Sirisha Gandikota > --- > src/intel/tools/aubinator.c | 115 ++ > +++--- > src/intel/tools/decoder.c | 95 +--- > src/intel/tools/decoder.h | 39 +++ > 3 files changed, 192 insertions(+), 57 deletions(-) > > diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c > index 99d67a1..d2f9d13 100644 > --- a/src/intel/tools/aubinator.c > +++ b/src/intel/tools/aubinator.c > @@ -84,17 +84,80 @@ valid_offset(uint32_t offset) > } > > static void > +print_dword_val(struct gen_field_iterator *iter, uint64_t offset, int > *dword_num) > +{ > + struct gen_field *f; > + union { > + uint32_t dw; > + float f; > + } v; > + > + f = iter->group->fields[iter->i-1]; > + v.dw = iter->p[f->start / 32]; > + > + if (*dword_num != (f->start/32)) { > + printf("0x%08lx: 0x%08x : Dword %d\n",(offset+4*(f->start/32)), > v.dw, f->start/32); > + *dword_num = (f->start/32); > + } > +} > + > +static char * > +print_iterator_values(struct gen_field_iterator *iter, int *idx) > +{ > +char *token = NULL; > +if (strstr(iter->value,"struct") == NULL) { > +printf("%s: %s\n", iter->name, iter->value); > +} else { > +token = strtok(iter->value, " "); > +if (token != NULL) { > +token = strtok(NULL, " "); > +*idx = atoi(strtok(NULL, ">")); > +} else { > +token = NULL; > +} > +printf("%s:\n", iter->name, token); > +} > +return token; > + > +} > + > +static void > decode_structure(struct gen_spec *spec, struct gen_group *strct, const > uint32_t *p) > { > struct gen_field_iterator iter; > + char *token = NULL; > + int idx = 0, dword_num = 0; > + uint64_t offset = 0; > + > + if (option_print_offsets) > + offset = (void *) p - gtt; > + else > + offset = 0; > > gen_field_iterator_init(&iter, strct, p); > while (gen_field_iterator_next(&iter)) { > - printf("%s: %s\n", iter.name, iter.value); > + idx = 0; > + print_dword_val(&iter, offset, &dword_num); > + token = print_iterator_values(&iter, &idx); > + if (token != NULL) { > + struct gen_group *struct_val = gen_spec_find_struct(spec, token); > + decode_structure(spec, struct_val, &p[idx]); > + token = NULL; > + } > } > } > > static void > +handle_struct_decode(struct gen_spec *spec, char *struct_name, uint32_t > *p) > +{ > +if (struct_name == NULL) > +return; > +struct gen_group *struct_val = gen_spec_find_struct(spec, > struct_name); > +decode_structure(spec, struct_val, p); > + > +} > + > +static void > dump_binding_table(struct gen_spec *spec, uint32_t offset) > { > uint32_t *pointers, i; > @@ -248,7 +311,8 @@ handle_media_interface_descriptor_load(struct > gen_spec *spec, uint32_t *p) > }
Re: [Mesa-dev] [PATCH 4/6] nir: Introduce a nir_opt_move_comparisons() pass.
On Wed, Aug 10, 2016 at 11:14 AM, Ian Romanick wrote: > On 08/09/2016 07:30 PM, Kenneth Graunke wrote: > > This tries to move comparisons (a common source of boolean values) > > closer to their first use. For GPUs which use condition codes, > > this can eliminate a lot of temporary booleans and comparisons > > which reload the condition code register based on a boolean. > > > > Signed-off-by: Kenneth Graunke > > --- > > src/compiler/Makefile.sources | 1 + > > src/compiler/nir/nir.h | 2 + > > src/compiler/nir/nir_opt_move_comparisons.c | 173 > > > 3 files changed, 176 insertions(+) > > create mode 100644 src/compiler/nir/nir_opt_move_comparisons.c > > > > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile. > sources > > index 0ff9b23..008a101 100644 > > --- a/src/compiler/Makefile.sources > > +++ b/src/compiler/Makefile.sources > > @@ -226,6 +226,7 @@ NIR_FILES = \ > > nir/nir_opt_gcm.c \ > > nir/nir_opt_global_to_local.c \ > > nir/nir_opt_peephole_select.c \ > > + nir/nir_opt_move_comparisons.c \ > > nir/nir_opt_remove_phis.c \ > > nir/nir_opt_undef.c \ > > nir/nir_phi_builder.c \ > > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > > index 9ce5be2..79511a7 100644 > > --- a/src/compiler/nir/nir.h > > +++ b/src/compiler/nir/nir.h > > @@ -2574,6 +2574,8 @@ bool nir_opt_dead_cf(nir_shader *shader); > > > > void nir_opt_gcm(nir_shader *shader); > > > > +bool nir_opt_move_comparisons(nir_shader *shader); > > + > > bool nir_opt_peephole_select(nir_shader *shader); > > > > bool nir_opt_remove_phis(nir_shader *shader); > > diff --git a/src/compiler/nir/nir_opt_move_comparisons.c > b/src/compiler/nir/nir_opt_move_comparisons.c > > new file mode 100644 > > index 000..74927c9 > > --- /dev/null > > +++ b/src/compiler/nir/nir_opt_move_comparisons.c > > @@ -0,0 +1,173 @@ > > +/* > > + * Copyright © 2016 Intel Corporation > > + * > > + * Permission is hereby granted, free of charge, to any person > obtaining a > > + * copy of this software and associated documentation files (the > "Software"), > > + * to deal in the Software without restriction, including without > limitation > > + * the rights to use, copy, modify, merge, publish, distribute, > sublicense, > > + * and/or sell copies of the Software, and to permit persons to whom the > > + * Software is furnished to do so, subject to the following conditions: > > + * > > + * The above copyright notice and this permission notice (including the > next > > + * paragraph) shall be included in all copies or substantial portions > of the > > + * Software. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > EXPRESS OR > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > MERCHANTABILITY, > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT > SHALL > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR > OTHER > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, > ARISING > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > DEALINGS > > + * IN THE SOFTWARE. > > + */ > > + > > +#include "nir.h" > > + > > +/** > > + * \file nir_opt_move_comparisons.c > > + * > > + * This pass moves ALU comparison operations just before their first > use. > > + * > > + * It only moves instructions within a single basic block; cross-block > > + * movement is left to global code motion. > > + * > > + * Many GPUs generate condition codes for comparisons, and use > predication > > + * for conditional selects and control flow. In a sequence such as: > > + * > > + * vec1 32 ssa_1 = flt a b > > + * > > + * vec1 32 ssa_2 = bcsel ssa_1 c d > > + * > > + * the backend would likely do the comparison, producing condition > codes, > > + * then save those to a boolean value. The intervening operations might > > + * trash the condition codes. Then, in order to do the bcsel, it would > > + * need to re-populate the condition code register based on the boolean. > > + * > > + * By moving the comparison just before the bcsel, the condition codes > could > > + * be used directly. This eliminates the need to reload them from the > boolean > > + * (generally eliminating an instruction). It may also eliminate the > need to > > + * create a boolean value altogether (unless it's used elsewhere), > which could > > + * lower register pressure. > > + */ > > + > > +static bool > > +is_comparison(nir_op op) > > +{ > > + switch (op) { > > + case nir_op_flt: > > + case nir_op_fge: > > + case nir_op_feq: > > + case nir_op_fne: > > + case nir_op_ilt: > > + case nir_op_ult: > > + case nir_op_ige: > > + case nir_op_uge: > > + case nir_op_ieq: > > + case nir_op_ine: > > + return true; > > + default: > > + return false; > > + } > > +} > > + > > +static bool > > +move_comparison_source(nir_src *src, nir_b
Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
Connor Abbott writes: > On Wed, Aug 10, 2016 at 1:53 PM, Eric Anholt wrote: >> Kenneth Graunke writes: >> >>> On Haswell (GL 3.3): >>> >>> total instructions in shared programs: 6208759 -> 6203860 (-0.08%) >>> instructions in affected programs: 856541 -> 851642 (-0.57%) >>> helped: 3157 >>> HURT: 113 >>> LOST: 7 >>> GAINED: 15 >>> >>> On Broadwell (GL 4.4): >>> >>> total instructions in shared programs: 11637854 -> 11632016 (-0.05%) >>> instructions in affected programs: 1055693 -> 1049855 (-0.55%) >>> helped: 3900 >>> HURT: 176 >>> LOST: 1 >>> GAINED: 18 >>> >>> Signed-off-by: Kenneth Graunke >>> --- >>> src/compiler/nir/nir_opt_algebraic.py | 4 >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/src/compiler/nir/nir_opt_algebraic.py >>> b/src/compiler/nir/nir_opt_algebraic.py >>> index 1cf614c..4e9896f 100644 >>> --- a/src/compiler/nir/nir_opt_algebraic.py >>> +++ b/src/compiler/nir/nir_opt_algebraic.py >>> @@ -251,6 +251,10 @@ optimizations = [ >>> (('ieq', 'a@bool', False), ('inot', 'a')), >>> (('bcsel', a, True, False), ('ine', a, 0)), >>> (('bcsel', a, False, True), ('ieq', a, 0)), >>> + (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))), >>> + (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))), >>> + (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0, >>> + (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0, >> >> Isn't bcsel's first arg guaranteed to be 0 or ~0? I thought that was >> how nir's bcsel (and b2f) worked. If not, it would be good to see this >> documented. > > Yes, any ALU operation that takes a boolean input can expect it to be > 0 or ~0. If that isn't true, then something else in the compiler has > messed up. So I think we can replace 'a != 0' with 'a' and 'a == 0' > with '!a'. Yeah. So, I like the contents of this series a lot, but it would be nice to see some of the @bool annotations dropped. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swrast: fix active attribs with atifragshader
On 08/04/2016 06:51 PM, Brian Paul wrote: On 08/04/2016 10:40 AM, Miklós Máté wrote: On 07/14/2016 10:43 PM, Miklós Máté wrote: On 07/07/2016 07:11 PM, Miklós Máté wrote: On 06/26/2016 09:48 PM, Miklós Máté wrote: Only include the ones that can be used by the shader. This fixes texture coordinates, which were completely wrong, because WPOS was included in the list of attribs. It also increases performance noticeably. Signed-off-by: Miklós Máté --- src/mesa/swrast/s_context.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/swrast/s_context.c b/src/mesa/swrast/s_context.c index 0a5fc7e..a63179c 100644 --- a/src/mesa/swrast/s_context.c +++ b/src/mesa/swrast/s_context.c @@ -504,7 +504,8 @@ _swrast_update_active_attribs(struct gl_context *ctx) attribsMask &= ~VARYING_BIT_POS; /* WPOS is always handled specially */ } else if (ctx->ATIFragmentShader._Enabled) { - attribsMask = ~0; /* XXX fix me */ + attribsMask = VARYING_BIT_COL0 | VARYING_BIT_COL1 | +VARYING_BIT_FOGC | VARYING_BITS_TEX_ANY; } else { /* fixed function */ ping? ping? Can anyone please review this? I think few people know swrast and this particular extension. Looks OK to me though. Reviewed-by: Brian Paul Thanks for the review. If there are no objections, I need somebody to commit this for me, because I have no access. MM ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/7] gallium/auxiliary: Add u_bitcast.h header.
On Thu, Jul 28, 2016 at 9:55 PM, Connor Abbott wrote: > On Thu, Jul 28, 2016 at 7:58 PM, Roland Scheidegger > wrote: >> Am 29.07.2016 um 00:35 schrieb Matt Turner: >>> --- >>> src/gallium/auxiliary/Makefile.sources | 1 + >>> src/gallium/auxiliary/util/u_bitcast.h | 57 >>> ++ >>> 2 files changed, 58 insertions(+) >>> create mode 100644 src/gallium/auxiliary/util/u_bitcast.h >>> >>> diff --git a/src/gallium/auxiliary/Makefile.sources >>> b/src/gallium/auxiliary/Makefile.sources >>> index e0311bf..26f7eee 100644 >>> --- a/src/gallium/auxiliary/Makefile.sources >>> +++ b/src/gallium/auxiliary/Makefile.sources >>> @@ -175,6 +175,7 @@ C_SOURCES := \ >>> translate/translate_generic.c \ >>> translate/translate_sse.c \ >>> util/dbghelp.h \ >>> + util/u_bitcast.h \ >>> util/u_bitmask.c \ >>> util/u_bitmask.h \ >>> util/u_blend.h \ >>> diff --git a/src/gallium/auxiliary/util/u_bitcast.h >>> b/src/gallium/auxiliary/util/u_bitcast.h >>> new file mode 100644 >>> index 000..b1f9938 >>> --- /dev/null >>> +++ b/src/gallium/auxiliary/util/u_bitcast.h >>> @@ -0,0 +1,57 @@ >>> +/** >>> + * >>> + * Copyright © 2016 Intel Corporation >>> + * All Rights Reserved. >>> + * >>> + * Permission is hereby granted, free of charge, to any person obtaining a >>> + * copy of this software and associated documentation files (the >>> + * "Software"), to deal in the Software without restriction, including >>> + * without limitation the rights to use, copy, modify, merge, publish, >>> + * distribute, sub license, and/or sell copies of the Software, and to >>> + * permit persons to whom the Software is furnished to do so, subject to >>> + * the following conditions: >>> + * >>> + * The above copyright notice and this permission notice (including the >>> + * next paragraph) shall be included in all copies or substantial portions >>> + * of the Software. >>> + * >>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS >>> + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. >>> + * IN NO EVENT SHALL THE AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR >>> + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, >>> + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE >>> + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. >>> + * >>> + >>> **/ >>> + >>> +#ifndef U_BITCAST_H_ >>> +#define U_BITCAST_H_ >>> + >>> +#include >>> + >>> +#ifdef __cplusplus >>> +extern "C" { >>> +#endif >>> + >>> +static inline unsigned >>> +u_bitcast_f2u(float f) >>> +{ >>> + unsigned u; >>> + memcpy(&u, &f, sizeof(u)); >>> + return u; >>> +} >>> + >>> +static inline float >>> +u_bitcast_u2f(unsigned u) >>> +{ >>> + float f; >>> + memcpy(&f, &u, sizeof(f)); >>> + return f; >>> +} >>> + >>> +#ifdef __cplusplus >>> +} >>> +#endif >>> + >>> +#endif /* U_BITCAST_H_ */ >>> >> >> I think traditionally we've used the union fi from u_math.h to do this. >> That said, I don't have anything against using an inline memcpy helper >> function instead - both should hopefully get optimized away... > > Unfortunately, that won't work with the strict aliasing rules either > since it's undefined behavior to reinterpret a union by dereferencing > more than member. That is, once you assign to one field of a union, > you can read from/write to that field only. You can, however, > reinterpret anything as a char * and dereference, hence why memcopy > etc. aren't undefined. Sorry for the late reply, but I thought it was important to correct this common misunderstanding (Even John Regehr gets it wrong! [1]). Since at least C99, the union trick is fine. C99 Technical Corrigendum 3 [2] says: """ If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation. """ It seems, but I have not confirmed, that it is not legal in C++. I cannot find a reference now, but I remember reading a C89 clarification that explained that it was only "unspecified" because of the "trap representation" wording and was intended to have the same semantics as C99. Regardless, memcpy generates optimal code and is legal in C and C++. Hence my choice in this series. (Aside: Hence Why Must Die [3]) [1] http://blog.regehr.org/archives/959 [2] http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1235.pdf [3] http://public.wsu.edu/~brians/errors/hencewhy.html ___ mesa-dev mailing list mesa-dev@lists.freede
[Mesa-dev] [PATCH 06/16] gallium/radeon: more descriptive names for LLVM temporaries in debug builds
From: Nicolai Hähnle Reviewed-by: Tom Stellard --- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 7b96a58..22ff18e 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -31,20 +31,21 @@ #include "gallivm/lp_bld_init.h" #include "gallivm/lp_bld_intr.h" #include "gallivm/lp_bld_misc.h" #include "gallivm/lp_bld_swizzle.h" #include "tgsi/tgsi_info.h" #include "tgsi/tgsi_parse.h" #include "util/u_math.h" #include "util/u_memory.h" #include "util/u_debug.h" +#include #include #include LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base, enum tgsi_opcode_type type) { LLVMContextRef ctx = bld_base->base.gallivm->context; switch (type) { case TGSI_TYPE_UNSIGNED: @@ -421,20 +422,21 @@ static void emit_declaration(struct lp_build_tgsi_context *bld_base, ctx->soa.addr[idx][chan] = si_build_alloca_undef( &ctx->gallivm, ctx->soa.bld_base.uint_bld.elem_type, ""); } } break; } case TGSI_FILE_TEMPORARY: { + char name[16] = ""; LLVMValueRef array_alloca = NULL; unsigned decl_size; first = decl->Range.First; last = decl->Range.Last; decl_size = 4 * ((last - first) + 1); if (decl->Declaration.Array) { unsigned id = decl->Array.ArrayID - 1; if (!ctx->arrays) { int size = bld_base->info->array_max[TGSI_FILE_TEMPORARY]; ctx->arrays = CALLOC(size, sizeof(ctx->arrays[0])); @@ -458,34 +460,42 @@ static void emit_declaration(struct lp_build_tgsi_context *bld_base, ctx->arrays[id].alloca = array_alloca; } } if (!ctx->temps_count) { ctx->temps_count = bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1; ctx->temps = MALLOC(TGSI_NUM_CHANNELS * ctx->temps_count * sizeof(LLVMValueRef)); } if (!array_alloca) { for (i = 0; i < decl_size; ++i) { +#ifdef DEBUG + snprintf(name, sizeof(name), "TEMP%d.%c", +first + i / 4, "xyzw"[i % 4]); +#endif ctx->temps[first * TGSI_NUM_CHANNELS + i] = si_build_alloca_undef(bld_base->base.gallivm, bld_base->base.vec_type, - "temp"); + name); } } else { LLVMValueRef idxs[2] = { bld_base->uint_bld.zero, NULL }; for (i = 0; i < decl_size; ++i) { +#ifdef DEBUG + snprintf(name, sizeof(name), "TEMP%d.%c", +first + i / 4, "xyzw"[i % 4]); +#endif idxs[1] = lp_build_const_int32(bld_base->base.gallivm, i); ctx->temps[first * TGSI_NUM_CHANNELS + i] = - LLVMBuildGEP(builder, array_alloca, idxs, 2, "temp"); + LLVMBuildGEP(builder, array_alloca, idxs, 2, name); } } break; } case TGSI_FILE_INPUT: { unsigned idx; for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) { if (ctx->load_input) ctx->load_input(ctx, idx, decl); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/16] gallium/radeon: allocate temps array info in radeon_llvm_context_init
From: Nicolai Hähnle Also, prepare for using tgsi_array_info. This also opens the door for properly handling allocation failures, but I'm leaving that for a separate change. --- src/gallium/drivers/radeon/radeon_llvm.h | 11 ++-- .../drivers/radeon/radeon_setup_tgsi_llvm.c| 66 +- src/gallium/drivers/radeonsi/si_shader.c | 6 +- 3 files changed, 47 insertions(+), 36 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_llvm.h b/src/gallium/drivers/radeon/radeon_llvm.h index 13f3336..6086dd6 100644 --- a/src/gallium/drivers/radeon/radeon_llvm.h +++ b/src/gallium/drivers/radeon/radeon_llvm.h @@ -43,25 +43,20 @@ struct radeon_llvm_branch { LLVMBasicBlockRef if_block; LLVMBasicBlockRef else_block; unsigned has_else; }; struct radeon_llvm_loop { LLVMBasicBlockRef loop_block; LLVMBasicBlockRef endloop_block; }; -struct radeon_llvm_array { - struct tgsi_declaration_range range; - LLVMValueRef alloca; -}; - struct radeon_llvm_context { struct lp_build_tgsi_soa_context soa; /*=== Front end configuration ===*/ /* Instructions that are not described by any of the TGSI opcodes. */ /** This function is responsible for initilizing the inputs array and will be * called once for each input declared in the TGSI shader. */ @@ -94,21 +89,22 @@ struct radeon_llvm_context { /*=== Private Members ===*/ struct radeon_llvm_branch *branch; struct radeon_llvm_loop *loop; unsigned branch_depth; unsigned branch_depth_max; unsigned loop_depth; unsigned loop_depth_max; - struct radeon_llvm_array *arrays; + struct tgsi_array_info *temp_arrays; + LLVMValueRef *temp_array_allocas; LLVMValueRef main_fn; LLVMTypeRef return_type; unsigned fpmath_md_kind; LLVMValueRef fpmath_md_2p5_ulp; struct gallivm_state gallivm; }; @@ -117,21 +113,22 @@ LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base, LLVMValueRef bitcast(struct lp_build_tgsi_context *bld_base, enum tgsi_opcode_type type, LLVMValueRef value); void radeon_llvm_emit_prepare_cube_coords(struct lp_build_tgsi_context *bld_base, struct lp_build_emit_data *emit_data, LLVMValueRef *coords_arg, LLVMValueRef *derivs_arg); void radeon_llvm_context_init(struct radeon_llvm_context *ctx, - const char *triple); + const char *triple, + const struct tgsi_shader_info *info); void radeon_llvm_create_func(struct radeon_llvm_context *ctx, LLVMTypeRef *return_types, unsigned num_return_elems, LLVMTypeRef *ParamTypes, unsigned ParamCount); void radeon_llvm_dispose(struct radeon_llvm_context *ctx); unsigned radeon_llvm_reg_index_soa(unsigned index, unsigned chan); void radeon_llvm_finalize_module(struct radeon_llvm_context *ctx); diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index d8ab5b0..2521023 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -108,54 +108,54 @@ static LLVMValueRef emit_swizzle(struct lp_build_tgsi_context *bld_base, return LLVMBuildShuffleVector(bld_base->base.gallivm->builder, value, LLVMGetUndef(LLVMTypeOf(value)), LLVMConstVector(swizzles, 4), ""); } /** * Return the description of the array covering the given temporary register * index. */ -static const struct radeon_llvm_array * -get_temp_array(struct lp_build_tgsi_context *bld_base, - unsigned reg_index, - const struct tgsi_ind_register *reg) +static unsigned +get_temp_array_id(struct lp_build_tgsi_context *bld_base, + unsigned reg_index, + const struct tgsi_ind_register *reg) { struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base); unsigned num_arrays = ctx->soa.bld_base.info->array_max[TGSI_FILE_TEMPORARY]; unsigned i; if (reg && reg->ArrayID > 0 && reg->ArrayID <= num_arrays) - return &ctx->arrays[reg->ArrayID - 1]; + return reg->ArrayID; for (i = 0; i < num_arrays; i++) { - const struct radeon_llvm_array *array = &ctx->arrays[i]; + const struct tgsi_array_info *array = &ctx->temp_arrays[i]; if (reg_index >= array->range.First && reg_index <= array->range.Last) - return array; + return i + 1;
[Mesa-dev] [PATCH 16/16] gallium/radeon: protect against out of bounds temporary array accesses
From: Nicolai Hähnle They can lead to VM faults and worse, which goes against the GL robustness promises. --- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 7cdf228..88c7b3c 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -232,20 +232,35 @@ get_pointer_into_array(struct radeon_llvm_context *ctx, if (!alloca) return NULL; array = &ctx->temp_arrays[array_id - 1]; if (!(array->writemask & (1 << swizzle))) return ctx->undef_alloca; index = emit_array_index(&ctx->soa, reg_indirect, reg_index - ctx->temp_arrays[array_id - 1].range.First); + + /* Ensure that the index is within a valid range, to guard against +* VM faults and overwriting critical data (e.g. spilled resource +* descriptors). +* +* TODO It should be possible to avoid the additional instructions +* if LLVM is changed so that it guarantuees: +* 1. the scratch space descriptor isolates the current wave (this +*could even save the scratch offset SGPR at the cost of an +*additional SALU instruction) +* 2. the memory for allocas must be allocated at the _end_ of the +*scratch space (after spilled registers) +*/ + index = radeon_llvm_bound_index(ctx, index, array->range.Last - array->range.First + 1); + index = LLVMBuildMul( builder, index, lp_build_const_int32(gallivm, util_bitcount(array->writemask)), ""); index = LLVMBuildAdd( builder, index, lp_build_const_int32( gallivm, util_bitcount(array->writemask & ((1 << swizzle) - 1))), ""); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/16] gallium/radeon: pass indirect register info into get_alloca_for_array
From: Nicolai Hähnle To have the same signature as get_array_range. --- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 994c7da..531a8fe 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -151,28 +151,29 @@ get_array_range(struct lp_build_tgsi_context *bld_base, return array->range; } range.First = 0; range.Last = bld_base->info->file_max[File]; return range; } static LLVMValueRef get_alloca_for_array(struct lp_build_tgsi_context *bld_base, unsigned file, -unsigned index) +unsigned index, +const struct tgsi_ind_register *reg) { const struct radeon_llvm_array *array; if (file != TGSI_FILE_TEMPORARY) return NULL; - array = get_temp_array(bld_base, index, NULL); + array = get_temp_array(bld_base, index, reg); if (!array) return NULL; return array->alloca; } static LLVMValueRef emit_array_index(struct lp_build_tgsi_soa_context *bld, const struct tgsi_ind_register *reg, unsigned offset) @@ -240,21 +241,21 @@ load_value_from_array(struct lp_build_tgsi_context *bld_base, enum tgsi_opcode_type type, unsigned swizzle, unsigned reg_index, const struct tgsi_ind_register *reg_indirect) { struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base); struct gallivm_state *gallivm = bld_base->base.gallivm; LLVMBuilderRef builder = gallivm->builder; struct tgsi_declaration_range range = get_array_range(bld_base, file, reg_index, reg_indirect); LLVMValueRef index = emit_array_index(bld, reg_indirect, reg_index - range.First); - LLVMValueRef array = get_alloca_for_array(bld_base, file, reg_index); + LLVMValueRef array = get_alloca_for_array(bld_base, file, reg_index, reg_indirect); LLVMValueRef ptr, val, indices[2]; if (!array) { /* Handle the case where the array is stored as a vector. */ return LLVMBuildExtractElement(builder, emit_array_fetch(bld_base, file, type, range, swizzle), index, ""); } index = LLVMBuildMul(builder, index, lp_build_const_int32(gallivm, TGSI_NUM_CHANNELS), ""); @@ -280,21 +281,21 @@ store_value_to_array(struct lp_build_tgsi_context *bld_base, unsigned file, unsigned chan_index, unsigned reg_index, const struct tgsi_ind_register *reg_indirect) { struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base); struct gallivm_state *gallivm = bld_base->base.gallivm; LLVMBuilderRef builder = gallivm->builder; struct tgsi_declaration_range range = get_array_range(bld_base, file, reg_index, reg_indirect); LLVMValueRef index = emit_array_index(bld, reg_indirect, reg_index - range.First); - LLVMValueRef array = get_alloca_for_array(bld_base, file, reg_index); + LLVMValueRef array = get_alloca_for_array(bld_base, file, reg_index, reg_indirect); if (array) { LLVMValueRef indices[2]; index = LLVMBuildMul(builder, index, lp_build_const_int32(gallivm, TGSI_NUM_CHANNELS), ""); index = LLVMBuildAdd(builder, index, lp_build_const_int32(gallivm, chan_index), ""); indices[0] = bld_base->uint_bld.zero; indices[1] = index; LLVMValueRef pointer = LLVMBuildGEP(builder, array, indices, 2, ""); LLVMBuildStore(builder, value, pointer); return NULL; @@ -617,21 +618,21 @@ void radeon_llvm_emit_store(struct lp_build_tgsi_context *bld_base, if (reg->Register.Indirect) { struct tgsi_declaration_range range = get_array_range(bld_base, reg->Register.File, reg->Register.Index, ®->Indirect); unsigned i, size = range.Last - range.First + 1; unsigned file = reg->Register.File; unsigned reg_index = reg->Register.Index; LLVMValueRef array = store_value_to_array(bld_base, value, file, chan_index, reg_index, ®->Indirect); - if (get_alloca_for_array(bld_base, file, reg_index)) { + if (get_all
[Mesa-dev] [PATCH 13/16] gallium/radeon: use tgsi_scan_arrays for temp arrays
From: Nicolai Hähnle --- src/gallium/drivers/radeon/radeon_llvm.h| 3 ++- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 9 ++--- src/gallium/drivers/radeonsi/si_shader.c| 3 ++- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_llvm.h b/src/gallium/drivers/radeon/radeon_llvm.h index 6086dd6..4ed2c97 100644 --- a/src/gallium/drivers/radeon/radeon_llvm.h +++ b/src/gallium/drivers/radeon/radeon_llvm.h @@ -114,21 +114,22 @@ LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base, LLVMValueRef bitcast(struct lp_build_tgsi_context *bld_base, enum tgsi_opcode_type type, LLVMValueRef value); void radeon_llvm_emit_prepare_cube_coords(struct lp_build_tgsi_context *bld_base, struct lp_build_emit_data *emit_data, LLVMValueRef *coords_arg, LLVMValueRef *derivs_arg); void radeon_llvm_context_init(struct radeon_llvm_context *ctx, const char *triple, - const struct tgsi_shader_info *info); + const struct tgsi_shader_info *info, + const struct tgsi_token *tokens); void radeon_llvm_create_func(struct radeon_llvm_context *ctx, LLVMTypeRef *return_types, unsigned num_return_elems, LLVMTypeRef *ParamTypes, unsigned ParamCount); void radeon_llvm_dispose(struct radeon_llvm_context *ctx); unsigned radeon_llvm_reg_index_soa(unsigned index, unsigned chan); void radeon_llvm_finalize_module(struct radeon_llvm_context *ctx); diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 2521023..dac0594 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -478,22 +478,20 @@ static void emit_declaration(struct lp_build_tgsi_context *bld_base, { char name[16] = ""; LLVMValueRef array_alloca = NULL; unsigned decl_size; first = decl->Range.First; last = decl->Range.Last; decl_size = 4 * ((last - first) + 1); if (decl->Declaration.Array) { unsigned id = decl->Array.ArrayID - 1; - ctx->temp_arrays[id].range = decl->Range; - /* If the array has more than 16 elements, store it * in memory using an alloca that spans the entire * array. * * Otherwise, store each array element individually. * We will then generate vectors (per-channel, up to * <4 x float>) for indirect addressing. * * Note that 16 is the number of vector elements that * LLVM will store in a register, so theoretically an @@ -1723,21 +1721,22 @@ static void emit_rsq(const struct lp_build_tgsi_action *action, LLVMValueRef sqrt = lp_build_emit_llvm_unary(bld_base, TGSI_OPCODE_SQRT, emit_data->args[0]); emit_data->output[emit_data->chan] = lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_DIV, bld_base->base.one, sqrt); } void radeon_llvm_context_init(struct radeon_llvm_context *ctx, const char *triple, - const struct tgsi_shader_info *info) + const struct tgsi_shader_info *info, + const struct tgsi_token *tokens) { struct lp_type type; /* Initialize the gallivm object: * We are only using the module, context, and builder fields of this struct. * This should be enough for us to be able to pass our gallivm struct to the * helper functions in the gallivm module. */ memset(&ctx->gallivm, 0, sizeof (ctx->gallivm)); memset(&ctx->soa, 0, sizeof(ctx->soa)); @@ -1749,20 +1748,24 @@ void radeon_llvm_context_init(struct radeon_llvm_context *ctx, const char *tripl struct lp_build_tgsi_context *bld_base = &ctx->soa.bld_base; bld_base->info = info; if (info && info->array_max[TGSI_FILE_TEMPORARY] > 0) { int size = info->array_max[TGSI_FILE_TEMPORARY]; ctx->temp_arrays = CALLOC(size, sizeof(ctx->temp_arrays[0])); ctx->temp_array_allocas = CALLOC(size, sizeof(ctx->temp_array_allocas[0])); + + if (tokens) + tgsi_scan_arrays(tokens, TGSI_FILE_TEMPORARY, size, +ctx->temp_arrays);
[Mesa-dev] [PATCH 14/16] gallium/radeon: reduce alloca of temporaries based on usagemask
From: Nicolai Hähnle v2: take actual writemasks into account --- src/gallium/drivers/radeon/radeon_llvm.h | 2 + .../drivers/radeon/radeon_setup_tgsi_llvm.c| 62 ++ 2 files changed, 54 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_llvm.h b/src/gallium/drivers/radeon/radeon_llvm.h index 4ed2c97..0276ef3 100644 --- a/src/gallium/drivers/radeon/radeon_llvm.h +++ b/src/gallium/drivers/radeon/radeon_llvm.h @@ -92,20 +92,22 @@ struct radeon_llvm_context { struct radeon_llvm_loop *loop; unsigned branch_depth; unsigned branch_depth_max; unsigned loop_depth; unsigned loop_depth_max; struct tgsi_array_info *temp_arrays; LLVMValueRef *temp_array_allocas; + LLVMValueRef undef_alloca; + LLVMValueRef main_fn; LLVMTypeRef return_type; unsigned fpmath_md_kind; LLVMValueRef fpmath_md_2p5_ulp; struct gallivm_state gallivm; }; LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base, diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index dac0594..dd7d60b 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -178,41 +178,55 @@ emit_array_index(struct lp_build_tgsi_soa_context *bld, * must be used. */ static LLVMValueRef get_pointer_into_array(struct radeon_llvm_context *ctx, unsigned file, unsigned swizzle, unsigned reg_index, const struct tgsi_ind_register *reg_indirect) { unsigned array_id; + struct tgsi_array_info *array; struct gallivm_state *gallivm = ctx->soa.bld_base.base.gallivm; LLVMBuilderRef builder = gallivm->builder; LLVMValueRef idxs[2]; LLVMValueRef index; LLVMValueRef alloca; if (file != TGSI_FILE_TEMPORARY) return NULL; array_id = get_temp_array_id(&ctx->soa.bld_base, reg_index, reg_indirect); if (!array_id) return NULL; alloca = ctx->temp_array_allocas[array_id - 1]; if (!alloca) return NULL; + array = &ctx->temp_arrays[array_id - 1]; + + if (!(array->writemask & (1 << swizzle))) + return ctx->undef_alloca; + index = emit_array_index(&ctx->soa, reg_indirect, reg_index - ctx->temp_arrays[array_id - 1].range.First); - index = LLVMBuildMul(builder, index, lp_build_const_int32(gallivm, TGSI_NUM_CHANNELS), ""); - index = LLVMBuildAdd(builder, index, lp_build_const_int32(gallivm, swizzle), ""); + index = LLVMBuildMul( + builder, index, + lp_build_const_int32(gallivm, util_bitcount(array->writemask)), + ""); + index = LLVMBuildAdd( + builder, index, + lp_build_const_int32( + gallivm, + util_bitcount(array->writemask & ((1 << swizzle) - 1))), + ""); idxs[0] = ctx->soa.bld_base.uint_bld.zero; idxs[1] = index; return LLVMBuildGEP(builder, alloca, idxs, 2, ""); } LLVMValueRef radeon_llvm_emit_fetch_64bit(struct lp_build_tgsi_context *bld_base, enum tgsi_opcode_type type, LLVMValueRef ptr, LLVMValueRef ptr2) @@ -472,48 +486,56 @@ static void emit_declaration(struct lp_build_tgsi_context *bld_base, } } break; } case TGSI_FILE_TEMPORARY: { char name[16] = ""; LLVMValueRef array_alloca = NULL; unsigned decl_size; + unsigned writemask = decl->Declaration.UsageMask; first = decl->Range.First; last = decl->Range.Last; decl_size = 4 * ((last - first) + 1); + if (decl->Declaration.Array) { unsigned id = decl->Array.ArrayID - 1; + unsigned array_size; + + writemask &= ctx->temp_arrays[id].writemask; + ctx->temp_arrays[id].writemask = writemask; + array_size = ((last - first) + 1) * util_bitcount(writemask); /* If the array has more than 16 elements, store it * in memory using an alloca that spans the entire * array. * * Otherwise, store each array element individually. * We will then generate vectors (per-channel, up to -* <4 x float>) for indirect addressing. +* <16 x float> if the usagemask is a single
[Mesa-dev] [PATCH 05/16] gallium/radeon: simplify radeon_llvm_emit_store for direct array addressing
From: Nicolai Hähnle We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard --- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index e084248..7b96a58 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -624,30 +624,23 @@ void radeon_llvm_emit_store(struct lp_build_tgsi_context *bld_base, } else { switch(reg->Register.File) { case TGSI_FILE_OUTPUT: temp_ptr = bld->outputs[reg->Register.Index][chan_index]; if (tgsi_type_is_64bit(dtype)) temp_ptr2 = bld->outputs[reg->Register.Index][chan_index + 1]; break; case TGSI_FILE_TEMPORARY: { - LLVMValueRef array; if (reg->Register.Index >= ctx->temps_count) continue; - array = get_alloca_for_array(bld_base, reg->Register.File, reg->Register.Index); - if (array) { - store_value_to_array(bld_base, value, reg->Register.File, chan_index, reg->Register.Index, - NULL); - continue; - } temp_ptr = ctx->temps[ TGSI_NUM_CHANNELS * reg->Register.Index + chan_index]; if (tgsi_type_is_64bit(dtype)) temp_ptr2 = ctx->temps[ TGSI_NUM_CHANNELS * reg->Register.Index + chan_index + 1]; break; } default: return; } if (!tgsi_type_is_64bit(dtype)) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/16] gallium/radeon: extract common getelementptr logic into get_pointer_into_array
From: Nicolai Hähnle --- .../drivers/radeon/radeon_setup_tgsi_llvm.c| 105 + 1 file changed, 66 insertions(+), 39 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 531a8fe..87fc07e 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -180,20 +180,55 @@ emit_array_index(struct lp_build_tgsi_soa_context *bld, { struct gallivm_state *gallivm = bld->bld_base.base.gallivm; if (!reg) { return lp_build_const_int32(gallivm, offset); } LLVMValueRef addr = LLVMBuildLoad(gallivm->builder, bld->addr[reg->Index][reg->Swizzle], ""); return LLVMBuildAdd(gallivm->builder, addr, lp_build_const_int32(gallivm, offset), ""); } +/** + * For indirect registers, construct a pointer directly to the requested + * element using getelementptr if possible. + * + * Returns NULL if the insertelement/extractelement fallback for array access + * must be used. + */ +static LLVMValueRef +get_pointer_into_array(struct radeon_llvm_context *ctx, + unsigned file, + unsigned swizzle, + unsigned reg_index, + const struct tgsi_ind_register *reg_indirect) +{ + const struct radeon_llvm_array *array; + struct gallivm_state *gallivm = ctx->soa.bld_base.base.gallivm; + LLVMBuilderRef builder = gallivm->builder; + LLVMValueRef idxs[2]; + LLVMValueRef index; + + if (file != TGSI_FILE_TEMPORARY) + return NULL; + + array = get_temp_array(&ctx->soa.bld_base, reg_index, reg_indirect); + if (!array || !array->alloca) + return NULL; + + index = emit_array_index(&ctx->soa, reg_indirect, reg_index - array->range.First); + index = LLVMBuildMul(builder, index, lp_build_const_int32(gallivm, TGSI_NUM_CHANNELS), ""); + index = LLVMBuildAdd(builder, index, lp_build_const_int32(gallivm, swizzle), ""); + idxs[0] = ctx->soa.bld_base.uint_bld.zero; + idxs[1] = index; + return LLVMBuildGEP(builder, array->alloca, idxs, 2, ""); +} + LLVMValueRef radeon_llvm_emit_fetch_64bit(struct lp_build_tgsi_context *bld_base, enum tgsi_opcode_type type, LLVMValueRef ptr, LLVMValueRef ptr2) { LLVMBuilderRef builder = bld_base->base.gallivm->builder; LLVMValueRef result; result = LLVMGetUndef(LLVMVectorType(LLVMIntTypeInContext(bld_base->base.gallivm->context, 32), bld_base->base.type.length * 2)); @@ -236,80 +271,72 @@ emit_array_fetch(struct lp_build_tgsi_context *bld_base, } static LLVMValueRef load_value_from_array(struct lp_build_tgsi_context *bld_base, unsigned file, enum tgsi_opcode_type type, unsigned swizzle, unsigned reg_index, const struct tgsi_ind_register *reg_indirect) { + struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base); struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base); struct gallivm_state *gallivm = bld_base->base.gallivm; LLVMBuilderRef builder = gallivm->builder; - struct tgsi_declaration_range range = get_array_range(bld_base, file, reg_index, reg_indirect); - LLVMValueRef index = emit_array_index(bld, reg_indirect, reg_index - range.First); - LLVMValueRef array = get_alloca_for_array(bld_base, file, reg_index, reg_indirect); - LLVMValueRef ptr, val, indices[2]; - - if (!array) { - /* Handle the case where the array is stored as a vector. */ - return LLVMBuildExtractElement(builder, - emit_array_fetch(bld_base, file, type, range, swizzle), - index, ""); - } + LLVMValueRef ptr; - index = LLVMBuildMul(builder, index, lp_build_const_int32(gallivm, TGSI_NUM_CHANNELS), ""); - index = LLVMBuildAdd(builder, index, lp_build_const_int32(gallivm, swizzle), ""); - indices[0] = bld_base->uint_bld.zero; - indices[1] = index; - ptr = LLVMBuildGEP(builder, array, indices, 2, ""); - val = LLVMBuildLoad(builder, ptr, ""); - if (tgsi_type_is_64bit(type)) { - LLVMValueRef ptr_hi, val_hi; - indices[0] = lp_build_const_int32(gallivm, 1); - ptr_hi = LLVMBuildGEP(builder, ptr, indices, 1, ""); - val_hi = LLVMBuildLoad(builder, ptr_hi, ""); - val = radeon_llvm_emit_fetch_64bit(bld_base, type, val, val_hi); + ptr = get_pointer_into_array(ctx, file, swizzle, reg_index, reg_indirect); + if (ptr) { + LLVMValueRef val = LLVMBuildLoad(builder, ptr, ""); + if (tgsi_type_is_64
[Mesa-dev] [PATCH 15/16] gallium/radeon: add radeon_llvm_bound_index for bounds checking
From: Nicolai Hähnle --- src/gallium/drivers/radeon/radeon_llvm.h | 4 +++ .../drivers/radeon/radeon_setup_tgsi_llvm.c| 29 ++ src/gallium/drivers/radeonsi/si_shader.c | 19 +- 3 files changed, 34 insertions(+), 18 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_llvm.h b/src/gallium/drivers/radeon/radeon_llvm.h index 0276ef3..da5b7f5 100644 --- a/src/gallium/drivers/radeon/radeon_llvm.h +++ b/src/gallium/drivers/radeon/radeon_llvm.h @@ -109,20 +109,24 @@ struct radeon_llvm_context { struct gallivm_state gallivm; }; LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base, enum tgsi_opcode_type type); LLVMValueRef bitcast(struct lp_build_tgsi_context *bld_base, enum tgsi_opcode_type type, LLVMValueRef value); +LLVMValueRef radeon_llvm_bound_index(struct radeon_llvm_context *ctx, +LLVMValueRef index, +unsigned num); + void radeon_llvm_emit_prepare_cube_coords(struct lp_build_tgsi_context *bld_base, struct lp_build_emit_data *emit_data, LLVMValueRef *coords_arg, LLVMValueRef *derivs_arg); void radeon_llvm_context_init(struct radeon_llvm_context *ctx, const char *triple, const struct tgsi_shader_info *info, const struct tgsi_token *tokens); diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index dd7d60b..7cdf228 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -66,20 +66,49 @@ LLVMValueRef bitcast(struct lp_build_tgsi_context *bld_base, { LLVMBuilderRef builder = bld_base->base.gallivm->builder; LLVMTypeRef dst_type = tgsi2llvmtype(bld_base, type); if (dst_type) return LLVMBuildBitCast(builder, value, dst_type, ""); else return value; } +/** + * Return a value that is equal to the given i32 \p index if it lies in [0,num) + * or an undefined value in the same interval otherwise. + */ +LLVMValueRef radeon_llvm_bound_index(struct radeon_llvm_context *ctx, +LLVMValueRef index, +unsigned num) +{ + struct gallivm_state *gallivm = &ctx->gallivm; + LLVMBuilderRef builder = gallivm->builder; + LLVMValueRef c_max = lp_build_const_int32(gallivm, num - 1); + LLVMValueRef cc; + + if (util_is_power_of_two(num)) { + index = LLVMBuildAnd(builder, index, c_max, ""); + } else { + /* In theory, this MAX pattern should result in code that is +* as good as the bit-wise AND above. +* +* In practice, LLVM generates worse code (at the time of +* writing), because its value tracking is not strong enough. +*/ + cc = LLVMBuildICmp(builder, LLVMIntULE, index, c_max, ""); + index = LLVMBuildSelect(builder, cc, index, c_max, ""); + } + + return index; +} + static struct radeon_llvm_loop *get_current_loop(struct radeon_llvm_context *ctx) { return ctx->loop_depth > 0 ? ctx->loop + (ctx->loop_depth - 1) : NULL; } static struct radeon_llvm_branch *get_current_branch(struct radeon_llvm_context *ctx) { return ctx->branch_depth > 0 ? ctx->branch + (ctx->branch_depth - 1) : NULL; } diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 06b5c9c..a5b566e 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -558,47 +558,30 @@ static LLVMValueRef get_indirect_index(struct si_shader_context *ctx, } /** * Like get_indirect_index, but restricts the return value to a (possibly * undefined) value inside [0..num). */ static LLVMValueRef get_bounded_indirect_index(struct si_shader_context *ctx, const struct tgsi_ind_register *ind, int rel_index, unsigned num) { - struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm; - LLVMBuilderRef builder = gallivm->builder; LLVMValueRef result = get_indirect_index(ctx, ind, rel_index); - LLVMValueRef c_max = LLVMConstInt(ctx->i32, num - 1, 0); - LLVMValueRef cc; /* LLVM 3.8: If indirect resource indexing is used: * - SI & CIK hang * - VI crashes */ if (HAVE_LLVM <= 0x0308) return LLVMGetUndef(ctx->i32); - if (util_is_power_of_two(num)) { - result =
[Mesa-dev] [PATCH 11/16] gallium/radeon: always do the full store in store_value_to_array
From: Nicolai Hähnle Doing the write-back of the temporary vector in radeon_llvm_emit_store makes no sense. This also allows us to get rid of get_alloca_for_array. --- .../drivers/radeon/radeon_setup_tgsi_llvm.c| 77 -- 1 file changed, 28 insertions(+), 49 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 87fc07e..d8ab5b0 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -149,37 +149,20 @@ get_array_range(struct lp_build_tgsi_context *bld_base, get_temp_array(bld_base, reg_index, reg); if (array) return array->range; } range.First = 0; range.Last = bld_base->info->file_max[File]; return range; } -static LLVMValueRef get_alloca_for_array(struct lp_build_tgsi_context *bld_base, -unsigned file, -unsigned index, -const struct tgsi_ind_register *reg) -{ - const struct radeon_llvm_array *array; - - if (file != TGSI_FILE_TEMPORARY) - return NULL; - - array = get_temp_array(bld_base, index, reg); - if (!array) - return NULL; - - return array->alloca; -} - static LLVMValueRef emit_array_index(struct lp_build_tgsi_soa_context *bld, const struct tgsi_ind_register *reg, unsigned offset) { struct gallivm_state *gallivm = bld->bld_base.base.gallivm; if (!reg) { return lp_build_const_int32(gallivm, offset); } @@ -299,44 +282,67 @@ load_value_from_array(struct lp_build_tgsi_context *bld_base, struct tgsi_declaration_range range = get_array_range(bld_base, file, reg_index, reg_indirect); LLVMValueRef index = emit_array_index(bld, reg_indirect, reg_index - range.First); LLVMValueRef array = emit_array_fetch(bld_base, file, type, range, swizzle); return LLVMBuildExtractElement(builder, array, index, ""); } } -static LLVMValueRef +static void store_value_to_array(struct lp_build_tgsi_context *bld_base, LLVMValueRef value, unsigned file, unsigned chan_index, unsigned reg_index, const struct tgsi_ind_register *reg_indirect) { struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base); struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base); struct gallivm_state *gallivm = bld_base->base.gallivm; LLVMBuilderRef builder = gallivm->builder; LLVMValueRef ptr; ptr = get_pointer_into_array(ctx, file, chan_index, reg_index, reg_indirect); if (ptr) { LLVMBuildStore(builder, value, ptr); - return NULL; } else { + unsigned i, size; struct tgsi_declaration_range range = get_array_range(bld_base, file, reg_index, reg_indirect); LLVMValueRef index = emit_array_index(bld, reg_indirect, reg_index - range.First); LLVMValueRef array = emit_array_fetch(bld_base, file, TGSI_TYPE_FLOAT, range, chan_index); - return LLVMBuildInsertElement(builder, array, value, index, ""); + LLVMValueRef temp_ptr; + + array = LLVMBuildInsertElement(builder, array, value, index, ""); + + size = range.Last - range.First + 1; + for (i = 0; i < size; ++i) { + switch(file) { + case TGSI_FILE_OUTPUT: + temp_ptr = bld->outputs[i + range.First][chan_index]; + break; + + case TGSI_FILE_TEMPORARY: + if (range.First + i >= ctx->temps_count) + continue; + temp_ptr = ctx->temps[(i + range.First) * TGSI_NUM_CHANNELS + chan_index]; + break; + + default: + continue; + } + value = LLVMBuildExtractElement(builder, array, + lp_build_const_int32(gallivm, i), ""); + LLVMBuildStore(builder, value, temp_ptr); + } } } LLVMValueRef radeon_llvm_emit_fetch(struct lp_build_tgsi_context *bld_base, const struct tgsi_full_src_register *reg, enum tgsi_opcode_type type, unsigned swizzle) { struct
[Mesa-dev] [PATCH 07/16] gallium/radeon: clarify the comment on the array alloca heuristic
From: Nicolai Hähnle --- .../drivers/radeon/radeon_setup_tgsi_llvm.c| 29 ++ 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 22ff18e..e4bfa74 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -437,33 +437,42 @@ static void emit_declaration(struct lp_build_tgsi_context *bld_base, decl_size = 4 * ((last - first) + 1); if (decl->Declaration.Array) { unsigned id = decl->Array.ArrayID - 1; if (!ctx->arrays) { int size = bld_base->info->array_max[TGSI_FILE_TEMPORARY]; ctx->arrays = CALLOC(size, sizeof(ctx->arrays[0])); } ctx->arrays[id].range = decl->Range; - /* If the array is more than 16 elements (each element -* is 32-bits), then store it in a vector. Storing the -* array in a vector will causes the compiler to store -* the array in registers and access it using indirect -* addressing. 16 is number of vector elements that -* LLVM will store in a register. -* FIXME: We shouldn't need to do this. LLVM should be -* smart enough to promote allocas int registers when -* profitable. + /* If the array has more than 16 elements, store it +* in memory using an alloca that spans the entire +* array. +* +* Otherwise, store each array element individually. +* We will then generate vectors (per-channel, up to +* <4 x float>) for indirect addressing. +* +* Note that 16 is the number of vector elements that +* LLVM will store in a register, so theoretically an +* array with up to 4 * 16 = 64 elements could be +* handled this way, but whether that's a good idea +* depends on VGPR register pressure elsewhere. +* +* FIXME: We shouldn't need to have the non-alloca +* code path for arrays. LLVM should be smart enough to +* promote allocas into registers when profitable. */ if (decl_size > 16) { array_alloca = LLVMBuildAlloca(builder, - LLVMArrayType(bld_base->base.vec_type, decl_size),"array"); + LLVMArrayType(bld_base->base.vec_type, + decl_size), "array"); ctx->arrays[id].alloca = array_alloca; } } if (!ctx->temps_count) { ctx->temps_count = bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1; ctx->temps = MALLOC(TGSI_NUM_CHANNELS * ctx->temps_count * sizeof(LLVMValueRef)); } if (!array_alloca) { for (i = 0; i < decl_size; ++i) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/16] gallium/radeon: extract common lookup code into get_temp_array function
From: Nicolai Hähnle --- .../drivers/radeon/radeon_setup_tgsi_llvm.c| 73 -- 1 file changed, 40 insertions(+), 33 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index e4bfa74..994c7da 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -104,72 +104,79 @@ static LLVMValueRef emit_swizzle(struct lp_build_tgsi_context *bld_base, swizzles[1] = LLVMConstInt(i32t, swizzle_y, 0); swizzles[2] = LLVMConstInt(i32t, swizzle_z, 0); swizzles[3] = LLVMConstInt(i32t, swizzle_w, 0); return LLVMBuildShuffleVector(bld_base->base.gallivm->builder, value, LLVMGetUndef(LLVMTypeOf(value)), LLVMConstVector(swizzles, 4), ""); } +/** + * Return the description of the array covering the given temporary register + * index. + */ +static const struct radeon_llvm_array * +get_temp_array(struct lp_build_tgsi_context *bld_base, + unsigned reg_index, + const struct tgsi_ind_register *reg) +{ + struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base); + unsigned num_arrays = ctx->soa.bld_base.info->array_max[TGSI_FILE_TEMPORARY]; + unsigned i; + + if (reg && reg->ArrayID > 0 && reg->ArrayID <= num_arrays) + return &ctx->arrays[reg->ArrayID - 1]; + + for (i = 0; i < num_arrays; i++) { + const struct radeon_llvm_array *array = &ctx->arrays[i]; + + if (reg_index >= array->range.First && reg_index <= array->range.Last) + return array; + } + + return NULL; +} + static struct tgsi_declaration_range get_array_range(struct lp_build_tgsi_context *bld_base, unsigned File, unsigned reg_index, const struct tgsi_ind_register *reg) { - struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base); + struct tgsi_declaration_range range; - if (!reg) { - unsigned i; - unsigned num_arrays = bld_base->info->array_max[TGSI_FILE_TEMPORARY]; - for (i = 0; i < num_arrays; i++) { - const struct tgsi_declaration_range *range = - &ctx->arrays[i].range; - - if (reg_index >= range->First && reg_index <= range->Last) { - return ctx->arrays[i].range; - } - } + if (File == TGSI_FILE_TEMPORARY) { + const struct radeon_llvm_array *array = + get_temp_array(bld_base, reg_index, reg); + if (array) + return array->range; } - if (File != TGSI_FILE_TEMPORARY || !reg || reg->ArrayID == 0 || - reg->ArrayID > bld_base->info->array_max[TGSI_FILE_TEMPORARY]) { - struct tgsi_declaration_range range; - range.First = 0; - range.Last = bld_base->info->file_max[File]; - return range; - } - - return ctx->arrays[reg->ArrayID - 1].range; + range.First = 0; + range.Last = bld_base->info->file_max[File]; + return range; } static LLVMValueRef get_alloca_for_array(struct lp_build_tgsi_context *bld_base, unsigned file, unsigned index) { - unsigned i; - unsigned num_arrays; - struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base); + const struct radeon_llvm_array *array; if (file != TGSI_FILE_TEMPORARY) return NULL; - num_arrays = bld_base->info->array_max[TGSI_FILE_TEMPORARY]; - for (i = 0; i < num_arrays; i++) { - const struct tgsi_declaration_range *range = - &ctx->arrays[i].range; + array = get_temp_array(bld_base, index, NULL); + if (!array) + return NULL; - if (index >= range->First && index <= range->Last) { - return ctx->arrays[i].alloca; - } - } - return NULL; + return array->alloca; } static LLVMValueRef emit_array_index(struct lp_build_tgsi_soa_context *bld, const struct tgsi_ind_register *reg, unsigned offset) { struct gallivm_state *gallivm = bld->bld_base.base.gallivm; if (!reg) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/16] gallium/radeon: clean up emit_declaration for temporaries
From: Nicolai Hähnle In the alloca'd array case, no longer create redundant and unused allocas for the individual elements; create getelementptrs instead. Reviewed-by: Tom Stellard --- .../drivers/radeon/radeon_setup_tgsi_llvm.c| 27 ++ 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index d75311e..41f24d3 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -408,81 +408,90 @@ static LLVMValueRef si_build_alloca_undef(struct gallivm_state *gallivm, LLVMValueRef ptr = lp_build_alloca(gallivm, type, name); LLVMBuildStore(gallivm->builder, LLVMGetUndef(type), ptr); return ptr; } static void emit_declaration(struct lp_build_tgsi_context *bld_base, const struct tgsi_full_declaration *decl) { struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base); LLVMBuilderRef builder = bld_base->base.gallivm->builder; - unsigned first, last, i, idx; + unsigned first, last, i; switch(decl->Declaration.File) { case TGSI_FILE_ADDRESS: { unsigned idx; for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) { unsigned chan; for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) { ctx->soa.addr[idx][chan] = si_build_alloca_undef( &ctx->gallivm, ctx->soa.bld_base.uint_bld.elem_type, ""); } } break; } case TGSI_FILE_TEMPORARY: { + LLVMValueRef array_alloca = NULL; unsigned decl_size; first = decl->Range.First; last = decl->Range.Last; decl_size = 4 * ((last - first) + 1); if (decl->Declaration.Array) { unsigned id = decl->Array.ArrayID - 1; if (!ctx->arrays) { int size = bld_base->info->array_max[TGSI_FILE_TEMPORARY]; ctx->arrays = CALLOC(size, sizeof(ctx->arrays[0])); - for (i = 0; i < size; ++i) { - assert(!ctx->arrays[i].alloca);} } ctx->arrays[id].range = decl->Range; /* If the array is more than 16 elements (each element * is 32-bits), then store it in a vector. Storing the * array in a vector will causes the compiler to store * the array in registers and access it using indirect * addressing. 16 is number of vector elements that * LLVM will store in a register. * FIXME: We shouldn't need to do this. LLVM should be * smart enough to promote allocas int registers when * profitable. */ if (decl_size > 16) { - ctx->arrays[id].alloca = LLVMBuildAlloca(builder, + array_alloca = LLVMBuildAlloca(builder, LLVMArrayType(bld_base->base.vec_type, decl_size),"array"); + ctx->arrays[id].alloca = array_alloca; } } - first = decl->Range.First; - last = decl->Range.Last; + if (!ctx->temps_count) { ctx->temps_count = bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1; ctx->temps = MALLOC(TGSI_NUM_CHANNELS * ctx->temps_count * sizeof(LLVMValueRef)); } - for (idx = first; idx <= last; idx++) { - for (i = 0; i < TGSI_NUM_CHANNELS; i++) { - ctx->temps[idx * TGSI_NUM_CHANNELS + i] = + if (!array_alloca) { + for (i = 0; i < decl_size; ++i) { + ctx->temps[first * TGSI_NUM_CHANNELS + i] = si_build_alloca_undef(bld_base->base.gallivm, bld_base->base.vec_type, "temp"); } + } else { + LLVMValueRef idxs[2] = { + bld_base->uint_bld.zero, + NULL + }; + for (i = 0; i < decl_size; ++i) { + idxs[1] = lp_build_con
[Mesa-dev] [PATCH v2 00/16] radeonsi: improve handling of temporary arrays
Hi, this is a respin of the series which scans the shader's TGSI to determine which channels of an array are actually written to. Most of the st/mesa changes have become unnecessary. Most of the radeon-specific part stays the same. For one F1 2015 shader, it reduces the scratch size from 132096 to 26624 bytes, which is bound to be much nicer on the texture cache. Please review! Thanks, Nicolai ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/16] gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressing
From: Nicolai Hähnle We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard --- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 5 - 1 file changed, 5 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 41f24d3..e084248 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -352,25 +352,20 @@ LLVMValueRef radeon_llvm_emit_fetch(struct lp_build_tgsi_context *bld_base, case TGSI_FILE_TEMPORARY: if (reg->Register.Index >= ctx->temps_count) return LLVMGetUndef(tgsi2llvmtype(bld_base, type)); ptr = ctx->temps[reg->Register.Index * TGSI_NUM_CHANNELS + swizzle]; if (tgsi_type_is_64bit(type)) { ptr2 = ctx->temps[reg->Register.Index * TGSI_NUM_CHANNELS + swizzle + 1]; return radeon_llvm_emit_fetch_64bit(bld_base, type, LLVMBuildLoad(builder, ptr, ""), LLVMBuildLoad(builder, ptr2, "")); } - LLVMValueRef array = get_alloca_for_array(bld_base, reg->Register.File, reg->Register.Index); - if (array) { - return bitcast(bld_base, type, load_value_from_array(bld_base, reg->Register.File, type, - swizzle, reg->Register.Index, NULL)); - } result = LLVMBuildLoad(builder, ptr, ""); break; case TGSI_FILE_OUTPUT: ptr = lp_get_output_ptr(bld, reg->Register.Index, swizzle); if (tgsi_type_is_64bit(type)) { ptr2 = lp_get_output_ptr(bld, reg->Register.Index, swizzle + 1); return radeon_llvm_emit_fetch_64bit(bld_base, type, LLVMBuildLoad(builder, ptr, ""), LLVMBuildLoad(builder, ptr2, "")); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/16] st_glsl_to_tgsi: use calloc the way it's meant to be used
From: Nicolai Hähnle --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 362559f..89e5c4d 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -6031,21 +6031,21 @@ st_translate_program( goto out; } t->procType = procType; t->inputMapping = inputMapping; t->outputMapping = outputMapping; t->ureg = ureg; t->num_temp_arrays = program->next_array; if (t->num_temp_arrays) t->arrays = (struct ureg_dst*) - calloc(1, sizeof(t->arrays[0]) * t->num_temp_arrays); + calloc(t->num_temp_arrays, sizeof(t->arrays[0])); /* * Declare input attributes. */ switch (procType) { case PIPE_SHADER_FRAGMENT: for (i = 0; i < numInputs; i++) { unsigned array_id = 0; unsigned array_size; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/16] tgsi/scan: add tgsi_scan_arrays
From: Nicolai Hähnle --- src/gallium/auxiliary/tgsi/tgsi_scan.c | 76 ++ src/gallium/auxiliary/tgsi/tgsi_scan.h | 17 2 files changed, 93 insertions(+) diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c b/src/gallium/auxiliary/tgsi/tgsi_scan.c index 98d86fc..0167e22 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_scan.c +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c @@ -628,20 +628,96 @@ tgsi_scan_shader(const struct tgsi_token *tokens, info->file_max[TGSI_FILE_INPUT] = MAX2(info->file_max[TGSI_FILE_INPUT], num_verts - 1); for (j = 0; j < num_verts; ++j) { info->file_mask[TGSI_FILE_INPUT] |= (1 << j); } } tgsi_parse_free(&parse); } +/** + * Collect information about the arrays of a given register file. + * + * @param tokens TGSI shader + * @param file the register file to scan through + * @param max_array_id number of entries in @p arrays; should be equal to the + * highest array id, i.e. tgsi_shader_info::array_max[file]. + * @param arrays info for array of each ID will be written to arrays[ID - 1]. + */ +void +tgsi_scan_arrays(const struct tgsi_token *tokens, + unsigned file, + unsigned max_array_id, + struct tgsi_array_info *arrays) +{ + struct tgsi_parse_context parse; + + if (tgsi_parse_init(&parse, tokens) != TGSI_PARSE_OK) { + debug_printf("tgsi_parse_init() failed in tgsi_scan_arrays()!\n"); + return; + } + + memset(arrays, 0, sizeof(arrays[0]) * max_array_id); + + while (!tgsi_parse_end_of_tokens(&parse)) { + struct tgsi_full_instruction *inst; + + tgsi_parse_token(&parse); + + if (parse.FullToken.Token.Type == TGSI_TOKEN_TYPE_DECLARATION) { + struct tgsi_full_declaration *decl = &parse.FullToken.FullDeclaration; + + if (decl->Declaration.Array && decl->Declaration.File == file && + decl->Array.ArrayID > 0 && decl->Array.ArrayID <= max_array_id) { +struct tgsi_array_info *array = &arrays[decl->Array.ArrayID - 1]; +assert(!array->declared); +array->declared = true; +array->range = decl->Range; + } + } + + if (parse.FullToken.Token.Type != TGSI_TOKEN_TYPE_INSTRUCTION) + continue; + + inst = &parse.FullToken.FullInstruction; + for (unsigned i = 0; i < inst->Instruction.NumDstRegs; i++) { + const struct tgsi_full_dst_register *dst = &inst->Dst[i]; + if (dst->Register.File != file) +continue; + + if (dst->Register.Indirect) { +if (dst->Indirect.ArrayID > 0 && +dst->Indirect.ArrayID <= max_array_id) { + arrays[dst->Indirect.ArrayID - 1].writemask |= dst->Register.WriteMask; +} else { + /* Indirect writes without an ArrayID can write anywhere. */ + for (unsigned j = 0; j < max_array_id; ++j) + arrays[j].writemask |= dst->Register.WriteMask; +} + } else { +/* Check whether the write falls into any of the arrays anyway. */ +for (unsigned j = 0; j < max_array_id; ++j) { + struct tgsi_array_info *array = &arrays[j]; + if (array->declared && + dst->Register.Index >= array->range.First && + dst->Register.Index <= array->range.Last) + array->writemask |= dst->Register.WriteMask; +} + } + } + } + + tgsi_parse_free(&parse); + + return; +} /** * Check if the given shader is a "passthrough" shader consisting of only * MOV instructions of the form: MOV OUT[n], IN[n] * */ boolean tgsi_is_passthrough_shader(const struct tgsi_token *tokens) { diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.h b/src/gallium/auxiliary/tgsi/tgsi_scan.h index f7eefa4..30d0146 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_scan.h +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.h @@ -142,23 +142,40 @@ struct tgsi_shader_info unsigned indirect_files_written; unsigned properties[TGSI_PROPERTY_COUNT]; /* index with TGSI_PROPERTY_ */ /** * Max nesting limit of loops/if's */ unsigned max_depth; }; +struct tgsi_array_info +{ + /** Whether an array with this ID was declared. */ + bool declared; + + /** The OR of all writemasks used to write to this array. */ + ubyte writemask; + + /** The range with which the array was declared. */ + struct tgsi_declaration_range range; +}; + extern void tgsi_scan_shader(const struct tgsi_token *tokens, struct tgsi_shader_info *info); +void +tgsi_scan_arrays(const struct tgsi_token *tokens, + unsigned file, + unsigned max_array_id, + struct tgsi_array_info *arrays); extern boolean tgsi_is_passthrough_shader(const struct tgsi_token *tokens); #ifdef __cplusplus } // extern "C" #endif
Re: [Mesa-dev] [PATCH] st/mesa: in ATI fs don't assume TEMP0=REG0
On Wed, Aug 10, 2016 at 8:40 PM, Miklós Máté wrote: > Thanks. I forgot to add that this should also be applied to the 12.0 stable > branch. OK. I've sent it to mesa-stable for inclusion into 12.0. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] st/mesa: in ATI fs don't assume TEMP0=REG0
From: Miklós Máté The temporaries are allocated dynamically. Signed-off-by: Miklós Máté Signed-off-by: Marek Olšák Cc: 12.0 --- src/mesa/state_tracker/st_atifs_to_tgsi.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c b/src/mesa/state_tracker/st_atifs_to_tgsi.c index 66f442a..ae08796 100644 --- a/src/mesa/state_tracker/st_atifs_to_tgsi.c +++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c @@ -691,6 +691,7 @@ transform_inst: struct tgsi_full_instruction inst; unsigned i; int fogc_index = -1; + int reg0_index = current_inst->Src[0].Register.Index; /* find FOGC input */ for (i = 0; i < ctx->info.num_inputs; i++) { @@ -804,11 +805,11 @@ transform_inst: inst.Instruction.Opcode = TGSI_OPCODE_LRP; inst.Instruction.NumDstRegs = 1; inst.Dst[0].Register.File = TGSI_FILE_TEMPORARY; - inst.Dst[0].Register.Index = 0; + inst.Dst[0].Register.Index = reg0_index; inst.Dst[0].Register.WriteMask = TGSI_WRITEMASK_XYZW; inst.Instruction.NumSrcRegs = 3; SET_SRC(&inst, 0, TGSI_FILE_TEMPORARY, ctx->fog_factor_temp, X, X, X, Y); - SET_SRC(&inst, 1, TGSI_FILE_TEMPORARY, 0, X, Y, Z, W); + SET_SRC(&inst, 1, TGSI_FILE_TEMPORARY, reg0_index, X, Y, Z, W); SET_SRC(&inst, 2, TGSI_FILE_CONSTANT, MAX_NUM_FRAGMENT_CONSTANTS_ATI + 1, X, Y, Z, W); tctx->emit_instruction(tctx, &inst); } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: in ATI fs don't assume TEMP0=REG0
Thanks. I forgot to add that this should also be applied to the 12.0 stable branch. MM On 08/10/2016 03:07 PM, Marek Olšák wrote: Pushed, thanks. Marek On Mon, Aug 8, 2016 at 12:48 AM, Miklós Máté wrote: The temporaries are allocated dynamically. Signed-off-by: Miklós Máté --- src/mesa/state_tracker/st_atifs_to_tgsi.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c b/src/mesa/state_tracker/st_atifs_to_tgsi.c index 66f442a..ae08796 100644 --- a/src/mesa/state_tracker/st_atifs_to_tgsi.c +++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c @@ -691,6 +691,7 @@ transform_inst: struct tgsi_full_instruction inst; unsigned i; int fogc_index = -1; + int reg0_index = current_inst->Src[0].Register.Index; /* find FOGC input */ for (i = 0; i < ctx->info.num_inputs; i++) { @@ -804,11 +805,11 @@ transform_inst: inst.Instruction.Opcode = TGSI_OPCODE_LRP; inst.Instruction.NumDstRegs = 1; inst.Dst[0].Register.File = TGSI_FILE_TEMPORARY; - inst.Dst[0].Register.Index = 0; + inst.Dst[0].Register.Index = reg0_index; inst.Dst[0].Register.WriteMask = TGSI_WRITEMASK_XYZW; inst.Instruction.NumSrcRegs = 3; SET_SRC(&inst, 0, TGSI_FILE_TEMPORARY, ctx->fog_factor_temp, X, X, X, Y); - SET_SRC(&inst, 1, TGSI_FILE_TEMPORARY, 0, X, Y, Z, W); + SET_SRC(&inst, 1, TGSI_FILE_TEMPORARY, reg0_index, X, Y, Z, W); SET_SRC(&inst, 2, TGSI_FILE_CONSTANT, MAX_NUM_FRAGMENT_CONSTANTS_ATI + 1, X, Y, Z, W); tctx->emit_instruction(tctx, &inst); } -- 2.8.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On 08/10/2016 11:24 AM, Kenneth Graunke wrote: > On Wednesday, August 10, 2016 10:02:12 AM PDT Erik Faye-Lund wrote: >> On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke >> wrote: >>> On Haswell (GL 3.3): >>> >>> total instructions in shared programs: 6208759 -> 6203860 (-0.08%) >>> instructions in affected programs: 856541 -> 851642 (-0.57%) >>> helped: 3157 >>> HURT: 113 >>> LOST: 7 >>> GAINED: 15 >>> >>> On Broadwell (GL 4.4): >>> >>> total instructions in shared programs: 11637854 -> 11632016 (-0.05%) >>> instructions in affected programs: 1055693 -> 1049855 (-0.55%) >>> helped: 3900 >>> HURT: 176 >>> LOST: 1 >>> GAINED: 18 >>> >>> Signed-off-by: Kenneth Graunke >>> --- >>> src/compiler/nir/nir_opt_algebraic.py | 4 >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/src/compiler/nir/nir_opt_algebraic.py >>> b/src/compiler/nir/nir_opt_algebraic.py >>> index 1cf614c..4e9896f 100644 >>> --- a/src/compiler/nir/nir_opt_algebraic.py >>> +++ b/src/compiler/nir/nir_opt_algebraic.py >>> @@ -251,6 +251,10 @@ optimizations = [ >>> (('ieq', 'a@bool', False), ('inot', 'a')), >>> (('bcsel', a, True, False), ('ine', a, 0)), >>> (('bcsel', a, False, True), ('ieq', a, 0)), >>> + (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))), >>> + (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))), >>> + (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0, >>> + (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0, >>> (('bcsel', True, b, c), b), >>> (('bcsel', False, b, c), c), >>> # The result of this should be hit by constant propagation and, in the >> >> Same as the previous patch, this smells like intel-isms. Hardware that >> has native bcsel with support for two inline immediates will do better >> without. > > It definitely feels a little strange replacing a single bcsel with > a fneg/b2f/ine, as that's three operations instead of one. > > I expect the ine to go away - assuming 'a' is a properly formatted > boolean (0 or 0x), "ine a 0" will just become 'a'. ieq would > turn into inot. If the boolean was a comparison, the inot could be > folded in - i.e. inot(flt(a,b)) -> fge(a,b). Or, some GPUs can handle > boolean negation as a source modifier, so it might be free there too. > > Floating point negation can usually be done as a source modifier. > > For reference, here's a shader snippet from Goat Simulator which > prompted me to write this optimization: > > const vec4 LocalConst1 = vec4(0.25, -0.25, 0.00, 1.00); > > void main() > { > ... > InstrHelpTemp.r = ( ( Temporary1.r >= 0.0 ) ? LocalConst1.b : LocalConst1.a ); > ... > } > > which could be turned into > > InstrHelpTemp.r = float(Temporary1.r < 0.0); > > which seems arguably better, regardless of hardware. Yeah... I remember looking at that very shader. I seem to recall that InstrHelpTemp.r is then compared with 0.0 a little later to generate another Boolean value. :) > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] nir: Convert ineg(b2i(a)) to a if it's a boolean.
On Wednesday, August 10, 2016 10:53:17 AM PDT Ian Romanick wrote: > My most "recent" work in this area is in my appropriately named > assume-CMS-in-precompile branch. I should just rename all my branches > idr-got-distracted-#. Thanks, I tried to look for your recent work on this, but in addition to the name, that branch wasn't in your repository yesterday... signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On Wednesday, August 10, 2016 10:02:12 AM PDT Erik Faye-Lund wrote: > On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke > wrote: > > On Haswell (GL 3.3): > > > > total instructions in shared programs: 6208759 -> 6203860 (-0.08%) > > instructions in affected programs: 856541 -> 851642 (-0.57%) > > helped: 3157 > > HURT: 113 > > LOST: 7 > > GAINED: 15 > > > > On Broadwell (GL 4.4): > > > > total instructions in shared programs: 11637854 -> 11632016 (-0.05%) > > instructions in affected programs: 1055693 -> 1049855 (-0.55%) > > helped: 3900 > > HURT: 176 > > LOST: 1 > > GAINED: 18 > > > > Signed-off-by: Kenneth Graunke > > --- > > src/compiler/nir/nir_opt_algebraic.py | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > > b/src/compiler/nir/nir_opt_algebraic.py > > index 1cf614c..4e9896f 100644 > > --- a/src/compiler/nir/nir_opt_algebraic.py > > +++ b/src/compiler/nir/nir_opt_algebraic.py > > @@ -251,6 +251,10 @@ optimizations = [ > > (('ieq', 'a@bool', False), ('inot', 'a')), > > (('bcsel', a, True, False), ('ine', a, 0)), > > (('bcsel', a, False, True), ('ieq', a, 0)), > > + (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))), > > + (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))), > > + (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0, > > + (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0, > > (('bcsel', True, b, c), b), > > (('bcsel', False, b, c), c), > > # The result of this should be hit by constant propagation and, in the > > Same as the previous patch, this smells like intel-isms. Hardware that > has native bcsel with support for two inline immediates will do better > without. It definitely feels a little strange replacing a single bcsel with a fneg/b2f/ine, as that's three operations instead of one. I expect the ine to go away - assuming 'a' is a properly formatted boolean (0 or 0x), "ine a 0" will just become 'a'. ieq would turn into inot. If the boolean was a comparison, the inot could be folded in - i.e. inot(flt(a,b)) -> fge(a,b). Or, some GPUs can handle boolean negation as a source modifier, so it might be free there too. Floating point negation can usually be done as a source modifier. For reference, here's a shader snippet from Goat Simulator which prompted me to write this optimization: const vec4 LocalConst1 = vec4(0.25, -0.25, 0.00, 1.00); void main() { ... InstrHelpTemp.r = ( ( Temporary1.r >= 0.0 ) ? LocalConst1.b : LocalConst1.a ); ... } which could be turned into InstrHelpTemp.r = float(Temporary1.r < 0.0); which seems arguably better, regardless of hardware. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/6] nir: Turn -(b2f(a) + b2f(b) >= 0 into !(a || b).
> > > > For now, this patch is > > > > Reviewed-by: Ian Romanick > I had a hard time parsing the title: "Turn -(b2f(a) + b2f(b) >= 0 into !(a || b)" at first, until I saw the replacement instructions. You're missing a ')' on the commit line. :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/6] nir: Introduce a nir_opt_move_comparisons() pass.
On 08/09/2016 07:30 PM, Kenneth Graunke wrote: > This tries to move comparisons (a common source of boolean values) > closer to their first use. For GPUs which use condition codes, > this can eliminate a lot of temporary booleans and comparisons > which reload the condition code register based on a boolean. > > Signed-off-by: Kenneth Graunke > --- > src/compiler/Makefile.sources | 1 + > src/compiler/nir/nir.h | 2 + > src/compiler/nir/nir_opt_move_comparisons.c | 173 > > 3 files changed, 176 insertions(+) > create mode 100644 src/compiler/nir/nir_opt_move_comparisons.c > > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources > index 0ff9b23..008a101 100644 > --- a/src/compiler/Makefile.sources > +++ b/src/compiler/Makefile.sources > @@ -226,6 +226,7 @@ NIR_FILES = \ > nir/nir_opt_gcm.c \ > nir/nir_opt_global_to_local.c \ > nir/nir_opt_peephole_select.c \ > + nir/nir_opt_move_comparisons.c \ > nir/nir_opt_remove_phis.c \ > nir/nir_opt_undef.c \ > nir/nir_phi_builder.c \ > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > index 9ce5be2..79511a7 100644 > --- a/src/compiler/nir/nir.h > +++ b/src/compiler/nir/nir.h > @@ -2574,6 +2574,8 @@ bool nir_opt_dead_cf(nir_shader *shader); > > void nir_opt_gcm(nir_shader *shader); > > +bool nir_opt_move_comparisons(nir_shader *shader); > + > bool nir_opt_peephole_select(nir_shader *shader); > > bool nir_opt_remove_phis(nir_shader *shader); > diff --git a/src/compiler/nir/nir_opt_move_comparisons.c > b/src/compiler/nir/nir_opt_move_comparisons.c > new file mode 100644 > index 000..74927c9 > --- /dev/null > +++ b/src/compiler/nir/nir_opt_move_comparisons.c > @@ -0,0 +1,173 @@ > +/* > + * Copyright © 2016 Intel Corporation > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > DEALINGS > + * IN THE SOFTWARE. > + */ > + > +#include "nir.h" > + > +/** > + * \file nir_opt_move_comparisons.c > + * > + * This pass moves ALU comparison operations just before their first use. > + * > + * It only moves instructions within a single basic block; cross-block > + * movement is left to global code motion. > + * > + * Many GPUs generate condition codes for comparisons, and use predication > + * for conditional selects and control flow. In a sequence such as: > + * > + * vec1 32 ssa_1 = flt a b > + * > + * vec1 32 ssa_2 = bcsel ssa_1 c d > + * > + * the backend would likely do the comparison, producing condition codes, > + * then save those to a boolean value. The intervening operations might > + * trash the condition codes. Then, in order to do the bcsel, it would > + * need to re-populate the condition code register based on the boolean. > + * > + * By moving the comparison just before the bcsel, the condition codes could > + * be used directly. This eliminates the need to reload them from the > boolean > + * (generally eliminating an instruction). It may also eliminate the need to > + * create a boolean value altogether (unless it's used elsewhere), which > could > + * lower register pressure. > + */ > + > +static bool > +is_comparison(nir_op op) > +{ > + switch (op) { > + case nir_op_flt: > + case nir_op_fge: > + case nir_op_feq: > + case nir_op_fne: > + case nir_op_ilt: > + case nir_op_ult: > + case nir_op_ige: > + case nir_op_uge: > + case nir_op_ieq: > + case nir_op_ine: > + return true; > + default: > + return false; > + } > +} > + > +static bool > +move_comparison_source(nir_src *src, nir_block *block, struct exec_node > *before) > +{ > + if (src->is_ssa && src->ssa->parent_instr->block == block && > + src->ssa->parent_instr->type == nir_instr_type_alu && > + is_comparison(nir_instr_as_alu(src->ssa->parent_instr)->op)) { > + > + struct exec_node *src_node = &src->ssa->parent_instr
Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On Wed, Aug 10, 2016 at 1:53 PM, Eric Anholt wrote: > Kenneth Graunke writes: > >> On Haswell (GL 3.3): >> >> total instructions in shared programs: 6208759 -> 6203860 (-0.08%) >> instructions in affected programs: 856541 -> 851642 (-0.57%) >> helped: 3157 >> HURT: 113 >> LOST: 7 >> GAINED: 15 >> >> On Broadwell (GL 4.4): >> >> total instructions in shared programs: 11637854 -> 11632016 (-0.05%) >> instructions in affected programs: 1055693 -> 1049855 (-0.55%) >> helped: 3900 >> HURT: 176 >> LOST: 1 >> GAINED: 18 >> >> Signed-off-by: Kenneth Graunke >> --- >> src/compiler/nir/nir_opt_algebraic.py | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/src/compiler/nir/nir_opt_algebraic.py >> b/src/compiler/nir/nir_opt_algebraic.py >> index 1cf614c..4e9896f 100644 >> --- a/src/compiler/nir/nir_opt_algebraic.py >> +++ b/src/compiler/nir/nir_opt_algebraic.py >> @@ -251,6 +251,10 @@ optimizations = [ >> (('ieq', 'a@bool', False), ('inot', 'a')), >> (('bcsel', a, True, False), ('ine', a, 0)), >> (('bcsel', a, False, True), ('ieq', a, 0)), >> + (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))), >> + (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))), >> + (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0, >> + (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0, > > Isn't bcsel's first arg guaranteed to be 0 or ~0? I thought that was > how nir's bcsel (and b2f) worked. If not, it would be good to see this > documented. Yes, any ALU operation that takes a boolean input can expect it to be 0 or ~0. If that isn't true, then something else in the compiler has messed up. So I think we can replace 'a != 0' with 'a' and 'a == 0' with '!a'. > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/6] nir: Turn -(b2f(a) + b2f(b) >= 0 into !(a || b).
On Wednesday, August 10, 2016 10:58:49 AM PDT Ian Romanick wrote: > On 08/09/2016 07:30 PM, Kenneth Graunke wrote: > > On Haswell (GL 3.3): > > > > total instructions in shared programs: 6211485 -> 6211427 (-0.00%) > > instructions in affected programs: 16260 -> 16202 (-0.36%) > > helped: 25 > > HURT: 37 > > > > On Broadwell (GL 4.4): > > > > total instructions in shared programs: 11640288 -> 11640218 (-0.00%) > > instructions in affected programs: 16313 -> 16243 (-0.43%) > > helped: 27 > > HURT: 37 > > Interesting... I had a similar version of this that worked on arbitrary > length sequences of (b2f(a_0) + .. b2f(a_n)), but I didn't get any > benefit. I also had tests for a couple other relational operators with > 0 that didn't see benefit. Maybe one of us should try to revive my old, > more generic series at some point. > > For now, this patch is > > Reviewed-by: Ian Romanick I believe I first saw this sequence after my bcsel -1.0 -0.0 optimization. I also imported over 11,000 more shaders recently - probably after you last did this experiment. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/6] i965: Use the nir_move_comparisons pass.
On 08/09/2016 07:30 PM, Kenneth Graunke wrote: > On Haswell (GL 3.3): > > total instructions in shared programs: 6211427 -> 6210079 (-0.02%) > instructions in affected programs: 219356 -> 218008 (-0.61%) > helped: 567 > HURT: 132 > > No spill/fill changes. > > LOST: 0 > GAINED: 4 > > On Broadwell (GL 4.4): > > total instructions in shared programs: 11640218 -> 11632136 (-0.07%) > instructions in affected programs: 542528 -> 534446 (-1.49%) > helped: 1661 > HURT: 143 > > total spills in shared programs: 2922 -> 2932 (0.34%) > spills in affected programs: 420 -> 430 (2.38%) > helped: 1 > HURT: 1 > > total fills in shared programs: 4389 -> 4390 (0.02%) > fills in affected programs: 967 -> 968 (0.10%) > helped: 1 > HURT: 1 I would have thought that this would help more. I wonder if it's being held back by the poor way we handle || and &&. We always implement these as actual instructions, but we could use predication. I had looked into this a very little bit in days just before NIR, and it looked promising. Once NIR landed, I couldn't tell the difference between logical-or and bitwise-or, and I abandoned the effort. I also had problems dealing with the single condition code register that we use. Long sequences like (a && b || c && d) really want to use all the condition code registers. > Signed-off-by: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/brw_nir.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c > b/src/mesa/drivers/dri/i965/brw_nir.c > index db2056d..b02c404 100644 > --- a/src/mesa/drivers/dri/i965/brw_nir.c > +++ b/src/mesa/drivers/dri/i965/brw_nir.c > @@ -487,6 +487,7 @@ brw_postprocess_nir(nir_shader *nir, > OPT_V(nir_lower_to_source_mods); > OPT(nir_copy_prop); > OPT(nir_opt_dce); > + OPT(nir_opt_move_comparisons); > > OPT(nir_lower_locals_to_regs); > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Add support for scalarizing load_interpolated_input.
Kenneth Graunke writes: > Signed-off-by: Kenneth Graunke > --- > src/compiler/nir/nir_lower_io_to_scalar.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/src/compiler/nir/nir_lower_io_to_scalar.c > b/src/compiler/nir/nir_lower_io_to_scalar.c > index f2345d5..8be75e4 100644 > --- a/src/compiler/nir/nir_lower_io_to_scalar.c > +++ b/src/compiler/nir/nir_lower_io_to_scalar.c > @@ -48,8 +48,10 @@ lower_load_input_to_scalar(nir_builder *b, > nir_intrinsic_instr *intr) > >nir_intrinsic_set_base(chan_intr, nir_intrinsic_base(intr)); >nir_intrinsic_set_component(chan_intr, nir_intrinsic_component(intr) + > i); > - /* offset */ > - chan_intr->src[0] = intr->src[0]; > + > + /* offset and possibly barycentric coordinates */ > + for (int i = 0; i < nir_intrinsic_infos[intr->intrinsic].num_srcs; i++) > + chan_intr->src[i] = intr->src[i]; > >nir_builder_instr_insert(b, &chan_intr->instr); > > @@ -112,6 +114,7 @@ nir_lower_io_to_scalar(nir_shader *shader, > nir_variable_mode mask) > > switch (intr->intrinsic) { > case nir_intrinsic_load_input: > + case nir_intrinsic_load_interpolated_input: >if (mask & nir_var_shader_in) > lower_load_input_to_scalar(&b, intr); >break; Don't you also need: if (intr->intrinsic == nir_intrinsic_load_interpolated_input) nir_intrinsic_set_interp_mode(chan_intr, nir_intrinsic_interp_mode(intr)); signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On 08/09/2016 07:30 PM, Kenneth Graunke wrote: > On Haswell (GL 3.3): > > total instructions in shared programs: 6208759 -> 6203860 (-0.08%) > instructions in affected programs: 856541 -> 851642 (-0.57%) > helped: 3157 > HURT: 113 > LOST: 7 > GAINED: 15 > > On Broadwell (GL 4.4): > > total instructions in shared programs: 11637854 -> 11632016 (-0.05%) > instructions in affected programs: 1055693 -> 1049855 (-0.55%) > helped: 3900 > HURT: 176 > LOST: 1 > GAINED: 18 > > Signed-off-by: Kenneth Graunke > --- > src/compiler/nir/nir_opt_algebraic.py | 4 > 1 file changed, 4 insertions(+) > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index 1cf614c..4e9896f 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -251,6 +251,10 @@ optimizations = [ > (('ieq', 'a@bool', False), ('inot', 'a')), > (('bcsel', a, True, False), ('ine', a, 0)), > (('bcsel', a, False, True), ('ieq', a, 0)), > + (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))), > + (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))), > + (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0, > + (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0, One of the other things I was experimenting with, that had mixed results, was rearranging larger sequences like x * bcsel(b, 1.0, 0.0). There were a few different ways to handle it, and the optimal way was dependent on the way the Boolean value was generated and on the way the result was used. This dovetailed into my work in removing spurious b2f (and f2i) expressions. > (('bcsel', True, b, c), b), > (('bcsel', False, b, c), c), > # The result of this should be hit by constant propagation and, in the > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/6] nir: Turn -(b2f(a) + b2f(b) >= 0 into !(a || b).
On 08/09/2016 07:30 PM, Kenneth Graunke wrote: > On Haswell (GL 3.3): > > total instructions in shared programs: 6211485 -> 6211427 (-0.00%) > instructions in affected programs: 16260 -> 16202 (-0.36%) > helped: 25 > HURT: 37 > > On Broadwell (GL 4.4): > > total instructions in shared programs: 11640288 -> 11640218 (-0.00%) > instructions in affected programs: 16313 -> 16243 (-0.43%) > helped: 27 > HURT: 37 Interesting... I had a similar version of this that worked on arbitrary length sequences of (b2f(a_0) + .. b2f(a_n)), but I didn't get any benefit. I also had tests for a couple other relational operators with 0 that didn't see benefit. Maybe one of us should try to revive my old, more generic series at some point. For now, this patch is Reviewed-by: Ian Romanick > Signed-off-by: Kenneth Graunke > --- > src/compiler/nir/nir_opt_algebraic.py | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index 4e9896f..ef87d4d 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -135,6 +135,9 @@ optimizations = [ > # inot(a) > (('fge', 0.0, ('b2f', a)), ('inot', a)), > > + # -(b2f(a) + b2f(b)) >= 0 becomes !(a || b) > + (('fge', ('fneg', ('fadd', ('b2f', 'a@bool'), ('b2f', 'b@bool'))), 0.0), > ('inot', ('ior', a, b))), > + > # 0.0 < fabs(a) > # fabs(a) > 0.0 > # fabs(a) != 0.0 because fabs(a) must be >= 0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] nir: Convert ineg(b2i(a)) to a if it's a boolean.
My most "recent" work in this area is in my appropriately named assume-CMS-in-precompile branch. I should just rename all my branches idr-got-distracted-#. On 08/09/2016 07:30 PM, Kenneth Graunke wrote: > On Haswell (GL 3.3): > > total instructions in shared programs: 6211678 -> 6211584 (-0.00%) > instructions in affected programs: 18968 -> 18874 (-0.50%) > helped: 62 > HURT: 0 > LOST: 0 > GAINED: 4 > > On Broadwell (GL 4.4): > > total instructions in shared programs: 11638930 -> 11638181 (-0.01%) > instructions in affected programs: 84768 -> 84019 (-0.88%) > helped: 505 > HURT: 45 > > Signed-off-by: Kenneth Graunke > --- > src/compiler/nir/nir_opt_algebraic.py | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index 0f0896b..1cf614c 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -180,6 +180,7 @@ optimizations = [ > (('fmul', ('b2f', a), ('b2f', b)), ('b2f', ('iand', a, b))), > (('fsat', ('fadd', ('b2f', a), ('b2f', b))), ('b2f', ('ior', a, b))), > (('iand', 'a@bool', 1.0), ('b2f', a)), > + (('ineg', ('b2i', 'a@bool')), a), I think I might like a comment that this works because Boolean true and false are always ~0 and 0 in NIR. With that, Reviewed-by: Ian Romanick > (('flt', ('fneg', ('b2f', a)), 0), a), # Generated by TGSI KILL_IF. > (('flt', ('fsub', 0.0, ('b2f', a)), 0), a), # Generated by TGSI KILL_IF. > # Comparison with the same args. Note that these are not done for > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
Kenneth Graunke writes: > On Haswell (GL 3.3): > > total instructions in shared programs: 6208759 -> 6203860 (-0.08%) > instructions in affected programs: 856541 -> 851642 (-0.57%) > helped: 3157 > HURT: 113 > LOST: 7 > GAINED: 15 > > On Broadwell (GL 4.4): > > total instructions in shared programs: 11637854 -> 11632016 (-0.05%) > instructions in affected programs: 1055693 -> 1049855 (-0.55%) > helped: 3900 > HURT: 176 > LOST: 1 > GAINED: 18 > > Signed-off-by: Kenneth Graunke > --- > src/compiler/nir/nir_opt_algebraic.py | 4 > 1 file changed, 4 insertions(+) > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index 1cf614c..4e9896f 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -251,6 +251,10 @@ optimizations = [ > (('ieq', 'a@bool', False), ('inot', 'a')), > (('bcsel', a, True, False), ('ine', a, 0)), > (('bcsel', a, False, True), ('ieq', a, 0)), > + (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))), > + (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))), > + (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0, > + (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0, Isn't bcsel's first arg guaranteed to be 0 or ~0? I thought that was how nir's bcsel (and b2f) worked. If not, it would be good to see this documented. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] *** Aubinator tool for Intel Gen platforms ***
On Wed, Aug 10, 2016 at 09:43:24AM -0700, Nanley Chery wrote: > On Tue, Aug 09, 2016 at 04:52:12PM -0700, Sirisha Gandikota wrote: > > From: Sirisha Gandikota > > > > This is a patch series for adding the aubinator tool to the codebase. > > > > The aubinator tool is designed to help the driver developers to debug > > the driver functionality by decoding the data in the .aub files. This > > tool is for Intel Gen platforms and has been tested for Gen7.5, Gen8 > > and Gen9 platforms. > > > > Kristian Høgsberg Kristensen (1): > > aubinator: Add a new tool called Aubinator to the src/intel/tools > > folder. > > Hi Sirisha, > > It's great to see this tool on the list. I seems that the first patch in > the series is missing however. Could you send it out? Sorry for the noise. My Gmail marked Patch 1/2 as spam for some odd reason. - Nanley > > Thanks, > Nanley > > > > > Sirisha Gandikota (1): > > aubinator: Fix the tool to correctly decode the DWords > > > > configure.ac |1 + > > src/Makefile.am |4 + > > src/intel/tools/Makefile.am | 65 +++ > > src/intel/tools/aubinator.c | 1140 > > ++ > > src/intel/tools/decoder.c| 515 +++ > > src/intel/tools/decoder.h| 96 > > src/intel/tools/disasm.c | 109 > > src/intel/tools/gen_disasm.h | 35 ++ > > src/intel/tools/intel_aub.h | 154 ++ > > 9 files changed, 2119 insertions(+) > > create mode 100644 src/intel/tools/Makefile.am > > create mode 100644 src/intel/tools/aubinator.c > > create mode 100644 src/intel/tools/decoder.c > > create mode 100644 src/intel/tools/decoder.h > > create mode 100644 src/intel/tools/disasm.c > > create mode 100644 src/intel/tools/gen_disasm.h > > create mode 100644 src/intel/tools/intel_aub.h > > > > -- > > 2.7.4 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] anv/blit2d: Add a format parameter to bind_dst and create_iview
On Wed, Aug 10, 2016 at 10:12:17AM -0700, Nanley Chery wrote: > On Tue, Aug 02, 2016 at 10:00:29AM -0700, Jason Ekstrand wrote: > > Signed-off-by: Jasosn Ekstrand > > Cc: "12.0" I just noticed a bug in this patch. With that fixed the Reviewed-by will hold. Please see the note inline. > > --- > > src/intel/vulkan/anv_meta_blit2d.c | 15 ++- > > 1 file changed, 10 insertions(+), 5 deletions(-) > > > > diff --git a/src/intel/vulkan/anv_meta_blit2d.c > > b/src/intel/vulkan/anv_meta_blit2d.c > > index 9d4c2fc..30bc6ed 100644 > > --- a/src/intel/vulkan/anv_meta_blit2d.c > > +++ b/src/intel/vulkan/anv_meta_blit2d.c > > @@ -99,6 +99,7 @@ create_iview(struct anv_cmd_buffer *cmd_buffer, > > VkImageUsageFlags usage, > > uint32_t width, > > uint32_t height, > > + VkFormat format, > > VkImage *img, > > struct anv_image_view *iview) > > { > > @@ -106,8 +107,7 @@ create_iview(struct anv_cmd_buffer *cmd_buffer, > >.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO, > >.imageType = VK_IMAGE_TYPE_2D, > >/* W-tiled images must be stencil-formatted. */ > > - .format = surf->tiling == ISL_TILING_W ? > > -VK_FORMAT_S8_UINT : vk_format_for_size(surf->bs), > > + .format = format, > >.extent = { > > .width = width, > > .height = height, > > @@ -190,6 +190,8 @@ blit2d_bind_src(struct anv_cmd_buffer *cmd_buffer, > > > >create_iview(cmd_buffer, src, offset, usage, > > rect->src_x + rect->width, rect->src_y + rect->height, > > + src->tiling == ISL_TILING_W ? > > + VK_FORMAT_S8_UINT : vk_format_for_size(src->bs), > > &tmp->image, &tmp->iview); > > > >anv_CreateDescriptorPool(vk_device, > > @@ -339,10 +341,11 @@ blit2d_bind_dst(struct anv_cmd_buffer *cmd_buffer, > > uint64_t offset, > > uint32_t width, > > uint32_t height, > > +enum isl_format format, This should be: VkFormat format, - Nanley > > struct blit2d_dst_temps *tmp) > > { > > create_iview(cmd_buffer, dst, offset, > > VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT, > > -width, height, &tmp->image, &tmp->iview); > > +width, height, format, &tmp->image, &tmp->iview); > > > > anv_CreateFramebuffer(anv_device_to_handle(cmd_buffer->device), > >&(VkFramebufferCreateInfo) { > > @@ -417,7 +420,8 @@ anv_meta_blit2d_normal_dst(struct anv_cmd_buffer > > *cmd_buffer, > > > >struct blit2d_dst_temps dst_temps; > >blit2d_bind_dst(cmd_buffer, dst, offset, rects[r].dst_x + > > rects[r].width, > > - rects[r].dst_y + rects[r].height, &dst_temps); > > + rects[r].dst_y + rects[r].height, > > + vk_format_for_size(dst->bs), &dst_temps); > > > >struct blit_vb_data { > > float pos[2]; > > @@ -555,7 +559,8 @@ anv_meta_blit2d_w_tiled_dst(struct anv_cmd_buffer > > *cmd_buffer, > >}; > > > >struct blit2d_dst_temps dst_temps; > > - blit2d_bind_dst(cmd_buffer, &dst_Y, offset, xmax_Y, ymax_Y, > > &dst_temps); > > + blit2d_bind_dst(cmd_buffer, &dst_Y, offset, xmax_Y, ymax_Y, > > + vk_format_for_size(dst->bs), &dst_temps); > > We know the format block size here, so I think we should replace dst->bs with > 1. There actually isn't any other usage of dst->bs (other than the assert) for > this reason. > > With that change this patch is, > Reviewed-by: Nanley Chery > > > > >struct blit_vb_header { > > struct anv_vue_header vue; > > -- > > 2.5.0.400.gff86faf > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] anv/blit2d: Add a format parameter to bind_dst and create_iview
On Tue, Aug 02, 2016 at 10:00:29AM -0700, Jason Ekstrand wrote: > Signed-off-by: Jasosn Ekstrand > Cc: "12.0" > --- > src/intel/vulkan/anv_meta_blit2d.c | 15 ++- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/src/intel/vulkan/anv_meta_blit2d.c > b/src/intel/vulkan/anv_meta_blit2d.c > index 9d4c2fc..30bc6ed 100644 > --- a/src/intel/vulkan/anv_meta_blit2d.c > +++ b/src/intel/vulkan/anv_meta_blit2d.c > @@ -99,6 +99,7 @@ create_iview(struct anv_cmd_buffer *cmd_buffer, > VkImageUsageFlags usage, > uint32_t width, > uint32_t height, > + VkFormat format, > VkImage *img, > struct anv_image_view *iview) > { > @@ -106,8 +107,7 @@ create_iview(struct anv_cmd_buffer *cmd_buffer, >.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO, >.imageType = VK_IMAGE_TYPE_2D, >/* W-tiled images must be stencil-formatted. */ > - .format = surf->tiling == ISL_TILING_W ? > -VK_FORMAT_S8_UINT : vk_format_for_size(surf->bs), > + .format = format, >.extent = { > .width = width, > .height = height, > @@ -190,6 +190,8 @@ blit2d_bind_src(struct anv_cmd_buffer *cmd_buffer, > >create_iview(cmd_buffer, src, offset, usage, > rect->src_x + rect->width, rect->src_y + rect->height, > + src->tiling == ISL_TILING_W ? > + VK_FORMAT_S8_UINT : vk_format_for_size(src->bs), > &tmp->image, &tmp->iview); > >anv_CreateDescriptorPool(vk_device, > @@ -339,10 +341,11 @@ blit2d_bind_dst(struct anv_cmd_buffer *cmd_buffer, > uint64_t offset, > uint32_t width, > uint32_t height, > +enum isl_format format, > struct blit2d_dst_temps *tmp) > { > create_iview(cmd_buffer, dst, offset, VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT, > -width, height, &tmp->image, &tmp->iview); > +width, height, format, &tmp->image, &tmp->iview); > > anv_CreateFramebuffer(anv_device_to_handle(cmd_buffer->device), >&(VkFramebufferCreateInfo) { > @@ -417,7 +420,8 @@ anv_meta_blit2d_normal_dst(struct anv_cmd_buffer > *cmd_buffer, > >struct blit2d_dst_temps dst_temps; >blit2d_bind_dst(cmd_buffer, dst, offset, rects[r].dst_x + > rects[r].width, > - rects[r].dst_y + rects[r].height, &dst_temps); > + rects[r].dst_y + rects[r].height, > + vk_format_for_size(dst->bs), &dst_temps); > >struct blit_vb_data { > float pos[2]; > @@ -555,7 +559,8 @@ anv_meta_blit2d_w_tiled_dst(struct anv_cmd_buffer > *cmd_buffer, >}; > >struct blit2d_dst_temps dst_temps; > - blit2d_bind_dst(cmd_buffer, &dst_Y, offset, xmax_Y, ymax_Y, > &dst_temps); > + blit2d_bind_dst(cmd_buffer, &dst_Y, offset, xmax_Y, ymax_Y, > + vk_format_for_size(dst->bs), &dst_temps); We know the format block size here, so I think we should replace dst->bs with 1. There actually isn't any other usage of dst->bs (other than the assert) for this reason. With that change this patch is, Reviewed-by: Nanley Chery > >struct blit_vb_header { > struct anv_vue_header vue; > -- > 2.5.0.400.gff86faf > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/8] st/mesa: determine states used or affected by shaders at compile time
On Wed, Aug 10, 2016 at 4:34 PM, Nicolai Hähnle wrote: > On 10.08.2016 13:05, Marek Olšák wrote: >> >> On Tue, Aug 9, 2016 at 12:56 PM, Nicolai Hähnle >> wrote: >>> >>> On 07.08.2016 03:12, Marek Olšák wrote: From: Marek Olšák At compile time, each shader determines which ST_NEW flags should be set at shader bind time. This just sets the new field for all shaders. The next commit will use it. --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 175 - src/mesa/state_tracker/st_program.c| 37 +- src/mesa/state_tracker/st_program.h| 6 + 3 files changed, 215 insertions(+), 3 deletions(-) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 362559f..fd14766 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -,31 +,202 @@ get_mesa_program_tgsi(struct gl_context *ctx, static struct gl_program * get_mesa_program(struct gl_context *ctx, struct gl_shader_program *shader_program, struct gl_linked_shader *shader) { struct pipe_screen *pscreen = ctx->st->pipe->screen; unsigned ptarget = st_shader_stage_to_ptarget(shader->Stage); enum pipe_shader_ir preferred_ir = (enum pipe_shader_ir) pscreen->get_shader_param(pscreen, ptarget, PIPE_SHADER_CAP_PREFERRED_IR); + struct gl_program *prog = NULL; + if (preferred_ir == PIPE_SHADER_IR_NIR) { /* TODO only for GLSL VS/FS for now: */ switch (shader->Stage) { case MESA_SHADER_VERTEX: case MESA_SHADER_FRAGMENT: - return st_nir_get_mesa_program(ctx, shader_program, shader); + prog = st_nir_get_mesa_program(ctx, shader_program, shader); default: break; } + } else { + prog = get_mesa_program_tgsi(ctx, shader_program, shader); + } + + if (prog) { + uint64_t *states; + + /* This determines which states will be updated when the shader is + * bound. + */ + switch (shader->Stage) { + case MESA_SHADER_VERTEX: + states = &((struct st_vertex_program*)prog)->affected_states; + + *states = ST_NEW_VS_STATE | + ST_NEW_RASTERIZER | + ST_NEW_VERTEX_ARRAYS; + + if (prog->Parameters->NumParameters) +*states |= ST_NEW_VS_CONSTANTS; + + if (shader->num_samplers) +*states |= ST_NEW_VS_SAMPLER_VIEWS | + ST_NEW_RENDER_SAMPLERS; + + if (shader->NumImages) +*states |= ST_NEW_VS_IMAGES; + + if (shader->NumUniformBlocks) +*states |= ST_NEW_VS_UBOS; + + if (shader->NumShaderStorageBlocks) +*states |= ST_NEW_VS_SSBOS; + + if (shader->NumAtomicBuffers) +*states |= ST_NEW_VS_ATOMICS; >>> >>> >>> >>> I'm not overly fond of the code duplication here. Perhaps these could all >>> be >>> expressed relative to a stage-specific base flag? >> >> >> Not really. It shouldn't rely on the order the flags are declared. I >> can't make it better than this: > > > I don't know why we can't just say we should be able to rely on the order, > but the alternative below is indeed not really any better. We could rely on the order, but as with system value enumerations, we'd need a lot of assertions to prevent a breakage if somebody changes the order and we don't notice. I'll still amend the diff, because it makes the code a little shorter. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] *** Aubinator tool for Intel Gen platforms ***
On Tue, Aug 09, 2016 at 04:52:12PM -0700, Sirisha Gandikota wrote: > From: Sirisha Gandikota > > This is a patch series for adding the aubinator tool to the codebase. > > The aubinator tool is designed to help the driver developers to debug > the driver functionality by decoding the data in the .aub files. This > tool is for Intel Gen platforms and has been tested for Gen7.5, Gen8 > and Gen9 platforms. > > Kristian Høgsberg Kristensen (1): > aubinator: Add a new tool called Aubinator to the src/intel/tools > folder. Hi Sirisha, It's great to see this tool on the list. I seems that the first patch in the series is missing however. Could you send it out? Thanks, Nanley > > Sirisha Gandikota (1): > aubinator: Fix the tool to correctly decode the DWords > > configure.ac |1 + > src/Makefile.am |4 + > src/intel/tools/Makefile.am | 65 +++ > src/intel/tools/aubinator.c | 1140 > ++ > src/intel/tools/decoder.c| 515 +++ > src/intel/tools/decoder.h| 96 > src/intel/tools/disasm.c | 109 > src/intel/tools/gen_disasm.h | 35 ++ > src/intel/tools/intel_aub.h | 154 ++ > 9 files changed, 2119 insertions(+) > create mode 100644 src/intel/tools/Makefile.am > create mode 100644 src/intel/tools/aubinator.c > create mode 100644 src/intel/tools/decoder.c > create mode 100644 src/intel/tools/decoder.h > create mode 100644 src/intel/tools/disasm.c > create mode 100644 src/intel/tools/gen_disasm.h > create mode 100644 src/intel/tools/intel_aub.h > > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On Wed, Aug 10, 2016 at 5:48 PM, Jason Ekstrand wrote: > On Aug 10, 2016 1:02 AM, "Erik Faye-Lund" wrote: >> >> On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke >> wrote: >> > On Haswell (GL 3.3): >> > >> > total instructions in shared programs: 6208759 -> 6203860 (-0.08%) >> > instructions in affected programs: 856541 -> 851642 (-0.57%) >> > helped: 3157 >> > HURT: 113 >> > LOST: 7 >> > GAINED: 15 >> > >> > On Broadwell (GL 4.4): >> > >> > total instructions in shared programs: 11637854 -> 11632016 (-0.05%) >> > instructions in affected programs: 1055693 -> 1049855 (-0.55%) >> > helped: 3900 >> > HURT: 176 >> > LOST: 1 >> > GAINED: 18 >> > >> > Signed-off-by: Kenneth Graunke >> > --- >> > src/compiler/nir/nir_opt_algebraic.py | 4 >> > 1 file changed, 4 insertions(+) >> > >> > diff --git a/src/compiler/nir/nir_opt_algebraic.py >> > b/src/compiler/nir/nir_opt_algebraic.py >> > index 1cf614c..4e9896f 100644 >> > --- a/src/compiler/nir/nir_opt_algebraic.py >> > +++ b/src/compiler/nir/nir_opt_algebraic.py >> > @@ -251,6 +251,10 @@ optimizations = [ >> > (('ieq', 'a@bool', False), ('inot', 'a')), >> > (('bcsel', a, True, False), ('ine', a, 0)), >> > (('bcsel', a, False, True), ('ieq', a, 0)), >> > + (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))), >> > + (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))), >> > + (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0, >> > + (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0, >> > (('bcsel', True, b, c), b), >> > (('bcsel', False, b, c), c), >> > # The result of this should be hit by constant propagation and, in >> > the >> >> Same as the previous patch, this smells like intel-isms. Hardware that >> has native bcsel with support for two inline immediates will do better >> without. > > Why? If you're back-end handles b2f *worse* then bcsel of two components, > then it's broken. Also, we have a lot of optimization in NIR to help cut > through "fake booleans" where shaders use 0.0 and 1.0 and math operations > instead of actual booleans. Recognising hand-coded b2f suddenly enables > more of these optimization paths to run on the shader. That 0.57% isn't all > just immediates being removed. You're right. I don't know what I was thinking. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] nir: Convert ineg(b2i(a)) to a if it's a boolean.
On Wed, Aug 10, 2016 at 5:40 PM, Jason Ekstrand wrote: > On Aug 10, 2016 1:00 AM, "Erik Faye-Lund" wrote: >> >> On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke >> wrote: >> > On Haswell (GL 3.3): >> > >> > total instructions in shared programs: 6211678 -> 6211584 (-0.00%) >> > instructions in affected programs: 18968 -> 18874 (-0.50%) >> > helped: 62 >> > HURT: 0 >> > LOST: 0 >> > GAINED: 4 >> > >> > On Broadwell (GL 4.4): >> > >> > total instructions in shared programs: 11638930 -> 11638181 (-0.01%) >> > instructions in affected programs: 84768 -> 84019 (-0.88%) >> > helped: 505 >> > HURT: 45 >> > >> > Signed-off-by: Kenneth Graunke >> > --- >> > src/compiler/nir/nir_opt_algebraic.py | 1 + >> > 1 file changed, 1 insertion(+) >> > >> > diff --git a/src/compiler/nir/nir_opt_algebraic.py >> > b/src/compiler/nir/nir_opt_algebraic.py >> > index 0f0896b..1cf614c 100644 >> > --- a/src/compiler/nir/nir_opt_algebraic.py >> > +++ b/src/compiler/nir/nir_opt_algebraic.py >> > @@ -180,6 +180,7 @@ optimizations = [ >> > (('fmul', ('b2f', a), ('b2f', b)), ('b2f', ('iand', a, b))), >> > (('fsat', ('fadd', ('b2f', a), ('b2f', b))), ('b2f', ('ior', a, >> > b))), >> > (('iand', 'a@bool', 1.0), ('b2f', a)), >> > + (('ineg', ('b2i', 'a@bool')), a), >> > (('flt', ('fneg', ('b2f', a)), 0), a), # Generated by TGSI KILL_IF. >> > (('flt', ('fsub', 0.0, ('b2f', a)), 0), a), # Generated by TGSI >> > KILL_IF. >> > # Comparison with the same args. Note that these are not done for >> >> Are you sure this shouldn't be conditional? This being an improvement >> sounds like an intel-ism, and not something that applies to most >> hardware... > > Why? You're replacing two instructions with zero. That should help everyone. My bad, I somehow read this as something entirely different. Sorry for the noise, looks good to me. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/19] gallium/radeon: more descriptive names for LLVM temporaries in debug builds
On Tue, Aug 09, 2016 at 12:36:40PM +0200, Nicolai Hähnle wrote: > From: Nicolai Hähnle > This is a great idea. Reviewed-by: Tom Stellard > --- > src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 14 -- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > index 7b96a58..22ff18e 100644 > --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > @@ -31,20 +31,21 @@ > #include "gallivm/lp_bld_init.h" > #include "gallivm/lp_bld_intr.h" > #include "gallivm/lp_bld_misc.h" > #include "gallivm/lp_bld_swizzle.h" > #include "tgsi/tgsi_info.h" > #include "tgsi/tgsi_parse.h" > #include "util/u_math.h" > #include "util/u_memory.h" > #include "util/u_debug.h" > > +#include > #include > #include > > LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base, > enum tgsi_opcode_type type) > { > LLVMContextRef ctx = bld_base->base.gallivm->context; > > switch (type) { > case TGSI_TYPE_UNSIGNED: > @@ -421,20 +422,21 @@ static void emit_declaration(struct > lp_build_tgsi_context *bld_base, >ctx->soa.addr[idx][chan] = > si_build_alloca_undef( > &ctx->gallivm, > ctx->soa.bld_base.uint_bld.elem_type, > ""); > } > } > break; > } > > case TGSI_FILE_TEMPORARY: > { > + char name[16] = ""; > LLVMValueRef array_alloca = NULL; > unsigned decl_size; > first = decl->Range.First; > last = decl->Range.Last; > decl_size = 4 * ((last - first) + 1); > if (decl->Declaration.Array) { > unsigned id = decl->Array.ArrayID - 1; > if (!ctx->arrays) { > int size = > bld_base->info->array_max[TGSI_FILE_TEMPORARY]; > ctx->arrays = CALLOC(size, > sizeof(ctx->arrays[0])); > @@ -458,34 +460,42 @@ static void emit_declaration(struct > lp_build_tgsi_context *bld_base, > ctx->arrays[id].alloca = array_alloca; > } > } > > if (!ctx->temps_count) { > ctx->temps_count = > bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1; > ctx->temps = MALLOC(TGSI_NUM_CHANNELS * > ctx->temps_count * sizeof(LLVMValueRef)); > } > if (!array_alloca) { > for (i = 0; i < decl_size; ++i) { > +#ifdef DEBUG > + snprintf(name, sizeof(name), "TEMP%d.%c", > + first + i / 4, "xyzw"[i % 4]); > +#endif > ctx->temps[first * TGSI_NUM_CHANNELS + i] = > > si_build_alloca_undef(bld_base->base.gallivm, > > bld_base->base.vec_type, > - "temp"); > + name); > } > } else { > LLVMValueRef idxs[2] = { > bld_base->uint_bld.zero, > NULL > }; > for (i = 0; i < decl_size; ++i) { > +#ifdef DEBUG > + snprintf(name, sizeof(name), "TEMP%d.%c", > + first + i / 4, "xyzw"[i % 4]); > +#endif > idxs[1] = > lp_build_const_int32(bld_base->base.gallivm, i); > ctx->temps[first * TGSI_NUM_CHANNELS + i] = > - LLVMBuildGEP(builder, array_alloca, > idxs, 2, "temp"); > + LLVMBuildGEP(builder, array_alloca, > idxs, 2, name); > } > } > break; > } > case TGSI_FILE_INPUT: > { > unsigned idx; > for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) { > if (ctx->load_input) > ctx->load_input(ctx, idx, decl); > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/19] gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressing
On Tue, Aug 09, 2016 at 12:36:38PM +0200, Nicolai Hähnle wrote: > From: Nicolai Hähnle > > We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard > --- > src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 5 - > 1 file changed, 5 deletions(-) > > diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > index 41f24d3..e084248 100644 > --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > @@ -352,25 +352,20 @@ LLVMValueRef radeon_llvm_emit_fetch(struct > lp_build_tgsi_context *bld_base, > case TGSI_FILE_TEMPORARY: > if (reg->Register.Index >= ctx->temps_count) > return LLVMGetUndef(tgsi2llvmtype(bld_base, type)); > ptr = ctx->temps[reg->Register.Index * TGSI_NUM_CHANNELS + > swizzle]; > if (tgsi_type_is_64bit(type)) { > ptr2 = ctx->temps[reg->Register.Index * > TGSI_NUM_CHANNELS + swizzle + 1]; > return radeon_llvm_emit_fetch_64bit(bld_base, type, >LLVMBuildLoad(builder, ptr, > ""), >LLVMBuildLoad(builder, ptr2, > "")); > } > - LLVMValueRef array = get_alloca_for_array(bld_base, > reg->Register.File, reg->Register.Index); > - if (array) { > - return bitcast(bld_base, type, > load_value_from_array(bld_base, reg->Register.File, type, > - swizzle, reg->Register.Index, NULL)); > - } > result = LLVMBuildLoad(builder, ptr, ""); > break; > > case TGSI_FILE_OUTPUT: > ptr = lp_get_output_ptr(bld, reg->Register.Index, swizzle); > if (tgsi_type_is_64bit(type)) { > ptr2 = lp_get_output_ptr(bld, reg->Register.Index, > swizzle + 1); > return radeon_llvm_emit_fetch_64bit(bld_base, type, >LLVMBuildLoad(builder, ptr, > ""), >LLVMBuildLoad(builder, ptr2, > "")); > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/19] gallium/radeon: simplify radeon_llvm_emit_store for direct array addressing
On Tue, Aug 09, 2016 at 12:36:39PM +0200, Nicolai Hähnle wrote: > From: Nicolai Hähnle > > We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard > --- > src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 7 --- > 1 file changed, 7 deletions(-) > > diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > index e084248..7b96a58 100644 > --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > @@ -624,30 +624,23 @@ void radeon_llvm_emit_store(struct > lp_build_tgsi_context *bld_base, > } else { > switch(reg->Register.File) { > case TGSI_FILE_OUTPUT: > temp_ptr = > bld->outputs[reg->Register.Index][chan_index]; > if (tgsi_type_is_64bit(dtype)) > temp_ptr2 = > bld->outputs[reg->Register.Index][chan_index + 1]; > break; > > case TGSI_FILE_TEMPORARY: > { > - LLVMValueRef array; > if (reg->Register.Index >= ctx->temps_count) > continue; > - array = get_alloca_for_array(bld_base, > reg->Register.File, reg->Register.Index); > > - if (array) { > - store_value_to_array(bld_base, value, > reg->Register.File, chan_index, reg->Register.Index, > - NULL); > - continue; > - } > temp_ptr = ctx->temps[ TGSI_NUM_CHANNELS * > reg->Register.Index + chan_index]; > if (tgsi_type_is_64bit(dtype)) > temp_ptr2 = ctx->temps[ > TGSI_NUM_CHANNELS * reg->Register.Index + chan_index + 1]; > > break; > } > default: > return; > } > if (!tgsi_type_is_64bit(dtype)) > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/19] gallium/radeon: clean up emit_declaration for temporaries
On Tue, Aug 09, 2016 at 12:36:37PM +0200, Nicolai Hähnle wrote: > From: Nicolai Hähnle > > In the alloca'd array case, no longer create redundant and unused allocas > for the individual elements; create getelementptrs instead. Reviewed-by: Tom Stellard > --- > .../drivers/radeon/radeon_setup_tgsi_llvm.c| 27 > ++ > 1 file changed, 18 insertions(+), 9 deletions(-) > > diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > index d75311e..41f24d3 100644 > --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > @@ -408,81 +408,90 @@ static LLVMValueRef si_build_alloca_undef(struct > gallivm_state *gallivm, > LLVMValueRef ptr = lp_build_alloca(gallivm, type, name); > LLVMBuildStore(gallivm->builder, LLVMGetUndef(type), ptr); > return ptr; > } > > static void emit_declaration(struct lp_build_tgsi_context *bld_base, >const struct tgsi_full_declaration *decl) > { > struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base); > LLVMBuilderRef builder = bld_base->base.gallivm->builder; > - unsigned first, last, i, idx; > + unsigned first, last, i; > switch(decl->Declaration.File) { > case TGSI_FILE_ADDRESS: > { >unsigned idx; > for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) { > unsigned chan; > for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) { >ctx->soa.addr[idx][chan] = > si_build_alloca_undef( > &ctx->gallivm, > ctx->soa.bld_base.uint_bld.elem_type, > ""); > } > } > break; > } > > case TGSI_FILE_TEMPORARY: > { > + LLVMValueRef array_alloca = NULL; > unsigned decl_size; > first = decl->Range.First; > last = decl->Range.Last; > decl_size = 4 * ((last - first) + 1); > if (decl->Declaration.Array) { > unsigned id = decl->Array.ArrayID - 1; > if (!ctx->arrays) { > int size = > bld_base->info->array_max[TGSI_FILE_TEMPORARY]; > ctx->arrays = CALLOC(size, > sizeof(ctx->arrays[0])); > - for (i = 0; i < size; ++i) { > - assert(!ctx->arrays[i].alloca);} > } > > ctx->arrays[id].range = decl->Range; > > /* If the array is more than 16 elements (each element >* is 32-bits), then store it in a vector. Storing the >* array in a vector will causes the compiler to store >* the array in registers and access it using indirect >* addressing. 16 is number of vector elements that >* LLVM will store in a register. >* FIXME: We shouldn't need to do this. LLVM should be >* smart enough to promote allocas int registers when >* profitable. >*/ > if (decl_size > 16) { > - ctx->arrays[id].alloca = > LLVMBuildAlloca(builder, > + array_alloca = LLVMBuildAlloca(builder, > LLVMArrayType(bld_base->base.vec_type, > decl_size),"array"); > + ctx->arrays[id].alloca = array_alloca; > } > } > - first = decl->Range.First; > - last = decl->Range.Last; > + > if (!ctx->temps_count) { > ctx->temps_count = > bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1; > ctx->temps = MALLOC(TGSI_NUM_CHANNELS * > ctx->temps_count * sizeof(LLVMValueRef)); > } > - for (idx = first; idx <= last; idx++) { > - for (i = 0; i < TGSI_NUM_CHANNELS; i++) { > - ctx->temps[idx * TGSI_NUM_CHANNELS + i] = > + if (!array_alloca) { > + for (i = 0; i < decl_size; ++i) { > + ctx->temps[first * TGSI_NUM_CHANNELS + i] = > > si_build_alloca_undef(bld_base->base.gallivm, > > bld_base->base.vec_type, > "temp"); > } > + } else { > + LLVMValueRef idxs[2] = { > + bld_base->uint_bld.zero, > + NULL > +
Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On Aug 10, 2016 1:02 AM, "Erik Faye-Lund" wrote: > > On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke wrote: > > On Haswell (GL 3.3): > > > > total instructions in shared programs: 6208759 -> 6203860 (-0.08%) > > instructions in affected programs: 856541 -> 851642 (-0.57%) > > helped: 3157 > > HURT: 113 > > LOST: 7 > > GAINED: 15 > > > > On Broadwell (GL 4.4): > > > > total instructions in shared programs: 11637854 -> 11632016 (-0.05%) > > instructions in affected programs: 1055693 -> 1049855 (-0.55%) > > helped: 3900 > > HURT: 176 > > LOST: 1 > > GAINED: 18 > > > > Signed-off-by: Kenneth Graunke > > --- > > src/compiler/nir/nir_opt_algebraic.py | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py > > index 1cf614c..4e9896f 100644 > > --- a/src/compiler/nir/nir_opt_algebraic.py > > +++ b/src/compiler/nir/nir_opt_algebraic.py > > @@ -251,6 +251,10 @@ optimizations = [ > > (('ieq', 'a@bool', False), ('inot', 'a')), > > (('bcsel', a, True, False), ('ine', a, 0)), > > (('bcsel', a, False, True), ('ieq', a, 0)), > > + (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))), > > + (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))), > > + (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0, > > + (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0, > > (('bcsel', True, b, c), b), > > (('bcsel', False, b, c), c), > > # The result of this should be hit by constant propagation and, in the > > Same as the previous patch, this smells like intel-isms. Hardware that > has native bcsel with support for two inline immediates will do better > without. Why? If you're back-end handles b2f *worse* then bcsel of two components, then it's broken. Also, we have a lot of optimization in NIR to help cut through "fake booleans" where shaders use 0.0 and 1.0 and math operations instead of actual booleans. Recognising hand-coded b2f suddenly enables more of these optimization paths to run on the shader. That 0.57% isn't all just immediates being removed. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] nir: Convert ineg(b2i(a)) to a if it's a boolean.
On Aug 10, 2016 1:00 AM, "Erik Faye-Lund" wrote: > > On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke wrote: > > On Haswell (GL 3.3): > > > > total instructions in shared programs: 6211678 -> 6211584 (-0.00%) > > instructions in affected programs: 18968 -> 18874 (-0.50%) > > helped: 62 > > HURT: 0 > > LOST: 0 > > GAINED: 4 > > > > On Broadwell (GL 4.4): > > > > total instructions in shared programs: 11638930 -> 11638181 (-0.01%) > > instructions in affected programs: 84768 -> 84019 (-0.88%) > > helped: 505 > > HURT: 45 > > > > Signed-off-by: Kenneth Graunke > > --- > > src/compiler/nir/nir_opt_algebraic.py | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py > > index 0f0896b..1cf614c 100644 > > --- a/src/compiler/nir/nir_opt_algebraic.py > > +++ b/src/compiler/nir/nir_opt_algebraic.py > > @@ -180,6 +180,7 @@ optimizations = [ > > (('fmul', ('b2f', a), ('b2f', b)), ('b2f', ('iand', a, b))), > > (('fsat', ('fadd', ('b2f', a), ('b2f', b))), ('b2f', ('ior', a, b))), > > (('iand', 'a@bool', 1.0), ('b2f', a)), > > + (('ineg', ('b2i', 'a@bool')), a), > > (('flt', ('fneg', ('b2f', a)), 0), a), # Generated by TGSI KILL_IF. > > (('flt', ('fsub', 0.0, ('b2f', a)), 0), a), # Generated by TGSI KILL_IF. > > # Comparison with the same args. Note that these are not done for > > Are you sure this shouldn't be conditional? This being an improvement > sounds like an intel-ism, and not something that applies to most > hardware... Why? You're replacing two instructions with zero. That should help everyone. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/19] gallium, radeonsi: temporary array improvements
FYI, I think I'm going to drop most of the gallium part of this series and replace it by a pass over the resulting TGSI to look at which components of the arrays are actually being used. Nicolai On 09.08.2016 12:36, Nicolai Hähnle wrote: Hi, this series was originally motivated by fixing a VM fault and ended up growing a bit larger :-) The goal of patches 1-7 is to change st/mesa so that it sets the UsageMask field in temporary array declarations. This ends up being helpful for lowering float and vecN arrays with N <= 3. The remaining patches are radeon (really radeonsi) specific. They begin with a bunch of cleanups, and then do two things: first, when alloca is used for arrays, make use of the UsageMask to allocate smaller arrays when possible. Second, add explicit bounds checks when accessing those arrays to prevent VM faults -- those temporary array accesses are not protected by limits in buffer descriptors. Note that the radeon part of the series exposes some pre-existing LLVM bugs in Piglit, at least one of which has already been encountered elsewhere, see http://reviews.llvm.org/D22556 and http://reviews.llvm.org/D23303. Please review! Thanks, Nicolai ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 96550] 12.0.0-rc3: mesa_dri_drivers.so linking fails with: relocation R_X86_64_32S against `V4F_COUNT' can not be used when making a shared object
https://bugs.freedesktop.org/show_bug.cgi?id=96550 --- Comment #1 from war...@o2.pl --- FYI: ppl from Phoronix forums located the cause of the problem: gen_matypes.c is passed to the compiler in order to generate assembly code in text form (compiler switch: -S), but with -flto GCC doesn't output any code nor data because that is postponed to link time. In such case gcc -S -flto generates a file with empty .text and .data sections. This isn't a GCC bug. Passing -ffat-lto-objects solves issue for me so maybe if compile passes -S together with -flto then the config should imply -ffat-lto-objects ? -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 97231] GL_DEPTH_CLAMP doesn't clamp to the far plane
https://bugs.freedesktop.org/show_bug.cgi?id=97231 Jules Blok changed: What|Removed |Added Attachment #125650|0 |1 is obsolete|| --- Comment #15 from Jules Blok --- Created attachment 125671 --> https://bugs.freedesktop.org/attachment.cgi?id=125671&action=edit apitrace file llvmpipe Finally got an apitrace running on llvmpipe by using a VM. Sorry for the delay. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] RadeonSI: GPU hang fix + 3 small changes
On 10.08.2016 01:34, Marek Olšák wrote: Hi, The first patch is pretty important and fixes the root cause of Overlord and Witcher 2 hangs. The GLSL revert can be reverted after that. I wonder what else this might fix. Anyway, nice catch! The series is Reviewed-by: Nicolai Hähnle Please review. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/8] st/mesa: determine states used or affected by shaders at compile time
On 10.08.2016 13:05, Marek Olšák wrote: On Tue, Aug 9, 2016 at 12:56 PM, Nicolai Hähnle wrote: On 07.08.2016 03:12, Marek Olšák wrote: From: Marek Olšák At compile time, each shader determines which ST_NEW flags should be set at shader bind time. This just sets the new field for all shaders. The next commit will use it. --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 175 - src/mesa/state_tracker/st_program.c| 37 +- src/mesa/state_tracker/st_program.h| 6 + 3 files changed, 215 insertions(+), 3 deletions(-) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 362559f..fd14766 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -,31 +,202 @@ get_mesa_program_tgsi(struct gl_context *ctx, static struct gl_program * get_mesa_program(struct gl_context *ctx, struct gl_shader_program *shader_program, struct gl_linked_shader *shader) { struct pipe_screen *pscreen = ctx->st->pipe->screen; unsigned ptarget = st_shader_stage_to_ptarget(shader->Stage); enum pipe_shader_ir preferred_ir = (enum pipe_shader_ir) pscreen->get_shader_param(pscreen, ptarget, PIPE_SHADER_CAP_PREFERRED_IR); + struct gl_program *prog = NULL; + if (preferred_ir == PIPE_SHADER_IR_NIR) { /* TODO only for GLSL VS/FS for now: */ switch (shader->Stage) { case MESA_SHADER_VERTEX: case MESA_SHADER_FRAGMENT: - return st_nir_get_mesa_program(ctx, shader_program, shader); + prog = st_nir_get_mesa_program(ctx, shader_program, shader); default: break; } + } else { + prog = get_mesa_program_tgsi(ctx, shader_program, shader); + } + + if (prog) { + uint64_t *states; + + /* This determines which states will be updated when the shader is + * bound. + */ + switch (shader->Stage) { + case MESA_SHADER_VERTEX: + states = &((struct st_vertex_program*)prog)->affected_states; + + *states = ST_NEW_VS_STATE | + ST_NEW_RASTERIZER | + ST_NEW_VERTEX_ARRAYS; + + if (prog->Parameters->NumParameters) +*states |= ST_NEW_VS_CONSTANTS; + + if (shader->num_samplers) +*states |= ST_NEW_VS_SAMPLER_VIEWS | + ST_NEW_RENDER_SAMPLERS; + + if (shader->NumImages) +*states |= ST_NEW_VS_IMAGES; + + if (shader->NumUniformBlocks) +*states |= ST_NEW_VS_UBOS; + + if (shader->NumShaderStorageBlocks) +*states |= ST_NEW_VS_SSBOS; + + if (shader->NumAtomicBuffers) +*states |= ST_NEW_VS_ATOMICS; I'm not overly fond of the code duplication here. Perhaps these could all be expressed relative to a stage-specific base flag? Not really. It shouldn't rely on the order the flags are declared. I can't make it better than this: I don't know why we can't just say we should be able to rely on the order, but the alternative below is indeed not really any better. Cheers, Nicolai diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index fd14766..3e5b322 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -6664,6 +6664,37 @@ get_mesa_program_tgsi(struct gl_context *ctx, return prog; } +static void +set_affected_state_flags(uint64_t *states, + struct gl_program *prog, + struct gl_linked_shader *shader, + uint64_t new_constants, + uint64_t new_sampler_views, + uint64_t new_samplers, + uint64_t new_images, + uint64_t new_ubos, + uint64_t new_ssbos, + uint64_t new_atomics) +{ + if (prog->Parameters->NumParameters) + *states |= new_constants; + + if (shader->num_samplers) + *states |= new_sampler_views | new_samplers; + + if (shader->NumImages) + *states |= new_images; + + if (shader->NumUniformBlocks) + *states |= new_ubos; + + if (shader->NumShaderStorageBlocks) + *states |= new_ssbos; + + if (shader->NumAtomicBuffers) + *states |= new_atomics; +} + static struct gl_program * get_mesa_program(struct gl_context *ctx, struct gl_shader_program *shader_program, @@ -6702,24 +6733,14 @@ get_mesa_program(struct gl_context *ctx, ST_NEW_RASTERIZER | ST_NEW_VERTEX_ARRAYS; - if (prog->Parameters->NumParameters) -*states |= ST_NEW_VS_CONSTANTS; - - if (shader->num_samplers) -*states |= ST_NEW_VS_SAMPLER_VIEWS | - ST_NEW_RENDER_SAMPLERS; - - if (shader->NumImages) -*states |
Re: [Mesa-dev] [PATCH] [rfc] glsl: allow invariant on fragment shader outputs.
Hi, On Mon, 2016-05-23 at 14:18 +1000, Dave Airlie wrote: > From: Dave Airlie > > A CTS test manages to generate this: > GL45-CTS.shading_language_420pack.qualifier_order > > I cannot find definitive evidence in the spec that it isn't > allowed. The specs mentions some things can't be used on > fragment shader outputs, but never specifies invariant as > one of them. Chris found GLSL1.20 was more explicit about this > but the language was changed/removed in GLSL 1.30. > > This might be a spec bug, I'm not sure. Specs are ambiguous, so I think it can be considered a bug in the specs, since they are not that explicit to ban the qualifier in the FS outputs. However, my understanding is that using the qualifier is not valid and, hence, this patch should be dropped. From page 27 (page 33 of the PDF) of the GLSL 1.20 spec: " Only variables output from a vertex shader can be candidates for invariance." But this later changes to: From page 37 (page 43 of the PDF) of the GLSL 1.30 spec: " Only variables output from a shader can be candidates for invariance." However, there is little sense on allowing this qualifier for the FS outputs and, actually, we can also read, in this direction: From page 37 (page 43 of the PDF) of the GLSL 1.30 spec: " Initially, by default, all output variables are allowed to be variant. To force all output variables to be invariant, use the pragma #pragma STDGL invariant(all) before all declarations in a shader. If this pragma is used after the declaration of any variables or functions, then the set of outputs that behave as invariant is undefined. It is an error to use this pragma in a fragment shader." Then, I think it is safe to assume that the individual use of "invariant" is also forbidden in FS outputs. However, would the conclusion be that FS outputs are actually allowed to be qualified as invariant, this patch LGTM, then. -- Br, Andres ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Moving amdgpu/addrlib into a git submodule
On 10.08.2016 06:22, Edward O'Callaghan wrote: Well taking a step back here for a second. My initial thoughts are - what, conceptually, is the primary user of addrlib from an extremely high level view? If the case is mesa3d then perhaps mesa should be treated as the upstream from which other users pull. Mesa really doesn't make logical sense as a dependency for the HSA stack. If the case is that it is indeed truly agnostic/general then perhaps deferring to the packaging expertise/infrastructure of the distro's (aka pkgconfig, etc..) is perhaps the best mechanism. In the latter case, addrlib would be released with versioning which should be fairly maintainable as it is a slowly moving target right? Please no. Those things are a fragile mess. I appreciate it the thought, but there's a reason why major version upgrades of shared libraries are feared. Nicolai Just my two cent, hope its helpful! Kind Regards, Edward. On 08/10/2016 01:39 AM, Christian König wrote: Am 09.08.2016 um 15:47 schrieb Nicolai Hähnle: Hi everybody, addrlib is the addressing and alignment calculator which is used by radeonsi. It's developed (and also used) internally at AMD, and so far we've had one open source copy living in the Mesa repository at src/gallium/winsys/amdgpu/drm/addrlib. The question of using addrlib in non-Mesa parts of our open-source stack has come up, in particular in relation to compute. We'd obviously like to share the code rather than having multiple copies flying around. Since the interface of addrlib is slow-moving but unstable, shared linking is not an option. I think the best way forward is to create a dedicated repository for addrlib which is then integrated into Mesa as a git submodule. The point of this email is to gather feedback from the Mesa community on this plan, which is explicitly: (0) Create an addrlib repository, say amd/addrlib on fd.o. (1) Add it as a git submodule to the Mesa repository. (2) Fix up whatever aspects of the build system that may be affected (perhaps for building source tarballs). (3) Go back to mostly ignoring addrlib, except for trying to get better at syncing with the internal closed-source version. From initial experiments, the impact on users interested in radeon is that they will have to run `git submodule init` and then occasionally `git submodule update`. Users who do not build radeonsi should be able to ignore the change completely. There are alternatives. For example, ROCm uses Google's repo tool already. But for Mesa, git submodule looks like a lightweight, well supported and overall conservative option that everybody should already have installed. If there are good arguments for something else, let's hear them! Another point: if we proceed with this plan, I think we should consider moving addrlib into src/amd/addrlib. There are two reasons: First, transitioning to a submodule *without* changing the directory is probably more fragile, i.e. what happens when you switch between checkouts before and after the transition. Second, if/when radv ends up being merged into Mesa master, it makes sense to have addrlib there anyway. Thoughts? Complaints? Praise? Well using git submodule is a possibility and we had rather good experience with that in GStreamer. But it would remove one major argument to beating the addrlib guys towards a stable interface and/or proper library version handling. Moving it into libdrm is clearly not an option because then you would need to use versioning for the whole libdrm_amdgpu library which we don't want. Christian. Nicolai ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Moving amdgpu/addrlib into a git submodule
On 09.08.2016 22:12, Dave Airlie wrote: tbh, git submodules are more annoying than they need to be, and I'm not really terribly excited to use that for something that is a build dependency. Maybe just move it into libdrm instead? I've only had to use git submodules once with spice project, and it was a nightmare. It makes packaging etc a real pita. It would be nice to have something more concrete to go on, like what are the problems _really_ :) Alternatives are something like a fetch external sources script, that does git submodules but does it better, you'll see Vulkan-CTS etc use something like that, it would have to be integrated with the build system a bit better though. What pain do submodules give you that wouldn't be even _worse_ with a separate script for fetching external sources? Look, there are a bunch of different use cases here. Distro packagers; Mesa contributors and bleeding-edge users; and us (which is slightly separate because we're not _just_ Mesa contributors but have parts of the stack elsewhere, which is why this is coming up in the first place). Most of the complaints seem to come from distro packaging, though I have yet to understand what the exact problems are. For Mesa contributors and users, I'd really like to avoid the hassle of having to get another repository manually. For us, in addition to avoiding the same hassles there's the question of maintainability, and perhaps (with a huge load of optimism) getting the closed source folks a bit more involved in the fact that there's another world out here as well. On those axes, I've seen one alternative suggestion that makes sense to me, and that is subtree-merges. I have to play with them a bit, but the initial impression is good. They seem like a more git-like solution. That would also be something new for Mesa, though, since it would mean more exceptions to the otherwise linear history. Nicolai ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: in ATI fs don't assume TEMP0=REG0
Pushed, thanks. Marek On Mon, Aug 8, 2016 at 12:48 AM, Miklós Máté wrote: > The temporaries are allocated dynamically. > > Signed-off-by: Miklós Máté > --- > src/mesa/state_tracker/st_atifs_to_tgsi.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c > b/src/mesa/state_tracker/st_atifs_to_tgsi.c > index 66f442a..ae08796 100644 > --- a/src/mesa/state_tracker/st_atifs_to_tgsi.c > +++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c > @@ -691,6 +691,7 @@ transform_inst: >struct tgsi_full_instruction inst; >unsigned i; >int fogc_index = -1; > + int reg0_index = current_inst->Src[0].Register.Index; > >/* find FOGC input */ >for (i = 0; i < ctx->info.num_inputs; i++) { > @@ -804,11 +805,11 @@ transform_inst: >inst.Instruction.Opcode = TGSI_OPCODE_LRP; >inst.Instruction.NumDstRegs = 1; >inst.Dst[0].Register.File = TGSI_FILE_TEMPORARY; > - inst.Dst[0].Register.Index = 0; > + inst.Dst[0].Register.Index = reg0_index; >inst.Dst[0].Register.WriteMask = TGSI_WRITEMASK_XYZW; >inst.Instruction.NumSrcRegs = 3; >SET_SRC(&inst, 0, TGSI_FILE_TEMPORARY, ctx->fog_factor_temp, X, X, X, > Y); > - SET_SRC(&inst, 1, TGSI_FILE_TEMPORARY, 0, X, Y, Z, W); > + SET_SRC(&inst, 1, TGSI_FILE_TEMPORARY, reg0_index, X, Y, Z, W); >SET_SRC(&inst, 2, TGSI_FILE_CONSTANT, MAX_NUM_FRAGMENT_CONSTANTS_ATI + > 1, X, Y, Z, W); >tctx->emit_instruction(tctx, &inst); > } > -- > 2.8.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] gallium/radeon: use lp_build_alloca_undef
For the series: Reviewed-by: Marek Olšák Marek On Tue, Aug 9, 2016 at 12:38 PM, Nicolai Hähnle wrote: > From: Nicolai Hähnle > > Avoid building all those store 0 / store undef instrucction pairs that > end up getting removed anyway. > --- > src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 15 +++ > 1 file changed, 3 insertions(+), 12 deletions(-) > > diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > index 6a010d5..b419add 100644 > --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c > @@ -483,43 +483,34 @@ static LLVMValueRef fetch_system_value(struct > lp_build_tgsi_context *bld_base, > struct gallivm_state *gallivm = bld_base->base.gallivm; > > LLVMValueRef cval = ctx->system_values[reg->Register.Index]; > if (LLVMGetTypeKind(LLVMTypeOf(cval)) == LLVMVectorTypeKind) { > cval = LLVMBuildExtractElement(gallivm->builder, cval, >lp_build_const_int32(gallivm, > swizzle), ""); > } > return bitcast(bld_base, type, cval); > } > > -static LLVMValueRef si_build_alloca_undef(struct gallivm_state *gallivm, > - LLVMTypeRef type, > - const char *name) > -{ > - LLVMValueRef ptr = lp_build_alloca(gallivm, type, name); > - LLVMBuildStore(gallivm->builder, LLVMGetUndef(type), ptr); > - return ptr; > -} > - > static void emit_declaration(struct lp_build_tgsi_context *bld_base, > const struct tgsi_full_declaration *decl) > { > struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base); > LLVMBuilderRef builder = bld_base->base.gallivm->builder; > unsigned first, last, i; > switch(decl->Declaration.File) { > case TGSI_FILE_ADDRESS: > { > unsigned idx; > for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) > { > unsigned chan; > for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) { > -ctx->soa.addr[idx][chan] = > si_build_alloca_undef( > +ctx->soa.addr[idx][chan] = > lp_build_alloca_undef( > &ctx->gallivm, > ctx->soa.bld_base.uint_bld.elem_type, > ""); > } > } > break; > } > > case TGSI_FILE_TEMPORARY: > { > char name[16] = ""; > @@ -574,21 +565,21 @@ static void emit_declaration(struct > lp_build_tgsi_context *bld_base, > ctx->temps_count = > bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1; > ctx->temps = MALLOC(TGSI_NUM_CHANNELS * > ctx->temps_count * sizeof(LLVMValueRef)); > } > if (!array_alloca) { > for (i = 0; i < decl_size; ++i) { > #ifdef DEBUG > snprintf(name, sizeof(name), "TEMP%d.%c", > first + i / 4, "xyzw"[i % 4]); > #endif > ctx->temps[first * TGSI_NUM_CHANNELS + i] = > - > si_build_alloca_undef(bld_base->base.gallivm, > + > lp_build_alloca_undef(bld_base->base.gallivm, > > bld_base->base.vec_type, > name); > } > } else { > LLVMValueRef idxs[2] = { > bld_base->uint_bld.zero, > NULL > }; > LLVMValueRef undef = NULL; > unsigned j = 0; > @@ -633,21 +624,21 @@ static void emit_declaration(struct > lp_build_tgsi_context *bld_base, > } > break; > > case TGSI_FILE_OUTPUT: > { > unsigned idx; > for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) > { > unsigned chan; > assert(idx < RADEON_LLVM_MAX_OUTPUTS); > for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) { > - ctx->soa.outputs[idx][chan] = > si_build_alloca_undef( > + ctx->soa.outputs[idx][chan] = > lp_build_alloca_undef( > &ctx->gallivm, > ctx->soa.bld_base.base.elem_type, ""); > } > } > break; > } > > case TGSI_FILE_MEMORY: >
Re: [Mesa-dev] [PATCH] st/nine: Fix invalid attempt to use indirect draws.
Pushed. Marek On Wed, Aug 10, 2016 at 1:16 PM, trevor.davenp...@gmail.com wrote: > I don't have commit access so someone else will need to push this out. > > Trevor > > > On Aug 10, 2016 2:56 AM, "Marek Olšák" wrote: >> >> Reviewed-by: Marek Olšák >> >> Marek >> >> On Wed, Aug 10, 2016 at 5:28 AM, Trevor Davenport >> wrote: >> > Since commit 6d7177f01b231e9fe79a558c28d2b562a218d7ea, radeonsi >> > would take a different path if info->indirect_params was not >> > initialized properly. Nine was not initializating this field. >> > --- >> > src/gallium/state_trackers/nine/device9.c | 1 + >> > 1 file changed, 1 insertion(+) >> > >> > diff --git src/gallium/state_trackers/nine/device9.c >> > src/gallium/state_trackers/nine/device9.c >> > index d233304..3f6577c 100644 >> > --- src/gallium/state_trackers/nine/device9.c >> > +++ src/gallium/state_trackers/nine/device9.c >> > @@ -2935,6 +2935,7 @@ init_draw_info(struct pipe_draw_info *info, >> > info->restart_index = 0; >> > info->count_from_stream_output = NULL; >> > info->indirect = NULL; >> > +info->indirect_params = NULL; >> > } >> > >> > HRESULT NINE_WINAPI >> > -- >> > 2.7.4 >> > >> > ___ >> > mesa-dev mailing list >> > mesa-dev@lists.freedesktop.org >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] RadeonSI: GPU hang fix + 3 small changes
This series is, Reviewed-by: Edward O'Callaghan On 08/10/2016 09:34 AM, Marek Olšák wrote: > Hi, > > The first patch is pretty important and fixes the root cause of Overlord and > Witcher 2 hangs. The GLSL revert can be reverted after that. > > Please review. > > Marek > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/nine: Fix invalid attempt to use indirect draws.
I don't have commit access so someone else will need to push this out. Trevor On Aug 10, 2016 2:56 AM, "Marek Olšák" wrote: > Reviewed-by: Marek Olšák > > Marek > > On Wed, Aug 10, 2016 at 5:28 AM, Trevor Davenport > wrote: > > Since commit 6d7177f01b231e9fe79a558c28d2b562a218d7ea, radeonsi > > would take a different path if info->indirect_params was not > > initialized properly. Nine was not initializating this field. > > --- > > src/gallium/state_trackers/nine/device9.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git src/gallium/state_trackers/nine/device9.c > src/gallium/state_trackers/nine/device9.c > > index d233304..3f6577c 100644 > > --- src/gallium/state_trackers/nine/device9.c > > +++ src/gallium/state_trackers/nine/device9.c > > @@ -2935,6 +2935,7 @@ init_draw_info(struct pipe_draw_info *info, > > info->restart_index = 0; > > info->count_from_stream_output = NULL; > > info->indirect = NULL; > > +info->indirect_params = NULL; > > } > > > > HRESULT NINE_WINAPI > > -- > > 2.7.4 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/8] st/mesa: determine states used or affected by shaders at compile time
On Tue, Aug 9, 2016 at 12:56 PM, Nicolai Hähnle wrote: > On 07.08.2016 03:12, Marek Olšák wrote: >> >> From: Marek Olšák >> >> At compile time, each shader determines which ST_NEW flags should be set >> at shader bind time. >> >> This just sets the new field for all shaders. The next commit will use it. >> --- >> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 175 >> - >> src/mesa/state_tracker/st_program.c| 37 +- >> src/mesa/state_tracker/st_program.h| 6 + >> 3 files changed, 215 insertions(+), 3 deletions(-) >> >> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> index 362559f..fd14766 100644 >> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> @@ -,31 +,202 @@ get_mesa_program_tgsi(struct gl_context *ctx, >> >> static struct gl_program * >> get_mesa_program(struct gl_context *ctx, >> struct gl_shader_program *shader_program, >> struct gl_linked_shader *shader) >> { >> struct pipe_screen *pscreen = ctx->st->pipe->screen; >> unsigned ptarget = st_shader_stage_to_ptarget(shader->Stage); >> enum pipe_shader_ir preferred_ir = (enum pipe_shader_ir) >>pscreen->get_shader_param(pscreen, ptarget, >> PIPE_SHADER_CAP_PREFERRED_IR); >> + struct gl_program *prog = NULL; >> + >> if (preferred_ir == PIPE_SHADER_IR_NIR) { >>/* TODO only for GLSL VS/FS for now: */ >>switch (shader->Stage) { >>case MESA_SHADER_VERTEX: >>case MESA_SHADER_FRAGMENT: >> - return st_nir_get_mesa_program(ctx, shader_program, shader); >> + prog = st_nir_get_mesa_program(ctx, shader_program, shader); >>default: >> break; >>} >> + } else { >> + prog = get_mesa_program_tgsi(ctx, shader_program, shader); >> + } >> + >> + if (prog) { >> + uint64_t *states; >> + >> + /* This determines which states will be updated when the shader is >> + * bound. >> + */ >> + switch (shader->Stage) { >> + case MESA_SHADER_VERTEX: >> + states = &((struct st_vertex_program*)prog)->affected_states; >> + >> + *states = ST_NEW_VS_STATE | >> + ST_NEW_RASTERIZER | >> + ST_NEW_VERTEX_ARRAYS; >> + >> + if (prog->Parameters->NumParameters) >> +*states |= ST_NEW_VS_CONSTANTS; >> + >> + if (shader->num_samplers) >> +*states |= ST_NEW_VS_SAMPLER_VIEWS | >> + ST_NEW_RENDER_SAMPLERS; >> + >> + if (shader->NumImages) >> +*states |= ST_NEW_VS_IMAGES; >> + >> + if (shader->NumUniformBlocks) >> +*states |= ST_NEW_VS_UBOS; >> + >> + if (shader->NumShaderStorageBlocks) >> +*states |= ST_NEW_VS_SSBOS; >> + >> + if (shader->NumAtomicBuffers) >> +*states |= ST_NEW_VS_ATOMICS; > > > I'm not overly fond of the code duplication here. Perhaps these could all be > expressed relative to a stage-specific base flag? Not really. It shouldn't rely on the order the flags are declared. I can't make it better than this: diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index fd14766..3e5b322 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -6664,6 +6664,37 @@ get_mesa_program_tgsi(struct gl_context *ctx, return prog; } +static void +set_affected_state_flags(uint64_t *states, + struct gl_program *prog, + struct gl_linked_shader *shader, + uint64_t new_constants, + uint64_t new_sampler_views, + uint64_t new_samplers, + uint64_t new_images, + uint64_t new_ubos, + uint64_t new_ssbos, + uint64_t new_atomics) +{ + if (prog->Parameters->NumParameters) + *states |= new_constants; + + if (shader->num_samplers) + *states |= new_sampler_views | new_samplers; + + if (shader->NumImages) + *states |= new_images; + + if (shader->NumUniformBlocks) + *states |= new_ubos; + + if (shader->NumShaderStorageBlocks) + *states |= new_ssbos; + + if (shader->NumAtomicBuffers) + *states |= new_atomics; +} + static struct gl_program * get_mesa_program(struct gl_context *ctx, struct gl_shader_program *shader_program, @@ -6702,24 +6733,14 @@ get_mesa_program(struct gl_context *ctx, ST_NEW_RASTERIZER | ST_NEW_VERTEX_ARRAYS; - if (prog->Parameters->NumParameters) -*states |= ST_NEW_VS_CONSTANTS; - - if (shader->num_samplers) -*states |= ST_NEW_VS_SAMPLER_VIEWS | - ST_NEW_RENDER_SAMPLERS; - - if (sha
Re: [Mesa-dev] [PATCH 0/8] More state optimizations for st/mesa
On Tue, Aug 9, 2016 at 12:58 PM, Nicolai Hähnle wrote: > I sent comments on patches 4 & 6. Apart from that, the series is > > Reviewed-by: Nicolai Hähnle > > On 07.08.2016 03:12, Marek Olšák wrote: >> >> PS: In order to make reviewing easier, all my patches have 10 lines >> of contexts instead of 3. That will be the default for all my work >> from now on. > > > I like that idea, I've made the same change to my git config. BTW, it breaks the Patchwork patch-matching when doing git-push, but let's consider that a patchwork bug. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/nine: Fix invalid attempt to use indirect draws.
Reviewed-by: Marek Olšák Marek On Wed, Aug 10, 2016 at 5:28 AM, Trevor Davenport wrote: > Since commit 6d7177f01b231e9fe79a558c28d2b562a218d7ea, radeonsi > would take a different path if info->indirect_params was not > initialized properly. Nine was not initializating this field. > --- > src/gallium/state_trackers/nine/device9.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git src/gallium/state_trackers/nine/device9.c > src/gallium/state_trackers/nine/device9.c > index d233304..3f6577c 100644 > --- src/gallium/state_trackers/nine/device9.c > +++ src/gallium/state_trackers/nine/device9.c > @@ -2935,6 +2935,7 @@ init_draw_info(struct pipe_draw_info *info, > info->restart_index = 0; > info->count_from_stream_output = NULL; > info->indirect = NULL; > +info->indirect_params = NULL; > } > > HRESULT NINE_WINAPI > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke wrote: > On Haswell (GL 3.3): > > total instructions in shared programs: 6208759 -> 6203860 (-0.08%) > instructions in affected programs: 856541 -> 851642 (-0.57%) > helped: 3157 > HURT: 113 > LOST: 7 > GAINED: 15 > > On Broadwell (GL 4.4): > > total instructions in shared programs: 11637854 -> 11632016 (-0.05%) > instructions in affected programs: 1055693 -> 1049855 (-0.55%) > helped: 3900 > HURT: 176 > LOST: 1 > GAINED: 18 > > Signed-off-by: Kenneth Graunke > --- > src/compiler/nir/nir_opt_algebraic.py | 4 > 1 file changed, 4 insertions(+) > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index 1cf614c..4e9896f 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -251,6 +251,10 @@ optimizations = [ > (('ieq', 'a@bool', False), ('inot', 'a')), > (('bcsel', a, True, False), ('ine', a, 0)), > (('bcsel', a, False, True), ('ieq', a, 0)), > + (('bcsel@32', a, 1.0, 0.0), ('b2f', ('ine', a, 0))), > + (('bcsel@32', a, 0.0, 1.0), ('b2f', ('ieq', a, 0))), > + (('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', ('ine', a, 0, > + (('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('ieq', a, 0, > (('bcsel', True, b, c), b), > (('bcsel', False, b, c), c), > # The result of this should be hit by constant propagation and, in the Same as the previous patch, this smells like intel-isms. Hardware that has native bcsel with support for two inline immediates will do better without. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] nir: Convert ineg(b2i(a)) to a if it's a boolean.
On Wed, Aug 10, 2016 at 4:30 AM, Kenneth Graunke wrote: > On Haswell (GL 3.3): > > total instructions in shared programs: 6211678 -> 6211584 (-0.00%) > instructions in affected programs: 18968 -> 18874 (-0.50%) > helped: 62 > HURT: 0 > LOST: 0 > GAINED: 4 > > On Broadwell (GL 4.4): > > total instructions in shared programs: 11638930 -> 11638181 (-0.01%) > instructions in affected programs: 84768 -> 84019 (-0.88%) > helped: 505 > HURT: 45 > > Signed-off-by: Kenneth Graunke > --- > src/compiler/nir/nir_opt_algebraic.py | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index 0f0896b..1cf614c 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -180,6 +180,7 @@ optimizations = [ > (('fmul', ('b2f', a), ('b2f', b)), ('b2f', ('iand', a, b))), > (('fsat', ('fadd', ('b2f', a), ('b2f', b))), ('b2f', ('ior', a, b))), > (('iand', 'a@bool', 1.0), ('b2f', a)), > + (('ineg', ('b2i', 'a@bool')), a), > (('flt', ('fneg', ('b2f', a)), 0), a), # Generated by TGSI KILL_IF. > (('flt', ('fsub', 0.0, ('b2f', a)), 0), a), # Generated by TGSI KILL_IF. > # Comparison with the same args. Note that these are not done for Are you sure this shouldn't be conditional? This being an improvement sounds like an intel-ism, and not something that applies to most hardware... ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl/dri2: dri2_make_current: Release previous context's display
On 10/08/16 03:00 PM, Nicolas Boichat wrote: > eglMakeCurrent can also be used to change the active display. In that > case, we need to decrement ref_count of the previous display (possibly > destroying it), and increment it on the next display. > > Also, old_dsurf/old_rsurf cannot be non-NULL if old_ctx is NULL, so > we only need to test if old_ctx is non-NULL. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97214 > Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display) > Cc: "12.0" > Reported-by: Alexandr Zelinsky > Tested-by: Alexandr Zelinsky > Signed-off-by: Nicolas Boichat > --- > src/egl/drivers/dri2/egl_dri2.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c > index 3205a36..701e42a 100644 > --- a/src/egl/drivers/dri2/egl_dri2.c > +++ b/src/egl/drivers/dri2/egl_dri2.c > @@ -1285,8 +1285,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, > _EGLSurface *dsurf, > >if (!unbind) > dri2_dpy->ref_count++; > - if (old_dsurf || old_rsurf || old_ctx) > - dri2_display_release(disp); > + if (old_ctx) { > + EGLDisplay old_disp = > _eglGetDisplayHandle(old_ctx->Resource.Display); > + dri2_display_release(old_disp); > + } Unfortunately, this change breaks the piglit test "spec@egl 1.4@eglterminate then unbind context", because old_ctx != NULL but old_ctx->Resource.Display == NULL. Modifying the test to if (old_ctx && old_ctx->Resource.Display) { fixes the regression and doesn't seem to cause any other problems. Alexandr, does the patch still fix your problem with that modification? Nicolas, this regression is also reproducible with LIBGL_ALWAYS_SOFTWARE=1 . Please get used to testing your changes like that and only send out changes for review which don't cause any regressions. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir: Add support for scalarizing load_interpolated_input.
Signed-off-by: Kenneth Graunke --- src/compiler/nir/nir_lower_io_to_scalar.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/compiler/nir/nir_lower_io_to_scalar.c b/src/compiler/nir/nir_lower_io_to_scalar.c index f2345d5..8be75e4 100644 --- a/src/compiler/nir/nir_lower_io_to_scalar.c +++ b/src/compiler/nir/nir_lower_io_to_scalar.c @@ -48,8 +48,10 @@ lower_load_input_to_scalar(nir_builder *b, nir_intrinsic_instr *intr) nir_intrinsic_set_base(chan_intr, nir_intrinsic_base(intr)); nir_intrinsic_set_component(chan_intr, nir_intrinsic_component(intr) + i); - /* offset */ - chan_intr->src[0] = intr->src[0]; + + /* offset and possibly barycentric coordinates */ + for (int i = 0; i < nir_intrinsic_infos[intr->intrinsic].num_srcs; i++) + chan_intr->src[i] = intr->src[i]; nir_builder_instr_insert(b, &chan_intr->instr); @@ -112,6 +114,7 @@ nir_lower_io_to_scalar(nir_shader *shader, nir_variable_mode mask) switch (intr->intrinsic) { case nir_intrinsic_load_input: + case nir_intrinsic_load_interpolated_input: if (mask & nir_var_shader_in) lower_load_input_to_scalar(&b, intr); break; -- 2.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Add an IO scalarizing pass using the intrinsic's first_component.
On Friday, August 5, 2016 4:27:03 PM PDT Eric Anholt wrote: > vc4 wants to have per-scalar IO load/stores so that dead code elimination > can happen on a more granular basis, which it has been doing in the > backend using a multiplication by 4 of the intrinsic's driver_location. > We can represent it properly in the NIR using the first_component field, > though. > --- > > Next part of the vc4 GTN series. I can't do uniforms because they > don't have a first-component field. > > NIR developers -- is this the right usage of the first-component > field? It looks right to me. > Also, do I need to be splitting up the variable declarations > if I do this? Leaving them as-is should be fine. Variables are pretty meaningless after nir_lower_io(). i965 basically doesn't use them at all anymore...with my FS input rework, everything's in the intrinsics. > There's a check during shader printing that the > fractional component of the variable matches the component field of > the intrinsic, even if the intrinsics's component value would lie > within the vector variable. Not seeing that code offhand...my guess is that it's trying to print the name of the variable being loaded. It's probably just looking for one with location_frac == component, instead of checking whether location_frac <= component < location_frac + vector_elements That could probably be spiffed up for nicer pretty printing, but it shouldn't actually affect any functionality. I think it'd be great to handle load_interpolated_input here too; I'll send a follow-up patch to do that. Feel free to review/squash as you see fit. Reviewed-by: Kenneth Graunke > > src/compiler/Makefile.sources | 1 + > src/compiler/nir/nir.h| 1 + > src/compiler/nir/nir_lower_io_to_scalar.c | 129 > ++ > 3 files changed, 131 insertions(+) > create mode 100644 src/compiler/nir/nir_lower_io_to_scalar.c > > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources > index 0ff9b23e4c38..04016d32db56 100644 > --- a/src/compiler/Makefile.sources > +++ b/src/compiler/Makefile.sources > @@ -201,6 +201,7 @@ NIR_FILES = \ > nir/nir_lower_indirect_derefs.c \ > nir/nir_lower_io.c \ > nir/nir_lower_io_to_temporaries.c \ > + nir/nir_lower_io_to_scalar.c \ > nir/nir_lower_io_types.c \ > nir/nir_lower_passthrough_edgeflags.c \ > nir/nir_lower_phis_to_scalar.c \ > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > index 9ce5be2176d9..379a5f8e022e 100644 > --- a/src/compiler/nir/nir.h > +++ b/src/compiler/nir/nir.h > @@ -2398,6 +2398,7 @@ void nir_lower_alu_to_scalar(nir_shader *shader); > void nir_lower_load_const_to_scalar(nir_shader *shader); > > void nir_lower_phis_to_scalar(nir_shader *shader); > +void nir_lower_io_to_scalar(nir_shader *shader, nir_variable_mode mask); > > void nir_lower_samplers(nir_shader *shader, > const struct gl_shader_program *shader_program); > diff --git a/src/compiler/nir/nir_lower_io_to_scalar.c > b/src/compiler/nir/nir_lower_io_to_scalar.c > new file mode 100644 > index ..f2345d58acfc > --- /dev/null > +++ b/src/compiler/nir/nir_lower_io_to_scalar.c > @@ -0,0 +1,129 @@ > +/* > + * Copyright © 2016 Broadcom > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > DEALINGS > + * IN THE SOFTWARE. > + */ > + > +#include "nir.h" > +#include "nir_builder.h" > + > +/** @file nir_lower_io_to_scalar.c > + * > + * Replaces nir_load_input/nir_store_output operations with num_components != > + * 1 with individual per-channel operations. > + */ > + > +static void > +lower_load_input_to_scalar(nir_builder *b, nir_intrinsic_instr *intr) > +{ > + b->cursor = nir_before_instr(&intr->instr); > + > + assert(intr->dest.is_ssa); > + > + nir_ssa_def *loads[4]; > + > + for (unsigned i = 0; i < intr->num_components; i++) {
Re: [Mesa-dev] [PATCH 02/13] glsl: remove dead builtins before assigning varying locations
Timothy Arceri writes: > Builtins already have locations assigned so this shouldn't > changing anything. We want to call it earlier so we can tranform "change" Other than that, 1-4 are: Reviewed-by: Eric Anholt and I'll see if I can get to some more tomorrow. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev