Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end
On Tuesday, March 22, 2016 3:33:35 PM PDT Jason Ekstrand wrote: > This is mostly a re-send of a patch series I've had floating around in one > form or a while for quite some time. It's basically the same except that > the original version was missing a work-around for Sandy Bridge. For a > while, I wasn't really pushing to get it merged because I couldn't > demonstrate any actual performance benifit from pushing arrays. However, > with the Vulkan API, the concept of push constants is directly exposed to > the user and we really need to be able to indirect on them. This series > makes the FS backend 100% ready for indirect push constants; vec4 will > take a little more work. > > It's worth noting that we've been carying these patches around in our > Vulkan driver for probably 3 or 4 months now and it's working great. > > For those that prefer to review on a branch: > > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms > > I think Kristian has mostly reviewed these patches. However, he never sent > any R-Bs to the list. I'd also like Ken or Matt to look at it from a > design perspective. > Sorry, I ran out of time to review these today. Here's what I have: > Jason Ekstrand (15): > i965/fs: Add support for doing MOV_INDIRECT on uniforms > i965/fs: Don't force MASK_DISABLE on INDIRECT_MOV instructions > i965/fs: Fix regs_read() for MOV_INDIRECT with a non-zero subnr > i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware > nir: Add another index to load_uniform to specify the range read The above are Reviewed-by: Kenneth Graunke > i965/fs: Use MOV_INDIRECT for all indirect uniform loads Acked-by: Kenneth Graunke > i965/fs: Get rid of reladdr Reviewed-by: Kenneth Graunke > i965/fs: Stop relying on param_size in assign_constant_locations Acked-by: Kenneth Graunke > i965/fs: Get rid of the param_size array > i965/vec4: Inline get_pull_constant_offset Reviewed-by: Kenneth Graunke > i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constants Acked-by: Kenneth Graunke > i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD > i965/vec4: Get rid of the uniform_size array > i965/fs: Rename demote_pull_constants to lower_constant_loads Reviewed-by: Kenneth Graunke > i965/fs: Push small uniform arrays Acked-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end
On Thu, Apr 7, 2016 at 10:01 PM, Matt Turner wrote: > On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand > wrote: > > This is mostly a re-send of a patch series I've had floating around in > one > > form or a while for quite some time. It's basically the same except that > > the original version was missing a work-around for Sandy Bridge. For a > > while, I wasn't really pushing to get it merged because I couldn't > > demonstrate any actual performance benifit from pushing arrays. However, > > with the Vulkan API, the concept of push constants is directly exposed to > > the user and we really need to be able to indirect on them. This series > > makes the FS backend 100% ready for indirect push constants; vec4 will > > take a little more work. > > > > It's worth noting that we've been carying these patches around in our > > Vulkan driver for probably 3 or 4 months now and it's working great. > > > > For those that prefer to review on a branch: > > > > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms > > > > I think Kristian has mostly reviewed these patches. However, he never > sent > > any R-Bs to the list. I'd also like Ken or Matt to look at it from a > > design perspective. > > I don't know what I think. I'm sympathetic to Curro's argument, but in > the absence of more data it's hard to judge anything really. I'm not > at all sympathetic to > > """ > Do I have a proof-of-concept in code, no. However, I've run through > it in my head and I have a pretty good idea what it would look like. > You are free to go off and do it if you don't believe me, but I don't > really want to hold things up while you do. > """ > > That's what... An Appeal to Your Brain? :) > > I don't know how to proceed on that front if no one is willing or > interested in trying to implement it using reladdr. > > I ran shader-db. > > total instructions in shared programs: 7113290 -> 7161760 (0.68%) > instructions in affected programs: 866011 -> 914481 (5.60%) > helped: 0 > HURT: 7180 > > total cycles in shared programs: 64705926 -> 64776118 (0.11%) > cycles in affected programs: 4951554 -> 5021746 (1.42%) > helped: 1605 > HURT: 5204 > > of which the overwhelming majority is vertex shaders (why? this series > is i965/fs). FS changes are just > > instructions in affected programs: 13550 -> 14132 (4.30%) > helped: 0 > HURT: 50 > > but I'm having a hard time finding shaders that actually use the > address register. > > What's going on with the shader-db regressions? > As I figured, the shader-db issues are primarily a problem with inconsistent use of D and UD. This patch fixes that problem: https://cgit.freedesktop.org/~jekstrand/mesa/commit/?h=wip/i965-uniforms-v3&id=0dbf9c8ee415b19073efe92fa586fddf22b725e6 I've pushed a version of the series with that patch to fd.o here: https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/i965-uniforms-v3 There is still a small shader-db delta in vec4 which comes from changing the algorithm we use to place pull constants in the buffer. The original algorithm walks over the instructions and places uniforms in the pull constant buffer as it finds an instructions that use them. The new algorithm places them in the same order as their uniform number which is effectively the order in which they are declared in the shader. There are a number of shaders from Valve games (left4dead, portal 2, tf2) that seem to have a single uniform array they use *a lot* and others that they use less frequently. The one they use most frequently happens to also be used first so it gets placed first in the buffer with the old algorighm and later with the new one. The only reason this makes any difference whatsoever is that whatever uniform gets placed first in the buffer is at offset 0 so you don't need to add a constant offset to the array offset and the address calculation has one less instruction. In these Valve games, that uniform happens to be used enough more often than the rest that that extra instruction shows up in the shader-db results. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end
On Apr 7, 2016 11:08 PM, "Jason Ekstrand" wrote: > > > On Apr 7, 2016 10:01 PM, "Matt Turner" wrote: > > > > On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand wrote: > > > This is mostly a re-send of a patch series I've had floating around in one > > > form or a while for quite some time. It's basically the same except that > > > the original version was missing a work-around for Sandy Bridge. For a > > > while, I wasn't really pushing to get it merged because I couldn't > > > demonstrate any actual performance benifit from pushing arrays. However, > > > with the Vulkan API, the concept of push constants is directly exposed to > > > the user and we really need to be able to indirect on them. This series > > > makes the FS backend 100% ready for indirect push constants; vec4 will > > > take a little more work. > > > > > > It's worth noting that we've been carying these patches around in our > > > Vulkan driver for probably 3 or 4 months now and it's working great. > > > > > > For those that prefer to review on a branch: > > > > > > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms > > > > > > I think Kristian has mostly reviewed these patches. However, he never sent > > > any R-Bs to the list. I'd also like Ken or Matt to look at it from a > > > design perspective. > > > > I don't know what I think. I'm sympathetic to Curro's argument, but in > > the absence of more data it's hard to judge anything really. I'm not > > at all sympathetic to > > > > """ > > Do I have a proof-of-concept in code, no. However, I've run through > > it in my head and I have a pretty good idea what it would look like. > > You are free to go off and do it if you don't believe me, but I don't > > really want to hold things up while you do. > > """ > > > > That's what... An Appeal to Your Brain? :) > > Sort-of... It was more a remark of frustration at the (percieved) implication that I hadn't thought about it or at the very least hadnt given it a fair shake. In a bit more detail here are some of my thoughts on reladdr and an ADDR file in no particular order > > a) Not a single FS optimization pass handles it. Yes, "if you see reladdr, bail” is a valid (if suboptimal) strategy 90% of the time. However, anything that computes any sort of kill set now needs a recursive algorithm to walk register sources. We do handle this in NIR and it's not terrible but it does come with nontrivial pain and retrofitting it isn't necessarily going to be quick-and-easy. Curro's response of "use-def chains will fix this" while probably accurate doesn't solve the immediate problem while these patches have been on the list for 6 months. > > b) The hardware doesn't do reladdr. It has an address register with substantial restrictions. Eventually, we would need to lower to something that writes the address register and have an indirect source type that consumes it. If you end up with two indirect sources, we have to emit a move for one of them. Where do we do that lowering? Do we do it in the generator or as a pass? > > c) If we handle it all in the generator, we have no ability to schedule it at all. It also makes the generator far more complex. > > d) If we handle it in a lowering pass, what does that pass produce? Do we expose the ADDR file and try to do RA on it or do we treat it as a fixed thing like flag? In either case, we need to add extra logic to at least the scheduler if not other places to add this whole new concept. > > e) If we allow indirect sources of any sort, how do we carry range information around post-RA. Pre-RA we can theoretically just say if you indirect you touch the whole thing. Post-RA, you either have to carry that information around per-instruction or you have to assume that any instruction that uses an indirect source could be reading anything in the entire GRF and it becomes almost a complete scheduling barrier. > > Those are the thoughts that pop to the top. I could come up with more if you'd like. > > So, yes, using reladdr or or an ADDR file would be possible but it would involve substantial IR surgery. What's the benefit? You can put the relative source directly in the instruction that uses it and maybe do an address calculation directly to the address register instead of having to move it there. The approach I've taken on the other hand, neatly side-steps all of the issues listed above. This comes at the cost of a few extra instructions (which you probably have to spend anyway on gen7). I think that trade-off is worthwhile. There are some other things I found to be very nice about the approach. One is that we got too stop carrying around these arrays of extra uniform size information. They're the only substantial per-backend bit of uniform setup that we do and they're only used for knowing how big the indirected uniforms are. Their existence has bugged me ever since we first got NIR going. They're not too annoying in GL but setting them up from Vulkan was going to get painfu
Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end
On Apr 7, 2016 10:01 PM, "Matt Turner" wrote: > > On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand wrote: > > This is mostly a re-send of a patch series I've had floating around in one > > form or a while for quite some time. It's basically the same except that > > the original version was missing a work-around for Sandy Bridge. For a > > while, I wasn't really pushing to get it merged because I couldn't > > demonstrate any actual performance benifit from pushing arrays. However, > > with the Vulkan API, the concept of push constants is directly exposed to > > the user and we really need to be able to indirect on them. This series > > makes the FS backend 100% ready for indirect push constants; vec4 will > > take a little more work. > > > > It's worth noting that we've been carying these patches around in our > > Vulkan driver for probably 3 or 4 months now and it's working great. > > > > For those that prefer to review on a branch: > > > > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms > > > > I think Kristian has mostly reviewed these patches. However, he never sent > > any R-Bs to the list. I'd also like Ken or Matt to look at it from a > > design perspective. > > I don't know what I think. I'm sympathetic to Curro's argument, but in > the absence of more data it's hard to judge anything really. I'm not > at all sympathetic to > > """ > Do I have a proof-of-concept in code, no. However, I've run through > it in my head and I have a pretty good idea what it would look like. > You are free to go off and do it if you don't believe me, but I don't > really want to hold things up while you do. > """ > > That's what... An Appeal to Your Brain? :) Sort-of... It was more a remark of frustration at the (percieved) implication that I hadn't thought about it or at the very least hadnt given it a fair shake. In a bit more detail here are some of my thoughts on reladdr and an ADDR file in no particular order a) Not a single FS optimization pass handles it. Yes, "if you see reladdr, bail” is a valid (if suboptimal) strategy 90% of the time. However, anything that computes any sort of kill set now needs a recursive algorithm to walk register sources. We do handle this in NIR and it's not terrible but it does come with nontrivial pain and retrofitting it isn't necessarily going to be quick-and-easy. Curro's response of "use-def chains will fix this" while probably accurate doesn't solve the immediate problem while these patches have been on the list for 6 months. b) The hardware doesn't do reladdr. It has an address register with substantial restrictions. Eventually, we would need to lower to something that writes the address register and have an indirect source type that consumes it. If you end up with two indirect sources, we have to emit a move for one of them. Where do we do that lowering? Do we do it in the generator or as a pass? c) If we handle it all in the generator, we have no ability to schedule it at all. It also makes the generator far more complex. d) If we handle it in a lowering pass, what does that pass produce? Do we expose the ADDR file and try to do RA on it or do we treat it as a fixed thing like flag? In either case, we need to add extra logic to at least the scheduler if not other places to add this whole new concept. e) If we allow indirect sources of any sort, how do we carry range information around post-RA. Pre-RA we can theoretically just say if you indirect you touch the whole thing. Post-RA, you either have to carry that information around per-instruction or you have to assume that any instruction that uses an indirect source could be reading anything in the entire GRF and it becomes almost a complete scheduling barrier. Those are the thoughts that pop to the top. I could come up with more if you'd like. So, yes, using reladdr or or an ADDR file would be possible but it would involve substantial IR surgery. What's the benefit? You can put the relative source directly in the instruction that uses it and maybe do an address calculation directly to the address register instead of having to move it there. The approach I've taken on the other hand, neatly side-steps all of the issues listed above. This comes at the cost of a few extra instructions (which you probably have to spend anyway on gen7). I think that trade-off is worthwhile. --Jason > I don't know how to proceed on that front if no one is willing or > interested in trying to implement it using reladdr. > > I ran shader-db. > > total instructions in shared programs: 7113290 -> 7161760 (0.68%) > instructions in affected programs: 866011 -> 914481 (5.60%) > helped: 0 > HURT: 7180 > > total cycles in shared programs: 64705926 -> 64776118 (0.11%) > cycles in affected programs: 4951554 -> 5021746 (1.42%) > helped: 1605 > HURT: 5204 > > of which the overwhelming majority is vertex shaders (why? this series > is i965/fs). FS changes are just > > instructions in affected programs:
Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end
On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand wrote: > This is mostly a re-send of a patch series I've had floating around in one > form or a while for quite some time. It's basically the same except that > the original version was missing a work-around for Sandy Bridge. For a > while, I wasn't really pushing to get it merged because I couldn't > demonstrate any actual performance benifit from pushing arrays. However, > with the Vulkan API, the concept of push constants is directly exposed to > the user and we really need to be able to indirect on them. This series > makes the FS backend 100% ready for indirect push constants; vec4 will > take a little more work. > > It's worth noting that we've been carying these patches around in our > Vulkan driver for probably 3 or 4 months now and it's working great. > > For those that prefer to review on a branch: > > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms > > I think Kristian has mostly reviewed these patches. However, he never sent > any R-Bs to the list. I'd also like Ken or Matt to look at it from a > design perspective. I don't know what I think. I'm sympathetic to Curro's argument, but in the absence of more data it's hard to judge anything really. I'm not at all sympathetic to """ Do I have a proof-of-concept in code, no. However, I've run through it in my head and I have a pretty good idea what it would look like. You are free to go off and do it if you don't believe me, but I don't really want to hold things up while you do. """ That's what... An Appeal to Your Brain? :) I don't know how to proceed on that front if no one is willing or interested in trying to implement it using reladdr. I ran shader-db. total instructions in shared programs: 7113290 -> 7161760 (0.68%) instructions in affected programs: 866011 -> 914481 (5.60%) helped: 0 HURT: 7180 total cycles in shared programs: 64705926 -> 64776118 (0.11%) cycles in affected programs: 4951554 -> 5021746 (1.42%) helped: 1605 HURT: 5204 of which the overwhelming majority is vertex shaders (why? this series is i965/fs). FS changes are just instructions in affected programs: 13550 -> 14132 (4.30%) helped: 0 HURT: 50 but I'm having a hard time finding shaders that actually use the address register. What's going on with the shader-db regressions? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end
On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand wrote: > This is mostly a re-send of a patch series I've had floating around in one > form or a while for quite some time. It's basically the same except that > the original version was missing a work-around for Sandy Bridge. For a > while, I wasn't really pushing to get it merged because I couldn't > demonstrate any actual performance benifit from pushing arrays. However, > with the Vulkan API, the concept of push constants is directly exposed to > the user and we really need to be able to indirect on them. This series > makes the FS backend 100% ready for indirect push constants; vec4 will > take a little more work. > > It's worth noting that we've been carying these patches around in our > Vulkan driver for probably 3 or 4 months now and it's working great. > > For those that prefer to review on a branch: > > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms > > I think Kristian has mostly reviewed these patches. However, he never sent > any R-Bs to the list. I'd also like Ken or Matt to look at it from a > design perspective. > I just confirmed with Kristian via SMS that he has, indeed, reviewed it. > Jason Ekstrand (15): > i965/fs: Add support for doing MOV_INDIRECT on uniforms > i965/fs: Don't force MASK_DISABLE on INDIRECT_MOV instructions > i965/fs: Fix regs_read() for MOV_INDIRECT with a non-zero subnr > i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware > nir: Add another index to load_uniform to specify the range read > i965/fs: Use MOV_INDIRECT for all indirect uniform loads > i965/fs: Get rid of reladdr > i965/fs: Stop relying on param_size in assign_constant_locations > i965/fs: Get rid of the param_size array > i965/vec4: Inline get_pull_constant_offset > i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push > constants > i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD > i965/vec4: Get rid of the uniform_size array > i965/fs: Rename demote_pull_constants to lower_constant_loads > i965/fs: Push small uniform arrays > > src/compiler/nir/nir.h| 7 + > src/compiler/nir/nir_intrinsics.h | 6 +- > src/compiler/nir/nir_lower_io.c | 5 + > src/compiler/nir/nir_print.c | 1 + > src/mesa/drivers/dri/i965/brw_fs.cpp | 189 > +- > src/mesa/drivers/dri/i965/brw_fs.h| 4 +- > src/mesa/drivers/dri/i965/brw_fs_generator.cpp| 68 ++-- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 63 +--- > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 - > src/mesa/drivers/dri/i965/brw_ir_fs.h | 5 +- > src/mesa/drivers/dri/i965/brw_vec4.cpp| 10 +- > src/mesa/drivers/dri/i965/brw_vec4.h | 7 +- > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp| 19 +-- > src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp| 2 - > src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 130 ++- > src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 1 - > 16 files changed, 292 insertions(+), 228 deletions(-) > > -- > 2.5.0.400.gff86faf > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end
This is mostly a re-send of a patch series I've had floating around in one form or a while for quite some time. It's basically the same except that the original version was missing a work-around for Sandy Bridge. For a while, I wasn't really pushing to get it merged because I couldn't demonstrate any actual performance benifit from pushing arrays. However, with the Vulkan API, the concept of push constants is directly exposed to the user and we really need to be able to indirect on them. This series makes the FS backend 100% ready for indirect push constants; vec4 will take a little more work. It's worth noting that we've been carying these patches around in our Vulkan driver for probably 3 or 4 months now and it's working great. For those that prefer to review on a branch: https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms I think Kristian has mostly reviewed these patches. However, he never sent any R-Bs to the list. I'd also like Ken or Matt to look at it from a design perspective. Jason Ekstrand (15): i965/fs: Add support for doing MOV_INDIRECT on uniforms i965/fs: Don't force MASK_DISABLE on INDIRECT_MOV instructions i965/fs: Fix regs_read() for MOV_INDIRECT with a non-zero subnr i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware nir: Add another index to load_uniform to specify the range read i965/fs: Use MOV_INDIRECT for all indirect uniform loads i965/fs: Get rid of reladdr i965/fs: Stop relying on param_size in assign_constant_locations i965/fs: Get rid of the param_size array i965/vec4: Inline get_pull_constant_offset i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constants i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD i965/vec4: Get rid of the uniform_size array i965/fs: Rename demote_pull_constants to lower_constant_loads i965/fs: Push small uniform arrays src/compiler/nir/nir.h| 7 + src/compiler/nir/nir_intrinsics.h | 6 +- src/compiler/nir/nir_lower_io.c | 5 + src/compiler/nir/nir_print.c | 1 + src/mesa/drivers/dri/i965/brw_fs.cpp | 189 +- src/mesa/drivers/dri/i965/brw_fs.h| 4 +- src/mesa/drivers/dri/i965/brw_fs_generator.cpp| 68 ++-- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 63 +--- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 - src/mesa/drivers/dri/i965/brw_ir_fs.h | 5 +- src/mesa/drivers/dri/i965/brw_vec4.cpp| 10 +- src/mesa/drivers/dri/i965/brw_vec4.h | 7 +- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp| 19 +-- src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp| 2 - src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 130 ++- src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 1 - 16 files changed, 292 insertions(+), 228 deletions(-) -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev