Re: [Mesa-dev] [HACK] i965/fs: Fix ordering of src0 alpha and oMask in the framebuffer write payload.
Jason Ekstrand ja...@jlekstrand.net writes: On Jul 9, 2015 7:57 AM, Francisco Jerez curroje...@riseup.net wrote: We were passing src0 alpha and oMask in reverse order. There seems to be no good way to pass them in the correct order to the new-style LOAD_PAYLOAD (how surprising) because src0 alpha is per-channel while oMask is not. Just split src0 alpha in fixed-width registers and pass them to LOAD_PAYLOAD as if they were part of the header as work-around for now. Bah... I came across this when I did the LOAD_PAYLOAD rework but thought it was only theoretical. I wasn't very familiar with what omask actually did and, since piglit didn't hit it, I wasn't sure if it was a real problem or not. I probably should have done more digging and written a piglit test at the time. My bad. One solution that I proposed at the time was to turn header_size into header_mask in the obvious way. We can still use 8 bits because we should never have a header source higher than 8. So your idea is to have one bit per source indicating whether it's header-like or per-channel? I don't think that extends to instructions other than LOAD_PAYLOAD (e.g. FB_WRITE) where the same source is at the same time header and payload. One bit per 32B register would extend easily but it would be rather ugly to deal with if you want to keep your code SIMD width-invariant. I think if you go with the per-source flag you'll want it to be in its own subclass of fs_inst. With its own subclass you could even have an array of per-source sizes determining the number of registers read for each source, which would be rather nice for the visitor (no need to split vectors into components while passing them to LOAD_PAYLOAD). Still I think the most elegant solution would be to simply get rid of the header/payload distinction by using force_writemask_all and, if it proves to be necessary, fix the optimizer to get rid of redundant force_writemask_all flags where it doesn't do it already. Thoughts? --Jason I've written a piglit test that demonstrates the problem by using gl_SampleMask from a fragment shader with multiple color outputs [1]. [1] http://lists.freedesktop.org/archives/piglit/2015-July/016499.html --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 94d6a58..304ae74 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1535,6 +1535,19 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, length++; } + if (src0_alpha.file != BAD_FILE color0.file != BAD_FILE) { + /* Neat, we need to chop the src0 alpha component and pass it as part of + * the header even though it has per-channel semantics, because the next + * optional field is header-like and LOAD_PAYLOAD requires all such + * fields to form a contiguous segment at the beginning of the message. + */ + for (unsigned i = 0; i exec_size / 8; i++) { + setup_color_payload(sources[length], src0_alpha, 1, 8, + use_2nd_half || i == 1); + length++; + } + } + prog_data-uses_omask = prog-OutputsWritten BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK); if (prog_data-uses_omask) { @@ -1561,19 +1574,14 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, offset(this-outputs[0], bld, 3), 1, exec_size, false); length += 4; - } else if (color1.file == BAD_FILE) { - if (src0_alpha.file != BAD_FILE) { - setup_color_payload(sources[length], src0_alpha, 1, exec_size, false); - length++; - } - - setup_color_payload(sources[length], color0, components, - exec_size, use_2nd_half); - length += 4; } else { setup_color_payload(sources[length], color0, components, exec_size, use_2nd_half); length += 4; + + } + + if (color1.file != BAD_FILE) { setup_color_payload(sources[length], color1, components, exec_size, use_2nd_half); length += 4; -- 2.4.3 signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [HACK] i965/fs: Fix ordering of src0 alpha and oMask in the framebuffer write payload.
On Fri, Jul 10, 2015 at 5:25 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Jul 9, 2015 7:57 AM, Francisco Jerez curroje...@riseup.net wrote: We were passing src0 alpha and oMask in reverse order. There seems to be no good way to pass them in the correct order to the new-style LOAD_PAYLOAD (how surprising) because src0 alpha is per-channel while oMask is not. Just split src0 alpha in fixed-width registers and pass them to LOAD_PAYLOAD as if they were part of the header as work-around for now. Bah... I came across this when I did the LOAD_PAYLOAD rework but thought it was only theoretical. I wasn't very familiar with what omask actually did and, since piglit didn't hit it, I wasn't sure if it was a real problem or not. I probably should have done more digging and written a piglit test at the time. My bad. One solution that I proposed at the time was to turn header_size into header_mask in the obvious way. We can still use 8 bits because we should never have a header source higher than 8. So your idea is to have one bit per source indicating whether it's header-like or per-channel? I don't think that extends to instructions other than LOAD_PAYLOAD (e.g. FB_WRITE) where the same source is at the same time header and payload. You're right, it doesn't. We really shouldn't be conflating them. We should have header_mask and header_present be different fields. Maybe use a union to save space, but they should have different semantic meaning and different names. We should probably also have a compr4_mask and get rid of the hackery there. One bit per 32B register would extend easily but it would be rather ugly to deal with if you want to keep your code SIMD width-invariant. I think if you go with the per-source flag you'll want it to be in its own subclass of fs_inst. With its own subclass you could even have an array of per-source sizes determining the number of registers read for each source, which would be rather nice for the visitor (no need to split vectors into components while passing them to LOAD_PAYLOAD). Still I think the most elegant solution would be to simply get rid of the header/payload distinction by using force_writemask_all and, if it proves to be necessary, fix the optimizer to get rid of redundant force_writemask_all flags where it doesn't do it already. I really don't think that's a good long-term or short-term solution. How badly are you blocking on this? I don't really have a lot of extra time to work on this at the moment but can carve some out if needed. --jason Thoughts? --Jason I've written a piglit test that demonstrates the problem by using gl_SampleMask from a fragment shader with multiple color outputs [1]. [1] http://lists.freedesktop.org/archives/piglit/2015-July/016499.html --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 94d6a58..304ae74 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1535,6 +1535,19 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, length++; } + if (src0_alpha.file != BAD_FILE color0.file != BAD_FILE) { + /* Neat, we need to chop the src0 alpha component and pass it as part of + * the header even though it has per-channel semantics, because the next + * optional field is header-like and LOAD_PAYLOAD requires all such + * fields to form a contiguous segment at the beginning of the message. + */ + for (unsigned i = 0; i exec_size / 8; i++) { + setup_color_payload(sources[length], src0_alpha, 1, 8, + use_2nd_half || i == 1); + length++; + } + } + prog_data-uses_omask = prog-OutputsWritten BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK); if (prog_data-uses_omask) { @@ -1561,19 +1574,14 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, offset(this-outputs[0], bld, 3), 1, exec_size, false); length += 4; - } else if (color1.file == BAD_FILE) { - if (src0_alpha.file != BAD_FILE) { - setup_color_payload(sources[length], src0_alpha, 1, exec_size, false); - length++; - } - - setup_color_payload(sources[length], color0, components, - exec_size, use_2nd_half); - length += 4; } else { setup_color_payload(sources[length], color0, components, exec_size, use_2nd_half); length += 4; + + } + + if (color1.file != BAD_FILE) { setup_color_payload(sources[length], color1, components, exec_size, use_2nd_half); length += 4; --
Re: [Mesa-dev] [HACK] i965/fs: Fix ordering of src0 alpha and oMask in the framebuffer write payload.
Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Jul 10, 2015 at 5:25 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Jul 9, 2015 7:57 AM, Francisco Jerez curroje...@riseup.net wrote: We were passing src0 alpha and oMask in reverse order. There seems to be no good way to pass them in the correct order to the new-style LOAD_PAYLOAD (how surprising) because src0 alpha is per-channel while oMask is not. Just split src0 alpha in fixed-width registers and pass them to LOAD_PAYLOAD as if they were part of the header as work-around for now. Bah... I came across this when I did the LOAD_PAYLOAD rework but thought it was only theoretical. I wasn't very familiar with what omask actually did and, since piglit didn't hit it, I wasn't sure if it was a real problem or not. I probably should have done more digging and written a piglit test at the time. My bad. One solution that I proposed at the time was to turn header_size into header_mask in the obvious way. We can still use 8 bits because we should never have a header source higher than 8. So your idea is to have one bit per source indicating whether it's header-like or per-channel? I don't think that extends to instructions other than LOAD_PAYLOAD (e.g. FB_WRITE) where the same source is at the same time header and payload. You're right, it doesn't. We really shouldn't be conflating them. We should have header_mask and header_present be different fields. Maybe use a union to save space, but they should have different semantic meaning and different names. We should probably also have a compr4_mask and get rid of the hackery there. One bit per 32B register would extend easily but it would be rather ugly to deal with if you want to keep your code SIMD width-invariant. I think if you go with the per-source flag you'll want it to be in its own subclass of fs_inst. With its own subclass you could even have an array of per-source sizes determining the number of registers read for each source, which would be rather nice for the visitor (no need to split vectors into components while passing them to LOAD_PAYLOAD). Still I think the most elegant solution would be to simply get rid of the header/payload distinction by using force_writemask_all and, if it proves to be necessary, fix the optimizer to get rid of redundant force_writemask_all flags where it doesn't do it already. I really don't think that's a good long-term or short-term solution. How badly are you blocking on this? I don't really have a lot of extra time to work on this at the moment but can carve some out if needed. I'm not blocking on this at all, feel free to fix it however you like, or just go with this hack for the moment if you have higher priority stuff to work on right now, I honestly don't care. --jason Thoughts? --Jason I've written a piglit test that demonstrates the problem by using gl_SampleMask from a fragment shader with multiple color outputs [1]. [1] http://lists.freedesktop.org/archives/piglit/2015-July/016499.html --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 94d6a58..304ae74 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1535,6 +1535,19 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, length++; } + if (src0_alpha.file != BAD_FILE color0.file != BAD_FILE) { + /* Neat, we need to chop the src0 alpha component and pass it as part of + * the header even though it has per-channel semantics, because the next + * optional field is header-like and LOAD_PAYLOAD requires all such + * fields to form a contiguous segment at the beginning of the message. + */ + for (unsigned i = 0; i exec_size / 8; i++) { + setup_color_payload(sources[length], src0_alpha, 1, 8, + use_2nd_half || i == 1); + length++; + } + } + prog_data-uses_omask = prog-OutputsWritten BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK); if (prog_data-uses_omask) { @@ -1561,19 +1574,14 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, offset(this-outputs[0], bld, 3), 1, exec_size, false); length += 4; - } else if (color1.file == BAD_FILE) { - if (src0_alpha.file != BAD_FILE) { - setup_color_payload(sources[length], src0_alpha, 1, exec_size, false); - length++; - } - - setup_color_payload(sources[length], color0, components, - exec_size, use_2nd_half); - length += 4; } else { setup_color_payload(sources[length], color0, components,
Re: [Mesa-dev] [HACK] i965/fs: Fix ordering of src0 alpha and oMask in the framebuffer write payload.
On Fri, Jul 10, 2015 at 9:53 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Jul 10, 2015 at 5:25 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Jul 9, 2015 7:57 AM, Francisco Jerez curroje...@riseup.net wrote: We were passing src0 alpha and oMask in reverse order. There seems to be no good way to pass them in the correct order to the new-style LOAD_PAYLOAD (how surprising) because src0 alpha is per-channel while oMask is not. Just split src0 alpha in fixed-width registers and pass them to LOAD_PAYLOAD as if they were part of the header as work-around for now. Bah... I came across this when I did the LOAD_PAYLOAD rework but thought it was only theoretical. I wasn't very familiar with what omask actually did and, since piglit didn't hit it, I wasn't sure if it was a real problem or not. I probably should have done more digging and written a piglit test at the time. My bad. One solution that I proposed at the time was to turn header_size into header_mask in the obvious way. We can still use 8 bits because we should never have a header source higher than 8. So your idea is to have one bit per source indicating whether it's header-like or per-channel? I don't think that extends to instructions other than LOAD_PAYLOAD (e.g. FB_WRITE) where the same source is at the same time header and payload. You're right, it doesn't. We really shouldn't be conflating them. We should have header_mask and header_present be different fields. Maybe use a union to save space, but they should have different semantic meaning and different names. We should probably also have a compr4_mask and get rid of the hackery there. One bit per 32B register would extend easily but it would be rather ugly to deal with if you want to keep your code SIMD width-invariant. I think if you go with the per-source flag you'll want it to be in its own subclass of fs_inst. With its own subclass you could even have an array of per-source sizes determining the number of registers read for each source, which would be rather nice for the visitor (no need to split vectors into components while passing them to LOAD_PAYLOAD). Still I think the most elegant solution would be to simply get rid of the header/payload distinction by using force_writemask_all and, if it proves to be necessary, fix the optimizer to get rid of redundant force_writemask_all flags where it doesn't do it already. I really don't think that's a good long-term or short-term solution. How badly are you blocking on this? I don't really have a lot of extra time to work on this at the moment but can carve some out if needed. I'm not blocking on this at all, feel free to fix it however you like, or just go with this hack for the moment if you have higher priority stuff to work on right now, I honestly don't care. That's good to hear. I'll try and take a look at this in a couple of weeks. Thanks for bringing it up and writing the piglit test! --Jason --jason Thoughts? --Jason I've written a piglit test that demonstrates the problem by using gl_SampleMask from a fragment shader with multiple color outputs [1]. [1] http://lists.freedesktop.org/archives/piglit/2015-July/016499.html --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 94d6a58..304ae74 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1535,6 +1535,19 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, length++; } + if (src0_alpha.file != BAD_FILE color0.file != BAD_FILE) { + /* Neat, we need to chop the src0 alpha component and pass it as part of + * the header even though it has per-channel semantics, because the next + * optional field is header-like and LOAD_PAYLOAD requires all such + * fields to form a contiguous segment at the beginning of the message. + */ + for (unsigned i = 0; i exec_size / 8; i++) { + setup_color_payload(sources[length], src0_alpha, 1, 8, + use_2nd_half || i == 1); + length++; + } + } + prog_data-uses_omask = prog-OutputsWritten BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK); if (prog_data-uses_omask) { @@ -1561,19 +1574,14 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, offset(this-outputs[0], bld, 3), 1, exec_size, false); length += 4; - } else if (color1.file == BAD_FILE) { - if (src0_alpha.file != BAD_FILE) { - setup_color_payload(sources[length], src0_alpha, 1, exec_size, false); - length++; - } - -
[Mesa-dev] [HACK] i965/fs: Fix ordering of src0 alpha and oMask in the framebuffer write payload.
We were passing src0 alpha and oMask in reverse order. There seems to be no good way to pass them in the correct order to the new-style LOAD_PAYLOAD (how surprising) because src0 alpha is per-channel while oMask is not. Just split src0 alpha in fixed-width registers and pass them to LOAD_PAYLOAD as if they were part of the header as work-around for now. I've written a piglit test that demonstrates the problem by using gl_SampleMask from a fragment shader with multiple color outputs [1]. [1] http://lists.freedesktop.org/archives/piglit/2015-July/016499.html --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 94d6a58..304ae74 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1535,6 +1535,19 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, length++; } + if (src0_alpha.file != BAD_FILE color0.file != BAD_FILE) { + /* Neat, we need to chop the src0 alpha component and pass it as part of + * the header even though it has per-channel semantics, because the next + * optional field is header-like and LOAD_PAYLOAD requires all such + * fields to form a contiguous segment at the beginning of the message. + */ + for (unsigned i = 0; i exec_size / 8; i++) { + setup_color_payload(sources[length], src0_alpha, 1, 8, + use_2nd_half || i == 1); + length++; + } + } + prog_data-uses_omask = prog-OutputsWritten BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK); if (prog_data-uses_omask) { @@ -1561,19 +1574,14 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, offset(this-outputs[0], bld, 3), 1, exec_size, false); length += 4; - } else if (color1.file == BAD_FILE) { - if (src0_alpha.file != BAD_FILE) { - setup_color_payload(sources[length], src0_alpha, 1, exec_size, false); - length++; - } - - setup_color_payload(sources[length], color0, components, - exec_size, use_2nd_half); - length += 4; } else { setup_color_payload(sources[length], color0, components, exec_size, use_2nd_half); length += 4; + + } + + if (color1.file != BAD_FILE) { setup_color_payload(sources[length], color1, components, exec_size, use_2nd_half); length += 4; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [HACK] i965/fs: Fix ordering of src0 alpha and oMask in the framebuffer write payload.
On Jul 9, 2015 7:57 AM, Francisco Jerez curroje...@riseup.net wrote: We were passing src0 alpha and oMask in reverse order. There seems to be no good way to pass them in the correct order to the new-style LOAD_PAYLOAD (how surprising) because src0 alpha is per-channel while oMask is not. Just split src0 alpha in fixed-width registers and pass them to LOAD_PAYLOAD as if they were part of the header as work-around for now. Bah... I came across this when I did the LOAD_PAYLOAD rework but thought it was only theoretical. I wasn't very familiar with what omask actually did and, since piglit didn't hit it, I wasn't sure if it was a real problem or not. I probably should have done more digging and written a piglit test at the time. My bad. One solution that I proposed at the time was to turn header_size into header_mask in the obvious way. We can still use 8 bits because we should never have a header source higher than 8. Thoughts? --Jason I've written a piglit test that demonstrates the problem by using gl_SampleMask from a fragment shader with multiple color outputs [1]. [1] http://lists.freedesktop.org/archives/piglit/2015-July/016499.html --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 94d6a58..304ae74 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1535,6 +1535,19 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, length++; } + if (src0_alpha.file != BAD_FILE color0.file != BAD_FILE) { + /* Neat, we need to chop the src0 alpha component and pass it as part of + * the header even though it has per-channel semantics, because the next + * optional field is header-like and LOAD_PAYLOAD requires all such + * fields to form a contiguous segment at the beginning of the message. + */ + for (unsigned i = 0; i exec_size / 8; i++) { + setup_color_payload(sources[length], src0_alpha, 1, 8, + use_2nd_half || i == 1); + length++; + } + } + prog_data-uses_omask = prog-OutputsWritten BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK); if (prog_data-uses_omask) { @@ -1561,19 +1574,14 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, offset(this-outputs[0], bld, 3), 1, exec_size, false); length += 4; - } else if (color1.file == BAD_FILE) { - if (src0_alpha.file != BAD_FILE) { - setup_color_payload(sources[length], src0_alpha, 1, exec_size, false); - length++; - } - - setup_color_payload(sources[length], color0, components, - exec_size, use_2nd_half); - length += 4; } else { setup_color_payload(sources[length], color0, components, exec_size, use_2nd_half); length += 4; + + } + + if (color1.file != BAD_FILE) { setup_color_payload(sources[length], color1, components, exec_size, use_2nd_half); length += 4; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev