Re: [Mesa-dev] [PATCH 1/5] mesa: add ARB_derivative_control extension bit
On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/mesa/main/extensions.c | 1 + src/mesa/main/mtypes.h | 1 + 2 files changed, 2 insertions(+) diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index 8658ca8..3dcb199 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -101,6 +101,7 @@ static const struct extension extension_table[] = { { GL_ARB_depth_buffer_float, o(ARB_depth_buffer_float), GL, 2008 }, { GL_ARB_depth_clamp, o(ARB_depth_clamp), GL, 2003 }, { GL_ARB_depth_texture, o(ARB_depth_texture), GLL,2001 }, + { GL_ARB_derivative_control, o(ARB_derivative_control), GLC,2014 }, No reason to be core-only that I can see. With s/GLC/GL/ this is Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/5] glsl: add ARB_derivative control support
Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] Add ARB_derivative_control support
On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote: I left all the variants as separate operations in the glsl ir. However for gallium I only added the fine version, as it seems like DDX can do pretty much whatever it wants. I was on the fence about adding coarse versions as well and then using the FragmentShaderDerivative hint to select one or the other in the glsl - tgsi conversion. In the case of nv50/nvc0, doing the fine version is pretty much the only (easy) way of doing derivatives. I haven't traced the blob to see how it handles things yet. In any case, on nv50/nvc0 all this is completely moot, at least for now. Curious about what the situation with other hardware is. i965 already implements coarse and fine derivatives, selectable by the derivatives hint, coarse default. The calculation of the derivative itself isn't faster for coarse derivatives, but it was discovered that if all of the samples of a sample_d are from the same LOD, it's a bunch faster on Haswell at least. See commit 848c0e72. And with coarse derivatives they are. Maybe other hardware has similar optimizations? Also, the extension spec claims to require GLSL 4.00, which seems a little extreme. Instead I restrict it to core contexts. Let me know if I should change this. Making it core-only doesn't help, nor does it satisfy the GLSL = 4.0 requirement in the spec. I'm not sure if we have a way to arbitrarily limit an extension to being exposed under certain GLSL versions... ? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] mesa: add ARB_derivative_control extension bit
On Wed, Aug 13, 2014 at 11:44 PM, Matt Turner matts...@gmail.com wrote: On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/mesa/main/extensions.c | 1 + src/mesa/main/mtypes.h | 1 + 2 files changed, 2 insertions(+) diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index 8658ca8..3dcb199 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -101,6 +101,7 @@ static const struct extension extension_table[] = { { GL_ARB_depth_buffer_float, o(ARB_depth_buffer_float), GL, 2008 }, { GL_ARB_depth_clamp, o(ARB_depth_clamp), GL, 2003 }, { GL_ARB_depth_texture, o(ARB_depth_texture), GLL,2001 }, + { GL_ARB_derivative_control, o(ARB_derivative_control), GLC,2014 }, No reason to be core-only that I can see. I guess we can just leave it up to the drivers to turn on the extension if GLSL = 4.00? Seems ugly. Also seems like a pretty arbitrary requirement. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/9] glsl: Optimize min/max expression trees
On 14.08.2014 04:33, Ian Romanick wrote: On 07/29/2014 02:36 AM, Petri Latvala wrote: Add an optimization pass that drops min/max expression operands that can be proven to not contribute to the final result. The algorithm is similar to alpha-beta pruning on a minmax search, from the field of AI. This optimization pass can optimize min/max expressions where operands are min/max expressions. Such code can appear in shaders by itself, or as the result of clamp() or AMD_shader_trinary_minmax functions. This optimization pass improves the generated code for piglit's AMD_shader_trinary_minmax tests as follows: total instructions in shared programs: 75 - 67 (-10.67%) instructions in affected programs: 60 - 52 (-13.33%) GAINED:0 LOST: 0 All tests (max3, min3, mid3) improved. And I assume no piglit regressions? Also... have you tried this in combination with Abdiel's related work on saturates? Petteri, What is your plan on this particular pass? I have a similar patch that drops the min/max expression but using a different approach. Do you want to push for this particular optimization or do you want to take over the series? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/9] glsl: Optimize min/max expression trees
On Tue, Jul 29, 2014 at 2:36 AM, Petri Latvala petri.latv...@intel.com wrote: Add an optimization pass that drops min/max expression operands that can be proven to not contribute to the final result. The algorithm is similar to alpha-beta pruning on a minmax search, from the field of AI. This optimization pass can optimize min/max expressions where operands are min/max expressions. Such code can appear in shaders by itself, or as the result of clamp() or AMD_shader_trinary_minmax functions. This optimization pass improves the generated code for piglit's AMD_shader_trinary_minmax tests as follows: total instructions in shared programs: 75 - 67 (-10.67%) instructions in affected programs: 60 - 52 (-13.33%) GAINED:0 LOST: 0 All tests (max3, min3, mid3) improved. A full shader-db run: total instructions in shared programs: 4293603 - 4293575 (-0.00%) instructions in affected programs: 1188 - 1160 (-2.36%) GAINED:0 LOST: 0 Improvements happen in Guacamelee and Serious Sam 3. One shader from Dungeon Defenders is hurt by shader-db metrics (26 - 28), because of dropping of a (constant float (0.0)) operand, which was compiled to a saturate modifier. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861 Signed-off-by: Petri Latvala petri.latv...@intel.com --- src/glsl/Makefile.sources | 1 + src/glsl/glsl_parser_extras.cpp | 1 + src/glsl/ir_optimization.h | 1 + src/glsl/opt_minmax.cpp | 395 4 files changed, 398 insertions(+) create mode 100644 src/glsl/opt_minmax.cpp diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index b54eae7..1ee80a3 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -95,6 +95,7 @@ LIBGLSL_FILES = \ $(GLSL_SRCDIR)/opt_flip_matrices.cpp \ $(GLSL_SRCDIR)/opt_function_inlining.cpp \ $(GLSL_SRCDIR)/opt_if_simplification.cpp \ + $(GLSL_SRCDIR)/opt_minmax.cpp \ $(GLSL_SRCDIR)/opt_noop_swizzle.cpp \ $(GLSL_SRCDIR)/opt_rebalance_tree.cpp \ $(GLSL_SRCDIR)/opt_redundant_jumps.cpp \ diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 890123a..9f57ef3 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -1561,6 +1561,7 @@ do_common_optimization(exec_list *ir, bool linked, else progress = do_constant_variable_unlinked(ir) || progress; progress = do_constant_folding(ir) || progress; + progress = do_minmax_prune(ir) || progress; progress = do_cse(ir) || progress; progress = do_rebalance_tree(ir) || progress; progress = do_algebraic(ir, native_integers, options) || progress; diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h index b83c225..9d22585 100644 --- a/src/glsl/ir_optimization.h +++ b/src/glsl/ir_optimization.h @@ -98,6 +98,7 @@ bool opt_flatten_nested_if_blocks(exec_list *instructions); bool do_discard_simplification(exec_list *instructions); bool lower_if_to_cond_assign(exec_list *instructions, unsigned max_depth = 0); bool do_mat_op_to_vec(exec_list *instructions); +bool do_minmax_prune(exec_list *instructions); bool do_noop_swizzle(exec_list *instructions); bool do_structure_splitting(exec_list *instructions); bool do_swizzle_swizzle(exec_list *instructions); diff --git a/src/glsl/opt_minmax.cpp b/src/glsl/opt_minmax.cpp new file mode 100644 index 000..5656059 --- /dev/null +++ b/src/glsl/opt_minmax.cpp @@ -0,0 +1,395 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +/** + * \file opt_minmax.cpp + * + * Drop operands from an expression tree of only
Re: [Mesa-dev] [PATCH] egl_dri2: fix EXT_image_dma_buf_import fds
On Wed, 13 Aug 2014 19:46:40 +0300 Pohjolainen, Topi topi.pohjolai...@intel.com wrote: On Fri, Aug 08, 2014 at 05:28:59PM +0300, Pekka Paalanen wrote: From: Pekka Paalanen pekka.paala...@collabora.co.uk The EGL_EXT_image_dma_buf_import specification was revised (according to its revision history) on Dec 5th, 2013, for EGL to not take ownership of the file descriptors. Do not close the file descriptors passed in to eglCreateImageKHR with EGL_LINUX_DMA_BUF_EXT target. It is assumed, that the drivers, which ultimately process the file descriptors, do not close or modify them in any way either. This avoids the need to dup(), as it seems we would only need to just close the dup'd file descriptors right after. Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk I wrote the current logic based on the older version, and at least to me this is the right thing to do. Thanks for fixing it as well as taking care of the piglit test. Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com I would be happier though if someone else gave his/her approval as well. Thank you, I have added your R-b, and will wait some more. I think I want the piglit patch landed first before I try to push this, anyway. Thanks for the piglit review too, I sent a new version with your R-b and the comment fix. - pq --- Hi, the corresponding Piglit fix has already been sent to the piglit mailing list. Both this and that need to be applied to not regress Mesa' piglit run by one test (ext_image_dma_buf_import-ownership_transfer). This patch fixes my test case on heavily modified Weston. Thanks, pq --- src/egl/drivers/dri2/egl_dri2.c | 37 ++--- 1 file changed, 6 insertions(+), 31 deletions(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 5602ec3..cd85fd3 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -1678,36 +1678,13 @@ dri2_check_dma_buf_format(const _EGLImageAttribs *attrs) /** * The spec says: * - * If eglCreateImageKHR is successful for a EGL_LINUX_DMA_BUF_EXT target, - * the EGL takes ownership of the file descriptor and is responsible for - * closing it, which it may do at any time while the EGLDisplay is - * initialized. + * If eglCreateImageKHR is successful for a EGL_LINUX_DMA_BUF_EXT target, the + * EGL will take a reference to the dma_buf(s) which it will release at any + * time while the EGLDisplay is initialized. It is the responsibility of the + * application to close the dma_buf file descriptors. + * + * Therefore we must never close or otherwise modify the file descriptors. */ -static void -dri2_take_dma_buf_ownership(const int *fds, unsigned num_fds) -{ - int already_closed[num_fds]; - unsigned num_closed = 0; - unsigned i, j; - - for (i = 0; i num_fds; ++i) { - /** - * The same file descriptor can be referenced multiple times in case more - * than one plane is found in the same buffer, just with a different - * offset. - */ - for (j = 0; j num_closed; ++j) { - if (already_closed[j] == fds[i]) -break; - } - - if (j == num_closed) { - close(fds[i]); - already_closed[num_closed++] = fds[i]; - } - } -} - static _EGLImage * dri2_create_image_dma_buf(_EGLDisplay *disp, _EGLContext *ctx, EGLClientBuffer buffer, const EGLint *attr_list) @@ -1770,8 +1747,6 @@ dri2_create_image_dma_buf(_EGLDisplay *disp, _EGLContext *ctx, return EGL_NO_IMAGE_KHR; res = dri2_create_image_from_dri(disp, dri_image); - if (res) - dri2_take_dma_buf_ownership(fds, num_fds); return res; } -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] squash! glsl: Optimize min/max expression trees
On Wed, Aug 13, 2014 at 9:04 PM, Matt Turner matts...@gmail.com wrote: --- I'd squash this in at minimum. The changes are - Whitespace - Removal of unnecessary destructor - Renaming one and two to a and b (one-value.u[c0] two-value.u[c0]...) - continue - break - assert(!...) - unreachable - Not doing assignments in if conditionals - Marking swizzle_if_required as static I also think less_all_components should just return an enum like { MIXED, EQUAL, LESS, GREATER }, rather than setting a variable in the class. It, as well as smaller/larger_constant, can then be static functions outside of the visitor. I agree. Also, I realized that in the only place where we care about the valid variable, Another thing I'd like to see is to change minmax_range to call things low and high instead of range[0] and range[1]. This helps readability, and the tricks with indirect addressing that having an array lets you do are things we really shouldn't be doing anyways because it's hard to follow. As I mentioned before, swizzle_if_required() should probably use the ir_builder swizzle helpers. I'm still not convinced that the algorithm is the best way to go about it. Right now, AFAICT, we do something like: - Pass in a base range, which is what the min's and max's above us in the tree will clamp the value we return to - Get the ranges for each subexpression (this is a recursive call) - Check and see if each operand is unnecessary (i.e. its range is strictly greater than the base range or strictly greater than the other argument for mins, the other way around for max's) As another thing, the logic for this part could be made a *lot* clearer by rearranging the code and commenting. I'd do something like: bool is_redundant = false /* whether this operand will never affect the final value of the min-max tree */ if (is_min) { /* if this operand will always be greater than the other one, it's redundant */ if (limit[i].low limit[1 - i].high) is_redundant = true; /* if this operand is always greater than baserange, then even if it's smaller than the other one it'll get clamped so it's redundant */ if (limit[i].low baserange.high) is_redundant = true; } else { ... the exact same logic mirrored ... } - Recurse into the subexpressions, computing the new baserange. What I think we should do instead is change prune_expression() to also return the range for the expression (it's now returning two things, so one would have to be passed via a class variable), so it would look like: - Pass in the base range - If this is a constant, return ourself and the range with low == high - Recurse into both subexpressions, setting both the range (limits[i]) and the new subexpression - If one of the subexpressions is redundant, return the other subexpression and its range - Otherwise, return ourself and the combination of the ranges This will allow us to do the recursion only once, instead of once in get_range() and once in prune_expression(), which will make things simpler and faster. I think the algorithm itself looks correct. src/glsl/opt_minmax.cpp | 145 +--- 1 file changed, 63 insertions(+), 82 deletions(-) diff --git a/src/glsl/opt_minmax.cpp b/src/glsl/opt_minmax.cpp index 5656059..b987386 100644 --- a/src/glsl/opt_minmax.cpp +++ b/src/glsl/opt_minmax.cpp @@ -37,12 +37,10 @@ #include glsl_types.h #include main/macros.h -namespace -{ -class minmax_range -{ -public: +namespace { +class minmax_range { +public: minmax_range(ir_constant *low = NULL, ir_constant *high = NULL) { range[0] = low; @@ -60,60 +58,45 @@ public: class ir_minmax_visitor : public ir_rvalue_enter_visitor { public: ir_minmax_visitor() - : progress(false) - , valid(true) - { - } - - virtual ~ir_minmax_visitor() + : progress(false), valid(true) { } - bool - less_all_components(ir_constant *one, ir_constant *two); - - ir_constant * - smaller_constant(ir_constant *one, ir_constant *two); - - ir_constant * - larger_constant(ir_constant *one, ir_constant *two); + bool less_all_components(ir_constant *a, ir_constant *b); + ir_constant *smaller_constant(ir_constant *a, ir_constant *b); + ir_constant *larger_constant(ir_constant *a, ir_constant *b); - minmax_range - combine_range(minmax_range r0, minmax_range r1, bool ismin); + minmax_range combine_range(minmax_range r0, minmax_range r1, bool ismin); - minmax_range - range_intersection(minmax_range r0, minmax_range r1); + minmax_range range_intersection(minmax_range r0, minmax_range r1); - minmax_range - get_range(ir_rvalue *rval); + minmax_range get_range(ir_rvalue *rval); - ir_rvalue * - prune_expression(ir_expression *expr, minmax_range baserange); + ir_rvalue *prune_expression(ir_expression *expr, minmax_range baserange); - void - handle_rvalue(ir_rvalue
[Mesa-dev] [PATCH] glsl: Fixed vectorize pass vs. texture lookups
Attached patch fixes GLSL vectorization optimization going wrong on some texture lookups, see https://bugs.freedesktop.org/show_bug.cgi?id=82574 -- Aras Pranckevičius work: http://unity3d.com home: http://aras-p.info From 9c592e2d0216e1b17f303be3ae1505b209abd5b3 Mon Sep 17 00:00:00 2001 From: Aras Pranckevicius a...@unity3d.com Date: Wed, 13 Aug 2014 20:40:05 +0300 Subject: [PATCH] glsl: Fixed vectorize pass vs. texture lookups https://bugs.freedesktop.org/show_bug.cgi?id=82574 --- src/glsl/opt_vectorize.cpp | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/src/glsl/opt_vectorize.cpp b/src/glsl/opt_vectorize.cpp index 826de5f..aa24043 100644 --- a/src/glsl/opt_vectorize.cpp +++ b/src/glsl/opt_vectorize.cpp @@ -86,6 +86,7 @@ public: virtual ir_visitor_status visit_enter(ir_expression *); virtual ir_visitor_status visit_enter(ir_if *); virtual ir_visitor_status visit_enter(ir_loop *); + virtual ir_visitor_status visit_enter(ir_texture *); virtual ir_visitor_status visit_leave(ir_assignment *); @@ -354,6 +355,18 @@ ir_vectorize_visitor::visit_enter(ir_loop *ir) } /** + * Upon entering an ir_texture, remove the current assignment from + * further consideration. Vectorizing multiple texture lookups into one + * is wrong. + */ +ir_visitor_status +ir_vectorize_visitor::visit_enter(ir_texture *) +{ + this-current_assignment = NULL; + return visit_continue_with_parent; +} + +/** * Upon leaving an ir_assignment, save a pointer to it in ::assignment[] if * the swizzle mask(s) found were appropriate. Also save a pointer in * ::last_assignment so that we can compare future assignments with it. -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/5] Enable ARB_derivative_control for i965/Gen7+
Since i965 already had derivative control via hints driconf, this was too trivial to pass up. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] i965/vec4: Assert that fine/coarse derivative ops don't appear
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 4 1 file changed, 4 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 1b46850..5a13094 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1348,7 +1348,11 @@ vec4_visitor::visit(ir_expression *ir) break; case ir_unop_dFdx: + case ir_unop_dFdx_coarse: + case ir_unop_dFdx_fine: case ir_unop_dFdy: + case ir_unop_dFdy_coarse: + case ir_unop_dFdy_fine: unreachable(derivatives not valid in vertex shader); case ir_unop_bitfield_reverse: -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] i965/fs: Support fine/coarse derivative opcodes
The quality level (fine/coarse/dont-care) is plumbed through to the generator as a constant in src1. Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_defines.h| 6 ++ src/mesa/drivers/dri/i965/brw_fs.h | 4 ++-- .../dri/i965/brw_fs_channel_expressions.cpp| 4 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 24 -- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 16 +-- 5 files changed, 44 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 3564041..1322ed2 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1004,6 +1004,12 @@ enum opcode { GS_OPCODE_GET_INSTANCE_ID, }; +enum brw_derivative_quality { + BRW_DERIVATIVE_BY_HINT = 0, + BRW_DERIVATIVE_FINE = 1, + BRW_DERIVATIVE_COARSE = 2, +}; + enum brw_urb_write_flags { BRW_URB_WRITE_NO_FLAGS = 0, diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 5cad504..a838e74 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -604,9 +604,9 @@ private: void generate_math_g45(fs_inst *inst, struct brw_reg dst, struct brw_reg src); - void generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src); + void generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src, struct brw_reg quality); void generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg src, - bool negate_value); + struct brw_reg quality, bool negate_value); void generate_scratch_write(fs_inst *inst, struct brw_reg src); void generate_scratch_read(fs_inst *inst, struct brw_reg dst); void generate_scratch_read_gen7(fs_inst *inst, struct brw_reg dst); diff --git a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp index 4113f47..d98b7eb 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp @@ -237,7 +237,11 @@ ir_channel_expressions_visitor::visit_leave(ir_assignment *ir) case ir_unop_sin_reduced: case ir_unop_cos_reduced: case ir_unop_dFdx: + case ir_unop_dFdx_coarse: + case ir_unop_dFdx_fine: case ir_unop_dFdy: + case ir_unop_dFdy_coarse: + case ir_unop_dFdy_fine: case ir_unop_bitfield_reverse: case ir_unop_bit_count: case ir_unop_find_msb: diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 1190f1f..6efd41c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -644,11 +644,17 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src * appropriate swizzling. */ void -fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src) +fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src, + struct brw_reg quality) { unsigned vstride, width; + assert(quality.file == BRW_IMMEDIATE_VALUE); + assert(quality.type == BRW_REGISTER_TYPE_D); - if (key-high_quality_derivatives) { + int quality_value = quality.dw1.d; + + if (quality_value == BRW_DERIVATIVE_FINE || + (key-high_quality_derivatives quality_value != BRW_DERIVATIVE_COARSE)) { /* produce accurate derivatives */ vstride = BRW_VERTICAL_STRIDE_2; width = BRW_WIDTH_2; @@ -680,9 +686,15 @@ fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src */ void fs_generator::generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg src, - bool negate_value) + struct brw_reg quality, bool negate_value) { - if (key-high_quality_derivatives) { + assert(quality.file == BRW_IMMEDIATE_VALUE); + assert(quality.type == BRW_REGISTER_TYPE_D); + + int quality_value = quality.dw1.d; + + if (quality_value == BRW_DERIVATIVE_FINE || + (key-high_quality_derivatives quality_value != BRW_DERIVATIVE_COARSE)) { /* From the Ivy Bridge PRM, volume 4 part 3, section 3.3.9 (Register * Region Restrictions): * @@ -1655,14 +1667,14 @@ fs_generator::generate_code(exec_list *instructions) generate_tex(inst, dst, src[0], src[1]); break; case FS_OPCODE_DDX: -generate_ddx(inst, dst, src[0]); +generate_ddx(inst, dst, src[0], src[1]); break; case FS_OPCODE_DDY: /* Make sure fp-UsesDFdy flag got set (otherwise there's no * guarantee that key-render_to_fbo is set). */ assert(fp-UsesDFdy); -generate_ddy(inst, dst, src[0], key-render_to_fbo); +
[Mesa-dev] [PATCH 5/5] docs: Mark off ARB_derivative_control for i965.
Also update 10.3 relnotes to match, and note nv50/nvc0 support there. Signed-off-by: Chris Forbes chr...@ijw.co.nz --- docs/GL3.txt| 2 +- docs/relnotes/10.3.html | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 0631c72..1c3567e 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -188,7 +188,7 @@ GL 4.5, GLSL 4.50: GL_ARB_clip_control not started GL_ARB_conditional_render_inverted not started GL_ARB_cull_distance not started - GL_ARB_derivative_controlDONE (nv50, nvc0) + GL_ARB_derivative_controlDONE (i965, nv50, nvc0) GL_ARB_direct_state_access not started GL_ARB_get_texture_sub_image started (Brian Paul) GL_ARB_shader_texture_image_samples not started diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html index a297106..3c33150 100644 --- a/docs/relnotes/10.3.html +++ b/docs/relnotes/10.3.html @@ -46,6 +46,7 @@ Note: some of the new features are only available with certain drivers. ul liGL_ARB_ES3_compatibility on nv50, nvc0, r600, radeonsi, softpipe, llvmpipe/li liGL_ARB_compressed_texture_pixel_storage on all drivers/li +liGL_ARB_derivative_control on i965, nv50, nvc0/li liGL_ARB_draw_indirect on nvc0, radeonsi/li liGL_ARB_explicit_uniform_location (all drivers that support GLSL)/li liGL_ARB_multi_draw_indirect on nvc0, radeonsi/li -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] glsl: Mark program as using dFdy if coarse/fine variant is used
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/glsl/ir_set_program_inouts.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/glsl/ir_set_program_inouts.cpp b/src/glsl/ir_set_program_inouts.cpp index 5163eb2..97ead75 100644 --- a/src/glsl/ir_set_program_inouts.cpp +++ b/src/glsl/ir_set_program_inouts.cpp @@ -306,7 +306,9 @@ ir_visitor_status ir_set_program_inouts_visitor::visit_enter(ir_expression *ir) { if (this-shader_stage == MESA_SHADER_FRAGMENT - ir-operation == ir_unop_dFdy) { + (ir-operation == ir_unop_dFdy || +ir-operation == ir_unop_dFdy_coarse || +ir-operation == ir_unop_dFdy_fine)) { gl_fragment_program *fprog = (gl_fragment_program *) prog; fprog-UsesDFdy = true; } -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] i965: Enable ARB_derivative_control on Gen7+.
The extension says GL 4.0 is required. We'll meet the spirit of that restriction by enabling on just those generations which will soon support GL 4.0 (Gen7+), although it's technically supportable on all generations. Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/intel_extensions.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index e134cd9..c672044 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -305,6 +305,7 @@ intelInitExtensions(struct gl_context *ctx) } ctx-Extensions.ARB_texture_compression_bptc = true; + ctx-Extensions.ARB_derivative_control = true; } if (brw-gen = 8) { -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] glsl: add ARB_derivative control support
Reviewed-by: Chris Forbes chr...@ijw.co.nz On Thu, Aug 14, 2014 at 4:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/glsl/builtin_functions.cpp | 48 + src/glsl/glcpp/glcpp-parse.y| 3 +++ src/glsl/glsl_parser_extras.cpp | 1 + src/glsl/glsl_parser_extras.h | 2 ++ src/glsl/ir.h | 4 src/glsl/ir_validate.cpp| 4 6 files changed, 62 insertions(+) diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp index 185fe98..c882ec8 100644 --- a/src/glsl/builtin_functions.cpp +++ b/src/glsl/builtin_functions.cpp @@ -318,6 +318,14 @@ fs_oes_derivatives(const _mesa_glsl_parse_state *state) } static bool +fs_derivative_control(const _mesa_glsl_parse_state *state) +{ + return state-stage == MESA_SHADER_FRAGMENT + (state-is_version(450, 0) || + state-ARB_derivative_control_enable); +} + +static bool tex1d_lod(const _mesa_glsl_parse_state *state) { return !state-es_shader lod_exists_in_stage(state); @@ -618,6 +626,12 @@ private: B1(dFdx); B1(dFdy); B1(fwidth); + B1(dFdxCoarse); + B1(dFdyCoarse); + B1(fwidthCoarse); + B1(dFdxFine); + B1(dFdyFine); + B1(fwidthFine); B1(noise1); B1(noise2); B1(noise3); @@ -2148,6 +2162,12 @@ builtin_builder::create_builtins() F(dFdx) F(dFdy) F(fwidth) + F(dFdxCoarse) + F(dFdyCoarse) + F(fwidthCoarse) + F(dFdxFine) + F(dFdyFine) + F(fwidthFine) F(noise1) F(noise2) F(noise3) @@ -4010,7 +4030,11 @@ builtin_builder::_textureQueryLevels(const glsl_type *sampler_type) } UNOP(dFdx, ir_unop_dFdx, fs_oes_derivatives) +UNOP(dFdxCoarse, ir_unop_dFdx_coarse, fs_derivative_control) +UNOP(dFdxFine, ir_unop_dFdx_fine, fs_derivative_control) UNOP(dFdy, ir_unop_dFdy, fs_oes_derivatives) +UNOP(dFdyCoarse, ir_unop_dFdy_coarse, fs_derivative_control) +UNOP(dFdyFine, ir_unop_dFdy_fine, fs_derivative_control) ir_function_signature * builtin_builder::_fwidth(const glsl_type *type) @@ -4024,6 +4048,30 @@ builtin_builder::_fwidth(const glsl_type *type) } ir_function_signature * +builtin_builder::_fwidthCoarse(const glsl_type *type) +{ + ir_variable *p = in_var(type, p); + MAKE_SIG(type, fs_derivative_control, 1, p); + + body.emit(ret(add(abs(expr(ir_unop_dFdx_coarse, p)), + abs(expr(ir_unop_dFdy_coarse, p); + + return sig; +} + +ir_function_signature * +builtin_builder::_fwidthFine(const glsl_type *type) +{ + ir_variable *p = in_var(type, p); + MAKE_SIG(type, fs_derivative_control, 1, p); + + body.emit(ret(add(abs(expr(ir_unop_dFdx_fine, p)), + abs(expr(ir_unop_dFdy_fine, p); + + return sig; +} + +ir_function_signature * builtin_builder::_noise1(const glsl_type *type) { return unop(v110, ir_unop_noise, glsl_type::float_type, type); diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y index a616973..f1119eb 100644 --- a/src/glsl/glcpp/glcpp-parse.y +++ b/src/glsl/glcpp/glcpp-parse.y @@ -2469,6 +2469,9 @@ _glcpp_parser_handle_version_declaration(glcpp_parser_t *parser, intmax_t versio if (extensions-ARB_shader_image_load_store) add_builtin_define(parser, GL_ARB_shader_image_load_store, 1); + + if (extensions-ARB_derivative_control) + add_builtin_define(parser, GL_ARB_derivative_control, 1); } } diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index ad91c46..490c3c8 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -514,6 +514,7 @@ static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = { EXT(ARB_arrays_of_arrays, true, false, ARB_arrays_of_arrays), EXT(ARB_compute_shader, true, false, ARB_compute_shader), EXT(ARB_conservative_depth, true, false, ARB_conservative_depth), + EXT(ARB_derivative_control, true, false, ARB_derivative_control), EXT(ARB_draw_buffers, true, false, dummy_true), EXT(ARB_draw_instanced, true, false, ARB_draw_instanced), EXT(ARB_explicit_attrib_location, true, false, ARB_explicit_attrib_location), diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index ce66e2f..c8b9478 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -393,6 +393,8 @@ struct _mesa_glsl_parse_state { bool ARB_compute_shader_warn; bool ARB_conservative_depth_enable; bool ARB_conservative_depth_warn; + bool ARB_derivative_control_enable; + bool ARB_derivative_control_warn; bool ARB_draw_buffers_enable; bool ARB_draw_buffers_warn; bool
[Mesa-dev] [PATCH 2/2] vl/compositor: set the scissor before clearing the render target
From: Christian König christian.koe...@amd.com Otherwise we clear areas that shouldn't be cleared. Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/auxiliary/vl/vl_compositor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/vl/vl_compositor.c b/src/gallium/auxiliary/vl/vl_compositor.c index 839fd27..6bd1a88 100644 --- a/src/gallium/auxiliary/vl/vl_compositor.c +++ b/src/gallium/auxiliary/vl/vl_compositor.c @@ -1060,6 +1060,7 @@ vl_compositor_render(struct vl_compositor_state *s, s-scissor.maxx = dst_surface-width; s-scissor.maxy = dst_surface-height; } + c-pipe-set_scissor_states(c-pipe, 0, 1, s-scissor); gen_vertex_data(c, s, dirty_area); @@ -1072,7 +1073,6 @@ vl_compositor_render(struct vl_compositor_state *s, dirty_area-x1 = dirty_area-y1 = MIN_DIRTY; } - c-pipe-set_scissor_states(c-pipe, 0, 1, s-scissor); c-pipe-set_framebuffer_state(c-pipe, c-fb_state); c-pipe-bind_vs_state(c-pipe, c-vs); c-pipe-set_vertex_buffers(c-pipe, 0, 1, c-vertex_buf); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] st/vdpau: fix vlVdpOutputSurfaceRender(Output|Bitmap)Surface
From: Christian König christian.koe...@amd.com Correctly handle that the source_surface is only optional. Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/state_trackers/vdpau/device.c| 43 +++- src/gallium/state_trackers/vdpau/output.c| 42 +++ src/gallium/state_trackers/vdpau/vdpau_private.h | 1 + 3 files changed, 71 insertions(+), 15 deletions(-) diff --git a/src/gallium/state_trackers/vdpau/device.c b/src/gallium/state_trackers/vdpau/device.c index 9c5ec60..efc1fde 100644 --- a/src/gallium/state_trackers/vdpau/device.c +++ b/src/gallium/state_trackers/vdpau/device.c @@ -42,6 +42,8 @@ vdp_imp_device_create_x11(Display *display, int screen, VdpDevice *device, VdpGetProcAddress **get_proc_address) { struct pipe_screen *pscreen; + struct pipe_resource *res, res_tmpl; + struct pipe_sampler_view sv_tmpl; vlVdpDevice *dev = NULL; VdpStatus ret; @@ -79,6 +81,43 @@ vdp_imp_device_create_x11(Display *display, int screen, VdpDevice *device, goto no_context; } + memset(res_tmpl, 0, sizeof(res_tmpl)); + + res_tmpl.target = PIPE_TEXTURE_2D; + res_tmpl.format = PIPE_FORMAT_R8G8B8A8_UNORM; + res_tmpl.width0 = 1; + res_tmpl.height0 = 1; + res_tmpl.depth0 = 1; + res_tmpl.array_size = 1; + res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW; + res_tmpl.usage = PIPE_USAGE_DEFAULT; + + if (!CheckSurfaceParams(pscreen, res_tmpl)) { + ret = VDP_STATUS_NO_IMPLEMENTATION; + goto no_resource; + } + + res = pscreen-resource_create(pscreen, res_tmpl); + if (!res) { + ret = VDP_STATUS_RESOURCES; + goto no_resource; + } + + memset(sv_tmpl, 0, sizeof(sv_tmpl)); + u_sampler_view_default_template(sv_tmpl, res, res-format); + + sv_tmpl.swizzle_r = PIPE_SWIZZLE_ONE; + sv_tmpl.swizzle_g = PIPE_SWIZZLE_ONE; + sv_tmpl.swizzle_b = PIPE_SWIZZLE_ONE; + sv_tmpl.swizzle_a = PIPE_SWIZZLE_ONE; + + dev-dummy_sv = dev-context-create_sampler_view(dev-context, res, sv_tmpl); + pipe_resource_reference(res, NULL); + if (!dev-dummy_sv) { + ret = VDP_STATUS_RESOURCES; + goto no_resource; + } + *device = vlAddDataHTAB(dev); if (*device == 0) { ret = VDP_STATUS_ERROR; @@ -93,8 +132,9 @@ vdp_imp_device_create_x11(Display *display, int screen, VdpDevice *device, return VDP_STATUS_OK; no_handle: + pipe_sampler_view_reference(dev-dummy_sv, NULL); +no_resource: dev-context-destroy(dev-context); - /* Destroy vscreen */ no_context: vl_screen_destroy(dev-vscreen); no_vscreen: @@ -185,6 +225,7 @@ vlVdpDeviceFree(vlVdpDevice *dev) { pipe_mutex_destroy(dev-mutex); vl_compositor_cleanup(dev-compositor); + pipe_sampler_view_reference(dev-dummy_sv, NULL); dev-context-destroy(dev-context); vl_screen_destroy(dev-vscreen); FREE(dev); diff --git a/src/gallium/state_trackers/vdpau/output.c b/src/gallium/state_trackers/vdpau/output.c index caae50f..3248f76 100644 --- a/src/gallium/state_trackers/vdpau/output.c +++ b/src/gallium/state_trackers/vdpau/output.c @@ -624,9 +624,9 @@ vlVdpOutputSurfaceRenderOutputSurface(VdpOutputSurface destination_surface, uint32_t flags) { vlVdpOutputSurface *dst_vlsurface; - vlVdpOutputSurface *src_vlsurface; struct pipe_context *context; + struct pipe_sampler_view *src_sv; struct vl_compositor *compositor; struct vl_compositor_state *cstate; @@ -639,12 +639,19 @@ vlVdpOutputSurfaceRenderOutputSurface(VdpOutputSurface destination_surface, if (!dst_vlsurface) return VDP_STATUS_INVALID_HANDLE; - src_vlsurface = vlGetDataHTAB(source_surface); - if (!src_vlsurface) - return VDP_STATUS_INVALID_HANDLE; + if (source_surface == VDP_INVALID_HANDLE) { + src_sv = dst_vlsurface-device-dummy_sv; + + } else { + vlVdpOutputSurface *src_vlsurface = vlGetDataHTAB(source_surface); + if (!src_vlsurface) + return VDP_STATUS_INVALID_HANDLE; - if (dst_vlsurface-device != src_vlsurface-device) - return VDP_STATUS_HANDLE_DEVICE_MISMATCH; + if (dst_vlsurface-device != src_vlsurface-device) + return VDP_STATUS_HANDLE_DEVICE_MISMATCH; + + src_sv = src_vlsurface-sampler_view; + } pipe_mutex_lock(dst_vlsurface-device-mutex); vlVdpResolveDelayedRendering(dst_vlsurface-device, NULL, NULL); @@ -657,7 +664,7 @@ vlVdpOutputSurfaceRenderOutputSurface(VdpOutputSurface destination_surface, vl_compositor_clear_layers(cstate); vl_compositor_set_layer_blend(cstate, 0, blend, false); - vl_compositor_set_rgba_layer(cstate, compositor, 0, src_vlsurface-sampler_view, + vl_compositor_set_rgba_layer(cstate, compositor, 0, src_sv, RectToPipe(source_rect, src_rect), NULL, ColorsToPipe(colors, flags, vlcolors)); STATIC_ASSERT(VL_COMPOSITOR_ROTATE_0 == VDP_OUTPUT_SURFACE_RENDER_ROTATE_0);
[Mesa-dev] [PATCH 08/11] glsl: enable/disable certain lowering passes for doubles
We want to restrict some lowering passes to floats only, and enable other for doubles. Signed-off-by: Dave Airlie airl...@redhat.com --- src/glsl/lower_instructions.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/glsl/lower_instructions.cpp b/src/glsl/lower_instructions.cpp index 176070c..eced619 100644 --- a/src/glsl/lower_instructions.cpp +++ b/src/glsl/lower_instructions.cpp @@ -290,7 +290,7 @@ lower_instructions_visitor::mod_to_fract(ir_expression *ir) /* Don't generate new IR that would need to be lowered in an additional * pass. */ - if (lowering(DIV_TO_MUL_RCP)) + if (lowering(DIV_TO_MUL_RCP) ir-type-is_float()) div_to_mul_rcp(div_expr); ir_rvalue *expr = new(ir) ir_expression(ir_unop_fract, @@ -511,7 +511,7 @@ lower_instructions_visitor::visit_leave(ir_expression *ir) break; case ir_binop_mod: - if (lowering(MOD_TO_FRACT) ir-type-is_float()) + if (lowering(MOD_TO_FRACT) (ir-type-is_float() || ir-type-is_double())) mod_to_fract(ir); break; @@ -526,7 +526,7 @@ lower_instructions_visitor::visit_leave(ir_expression *ir) break; case ir_binop_ldexp: - if (lowering(LDEXP_TO_ARITH)) + if (lowering(LDEXP_TO_ARITH) ir-type-is_float()) ldexp_to_arith(ir); break; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/11] glsl: add double support
This adds the guts of the fp64 implementation to the GLSL compiler. - builtin double types - double constant support - lexer parsing for double types (lf, LF) - enforcing flat on double fs inputs - double operations (d2f,f2d, pack/unpack, frexp - in 2 parts) - ir builder bits. - double constant expression handling Signed-off-by: Dave Airlie airl...@redhat.com --- src/glsl/ast.h | 2 + src/glsl/ast_function.cpp | 36 ++ src/glsl/ast_to_hir.cpp| 28 - src/glsl/builtin_type_macros.h | 16 +++ src/glsl/builtin_types.cpp | 30 + src/glsl/glsl_lexer.ll | 42 ++- src/glsl/glsl_parser.yy| 33 +- src/glsl/glsl_parser_extras.cpp| 4 + src/glsl/glsl_types.cpp| 74 ++-- src/glsl/glsl_types.h | 18 ++- src/glsl/ir.cpp| 90 +- src/glsl/ir.h | 17 +++ src/glsl/ir_builder.cpp| 11 ++ src/glsl/ir_builder.h | 3 + src/glsl/ir_clone.cpp | 1 + src/glsl/ir_constant_expression.cpp| 207 - src/glsl/ir_print_visitor.cpp | 11 ++ src/glsl/ir_set_program_inouts.cpp | 24 +++- src/glsl/ir_validate.cpp | 45 ++- src/glsl/link_uniform_initializers.cpp | 4 + src/glsl/link_uniforms.cpp | 2 + src/glsl/link_varyings.cpp | 3 +- src/mesa/program/ir_to_mesa.cpp| 6 + 23 files changed, 641 insertions(+), 66 deletions(-) diff --git a/src/glsl/ast.h b/src/glsl/ast.h index 15bf086..99274ed 100644 --- a/src/glsl/ast.h +++ b/src/glsl/ast.h @@ -189,6 +189,7 @@ enum ast_operators { ast_uint_constant, ast_float_constant, ast_bool_constant, + ast_double_constant, ast_sequence, ast_aggregate @@ -236,6 +237,7 @@ public: float float_constant; unsigned uint_constant; int bool_constant; + double double_constant; } primary_expression; diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp index 39c7bee..6169ae6 100644 --- a/src/glsl/ast_function.cpp +++ b/src/glsl/ast_function.cpp @@ -570,6 +570,10 @@ convert_component(ir_rvalue *src, const glsl_type *desired_type) result = new(ctx) ir_expression(ir_unop_i2u, new(ctx) ir_expression(ir_unop_b2i, src)); break; + case GLSL_TYPE_DOUBLE: +result = new(ctx) ir_expression(ir_unop_f2u, + new(ctx) ir_expression(ir_unop_d2f, src)); +break; } break; case GLSL_TYPE_INT: @@ -583,6 +587,10 @@ convert_component(ir_rvalue *src, const glsl_type *desired_type) case GLSL_TYPE_BOOL: result = new(ctx) ir_expression(ir_unop_b2i, src); break; + case GLSL_TYPE_DOUBLE: +result = new(ctx) ir_expression(ir_unop_f2i, + new(ctx) ir_expression(ir_unop_d2f, src)); +break; } break; case GLSL_TYPE_FLOAT: @@ -596,6 +604,9 @@ convert_component(ir_rvalue *src, const glsl_type *desired_type) case GLSL_TYPE_BOOL: result = new(ctx) ir_expression(ir_unop_b2f, desired_type, src, NULL); break; + case GLSL_TYPE_DOUBLE: +result = new(ctx) ir_expression(ir_unop_d2f, desired_type, src, NULL); +break; } break; case GLSL_TYPE_BOOL: @@ -610,8 +621,30 @@ convert_component(ir_rvalue *src, const glsl_type *desired_type) case GLSL_TYPE_FLOAT: result = new(ctx) ir_expression(ir_unop_f2b, desired_type, src, NULL); break; + case GLSL_TYPE_DOUBLE: +result = new(ctx) ir_expression(ir_unop_f2b, + new(ctx) ir_expression(ir_unop_d2f, src)); +break; } break; + case GLSL_TYPE_DOUBLE: + switch (b) { + case GLSL_TYPE_INT: + result = new(ctx) ir_expression(ir_unop_f2d, + new(ctx) ir_expression(ir_unop_i2f, src)); +break; + case GLSL_TYPE_UINT: + result = new(ctx) ir_expression(ir_unop_f2d, + new(ctx) ir_expression(ir_unop_u2f, src)); +break; + case GLSL_TYPE_BOOL: + result = new(ctx) ir_expression(ir_unop_f2d, + new(ctx) ir_expression(ir_unop_b2f, src)); +break; + case GLSL_TYPE_FLOAT: +result = new(ctx) ir_expression(ir_unop_f2d, desired_type, src, NULL); +break; + } } assert(result != NULL); @@ -1009,6 +1042,9 @@ emit_inline_vector_constructor(const glsl_type *type, case GLSL_TYPE_FLOAT: data.f[i + base_component] = c-get_float_component(i); break; + case GLSL_TYPE_DOUBLE: + data.d[i + base_component] = c-get_double_component(i); + break; case GLSL_TYPE_BOOL: data.b[i + base_component] =
[Mesa-dev] [PATCH 09/11] glsl/lower_instructions: add double lowering passes
This lowers double dot product and lrp to fma. Signed-off-by: Dave Airlie airl...@redhat.com --- src/glsl/lower_instructions.cpp | 83 + 1 file changed, 83 insertions(+) diff --git a/src/glsl/lower_instructions.cpp b/src/glsl/lower_instructions.cpp index eced619..f737556 100644 --- a/src/glsl/lower_instructions.cpp +++ b/src/glsl/lower_instructions.cpp @@ -107,6 +107,7 @@ */ #include main/core.h /* for M_LOG2E */ +#include program/prog_instruction.h /* for swizzle */ #include glsl_types.h #include ir.h #include ir_builder.h @@ -139,6 +140,8 @@ private: void ldexp_to_arith(ir_expression *); void carry_to_arith(ir_expression *); void borrow_to_arith(ir_expression *); + void double_dot_to_fma(ir_expression *); + void double_lrp(ir_expression *); }; } /* anonymous namespace */ @@ -484,10 +487,90 @@ lower_instructions_visitor::borrow_to_arith(ir_expression *ir) this-progress = true; } +void +lower_instructions_visitor::double_dot_to_fma(ir_expression *ir) +{ + ir_variable *temp = new(ir) ir_variable(ir-operands[0]-type-get_base_type(), dot_res, + ir_var_temporary); + this-base_ir-insert_before(temp); + + int nc = ir-operands[0]-type-components(); + for (int i = nc - 1; i = 1; i--) { + ir_assignment *assig; + if (i == (nc - 1)) { + assig = assign(temp, mul(swizzle(ir-operands[0]-clone(ir, NULL), i, 1), + swizzle(ir-operands[1]-clone(ir, NULL), i, 1))); + } else { + assig = assign(temp, fma(swizzle(ir-operands[0]-clone(ir, NULL), i, 1), + swizzle(ir-operands[1]-clone(ir, NULL), i, 1), + temp)); + } + this-base_ir-insert_before(assig); + } + + ir-operation = ir_triop_fma; + ir-operands[0] = swizzle(ir-operands[0], 0, 1); + ir-operands[1] = swizzle(ir-operands[1], 0, 1); + ir-operands[2] = new(ir) ir_dereference_variable(temp); + + this-progress = true; + +} + +void +lower_instructions_visitor::double_lrp(ir_expression *ir) +{ + ir_assignment *assig; + ir_constant *one = new(ir) ir_constant(1.0, ir-operands[2]-type-vector_elements); + ir_variable *temp = new(ir) ir_variable(ir-operands[0]-type, lrp_res, + ir_var_temporary); + ir_variable *t2 = new(ir) ir_variable(ir-operands[0]-type, aval, + ir_var_temporary); + int swizval; + this-base_ir-insert_before(temp); + this-base_ir-insert_before(t2); + + assig = assign(temp, mul(sub(one, ir-operands[2]), ir-operands[0])); + this-base_ir-insert_before(assig); + + switch (ir-operands[2]-type-vector_elements) { + case 1: + swizval = SWIZZLE_; + break; + case 2: + swizval = MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_X, SWIZZLE_X); + break; + case 3: + swizval = MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_X); + break; + case 4: + default: + swizval = SWIZZLE_XYZW; + break; + } + assig = assign(t2, swizzle(ir-operands[2], swizval, ir-operands[0]-type-vector_elements)); + this-base_ir-insert_before(assig); + + ir-operation = ir_triop_fma; + ir-operands[0] = new(ir) ir_dereference_variable(t2); + ir-operands[1] = ir-operands[1]; + ir-operands[2] = new(ir) ir_dereference_variable(temp); + + this-progress = true; +} + ir_visitor_status lower_instructions_visitor::visit_leave(ir_expression *ir) { switch (ir-operation) { + case ir_binop_dot: + if (ir-operands[0]-type-is_double()) + double_dot_to_fma(ir); + break; + case ir_triop_lrp: + if (ir-operands[0]-type-is_double()) + double_lrp(ir); + break; case ir_binop_sub: if (lowering(SUB_TO_ADD_NEG)) sub_to_add_neg(ir); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] initial ARB_gpu_shader_fp64 posting
This is just the mesa and glsl compiler portions of the ARB_gpu_shader_fp64 extension that I've been slowly iterating over the past few months. All in http://cgit.freedesktop.org/~airlied/mesa/log/?h=arb_gpu_shader_fp64-submit but underneath the gallium + softpipe + mesa/st development, which all need further cleaning and docs. The biggest bits of this are the builtin generator, constant expression handling and uniform interfaces. I suspect there are chunks in some patches that might need to be in other, and the uniform patches are probably not very well explained, mostly because I can't remember why exactly I did what I did in a few places. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/11] mesa: add double uniform support.
From: Dave Airlie airl...@redhat.com This adds support for the new uniform interfaces from ARB_gpu_shader_fp64. Signed-off-by: Dave Airlie airl...@redhat.com --- src/mesa/main/uniform_query.cpp | 50 + src/mesa/main/uniforms.c | 91 +++ src/mesa/main/uniforms.h | 3 +- src/mesa/program/ir_to_mesa.cpp | 17 +++- src/mesa/program/prog_parameter.c | 16 --- 5 files changed, 143 insertions(+), 34 deletions(-) diff --git a/src/mesa/main/uniform_query.cpp b/src/mesa/main/uniform_query.cpp index 7e630e6..d7024cb 100644 --- a/src/mesa/main/uniform_query.cpp +++ b/src/mesa/main/uniform_query.cpp @@ -449,6 +449,9 @@ log_uniform(const void *values, enum glsl_base_type basicType, case GLSL_TYPE_FLOAT: printf(%g , v[i].f); break; + case GLSL_TYPE_DOUBLE: +printf(%g , *(double* )v[i * 2].f); +break; default: assert(!Should not get here.); break; @@ -509,11 +512,11 @@ _mesa_propagate_uniforms_to_driver_storage(struct gl_uniform_storage *uni, */ const unsigned components = MAX2(1, uni-type-vector_elements); const unsigned vectors = MAX2(1, uni-type-matrix_columns); - + const int dmul = uni-type-base_type == GLSL_TYPE_DOUBLE ? 2 : 1; /* Store the data in the driver's requested type in the driver's storage * areas. */ - unsigned src_vector_byte_stride = components * 4; + unsigned src_vector_byte_stride = components * 4 * dmul; for (i = 0; i uni-num_driver_storage; i++) { struct gl_uniform_driver_storage *const store = uni-driver_storage[i]; @@ -612,6 +615,7 @@ _mesa_uniform(struct gl_context *ctx, struct gl_shader_program *shProg, unsigned components; unsigned src_components; enum glsl_base_type basicType; + int size_mul = 1; struct gl_uniform_storage *const uni = validate_uniform_parameters(ctx, shProg, location, count, @@ -670,6 +674,26 @@ _mesa_uniform(struct gl_context *ctx, struct gl_shader_program *shProg, basicType = GLSL_TYPE_INT; src_components = 4; break; + case GL_DOUBLE: + basicType = GLSL_TYPE_DOUBLE; + src_components = 1; + size_mul = 2; + break; + case GL_DOUBLE_VEC2: + basicType = GLSL_TYPE_DOUBLE; + src_components = 2; + size_mul = 2; + break; + case GL_DOUBLE_VEC3: + basicType = GLSL_TYPE_DOUBLE; + src_components = 3; + size_mul = 2; + break; + case GL_DOUBLE_VEC4: + basicType = GLSL_TYPE_DOUBLE; + src_components = 4; + size_mul = 2; + break; case GL_BOOL: case GL_BOOL_VEC2: case GL_BOOL_VEC3: @@ -683,6 +707,15 @@ _mesa_uniform(struct gl_context *ctx, struct gl_shader_program *shProg, case GL_FLOAT_MAT4x2: case GL_FLOAT_MAT4x3: case GL_FLOAT_MAT4: + case GL_DOUBLE_MAT2: + case GL_DOUBLE_MAT2x3: + case GL_DOUBLE_MAT2x4: + case GL_DOUBLE_MAT3x2: + case GL_DOUBLE_MAT3: + case GL_DOUBLE_MAT3x4: + case GL_DOUBLE_MAT4x2: + case GL_DOUBLE_MAT4x3: + case GL_DOUBLE_MAT4: default: _mesa_problem(NULL, Invalid type in %s, __func__); return; @@ -789,7 +822,7 @@ _mesa_uniform(struct gl_context *ctx, struct gl_shader_program *shProg, */ if (!uni-type-is_boolean()) { memcpy(uni-storage[components * offset], values, -sizeof(uni-storage[0]) * components * count); +sizeof(uni-storage[0]) * components * count * size_mul); } else { const union gl_constant_value *src = (const union gl_constant_value *) values; @@ -892,13 +925,14 @@ extern C void _mesa_uniform_matrix(struct gl_context *ctx, struct gl_shader_program *shProg, GLuint cols, GLuint rows, GLint location, GLsizei count, - GLboolean transpose, const GLfloat *values) + GLboolean transpose, + const GLvoid *values, GLenum type) { unsigned offset; unsigned vectors; unsigned components; unsigned elements; - + int size_mul = mesa_type_is_double(type) ? 2 : 1; struct gl_uniform_storage *const uni = validate_uniform_parameters(ctx, shProg, location, count, offset, glUniformMatrix, false); @@ -936,7 +970,7 @@ _mesa_uniform_matrix(struct gl_context *ctx, struct gl_shader_program *shProg, } if (ctx-_Shader-Flags GLSL_UNIFORMS) { - log_uniform(values, GLSL_TYPE_FLOAT, components, vectors, count, + log_uniform(values, uni-type-base_type, components, vectors, count, bool(transpose), shProg, location, uni); } @@ -963,11 +997,11 @@ _mesa_uniform_matrix(struct gl_context *ctx, struct gl_shader_program *shProg, if (!transpose) { memcpy(uni-storage[elements * offset], values, -sizeof(uni-storage[0]) * elements * count); +sizeof(uni-storage[0]) * elements * count * size_mul);
[Mesa-dev] [PATCH 03/11] mesa: add mesa_type_is_double helper function
This is a helper to return if a type is based on a double. Signed-off-by: Dave Airlie airl...@redhat.com --- src/mesa/program/prog_parameter.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/src/mesa/program/prog_parameter.h b/src/mesa/program/prog_parameter.h index 6b3b3c2..9ee0f5e 100644 --- a/src/mesa/program/prog_parameter.h +++ b/src/mesa/program/prog_parameter.h @@ -151,6 +151,28 @@ _mesa_lookup_parameter_constant(const struct gl_program_parameter_list *list, const gl_constant_value v[], GLuint vSize, GLint *posOut, GLuint *swizzleOut); +static INLINE GLboolean mesa_type_is_double(int dataType) +{ + switch (dataType) { + case GL_DOUBLE: + case GL_DOUBLE_VEC2: + case GL_DOUBLE_VEC3: + case GL_DOUBLE_VEC4: + case GL_DOUBLE_MAT2: + case GL_DOUBLE_MAT2x3: + case GL_DOUBLE_MAT2x4: + case GL_DOUBLE_MAT3: + case GL_DOUBLE_MAT3x2: + case GL_DOUBLE_MAT3x4: + case GL_DOUBLE_MAT4: + case GL_DOUBLE_MAT4x2: + case GL_DOUBLE_MAT4x3: + return GL_TRUE; + default: + return GL_FALSE; + } +} + #ifdef __cplusplus } #endif -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/11] glapi: add ARB_gpu_shader_fp64
From: Dave Airlie airl...@redhat.com Just add the xml file covering this extension, and dummy interface files in mesa, and fix up sanity tests. Signed-off-by: Dave Airlie airl...@redhat.com --- src/mapi/glapi/gen/ARB_gpu_shader_fp64.xml | 143 + src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 2 + src/mesa/main/tests/dispatch_sanity.cpp| 36 src/mesa/main/uniforms.c | 95 +++ src/mesa/main/uniforms.h | 43 + 6 files changed, 302 insertions(+), 18 deletions(-) create mode 100644 src/mapi/glapi/gen/ARB_gpu_shader_fp64.xml diff --git a/src/mapi/glapi/gen/ARB_gpu_shader_fp64.xml b/src/mapi/glapi/gen/ARB_gpu_shader_fp64.xml new file mode 100644 index 000..4f860ef --- /dev/null +++ b/src/mapi/glapi/gen/ARB_gpu_shader_fp64.xml @@ -0,0 +1,143 @@ +?xml version=1.0? +!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd + +OpenGLAPI + +category name=GL_ARB_gpu_shader_fp64 number=89 + +function name=Uniform1d offset=assign +param name=location type=GLint/ +param name=x type=GLdouble/ +/function + +function name=Uniform2d offset=assign +param name=location type=GLint/ +param name=x type=GLdouble/ +param name=y type=GLdouble/ +/function + +function name=Uniform3d offset=assign +param name=location type=GLint/ +param name=x type=GLdouble/ +param name=y type=GLdouble/ +param name=z type=GLdouble/ +/function + +function name=Uniform4d offset=assign +param name=location type=GLint/ +param name=x type=GLdouble/ +param name=y type=GLdouble/ +param name=z type=GLdouble/ +param name=w type=GLdouble/ +/function + +function name=Uniform1dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=value type=const GLdouble */ +/function + +function name=Uniform2dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=value type=const GLdouble */ +/function + +function name=Uniform3dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=value type=const GLdouble */ +/function + +function name=Uniform4dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=value type=const GLdouble */ +/function + +function name=UniformMatrix2dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=transpose type=GLboolean/ +param name=value type=const GLdouble */ +/function + +function name=UniformMatrix3dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=transpose type=GLboolean/ +param name=value type=const GLdouble */ +/function + +function name=UniformMatrix4dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=transpose type=GLboolean/ +param name=value type=const GLdouble */ +/function + +function name=UniformMatrix2x3dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=transpose type=GLboolean/ +param name=value type=const GLdouble */ +/function + +function name=UniformMatrix2x4dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=transpose type=GLboolean/ +param name=value type=const GLdouble */ +/function + +function name=UniformMatrix3x2dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=transpose type=GLboolean/ +param name=value type=const GLdouble */ +/function + +function name=UniformMatrix3x4dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=transpose type=GLboolean/ +param name=value type=const GLdouble */ +/function + +function name=UniformMatrix4x2dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=transpose type=GLboolean/ +param name=value type=const GLdouble */ +/function + +function name=UniformMatrix4x3dv offset=assign +param name=location type=GLint/ +param name=count type=GLsizei/ +param name=transpose type=GLboolean/ +param name=value type=const GLdouble */ +/function + +function name=GetUniformdv offset=assign +param name=program type=GLuint/ +param name=location type=GLint/ +param name=params type=GLdouble */ +/function + +enum name=DOUBLE_VEC2
[Mesa-dev] [PATCH 04/11] glsl: add double type
From: Dave Airlie airl...@redhat.com This just adds a placeholder for the GLSL_TYPE_DOUBLE. This causes a lot of warnings about unchecked type in switch statements - fix them later. Signed-off-by: Dave Airlie airl...@redhat.com --- src/glsl/glsl_types.h | 1 + 1 file changed, 1 insertion(+) diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h index d545533..e00a3e0 100644 --- a/src/glsl/glsl_types.h +++ b/src/glsl/glsl_types.h @@ -51,6 +51,7 @@ enum glsl_base_type { GLSL_TYPE_UINT = 0, GLSL_TYPE_INT, GLSL_TYPE_FLOAT, + GLSL_TYPE_DOUBLE, GLSL_TYPE_BOOL, GLSL_TYPE_SAMPLER, GLSL_TYPE_IMAGE, -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/11] glsl: add ARB_gpu_shader_fp64 to the glsl extensions.
From: Dave Airlie airl...@redhat.com Signed-off-by: Dave Airlie airl...@redhat.com --- src/glsl/glsl_parser_extras.cpp | 1 + src/glsl/glsl_parser_extras.h | 2 ++ src/glsl/standalone_scaffolding.cpp | 1 + 3 files changed, 4 insertions(+) diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index ad91c46..53fbb25 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -521,6 +521,7 @@ static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = { EXT(ARB_fragment_coord_conventions, true, false, ARB_fragment_coord_conventions), EXT(ARB_fragment_layer_viewport,true, false, ARB_fragment_layer_viewport), EXT(ARB_gpu_shader5,true, false, ARB_gpu_shader5), + EXT(ARB_gpu_shader_fp64,true, false, ARB_gpu_shader_fp64), EXT(ARB_sample_shading, true, false, ARB_sample_shading), EXT(ARB_separate_shader_objects,true, false, dummy_true), EXT(ARB_shader_atomic_counters, true, false, ARB_shader_atomic_counters), diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index ce66e2f..6f5c0b1 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -407,6 +407,8 @@ struct _mesa_glsl_parse_state { bool ARB_fragment_layer_viewport_warn; bool ARB_gpu_shader5_enable; bool ARB_gpu_shader5_warn; + bool ARB_gpu_shader_fp64_enable; + bool ARB_gpu_shader_fp64_warn; bool ARB_sample_shading_enable; bool ARB_sample_shading_warn; bool ARB_separate_shader_objects_enable; diff --git a/src/glsl/standalone_scaffolding.cpp b/src/glsl/standalone_scaffolding.cpp index 2b76dd1..63e3cde 100644 --- a/src/glsl/standalone_scaffolding.cpp +++ b/src/glsl/standalone_scaffolding.cpp @@ -100,6 +100,7 @@ void initialize_context_to_defaults(struct gl_context *ctx, gl_api api) ctx-Extensions.ARB_fragment_coord_conventions = true; ctx-Extensions.ARB_fragment_layer_viewport = true; ctx-Extensions.ARB_gpu_shader5 = true; + ctx-Extensions.ARB_gpu_shader_fp64 = true; ctx-Extensions.ARB_sample_shading = true; ctx-Extensions.ARB_shader_bit_encoding = true; ctx-Extensions.ARB_shader_stencil_export = true; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/11] glsl: lower double optional passes
These lowering passes are optional for the backend to request, currently the TGSI softpipe backend most likely the r600g backend would want to use these passes as is. They aim to hit the gallium opcodes from the standard rounding/truncation functions. Signed-off-by: Dave Airlie airl...@redhat.com --- src/glsl/ir_optimization.h | 1 + src/glsl/lower_instructions.cpp | 209 2 files changed, 210 insertions(+) diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h index b83c225..72ac3a9 100644 --- a/src/glsl/ir_optimization.h +++ b/src/glsl/ir_optimization.h @@ -40,6 +40,7 @@ #define LDEXP_TO_ARITH 0x100 #define CARRY_TO_ARITH 0x200 #define BORROW_TO_ARITH0x400 +#define DOPS_TO_DFRAC 0x800 /** * \see class lower_packing_builtins_visitor diff --git a/src/glsl/lower_instructions.cpp b/src/glsl/lower_instructions.cpp index f737556..6da144e 100644 --- a/src/glsl/lower_instructions.cpp +++ b/src/glsl/lower_instructions.cpp @@ -41,6 +41,7 @@ * - BITFIELD_INSERT_TO_BFM_BFI * - CARRY_TO_ARITH * - BORROW_TO_ARITH + * - DOPS_TO_DFRAC * * SUB_TO_ADD_NEG: * --- @@ -104,6 +105,9 @@ * * Converts ir_borrow into (x y). * + * DOPS_TO_DFRAC: + * -- + * Converts double trunc, ceil, floor, round to fract */ #include main/core.h /* for M_LOG2E */ @@ -142,6 +146,11 @@ private: void borrow_to_arith(ir_expression *); void double_dot_to_fma(ir_expression *); void double_lrp(ir_expression *); + void dceil_to_dfrac(ir_expression *); + void dfloor_to_dfrac(ir_expression *); + void dround_even_to_dfrac(ir_expression *); + void dtrunc_to_dfrac(ir_expression *); + void dsign_to_csel(ir_expression *); }; } /* anonymous namespace */ @@ -559,6 +568,182 @@ lower_instructions_visitor::double_lrp(ir_expression *ir) this-progress = true; } +void +lower_instructions_visitor::dceil_to_dfrac(ir_expression *ir) +{ + /* +* frtemp = frac(x); +* temp = sub(x, frtemp); +* result = temp + ((frtemp != 0.0) ? 1.0 : 0.0); +*/ + ir_instruction i = *base_ir; + ir_constant *zero = new(ir) ir_constant(0.0, ir-operands[0]-type-vector_elements); + ir_constant *one = new(ir) ir_constant(1.0, ir-operands[0]-type-vector_elements); + ir_variable *frtemp = new(ir) ir_variable(ir-operands[0]-type, frtemp, + ir_var_temporary); + ir_variable *temp = new(ir) ir_variable(ir-operands[0]-type, temp, + ir_var_temporary); + ir_variable *t2 = new(ir) ir_variable(ir-operands[0]-type, t2, + ir_var_temporary); + + i.insert_before(frtemp); + i.insert_before(assign(frtemp, fract(ir-operands[0]))); + + i.insert_before(temp); + i.insert_before(assign(temp, sub(ir-operands[0]-clone(ir, NULL), frtemp))); + + i.insert_before(t2); + i.insert_before(assign(t2, csel(nequal(frtemp, zero), one, zero-clone(ir, NULL; + ir-operation = ir_binop_add; + ir-operands[0] = new(ir) ir_dereference_variable(temp); + ir-operands[1] = new(ir) ir_dereference_variable(t2); +} + +void +lower_instructions_visitor::dfloor_to_dfrac(ir_expression *ir) +{ + /* +* frtemp = frac(x); +* result = sub(x, frtemp); +*/ + ir_instruction i = *base_ir; + ir_variable *frtemp = new(ir) ir_variable(ir-operands[0]-type, frtemp, + ir_var_temporary); + + i.insert_before(frtemp); + i.insert_before(assign(frtemp, fract(ir-operands[0]-clone(ir, NULL; + + ir-operation = ir_binop_sub; + ir-operands[1] = new(ir) ir_dereference_variable(frtemp); +} +void +lower_instructions_visitor::dround_even_to_dfrac(ir_expression *ir) +{ + /* +* insane but works +* temp = x + 0.5; +* frtemp = frac(temp); +* t2 = sub(temp, frtemp); +* if (frac(x) == 0.5) +* result = frac(t2 * 0.5) == 0 ? t2 : t2 - 1; +* else +* result = t2; + +*/ + const unsigned vec_elem = ir-type-vector_elements; + const glsl_type *bvec = glsl_type::get_instance(GLSL_TYPE_BOOL, vec_elem, 1); + ir_instruction i = *base_ir; + ir_variable *frtemp = new(ir) ir_variable(ir-operands[0]-type, frtemp, + ir_var_temporary); + ir_variable *temp = new(ir) ir_variable(ir-operands[0]-type, temp, + ir_var_temporary); + ir_variable *t2 = new(ir) ir_variable(ir-operands[0]-type, t2, + ir_var_temporary); + ir_variable *t3 = new(ir) ir_variable(bvec, t3, + ir_var_temporary); + ir_variable *t4 = new(ir) ir_variable(bvec, t4, + ir_var_temporary); + ir_variable *t5 = new(ir) ir_variable(ir-operands[0]-type, t5, + ir_var_temporary); + ir_constant *p5 =
[Mesa-dev] [PATCH 10/11] glsl: implement double builtin functions
This implements the bulk of the builtin functions for fp64 support. Signed-off-by: Dave Airlie airl...@redhat.com --- src/glsl/builtin_functions.cpp | 751 +++-- 1 file changed, 492 insertions(+), 259 deletions(-) diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp index 185fe98..b190fcd 100644 --- a/src/glsl/builtin_functions.cpp +++ b/src/glsl/builtin_functions.cpp @@ -373,6 +373,12 @@ gs_streams(const _mesa_glsl_parse_state *state) return gpu_shader5(state) gs_only(state); } +static bool +fp64(const _mesa_glsl_parse_state *state) +{ + return state-is_version(400, 0) || state-ARB_gpu_shader_fp64_enable; +} + /** @} */ /**/ @@ -428,6 +434,7 @@ private: ir_constant *imm(float f, unsigned vector_elements=1); ir_constant *imm(int i, unsigned vector_elements=1); ir_constant *imm(unsigned u, unsigned vector_elements=1); + ir_constant *imm(double d, unsigned vector_elements=1); ir_constant *imm(const glsl_type *type, const ir_constant_data ); ir_dereference_variable *var_ref(ir_variable *var); ir_dereference_array *array_ref(ir_variable *var, int i); @@ -517,29 +524,29 @@ private: B1(log) B1(exp2) B1(log2) - B1(sqrt) - B1(inversesqrt) - B1(abs) - B1(sign) - B1(floor) - B1(trunc) - B1(round) - B1(roundEven) - B1(ceil) - B1(fract) + BA1(sqrt) + BA1(inversesqrt) + BA1(abs) + BA1(sign) + BA1(floor) + BA1(trunc) + BA1(round) + BA1(roundEven) + BA1(ceil) + BA1(fract) B2(mod) - B1(modf) + BA1(modf) BA2(min) BA2(max) BA2(clamp) - B2(mix_lrp) + BA2(mix_lrp) ir_function_signature *_mix_sel(builtin_available_predicate avail, const glsl_type *val_type, const glsl_type *blend_type); - B2(step) - B2(smoothstep) - B1(isnan) - B1(isinf) + BA2(step) + BA2(smoothstep) + BA1(isnan) + BA1(isinf) B1(floatBitsToInt) B1(floatBitsToUint) B1(intBitsToFloat) @@ -554,24 +561,27 @@ private: ir_function_signature *_unpackSnorm4x8(builtin_available_predicate avail); ir_function_signature *_packHalf2x16(builtin_available_predicate avail); ir_function_signature *_unpackHalf2x16(builtin_available_predicate avail); - B1(length) - B1(distance); - B1(dot); - B1(cross); - B1(normalize); + ir_function_signature *_packDouble2x32(builtin_available_predicate avail); + ir_function_signature *_unpackDouble2x32(builtin_available_predicate avail); + + BA1(length) + BA1(distance); + BA1(dot); + BA1(cross); + BA1(normalize); B0(ftransform); - B1(faceforward); - B1(reflect); - B1(refract); - B1(matrixCompMult); - B1(outerProduct); - B0(determinant_mat2); - B0(determinant_mat3); - B0(determinant_mat4); - B0(inverse_mat2); - B0(inverse_mat3); - B0(inverse_mat4); - B1(transpose); + BA1(faceforward); + BA1(reflect); + BA1(refract); + BA1(matrixCompMult); + BA1(outerProduct); + BA1(determinant_mat2); + BA1(determinant_mat3); + BA1(determinant_mat4); + BA1(inverse_mat2); + BA1(inverse_mat3); + BA1(inverse_mat4); + BA1(transpose); BA1(lessThan); BA1(lessThanEqual); BA1(greaterThan); @@ -629,9 +639,10 @@ private: B1(bitCount) B1(findLSB) B1(findMSB) - B1(fma) + BA1(fma) B2(ldexp) B2(frexp) + B2(dfrexp) B1(uaddCarry) B1(usubBorrow) B1(mulExtended) @@ -800,6 +811,42 @@ builtin_builder::create_builtins() _##NAME(glsl_type::vec4_type), \ NULL); +#define FD(NAME) \ + add_function(#NAME, \ +_##NAME(always_available, glsl_type::float_type), \ +_##NAME(always_available, glsl_type::vec2_type), \ +_##NAME(always_available, glsl_type::vec3_type), \ +_##NAME(always_available, glsl_type::vec4_type), \ +_##NAME(fp64, glsl_type::double_type), \ +_##NAME(fp64, glsl_type::dvec2_type),\ +_##NAME(fp64, glsl_type::dvec3_type), \ +_##NAME(fp64, glsl_type::dvec4_type), \ +NULL); + +#define FD130(NAME) \ + add_function(#NAME, \ +_##NAME(v130, glsl_type::float_type), \ +_##NAME(v130, glsl_type::vec2_type), \ +_##NAME(v130, glsl_type::vec3_type), \ +_##NAME(v130, glsl_type::vec4_type), \ +_##NAME(fp64, glsl_type::double_type), \ +_##NAME(fp64, glsl_type::dvec2_type),\ +_##NAME(fp64, glsl_type::dvec3_type), \ +_##NAME(fp64, glsl_type::dvec4_type), \ +NULL); + +#define FDGS5(NAME) \ +
[Mesa-dev] [PATCH 02/11] mesa: add ARB_gpu_shader_fp64 extension info
From: Dave Airlie airl...@redhat.com This just adds the entries to extensions.c and mtypes.h Signed-off-by: Dave Airlie airl...@redhat.com --- src/mesa/main/extensions.c | 1 + src/mesa/main/mtypes.h | 1 + 2 files changed, 2 insertions(+) diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index 4f322d0..1445a9d 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -117,6 +117,7 @@ static const struct extension extension_table[] = { { GL_ARB_framebuffer_sRGB,o(EXT_framebuffer_sRGB), GL, 1998 }, { GL_ARB_get_program_binary, o(dummy_true), GL, 2010 }, { GL_ARB_gpu_shader5, o(ARB_gpu_shader5), GL, 2010 }, + { GL_ARB_gpu_shader_fp64, o(ARB_gpu_shader_fp64), GL, 2010 }, { GL_ARB_half_float_pixel,o(dummy_true), GL, 2003 }, { GL_ARB_half_float_vertex, o(ARB_half_float_vertex), GL, 2008 }, { GL_ARB_instanced_arrays,o(ARB_instanced_arrays), GL, 2008 }, diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 742ce3e..121f2ea 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3572,6 +3572,7 @@ struct gl_extensions GLboolean ARB_explicit_uniform_location; GLboolean ARB_geometry_shader4; GLboolean ARB_gpu_shader5; + GLboolean ARB_gpu_shader_fp64; GLboolean ARB_half_float_vertex; GLboolean ARB_instanced_arrays; GLboolean ARB_internalformat_query; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/5] gallium: add opcodes/cap for fine derivative support
On Thu, Aug 14, 2014 at 6:52 AM, Ilia Mirkin imir...@alum.mit.edu wrote: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/auxiliary/tgsi/tgsi_info.c | 3 +++ src/gallium/auxiliary/tgsi/tgsi_util.c | 2 ++ src/gallium/docs/source/screen.rst | 2 ++ src/gallium/docs/source/tgsi.rst | 12 ++-- src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + src/gallium/include/pipe/p_shader_tokens.h | 5 - 19 files changed, 35 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index e24348f..35f9747 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -235,6 +235,9 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 1, 0, 0, 0, 0, OTHR, INTERP_CENTROID, TGSI_OPCODE_INTERP_CENTROID }, { 1, 2, 0, 0, 0, 0, OTHR, INTERP_SAMPLE, TGSI_OPCODE_INTERP_SAMPLE }, { 1, 2, 0, 0, 0, 0, OTHR, INTERP_OFFSET, TGSI_OPCODE_INTERP_OFFSET }, + + { 1, 1, 0, 0, 0, 0, COMP, DDX_FINE, TGSI_OPCODE_DDX_FINE }, + { 1, 1, 0, 0, 0, 0, COMP, DDY_FINE, TGSI_OPCODE_DDY_FINE }, It would be nice to fill in some of the unused slots, e.g. 79 and 80. Other than that: Reviewed-by: Marek Olšák marek.ol...@amd.com Marek }; const struct tgsi_opcode_info * diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c b/src/gallium/auxiliary/tgsi/tgsi_util.c index e48159c..e1cba95 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_util.c +++ b/src/gallium/auxiliary/tgsi/tgsi_util.c @@ -245,6 +245,8 @@ tgsi_util_get_inst_usage_mask(const struct tgsi_full_instruction *inst, case TGSI_OPCODE_USNE: case TGSI_OPCODE_IMUL_HI: case TGSI_OPCODE_UMUL_HI: + case TGSI_OPCODE_DDX_FINE: + case TGSI_OPCODE_DDY_FINE: /* Channel-wise operations */ read_mask = write_mask; break; diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 814e3ae..6fecc15 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -213,6 +213,8 @@ The integer capabilities: * ``PIPE_CAP_DRAW_INDIRECT``: Whether the driver supports taking draw arguments { count, instance_count, start, index_bias } from a PIPE_BUFFER resource. See pipe_draw_info. +* ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE``: Whether the fragment shader supports + the FINE versions of DDX/DDY. .. _pipe_capf: diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index ac0ea54..7d5918f 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -433,7 +433,11 @@ This instruction replicates its result. dst = \cos{src.x} -.. opcode:: DDX - Derivative Relative To X +.. opcode:: DDX, DDX_FINE - Derivative Relative To X + +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is +advertised. When it is, the fine version guarantees one derivative per row +while DDX is allowed to be the same for the entire 2x2 quad. .. math:: @@ -446,7 +450,11 @@ This instruction replicates its result. dst.w = partialx(src.w) -.. opcode:: DDY - Derivative Relative To Y +.. opcode:: DDY, DDY_FINE - Derivative Relative To Y + +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is +advertised. When it is, the fine version guarantees one derivative per column +while DDY is allowed to be the same for the entire 2x2 quad. .. math:: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index de69b14..b156d8b 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -216,6 +216,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_TEXTURE_GATHER_OFFSETS: case PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION: case PIPE_CAP_DRAW_INDIRECT: + case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: return 0; /* Stream output. */ diff --git a/src/gallium/drivers/i915/i915_screen.c
Re: [Mesa-dev] [PATCH 4/5] mesa/st: add support for emitting fine derivative opcodes
Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Thu, Aug 14, 2014 at 6:52 AM, Ilia Mirkin imir...@alum.mit.edu wrote: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/mesa/state_tracker/st_extensions.c | 3 ++- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 9 - 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index eace321..24e886c 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -458,7 +458,8 @@ void st_init_extensions(struct pipe_screen *screen, { o(ARB_texture_multisample), PIPE_CAP_TEXTURE_MULTISAMPLE }, { o(ARB_texture_query_lod),PIPE_CAP_TEXTURE_QUERY_LOD }, { o(ARB_sample_shading), PIPE_CAP_SAMPLE_SHADING }, - { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT } + { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT }, + { o(ARB_derivative_control), PIPE_CAP_TGSI_FS_FINE_DERIVATIVE }, }; /* Required: render target and sampler support */ diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 4898166..84bdc4f 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -1462,9 +1462,15 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir) break; case ir_unop_dFdx: + case ir_unop_dFdx_coarse: emit(ir, TGSI_OPCODE_DDX, result_dst, op[0]); break; + case ir_unop_dFdx_fine: + emit(ir, TGSI_OPCODE_DDX_FINE, result_dst, op[0]); + break; case ir_unop_dFdy: + case ir_unop_dFdy_coarse: + case ir_unop_dFdy_fine: { /* The X component contains 1 or -1 depending on whether the framebuffer * is a FBO or the window system buffer, respectively. @@ -1485,7 +1491,8 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir) st_src_reg temp = get_temp(glsl_type::vec4_type); emit(ir, TGSI_OPCODE_MUL, st_dst_reg(temp), transform_y, op[0]); - emit(ir, TGSI_OPCODE_DDY, result_dst, temp); + emit(ir, ir-operation == ir_unop_dFdy_fine ? + TGSI_OPCODE_DDY_FINE : TGSI_OPCODE_DDY, result_dst, temp); break; } -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/5] nv50, nvc0: add support for fine derivatives
Are you gonna update the release notes too? Marek On Thu, Aug 14, 2014 at 6:52 AM, Ilia Mirkin imir...@alum.mit.edu wrote: The quadop-based method we currently use on all chipsets already provides the fine version of the derivatives. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- docs/GL3.txt | 2 +- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 4 src/gallium/drivers/nouveau/nv50/nv50_screen.c| 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c| 2 +- 4 files changed, 7 insertions(+), 3 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 89529fe..0a40e23 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -189,7 +189,7 @@ GL 4.5, GLSL 4.50: GL_ARB_clip_control not started GL_ARB_conditional_render_inverted not started GL_ARB_cull_distance not started - GL_ARB_derivative_controlnot started + GL_ARB_derivative_controlDONE (nv50, nvc0) GL_ARB_direct_state_access not started GL_ARB_get_texture_sub_image started (Brian Paul) GL_ARB_shader_texture_image_samples not started diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 14b6d68..456efcb 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -531,7 +531,9 @@ static nv50_ir::operation translateOpcode(uint opcode) NV50_IR_OPCODE_CASE(COS, COS); NV50_IR_OPCODE_CASE(DDX, DFDX); + NV50_IR_OPCODE_CASE(DDX_FINE, DFDX); NV50_IR_OPCODE_CASE(DDY, DFDY); + NV50_IR_OPCODE_CASE(DDY_FINE, DFDY); NV50_IR_OPCODE_CASE(KILL, DISCARD); NV50_IR_OPCODE_CASE(SEQ, SET); @@ -2327,6 +2329,8 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) case TGSI_OPCODE_NOT: case TGSI_OPCODE_DDX: case TGSI_OPCODE_DDY: + case TGSI_OPCODE_DDX_FINE: + case TGSI_OPCODE_DDY_FINE: FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) mkOp1(op, dstTy, dst0[c], fetchSrc(0, c)); break; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 34cca3d..8a9a40e 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -169,6 +169,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_USER_VERTEX_BUFFERS: case PIPE_CAP_TEXTURE_MULTISAMPLE: case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER: + case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: return 1; case PIPE_CAP_SEAMLESS_CUBE_MAP: return 1; /* class_3d = NVA0_3D_CLASS; */ @@ -200,7 +201,6 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION: case PIPE_CAP_COMPUTE: case PIPE_CAP_DRAW_INDIRECT: - case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: return 0; } diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index 17aee63..c6d9b91 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -167,6 +167,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_SAMPLE_SHADING: case PIPE_CAP_TEXTURE_GATHER_OFFSETS: case PIPE_CAP_TEXTURE_GATHER_SM5: + case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: return 1; case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: return (class_3d = NVE4_3D_CLASS) ? 1 : 0; @@ -184,7 +185,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT: case PIPE_CAP_FAKE_SW_MSAA: case PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION: - case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: return 0; } -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/37] i965/gs: Use single dispatch mode as fallback to dual object mode when possible.
Currently, when a geometry shader can't use dual object mode we fall back to dual instance mode, however, when invocations == 1, single dispatch mode is more performant and equally efficient in terms of register pressure. Single dispatch mode requires that the driver can handle interleaving of registers, but this is already supported (dual instance mode has the same requirement). --- src/mesa/drivers/dri/i965/brw_context.h | 8 --- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 26 +++ src/mesa/drivers/dri/i965/gen7_gs_state.c | 4 +--- src/mesa/drivers/dri/i965/gen8_gs_state.c | 4 +--- 4 files changed, 20 insertions(+), 22 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 1bbcf46..7439da1 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -587,10 +587,12 @@ struct brw_gs_prog_data int invocations; /** -* True if the thread should be dispatched in DUAL_INSTANCE mode, false if -* it should be dispatched in DUAL_OBJECT mode. +* Dispatch mode, can be any of: +* GEN7_GS_DISPATCH_MODE_DUAL_OBJECT +* GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE +* GEN7_GS_DISPATCH_MODE_SINGLE */ - bool dual_instanced_dispatch; + int dispatch_mode; }; /** Number of texture sampler units */ diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp index b7995ad..c2a4892 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp @@ -101,10 +101,11 @@ vec4_gs_visitor::setup_payload() { int attribute_map[BRW_VARYING_SLOT_COUNT * MAX_GS_INPUT_VERTICES]; - /* If we are in dual instanced mode, then attributes are going to be -* interleaved, so one register contains two attribute slots. + /* If we are in dual instanced or single mode, then attributes are going +* to be interleaved, so one register contains two attribute slots. */ - int attributes_per_reg = c-prog_data.dual_instanced_dispatch ? 2 : 1; + int attributes_per_reg = + c-prog_data.dispatch_mode == GEN7_GS_DISPATCH_MODE_DUAL_OBJECT ? 1 : 2; /* If a geometry shader tries to read from an input that wasn't written by * the vertex shader, that produces undefined results, but it shouldn't @@ -129,8 +130,7 @@ vec4_gs_visitor::setup_payload() reg = setup_varying_inputs(reg, attribute_map, attributes_per_reg); - lower_attributes_to_hw_regs(attribute_map, - c-prog_data.dual_instanced_dispatch); + lower_attributes_to_hw_regs(attribute_map, attributes_per_reg 1); this-first_non_payload_grf = reg; } @@ -640,7 +640,7 @@ brw_gs_emit(struct brw_context *brw, */ if (c-prog_data.invocations = 1 likely(!(INTEL_DEBUG DEBUG_NO_DUAL_OBJECT_GS))) { - c-prog_data.dual_instanced_dispatch = false; + c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_OBJECT; vec4_gs_visitor v(brw, c, prog, mem_ctx, true /* no_spills */); if (v.run()) { @@ -652,15 +652,15 @@ brw_gs_emit(struct brw_context *brw, /* Either we failed to compile in DUAL_OBJECT mode (probably because it * would have required spilling) or DUAL_OBJECT mode is disabled. So fall -* back to DUAL_INSTANCED mode, which consumes fewer registers. +* back to DUAL_INSTANCED or SINGLE mode, which consumes fewer registers. * -* FIXME: In an ideal world we'd fall back to SINGLE mode, which would -* allow us to interleave general purpose registers (resulting in even less -* likelihood of spilling). But at the moment, the vec4 generator and -* visitor classes don't have the infrastructure to interleave general -* purpose registers, so DUAL_INSTANCED is the best we can do. +* SINGLE mode is more performant when invocations == 1 and DUAL_INSTANCE +* mode is more performant when invocations 1. */ - c-prog_data.dual_instanced_dispatch = true; + if (c-prog_data.invocations = 1) + c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_SINGLE; + else + c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE; vec4_gs_visitor v(brw, c, prog, mem_ctx, false /* no_spills */); if (!v.run()) { diff --git a/src/mesa/drivers/dri/i965/gen7_gs_state.c b/src/mesa/drivers/dri/i965/gen7_gs_state.c index 93f48f6..b3b4ee6 100644 --- a/src/mesa/drivers/dri/i965/gen7_gs_state.c +++ b/src/mesa/drivers/dri/i965/gen7_gs_state.c @@ -145,9 +145,7 @@ upload_gs_state(struct brw_context *brw) GEN7_GS_CONTROL_DATA_HEADER_SIZE_SHIFT) | ((brw-gs.prog_data-invocations - 1) GEN7_GS_INSTANCE_CONTROL_SHIFT) | - (brw-gs.prog_data-dual_instanced_dispatch ? - GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE : - GEN7_GS_DISPATCH_MODE_DUAL_OBJECT) | + brw-gs.prog_data-dispatch_mode |
[Mesa-dev] [PATCH 02/37] i965/gen6/gs: refactor gen6_gs_state
From: Samuel Iglesias Gonsalvez sigles...@igalia.com Currently, gen6 only uses geometry shaders for transform feedback so the state we emit is not suitable to accomodate general purpose, user-provided geometry shaders. This patch paves the way to add these support and the needed 3DSTATE_GS packet modifications for it. Previous code that emitted state to implement transform feedback in gen6 goes to upload_gs_state_adhoc_tf(). Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/gen6_gs_state.c | 105 ++ 1 file changed, 94 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c b/src/mesa/drivers/dri/i965/gen6_gs_state.c index 9648fb7..e132959 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_state.c +++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c @@ -31,7 +31,7 @@ #include intel_batchbuffer.h static void -upload_gs_state(struct brw_context *brw) +upload_gs_state_for_tf(struct brw_context *brw) { /* Disable all the constant buffers. */ BEGIN_BATCH(5); @@ -49,11 +49,11 @@ upload_gs_state(struct brw_context *brw) OUT_BATCH(GEN6_GS_SPF_MODE | GEN6_GS_VECTOR_MASK_ENABLE); OUT_BATCH(0); /* no scratch space */ OUT_BATCH((2 GEN6_GS_DISPATCH_START_GRF_SHIFT) | - (brw-ff_gs.prog_data-urb_read_length GEN6_GS_URB_READ_LENGTH_SHIFT)); +(brw-ff_gs.prog_data-urb_read_length GEN6_GS_URB_READ_LENGTH_SHIFT)); OUT_BATCH(((brw-max_gs_threads - 1) GEN6_GS_MAX_THREADS_SHIFT) | - GEN6_GS_STATISTICS_ENABLE | - GEN6_GS_SO_STATISTICS_ENABLE | - GEN6_GS_RENDERING_ENABLE); +GEN6_GS_STATISTICS_ENABLE | +GEN6_GS_SO_STATISTICS_ENABLE | +GEN6_GS_RENDERING_ENABLE); OUT_BATCH(GEN6_GS_SVBI_PAYLOAD_ENABLE | GEN6_GS_SVBI_POSTINCREMENT_ENABLE | (brw-ff_gs.prog_data-svbi_postincrement_value @@ -65,24 +65,107 @@ upload_gs_state(struct brw_context *brw) OUT_BATCH(_3DSTATE_GS 16 | (7 - 2)); OUT_BATCH(0); /* prog_bo */ OUT_BATCH((0 GEN6_GS_SAMPLER_COUNT_SHIFT) | - (0 GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT)); +(0 GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT)); OUT_BATCH(0); /* scratch space base offset */ OUT_BATCH((1 GEN6_GS_DISPATCH_START_GRF_SHIFT) | - (0 GEN6_GS_URB_READ_LENGTH_SHIFT) | - (0 GEN6_GS_URB_ENTRY_READ_OFFSET_SHIFT)); +(0 GEN6_GS_URB_READ_LENGTH_SHIFT) | +(0 GEN6_GS_URB_ENTRY_READ_OFFSET_SHIFT)); OUT_BATCH((0 GEN6_GS_MAX_THREADS_SHIFT) | - GEN6_GS_STATISTICS_ENABLE | - GEN6_GS_RENDERING_ENABLE); +GEN6_GS_STATISTICS_ENABLE | +GEN6_GS_RENDERING_ENABLE); + OUT_BATCH(0); + ADVANCE_BATCH(); + } +} + +static void +upload_gs_state(struct brw_context *brw) +{ + /* BRW_NEW_GEOMETRY_PROGRAM */ + bool active = brw-geometry_program; + /* CACHE_NEW_GS_PROG */ + const struct brw_vec4_prog_data *prog_data = brw-gs.prog_data-base; + const struct brw_stage_state *stage_state = brw-gs.base; + + if (active) { + /* FIXME: enable constant buffers */ + BEGIN_BATCH(5); + OUT_BATCH(_3DSTATE_CONSTANT_GS 16 | (5 - 2)); + OUT_BATCH(0); + OUT_BATCH(0); OUT_BATCH(0); + OUT_BATCH(0); + ADVANCE_BATCH(); + + BEGIN_BATCH(7); + OUT_BATCH(_3DSTATE_GS 16 | (7 - 2)); + OUT_BATCH(stage_state-prog_offset); + + /* GEN6_GS_SPF_MODE and GEN6_GS_VECTOR_MASK_ENABLE are enabled as it + * was previously done for gen6. + * + * TODO: test with both disabled to see if the HW is behaving + * as expected, like in gen7. + */ + OUT_BATCH(GEN6_GS_SPF_MODE | GEN6_GS_VECTOR_MASK_ENABLE | +((ALIGN(stage_state-sampler_count, 4)/4) + GEN6_GS_SAMPLER_COUNT_SHIFT) | +((prog_data-base.binding_table.size_bytes / 4) + GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT)); + + if (prog_data-total_scratch) { + OUT_RELOC(stage_state-scratch_bo, + I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, + ffs(prog_data-total_scratch) - 11); + } else { + OUT_BATCH(0); /* no scratch space */ + } + + OUT_BATCH((prog_data-urb_read_length + GEN6_GS_URB_READ_LENGTH_SHIFT) | +(0 GEN6_GS_URB_ENTRY_READ_OFFSET_SHIFT) | +(prog_data-base.dispatch_grf_start_reg + GEN6_GS_DISPATCH_START_GRF_SHIFT)); + + OUT_BATCH(((brw-max_gs_threads - 1) GEN6_GS_MAX_THREADS_SHIFT) | +GEN6_GS_STATISTICS_ENABLE | +GEN6_GS_SO_STATISTICS_ENABLE | +GEN6_GS_RENDERING_ENABLE); + + /* FIXME: Enable SVBI payload only when TF is enable in SNB for + * user-provided GS. +
[Mesa-dev] [PATCH 27/37] i965/gen6/gs: Add an additional parameter to the FF_SYNC opcode.
From: Samuel Iglesias Gonsalvez sigles...@igalia.com We will use this parameter in later patches to provide information relevant to transform feedback that needs to be set as part of the FF_SYNC message. Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/brw_defines.h | 4 src/mesa/drivers/dri/i965/brw_vec4.h | 3 ++- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 16 +--- src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp| 3 ++- 4 files changed, 21 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 6e8b998..b0d6d9f 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1030,6 +1030,10 @@ enum opcode { * FF_SYNC operation. * * - src1 is the number of primitives written. +* +* - src2 is the value to hold in M0.0: number of SO vertices to write +* and number of SO primitives needed. Its value will be overwritten +* with the SVBI values if transform feedback is enabled. */ GS_OPCODE_FF_SYNC, diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 763cb23..58a5aac 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -679,7 +679,8 @@ private: struct brw_reg src2); void generate_gs_ff_sync(struct brw_reg dst, struct brw_reg src0, -struct brw_reg src1); +struct brw_reg src1, +struct brw_reg src2); void generate_gs_set_primitive_id(struct brw_reg dst); void generate_oword_dual_block_offsets(struct brw_reg m1, struct brw_reg index); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index d4554f5..c69b305 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -734,7 +734,8 @@ vec4_generator::generate_gs_ff_sync_set_primitives(struct brw_reg dst, void vec4_generator::generate_gs_ff_sync(struct brw_reg dst, struct brw_reg src0, -struct brw_reg src1) +struct brw_reg src1, +struct brw_reg src2) { /* We use dst to setup the ff_sync header, so we expect it to be * initialized to R0 by the caller. Here we overwrite dword 0 (cleared @@ -744,7 +745,7 @@ vec4_generator::generate_gs_ff_sync(struct brw_reg dst, brw_push_insn_state(p); brw_set_default_mask_control(p, BRW_MASK_DISABLE); brw_set_default_access_mode(p, BRW_ALIGN_1); - brw_MOV(p, get_element_ud(dst, 0), brw_imm_ud(0)); + brw_MOV(p, get_element_ud(dst, 0), get_element_ud(src2, 0)); brw_MOV(p, get_element_ud(dst, 1), get_element_ud(src1, 0)); brw_set_default_access_mode(p, BRW_ALIGN_16); brw_pop_insn_state(p); @@ -763,6 +764,15 @@ vec4_generator::generate_gs_ff_sync(struct brw_reg dst, brw_set_default_access_mode(p, BRW_ALIGN_1); brw_set_default_mask_control(p, BRW_MASK_DISABLE); brw_MOV(p, get_element_ud(dst, 0), get_element_ud(src0, 0)); + + /* src2 is not an immediate when we use transform feedback */ + if (src2.file != BRW_IMMEDIATE_VALUE) { + brw_MOV(p, suboffset(vec1(src2), 0), suboffset(vec1(src0), 1)); + brw_MOV(p, suboffset(vec1(src2), 1), suboffset(vec1(src0), 2)); + brw_MOV(p, suboffset(vec1(src2), 2), suboffset(vec1(src0), 3)); + brw_MOV(p, suboffset(vec1(src2), 3), suboffset(vec1(src0), 4)); + } + brw_set_default_access_mode(p, BRW_ALIGN_16); brw_pop_insn_state(p); } @@ -1374,7 +1384,7 @@ vec4_generator::generate_vec4_instruction(vec4_instruction *instruction, break; case GS_OPCODE_FF_SYNC: - generate_gs_ff_sync(dst, src[0], src[1]); + generate_gs_ff_sync(dst, src[0], src[1], src[2]); break; case GS_OPCODE_FF_SYNC_SET_PRIMITIVES: diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp index b45c381..c1cfe75 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp @@ -331,7 +331,8 @@ gen6_gs_visitor::emit_thread_end() { this-current_annotation = gen6 thread end: ff_sync; emit(GS_OPCODE_FF_SYNC, - dst_reg(MRF, base_mrf), this-temp, this-prim_count); + dst_reg(MRF, base_mrf), this-temp, this-prim_count, + brw_imm_ud(0u)); /* Loop over all buffered vertices and emit URB write messages */ this-current_annotation = gen6 thread end: urb writes init; -- 1.9.1 ___ mesa-dev mailing list
[Mesa-dev] [PATCH 05/37] i965/gen6/gs: Setup constant push buffers for gen6 geometry shaders.
--- src/mesa/drivers/dri/i965/brw_state.h| 1 + src/mesa/drivers/dri/i965/brw_state_upload.c | 1 + src/mesa/drivers/dri/i965/gen6_gs_state.c| 59 ++-- 3 files changed, 49 insertions(+), 12 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h index abead18..95dc411 100644 --- a/src/mesa/drivers/dri/i965/brw_state.h +++ b/src/mesa/drivers/dri/i965/brw_state.h @@ -101,6 +101,7 @@ extern const struct brw_tracked_state gen6_clip_vp; extern const struct brw_tracked_state gen6_color_calc_state; extern const struct brw_tracked_state gen6_depth_stencil_state; extern const struct brw_tracked_state gen6_gs_state; +extern const struct brw_tracked_state gen6_gs_push_constants; extern const struct brw_tracked_state gen6_gs_binding_table; extern const struct brw_tracked_state gen6_multisample_state; extern const struct brw_tracked_state gen6_renderbuffer_surfaces; diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index 086956d..0481790 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -128,6 +128,7 @@ static const struct brw_tracked_state *gen6_atoms[] = gen6_depth_stencil_state, /* must do before cc unit */ gen6_vs_push_constants, /* Before vs_state */ + gen6_gs_push_constants, /* Before gs_state */ gen6_wm_push_constants, /* Before wm_state */ /* Surface state setup. Must come before the VS/WM unit. The binding diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c b/src/mesa/drivers/dri/i965/gen6_gs_state.c index e132959..987b7d2 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_state.c +++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c @@ -31,17 +31,36 @@ #include intel_batchbuffer.h static void -upload_gs_state_for_tf(struct brw_context *brw) +gen6_upload_gs_push_constants(struct brw_context *brw) { - /* Disable all the constant buffers. */ - BEGIN_BATCH(5); - OUT_BATCH(_3DSTATE_CONSTANT_GS 16 | (5 - 2)); - OUT_BATCH(0); - OUT_BATCH(0); - OUT_BATCH(0); - OUT_BATCH(0); - ADVANCE_BATCH(); + /* BRW_NEW_GEOMETRY_PROGRAM */ + const struct brw_geometry_program *gp = + (struct brw_geometry_program *) brw-geometry_program; + + if (gp) { + /* CACHE_NEW_GS_PROG */ + struct brw_stage_state *stage_state = brw-gs.base; + struct brw_stage_prog_data *prog_data = brw-gs.prog_data-base.base; + + gen6_upload_push_constants(brw, gp-program.Base, prog_data, + stage_state, AUB_TRACE_VS_CONSTANTS); + } +} +const struct brw_tracked_state gen6_gs_push_constants = { + .dirty = { + .mesa = _NEW_TRANSFORM | _NEW_PROGRAM_CONSTANTS, + .brw = (BRW_NEW_BATCH | +BRW_NEW_GEOMETRY_PROGRAM | +BRW_NEW_PUSH_CONSTANT_ALLOCATION), + .cache = CACHE_NEW_GS_PROG, + }, + .emit = gen6_upload_gs_push_constants, +}; + +static void +upload_gs_state_for_tf(struct brw_context *brw) +{ if (brw-ff_gs.prog_active) { BEGIN_BATCH(7); OUT_BATCH(_3DSTATE_GS 16 | (7 - 2)); @@ -87,8 +106,8 @@ upload_gs_state(struct brw_context *brw) const struct brw_vec4_prog_data *prog_data = brw-gs.prog_data-base; const struct brw_stage_state *stage_state = brw-gs.base; - if (active) { - /* FIXME: enable constant buffers */ + if (!active || stage_state-push_const_size == 0) { + /* Disable the push constant buffers. */ BEGIN_BATCH(5); OUT_BATCH(_3DSTATE_CONSTANT_GS 16 | (5 - 2)); OUT_BATCH(0); @@ -96,7 +115,23 @@ upload_gs_state(struct brw_context *brw) OUT_BATCH(0); OUT_BATCH(0); ADVANCE_BATCH(); + } else { + BEGIN_BATCH(5); + OUT_BATCH(_3DSTATE_CONSTANT_GS 16 | + GEN6_CONSTANT_BUFFER_0_ENABLE | + (5 - 2)); + /* Pointer to the GS constant buffer. Covered by the set of + * state flags from gen6_upload_vs_constants + */ + OUT_BATCH(stage_state-push_const_offset + +stage_state-push_const_size - 1); + OUT_BATCH(0); + OUT_BATCH(0); + OUT_BATCH(0); + ADVANCE_BATCH(); + } + if (active) { BEGIN_BATCH(7); OUT_BATCH(_3DSTATE_GS 16 | (7 - 2)); OUT_BATCH(stage_state-prog_offset); @@ -163,7 +198,7 @@ upload_gs_state(struct brw_context *brw) const struct brw_tracked_state gen6_gs_state = { .dirty = { - .mesa = _NEW_TRANSFORM, + .mesa = _NEW_TRANSFORM | _NEW_PROGRAM_CONSTANTS, .brw = BRW_NEW_CONTEXT | BRW_NEW_PUSH_CONSTANT_ALLOCATION, .cache = (CACHE_NEW_GS_PROG | CACHE_NEW_FF_GS_PROG) }, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/37] i965/gen6/gs: Set brw-gs.enabled to FALSE in gen6_blorp_emit_gs_disable()
From: Samuel Iglesias Gonsalvez sigles...@igalia.com Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/gen6_blorp.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp b/src/mesa/drivers/dri/i965/gen6_blorp.cpp index 1cab8b7..34b4331 100644 --- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp @@ -575,6 +575,7 @@ gen6_blorp_emit_gs_disable(struct brw_context *brw, OUT_BATCH(0); OUT_BATCH(0); ADVANCE_BATCH(); + brw-gs.enabled = false; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/37] i965/gs: Reuse gen6 constant push buffers setup code in gen7+.
From: Samuel Iglesias Gonsalvez sigles...@igalia.com The code required for gen6 and gen7+ is almost the same, so reuse it. Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/brw_state_upload.c | 4 ++-- src/mesa/drivers/dri/i965/gen6_gs_state.c| 6 - src/mesa/drivers/dri/i965/gen7_gs_state.c| 33 3 files changed, 7 insertions(+), 36 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index 0481790..a52a8f4 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -197,7 +197,7 @@ static const struct brw_tracked_state *gen7_atoms[] = gen6_depth_stencil_state, /* must do before cc unit */ gen6_vs_push_constants, /* Before vs_state */ - gen7_gs_push_constants, /* Before gs_state */ + gen6_gs_push_constants, /* Before gs_state */ gen6_wm_push_constants, /* Before wm_surfaces and constant_buffer */ /* Surface state setup. Must come before the VS/WM unit. The binding @@ -271,7 +271,7 @@ static const struct brw_tracked_state *gen8_atoms[] = gen6_color_calc_state, gen6_vs_push_constants, /* Before vs_state */ - gen7_gs_push_constants, /* Before gs_state */ + gen6_gs_push_constants, /* Before gs_state */ gen6_wm_push_constants, /* Before wm_surfaces and constant_buffer */ /* Surface state setup. Must come before the VS/WM unit. The binding diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c b/src/mesa/drivers/dri/i965/gen6_gs_state.c index 987b7d2..e3256e2 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_state.c +++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c @@ -33,18 +33,22 @@ static void gen6_upload_gs_push_constants(struct brw_context *brw) { + struct brw_stage_state *stage_state = brw-gs.base; + /* BRW_NEW_GEOMETRY_PROGRAM */ const struct brw_geometry_program *gp = (struct brw_geometry_program *) brw-geometry_program; if (gp) { /* CACHE_NEW_GS_PROG */ - struct brw_stage_state *stage_state = brw-gs.base; struct brw_stage_prog_data *prog_data = brw-gs.prog_data-base.base; gen6_upload_push_constants(brw, gp-program.Base, prog_data, stage_state, AUB_TRACE_VS_CONSTANTS); } + + if (brw-gen = 7) + gen7_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS); } const struct brw_tracked_state gen6_gs_push_constants = { diff --git a/src/mesa/drivers/dri/i965/gen7_gs_state.c b/src/mesa/drivers/dri/i965/gen7_gs_state.c index b3b4ee6..2a9955f 100644 --- a/src/mesa/drivers/dri/i965/gen7_gs_state.c +++ b/src/mesa/drivers/dri/i965/gen7_gs_state.c @@ -26,39 +26,6 @@ #include brw_defines.h #include intel_batchbuffer.h - -static void -gen7_upload_gs_push_constants(struct brw_context *brw) -{ - const struct brw_stage_state *stage_state = brw-gs.base; - /* BRW_NEW_GEOMETRY_PROGRAM */ - const struct brw_geometry_program *gp = - (struct brw_geometry_program *) brw-geometry_program; - - if (gp) { - /* CACHE_NEW_GS_PROG */ - const struct brw_stage_prog_data *prog_data = brw-gs.prog_data-base.base; - struct brw_stage_state *stage_state = brw-gs.base; - - gen6_upload_push_constants(brw, gp-program.Base, prog_data, - stage_state, AUB_TRACE_VS_CONSTANTS); - } - - gen7_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS); -} - -const struct brw_tracked_state gen7_gs_push_constants = { - .dirty = { - .mesa = _NEW_TRANSFORM | _NEW_PROGRAM_CONSTANTS, - .brw = (BRW_NEW_BATCH | -BRW_NEW_GEOMETRY_PROGRAM | -BRW_NEW_PUSH_CONSTANT_ALLOCATION), - .cache = CACHE_NEW_GS_PROG, - }, - .emit = gen7_upload_gs_push_constants, -}; - - static void upload_gs_state(struct brw_context *brw) { -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/37] i965/gen6/gs: Add instruction URB flags to geometry shaders EOT message.
Gen6 seems to require that EOT messages include the complete flag too or else the GPU hangs. We add will this flag to the instruction when we emit the thread end opcode. --- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 8ef0c34..9cb47b2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -462,7 +462,7 @@ vec4_generator::generate_gs_thread_end(vec4_instruction *inst) brw_null_reg(), /* dest */ inst-base_mrf, /* starting mrf reg nr */ src, - BRW_URB_WRITE_EOT, + BRW_URB_WRITE_EOT | inst-urb_write_flags, brw-gen = 8 ? 2 : 1,/* message len */ 0, /* response len */ 0, /* urb destination offset */ -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/37] i965/gen6/gs: use brw_gs_prog atom instead of brw_ff_gs_prog
From: Samuel Iglesias Gonsalvez sigles...@igalia.com This is needed to support user-provided geometry shaders, since the brw_ff_gs_prog atom in gen6 only takes care of implementing transform feedback for vertex shaders. If there is no user-provided geometry shader the implementation falls back to the original code. Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/brw_gs.c | 4 src/mesa/drivers/dri/i965/brw_gs.h | 1 + src/mesa/drivers/dri/i965/brw_state_upload.c | 2 +- src/mesa/drivers/dri/i965/brw_vec4_gs.c | 11 ++- 4 files changed, 16 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_gs.c b/src/mesa/drivers/dri/i965/brw_gs.c index fbd728f..c0c4c13 100644 --- a/src/mesa/drivers/dri/i965/brw_gs.c +++ b/src/mesa/drivers/dri/i965/brw_gs.c @@ -243,6 +243,10 @@ brw_upload_ff_gs_prog(struct brw_context *brw) } } +void gen6_brw_upload_ff_gs_prog(struct brw_context *brw) +{ + brw_upload_ff_gs_prog(brw); +} const struct brw_tracked_state brw_ff_gs_prog = { .dirty = { diff --git a/src/mesa/drivers/dri/i965/brw_gs.h b/src/mesa/drivers/dri/i965/brw_gs.h index f8f430c..a538948 100644 --- a/src/mesa/drivers/dri/i965/brw_gs.h +++ b/src/mesa/drivers/dri/i965/brw_gs.h @@ -110,5 +110,6 @@ void brw_ff_gs_lines(struct brw_ff_gs_compile *c); void gen6_sol_program(struct brw_ff_gs_compile *c, struct brw_ff_gs_prog_key *key, unsigned num_verts, bool check_edge_flag); +void gen6_brw_upload_ff_gs_prog(struct brw_context *brw); #endif diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index 3a452c3..086956d 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -108,7 +108,7 @@ static const struct brw_tracked_state *gen4_atoms[] = static const struct brw_tracked_state *gen6_atoms[] = { brw_vs_prog, /* must do before state base address */ - brw_ff_gs_prog, /* must do before state base address */ + brw_gs_prog, /* must do before state base address */ brw_wm_prog, /* must do before state base address */ gen6_clip_vp, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c b/src/mesa/drivers/dri/i965/brw_vec4_gs.c index 6428291..2d9e8c2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c @@ -31,6 +31,7 @@ #include brw_context.h #include brw_vec4_gs_visitor.h #include brw_state.h +#include brw_gs.h static bool @@ -270,6 +271,12 @@ brw_upload_gs_prog(struct brw_context *brw) (struct brw_geometry_program *) brw-geometry_program; if (gp == NULL) { + if (brw-gen == 6) { + if (brw-state.dirty.brw BRW_NEW_TRANSFORM_FEEDBACK) +gen6_brw_upload_ff_gs_prog(brw); + return; + } + /* No geometry shader. Vertex data just passes straight through. */ if (brw-state.dirty.brw BRW_NEW_VUE_MAP_VS) { brw-vue_map_geom_out = brw-vue_map_vs; @@ -325,7 +332,9 @@ brw_upload_gs_prog(struct brw_context *brw) const struct brw_tracked_state brw_gs_prog = { .dirty = { .mesa = (_NEW_LIGHT | _NEW_BUFFERS | _NEW_TEXTURE), - .brw = BRW_NEW_GEOMETRY_PROGRAM | BRW_NEW_VUE_MAP_VS, + .brw = (BRW_NEW_GEOMETRY_PROGRAM | +BRW_NEW_VUE_MAP_VS | +BRW_NEW_TRANSFORM_FEEDBACK), }, .emit = brw_upload_gs_prog }; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 33/37] i965/gen6/gs: Enable transform feedback support in geometry shaders
From: Samuel Iglesias Gonsalvez sigles...@igalia.com Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/brw_vec4_gs.c | 6 ++ src/mesa/drivers/dri/i965/gen6_gs_state.c | 13 + 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c b/src/mesa/drivers/dri/i965/brw_vec4_gs.c index f735cf3..53b0a2f 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c @@ -105,6 +105,12 @@ do_gs_prog(struct brw_context *brw, } else { /* There are no control data bits in gen6. */ c.control_data_bits_per_vertex = 0; + + /* If it is using transform feedback, enable it */ + if (prog-TransformFeedback.NumVarying) + c.prog_data.gen6_xfb_enabled = true; + else + c.prog_data.gen6_xfb_enabled = false; } c.control_data_header_size_bits = gp-program.VerticesOut * c.control_data_bits_per_vertex; diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c b/src/mesa/drivers/dri/i965/gen6_gs_state.c index e3256e2..f2eed19 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_state.c +++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c @@ -171,10 +171,7 @@ upload_gs_state(struct brw_context *brw) GEN6_GS_SO_STATISTICS_ENABLE | GEN6_GS_RENDERING_ENABLE); - /* FIXME: Enable SVBI payload only when TF is enable in SNB for - * user-provided GS. - */ - if (0) { + if (brw-gs.prog_data-gen6_xfb_enabled) { /* GEN6_GS_REORDER is equivalent to GEN7_GS_REORDER_TRAILING * in gen7. SNB and IVB specs are the same regarding the reordering of * TRISTRIP/TRISTRIP_REV vertices and triangle orientation, so we do @@ -183,9 +180,6 @@ upload_gs_state(struct brw_context *brw) */ OUT_BATCH(GEN6_GS_REORDER | GEN6_GS_SVBI_PAYLOAD_ENABLE | - GEN6_GS_SVBI_POSTINCREMENT_ENABLE | - /* FIXME: prog_data-svbi_postincrement_value instead of 0 */ - (0 GEN6_GS_SVBI_POSTINCREMENT_VALUE_SHIFT) | GEN6_GS_ENABLE); } else { OUT_BATCH(GEN6_GS_REORDER | GEN6_GS_ENABLE); @@ -203,7 +197,10 @@ upload_gs_state(struct brw_context *brw) const struct brw_tracked_state gen6_gs_state = { .dirty = { .mesa = _NEW_TRANSFORM | _NEW_PROGRAM_CONSTANTS, - .brw = BRW_NEW_CONTEXT | BRW_NEW_PUSH_CONSTANT_ALLOCATION, + .brw = (BRW_NEW_CONTEXT | +BRW_NEW_PUSH_CONSTANT_ALLOCATION | +BRW_NEW_GEOMETRY_PROGRAM | +BRW_NEW_BATCH), .cache = (CACHE_NEW_GS_PROG | CACHE_NEW_FF_GS_PROG) }, .emit = upload_gs_state, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 24/37] i965/gen6/gs: implement GS_OPCODE_SVB_WRITE opcode
From: Samuel Iglesias Gonsalvez sigles...@igalia.com This opcode will be used when sending SVB WRITE messages to save transform feedback outputs into Streamed Vertex Buffers. Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/brw_defines.h | 12 +++ src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_vec4.h | 7 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 41 4 files changed, 62 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index b30a095..83011d6 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1040,6 +1040,18 @@ enum opcode { * - dst is the GRF where PrimitiveID information will be moved. */ GS_OPCODE_SET_PRIMITIVE_ID, + + /** +* Write transform feedback data to the SVB by sending a SVB WRITE message. +* Used in gen6. +* +* - dst is the MRF register containing the message header. +* +* - src0 is the register where the vertex data is going to be copied from. +* +* - src1 is the destination register when write commit occurs. +*/ + GS_OPCODE_SVB_WRITE, }; enum brw_urb_write_flags { diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index fc3146c..8698b75 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -536,6 +536,8 @@ brw_instruction_name(enum opcode op) return ff_sync; case GS_OPCODE_SET_PRIMITIVE_ID: return set_primitive_id; + case GS_OPCODE_SVB_WRITE: + return gs_svb_write; default: /* Yes, this leaks. It's in debug code, it should never occur, and if diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 6e0da6d..e8456ce 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -220,6 +220,9 @@ public: enum brw_urb_write_flags urb_write_flags; bool header_present; + unsigned sol_binding; /** gen6: SOL binding table index */ + bool sol_final_write; /** gen6: send commit message */ + bool is_send_from_grf(); bool can_reswizzle_dst(int dst_writemask, int swizzle, int swizzle_mask); void reswizzle_dst(int dst_writemask, int swizzle); @@ -657,6 +660,10 @@ private: struct brw_reg src1); void generate_gs_set_vertex_count(struct brw_reg dst, struct brw_reg src); + void generate_gs_svb_write(vec4_instruction *inst, + struct brw_reg dst, + struct brw_reg src0, + struct brw_reg src1); void generate_gs_set_dword_2_immed(struct brw_reg dst, struct brw_reg src); void generate_gs_set_dword_2(struct brw_reg dst, struct brw_reg src); void generate_gs_prepare_channel_masks(struct brw_reg dst); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 8293f60..1728790 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -536,6 +536,44 @@ vec4_generator::generate_gs_set_vertex_count(struct brw_reg dst, } void +vec4_generator::generate_gs_svb_write(vec4_instruction *inst, + struct brw_reg dst, + struct brw_reg src0, + struct brw_reg src1) +{ + int binding = inst-sol_binding; + bool final_write = inst-sol_final_write; + + brw_push_insn_state(p); + /* Copy Vertex data into M0.x */ + brw_MOV(p, stride(dst, 4, 4, 1), + stride(retype(src0, BRW_REGISTER_TYPE_UD), 4, 4, 1)); + + /* Send SVB Write */ + brw_svb_write(p, + final_write ? src1 : brw_null_reg(), /* dest == src1 */ + 1, /* msg_reg_nr */ + dst, /* src0 == previous dst */ + SURF_INDEX_GEN6_SOL_BINDING(binding), /* binding_table_index */ + final_write); /* send_commit_msg */ + + /* Finally, wait for the write commit to occur so that we can proceed to +* other things safely. +* +* From the Sandybridge PRM, Volume 4, Part 1, Section 3.3: +* +* The write commit does not modify the destination register, but +* merely clears the dependency associated with the destination +* register. Thus, a simple “mov” instruction using the register as a +* source is sufficient to wait for the write commit to occur. +*/ + if (final_write) { + brw_MOV(p, src1, src1); + } + brw_pop_insn_state(p); +} + +void vec4_generator::generate_gs_set_dword_2_immed(struct brw_reg dst, struct brw_reg src) { @@ -1272,6 +1310,9 @@
[Mesa-dev] [PATCH 34/37] i965/gen6/gs: upload ubo and pull constants surfaces.
Uniforms declared as uniform blocks are stored in ubo surfaces and need to be pulled from the geometry shader program so make sure we upload them first and do the same for pull constants. This fixes all piglit tests that use uniform blocks: bin/shader_runner tests/spec/glsl-1.50/uniform_buffer/gs-* --- src/mesa/drivers/dri/i965/brw_state_upload.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index b0d78ab..af19a4c 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -136,6 +136,8 @@ static const struct brw_tracked_state *gen6_atoms[] = */ brw_vs_pull_constants, brw_vs_ubo_surfaces, + brw_gs_pull_constants, + brw_gs_ubo_surfaces, brw_wm_pull_constants, brw_wm_ubo_surfaces, gen6_renderbuffer_surfaces, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/37] Geometry shader support in Sandy Bridge
Hi, this series brings support for geometry shaders in Sandy Bridge (gen6) and is combined work from Samuel and myself. A few notes: 1.- Some patches have been based on original work by Ilia Mirkin, specifically the idea of using arrays to buffer the output of the GS, subclassing the vec4_gs_visitor for gen6 and generalizing emit_urb_slot(). 2.- Geometry shaders were already being used in gen6 to implement transform feedback support for vertex shaders. We have not changed this. These patches focus on adding support for user-provided geometry shaders and transform feedback support for the geometry shader stage. In the future it probably makes sense to merge transform feedback support for the vertex shader stage in our implementation so there is only one code path for geometry shaders in gen6, but it is probably better to tackle that at a later moment, once we have merged this work. 2.- On Ivy Bridge there are no piglit regressions. 3.- On Sandy Bridge we get these results after enabling OpenGL 3.2 and GLSL 1.50 (*1): crash:+0 fail:+15 (*2) pass: +3265 skip: -3280 (*1) Including Jordan's patches from the series Gen6 render surface state changes since these are required to enable layered rendering in geometry shaders. The numbers were obtained by comparing master with Jordan's patches on top (OpenGL 3.1, GLSL 1.40) against master with these and Jordan's patches on top (OpenGL 3.2, GLSL 1.50) (*2) These are mostly tests that either fail in Ivy Bridge too, are GS variants of tests that also fail for the VS/FS stages or relate to other aspects of OpenGL 3.2 that are not related with geometry shaders. 4.- With these patches, the following piglit test hangs: bin/glsl-1.50-geometry-primitive-id-restart GL_TRIANGLE_STRIP_ADJACENCY This problem seems to be unrelated to our implementation, since the hang happens only for that primitive type, only when using glDrawElements() (so glDrawArrays works fine), and only in specific cases where the list of indices provided includes repeated indices with a certain pattern. Actually, this test hangs even if we have a geometry shader that does nothing (i.e. an empty main function), where the code we generate is trivial and works with any other primitive type. Based on this, I conclude that this is a problem originating somewhere else, I think probably a hardware bug. Because of this, piglit runs with these patches should exclude this test by including -x primitive-id-restart. The offending piglit test can be trivially reworked to avoid repeating indices in the call to glDrawElements() too. I'll develop this issue further in another thread so we can decide what to do about this problem. I'll be on holidays for the next two weeks, starting tomorrow, but Samuel will be around since Tuesday next week so he can start acting on the review feedback we get. A quick summary of the patches: - Patch 1: is actually about gen7, but since gen6's dispatch mode for geometry shaders is equivalent to gen7's SINGLE mode it makes sense to do this first. - Patches 2-4 refactor 3DSTATE_GS to accomodate the code path for user-provided geometry shaders while keeping the original code that handles TF support in vertex shaders. - Patches 5-13 implement generator opcodes, configure state packets and handle required URB space. - Patches 14-15 generalize emit_urb_slot() so we can reuse that code. - Patches 16-19 are the gen6 geometry shader visitor implementation. - Patches 20-21 implement gl_PrimitiveIDIn. - Patch 22 makes sure we compute the right VUE map for user-provided GS. - Patch 23 enables texture related functions in the GS stage. - Patches 24-33 mostly implement transform feedback - Patch 34 handles uploading of ubo and pull constant surfaces - Patch 35 makes gen6 use this implementation of geometry shaders - Patches 36-37 enable GLSL 1.5 and OpenGL 3.2 in gen6 Iago Toral Quiroga (23): i965/gs: Use single dispatch mode as fallback to dual object mode when possible. i965/gen6/gs: Setup constant push buffers for gen6 geometry shaders. i965/gen6/gs: Implement GS_OPCODE_FF_SYNC. i965/gen6/gs: Implement GS_OPCODE_URB_WRITE_ALLOCATE. i965/gen6/gs: Add instruction URB flags to geometry shaders EOT message. i965/gen6/gs: Compute URB entry size for user-provided geometry shaders. i965/gen6/gs: Enable URB space for user-provided geometry shaders. i965/gen6/gs: Upload binding table for user-provided geometry shaders. i965/gen6/gs: Implement GS_OPCODE_SET_DWORD_2. i965: Provide means to create registers of a given size. i965: Generalize emit_urb_slot() to emit to any dst_reg. i965/gen6/gs: Add initial implementation for a gen6 geometry shader visitor. i965/gen6/gs: Implement geometry shaders for outputs other than points. i965/gen6/gs: Make sure we complete the last primitive. i965/gen6/gs: Handle the case where a geometry shader emits no output. i965/gen6/gs: Implement GS_OPCODE_SET_PRIMITIVE_ID. i965/gen6/gs:
[Mesa-dev] [PATCH 13/37] i965/gen6/gs: Implement GS_OPCODE_SET_DWORD_2.
We have GS_OPCODE_SET_DWORD_2_IMMED but this requires its source argument to be an immediate. In gen6 we need to set dword 2 of the URB write message header from values stored in separate register, so we need something more flexible. --- src/mesa/drivers/dri/i965/brw_defines.h | 8 src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 15 +++ 4 files changed, 26 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index a2b40fb..f6bdaeb 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -979,6 +979,14 @@ enum opcode { GS_OPCODE_SET_DWORD_2_IMMED, /** +* Same as above but can take the DWORD 2 value from any general purpose +* register, not necessarily an immediate. Used by geometry shaders in gen6 +* which need to set DWORD 2 of the URB write message header with vertex +* flags that we have buffered in a separate register. +*/ + GS_OPCODE_SET_DWORD_2, + + /** * Prepare the dst register for storage in the Channel Mask fields of a * URB_WRITE message header. * diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 69d16a7..b927601 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -524,6 +524,8 @@ brw_instruction_name(enum opcode op) return set_vertex_count; case GS_OPCODE_SET_DWORD_2_IMMED: return set_dword_2_immed; + case GS_OPCODE_SET_DWORD_2: + return set_dword_2; case GS_OPCODE_PREPARE_CHANNEL_MASKS: return prepare_channel_masks; case GS_OPCODE_SET_CHANNEL_MASKS: diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index c1daf54..5403f5a 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -657,6 +657,7 @@ private: void generate_gs_set_vertex_count(struct brw_reg dst, struct brw_reg src); void generate_gs_set_dword_2_immed(struct brw_reg dst, struct brw_reg src); + void generate_gs_set_dword_2(struct brw_reg dst, struct brw_reg src); void generate_gs_prepare_channel_masks(struct brw_reg dst); void generate_gs_set_channel_masks(struct brw_reg dst, struct brw_reg src); void generate_gs_get_instance_id(struct brw_reg dst); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 9cb47b2..2bf2b67 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -550,6 +550,17 @@ vec4_generator::generate_gs_set_dword_2_immed(struct brw_reg dst, } void +vec4_generator::generate_gs_set_dword_2(struct brw_reg dst, struct brw_reg src) +{ + brw_push_insn_state(p); + brw_set_default_access_mode(p, BRW_ALIGN_1); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + brw_MOV(p, suboffset(vec1(dst), 2), suboffset(vec1(src), 0)); + brw_set_default_access_mode(p, BRW_ALIGN_16); + brw_pop_insn_state(p); +} + +void vec4_generator::generate_gs_prepare_channel_masks(struct brw_reg dst) { /* We want to left shift just DWORD 4 (the x component belonging to the @@ -1252,6 +1263,10 @@ vec4_generator::generate_vec4_instruction(vec4_instruction *instruction, generate_gs_set_dword_2_immed(dst, src[0]); break; + case GS_OPCODE_SET_DWORD_2: + generate_gs_set_dword_2(dst, src[0]); + break; + case GS_OPCODE_PREPARE_CHANNEL_MASKS: generate_gs_prepare_channel_masks(dst); break; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/37] i965/gen6/gs: Implement GS_OPCODE_FF_SYNC.
This implements the FF_SYNC message required in gen6 geometry shaders to get the initial URB handle. --- src/mesa/drivers/dri/i965/brw_defines.h | 14 + src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_vec4.h | 3 ++ src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 40 4 files changed, 59 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 3564041..125d728 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1002,6 +1002,20 @@ enum opcode { * - dst is the GRF for gl_InvocationID. */ GS_OPCODE_GET_INSTANCE_ID, + + /** +* Send a FF_SYNC message to allocate initial URB handles (gen6). +* +* - dst will hold the newly allocated VUE handle. It is expected to be +* be initialized so that it can be used to as the FF_SYNC message header +* (that is, it won't do an implied move from R0). +* +* - src0 is a temporary that will be used as writeback register for the +* FF_SYNC operation. +* +* - src1 is the number of primitives written. +*/ + GS_OPCODE_FF_SYNC, }; enum brw_urb_write_flags { diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 0033135..5749061 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -528,6 +528,8 @@ brw_instruction_name(enum opcode op) return set_channel_masks; case GS_OPCODE_GET_INSTANCE_ID: return get_instance_id; + case GS_OPCODE_FF_SYNC: + return ff_sync; default: /* Yes, this leaks. It's in debug code, it should never occur, and if diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 67132c0..72fabdd 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -659,6 +659,9 @@ private: void generate_gs_prepare_channel_masks(struct brw_reg dst); void generate_gs_set_channel_masks(struct brw_reg dst, struct brw_reg src); void generate_gs_get_instance_id(struct brw_reg dst); + void generate_gs_ff_sync(struct brw_reg dst, +struct brw_reg src0, +struct brw_reg src1); void generate_oword_dual_block_offsets(struct brw_reg m1, struct brw_reg index); void generate_scratch_write(vec4_instruction *inst, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index c63b47a..05f4892 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -621,6 +621,42 @@ vec4_generator::generate_gs_get_instance_id(struct brw_reg dst) } void +vec4_generator::generate_gs_ff_sync(struct brw_reg dst, +struct brw_reg src0, +struct brw_reg src1) +{ + /* We use dst to setup the ff_sync header, so we expect it to be +* initialized to R0 by the caller. Here we overwrite dword 0 (cleared +* for now since we are not doing transform feedback) and dword 1 +* (to hold the number of primitives written). +*/ + brw_push_insn_state(p); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + brw_set_default_access_mode(p, BRW_ALIGN_1); + brw_MOV(p, get_element_ud(dst, 0), brw_imm_ud(0)); + brw_MOV(p, get_element_ud(dst, 1), get_element_ud(src1, 0)); + brw_set_default_access_mode(p, BRW_ALIGN_16); + brw_pop_insn_state(p); + + /* Write allocated URB handle to temporary passed in src0 */ + brw_ff_sync(p, + src0, + 0, + dst, + 1, /* allocate */ + 1, /* response length */ + 0 /* eot */); + + /* Now put allocated urb handle in dst.0 */ + brw_push_insn_state(p); + brw_set_default_access_mode(p, BRW_ALIGN_1); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + brw_MOV(p, get_element_ud(dst, 0), get_element_ud(src0, 0)); + brw_set_default_access_mode(p, BRW_ALIGN_16); + brw_pop_insn_state(p); +} + +void vec4_generator::generate_oword_dual_block_offsets(struct brw_reg m1, struct brw_reg index) { @@ -1198,6 +1234,10 @@ vec4_generator::generate_vec4_instruction(vec4_instruction *instruction, generate_gs_get_instance_id(dst); break; + case GS_OPCODE_FF_SYNC: + generate_gs_ff_sync(dst, src[0], src[1]); + break; + case SHADER_OPCODE_SHADER_TIME_ADD: brw_shader_time_add(p, src[0], prog_data-base.binding_table.shader_time_start); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
[Mesa-dev] [PATCH 17/37] i965/gen6/gs: Implement geometry shaders for outputs other than points.
--- src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 72 --- src/mesa/drivers/dri/i965/gen6_gs_visitor.h | 2 + 2 files changed, 67 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp index b78c55e..5123bd7 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp @@ -79,6 +79,21 @@ gen6_gs_visitor::emit_prolog() * and URB_WRITE messages. */ this-temp = src_reg(this, glsl_type::uint_type); + + /* This will be used to know when we are processing the first vertex of +* a primitive. We will set this to URB_WRITE_PRIM_START only when we know +* that we are processing the first vertex in the primitive and to zero +* otherwise. This way we can use its value directly in the URB write +* headers. +*/ + this-first_vertex = src_reg(this, glsl_type::uint_type); + emit(MOV(dst_reg(this-first_vertex), URB_WRITE_PRIM_START)); + + /* The FF_SYNC message requires to know the number of primitives generated, +* so keep a counter for this. +*/ + this-prim_count = src_reg(this, glsl_type::uint_type); + emit(MOV(dst_reg(this-prim_count), 0u)); } void @@ -109,18 +124,26 @@ gen6_gs_visitor::visit(ir_emit_vertex *) this-vertex_output_offset, 1u)); } - /* Now buffer flags for this vertex (we only support point output - * for now). - */ + /* Now buffer flags for this vertex */ dst_reg dst(this-vertex_output); dst.reladdr = ralloc(mem_ctx, src_reg); memcpy(dst.reladdr, this-vertex_output_offset, sizeof(src_reg)); - /* If we are outputting points, then every vertex has PrimStart and - * PrimEnd set. - */ if (c-gp-program.OutputType == GL_POINTS) { + /* If we are outputting points, then every vertex has PrimStart and + * PrimEnd set. + */ emit(MOV(dst, (_3DPRIM_POINTLIST URB_WRITE_PRIM_TYPE_SHIFT) | URB_WRITE_PRIM_START | URB_WRITE_PRIM_END)); + emit(ADD(dst_reg(this-prim_count), this-prim_count, 1u)); + } else { + /* Otherwise, we can only set the PrimStart flag, which we have stored + * in the first_vertex register. We will have to wait until we execute + * EndPrimitive() or we end the thread to set the PrimEnd flag on a + * vertex. + */ + emit(OR(dst, this-first_vertex, + (c-prog_data.output_topology URB_WRITE_PRIM_TYPE_SHIFT))); + emit(MOV(dst_reg(this-first_vertex), 0u)); } emit(ADD(dst_reg(this-vertex_output_offset), this-vertex_output_offset, 1u)); @@ -140,6 +163,41 @@ gen6_gs_visitor::visit(ir_end_primitive *) */ if (c-gp-program.OutputType == GL_POINTS) return; + + /* Otheriwse we know that the last vertex we have processed was the last +* vertex in the primitive and we need to set its PrimEnd flag, so do this +* unless we haven't emitted that vertex at all. +* +* Notice that we have already incremented vertex_count when we processed +* the last emit_vertex, so we need to take that into account in the +* comparison below (hence the num_output_vertices + 1 in the comparison +* below). +*/ + unsigned num_output_vertices = c-gp-program.VerticesOut; + emit(CMP(dst_null_d(), this-vertex_count, src_reg(num_output_vertices + 1), +BRW_CONDITIONAL_L)); + emit(IF(BRW_PREDICATE_NORMAL)); + { + /* vertex_output_offset is already pointing at the first entry of the + * next vertex. So subtract 1 to modify the flags for the previous + * vertex. + */ + src_reg offset(this, glsl_type::uint_type); + emit(ADD(dst_reg(offset), this-vertex_output_offset, brw_imm_d(-1))); + + src_reg dst(this-vertex_output); + dst.reladdr = ralloc(mem_ctx, src_reg); + memcpy(dst.reladdr, offset, sizeof(src_reg)); + + emit(OR(dst_reg(dst), dst, URB_WRITE_PRIM_END)); + emit(ADD(dst_reg(this-prim_count), this-prim_count, 1u)); + + /* Set the first vertex flag to indicate that the next vertex will start + * a primitive. + */ + emit(MOV(dst_reg(this-first_vertex), URB_WRITE_PRIM_START)); + } + emit(BRW_OPCODE_ENDIF); } void @@ -234,7 +292,7 @@ gen6_gs_visitor::emit_thread_end() /* Issue the FF_SYNC message and obtain the initial VUE handle. */ this-current_annotation = gen6 thread end: ff_sync; emit(GS_OPCODE_FF_SYNC, -dst_reg(MRF, base_mrf), this-temp, this-vertex_count); +dst_reg(MRF, base_mrf), this-temp, this-prim_count); /* Loop over all buffered vertices and emit URB write messages */ this-current_annotation = gen6 thread end: urb writes init; diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.h b/src/mesa/drivers/dri/i965/gen6_gs_visitor.h index 6dd3a19..68fe88d 100644 ---
[Mesa-dev] [PATCH 36/37] i965/gen6: enable GLSL 1.50
From: Samuel Iglesias Gonsalvez sigles...@igalia.com Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/intel_extensions.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index e134cd9..9875b7c 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -246,7 +246,7 @@ intelInitExtensions(struct gl_context *ctx) if (brw-gen = 7) ctx-Const.GLSLVersion = 330; else if (brw-gen = 6) - ctx-Const.GLSLVersion = 140; + ctx-Const.GLSLVersion = 150; else ctx-Const.GLSLVersion = 120; _mesa_override_glsl_version(ctx-Const); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 26/37] i965/gen6/gs: implement GS_OPCODE_FF_SYNC_SET_PRIMITIVES opcode
From: Samuel Iglesias Gonsalvez sigles...@igalia.com This opcode will be used when filling FF_SYNC header before emitting vertices and their data. Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/brw_defines.h | 15 +++ src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_vec4.h | 4 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 24 4 files changed, 45 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 7095c39..6e8b998 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1061,6 +1061,21 @@ enum opcode { * - src is the register that holds the destination indices value. */ GS_OPCODE_SVB_SET_DST_INDEX, + + /** +* Prepare Mx.0 subregister for being used in the FF_SYNC message header. +* Used in gen6 for transform feedback. +* +* - dst will hold the register with the final Mx.0 value. +* +* - src0 has the number of vertices emitted in SO (NumSOVertsToWrite) +* +* - src1 has the number of needed primitives for SO (NumSOPrimsNeeded) +* +* - src2 is the value to hold in M0: number of SO vertices to write +* and number of SO primitives needed. +*/ + GS_OPCODE_FF_SYNC_SET_PRIMITIVES, }; enum brw_urb_write_flags { diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index bf625a5..7328fdc 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -540,6 +540,8 @@ brw_instruction_name(enum opcode op) return gs_svb_write; case GS_OPCODE_SVB_SET_DST_INDEX: return gs_svb_set_dst_index; + case GS_OPCODE_FF_SYNC_SET_PRIMITIVES: + return gs_ff_sync_set_primitives; default: /* Yes, this leaks. It's in debug code, it should never occur, and if diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index ea3967d..763cb23 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -673,6 +673,10 @@ private: void generate_gs_prepare_channel_masks(struct brw_reg dst); void generate_gs_set_channel_masks(struct brw_reg dst, struct brw_reg src); void generate_gs_get_instance_id(struct brw_reg dst); + void generate_gs_ff_sync_set_primitives(struct brw_reg dst, + struct brw_reg src0, + struct brw_reg src1, + struct brw_reg src2); void generate_gs_ff_sync(struct brw_reg dst, struct brw_reg src0, struct brw_reg src1); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index d914a52..d4554f5 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -712,6 +712,26 @@ vec4_generator::generate_gs_get_instance_id(struct brw_reg dst) } void +vec4_generator::generate_gs_ff_sync_set_primitives(struct brw_reg dst, + struct brw_reg src0, + struct brw_reg src1, + struct brw_reg src2) +{ + brw_push_insn_state(p); + brw_set_default_access_mode(p, BRW_ALIGN_1); + /* Save src0 data in 16:31 bits of dst.0 */ + brw_AND(p, suboffset(vec1(dst), 0), suboffset(vec1(src0), 0), brw_imm_ud(0xu)); + brw_SHL(p, suboffset(vec1(dst), 0), suboffset(vec1(dst), 0), brw_imm_ud(16)); + /* Save src1 data in 0:15 bits of dst.0 */ + brw_AND(p, suboffset(vec1(src2), 0), suboffset(vec1(src1), 0), brw_imm_ud(0xu)); + brw_OR(p, suboffset(vec1(dst), 0), + suboffset(vec1(dst), 0), + suboffset(vec1(src2), 0)); + brw_set_default_access_mode(p, BRW_ALIGN_16); + brw_pop_insn_state(p); +} + +void vec4_generator::generate_gs_ff_sync(struct brw_reg dst, struct brw_reg src0, struct brw_reg src1) @@ -1357,6 +1377,10 @@ vec4_generator::generate_vec4_instruction(vec4_instruction *instruction, generate_gs_ff_sync(dst, src[0], src[1]); break; + case GS_OPCODE_FF_SYNC_SET_PRIMITIVES: + generate_gs_ff_sync_set_primitives(dst, src[0], src[1], src[2]); + break; + case GS_OPCODE_SET_PRIMITIVE_ID: generate_gs_set_primitive_id(dst); break; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 30/37] i965/gen6/gs: Buffer PSIZ/flags vertex data in gen6_gs_visitor
From: Samuel Iglesias Gonsalvez sigles...@igalia.com Since geometry shaders can alter the value of varyings packed in the first output VUE slot (PSIZ), we need to buffer it together with all the other vertex data so we can emit the right value for each vertex when we do the URB writes. This fixes the following piglit test in gen6: tests/spec/glsl-1.50/execution/redeclare-pervertex-out-subset-gs.shader_test Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 79 ++- 1 file changed, 41 insertions(+), 38 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp index b8eaa58..fca7536 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp @@ -178,16 +178,33 @@ gen6_gs_visitor::visit(ir_emit_vertex *) /* Buffer all output slots for this vertex in vertex_output */ for (int slot = 0; slot prog_data-vue_map.num_slots; ++slot) { - /* We will handle PSIZ for each vertex at thread end time since it - * is not computed by the GS algorithm and requires specific handling. - */ int varying = prog_data-vue_map.slot_to_varying[slot]; if (varying != VARYING_SLOT_PSIZ) { dst_reg dst(this-vertex_output); dst.reladdr = ralloc(mem_ctx, src_reg); memcpy(dst.reladdr, this-vertex_output_offset, sizeof(src_reg)); emit_urb_slot(dst, varying); + } else { +/* The PSIZ slot can pack multiple varyings in different channels + * and emit_urb_slot() will produce a MOV instruction for each of + * them. Since we are writing to an array, that will translate to + * possibly multiple MOV instructions with an array destination and + * each will generate a scratch write with the same offset into + * scratch space (thus, each one overwriting the previous). This is + * not what we want. What we will do instead is emit PSIZ to a + * a regular temporary register, then move that resgister into the + * array. This way we only have one instruction with an array + * destination and we only produce a single scratch write. + */ +dst_reg tmp = dst_reg(src_reg(this, glsl_type::uvec4_type)); +emit_urb_slot(tmp, varying); +dst_reg dst(this-vertex_output); +dst.reladdr = ralloc(mem_ctx, src_reg); +memcpy(dst.reladdr, this-vertex_output_offset, sizeof(src_reg)); +vec4_instruction *inst = emit(MOV(dst, src_reg(tmp))); +inst-force_writemask_all = true; } + emit(ADD(dst_reg(this-vertex_output_offset), this-vertex_output_offset, 1u)); } @@ -427,17 +444,12 @@ gen6_gs_visitor::emit_thread_end() memcpy(data.reladdr, this-vertex_output_offset, sizeof(src_reg)); - if (varying == VARYING_SLOT_PSIZ) { - /* We did not buffer PSIZ, emit it directly here */ - emit_urb_slot(dst_reg(MRF, mrf), varying); - } else { - /* Copy this slot to the appropriate message register */ - dst_reg reg = dst_reg(MRF, mrf); - reg.type = output_reg[varying].type; - data.type = reg.type; - vec4_instruction *inst = emit(MOV(reg, data)); - inst-force_writemask_all = true; - } + /* Copy this slot to the appropriate message register */ + dst_reg reg = dst_reg(MRF, mrf); + reg.type = output_reg[varying].type; + data.type = reg.type; + vec4_instruction *inst = emit(MOV(reg, data)); + inst-force_writemask_all = true; mrf++; emit(ADD(dst_reg(this-vertex_output_offset), @@ -585,22 +597,19 @@ gen6_gs_visitor::xfb_buffer_output() /* Buffer all TF outputs for this vertex in xfb_output */ for (int binding = 0; binding prog_data-num_transform_feedback_bindings; binding++) { - /* We will handle PSIZ for each vertex at thread end time since it - * is not computed by the GS algorithm and requires specific handling. - */ unsigned varying = prog_data-transform_feedback_bindings[binding]; - if (varying != VARYING_SLOT_PSIZ) { - dst_reg dst(this-xfb_output); - dst.reladdr = ralloc(mem_ctx, src_reg); - memcpy(dst.reladdr, this-xfb_output_offset, sizeof(src_reg)); - dst.type = output_reg[varying].type; + dst_reg dst(this-xfb_output); + dst.reladdr = ralloc(mem_ctx, src_reg); + memcpy(dst.reladdr, this-xfb_output_offset, sizeof(src_reg)); + dst.type = output_reg[varying].type; + +
[Mesa-dev] [PATCH 11/37] i965/gen6/gs: Enable URB space for user-provided geometry shaders.
--- src/mesa/drivers/dri/i965/gen6_urb.c | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_urb.c b/src/mesa/drivers/dri/i965/gen6_urb.c index b694f5d..7af1f37 100644 --- a/src/mesa/drivers/dri/i965/gen6_urb.c +++ b/src/mesa/drivers/dri/i965/gen6_urb.c @@ -52,19 +52,29 @@ gen6_upload_urb( struct brw_context *brw ) int nr_vs_entries, nr_gs_entries; int total_urb_size = brw-urb.size * 1024; /* in bytes */ + bool gs_present = brw-ff_gs.prog_active || brw-geometry_program; + /* CACHE_NEW_VS_PROG */ unsigned vs_size = MAX2(brw-vs.prog_data-base.urb_entry_size, 1); - /* We use the same VUE layout for VS outputs and GS outputs (as it's what -* the SF and Clipper expect), so we can simply make the GS URB entry size -* the same as for the VS. This may technically be too large in cases -* where we have few vertex attributes and a lot of varyings, since the VS -* size is determined by the larger of the two. For now, it's safe. + /* Whe using GS to do transform feedback only we use the same VUE layout for +* VS outputs and GS outputs (as it's what the SF and Clipper expect), so we +* can simply make the GS URB entry size the same as for the VS. This may +* technically be too large in cases where we have few vertex attributes and +* a lot of varyings, since the VS size is determined by the larger of the +* two. For now, it's safe. +* +* For user-provided GS the assumption above does not hold since the GS +* outputs can be different from the VS outputs. */ unsigned gs_size = vs_size; + if (brw-geometry_program) { + gs_size = brw-gs.prog_data-base.urb_entry_size; + assert(gs_size = 1); + } /* Calculate how many entries fit in each stage's section of the URB */ - if (brw-ff_gs.prog_active) { + if (gs_present) { nr_vs_entries = (total_urb_size/2) / (vs_size * 128); nr_gs_entries = (total_urb_size/2) / (gs_size * 128); } else { @@ -109,16 +119,16 @@ gen6_upload_urb( struct brw_context *brw ) * doesn't exist on Gen6). So for now we just do a full pipeline flush as * a workaround. */ - if (brw-urb.gen6_gs_previously_active !brw-ff_gs.prog_active) + if (brw-urb.gen6_gs_previously_active !gs_present) intel_batchbuffer_emit_mi_flush(brw); - brw-urb.gen6_gs_previously_active = brw-ff_gs.prog_active; + brw-urb.gen6_gs_previously_active = gs_present; } const struct brw_tracked_state gen6_urb = { .dirty = { .mesa = 0, - .brw = BRW_NEW_CONTEXT, - .cache = (CACHE_NEW_VS_PROG | CACHE_NEW_FF_GS_PROG), + .brw = (BRW_NEW_CONTEXT | BRW_NEW_GEOMETRY_PROGRAM), + .cache = (CACHE_NEW_VS_PROG | CACHE_NEW_GS_PROG | CACHE_NEW_FF_GS_PROG), }, .emit = gen6_upload_urb, }; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 37/37] i965/gen6: enable OpenGL 3.2
From: Samuel Iglesias Gonsalvez sigles...@igalia.com Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/intel_screen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index ea0fc58..83101a5 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1273,7 +1273,7 @@ set_max_gl_versions(struct intel_screen *screen) psp-max_gl_es2_version = 30; break; case 6: - psp-max_gl_core_version = 31; + psp-max_gl_core_version = 32; psp-max_gl_compat_version = 30; psp-max_gl_es1_version = 11; psp-max_gl_es2_version = 30; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 25/37] i965/gen6/gs: implement GS_OPCODE_SVB_SET_DST_INDEX opcode
From: Samuel Iglesias Gonsalvez sigles...@igalia.com This opcode generates code to copy the specified destination index into subregister 5 of the MRF message header. Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/brw_defines.h | 9 + src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_vec4.h | 4 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 20 4 files changed, 35 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 83011d6..7095c39 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1052,6 +1052,15 @@ enum opcode { * - src1 is the destination register when write commit occurs. */ GS_OPCODE_SVB_WRITE, + + /** +* Set destination index in the SVB write message payload (M0.5). Used +* in gen6 for transform feedback. +* +* - dst is the header to save the destination indices for SVB WRITE. +* - src is the register that holds the destination indices value. +*/ + GS_OPCODE_SVB_SET_DST_INDEX, }; enum brw_urb_write_flags { diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 8698b75..bf625a5 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -538,6 +538,8 @@ brw_instruction_name(enum opcode op) return set_primitive_id; case GS_OPCODE_SVB_WRITE: return gs_svb_write; + case GS_OPCODE_SVB_SET_DST_INDEX: + return gs_svb_set_dst_index; default: /* Yes, this leaks. It's in debug code, it should never occur, and if diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index e8456ce..ea3967d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -222,6 +222,7 @@ public: unsigned sol_binding; /** gen6: SOL binding table index */ bool sol_final_write; /** gen6: send commit message */ + unsigned sol_vertex; /** gen6: used for setting dst index in SVB header */ bool is_send_from_grf(); bool can_reswizzle_dst(int dst_writemask, int swizzle, int swizzle_mask); @@ -664,6 +665,9 @@ private: struct brw_reg dst, struct brw_reg src0, struct brw_reg src1); + void generate_gs_svb_set_destination_index(vec4_instruction *inst, + struct brw_reg dst, + struct brw_reg src); void generate_gs_set_dword_2_immed(struct brw_reg dst, struct brw_reg src); void generate_gs_set_dword_2(struct brw_reg dst, struct brw_reg src); void generate_gs_prepare_channel_masks(struct brw_reg dst); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 1728790..d914a52 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -574,6 +574,22 @@ vec4_generator::generate_gs_svb_write(vec4_instruction *inst, } void +vec4_generator::generate_gs_svb_set_destination_index(vec4_instruction *inst, + struct brw_reg dst, + struct brw_reg src) +{ + + int vertex = inst-sol_vertex; + brw_push_insn_state(p); + brw_set_default_access_mode(p, BRW_ALIGN_1); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + brw_MOV(p, get_element_ud(dst, 5), + get_element_ud(src, vertex)); + brw_set_default_access_mode(p, BRW_ALIGN_16); + brw_pop_insn_state(p); +} + +void vec4_generator::generate_gs_set_dword_2_immed(struct brw_reg dst, struct brw_reg src) { @@ -1313,6 +1329,10 @@ vec4_generator::generate_vec4_instruction(vec4_instruction *instruction, case GS_OPCODE_SVB_WRITE: generate_gs_svb_write(inst, dst, src[0], src[1]); + case GS_OPCODE_SVB_SET_DST_INDEX: + generate_gs_svb_set_destination_index(inst, dst, src[0]); + break; + case GS_OPCODE_SET_DWORD_2_IMMED: generate_gs_set_dword_2_immed(dst, src[0]); break; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 22/37] i965/gen6/gs: Assign geometry shader VUE map properly.
So far in gen6 we only used geometry shaders to implement transform feedback in vertex shaders, so we assumed that the VUE map for the geometry shader stage was always the same as for the vertex shader stage. This is no longer true now that we support user provided geometry shaders in gen6 too. --- src/mesa/drivers/dri/i965/brw_vec4_gs.c | 12 ++-- src/mesa/drivers/dri/i965/brw_vs.c | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c b/src/mesa/drivers/dri/i965/brw_vec4_gs.c index a445174..f735cf3 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c @@ -296,18 +296,18 @@ brw_upload_gs_prog(struct brw_context *brw) (struct brw_geometry_program *) brw-geometry_program; if (gp == NULL) { - if (brw-gen == 6) { - if (brw-state.dirty.brw BRW_NEW_TRANSFORM_FEEDBACK) -gen6_brw_upload_ff_gs_prog(brw); - return; - } - /* No geometry shader. Vertex data just passes straight through. */ if (brw-state.dirty.brw BRW_NEW_VUE_MAP_VS) { brw-vue_map_geom_out = brw-vue_map_vs; brw-state.dirty.brw |= BRW_NEW_VUE_MAP_GEOM_OUT; } + if (brw-gen == 6 + (brw-state.dirty.brw BRW_NEW_TRANSFORM_FEEDBACK)) { + gen6_brw_upload_ff_gs_prog(brw); + return; + } + /* Other state atoms had better not try to access prog_data, since * there's no GS program. */ diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index 19b1d3b..3ea7681 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -495,7 +495,7 @@ static void brw_upload_vs_prog(struct brw_context *brw) sizeof(brw-vue_map_geom_out)) != 0) { brw-vue_map_vs = brw-vs.prog_data-base.vue_map; brw-state.dirty.brw |= BRW_NEW_VUE_MAP_VS; - if (brw-gen 7) { + if (brw-gen 6) { /* No geometry shader support, so the VS VUE map is the VUE map for * the output of the geometry portion of the pipeline. */ -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/37] i965: Generalize emit_urb_slot() to emit to any dst_reg.
In gen7+ we emit vertices as they come, however in gen6 geometry shaders we have to buffer vertex data for all vertices and then emit it all in one go at the end. To achieve this we need to generalize emit_urb_slot() to store vertex data in general purpose registers and not only MRF registers. --- src/mesa/drivers/dri/i965/brw_vec4.h | 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 30 +++--- 2 files changed, 20 insertions(+), 14 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index d95b58d..ad3a77f 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -532,10 +532,10 @@ public: void swizzle_result(ir_texture *ir, src_reg orig_val, uint32_t sampler); void emit_ndc_computation(); - void emit_psiz_and_flags(struct brw_reg reg); + void emit_psiz_and_flags(dst_reg reg); void emit_clip_distances(dst_reg reg, int offset); void emit_generic_urb_slot(dst_reg reg, int varying); - void emit_urb_slot(int mrf, int varying); + void emit_urb_slot(dst_reg reg, int varying); void emit_shader_time_begin(); void emit_shader_time_end(); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index e1fbcbc..d6ace29 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -2806,7 +2806,7 @@ vec4_visitor::emit_ndc_computation() } void -vec4_visitor::emit_psiz_and_flags(struct brw_reg reg) +vec4_visitor::emit_psiz_and_flags(dst_reg reg) { if (brw-gen 6 ((prog_data-vue_map.slots_valid VARYING_BIT_PSIZ) || @@ -2866,16 +2866,21 @@ vec4_visitor::emit_psiz_and_flags(struct brw_reg reg) } else { emit(MOV(retype(reg, BRW_REGISTER_TYPE_D), src_reg(0))); if (prog_data-vue_map.slots_valid VARYING_BIT_PSIZ) { - emit(MOV(brw_writemask(reg, WRITEMASK_W), - src_reg(output_reg[VARYING_SLOT_PSIZ]))); + dst_reg reg_w = reg; + reg_w.writemask = WRITEMASK_W; + emit(MOV(reg_w, src_reg(output_reg[VARYING_SLOT_PSIZ]))); } if (prog_data-vue_map.slots_valid VARYING_BIT_LAYER) { - emit(MOV(retype(brw_writemask(reg, WRITEMASK_Y), BRW_REGISTER_TYPE_D), - src_reg(output_reg[VARYING_SLOT_LAYER]))); + dst_reg reg_y = reg; + reg_y.writemask = WRITEMASK_Y; + reg_y.type = BRW_REGISTER_TYPE_D; + emit(MOV(reg_y, src_reg(output_reg[VARYING_SLOT_LAYER]))); } if (prog_data-vue_map.slots_valid VARYING_BIT_VIEWPORT) { - emit(MOV(retype(brw_writemask(reg, WRITEMASK_Z), BRW_REGISTER_TYPE_D), - src_reg(output_reg[VARYING_SLOT_VIEWPORT]))); + dst_reg reg_z = reg; + reg_z.writemask = WRITEMASK_Z; + reg_z.type = BRW_REGISTER_TYPE_D; + emit(MOV(reg_z, src_reg(output_reg[VARYING_SLOT_VIEWPORT]))); } } } @@ -2928,18 +2933,18 @@ vec4_visitor::emit_generic_urb_slot(dst_reg reg, int varying) } void -vec4_visitor::emit_urb_slot(int mrf, int varying) +vec4_visitor::emit_urb_slot(dst_reg reg, int varying) { - struct brw_reg hw_reg = brw_message_reg(mrf); - dst_reg reg = dst_reg(MRF, mrf); reg.type = BRW_REGISTER_TYPE_F; switch (varying) { case VARYING_SLOT_PSIZ: + { /* PSIZ is always in slot 0, and is coupled with other flags. */ current_annotation = indices, point width, clip flags; - emit_psiz_and_flags(hw_reg); + emit_psiz_and_flags(reg); break; + } case BRW_VARYING_SLOT_NDC: current_annotation = NDC; emit(MOV(reg, src_reg(output_reg[BRW_VARYING_SLOT_NDC]))); @@ -3047,7 +3052,8 @@ vec4_visitor::emit_vertex() mrf = base_mrf + 1; for (; slot prog_data-vue_map.num_slots; ++slot) { - emit_urb_slot(mrf++, prog_data-vue_map.slot_to_varying[slot]); + emit_urb_slot(dst_reg(MRF, mrf++), + prog_data-vue_map.slot_to_varying[slot]); /* If this was max_usable_mrf, we can't fit anything more into this * URB WRITE. -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 20/37] i965/gen6/gs: Implement GS_OPCODE_SET_PRIMITIVE_ID.
In gen6 the geometry shader payload includes the PrimitiveID information in r0.1. When the shader code uses glPimitiveIdIn we will have to move this to a separate hardware register where we can map this attribute. This opcode takes the selected destination register and moves r0.1 there. --- src/mesa/drivers/dri/i965/brw_defines.h | 8 src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 17 + 4 files changed, 28 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index f6bdaeb..b30a095 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1032,6 +1032,14 @@ enum opcode { * - src1 is the number of primitives written. */ GS_OPCODE_FF_SYNC, + + /** +* Move r0.1 (which holds PrimitiveID information in gen6) to a separate +* register. +* +* - dst is the GRF where PrimitiveID information will be moved. +*/ + GS_OPCODE_SET_PRIMITIVE_ID, }; enum brw_urb_write_flags { diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index b927601..fc3146c 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -534,6 +534,8 @@ brw_instruction_name(enum opcode op) return get_instance_id; case GS_OPCODE_FF_SYNC: return ff_sync; + case GS_OPCODE_SET_PRIMITIVE_ID: + return set_primitive_id; default: /* Yes, this leaks. It's in debug code, it should never occur, and if diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index ad3a77f..6e0da6d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -665,6 +665,7 @@ private: void generate_gs_ff_sync(struct brw_reg dst, struct brw_reg src0, struct brw_reg src1); + void generate_gs_set_primitive_id(struct brw_reg dst); void generate_oword_dual_block_offsets(struct brw_reg m1, struct brw_reg index); void generate_scratch_write(vec4_instruction *inst, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 2bf2b67..8293f60 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -694,6 +694,19 @@ vec4_generator::generate_gs_ff_sync(struct brw_reg dst, } void +vec4_generator::generate_gs_set_primitive_id(struct brw_reg dst) +{ + /* In gen6, PrimitiveID is delivered in R0.1 of the payload */ + struct brw_reg src = brw_vec8_grf(0, 0); + brw_push_insn_state(p); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + brw_set_default_access_mode(p, BRW_ALIGN_1); + brw_MOV(p, get_element_ud(dst, 0), get_element_ud(src, 1)); + brw_set_default_access_mode(p, BRW_ALIGN_16); + brw_pop_insn_state(p); +} + +void vec4_generator::generate_oword_dual_block_offsets(struct brw_reg m1, struct brw_reg index) { @@ -1283,6 +1296,10 @@ vec4_generator::generate_vec4_instruction(vec4_instruction *instruction, generate_gs_ff_sync(dst, src[0], src[1]); break; + case GS_OPCODE_SET_PRIMITIVE_ID: + generate_gs_set_primitive_id(dst); + break; + case SHADER_OPCODE_SHADER_TIME_ADD: brw_shader_time_add(p, src[0], prog_data-base.binding_table.shader_time_start); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 31/37] i965/gen6/gs: Avoid buffering transform feedback varyings twice.
Currently we buffer transform feedack varyings separately. This patch makes it so that we reuse the values we have already buffered for all the output varyings of the geometry shader instead. --- src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 181 -- src/mesa/drivers/dri/i965/gen6_gs_visitor.h | 8 +- 2 files changed, 83 insertions(+), 106 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp index fca7536..8b7b8fd 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp @@ -98,30 +98,6 @@ gen6_gs_visitor::emit_prolog() emit(MOV(dst_reg(this-prim_count), 0u)); if (c-prog_data.gen6_xfb_enabled) { - const struct gl_transform_feedback_info *linked_xfb_info = - this-shader_prog-LinkedTransformFeedback; - - /* Gen6 geometry shaders are required to ask for Streamed Vertex Buffer - * Indices values via FF_SYNC message, when Transform Feedback is - * enabled. - * - * To achieve this we buffer the Transform feedback outputs for each - * emitted vertex in xfb_output during operation. Then, when we have - * processed the last vertex (that is, at thread end time), we know all - * the required data for the FF_SYNC message header in order to receive - * the SVBI in the writeback. - * - * For each emitted vertex, xfb_output will hold - * num_transform_feedback_bindings data items plus one, which will - * indicate the end of the primitive. Next vertex's data comes right - * after. - */ - this-xfb_output = src_reg(this, - glsl_type::uint_type, - linked_xfb_info-NumOutputs * - c-gp-program.VerticesOut); - this-xfb_output_offset = src_reg(this, glsl_type::uint_type); - emit(MOV(dst_reg(this-xfb_output_offset), src_reg(0u))); /* Create a virtual register to hold destination indices in SOL */ this-destination_indices = src_reg(this, glsl_type::uvec4_type); /* Create a virtual register to hold temporal values in SOL */ @@ -134,6 +110,8 @@ gen6_gs_visitor::emit_prolog() this-max_svbi = src_reg(this, glsl_type::uvec4_type); emit(MOV(dst_reg(this-max_svbi), src_reg(retype(brw_vec1_grf(1, 4), BRW_REGISTER_TYPE_UD; + + xfb_setup(); } /* PrimitveID is delivered in r0.1 of the thread payload. If the program @@ -173,9 +151,6 @@ gen6_gs_visitor::visit(ir_emit_vertex *) BRW_CONDITIONAL_L)); emit(IF(BRW_PREDICATE_NORMAL)); { - if (c-prog_data.gen6_xfb_enabled) - xfb_buffer_output(); - /* Buffer all output slots for this vertex in vertex_output */ for (int slot = 0; slot prog_data-vue_map.num_slots; ++slot) { int varying = prog_data-vue_map.slot_to_varying[slot]; @@ -557,7 +532,7 @@ gen6_gs_visitor::setup_payload() } void -gen6_gs_visitor::xfb_buffer_output() +gen6_gs_visitor::xfb_setup() { static const unsigned swizzle_for_offset[4] = { BRW_SWIZZLE4(0, 1, 2, 3), @@ -569,48 +544,27 @@ gen6_gs_visitor::xfb_buffer_output() struct brw_gs_prog_data *prog_data = (struct brw_gs_prog_data *) c-prog_data; - if (!prog_data-num_transform_feedback_bindings) { - const struct gl_transform_feedback_info *linked_xfb_info = - this-shader_prog-LinkedTransformFeedback; - int i; - - /* Make sure that the VUE slots won't overflow the unsigned chars in - * prog_data-transform_feedback_bindings[]. - */ - STATIC_ASSERT(BRW_VARYING_SLOT_COUNT = 256); - - /* Make sure that we don't need more binding table entries than we've - * set aside for use in transform feedback. (We shouldn't, since we - * set aside enough binding table entries to have one per component). - */ - assert(linked_xfb_info-NumOutputs = BRW_MAX_SOL_BINDINGS); - - prog_data-num_transform_feedback_bindings = linked_xfb_info-NumOutputs; - for (i = 0; i prog_data-num_transform_feedback_bindings; i++) { - prog_data-transform_feedback_bindings[i] = -linked_xfb_info-Outputs[i].OutputRegister; - prog_data-transform_feedback_swizzles[i] = -swizzle_for_offset[linked_xfb_info-Outputs[i].ComponentOffset]; - } - } - - /* Buffer all TF outputs for this vertex in xfb_output */ - for (int binding = 0; binding prog_data-num_transform_feedback_bindings; -binding++) { - unsigned varying = - prog_data-transform_feedback_bindings[binding]; - dst_reg dst(this-xfb_output); - dst.reladdr = ralloc(mem_ctx, src_reg); - memcpy(dst.reladdr, this-xfb_output_offset, sizeof(src_reg)); - dst.type = output_reg[varying].type; + const struct gl_transform_feedback_info *linked_xfb_info = + this-shader_prog-LinkedTransformFeedback;
[Mesa-dev] [PATCH 28/37] i965/gen6/gs: implement transform feedback support in gen6_gs_visitor
From: Samuel Iglesias Gonsalvez sigles...@igalia.com This takes care of generating code required to handle transform feedback. Notice that transform feedback isn't enabled yet, since that requires additional setups in other parts of the code that will come in later patches. Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/brw_context.h | 113 ++ src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 309 +- src/mesa/drivers/dri/i965/gen6_gs_visitor.h | 14 ++ 3 files changed, 391 insertions(+), 45 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 7439da1..3418b76 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -553,48 +553,6 @@ struct brw_vs_prog_data { bool uses_vertexid; }; - -/* Note: brw_gs_prog_data_compare() must be updated when adding fields to - * this struct! - */ -struct brw_gs_prog_data -{ - struct brw_vec4_prog_data base; - - /** -* Size of an output vertex, measured in HWORDS (32 bytes). -*/ - unsigned output_vertex_size_hwords; - - unsigned output_topology; - - /** -* Size of the control data (cut bits or StreamID bits), in hwords (32 -* bytes). 0 if there is no control data. -*/ - unsigned control_data_header_size_hwords; - - /** -* Format of the control data (either GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID -* if the control data is StreamID bits, or -* GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT if the control data is cut bits). -* Ignored if control_data_header_size is 0. -*/ - unsigned control_data_format; - - bool include_primitive_id; - - int invocations; - - /** -* Dispatch mode, can be any of: -* GEN7_GS_DISPATCH_MODE_DUAL_OBJECT -* GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE -* GEN7_GS_DISPATCH_MODE_SINGLE -*/ - int dispatch_mode; -}; - /** Number of texture sampler units */ #define BRW_MAX_TEX_UNIT 32 @@ -641,6 +599,77 @@ struct brw_gs_prog_data #define SURF_INDEX_GEN6_SOL_BINDING(t) (t) #define BRW_MAX_GEN6_GS_SURFACES SURF_INDEX_GEN6_SOL_BINDING(BRW_MAX_SOL_BINDINGS) +/* Note: brw_gs_prog_data_compare() must be updated when adding fields to + * this struct! + */ +struct brw_gs_prog_data +{ + struct brw_vec4_prog_data base; + + /** +* Size of an output vertex, measured in HWORDS (32 bytes). +*/ + unsigned output_vertex_size_hwords; + + unsigned output_topology; + + /** +* Size of the control data (cut bits or StreamID bits), in hwords (32 +* bytes). 0 if there is no control data. +*/ + unsigned control_data_header_size_hwords; + + /** +* Format of the control data (either GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID +* if the control data is StreamID bits, or +* GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT if the control data is cut bits). +* Ignored if control_data_header_size is 0. +*/ + unsigned control_data_format; + + bool include_primitive_id; + + int invocations; + + /** +* Dispatch mode, can be any of: +* GEN7_GS_DISPATCH_MODE_DUAL_OBJECT +* GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE +* GEN7_GS_DISPATCH_MODE_SINGLE +*/ + int dispatch_mode; + + /** +* Gen6 transform feedback enabled flag. +*/ + bool gen6_xfb_enabled; + + /** +* Gen6: Provoking vertex convention for odd-numbered triangles +* in tristrips. +*/ + GLuint pv_first:1; + + /** +* Gen6: Number of varyings that are output to transform feedback. +*/ + GLuint num_transform_feedback_bindings:7; /* 0-BRW_MAX_SOL_BINDINGS */ + + /** +* Gen6: Map from the index of a transform feedback binding table entry to the +* gl_varying_slot that should be streamed out through that binding table +* entry. +*/ + unsigned char transform_feedback_bindings[BRW_MAX_SOL_BINDINGS]; + + /** +* Gen6: Map from the index of a transform feedback binding table entry to the +* swizzles that should be used when streaming out data through that +* binding table entry. +*/ + unsigned char transform_feedback_swizzles[BRW_MAX_SOL_BINDINGS]; +}; + /** * Stride in bytes between shader_time entries. * diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp index c1cfe75..b8eaa58 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp @@ -97,6 +97,45 @@ gen6_gs_visitor::emit_prolog() this-prim_count = src_reg(this, glsl_type::uint_type); emit(MOV(dst_reg(this-prim_count), 0u)); + if (c-prog_data.gen6_xfb_enabled) { + const struct gl_transform_feedback_info *linked_xfb_info = + this-shader_prog-LinkedTransformFeedback; + + /* Gen6 geometry shaders are required to ask for Streamed Vertex Buffer + * Indices values via FF_SYNC message, when Transform Feedback is + * enabled. +
[Mesa-dev] [PATCH 16/37] i965/gen6/gs: Add initial implementation for a gen6 geometry shader visitor.
Geometry shaders in gen6 are significantly different from gen7+ so it is better to have them implemented in a different file rather than adding gen6 branching paths all over brw_vec4_gs_visitor.cpp. This commit adds an initial implementation that only handles point output, which is the simplest case. --- src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h | 2 +- src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 345 src/mesa/drivers/dri/i965/gen6_gs_visitor.h | 67 + 4 files changed, 414 insertions(+), 1 deletion(-) create mode 100644 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp create mode 100644 src/mesa/drivers/dri/i965/gen6_gs_visitor.h diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 3fb647b..deada5f 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -121,6 +121,7 @@ i965_FILES = \ gen6_clip_state.c \ gen6_depthstencil.c \ gen6_gs_state.c \ + gen6_gs_visitor.cpp \ gen6_multisample_state.c \ gen6_queryobj.c \ gen6_sampler_state.c \ diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h index 0be7559..8bf11fa 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h @@ -95,7 +95,7 @@ protected: virtual void visit(ir_emit_vertex *); virtual void visit(ir_end_primitive *); -private: +protected: int setup_varying_inputs(int payload_reg, int *attribute_map, int attributes_per_reg); void emit_control_data_bits(); diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp new file mode 100644 index 000..b78c55e --- /dev/null +++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp @@ -0,0 +1,345 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * This code is based on original work by Ilia Mirkin. + */ + +/** + * \file gen6_gs_visitor.cpp + * + * Gen6 geometry shader implementation + */ + +#include gen6_gs_visitor.h + +namespace brw { + +void +gen6_gs_visitor::emit_prolog() +{ + vec4_gs_visitor::emit_prolog(); + + /* Gen6 geometry shaders require to allocate an initial VUE handle via +* FF_SYNC message, however the documentation remarks that only one thread +* can write to the URB simultaneously and the FF_SYNC message provides the +* synchronization mechanism for this, so using this message effectively +* stalls the thread until it is its turn to write to the URB. Because of +* this, the best way to implement geometry shader algorithms in gen6 is to +* execute the algorithm before the FF_SYNC message to maximize parallelism. +* +* To achieve this we buffer the geometry shader outputs for each emitted +* vertex in vertex_output during operation. Then, when we have processed +* the last vertex (that is, at thread end time), we send the FF_SYNC +* message to allocate the initial VUE handle and write all buffered vertex +* data to the URB in one go. +* +* For each emitted vertex, vertex_output will hold vue_map.num_slots +* data items plus one additional item to hold required flags +* (PrimType, PrimStart, PrimEnd, as expected by the URB_WRITE message) +* which come right after the data items for that vertex. Vertex data and +* flags for the next vertex come right after the data items and flags for +* the previous vertex. +*/ + this-current_annotation = gen6 prolog; + this-vertex_output = src_reg(this, + glsl_type::uint_type, + (prog_data-vue_map.num_slots + 1) * +
[Mesa-dev] [PATCH 08/37] i965/gen6/gs: Implement GS_OPCODE_URB_WRITE_ALLOCATE.
Gen6 geometry shaders need to allocate URB handles for each new vertex they emit after the first (the URB handle for the first vertex is obtained via the FF_SYNC message). This opcode adds the URB allocation mechanism to regular URB writes. --- src/mesa/drivers/dri/i965/brw_defines.h | 8 +++ src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 30 5 files changed, 42 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 125d728..60b3846 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -929,6 +929,14 @@ enum opcode { GS_OPCODE_URB_WRITE, /** +* Write geometry shader output data to the URB and request a new URB +* handle (gen6). +* +* This opcode doesn't do an implied move from R0 to the first MRF. +*/ + GS_OPCODE_URB_WRITE_ALLOCATE, + + /** * Terminate the geometry shader thread by doing an empty URB write. * * This opcode doesn't do an implied move from R0 to the first MRF. This diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 5749061..69d16a7 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -514,6 +514,8 @@ brw_instruction_name(enum opcode op) case GS_OPCODE_URB_WRITE: return gs_urb_write; + case GS_OPCODE_URB_WRITE_ALLOCATE: + return gs_urb_write_allocate; case GS_OPCODE_THREAD_END: return gs_thread_end; case GS_OPCODE_SET_WRITE_OFFSET: diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index b572b61..e413a05 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -274,6 +274,7 @@ vec4_visitor::implied_mrf_writes(vec4_instruction *inst) case SHADER_OPCODE_GEN4_SCRATCH_WRITE: return 3; case GS_OPCODE_URB_WRITE: + case GS_OPCODE_URB_WRITE_ALLOCATE: case GS_OPCODE_THREAD_END: return 0; case SHADER_OPCODE_SHADER_TIME_ADD: diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 72fabdd..c1daf54 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -649,6 +649,7 @@ private: void generate_vs_urb_write(vec4_instruction *inst); void generate_gs_urb_write(vec4_instruction *inst); + void generate_gs_urb_write_allocate(vec4_instruction *inst); void generate_gs_thread_end(vec4_instruction *inst); void generate_gs_set_write_offset(struct brw_reg dst, struct brw_reg src0, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 05f4892..8ef0c34 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -429,6 +429,32 @@ vec4_generator::generate_gs_urb_write(vec4_instruction *inst) } void +vec4_generator::generate_gs_urb_write_allocate(vec4_instruction *inst) +{ + struct brw_reg src = brw_message_reg(inst-base_mrf); + + /* We pass the temporary passed in src0 as the writeback register */ + brw_urb_WRITE(p, + inst-get_src(this-prog_data, 0), /* dest */ + inst-base_mrf, /* starting mrf reg nr */ + src, + BRW_URB_WRITE_ALLOCATE_COMPLETE, + inst-mlen, + 1, /* response len */ + inst-offset, /* urb destination offset */ + BRW_URB_SWIZZLE_INTERLEAVE); + + /* Now put allocated urb handle in dst.0 */ + brw_push_insn_state(p); + brw_set_default_access_mode(p, BRW_ALIGN_1); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + brw_MOV(p, get_element_ud(inst-get_dst(), 0), + get_element_ud(inst-get_src(this-prog_data, 0), 0)); + brw_set_default_access_mode(p, BRW_ALIGN_16); + brw_pop_insn_state(p); +} + +void vec4_generator::generate_gs_thread_end(vec4_instruction *inst) { struct brw_reg src = brw_message_reg(inst-base_mrf); @@ -1206,6 +1232,10 @@ vec4_generator::generate_vec4_instruction(vec4_instruction *instruction, generate_gs_urb_write(inst); break; + case GS_OPCODE_URB_WRITE_ALLOCATE: + generate_gs_urb_write_allocate(inst); + break; + case GS_OPCODE_THREAD_END: generate_gs_thread_end(inst); break; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 29/37] i965/gen6/gs: Setup SOL surfaces for user-provided geometry shaders
From: Samuel Iglesias Gonsalvez sigles...@igalia.com Update gen6_gs_binding_table and gen6_sol_surface to use user-provided geometry program information when present. This is necessary to implement transform feedback support. Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/brw_context.h | 2 +- src/mesa/drivers/dri/i965/gen6_sol.c| 119 ++-- 2 files changed, 82 insertions(+), 39 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3418b76..82f32af 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -914,7 +914,7 @@ struct brw_stage_state uint32_t push_const_offset; /* Offset in the batchbuffer */ int push_const_size; /* in 256-bit register increments */ - /* Binding table: pointers to SURFACE_STATE entries. */ + /** Binding table: pointers to SURFACE_STATE entries. */ uint32_t bind_bo_offset; uint32_t surf_offset[BRW_MAX_SURFACES]; diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c index e1c1b3c..d21a010 100644 --- a/src/mesa/drivers/dri/i965/gen6_sol.c +++ b/src/mesa/drivers/dri/i965/gen6_sol.c @@ -41,13 +41,21 @@ gen6_update_sol_surfaces(struct brw_context *brw) /* BRW_NEW_TRANSFORM_FEEDBACK */ struct gl_transform_feedback_object *xfb_obj = ctx-TransformFeedback.CurrentObject; - /* BRW_NEW_VERTEX_PROGRAM */ - const struct gl_shader_program *shaderprog = - ctx-_Shader-CurrentProgram[MESA_SHADER_VERTEX]; - const struct gl_transform_feedback_info *linked_xfb_info = - shaderprog-LinkedTransformFeedback; + const struct gl_shader_program *shaderprog; + const struct gl_transform_feedback_info *linked_xfb_info; int i; + if (brw-geometry_program) { + /* BRW_NEW_GEOMETRY_PROGRAM */ + shaderprog = + ctx-_Shader-CurrentProgram[MESA_SHADER_GEOMETRY]; + } else { + /* BRW_NEW_VERTEX_PROGRAM */ + shaderprog = + ctx-_Shader-CurrentProgram[MESA_SHADER_VERTEX]; + } + linked_xfb_info = shaderprog-LinkedTransformFeedback; + for (i = 0; i BRW_MAX_SOL_BINDINGS; ++i) { const int surf_index = SURF_INDEX_GEN6_SOL_BINDING(i); if (_mesa_is_xfb_active_and_unpaused(ctx) @@ -56,12 +64,24 @@ gen6_update_sol_surfaces(struct brw_context *brw) unsigned buffer_offset = xfb_obj-Offset[buffer] / 4 + linked_xfb_info-Outputs[i].DstOffset; - brw_update_sol_surface( -brw, xfb_obj-Buffers[buffer], brw-ff_gs.surf_offset[surf_index], -linked_xfb_info-Outputs[i].NumComponents, -linked_xfb_info-BufferStride[buffer], buffer_offset); + if (brw-geometry_program) { +brw_update_sol_surface( + brw, xfb_obj-Buffers[buffer], + brw-gs.base.surf_offset[surf_index], + linked_xfb_info-Outputs[i].NumComponents, + linked_xfb_info-BufferStride[buffer], buffer_offset); + } else { +brw_update_sol_surface( + brw, xfb_obj-Buffers[buffer], + brw-ff_gs.surf_offset[surf_index], + linked_xfb_info-Outputs[i].NumComponents, + linked_xfb_info-BufferStride[buffer], buffer_offset); + } } else { - brw-ff_gs.surf_offset[surf_index] = 0; + if (!brw-geometry_program) +brw-ff_gs.surf_offset[surf_index] = 0; + else +brw-gs.base.surf_offset[surf_index] = 0; } } @@ -73,6 +93,7 @@ const struct brw_tracked_state gen6_sol_surface = { .mesa = 0, .brw = (BRW_NEW_BATCH | BRW_NEW_VERTEX_PROGRAM | + BRW_NEW_GEOMETRY_PROGRAM | BRW_NEW_TRANSFORM_FEEDBACK), .cache = 0 }, @@ -86,38 +107,50 @@ const struct brw_tracked_state gen6_sol_surface = { static void brw_gs_upload_binding_table(struct brw_context *brw) { - struct gl_context *ctx = brw-ctx; - /* BRW_NEW_VERTEX_PROGRAM */ - const struct gl_shader_program *shaderprog = - ctx-_Shader-CurrentProgram[MESA_SHADER_VERTEX]; - bool has_surfaces = false; uint32_t *bind; - if (shaderprog) { - const struct gl_transform_feedback_info *linked_xfb_info = -shaderprog-LinkedTransformFeedback; - /* Currently we only ever upload surfaces for SOL. */ - has_surfaces = linked_xfb_info-NumOutputs != 0; - } + if (!brw-geometry_program) { + struct gl_context *ctx = brw-ctx; + /* BRW_NEW_VERTEX_PROGRAM */ + const struct gl_shader_program *shaderprog = + ctx-_Shader-CurrentProgram[MESA_SHADER_VERTEX]; + bool has_surfaces = false; + + if (shaderprog) { + const struct gl_transform_feedback_info *linked_xfb_info = +shaderprog-LinkedTransformFeedback; + /* Currently we only ever upload surfaces for SOL. */ + has_surfaces =
[Mesa-dev] [PATCH 35/37] i965/gen6/gs: Use a specific implementation of geometry shaders for gen6.
In gen6 we will use the geometry shader implementation from gen6_gs_visitor.cpp and keep the implementation in brw_vec4_gs_visitor.cpp for gen7+. Notice that gen6_gs_visitor inherits from brw_vec4_gs_visitor so it is not a completely seprate implementation of geometry shaders. Also, gen6 does not support multiple dispatch modes, its default operation mode is equivalent to gen7's SINGLE mode, so select that in gen6 for consistency. --- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 56 ++- 1 file changed, 35 insertions(+), 21 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp index c2a4892..d9f658e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp @@ -28,6 +28,7 @@ */ #include brw_vec4_gs_visitor.h +#include gen6_gs_visitor.h const unsigned MAX_GS_INPUT_VERTICES = 6; @@ -634,19 +635,21 @@ brw_gs_emit(struct brw_context *brw, brw_dump_ir(brw, geometry, prog, shader-base, NULL); } - /* Compile the geometry shader in DUAL_OBJECT dispatch mode, if we can do -* so without spilling. If the GS invocations count 1, then we can't use -* dual object mode. -*/ - if (c-prog_data.invocations = 1 - likely(!(INTEL_DEBUG DEBUG_NO_DUAL_OBJECT_GS))) { - c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_OBJECT; - - vec4_gs_visitor v(brw, c, prog, mem_ctx, true /* no_spills */); - if (v.run()) { - return generate_assembly(brw, prog, c-gp-program.Base, - c-prog_data.base, mem_ctx, v.instructions, - final_assembly_size); + if (brw-gen = 7) { + /* Compile the geometry shader in DUAL_OBJECT dispatch mode, if we can do + * so without spilling. If the GS invocations count 1, then we can't use + * dual object mode. + */ + if (c-prog_data.invocations = 1 + likely(!(INTEL_DEBUG DEBUG_NO_DUAL_OBJECT_GS))) { + c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_OBJECT; + + vec4_gs_visitor v(brw, c, prog, mem_ctx, true /* no_spills */); + if (v.run()) { +return generate_assembly(brw, prog, c-gp-program.Base, + c-prog_data.base, mem_ctx, + v.instructions, final_assembly_size); + } } } @@ -655,22 +658,33 @@ brw_gs_emit(struct brw_context *brw, * back to DUAL_INSTANCED or SINGLE mode, which consumes fewer registers. * * SINGLE mode is more performant when invocations == 1 and DUAL_INSTANCE -* mode is more performant when invocations 1. +* mode is more performant when invocations 1. Gen6 only supports +* SINGLE mode. */ - if (c-prog_data.invocations = 1) + if (c-prog_data.invocations = 1 || brw-gen 7) c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_SINGLE; else c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE; - vec4_gs_visitor v(brw, c, prog, mem_ctx, false /* no_spills */); - if (!v.run()) { + vec4_gs_visitor *gs = NULL; + const unsigned *ret = NULL; + + if (brw-gen = 7) + gs = new vec4_gs_visitor(brw, c, prog, mem_ctx, false /* no_spills */); + else + gs = new gen6_gs_visitor(brw, c, prog, mem_ctx, false /* no_spills */); + + if (!gs-run()) { prog-LinkStatus = false; - ralloc_strcat(prog-InfoLog, v.fail_msg); - return NULL; + ralloc_strcat(prog-InfoLog, gs-fail_msg); + } else { + ret = generate_assembly(brw, prog, c-gp-program.Base, + c-prog_data.base, mem_ctx, gs-instructions, + final_assembly_size); } - return generate_assembly(brw, prog, c-gp-program.Base, c-prog_data.base, -mem_ctx, v.instructions, final_assembly_size); + delete gs; + return ret; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 19/37] i965/gen6/gs: Handle the case where a geometry shader emits no output.
In gen6 we need to end the thread differently depending on whether we have emitted at least one vertex or not. In case we did, the EOT message must always include the COMPLETE flag or else the GPU hangs. If we have not produced any output, however, we can't use the COMPLETE flag. This would lead us to end the program with an ENDIF opcode, which we want to avoid (and actually is not permitted since it hits an assertion), so instead what we do is that we always request a new VUE handle every time we do an URB WRITE, even for the last vertex we emit. With this we make sure that whether we have emitted at least one vertex or none at all we have to finish the thread without writing to the URB, which works for both cases by setting the COMPLETE and UNUSED flags in the EOT message. --- src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 237 +- src/mesa/drivers/dri/i965/gen6_gs_visitor.h | 3 +- 2 files changed, 118 insertions(+), 122 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp index 252e585..4a440eb 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp @@ -166,7 +166,7 @@ gen6_gs_visitor::visit(ir_end_primitive *) /* Otheriwse we know that the last vertex we have processed was the last * vertex in the primitive and we need to set its PrimEnd flag, so do this -* unless we haven't emitted that vertex at all. +* unless we haven't emitted that vertex at all (vertex_count != 0). * * Notice that we have already incremented vertex_count when we processed * the last emit_vertex, so we need to take that into account in the @@ -176,6 +176,10 @@ gen6_gs_visitor::visit(ir_end_primitive *) unsigned num_output_vertices = c-gp-program.VerticesOut; emit(CMP(dst_null_d(), this-vertex_count, src_reg(num_output_vertices + 1), BRW_CONDITIONAL_L)); + vec4_instruction *inst = emit(CMP(dst_null_d(), + this-vertex_count, 0u, + BRW_CONDITIONAL_NEQ)); + inst-predicate = BRW_PREDICATE_NORMAL; emit(IF(BRW_PREDICATE_NORMAL)); { /* vertex_output_offset is already pointing at the first entry of the @@ -224,47 +228,40 @@ gen6_gs_visitor::emit_urb_write_header(int mrf) } void -gen6_gs_visitor::emit_urb_write_opcode(bool complete, src_reg vertex, - int base_mrf, int mlen, int urb_offset) +gen6_gs_visitor::emit_urb_write_opcode(bool complete, int base_mrf, + int last_mrf, int urb_offset) { vec4_instruction *inst = NULL; - /* If the vertex is not complete we don't have to do anything special */ if (!complete) { + /* If the vertex is not complete we don't have to do anything special */ inst = emit(GS_OPCODE_URB_WRITE); inst-urb_write_flags = BRW_URB_WRITE_NO_FLAGS; - inst-base_mrf = base_mrf; - inst-mlen = mlen; - inst-offset = urb_offset; - return; - } - - /* Otherwise, if this is not the last vertex we are going to write, -* we have to request a new VUE handle for the next vertex. -* -* Notice that the vertex parameter has been pre-incremented in -* emit_thread_end() to make this comparison easier. -*/ - emit(CMP(dst_null_d(), vertex, this-vertex_count, BRW_CONDITIONAL_L)); - emit(IF(BRW_PREDICATE_NORMAL)); - { + } else { + /* Otherwise we always request to allocate a new VUE handle. If this is + * the last write before the EOT message and the new handle never gets + * used it will be dereferenced when we send the EOT message. This is + * necessary to avoid different setups for the EOT message (one for the + * case when there is no output and another for the case when there is) + * which would require to end the program with an IF/ELSE/ENDIF block, + * something we do not want. + */ inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE); inst-urb_write_flags = BRW_URB_WRITE_COMPLETE; - inst-base_mrf = base_mrf; - inst-mlen = mlen; - inst-offset = urb_offset; inst-dst = dst_reg(MRF, base_mrf); inst-src[0] = this-temp; } - emit(BRW_OPCODE_ELSE); - { - inst = emit(GS_OPCODE_URB_WRITE); - inst-urb_write_flags = BRW_URB_WRITE_COMPLETE; - inst-base_mrf = base_mrf; - inst-mlen = mlen; - inst-offset = urb_offset; - } - emit(BRW_OPCODE_ENDIF); + + inst-base_mrf = base_mrf; + /* URB data written (does not include the message header reg) must +* be a multiple of 256 bits, or 2 VS registers. See vol5c.5, +* section 5.4.3.2.2: URB_INTERLEAVED. +*/ + int mlen = last_mrf - base_mrf; + if ((mlen % 2) != 1) + mlen++; + inst-mlen = mlen; + inst-offset = urb_offset; } void @@ -303,112 +300,112 @@ gen6_gs_visitor::emit_thread_end() int max_usable_mrf =
[Mesa-dev] [PATCH 12/37] i965/gen6/gs: Upload binding table for user-provided geometry shaders.
--- src/mesa/drivers/dri/i965/brw_binding_tables.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c b/src/mesa/drivers/dri/i965/brw_binding_tables.c index 30a54ef..709cb9c 100644 --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c @@ -215,7 +215,10 @@ gen6_upload_binding_table_pointers(struct brw_context *brw) GEN6_BINDING_TABLE_MODIFY_PS | (4 - 2)); OUT_BATCH(brw-vs.base.bind_bo_offset); /* vs */ - OUT_BATCH(brw-ff_gs.bind_bo_offset); /* gs */ + if (brw-ff_gs.prog_active) + OUT_BATCH(brw-ff_gs.bind_bo_offset); /* gs */ + else + OUT_BATCH(brw-gs.base.bind_bo_offset); /* gs */ OUT_BATCH(brw-wm.base.bind_bo_offset); /* wm/ps */ ADVANCE_BATCH(); } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/37] i965: Provide means to create registers of a given size.
Implemented by Ilia Mirkin imir...@alum.mit.edu. --- src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 14 ++ 2 files changed, 15 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 5403f5a..d95b58d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -107,6 +107,7 @@ public: bool equals(const src_reg r) const; src_reg(class vec4_visitor *v, const struct glsl_type *type); + src_reg(class vec4_visitor *v, const struct glsl_type *type, int size); explicit src_reg(dst_reg reg); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 1b46850..e1fbcbc 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -639,6 +639,20 @@ src_reg::src_reg(class vec4_visitor *v, const struct glsl_type *type) this-type = brw_type_for_base_type(type); } +src_reg::src_reg(class vec4_visitor *v, const struct glsl_type *type, int size) +{ + assert(size 0); + + init(); + + this-file = GRF; + this-reg = v-virtual_grf_alloc(type_size(type) * size); + + this-swizzle = BRW_SWIZZLE_NOOP; + + this-type = brw_type_for_base_type(type); +} + dst_reg::dst_reg(class vec4_visitor *v, const struct glsl_type *type) { init(); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 32/37] i965/gen6/gs: Fix binding table clash between TF surfaces and textures.
For gen6 geometry shaders we use the first BRW_MAX_SOL_BINDINGS entries of the binding table for transform feedback surfaces. However, vec4_visitor will setup the binding table so that textures use the same space in the binding table. This is done when calling assign_common_binding_table_offsets(0) as part if its run() method. To fix this clash we add a virtual method to the vec4_visitor hierarchy to assign the binding table offsets, so that we can change this behavior specifically for gen6 geometry shaders by mapping textures right after the first BRW_MAX_SOL_BINDINGS entries. Also, when there is no user-provided geometry shader, we only need to upload the binding table if we have transform feedback, however, in the case of a user-provided geometry shader, we can't only look into transform feedback to make that decision. This fixes multiple piglit tests for textureSize() and texelFetch() when these functions are called from a geometry shader in gen6, like these: bin/textureSize gs sampler2D -fbo -auto bin/texelFetch gs usampler2D -fbo -auto --- src/mesa/drivers/dri/i965/brw_context.h | 8 ++- src/mesa/drivers/dri/i965/brw_vec4.cpp| 8 ++- src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 9 +++ src/mesa/drivers/dri/i965/gen6_gs_visitor.h | 1 + src/mesa/drivers/dri/i965/gen6_sol.c | 80 ++- 6 files changed, 78 insertions(+), 29 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 82f32af..aad7033 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -597,7 +597,6 @@ struct brw_vs_prog_data { 2 /* shader time, pull constants */) #define SURF_INDEX_GEN6_SOL_BINDING(t) (t) -#define BRW_MAX_GEN6_GS_SURFACES SURF_INDEX_GEN6_SOL_BINDING(BRW_MAX_SOL_BINDINGS) /* Note: brw_gs_prog_data_compare() must be updated when adding fields to * this struct! @@ -1240,7 +1239,12 @@ struct brw_context uint32_t state_offset; uint32_t bind_bo_offset; - uint32_t surf_offset[BRW_MAX_GEN6_GS_SURFACES]; + /** + * Surface offsets for the binding table. We only need surfaces to + * implement transform feedback so BRW_MAX_SOL_BINDINGS is all that we + * need in this case. + */ + uint32_t surf_offset[BRW_MAX_SOL_BINDINGS]; } ff_gs; struct { diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index e413a05..5307861 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1529,6 +1529,12 @@ vec4_vs_visitor::setup_payload(void) this-first_non_payload_grf = reg; } +void +vec4_visitor::assign_binding_table_offsets() +{ + assign_common_binding_table_offsets(0); +} + src_reg vec4_visitor::get_timestamp() { @@ -1628,7 +1634,7 @@ vec4_visitor::run() if (INTEL_DEBUG DEBUG_SHADER_TIME) emit_shader_time_begin(); - assign_common_binding_table_offsets(0); + assign_binding_table_offsets(); emit_prolog(); diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 58a5aac..531ec68 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -589,6 +589,7 @@ protected: void setup_payload_interference(struct ra_graph *g, int first_payload_node, int reg_node_count); virtual dst_reg *make_reg_for_system_value(ir_variable *ir) = 0; + virtual void assign_binding_table_offsets(); virtual void setup_payload() = 0; virtual void emit_prolog() = 0; virtual void emit_program_code() = 0; diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp index 8b7b8fd..8285efb 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp @@ -36,6 +36,15 @@ const unsigned MAX_GS_INPUT_VERTICES = 6; namespace brw { void +gen6_gs_visitor::assign_binding_table_offsets() +{ + /* In gen6 we reserve the first BRW_MAX_SOL_BINDINGS entries for transform +* feedback surfaces. +*/ + assign_common_binding_table_offsets(BRW_MAX_SOL_BINDINGS); +} + +void gen6_gs_visitor::emit_prolog() { vec4_gs_visitor::emit_prolog(); diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.h b/src/mesa/drivers/dri/i965/gen6_gs_visitor.h index db65f81..3a67fe4 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.h +++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.h @@ -43,6 +43,7 @@ public: vec4_gs_visitor(brw, c, prog, mem_ctx, no_spills) {} protected: + virtual void assign_binding_table_offsets(); virtual void emit_prolog(); virtual void emit_thread_end(); virtual void visit(ir_emit_vertex *); diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c index
[Mesa-dev] [PATCH 10/37] i965/gen6/gs: Compute URB entry size for user-provided geometry shaders.
--- src/mesa/drivers/dri/i965/brw_defines.h | 8 ++- src/mesa/drivers/dri/i965/brw_vec4_gs.c | 87 + 2 files changed, 62 insertions(+), 33 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 60b3846..a2b40fb 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1576,10 +1576,14 @@ enum brw_message_target { # define GEN7_URB_ENTRY_SIZE_SHIFT 16 # define GEN7_URB_STARTING_ADDRESS_SHIFT25 -/* GS URB Entry Allocation Size is a U9-1 field, so the maximum gs_size +/* Gen7 GS URB Entry Allocation Size is a U9-1 field, so the maximum gs_size * is 2^9, or 512. It's counted in multiples of 64 bytes. */ -#define GEN7_MAX_GS_URB_ENTRY_SIZE_BYTES (512*64) +#define GEN7_MAX_GS_URB_ENTRY_SIZE_BYTES(512*64) +/* Gen6 GS URB Entry Allocation Size is defined as a number of 1024-bit + * (128 bytes) URB rows and the maximum allowed value is 5 rows. + */ +#define GEN6_MAX_GS_URB_ENTRY_SIZE_BYTES(5*128) #define _3DSTATE_PUSH_CONSTANT_ALLOC_VS 0x7912 /* GEN7+ */ #define _3DSTATE_PUSH_CONSTANT_ALLOC_GS 0x7915 /* GEN7+ */ diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c b/src/mesa/drivers/dri/i965/brw_vec4_gs.c index 2d9e8c2..a445174 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c @@ -75,31 +75,36 @@ do_gs_prog(struct brw_context *brw, */ c.prog_data.base.base.nr_params = ALIGN(param_count, 4) / 4 + gs-num_samplers; - if (gp-program.OutputType == GL_POINTS) { - /* When the output type is points, the geometry shader may output data - * to multiple streams, and EndPrimitive() has no effect. So we - * configure the hardware to interpret the control data as stream ID. - */ - c.prog_data.control_data_format = GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID; - - /* We only have to emit control bits if we are using streams */ - if (prog-Geom.UsesStreams) - c.control_data_bits_per_vertex = 2; - else - c.control_data_bits_per_vertex = 0; + if (brw-gen = 7) { + if (gp-program.OutputType == GL_POINTS) { + /* When the output type is points, the geometry shader may output data + * to multiple streams, and EndPrimitive() has no effect. So we + * configure the hardware to interpret the control data as stream ID. + */ + c.prog_data.control_data_format = GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID; + + /* We only have to emit control bits if we are using streams */ + if (prog-Geom.UsesStreams) +c.control_data_bits_per_vertex = 2; + else +c.control_data_bits_per_vertex = 0; + } else { + /* When the output type is triangle_strip or line_strip, EndPrimitive() + * may be used to terminate the current strip and start a new one + * (similar to primitive restart), and outputting data to multiple + * streams is not supported. So we configure the hardware to interpret + * the control data as EndPrimitive information (a.k.a. cut bits). + */ + c.prog_data.control_data_format = GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT; + + /* We only need to output control data if the shader actually calls + * EndPrimitive(). + */ + c.control_data_bits_per_vertex = gp-program.UsesEndPrimitive ? 1 : 0; + } } else { - /* When the output type is triangle_strip or line_strip, EndPrimitive() - * may be used to terminate the current strip and start a new one - * (similar to primitive restart), and outputting data to multiple - * streams is not supported. So we configure the hardware to interpret - * the control data as EndPrimitive information (a.k.a. cut bits). - */ - c.prog_data.control_data_format = GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT; - - /* We only need to output control data if the shader actually calls - * EndPrimitive(). - */ - c.control_data_bits_per_vertex = gp-program.UsesEndPrimitive ? 1 : 0; + /* There are no control data bits in gen6. */ + c.control_data_bits_per_vertex = 0; } c.control_data_header_size_bits = gp-program.VerticesOut * c.control_data_bits_per_vertex; @@ -170,7 +175,8 @@ do_gs_prog(struct brw_context *brw, * */ unsigned output_vertex_size_bytes = c.prog_data.base.vue_map.num_slots * 16; - assert(output_vertex_size_bytes = GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES); + assert(brw-gen == 6 || + output_vertex_size_bytes = GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES); c.prog_data.output_vertex_size_hwords = ALIGN(output_vertex_size_bytes, 32) / 32; @@ -200,10 +206,20 @@ do_gs_prog(struct brw_context *brw, * the above figures are all worst-case, and most of them
[Mesa-dev] [PATCH 18/37] i965/gen6/gs: Make sure we complete the last primitive.
Just in case the GS algorithm does not call EndPrimitive() for the last primitive produced. This is relevant only for non point outputs, since for this we are already setting the PrimEnd flag on each vertex we emit. --- src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 13 + 1 file changed, 13 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp index 5123bd7..252e585 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp @@ -270,6 +270,19 @@ gen6_gs_visitor::emit_urb_write_opcode(bool complete, src_reg vertex, void gen6_gs_visitor::emit_thread_end() { + /* Make sure the current primitive is ended: we know it is not ended when +* first_vertex is not zero. This is only relevant for outputs other than +* points because in the point case we set PrimEnd on all vertices. +*/ + if (c-gp-program.OutputType != GL_POINTS) { + emit(CMP(dst_null_d(), this-first_vertex, 0u, BRW_CONDITIONAL_Z)); + emit(IF(BRW_PREDICATE_NORMAL)); + { + visit((ir_end_primitive *) NULL); + } + emit(BRW_OPCODE_ENDIF); + } + /* Here we have to: * 1) Emit an FF_SYNC messsage to obtain an initial VUE handle. * 2) Loop over all buffered vertex data and write it to corresponding -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 23/37] i965/gen6/gs: Enable texture units and upload sampler state.
--- src/mesa/drivers/dri/i965/brw_context.c| 2 +- src/mesa/drivers/dri/i965/brw_state_upload.c | 1 + src/mesa/drivers/dri/i965/gen6_sampler_state.c | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index bf2aedb..bc0e1dd 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -300,7 +300,7 @@ brw_initialize_context_constants(struct brw_context *brw) MIN2(ctx-Const.MaxTextureCoordUnits, ctx-Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits); ctx-Const.Program[MESA_SHADER_VERTEX].MaxTextureImageUnits = max_samplers; - if (brw-gen = 7) + if (brw-gen = 6) ctx-Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits = max_samplers; else ctx-Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits = 0; diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index a52a8f4..b0d78ab 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -147,6 +147,7 @@ static const struct brw_tracked_state *gen6_atoms[] = brw_fs_samplers, brw_vs_samplers, + brw_gs_samplers, gen6_sampler_state, gen6_multisample_state, diff --git a/src/mesa/drivers/dri/i965/gen6_sampler_state.c b/src/mesa/drivers/dri/i965/gen6_sampler_state.c index 981e98f..9c6c508 100644 --- a/src/mesa/drivers/dri/i965/gen6_sampler_state.c +++ b/src/mesa/drivers/dri/i965/gen6_sampler_state.c @@ -40,7 +40,7 @@ upload_sampler_state_pointers(struct brw_context *brw) PS_SAMPLER_STATE_CHANGE | (4 - 2)); OUT_BATCH(brw-vs.base.sampler_offset); /* VS */ - OUT_BATCH(0); /* GS */ + OUT_BATCH(brw-gs.base.sampler_offset); /* GS */ OUT_BATCH(brw-wm.base.sampler_offset); ADVANCE_BATCH(); } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 21/37] i965/gen6/gs: Implement support for gl_PrimitiveIdIn.
For this we will need to move PrimitiveID information, delivered in the thread payload in r0.1, to a separate register (we use GS_OPCODE_SET_PRIMITIVE_ID for this), then map the corresponding varying slot to that register in the setup_payload() method. Notice that we cannot use a virtual register as the destination for the PrimitiveID because we need to map all input attributes to hardware registers in setup_payload(), which happens before virtual registers are mapped to hardware registers. We could work around that issue if we were able to compute the first non-payload register in emit_prolog() and move the PrimitiveID information to that register, but we can't because at that point we still don't know the final number uniforms that will be included in the payload. So, what we do is to place PrimitiveID information in r1, which is always delivered as part of the payload but its only populated with data relevant for transform feedback when we set GEN6_GS_SVBI_PAYLOAD_ENABLE in the 3DSTATE_GS state packet. When we implement transform feedback, we wil make sure to move the value of r1 to another register before we overwrite it with the PrimitiveID. --- src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 69 ++- src/mesa/drivers/dri/i965/gen6_gs_visitor.h | 2 + 2 files changed, 70 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp index 4a440eb..b45c381 100644 --- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp @@ -31,6 +31,8 @@ #include gen6_gs_visitor.h +const unsigned MAX_GS_INPUT_VERTICES = 6; + namespace brw { void @@ -38,6 +40,7 @@ gen6_gs_visitor::emit_prolog() { vec4_gs_visitor::emit_prolog(); + this-current_annotation = gen6 prolog; /* Gen6 geometry shaders require to allocate an initial VUE handle via * FF_SYNC message, however the documentation remarks that only one thread * can write to the URB simultaneously and the FF_SYNC message provides the @@ -59,7 +62,6 @@ gen6_gs_visitor::emit_prolog() * flags for the next vertex come right after the data items and flags for * the previous vertex. */ - this-current_annotation = gen6 prolog; this-vertex_output = src_reg(this, glsl_type::uint_type, (prog_data-vue_map.num_slots + 1) * @@ -94,6 +96,30 @@ gen6_gs_visitor::emit_prolog() */ this-prim_count = src_reg(this, glsl_type::uint_type); emit(MOV(dst_reg(this-prim_count), 0u)); + + /* PrimitveID is delivered in r0.1 of the thread payload. If the program +* needs it we have to move it to a separate register where we can map +* the atttribute. +* +* Notice that we cannot use a virtual register for this, because we need to +* map all input attributes to hardware registers in setup_payload(), +* which happens before virtual registers are mapped to hardware registers. +* We could work around that issue if we were able to compute the first +* non-payload register here and move the PrimitiveID information to that +* register, but we can't because at this point we don't know the final +* number uniforms that will be included in the payload. +* +* So, what we do is to place PrimitiveID information in r1, which is always +* delivered as part of the payload, but its only populated with data +* relevant for transform feedback when we set GEN6_GS_SVBI_PAYLOAD_ENABLE +* in the 3DSTATE_GS state packet. That information can be obtained by other +* means though, so we can safely use r1 for this purpose. +*/ + if (c-prog_data.include_primitive_id) { + this-primitive_id = + src_reg(retype(brw_vec8_grf(1, 0), BRW_REGISTER_TYPE_UD)); + emit(GS_OPCODE_SET_PRIMITIVE_ID, dst_reg(this-primitive_id)); + } } void @@ -410,4 +436,45 @@ gen6_gs_visitor::emit_thread_end() inst-mlen = 1; } +void +gen6_gs_visitor::setup_payload() +{ + int attribute_map[BRW_VARYING_SLOT_COUNT * MAX_GS_INPUT_VERTICES]; + + /* Attributes are going to be interleaved, so one register contains two +* attribute slots. +*/ + int attributes_per_reg = 2; + + /* If a geometry shader tries to read from an input that wasn't written by +* the vertex shader, that produces undefined results, but it shouldn't +* crash anything. So initialize attribute_map to zeros--that ensures that +* these undefined results are read from r0. +*/ + memset(attribute_map, 0, sizeof(attribute_map)); + + int reg = 0; + + /* The payload always contains important data in r0. */ + reg++; + + /* r1 is always part of the payload and it holds information relevant +* for transform feedback when we set the GEN6_GS_SVBI_PAYLOAD_ENABLE bit in +* the 3DSTATE_GS packet. We will overwrite it with the PrimitiveID +* information (and move the original
Re: [Mesa-dev] [PATCH 5/5] nv50, nvc0: add support for fine derivatives
I've included an appropriate release notes change including nv50 nvc0 in my i965 follow-up series, on the assumption that these two will land more or less together. On Thu, Aug 14, 2014 at 11:00 PM, Marek Olšák mar...@gmail.com wrote: Are you gonna update the release notes too? Marek On Thu, Aug 14, 2014 at 6:52 AM, Ilia Mirkin imir...@alum.mit.edu wrote: The quadop-based method we currently use on all chipsets already provides the fine version of the derivatives. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- docs/GL3.txt | 2 +- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 4 src/gallium/drivers/nouveau/nv50/nv50_screen.c| 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c| 2 +- 4 files changed, 7 insertions(+), 3 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 89529fe..0a40e23 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -189,7 +189,7 @@ GL 4.5, GLSL 4.50: GL_ARB_clip_control not started GL_ARB_conditional_render_inverted not started GL_ARB_cull_distance not started - GL_ARB_derivative_controlnot started + GL_ARB_derivative_controlDONE (nv50, nvc0) GL_ARB_direct_state_access not started GL_ARB_get_texture_sub_image started (Brian Paul) GL_ARB_shader_texture_image_samples not started diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 14b6d68..456efcb 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -531,7 +531,9 @@ static nv50_ir::operation translateOpcode(uint opcode) NV50_IR_OPCODE_CASE(COS, COS); NV50_IR_OPCODE_CASE(DDX, DFDX); + NV50_IR_OPCODE_CASE(DDX_FINE, DFDX); NV50_IR_OPCODE_CASE(DDY, DFDY); + NV50_IR_OPCODE_CASE(DDY_FINE, DFDY); NV50_IR_OPCODE_CASE(KILL, DISCARD); NV50_IR_OPCODE_CASE(SEQ, SET); @@ -2327,6 +2329,8 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) case TGSI_OPCODE_NOT: case TGSI_OPCODE_DDX: case TGSI_OPCODE_DDY: + case TGSI_OPCODE_DDX_FINE: + case TGSI_OPCODE_DDY_FINE: FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) mkOp1(op, dstTy, dst0[c], fetchSrc(0, c)); break; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 34cca3d..8a9a40e 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -169,6 +169,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_USER_VERTEX_BUFFERS: case PIPE_CAP_TEXTURE_MULTISAMPLE: case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER: + case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: return 1; case PIPE_CAP_SEAMLESS_CUBE_MAP: return 1; /* class_3d = NVA0_3D_CLASS; */ @@ -200,7 +201,6 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION: case PIPE_CAP_COMPUTE: case PIPE_CAP_DRAW_INDIRECT: - case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: return 0; } diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index 17aee63..c6d9b91 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -167,6 +167,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_SAMPLE_SHADING: case PIPE_CAP_TEXTURE_GATHER_OFFSETS: case PIPE_CAP_TEXTURE_GATHER_SM5: + case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: return 1; case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: return (class_3d = NVE4_3D_CLASS) ? 1 : 0; @@ -184,7 +185,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT: case PIPE_CAP_FAKE_SW_MSAA: case PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION: - case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: return 0; } -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] initial ARB_gpu_shader_fp64 posting
Hi; On 08/14/2014 01:52 PM, Dave Airlie wrote: This is just the mesa and glsl compiler portions of the ARB_gpu_shader_fp64 extension that I've been slowly iterating over the past few months. All in http://cgit.freedesktop.org/~airlied/mesa/log/?h=arb_gpu_shader_fp64-submit but underneath the gallium + softpipe + mesa/st development, which all need further cleaning and docs. I have some fixes/changes to this which I rebased on top of your latest tree, these are available here: http://cgit.freedesktop.org/~tpalli/mesa/log/?h=fp64_fixes Notably the last one (i965 changes) is very experimental and should be maybe ignored for now, others should be useful and fixes the fp64 tests I've been sending to Piglit. I introduced 'i2d and u2d', I'm not sure if this is wanted but it makes implicit conversions in ast_to_hir.cpp cleaner, other option would be to refactor implicit conversions code a bit. Let me know of your thoughts, I can go for refactor if these are not wanted. Thanks; The biggest bits of this are the builtin generator, constant expression handling and uniform interfaces. I suspect there are chunks in some patches that might need to be in other, and the uniform patches are probably not very well explained, mostly because I can't remember why exactly I did what I did in a few places. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev // Tapani ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] SandyBridge not handling GL_TRIANGLE_STRIP_ADJACENCY with repeating vertex indices correctly
On mar, 2014-07-29 at 10:12 +0200, Iago Toral Quiroga wrote: Hi, running the piglit tests on my implementation of geometry shaders for Sandy Bridge produces a GPU hang for the following test: ./glsl-1.50-geometry-primitive-id-restart GL_TRIANGLE_STRIP_ADJACENCY ffs That test checks primitive restarts but the hang seems to be unrelated to that, since it happens also when primitive restart is not enabled. The problem, which only affects GL_TRIANGLE_STRIP_ADJACENCY and no other primitive type -with our without adjacency-, is in this loop that the test uses to setup the indices for the vertices: elements = glMapBuffer(GL_ELEMENT_ARRAY_BUFFER, GL_READ_WRITE); num_elements = 0; for (i = 1; i = LONGEST_INPUT_SEQUENCE; i++) { for (j = 0; j i; j++) { /* Every element that isn't the primitive * restart index can just be element 0, since * we don't care about the actual vertex data. */ elements[num_elements++] = 0; } elements[num_elements++] = prim_restart_index; } glUnmapBuffer(GL_ELEMENT_ARRAY_BUFFER); Setting all elements to the same index (0 in this case) is the one thing that causes the hang for GL_TRIANGLE_STRIP_ADJACENCY. A simple change like this removes the hang: - elements[num_elements++] = 0; + elements[num_elements++] = j != prim_restart_index ? j : j + 1; Skimming through the docs I have not seen any references to this being a known problem. In fact, I don't see any references to GL_TRIANGLE_STRIP_ADJACENCY being special in any way and it seems that this is not a problem in IvyBridge, since the test runs correctly there. Does this sound like a hardware bug specific to SandyBridge's handling of GL_TRIANGLE_STRIP_ADJACENCY or is there something else I should check before arriving to that conclusion? If it is a hardware bug I guess we want a workaround for it , at least to prevent the hang or something but I am not sure what would be the best option here, I think the only option for the driver would be to explore the list of indices provided when this primitive type is used and when we hit this scenario (I'd have to test how many repeating indices we need for it to hang), error out and do not execute the drawing command or something... any other suggestions? This is what I found so far: 1. the problem is specific to glDrawElements. glDrawArrays works well even if all the vertices used have the same coordinates. To me this suggests that the problem should not be in our implementation of GS, since using glDrawArrays or glDrawElements is handled elsewhere and should be transparent to the implementation of the GS stage. 2. The problem does not happen in all situations, only when we repeat values in the indices we use with glDrawElements. In particular, I found that the pattern that leads to the hang seems to be: - There are only 8 indices and all of them are the same. - There are more than 8 indices and there is at least one subset of 9 consecutive indices where at least 8 indices are the same (they do not need to be consecutive within the group of 9). 3. The problem is specific to GL_TRIANGLE_STRIP_ADJACENCY. It does not hang for any other primitive. In fact, other primitives work well and produce the expected results. I have not seen specific requirements for this primitive type in the docs that could justify something like this. Even GL_TRIANGLE_STRIP_ADJACENCY seems to work well except when there are repeating vertices with that specific pattern in glDrawElements. 4. The problem seems to be independent of the code we generate in the GS stage, although this should not be surprising considering 1). Particularly, the hang persists even in the case of an empty main() function in the geometry shader (where we generate trivial code that of course works for any other primitive type). Based on this my conclusion is that this is very likely a hardware issue. That, or some very obscure problem in the implementation of the index buffer in gen6 that I have not seen and that only affects GL_TRIANGLE_STRIP_ADJACENCY for some reason. At this point I'd like to hear suggestions for things we could try next to confirm whether this is a hardware problem or a driver problem, or, if we agree that this is enough evidence that this must be a hardware problem, how we can limit its impact, starting, probably, by rewriting the piglit test so that we don't alter its purpose but avoid the hang on gen6. We should also discuss if there is a way to work around this problem so that at least developers running into it (as unlikely as that may be) don't hang their systems. I am going to be on holidays starting tomorrow and will have difficult and limited Internet access for the most part, but Samuel (in the CC) will be available next week to try any suggestions you may have. Iago ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
[Mesa-dev] [Bug 81680] [r600g] Firefox crashes with hardware acceleration turned on
https://bugs.freedesktop.org/show_bug.cgi?id=81680 --- Comment #28 from Eugene ken20...@ukr.net --- (In reply to comment #27) (In reply to comment #17) Program received signal SIGSEGV, Segmentation fault. PatchJump (label=..., jump=...) at When it says 'PatchJump (label=..., jump=...) at [...]', it's not a crash but normal JavaScript JIT operation. Run 'continue' in that case. Program received signal SIGSEGV, Segmentation fault. 0x in ?? () Only when it says '0x in ?? ()' is it the crash you're looking for. Run 'bt full' in that case and attach the output here. I told already that bt / bt full gives nothing: Program received signal SIGSEGV, Segmentation fault. 0x7fffe09f1d89 in ?? () (gdb) bt full #0 0x7fffe09f1d89 in ?? () No symbol table info available. #1 0x0500 in ?? () No symbol table info available. #2 0x7fffb86d3900 in ?? () No symbol table info available. #3 0x0003 in ?? () No symbol table info available. #4 0xfffbb86d0f00 in ?? () No symbol table info available. #5 0xfffadd4a9720 in ?? () No symbol table info available. Any suggestions ? -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 79629] [dri3] piglit glx_GLX_ARB_create_context_current_with_no_framebuffer fails
https://bugs.freedesktop.org/show_bug.cgi?id=79629 Eero Tamminen eero.t.tammi...@intel.com changed: What|Removed |Added CC||eero.t.tammi...@intel.com -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 37/37] i965/gen6: enable OpenGL 3.2
Hi Mike, I don't really know, probably someone from Intel should confirm this. Iago On Thu, 2014-08-14 at 14:14 +0100, Mike Lothian wrote: Isn't everything already added for GL 3.3? On 14 Aug 2014 12:13, Iago Toral Quiroga ito...@igalia.com wrote: From: Samuel Iglesias Gonsalvez sigles...@igalia.com Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/intel_screen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index ea0fc58..83101a5 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1273,7 +1273,7 @@ set_max_gl_versions(struct intel_screen *screen) psp-max_gl_es2_version = 30; break; case 6: - psp-max_gl_core_version = 31; + psp-max_gl_core_version = 32; psp-max_gl_compat_version = 30; psp-max_gl_es1_version = 11; psp-max_gl_es2_version = 30; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 37/37] i965/gen6: enable OpenGL 3.2
I think everything that's required for GL 3.3 has already been added can we jump directly there? On 14 Aug 2014 12:13, Iago Toral Quiroga ito...@igalia.com wrote: From: Samuel Iglesias Gonsalvez sigles...@igalia.com Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/intel_screen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index ea0fc58..83101a5 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1273,7 +1273,7 @@ set_max_gl_versions(struct intel_screen *screen) psp-max_gl_es2_version = 30; break; case 6: - psp-max_gl_core_version = 31; + psp-max_gl_core_version = 32; psp-max_gl_compat_version = 30; psp-max_gl_es1_version = 11; psp-max_gl_es2_version = 30; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 36/37] i965/gen6: enable GLSL 1.50
We can probably just change this to check for gen = 6 and expose 3.30 On 14 Aug 2014 12:13, Iago Toral Quiroga ito...@igalia.com wrote: From: Samuel Iglesias Gonsalvez sigles...@igalia.com Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/drivers/dri/i965/intel_extensions.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index e134cd9..9875b7c 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -246,7 +246,7 @@ intelInitExtensions(struct gl_context *ctx) if (brw-gen = 7) ctx-Const.GLSLVersion = 330; else if (brw-gen = 6) - ctx-Const.GLSLVersion = 140; + ctx-Const.GLSLVersion = 150; else ctx-Const.GLSLVersion = 120; _mesa_override_glsl_version(ctx-Const); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/1] configure.ac: Fix build with git-svn llvm version string
On Wed, Aug 13, 2014 at 04:46:56PM -0400, Jan Vesely wrote: Signed-off-by: Jan Vesely jan.ves...@rutgers.edu --- My llvm-config --version is 3.6.0git-svn-r215564-cd35a3b3 This patch assumes that the interesting part consists of only digits and dots. configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 4ff87eb..dc5117e 100644 --- a/configure.ac +++ b/configure.ac @@ -1697,7 +1697,7 @@ if test x$enable_gallium_llvm = xyes; then fi if test x$LLVM_CONFIG != xno; then -LLVM_VERSION=`$LLVM_CONFIG --version | sed 's/svn.*//g'` +LLVM_VERSION=`$LLVM_CONFIG --version | sed 's/[[^0-9.]].*//g'` As long as we are changing this. I think it would be simpler to use grep: `$LLVM_CONFIG --version | grep -o '^[[0-9.]]\+'` -Tom LLVM_LDFLAGS=`$LLVM_CONFIG --ldflags` LLVM_BINDIR=`$LLVM_CONFIG --bindir` LLVM_CPPFLAGS=`strip_unwanted_llvm_flags $LLVM_CONFIG --cppflags` -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/5] gallium: add opcodes/cap for fine derivative support
Reviewed-by: Roland Scheidegger srol...@vmware.com llvmpipe also already does the fine version. A coarse version (which we indeed do when used implicitly for sampling though with some other changes) might be minimally simpler though not even sure (might save a shuffle instruction somewhere), but probably not worth it (plus, d3d10 sm4 had deriv_rtx and sm5 deriv_rtx_coarse/deriv_rtx_fine but the sm4 versions correspond to the fine versions so this was required). Roland Am 14.08.2014 06:52, schrieb Ilia Mirkin: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/auxiliary/tgsi/tgsi_info.c | 3 +++ src/gallium/auxiliary/tgsi/tgsi_util.c | 2 ++ src/gallium/docs/source/screen.rst | 2 ++ src/gallium/docs/source/tgsi.rst | 12 ++-- src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + src/gallium/include/pipe/p_shader_tokens.h | 5 - 19 files changed, 35 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index e24348f..35f9747 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -235,6 +235,9 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 1, 0, 0, 0, 0, OTHR, INTERP_CENTROID, TGSI_OPCODE_INTERP_CENTROID }, { 1, 2, 0, 0, 0, 0, OTHR, INTERP_SAMPLE, TGSI_OPCODE_INTERP_SAMPLE }, { 1, 2, 0, 0, 0, 0, OTHR, INTERP_OFFSET, TGSI_OPCODE_INTERP_OFFSET }, + + { 1, 1, 0, 0, 0, 0, COMP, DDX_FINE, TGSI_OPCODE_DDX_FINE }, + { 1, 1, 0, 0, 0, 0, COMP, DDY_FINE, TGSI_OPCODE_DDY_FINE }, }; const struct tgsi_opcode_info * diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c b/src/gallium/auxiliary/tgsi/tgsi_util.c index e48159c..e1cba95 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_util.c +++ b/src/gallium/auxiliary/tgsi/tgsi_util.c @@ -245,6 +245,8 @@ tgsi_util_get_inst_usage_mask(const struct tgsi_full_instruction *inst, case TGSI_OPCODE_USNE: case TGSI_OPCODE_IMUL_HI: case TGSI_OPCODE_UMUL_HI: + case TGSI_OPCODE_DDX_FINE: + case TGSI_OPCODE_DDY_FINE: /* Channel-wise operations */ read_mask = write_mask; break; diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 814e3ae..6fecc15 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -213,6 +213,8 @@ The integer capabilities: * ``PIPE_CAP_DRAW_INDIRECT``: Whether the driver supports taking draw arguments { count, instance_count, start, index_bias } from a PIPE_BUFFER resource. See pipe_draw_info. +* ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE``: Whether the fragment shader supports + the FINE versions of DDX/DDY. .. _pipe_capf: diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index ac0ea54..7d5918f 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -433,7 +433,11 @@ This instruction replicates its result. dst = \cos{src.x} -.. opcode:: DDX - Derivative Relative To X +.. opcode:: DDX, DDX_FINE - Derivative Relative To X + +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is +advertised. When it is, the fine version guarantees one derivative per row +while DDX is allowed to be the same for the entire 2x2 quad. .. math:: @@ -446,7 +450,11 @@ This instruction replicates its result. dst.w = partialx(src.w) -.. opcode:: DDY - Derivative Relative To Y +.. opcode:: DDY, DDY_FINE - Derivative Relative To Y + +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is +advertised. When it is, the fine version guarantees one derivative per column +while DDY is allowed to be the same for the entire 2x2 quad. .. math:: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index de69b14..b156d8b 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -216,6 +216,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap
[Mesa-dev] [PATCH] i965/blorp_clear: Use memcpy instead of assignment to copy clear value
Hi, After the looking at the problem in bug 81150 I was wondering if we have the same problem when using glClear with integer values. Sure enough I can trigger a similar bug on 32-bit builds with optimisations using a piglit test which I've posted here: http://lists.freedesktop.org/archives/piglit/2014-August/012144.html - Neil --- 8 --- (use git am --scissors to automatically chop here) Similar to the problem described in 2c50212b14da27de4e3, if we copy the clear value through a regular assignment via a floating point value, then if an integer clear value is being used that happens to contain a signalling NaN value then it would get converted to a quiet NaN when stored via the x87 floating-point registers. This would corrupt the integer value. Instead we should use a memcpy to ensure the exact bit representation is preserved. This bug can be triggered on 32-bit builds with optimisations by using an integer clear color with a value like 0x7f817f81. --- src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp index ffbcd1a..8db0837 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp @@ -202,12 +202,7 @@ brw_blorp_clear_params::brw_blorp_clear_params(struct brw_context *brw, y1 = rb-Height - fb-_Ymin; } - float *push_consts = (float *)wm_push_consts; - - push_consts[0] = ctx-Color.ClearColor.f[0]; - push_consts[1] = ctx-Color.ClearColor.f[1]; - push_consts[2] = ctx-Color.ClearColor.f[2]; - push_consts[3] = ctx-Color.ClearColor.f[3]; + memcpy(wm_push_consts.dst_x0, ctx-Color.ClearColor.f, sizeof(float) * 4); use_wm_prog = true; @@ -250,7 +245,7 @@ brw_blorp_clear_params::brw_blorp_clear_params(struct brw_context *brw, if (irb-mt-fast_clear_state != INTEL_FAST_CLEAR_STATE_NO_MCS !partial_clear wm_prog_key.use_simd16_replicated_data is_color_fast_clear_compatible(brw, format, ctx-Color.ClearColor)) { - memset(push_consts, 0xff, 4*sizeof(float)); + memset(wm_push_consts, 0xff, 4*sizeof(float)); fast_clear_op = GEN7_FAST_CLEAR_OP_FAST_CLEAR; /* Figure out what the clear rectangle needs to be aligned to, and how -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] egl/main: use separate LIBEGL_C_FILES and LIBEGL_H_FILES to fix SCons build
The linker was trying to process .h files and failing. --- src/egl/main/Makefile.am |3 ++- src/egl/main/Makefile.sources | 36 2 files changed, 22 insertions(+), 17 deletions(-) diff --git a/src/egl/main/Makefile.am b/src/egl/main/Makefile.am index 6746bcc..06f6a05 100644 --- a/src/egl/main/Makefile.am +++ b/src/egl/main/Makefile.am @@ -34,7 +34,8 @@ AM_CFLAGS = \ lib_LTLIBRARIES = libEGL.la libEGL_la_SOURCES = \ - ${LIBEGL_C_FILES} + ${LIBEGL_C_FILES} \ + ${LIBEGL_H_FILES} libEGL_la_LIBADD = \ $(EGL_LIB_DEPS) diff --git a/src/egl/main/Makefile.sources b/src/egl/main/Makefile.sources index 6a917e2..3573004 100644 --- a/src/egl/main/Makefile.sources +++ b/src/egl/main/Makefile.sources @@ -1,38 +1,42 @@ LIBEGL_C_FILES := \ eglapi.c \ - eglapi.h \ eglarray.c \ + eglconfig.c \ + eglcontext.c \ + eglcurrent.c \ + egldisplay.c \ + egldriver.c \ + eglfallbacks.c \ + eglglobals.c \ + eglimage.c \ + egllog.c \ + eglmisc.c \ + eglmode.c \ + eglscreen.c \ + eglstring.c \ + eglsurface.c \ + eglsync.c + + +LIBEGL_H_FILES := \ + eglapi.h \ eglarray.h \ eglcompiler.h \ - eglconfig.c \ eglconfig.h \ - eglcontext.c \ eglcontext.h \ - eglcurrent.c \ eglcurrent.h \ egldefines.h \ - egldisplay.c \ egldisplay.h \ - egldriver.c \ egldriver.h \ - eglfallbacks.c \ - eglglobals.c \ eglglobals.h \ - eglimage.c \ eglimage.h \ - egllog.c \ egllog.h \ - eglmisc.c \ eglmisc.h \ - eglmode.c \ eglmode.h \ eglmutex.h \ - eglscreen.c \ eglscreen.h \ - eglstring.c \ eglstring.h \ - eglsurface.c \ eglsurface.h \ - eglsync.c \ eglsync.h \ egltypedefs.h + -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/5] gallium: add opcodes/cap for fine derivative support
I guess a question is whether we should even bother with the fine version at all then? Just map everything to DDX/DDY... Although I guess if llvmpipe does the coarse version sometimes, at least the fine version is warranted. On Thu, Aug 14, 2014 at 10:12 AM, Roland Scheidegger srol...@vmware.com wrote: Reviewed-by: Roland Scheidegger srol...@vmware.com llvmpipe also already does the fine version. A coarse version (which we indeed do when used implicitly for sampling though with some other changes) might be minimally simpler though not even sure (might save a shuffle instruction somewhere), but probably not worth it (plus, d3d10 sm4 had deriv_rtx and sm5 deriv_rtx_coarse/deriv_rtx_fine but the sm4 versions correspond to the fine versions so this was required). Roland Am 14.08.2014 06:52, schrieb Ilia Mirkin: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/auxiliary/tgsi/tgsi_info.c | 3 +++ src/gallium/auxiliary/tgsi/tgsi_util.c | 2 ++ src/gallium/docs/source/screen.rst | 2 ++ src/gallium/docs/source/tgsi.rst | 12 ++-- src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + src/gallium/include/pipe/p_shader_tokens.h | 5 - 19 files changed, 35 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index e24348f..35f9747 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -235,6 +235,9 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 1, 0, 0, 0, 0, OTHR, INTERP_CENTROID, TGSI_OPCODE_INTERP_CENTROID }, { 1, 2, 0, 0, 0, 0, OTHR, INTERP_SAMPLE, TGSI_OPCODE_INTERP_SAMPLE }, { 1, 2, 0, 0, 0, 0, OTHR, INTERP_OFFSET, TGSI_OPCODE_INTERP_OFFSET }, + + { 1, 1, 0, 0, 0, 0, COMP, DDX_FINE, TGSI_OPCODE_DDX_FINE }, + { 1, 1, 0, 0, 0, 0, COMP, DDY_FINE, TGSI_OPCODE_DDY_FINE }, }; const struct tgsi_opcode_info * diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c b/src/gallium/auxiliary/tgsi/tgsi_util.c index e48159c..e1cba95 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_util.c +++ b/src/gallium/auxiliary/tgsi/tgsi_util.c @@ -245,6 +245,8 @@ tgsi_util_get_inst_usage_mask(const struct tgsi_full_instruction *inst, case TGSI_OPCODE_USNE: case TGSI_OPCODE_IMUL_HI: case TGSI_OPCODE_UMUL_HI: + case TGSI_OPCODE_DDX_FINE: + case TGSI_OPCODE_DDY_FINE: /* Channel-wise operations */ read_mask = write_mask; break; diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 814e3ae..6fecc15 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -213,6 +213,8 @@ The integer capabilities: * ``PIPE_CAP_DRAW_INDIRECT``: Whether the driver supports taking draw arguments { count, instance_count, start, index_bias } from a PIPE_BUFFER resource. See pipe_draw_info. +* ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE``: Whether the fragment shader supports + the FINE versions of DDX/DDY. .. _pipe_capf: diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index ac0ea54..7d5918f 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -433,7 +433,11 @@ This instruction replicates its result. dst = \cos{src.x} -.. opcode:: DDX - Derivative Relative To X +.. opcode:: DDX, DDX_FINE - Derivative Relative To X + +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is +advertised. When it is, the fine version guarantees one derivative per row +while DDX is allowed to be the same for the entire 2x2 quad. .. math:: @@ -446,7 +450,11 @@ This instruction replicates its result. dst.w = partialx(src.w) -.. opcode:: DDY - Derivative Relative To Y +.. opcode:: DDY, DDY_FINE - Derivative Relative To Y + +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is +advertised. When it is, the fine version guarantees one derivative per column +while DDY is allowed to be the same for the entire 2x2 quad. .. math:: diff --git
Re: [Mesa-dev] [PATCH] egl/main: use separate LIBEGL_C_FILES and LIBEGL_H_FILES to fix SCons build
On 14/08/14 15:38, Brian Paul wrote: The linker was trying to process .h files and failing. Hi Brian, what linker do you have in mind ? Is it the same issue as reported here [1] ? If so I've just pushed Jose's patch which explicitly handles scons. commit d4a1f3fd270001b2fb0684dc981340391df8fb64 Author: Jose Fonseca jfons...@vmware.com Date: Wed Aug 13 20:33:35 2014 +0100 scons: do not include headers from the sources lists -Emil [1] https://bugs.freedesktop.org/show_bug.cgi?id=82534 --- src/egl/main/Makefile.am |3 ++- src/egl/main/Makefile.sources | 36 2 files changed, 22 insertions(+), 17 deletions(-) diff --git a/src/egl/main/Makefile.am b/src/egl/main/Makefile.am index 6746bcc..06f6a05 100644 --- a/src/egl/main/Makefile.am +++ b/src/egl/main/Makefile.am @@ -34,7 +34,8 @@ AM_CFLAGS = \ lib_LTLIBRARIES = libEGL.la libEGL_la_SOURCES = \ - ${LIBEGL_C_FILES} + ${LIBEGL_C_FILES} \ + ${LIBEGL_H_FILES} libEGL_la_LIBADD = \ $(EGL_LIB_DEPS) diff --git a/src/egl/main/Makefile.sources b/src/egl/main/Makefile.sources index 6a917e2..3573004 100644 --- a/src/egl/main/Makefile.sources +++ b/src/egl/main/Makefile.sources @@ -1,38 +1,42 @@ LIBEGL_C_FILES := \ eglapi.c \ - eglapi.h \ eglarray.c \ + eglconfig.c \ + eglcontext.c \ + eglcurrent.c \ + egldisplay.c \ + egldriver.c \ + eglfallbacks.c \ + eglglobals.c \ + eglimage.c \ + egllog.c \ + eglmisc.c \ + eglmode.c \ + eglscreen.c \ + eglstring.c \ + eglsurface.c \ + eglsync.c + + +LIBEGL_H_FILES := \ + eglapi.h \ eglarray.h \ eglcompiler.h \ - eglconfig.c \ eglconfig.h \ - eglcontext.c \ eglcontext.h \ - eglcurrent.c \ eglcurrent.h \ egldefines.h \ - egldisplay.c \ egldisplay.h \ - egldriver.c \ egldriver.h \ - eglfallbacks.c \ - eglglobals.c \ eglglobals.h \ - eglimage.c \ eglimage.h \ - egllog.c \ egllog.h \ - eglmisc.c \ eglmisc.h \ - eglmode.c \ eglmode.h \ eglmutex.h \ - eglscreen.c \ eglscreen.h \ - eglstring.c \ eglstring.h \ - eglsurface.c \ eglsurface.h \ - eglsync.c \ eglsync.h \ egltypedefs.h + ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 82534] src\egl\main\eglapi.h : fatal error LNK1107: invalid or corrupt file: cannot read at 0x2E02
https://bugs.freedesktop.org/show_bug.cgi?id=82534 Emil Velikov emil.l.veli...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #9 from Emil Velikov emil.l.veli...@gmail.com --- Pushed to master commit d4a1f3fd270001b2fb0684dc981340391df8fb64 Author: Jose Fonseca jfons...@vmware.com Date: Wed Aug 13 20:33:35 2014 +0100 scons: do not include headers from the sources lists -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl/main: use separate LIBEGL_C_FILES and LIBEGL_H_FILES to fix SCons build
I guess I missed that patch/discussion. That fixes things for me too. -Brian On 08/14/2014 08:49 AM, Emil Velikov wrote: On 14/08/14 15:38, Brian Paul wrote: The linker was trying to process .h files and failing. Hi Brian, what linker do you have in mind ? Is it the same issue as reported here [1] ? If so I've just pushed Jose's patch which explicitly handles scons. commit d4a1f3fd270001b2fb0684dc981340391df8fb64 Author: Jose Fonseca jfons...@vmware.com Date: Wed Aug 13 20:33:35 2014 +0100 scons: do not include headers from the sources lists -Emil [1] https://urldefense.proofpoint.com/v1/url?u=https://bugs.freedesktop.org/show_bug.cgi?id%3D82534k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=lGQMzzTgII0I7jefp2FHq7WtZ%2BTLs8wadB%2BiIj9xpBY%3D%0Am=toHOIPIotrjbrpC6XOIQWRAIUxSJLYHsUM%2Bq8nUooWI%3D%0As=470d545554f902da2fca8e17cf4def93576c3b65d0a2b0c80f73a4256a2f2d9b --- src/egl/main/Makefile.am |3 ++- src/egl/main/Makefile.sources | 36 2 files changed, 22 insertions(+), 17 deletions(-) diff --git a/src/egl/main/Makefile.am b/src/egl/main/Makefile.am index 6746bcc..06f6a05 100644 --- a/src/egl/main/Makefile.am +++ b/src/egl/main/Makefile.am @@ -34,7 +34,8 @@ AM_CFLAGS = \ lib_LTLIBRARIES = libEGL.la libEGL_la_SOURCES = \ - ${LIBEGL_C_FILES} + ${LIBEGL_C_FILES} \ + ${LIBEGL_H_FILES} libEGL_la_LIBADD = \ $(EGL_LIB_DEPS) diff --git a/src/egl/main/Makefile.sources b/src/egl/main/Makefile.sources index 6a917e2..3573004 100644 --- a/src/egl/main/Makefile.sources +++ b/src/egl/main/Makefile.sources @@ -1,38 +1,42 @@ LIBEGL_C_FILES := \ eglapi.c \ - eglapi.h \ eglarray.c \ + eglconfig.c \ + eglcontext.c \ + eglcurrent.c \ + egldisplay.c \ + egldriver.c \ + eglfallbacks.c \ + eglglobals.c \ + eglimage.c \ + egllog.c \ + eglmisc.c \ + eglmode.c \ + eglscreen.c \ + eglstring.c \ + eglsurface.c \ + eglsync.c + + +LIBEGL_H_FILES := \ + eglapi.h \ eglarray.h \ eglcompiler.h \ - eglconfig.c \ eglconfig.h \ - eglcontext.c \ eglcontext.h \ - eglcurrent.c \ eglcurrent.h \ egldefines.h \ - egldisplay.c \ egldisplay.h \ - egldriver.c \ egldriver.h \ - eglfallbacks.c \ - eglglobals.c \ eglglobals.h \ - eglimage.c \ eglimage.h \ - egllog.c \ egllog.h \ - eglmisc.c \ eglmisc.h \ - eglmode.c \ eglmode.h \ eglmutex.h \ - eglscreen.c \ eglscreen.h \ - eglstring.c \ eglstring.h \ - eglsurface.c \ eglsurface.h \ - eglsync.c \ eglsync.h \ egltypedefs.h + ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 82536] u_current.h:72: undefined reference to `__imp__glapi_Dispatch'
https://bugs.freedesktop.org/show_bug.cgi?id=82536 Emil Velikov emil.l.veli...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Emil Velikov emil.l.veli...@gmail.com --- The offending commit has been reverted. commit 957a28e63c8a205d01c48cb8fa03c3c1abe4b499 Author: Emil Velikov emil.l.veli...@gmail.com Date: Wed Aug 13 17:55:39 2014 +0100 Revert configure: Fix --enable-XX-bit flags by moving LT_INIT where it should This reverts commit 2af28040d639dddbb7c258981a00eaf3dfcbcf03. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 82546] [regression] libOSMesa build failure
https://bugs.freedesktop.org/show_bug.cgi?id=82546 Emil Velikov emil.l.veli...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Emil Velikov emil.l.veli...@gmail.com --- The offending commit has been reverted. commit 957a28e63c8a205d01c48cb8fa03c3c1abe4b499 Author: Emil Velikov emil.l.veli...@gmail.com Date: Wed Aug 13 17:55:39 2014 +0100 Revert configure: Fix --enable-XX-bit flags by moving LT_INIT where it should This reverts commit 2af28040d639dddbb7c258981a00eaf3dfcbcf03. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 82539] vmw_screen_dri.lo In file included from vmw_screen_dri.c:41: vmwgfx_drm.h:32:17: error: drm.h: No such file or directory
https://bugs.freedesktop.org/show_bug.cgi?id=82539 --- Comment #8 from Emil Velikov emil.l.veli...@gmail.com --- The revert is already in master, so a fetch/rebase test should suffice. Thank you -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 50754] Building 32 bit mesa on 64 bit OS fails since change for automake
https://bugs.freedesktop.org/show_bug.cgi?id=50754 --- Comment #29 from Emil Velikov emil.l.veli...@gmail.com --- Hello gents, While this patch looks correct at first sight I caused quite a few issues with other parts of mesa. As such I've reverted it, removed the hacky --enable-32,64-bit options, and documented (docs/autoconf.html) a reasonable approach towards multilib/cross-compile builds. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/1] configure.ac: Fix build with git-svn llvm version string
On Thu, 2014-08-14 at 06:35 -0700, Tom Stellard wrote: On Wed, Aug 13, 2014 at 04:46:56PM -0400, Jan Vesely wrote: Signed-off-by: Jan Vesely jan.ves...@rutgers.edu --- My llvm-config --version is 3.6.0git-svn-r215564-cd35a3b3 This patch assumes that the interesting part consists of only digits and dots. configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 4ff87eb..dc5117e 100644 --- a/configure.ac +++ b/configure.ac @@ -1697,7 +1697,7 @@ if test x$enable_gallium_llvm = xyes; then fi if test x$LLVM_CONFIG != xno; then -LLVM_VERSION=`$LLVM_CONFIG --version | sed 's/svn.*//g'` +LLVM_VERSION=`$LLVM_CONFIG --version | sed 's/[[^0-9.]].*//g'` As long as we are changing this. I think it would be simpler to use grep: `$LLVM_CONFIG --version | grep -o '^[[0-9.]]\+'` I agree. I didn't know about grep -o. It fixes my issue. Reviewed-and-tested-by: Jan Vesely jan.ves...@rutgers.edu jan -Tom LLVM_LDFLAGS=`$LLVM_CONFIG --ldflags` LLVM_BINDIR=`$LLVM_CONFIG --bindir` LLVM_CPPFLAGS=`strip_unwanted_llvm_flags $LLVM_CONFIG --cppflags` -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev -- Jan Vesely jan.ves...@rutgers.edu signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] vl/compositor: set the scissor before clearing the render target
Series is Reviewed-by: Ilia Mirkin imir...@alum.mit.edu On Thu, Aug 14, 2014 at 5:59 AM, Christian König deathsim...@vodafone.de wrote: From: Christian König christian.koe...@amd.com Otherwise we clear areas that shouldn't be cleared. Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/auxiliary/vl/vl_compositor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/vl/vl_compositor.c b/src/gallium/auxiliary/vl/vl_compositor.c index 839fd27..6bd1a88 100644 --- a/src/gallium/auxiliary/vl/vl_compositor.c +++ b/src/gallium/auxiliary/vl/vl_compositor.c @@ -1060,6 +1060,7 @@ vl_compositor_render(struct vl_compositor_state *s, s-scissor.maxx = dst_surface-width; s-scissor.maxy = dst_surface-height; } + c-pipe-set_scissor_states(c-pipe, 0, 1, s-scissor); gen_vertex_data(c, s, dirty_area); @@ -1072,7 +1073,6 @@ vl_compositor_render(struct vl_compositor_state *s, dirty_area-x1 = dirty_area-y1 = MIN_DIRTY; } - c-pipe-set_scissor_states(c-pipe, 0, 1, s-scissor); c-pipe-set_framebuffer_state(c-pipe, c-fb_state); c-pipe-bind_vs_state(c-pipe, c-vs); c-pipe-set_vertex_buffers(c-pipe, 0, 1, c-vertex_buf); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] Enable ARB_derivative_control for i965/Gen7+
Nice. Series is Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 37/37] i965/gen6: enable OpenGL 3.2
On Thu, Aug 14, 2014 at 4:12 AM, Iago Toral Quiroga ito...@igalia.com wrote: From: Samuel Iglesias Gonsalvez sigles...@igalia.com Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- I'd squash the last two patches together. I think it's likely we can go to GL 3.3 on Sandybridge, but we'd probably like to take a look at the piglit results first, so this patch that increases it to 3.2 seems fine. Both of these (squashed together) are Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/blorp_clear: Use memcpy instead of assignment to copy clear value
Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl_dri2: fix EXT_image_dma_buf_import fds
On Thu, Aug 14, 2014 at 12:24 AM, Pekka Paalanen ppaala...@gmail.com wrote: On Wed, 13 Aug 2014 19:46:40 +0300 Pohjolainen, Topi topi.pohjolai...@intel.com wrote: On Fri, Aug 08, 2014 at 05:28:59PM +0300, Pekka Paalanen wrote: From: Pekka Paalanen pekka.paala...@collabora.co.uk The EGL_EXT_image_dma_buf_import specification was revised (according to its revision history) on Dec 5th, 2013, for EGL to not take ownership of the file descriptors. Do not close the file descriptors passed in to eglCreateImageKHR with EGL_LINUX_DMA_BUF_EXT target. It is assumed, that the drivers, which ultimately process the file descriptors, do not close or modify them in any way either. This avoids the need to dup(), as it seems we would only need to just close the dup'd file descriptors right after. Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk I wrote the current logic based on the older version, and at least to me this is the right thing to do. Thanks for fixing it as well as taking care of the piglit test. Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com I would be happier though if someone else gave his/her approval as well. Thank you, I have added your R-b, and will wait some more. I think I want the piglit patch landed first before I try to push this, anyway. Thanks for the piglit review too, I sent a new version with your R-b and the comment fix. The plan is to make the 10.3 branch tomorrow, so don't wait too long. :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/gen8: Allow 16k viewport when blitting stencil
From: Topi Pohjolainen topi.pohjolai...@gmail.com Fixes gles3 conformance tests: framebuffer_blit_functionality_negative_height_blit framebuffer_blit_functionality_negative_width_blit framebuffer_blit_functionality_negative_dimensions_blit framebuffer_blit_functionality_magnifying_blit framebuffer_blit_functionality_multisampled_to_singlesampled_blit Signed-off-by: Topi Pohjolainen topi.pohjolai...@gmail.com --- src/mesa/drivers/dri/i965/gen8_viewport_state.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen8_viewport_state.c b/src/mesa/drivers/dri/i965/gen8_viewport_state.c index 9c89532..eda9aad 100644 --- a/src/mesa/drivers/dri/i965/gen8_viewport_state.c +++ b/src/mesa/drivers/dri/i965/gen8_viewport_state.c @@ -94,6 +94,13 @@ gen8_upload_sf_clip_viewport(struct brw_context *brw) float gbx = maximum_guardband_extent / ctx-ViewportArray[i].Width; float gby = maximum_guardband_extent / ctx-ViewportArray[i].Height; + /** + * Stencil blits require W-tiled to be treated as Y-tiled needing in + * turn width to be programmed twice the original. + */ + if (brw-meta_in_progress) + gbx *= 2; + /* _NEW_VIEWPORT: Guardband Clipping */ vp[8] = -gbx; /* x-min */ vp[9] = gbx; /* x-max */ -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 4/5] mesa: add ARB_texture_barrier support
Any chance this can get reviewed before the 10.3 cutoff tomorrow? I copied one of the existing nv_texture_barrier piglits and made use of glTextureBarrier() instead, and it still passed. On Mon, Aug 11, 2014 at 4:01 PM, Ilia Mirkin imir...@alum.mit.edu wrote: This extension is identical to NV_texture_barrier. Alias glTextureBarrier to the existing glTextureBarrierNV and use the existing NV_texture_barrier extension bit. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- v1 - v2: - Add the actual extension string - Remove separate (and missing dlist bits) TextureBarrier implementation in favor of aliasing approach. src/mapi/glapi/gen/ARB_texture_barrier.xml | 13 + src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 4 src/mesa/main/extensions.c | 1 + 4 files changed, 19 insertions(+) create mode 100644 src/mapi/glapi/gen/ARB_texture_barrier.xml diff --git a/src/mapi/glapi/gen/ARB_texture_barrier.xml b/src/mapi/glapi/gen/ARB_texture_barrier.xml new file mode 100644 index 000..7119732 --- /dev/null +++ b/src/mapi/glapi/gen/ARB_texture_barrier.xml @@ -0,0 +1,13 @@ +?xml version=1.0? +!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd + +!-- Note: no GLX protocol info yet. -- + + +OpenGLAPI + +category name=GL_ARB_texture_barrier number=167 +function name=TextureBarrier alias=TextureBarrierNV / +/category + +/OpenGLAPI diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 212731f..2cc2752 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -144,6 +144,7 @@ API_XML = \ ARB_shader_atomic_counters.xml \ ARB_shader_image_load_store.xml \ ARB_sync.xml \ + ARB_texture_barrier.xml \ ARB_texture_buffer_object.xml \ ARB_texture_buffer_range.xml \ ARB_texture_compression_rgtc.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index e011509..ccf3b9a 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8364,6 +8364,10 @@ xi:include href=ARB_multi_bind.xml xmlns:xi=http://www.w3.org/2001/XInclude/ +!-- ARB extensions 148 - 166 -- + +xi:include href=ARB_texture_barrier.xml xmlns:xi=http://www.w3.org/2001/XInclude/ + !-- Non-ARB extensions sorted by extension number. -- category name=GL_EXT_blend_color number=2 diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index 9ac8377..311f6ce 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -151,6 +151,7 @@ static const struct extension extension_table[] = { { GL_ARB_shadow, o(ARB_shadow), GLL,2001 }, { GL_ARB_stencil_texturing, o(ARB_stencil_texturing), GL, 2012 }, { GL_ARB_sync,o(ARB_sync), GL, 2003 }, + { GL_ARB_texture_barrier, o(NV_texture_barrier), GL, 2014 }, { GL_ARB_texture_border_clamp, o(ARB_texture_border_clamp),GLL,2000 }, { GL_ARB_texture_buffer_object, o(ARB_texture_buffer_object), GLC,2008 }, { GL_ARB_texture_buffer_object_rgb32, o(ARB_texture_buffer_object_rgb32), GLC,2009 }, -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/5] gallium: add opcodes/cap for fine derivative support
Am 14.08.2014 16:39, schrieb Ilia Mirkin: I guess a question is whether we should even bother with the fine version at all then? Just map everything to DDX/DDY... Although I guess if llvmpipe does the coarse version sometimes, at least the fine version is warranted. I think it's nice to have both versions. llvmpipe only does the coarse version for its internal use. If a shader would do a ddx and ddy and then use the values for a texture instruction with explicit derivatives, some slower path is used for sampling (which can handle different mip levels in a quad) (though this is a lot subject currently to debug vars such as no_quad_lod). The problem is that even if you'd do a coarse_ddx, we still would fall back to that slower path anyway, because (unlike intel hw where it really matters if the actual lod values are different) we won't detect that there is in fact just one lod per quad, so right now there would not really be a benefit. Obviously, if you do the derivatives calculations as part of the sampling itself, this is not a problem. FWIW the slow path isn't actually all THAT more complicated than the per-quad lod path - strides, mip image offsets etc. need to be looked up per pixel rather than per quad, plus some slowness comes from the fact that stupid sse/avx (only avx2) doesn't have true vector shift... There's also the fact that the tex filter may be different too per pixel (with different min/mag filter) though since we do (in some cases at least with avx) do texture sampling for multiple quads at once this is something which needs to be handled in any case. I suspect hw being slower with different effective lods per pixel has similar reasons - there's just more work to be done. Roland On Thu, Aug 14, 2014 at 10:12 AM, Roland Scheidegger srol...@vmware.com wrote: Reviewed-by: Roland Scheidegger srol...@vmware.com llvmpipe also already does the fine version. A coarse version (which we indeed do when used implicitly for sampling though with some other changes) might be minimally simpler though not even sure (might save a shuffle instruction somewhere), but probably not worth it (plus, d3d10 sm4 had deriv_rtx and sm5 deriv_rtx_coarse/deriv_rtx_fine but the sm4 versions correspond to the fine versions so this was required). Roland Am 14.08.2014 06:52, schrieb Ilia Mirkin: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/auxiliary/tgsi/tgsi_info.c | 3 +++ src/gallium/auxiliary/tgsi/tgsi_util.c | 2 ++ src/gallium/docs/source/screen.rst | 2 ++ src/gallium/docs/source/tgsi.rst | 12 ++-- src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + src/gallium/include/pipe/p_shader_tokens.h | 5 - 19 files changed, 35 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index e24348f..35f9747 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -235,6 +235,9 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 1, 0, 0, 0, 0, OTHR, INTERP_CENTROID, TGSI_OPCODE_INTERP_CENTROID }, { 1, 2, 0, 0, 0, 0, OTHR, INTERP_SAMPLE, TGSI_OPCODE_INTERP_SAMPLE }, { 1, 2, 0, 0, 0, 0, OTHR, INTERP_OFFSET, TGSI_OPCODE_INTERP_OFFSET }, + + { 1, 1, 0, 0, 0, 0, COMP, DDX_FINE, TGSI_OPCODE_DDX_FINE }, + { 1, 1, 0, 0, 0, 0, COMP, DDY_FINE, TGSI_OPCODE_DDY_FINE }, }; const struct tgsi_opcode_info * diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c b/src/gallium/auxiliary/tgsi/tgsi_util.c index e48159c..e1cba95 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_util.c +++ b/src/gallium/auxiliary/tgsi/tgsi_util.c @@ -245,6 +245,8 @@ tgsi_util_get_inst_usage_mask(const struct tgsi_full_instruction *inst, case TGSI_OPCODE_USNE: case TGSI_OPCODE_IMUL_HI: case TGSI_OPCODE_UMUL_HI: + case TGSI_OPCODE_DDX_FINE: + case TGSI_OPCODE_DDY_FINE: /* Channel-wise operations */ read_mask = write_mask; break; diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 814e3ae..6fecc15 100644 ---