[Mesa-dev] [Bug 84145] UE4: Realistic Rendering Demo render blue
https://bugs.freedesktop.org/show_bug.cgi?id=84145 Ilia Mirkin imir...@alum.mit.edu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #21 from Ilia Mirkin imir...@alum.mit.edu --- I've pushed this out. Thanks for bisecting and testing! commit 9d2e298dd4159651323cac54dbc43527e7fd6d16 Author: Ilia Mirkin imir...@alum.mit.edu Date: Wed Sep 24 00:58:07 2014 -0400 mesa/st: NumLayers is only valid for array textures -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 84355] New: texture2DProjLod and textureCubeLod are not supported when using GLES.
https://bugs.freedesktop.org/show_bug.cgi?id=84355 Priority: medium Bug ID: 84355 Assignee: mesa-dev@lists.freedesktop.org Summary: texture2DProjLod and textureCubeLod are not supported when using GLES. Severity: normal Classification: Unclassified OS: All Reporter: kondapallykalyancontrib...@gmail.com Hardware: Other Status: NEW Version: 10.2 Component: Mesa core Product: Mesa Created attachment 106901 -- https://bugs.freedesktop.org/attachment.cgi?id=106901action=edit patch. According to GLES (i.e. 1.0 and above) spec textureCubeLod and texture2DProjLod are built in functions. We seem to disable support for these functions with GLES. The following WebGL conformance tests fail when running Chromium Web Browser with Wayland(https://github.com/01org/ozone-wayland) Test case: https://www.khronos.org/registry/webgl/sdk/tests/conformance/glsl/samplers/glsl-function-texture2dprojlod.html Attached is a patch which fixes this.. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] i965: Issue performance warnings on MapBufferRange stalls.
On Friday, August 29, 2014 11:10:48 PM Kenneth Graunke wrote: This is easy: we just need to use brw_map_bo instead of mapping it directly. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/intel_buffer_objects.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c b/src/mesa/drivers/dri/i965/intel_buffer_objects.c index 96dacde..fb806dc 100644 --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c @@ -421,8 +421,8 @@ intel_bufferobj_map_range(struct gl_context * ctx, intel_obj-map_extra[index], alignment); if (brw-has_llc) { - drm_intel_bo_map(intel_obj-range_map_bo[index], - (access GL_MAP_WRITE_BIT) != 0); + brw_bo_map(brw, intel_obj-range_map_bo[index], +(access GL_MAP_WRITE_BIT) != 0, range-map); } else { drm_intel_gem_bo_map_gtt(intel_obj-range_map_bo[index]); } @@ -438,7 +438,8 @@ intel_bufferobj_map_range(struct gl_context * ctx, drm_intel_gem_bo_map_gtt(intel_obj-buffer); intel_bufferobj_mark_inactive(intel_obj); } else { - drm_intel_bo_map(intel_obj-buffer, (access GL_MAP_WRITE_BIT) != 0); + brw_bo_map(brw, intel_obj-buffer, (access GL_MAP_WRITE_BIT) != 0, + MapBufferRange); intel_bufferobj_mark_inactive(intel_obj); } It's been a month and patches 2-4 haven't received any review. Could someone take a look? Thanks, --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 84355] texture2DProjLod and textureCubeLod are not supported when using GLES.
https://bugs.freedesktop.org/show_bug.cgi?id=84355 kalyank kondapallykalyancontrib...@gmail.com changed: What|Removed |Added Hardware|Other |x86 (IA32) -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 84355] texture2DProjLod and textureCubeLod are not supported when using GLES.
https://bugs.freedesktop.org/show_bug.cgi?id=84355 Kenneth Graunke kenn...@whitecape.org changed: What|Removed |Added Assignee|mesa-dev@lists.freedesktop. |i...@freedesktop.org |org | QA Contact||intel-3d-bugs@lists.freedes ||ktop.org Component|Mesa core |glsl-compiler --- Comment #1 from Kenneth Graunke kenn...@whitecape.org --- Hi Kalyan, Please use git-send-email to send patches to mesa-dev. Thanks! -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] configure.ac: Compute LLVM_VERSION_PATCH using llvm-config
On Thu, Sep 25, 2014 at 12:55:40PM -0700, Tom Stellard wrote: This is the only guaranteed way get the patch level for llvm, since the define cannot always be found in config.h depending on the version of llvm or the build system used. CC: mesa-sta...@lists.freedesktop.org Reviewed-by: Jonathan Gray j...@jsg.id.au --- configure.ac | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/configure.ac b/configure.ac index bad1528..a097a5c 100644 --- a/configure.ac +++ b/configure.ac @@ -1704,11 +1704,10 @@ if test x$enable_gallium_llvm = xyes; then AC_COMPUTE_INT([LLVM_VERSION_MINOR], [LLVM_VERSION_MINOR], [#include ${LLVM_INCLUDEDIR}/llvm/Config/llvm-config.h]) -dnl In LLVM 3.4.1 patch level was defined in config.h and not -dnl llvm-config.h -AC_COMPUTE_INT([LLVM_VERSION_PATCH], [LLVM_VERSION_PATCH], -[#include ${LLVM_INCLUDEDIR}/llvm/Config/config.h], -LLVM_VERSION_PATCH=0) dnl Default if LLVM_VERSION_PATCH not found +LLVM_VERSION_PATCH=`echo $LLVM_VERSION | cut -d. -f3 | egrep -o '^[[0-9]]+'` +if test -z $LLVM_VERSION_PATCH; then +LLVM_VERSION_PATCH=0 +fi if test -n ${LLVM_VERSION_MAJOR}; then LLVM_VERSION_INT=${LLVM_VERSION_MAJOR}0${LLVM_VERSION_MINOR} -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] radeonsi/compute: directly emit CONTEXT_CONTROL
How about assuming for each CS that it can use the compute ring and as soon as we submit a PM4 command that can only be executed on the graphics ring note that this CS needs to be executed on the graphics ring? Just an idea, Christian. Am 25.09.2014 um 21:02 schrieb Tom Stellard: On Mon, Sep 22, 2014 at 09:48:43PM +0200, Marek Olšák wrote: No, we cannot detect compute-only contexts yet. We need to add a new parameter to pipe_context::context_create which says that a context is compute-only. That should be OpenCL but not OpenGL. Also, some code paths like resource_copy_region use the graphics engine for copying, which cannot be used with compute rings and must be implemented with either DMA or compute-based blits. DMA isn't flexible enough, so some additional work for compute-based blits might be needed. We can also use the graphics ring for copying only and the compute ring for compute stuff. If possible, I think I would prefer continuing to use the graphic ring for blits and only submit compute specific packets to the compute ring. I'm a little concerned that adding a compute-flag to context create might make it harder to share code between compute and graphics, which I think is important. What are the downsides of using both rings at once? Will we need to add synchronization code for the two rings? I think the last time I looked into doing this, the biggest problem was that fences were submitted via the graphics ring even though they were meant for jobs on the compute ring. Is there are good solution to this? -Tom Marek On Mon, Sep 22, 2014 at 8:03 PM, Niels Ole Salscheider niels_...@salscheider-online.de wrote: On Monday 22 September 2014, 12:16:13, Alex Deucher wrote: On Sat, Sep 20, 2014 at 6:11 AM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com Looks good. Tom should probably take a look as well. As a further improvement, it would be nice to be able to use the compute rings for compute rather than gfx, but I'm not sure how much additional effort it would take to clean that up. This is completely untested but now that we can detect compute contexts something like the attached patches might be sufficient... Reviewed-by: Alex Deucher alexander.deuc...@amd.com --- src/gallium/drivers/radeonsi/si_compute.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 4b2662d..3ad9182 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -168,6 +168,7 @@ static void si_launch_grid( uint32_t pc, const void *input) { struct si_context *sctx = (struct si_context*)ctx; + struct radeon_winsys_cs *cs = sctx-b.rings.gfx.cs; struct si_compute *program = sctx-cs_shader_state.program; struct si_pm4_state *pm4 = CALLOC_STRUCT(si_pm4_state); struct r600_resource *input_buffer = program-input_buffer; @@ -184,8 +185,11 @@ static void si_launch_grid( unsigned lds_blocks; unsigned num_waves_for_scratch; + radeon_emit(cs, PKT3(PKT3_CONTEXT_CONTROL, 1, 0) | PKT3_SHADER_TYPE_S(1)); + radeon_emit(cs, 0x8000); + radeon_emit(cs, 0x8000); + pm4-compute_pkt = true; - si_cmd_context_control(pm4); si_pm4_cmd_begin(pm4, PKT3_EVENT_WRITE); si_pm4_cmd_add(pm4, EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH) | -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] st/va: skeleton VAAPI state tracker
Hi Leo, On 25/09/14 15:21, Liu, Leo wrote: Hi Gwenole and Emil, [...] the reason for $(LIBVA_LIBS) is for xcb lib, from configure.ac +PKG_CHECK_MODULES([LIBVA], [libva = 0.35.0 x11-xcb xcb-dri2 = $XCBDRI2_REQUIRED]) I will separate them, and remove libva for link. I've completely forgot that the patch that splits them out did not land. The easiest/shortest thing you can do is (based on vdpau) PKG_CHECK_MODULES([LIBVA], [libva = 0.35.0 x11-xcb xcb-dri2 = $XCBDRI2_REQUIRED], [LIBVA_LIBS=`$PKG_CONFIG --libs x11-xcb xcb-dri2 = $XCBDRI2_REQUIRED`]) Pardon for the bashing, Emil Thanks, Leo ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] glsl: Optimize min/max expression trees
Original patch by Petri Latvala petri.latv...@intel.com: Add an optimization pass that drops min/max expression operands that can be proven to not contribute to the final result. The algorithm is similar to alpha-beta pruning on a minmax search, from the field of AI. This optimization pass can optimize min/max expressions where operands are min/max expressions. Such code can appear in shaders by itself, or as the result of clamp() or AMD_shader_trinary_minmax functions. This optimization pass improves the generated code for piglit's AMD_shader_trinary_minmax tests as follows: total instructions in shared programs: 75 - 67 (-10.67%) instructions in affected programs: 60 - 52 (-13.33%) GAINED:0 LOST: 0 All tests (max3, min3, mid3) improved. A full shader-db run: total instructions in shared programs: 4293603 - 4293575 (-0.00%) instructions in affected programs: 1188 - 1160 (-2.36%) GAINED:0 LOST: 0 Improvements happen in Guacamelee and Serious Sam 3. One shader from Dungeon Defenders is hurt by shader-db metrics (26 - 28), because of dropping of a (constant float (0.0)) operand, which was compiled to a saturate modifier. Version 2 by Iago Toral Quiroga ito...@igalia.com: Changes from review feedback: - Squashed various cosmetic changes sent by Matt Turner. - Make less_all_components return an enum rather than setting a class member. (Suggested by Mat Turner). Also, renamed it to compare_components. - Make less_all_components, smaller_constant and larger_constant static. (Suggested by Mat Turner) - Change mixmax_range to call its limits low and high instead of range[0] and range[1]. (Suggested by Connor Abbot). - Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by Connor Abbot). - Make the logic more clearer by rearrenging the code and commenting. (Suggested by Connor Abbot). - Added comment to explain why we need to recurse twice. (Suggested by Connor Abbot). - If we cannot prune an expression, do not return early. Instead, attempt to prune its children. (Suggested by Connor Abbot). Other changes: - Instead of having a global valid visitor member, let the various functions that can determine this status return a boolean and check for its value to decide what to do in each case. This is more flexible and allows to recurse into children of parents that could not be prunned due to invalid ranges (so related to the last bullet in the review feedback). - Make sure we always check if a range is valid before working with it. Since any use of get_range, combine_range or range_intersection can invalidate a range we should check for this situation every time we use any of these functions. No piglit regressions observed with Version 2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861 --- Version 2 also passes all unit tests sent by Petri in the original series. src/glsl/Makefile.sources | 1 + src/glsl/glsl_parser_extras.cpp | 1 + src/glsl/ir_optimization.h | 1 + src/glsl/opt_minmax.cpp | 457 4 files changed, 460 insertions(+) create mode 100644 src/glsl/opt_minmax.cpp diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index cb8d5a6..1c08697 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -95,6 +95,7 @@ LIBGLSL_FILES = \ $(GLSL_SRCDIR)/opt_flip_matrices.cpp \ $(GLSL_SRCDIR)/opt_function_inlining.cpp \ $(GLSL_SRCDIR)/opt_if_simplification.cpp \ + $(GLSL_SRCDIR)/opt_minmax.cpp \ $(GLSL_SRCDIR)/opt_noop_swizzle.cpp \ $(GLSL_SRCDIR)/opt_rebalance_tree.cpp \ $(GLSL_SRCDIR)/opt_redundant_jumps.cpp \ diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 490c3c8..ae19ce4 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -1586,6 +1586,7 @@ do_common_optimization(exec_list *ir, bool linked, else progress = do_constant_variable_unlinked(ir) || progress; progress = do_constant_folding(ir) || progress; + progress = do_minmax_prune(ir) || progress; progress = do_cse(ir) || progress; progress = do_rebalance_tree(ir) || progress; progress = do_algebraic(ir, native_integers, options) || progress; diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h index 369dcd1..8fbd992 100644 --- a/src/glsl/ir_optimization.h +++ b/src/glsl/ir_optimization.h @@ -99,6 +99,7 @@ bool opt_flatten_nested_if_blocks(exec_list *instructions); bool do_discard_simplification(exec_list *instructions); bool lower_if_to_cond_assign(exec_list *instructions, unsigned max_depth = 0); bool do_mat_op_to_vec(exec_list *instructions); +bool do_minmax_prune(exec_list *instructions); bool do_noop_swizzle(exec_list *instructions); bool do_structure_splitting(exec_list
Re: [Mesa-dev] [PATCH 1/4] radeonsi/compute: directly emit CONTEXT_CONTROL
On Thu, Sep 25, 2014 at 3:02 PM, Tom Stellard t...@stellard.net wrote: On Mon, Sep 22, 2014 at 09:48:43PM +0200, Marek Olšák wrote: No, we cannot detect compute-only contexts yet. We need to add a new parameter to pipe_context::context_create which says that a context is compute-only. That should be OpenCL but not OpenGL. Also, some code paths like resource_copy_region use the graphics engine for copying, which cannot be used with compute rings and must be implemented with either DMA or compute-based blits. DMA isn't flexible enough, so some additional work for compute-based blits might be needed. We can also use the graphics ring for copying only and the compute ring for compute stuff. If possible, I think I would prefer continuing to use the graphic ring for blits and only submit compute specific packets to the compute ring. I'm a little concerned that adding a compute-flag to context create might make it harder to share code between compute and graphics, which I think is important. What are the downsides of using both rings at once? Will we need to add synchronization code for the two rings? I think the last time I looked into doing this, the biggest problem was that fences were submitted via the graphics ring even though they were meant for jobs on the compute ring. Is there are good solution to this? It would be nice to not have any dependencies on the gfx ring. That way compute jobs can run on the compute rings without requiring the gfx ring which should avoid any latency issues with desktop gfx jobs. Alex -Tom Marek On Mon, Sep 22, 2014 at 8:03 PM, Niels Ole Salscheider niels_...@salscheider-online.de wrote: On Monday 22 September 2014, 12:16:13, Alex Deucher wrote: On Sat, Sep 20, 2014 at 6:11 AM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com Looks good. Tom should probably take a look as well. As a further improvement, it would be nice to be able to use the compute rings for compute rather than gfx, but I'm not sure how much additional effort it would take to clean that up. This is completely untested but now that we can detect compute contexts something like the attached patches might be sufficient... Reviewed-by: Alex Deucher alexander.deuc...@amd.com --- src/gallium/drivers/radeonsi/si_compute.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 4b2662d..3ad9182 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -168,6 +168,7 @@ static void si_launch_grid( uint32_t pc, const void *input) { struct si_context *sctx = (struct si_context*)ctx; + struct radeon_winsys_cs *cs = sctx-b.rings.gfx.cs; struct si_compute *program = sctx-cs_shader_state.program; struct si_pm4_state *pm4 = CALLOC_STRUCT(si_pm4_state); struct r600_resource *input_buffer = program-input_buffer; @@ -184,8 +185,11 @@ static void si_launch_grid( unsigned lds_blocks; unsigned num_waves_for_scratch; + radeon_emit(cs, PKT3(PKT3_CONTEXT_CONTROL, 1, 0) | PKT3_SHADER_TYPE_S(1)); + radeon_emit(cs, 0x8000); + radeon_emit(cs, 0x8000); + pm4-compute_pkt = true; - si_cmd_context_control(pm4); si_pm4_cmd_begin(pm4, PKT3_EVENT_WRITE); si_pm4_cmd_add(pm4, EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH) | -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): glsl: Make sure fields after small structs have correct padding
Okay... I screwed up this morning. I pushed a set of four patches without adding Jordan's Reviewed-by. Realizing the error, I quickly added the R-b to each commit and force-pushed the changes. If you pushed something in the intervening 2 minutes, it got lost. On 09/26/2014 08:00 AM, Ian Romanick wrote: Module: Mesa Branch: master Commit: 8e01c66da6c780601f941aa5b9939962c219fdbd URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=8e01c66da6c780601f941aa5b9939962c219fdbd Author: Ian Romanick ian.d.roman...@intel.com Date: Mon Sep 8 12:23:39 2014 -0700 glsl: Make sure fields after small structs have correct padding Previously the linker would correctly calculate the layout, but the lower_ubo_reference pass would not apply correct alignment to fields following small (less than 16-byte) nested structures. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Reviewed-by: Jordan Justen jordan.l.jus...@intel.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83533 Cc: mesa-sta...@lists.freedesktop.org --- src/glsl/lower_ubo_reference.cpp | 22 ++ 1 file changed, 22 insertions(+) diff --git a/src/glsl/lower_ubo_reference.cpp b/src/glsl/lower_ubo_reference.cpp index 3cdfc04..4ae1aac 100644 --- a/src/glsl/lower_ubo_reference.cpp +++ b/src/glsl/lower_ubo_reference.cpp @@ -327,6 +327,15 @@ lower_ubo_reference_visitor::handle_rvalue(ir_rvalue **rvalue) const glsl_type *struct_type = deref_record-record-type; unsigned intra_struct_offset = 0; + /* glsl_type::std140_base_alignment doesn't grok interfaces. Use + * 16-bytes for the alignment because that is the general minimum of + * std140. + */ + const unsigned struct_alignment = struct_type-is_interface() +? 16 +: struct_type-std140_base_alignment(row_major); + + for (unsigned int i = 0; i struct_type-length; i++) { const glsl_type *type = struct_type-fields.structure[i].type; @@ -346,6 +355,19 @@ lower_ubo_reference_visitor::handle_rvalue(ir_rvalue **rvalue) deref_record-field) == 0) break; intra_struct_offset += type-std140_size(field_row_major); + +/* If the field just examined was itself a structure, apply rule + * #9: + * + * The structure may have padding at the end; the base offset + * of the member following the sub-structure is rounded up to + * the next multiple of the base alignment of the structure. + */ +if (type-without_array()-is_record()) { + intra_struct_offset = glsl_align(intra_struct_offset, +struct_alignment); + +} } const_offset += intra_struct_offset; ___ mesa-commit mailing list mesa-com...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-commit ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] i965: Issue performance warnings on MapBufferRange stalls.
On Fri, Aug 29, 2014 at 11:10:48PM -0700, Kenneth Graunke wrote: This is easy: we just need to use brw_map_bo instead of mapping it directly. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Reviwed-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/intel_buffer_objects.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c b/src/mesa/drivers/dri/i965/intel_buffer_objects.c index 96dacde..fb806dc 100644 --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c @@ -421,8 +421,8 @@ intel_bufferobj_map_range(struct gl_context * ctx, intel_obj-map_extra[index], alignment); if (brw-has_llc) { - drm_intel_bo_map(intel_obj-range_map_bo[index], - (access GL_MAP_WRITE_BIT) != 0); + brw_bo_map(brw, intel_obj-range_map_bo[index], +(access GL_MAP_WRITE_BIT) != 0, range-map); } else { drm_intel_gem_bo_map_gtt(intel_obj-range_map_bo[index]); } @@ -438,7 +438,8 @@ intel_bufferobj_map_range(struct gl_context * ctx, drm_intel_gem_bo_map_gtt(intel_obj-buffer); intel_bufferobj_mark_inactive(intel_obj); } else { - drm_intel_bo_map(intel_obj-buffer, (access GL_MAP_WRITE_BIT) != 0); + brw_bo_map(brw, intel_obj-buffer, (access GL_MAP_WRITE_BIT) != 0, + MapBufferRange); intel_bufferobj_mark_inactive(intel_obj); } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] i965: Issue performance warnings for program cache related stalls.
On Fri, Aug 29, 2014 at 11:10:49PM -0700, Kenneth Graunke wrote: We don't really want extra buffer copying or stalls when mapping, so it'd be nice to know when it's happening. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Reviewed-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_state_cache.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c index b0986ea..b9bb0fc 100644 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c @@ -175,7 +175,7 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t new_size) /* Copy any existing data that needs to be saved. */ if (cache-next_offset != 0) { - drm_intel_bo_map(cache-bo, false); + brw_bo_map(brw, cache-bo, false, program cache); drm_intel_bo_subdata(new_bo, 0, cache-next_offset, cache-bo-virtual); drm_intel_bo_unmap(cache-bo); } @@ -200,6 +200,7 @@ brw_try_upload_using_copy(struct brw_cache *cache, const void *data, const void *aux) { + struct brw_context *brw = cache-brw; int i; struct brw_cache_item *item; @@ -221,7 +222,7 @@ brw_try_upload_using_copy(struct brw_cache *cache, continue; } - drm_intel_bo_map(cache-bo, false); + brw_bo_map(brw, cache-bo, false, program cache); ret = memcmp(cache-bo-virtual + item-offset, data, item-size); drm_intel_bo_unmap(cache-bo); if (ret) @@ -241,6 +242,8 @@ brw_upload_item_data(struct brw_cache *cache, struct brw_cache_item *item, const void *data) { + struct brw_context *brw = cache-brw; + /* Allocate space in the cache BO for our new program. */ if (cache-next_offset + item-size cache-bo-size) { uint32_t new_size = cache-bo-size * 2; @@ -255,6 +258,7 @@ brw_upload_item_data(struct brw_cache *cache, * recreate it. */ if (cache-bo_used_by_gpu) { + perf_debug(Copying busy program cache buffer.\n); brw_cache_new_bo(cache, cache-bo-size); } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] i965: Issue performance warnings for program cache related stalls.
On Fri, Sep 26, 2014 at 08:36:39AM -0700, Kristian Høgsberg wrote: On Fri, Aug 29, 2014 at 11:10:49PM -0700, Kenneth Graunke wrote: We don't really want extra buffer copying or stalls when mapping, so it'd be nice to know when it's happening. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Reviewed-by: Kristian Høgsberg k...@bitplanet.net This warns if the the program cache is currently being read by the GPU (expected) but a read-read (as used here) does not incur a stall. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: improve accuracy of atan()
Our current atan()-approximation is pretty inaccurate at 1.0, so let's try to improve the situation by doing a direct approximation without going through atan. This new implementation uses an 11th degree polynomial to approximate atan in the [-1..1] range, and the following identitiy to reduce the entire range to [-1..1]: atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x) This range-reduction idea is taken from the paper Fast computation of Arctangent Functions for Embedded Applications: A Comparative Analysis (Ukil et al. 2011). The polynomial that approximates atan(x) is: x * 0.793128310355 - x^3 * 0.3326756418091246 + x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 + x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444 This polynomial was found with the following GNU Octave script: x = linspace(0, 1); y = atan(x); n = [1, 3, 5, 7, 9, 11]; format long; polyfitc(x, y, n) The polyfitc function is not built-in, but too long to include here. It can be downloaded from the following URL: http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m This fixes the following piglit test: shaders/glsl-const-folding-01 Signed-off-by: Erik Faye-Lund kusmab...@gmail.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/builtin_functions.cpp | 65 +++--- 1 file changed, 55 insertions(+), 10 deletions(-) diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp index 9be7f6d..c126b60 100644 --- a/src/glsl/builtin_functions.cpp +++ b/src/glsl/builtin_functions.cpp @@ -442,6 +442,7 @@ private: ir_swizzle *matrix_elt(ir_variable *var, int col, int row); ir_expression *asin_expr(ir_variable *x); + void do_atan(ir_factory body, const glsl_type *type, ir_variable *res, operand y_over_x); /** * Call function \param f with parameters specified as the linked @@ -2684,11 +2685,7 @@ builtin_builder::_atan2(const glsl_type *type) ir_factory outer_then(outer_if-then_instructions, mem_ctx); /* Then...call atan(y/x) */ - ir_variable *y_over_x = outer_then.make_temp(glsl_type::float_type, y_over_x); - outer_then.emit(assign(y_over_x, div(y, x))); - outer_then.emit(assign(r, mul(y_over_x, rsq(add(mul(y_over_x, y_over_x), - imm(1.0f)); - outer_then.emit(assign(r, asin_expr(r))); + do_atan(body, glsl_type::float_type, r, div(y, x)); /* ...and fix it up: */ ir_if *inner_if = new(mem_ctx) ir_if(less(x, imm(0.0f))); @@ -2711,17 +2708,65 @@ builtin_builder::_atan2(const glsl_type *type) return sig; } +void +builtin_builder::do_atan(ir_factory body, const glsl_type *type, ir_variable *res, operand y_over_x) +{ + /* +* range-reduction, first step: +* +* / y_over_x if |y_over_x| = 1.0; +* x = +* \ 1.0 / y_over_x otherwise +*/ + ir_variable *x = body.make_temp(type, atan_x); + body.emit(assign(x, div(min2(abs(y_over_x), +imm(1.0f)), + max2(abs(y_over_x), +imm(1.0f); + + /* +* approximate atan by evaluating polynomial: +* +* x * 0.793128310355 - x^3 * 0.3326756418091246 + +* x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 + +* x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444 +*/ + ir_variable *tmp = body.make_temp(type, atan_tmp); + body.emit(assign(tmp, mul(x, x))); + body.emit(assign(tmp, mul(add(mul(sub(mul(add(mul(sub(mul(add(mul(imm(-0.0121323213173444f), + tmp), + imm(0.0536813784310406f)), + tmp), + imm(0.1173503194786851f)), + tmp), + imm(0.1938924977115610f)), + tmp), + imm(0.3326756418091246f)), + tmp), + imm(0.793128310355f)), + x))); + + /* range-reduction fixup */ + body.emit(assign(tmp, add(tmp, + mul(b2f(greater(abs(y_over_x), + imm(1.0f, type-components(, + add(mul(tmp, + imm(-2.0f)), + imm(M_PI_2f)); + + /* sign fixup */ + body.emit(assign(res, mul(tmp, sign(y_over_x; +} + ir_function_signature * builtin_builder::_atan(const glsl_type *type) { ir_variable *y_over_x = in_var(type, y_over_x); MAKE_SIG(type, always_available, 1,
Re: [Mesa-dev] [PATCH 4/4] i965: Use unsynchronized maps for the program cache on LLC platforms.
On Fri, Aug 29, 2014 at 11:10:50PM -0700, Kenneth Graunke wrote: There's no reason to stall on pwrite - the CPU always appends to the buffer and never modifies existing contents, and the GPU never writes it. Further, the CPU always appends new data before submitting a batch that requires it. This code predates the unsynchronized mapping feature, so we simply didn't have the option when it was written. Ideally, we would do this for non-LLC platforms too, but unsynchronized mapping support only exists for LLC systems. Saves repeated 0.001ms stalls on program upload. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_state_cache.c | 34 +++-- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c index b9bb0fc..1d2d32f 100644 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c @@ -172,14 +172,23 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t new_size) drm_intel_bo *new_bo; new_bo = drm_intel_bo_alloc(brw-bufmgr, program cache, new_size, 64); + if (brw-has_llc) + drm_intel_gem_bo_map_unsynchronized(new_bo); /* Copy any existing data that needs to be saved. */ if (cache-next_offset != 0) { - brw_bo_map(brw, cache-bo, false, program cache); - drm_intel_bo_subdata(new_bo, 0, cache-next_offset, cache-bo-virtual); - drm_intel_bo_unmap(cache-bo); + if (brw-has_llc) { + memcpy(new_bo-virtual, cache-bo-virtual, cache-next_offset); Move the drm_intel_gem_bo_map_unsynchronized() and drm_intel_bo_unmap() calls into this block so they bracket the memcpy as for the subdata case below? Other than that, Reviewed-by: Kristian Høgsberg k...@bitplanet.net + } else { + brw_bo_map(brw, cache-bo, false, program cache); + drm_intel_bo_subdata(new_bo, 0, cache-next_offset, + cache-bo-virtual); + drm_intel_bo_unmap(cache-bo); + } } + if (brw-has_llc) + drm_intel_bo_unmap(cache-bo); drm_intel_bo_unreference(cache-bo); cache-bo = new_bo; cache-bo_used_by_gpu = false; @@ -222,9 +231,11 @@ brw_try_upload_using_copy(struct brw_cache *cache, continue; } - brw_bo_map(brw, cache-bo, false, program cache); + if (!brw-has_llc) +brw_bo_map(brw, cache-bo, false, program cache); ret = memcmp(cache-bo-virtual + item-offset, data, item-size); - drm_intel_bo_unmap(cache-bo); + if (!brw-has_llc) +drm_intel_bo_unmap(cache-bo); if (ret) continue; @@ -257,7 +268,7 @@ brw_upload_item_data(struct brw_cache *cache, /* If we would block on writing to an in-use program BO, just * recreate it. */ - if (cache-bo_used_by_gpu) { + if (!brw-has_llc cache-bo_used_by_gpu) { perf_debug(Copying busy program cache buffer.\n); brw_cache_new_bo(cache, cache-bo-size); } @@ -280,6 +291,7 @@ brw_upload_cache(struct brw_cache *cache, uint32_t *out_offset, void *out_aux) { + struct brw_context *brw = cache-brw; struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item); GLuint hash; void *tmp; @@ -320,7 +332,11 @@ brw_upload_cache(struct brw_cache *cache, cache-n_items++; /* Copy data to the buffer */ - drm_intel_bo_subdata(cache-bo, item-offset, data_size, data); + if (brw-has_llc) { + memcpy((char *) cache-bo-virtual + item-offset, data, data_size); + } else { + drm_intel_bo_subdata(cache-bo, item-offset, data_size, data); + } *out_offset = item-offset; *(void **)out_aux = (void *)((char *)item-key + item-key_size); @@ -342,6 +358,8 @@ brw_init_caches(struct brw_context *brw) cache-bo = drm_intel_bo_alloc(brw-bufmgr, program cache, 4096, 64); + if (brw-has_llc) + drm_intel_gem_bo_map_unsynchronized(cache-bo); cache-aux_compare[BRW_VS_PROG] = brw_vs_prog_data_compare; cache-aux_compare[BRW_GS_PROG] = brw_gs_prog_data_compare; @@ -408,6 +426,8 @@ brw_destroy_cache(struct brw_context *brw, struct brw_cache *cache) DBG(%s\n, __FUNCTION__); + if (brw-has_llc) + drm_intel_bo_unmap(cache-bo); drm_intel_bo_unreference(cache-bo); cache-bo = NULL; brw_clear_cache(brw, cache); -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] i965: Issue performance warnings on MapBufferRange stalls.
On Fri, Sep 26, 2014 at 12:38 AM, Kenneth Graunke kenn...@whitecape.org wrote: On Friday, August 29, 2014 11:10:48 PM Kenneth Graunke wrote: This is easy: we just need to use brw_map_bo instead of mapping it directly. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/intel_buffer_objects.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c b/src/mesa/drivers/dri/i965/intel_buffer_objects.c index 96dacde..fb806dc 100644 --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c @@ -421,8 +421,8 @@ intel_bufferobj_map_range(struct gl_context * ctx, intel_obj-map_extra[index], alignment); if (brw-has_llc) { - drm_intel_bo_map(intel_obj-range_map_bo[index], - (access GL_MAP_WRITE_BIT) != 0); + brw_bo_map(brw, intel_obj-range_map_bo[index], +(access GL_MAP_WRITE_BIT) != 0, range-map); } else { drm_intel_gem_bo_map_gtt(intel_obj-range_map_bo[index]); } @@ -438,7 +438,8 @@ intel_bufferobj_map_range(struct gl_context * ctx, drm_intel_gem_bo_map_gtt(intel_obj-buffer); intel_bufferobj_mark_inactive(intel_obj); } else { - drm_intel_bo_map(intel_obj-buffer, (access GL_MAP_WRITE_BIT) != 0); + brw_bo_map(brw, intel_obj-buffer, (access GL_MAP_WRITE_BIT) != 0, + MapBufferRange); intel_bufferobj_mark_inactive(intel_obj); } It's been a month and patches 2-4 haven't received any review. Could someone take a look? Sorry, I saw them go by but didn't review. I like 4/4 a lot. Kristian Thanks, --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] glsl: Optimize min/max expression trees
On Fri, Sep 26, 2014 at 9:02 AM, Iago Toral Quiroga ito...@igalia.com wrote: Original patch by Petri Latvala petri.latv...@intel.com: Add an optimization pass that drops min/max expression operands that can be proven to not contribute to the final result. The algorithm is similar to alpha-beta pruning on a minmax search, from the field of AI. This optimization pass can optimize min/max expressions where operands are min/max expressions. Such code can appear in shaders by itself, or as the result of clamp() or AMD_shader_trinary_minmax functions. This optimization pass improves the generated code for piglit's AMD_shader_trinary_minmax tests as follows: total instructions in shared programs: 75 - 67 (-10.67%) instructions in affected programs: 60 - 52 (-13.33%) GAINED:0 LOST: 0 All tests (max3, min3, mid3) improved. A full shader-db run: total instructions in shared programs: 4293603 - 4293575 (-0.00%) instructions in affected programs: 1188 - 1160 (-2.36%) GAINED:0 LOST: 0 Improvements happen in Guacamelee and Serious Sam 3. One shader from Dungeon Defenders is hurt by shader-db metrics (26 - 28), because of dropping of a (constant float (0.0)) operand, which was compiled to a saturate modifier. Version 2 by Iago Toral Quiroga ito...@igalia.com: Changes from review feedback: - Squashed various cosmetic changes sent by Matt Turner. - Make less_all_components return an enum rather than setting a class member. (Suggested by Mat Turner). Also, renamed it to compare_components. - Make less_all_components, smaller_constant and larger_constant static. (Suggested by Mat Turner) - Change mixmax_range to call its limits low and high instead of range[0] and range[1]. (Suggested by Connor Abbot). - Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by Connor Abbot). - Make the logic more clearer by rearrenging the code and commenting. (Suggested by Connor Abbot). - Added comment to explain why we need to recurse twice. (Suggested by Connor Abbot). - If we cannot prune an expression, do not return early. Instead, attempt to prune its children. (Suggested by Connor Abbot). Other changes: - Instead of having a global valid visitor member, let the various functions that can determine this status return a boolean and check for its value to decide what to do in each case. This is more flexible and allows to recurse into children of parents that could not be prunned due to invalid ranges (so related to the last bullet in the review feedback). - Make sure we always check if a range is valid before working with it. Since any use of get_range, combine_range or range_intersection can invalidate a range we should check for this situation every time we use any of these functions. No piglit regressions observed with Version 2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861 --- Version 2 also passes all unit tests sent by Petri in the original series. src/glsl/Makefile.sources | 1 + src/glsl/glsl_parser_extras.cpp | 1 + src/glsl/ir_optimization.h | 1 + src/glsl/opt_minmax.cpp | 457 4 files changed, 460 insertions(+) create mode 100644 src/glsl/opt_minmax.cpp diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index cb8d5a6..1c08697 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -95,6 +95,7 @@ LIBGLSL_FILES = \ $(GLSL_SRCDIR)/opt_flip_matrices.cpp \ $(GLSL_SRCDIR)/opt_function_inlining.cpp \ $(GLSL_SRCDIR)/opt_if_simplification.cpp \ + $(GLSL_SRCDIR)/opt_minmax.cpp \ $(GLSL_SRCDIR)/opt_noop_swizzle.cpp \ $(GLSL_SRCDIR)/opt_rebalance_tree.cpp \ $(GLSL_SRCDIR)/opt_redundant_jumps.cpp \ diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 490c3c8..ae19ce4 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -1586,6 +1586,7 @@ do_common_optimization(exec_list *ir, bool linked, else progress = do_constant_variable_unlinked(ir) || progress; progress = do_constant_folding(ir) || progress; + progress = do_minmax_prune(ir) || progress; progress = do_cse(ir) || progress; progress = do_rebalance_tree(ir) || progress; progress = do_algebraic(ir, native_integers, options) || progress; diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h index 369dcd1..8fbd992 100644 --- a/src/glsl/ir_optimization.h +++ b/src/glsl/ir_optimization.h @@ -99,6 +99,7 @@ bool opt_flatten_nested_if_blocks(exec_list *instructions); bool do_discard_simplification(exec_list *instructions); bool lower_if_to_cond_assign(exec_list *instructions, unsigned max_depth = 0); bool
[Mesa-dev] [PATCH v2 40/41] i965/fs: Use the GRF for FB writes on gen = 7
On gen 7, the MRF was removed and we gained the ability to do send instructions directly from the GRF. This commit enables that functinoality for FB writes. v2: Make handling of components more sane. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 + src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 167 +-- src/mesa/drivers/dri/i965/brw_shader.cpp | 1 + 4 files changed, 136 insertions(+), 37 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index b43032b..143b590 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -514,6 +514,8 @@ fs_inst::is_send_from_grf() const return true; case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD: return src[1].file == GRF; + case FS_OPCODE_FB_WRITE: + return src[0].file == GRF; default: if (is_tex()) return src[0].file == GRF; @@ -917,6 +919,8 @@ fs_inst::regs_read(fs_visitor *v, int arg) const { if (is_tex() arg == 0 src[0].file == GRF) { return mlen; + } else if (opcode == FS_OPCODE_FB_WRITE arg == 0) { + return mlen; } else if (opcode == SHADER_OPCODE_UNTYPED_ATOMIC arg == 0) { return mlen; } else if (opcode == SHADER_OPCODE_UNTYPED_SURFACE_READ arg == 0) { diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 7500e8e..a91bf9f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -521,6 +521,7 @@ public: fs_reg dst, fs_reg src0, fs_reg src1, fs_reg one); void emit_color_write(fs_reg color, int index, int first_color_mrf); + int setup_color_payload(fs_reg *dst, fs_reg color, unsigned components); void emit_alpha_test(); fs_inst *emit_single_fb_write(fs_reg color1, fs_reg color2, fs_reg src0_alpha, unsigned components); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 8e38315..e72fb62 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -3005,6 +3005,82 @@ fs_visitor::emit_color_write(fs_reg color, int index, int first_color_mrf) } } +int +fs_visitor::setup_color_payload(fs_reg *dst, fs_reg color, unsigned components) +{ + fs_inst *inst; + + if (color.file == BAD_FILE) { + return 4 * (dispatch_width / 8); + } + + uint8_t colors_enabled; + if (components == 0) { + /* We want to write one component to the alpha channel */ + colors_enabled = 0x8; + } else { + /* Enable the first components-many channels */ + colors_enabled = (1 components) - 1; + } + + if (dispatch_width == 8 || brw-gen = 6) { + /* SIMD8 write looks like: + * m + 0: r0 + * m + 1: r1 + * m + 2: g0 + * m + 3: g1 + * + * gen6 SIMD16 DP write looks like: + * m + 0: r0 + * m + 1: r1 + * m + 2: g0 + * m + 3: g1 + * m + 4: b0 + * m + 5: b1 + * m + 6: a0 + * m + 7: a1 + */ + int len = 0; + for (unsigned i = 0; i 4; ++i) { + if (colors_enabled (1 i)) { +dst[len] = fs_reg(GRF, virtual_grf_alloc(color.width / 8), + color.type, color.width); +inst = emit(MOV(dst[len], offset(color, i))); +inst-saturate = key-clamp_fragment_color; + } else if (color.width == 16) { +/* We need two BAD_FILE slots for a 16-wide color */ +len++; + } + len++; + } + return len; + } else { + /* pre-gen6 SIMD16 single source DP write looks like: + * m + 0: r0 + * m + 1: g0 + * m + 2: b0 + * m + 3: a0 + * m + 4: r1 + * m + 5: g1 + * m + 6: b1 + * m + 7: a1 + */ + for (unsigned i = 0; i 4; ++i) { + if (colors_enabled (1 i)) { +dst[i] = fs_reg(GRF, virtual_grf_alloc(1), color.type); +inst = emit(MOV(dst[i], half(offset(color, i), 0))); +inst-saturate = key-clamp_fragment_color; + +dst[i + 4] = fs_reg(GRF, virtual_grf_alloc(1), color.type); +inst = emit(MOV(dst[i + 4], half(offset(color, i), 1))); +inst-saturate = key-clamp_fragment_color; +inst-force_sechalf = true; + } + } + return 8; + } +} + static enum brw_conditional_mod cond_for_alpha_func(GLenum func) { @@ -3063,12 +3139,13 @@ fs_visitor::emit_single_fb_write(fs_reg color0, fs_reg color1, { this-current_annotation = FB write header; bool header_present = true; + int reg_size = dispatch_width / 8; + /* We can potentially have a message length of up to 15, so we have to set * base_mrf to either 0 or 1 in order to fit in m0..m15. */ - int base_mrf = 1; - int nr = base_mrf; -
[Mesa-dev] [PATCH 42/41] i965: Fix widths on gen5 math instructions.
This commit uses a 16-wide MRF instead of a hardware register when setting up math instructions and properly sets the base_mrf on the second emitted instruction. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 143b590..af9736b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1648,7 +1648,7 @@ fs_visitor::emit_math(enum opcode opcode, fs_reg dst, fs_reg src0, fs_reg src1) fs_reg op0 = is_int_div ? src1 : src0; fs_reg op1 = is_int_div ? src0 : src1; - emit(BRW_OPCODE_MOV, fs_reg(MRF, base_mrf + 1, op1.type), op1); + emit(MOV(fs_reg(MRF, base_mrf + 1, op1.type, dispatch_width), op1)); inst = emit(opcode, dst, op0, reg_null_f); inst-base_mrf = base_mrf; diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 59c7e7c..485c050 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -346,7 +346,7 @@ fs_generator::generate_math_gen4(fs_inst *inst, brw_set_default_compression_control(p, BRW_COMPRESSION_NONE); gen4_math(p, firsthalf(dst), op, - inst-base_mrf + 1, firsthalf(src), + inst-base_mrf, firsthalf(src), BRW_MATH_DATA_VECTOR, BRW_MATH_PRECISION_FULL); brw_set_default_compression_control(p, BRW_COMPRESSION_2NDHALF); -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06.5/41] SQUAHS: i965/fs: Always 2-align registers SIMD16 for gen = 5
--- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 61 ++- 1 file changed, 48 insertions(+), 13 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp index 567f8e2..8d96906 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp @@ -117,7 +117,21 @@ brw_alloc_reg_set(struct intel_screen *screen, int reg_width) /* Compute the total number of registers across all classes. */ int ra_reg_count = 0; for (int i = 0; i class_count; i++) { - ra_reg_count += base_reg_count - (class_sizes[i] - 1); + if (devinfo-gen = 5 reg_width == 2) { + /* From the GM5 PRM: + * + * In order to reduce the hardware complexity, the following + * rules and restrictions apply to the compressed instruction: + * ... + * * Operand Alignment Rule: With the exceptions listed below, a + * source/destination operand in general should be aligned to + * even 256-bit physical register with a region size equal to + * two 256-bit physical register + */ + ra_reg_count += (base_reg_count - (class_sizes[i] - 1)) / 2; + } else { + ra_reg_count += base_reg_count - (class_sizes[i] - 1); + } } uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, ra_reg_count); @@ -134,27 +148,48 @@ brw_alloc_reg_set(struct intel_screen *screen, int reg_width) int pairs_base_reg = 0; int pairs_reg_count = 0; for (int i = 0; i class_count; i++) { - int class_reg_count = base_reg_count - (class_sizes[i] - 1); + int class_reg_count; + if (devinfo-gen = 5 reg_width == 2) { + class_reg_count = (base_reg_count - (class_sizes[i] - 1)) / 2; + } else { + class_reg_count = base_reg_count - (class_sizes[i] - 1); + } classes[i] = ra_alloc_reg_class(regs); /* Save this off for the aligned pair class at the end. */ if (class_sizes[i] == 2) { -pairs_base_reg = reg; -pairs_reg_count = class_reg_count; + pairs_base_reg = reg; + pairs_reg_count = class_reg_count; } - for (int j = 0; j class_reg_count; j++) { -ra_class_add_reg(regs, classes[i], reg); + if (devinfo-gen = 5 reg_width == 2) { + for (int j = 0; j class_reg_count; j++) { +ra_class_add_reg(regs, classes[i], reg); -ra_reg_to_grf[reg] = j; +ra_reg_to_grf[reg] = j * 2; -for (int base_reg = j; - base_reg j + class_sizes[i]; - base_reg++) { - ra_add_transitive_reg_conflict(regs, base_reg, reg); -} +for (int base_reg = j * 2; + base_reg j * 2 + class_sizes[i]; + base_reg++) { + ra_add_transitive_reg_conflict(regs, base_reg, reg); +} -reg++; +reg++; + } + } else { + for (int j = 0; j class_reg_count; j++) { +ra_class_add_reg(regs, classes[i], reg); + +ra_reg_to_grf[reg] = j; + +for (int base_reg = j; + base_reg j + class_sizes[i]; + base_reg++) { + ra_add_transitive_reg_conflict(regs, base_reg, reg); +} + +reg++; + } } } assert(reg == ra_reg_count); -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10.5/41] SQUASH: i965/fs: Properly set writemasks in LOAD_PAYLOAD
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 56 +++- 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 444cc32..4d97594 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2865,10 +2865,44 @@ fs_visitor::lower_load_payload() { bool progress = false; + int vgrf_to_reg[virtual_grf_count]; + int reg_count = 16; /* Leave room for MRF */ + for (int i = 0; i virtual_grf_count; ++i) { + vgrf_to_reg[i] = reg_count; + reg_count += virtual_grf_sizes[i]; + } + + struct { + bool written:1; /* Whether this register has ever been written */ + bool force_writemask_all:1; + bool force_sechalf:1; + } metadata[reg_count]; + memset(metadata, 0, sizeof(metadata)); + calculate_cfg(); foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { + int dst_reg; + if (inst-dst.file == MRF) { + dst_reg = inst-dst.reg; + } else if (inst-dst.file == GRF) { + dst_reg = vgrf_to_reg[inst-dst.reg]; + } + + if (inst-dst.file == MRF || inst-dst.file == GRF) { + bool force_sechalf = inst-force_sechalf; + bool toggle_sechalf = inst-dst.width == 16 + type_sz(inst-dst.type) == 4; + for (int i = 0; i inst-regs_written; ++i) { +metadata[dst_reg + i].written = true; +metadata[dst_reg + i].force_sechalf = force_sechalf; +metadata[dst_reg + i].force_writemask_all = inst-force_writemask_all; +force_sechalf = (toggle_sechalf != force_sechalf); + } + } + if (inst-opcode == SHADER_OPCODE_LOAD_PAYLOAD) { + assert(inst-dst.file == MRF || inst-dst.file == GRF); fs_reg dst = inst-dst; for (int i = 0; i inst-sources; i++) { @@ -2879,7 +2913,27 @@ fs_visitor::lower_load_payload() /* Do nothing but otherwise increment as normal */ } else { fs_inst *mov = MOV(dst, inst-src[i]); - mov-force_writemask_all = true; + if (inst-src[i].file == GRF) { + int src_reg = vgrf_to_reg[inst-src[i].reg] + +inst-src[i].reg_offset; + mov-force_sechalf = metadata[src_reg].force_sechalf; + mov-force_writemask_all = metadata[src_reg].force_writemask_all; + metadata[dst_reg] = metadata[src_reg]; + if (dst.width * type_sz(dst.type) 32) { + assert((!metadata[src_reg].written || + !metadata[src_reg].force_sechalf) +(!metadata[src_reg + 1].written || + metadata[src_reg + 1].force_sechalf)); + metadata[dst_reg + 1] = metadata[src_reg + 1]; + } + } else { + metadata[dst_reg].force_writemask_all = false; + metadata[dst_reg].force_sechalf = false; + if (dst.width == 16) { + metadata[dst_reg + 1].force_writemask_all = false; + metadata[dst_reg + 1].force_sechalf = true; + } + } inst-insert_before(block, mov); } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/12] i965/fs: Copy propagate partial reads.
This commit reworks copy propagation a bit to support propagating the copying of partial registers. This comes up every time we have pull constants because we do a pull constant read immediately followed by a move to splat the one component of the out to 8 or 16-wide. This allows us to eliminate the copy and simply use the one component of the register. Shader DB results: total instructions in shared programs: 5044937 - 5044428 (-0.01%) instructions in affected programs: 66112 - 65603 (-0.77%) GAINED:0 LOST: 0 Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.h | 1 + .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 83 -- 2 files changed, 64 insertions(+), 20 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 50b5fc1..9b63114 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -337,6 +337,7 @@ public: bool opt_cse_local(bblock_t *block); bool opt_copy_propagate(); bool try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry); + bool try_constant_propagate(fs_inst *inst, acp_entry *entry); bool opt_copy_propagate_local(void *mem_ctx, bblock_t *block, exec_list *acp); void opt_drop_redundant_mov_to_flags(); diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index e5816df..a97dc04 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -277,24 +277,30 @@ is_logic_op(enum opcode opcode) bool fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry) { + if (inst-src[arg].file != GRF) + return false; + if (entry-src.file == IMM) return false; + assert(entry-src.file == GRF || entry-src.file == UNIFORM); if (entry-opcode == SHADER_OPCODE_LOAD_PAYLOAD inst-opcode == SHADER_OPCODE_LOAD_PAYLOAD) return false; - /* Bail if inst is reading more than entry is writing. */ - if ((inst-regs_read(this, arg) * inst-src[arg].stride * -type_sz(inst-src[arg].type)) type_sz(entry-dst.type)) + assert(entry-dst.file == GRF); + if (inst-src[arg].reg != entry-dst.reg) return false; - if (inst-src[arg].file != entry-dst.file || - inst-src[arg].reg != entry-dst.reg || - inst-src[arg].reg_offset != entry-dst.reg_offset || - inst-src[arg].subreg_offset != entry-dst.subreg_offset) { + /* Bail if inst is reading a range that isn't contained in the range +* that entry is writing. +*/ + int reg_size = dispatch_width * sizeof(float); + if (inst-src[arg].reg_offset entry-dst.reg_offset || + (inst-src[arg].reg_offset * reg_size + inst-src[arg].subreg_offset + +inst-regs_read(this, arg) * inst-src[arg].stride * reg_size) + (entry-dst.reg_offset + 1) * reg_size) return false; - } /* See resolve_ud_negate() and comment in brw_fs_emit.cpp. */ if (inst-conditional_mod @@ -361,11 +367,39 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry) inst-src[arg].file = entry-src.file; inst-src[arg].reg = entry-src.reg; - inst-src[arg].reg_offset = entry-src.reg_offset; - inst-src[arg].subreg_offset = entry-src.subreg_offset; inst-src[arg].stride *= entry-src.stride; inst-saturate = inst-saturate || entry-saturate; + switch (entry-src.file) { + case BAD_FILE: + case HW_REG: + case UNIFORM: + inst-src[arg].reg_offset = entry-src.reg_offset; + inst-src[arg].subreg_offset = entry-src.subreg_offset; + break; + case GRF: + { + /* In this case, we have to deal with mapping parts of vgrfs to + * other parts of vgrfs so we have to do some reg_offset magic. + */ + + /* Compute the offset of inst-src[arg] relative to inst-dst */ + assert(entry-dst.subreg_offset == 0); + int rel_offset = inst-src[arg].reg_offset - entry-dst.reg_offset; + int rel_suboffset = inst-src[arg].subreg_offset; + + /* Compute the final register offset (in bytes) */ + int offset = entry-src.reg_offset * reg_size + entry-src.subreg_offset; + offset += rel_offset * reg_size + rel_suboffset; + inst-src[arg].reg_offset = offset / reg_size; + inst-src[arg].subreg_offset = offset % reg_size; + } + break; + default: + unreachable(Invalid register file); + break; + } + if (!inst-src[arg].abs) { inst-src[arg].abs = entry-src.abs; inst-src[arg].negate ^= entry-src.negate; @@ -375,9 +409,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry) } -static bool -try_constant_propagate(struct brw_context *brw, fs_inst *inst, -
[Mesa-dev] [PATCH 39.2/41] i965/fs: Handle COMPR4 in LOAD_PAYLOAD
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 15 +++ src/mesa/drivers/dri/i965/brw_fs.h | 22 +- 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 97b21e3..b43032b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2988,6 +2988,21 @@ fs_visitor::lower_load_payload() if (inst-src[i].file == BAD_FILE) { /* Do nothing but otherwise increment as normal */ +} else if (dst.file == MRF + dst.width == 8 + brw-has_compr4 + i + 4 inst-sources + inst-src[i + 4].equals(horiz_offset(inst-src[i], 8))) { + fs_reg compr4_dst = dst; + compr4_dst.reg += BRW_MRF_COMPR4; + compr4_dst.width = 16; + fs_reg compr4_src = inst-src[i]; + compr4_src.width = 16; + fs_inst *mov = MOV(compr4_dst, compr4_src); + mov-force_writemask_all = true; + inst-insert_before(block, mov); + /* Mark i+4 as BAD_FILE so we don't emit a MOV for it */ + inst-src[i + 4].file = BAD_FILE; } else { fs_inst *mov = MOV(dst, inst-src[i]); if (inst-src[i].file == GRF) { diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 14bbac2..7500e8e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -143,6 +143,26 @@ byte_offset(fs_reg reg, unsigned delta) } static inline fs_reg +horiz_offset(fs_reg reg, unsigned delta) +{ + switch (reg.file) { + case BAD_FILE: + case UNIFORM: + case IMM: + /* These only have a single component that is implicitly splatted. A + * horizontal offset should be a harmless no-op. + */ + break; + case GRF: + case MRF: + return byte_offset(reg, delta * reg.stride * type_sz(reg.type)); + default: + assert(delta == 0); + } + return reg; +} + +static inline fs_reg offset(fs_reg reg, unsigned delta) { assert(reg.stride 0); @@ -183,7 +203,7 @@ half(fs_reg reg, unsigned idx) assert(idx == 0 || (reg.file != HW_REG reg.file != IMM)); assert(reg.width == 16); reg.width = 8; - return byte_offset(reg, 8 * idx * reg.stride * type_sz(reg.type)); + return horiz_offset(reg, 8 * idx); } static const fs_reg reg_undef; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 45/63] i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor*
Now that we have execution sizes, we can use that instead of the dispatch width. This way it also works for 8-wide instructions in SIMD16. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 10 +- src/mesa/drivers/dri/i965/brw_fs.h| 4 ++-- src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 4 ++-- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 551bc2b..ffbfdbd 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -354,7 +354,7 @@ fs_visitor::LOAD_PAYLOAD(const fs_reg dst, fs_reg *src, int sources) * dealing with whole registers. If this ever changes, we can deal * with it later. */ - int size = src[i].effective_width(this) * type_sz(src[i].type); + int size = src[i].effective_width(inst) * type_sz(src[i].type); assert(size % 32 == 0); inst-regs_written += (size + 31) / 32; } @@ -583,7 +583,7 @@ fs_reg::equals(const fs_reg r) const } uint8_t -fs_reg::effective_width(const fs_visitor *v) const +fs_reg::effective_width(const fs_inst *inst) const { switch (this-file) { case BAD_FILE: @@ -591,10 +591,10 @@ fs_reg::effective_width(const fs_visitor *v) const case UNIFORM: case IMM: assert(this-width == 1); - return v-dispatch_width; + return inst-exec_size; case GRF: case HW_REG: - assert(this-width 1 this-width = v-dispatch_width); + assert(this-width 1 this-width = inst-exec_size); assert(this-width % 8 == 0); return this-width; case MRF: @@ -2994,7 +2994,7 @@ fs_visitor::lower_load_payload() fs_reg dst = inst-dst; for (int i = 0; i inst-sources; i++) { -dst.width = inst-src[i].effective_width(this); +dst.width = inst-src[i].effective_width(inst); dst.type = inst-src[i].type; if (inst-src[i].file == BAD_FILE) { diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 4ffbec8..c282b5e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -62,7 +62,7 @@ namespace brw { class fs_live_variables; } -class fs_visitor; +class fs_inst; class fs_reg : public backend_reg { public: @@ -110,7 +110,7 @@ public: * effectively take on the width of the instruction in which they are * used. */ - uint8_t effective_width(const fs_visitor *v) const; + uint8_t effective_width(const fs_inst *inst) const; /** Register region horizontal stride */ uint8_t stride; diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index aafc49b..73a196d 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -640,13 +640,13 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, bblock_t *block, inst-dst.file == GRF) { int offset = 0; for (int i = 0; i inst-sources; i++) { -int regs_written = ((inst-src[i].effective_width(this) * +int regs_written = ((inst-src[i].effective_width(inst) * type_sz(inst-src[i].type)) + 31) / 32; if (inst-src[i].file == GRF) { acp_entry *entry = ralloc(copy_prop_ctx, acp_entry); entry-dst = inst-dst; entry-dst.reg_offset = offset; - entry-dst.width = inst-src[i].effective_width(this); + entry-dst.width = inst-src[i].effective_width(inst); entry-src = inst-src[i]; entry-regs_written = regs_written; entry-opcode = inst-opcode; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 39.1/41] i965/fs: Constant propagate into LOAD_PAYLOAD
--- src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index 7dfed6e..6b7ec79 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -456,6 +456,7 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry) switch (inst-opcode) { case BRW_OPCODE_MOV: + case SHADER_OPCODE_LOAD_PAYLOAD: inst-src[i] = val; progress = true; break; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 41/41] SQUASH: i965/fs: Force a high register for the final FB write
v2: Renamed the array for the range mappings and added a comment. --- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 34 ++- src/mesa/drivers/dri/i965/intel_screen.h | 10 +++ 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp index 246d27c..477efe1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp @@ -113,6 +113,10 @@ brw_alloc_reg_set(struct intel_screen *screen, int reg_width) class_sizes[class_count++] = 8; } + memset(screen-wm_reg_sets[index].class_to_ra_reg_range, 0, + sizeof(screen-wm_reg_sets[index].class_to_ra_reg_range)); + int *class_to_ra_reg_range = screen-wm_reg_sets[index].class_to_ra_reg_range; + /* Compute the total number of registers across all classes. */ int ra_reg_count = 0; for (int i = 0; i class_count; i++) { @@ -131,6 +135,14 @@ brw_alloc_reg_set(struct intel_screen *screen, int reg_width) } else { ra_reg_count += base_reg_count - (class_sizes[i] - 1); } + /* Mark the last register. We'll fill in the beginnings later. */ + class_to_ra_reg_range[class_sizes[i]] = ra_reg_count; + } + + /* Fill out the rest of the range markers */ + for (int i = 1; i 17; ++i) { + if (class_to_ra_reg_range[i] == 0) + class_to_ra_reg_range[i] = class_to_ra_reg_range[i-1]; } uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, ra_reg_count); @@ -504,9 +516,29 @@ fs_visitor::assign_regs(bool allow_spilling) } setup_payload_interference(g, payload_node_count, first_payload_node); - if (brw-gen = 7) + if (brw-gen = 7) { setup_mrf_hack_interference(g, first_mrf_hack_node); + foreach_in_list(fs_inst, inst, instructions) { + /* When we do send-from-GRF for FB writes, we need to ensure that + * the last write instruction sends from a high register. This is + * because the vertex fetcher wants to start filling the low + * payload registers while the pixel data port is still working on + * writing out the memory. If we don't do this, we get rendering + * artifacts. + * + * We could just do something high. Instead, we just pick the + * highest register that works. + */ + if (inst-opcode == FS_OPCODE_FB_WRITE inst-eot) { +int size = virtual_grf_sizes[inst-src[0].reg]; +int reg = screen-wm_reg_sets[rsi].class_to_ra_reg_range[size] - 1; +ra_set_node_reg(g, inst-src[0].reg, reg); +break; + } + } + } + if (dispatch_width 8) { /* In 16-wide dispatch we have an issue where a compressed * instruction is actually two instructions executed simultaneiously. diff --git a/src/mesa/drivers/dri/i965/intel_screen.h b/src/mesa/drivers/dri/i965/intel_screen.h index 945f6f5..88a84a2 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.h +++ b/src/mesa/drivers/dri/i965/intel_screen.h @@ -90,6 +90,16 @@ struct intel_screen int classes[16]; /** + * Mapping from classes to ra_reg ranges. Each of the per-size + * classes corresponds to a range of ra_reg nodes. This array stores + * those ranges in the form of first ra_reg in each class and the + * total number of ra_reg elements in the last array element. This + * way the range of the i'th class is given by: + * [ class_to_ra_reg_range[i], class_to_ra_reg_range[i+1] ) + */ + int class_to_ra_reg_range[17]; + + /** * Mapping for register-allocated objects in *regs to the first * GRF for that object. */ -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 17/41] SQUASH: i965/fs: Properly handle widths in copy propagation
v2: Account for register ranges due to the rebase on top of the patch to propagate subsets of copied registers --- .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 40 ++ 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index cfb17bf..01113f3 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -42,6 +42,7 @@ namespace { /* avoid conflict with opt_copy_propagation_elements */ struct acp_entry : public exec_node { fs_reg dst; fs_reg src; + uint8_t regs_written; enum opcode opcode; bool saturate; }; @@ -295,11 +296,10 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry) /* Bail if inst is reading a range that isn't contained in the range * that entry is writing. */ - int reg_size = dispatch_width * sizeof(float); if (inst-src[arg].reg_offset entry-dst.reg_offset || - (inst-src[arg].reg_offset * reg_size + inst-src[arg].subreg_offset + -inst-regs_read(this, arg) * inst-src[arg].stride * reg_size) - (entry-dst.reg_offset + 1) * reg_size) + (inst-src[arg].reg_offset * 32 + inst-src[arg].subreg_offset + +inst-regs_read(this, arg) * inst-src[arg].stride * 32) + (entry-dst.reg_offset + entry-regs_written) * 32) return false; /* See resolve_ud_negate() and comment in brw_fs_emit.cpp. */ @@ -371,16 +371,25 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry) inst-saturate = inst-saturate || entry-saturate; switch (entry-src.file) { + case UNIFORM: + assert(entry-src.width == 1); case BAD_FILE: case HW_REG: - case UNIFORM: + inst-src[arg].width = entry-src.width; inst-src[arg].reg_offset = entry-src.reg_offset; inst-src[arg].subreg_offset = entry-src.subreg_offset; break; case GRF: { - /* In this case, we have to deal with mapping parts of vgrfs to - * other parts of vgrfs so we have to do some reg_offset magic. + assert(entry-src.width % inst-src[arg].width == 0); + /* In this case, we'll just leave the width alone. The source + * register could have different widths depending on how it is + * being used. For instance, if only half of the register was + * used then we want to preserve that and continue to only use + * half. + * + * Also, we have to deal with mapping parts of vgrfs to other + * parts of vgrfs so we have to do some reg_offset magic. */ /* Compute the offset of inst-src[arg] relative to inst-dst */ @@ -389,10 +398,10 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry) int rel_suboffset = inst-src[arg].subreg_offset; /* Compute the final register offset (in bytes) */ - int offset = entry-src.reg_offset * reg_size + entry-src.subreg_offset; - offset += rel_offset * reg_size + rel_suboffset; - inst-src[arg].reg_offset = offset / reg_size; - inst-src[arg].subreg_offset = offset % reg_size; + int offset = entry-src.reg_offset * 32 + entry-src.subreg_offset; + offset += rel_offset * 32 + rel_suboffset; + inst-src[arg].reg_offset = offset / 32; + inst-src[arg].subreg_offset = offset % 32; } break; default: @@ -429,11 +438,10 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry) /* Bail if inst is reading a range that isn't contained in the range * that entry is writing. */ - int reg_size = dispatch_width * sizeof(float); if (inst-src[i].reg_offset entry-dst.reg_offset || - (inst-src[i].reg_offset * reg_size + inst-src[i].subreg_offset + - inst-regs_read(this, i) * inst-src[i].stride * reg_size) - (entry-dst.reg_offset + 1) * reg_size) + (inst-src[i].reg_offset * 32 + inst-src[i].subreg_offset + + inst-regs_read(this, i) * inst-src[i].stride * 32) + (entry-dst.reg_offset + entry-regs_written) * 32) continue; /* Don't bother with cases that should have been taken care of by the @@ -623,6 +631,7 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, bblock_t *block, acp_entry *entry = ralloc(copy_prop_ctx, acp_entry); entry-dst = inst-dst; entry-src = inst-src[0]; + entry-regs_written = inst-regs_written; entry-opcode = inst-opcode; entry-saturate = inst-saturate; acp[entry-dst.reg % ACP_HASH_SIZE].push_tail(entry); @@ -638,6 +647,7 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, bblock_t *block, entry-dst.reg_offset = offset; entry-dst.width = inst-src[i].effective_width(this);
Re: [Mesa-dev] [PATCH 0.1/2] mesa: Add new variables in gl_context to store sample number layout
On Tue, Sep 23, 2014 at 5:38 PM, Anuj Phogat anuj.pho...@gmail.com wrote: Variables are used in later patches to implement EXT_framebuffer_multisample_blit_scaled extension. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/main/mtypes.h | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 0d50be8..1cb3461 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3608,6 +3608,15 @@ struct gl_constants GLint MaxDepthTextureSamples; GLint MaxIntegerSamples; + /** +* Layout of sample numbers in a rectangular grid roughly corresponding +* to real sample locations within a pixel. Used by +* GL_EXT_texture_multisample_blit_scaled implementation. +*/ + GLchar* sample_map_2x; + GLchar* sample_map_4x; + GLchar* sample_map_8x; I think this would be better: uint8_t SampleMap2x[2]; Using a string here seems confusing. The meta code can use asprintf to build the string. The CamelCase name seems to follow the convention of this structure. uint8_t doesn't follow the convection of the structure. :) (But, Ian seems to often try to move us away from GL types when not API facing.) Do you think the comment could be improved to help drivers understand the purpose of the constants? The comment in the 0.2 patch was pretty clear, but it is i965 specific. If you agree to my suggestions, then you should probably send out all 4 patches as a series. -Jordan /** GL_ARB_shader_atomic_counters */ GLuint MaxAtomicBufferBindings; GLuint MaxAtomicBufferSize; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0.1/2] mesa: Add new variables in gl_context to store sample number layout
On Fri, Sep 26, 2014 at 12:50 PM, Jordan Justen jljus...@gmail.com wrote: On Tue, Sep 23, 2014 at 5:38 PM, Anuj Phogat anuj.pho...@gmail.com wrote: Variables are used in later patches to implement EXT_framebuffer_multisample_blit_scaled extension. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/main/mtypes.h | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 0d50be8..1cb3461 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3608,6 +3608,15 @@ struct gl_constants GLint MaxDepthTextureSamples; GLint MaxIntegerSamples; + /** +* Layout of sample numbers in a rectangular grid roughly corresponding +* to real sample locations within a pixel. Used by +* GL_EXT_texture_multisample_blit_scaled implementation. +*/ + GLchar* sample_map_2x; + GLchar* sample_map_4x; + GLchar* sample_map_8x; I think this would be better: uint8_t SampleMap2x[2]; Using a string here seems confusing. The meta code can use asprintf to build the string. Yes, I had this thought earlier but wasn't sure. Will fix it now. The CamelCase name seems to follow the convention of this structure. uint8_t doesn't follow the convection of the structure. :) (But, Ian seems to often try to move us away from GL types when not API facing.) Do you think the comment could be improved to help drivers understand the purpose of the constants? The comment in the 0.2 patch was pretty clear, but it is i965 specific. If you agree to my suggestions, then you should probably send out all 4 patches as a series. I agree. I'll soon send out the series. Thanks. -Jordan /** GL_ARB_shader_atomic_counters */ GLuint MaxAtomicBufferBindings; GLuint MaxAtomicBufferSize; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/6] st/va: implement vlVa(Create|Destroy|Query|Get)Config
From: Christian König christian.koe...@amd.com This patch is for application to query configuration, such as profiles, entrypoints, and attributes v2: fix missing profile with query Signed-off-by: Michael Varga michael.va...@amd.com Signed-off-by: Christian König christian.koe...@amd.com Signed-off-by: Leo Liu leo@amd.com --- src/gallium/state_trackers/va/config.c | 78 -- src/gallium/state_trackers/va/context.c| 2 +- src/gallium/state_trackers/va/va_private.h | 68 ++ 3 files changed, 143 insertions(+), 5 deletions(-) diff --git a/src/gallium/state_trackers/va/config.c b/src/gallium/state_trackers/va/config.c index d548780..cfb0b25 100644 --- a/src/gallium/state_trackers/va/config.c +++ b/src/gallium/state_trackers/va/config.c @@ -26,16 +26,32 @@ * **/ +#include pipe/p_screen.h + +#include vl/vl_winsys.h + #include va_private.h VAStatus vlVaQueryConfigProfiles(VADriverContextP ctx, VAProfile *profile_list, int *num_profiles) { + struct pipe_screen *pscreen; + enum pipe_video_profile p; + VAProfile vap; + if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; *num_profiles = 0; + pscreen = VL_VA_PSCREEN(ctx); + for (p = PIPE_VIDEO_PROFILE_MPEG2_SIMPLE; p = PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH; ++p) + if (pscreen-get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, PIPE_VIDEO_CAP_SUPPORTED)) { + vap = PipeToProfile(p); + if (vap != VAProfileNone) +profile_list[(*num_profiles)++] = vap; + } + return VA_STATUS_SUCCESS; } @@ -43,11 +59,24 @@ VAStatus vlVaQueryConfigEntrypoints(VADriverContextP ctx, VAProfile profile, VAEntrypoint *entrypoint_list, int *num_entrypoints) { + struct pipe_screen *pscreen; + enum pipe_video_profile p; + if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; *num_entrypoints = 0; + p = ProfileToPipe(profile); + if (p == PIPE_VIDEO_PROFILE_UNKNOWN) + return VA_STATUS_ERROR_UNSUPPORTED_PROFILE; + + pscreen = VL_VA_PSCREEN(ctx); + if (!pscreen-get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, PIPE_VIDEO_CAP_SUPPORTED)) + return VA_STATUS_ERROR_UNSUPPORTED_PROFILE; + + entrypoint_list[(*num_entrypoints)++] = VAEntrypointVLD; + return VA_STATUS_SUCCESS; } @@ -55,20 +84,54 @@ VAStatus vlVaGetConfigAttributes(VADriverContextP ctx, VAProfile profile, VAEntrypoint entrypoint, VAConfigAttrib *attrib_list, int num_attribs) { + int i; + if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; - return VA_STATUS_ERROR_UNIMPLEMENTED; + for (i = 0; i num_attribs; ++i) { + unsigned int value; + switch (attrib_list[i].type) { + case VAConfigAttribRTFormat: + value = VA_RT_FORMAT_YUV420; + break; + case VAConfigAttribRateControl: +value = VA_RC_NONE; + break; + default: + value = VA_ATTRIB_NOT_SUPPORTED; + break; + } + attrib_list[i].value = value; + } + + return VA_STATUS_SUCCESS; } VAStatus vlVaCreateConfig(VADriverContextP ctx, VAProfile profile, VAEntrypoint entrypoint, VAConfigAttrib *attrib_list, int num_attribs, VAConfigID *config_id) { + struct pipe_screen *pscreen; + enum pipe_video_profile p; + if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; - return VA_STATUS_ERROR_UNIMPLEMENTED; + p = ProfileToPipe(profile); + if (p == PIPE_VIDEO_PROFILE_UNKNOWN) + return VA_STATUS_ERROR_UNSUPPORTED_PROFILE; + + pscreen = VL_VA_PSCREEN(ctx); + if (!pscreen-get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, PIPE_VIDEO_CAP_SUPPORTED)) + return VA_STATUS_ERROR_UNSUPPORTED_PROFILE; + + if (entrypoint != VAEntrypointVLD) + return VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT; + + *config_id = p; + + return VA_STATUS_SUCCESS; } VAStatus @@ -77,7 +140,7 @@ vlVaDestroyConfig(VADriverContextP ctx, VAConfigID config_id) if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; - return VA_STATUS_ERROR_UNIMPLEMENTED; + return VA_STATUS_SUCCESS; } VAStatus @@ -87,5 +150,12 @@ vlVaQueryConfigAttributes(VADriverContextP ctx, VAConfigID config_id, VAProfile if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; - return VA_STATUS_ERROR_UNIMPLEMENTED; + *profile = PipeToProfile(config_id); + *entrypoint = VAEntrypointVLD; + + *num_attribs = 1; + attrib_list[0].type = VAConfigAttribRTFormat; + attrib_list[0].value = VA_RT_FORMAT_YUV420; + + return VA_STATUS_SUCCESS; } diff --git a/src/gallium/state_trackers/va/context.c b/src/gallium/state_trackers/va/context.c index 71651aa..048c3f2 100644 --- a/src/gallium/state_trackers/va/context.c +++ b/src/gallium/state_trackers/va/context.c @@ -104,7 +104,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
[Mesa-dev] [PATCH v2 6/6] st/va: implement vlVa(Query|Create|Get|Put|Destroy)Image
This patch implements functions for images support, which basically supports copy data between video surface and user buffers, in this case supports SW decode, and other video output v2: fix buffer size for odd-sized image case expose I420 format as well Signed-off-by: Leo Liu leo@amd.com --- src/gallium/state_trackers/va/context.c| 2 +- src/gallium/state_trackers/va/image.c | 254 - src/gallium/state_trackers/va/va_private.h | 22 +++ 3 files changed, 269 insertions(+), 9 deletions(-) diff --git a/src/gallium/state_trackers/va/context.c b/src/gallium/state_trackers/va/context.c index 1819ec5..ae87d3b 100644 --- a/src/gallium/state_trackers/va/context.c +++ b/src/gallium/state_trackers/va/context.c @@ -121,7 +121,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx) ctx-max_profiles = PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH - PIPE_VIDEO_PROFILE_UNKNOWN; ctx-max_entrypoints = 1; ctx-max_attributes = 1; - ctx-max_image_formats = 1; + ctx-max_image_formats = VL_VA_MAX_IMAGE_FORMATS; ctx-max_subpic_formats = 1; ctx-max_display_attributes = 1; ctx-str_vendor = mesa gallium vaapi; diff --git a/src/gallium/state_trackers/va/image.c b/src/gallium/state_trackers/va/image.c index 8aaa29c..d3c9f20 100644 --- a/src/gallium/state_trackers/va/image.c +++ b/src/gallium/state_trackers/va/image.c @@ -26,18 +26,66 @@ * **/ +#include pipe/p_screen.h + +#include util/u_memory.h +#include util/u_handle_table.h +#include util/u_surface.h +#include util/u_video.h + +#include vl/vl_winsys.h + #include va_private.h +static const VAImageFormat formats[VL_VA_MAX_IMAGE_FORMATS] = +{ + {VA_FOURCC('N','V','1','2')}, + {VA_FOURCC('I','4','2','0')}, + {VA_FOURCC('Y','V','1','2')}, + {VA_FOURCC('Y','U','Y','V')}, + {VA_FOURCC('U','Y','V','Y')}, +}; + +static void +vlVaVideoSurfaceSize(vlVaSurface *p_surf, int component, + unsigned *width, unsigned *height) +{ + *width = p_surf-templat.width; + *height = p_surf-templat.height; + + if (component 0) { + if (p_surf-templat.chroma_format == PIPE_VIDEO_CHROMA_FORMAT_420) { + *width /= 2; + *height /= 2; + } else if (p_surf-templat.chroma_format == PIPE_VIDEO_CHROMA_FORMAT_422) + *width /= 2; + } + if (p_surf-templat.interlaced) + *height /= 2; +} + VAStatus vlVaQueryImageFormats(VADriverContextP ctx, VAImageFormat *format_list, int *num_formats) { + struct pipe_screen *pscreen; + enum pipe_format format; + int i; + if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; if (!(format_list num_formats)) - return VA_STATUS_ERROR_UNKNOWN; + return VA_STATUS_ERROR_INVALID_PARAMETER; *num_formats = 0; + pscreen = VL_VA_PSCREEN(ctx); + for (i = 0; i VL_VA_MAX_IMAGE_FORMATS; ++i) { + format = YCbCrToPipe(formats[i].fourcc); + if (pscreen-is_video_format_supported(pscreen, format, + PIPE_VIDEO_PROFILE_UNKNOWN, + PIPE_VIDEO_ENTRYPOINT_BITSTREAM)) + format_list[(*num_formats)++] = formats[i]; + } return VA_STATUS_SUCCESS; } @@ -45,16 +93,61 @@ vlVaQueryImageFormats(VADriverContextP ctx, VAImageFormat *format_list, int *num VAStatus vlVaCreateImage(VADriverContextP ctx, VAImageFormat *format, int width, int height, VAImage *image) { + vlVaDriver *drv; + int w, h; + if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; - if(!format) - return VA_STATUS_ERROR_UNKNOWN; + if (!(format image width height)) + return VA_STATUS_ERROR_INVALID_PARAMETER; + + drv = VL_VA_DRIVER(ctx); - if (!(width height)) + image-image_id = handle_table_add(drv-htab, image); + image-format = *format; + image-width = width; + image-height = height; + w = align(width, 2); + h = align(width, 2); + + switch (format-fourcc) { + case VA_FOURCC('N','V','1','2'): + image-num_planes = 2; + image-pitches[0] = w; + image-offsets[0] = 0; + image-pitches[1] = w; + image-offsets[1] = w * h; + image-data_size = w * h * 3 / 2; + break; + + case VA_FOURCC('I','4','2','0'): + case VA_FOURCC('Y','V','1','2'): + image-num_planes = 3; + image-pitches[0] = w; + image-offsets[0] = 0; + image-pitches[1] = w / 2; + image-offsets[1] = w * h; + image-pitches[2] = w / 2; + image-offsets[2] = w * h * 5 / 4; + image-data_size = w * h * 3 / 2; + break; + + case VA_FOURCC('U','Y','V','Y'): + case VA_FOURCC('Y','U','Y','V'): + image-num_planes = 1; + image-pitches[0] = w * 4; + image-offsets[0] = 0; + image-data_size = w * h * 4; + break; + + default: return VA_STATUS_ERROR_INVALID_IMAGE_FORMAT; + } - return VA_STATUS_ERROR_UNIMPLEMENTED; + return vlVaCreateBuffer(ctx, 0, VAImageBufferType, + align(image-data_size, 16), +
[Mesa-dev] [PATCH v3 2/6] st/va: skeleton VAAPI state tracker
From: Christian König christian.koe...@amd.com This patch adds a skeleton VA-API state tracker, which is filled with live in the subsequent patches. v2: fixes in configure.ac and va state_tracker Makefile.am v3: configure.ac: generate a marco for link to xcb auto-dectecting VA version rebase with upstream changes state-trackers/va/Makefile.am: pass symbol for auto-detecting VA version targets/va/Makefile.am rebase with omx/Makefile.am use macro VA_DRIVER_INIT_FUNC for auto-detect Signed-off-by: Christian König christian.koe...@amd.com Signed-off-by: Leo Liu leo@amd.com --- configure.ac | 34 ++ src/gallium/Makefile.am| 4 + src/gallium/state_trackers/va/Makefile.am | 37 ++ src/gallium/state_trackers/va/Makefile.sources | 10 ++ src/gallium/state_trackers/va/buffer.c | 87 ++ src/gallium/state_trackers/va/config.c | 91 +++ src/gallium/state_trackers/va/context.c| 151 + src/gallium/state_trackers/va/display.c| 61 ++ src/gallium/state_trackers/va/image.c | 106 + src/gallium/state_trackers/va/picture.c| 56 + src/gallium/state_trackers/va/subpicture.c | 115 +++ src/gallium/state_trackers/va/surface.c| 111 ++ src/gallium/state_trackers/va/va_private.h | 116 +++ src/gallium/targets/va/Makefile.am | 58 ++ src/gallium/targets/va/target.c| 1 + 15 files changed, 1038 insertions(+) create mode 100644 src/gallium/state_trackers/va/Makefile.am create mode 100644 src/gallium/state_trackers/va/Makefile.sources create mode 100644 src/gallium/state_trackers/va/buffer.c create mode 100644 src/gallium/state_trackers/va/config.c create mode 100644 src/gallium/state_trackers/va/context.c create mode 100644 src/gallium/state_trackers/va/display.c create mode 100644 src/gallium/state_trackers/va/image.c create mode 100644 src/gallium/state_trackers/va/picture.c create mode 100644 src/gallium/state_trackers/va/subpicture.c create mode 100644 src/gallium/state_trackers/va/surface.c create mode 100644 src/gallium/state_trackers/va/va_private.h create mode 100644 src/gallium/targets/va/Makefile.am create mode 100644 src/gallium/targets/va/target.c diff --git a/configure.ac b/configure.ac index 52f8a52..9cd7f4b 100644 --- a/configure.ac +++ b/configure.ac @@ -673,6 +673,11 @@ AC_ARG_ENABLE([omx], [enable OpenMAX library @:@default=no@:@])], [enable_omx=$enableval], [enable_omx=no]) +AC_ARG_ENABLE([va], + [AS_HELP_STRING([--enable-va], + [enable va library @:@default=auto@:@])], + [enable_va=$enableval], + [enable_va=auto]) AC_ARG_ENABLE([opencl], [AS_HELP_STRING([--enable-opencl], [enable OpenCL library @:@default=no@:@])], @@ -744,6 +749,7 @@ if test x$enable_opengl = xno -a \ x$enable_xvmc = xno -a \ x$enable_vdpau = xno -a \ x$enable_omx = xno -a \ +x$enable_va = xno -a \ x$enable_opencl = xno; then AC_MSG_ERROR([at least one API should be enabled]) fi @@ -1404,6 +1410,10 @@ if test -n $with_gallium_drivers -a x$with_gallium_drivers != xswrast; then if test x$enable_omx = xauto; then PKG_CHECK_EXISTS([libomxil-bellagio], [enable_omx=yes], [enable_omx=no]) fi + +if test x$enable_va = xauto; then +PKG_CHECK_EXISTS([libva], [enable_va=yes], [enable_va=no]) +fi fi if test x$enable_xvmc = xyes; then @@ -1425,6 +1435,16 @@ if test x$enable_omx = xyes; then fi AM_CONDITIONAL(HAVE_ST_OMX, test x$enable_omx = xyes) +if test x$enable_va = xyes; then +PKG_CHECK_MODULES([VA], [libva = 0.35.0 x11-xcb xcb-dri2 = $XCBDRI2_REQUIRED], + [VA_LIBS=`$PKG_CONFIG --libs x11-xcb xcb-dri2`]) +VA_DRIVER_INIT_FUNC=`$PKG_CONFIG --modversion libva|sed -n 's/\(.*\)\.\(.*\)\..*$/__vaDriverInit_\1_\2/p'` +AC_SUBST([VA_DRIVER_INIT_FUNC]) +GALLIUM_STATE_TRACKERS_DIRS=$GALLIUM_STATE_TRACKERS_DIRS va +enable_gallium_loader=$enable_shared_pipe_drivers +fi +AM_CONDITIONAL(HAVE_ST_VA, test x$enable_va = xyes) + dnl dnl OpenCL configuration dnl @@ -1796,6 +1816,15 @@ AC_ARG_WITH([omx-libdir], [OMX_LIB_INSTALL_DIR=$OMX_LIB_INSTALL_DIR_DEFAULT]) AC_SUBST([OMX_LIB_INSTALL_DIR]) +dnl Directory for VA libs + +AC_ARG_WITH([va-libdir], +[AS_HELP_STRING([--with-va-libdir=DIR], +[directory for the VA libraries @:@default=`pkg-config libva --variable=driverdir`@:@])], +[VA_LIB_INSTALL_DIR=$withval], +[VA_LIB_INSTALL_DIR=`pkg-config libva --variable=driverdir`]) +AC_SUBST([VA_LIB_INSTALL_DIR]) + dnl Directory for OpenCL libs AC_ARG_WITH([opencl-libdir], [AS_HELP_STRING([--with-opencl-libdir=DIR], @@ -1829,6 +1858,9 @@ gallium_require_drm_loader() { fi
Re: [Mesa-dev] [PATCH v2 6/6] st/va: implement vlVa(Query|Create|Get|Put|Destroy)Image
On Fri, Sep 26, 2014 at 4:30 PM, Leo Liu leo@amd.com wrote: This patch implements functions for images support, which basically supports copy data between video surface and user buffers, in this case supports SW decode, and other video output v2: fix buffer size for odd-sized image case expose I420 format as well Signed-off-by: Leo Liu leo@amd.com --- src/gallium/state_trackers/va/context.c| 2 +- src/gallium/state_trackers/va/image.c | 254 - src/gallium/state_trackers/va/va_private.h | 22 +++ 3 files changed, 269 insertions(+), 9 deletions(-) diff --git a/src/gallium/state_trackers/va/context.c b/src/gallium/state_trackers/va/context.c index 1819ec5..ae87d3b 100644 --- a/src/gallium/state_trackers/va/context.c +++ b/src/gallium/state_trackers/va/context.c @@ -121,7 +121,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx) ctx-max_profiles = PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH - PIPE_VIDEO_PROFILE_UNKNOWN; ctx-max_entrypoints = 1; ctx-max_attributes = 1; - ctx-max_image_formats = 1; + ctx-max_image_formats = VL_VA_MAX_IMAGE_FORMATS; ctx-max_subpic_formats = 1; ctx-max_display_attributes = 1; ctx-str_vendor = mesa gallium vaapi; diff --git a/src/gallium/state_trackers/va/image.c b/src/gallium/state_trackers/va/image.c index 8aaa29c..d3c9f20 100644 --- a/src/gallium/state_trackers/va/image.c +++ b/src/gallium/state_trackers/va/image.c @@ -26,18 +26,66 @@ * **/ +#include pipe/p_screen.h + +#include util/u_memory.h +#include util/u_handle_table.h +#include util/u_surface.h +#include util/u_video.h + +#include vl/vl_winsys.h + #include va_private.h +static const VAImageFormat formats[VL_VA_MAX_IMAGE_FORMATS] = +{ + {VA_FOURCC('N','V','1','2')}, + {VA_FOURCC('I','4','2','0')}, + {VA_FOURCC('Y','V','1','2')}, + {VA_FOURCC('Y','U','Y','V')}, + {VA_FOURCC('U','Y','V','Y')}, +}; + +static void +vlVaVideoSurfaceSize(vlVaSurface *p_surf, int component, + unsigned *width, unsigned *height) +{ + *width = p_surf-templat.width; + *height = p_surf-templat.height; + + if (component 0) { + if (p_surf-templat.chroma_format == PIPE_VIDEO_CHROMA_FORMAT_420) { + *width /= 2; + *height /= 2; + } else if (p_surf-templat.chroma_format == PIPE_VIDEO_CHROMA_FORMAT_422) + *width /= 2; + } + if (p_surf-templat.interlaced) + *height /= 2; +} + VAStatus vlVaQueryImageFormats(VADriverContextP ctx, VAImageFormat *format_list, int *num_formats) { + struct pipe_screen *pscreen; + enum pipe_format format; + int i; + if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; if (!(format_list num_formats)) - return VA_STATUS_ERROR_UNKNOWN; + return VA_STATUS_ERROR_INVALID_PARAMETER; *num_formats = 0; + pscreen = VL_VA_PSCREEN(ctx); + for (i = 0; i VL_VA_MAX_IMAGE_FORMATS; ++i) { + format = YCbCrToPipe(formats[i].fourcc); + if (pscreen-is_video_format_supported(pscreen, format, + PIPE_VIDEO_PROFILE_UNKNOWN, + PIPE_VIDEO_ENTRYPOINT_BITSTREAM)) + format_list[(*num_formats)++] = formats[i]; + } return VA_STATUS_SUCCESS; } @@ -45,16 +93,61 @@ vlVaQueryImageFormats(VADriverContextP ctx, VAImageFormat *format_list, int *num VAStatus vlVaCreateImage(VADriverContextP ctx, VAImageFormat *format, int width, int height, VAImage *image) { + vlVaDriver *drv; + int w, h; + if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; - if(!format) - return VA_STATUS_ERROR_UNKNOWN; + if (!(format image width height)) + return VA_STATUS_ERROR_INVALID_PARAMETER; + + drv = VL_VA_DRIVER(ctx); - if (!(width height)) + image-image_id = handle_table_add(drv-htab, image); + image-format = *format; + image-width = width; + image-height = height; + w = align(width, 2); + h = align(width, 2); + + switch (format-fourcc) { + case VA_FOURCC('N','V','1','2'): + image-num_planes = 2; + image-pitches[0] = w; + image-offsets[0] = 0; + image-pitches[1] = w; + image-offsets[1] = w * h; + image-data_size = w * h * 3 / 2; + break; + + case VA_FOURCC('I','4','2','0'): + case VA_FOURCC('Y','V','1','2'): + image-num_planes = 3; + image-pitches[0] = w; + image-offsets[0] = 0; + image-pitches[1] = w / 2; + image-offsets[1] = w * h; + image-pitches[2] = w / 2; + image-offsets[2] = w * h * 5 / 4; + image-data_size = w * h * 3 / 2; + break; + + case VA_FOURCC('U','Y','V','Y'): + case VA_FOURCC('Y','U','Y','V'): + image-num_planes = 1; + image-pitches[0] = w * 4; + image-offsets[0] = 0; + image-data_size = w * h * 4; Is this right? YUYV/UYVY stores 2 pixels in 4
[Mesa-dev] [PATCH 2/3] driconf: Update Spanish translation
--- src/mesa/drivers/dri/common/xmlpool/es.po | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/common/xmlpool/es.po b/src/mesa/drivers/dri/common/xmlpool/es.po index 1733b76..a68c329 100644 --- a/src/mesa/drivers/dri/common/xmlpool/es.po +++ b/src/mesa/drivers/dri/common/xmlpool/es.po @@ -10,7 +10,7 @@ msgstr Project-Id-Version: es\n Report-Msgid-Bugs-To: \n POT-Creation-Date: 2014-09-25 22:29-0600\n -PO-Revision-Date: 2014-01-15 10:34-0700\n +PO-Revision-Date: 2014-09-26 14:22-0700\n Last-Translator: Alex Henrie alexhenri...@gmail.com\n Language-Team: Spanish e...@li.org\n Language: es\n @@ -18,7 +18,7 @@ msgstr Content-Type: text/plain; charset=UTF-8\n Content-Transfer-Encoding: 8bit\n Plural-Forms: nplurals=2; plural=(n != 1);\n -X-Generator: Poedit 1.5.4\n +X-Generator: Poedit 1.6.9\n #: t_options.h:56 msgid Debugging @@ -72,7 +72,7 @@ msgstr #: t_options.h:110 msgid Allow GLSL #extension directives in the middle of shaders -msgstr +msgstr Permite directivas #extension GLSL en medio de los shaders #: t_options.h:120 msgid Image Quality @@ -309,8 +309,8 @@ msgstr Crear todos los visuales con buffer de profundidad #: t_options.h:337 msgid Initialization -msgstr +msgstr Inicialización #: t_options.h:341 msgid Define the graphic device to use if possible -msgstr +msgstr Define el dispositivo de gráficos que usar si es posible -- 2.1.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] driconf: Synchronize po files
--- src/mesa/drivers/dri/common/xmlpool/ca.po | 119 -- src/mesa/drivers/dri/common/xmlpool/de.po | 118 - src/mesa/drivers/dri/common/xmlpool/es.po | 118 - src/mesa/drivers/dri/common/xmlpool/fr.po | 118 - src/mesa/drivers/dri/common/xmlpool/nl.po | 118 - src/mesa/drivers/dri/common/xmlpool/sv.po | 118 - 6 files changed, 390 insertions(+), 319 deletions(-) diff --git a/src/mesa/drivers/dri/common/xmlpool/ca.po b/src/mesa/drivers/dri/common/xmlpool/ca.po index c0cf7f6..1db9703 100644 --- a/src/mesa/drivers/dri/common/xmlpool/ca.po +++ b/src/mesa/drivers/dri/common/xmlpool/ca.po @@ -21,12 +21,11 @@ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS # IN THE SOFTWARE. - msgid msgstr Project-Id-Version: Mesa 10.1.0-devel\n Report-Msgid-Bugs-To: \n -POT-Creation-Date: 2014-01-13 22:30-0700\n +POT-Creation-Date: 2014-09-25 22:29-0600\n PO-Revision-Date: 2014-01-15 10:37-0700\n Last-Translator: Alex Henrie alexhenri...@gmail.com\n Language-Team: Catalan c...@li.org\n @@ -87,108 +86,112 @@ msgstr Força una versió GLSL per defecte en els shaders als quals falta una línia #version explícita -#: t_options.h:115 +#: t_options.h:110 +msgid Allow GLSL #extension directives in the middle of shaders +msgstr + +#: t_options.h:120 msgid Image Quality msgstr Qualitat d'Imatge -#: t_options.h:128 +#: t_options.h:133 msgid Texture color depth msgstr Profunditat de color de textura -#: t_options.h:129 +#: t_options.h:134 msgid Prefer frame buffer color depth msgstr Prefereix profunditat de color del framebuffer -#: t_options.h:130 +#: t_options.h:135 msgid Prefer 32 bits per texel msgstr Prefereix 32 bits per texel -#: t_options.h:131 +#: t_options.h:136 msgid Prefer 16 bits per texel msgstr Prefereix 16 bits per texel -#: t_options.h:132 +#: t_options.h:137 msgid Force 16 bits per texel msgstr Força 16 bits per texel -#: t_options.h:138 +#: t_options.h:143 msgid Initial maximum value for anisotropic texture filtering msgstr Valor màxim inicial per a la filtració de textura anisòtropa -#: t_options.h:143 +#: t_options.h:148 msgid Forbid negative texture LOD bias msgstr Prohibeix una parcialitat negativa del Nivell de Detalle (LOD) de les textures -#: t_options.h:148 +#: t_options.h:153 msgid Enable S3TC texture compression even if software support is not available msgstr Habilitar la compressió de textures S3TC encara que el suport de programari no estigui disponible -#: t_options.h:155 +#: t_options.h:160 msgid Initial color reduction method msgstr Mètode inicial de reducció de color -#: t_options.h:156 +#: t_options.h:161 msgid Round colors msgstr Colors arrodonits -#: t_options.h:157 +#: t_options.h:162 msgid Dither colors msgstr Colors tramats -#: t_options.h:165 +#: t_options.h:170 msgid Color rounding method msgstr Mètode d'arrodoniment de color -#: t_options.h:166 +#: t_options.h:171 msgid Round color components downward msgstr Arrondeix els components de color a baix -#: t_options.h:167 +#: t_options.h:172 msgid Round to nearest color msgstr Arrondeix al color més proper -#: t_options.h:176 +#: t_options.h:181 msgid Color dithering method msgstr Mètode de tramat de color -#: t_options.h:177 +#: t_options.h:182 msgid Horizontal error diffusion msgstr Difusió d'error horitzontal -#: t_options.h:178 +#: t_options.h:183 msgid Horizontal error diffusion, reset error at line start msgstr Difusió d'error horitzontal, reinicia l'error a l'inici de la línia -#: t_options.h:179 +#: t_options.h:184 msgid Ordered 2D color dithering msgstr Tramat de color 2D ordenat -#: t_options.h:185 +#: t_options.h:190 msgid Floating point depth buffer msgstr Buffer de profunditat de punt flotant -#: t_options.h:190 +#: t_options.h:195 msgid A post-processing filter to cel-shade the output msgstr Un filtre de postprocessament per a aplicar cel shading a la sortida -#: t_options.h:195 +#: t_options.h:200 msgid A post-processing filter to remove the red channel msgstr Un filtre de postprocessament per a treure el canal vermell -#: t_options.h:200 +#: t_options.h:205 msgid A post-processing filter to remove the green channel msgstr Un filtre de postprocessament per a treure el canal verd -#: t_options.h:205 +#: t_options.h:210 msgid A post-processing filter to remove the blue channel msgstr Un filtre de postprocessament per a treure el canal blau -#: t_options.h:210 +#: t_options.h:215 msgid Morphological anti-aliasing based on Jimenez\\' MLAA. 0 to disable, 8 for default quality @@ -196,7 +199,7 @@ msgstr Antialiàsing morfològic basat en el MLAA de Jimenez. 0 per deshabilitar, 8 per qualitat per defecte -#: t_options.h:215 +#:
[Mesa-dev] [PATCH 3/3] driconf: Correct and update Catalan translation
--- src/mesa/drivers/dri/common/xmlpool/ca.po | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/common/xmlpool/ca.po b/src/mesa/drivers/dri/common/xmlpool/ca.po index 1db9703..23e9f42 100644 --- a/src/mesa/drivers/dri/common/xmlpool/ca.po +++ b/src/mesa/drivers/dri/common/xmlpool/ca.po @@ -26,14 +26,14 @@ msgstr Project-Id-Version: Mesa 10.1.0-devel\n Report-Msgid-Bugs-To: \n POT-Creation-Date: 2014-09-25 22:29-0600\n -PO-Revision-Date: 2014-01-15 10:37-0700\n +PO-Revision-Date: 2014-09-26 14:43-0700\n Last-Translator: Alex Henrie alexhenri...@gmail.com\n Language-Team: Catalan c...@li.org\n Language: ca\n MIME-Version: 1.0\n Content-Type: text/plain; charset=UTF-8\n Content-Transfer-Encoding: 8bit\n -X-Generator: Poedit 1.5.4\n +X-Generator: Poedit 1.6.9\n #: t_options.h:56 msgid Debugging @@ -72,8 +72,8 @@ msgstr Deshabilita la barreja de font dual #: t_options.h:95 msgid Disable backslash-based line continuations in GLSL source msgstr -Deshabilitar les continuacions de línia basades en barra invertida en la -font GLSL +Deshabilita les continuacions de línia basades en barra invertida en la font +GLSL #: t_options.h:100 msgid Disable GL_ARB_shader_bit_encoding @@ -88,7 +88,7 @@ msgstr #: t_options.h:110 msgid Allow GLSL #extension directives in the middle of shaders -msgstr +msgstr Permet les directives #extension GLSL en el mitjà dels shaders #: t_options.h:120 msgid Image Quality @@ -128,7 +128,7 @@ msgstr msgid Enable S3TC texture compression even if software support is not available msgstr -Habilitar la compressió de textures S3TC encara que el suport de programari +Habilita la compressió de textures S3TC encara que el suport de programari no estigui disponible #: t_options.h:160 @@ -325,8 +325,8 @@ msgstr Crea tots els visuals amb buffer de profunditat #: t_options.h:337 msgid Initialization -msgstr +msgstr Inicialització #: t_options.h:341 msgid Define the graphic device to use if possible -msgstr +msgstr Defineix el dispositiu de gràfics que usar si és possible -- 2.1.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] mesa: remove last DJGPP remains
Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/mapi/glapi/gen/gl_x86_asm.py | 2 +- src/mesa/main/dlopen.h| 7 --- src/mesa/main/texcompress_s3tc.c | 2 -- src/mesa/x86/assyntax.h | 6 +++--- src/mesa/x86/read_rgba_span_x86.S | 4 ++-- 5 files changed, 6 insertions(+), 15 deletions(-) diff --git a/src/mapi/glapi/gen/gl_x86_asm.py b/src/mapi/glapi/gen/gl_x86_asm.py index 919bbc0..d87d0bd 100644 --- a/src/mapi/glapi/gen/gl_x86_asm.py +++ b/src/mapi/glapi/gen/gl_x86_asm.py @@ -72,7 +72,7 @@ class PrintGenericStubs(gl_XML.gl_print_base): print '' print '#define GL_OFFSET(x) CODEPTR(REGOFF(4 * x, EAX))' print '' -print '#if defined(GNU_ASSEMBLER) !defined(__DJGPP__) !defined(__MINGW32__) !defined(__APPLE__)' +print '#if defined(GNU_ASSEMBLER) !defined(__MINGW32__) !defined(__APPLE__)' print '#define GLOBL_FN(x) GLOBL x ; .type x, @function' print '#else' print '#define GLOBL_FN(x) GLOBL x' diff --git a/src/mesa/main/dlopen.h b/src/mesa/main/dlopen.h index 55a56f0..3754ec1 100644 --- a/src/mesa/main/dlopen.h +++ b/src/mesa/main/dlopen.h @@ -73,13 +73,6 @@ _mesa_dlsym(void *handle, const char *fname) } u; #if defined(__blrts) u.v = NULL; -#elif defined(__DJGPP__) - /* need '_' prefix on symbol names */ - char fname2[1000]; - fname2[0] = '_'; - strncpy(fname2 + 1, fname, 998); - fname2[999] = 0; - u.v = dlsym(handle, fname2); #elif defined(HAVE_DLOPEN) u.v = dlsym(handle, fname); #elif defined(__MINGW32__) diff --git a/src/mesa/main/texcompress_s3tc.c b/src/mesa/main/texcompress_s3tc.c index 5b275ef..254f84e 100644 --- a/src/mesa/main/texcompress_s3tc.c +++ b/src/mesa/main/texcompress_s3tc.c @@ -51,8 +51,6 @@ #define DXTN_LIBNAME dxtn.dll #define RTLD_LAZY 0 #define RTLD_GLOBAL 0 -#elif defined(__DJGPP__) -#define DXTN_LIBNAME dxtn.dxe #else #define DXTN_LIBNAME libtxc_dxtn.so #endif diff --git a/src/mesa/x86/assyntax.h b/src/mesa/x86/assyntax.h index fa7d92e..67867bd 100644 --- a/src/mesa/x86/assyntax.h +++ b/src/mesa/x86/assyntax.h @@ -255,7 +255,7 @@ #endif /* ACK_ASSEMBLER */ -#if defined(__QNX__) || defined(Lynx) || (defined(SYSV) || defined(SVR4)) !defined(ACK_ASSEMBLER) || defined(__ELF__) || defined(__GNU__) || defined(__GNUC__) !defined(__DJGPP__) !defined(__MINGW32__) +#if defined(__QNX__) || defined(Lynx) || (defined(SYSV) || defined(SVR4)) !defined(ACK_ASSEMBLER) || defined(__ELF__) || defined(__GNU__) || defined(__GNUC__) !defined(__MINGW32__) #define GLNAME(a) a #else #define GLNAME(a) CONCAT(_,a) @@ -1727,9 +1727,9 @@ * If we build with gcc's -fvisibility=hidden flag, we'll need to change * the symbol visibility mode to 'default'. */ -#if defined(GNU_ASSEMBLER) !defined(__DJGPP__) !defined(__MINGW32__) !defined(__APPLE__) +#if defined(GNU_ASSEMBLER) !defined(__MINGW32__) !defined(__APPLE__) # define HIDDEN(x) .hidden x -#elif defined(__GNUC__) !defined(__DJGPP__) !defined(__MINGW32__) !defined(__APPLE__) +#elif defined(__GNUC__) !defined(__MINGW32__) !defined(__APPLE__) # pragma GCC visibility push(default) # define HIDDEN(x) .hidden x #else diff --git a/src/mesa/x86/read_rgba_span_x86.S b/src/mesa/x86/read_rgba_span_x86.S index 8177299..5def1f8 100644 --- a/src/mesa/x86/read_rgba_span_x86.S +++ b/src/mesa/x86/read_rgba_span_x86.S @@ -31,7 +31,7 @@ */ .file read_rgba_span_x86.S -#if !defined(__DJGPP__) !defined(__MINGW32__) !defined(__APPLE__) /* this one cries for assyntax.h */ +#if !defined(__MINGW32__) !defined(__APPLE__) /* this one cries for assyntax.h */ /* Kevin F. Quinn 2nd July 2006 * Replaced data segment constants with text-segment instructions. */ @@ -671,7 +671,7 @@ _generic_read_RGBA_span_RGB565_MMX: emms #endif ret -#endif /* !defined(__DJGPP__) !defined(__MINGW32__) !defined(__APPLE__) */ +#endif /* !defined(__MINGW32__) !defined(__APPLE__) */ #if defined (__ELF__) defined (__linux__) .section .note.GNU-stack,,%progbits -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] Remove Bluegene/L wrappers
Added back in 2009, with osmesa/GLU in mind. Unlikely to be working any more since the removal of the static makefiles. Cc: Brian Paul bri...@vmware.com Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/mesa/main/compiler.h | 2 +- src/mesa/main/dlopen.h | 12 +++- 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/src/mesa/main/compiler.h b/src/mesa/main/compiler.h index 185c911..34671dc 100644 --- a/src/mesa/main/compiler.h +++ b/src/mesa/main/compiler.h @@ -150,7 +150,7 @@ extern C { #elif defined(__APPLE__) #include CoreFoundation/CFByteOrder.h #define CPU_TO_LE32( x ) CFSwapInt32HostToLittle( x ) -#elif (defined(_AIX) || defined(__blrts)) +#elif (defined(_AIX)) static inline GLuint CPU_TO_LE32(GLuint x) { return (((x 0x00ff) 24) | diff --git a/src/mesa/main/dlopen.h b/src/mesa/main/dlopen.h index 3754ec1..1e77849 100644 --- a/src/mesa/main/dlopen.h +++ b/src/mesa/main/dlopen.h @@ -47,9 +47,7 @@ typedef void (*GenericFunc)(void); static inline void * _mesa_dlopen(const char *libname, int flags) { -#if defined(__blrts) - return NULL; -#elif defined(HAVE_DLOPEN) +#if defined(HAVE_DLOPEN) flags = RTLD_LAZY | RTLD_GLOBAL; /* Overriding flags at this time */ return dlopen(libname, flags); #elif defined(__MINGW32__) @@ -71,9 +69,7 @@ _mesa_dlsym(void *handle, const char *fname) void *v; GenericFunc f; } u; -#if defined(__blrts) - u.v = NULL; -#elif defined(HAVE_DLOPEN) +#if defined(HAVE_DLOPEN) u.v = dlsym(handle, fname); #elif defined(__MINGW32__) u.v = (void *) GetProcAddress(handle, fname); @@ -89,9 +85,7 @@ _mesa_dlsym(void *handle, const char *fname) static inline void _mesa_dlclose(void *handle) { -#if defined(__blrts) - (void) handle; -#elif defined(HAVE_DLOPEN) +#if defined(HAVE_DLOPEN) dlclose(handle); #elif defined(__MINGW32__) FreeLibrary(handle); -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965: Use unsynchronized maps for the program cache on LLC platforms.
On Friday, September 26, 2014 09:22:31 AM Kristian Høgsberg wrote: On Fri, Aug 29, 2014 at 11:10:50PM -0700, Kenneth Graunke wrote: There's no reason to stall on pwrite - the CPU always appends to the buffer and never modifies existing contents, and the GPU never writes it. Further, the CPU always appends new data before submitting a batch that requires it. This code predates the unsynchronized mapping feature, so we simply didn't have the option when it was written. Ideally, we would do this for non-LLC platforms too, but unsynchronized mapping support only exists for LLC systems. Saves repeated 0.001ms stalls on program upload. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_state_cache.c | 34 +++-- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c index b9bb0fc..1d2d32f 100644 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c @@ -172,14 +172,23 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t new_size) drm_intel_bo *new_bo; new_bo = drm_intel_bo_alloc(brw-bufmgr, program cache, new_size, 64); + if (brw-has_llc) + drm_intel_gem_bo_map_unsynchronized(new_bo); /* Copy any existing data that needs to be saved. */ if (cache-next_offset != 0) { - brw_bo_map(brw, cache-bo, false, program cache); - drm_intel_bo_subdata(new_bo, 0, cache-next_offset, cache-bo-virtual); - drm_intel_bo_unmap(cache-bo); + if (brw-has_llc) { + memcpy(new_bo-virtual, cache-bo-virtual, cache-next_offset); Move the drm_intel_gem_bo_map_unsynchronized() and drm_intel_bo_unmap() calls into this block so they bracket the memcpy as for the subdata case below? Other than that, Reviewed-by: Kristian Høgsberg k...@bitplanet.net That won't work---the point is to map new_bo, and leave it mapped...and unmap the old BO before throwing it away. If I moved the map call into the if (cache-next_offset != 0) block, then the initial mapping would never occur. + } else { + brw_bo_map(brw, cache-bo, false, program cache); + drm_intel_bo_subdata(new_bo, 0, cache-next_offset, + cache-bo-virtual); + drm_intel_bo_unmap(cache-bo); + } } + if (brw-has_llc) + drm_intel_bo_unmap(cache-bo); drm_intel_bo_unreference(cache-bo); cache-bo = new_bo; cache-bo_used_by_gpu = false; @@ -222,9 +231,11 @@ brw_try_upload_using_copy(struct brw_cache *cache, continue; } -brw_bo_map(brw, cache-bo, false, program cache); + if (!brw-has_llc) +brw_bo_map(brw, cache-bo, false, program cache); ret = memcmp(cache-bo-virtual + item-offset, data, item-size); -drm_intel_bo_unmap(cache-bo); + if (!brw-has_llc) +drm_intel_bo_unmap(cache-bo); if (ret) continue; @@ -257,7 +268,7 @@ brw_upload_item_data(struct brw_cache *cache, /* If we would block on writing to an in-use program BO, just * recreate it. */ - if (cache-bo_used_by_gpu) { + if (!brw-has_llc cache-bo_used_by_gpu) { perf_debug(Copying busy program cache buffer.\n); brw_cache_new_bo(cache, cache-bo-size); } @@ -280,6 +291,7 @@ brw_upload_cache(struct brw_cache *cache, uint32_t *out_offset, void *out_aux) { + struct brw_context *brw = cache-brw; struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item); GLuint hash; void *tmp; @@ -320,7 +332,11 @@ brw_upload_cache(struct brw_cache *cache, cache-n_items++; /* Copy data to the buffer */ - drm_intel_bo_subdata(cache-bo, item-offset, data_size, data); + if (brw-has_llc) { + memcpy((char *) cache-bo-virtual + item-offset, data, data_size); + } else { + drm_intel_bo_subdata(cache-bo, item-offset, data_size, data); + } *out_offset = item-offset; *(void **)out_aux = (void *)((char *)item-key + item-key_size); @@ -342,6 +358,8 @@ brw_init_caches(struct brw_context *brw) cache-bo = drm_intel_bo_alloc(brw-bufmgr, program cache, 4096, 64); + if (brw-has_llc) + drm_intel_gem_bo_map_unsynchronized(cache-bo); cache-aux_compare[BRW_VS_PROG] = brw_vs_prog_data_compare; cache-aux_compare[BRW_GS_PROG] = brw_gs_prog_data_compare; @@ -408,6 +426,8 @@ brw_destroy_cache(struct brw_context *brw, struct brw_cache *cache) DBG(%s\n, __FUNCTION__); + if (brw-has_llc) + drm_intel_bo_unmap(cache-bo); drm_intel_bo_unreference(cache-bo); cache-bo = NULL; brw_clear_cache(brw, cache);
Re: [Mesa-dev] [PATCH 4/5] i965/fs: Don't invalidate live intervals in saturate propagation.
Patches 2-4 are Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com I'll have to think more about patch 1 On Mon, Sep 8, 2014 at 12:21 PM, Matt Turner matts...@gmail.com wrote: --- src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp index 6f7fb6c..347a78e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp @@ -95,8 +95,7 @@ fs_visitor::opt_saturate_propagation() progress = opt_saturate_propagation_local(this, block) || progress; } - if (progress) - invalidate_live_intervals(); + /* Live intervals are still valid. */ return progress; } -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] i965: Issue performance warnings for program cache related stalls.
On Friday, September 26, 2014 04:41:14 PM Chris Wilson wrote: On Fri, Sep 26, 2014 at 08:36:39AM -0700, Kristian Høgsberg wrote: On Fri, Aug 29, 2014 at 11:10:49PM -0700, Kenneth Graunke wrote: We don't really want extra buffer copying or stalls when mapping, so it'd be nice to know when it's happening. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Reviewed-by: Kristian Høgsberg k...@bitplanet.net This warns if the the program cache is currently being read by the GPU (expected) but a read-read (as used here) does not incur a stall. -Chris Good catch! Since we're doing a read-only mapping, and all of our relocations to this buffer have 0 for the write domains, GEM knows that nobody is altering it, so there shouldn't be a stall. Even though i915_gem_set_domain_ioctl calls i915_gem_object_wait_rendering__nonblocking, it shouldn't actually wait. Thanks for spotting this. I'll drop this hunk. I suppose this is a problem with my stall-warning code in general... drm_intel_bo_busy() == true does not necessarily imply that there will be a stall when mapping it. I hadn't considered that. It sounds like patch 4 (using unsynchronized mappings) is still useful though, as drm_intel_bo_subdata/pwrite doesn't know that it's safe to let the CPU write the buffer even while the GPU is reading it. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: remove last DJGPP remains
And I was just going to start working on the Mesa software rasterizer for DOS. Oh well. Reviewed-by: Ian Romanick ian.d.roman...@intel.com On 09/26/2014 02:14 PM, Emil Velikov wrote: Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/mapi/glapi/gen/gl_x86_asm.py | 2 +- src/mesa/main/dlopen.h| 7 --- src/mesa/main/texcompress_s3tc.c | 2 -- src/mesa/x86/assyntax.h | 6 +++--- src/mesa/x86/read_rgba_span_x86.S | 4 ++-- 5 files changed, 6 insertions(+), 15 deletions(-) diff --git a/src/mapi/glapi/gen/gl_x86_asm.py b/src/mapi/glapi/gen/gl_x86_asm.py index 919bbc0..d87d0bd 100644 --- a/src/mapi/glapi/gen/gl_x86_asm.py +++ b/src/mapi/glapi/gen/gl_x86_asm.py @@ -72,7 +72,7 @@ class PrintGenericStubs(gl_XML.gl_print_base): print '' print '#define GL_OFFSET(x) CODEPTR(REGOFF(4 * x, EAX))' print '' -print '#if defined(GNU_ASSEMBLER) !defined(__DJGPP__) !defined(__MINGW32__) !defined(__APPLE__)' +print '#if defined(GNU_ASSEMBLER) !defined(__MINGW32__) !defined(__APPLE__)' print '#define GLOBL_FN(x) GLOBL x ; .type x, @function' print '#else' print '#define GLOBL_FN(x) GLOBL x' diff --git a/src/mesa/main/dlopen.h b/src/mesa/main/dlopen.h index 55a56f0..3754ec1 100644 --- a/src/mesa/main/dlopen.h +++ b/src/mesa/main/dlopen.h @@ -73,13 +73,6 @@ _mesa_dlsym(void *handle, const char *fname) } u; #if defined(__blrts) u.v = NULL; -#elif defined(__DJGPP__) - /* need '_' prefix on symbol names */ - char fname2[1000]; - fname2[0] = '_'; - strncpy(fname2 + 1, fname, 998); - fname2[999] = 0; - u.v = dlsym(handle, fname2); #elif defined(HAVE_DLOPEN) u.v = dlsym(handle, fname); #elif defined(__MINGW32__) diff --git a/src/mesa/main/texcompress_s3tc.c b/src/mesa/main/texcompress_s3tc.c index 5b275ef..254f84e 100644 --- a/src/mesa/main/texcompress_s3tc.c +++ b/src/mesa/main/texcompress_s3tc.c @@ -51,8 +51,6 @@ #define DXTN_LIBNAME dxtn.dll #define RTLD_LAZY 0 #define RTLD_GLOBAL 0 -#elif defined(__DJGPP__) -#define DXTN_LIBNAME dxtn.dxe #else #define DXTN_LIBNAME libtxc_dxtn.so #endif diff --git a/src/mesa/x86/assyntax.h b/src/mesa/x86/assyntax.h index fa7d92e..67867bd 100644 --- a/src/mesa/x86/assyntax.h +++ b/src/mesa/x86/assyntax.h @@ -255,7 +255,7 @@ #endif /* ACK_ASSEMBLER */ -#if defined(__QNX__) || defined(Lynx) || (defined(SYSV) || defined(SVR4)) !defined(ACK_ASSEMBLER) || defined(__ELF__) || defined(__GNU__) || defined(__GNUC__) !defined(__DJGPP__) !defined(__MINGW32__) +#if defined(__QNX__) || defined(Lynx) || (defined(SYSV) || defined(SVR4)) !defined(ACK_ASSEMBLER) || defined(__ELF__) || defined(__GNU__) || defined(__GNUC__) !defined(__MINGW32__) #define GLNAME(a)a #else #define GLNAME(a)CONCAT(_,a) @@ -1727,9 +1727,9 @@ * If we build with gcc's -fvisibility=hidden flag, we'll need to change * the symbol visibility mode to 'default'. */ -#if defined(GNU_ASSEMBLER) !defined(__DJGPP__) !defined(__MINGW32__) !defined(__APPLE__) +#if defined(GNU_ASSEMBLER) !defined(__MINGW32__) !defined(__APPLE__) # define HIDDEN(x) .hidden x -#elif defined(__GNUC__) !defined(__DJGPP__) !defined(__MINGW32__) !defined(__APPLE__) +#elif defined(__GNUC__) !defined(__MINGW32__) !defined(__APPLE__) # pragma GCC visibility push(default) # define HIDDEN(x) .hidden x #else diff --git a/src/mesa/x86/read_rgba_span_x86.S b/src/mesa/x86/read_rgba_span_x86.S index 8177299..5def1f8 100644 --- a/src/mesa/x86/read_rgba_span_x86.S +++ b/src/mesa/x86/read_rgba_span_x86.S @@ -31,7 +31,7 @@ */ .file read_rgba_span_x86.S -#if !defined(__DJGPP__) !defined(__MINGW32__) !defined(__APPLE__) /* this one cries for assyntax.h */ +#if !defined(__MINGW32__) !defined(__APPLE__) /* this one cries for assyntax.h */ /* Kevin F. Quinn 2nd July 2006 * Replaced data segment constants with text-segment instructions. */ @@ -671,7 +671,7 @@ _generic_read_RGBA_span_RGB565_MMX: emms #endif ret -#endif /* !defined(__DJGPP__) !defined(__MINGW32__) !defined(__APPLE__) */ +#endif /* !defined(__MINGW32__) !defined(__APPLE__) */ #if defined (__ELF__) defined (__linux__) .section .note.GNU-stack,,%progbits ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] Remove Bluegene/L wrappers
On 09/26/2014 02:14 PM, Emil Velikov wrote: Added back in 2009, with osmesa/GLU in mind. Unlikely to be working any more since the removal of the static makefiles. Cc: Brian Paul bri...@vmware.com Signed-off-by: Emil Velikov emil.l.veli...@gmail.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com In dlopen.h, the code will be the same... the defined(__blrts) paths are the same as the last #else paths. --- src/mesa/main/compiler.h | 2 +- src/mesa/main/dlopen.h | 12 +++- 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/src/mesa/main/compiler.h b/src/mesa/main/compiler.h index 185c911..34671dc 100644 --- a/src/mesa/main/compiler.h +++ b/src/mesa/main/compiler.h @@ -150,7 +150,7 @@ extern C { #elif defined(__APPLE__) #include CoreFoundation/CFByteOrder.h #define CPU_TO_LE32( x ) CFSwapInt32HostToLittle( x ) -#elif (defined(_AIX) || defined(__blrts)) +#elif (defined(_AIX)) static inline GLuint CPU_TO_LE32(GLuint x) { return (((x 0x00ff) 24) | diff --git a/src/mesa/main/dlopen.h b/src/mesa/main/dlopen.h index 3754ec1..1e77849 100644 --- a/src/mesa/main/dlopen.h +++ b/src/mesa/main/dlopen.h @@ -47,9 +47,7 @@ typedef void (*GenericFunc)(void); static inline void * _mesa_dlopen(const char *libname, int flags) { -#if defined(__blrts) - return NULL; -#elif defined(HAVE_DLOPEN) +#if defined(HAVE_DLOPEN) flags = RTLD_LAZY | RTLD_GLOBAL; /* Overriding flags at this time */ return dlopen(libname, flags); #elif defined(__MINGW32__) @@ -71,9 +69,7 @@ _mesa_dlsym(void *handle, const char *fname) void *v; GenericFunc f; } u; -#if defined(__blrts) - u.v = NULL; -#elif defined(HAVE_DLOPEN) +#if defined(HAVE_DLOPEN) u.v = dlsym(handle, fname); #elif defined(__MINGW32__) u.v = (void *) GetProcAddress(handle, fname); @@ -89,9 +85,7 @@ _mesa_dlsym(void *handle, const char *fname) static inline void _mesa_dlclose(void *handle) { -#if defined(__blrts) - (void) handle; -#elif defined(HAVE_DLOPEN) +#if defined(HAVE_DLOPEN) dlclose(handle); #elif defined(__MINGW32__) FreeLibrary(handle); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] i965: Use 1ull instead of 1 in BRW_NEW_* defines.
Now that the bitfield is a uint64_t, we should use 1ull. Currently, we only have 32 entries, so 1 works fine, but it's not future-proof. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_context.h | 64 - 1 file changed, 32 insertions(+), 32 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3efd582..317724f 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -185,43 +185,43 @@ enum brw_state_id { BRW_NUM_STATE_BITS }; -#define BRW_NEW_URB_FENCE (1 BRW_STATE_URB_FENCE) -#define BRW_NEW_FRAGMENT_PROGRAM(1 BRW_STATE_FRAGMENT_PROGRAM) -#define BRW_NEW_GEOMETRY_PROGRAM(1 BRW_STATE_GEOMETRY_PROGRAM) -#define BRW_NEW_VERTEX_PROGRAM (1 BRW_STATE_VERTEX_PROGRAM) -#define BRW_NEW_CURBE_OFFSETS (1 BRW_STATE_CURBE_OFFSETS) -#define BRW_NEW_REDUCED_PRIMITIVE (1 BRW_STATE_REDUCED_PRIMITIVE) -#define BRW_NEW_PRIMITIVE (1 BRW_STATE_PRIMITIVE) -#define BRW_NEW_CONTEXT (1 BRW_STATE_CONTEXT) -#define BRW_NEW_PSP (1 BRW_STATE_PSP) -#define BRW_NEW_SURFACES (1 BRW_STATE_SURFACES) -#define BRW_NEW_VS_BINDING_TABLE (1 BRW_STATE_VS_BINDING_TABLE) -#define BRW_NEW_GS_BINDING_TABLE (1 BRW_STATE_GS_BINDING_TABLE) -#define BRW_NEW_PS_BINDING_TABLE (1 BRW_STATE_PS_BINDING_TABLE) -#define BRW_NEW_INDICES(1 BRW_STATE_INDICES) -#define BRW_NEW_VERTICES (1 BRW_STATE_VERTICES) +#define BRW_NEW_URB_FENCE (1ull BRW_STATE_URB_FENCE) +#define BRW_NEW_FRAGMENT_PROGRAM(1ull BRW_STATE_FRAGMENT_PROGRAM) +#define BRW_NEW_GEOMETRY_PROGRAM(1ull BRW_STATE_GEOMETRY_PROGRAM) +#define BRW_NEW_VERTEX_PROGRAM (1ull BRW_STATE_VERTEX_PROGRAM) +#define BRW_NEW_CURBE_OFFSETS (1ull BRW_STATE_CURBE_OFFSETS) +#define BRW_NEW_REDUCED_PRIMITIVE (1ull BRW_STATE_REDUCED_PRIMITIVE) +#define BRW_NEW_PRIMITIVE (1ull BRW_STATE_PRIMITIVE) +#define BRW_NEW_CONTEXT (1ull BRW_STATE_CONTEXT) +#define BRW_NEW_PSP (1ull BRW_STATE_PSP) +#define BRW_NEW_SURFACES(1ull BRW_STATE_SURFACES) +#define BRW_NEW_VS_BINDING_TABLE(1ull BRW_STATE_VS_BINDING_TABLE) +#define BRW_NEW_GS_BINDING_TABLE(1ull BRW_STATE_GS_BINDING_TABLE) +#define BRW_NEW_PS_BINDING_TABLE(1ull BRW_STATE_PS_BINDING_TABLE) +#define BRW_NEW_INDICES (1ull BRW_STATE_INDICES) +#define BRW_NEW_VERTICES(1ull BRW_STATE_VERTICES) /** * Used for any batch entry with a relocated pointer that will be used * by any 3D rendering. */ -#define BRW_NEW_BATCH (1 BRW_STATE_BATCH) +#define BRW_NEW_BATCH (1ull BRW_STATE_BATCH) /** \see brw.state.depth_region */ -#define BRW_NEW_INDEX_BUFFER (1 BRW_STATE_INDEX_BUFFER) -#define BRW_NEW_VS_CONSTBUF(1 BRW_STATE_VS_CONSTBUF) -#define BRW_NEW_GS_CONSTBUF(1 BRW_STATE_GS_CONSTBUF) -#define BRW_NEW_PROGRAM_CACHE (1 BRW_STATE_PROGRAM_CACHE) -#define BRW_NEW_STATE_BASE_ADDRESS (1 BRW_STATE_STATE_BASE_ADDRESS) -#define BRW_NEW_VUE_MAP_VS (1 BRW_STATE_VUE_MAP_VS) -#define BRW_NEW_VUE_MAP_GEOM_OUT (1 BRW_STATE_VUE_MAP_GEOM_OUT) -#define BRW_NEW_TRANSFORM_FEEDBACK (1 BRW_STATE_TRANSFORM_FEEDBACK) -#define BRW_NEW_RASTERIZER_DISCARD (1 BRW_STATE_RASTERIZER_DISCARD) -#define BRW_NEW_STATS_WM (1 BRW_STATE_STATS_WM) -#define BRW_NEW_UNIFORM_BUFFER (1 BRW_STATE_UNIFORM_BUFFER) -#define BRW_NEW_ATOMIC_BUFFER (1 BRW_STATE_ATOMIC_BUFFER) -#define BRW_NEW_META_IN_PROGRESS(1 BRW_STATE_META_IN_PROGRESS) -#define BRW_NEW_INTERPOLATION_MAP (1 BRW_STATE_INTERPOLATION_MAP) -#define BRW_NEW_PUSH_CONSTANT_ALLOCATION (1 BRW_STATE_PUSH_CONSTANT_ALLOCATION) -#define BRW_NEW_NUM_SAMPLES (1 BRW_STATE_NUM_SAMPLES) +#define BRW_NEW_INDEX_BUFFER(1ull BRW_STATE_INDEX_BUFFER) +#define BRW_NEW_VS_CONSTBUF (1ull BRW_STATE_VS_CONSTBUF) +#define BRW_NEW_GS_CONSTBUF (1ull BRW_STATE_GS_CONSTBUF) +#define BRW_NEW_PROGRAM_CACHE (1ull BRW_STATE_PROGRAM_CACHE) +#define BRW_NEW_STATE_BASE_ADDRESS (1ull BRW_STATE_STATE_BASE_ADDRESS) +#define BRW_NEW_VUE_MAP_VS (1ull BRW_STATE_VUE_MAP_VS) +#define BRW_NEW_VUE_MAP_GEOM_OUT(1ull BRW_STATE_VUE_MAP_GEOM_OUT) +#define BRW_NEW_TRANSFORM_FEEDBACK (1ull BRW_STATE_TRANSFORM_FEEDBACK) +#define BRW_NEW_RASTERIZER_DISCARD (1ull BRW_STATE_RASTERIZER_DISCARD) +#define BRW_NEW_STATS_WM(1ull BRW_STATE_STATS_WM) +#define BRW_NEW_UNIFORM_BUFFER (1ull BRW_STATE_UNIFORM_BUFFER) +#define BRW_NEW_ATOMIC_BUFFER (1ull
[Mesa-dev] [PATCH 5/5] i965: Drop brwBindProgram driver hook.
This function flagged BRW_NEW_*_PROGRAM When ctx-{Vertex,Geometry,Fragment}Program._Current changes, core Mesa calls the BindProgram driver hook, which flagged BRW_NEW_*_PROGRAM. However, brw_upload_state also checks for that changing, sets the same flags, and also updates brw-fragment_program and so on. So, this looks to be entirely redundant. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_program.c | 20 1 file changed, 20 deletions(-) Tested with Piglit and a manual inspection of an apitrace of Shadowrun Returns, which uses a variety of ARB programs. diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index d782b4f..b37da4e 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -54,25 +54,6 @@ get_new_program_id(struct intel_screen *screen) return id; } -static void brwBindProgram( struct gl_context *ctx, - GLenum target, - struct gl_program *prog ) -{ - struct brw_context *brw = brw_context(ctx); - - switch (target) { - case GL_VERTEX_PROGRAM_ARB: - brw-state.dirty.brw |= BRW_NEW_VERTEX_PROGRAM; - break; - case MESA_GEOMETRY_PROGRAM: - brw-state.dirty.brw |= BRW_NEW_GEOMETRY_PROGRAM; - break; - case GL_FRAGMENT_PROGRAM_ARB: - brw-state.dirty.brw |= BRW_NEW_FRAGMENT_PROGRAM; - break; - } -} - static struct gl_program *brwNewProgram( struct gl_context *ctx, GLenum target, GLuint id ) @@ -250,7 +231,6 @@ void brwInitFragProgFuncs( struct dd_function_table *functions ) { assert(functions-ProgramStringNotify == _tnl_program_string); - functions-BindProgram = brwBindProgram; functions-NewProgram = brwNewProgram; functions-DeleteProgram = brwDeleteProgram; functions-IsProgramNative = brwIsProgramNative; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] i965: Delete CACHE_NEW_BLORP_CONST_COLOR_PROG.
Unused since krh rewrote fast clears to use meta. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_context.h | 2 -- src/mesa/drivers/dri/i965/brw_state_upload.c | 1 - 2 files changed, 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 377853e..3efd582 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -685,7 +685,6 @@ enum brw_cache_id { BRW_CC_UNIT, BRW_WM_PROG, BRW_BLORP_BLIT_PROG, - BRW_BLORP_CONST_COLOR_PROG, BRW_SAMPLER, BRW_WM_UNIT, BRW_SF_PROG, @@ -780,7 +779,6 @@ enum shader_time_shader_type { #define CACHE_NEW_CC_UNIT(1BRW_CC_UNIT) #define CACHE_NEW_WM_PROG(1BRW_WM_PROG) #define CACHE_NEW_BLORP_BLIT_PROG(1BRW_BLORP_BLIT_PROG) -#define CACHE_NEW_BLORP_CONST_COLOR_PROG (1BRW_BLORP_CONST_COLOR_PROG) #define CACHE_NEW_SAMPLER(1BRW_SAMPLER) #define CACHE_NEW_WM_UNIT(1BRW_WM_UNIT) #define CACHE_NEW_SF_PROG(1BRW_SF_PROG) diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index dd0ceb6..f4b0475 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -520,7 +520,6 @@ static struct dirty_bit_map cache_bits[] = { DEFINE_BIT(CACHE_NEW_CC_UNIT), DEFINE_BIT(CACHE_NEW_WM_PROG), DEFINE_BIT(CACHE_NEW_BLORP_BLIT_PROG), - DEFINE_BIT(CACHE_NEW_BLORP_CONST_COLOR_PROG), DEFINE_BIT(CACHE_NEW_SAMPLER), DEFINE_BIT(CACHE_NEW_WM_UNIT), DEFINE_BIT(CACHE_NEW_SF_PROG), -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] i965: Update dirty_bit_map::bit to be a uint64_t.
This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits beyond 1 31. We missed doing this when widening the driver flags from uint32_t to uint64_t. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_state_upload.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index f4b0475..b2d1bdf 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -438,7 +438,7 @@ static void xor_states( struct brw_state_flags *result, } struct dirty_bit_map { - uint32_t bit; + uint64_t bit; char *name; uint32_t count; }; @@ -560,7 +560,7 @@ brw_print_dirty_count(struct dirty_bit_map *bit_map) if (bit_map[i].bit == 0) return; - fprintf(stderr, 0x%08x: %12d (%s)\n, + fprintf(stderr, 0x%08lx: %12d (%s)\n, bit_map[i].bit, bit_map[i].count, bit_map[i].name); } } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] i965: Add missing /* BRW_NEW_FRAGMENT_PROGRAM */ comments.
I had to dig a bit to figure out why this was necessary. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen6_sf_state.c | 5 +++-- src/mesa/drivers/dri/i965/gen7_sf_state.c | 4 ++-- src/mesa/drivers/dri/i965/gen8_sf_state.c | 4 ++-- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c b/src/mesa/drivers/dri/i965/gen6_sf_state.c index 843507e..d0411b0 100644 --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c @@ -155,6 +155,7 @@ calculate_attr_overrides(const struct brw_context *brw, memset(attr_overrides, 0, 16*sizeof(*attr_overrides)); for (int attr = 0; attr VARYING_SLOT_MAX; attr++) { + /* BRW_NEW_FRAGMENT_PROGRAM */ enum glsl_interp_qualifier interp_qualifier = brw-fragment_program-InterpQualifier[attr]; bool is_gl_Color = attr == VARYING_SLOT_COL0 || attr == VARYING_SLOT_COL1; @@ -369,8 +370,8 @@ upload_sf_state(struct brw_context *brw) (1 GEN6_SF_TRIFAN_PROVOKE_SHIFT); } - /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM | -* CACHE_NEW_WM_PROG + /* BRW_NEW_VUE_MAP_GEOM_OUT | BRW_NEW_FRAGMENT_PROGRAM | +* _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM | CACHE_NEW_WM_PROG */ uint32_t urb_entry_read_length; calculate_attr_overrides(brw, attr_overrides, point_sprite_enables, diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c b/src/mesa/drivers/dri/i965/gen7_sf_state.c index 4badc82..67e4448 100644 --- a/src/mesa/drivers/dri/i965/gen7_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c @@ -60,8 +60,8 @@ upload_sbe_state(struct brw_context *brw) } dw1 |= point_sprite_origin; - /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM | -* CACHE_NEW_WM_PROG + /* BRW_NEW_VUE_MAP_GEOM_OUT | BRW_NEW_FRAGMENT_PROGRAM +* _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM | CACHE_NEW_WM_PROG */ uint32_t urb_entry_read_length; calculate_attr_overrides(brw, attr_overrides, point_sprite_enables, diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c b/src/mesa/drivers/dri/i965/gen8_sf_state.c index 4263eaf..555e6a8 100644 --- a/src/mesa/drivers/dri/i965/gen8_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen8_sf_state.c @@ -57,8 +57,8 @@ upload_sbe(struct brw_context *brw) else dw1 |= GEN6_SF_POINT_SPRITE_UPPERLEFT; - /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM | -* CACHE_NEW_WM_PROG + /* BRW_NEW_VUE_MAP_GEOM_OUT | BRW_NEW_FRAGMENT_PROGRAM | +* _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM | CACHE_NEW_WM_PROG */ calculate_attr_overrides(brw, attr_overrides, point_sprite_enables, -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Recalculate cfg in emit_curb_setup
Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index ffe8ba8..95af5ab 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1477,6 +1477,8 @@ fs_visitor::assign_curb_setup() prog_data-curb_read_length = ALIGN(stage_prog_data-nr_params, 8) / 8; + calculate_cfg(); + /* Map the offsets in the UNIFORM file to fixed HW regs. */ foreach_block_and_inst(block, fs_inst, inst, cfg) { for (unsigned int i = 0; i inst-sources; i++) { -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965: Use unsynchronized maps for the program cache on LLC platforms.
On Fri, Sep 26, 2014 at 2:21 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Friday, September 26, 2014 09:22:31 AM Kristian Høgsberg wrote: On Fri, Aug 29, 2014 at 11:10:50PM -0700, Kenneth Graunke wrote: There's no reason to stall on pwrite - the CPU always appends to the buffer and never modifies existing contents, and the GPU never writes it. Further, the CPU always appends new data before submitting a batch that requires it. This code predates the unsynchronized mapping feature, so we simply didn't have the option when it was written. Ideally, we would do this for non-LLC platforms too, but unsynchronized mapping support only exists for LLC systems. Saves repeated 0.001ms stalls on program upload. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_state_cache.c | 34 +++-- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c index b9bb0fc..1d2d32f 100644 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c @@ -172,14 +172,23 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t new_size) drm_intel_bo *new_bo; new_bo = drm_intel_bo_alloc(brw-bufmgr, program cache, new_size, 64); + if (brw-has_llc) + drm_intel_gem_bo_map_unsynchronized(new_bo); /* Copy any existing data that needs to be saved. */ if (cache-next_offset != 0) { - brw_bo_map(brw, cache-bo, false, program cache); - drm_intel_bo_subdata(new_bo, 0, cache-next_offset, cache-bo-virtual); - drm_intel_bo_unmap(cache-bo); + if (brw-has_llc) { + memcpy(new_bo-virtual, cache-bo-virtual, cache-next_offset); Move the drm_intel_gem_bo_map_unsynchronized() and drm_intel_bo_unmap() calls into this block so they bracket the memcpy as for the subdata case below? Other than that, Reviewed-by: Kristian Høgsberg k...@bitplanet.net That won't work---the point is to map new_bo, and leave it mapped...and unmap the old BO before throwing it away. If I moved the map call into the if (cache-next_offset != 0) block, then the initial mapping would never occur. Yup, that makes sense. Kristian + } else { + brw_bo_map(brw, cache-bo, false, program cache); + drm_intel_bo_subdata(new_bo, 0, cache-next_offset, + cache-bo-virtual); + drm_intel_bo_unmap(cache-bo); + } } + if (brw-has_llc) + drm_intel_bo_unmap(cache-bo); drm_intel_bo_unreference(cache-bo); cache-bo = new_bo; cache-bo_used_by_gpu = false; @@ -222,9 +231,11 @@ brw_try_upload_using_copy(struct brw_cache *cache, continue; } -brw_bo_map(brw, cache-bo, false, program cache); + if (!brw-has_llc) +brw_bo_map(brw, cache-bo, false, program cache); ret = memcmp(cache-bo-virtual + item-offset, data, item-size); -drm_intel_bo_unmap(cache-bo); + if (!brw-has_llc) +drm_intel_bo_unmap(cache-bo); if (ret) continue; @@ -257,7 +268,7 @@ brw_upload_item_data(struct brw_cache *cache, /* If we would block on writing to an in-use program BO, just * recreate it. */ - if (cache-bo_used_by_gpu) { + if (!brw-has_llc cache-bo_used_by_gpu) { perf_debug(Copying busy program cache buffer.\n); brw_cache_new_bo(cache, cache-bo-size); } @@ -280,6 +291,7 @@ brw_upload_cache(struct brw_cache *cache, uint32_t *out_offset, void *out_aux) { + struct brw_context *brw = cache-brw; struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item); GLuint hash; void *tmp; @@ -320,7 +332,11 @@ brw_upload_cache(struct brw_cache *cache, cache-n_items++; /* Copy data to the buffer */ - drm_intel_bo_subdata(cache-bo, item-offset, data_size, data); + if (brw-has_llc) { + memcpy((char *) cache-bo-virtual + item-offset, data, data_size); + } else { + drm_intel_bo_subdata(cache-bo, item-offset, data_size, data); + } *out_offset = item-offset; *(void **)out_aux = (void *)((char *)item-key + item-key_size); @@ -342,6 +358,8 @@ brw_init_caches(struct brw_context *brw) cache-bo = drm_intel_bo_alloc(brw-bufmgr, program cache, 4096, 64); + if (brw-has_llc) + drm_intel_gem_bo_map_unsynchronized(cache-bo); cache-aux_compare[BRW_VS_PROG] = brw_vs_prog_data_compare; cache-aux_compare[BRW_GS_PROG] = brw_gs_prog_data_compare; @@ -408,6 +426,8 @@ brw_destroy_cache(struct brw_context *brw, struct brw_cache *cache) DBG(%s\n, __FUNCTION__); + if (brw-has_llc) + drm_intel_bo_unmap(cache-bo);
[Mesa-dev] [PATCH 2/2] mesa: Avoid flagging _NEW_VIEWPORT on redundant viewport updates.
Cuts the number of i965 color calculator viewport uploads by 100x (11017983 - 113385) in 'x11perf -gc' with Glamor in Xephyr. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/main/viewport.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/main/viewport.c b/src/mesa/main/viewport.c index 6545bf6..222ae30 100644 --- a/src/mesa/main/viewport.c +++ b/src/mesa/main/viewport.c @@ -58,6 +58,12 @@ set_viewport_no_notify(struct gl_context *ctx, unsigned idx, ctx-Const.ViewportBounds.Min, ctx-Const.ViewportBounds.Max); } + if (ctx-ViewportArray[idx].X == x + ctx-ViewportArray[idx].Width == width + ctx-ViewportArray[idx].Y == y + ctx-ViewportArray[idx].Height == height) + return; + ctx-ViewportArray[idx].X = x; ctx-ViewportArray[idx].Width = width; ctx-ViewportArray[idx].Y = y; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965: Drop CACHE_NEW_VS_PROG from the gen7_sf_state atom.
I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG, which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT. It's not needed here anyway - only SBE needs it. Just a copy and paste mistake. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen7_sf_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c b/src/mesa/drivers/dri/i965/gen7_sf_state.c index 67e4448..150a4d3 100644 --- a/src/mesa/drivers/dri/i965/gen7_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c @@ -254,7 +254,7 @@ const struct brw_tracked_state gen7_sf_state = { _NEW_POINT | _NEW_MULTISAMPLE), .brw = BRW_NEW_CONTEXT, - .cache = CACHE_NEW_VS_PROG + .cache = 0, }, .emit = upload_sf_state, }; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Recalculate cfg in emit_curb_setup
On Fri, Sep 26, 2014 at 2:59 PM, Jason Ekstrand ja...@jlekstrand.net wrote: Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index ffe8ba8..95af5ab 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1477,6 +1477,8 @@ fs_visitor::assign_curb_setup() prog_data-curb_read_length = ALIGN(stage_prog_data-nr_params, 8) / 8; + calculate_cfg(); + /* Map the offsets in the UNIFORM file to fixed HW regs. */ foreach_block_and_inst(block, fs_inst, inst, cfg) { for (unsigned int i = 0; i inst-sources; i++) { -- 2.1.0 This shouldn't be necessary. We never invalidate the cfg after calculating it the first time. Something's wrong. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] radeonsi/compute: directly emit CONTEXT_CONTROL
On Fri, Sep 26, 2014 at 3:04 PM, Alex Deucher alexdeuc...@gmail.com wrote: On Thu, Sep 25, 2014 at 3:02 PM, Tom Stellard t...@stellard.net wrote: On Mon, Sep 22, 2014 at 09:48:43PM +0200, Marek Olšák wrote: No, we cannot detect compute-only contexts yet. We need to add a new parameter to pipe_context::context_create which says that a context is compute-only. That should be OpenCL but not OpenGL. Also, some code paths like resource_copy_region use the graphics engine for copying, which cannot be used with compute rings and must be implemented with either DMA or compute-based blits. DMA isn't flexible enough, so some additional work for compute-based blits might be needed. We can also use the graphics ring for copying only and the compute ring for compute stuff. If possible, I think I would prefer continuing to use the graphic ring for blits and only submit compute specific packets to the compute ring. I'm a little concerned that adding a compute-flag to context create might make it harder to share code between compute and graphics, which I think is important. What are the downsides of using both rings at once? Will we need to add synchronization code for the two rings? I think the last time I looked into doing this, the biggest problem was that fences were submitted via the graphics ring even though they were meant for jobs on the compute ring. Is there are good solution to this? It would be nice to not have any dependencies on the gfx ring. That way compute jobs can run on the compute rings without requiring the gfx ring which should avoid any latency issues with desktop gfx jobs. In that case we have to rewrite resource_copy_region and make it use compute shaders only. The ideal time for that would be after ARB_compute_shader (from GL4.3) has been implemented. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] mesa: Avoid flagging _NEW_VIEWPORT on redundant viewport updates.
On 09/26/2014 04:13 PM, Kenneth Graunke wrote: Cuts the number of i965 color calculator viewport uploads by 100x (11017983 - 113385) in 'x11perf -gc' with Glamor in Xephyr. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/main/viewport.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/main/viewport.c b/src/mesa/main/viewport.c index 6545bf6..222ae30 100644 --- a/src/mesa/main/viewport.c +++ b/src/mesa/main/viewport.c @@ -58,6 +58,12 @@ set_viewport_no_notify(struct gl_context *ctx, unsigned idx, ctx-Const.ViewportBounds.Min, ctx-Const.ViewportBounds.Max); } + if (ctx-ViewportArray[idx].X == x + ctx-ViewportArray[idx].Width == width + ctx-ViewportArray[idx].Y == y + ctx-ViewportArray[idx].Height == height) + return; + ctx-ViewportArray[idx].X = x; ctx-ViewportArray[idx].Width = width; ctx-ViewportArray[idx].Y = y; Reviewed-by: Brian Paul bri...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/5] i965: Fix INTEL_DEBUG=state to work with 64-bit dirty bits.
This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits beyond 1 31. We missed doing this when widening the driver flags from uint32_t to uint64_t. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_state_upload.c | 23 +++ 1 file changed, 7 insertions(+), 16 deletions(-) NAK on i965: Update dirty_bit_map::bit to be a uint64_t. It wasn't sufficient to keep this working. I've now actually created bits 32 and 33, and verified that they are counted and printed correctly. diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index f4b0475..e124ce4 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -438,7 +438,7 @@ static void xor_states( struct brw_state_flags *result, } struct dirty_bit_map { - uint32_t bit; + uint64_t bit; char *name; uint32_t count; }; @@ -475,7 +475,8 @@ static struct dirty_bit_map mesa_bits[] = { DEFINE_BIT(_NEW_PROGRAM_CONSTANTS), DEFINE_BIT(_NEW_BUFFER_OBJECT), DEFINE_BIT(_NEW_FRAG_CLAMP), - DEFINE_BIT(_NEW_VARYING_VP_INPUTS), + /* Avoid sign extension problems. */ + {(unsigned) _NEW_VARYING_VP_INPUTS, _NEW_VARYING_VP_INPUTS, 0}, {0, 0, 0} }; @@ -538,14 +539,9 @@ static struct dirty_bit_map cache_bits[] = { static void -brw_update_dirty_count(struct dirty_bit_map *bit_map, int32_t bits) +brw_update_dirty_count(struct dirty_bit_map *bit_map, uint64_t bits) { - int i; - - for (i = 0; i 32; i++) { - if (bit_map[i].bit == 0) -return; - + for (int i = 0; bit_map[i].bit != 0; i++) { if (bit_map[i].bit bits) bit_map[i].count++; } @@ -554,13 +550,8 @@ brw_update_dirty_count(struct dirty_bit_map *bit_map, int32_t bits) static void brw_print_dirty_count(struct dirty_bit_map *bit_map) { - int i; - - for (i = 0; i 32; i++) { - if (bit_map[i].bit == 0) -return; - - fprintf(stderr, 0x%08x: %12d (%s)\n, + for (int i = 0; bit_map[i].bit != 0; i++) { + fprintf(stderr, 0x%016lx: %12d (%s)\n, bit_map[i].bit, bit_map[i].count, bit_map[i].name); } } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3.5/5] i965: Use ~0ull when flagging all BRW_NEW_* dirty flags.
~0 is 0x, which only covers the first 32 bits. We need all 64. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_blorp.cpp | 2 +- src/mesa/drivers/dri/i965/brw_state_cache.c | 2 +- src/mesa/drivers/dri/i965/brw_state_upload.c | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) I think Jordan/Paul fixed this with macros, but we reverted that patch. This fixes it in the minimal way; we can think about adding macros later. diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp b/src/mesa/drivers/dri/i965/brw_blorp.cpp index 2c00bce..20ce7b7 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp @@ -276,7 +276,7 @@ retry: /* We've smashed all state compared to what the normal 3D pipeline * rendering tracks for GL. */ - brw-state.dirty.brw = ~0; + brw-state.dirty.brw = ~0ull; brw-state.dirty.cache = ~0; brw-no_depth_or_stencil = false; brw-ib.type = -1; diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c index 882d131..62e03b1 100644 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c @@ -379,7 +379,7 @@ brw_clear_cache(struct brw_context *brw, struct brw_cache *cache) * any offsets leftover in brw_context will no longer be valid. */ brw-state.dirty.mesa |= ~0; - brw-state.dirty.brw |= ~0; + brw-state.dirty.brw |= ~0ull; brw-state.dirty.cache |= ~0; intel_batchbuffer_flush(brw); } diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index e124ce4..9e3cfb8 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -388,7 +388,7 @@ void brw_init_state( struct brw_context *brw ) brw_upload_initial_gpu_state(brw); brw-state.dirty.mesa = ~0; - brw-state.dirty.brw = ~0; + brw-state.dirty.brw = ~0ull; /* Make sure that brw-state.dirty.brw has enough bits to hold all possible * dirty flags. @@ -575,7 +575,7 @@ void brw_upload_state(struct brw_context *brw) if (0) { /* Always re-emit all state. */ state-mesa |= ~0; - state-brw |= ~0; + state-brw |= ~0ull; state-cache |= ~0; } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Properly calculate the number of instructions in calculate_register_pressure
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index b9bd94c..97b39e1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3400,7 +3400,9 @@ fs_visitor::calculate_register_pressure() invalidate_live_intervals(); calculate_live_intervals(); - unsigned num_instructions = instructions.length(); + unsigned num_instructions = 0; + foreach_block(block, cfg) + num_instructions = block-instructions.length(); regs_live_at_ip = rzalloc_array(mem_ctx, int, num_instructions); -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Properly calculate the number of instructions in calculate_register_pressure
On Fri, Sep 26, 2014 at 7:09 PM, Jason Ekstrand ja...@jlekstrand.net wrote: --- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index b9bd94c..97b39e1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3400,7 +3400,9 @@ fs_visitor::calculate_register_pressure() invalidate_live_intervals(); calculate_live_intervals(); - unsigned num_instructions = instructions.length(); + unsigned num_instructions = 0; + foreach_block(block, cfg) + num_instructions = block-instructions.length(); This seems odd. Did you mean += perchance? regs_live_at_ip = rzalloc_array(mem_ctx, int, num_instructions); -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Properly calculate the number of instructions in calculate_register_pressure
On Fri, Sep 26, 2014 at 4:09 PM, Jason Ekstrand ja...@jlekstrand.net wrote: --- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index b9bd94c..97b39e1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3400,7 +3400,9 @@ fs_visitor::calculate_register_pressure() invalidate_live_intervals(); calculate_live_intervals(); - unsigned num_instructions = instructions.length(); + unsigned num_instructions = 0; + foreach_block(block, cfg) + num_instructions = block-instructions.length(); += regs_live_at_ip = rzalloc_array(mem_ctx, int, num_instructions); -- 2.1.0 Oh, yeah. Nice find. Reviewed-by: Matt Turner matts...@gmail.com We should get rid of the instructions member entirely to avoid (my) mistakes like this. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965: Drop CACHE_NEW_VS_PROG from the gen7_sf_state atom.
Series is Reviewed-by: Ian Romanick ian.d.roman...@intel.com On 09/26/2014 03:13 PM, Kenneth Graunke wrote: I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG, which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT. It's not needed here anyway - only SBE needs it. Just a copy and paste mistake. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen7_sf_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c b/src/mesa/drivers/dri/i965/gen7_sf_state.c index 67e4448..150a4d3 100644 --- a/src/mesa/drivers/dri/i965/gen7_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c @@ -254,7 +254,7 @@ const struct brw_tracked_state gen7_sf_state = { _NEW_POINT | _NEW_MULTISAMPLE), .brw = BRW_NEW_CONTEXT, - .cache = CACHE_NEW_VS_PROG + .cache = 0, }, .emit = upload_sf_state, }; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] i965: Delete CACHE_NEW_BLORP_CONST_COLOR_PROG.
Series is Reviewed-by: Ian Romanick ian.d.roman...@intel.com On 09/26/2014 02:53 PM, Kenneth Graunke wrote: Unused since krh rewrote fast clears to use meta. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_context.h | 2 -- src/mesa/drivers/dri/i965/brw_state_upload.c | 1 - 2 files changed, 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 377853e..3efd582 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -685,7 +685,6 @@ enum brw_cache_id { BRW_CC_UNIT, BRW_WM_PROG, BRW_BLORP_BLIT_PROG, - BRW_BLORP_CONST_COLOR_PROG, BRW_SAMPLER, BRW_WM_UNIT, BRW_SF_PROG, @@ -780,7 +779,6 @@ enum shader_time_shader_type { #define CACHE_NEW_CC_UNIT(1BRW_CC_UNIT) #define CACHE_NEW_WM_PROG(1BRW_WM_PROG) #define CACHE_NEW_BLORP_BLIT_PROG(1BRW_BLORP_BLIT_PROG) -#define CACHE_NEW_BLORP_CONST_COLOR_PROG (1BRW_BLORP_CONST_COLOR_PROG) #define CACHE_NEW_SAMPLER(1BRW_SAMPLER) #define CACHE_NEW_WM_UNIT(1BRW_WM_UNIT) #define CACHE_NEW_SF_PROG(1BRW_SF_PROG) diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index dd0ceb6..f4b0475 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -520,7 +520,6 @@ static struct dirty_bit_map cache_bits[] = { DEFINE_BIT(CACHE_NEW_CC_UNIT), DEFINE_BIT(CACHE_NEW_WM_PROG), DEFINE_BIT(CACHE_NEW_BLORP_BLIT_PROG), - DEFINE_BIT(CACHE_NEW_BLORP_CONST_COLOR_PROG), DEFINE_BIT(CACHE_NEW_SAMPLER), DEFINE_BIT(CACHE_NEW_WM_UNIT), DEFINE_BIT(CACHE_NEW_SF_PROG), ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3.5/5] i965: Use ~0ull when flagging all BRW_NEW_* dirty flags.
On Fri, Sep 26, 2014 at 4:09 PM, Kenneth Graunke kenn...@whitecape.org wrote: ~0 is 0x, which only covers the first 32 bits. We need all 64. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_blorp.cpp | 2 +- src/mesa/drivers/dri/i965/brw_state_cache.c | 2 +- src/mesa/drivers/dri/i965/brw_state_upload.c | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) I think Jordan/Paul fixed this with macros, but we reverted that patch. This fixes it in the minimal way; we can think about adding macros later. diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp b/src/mesa/drivers/dri/i965/brw_blorp.cpp index 2c00bce..20ce7b7 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp @@ -276,7 +276,7 @@ retry: /* We've smashed all state compared to what the normal 3D pipeline * rendering tracks for GL. */ - brw-state.dirty.brw = ~0; + brw-state.dirty.brw = ~0ull; brw-state.dirty.cache = ~0; brw-no_depth_or_stencil = false; brw-ib.type = -1; diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c index 882d131..62e03b1 100644 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c @@ -379,7 +379,7 @@ brw_clear_cache(struct brw_context *brw, struct brw_cache *cache) * any offsets leftover in brw_context will no longer be valid. */ brw-state.dirty.mesa |= ~0; - brw-state.dirty.brw |= ~0; + brw-state.dirty.brw |= ~0ull; brw-state.dirty.cache |= ~0; intel_batchbuffer_flush(brw); } diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index e124ce4..9e3cfb8 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -388,7 +388,7 @@ void brw_init_state( struct brw_context *brw ) brw_upload_initial_gpu_state(brw); brw-state.dirty.mesa = ~0; - brw-state.dirty.brw = ~0; + brw-state.dirty.brw = ~0ull; /* Make sure that brw-state.dirty.brw has enough bits to hold all possible * dirty flags. @@ -575,7 +575,7 @@ void brw_upload_state(struct brw_context *brw) if (0) { /* Always re-emit all state. */ state-mesa |= ~0; - state-brw |= ~0; + state-brw |= ~0ull; state-cache |= ~0; Something stupid about ORing with a field-width set of 1s, but that's how the code is. Looks good to me. The whole series is Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/compaction: Avoid (unexpected) unsigned division.
... which leads to incorrect results on 32-bit x86. Reported-by: Mark Janes mark.a.ja...@intel.com --- I tried writing up a nice commit message that explained what was going on and why this worked on 64-bit, but then I realized that it was taking orders of magnitude longer than the fix itself and probably no one would care anyway. src/mesa/drivers/dri/i965/brw_eu_compact.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_compact.c b/src/mesa/drivers/dri/i965/brw_eu_compact.c index 114d18f..3f655ac 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_compact.c +++ b/src/mesa/drivers/dri/i965/brw_eu_compact.c @@ -1445,8 +1445,8 @@ brw_compact_instructions(struct brw_compile *p, int start_offset, assert(brw_inst_src1_reg_file(brw, insn) == BRW_IMMEDIATE_VALUE); int jump = brw_inst_imm_d(brw, insn); -int jump_compacted = jump / sizeof(brw_compact_inst); -int jump_uncompacted = jump / sizeof(brw_inst); +int jump_compacted = jump / (int)sizeof(brw_compact_inst); +int jump_uncompacted = jump / (int)sizeof(brw_inst); target_old_ip = this_old_ip + jump_uncompacted; target_compacted_count = compacted_counts[target_old_ip]; -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] glsl: replace while loop with without_array function
Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/glsl/ast_to_hir.cpp | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 5ec1614..1c1815b 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -3560,9 +3560,7 @@ ast_declarator_list::hir(exec_list *instructions, *vectors. Vertex shader inputs cannot be arrays or *structures. */ -const glsl_type *check_type = var-type; -while (check_type-is_array()) - check_type = check_type-element_type(); +const glsl_type *check_type = var-type-without_array(); switch (check_type-base_type) { case GLSL_TYPE_FLOAT: -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] glsl: simplify varying lowering check
This adds support for arrays of arrays and simplifies the check for gs and ts. Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/glsl/lower_packed_varyings.cpp | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/src/glsl/lower_packed_varyings.cpp b/src/glsl/lower_packed_varyings.cpp index 7801483..60b06f4 100644 --- a/src/glsl/lower_packed_varyings.cpp +++ b/src/glsl/lower_packed_varyings.cpp @@ -590,14 +590,7 @@ lower_packed_varyings_visitor::needs_lowering(ir_variable *var) if (var-data.explicit_location) return false; - const glsl_type *type = var-type; - if (this-gs_input_vertices != 0) { - assert(type-is_array()); - type = type-element_type(); - } - if (type-is_array()) - type = type-fields.array; - if (type-vector_elements == 4) + if (var-type-without_array()-vector_elements == 4) return false; return true; } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] glsl: add arrays of arrays support to without_array function
Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/glsl/glsl_types.h | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h index eeb14c2..f1d578e 100644 --- a/src/glsl/glsl_types.h +++ b/src/glsl/glsl_types.h @@ -505,7 +505,12 @@ struct glsl_type { */ const glsl_type *without_array() const { - return this-is_array() ? this-fields.array : this; + const glsl_type *t = this; + + while (t-is_array()) + t = t-fields.array; + + return t; } /** -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] Allow texture2DProjLod and textureCubeLod with Gles.
According to GLES (i.e. 1.0 and above) spec textureCubeLod and texture2DProjLod are built in functions. We seem to disable support for these functions with GLES. This patch enables the support. Signed-off-by: Kalyan Kondapally kalyan.kondapa...@intel.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84355 --- src/glsl/builtin_functions.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp index 9be7f6d..5a024cb 100644 --- a/src/glsl/builtin_functions.cpp +++ b/src/glsl/builtin_functions.cpp @@ -1882,8 +1882,8 @@ builtin_builder::create_builtins() NULL); add_function(texture2DProjLod, -_texture(ir_txl, v110_lod, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec3_type, TEX_PROJECT), -_texture(ir_txl, v110_lod, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec4_type, TEX_PROJECT), +_texture(ir_txl, lod_exists_in_stage, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec3_type, TEX_PROJECT), +_texture(ir_txl, lod_exists_in_stage, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec4_type, TEX_PROJECT), NULL); add_function(texture3D, @@ -1910,7 +1910,7 @@ builtin_builder::create_builtins() NULL); add_function(textureCubeLod, -_texture(ir_txl, v110_lod, glsl_type::vec4_type, glsl_type::samplerCube_type, glsl_type::vec3_type), +_texture(ir_txl, lod_exists_in_stage, glsl_type::vec4_type, glsl_type::samplerCube_type, glsl_type::vec3_type), NULL); add_function(texture2DRect, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] Allow texture2DProjLod and textureCubeLod with Gles.
On Fri, Sep 26, 2014 at 7:44 PM, Kalyan Kondapally kondapallykalyancontrib...@gmail.com wrote: According to GLES (i.e. 1.0 and above) spec textureCubeLod and texture2DProjLod are built in functions. We seem to disable support for these functions with GLES. This patch enables the support. Signed-off-by: Kalyan Kondapally kalyan.kondapa...@intel.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84355 Change the subject to glsl: Allow texture2DProjLod and textureCubeLod in GL ES. Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev