[Mesa-dev] [PATCH 2/9] gallium: use $(top_builddir) when referencing other .la's
Just like every other place in gallium. Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gallium/drivers/freedreno/Makefile.am | 2 +- src/gallium/drivers/nouveau/Makefile.am | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/freedreno/Makefile.am b/src/gallium/drivers/freedreno/Makefile.am index e798e44..cbf62c6 100644 --- a/src/gallium/drivers/freedreno/Makefile.am +++ b/src/gallium/drivers/freedreno/Makefile.am @@ -28,7 +28,7 @@ ir3_compiler_SOURCES = \ ir3_compiler_LDADD = \ libfreedreno.la \ - ../../auxiliary/libgallium.la \ + $(top_builddir)/src/gallium/auxiliary/libgallium.la \ $(top_builddir)/src/glsl/libnir.la \ $(top_builddir)/src/libglsl_util.la \ $(top_builddir)/src/util/libmesautil.la \ diff --git a/src/gallium/drivers/nouveau/Makefile.am b/src/gallium/drivers/nouveau/Makefile.am index 0aefc03..d05f0a1 100644 --- a/src/gallium/drivers/nouveau/Makefile.am +++ b/src/gallium/drivers/nouveau/Makefile.am @@ -48,7 +48,7 @@ nouveau_compiler_SOURCES = \ nouveau_compiler_LDADD = \ libnouveau.la \ - ../../auxiliary/libgallium.la \ + $(top_builddir)/src/gallium/auxiliary/libgallium.la \ $(top_builddir)/src/util/libmesautil.la \ $(GALLIUM_COMMON_LIB_DEPS) -- 2.3.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/skl: Add the header for constant loads outside of the generator
Ben Widawsky b...@bwidawsk.net writes: + void generate_set_simd4x2_header_gen9(vec4_instruction *inst, + struct brw_reg dst); super bikesheddy, but this function name sounds pretty bad, though I understand it's a side-effect of the name of the opcode (which seems consistent with similar opcodes). I can't think of anything great either though. Yes, it's a bit awkward, but unless someone proposes a better name I will leave it. It matches methods like fs_generator::generate_set_simd4x2_offset which already exists. void +vec4_generator::generate_set_simd4x2_header_gen9(vec4_instruction *inst, + struct brw_reg dst) +{ + /* Skylake requires a message header in order to use SIMD4x2 mode. */ This comment is pretty useless here. Yes, good point. I'll remove it. + brw_push_insn_state(p); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + + brw_MOV(p, dst, retype(brw_vec4_grf(0, 0), BRW_REGISTER_TYPE_UD)); + retype(brw_vec8_grf(0, 0)? It would jive better with the diff. Ah yes, oops. I've accidentally undone the fix I made in 07c571a39fa1 here. I'll fix it. +void +vec4_visitor::emit_pull_constant_load_reg(dst_reg dst, + src_reg surf_index, + src_reg offset_reg, + bblock_t *before_block, + vec4_instruction *before_inst) +{ + assert((before_inst == NULL before_block == NULL) || + (before_inst before_block)); + + vec4_instruction *pull; + + if (brw-gen = 9) { + /* Gen9+ needs a message header in order to use SIMD4x2 mode */ + src_reg header(this, glsl_type::uvec4_type, 2); + Just curious why you didn't go for a uvec8_type. It seems more natural for what we want to do, and how the other code handles things. I don't think I picked uvec4_type here for any particular reason. I guess it's hard to pick a type that makes sense because it's a virtual register that needs to be sized for 2 actual registers where the first one is a chunk of structed data used for the header and the second one is actually just an integer I think. As Ian pointed out, uvec8_type doesn't exist. + dst_reg index_reg = retype(offset(dst_reg(header), 1), + offset_reg.type); + pull = MOV(writemask(index_reg, WRITEMASK_X), offset_reg); I am not finding where the 1 in the offset there comes from? I guess that's why you used a uvec4 above, though I think suboffset can achieve the same. Can you elaborate for me, I was sort of expecting to see a '2' instead of 1? This isn't a suboffset into a register, the 1 means to add a whole register. I think the register type is irrelevant in that case. The header register is allocated as two-registers wide. The first actual register gets the header and here we are copying in the index into the second register. I wouldn't be opposed to doing the extraction as a prep-patch fwiw. Should make things slightly easier to review, but I can live with whatever you prefer. Ok, good idea. I'll do that. I couldn't spot any issues otherwise. If you can address my questions comments, I think I'm ready to slap on the r-b Thanks for the review. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] llvmpipe support for ARM?
Hi, I have been trying to get OpenGL sw rastering to work on Linux based ARM CortexA15 device but for some reason gallium llvmpipe gives me a segmentation fault. Has anybody managed to run sw rastering using gallium llvmpipe on ARM? Softpipe seems to be working ok. I am using the following versions: - Mesa 9.2.5 - LLVM3.3 - Wayland/Weston 1.5 The issue is related to opengl/gles2 rendering i.e. weston-simple-egl demo and Qt openGL demos give seg faults when executed with llvmpipe enabled. Backtrace from weston-simple-egl looks something like this: #0 0x6d7de620 in ?? () #1 0xb354004c in fs17_variant0_partial () #2 0xb6841740 in lp_rast_shade_quads_mask (task=task@entry=0x1e360, inputs=inputs@entry=0x47a00, x=128, y=80, mask=4369) at lp_rast.c:466 #3 0xb6842d04 in do_block_4_1 (c=optimized out, y=optimized out, x=optimized out, plane=optimized out, tri=optimized out, task=optimized out) at lp_rast_tri_tmp.h:61 #4 do_block_16_1 (c=synthetic pointer, y=optimized out, x=optimized out, plane=0xb4f4ccf0, tri=optimized out, task=optimized out) at lp_rast_tri_tmp.h:130 #5 lp_rast_triangle_1 (task=0x1e360, arg=...) at lp_rast_tri_tmp.h:232 #6 0xb6840420 in do_rasterize_bin (bin=optimized out, task=0x1e360, x=optimized out, y=optimized out) at lp_rast.c:607 #7 rasterize_bin (y=optimized out, x=optimized out, bin=optimized out, task=0x1e360) at lp_rast.c:626 #8 rasterize_scene (task=task@entry=0x1e360, scene=0xb3646008) at lp_rast.c:675 #9 0xb6840cb0 in thread_function (init_data=0x1e360) at lp_rast.c:788 #10 0xb6d98ed2 in start_thread () from /lib/libpthread.so.0 #11 0xb6c91058 in ?? () from /lib/libc.so.6 #12 0xb6c91058 in ?? () from /lib/libc.so.6 regards, Marko marko.s.mob...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] gallium/ttn: add support for texture offsets
On Tue, Apr 14, 2015 at 1:04 PM, Eric Anholt e...@anholt.net wrote: Rob Clark robdcl...@gmail.com writes: From: Rob Clark robcl...@freedesktop.org Signed-off-by: Rob Clark robcl...@freedesktop.org 1-3 (with the fix to 1 that you posted in irc) are: Reviewed-by: Eric Anholt e...@anholt.net I don't like the mismatch on bytes vs vec4s in the load_ubo_indirect arguments for patch 4, and will be interested in seeing the version you were working on to fix that. so, if I pull out the addressing-modes patch, then for the ubo patch we are back to 'index *= 16' vec4-byte conversion. Which is funny looking, but at the moment my best definition of correct is what glsl_to_nir and intel driver do, and by that definition, this is the correct thing to do. BR, -R ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/9] freedreno: use CXX linker rather than explicit link against libstdc++
Cc: Rob Clark robcl...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gallium/drivers/freedreno/Makefile.am | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/Makefile.am b/src/gallium/drivers/freedreno/Makefile.am index 4b2629f..e798e44 100644 --- a/src/gallium/drivers/freedreno/Makefile.am +++ b/src/gallium/drivers/freedreno/Makefile.am @@ -21,6 +21,8 @@ libfreedreno_la_SOURCES = \ noinst_PROGRAMS = ir3_compiler +# XXX: Required due to the C++ sources in libnir/libglsl_util +nodist_EXTRA_ir3_compiler_SOURCES = dummy.cpp ir3_compiler_SOURCES = \ ir3/ir3_cmdline.c @@ -29,7 +31,6 @@ ir3_compiler_LDADD = \ ../../auxiliary/libgallium.la \ $(top_builddir)/src/glsl/libnir.la \ $(top_builddir)/src/libglsl_util.la \ - -lstdc++ \ $(top_builddir)/src/util/libmesautil.la \ $(GALLIUM_COMMON_LIB_DEPS) \ $(FREEDRENO_LIBS) -- 2.3.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/9] pipe-loader: add libnir and libglsl_util to the link
Otherwise we'll error out due to unresolved nir symbols. Note that we still fail to link due to unresolved _mesa_error_no_memory(). Based on commit 101142c4010(xa: support for drivers which use NIR) Cc: Rob Clark robcl...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gallium/targets/pipe-loader/Makefile.am | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/targets/pipe-loader/Makefile.am b/src/gallium/targets/pipe-loader/Makefile.am index 967cdb7..e4048b5 100644 --- a/src/gallium/targets/pipe-loader/Makefile.am +++ b/src/gallium/targets/pipe-loader/Makefile.am @@ -52,6 +52,8 @@ endif PIPE_LIBS += \ $(top_builddir)/src/gallium/auxiliary/libgallium.la \ + $(top_builddir)/src/glsl/libnir.la \ + $(top_builddir)/src/libglsl_util.la \ $(top_builddir)/src/util/libmesautil.la \ $(top_builddir)/src/gallium/drivers/rbug/librbug.la \ $(top_builddir)/src/gallium/drivers/trace/libtrace.la \ -- 2.3.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/9] freedreno/ir3: use the weak _mesa_error_no_memory() symbol
We no longer need to (although we can) provide this symbol ourselves. Cc: Rob Clark robcl...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gallium/drivers/freedreno/ir3/ir3_cmdline.c | 8 1 file changed, 8 deletions(-) diff --git a/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c b/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c index d0517aa..0b16cc1 100644 --- a/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c +++ b/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c @@ -360,11 +360,3 @@ int main(int argc, char **argv) } dump_info(v, info); } - -void _mesa_error_no_memory(const char *caller); - -void -_mesa_error_no_memory(const char *caller) -{ - fprintf(stderr, Mesa error: out of memory in %s, caller); -} -- 2.3.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 9/9] r300: do not link against libdrm_intel
Accidentally added since the introduction of the file. Cc: 10.4 10.5 mesa-sta...@lists.freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gallium/drivers/r300/Automake.inc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r300/Automake.inc b/src/gallium/drivers/r300/Automake.inc index 9334973..d4ddc40 100644 --- a/src/gallium/drivers/r300/Automake.inc +++ b/src/gallium/drivers/r300/Automake.inc @@ -5,7 +5,7 @@ TARGET_CPPFLAGS += -DGALLIUM_R300 TARGET_LIB_DEPS += \ $(top_builddir)/src/gallium/drivers/r300/libr300.la \ $(RADEON_LIBS) \ - $(INTEL_LIBS) + $(LIBDRM_LIBS) TARGET_RADEON_WINSYS = \ $(top_builddir)/src/gallium/winsys/radeon/drm/libradeonwinsys.la -- 2.3.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/9] prog; add weak _mesa_error_no_memory() symbol and add it to libglsl_util
Rather than forcing everyone to provide their own definition of the symbol provide a common weak one, which anyone can override if needed. This resolved the build of the standalone pipe-drivers, as it provides a default symbol which was missing previously. Cc: Rob Clark robcl...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/Makefile.am| 3 ++- src/glsl/SConscript| 2 ++ src/mesa/Android.libmesa_glsl_utils.mk | 6 -- src/mesa/program/weak_errors.c | 30 ++ 4 files changed, 38 insertions(+), 3 deletions(-) create mode 100644 src/mesa/program/weak_errors.c diff --git a/src/Makefile.am b/src/Makefile.am index 18cb4ce..461da27 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -72,4 +72,5 @@ noinst_LTLIBRARIES = libglsl_util.la libglsl_util_la_SOURCES = \ mesa/main/imports.c \ mesa/program/prog_hash_table.c \ - mesa/program/symbol_table.c + mesa/program/symbol_table.c \ + mesa/program/weak_errors.c diff --git a/src/glsl/SConscript b/src/glsl/SConscript index 284b375..d18a8a7 100644 --- a/src/glsl/SConscript +++ b/src/glsl/SConscript @@ -71,6 +71,7 @@ env.Command('imports.c', '#src/mesa/main/imports.c', Copy('$TARGET', '$SOURCE')) env.Prepend(CPPPATH = ['#src/mesa/program']) env.Command('prog_hash_table.c', '#src/mesa/program/prog_hash_table.c', Copy('$TARGET', '$SOURCE')) env.Command('symbol_table.c', '#src/mesa/program/symbol_table.c', Copy('$TARGET', '$SOURCE')) +env.Command('weak_errors.c', '#src/mesa/program/weak_errors.c', Copy('$TARGET', '$SOURCE')) compiler_objs = env.StaticObject(source_lists['GLSL_COMPILER_CXX_FILES']) @@ -78,6 +79,7 @@ mesa_objs = env.StaticObject([ 'imports.c', 'prog_hash_table.c', 'symbol_table.c', +'weak_errors.c', ]) compiler_objs += mesa_objs diff --git a/src/mesa/Android.libmesa_glsl_utils.mk b/src/mesa/Android.libmesa_glsl_utils.mk index a9f6ff5..08786e3 100644 --- a/src/mesa/Android.libmesa_glsl_utils.mk +++ b/src/mesa/Android.libmesa_glsl_utils.mk @@ -43,7 +43,8 @@ LOCAL_C_INCLUDES := \ LOCAL_SRC_FILES := \ main/imports.c \ program/prog_hash_table.c \ - program/symbol_table.c + program/symbol_table.c \ + program/weak_errors.c include $(MESA_COMMON_MK) include $(BUILD_STATIC_LIBRARY) @@ -66,7 +67,8 @@ LOCAL_C_INCLUDES := \ LOCAL_SRC_FILES := \ main/imports.c \ program/prog_hash_table.c \ - program/symbol_table.c + program/symbol_table.c \ + program/weak_errors.c include $(MESA_COMMON_MK) include $(BUILD_HOST_STATIC_LIBRARY) diff --git a/src/mesa/program/weak_errors.c b/src/mesa/program/weak_errors.c new file mode 100644 index 000..85c35dc --- /dev/null +++ b/src/mesa/program/weak_errors.c @@ -0,0 +1,30 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ +#include stdio.h +#include main/errors.h + +__attribute__((weak)) void +_mesa_error_no_memory(const char *caller) +{ + fprintf(stderr, Mesa error: out of memory in %s, caller); +} -- 2.3.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 8/9] targets/nine: link against libnir/libglsl_util
Similar to commit 127f8767e0a and 101142c4010. Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gallium/targets/d3dadapter9/Makefile.am | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/targets/d3dadapter9/Makefile.am b/src/gallium/targets/d3dadapter9/Makefile.am index 1dc55f5..591978f 100644 --- a/src/gallium/targets/d3dadapter9/Makefile.am +++ b/src/gallium/targets/d3dadapter9/Makefile.am @@ -74,6 +74,8 @@ endif # HAVE_LD_VERSION_SCRIPT d3dadapter9_la_LIBADD = \ $(top_builddir)/src/gallium/auxiliary/libgalliumvl_stub.la \ $(top_builddir)/src/gallium/auxiliary/libgallium.la \ + $(top_builddir)/src/glsl/libnir.la \ + $(top_builddir)/src/libglsl_util.la \ $(top_builddir)/src/gallium/state_trackers/nine/libninetracker.la \ $(top_builddir)/src/util/libmesautil.la \ $(top_builddir)/src/gallium/winsys/sw/wrapper/libwsw.la \ -- 2.3.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/9] st/xa: use the weak _mesa_error_no_memory() symbol
Thus we can remove the workaround that we previously had. Cc: Rob Clark robcl...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gallium/state_trackers/xa/xa_tracker.c | 12 1 file changed, 12 deletions(-) diff --git a/src/gallium/state_trackers/xa/xa_tracker.c b/src/gallium/state_trackers/xa/xa_tracker.c index 8901998..f69ac8e 100644 --- a/src/gallium/state_trackers/xa/xa_tracker.c +++ b/src/gallium/state_trackers/xa/xa_tracker.c @@ -535,15 +535,3 @@ xa_surface_format(const struct xa_surface *srf) { return srf-fdesc.xa_format; } - -/* - * _mesa_error_no_memory() is expected by NIR to be provided by the - * user. Normally this is in mesa st, but other state trackers - * must provide their own. - */ -void _mesa_error_no_memory(const char *caller); -void -_mesa_error_no_memory(const char *caller) -{ - debug_printf(Mesa error: out of memory in %s, caller); -} -- 2.3.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glx: Allow to create any OpenGL ES version.
On 14 April 2015 at 17:22, Ian Romanick i...@freedesktop.org wrote: On 04/14/2015 08:36 AM, Emil Velikov wrote: On 14 April 2015 at 13:52, Jose Fonseca jfons...@vmware.com wrote: On 13/04/15 22:59, Ian Romanick wrote: On 04/10/2015 03:36 PM, Jose Fonseca wrote: From: José Fonseca jfons...@vmware.com The latest version of GLX_EXT_create_context_es2_profile states: If the version requested is a valid and supported OpenGL-ES version, and the GLX_CONTEXT_ES_PROFILE_BIT_EXT bit is set in the GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the context returned will implement the OpenGL ES version requested. We must also export EXT_create_context_es_profile too, as EXT_create_context_es2_profile specification is crystal clear: NOTE: implementations of this extension must export BOTH extension strings, for backwards compatibility with applications written against version 1 of this extension. Totally untested. (Just happened to noticed this while implementing GLX_EXT_create_context_es2_profile for st/xlib.) Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Emil Velikov emil.l.veli...@gmail.com v2: Replicate the drisw_glx.c to dri2_glx.c and dri3_glx.c as suggested by Emil Velikov. --- src/glx/dri2_glx.c | 5 - src/glx/dri3_glx.c | 5 - src/glx/dri_common.c | 32 src/glx/drisw_glx.c | 2 ++ 4 files changed, 26 insertions(+), 18 deletions(-) diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c index 462d560..8192c54 100644 --- a/src/glx/dri2_glx.c +++ b/src/glx/dri2_glx.c @@ -1102,9 +1102,12 @@ dri2BindExtensions(struct dri2_screen *psc, struct glx_display * priv, __glXEnableDirectExtension(psc-base, GLX_ARB_create_context); __glXEnableDirectExtension(psc-base, GLX_ARB_create_context_profile); - if ((mask (1 __DRI_API_GLES2)) != 0) + if ((mask (1 __DRI_API_GLES2)) != 0) { +__glXEnableDirectExtension(psc-base, + GLX_EXT_create_context_es_profile); __glXEnableDirectExtension(psc-base, GLX_EXT_create_context_es2_profile); + } } for (i = 0; extensions[i]; i++) { diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c index 1ddc723..6973ad1 100644 --- a/src/glx/dri3_glx.c +++ b/src/glx/dri3_glx.c @@ -1825,9 +1825,12 @@ dri3_bind_extensions(struct dri3_screen *psc, struct glx_display * priv, __glXEnableDirectExtension(psc-base, GLX_ARB_create_context); __glXEnableDirectExtension(psc-base, GLX_ARB_create_context_profile); - if ((mask (1 __DRI_API_GLES2)) != 0) + if ((mask (1 __DRI_API_GLES2)) != 0) { + __glXEnableDirectExtension(psc-base, + GLX_EXT_create_context_es_profile); __glXEnableDirectExtension(psc-base, GLX_EXT_create_context_es2_profile); + } for (i = 0; extensions[i]; i++) { /* when on a different gpu than the server, the server pixmaps diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c index 63c8de3..541abbb 100644 --- a/src/glx/dri_common.c +++ b/src/glx/dri_common.c @@ -544,9 +544,22 @@ dri2_convert_glx_attribs(unsigned num_attribs, const uint32_t *attribs, case GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB: *api = __DRI_API_OPENGL; break; - case GLX_CONTEXT_ES2_PROFILE_BIT_EXT: -*api = __DRI_API_GLES2; -break; + case GLX_CONTEXT_ES_PROFILE_BIT_EXT: + switch (*major_ver) { + case 3: +*api = __DRI_API_GLES3; +break; + case 2: +*api = __DRI_API_GLES2; +break; + case 1: +*api = __DRI_API_GLES; +break; + default: +*error = __DRI_CTX_ERROR_BAD_API; +return false; + } + break; default: *error = __DRI_CTX_ERROR_BAD_API; return false; @@ -577,19 +590,6 @@ dri2_convert_glx_attribs(unsigned num_attribs, const uint32_t *attribs, return false; } - /* The GLX_EXT_create_context_es2_profile spec says: -* -* ... If the version requested is 2.0, and the -* GLX_CONTEXT_ES2_PROFILE_BIT_EXT bit is set in the -* GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the context -* returned will implement OpenGL ES 2.0. This is the only way in which -* an implementation may request an OpenGL ES 2.0 context. -*/ - if (*api == __DRI_API_GLES2 (*major_ver != 2 || *minor_ver != 0)) { - *error = __DRI_CTX_ERROR_BAD_API; - return false; - } I guess this text was removed from the extension spec, and now we rely on some other layer detecting invalid versions (like 2.1)? Yes, the spec replaced that with ... If the version requested is a valid and supported OpenGL-ES version, and the
Re: [Mesa-dev] llvmpipe support for ARM?
Hi, I can run arm llvmpipe glxgears on IFC6410 (Snapdragon 600 APQ8064, Krait 300) without problems. llvm 3.5 mesa 10.4.3 both are fedora 21 standard packages without modifications. I haven't tried anything more complex than glxgears jan On Wed, 2015-04-15 at 16:30 +0200, Roland Scheidegger wrote: There were reports with varying success of running llvmpipe with arm in the past but I haven't heard anything for a while. In theory it should work (and crashing in the pixel shader is an indication the vertex shader worked or at least didn't crash). But it's not really tested and hence might easily break. My suggestion would be to try a newer Mesa version since 9.2.5 is very old, but I really don't know if it will help. If that doesn't help, you could try debugging it, you can get the crashing instruction in gdb with x/i address and see if that was an unaligned access which should have been aligned or something else. GALLIVM_DEBUG=ir,asm will get you the IR and compiled code (with debug builds). Roland Am 15.04.2015 um 14:37 schrieb Marko Moberg: Hi, I have been trying to get OpenGL sw rastering to work on Linux based ARM CortexA15 device but for some reason gallium llvmpipe gives me a segmentation fault. Has anybody managed to run sw rastering using gallium llvmpipe on ARM? Softpipe seems to be working ok. I am using the following versions: - Mesa 9.2.5 - LLVM3.3 - Wayland/Weston 1.5 The issue is related to opengl/gles2 rendering i.e. weston-simple-egl demo and Qt openGL demos give seg faults when executed with llvmpipe enabled. Backtrace from weston-simple-egl looks something like this: #0 0x6d7de620 in ?? () #1 0xb354004c in fs17_variant0_partial () #2 0xb6841740 in lp_rast_shade_quads_mask (task=task@entry=0x1e360, inputs=inputs@entry=0x47a00, x=128, y=80, mask=4369) at lp_rast.c:466 #3 0xb6842d04 in do_block_4_1 (c=optimized out, y=optimized out, x=optimized out, plane=optimized out, tri=optimized out, task=optimized out) at lp_rast_tri_tmp.h:61 #4 do_block_16_1 (c=synthetic pointer, y=optimized out, x=optimized out, plane=0xb4f4ccf0, tri=optimized out, task=optimized out) at lp_rast_tri_tmp.h:130 #5 lp_rast_triangle_1 (task=0x1e360, arg=...) at lp_rast_tri_tmp.h:232 #6 0xb6840420 in do_rasterize_bin (bin=optimized out, task=0x1e360, x=optimized out, y=optimized out) at lp_rast.c:607 #7 rasterize_bin (y=optimized out, x=optimized out, bin=optimized out, task=0x1e360) at lp_rast.c:626 #8 rasterize_scene (task=task@entry=0x1e360, scene=0xb3646008) at lp_rast.c:675 #9 0xb6840cb0 in thread_function (init_data=0x1e360) at lp_rast.c:788 #10 0xb6d98ed2 in start_thread () from /lib/libpthread.so.0 #11 0xb6c91058 in ?? () from /lib/libc.so.6 #12 0xb6c91058 in ?? () from /lib/libc.so.6 regards, Marko marko.s.mob...@gmail.com mailto:marko.s.mob...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AwIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=5qZ4KpVhu_wXA8-dTn8xYiul5G713jYwztPr8kfcUZMs=wBNyDvf5DiNBjxXXl7QvyHOvNuTVjZAaw0n7WaQM-bAe= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev -- Jan Vesely jan.ves...@rutgers.edu signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] llvmpipe support for ARM?
There were reports with varying success of running llvmpipe with arm in the past but I haven't heard anything for a while. In theory it should work (and crashing in the pixel shader is an indication the vertex shader worked or at least didn't crash). But it's not really tested and hence might easily break. My suggestion would be to try a newer Mesa version since 9.2.5 is very old, but I really don't know if it will help. If that doesn't help, you could try debugging it, you can get the crashing instruction in gdb with x/i address and see if that was an unaligned access which should have been aligned or something else. GALLIVM_DEBUG=ir,asm will get you the IR and compiled code (with debug builds). Roland Am 15.04.2015 um 14:37 schrieb Marko Moberg: Hi, I have been trying to get OpenGL sw rastering to work on Linux based ARM CortexA15 device but for some reason gallium llvmpipe gives me a segmentation fault. Has anybody managed to run sw rastering using gallium llvmpipe on ARM? Softpipe seems to be working ok. I am using the following versions: - Mesa 9.2.5 - LLVM3.3 - Wayland/Weston 1.5 The issue is related to opengl/gles2 rendering i.e. weston-simple-egl demo and Qt openGL demos give seg faults when executed with llvmpipe enabled. Backtrace from weston-simple-egl looks something like this: #0 0x6d7de620 in ?? () #1 0xb354004c in fs17_variant0_partial () #2 0xb6841740 in lp_rast_shade_quads_mask (task=task@entry=0x1e360, inputs=inputs@entry=0x47a00, x=128, y=80, mask=4369) at lp_rast.c:466 #3 0xb6842d04 in do_block_4_1 (c=optimized out, y=optimized out, x=optimized out, plane=optimized out, tri=optimized out, task=optimized out) at lp_rast_tri_tmp.h:61 #4 do_block_16_1 (c=synthetic pointer, y=optimized out, x=optimized out, plane=0xb4f4ccf0, tri=optimized out, task=optimized out) at lp_rast_tri_tmp.h:130 #5 lp_rast_triangle_1 (task=0x1e360, arg=...) at lp_rast_tri_tmp.h:232 #6 0xb6840420 in do_rasterize_bin (bin=optimized out, task=0x1e360, x=optimized out, y=optimized out) at lp_rast.c:607 #7 rasterize_bin (y=optimized out, x=optimized out, bin=optimized out, task=0x1e360) at lp_rast.c:626 #8 rasterize_scene (task=task@entry=0x1e360, scene=0xb3646008) at lp_rast.c:675 #9 0xb6840cb0 in thread_function (init_data=0x1e360) at lp_rast.c:788 #10 0xb6d98ed2 in start_thread () from /lib/libpthread.so.0 #11 0xb6c91058 in ?? () from /lib/libc.so.6 #12 0xb6c91058 in ?? () from /lib/libc.so.6 regards, Marko marko.s.mob...@gmail.com mailto:marko.s.mob...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AwIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=5qZ4KpVhu_wXA8-dTn8xYiul5G713jYwztPr8kfcUZMs=wBNyDvf5DiNBjxXXl7QvyHOvNuTVjZAaw0n7WaQM-bAe= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glx: Create proper server dependency for GLX_EXT_create_context_es2_profile
On 14 April 2015 at 17:35, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com Previously GLX_EXT_create_context_es2_profile was marked as direct only so that it would not depend on server support. Since the extension required functions that are part of GLX_ARB_create_context_profile, support for the EXT was disabled if the ARB was not supported. This was complete rubbish. If the server supported the ARB but not the EXT, sending a request with GLX_CONTEXT_ES2_PROFILE_BIT_EXT would result in GLXBadProfileARB. Instead of the misguided hack, make GLX_EXT_create_context_es2_profile properly depend on server support by not marking it as direct only. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: José Fonseca jfons...@vmware.com Cc: Chad Versace chad.vers...@intel.com Cc: Emil Velikov emil.l.veli...@gmail.com --- src/glx/glxextensions.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/src/glx/glxextensions.c b/src/glx/glxextensions.c index ce5d66d..a326f0d 100644 --- a/src/glx/glxextensions.c +++ b/src/glx/glxextensions.c @@ -84,7 +84,7 @@ static const struct extension_info known_glx_extensions[] = { { GLX(EXT_visual_rating), VER(0,0), Y, Y, N, N }, { GLX(EXT_fbconfig_packed_float), VER(0,0), Y, Y, N, N }, { GLX(EXT_framebuffer_sRGB),VER(0,0), Y, Y, N, N }, - { GLX(EXT_create_context_es2_profile), VER(0,0), Y, N, N, Y }, + { GLX(EXT_create_context_es2_profile), VER(0,0), Y, N, N, N }, I've had a look at these a while back, and I don't think I ever got the idea how an extension can be direct_only without having direct_support. If you can spare a minute that would be greatly appreciated, the glx code can be a bit misty at times. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes
On Wed, Apr 15, 2015 at 10:32 AM, Connor Abbott cwabbo...@gmail.com wrote: But more immediately, I hit a sort of snag: I cannot seem to narrow from 32b to 16b at the same time I move to address register. Which ends up meaning I need a mov from 32b to 16b followed by a 2nd mov to get it into address register... which sort of defeats the purpose of this whole exercise. I need to do some more r/e around this, but it may end up being better the way it was before. And if we end up needing to do the shl in half-precision registers, then solving this in NIR would (I think) require NIR to be aware of half-precision. Which sounds useful, but -EBIGGER_FIRES I don't quite understand... is this just a problem with using registers? Would the entire sequence of operations need to be in 16 bits, or can you have whatever instruction computed your address do the conversion to 16-bit as part of the output? If it's the latter, you can just re-emit a 16-bit-outputting version of it and use that, although it's a bit of a hack. Well, the problem if the shl is done in NIR, I commenly end up with: cov.f32s32 r0.x, r0.x shl.b r0.x, r0.x, 2 cov.s32s16 hr0.x, r0.x mova a0.x, hr0.x vs cov.f32s16 hr0.x, r0.x shl.b hr0.x, hr0.x, 2 mova a0.x, hr0.x (in both cases, with four nop's between each if I don't have other instructions I can schedule in between, so the extra instruction here could translate to 4 cycles in a lot of cases) I tried to see if I could do something like 'shl.b hr0.x, r0.x, 2' but that didn't seem to work.. possibly the narrowing on alu op's only works for float. (The ISA has all these fun holes like that, where the instruction encoding supports things, but the designers apparently didn't think it was worth it to spend transistors for less common cases) Long-term, support for half-precision in NIR is definitely in the cards, but it'll probably have to wait until fp64 support as they're both very similar wrt changes we have to make in the IR. Unless someone has a burning desire to do half-precision first :). It would be useful.. or at least my understanding is half-precision should be faster for me. Although I don't see a particularly good way to solve that until we figure out how to get TGSI out from the middle. I guess I need to take a look at the tie-ins between vbo/uniform/texture/etc state in mesa st and tgsi, and figure out how to make that work without a glsl_to_tgsi pass so drivers can request NIR directly. That probably comes after flow control in ir3 backend, but *eventually* I'll be really interested in proper mediump support. BR, -R ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes
On Tue, Apr 14, 2015 at 7:08 PM, Rob Clark robdcl...@gmail.com wrote: On Tue, Apr 14, 2015 at 6:24 PM, Connor Abbott cwabbo...@gmail.com wrote: On Tue, Apr 14, 2015 at 5:16 PM, Rob Clark robdcl...@gmail.com wrote: On Tue, Apr 14, 2015 at 4:59 PM, Jason Ekstrand ja...@jlekstrand.net wrote: + /** +* Addressing mode for corresponding _indirect intrinsics: +*/ + nir_addressing_mode var_addressing_mode; + nir_addressing_mode input_addressing_mode; + nir_addressing_mode output_addressing_mode; + nir_addressing_mode uniform_addressing_mode; + nir_addressing_mode ubo_addressing_mode; What is var_addressing_mode? Sorry for not bringing that up before. well, originally in my thinking it was for load_var/store_var.. but perhaps that doesn't make sense (given lower_io). Maybe it makes more sense to define is as applying to var_local/var_global (where the others apply to shader_in/shader_out/uniform and their equivalent intrinsic forms)? Maybe it's a bit weird since I don't lower vars to regs before feeding to my ir3 frontend, but the whole load_var/store_var for array access, and ssa for everything else form works kind of nicely for me. BR, -R I don't think we should be letting the driver define the stride of variable array accesses. Variables are supposed to be structured, backend-independent things that core NIR can manipulate and optimize as it pleases; it shouldn't need to know anything about how the driver will index the data. For doing the kinds of optimizations you're talking about, you have registers that are backend-dependent, and core NIR (other than the lower_locals_to_regs) doesn't need to know what the indices mean. What you're doing right now is a hack, and if you want to get the benefits of optimizing the index expression in core NIR you should be using lower_locals_to_regs(). Having scalars be SSA values and arrays be registers can't be that much more complicated than having arrays be variables, and that's how it was set up to work from the beginning. well, it is pretty convenient for me to have direct and indirect array access come via intrinsics, since that gives me a nice single point to do all the magic I need to do to set up instruction dependencies for scheduling and register assignment where the arrays get allocated in registers. Possibly that means we need an option to lower array access to some new sort of intrinsic? Not sure, I'll play with the lower_locals_to_regs without first coming out of SSA.. maybe if then the only things in regs are array accesses, I can achieve the same result. Yeah, it seems like some sort of load_register intrinsic might be more useful to you... there are a few reasons I never added it from the beginning: - You can't have a pointer to the nir_register * that contains useful info like the number of vector components. You would have to search through the list of registers, which takes O(n) time and is a lot more annoying. - There's another usecase for registers, namely backends that don't support SSA at all; there, the possibility of register reads/writes that are arbitrarily ordered compared to how they're used seems like no fun. Having more than one way to represent a register load/store doesn't seem like a great idea either. although I can't imagine supporting the current way of loading/storing registers would be that much more complicated. Variables definitely aren't what you want. But more immediately, I hit a sort of snag: I cannot seem to narrow from 32b to 16b at the same time I move to address register. Which ends up meaning I need a mov from 32b to 16b followed by a 2nd mov to get it into address register... which sort of defeats the purpose of this whole exercise. I need to do some more r/e around this, but it may end up being better the way it was before. And if we end up needing to do the shl in half-precision registers, then solving this in NIR would (I think) require NIR to be aware of half-precision. Which sounds useful, but -EBIGGER_FIRES I don't quite understand... is this just a problem with using registers? Would the entire sequence of operations need to be in 16 bits, or can you have whatever instruction computed your address do the conversion to 16-bit as part of the output? If it's the latter, you can just re-emit a 16-bit-outputting version of it and use that, although it's a bit of a hack. Long-term, support for half-precision in NIR is definitely in the cards, but it'll probably have to wait until fp64 support as they're both very similar wrt changes we have to make in the IR. Unless someone has a burning desire to do half-precision first :). The other problem is that currently ttn gives addr src in float, which is how things are in tgsi land. I'm not sure if changing this will be a problem for Eric. An interesting alternative solution to consider, is to allow the backend to lower to driver specific specific alu opcodes.. and then
[Mesa-dev] [PATCH v2] nir: Try commutative sources in CSE
From: Ian Romanick ian.d.roman...@intel.com Shader-db results: GM45 NIR: total instructions in shared programs: 4082044 - 4081919 (-0.00%) instructions in affected programs: 27609 - 27484 (-0.45%) helped:44 Iron Lake NIR: total instructions in shared programs: 5678776 - 5678646 (-0.00%) instructions in affected programs: 27406 - 27276 (-0.47%) helped:45 Sandy Bridge NIR: total instructions in shared programs: 7329995 - 7329096 (-0.01%) instructions in affected programs: 142035 - 141136 (-0.63%) helped:406 HURT: 19 Ivy Bridge NIR: total instructions in shared programs: 6769314 - 6768359 (-0.01%) instructions in affected programs: 140820 - 139865 (-0.68%) helped:423 HURT: 2 Haswell NIR: total instructions in shared programs: 6183693 - 6183298 (-0.01%) instructions in affected programs: 96538 - 96143 (-0.41%) helped:303 HURT: 4 Broadwell NIR: total instructions in shared programs: 7501711 - 7498170 (-0.05%) instructions in affected programs: 266403 - 262862 (-1.33%) helped:705 HURT: 5 GAINED:4 v2: Rebase on top of Connor's fix. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Reviewed-by: Jordan Justen jordan.l.jus...@intel.com [v1] Cc: Jason Ekstrand jason.ekstr...@intel.com Cc: Connor Abbott cwabbo...@gmail.com --- The v2 changes were a little more intrusive than before, so it seemed worth sending out again. src/glsl/nir/nir_opt_cse.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/src/glsl/nir/nir_opt_cse.c b/src/glsl/nir/nir_opt_cse.c index 56d491c..3c5ece7 100644 --- a/src/glsl/nir/nir_opt_cse.c +++ b/src/glsl/nir/nir_opt_cse.c @@ -37,18 +37,19 @@ struct cse_state { }; static bool -nir_alu_srcs_equal(nir_alu_instr *alu1, nir_alu_instr *alu2, unsigned src) +nir_alu_srcs_equal(nir_alu_instr *alu1, nir_alu_instr *alu2, unsigned src1, + unsigned src2) { - if (alu1-src[src].abs != alu2-src[src].abs || - alu1-src[src].negate != alu2-src[src].negate) + if (alu1-src[src1].abs != alu2-src[src2].abs || + alu1-src[src1].negate != alu2-src[src2].negate) return false; - for (unsigned i = 0; i nir_ssa_alu_instr_src_components(alu1, src); i++) { - if (alu1-src[src].swizzle[i] != alu2-src[src].swizzle[i]) + for (unsigned i = 0; i nir_ssa_alu_instr_src_components(alu1, src1); i++) { + if (alu1-src[src1].swizzle[i] != alu2-src[src2].swizzle[i]) return false; } - return nir_srcs_equal(alu1-src[src].src, alu2-src[src].src); + return nir_srcs_equal(alu1-src[src1].src, alu2-src[src2].src); } static bool @@ -71,9 +72,17 @@ nir_instrs_equal(nir_instr *instr1, nir_instr *instr2) if (alu1-dest.dest.ssa.num_components != alu2-dest.dest.ssa.num_components) return false; - for (unsigned i = 0; i nir_op_infos[alu1-op].num_inputs; i++) { - if (!nir_alu_srcs_equal(alu1, alu2, i)) -return false; + if (nir_op_infos[alu1-op].num_inputs == 2 + (nir_op_infos[alu1-op].algebraic_properties NIR_OP_IS_COMMUTATIVE)) { + return (nir_alu_srcs_equal(alu1, alu2, 0, 0) + nir_alu_srcs_equal(alu1, alu2, 1, 1)) || +(nir_alu_srcs_equal(alu1, alu2, 0, 1) + nir_alu_srcs_equal(alu1, alu2, 1, 0)); + } else { + for (unsigned i = 0; i nir_op_infos[alu1-op].num_inputs; i++) { +if (!nir_alu_srcs_equal(alu1, alu2, i, i)) + return false; + } } return true; } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] llvmpipe support for ARM?
Thanks for the replies. I tried to upgrade LLVM from 3.3 to 3.4 after I sent the e-mail but ended up having an internal compiler error from gcc linaro toolchain compiler. Nice. I could try a more up-to-date version of Mesa to see if it helps. Although I have learned that I can not take the most up-to-date version since something has gotten broken related to SW rendering/wayland/egl. Marko On Wed, Apr 15, 2015 at 6:45 PM, Jan Vesely jan.ves...@rutgers.edu wrote: Hi, I can run arm llvmpipe glxgears on IFC6410 (Snapdragon 600 APQ8064, Krait 300) without problems. llvm 3.5 mesa 10.4.3 both are fedora 21 standard packages without modifications. I haven't tried anything more complex than glxgears jan On Wed, 2015-04-15 at 16:30 +0200, Roland Scheidegger wrote: There were reports with varying success of running llvmpipe with arm in the past but I haven't heard anything for a while. In theory it should work (and crashing in the pixel shader is an indication the vertex shader worked or at least didn't crash). But it's not really tested and hence might easily break. My suggestion would be to try a newer Mesa version since 9.2.5 is very old, but I really don't know if it will help. If that doesn't help, you could try debugging it, you can get the crashing instruction in gdb with x/i address and see if that was an unaligned access which should have been aligned or something else. GALLIVM_DEBUG=ir,asm will get you the IR and compiled code (with debug builds). Roland Am 15.04.2015 um 14:37 schrieb Marko Moberg: Hi, I have been trying to get OpenGL sw rastering to work on Linux based ARM CortexA15 device but for some reason gallium llvmpipe gives me a segmentation fault. Has anybody managed to run sw rastering using gallium llvmpipe on ARM? Softpipe seems to be working ok. I am using the following versions: - Mesa 9.2.5 - LLVM3.3 - Wayland/Weston 1.5 The issue is related to opengl/gles2 rendering i.e. weston-simple-egl demo and Qt openGL demos give seg faults when executed with llvmpipe enabled. Backtrace from weston-simple-egl looks something like this: #0 0x6d7de620 in ?? () #1 0xb354004c in fs17_variant0_partial () #2 0xb6841740 in lp_rast_shade_quads_mask (task=task@entry=0x1e360, inputs=inputs@entry=0x47a00, x=128, y=80, mask=4369) at lp_rast.c:466 #3 0xb6842d04 in do_block_4_1 (c=optimized out, y=optimized out, x=optimized out, plane=optimized out, tri=optimized out, task=optimized out) at lp_rast_tri_tmp.h:61 #4 do_block_16_1 (c=synthetic pointer, y=optimized out, x=optimized out, plane=0xb4f4ccf0, tri=optimized out, task=optimized out) at lp_rast_tri_tmp.h:130 #5 lp_rast_triangle_1 (task=0x1e360, arg=...) at lp_rast_tri_tmp.h:232 #6 0xb6840420 in do_rasterize_bin (bin=optimized out, task=0x1e360, x=optimized out, y=optimized out) at lp_rast.c:607 #7 rasterize_bin (y=optimized out, x=optimized out, bin=optimized out, task=0x1e360) at lp_rast.c:626 #8 rasterize_scene (task=task@entry=0x1e360, scene=0xb3646008) at lp_rast.c:675 #9 0xb6840cb0 in thread_function (init_data=0x1e360) at lp_rast.c:788 #10 0xb6d98ed2 in start_thread () from /lib/libpthread.so.0 #11 0xb6c91058 in ?? () from /lib/libc.so.6 #12 0xb6c91058 in ?? () from /lib/libc.so.6 regards, Marko marko.s.mob...@gmail.com mailto:marko.s.mob...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AwIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=5qZ4KpVhu_wXA8-dTn8xYiul5G713jYwztPr8kfcUZMs=wBNyDvf5DiNBjxXXl7QvyHOvNuTVjZAaw0n7WaQM-bAe= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev -- Jan Vesely jan.ves...@rutgers.edu ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] nir: Try commutative sources in CSE
On Wed, Apr 15, 2015 at 12:37 PM, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com Shader-db results: GM45 NIR: total instructions in shared programs: 4082044 - 4081919 (-0.00%) instructions in affected programs: 27609 - 27484 (-0.45%) helped:44 Iron Lake NIR: total instructions in shared programs: 5678776 - 5678646 (-0.00%) instructions in affected programs: 27406 - 27276 (-0.47%) helped:45 Sandy Bridge NIR: total instructions in shared programs: 7329995 - 7329096 (-0.01%) instructions in affected programs: 142035 - 141136 (-0.63%) helped:406 HURT: 19 Ivy Bridge NIR: total instructions in shared programs: 6769314 - 6768359 (-0.01%) instructions in affected programs: 140820 - 139865 (-0.68%) helped:423 HURT: 2 Haswell NIR: total instructions in shared programs: 6183693 - 6183298 (-0.01%) instructions in affected programs: 96538 - 96143 (-0.41%) helped:303 HURT: 4 Broadwell NIR: total instructions in shared programs: 7501711 - 7498170 (-0.05%) instructions in affected programs: 266403 - 262862 (-1.33%) helped:705 HURT: 5 GAINED:4 v2: Rebase on top of Connor's fix. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Reviewed-by: Jordan Justen jordan.l.jus...@intel.com [v1] Cc: Jason Ekstrand jason.ekstr...@intel.com Cc: Connor Abbott cwabbo...@gmail.com --- The v2 changes were a little more intrusive than before, so it seemed worth sending out again. src/glsl/nir/nir_opt_cse.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/src/glsl/nir/nir_opt_cse.c b/src/glsl/nir/nir_opt_cse.c index 56d491c..3c5ece7 100644 --- a/src/glsl/nir/nir_opt_cse.c +++ b/src/glsl/nir/nir_opt_cse.c @@ -37,18 +37,19 @@ struct cse_state { }; static bool -nir_alu_srcs_equal(nir_alu_instr *alu1, nir_alu_instr *alu2, unsigned src) +nir_alu_srcs_equal(nir_alu_instr *alu1, nir_alu_instr *alu2, unsigned src1, + unsigned src2) { - if (alu1-src[src].abs != alu2-src[src].abs || - alu1-src[src].negate != alu2-src[src].negate) + if (alu1-src[src1].abs != alu2-src[src2].abs || + alu1-src[src1].negate != alu2-src[src2].negate) return false; - for (unsigned i = 0; i nir_ssa_alu_instr_src_components(alu1, src); i++) { - if (alu1-src[src].swizzle[i] != alu2-src[src].swizzle[i]) + for (unsigned i = 0; i nir_ssa_alu_instr_src_components(alu1, src1); i++) { + if (alu1-src[src1].swizzle[i] != alu2-src[src2].swizzle[i]) return false; } - return nir_srcs_equal(alu1-src[src].src, alu2-src[src].src); + return nir_srcs_equal(alu1-src[src1].src, alu2-src[src2].src); } static bool @@ -71,9 +72,17 @@ nir_instrs_equal(nir_instr *instr1, nir_instr *instr2) if (alu1-dest.dest.ssa.num_components != alu2-dest.dest.ssa.num_components) return false; - for (unsigned i = 0; i nir_op_infos[alu1-op].num_inputs; i++) { - if (!nir_alu_srcs_equal(alu1, alu2, i)) -return false; + if (nir_op_infos[alu1-op].num_inputs == 2 + (nir_op_infos[alu1-op].algebraic_properties NIR_OP_IS_COMMUTATIVE)) { Just a couple of thoughts (could well go into a later patch, or not at all) -- (a) is the NIR_OP_IS_COMMUTATIVE flag ever set when num_inputs != 2? (b) I see that there is an isub/fsub/etc, which are (hopefully) not marked as commutative. Would it make sense to first get rid of them (and introduce negate flags) before CSE is done? I guess it's an argument that expressions should have a clear canonical form for maximal CSE, after which various negs can be propagated through. + return (nir_alu_srcs_equal(alu1, alu2, 0, 0) + nir_alu_srcs_equal(alu1, alu2, 1, 1)) || +(nir_alu_srcs_equal(alu1, alu2, 0, 1) + nir_alu_srcs_equal(alu1, alu2, 1, 0)); + } else { + for (unsigned i = 0; i nir_op_infos[alu1-op].num_inputs; i++) { +if (!nir_alu_srcs_equal(alu1, alu2, i, i)) + return false; + } } return true; } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/13] i965/fs_cse: Factor out code to create copy instructions
On Wed, Apr 1, 2015 at 6:19 PM, Jason Ekstrand ja...@jlekstrand.net wrote: --- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 75 1 file changed, 38 insertions(+), 37 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index f2c4098..dd199fa 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -183,6 +183,29 @@ instructions_match(fs_inst *a, fs_inst *b, bool *negate) operands_match(a, b, negate); } +static fs_inst * +create_copy_instr(fs_visitor *v, fs_inst *inst, fs_reg src, bblock_t *block, const fs_reg src With that and Topi's comments addressed, Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2 v2] i965/skl: Add the header for constant loads outside of the generator
Commit 5a06ee738 added a step to the generator to set up the message header when generating the VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 instruction. That pseudo opcode is implemented in terms of multiple actual opcodes, one of which writes to one of the source registers in order to set up the message header. This causes problems because the scheduler isn't aware that the source register is written to and it can end up reorganising the instructions incorrectly such that the write to the source register overwrites a needed value from a previous instruction. This problem was presenting itself as a rendering error in the weapon in Enemy Territory: Quake Wars. Since commit 588859e1 there is an additional problem that the double register allocated to include the message header would end up being split into two. This wasn't happening previously because the code to split registers was explicitly avoided for instructions that are sending from the GRF. This patch fixes both problems by splitting the code to set up the message header into a new pseudo opcode so that it will be done outside of the generator. This new opcode has the header register as a destination so the scheduler can recognise that the register is written to. This has the additional benefit that the scheduler can optimise the message header slightly better by moving the mov instructions further away from the send instructions. On Skylake it appears to fix the following three Piglit tests without causing any regressions: gs-float-array-variable-index gs-mat3x4-row-major gs-mat4x3-row-major I think we actually may need to do something similar for the fs backend and possibly for message headers from regular texture sampling but I'm not entirely sure. v2: Make sure the exec-size is retained as 8 for the mov instruction to initialise the header from g0. This was accidentally lost during a rebase on top of 07c571a39fa1. Split the patch into two so that the helper function is a separate change. Fix emitting the MOV instruction on Gen7. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89058 --- src/mesa/drivers/dri/i965/brw_defines.h| 1 + src/mesa/drivers/dri/i965/brw_shader.cpp | 4 ++ src/mesa/drivers/dri/i965/brw_vec4.h | 2 + .../dri/i965/brw_vec4_dead_code_eliminate.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 52 +++--- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 38 6 files changed, 63 insertions(+), 35 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index da6ed5b..a97a944 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -948,6 +948,7 @@ enum opcode { VS_OPCODE_URB_WRITE, VS_OPCODE_PULL_CONSTANT_LOAD, VS_OPCODE_PULL_CONSTANT_LOAD_GEN7, + VS_OPCODE_SET_SIMD4X2_HEADER_GEN9, VS_OPCODE_UNPACK_FLAGS_SIMD4X2, /** diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 335a800..0d6ac0c 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -568,6 +568,10 @@ brw_instruction_name(enum opcode op) return pull_constant_load; case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7: return pull_constant_load_gen7; + + case VS_OPCODE_SET_SIMD4X2_HEADER_GEN9: + return set_simd4x2_header_gen9; + case VS_OPCODE_UNPACK_FLAGS_SIMD4X2: return unpack_flags_simd4x2; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 0363924..a0ee2cc 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -500,6 +500,8 @@ private: struct brw_reg dst, struct brw_reg surf_index, struct brw_reg offset); + void generate_set_simd4x2_header_gen9(vec4_instruction *inst, + struct brw_reg dst); void generate_unpack_flags(struct brw_reg dst); void generate_untyped_atomic(vec4_instruction *inst, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp index 980e266..70d2af5 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp @@ -44,6 +44,7 @@ can_do_writemask(const struct brw_context *brw, case SHADER_OPCODE_GEN4_SCRATCH_READ: case VS_OPCODE_PULL_CONSTANT_LOAD: case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7: + case VS_OPCODE_SET_SIMD4X2_HEADER_GEN9: return false; default: /* The MATH instruction on Gen6 only executes in align1 mode, which does diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index
[Mesa-dev] [PATCH 1/2] i965/vec4: Add a helper function to emit VS_OPCODE_PULL_CONSTANT_LOAD
There were three places in the visitor that had a similar chunk of code to emit the VS_OPCODE_PULL_CONSTANT_LOAD opcode using a register for the offset. This patch combines the chunks into a helper function to reduce the code duplication. It will also be useful in the next patch to expand what happens on Gen9+. This shouldn't introduce any functional changes. --- src/mesa/drivers/dri/i965/brw_vec4.h | 5 ++ src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 120 + src/mesa/drivers/dri/i965/brw_vec4_vp.cpp | 27 ++ 3 files changed, 75 insertions(+), 77 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 700ca69..0363924 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -364,6 +364,11 @@ public: dst_reg dst, src_reg orig_src, int base_offset); + void emit_pull_constant_load_reg(dst_reg dst, +src_reg surf_index, +src_reg offset, +bblock_t *before_block, +vec4_instruction *before_inst); src_reg emit_resolve_reladdr(int scratch_loc[], bblock_t *block, vec4_instruction *inst, src_reg src); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index ffbe04d..f7d542b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1296,6 +1296,63 @@ vec4_visitor::emit_lrp(const dst_reg dst, } } +/** + * Emits the instructions needed to perform a pull constant load. before_block + * and before_inst can be NULL in which case the instruction will be appended + * to the end of the instruction list. + */ +void +vec4_visitor::emit_pull_constant_load_reg(dst_reg dst, + src_reg surf_index, + src_reg offset_reg, + bblock_t *before_block, + vec4_instruction *before_inst) +{ + assert((before_inst == NULL before_block == NULL) || + (before_inst before_block)); + + vec4_instruction *pull; + + if (brw-gen = 7) { + dst_reg grf_offset = dst_reg(this, glsl_type::int_type); + + /* We have to use a message header on Skylake to get SIMD4x2 mode. + * Reserve space for the register. + */ + if (brw-gen = 9) { + grf_offset.reg_offset++; + alloc.sizes[grf_offset.reg] = 2; + } + + grf_offset.type = offset_reg.type; + + pull = MOV(grf_offset, offset_reg); + + if (before_inst) + emit_before(before_block, before_inst, pull); + else + emit(pull); + + pull = new(mem_ctx) vec4_instruction(VS_OPCODE_PULL_CONSTANT_LOAD_GEN7, + dst, + surf_index, + src_reg(grf_offset)); + pull-mlen = 1; + } else { + pull = new(mem_ctx) vec4_instruction(VS_OPCODE_PULL_CONSTANT_LOAD, + dst, + surf_index, + offset_reg); + pull-base_mrf = 14; + pull-mlen = 1; + } + + if (before_inst) + emit_before(before_block, before_inst, pull); + else + emit(pull); +} + void vec4_visitor::visit(ir_expression *ir) { @@ -1774,36 +1831,10 @@ vec4_visitor::visit(ir_expression *ir) emit(SHR(dst_reg(offset), op[1], src_reg(4))); } - if (brw-gen = 7) { - dst_reg grf_offset = dst_reg(this, glsl_type::int_type); - - /* We have to use a message header on Skylake to get SIMD4x2 mode. - * Reserve space for the register. - */ - if (brw-gen = 9) { -grf_offset.reg_offset++; -alloc.sizes[grf_offset.reg] = 2; - } - - grf_offset.type = offset.type; - - emit(MOV(grf_offset, offset)); - - vec4_instruction *pull = -emit(new(mem_ctx) vec4_instruction(VS_OPCODE_PULL_CONSTANT_LOAD_GEN7, - dst_reg(packed_consts), - surf_index, - src_reg(grf_offset))); - pull-mlen = 1; - } else { - vec4_instruction *pull = -emit(new(mem_ctx) vec4_instruction(VS_OPCODE_PULL_CONSTANT_LOAD, - dst_reg(packed_consts), - surf_index, - offset)); - pull-base_mrf = 14; - pull-mlen = 1; - }
Re: [Mesa-dev] [PATCH 2/2 v2] i965/skl: Add the header for constant loads outside of the generator
On Wed, Apr 15, 2015 at 06:58:02PM +0100, Neil Roberts wrote: Commit 5a06ee738 added a step to the generator to set up the message header when generating the VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 instruction. That pseudo opcode is implemented in terms of multiple actual opcodes, one of which writes to one of the source registers in order to set up the message header. This causes problems because the scheduler isn't aware that the source register is written to and it can end up reorganising the instructions incorrectly such that the write to the source register overwrites a needed value from a previous instruction. This problem was presenting itself as a rendering error in the weapon in Enemy Territory: Quake Wars. Since commit 588859e1 there is an additional problem that the double register allocated to include the message header would end up being split into two. This wasn't happening previously because the code to split registers was explicitly avoided for instructions that are sending from the GRF. This patch fixes both problems by splitting the code to set up the message header into a new pseudo opcode so that it will be done outside of the generator. This new opcode has the header register as a destination so the scheduler can recognise that the register is written to. This has the additional benefit that the scheduler can optimise the message header slightly better by moving the mov instructions further away from the send instructions. On Skylake it appears to fix the following three Piglit tests without causing any regressions: gs-float-array-variable-index gs-mat3x4-row-major gs-mat4x3-row-major I think we actually may need to do something similar for the fs backend and possibly for message headers from regular texture sampling but I'm not entirely sure. v2: Make sure the exec-size is retained as 8 for the mov instruction to initialise the header from g0. This was accidentally lost during a rebase on top of 07c571a39fa1. Split the patch into two so that the helper function is a separate change. Fix emitting the MOV instruction on Gen7. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89058 Very nice fix. Reviewed-by: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/brw_defines.h| 1 + src/mesa/drivers/dri/i965/brw_shader.cpp | 4 ++ src/mesa/drivers/dri/i965/brw_vec4.h | 2 + .../dri/i965/brw_vec4_dead_code_eliminate.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 52 +++--- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 38 6 files changed, 63 insertions(+), 35 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index da6ed5b..a97a944 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -948,6 +948,7 @@ enum opcode { VS_OPCODE_URB_WRITE, VS_OPCODE_PULL_CONSTANT_LOAD, VS_OPCODE_PULL_CONSTANT_LOAD_GEN7, + VS_OPCODE_SET_SIMD4X2_HEADER_GEN9, VS_OPCODE_UNPACK_FLAGS_SIMD4X2, /** diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 335a800..0d6ac0c 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -568,6 +568,10 @@ brw_instruction_name(enum opcode op) return pull_constant_load; case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7: return pull_constant_load_gen7; + + case VS_OPCODE_SET_SIMD4X2_HEADER_GEN9: + return set_simd4x2_header_gen9; + case VS_OPCODE_UNPACK_FLAGS_SIMD4X2: return unpack_flags_simd4x2; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 0363924..a0ee2cc 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -500,6 +500,8 @@ private: struct brw_reg dst, struct brw_reg surf_index, struct brw_reg offset); + void generate_set_simd4x2_header_gen9(vec4_instruction *inst, + struct brw_reg dst); void generate_unpack_flags(struct brw_reg dst); void generate_untyped_atomic(vec4_instruction *inst, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp index 980e266..70d2af5 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp @@ -44,6 +44,7 @@ can_do_writemask(const struct brw_context *brw, case SHADER_OPCODE_GEN4_SCRATCH_READ: case VS_OPCODE_PULL_CONSTANT_LOAD: case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7: + case VS_OPCODE_SET_SIMD4X2_HEADER_GEN9: return
Re: [Mesa-dev] [PATCH v2] nir: Try commutative sources in CSE
On Wed, Apr 15, 2015 at 9:50 AM, Ilia Mirkin imir...@alum.mit.edu wrote: On Wed, Apr 15, 2015 at 12:37 PM, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com Shader-db results: GM45 NIR: total instructions in shared programs: 4082044 - 4081919 (-0.00%) instructions in affected programs: 27609 - 27484 (-0.45%) helped:44 Iron Lake NIR: total instructions in shared programs: 5678776 - 5678646 (-0.00%) instructions in affected programs: 27406 - 27276 (-0.47%) helped:45 Sandy Bridge NIR: total instructions in shared programs: 7329995 - 7329096 (-0.01%) instructions in affected programs: 142035 - 141136 (-0.63%) helped:406 HURT: 19 Ivy Bridge NIR: total instructions in shared programs: 6769314 - 6768359 (-0.01%) instructions in affected programs: 140820 - 139865 (-0.68%) helped:423 HURT: 2 Haswell NIR: total instructions in shared programs: 6183693 - 6183298 (-0.01%) instructions in affected programs: 96538 - 96143 (-0.41%) helped:303 HURT: 4 Broadwell NIR: total instructions in shared programs: 7501711 - 7498170 (-0.05%) instructions in affected programs: 266403 - 262862 (-1.33%) helped:705 HURT: 5 GAINED:4 v2: Rebase on top of Connor's fix. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Reviewed-by: Jordan Justen jordan.l.jus...@intel.com [v1] Cc: Jason Ekstrand jason.ekstr...@intel.com Cc: Connor Abbott cwabbo...@gmail.com --- The v2 changes were a little more intrusive than before, so it seemed worth sending out again. src/glsl/nir/nir_opt_cse.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/src/glsl/nir/nir_opt_cse.c b/src/glsl/nir/nir_opt_cse.c index 56d491c..3c5ece7 100644 --- a/src/glsl/nir/nir_opt_cse.c +++ b/src/glsl/nir/nir_opt_cse.c @@ -37,18 +37,19 @@ struct cse_state { }; static bool -nir_alu_srcs_equal(nir_alu_instr *alu1, nir_alu_instr *alu2, unsigned src) +nir_alu_srcs_equal(nir_alu_instr *alu1, nir_alu_instr *alu2, unsigned src1, + unsigned src2) { - if (alu1-src[src].abs != alu2-src[src].abs || - alu1-src[src].negate != alu2-src[src].negate) + if (alu1-src[src1].abs != alu2-src[src2].abs || + alu1-src[src1].negate != alu2-src[src2].negate) return false; - for (unsigned i = 0; i nir_ssa_alu_instr_src_components(alu1, src); i++) { - if (alu1-src[src].swizzle[i] != alu2-src[src].swizzle[i]) + for (unsigned i = 0; i nir_ssa_alu_instr_src_components(alu1, src1); i++) { + if (alu1-src[src1].swizzle[i] != alu2-src[src2].swizzle[i]) return false; } - return nir_srcs_equal(alu1-src[src].src, alu2-src[src].src); + return nir_srcs_equal(alu1-src[src1].src, alu2-src[src2].src); } static bool @@ -71,9 +72,17 @@ nir_instrs_equal(nir_instr *instr1, nir_instr *instr2) if (alu1-dest.dest.ssa.num_components != alu2-dest.dest.ssa.num_components) return false; - for (unsigned i = 0; i nir_op_infos[alu1-op].num_inputs; i++) { - if (!nir_alu_srcs_equal(alu1, alu2, i)) -return false; + if (nir_op_infos[alu1-op].num_inputs == 2 + (nir_op_infos[alu1-op].algebraic_properties NIR_OP_IS_COMMUTATIVE)) { Just a couple of thoughts (could well go into a later patch, or not at all) -- (a) is the NIR_OP_IS_COMMUTATIVE flag ever set when num_inputs != 2? No it's not. It would probably be reasonable to turn that into an assert. With that changed (and piglitted to make sure I'm right), Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com (b) I see that there is an isub/fsub/etc, which are (hopefully) not marked as commutative. Would it make sense to first get rid of them (and introduce negate flags) before CSE is done? I guess it's an argument that expressions should have a clear canonical form for maximal CSE, after which various negs can be propagated through. We (i965) do at the moment. Eric doesn't for vc4 but that was his choice. + return (nir_alu_srcs_equal(alu1, alu2, 0, 0) + nir_alu_srcs_equal(alu1, alu2, 1, 1)) || +(nir_alu_srcs_equal(alu1, alu2, 0, 1) + nir_alu_srcs_equal(alu1, alu2, 1, 0)); + } else { + for (unsigned i = 0; i nir_op_infos[alu1-op].num_inputs; i++) { +if (!nir_alu_srcs_equal(alu1, alu2, i, i)) + return false; + } } return true; } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
[Mesa-dev] [PATCH] glx: Massive update of comments in struct extension_info
From: Ian Romanick ian.d.roman...@intel.com In response to another patch, Emil asked for some clarification how this stuff works. Rather than just reply to the e-mail, I decided to update the exlanation in the code. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: Emil Velikov emil.l.veli...@gmail.com --- src/glx/glxextensions.c | 69 ++--- 1 file changed, 65 insertions(+), 4 deletions(-) diff --git a/src/glx/glxextensions.c b/src/glx/glxextensions.c index a326f0d..cb8cd66 100644 --- a/src/glx/glxextensions.c +++ b/src/glx/glxextensions.c @@ -61,12 +61,73 @@ struct extension_info */ unsigned char version_major; unsigned char version_minor; + + /** +* The client (i.e., libGL) supports this extension. +* +* Except during bring up, all extensions should have this set to Y. There +* are a few cases of extensions that have partial (or speculative) +* support, but these are rare. There also shouldn't be any new ones +* added. +* +* Generally, extensions require server support and ::client_support to be +* enabled. If the display is capable of direct rendering, +* ::direct_support is also required. +* +* \sa ::client_only +*/ unsigned char client_support; + + /** +* The direct-renderer (e.g., i965_dri.so) supports this extension. +* +* For cases where all of the infrastructure to support the extension is a +* required part of the loader/driver interface, this can default to Y. +* For most cases, extended functionality, usually in the form of DRI2 +* extensions, is necessary to support the extension. The loader will set +* the flag true if all the requirements are met. +* +* If the display is capable of direct rendering, ::direct_support is +* required for the extension to be enabled. +*/ unsigned char direct_support; - unsigned char client_only;/** Is the extension client-side only? */ - unsigned char direct_only;/** Is the extension for direct - * contexts only? - */ + + /** +* The extension depends only on client support. +* +* This is for extensions like GLX_ARB_get_proc_address that are contained +* entirely in the client library. There is no dependency on the server or +* the direct-renderer. +* +* These extensions will be enabled if ::client_support is set. +* +* \note +* An extension \b cannot be both client-only and direct-only because being +* direct-only implies a dependency on the direct renderer. +* +* \sa ::client_support, ::direct_only +*/ + unsigned char client_only; + + /** +* The extension only functions with direct-rendering contexts +* +* The extension has no GLX protocol, and, therefore, no explicit +* dependency on the server. The functionality is contained entirely in +* the client library and the direct renderer. A few of the swap-related +* extensions are intended to behave this way. +* +* These extensions will be enabled if both ::client_support and +* ::direct_support are set. +* +* \note +* An extension \b cannot be both client-only and direct-only because being +* client-only implies that all functionality is outside the +* direct-renderer. +* +* \sa ::direct_support, ::client_only +*/ + unsigned char direct_only; }; /* *INDENT-OFF* */ -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Render R16G16B16X16 as R16G16B16A16
This enables using _mesa_meta_pbo_TexSubImage() to upload data to R16G16B16X16 texture. Earlier it fell back to slower paths. Jenkins run shows no piglit regressions. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/brw_surface_formats.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index f04bac5..7bec8fa 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -582,6 +582,12 @@ brw_init_surface_formats(struct brw_context *brw) case BRW_SURFACEFORMAT_L16_FLOAT: render = BRW_SURFACEFORMAT_R16_FLOAT; break; + case BRW_SURFACEFORMAT_R16G16B16X16_UNORM: + render = BRW_SURFACEFORMAT_R16G16B16A16_UNORM; + break; + case BRW_SURFACEFORMAT_R16G16B16X16_FLOAT: + render = BRW_SURFACEFORMAT_R16G16B16A16_FLOAT; + break; case BRW_SURFACEFORMAT_B8G8R8X8_UNORM: /* XRGB is handled as ARGB because the chips in this family * cannot render to XRGB targets. This means that we have to -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Update the comment about platforms supporting blorp
Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp index 644cb41..7fd812e 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp @@ -223,8 +223,8 @@ brw_blorp_copytexsubimage(struct brw_context *brw, struct intel_mipmap_tree *src_mt = src_irb-mt; struct intel_mipmap_tree *dst_mt = intel_image-mt; - /* BLORP is not supported before Gen6. */ - if (brw-gen 6 || brw-gen = 8) + /* BLORP is only supported for Gen6-7. */ + if (brw-gen 6 || brw-gen 7) return false; if (_mesa_get_format_base_format(src_rb-Format) != -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Change the order of conditions tested in if
Reduces the number of conditions tested in if to one in case of non-integer formats. Makes no functional changes. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/brw_surface_formats.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index 7261c01..f04bac5 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -683,9 +683,10 @@ brw_render_target_supported(struct brw_context *brw, * available to fake it like we do for XRGB. Force them to being * unsupported. */ - if ((rb-_BaseFormat != GL_RGBA + if (_mesa_is_format_integer_color(format) + (rb-_BaseFormat != GL_RGBA rb-_BaseFormat != GL_RG - rb-_BaseFormat != GL_RED) _mesa_is_format_integer_color(format)) + rb-_BaseFormat != GL_RED)) return false; /* Under some conditions, MSAA is not supported for formats whose width is -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/9] prog; add weak _mesa_error_no_memory() symbol and add it to libglsl_util
On 15 April 2015 at 18:33, Matt Turner matts...@gmail.com wrote: On Wed, Apr 15, 2015 at 7:08 AM, Emil Velikov emil.l.veli...@gmail.com wrote: Rather than forcing everyone to provide their own definition of the symbol provide a common weak one, which anyone can override if needed. This resolved the build of the standalone pipe-drivers, as it provides a default symbol which was missing previously. Cc: Rob Clark robcl...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/Makefile.am| 3 ++- src/glsl/SConscript| 2 ++ src/mesa/Android.libmesa_glsl_utils.mk | 6 -- src/mesa/program/weak_errors.c | 30 ++ Is program/ really the right place for this? (maybe main/ instead?) Hmm main/ might be better indeed. If there are any recommendations about the name I'll gladly take it :) Jose/Brian From a quick look I cannot find a suitable alternative for __attribute__((weak)) for MSVC. Would you know any of the top of your head or should I just wrap the attribute in ifndef _MSC_VER and leave the rest ? If I drop the weak_errors.c hunk from the SCons here, and replace tests/common.c with it things should just work. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965/vec4: Add a helper function to emit VS_OPCODE_PULL_CONSTANT_LOAD
On Wed, Apr 15, 2015 at 06:58:01PM +0100, Neil Roberts wrote: There were three places in the visitor that had a similar chunk of code to emit the VS_OPCODE_PULL_CONSTANT_LOAD opcode using a register for the offset. This patch combines the chunks into a helper function to reduce the code duplication. It will also be useful in the next patch to expand what happens on Gen9+. This shouldn't introduce any functional changes. Hopefully you agreed and didn't just do it for me :-) Reviewed-by: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/brw_vec4.h | 5 ++ src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 120 + src/mesa/drivers/dri/i965/brw_vec4_vp.cpp | 27 ++ 3 files changed, 75 insertions(+), 77 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 700ca69..0363924 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -364,6 +364,11 @@ public: dst_reg dst, src_reg orig_src, int base_offset); + void emit_pull_constant_load_reg(dst_reg dst, +src_reg surf_index, +src_reg offset, +bblock_t *before_block, +vec4_instruction *before_inst); src_reg emit_resolve_reladdr(int scratch_loc[], bblock_t *block, vec4_instruction *inst, src_reg src); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index ffbe04d..f7d542b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1296,6 +1296,63 @@ vec4_visitor::emit_lrp(const dst_reg dst, } } +/** + * Emits the instructions needed to perform a pull constant load. before_block + * and before_inst can be NULL in which case the instruction will be appended + * to the end of the instruction list. + */ +void +vec4_visitor::emit_pull_constant_load_reg(dst_reg dst, + src_reg surf_index, + src_reg offset_reg, + bblock_t *before_block, + vec4_instruction *before_inst) +{ + assert((before_inst == NULL before_block == NULL) || + (before_inst before_block)); + + vec4_instruction *pull; + + if (brw-gen = 7) { + dst_reg grf_offset = dst_reg(this, glsl_type::int_type); + + /* We have to use a message header on Skylake to get SIMD4x2 mode. + * Reserve space for the register. + */ + if (brw-gen = 9) { + grf_offset.reg_offset++; + alloc.sizes[grf_offset.reg] = 2; + } + + grf_offset.type = offset_reg.type; + + pull = MOV(grf_offset, offset_reg); + + if (before_inst) + emit_before(before_block, before_inst, pull); + else + emit(pull); + + pull = new(mem_ctx) vec4_instruction(VS_OPCODE_PULL_CONSTANT_LOAD_GEN7, + dst, + surf_index, + src_reg(grf_offset)); + pull-mlen = 1; + } else { + pull = new(mem_ctx) vec4_instruction(VS_OPCODE_PULL_CONSTANT_LOAD, + dst, + surf_index, + offset_reg); + pull-base_mrf = 14; + pull-mlen = 1; + } + + if (before_inst) + emit_before(before_block, before_inst, pull); + else + emit(pull); +} + void vec4_visitor::visit(ir_expression *ir) { @@ -1774,36 +1831,10 @@ vec4_visitor::visit(ir_expression *ir) emit(SHR(dst_reg(offset), op[1], src_reg(4))); } - if (brw-gen = 7) { - dst_reg grf_offset = dst_reg(this, glsl_type::int_type); - - /* We have to use a message header on Skylake to get SIMD4x2 mode. - * Reserve space for the register. - */ - if (brw-gen = 9) { -grf_offset.reg_offset++; -alloc.sizes[grf_offset.reg] = 2; - } - - grf_offset.type = offset.type; - - emit(MOV(grf_offset, offset)); - - vec4_instruction *pull = -emit(new(mem_ctx) vec4_instruction(VS_OPCODE_PULL_CONSTANT_LOAD_GEN7, - dst_reg(packed_consts), - surf_index, - src_reg(grf_offset))); - pull-mlen = 1; - } else { - vec4_instruction *pull = -emit(new(mem_ctx)
Re: [Mesa-dev] [PATCH 4/9] prog; add weak _mesa_error_no_memory() symbol and add it to libglsl_util
On Wed, Apr 15, 2015 at 7:08 AM, Emil Velikov emil.l.veli...@gmail.com wrote: Rather than forcing everyone to provide their own definition of the symbol provide a common weak one, which anyone can override if needed. This resolved the build of the standalone pipe-drivers, as it provides a default symbol which was missing previously. Cc: Rob Clark robcl...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/Makefile.am| 3 ++- src/glsl/SConscript| 2 ++ src/mesa/Android.libmesa_glsl_utils.mk | 6 -- src/mesa/program/weak_errors.c | 30 ++ Is program/ really the right place for this? (maybe main/ instead?) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: remove bogus r600-- triple
On 15 April 2015 at 10:24, Michel Dänzer mic...@daenzer.net wrote: On 09.04.2015 03:27, Emil Velikov wrote: As mentioned by Michel Dänzer for LLVM = 3.6 we create the LLVMTargetMachine (with triple amdgcn--), as we setup the radeonsi context. For older LLVM or hardware (r600) the triple is always r600-- and is created at a later stage - radeon_llvm_compile() Cc: Michel Dänzer michel.daen...@amd.com Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gallium/drivers/radeonsi/si_pipe.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index e761d20..5ea8868 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -85,8 +85,6 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen, void * LLVMTargetRef r600_target; #if HAVE_LLVM = 0x0306 const char *triple = amdgcn--; -#else - const char *triple = r600--; #endif int shader, i; r600_target should probably also be moved inside the #if HAVE_LLVM = 0x0306 as it's only used in that case. But that can be done in a separate patch. Either way, this patch is Reviewed-by: Michel Dänzer michel.daen...@amd.com Will keep the r600_target change in a separate patch. I have a few patches which handle unused variables/used uninitialised warnings. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Wed, Apr 1, 2015 at 6:19 PM, Jason Ekstrand ja...@jlekstrand.net wrote: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. Let's not do this. Nothing puts a saturate modifier on LOAD_PAYLOAD today, and it is kind of confusing about what it means. Can't we have fbwrites that write depth as well. I wouldn't think we wanted to saturate that. I don't think it buys us anything. If we just run copy propagation after lower_load_payload() we'll get the code we want. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 154 ++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 - 2 files changed, 80 insertions(+), 77 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f8e26c0..bc45a38 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3201,93 +3201,99 @@ fs_visitor::lower_load_payload() { bool progress = false; - int vgrf_to_reg[alloc.count]; - int reg_count = 0; - for (unsigned i = 0; i alloc.count; ++i) { - vgrf_to_reg[i] = reg_count; - reg_count += alloc.sizes[i]; - } - - struct { - bool written:1; /* Whether this register has ever been written */ - bool force_writemask_all:1; - bool force_sechalf:1; - } metadata[reg_count]; - memset(metadata, 0, sizeof(metadata)); - foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { - if (inst-dst.file == GRF) { - const int dst_reg = vgrf_to_reg[inst-dst.reg] + inst-dst.reg_offset; - bool force_sechalf = inst-force_sechalf - !inst-force_writemask_all; - bool toggle_sechalf = inst-dst.width == 16 - type_sz(inst-dst.type) == 4 - !inst-force_writemask_all; - for (int i = 0; i inst-regs_written; ++i) { -metadata[dst_reg + i].written = true; -metadata[dst_reg + i].force_sechalf = force_sechalf; -metadata[dst_reg + i].force_writemask_all = inst-force_writemask_all; -force_sechalf = (toggle_sechalf != force_sechalf); - } - } - if (inst-opcode == SHADER_OPCODE_LOAD_PAYLOAD) { We should invert this condition and 'continue' so that we can avoid an extra level of indentation which only makes the code harder to read. assert(inst-dst.file == MRF || inst-dst.file == GRF); + fs_reg dst = inst-dst; - for (int i = 0; i inst-sources; i++) { -dst.width = inst-src[i].effective_width; -dst.type = inst-src[i].type; - -if (inst-src[i].file == BAD_FILE) { - /* Do nothing but otherwise increment as normal */ -} else if (dst.file == MRF - dst.width == 8 - brw-has_compr4 - i + 4 inst-sources - inst-src[i + 4].equals(horiz_offset(inst-src[i], 8))) { - fs_reg compr4_dst = dst; - compr4_dst.reg += BRW_MRF_COMPR4; - compr4_dst.width = 16; - fs_reg compr4_src = inst-src[i]; - compr4_src.width = 16; - fs_inst *mov = MOV(compr4_dst, compr4_src); + /* Get rid of COMPR4. We'll add it back in if we need it */ + if (dst.file == MRF dst.reg BRW_MRF_COMPR4) +dst.reg = dst.reg ~BRW_MRF_COMPR4; No point in checking whether the BRW_MRF_COMPR4 bit is set before clearing it. Just clear it. + + dst.width = 8; + for (uint8_t i = 0; i inst-header_size; i++) { +if (inst-src[i].file != BAD_FILE) { + fs_reg mov_dst = retype(dst, BRW_REGISTER_TYPE_UD); + fs_reg mov_src = retype(inst-src[i], BRW_REGISTER_TYPE_UD); + mov_src.width = 8; + fs_inst *mov = MOV(mov_dst, mov_src); mov-force_writemask_all = true; inst-insert_before(block, mov); - /* Mark i+4 as BAD_FILE so we don't emit a MOV for it */ - inst-src[i + 4].file = BAD_FILE; -} else { - fs_inst *mov = MOV(dst, inst-src[i]); -
Re: [Mesa-dev] [PATCH] glx: Allow to create any OpenGL ES version.
On Wed 15 Apr 2015, Emil Velikov wrote: On 14 April 2015 at 17:22, Ian Romanick i...@freedesktop.org wrote: On 04/14/2015 08:36 AM, Emil Velikov wrote: On 14 April 2015 at 13:52, Jose Fonseca jfons...@vmware.com wrote: On 13/04/15 22:59, Ian Romanick wrote: On 04/10/2015 03:36 PM, Jose Fonseca wrote: From: José Fonseca jfons...@vmware.com diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c index 8192c54..192525a 100644 --- a/src/glx/dri2_glx.c +++ b/src/glx/dri2_glx.c @@ -1102,7 +1102,7 @@ dri2BindExtensions(struct dri2_screen *psc, struct glx_display * priv, __glXEnableDirectExtension(psc-base, GLX_ARB_create_context); __glXEnableDirectExtension(psc-base, GLX_ARB_create_context_profile); - if ((mask (1 __DRI_API_GLES2)) != 0) { + if ((mask ((1 __DRI_API_GLES2) | (1 __DRI_API_GLES3))) != 0) Since ES3 is a superset of ES2, this shouldn't be necessary. I can't imagine a driver only setting __DRI_API_GLES3... and the common code may not even make that possible. We could, however, enable GLX_EXT_create_context_es2_profile if only __DRI_API_GLES is set. I don't really feel like testing any drivers that only support ES 1.x, do you? :) Think of it as allowing gles1 support via GLX (apart from gles2/3), rather than testing gles1 only drivers. I would assume that you already support gles1 via EGL so things should just work for GLX. Either way the decision is not mine to make ;-) Drivers that support GLES1 but not GLES2 are easily created. All that's needed is rebuilding i965 with --enable-gles1 --disable-gles2, right? I also have no time to rebuild the xserver. I try that once in a blue moon, and everytime it leads to broken input drivers and the need for vodka. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glx: Create proper server dependency for GLX_EXT_create_context_es2_profile
On Tue 14 Apr 2015, Ian Romanick wrote: From: Ian Romanick ian.d.roman...@intel.com Previously GLX_EXT_create_context_es2_profile was marked as direct only so that it would not depend on server support. Since the extension required functions that are part of GLX_ARB_create_context_profile, support for the EXT was disabled if the ARB was not supported. This was complete rubbish. If the server supported the ARB but not the EXT, sending a request with GLX_CONTEXT_ES2_PROFILE_BIT_EXT would result in GLXBadProfileARB. Instead of the misguided hack, make GLX_EXT_create_context_es2_profile properly depend on server support by not marking it as direct only. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: José Fonseca jfons...@vmware.com Cc: Chad Versace chad.vers...@intel.com Cc: Emil Velikov emil.l.veli...@gmail.com --- src/glx/glxextensions.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) There's a lot of subtlety here, but it looks correct as far as I can tell. Reviewed-by: Chad Versace chad.vers...@intel.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Update the comment about platforms supporting blorp
On Wed 15 Apr 2015, Anuj Phogat wrote: Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp index 644cb41..7fd812e 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp @@ -223,8 +223,8 @@ brw_blorp_copytexsubimage(struct brw_context *brw, struct intel_mipmap_tree *src_mt = src_irb-mt; struct intel_mipmap_tree *dst_mt = intel_image-mt; - /* BLORP is not supported before Gen6. */ - if (brw-gen 6 || brw-gen = 8) + /* BLORP is only supported for Gen6-7. */ + if (brw-gen 6 || brw-gen 7) return false; if (_mesa_get_format_base_format(src_rb-Format) != Reviewed-by: Chad Versace chad.vers...@intel.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965: Make shader_time store names/ids instead of referencing shaders.
On Wed, Apr 15, 2015 at 2:24 AM, Kenneth Graunke kenn...@whitecape.org wrote: Jason noticed that shader_time was bumping the reference count on the gl_shader_program and gl_program structures, in code called during compilation. Not only were these never unreferenced, but it meant fragment shaders might be referenced twice (SIMD8 and SIMD16)...or only once. We don't actually need the programs. We just need their numeric ID and their language (GLSL/ARB/FF) or KHR_debug label. If there's a label, we have to strdup it since the underlying program could be deleted. To be fair, we're not exactly cleaning that up either, but we at least ralloc it out of the shader_time arrays, so if we ever bother cleaning those up, they'll go away properly. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_context.h | 4 +-- src/mesa/drivers/dri/i965/brw_program.c | 52 +++-- 2 files changed, 19 insertions(+), 37 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 0bd0ed1..a6d6787 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1454,8 +1454,8 @@ struct brw_context struct { drm_intel_bo *bo; - struct gl_shader_program **shader_programs; - struct gl_program **programs; + const char **names; + int *ids; enum shader_time_shader_type *types; uint64_t *cumulative; int num_entries; diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index 7ea08e6..81a0c19 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -294,10 +294,8 @@ brw_init_shader_time(struct brw_context *brw) brw-shader_time.bo = drm_intel_bo_alloc(brw-bufmgr, shader time, max_entries * SHADER_TIME_STRIDE, 4096); - brw-shader_time.shader_programs = rzalloc_array(brw, struct gl_shader_program *, -max_entries); - brw-shader_time.programs = rzalloc_array(brw, struct gl_program *, - max_entries); + brw-shader_time.names = rzalloc_array(brw, const char *, max_entries); + brw-shader_time.ids = rzalloc_array(brw, int, max_entries); brw-shader_time.types = rzalloc_array(brw, enum shader_time_shader_type, max_entries); brw-shader_time.cumulative = rzalloc_array(brw, uint64_t, @@ -434,36 +432,15 @@ brw_report_shader_time(struct brw_context *brw) fprintf(stderr, \n); fprintf(stderr, type ID cycles spent %% of total\n); for (int s = 0; s brw-shader_time.num_entries; s++) { - const char *shader_name; const char *stage; /* Work back from the sorted pointers times to a time to print. */ int i = sorted[s] - scaled; - struct gl_shader_program *prog = brw-shader_time.shader_programs[i]; if (scaled[i] == 0) continue; - int shader_num = 0; - if (prog) { - shader_num = prog-Name; - - if (prog-Label) { -shader_name = prog-Label; - } else if (shader_num == 0) { -shader_name = ff; - } else { -shader_name = glsl; - } - } else if (brw-shader_time.programs[i]) { - shader_num = brw-shader_time.programs[i]-Id; - if (shader_num == 0) { -shader_name = ff; - } else { -shader_name = prog; - } - } else { - shader_name = other; - } + int shader_num = brw-shader_time.ids[i]; + const char *shader_name = brw-shader_time.names[i]; switch (brw-shader_time.types[i]) { case ST_VS: @@ -543,19 +520,24 @@ brw_get_shader_time_index(struct brw_context *brw, struct gl_program *prog, enum shader_time_shader_type type) { I'm still not happy that this is still in get_shader_time_index but at least it doesn't use the GL context anymore. Series is Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com - struct gl_context *ctx = brw-ctx; - int shader_time_index = brw-shader_time.num_entries++; assert(shader_time_index brw-shader_time.max_entries); brw-shader_time.types[shader_time_index] = type; - _mesa_reference_shader_program(ctx, - brw-shader_time.shader_programs[shader_time_index], - shader_prog); + int id = shader_prog ? shader_prog-Name : prog-Id; + const char *name; + if (id == 0) { + name = ff; + } else if (!shader_prog) { + name = prog; + } else if (shader_prog-Label) { + name =
Re: [Mesa-dev] [PATCH] glx: Allow to create any OpenGL ES version.
On 14/04/15 17:22, Ian Romanick wrote: On 04/14/2015 08:36 AM, Emil Velikov wrote: On 14 April 2015 at 13:52, Jose Fonseca jfons...@vmware.com wrote: On 13/04/15 22:59, Ian Romanick wrote: On 04/10/2015 03:36 PM, Jose Fonseca wrote: From: José Fonseca jfons...@vmware.com The latest version of GLX_EXT_create_context_es2_profile states: If the version requested is a valid and supported OpenGL-ES version, and the GLX_CONTEXT_ES_PROFILE_BIT_EXT bit is set in the GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the context returned will implement the OpenGL ES version requested. We must also export EXT_create_context_es_profile too, as EXT_create_context_es2_profile specification is crystal clear: NOTE: implementations of this extension must export BOTH extension strings, for backwards compatibility with applications written against version 1 of this extension. Totally untested. (Just happened to noticed this while implementing GLX_EXT_create_context_es2_profile for st/xlib.) Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Emil Velikov emil.l.veli...@gmail.com v2: Replicate the drisw_glx.c to dri2_glx.c and dri3_glx.c as suggested by Emil Velikov. --- src/glx/dri2_glx.c | 5 - src/glx/dri3_glx.c | 5 - src/glx/dri_common.c | 32 src/glx/drisw_glx.c | 2 ++ 4 files changed, 26 insertions(+), 18 deletions(-) diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c index 462d560..8192c54 100644 --- a/src/glx/dri2_glx.c +++ b/src/glx/dri2_glx.c @@ -1102,9 +1102,12 @@ dri2BindExtensions(struct dri2_screen *psc, struct glx_display * priv, __glXEnableDirectExtension(psc-base, GLX_ARB_create_context); __glXEnableDirectExtension(psc-base, GLX_ARB_create_context_profile); - if ((mask (1 __DRI_API_GLES2)) != 0) + if ((mask (1 __DRI_API_GLES2)) != 0) { +__glXEnableDirectExtension(psc-base, + GLX_EXT_create_context_es_profile); __glXEnableDirectExtension(psc-base, GLX_EXT_create_context_es2_profile); + } } for (i = 0; extensions[i]; i++) { diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c index 1ddc723..6973ad1 100644 --- a/src/glx/dri3_glx.c +++ b/src/glx/dri3_glx.c @@ -1825,9 +1825,12 @@ dri3_bind_extensions(struct dri3_screen *psc, struct glx_display * priv, __glXEnableDirectExtension(psc-base, GLX_ARB_create_context); __glXEnableDirectExtension(psc-base, GLX_ARB_create_context_profile); - if ((mask (1 __DRI_API_GLES2)) != 0) + if ((mask (1 __DRI_API_GLES2)) != 0) { + __glXEnableDirectExtension(psc-base, + GLX_EXT_create_context_es_profile); __glXEnableDirectExtension(psc-base, GLX_EXT_create_context_es2_profile); + } for (i = 0; extensions[i]; i++) { /* when on a different gpu than the server, the server pixmaps diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c index 63c8de3..541abbb 100644 --- a/src/glx/dri_common.c +++ b/src/glx/dri_common.c @@ -544,9 +544,22 @@ dri2_convert_glx_attribs(unsigned num_attribs, const uint32_t *attribs, case GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB: *api = __DRI_API_OPENGL; break; - case GLX_CONTEXT_ES2_PROFILE_BIT_EXT: -*api = __DRI_API_GLES2; -break; + case GLX_CONTEXT_ES_PROFILE_BIT_EXT: + switch (*major_ver) { + case 3: +*api = __DRI_API_GLES3; +break; + case 2: +*api = __DRI_API_GLES2; +break; + case 1: +*api = __DRI_API_GLES; +break; + default: +*error = __DRI_CTX_ERROR_BAD_API; +return false; + } + break; default: *error = __DRI_CTX_ERROR_BAD_API; return false; @@ -577,19 +590,6 @@ dri2_convert_glx_attribs(unsigned num_attribs, const uint32_t *attribs, return false; } - /* The GLX_EXT_create_context_es2_profile spec says: -* -* ... If the version requested is 2.0, and the -* GLX_CONTEXT_ES2_PROFILE_BIT_EXT bit is set in the -* GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the context -* returned will implement OpenGL ES 2.0. This is the only way in which -* an implementation may request an OpenGL ES 2.0 context. -*/ - if (*api == __DRI_API_GLES2 (*major_ver != 2 || *minor_ver != 0)) { - *error = __DRI_CTX_ERROR_BAD_API; - return false; - } I guess this text was removed from the extension spec, and now we rely on some other layer detecting invalid versions (like 2.1)? Yes, the spec replaced that with ... If the version requested is a valid and supported OpenGL-ES version, and the GLX_CONTEXT_ES_PROFILE_BIT_EXT bit is set in the GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Fri, Apr 3, 2015 at 7:28 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Apr 2, 2015 at 3:01 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. I had a quick glance at the series and it seems to be going in the right direction. One thing I honestly don't like is the ad-hoc and IMHO premature treatment of payload headers, it still feels like the LOAD_PAYLOAD instruction has more complex semantics than necessary and the benefit is unclear to me. I suppose that your motivation was to avoid setting force_writemask_all in LOAD_PAYLOAD instructions with header. The optimizer should be able to cope with those flags and get rid of them from the resulting moves where they are redundant, and if it's not able to it looks like something that should be fixed anyway. The explicit handling of headers is responsible for much of the churn in this series and is likely to complicate users of LOAD_PAYLOAD and optimization passes that have to manipulate them. Avoiding force_writemask_all is only half of the motivation and the small half at that. A header source, more properly defined, is a single physical register that, conceptually, applies to all channels. Effectively, a header source (I should have stated this clearly) has two properties: 1) It has force_writemask_all set 2) It is exactly one physical hardware register. This second property is the more important of the two. Most of the disaster of the previous LOAD_PAYLOAD implementation was that we did a pile of guesswork and had a ill-conceved effective width think in order to figure out how big the register actually was. Making the user specify which sources are header sources eliminates that guesswork. It also has the nice side-effect that we can do the right force_writemask_all and we can properly handle COMPR4 for the the user. Yeah, true, but this seems like the least orthogonal and most annoying to use solution for this problem, it forces the caller to provide redundant information, it takes into account the saturate flag on some arguments and not others, There's no way to get a saturated load_payload, as far as I can tell. Let's just avoid the problem you mention all together and continue not doing it. I agree that this patch adding support for that is not a good idea. it shuffles sources with respect to the specified order when COMPR4 is set, but only for the first four non-header sources. To confirm I understand -- the existing lower_load_payload takes 2x 8-channel color sources and recognizes that src[i] and src[i+4] can be emitted as a compr4 MOV. The proposed patch changes that to take 1x 16-channel color source and just emits a compr4 MOV. I don't really find that change particularly contentious. In fact, I kind of wonder what case we were trying to handle by allowing separate sources for the first 8-channels and the second 8-channels. Is there some case where these are going to be produced by different instructions that can't do compr4? Texturing or math instructions come to mind, but they can't have MRF destinations anyway. In another email, you mention that compr4 change in this patch means that optimization passes need to know some extra information. I'm not able to think of any places where we'd need that information in optimization passes. The only passes that really operate on load_payload are CSE, copy propagate, and register coalescing; and I don't think any of these will touch things writing to MRFs. Am I forgetting something? I think I understand the concern in general -- the sources are copied into different destination registers dependent on (dst.reg BRW_MRF_COMPR4) == 0. That's a little strange, but that strangeness is in the hardware. I mean, if it's strange for load_payload to do this, isn't it strange for other hardware instructions to do it? Or is it just that it's a virtual opcode doing a hardware-specific thing? I don't know. It seems like it's just hiding a little bit of complexity (give me 16-channels and I'll emit compr4 MOVs to put them in the right hardware-specific places). I think any of the following
Re: [Mesa-dev] [PATCH 4/9] prog; add weak _mesa_error_no_memory() symbol and add it to libglsl_util
On 04/15/2015 11:56 AM, Emil Velikov wrote: On 15 April 2015 at 18:33, Matt Turner matts...@gmail.com wrote: On Wed, Apr 15, 2015 at 7:08 AM, Emil Velikov emil.l.veli...@gmail.com wrote: Rather than forcing everyone to provide their own definition of the symbol provide a common weak one, which anyone can override if needed. This resolved the build of the standalone pipe-drivers, as it provides a default symbol which was missing previously. Cc: Rob Clark robcl...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/Makefile.am| 3 ++- src/glsl/SConscript| 2 ++ src/mesa/Android.libmesa_glsl_utils.mk | 6 -- src/mesa/program/weak_errors.c | 30 ++ Is program/ really the right place for this? (maybe main/ instead?) Hmm main/ might be better indeed. If there are any recommendations about the name I'll gladly take it :) Jose/Brian From a quick look I cannot find a suitable alternative for __attribute__((weak)) for MSVC. Would you know any of the top of your head or should I just wrap the attribute in ifndef _MSC_VER and leave the rest ? If I drop the weak_errors.c hunk from the SCons here, and replace tests/common.c with it things should just work. I'd have to do some digging to answer the MSVC question. I guess I'm not totally clear on the background of what's going on here, but let me make a suggestion. The comment in src/gallium/state_trackers/xa/xa_tracker.c says that it's called by NIR, but I can't find any calls to _mesa_error_no_memory() in src/glsl/nir/. I guess the actual calls are in the src/glsl/ code, right? I ran into a similar thing in the glapi code where I wanted to call some core Mesa function, but didn't want a dependency of that nature. The solution was to use a callback function. So, could we add a function to the glsl/nir code which registers an out-of-memory callback function? The glsl/nir code could have a default fprintf(stderr) fallback. But Mesa would register a callback that actually calls _mesa_error() to record the error. That seems cleaner than using a compiler/linker work-around. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Rewrite ir_tex to ir_txl with lod 0 for vertex shaders
The ir_tex opcode turns into a sample or sample_c message, which will try to compute derivatives to determine the lod. This produces garbage for non-fragment shaders where the sample coordinates don't correspond to subspans. We fix this by rewriting the opcode from ir_tex to ir_txl and setting the lod to 0. https://bugs.freedesktop.org/show_bug.cgi?id=89457 Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 0049b2d..4e99366 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1839,6 +1839,15 @@ fs_visitor::emit_texture_gen7(ir_texture_opcode op, fs_reg dst, offset_value.file != BAD_FILE offset_value.file != IMM; bool coordinate_done = false; + /* The sampler can only meaningfully compute LOD for fragment shader +* messages. For all other stages, we change the opcode to ir_txl and +* hardcode the LOD to 0. +*/ + if (stage != MESA_SHADER_FRAGMENT op == ir_tex) { + op = ir_txl; + lod = fs_reg(0.0f); + } + /* Set up the LOD info */ switch (op) { case ir_tex: -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/9] prog; add weak _mesa_error_no_memory() symbol and add it to libglsl_util
On 15/04/15 20:38, Brian Paul wrote: On 04/15/2015 11:56 AM, Emil Velikov wrote: On 15 April 2015 at 18:33, Matt Turner matts...@gmail.com wrote: On Wed, Apr 15, 2015 at 7:08 AM, Emil Velikov emil.l.veli...@gmail.com wrote: Rather than forcing everyone to provide their own definition of the symbol provide a common weak one, which anyone can override if needed. This resolved the build of the standalone pipe-drivers, as it provides a default symbol which was missing previously. Cc: Rob Clark robcl...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/Makefile.am| 3 ++- src/glsl/SConscript| 2 ++ src/mesa/Android.libmesa_glsl_utils.mk | 6 -- src/mesa/program/weak_errors.c | 30 ++ Is program/ really the right place for this? (maybe main/ instead?) Hmm main/ might be better indeed. If there are any recommendations about the name I'll gladly take it :) Jose/Brian From a quick look I cannot find a suitable alternative for __attribute__((weak)) for MSVC. Would you know any of the top of your head or should I just wrap the attribute in ifndef _MSC_VER and leave the rest ? If I drop the weak_errors.c hunk from the SCons here, and replace tests/common.c with it things should just work. I'd have to do some digging to answer the MSVC question. I don't believe MSVC has the equivalent of __attribute__((weak)). There's _declspec(selectany) , which allows to pick any of multiple instances of the same symbol, but it does not allow to say which of the multiple implementations is strong vs weak. I guess I'm not totally clear on the background of what's going on here, but let me make a suggestion. The comment in src/gallium/state_trackers/xa/xa_tracker.c says that it's called by NIR, but I can't find any calls to _mesa_error_no_memory() in src/glsl/nir/. I guess the actual calls are in the src/glsl/ code, right? I ran into a similar thing in the glapi code where I wanted to call some core Mesa function, but didn't want a dependency of that nature. The solution was to use a callback function. So, could we add a function to the glsl/nir code which registers an out-of-memory callback function? The glsl/nir code could have a default fprintf(stderr) fallback. But Mesa would register a callback that actually calls _mesa_error() to record the error. That seems cleaner than using a compiler/linker work-around. I agree with Brian. This sort of compiler/linker tricks are trick to sustain with portable code. Callbacks seem a much more elegant solution. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] gallium/ttn: add support for texture offsets
Rob Clark robdcl...@gmail.com writes: On Tue, Apr 14, 2015 at 1:04 PM, Eric Anholt e...@anholt.net wrote: Rob Clark robdcl...@gmail.com writes: From: Rob Clark robcl...@freedesktop.org Signed-off-by: Rob Clark robcl...@freedesktop.org 1-3 (with the fix to 1 that you posted in irc) are: Reviewed-by: Eric Anholt e...@anholt.net I don't like the mismatch on bytes vs vec4s in the load_ubo_indirect arguments for patch 4, and will be interested in seeing the version you were working on to fix that. so, if I pull out the addressing-modes patch, then for the ubo patch we are back to 'index *= 16' vec4-byte conversion. Which is funny looking, but at the moment my best definition of correct is what glsl_to_nir and intel driver do, and by that definition, this is the correct thing to do. OK. But we should be doing the SHL here, too (also gets us consistency with glsl_to_nir). signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glx: Create proper server dependency for GLX_EXT_create_context_es2_profile
On 14/04/15 17:35, Ian Romanick wrote: From: Ian Romanick ian.d.roman...@intel.com Previously GLX_EXT_create_context_es2_profile was marked as direct only so that it would not depend on server support. Since the extension required functions that are part of GLX_ARB_create_context_profile, support for the EXT was disabled if the ARB was not supported. This was complete rubbish. If the server supported the ARB but not the EXT, sending a request with GLX_CONTEXT_ES2_PROFILE_BIT_EXT would result in GLXBadProfileARB. Instead of the misguided hack, make GLX_EXT_create_context_es2_profile properly depend on server support by not marking it as direct only. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: José Fonseca jfons...@vmware.com Cc: Chad Versace chad.vers...@intel.com Cc: Emil Velikov emil.l.veli...@gmail.com --- src/glx/glxextensions.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/src/glx/glxextensions.c b/src/glx/glxextensions.c index ce5d66d..a326f0d 100644 --- a/src/glx/glxextensions.c +++ b/src/glx/glxextensions.c @@ -84,7 +84,7 @@ static const struct extension_info known_glx_extensions[] = { { GLX(EXT_visual_rating), VER(0,0), Y, Y, N, N }, { GLX(EXT_fbconfig_packed_float), VER(0,0), Y, Y, N, N }, { GLX(EXT_framebuffer_sRGB),VER(0,0), Y, Y, N, N }, - { GLX(EXT_create_context_es2_profile), VER(0,0), Y, N, N, Y }, + { GLX(EXT_create_context_es2_profile), VER(0,0), Y, N, N, N }, { GLX(MESA_copy_sub_buffer),VER(0,0), Y, N, N, N }, { GLX(MESA_multithread_makecurrent),VER(0,0), Y, N, Y, N }, { GLX(MESA_query_renderer), VER(0,0), Y, N, N, Y }, @@ -627,16 +627,6 @@ __glXCalculateUsableExtensions(struct glx_screen * psc, } } - /* This hack is necessary because GLX_ARB_create_context_profile depends on -* server support, but GLX_EXT_create_context_es2_profile is direct-only. -* Without this hack, it would be possible to advertise -* GLX_EXT_create_context_es2_profile without -* GLX_ARB_create_context_profile. That would be a problem. -*/ - if (!IS_SET(server_support, ARB_create_context_profile_bit)) { - CLR_BIT(usable, EXT_create_context_es2_profile_bit); - } - psc-effectiveGLXexts = __glXGetStringFromTable(known_glx_extensions, usable); } Makes sense AFAICT. But I confess I'm not very familiar with these subtle interactions between GLX and X server. Acked-by: Jose Fonseca jfons...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Render R16G16B16X16 as R16G16B16A16
On Wednesday, April 15, 2015 11:38:51 AM Anuj Phogat wrote: This enables using _mesa_meta_pbo_TexSubImage() to upload data to R16G16B16X16 texture. Earlier it fell back to slower paths. Jenkins run shows no piglit regressions. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/brw_surface_formats.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index f04bac5..7bec8fa 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -582,6 +582,12 @@ brw_init_surface_formats(struct brw_context *brw) case BRW_SURFACEFORMAT_L16_FLOAT: render = BRW_SURFACEFORMAT_R16_FLOAT; break; + case BRW_SURFACEFORMAT_R16G16B16X16_UNORM: + render = BRW_SURFACEFORMAT_R16G16B16A16_UNORM; + break; + case BRW_SURFACEFORMAT_R16G16B16X16_FLOAT: + render = BRW_SURFACEFORMAT_R16G16B16A16_FLOAT; + break; case BRW_SURFACEFORMAT_B8G8R8X8_UNORM: /* XRGB is handled as ARGB because the chips in this family * cannot render to XRGB targets. This means that we have to LGTM, thanks! Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] st/mesa: also emit labels for TGSI_OPCODE_BGNSUB
On 04/14/2015 02:20 PM, Jose Fonseca wrote: On 14/04/15 17:56, Brian Paul wrote: Subroutines need labels so they can be identied by CAL instructions. --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 67a4da7..eb0ce07 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -4771,18 +4771,25 @@ compile_tgsi_instruction(struct st_translate *t, } switch(inst-op) { + case TGSI_OPCODE_BGNSUB: case TGSI_OPCODE_BGNLOOP: case TGSI_OPCODE_CAL: case TGSI_OPCODE_ELSE: case TGSI_OPCODE_ENDLOOP: case TGSI_OPCODE_IF: case TGSI_OPCODE_UIF: - assert(num_dst == 0); - ureg_label_insn(ureg, - inst-op, - src, num_src, - get_label(t, -inst-op == TGSI_OPCODE_CAL ? inst-function-sig_id : 0)); + { + int sig_id = 0; + if (inst-op == TGSI_OPCODE_CAL || + inst-op == TGSI_OPCODE_BGNSUB) { I'm not sure about this. One thing is the label of the instruction (ie, instrution no), the other is the operand label. CAL takes an operand label (the #no of the instruction where BGNSUB starts). But BGNSUB shouldn't need a label (what matter is the instrucion #no). If it does take a label, then it's probably the label of the ENDSUB, not where the function starts. I thought we had removed all these pointless labels. I suppose that's why the label 0 is used. In other words, I believe the right thing here is to add the case statement for TGSI_OPCODE_BGNSUB, but not treat it like CAL. Hmm, OK, I'll have to take another look at this when I can find some time. I know labels were kind of iffy... -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium/ttn: add UBO support
From: Rob Clark robcl...@freedesktop.org v2: move ishl into ttn (instead of driver backend) to keep the units consistent between immediate and indirect offsets Signed-off-by: Rob Clark robcl...@freedesktop.org --- src/gallium/auxiliary/nir/tgsi_to_nir.c | 69 - 1 file changed, 60 insertions(+), 9 deletions(-) diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c b/src/gallium/auxiliary/nir/tgsi_to_nir.c index b6123e0..6c8d3fc 100644 --- a/src/gallium/auxiliary/nir/tgsi_to_nir.c +++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c @@ -162,6 +162,10 @@ ttn_emit_declaration(struct ttn_compile *c) file == TGSI_FILE_OUTPUT || file == TGSI_FILE_CONSTANT); + /* nothing to do for UBOs: */ + if ((file == TGSI_FILE_CONSTANT) decl-Declaration.Dimension) + return; + var = rzalloc(b-shader, nir_variable); var-data.driver_location = decl-Range.First; @@ -286,7 +290,9 @@ ttn_array_deref(struct ttn_compile *c, nir_intrinsic_instr *instr, static nir_src ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index, - struct tgsi_ind_register *indirect) + struct tgsi_ind_register *indirect, + struct tgsi_dimension *dim, + struct tgsi_ind_register *dimind) { nir_builder *b = c-build; nir_src src; @@ -314,15 +320,18 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index, assert(!indirect); src.reg.reg = c-temp_regs[index].reg; } + assert(!dim); break; case TGSI_FILE_ADDRESS: src.reg.reg = c-addr_reg; + assert(!dim); break; case TGSI_FILE_IMMEDIATE: src = nir_src_for_ssa(c-imm_defs[index]); assert(!indirect); + assert(!dim); break; case TGSI_FILE_SYSTEM_VALUE: { @@ -330,6 +339,9 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index, nir_intrinsic_op op; unsigned ncomp = 1; + assert(!indirect); + assert(!dim); + switch (c-scan-system_value_semantic_name[index]) { case TGSI_SEMANTIC_VERTEXID_NOBASE: op = nir_intrinsic_load_vertex_id_zero_base; @@ -361,15 +373,24 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index, case TGSI_FILE_CONSTANT: { nir_intrinsic_instr *load; nir_intrinsic_op op; + unsigned srcn = 0; switch (file) { case TGSI_FILE_INPUT: op = indirect ? nir_intrinsic_load_input_indirect : nir_intrinsic_load_input; + assert(!dim); break; case TGSI_FILE_CONSTANT: - op = indirect ? nir_intrinsic_load_uniform_indirect : - nir_intrinsic_load_uniform; + if (dim) { +op = indirect ? nir_intrinsic_load_ubo_indirect : +nir_intrinsic_load_ubo; +/* convert index from vec4 to byte: */ +index *= 16; + } else { +op = indirect ? nir_intrinsic_load_uniform_indirect : +nir_intrinsic_load_uniform; + } break; default: unreachable(No other load files supported); @@ -381,8 +402,28 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index, load-num_components = 4; load-const_index[0] = index; load-const_index[1] = 1; + if (dim) { + if (dimind) { +load-src[srcn] = + ttn_src_for_file_and_index(c, dimind-File, dimind-Index, + NULL, NULL, NULL); + } else { +/* UBOs start at index 1 in TGSI: */ +load-src[srcn] = + nir_src_for_ssa(nir_imm_int(b, dim-Index - 1)); + } + srcn++; + } if (indirect) { - load-src[0] = ttn_src_for_indirect(c, indirect); + load-src[srcn] = ttn_src_for_indirect(c, indirect); + if (dim) { +assert(load-src[srcn].is_ssa); +/* we also need to covert vec4 to byte here too: */ +load-src[srcn] = + nir_src_for_ssa(nir_ishl(b, load-src[srcn].ssa, +nir_imm_int(b, 4))); + } + srcn++; } nir_ssa_dest_init(load-instr, load-dest, 4, NULL); nir_instr_insert_after_cf_list(b-cf_node_list, load-instr); @@ -409,7 +450,8 @@ ttn_src_for_indirect(struct ttn_compile *c, struct tgsi_ind_register *indirect) src.swizzle[i] = indirect-Swizzle; src.src = ttn_src_for_file_and_index(c, indirect-File, -indirect-Index, NULL); +indirect-Index, +NULL, NULL, NULL); return
Re: [Mesa-dev] [PATCH] i965: Rewrite ir_tex to ir_txl with lod 0 for vertex shaders
On 04/15/2015 12:41 PM, Kristian Høgsberg wrote: The ir_tex opcode turns into a sample or sample_c message, which will try to compute derivatives to determine the lod. This produces garbage for non-fragment shaders where the sample coordinates don't correspond to subspans. We fix this by rewriting the opcode from ir_tex to ir_txl and setting the lod to 0. This seems logical, but... why the heck hasn't this been a problem before? Is this some weirdness of just BSW? Should we perhaps only do this on BSW? https://bugs.freedesktop.org/show_bug.cgi?id=89457 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89457 That allows the bin/bugzilla_mesa.sh script to find it for generating release notes. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 0049b2d..4e99366 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1839,6 +1839,15 @@ fs_visitor::emit_texture_gen7(ir_texture_opcode op, fs_reg dst, offset_value.file != BAD_FILE offset_value.file != IMM; bool coordinate_done = false; + /* The sampler can only meaningfully compute LOD for fragment shader +* messages. For all other stages, we change the opcode to ir_txl and +* hardcode the LOD to 0. +*/ + if (stage != MESA_SHADER_FRAGMENT op == ir_tex) { + op = ir_txl; + lod = fs_reg(0.0f); + } + /* Set up the LOD info */ switch (op) { case ir_tex: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90032] build fails, undefined reference to llvm:RTDyldMemoryManager
https://bugs.freedesktop.org/show_bug.cgi?id=90032 --- Comment #11 from Felix von Leitner felix-freedesk...@fefe.de --- Good news! It works now that I recompiled llvm with RTTI support! I thought I had RTTI in my LLVM but I was doing it wrong. For the record: the right way to build LLVM with RTTI support is: cmake [path-to-llvm-sources] -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DLLVM_LIBDIR_SUFFIX=64 -DBUILD_SHARED_LIBS=ON -DLLVM_ENABLE_RTTI=1 (the -DLLVM_ENABLE_RTTI=1 part is the important part) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/9] prog; add weak _mesa_error_no_memory() symbol and add it to libglsl_util
On Wed, Apr 15, 2015 at 6:15 PM, Ian Romanick i...@freedesktop.org wrote: On 04/15/2015 10:56 AM, Emil Velikov wrote: On 15 April 2015 at 18:33, Matt Turner matts...@gmail.com wrote: On Wed, Apr 15, 2015 at 7:08 AM, Emil Velikov emil.l.veli...@gmail.com wrote: Rather than forcing everyone to provide their own definition of the symbol provide a common weak one, which anyone can override if needed. This resolved the build of the standalone pipe-drivers, as it provides a default symbol which was missing previously. Cc: Rob Clark robcl...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/Makefile.am| 3 ++- src/glsl/SConscript| 2 ++ src/mesa/Android.libmesa_glsl_utils.mk | 6 -- src/mesa/program/weak_errors.c | 30 ++ Is program/ really the right place for this? (maybe main/ instead?) Hmm main/ might be better indeed. If there are any recommendations about the name I'll gladly take it :) So... who is going to use this other than src/glsl/main.cpp and src/glsl/tests? It seems like you could just put the (non-weak) symbol in a common file that only those things link. Is the goal just code sharing, or is the goal partitioning src/glsl from src/mesa? The former probably isn't worth the effort, and the latter should have a more systematic approach... and I could get behind that. fwiw, the goal is sharing libnir (both vc4 and freedreno/ir3 use it, and I expect others might like to at some point).. but that currently brings along some glsl dependencies, which bring along the _mesa_error_no_memory() dependency.. BR, -R Jose/Brian From a quick look I cannot find a suitable alternative for __attribute__((weak)) for MSVC. Would you know any of the top of your head or should I just wrap the attribute in ifndef _MSC_VER and leave the rest ? If I drop the weak_errors.c hunk from the SCons here, and replace tests/common.c with it things should just work. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/9] prog; add weak _mesa_error_no_memory() symbol and add it to libglsl_util
On 04/15/2015 10:56 AM, Emil Velikov wrote: On 15 April 2015 at 18:33, Matt Turner matts...@gmail.com wrote: On Wed, Apr 15, 2015 at 7:08 AM, Emil Velikov emil.l.veli...@gmail.com wrote: Rather than forcing everyone to provide their own definition of the symbol provide a common weak one, which anyone can override if needed. This resolved the build of the standalone pipe-drivers, as it provides a default symbol which was missing previously. Cc: Rob Clark robcl...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/Makefile.am| 3 ++- src/glsl/SConscript| 2 ++ src/mesa/Android.libmesa_glsl_utils.mk | 6 -- src/mesa/program/weak_errors.c | 30 ++ Is program/ really the right place for this? (maybe main/ instead?) Hmm main/ might be better indeed. If there are any recommendations about the name I'll gladly take it :) So... who is going to use this other than src/glsl/main.cpp and src/glsl/tests? It seems like you could just put the (non-weak) symbol in a common file that only those things link. Is the goal just code sharing, or is the goal partitioning src/glsl from src/mesa? The former probably isn't worth the effort, and the latter should have a more systematic approach... and I could get behind that. Jose/Brian From a quick look I cannot find a suitable alternative for __attribute__((weak)) for MSVC. Would you know any of the top of your head or should I just wrap the attribute in ifndef _MSC_VER and leave the rest ? If I drop the weak_errors.c hunk from the SCons here, and replace tests/common.c with it things should just work. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/3] util: Use set_foreach instead of rolling our own
On Wed, Apr 15, 2015 at 2:14 PM, Thomas Helland thomashellan...@gmail.com wrote: This follows the same pattern as in the hash_table. Reviewed-by: Jason Ekstrand jason.ekstrand at intel.com Fix the email address before committing, whoever commits. Signed-off-by: Thomas Helland thomashellan...@gmail.com --- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Rewrite ir_tex to ir_txl with lod 0 for vertex shaders
On Wednesday, April 15, 2015 02:47:01 PM Ian Romanick wrote: On 04/15/2015 12:41 PM, Kristian Høgsberg wrote: The ir_tex opcode turns into a sample or sample_c message, which will try to compute derivatives to determine the lod. This produces garbage for non-fragment shaders where the sample coordinates don't correspond to subspans. We fix this by rewriting the opcode from ir_tex to ir_txl and setting the lod to 0. This seems logical, but... why the heck hasn't this been a problem before? Is this some weirdness of just BSW? Should we perhaps only do this on BSW? In the vec4 backend, we've always made ir_tex use the SAMPLE_LOD message with lod set to 0.0 - so Gen4-7.5 have always done this workaround. When Kristian added SIMD8 VS support (Oct 2014, Gen8+), we started using SAMPLE (by virtue of not changing the FS behavior). We honestly have no idea why it appears to work on Broadwell - it sure seems like it ought to be broken there. Maybe we've just been lucky. There's also a message header bit for Force LOD to Zero, which the docs say is mandatory if you're going to use the SAMPLE message in SIMD4x2 stages (i.e. not fragment). We've wondered whether headerless messages in non-fragment stages magically get that right. That could make the issue less noticable... Using SAMPLE_LOD is easier than setting the message header bit, and seems just as effective. I think we should do it on Broadwell too. Thanks for fixing this, Kristian! Cc: 10.5 mesa-sta...@lists.freedesktop.org Reviewed-by: Kenneth Graunke kenn...@whitecape.org https://bugs.freedesktop.org/show_bug.cgi?id=89457 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89457 That allows the bin/bugzilla_mesa.sh script to find it for generating release notes. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 0049b2d..4e99366 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1839,6 +1839,15 @@ fs_visitor::emit_texture_gen7(ir_texture_opcode op, fs_reg dst, offset_value.file != BAD_FILE offset_value.file != IMM; bool coordinate_done = false; + /* The sampler can only meaningfully compute LOD for fragment shader +* messages. For all other stages, we change the opcode to ir_txl and +* hardcode the LOD to 0. +*/ + if (stage != MESA_SHADER_FRAGMENT op == ir_tex) { + op = ir_txl; + lod = fs_reg(0.0f); + } + /* Set up the LOD info */ switch (op) { case ir_tex: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 9/9] r300: do not link against libdrm_intel
Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Wed, Apr 15, 2015 at 4:08 PM, Emil Velikov emil.l.veli...@gmail.com wrote: Accidentally added since the introduction of the file. Cc: 10.4 10.5 mesa-sta...@lists.freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gallium/drivers/r300/Automake.inc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r300/Automake.inc b/src/gallium/drivers/r300/Automake.inc index 9334973..d4ddc40 100644 --- a/src/gallium/drivers/r300/Automake.inc +++ b/src/gallium/drivers/r300/Automake.inc @@ -5,7 +5,7 @@ TARGET_CPPFLAGS += -DGALLIUM_R300 TARGET_LIB_DEPS += \ $(top_builddir)/src/gallium/drivers/r300/libr300.la \ $(RADEON_LIBS) \ - $(INTEL_LIBS) + $(LIBDRM_LIBS) TARGET_RADEON_WINSYS = \ $(top_builddir)/src/gallium/winsys/radeon/drm/libradeonwinsys.la -- 2.3.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir/builder: add nir_builder_insert_after_instr()
From: Rob Clark robcl...@freedesktop.org For lowering if/else, I need a way to insert at the end of the previous block. Signed-off-by: Rob Clark robcl...@freedesktop.org --- src/glsl/nir/nir_builder.h | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_builder.h b/src/glsl/nir/nir_builder.h index 587d014..d1419ee 100644 --- a/src/glsl/nir/nir_builder.h +++ b/src/glsl/nir/nir_builder.h @@ -29,6 +29,7 @@ struct exec_list; typedef struct nir_builder { struct exec_list *cf_node_list; nir_instr *before_instr; + nir_instr *after_instr; nir_shader *shader; nir_function_impl *impl; @@ -47,12 +48,24 @@ nir_builder_insert_after_cf_list(nir_builder *build, struct exec_list *cf_node_list) { build-cf_node_list = cf_node_list; + build-before_instr = NULL; + build-after_instr = NULL; } static inline void nir_builder_insert_before_instr(nir_builder *build, nir_instr *before_instr) { + build-cf_node_list = NULL; build-before_instr = before_instr; + build-after_instr = NULL; +} + +static inline void +nir_builder_insert_after_instr(nir_builder *build, nir_instr *after_instr) +{ + build-cf_node_list = NULL; + build-before_instr = NULL; + build-after_instr = after_instr; } static inline void @@ -60,9 +73,12 @@ nir_builder_instr_insert(nir_builder *build, nir_instr *instr) { if (build-cf_node_list) { nir_instr_insert_after_cf_list(build-cf_node_list, instr); - } else { - assert(build-before_instr); + } else if (build-before_instr) { nir_instr_insert_before(build-before_instr, instr); + } else { + assert(build-after_instr); + nir_instr_insert_after(build-after_instr, instr); + build-after_instr = instr; } } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Rewrite ir_tex to ir_txl with lod 0 for vertex shaders
On Wed, Apr 15, 2015 at 4:28 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Wednesday, April 15, 2015 02:47:01 PM Ian Romanick wrote: On 04/15/2015 12:41 PM, Kristian Høgsberg wrote: The ir_tex opcode turns into a sample or sample_c message, which will try to compute derivatives to determine the lod. This produces garbage for non-fragment shaders where the sample coordinates don't correspond to subspans. We fix this by rewriting the opcode from ir_tex to ir_txl and setting the lod to 0. This seems logical, but... why the heck hasn't this been a problem before? Is this some weirdness of just BSW? Should we perhaps only do this on BSW? In the vec4 backend, we've always made ir_tex use the SAMPLE_LOD message with lod set to 0.0 - so Gen4-7.5 have always done this workaround. When Kristian added SIMD8 VS support (Oct 2014, Gen8+), we started using SAMPLE (by virtue of not changing the FS behavior). We honestly have no idea why it appears to work on Broadwell - it sure seems like it ought to be broken there. Maybe we've just been lucky. There's also a message header bit for Force LOD to Zero, which the docs say is mandatory if you're going to use the SAMPLE message in SIMD4x2 stages (i.e. not fragment). We've wondered whether headerless messages in non-fragment stages magically get that right. That could make the issue less noticable... More speculation: since SKL adds a new message type (sample_c_lz) for sample_c with lod=0, it could be that some of that got pulled into BSW. Maybe pre-SKL, the thread dispatcher set the header bit in the thread payload for VS so that it carries over in message header for sample and sample_c and gets the right behavior. If BSW removes that special case in favor of a dedicated opcode, it would explain why it breaks on BSW but not BDW. The docs doesn't mention sample_c_lz for BSW, so who knows... Using SAMPLE_LOD is easier than setting the message header bit, and seems just as effective. I think we should do it on Broadwell too. Thanks for fixing this, Kristian! Thanks for the help. Kristian Cc: 10.5 mesa-sta...@lists.freedesktop.org Reviewed-by: Kenneth Graunke kenn...@whitecape.org https://bugs.freedesktop.org/show_bug.cgi?id=89457 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89457 That allows the bin/bugzilla_mesa.sh script to find it for generating release notes. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 0049b2d..4e99366 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1839,6 +1839,15 @@ fs_visitor::emit_texture_gen7(ir_texture_opcode op, fs_reg dst, offset_value.file != BAD_FILE offset_value.file != IMM; bool coordinate_done = false; + /* The sampler can only meaningfully compute LOD for fragment shader +* messages. For all other stages, we change the opcode to ir_txl and +* hardcode the LOD to 0. +*/ + if (stage != MESA_SHADER_FRAGMENT op == ir_tex) { + op = ir_txl; + lod = fs_reg(0.0f); + } + /* Set up the LOD info */ switch (op) { case ir_tex: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Wed, Apr 15, 2015 at 1:31 PM, Matt Turner matts...@gmail.com wrote: On Wed, Apr 1, 2015 at 6:19 PM, Jason Ekstrand ja...@jlekstrand.net wrote: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. Let's not do this. Nothing puts a saturate modifier on LOAD_PAYLOAD today, and it is kind of confusing about what it means. Can't we have fbwrites that write depth as well. I wouldn't think we wanted to saturate that. Sure. I can drop saturate and just assert that it's not set. We do want to keep force_sechalf and force_writemask_all though. I don't think it buys us anything. If we just run copy propagation after lower_load_payload() we'll get the code we want. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 154 ++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 - 2 files changed, 80 insertions(+), 77 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f8e26c0..bc45a38 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3201,93 +3201,99 @@ fs_visitor::lower_load_payload() { bool progress = false; - int vgrf_to_reg[alloc.count]; - int reg_count = 0; - for (unsigned i = 0; i alloc.count; ++i) { - vgrf_to_reg[i] = reg_count; - reg_count += alloc.sizes[i]; - } - - struct { - bool written:1; /* Whether this register has ever been written */ - bool force_writemask_all:1; - bool force_sechalf:1; - } metadata[reg_count]; - memset(metadata, 0, sizeof(metadata)); - foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { - if (inst-dst.file == GRF) { - const int dst_reg = vgrf_to_reg[inst-dst.reg] + inst-dst.reg_offset; - bool force_sechalf = inst-force_sechalf - !inst-force_writemask_all; - bool toggle_sechalf = inst-dst.width == 16 - type_sz(inst-dst.type) == 4 - !inst-force_writemask_all; - for (int i = 0; i inst-regs_written; ++i) { -metadata[dst_reg + i].written = true; -metadata[dst_reg + i].force_sechalf = force_sechalf; -metadata[dst_reg + i].force_writemask_all = inst-force_writemask_all; -force_sechalf = (toggle_sechalf != force_sechalf); - } - } - if (inst-opcode == SHADER_OPCODE_LOAD_PAYLOAD) { We should invert this condition and 'continue' so that we can avoid an extra level of indentation which only makes the code harder to read. Sure. I can do that. assert(inst-dst.file == MRF || inst-dst.file == GRF); + fs_reg dst = inst-dst; - for (int i = 0; i inst-sources; i++) { -dst.width = inst-src[i].effective_width; -dst.type = inst-src[i].type; - -if (inst-src[i].file == BAD_FILE) { - /* Do nothing but otherwise increment as normal */ -} else if (dst.file == MRF - dst.width == 8 - brw-has_compr4 - i + 4 inst-sources - inst-src[i + 4].equals(horiz_offset(inst-src[i], 8))) { - fs_reg compr4_dst = dst; - compr4_dst.reg += BRW_MRF_COMPR4; - compr4_dst.width = 16; - fs_reg compr4_src = inst-src[i]; - compr4_src.width = 16; - fs_inst *mov = MOV(compr4_dst, compr4_src); + /* Get rid of COMPR4. We'll add it back in if we need it */ + if (dst.file == MRF dst.reg BRW_MRF_COMPR4) +dst.reg = dst.reg ~BRW_MRF_COMPR4; No point in checking whether the BRW_MRF_COMPR4 bit is set before clearing it. Just clear it. Sure. + + dst.width = 8; + for (uint8_t i = 0; i inst-header_size; i++) { +if (inst-src[i].file != BAD_FILE) { + fs_reg mov_dst = retype(dst, BRW_REGISTER_TYPE_UD); + fs_reg mov_src = retype(inst-src[i], BRW_REGISTER_TYPE_UD); + mov_src.width = 8; + fs_inst *mov = MOV(mov_dst, mov_src); mov-force_writemask_all = true;
Re: [Mesa-dev] [PATCH] nir: Convert the if-test for num_inputs == 2 to an assertion
Reviewed-by: Connor Abbott cwabo...@gmail.com On Wed, Apr 15, 2015 at 9:18 PM, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com Suggested by Jason on a different patch after some comments / questions by Ilia. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: Jason Ekstrand jason.ekstr...@intel.com --- src/glsl/nir/nir_search.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c index 73a802b..5ba0160 100644 --- a/src/glsl/nir/nir_search.c +++ b/src/glsl/nir/nir_search.c @@ -218,8 +218,8 @@ match_expression(const nir_search_expression *expr, nir_alu_instr *instr, if (matched) return true; - if (nir_op_infos[instr-op].num_inputs == 2 - (nir_op_infos[instr-op].algebraic_properties NIR_OP_IS_COMMUTATIVE)) { + if (nir_op_infos[instr-op].algebraic_properties NIR_OP_IS_COMMUTATIVE) { + assert(nir_op_infos[instr-op].num_inputs == 2); if (!match_value(expr-srcs[0], instr, 1, num_components, swizzle, state)) return false; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Convert the if-test for num_inputs == 2 to an assertion
Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com On Wed, Apr 15, 2015 at 6:18 PM, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com Suggested by Jason on a different patch after some comments / questions by Ilia. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: Jason Ekstrand jason.ekstr...@intel.com --- src/glsl/nir/nir_search.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c index 73a802b..5ba0160 100644 --- a/src/glsl/nir/nir_search.c +++ b/src/glsl/nir/nir_search.c @@ -218,8 +218,8 @@ match_expression(const nir_search_expression *expr, nir_alu_instr *instr, if (matched) return true; - if (nir_op_infos[instr-op].num_inputs == 2 - (nir_op_infos[instr-op].algebraic_properties NIR_OP_IS_COMMUTATIVE)) { + if (nir_op_infos[instr-op].algebraic_properties NIR_OP_IS_COMMUTATIVE) { + assert(nir_op_infos[instr-op].num_inputs == 2); if (!match_value(expr-srcs[0], instr, 1, num_components, swizzle, state)) return false; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/ttn: add UBO support
Rob Clark robdcl...@gmail.com writes: From: Rob Clark robcl...@freedesktop.org v2: move ishl into ttn (instead of driver backend) to keep the units consistent between immediate and indirect offsets This looks good to me. Reviewed-by: Eric Anholt e...@anholt.net signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir: Convert the if-test for num_inputs == 2 to an assertion
From: Ian Romanick ian.d.roman...@intel.com Suggested by Jason on a different patch after some comments / questions by Ilia. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: Jason Ekstrand jason.ekstr...@intel.com --- src/glsl/nir/nir_search.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c index 73a802b..5ba0160 100644 --- a/src/glsl/nir/nir_search.c +++ b/src/glsl/nir/nir_search.c @@ -218,8 +218,8 @@ match_expression(const nir_search_expression *expr, nir_alu_instr *instr, if (matched) return true; - if (nir_op_infos[instr-op].num_inputs == 2 - (nir_op_infos[instr-op].algebraic_properties NIR_OP_IS_COMMUTATIVE)) { + if (nir_op_infos[instr-op].algebraic_properties NIR_OP_IS_COMMUTATIVE) { + assert(nir_op_infos[instr-op].num_inputs == 2); if (!match_value(expr-srcs[0], instr, 1, num_components, swizzle, state)) return false; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Wed, Apr 15, 2015 at 5:13 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Wed, Apr 15, 2015 at 1:31 PM, Matt Turner matts...@gmail.com wrote: On Wed, Apr 1, 2015 at 6:19 PM, Jason Ekstrand ja...@jlekstrand.net wrote: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. Let's not do this. Nothing puts a saturate modifier on LOAD_PAYLOAD today, and it is kind of confusing about what it means. Can't we have fbwrites that write depth as well. I wouldn't think we wanted to saturate that. Sure. I can drop saturate and just assert that it's not set. We do want to keep force_sechalf and force_writemask_all though. I didn't think about those before, but I don't know how a load_payload could have force_writemask_all set. Have I missed something? I see that setup_color_payload sets force_sechalf for dual-source fbwrites -- that's the only case we're going to have force_sechalf set, right? That is, the Gen 6 case is going to be handled by passing 16-channel sources to load_payload and letting it do compr4? I don't think it buys us anything. If we just run copy propagation after lower_load_payload() we'll get the code we want. [snip] +/* The COMPR4 code took care of the first 4 sources. We'll let + * the regular path handle any remaining sources. Yes, we are + * modifying the instruction but we're about to delete it so + * this really doesn't hurt anything. + */ +inst-header_size += 4; I mean, while the comment is a true statement, why is doing this any better than just... + } + + for (uint8_t i = inst-header_size; i inst-sources; i++) { ... changing this to inst-header_size + 4? Because the inst-header_size += 4 is predicated on it being a COMPR4 destination while the code below handles both the remaining sources (in the COMPR4 case) and the regular non-COMPR4 case. Ahh, right. It'd be fewer lines (no commenting necessary) to just have a 'start' variable that you set to header_size at the top and +=4 here. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Rewrite ir_tex to ir_txl with lod 0 for vertex shaders
On 04/15/2015 04:28 PM, Kenneth Graunke wrote: On Wednesday, April 15, 2015 02:47:01 PM Ian Romanick wrote: On 04/15/2015 12:41 PM, Kristian Høgsberg wrote: The ir_tex opcode turns into a sample or sample_c message, which will try to compute derivatives to determine the lod. This produces garbage for non-fragment shaders where the sample coordinates don't correspond to subspans. We fix this by rewriting the opcode from ir_tex to ir_txl and setting the lod to 0. This seems logical, but... why the heck hasn't this been a problem before? Is this some weirdness of just BSW? Should we perhaps only do this on BSW? In the vec4 backend, we've always made ir_tex use the SAMPLE_LOD message with lod set to 0.0 - so Gen4-7.5 have always done this workaround. When Kristian added SIMD8 VS support (Oct 2014, Gen8+), we started using SAMPLE (by virtue of not changing the FS behavior). We honestly have no Yes, that's the part the slipped my mind. In that case, FWIW, Reviewed-by: Ian Romanick ian.d.roman...@intel.com idea why it appears to work on Broadwell - it sure seems like it ought to be broken there. Maybe we've just been lucky. There's also a message header bit for Force LOD to Zero, which the docs say is mandatory if you're going to use the SAMPLE message in SIMD4x2 stages (i.e. not fragment). We've wondered whether headerless messages in non-fragment stages magically get that right. That could make the issue less noticable... Using SAMPLE_LOD is easier than setting the message header bit, and seems just as effective. I think we should do it on Broadwell too. Thanks for fixing this, Kristian! Cc: 10.5 mesa-sta...@lists.freedesktop.org Reviewed-by: Kenneth Graunke kenn...@whitecape.org https://bugs.freedesktop.org/show_bug.cgi?id=89457 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89457 That allows the bin/bugzilla_mesa.sh script to find it for generating release notes. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 0049b2d..4e99366 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1839,6 +1839,15 @@ fs_visitor::emit_texture_gen7(ir_texture_opcode op, fs_reg dst, offset_value.file != BAD_FILE offset_value.file != IMM; bool coordinate_done = false; + /* The sampler can only meaningfully compute LOD for fragment shader +* messages. For all other stages, we change the opcode to ir_txl and +* hardcode the LOD to 0. +*/ + if (stage != MESA_SHADER_FRAGMENT op == ir_tex) { + op = ir_txl; + lod = fs_reg(0.0f); + } + /* Set up the LOD info */ switch (op) { case ir_tex: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/3] Hash-table and hash-set, V4
Jason Ekstrand ja...@jlekstrand.net writes: On Sat, Apr 11, 2015 at 4:25 PM, Thomas Helland thomashellan...@gmail.com wrote: The performance numbers (shader-db runtime) are: Difference at 95.0% confidence -14.7608 +/- 3.36786 -9.05064% +/- 2.06501% (Original runtime was 160 seconds) Good Work! I had one comment on the hash set patch. With that fixed, the series is Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Probably want to give Eric a a couple of days to look at it too. Yeah, I'm splitting the series up into each individual change (resizing cutoff, linear probing, power-of-two, then quadratic probing) so that we can make sure that they're all good for both the compiler and GL object lookup. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] util: Change util/set to use quadratic probing
The same rationale applies here as for the hash table. Power of two size should give better performance, and using the algorithm hash = sh + i/2 + i*i/2 should result in only distinct hash values when hitting collisions. Difference at 95.0% confidence -7.9505 +/- 2.44011 -5.04357% +/- 1.54794% V4: Feedback from Jason Ekstrand - Split cleanup of set_rehash into separate commit V3: Feedback from Eric Anholt - Don't change load factor and starting size. V2: Feedback from Connor Abbott - Don't set initial hash address before potential rehash - Remove hash_sizes table - Correct the quadratic hashing algorithm - Use correct comment style Feedback from Jason Ekstrand - Use unreachable() to detect if we fail to insert Reviewed-by: Jason Ekstrand jason.ekstrand at intel.com Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/util/set.c | 110 + src/util/set.h | 3 +- 2 files changed, 41 insertions(+), 72 deletions(-) diff --git a/src/util/set.c b/src/util/set.c index f01f869..2bcc743 100644 --- a/src/util/set.c +++ b/src/util/set.c @@ -32,6 +32,17 @@ *Keith Packard kei...@keithp.com */ +/** + * Implements an open-addressing, quadratic probing hash-set. + * + * We choose set sizes that's a power of two. + * This is computationally less expensive than primes. + * As a bonus the size and free space can be calculated instead of looked up. + * FNV-1a has good avalanche properties, so collision is not an issue. + * These sets are sized to have an extra 10% free to avoid + * exponential performance degradation as the set fills. + */ + #include stdlib.h #include assert.h @@ -39,51 +50,9 @@ #include ralloc.h #include set.h -/* - * From Knuth -- a good choice for hash/rehash values is p, p-2 where - * p and p-2 are both prime. These tables are sized to have an extra 10% - * free to avoid exponential performance degradation as the hash table fills - */ - uint32_t deleted_key_value; const void *deleted_key = deleted_key_value; -static const struct { - uint32_t max_entries, size, rehash; -} hash_sizes[] = { - { 2,5,3}, - { 4,7,5}, - { 8,13, 11 }, - { 16, 19, 17 }, - { 32, 43, 41 }, - { 64, 73, 71 }, - { 128, 151, 149 }, - { 256, 283, 281 }, - { 512, 571, 569 }, - { 1024, 1153, 1151 }, - { 2048, 2269, 2267 }, - { 4096, 4519, 4517 }, - { 8192, 9013, 9011 }, - { 16384,18043,18041}, - { 32768,36109,36107}, - { 65536,72091,72089}, - { 131072, 144409, 144407 }, - { 262144, 288361, 288359 }, - { 524288, 576883, 576881 }, - { 1048576, 1153459, 1153457 }, - { 2097152, 2307163, 2307161 }, - { 4194304, 4613893, 4613891 }, - { 8388608, 9227641, 9227639 }, - { 16777216, 18455029, 18455027 }, - { 33554432, 36911011, 36911009 }, - { 67108864, 73819861, 73819859 }, - { 134217728,147639589,147639587}, - { 268435456,295279081,295279079}, - { 536870912,590559793,590559791}, - { 1073741824, 1181116273, 1181116271 }, - { 2147483648ul, 2362232233ul, 2362232231ul } -}; - static int entry_is_free(struct set_entry *entry) { @@ -114,10 +83,9 @@ _mesa_set_create(void *mem_ctx, if (ht == NULL) return NULL; - ht-size_index = 0; - ht-size = hash_sizes[ht-size_index].size; - ht-rehash = hash_sizes[ht-size_index].rehash; - ht-max_entries = hash_sizes[ht-size_index].max_entries; + ht-size_iteration = 2; + ht-size = 1 ht-size_iteration; + ht-max_entries = ht-size * 0.9; ht-key_hash_function = key_hash_function; ht-key_equals_function = key_equals_function; ht-table = rzalloc_array(ht, struct set_entry, ht-size); @@ -163,12 +131,11 @@ _mesa_set_destroy(struct set *ht, void (*delete_function)(struct set_entry *entr static struct set_entry * set_search(const struct set *ht, uint32_t hash, const void *key) { - uint32_t hash_address; + uint32_t start_hash_address = hash (ht-size - 1); + uint32_t hash_address = start_hash_address; + uint32_t quad_hash = 1; - hash_address = hash % ht-size; do { - uint32_t double_hash; - struct set_entry *entry = ht-table + hash_address; if (entry_is_free(entry)) { @@ -179,10 +146,10 @@ set_search(const struct set *ht, uint32_t hash, const void *key) } } - double_hash = 1 + hash % ht-rehash; - -
[Mesa-dev] [PATCH 4/3] util: Use set_foreach instead of rolling our own
This follows the same pattern as in the hash_table. Reviewed-by: Jason Ekstrand jason.ekstrand at intel.com Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/util/set.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/src/util/set.c b/src/util/set.c index 2bcc743..7ff9520 100644 --- a/src/util/set.c +++ b/src/util/set.c @@ -196,12 +196,8 @@ set_rehash(struct set *ht, uint32_t new_size_iteration) ht-entries = 0; ht-deleted_entries = 0; - for (entry = old_ht.table; -entry != old_ht.table + old_ht.size; -entry++) { - if (entry_is_present(entry)) { - set_add(ht, entry-hash, entry-key); - } + set_foreach(old_ht, entry) { + set_add(ht, entry-hash, entry-key); } ralloc_free(old_ht.table); -- 2.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] llvmpipe support for ARM?
That's good to know. I've always thought it's a bit of a pity that it isn't optimized at all for arm (doesn't use any arm intrinsics, though with newer llvm versions it is probably not all that important any more, and I'm not even sure which neon intrinsics would be useful) but someone would need to do the work. But that it's running is a good start :-). Roland Am 15.04.2015 um 17:45 schrieb Jan Vesely: Hi, I can run arm llvmpipe glxgears on IFC6410 (Snapdragon 600 APQ8064, Krait 300) without problems. llvm 3.5 mesa 10.4.3 both are fedora 21 standard packages without modifications. I haven't tried anything more complex than glxgears jan On Wed, 2015-04-15 at 16:30 +0200, Roland Scheidegger wrote: There were reports with varying success of running llvmpipe with arm in the past but I haven't heard anything for a while. In theory it should work (and crashing in the pixel shader is an indication the vertex shader worked or at least didn't crash). But it's not really tested and hence might easily break. My suggestion would be to try a newer Mesa version since 9.2.5 is very old, but I really don't know if it will help. If that doesn't help, you could try debugging it, you can get the crashing instruction in gdb with x/i address and see if that was an unaligned access which should have been aligned or something else. GALLIVM_DEBUG=ir,asm will get you the IR and compiled code (with debug builds). Roland Am 15.04.2015 um 14:37 schrieb Marko Moberg: Hi, I have been trying to get OpenGL sw rastering to work on Linux based ARM CortexA15 device but for some reason gallium llvmpipe gives me a segmentation fault. Has anybody managed to run sw rastering using gallium llvmpipe on ARM? Softpipe seems to be working ok. I am using the following versions: - Mesa 9.2.5 - LLVM3.3 - Wayland/Weston 1.5 The issue is related to opengl/gles2 rendering i.e. weston-simple-egl demo and Qt openGL demos give seg faults when executed with llvmpipe enabled. Backtrace from weston-simple-egl looks something like this: #0 0x6d7de620 in ?? () #1 0xb354004c in fs17_variant0_partial () #2 0xb6841740 in lp_rast_shade_quads_mask (task=task@entry=0x1e360, inputs=inputs@entry=0x47a00, x=128, y=80, mask=4369) at lp_rast.c:466 #3 0xb6842d04 in do_block_4_1 (c=optimized out, y=optimized out, x=optimized out, plane=optimized out, tri=optimized out, task=optimized out) at lp_rast_tri_tmp.h:61 #4 do_block_16_1 (c=synthetic pointer, y=optimized out, x=optimized out, plane=0xb4f4ccf0, tri=optimized out, task=optimized out) at lp_rast_tri_tmp.h:130 #5 lp_rast_triangle_1 (task=0x1e360, arg=...) at lp_rast_tri_tmp.h:232 #6 0xb6840420 in do_rasterize_bin (bin=optimized out, task=0x1e360, x=optimized out, y=optimized out) at lp_rast.c:607 #7 rasterize_bin (y=optimized out, x=optimized out, bin=optimized out, task=0x1e360) at lp_rast.c:626 #8 rasterize_scene (task=task@entry=0x1e360, scene=0xb3646008) at lp_rast.c:675 #9 0xb6840cb0 in thread_function (init_data=0x1e360) at lp_rast.c:788 #10 0xb6d98ed2 in start_thread () from /lib/libpthread.so.0 #11 0xb6c91058 in ?? () from /lib/libc.so.6 #12 0xb6c91058 in ?? () from /lib/libc.so.6 regards, Marko marko.s.mob...@gmail.com mailto:marko.s.mob...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AwIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=5qZ4KpVhu_wXA8-dTn8xYiul5G713jYwztPr8kfcUZMs=wBNyDvf5DiNBjxXXl7QvyHOvNuTVjZAaw0n7WaQM-bAe= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AwIFaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=nAdzrx0oFETmuwc9kjHUEBdG2y7GhDKXpaAS5jLedxAs=ldDTgR0M9Ltu-Kyh-nH37uDBxApVBRuxXB086icVVMce= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/radeon: don't crash when getting out-of-bounds TEMP references
On 12.04.2015 04:11, Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com --- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 1690194..333f7ae 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -191,6 +191,8 @@ emit_fetch( break; case TGSI_FILE_TEMPORARY: + if (reg-Register.Index = ctx-temps_count) + return LLVMGetUndef(tgsi2llvmtype(bld_base, type)); if (uses_temp_indirect_addressing(bld_base)) { ptr = lp_get_temp_ptr_soa(bld, reg-Register.Index, swizzle); break; @@ -395,6 +397,8 @@ emit_store( break; case TGSI_FILE_TEMPORARY: + if (range.First + i = ctx-temps_count) + continue; if (uses_temp_indirect_addressing(bld_base)) temp_ptr = lp_get_temp_ptr_soa(bld, i + range.First, chan_index); else @@ -416,6 +420,8 @@ emit_store( break; case TGSI_FILE_TEMPORARY: + if (reg-Register.Index = ctx-temps_count) + continue; if (uses_temp_indirect_addressing(bld_base)) { temp_ptr = NULL; break; Looks good to me, but Tom should probably also take a look. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (10.5): radeonsi: Cache LLVMTargetMachineRef in context instead of in screen
On 09.04.2015 20:27, eveli...@kemper.freedesktop.org (Emil Velikov) wrote: @@ -2740,7 +2741,45 @@ static int si_generate_gs_copy_shader(struct si_screen *sscreen, return r; } -int si_shader_create(struct si_screen *sscreen, struct si_shader *shader) +static void si_dump_key(unsigned shader, union si_shader_key *key) +{ + int i; + + fprintf(stderr, SHADER KEY\n); + + switch (shader) { + case PIPE_SHADER_VERTEX: + fprintf(stderr, instance_divisors = {); + for (i = 0; i Elements(key-vs.instance_divisors); i++) + fprintf(stderr, !i ? %u : , %u, + key-vs.instance_divisors[i]); + fprintf(stderr, }\n); + + if (key-vs.as_es) + fprintf(stderr, gs_used_inputs = 0x%PRIx64\n, + key-vs.gs_used_inputs); + fprintf(stderr, as_es = %u\n, key-vs.as_es); + break; + + case PIPE_SHADER_GEOMETRY: + break; + + case PIPE_SHADER_FRAGMENT: + fprintf(stderr, export_16bpc = 0x%X\n, key-ps.export_16bpc); + fprintf(stderr, last_cbuf = %u\n, key-ps.last_cbuf); + fprintf(stderr, color_two_side = %u\n, key-ps.color_two_side); + fprintf(stderr, alpha_func = %u\n, key-ps.alpha_func); + fprintf(stderr, alpha_to_one = %u\n, key-ps.alpha_to_one); + fprintf(stderr, poly_stipple = %u\n, key-ps.poly_stipple); + break; + + default: + assert(0); + } +} The si_dump_key function shouldn't have been added. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] i965: Delete some unnecessary code in brw_report_shader_time().
It is true that a gl_shader_program with ID 0 will be a fixed-function fragment program; a gl_program with ID 0 but NULL gl_shader_program means that it's a fixed-function vertex shader. But that's not terribly interesting or relevant to what we're doing. We just need to know that ID 0 means fixed function. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_program.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index 0ca63de..7ea08e6 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -447,14 +447,9 @@ brw_report_shader_time(struct brw_context *brw) if (prog) { shader_num = prog-Name; - /* The fixed function fragment shader generates GLSL IR with a Name - * of 0, and nothing else does. - */ if (prog-Label) { shader_name = prog-Label; - } else if (shader_num == 0 - (brw-shader_time.types[i] == ST_FS8 || - brw-shader_time.types[i] == ST_FS16)) { + } else if (shader_num == 0) { shader_name = ff; } else { shader_name = glsl; -- 2.3.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] i965: Make shader_time use 0 instead of -1 for no meaningful ID.
0 is not a valid GLSL shader or ARB program ID. For some reason, shader_time used -1 instead...so we had code to detect 0, then override it to -1. We can just delete that. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_program.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index 9e27c2a..0ca63de 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -346,7 +346,7 @@ print_shader_time_line(const char *stage, const char *name, { fprintf(stderr, %-6s%-18s, stage, name); - if (shader_num != -1) + if (shader_num != 0) fprintf(stderr, %4d: , shader_num); else fprintf(stderr, : ); @@ -443,7 +443,7 @@ brw_report_shader_time(struct brw_context *brw) if (scaled[i] == 0) continue; - int shader_num = -1; + int shader_num = 0; if (prog) { shader_num = prog-Name; @@ -456,7 +456,6 @@ brw_report_shader_time(struct brw_context *brw) (brw-shader_time.types[i] == ST_FS8 || brw-shader_time.types[i] == ST_FS16)) { shader_name = ff; -shader_num = -1; } else { shader_name = glsl; } @@ -464,7 +463,6 @@ brw_report_shader_time(struct brw_context *brw) shader_num = brw-shader_time.programs[i]-Id; if (shader_num == 0) { shader_name = ff; -shader_num = -1; } else { shader_name = prog; } @@ -495,10 +493,10 @@ brw_report_shader_time(struct brw_context *brw) } fprintf(stderr, \n); - print_shader_time_line(total, vs, -1, total_by_type[ST_VS], total); - print_shader_time_line(total, gs, -1, total_by_type[ST_GS], total); - print_shader_time_line(total, fs8, -1, total_by_type[ST_FS8], total); - print_shader_time_line(total, fs16, -1, total_by_type[ST_FS16], total); + print_shader_time_line(total, vs, 0, total_by_type[ST_VS], total); + print_shader_time_line(total, gs, 0, total_by_type[ST_GS], total); + print_shader_time_line(total, fs8, 0, total_by_type[ST_FS8], total); + print_shader_time_line(total, fs16, 0, total_by_type[ST_FS16], total); } static void -- 2.3.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: remove bogus r600-- triple
On 09.04.2015 03:27, Emil Velikov wrote: As mentioned by Michel Dänzer for LLVM = 3.6 we create the LLVMTargetMachine (with triple amdgcn--), as we setup the radeonsi context. For older LLVM or hardware (r600) the triple is always r600-- and is created at a later stage - radeon_llvm_compile() Cc: Michel Dänzer michel.daen...@amd.com Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gallium/drivers/radeonsi/si_pipe.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index e761d20..5ea8868 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -85,8 +85,6 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen, void * LLVMTargetRef r600_target; #if HAVE_LLVM = 0x0306 const char *triple = amdgcn--; -#else - const char *triple = r600--; #endif int shader, i; r600_target should probably also be moved inside the #if HAVE_LLVM = 0x0306 as it's only used in that case. But that can be done in a separate patch. Either way, this patch is Reviewed-by: Michel Dänzer michel.daen...@amd.com -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev