Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs
On Thu, 06 Nov 2014 13:01:03 -0800 Ian Romanick i...@freedesktop.org wrote: I thought Eric and Chad already NAKed it in bugzilla. The problem is that applications ask for an RGBA visual for GL blending. They use the alpha channel to generate their images, but the final alpha values are, basically, random... and the composited result would be pure garbage. Reading https://bugs.freedesktop.org/show_bug.cgi?id=67676#c5 We should certainly be exposing EGLConfigs that match up to the rgba visual, though, so you can find it when you try. - Eric To me that sounds like Eric would accept having the visuals there in additional configs (as long as they are sorted after the otherwise equivalent xRGB configs?). Eric, would you like to confirm your current opinion? As Chad points out in comment #1, EGL just doesn't let applications do the thing the patch is trying to do. True in general, not exactly for X11. Surely the native visual is exposed in EGL configs for some reason? (Ok, yeah, not a good argument, EGL considered.) For the record, if a Wayland compositor uses the opaque region hint in Wayland core protocol, we do not see this problem on Wayland at all. In Wayland, so far the implementations always assume that if the alpha channel exists in the buffer, it will be used for window blending, except on regions marked as opaque. So we kind of already have a way to make it work in Wayland, and just X11 is broken here. Still, I won't even try to deny that an extension to EGL_TRANSPARENT_TYPE would be hugely useful and the superior solution over all the hacking. I do understand you would rather see that developed than a native visual based solution. More below... On 11/06/2014 05:12 AM, Emil Velikov wrote: Humble ping x2 On 14/10/14 15:25, Emil Velikov wrote: Humble ping. On 23/09/14 01:25, Emil Velikov wrote: From: Sjoerd Simons sjoerd.sim...@collabora.co.uk When using RGBA EGLConfigs allow both RGB and RGBA X visuals, such that application can decide whether they want to use RGBA (and have the compositor blend their windows). On my system with this change EGLConfigs with a 24 bit visual comes up first, as such applications blindly picking the first EGLConfig will still get an RGB X visual. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676 --- Hello gents, This patch has been stuck in bugzilla since February this year. Bringing it around here to gather greater exposure and perhaps some comments/reviews. -Emil src/egl/drivers/dri2/egl_dri2.c | 5 + src/egl/drivers/dri2/platform_x11.c | 17 + 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 20a7243..2ed90a7 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -110,6 +110,11 @@ EGLint dri2_to_egl_attribute_map[] = { static EGLBoolean dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria) { + + if (criteria-NativeVisualID != EGL_DONT_CARE +conf-NativeVisualID != criteria-NativeVisualID) + return EGL_FALSE; Does this directly affect the behaviour of eglChooseConfig? EGL 1.4 spec quite clearly says that NATIVE_VISUAL_ID is ignored if specified as matching criterion to eglChooseConfig, which is why apps are expected to match it manually rather than via eglChooseConfig. Doesn't that mean that apps that want alpha blending of the window already check the native visual? Maybe not if it worked by accident... Now, this magical alpha-blending visual thing used to work the past somehow for a long time, didn't it? Until someone noticed a performance problem (fd.o #59783), not even a correctness problem, or are there other bugs about correctness too? The patch that caused bug #59783 to be reported was an intel-driver specific commit, and then #59783 got fixed by a hardware agnostic change elsewhere, causing some existing apps to produce unwanted results rather than just affect performance, leading to fd.o #67676. To not break most of existing apps that need alpha for rendering but do not intend the alpha channel to go to the window system blending, we only need to make sure the configs with argb visual get sorted after the equivalent xrgb visual config? Reading the EGL 1.4 spec, that seems perfectly valid. I don't know if this patch guarantees the config ordering though. Thanks, pq + if (_eglCompareConfigs(conf, criteria, NULL, EGL_FALSE) != 0) return EGL_FALSE; diff --git a/src/egl/drivers/dri2/platform_x11.c b/src/egl/drivers/dri2/platform_x11.c index a7a7338..3395fb7 100644 --- a/src/egl/drivers/dri2/platform_x11.c +++ b/src/egl/drivers/dri2/platform_x11.c @@ -672,14 +672,15 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display *dri2_dpy, dri2_add_config(disp, dri2_dpy-driver_configs[j], id++,
[Mesa-dev] Require micro-benchmarks for performance optimization oriented patches
I know that this might sound troublesome but since there is no benchmarks done by reviewers before pushing the performance optimization oriented patches into master branch, I think it's as important as piglit tests and necessary to ask the patch provider for simple OpenGL micro benchmarks triggering the optimized code path to see if there is any real benefits and make sure that it isn't degrading the performance. Being more strict about pushing and quality assurance of these kind of patches will save hours of bisecting and hair-pulling to find the root cause of performance degrades. Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-announce] Mesa 10.3 release candidate 1
On Fri, Nov 7, 2014 at 1:07 AM, Thierry Vignaud thierry.vign...@gmail.com wrote: On 5 November 2014 04:44, Matt Turner matts...@gmail.com wrote: I tried to reproduce this today and couldn't. (...) Thanks. Maybe you could give a little more information, like an error message or something? Same error as Thierry reported in this thread in August: Unfortunately Thierry's was from a re-run of make, so it wasn't useful. No It wasn't a re-run! It was a clean build in our build system with make -jXX with XX auto set to the number of cores and is always reproducable given enough cores Oh, weird. I've gone over this all and can't spot the problem. The dependencies look fine. I tried automake-1.13 and 1.14, and make-3.82 and 4.0. Maybe I'll have more luck on a 40 core system. As already explained, in order to be able to reproduce, you must either have a large system or force make -j to a high value (eg: -j24) Did you see the rest of the thread where I said I couldn't reproduce on a 40 core system? Perhaps someone who can reproduce could try to take a look? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] clover: fix tab/space and alignement
--- src/gallium/state_trackers/clover/llvm/invocation.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp b/src/gallium/state_trackers/clover/llvm/invocation.cpp index 3a4fcf0..d29f5a6 100644 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp @@ -214,7 +214,7 @@ namespace { #if HAVE_LLVM = 0x0306 c.getPreprocessorOpts().addRemappedFile(name, - llvm::MemoryBuffer::getMemBuffer(source).release()); + llvm::MemoryBuffer::getMemBuffer(source).release()); #else c.getPreprocessorOpts().addRemappedFile(name, llvm::MemoryBuffer::getMemBuffer(source)); @@ -675,7 +675,7 @@ static const struct debug_named_value debug_options[] = { {llvm, DBG_LLVM, Dump the generated LLVM IR for all kernels.}, {asm, DBG_ASM, Dump kernel assembly code for targets specifying PIPE_SHADER_IR_NATIVE}, - DEBUG_NAMED_VALUE_END // must be last + DEBUG_NAMED_VALUE_END // must be last }; module @@ -737,7 +737,7 @@ clover::compile_program_llvm(const compat::string source, break; case PIPE_SHADER_IR_NATIVE: { std::vectorchar code = compile_native(mod, triple, processor, -debug_flags DBG_ASM, r_log); + debug_flags DBG_ASM, r_log); m = build_module_native(code, mod, kernels, address_spaces, r_log); break; } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/2] clover CL_INVALID_COMPILER_OPTIONS
When I added clCompileProgram, I forgot to made the change to trigger CL_INVALID_COMPILER_OPTIONS instead of CL_INVALID_BUILD_OPTIONS when invalid option are pass to compiler. This have been catched by the yet to be reviewed Pigilt test The first patch remove some spaces inconsistency in llvm/invocation.cpp. It could have been a stand-alone one EdB (2): clover: fix tab/space and alignement clover: clCompileProgram CL_INVALID_COMPILER_OPTIONS src/gallium/state_trackers/clover/api/program.cpp | 2 ++ src/gallium/state_trackers/clover/llvm/invocation.cpp | 8 2 files changed, 6 insertions(+), 4 deletions(-) -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] clover: clCompileProgram CL_INVALID_COMPILER_OPTIONS
clCompileProgram should return CL_INVALID_COMPILER_OPTIONS instead of CL_INVALID_BUILD_OPTIONS --- src/gallium/state_trackers/clover/api/program.cpp | 2 ++ src/gallium/state_trackers/clover/llvm/invocation.cpp | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/clover/api/program.cpp b/src/gallium/state_trackers/clover/api/program.cpp index 3a6c054..60184ed 100644 --- a/src/gallium/state_trackers/clover/api/program.cpp +++ b/src/gallium/state_trackers/clover/api/program.cpp @@ -182,6 +182,8 @@ clBuildProgram(cl_program d_prog, cl_uint num_devs, prog.build(devs, opts); return CL_SUCCESS; } catch (error e) { + if (e.get() == CL_INVALID_COMPILER_OPTIONS) + return CL_INVALID_BUILD_OPTIONS; if (e.get() == CL_COMPILE_PROGRAM_FAILURE) return CL_BUILD_PROGRAM_FAILURE; return e.get(); diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp b/src/gallium/state_trackers/clover/llvm/invocation.cpp index d29f5a6..30547d0 100644 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp @@ -177,7 +177,7 @@ namespace { opts_carray.data() + opts_carray.size(), Diags); if (!Success) { - throw error(CL_INVALID_BUILD_OPTIONS); + throw error(CL_INVALID_COMPILER_OPTIONS); } c.getFrontendOpts().ProgramAction = clang::frontend::EmitLLVMOnly; c.getHeaderSearchOpts().UseBuiltinIncludes = true; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: validate invariance between builtin variables
Patch adds additional validation for GLSL ES 1.00 that specifies cross stage variance requirements for a set of specified builtins. Fixes failures in WebGL conformance test 'shaders-with-invariance'. Signed-off-by: Tapani Pälli tapani.pa...@intel.com --- src/glsl/linker.cpp | 43 +++ 1 file changed, 43 insertions(+) diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index bd2aa3c..57082a4 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -695,10 +695,29 @@ cross_validate_globals(struct gl_shader_program *prog, * them. */ glsl_symbol_table variables; + + ir_variable *gl_fcoord = NULL, *gl_position = NULL, + *gl_pcoord = NULL, *gl_psize = NULL, + *gl_frontf = NULL; + for (unsigned i = 0; i num_shaders; i++) { if (shader_list[i] == NULL) continue; + /* For GLSL ES 1.00, store builtin variables that require + * cross stage validation. + */ + if (prog-IsES prog-Version 300) { + if (shader_list[i]-Stage == MESA_SHADER_VERTEX) { +gl_position = shader_list[i]-symbols-get_variable(gl_Position); +gl_psize = shader_list[i]-symbols-get_variable(gl_PointSize); + } else if (shader_list[i]-Stage == MESA_SHADER_FRAGMENT) { +gl_fcoord = shader_list[i]-symbols-get_variable(gl_FragCoord); +gl_pcoord = shader_list[i]-symbols-get_variable(gl_PointCoord); +gl_frontf = shader_list[i]-symbols-get_variable(gl_FrontFacing); + } + } + foreach_in_list(ir_instruction, node, shader_list[i]-ir) { ir_variable *const var = node-as_variable(); @@ -904,6 +923,30 @@ cross_validate_globals(struct gl_shader_program *prog, variables.add_variable(var); } } + + /* GLSL ES 1.00 specification, Section Invariance and Linkage says: +* +* For the built-in special variables, gl_FragCoord can only be +* declared invariant if and only if gl_Position is declared invariant. +* Similarly gl_PointCoord can only be declared invariant if and only +* if gl_PointSize is declared invariant. It is an error to declare +* gl_FrontFacing as invariant. The invariance of gl_FrontFacing is the +* same as the invariance of gl_Position. +*/ + if (prog-IsES prog-Version 300) { + if (gl_fcoord gl_position) + if (gl_fcoord-data.invariant !gl_position-data.invariant) +linker_error(prog, gl_FragCoord declared invariant + while gl_Position is declared variant.); + + if (gl_pcoord gl_psize) + if (gl_pcoord-data.invariant !gl_psize-data.invariant) +linker_error(prog, gl_PointCoord declared invariant + while gl_PointSize is declared variant.); + + if (gl_frontf gl_frontf-data.invariant) + linker_error(prog, gl_FrontFacing cannot be declared invariant.); + } } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 85918] Mesa: MSVC 2010/2012 Compile error
https://bugs.freedesktop.org/show_bug.cgi?id=85918 --- Comment #7 from José Fonseca jfons...@vmware.com --- (In reply to Emil Velikov from comment #2) Roland, Can we encourage newcomers to use MSVC 2013 and later (if ever available) ? This way once you guys (and others) are over to 2013 we can just go with - anything prior to MSVC2013 is not supported :P I.e. having a gradual transition period is always a nice. I'm afraid VMware needs to build with MSVC 2012. See explanation on http://lists.freedesktop.org/archives/piglit/2014-October/013129.html It's OK recommending MSVC 2013, but we need to continue to support MSVC 2012 a bit longer. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation forglDrawElements
On Thu, 2014-11-06 at 14:39 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 2:33 PM, Marc Dietrich marvi...@gmx.de wrote: So this is only a problem with Link-Time-Opt. I don't actually know how you're getting to this point in the build with LTO. It fails for me in src/mapi. I think LTO is something we should make work at some point, but I don't think we should gate contributions on whether LTO works or not. FWIW, I always compile mesa with -flto, the only place where it fails is the stand-alone glsl compiler. signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy
On Thu, 2014-11-06 at 19:30 -0500, Frank Henigman wrote: I tested your patch with the teximage program in mesa demos, the same thing I used to benchmark when I developed this code. As Matt and Chad point out, the odd-looking _faster functions are there for a reason. Your change causes a huge slowdown. Yes I should have known better than to assume it was left over code. I didn't know that gcc could inline memcpy like that, very nice. In fact I was reading a blog just last week that was saying msvc was better than gcc for memcpy because gcc was reliant on a library implementation. A good reminder not to believe everything you read on the internet. Anyway I've had another go at it and the performance regression should be fixed. In my testing I couldn't spot any real difference. The main down side is the ssse3 code can't be inlined so there will be a small trade off compared to the current way of building with ssse3 enabled. Also thanks for pointing out teximage I didn't know the mesa demos contained pref tools. I tested on a sandybridge system with a Intel(R) Celeron(R) CPU 857 @ 1.20GHz. Mesa compiled with -O2. original code: TexSubImage(RGBA/ubyte 256 x 256): 9660.4 images/sec, 2415.1 MB/sec TexSubImage(RGBA/ubyte 1024 x 1024): 821.2 images/sec, 3284.7 MB/sec TexSubImage(RGBA/ubyte 4096 x 4096): 76.3 images/sec, 4884.9 MB/sec TexSubImage(BGRA/ubyte 256 x 256): 11307.1 images/sec, 2826.8 MB/sec TexSubImage(BGRA/ubyte 1024 x 1024): 944.6 images/sec, 3778.6 MB/sec TexSubImage(BGRA/ubyte 4096 x 4096): 76.7 images/sec, 4908.3 MB/sec TexSubImage(L/ubyte 256 x 256): 17847.5 images/sec, 1115.5 MB/sec TexSubImage(L/ubyte 1024 x 1024): 3068.2 images/sec, 3068.2 MB/sec TexSubImage(L/ubyte 4096 x 4096): 224.6 images/sec, 3593.0 MB/sec your code: TexSubImage(RGBA/ubyte 256 x 256): 3271.6 images/sec, 817.9 MB/sec TexSubImage(RGBA/ubyte 1024 x 1024): 232.3 images/sec, 929.2 MB/sec TexSubImage(RGBA/ubyte 4096 x 4096): 47.5 images/sec, 3038.6 MB/sec TexSubImage(BGRA/ubyte 256 x 256): 2426.5 images/sec, 606.6 MB/sec TexSubImage(BGRA/ubyte 1024 x 1024): 164.1 images/sec, 656.4 MB/sec TexSubImage(BGRA/ubyte 4096 x 4096): 13.4 images/sec, 854.8 MB/sec TexSubImage(L/ubyte 256 x 256): 9514.5 images/sec, 594.7 MB/sec TexSubImage(L/ubyte 1024 x 1024): 864.1 images/sec, 864.1 MB/sec TexSubImage(L/ubyte 4096 x 4096): 59.7 images/sec, 955.2 MB/sec This is just one run, not an average, but you can see it's slower across the board up to a factor of around 6. Also I couldn't configure the build after your patch. I think you left out a change to configure.ac to define SSSE3_SUPPORTED. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 84566] Unify the format conversion code
https://bugs.freedesktop.org/show_bug.cgi?id=84566 --- Comment #52 from Samuel Iglesias sigles...@igalia.com --- (In reply to Jason Ekstrand from comment #51) (In reply to Samuel Iglesias from comment #48) (In reply to Jason Ekstrand from comment #34) (In reply to Samuel Iglesias from comment #33) Jason, I would like to know your opinion about the integer RGBA clamping done in pack.c (_mesa_pack_rgba_span_from_ints()). glReadPixels() and glGetTexImage() specs said (for example, in OpenGL 3.3. core specification) that for an integer RGBA color, each component is clamped to the representable range of type. Those GL functions end up calling pack.c's functions for performing the conversion (mesa_pack_rgba_span_from_ints() and others). It's possible to replace some of those pack.c's conversion functions by the master conversion but the problem is in the clamping operation. I would like to add a boolean argument called clamp to the master conversion function which is passed to _mesa_swizzle_and_convert() where each of its convert_{uint,int,byte,ubyte,short,ushort}() do the right thing when clamp is true. This clamp parameter would be false for every call to either master conversion function or _mesa_swizzle_and_convert() except when they are being called from pack.c. What do you think? Supporting clamping is probably fine. I think we determined we needed a clamp parameter anyway for some of the float conversions. I guess it makes sense for integers too. Let's watch out for performance when we implement it though. Changing the loop inside mesa_swizzle_and_convert without hurting performance can be tricky. The teximage-colors test in Piglit has a -benchmark flag that can be used for testing that. In the end, we did not make that change in pack.c as we could just use the autogenerated format pack/unpack functions. However I am retaking this topic again because we found another issue which would require a similar solution: The convert_*() functions in format_utils.c convert between source and destination data and are used by _mesa_swizzle_and_convert. We found that these were not good enough for various conversions that involved non-normalized types of different sizes: INT to SHORT, INT to BYTE, etc. Because of that, several piglit tests related to glGetTexImage() and others failed, like for example bin/ext_texture_integer-getteximage-clamping. In order to fix that we added the clamp expressions for these cases [1] and with that we achieved no regressions when executing a full piglit run on i965 driver. Unfortunately, when testing our patches on a couple of Gallium drivers, we found a regression that we tracked down back to that patch: bin/arb_clear_buffer_object-formats. Reverting the patch makes fixes the problem with these Gallium drivers but then, bin/ext_texture_integer-getteximage-clamping fails again on i965. We wonder if there could be more cases like this that piglit is not covering, since it looks weird that this affects just this one test. So, we wonder if that patch for _mesa_swizzle_and_convert is correct and we should fix the failed gallium cases elsewhere, or if we should revert that patch and fix the cases it fixed in a different way. What do you think? Was _mesa_swizzle_and_convert implemented like that on purpose or are these real bugs? From my brief reading of the GL spec, it looks like clamping integers to the max representable range is what it expects by default. From the glTexImage spec: The selected groups are transferred to the GL as described in section 3.7.2 and then clamped to the representable range of the internal format. If the internalformat of the texture is signed or unsigned integer, components are clamped to [-2^(n-1), 2^(n-1)-1] or [0, 2^(n-1)-1], respectively, where n is the number of bits per component. For color component groups, if the internalformat of the texture is signed or unsigned normalized fixed-point, components are clamped t0 [-1, 1] or [0, 1], respectively. Therefore, it seems as if we want to be clamping when we have integer destinations. I'm not sure why the gallium drivers are regressing when you do. One more observation is that it doesn't look like your clamping patch is complete. If we're going to clamp when we have an integer destination, we should always clamp with integer destinations, not just in the few cases that piglit hits. I wouldn't be surprised if piglit's coverage in those areas is terrible. Yes, you are right about the specification. We have added clamping for all the needed conversions in _mesa_swizzle_and_convert() following the spec but, after analyzing why gallium drivers have this regression, I discovered that it comes from other place. The following text was copied from piglit's
Re: [Mesa-dev] [PATCH 0/2] Disable the EGL state tracker for Linux/DRI builds
On Wed, 2014-11-05 at 00:46 +, Emil Velikov wrote: On 04/11/14 22:42, Marek Olšák wrote: Hi everybody, I'm about to address this long-standing issue: The EGL state tracker is redundant. It duplicates what st/dri does and it also duplicates what the common loader egl_dri2 does, which is used by all classic drivers and even works better with gallium drivers. Let's compare EGL extensions for both backends: st/egl: EGL version string: 1.4 (Gallium) EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenVG EGL extensions string: EGL_WL_bind_wayland_display EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_reusable_sync EGL_KHR_fence_sync EGL_KHR_surfaceless_context EGL_NOK_swap_region EGL_NV_post_sub_buffer egl_dri2: EGL version string: 1.4 (DRI2) EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenGL_ES3 EGL extensions string: EGL_MESA_drm_image EGL_MESA_configless_context EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_surfaceless_context EGL_KHR_create_context EGL_NOK_swap_region EGL_NOK_texture_from_pixmap EGL_CHROMIUM_sync_control EGL_EXT_image_dma_buf_import EGL_NV_post_sub_buffer egl_dri2 also supports MSAA on the window framebuffer (through st/dri). It's really obvious which one is better. I'm aware of 2 features that we will lose: - swrast on Wayland - I'm not sure about this. Perhaps kms-swrast has addressed this already. - OpenVG - It has never taken off. If people want this on Linux, it should use egl_dri2 and st/dri, like OpenGL does. I think it would be a shame to lose the OpenVG backend. It's been disappointing with the lack of interest on desktop systems even through cairo, but I wonder if it was because of the inherent Gallium only nature? Cairo's GL backend is probably more likely to work on more systems because of this. Admittedly, it was probably mostly useful as a technology on weaker CPUs and simpler GPUs without full OpenGL(ES) capability where it could provide a performant GUI. T his series removes st/egl and st/gbm support from the autoconf build (the latter depends on the former and is probably just as redundant). The next step is to remove all Linux-specific backends from st/egl. Windows, Android, and other platform backends will be kept intact, therefore st/egl won't be removed completely. Please comment. I use EGL_DRIVER=egl_gallium to switch to using the ilo driver at run- time. (Admittedly, mostly for testing more than anything useful.) There would have to be some other way of at least selecting run-time backend to egl_dri2 for this functionality to continue working. A few thoughts: - Iirc Eric is using st/egl as the dri2 backend seems to be causing problems :( - Android supports dri modules, but st/dri/Android.mk is missing. Should be trivial to add. - Windows - might be a pain in the a** to get it working. Then again does Windows have EGL ? - fbdev, OpenVG - the etnaviv project was using the former, currently moving to full-blown drm :) If one is to nuking the three st we can remove - the libEGL symbol export mayhem - gallium's st_api, and flatten things a bit - a bit of duplication :) In summary: With a bit of work we can remove Linux and Android from the equation. How many people/companies use EGL for Windows/fbdev, how about OpenVG on any platform ? I'm not sure we can know how many use OpenVG. Probably more than is commonly thought on this list. signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V2 2/2] i965: add runtime check for SSSE3 rgba8_copy
Callgrind cpu usage results from pts benchmarks: For ytile_copy_faster() Nexuiz 1.6.1: 2.16% - 1.20% V2: - put back the if statements and add one for the SSSE3 rgba8_copy - move some header files out of the header - don't indent the preprocessor tests - changed copyright to Google and add author Frank Henigman Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/Makefile.am | 8 +++ src/mesa/drivers/dri/i965/intel_tex_subimage.c | 80 - src/mesa/main/fast_rgba8_copy.c| 83 ++ src/mesa/main/fast_rgba8_copy.h| 38 4 files changed, 169 insertions(+), 40 deletions(-) create mode 100644 src/mesa/main/fast_rgba8_copy.c create mode 100644 src/mesa/main/fast_rgba8_copy.h Thanks to everyone who took the time to provide feedback on the first patchset and pointing out that the baby had been thrown out with the bath water. I've had another go at it and the performance regression should be fixed. With the tests available I couldn't spot any real difference. However the down side is the ssse3 code can't be inlined so there will be a small trade off compared to the current way of building with ssse3 enabled. But for the majority of builds done without ssse3 it should be a win. diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..cf87289 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS) ARCH_LIBS = +if SSSE3_SUPPORTED +ARCH_LIBS += libmesa_ssse3.la +endif + if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif @@ -150,6 +154,10 @@ libmesagallium_la_LIBADD = \ $(top_builddir)/src/glsl/libglsl.la \ $(ARCH_LIBS) +libmesa_ssse3_la_SOURCES = \ + main/fast_rgba8_copy.c +libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3 + libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c b/src/mesa/drivers/dri/i965/intel_tex_subimage.c index cb5738a..81a2310 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c @@ -27,6 +27,7 @@ **/ #include main/bufferobj.h +#include main/fast_rgba8_copy.h #include main/image.h #include main/macros.h #include main/mtypes.h @@ -42,9 +43,7 @@ #include intel_mipmap_tree.h #include intel_blit.h -#ifdef __SSSE3__ -#include tmmintrin.h -#endif +#include x86/common_x86_asm.h #define FILE_DEBUG_FLAG DEBUG_TEXTURE @@ -175,18 +174,6 @@ err: return false; } -#ifdef __SSSE3__ -static const uint8_t rgba8_permutation[16] = - { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 }; - -/* NOTE: dst must be 16 byte aligned */ -#define rgba8_copy_16(dst, src) \ - *(__m128i *)(dst) = _mm_shuffle_epi8(\ - (__m128i) _mm_loadu_ps((float *)(src)), \ - *(__m128i *) rgba8_permutation\ - ) -#endif - /** * Copy RGBA to BGRA - swap R and B. */ @@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes) uint8_t *d = dst; uint8_t const *s = src; -#ifdef __SSSE3__ - /* Fast copying for tile spans. -* -* As long as the destination texture is 16 aligned, -* any 16 or 64 spans we get here should also be 16 aligned. -*/ - - if (bytes == 16) { - assert(!(((uintptr_t)dst) 0xf)); - rgba8_copy_16(d+ 0, s+ 0); - return dst; - } - - if (bytes == 64) { - assert(!(((uintptr_t)dst) 0xf)); - rgba8_copy_16(d+ 0, s+ 0); - rgba8_copy_16(d+16, s+16); - rgba8_copy_16(d+32, s+32); - rgba8_copy_16(d+48, s+48); - return dst; - } -#endif - while (bytes = 4) { d[0] = s[2]; d[1] = s[1]; @@ -355,6 +319,12 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, if (mem_copy == memcpy) return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, dst, src, src_pitch, swizzle_bit, memcpy); + #if defined(USE_SSSE3) + else if (mem_copy == _mesa_fast_rgba8_copy) + return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, + dst, src, src_pitch, swizzle_bit, + _mesa_fast_rgba8_copy); + #endif else if (mem_copy == rgba8_copy) return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, dst, src, src_pitch, swizzle_bit, rgba8_copy); @@ -362,6 +332,12 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, if (mem_copy == memcpy) return xtile_copy(x0, x1, x2, x3, y0, y1, dst, src, src_pitch, swizzle_bit, memcpy); + #if defined(USE_SSSE3) + else if (mem_copy ==
[Mesa-dev] [PATCH V2 1/2] mesa: add runtime support for SSSE3
V2: - remove unrequired #ifdef bit_SSSE3 - order flag check in config Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- configure.ac | 6 ++ src/mesa/x86/common_x86.c | 2 ++ src/mesa/x86/common_x86_features.h | 4 +++- 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 03f1bca..f86edf1 100644 --- a/configure.ac +++ b/configure.ac @@ -252,6 +252,12 @@ AC_SUBST([VISIBILITY_CXXFLAGS]) dnl dnl Optional flags, check for compiler support dnl +AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0]) +if test x$SSSE3_SUPPORTED = x1; then +DEFINES=$DEFINES -DUSE_SSSE3 +fi +AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1]) + AX_CHECK_COMPILE_FLAG([-msse4.1], [SSE41_SUPPORTED=1], [SSE41_SUPPORTED=0]) if test x$SSE41_SUPPORTED = x1; then DEFINES=$DEFINES -DUSE_SSE41 diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c index 25f5c40..bef9cf2 100644 --- a/src/mesa/x86/common_x86.c +++ b/src/mesa/x86/common_x86.c @@ -352,6 +352,8 @@ _mesa_get_x86_features(void) __get_cpuid(1, eax, ebx, ecx, edx); + if (ecx bit_SSSE3) + _mesa_x86_cpu_features |= X86_FEATURE_SSSE3; if (ecx bit_SSE4_1) _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1; } diff --git a/src/mesa/x86/common_x86_features.h b/src/mesa/x86/common_x86_features.h index 66f2cf6..6eb2b38 100644 --- a/src/mesa/x86/common_x86_features.h +++ b/src/mesa/x86/common_x86_features.h @@ -43,7 +43,8 @@ #define X86_FEATURE_XMM2 (16) #define X86_FEATURE_3DNOWEXT (17) #define X86_FEATURE_3DNOW (18) -#define X86_FEATURE_SSE4_1 (19) +#define X86_FEATURE_SSSE3 (19) +#define X86_FEATURE_SSE4_1 (110) /* standard X86 CPU features */ #define X86_CPU_FPU(10) @@ -65,6 +66,7 @@ #define cpu_has_xmm2 (_mesa_x86_cpu_features X86_FEATURE_XMM2) #define cpu_has_3dnow (_mesa_x86_cpu_features X86_FEATURE_3DNOW) #define cpu_has_3dnowext (_mesa_x86_cpu_features X86_FEATURE_3DNOWEXT) +#define cpu_has_ssse3 (_mesa_x86_cpu_features X86_FEATURE_SSSE3) #define cpu_has_sse4_1 (_mesa_x86_cpu_features X86_FEATURE_SSE4_1) #endif -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Fri, 2014-11-07 at 11:44 +, Steven Newbury wrote: On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) As long as the check is placed in the right location it shouldn't really make a noticeable difference. i.e. just outside the hotspot and not inside it. Things that will have more impact is not being able to inline certain code such as in the latest patchset I sent out. It seems this is another side effect the way gcc handles intrinsics. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] Disable the EGL state tracker for Linux/DRI builds
On Fri, Nov 7, 2014 at 1:10 PM, Steven Newbury st...@snewbury.org.uk wrote: On Wed, 2014-11-05 at 00:46 +, Emil Velikov wrote: On 04/11/14 22:42, Marek Olšák wrote: Hi everybody, I'm about to address this long-standing issue: The EGL state tracker is redundant. It duplicates what st/dri does and it also duplicates what the common loader egl_dri2 does, which is used by all classic drivers and even works better with gallium drivers. Let's compare EGL extensions for both backends: st/egl: EGL version string: 1.4 (Gallium) EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenVG EGL extensions string: EGL_WL_bind_wayland_display EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_reusable_sync EGL_KHR_fence_sync EGL_KHR_surfaceless_context EGL_NOK_swap_region EGL_NV_post_sub_buffer egl_dri2: EGL version string: 1.4 (DRI2) EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenGL_ES3 EGL extensions string: EGL_MESA_drm_image EGL_MESA_configless_context EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_surfaceless_context EGL_KHR_create_context EGL_NOK_swap_region EGL_NOK_texture_from_pixmap EGL_CHROMIUM_sync_control EGL_EXT_image_dma_buf_import EGL_NV_post_sub_buffer egl_dri2 also supports MSAA on the window framebuffer (through st/dri). It's really obvious which one is better. I'm aware of 2 features that we will lose: - swrast on Wayland - I'm not sure about this. Perhaps kms-swrast has addressed this already. - OpenVG - It has never taken off. If people want this on Linux, it should use egl_dri2 and st/dri, like OpenGL does. I think it would be a shame to lose the OpenVG backend. It's been disappointing with the lack of interest on desktop systems even through cairo, but I wonder if it was because of the inherent Gallium only nature? Cairo's GL backend is probably more likely to work on more systems because of this. Admittedly, it was probably mostly useful as a technology on weaker CPUs and simpler GPUs without full OpenGL(ES) capability where it could provide a performant GUI. T his series removes st/egl and st/gbm support from the autoconf build (the latter depends on the former and is probably just as redundant). The next step is to remove all Linux-specific backends from st/egl. Windows, Android, and other platform backends will be kept intact, therefore st/egl won't be removed completely. Please comment. I use EGL_DRIVER=egl_gallium to switch to using the ilo driver at run- time. (Admittedly, mostly for testing more than anything useful.) There would have to be some other way of at least selecting run-time backend to egl_dri2 for this functionality to continue working. A few thoughts: - Iirc Eric is using st/egl as the dri2 backend seems to be causing problems :( - Android supports dri modules, but st/dri/Android.mk is missing. Should be trivial to add. - Windows - might be a pain in the a** to get it working. Then again does Windows have EGL ? - fbdev, OpenVG - the etnaviv project was using the former, currently moving to full-blown drm :) If one is to nuking the three st we can remove - the libEGL symbol export mayhem - gallium's st_api, and flatten things a bit - a bit of duplication :) In summary: With a bit of work we can remove Linux and Android from the equation. How many people/companies use EGL for Windows/fbdev, how about OpenVG on any platform ? I'm not sure we can know how many use OpenVG. Probably more than is commonly thought on this list. Then I'd like those people or companies to speak up. I seriously doubt anyone is using OpenVG with a Gallium driver. It even needs GL3 hardware (because it uses ARL in a fragment shader). That means every non-GL3 Gallium driver has incomplete OpenVG support, so OpenVG users should use these: radeon = r600, nouveau = nv50, ilo maybe? That's it. I wouldn't bother with softpipe and llvmpipe -- if you have to use them, you're better off with a VG-over-GL wrapper and a good OpenGL driver/hardware combo. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On 11/07/2014 03:14 PM, Steven Newbury wrote: On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) I think that's possible. Since cpu_has_sse4_1 and friends are simply macros, one can set them to true or 1 during compile time if it's going to be built for an SSE 4.1 capable target so your smart compiler will totally get rid of the unnecessary runtime check. I guess common_x86_features.h should be modified to something like this: #ifdef __SSE4_1__ #define cpu_has_sse4_1 1 #else #define cpu_has_sse4_1(_mesa_x86_cpu_features X86_FEATURE_SSE4_1) #endif ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping.
From: José Fonseca jfons...@vmware.com This commit causes the generated C code to change as union util_format_r32g32b32a32_sscaled pixel; - pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647); - pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647); - pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647); - pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647); + pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f); + pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 2147483647.0f); + pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 2147483647.0f); + pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 2147483647.0f); memcpy(dst, pixel, sizeof pixel); which surprisingly makes a difference for MSVC. Thanks to Juraj Svec for diagnosing this and drafting a fix. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=29661 Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org --- src/gallium/auxiliary/util/u_format_pack.py | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/util/u_format_pack.py b/src/gallium/auxiliary/util/u_format_pack.py index 6ccf04c..90f348e 100644 --- a/src/gallium/auxiliary/util/u_format_pack.py +++ b/src/gallium/auxiliary/util/u_format_pack.py @@ -226,9 +226,9 @@ def native_to_constant(type, value): '''Get the value of unity for this type.''' if type.type == FLOAT: if type.size = 32: -return %ff % value +return %.1ff % float(value) else: -return %ff % value +return %.1f % float(value) else: return str(int(value)) @@ -251,8 +251,8 @@ def clamp_expr(src_channel, dst_channel, dst_native_type, value): dst_max = dst_channel.max() # Translate the destination range to the src native value -dst_min_native = value_to_native(src_channel, dst_min) -dst_max_native = value_to_native(src_channel, dst_max) +dst_min_native = native_to_constant(src_channel, value_to_native(src_channel, dst_min)) +dst_max_native = native_to_constant(src_channel, value_to_native(src_channel, dst_max)) if src_min dst_min and src_max dst_max: return 'CLAMP(%s, %s, %s)' % (value, dst_min_native, dst_max_native) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 61312] Assertion failure: freeing bad or corrupted memory in st_translate_mesa_program
https://bugs.freedesktop.org/show_bug.cgi?id=61312 José Fonseca jfons...@vmware.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #2 from José Fonseca jfons...@vmware.com --- This assertion happened because code in st_mesa_to_tgsi.c was mixing up realloc() with FREE() for the same pointer, instead of using REALLOC()+FREE() or realloc() + free() consistent, as the debugging helpers are only enabled with the macros. But this has been fixed since then. It wasn't a single fix. The last one was 11070105f0b5ad20f12bb40a8dd0b357924bcfdd. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping.
A possible wrinkle here is that 2147483647 isn't exactly representable as a float (the adjacent floats are 2147483520 and 2147483648). Is the clamp actually doing the right thing at the top end? On Sat, Nov 8, 2014 at 3:32 AM, jfons...@vmware.com wrote: From: José Fonseca jfons...@vmware.com This commit causes the generated C code to change as union util_format_r32g32b32a32_sscaled pixel; - pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647); - pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647); - pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647); - pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647); + pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f); + pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 2147483647.0f); + pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 2147483647.0f); + pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 2147483647.0f); memcpy(dst, pixel, sizeof pixel); which surprisingly makes a difference for MSVC. Thanks to Juraj Svec for diagnosing this and drafting a fix. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=29661 Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org --- src/gallium/auxiliary/util/u_format_pack.py | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/util/u_format_pack.py b/src/gallium/auxiliary/util/u_format_pack.py index 6ccf04c..90f348e 100644 --- a/src/gallium/auxiliary/util/u_format_pack.py +++ b/src/gallium/auxiliary/util/u_format_pack.py @@ -226,9 +226,9 @@ def native_to_constant(type, value): '''Get the value of unity for this type.''' if type.type == FLOAT: if type.size = 32: -return %ff % value +return %.1ff % float(value) else: -return %ff % value +return %.1f % float(value) else: return str(int(value)) @@ -251,8 +251,8 @@ def clamp_expr(src_channel, dst_channel, dst_native_type, value): dst_max = dst_channel.max() # Translate the destination range to the src native value -dst_min_native = value_to_native(src_channel, dst_min) -dst_max_native = value_to_native(src_channel, dst_max) +dst_min_native = native_to_constant(src_channel, value_to_native(src_channel, dst_min)) +dst_max_native = native_to_constant(src_channel, value_to_native(src_channel, dst_max)) if src_min dst_min and src_max dst_max: return 'CLAMP(%s, %s, %s)' % (value, dst_min_native, dst_max_native) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Fri Nov 7 14:09:09 2014 GMT, Siavash Eliasi wrote: On 11/07/2014 03:14 PM, Steven Newbury wrote: On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) I think that's possible. Since cpu_has_sse4_1 and friends are simply macros, one can set them to true or 1 during compile time if it's going to be built for an SSE 4.1 capable target so your smart compiler will totally get rid of the unnecessary runtime check. I guess common_x86_features.h should be modified to something like this: #ifdef __SSE4_1__ #define cpu_has_sse4_1 1 #else #define cpu_has_sse4_1(_mesa_x86_cpu_features X86_FEATURE_SSE4_1) #endif Yes, this was what I was thinking. Then perhaps an option for disabling run-time detection, with the available cpu features then determined during configuration setting appropriate defines. Whether it's worth it I don't know. I can imagine the compiler having an easier job optimizing the code. -- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Use the predicate enable bit for conditional rendering without stalling
Previously whenever a primitive is drawn the driver would call _mesa_check_conditional_render which blocks waiting for the result of the query to determine whether to render. On Gen7+ there is a bit in the 3DPRIMITIVE command which can be used to disable the primitive based on the value of a state bit. This state bit can be set based on whether two registers have different values using the MI_PREDICATE command. We can load these two registers with the pixel count values stored in the query begin and end to implement conditional rendering without stalling. Unfortunately these two source registers are not in the whitelist of available registers in the kernel driver so this needs a kernel patch to work. This patch tries to check whether it is possible to write to this register by creating a DRM buffer with a single command to write to the register and then trying to exec it. The return value of the exec function (and hence the ioctl) is checked. There is a function with a similar goal just above to check whether the OACONTROL register can be used. This works by putting the test command in the regular batch buffer and trying to compare whether the value was successfully written. I couldn't get this approach to work because intel_batchbuffer_flush actually calls exit() if the ioctl fails. I am suspicious that this approach doesn't work for OACONTROL either. The predicate enable bit is currently only used for drawing 3D primitives. For blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For most of these it would probably just work to call the new brw_check_conditional_render function instead of _mesa_check_conditional_render because they already work in terms of rendering primitives. However it's a bit trickier for blits because it can use the BLT ring or the blorp codepath. I think these operations are less useful for conditional rendering than rendering primitives so it might be best to leave it for a later patch. --- I'm not really sure whether I'm using the right cache domain to load the pixel count values into the predicate source values. Currently I'm using the instruction domain because that is what is used to write the values from a PIPE_CONTROL command but that doesn't seem to make much sense because the comment for that domain says it is for shaders. I've added the PIPE_CONTROL_FLUSH_ENABLE flag to the PIPE_CONTROL command used to save the depth pass count because the PRM says that you should do this. However it seems to work just as well without it and I'm not really sure what it does. Perhaps without the bit it can start the next command without waiting for the depth pass count to be written and that would be bad because the next command is likely to be the load into the predicate register. I'm not sure if it also implies a cache flush because I would expect that the PIPE_CONTROL command and the register load command would be using the same cache domain? I'll post the kernel patch to the intel-gfx list. I've also written a little demo using conditional rendering here: https://github.com/bpeel/glthing/tree/wip/conditional-render With the patch the FPS jumps from ~20 to ~50 although it's a bit of a silly test because if you just completely disable conditional rendering then the FPS goes all the way up to 270. src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 + src/mesa/drivers/dri/i965/brw_context.c| 4 + src/mesa/drivers/dri/i965/brw_context.h| 21 +++ src/mesa/drivers/dri/i965/brw_defines.h| 1 + src/mesa/drivers/dri/i965/brw_draw.c | 16 +- src/mesa/drivers/dri/i965/brw_queryobj.c | 17 ++- src/mesa/drivers/dri/i965/intel_extensions.c | 48 ++ src/mesa/drivers/dri/i965/intel_reg.h | 23 +++ 9 files changed, 286 insertions(+), 12 deletions(-) create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 711aabe..0adaf4d 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -38,6 +38,7 @@ i965_FILES = \ brw_clip_tri.c \ brw_clip_unfilled.c \ brw_clip_util.c \ + brw_conditional_render.c \ brw_context.c \ brw_cubemap_normalize.cpp \ brw_curbe.c \ diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c b/src/mesa/drivers/dri/i965/brw_conditional_render.c new file mode 100644 index 000..e676aa0 --- /dev/null +++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c @@ -0,0 +1,167 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + *
Re: [Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping.
Great point. The source values, src[], are already in floating point, so it's OK not to represent 2147483647 exactly, however, if the compiler 2147483647.0f upwards to 2147483648 then indeed the clamping won't work properly, as 2147483648 will overflow when converted to int32. I've confirmed that gcc generates precisely the same assembly, so at least this change fix one. But you're right, we do have a bug, before and after this change: $ cat test.c #include stdint.h #include stdio.h #define CLAMP( X, MIN, MAX ) ( (X)(MIN) ? (MIN) : ((X)(MAX) ? (MAX) : (X)) ) int main() { float src = 2147483648.0f; int32_t dst = (int32_t)CLAMP(src, -2147483648, 2147483647); printf(%f %i\n, src, dst); return 0; } $ gcc -O0 -Wall -o test test.c $ ./test 2147483648.00 -2147483648 And to make things even stranger, it depends on compiler optimization: $ gcc -O2 -Wall -o test test.c $ ./test 2147483648.00 2147483647 Bummer.. It's quite hard to come up with general rules to cover all this corner cases. I suspect llvmpipe's LLVM IR generation suffers from similar issues . To starters we need to add test cases for these... Jose From: Chris Forbes chr...@ijw.co.nz Sent: 07 November 2014 15:12 To: Jose Fonseca Cc: mesa-dev@lists.freedesktop.org; Roland Scheidegger; Brian Paul Subject: Re: [Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping. A possible wrinkle here is that 2147483647 isn't exactly representable as a float (the adjacent floats are 2147483520 and 2147483648). Is the clamp actually doing the right thing at the top end? On Sat, Nov 8, 2014 at 3:32 AM, jfons...@vmware.com wrote: From: José Fonseca jfons...@vmware.com This commit causes the generated C code to change as union util_format_r32g32b32a32_sscaled pixel; - pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647); - pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647); - pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647); - pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647); + pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f); + pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 2147483647.0f); + pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 2147483647.0f); + pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 2147483647.0f); memcpy(dst, pixel, sizeof pixel); which surprisingly makes a difference for MSVC. Thanks to Juraj Svec for diagnosing this and drafting a fix. Fixes https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D29661d=AAIFaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=djQWlVnf6dbYVyPQ5qxDycdd3R5KbOq7Bj00kHD0dZEs=1xrbiqwOdtyjZXGw4e6Kj338lZaorHg_Hj-BNByAboQe= Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org --- src/gallium/auxiliary/util/u_format_pack.py | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/util/u_format_pack.py b/src/gallium/auxiliary/util/u_format_pack.py index 6ccf04c..90f348e 100644 --- a/src/gallium/auxiliary/util/u_format_pack.py +++ b/src/gallium/auxiliary/util/u_format_pack.py @@ -226,9 +226,9 @@ def native_to_constant(type, value): '''Get the value of unity for this type.''' if type.type == FLOAT: if type.size = 32: -return %ff % value +return %.1ff % float(value) else: -return %ff % value +return %.1f % float(value) else: return str(int(value)) @@ -251,8 +251,8 @@ def clamp_expr(src_channel, dst_channel, dst_native_type, value): dst_max = dst_channel.max() # Translate the destination range to the src native value -dst_min_native = value_to_native(src_channel, dst_min) -dst_max_native = value_to_native(src_channel, dst_max) +dst_min_native = native_to_constant(src_channel, value_to_native(src_channel, dst_min)) +dst_max_native = native_to_constant(src_channel, value_to_native(src_channel, dst_max)) if src_min dst_min and src_max dst_max: return 'CLAMP(%s, %s, %s)' % (value, dst_min_native, dst_max_native) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIFaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=djQWlVnf6dbYVyPQ5qxDycdd3R5KbOq7Bj00kHD0dZEs=imZ41uT_6T2fgHee2dCxYwjQLSqfLYKgC8YgzpGFbpMe= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping.
Surprising indeed... Reviewed-by: Roland Scheidegger srol...@vmware.com Am 07.11.2014 um 15:32 schrieb jfons...@vmware.com: From: José Fonseca jfons...@vmware.com This commit causes the generated C code to change as union util_format_r32g32b32a32_sscaled pixel; - pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647); - pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647); - pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647); - pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647); + pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f); + pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 2147483647.0f); + pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 2147483647.0f); + pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 2147483647.0f); memcpy(dst, pixel, sizeof pixel); which surprisingly makes a difference for MSVC. Thanks to Juraj Svec for diagnosing this and drafting a fix. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=29661 Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org --- src/gallium/auxiliary/util/u_format_pack.py | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/util/u_format_pack.py b/src/gallium/auxiliary/util/u_format_pack.py index 6ccf04c..90f348e 100644 --- a/src/gallium/auxiliary/util/u_format_pack.py +++ b/src/gallium/auxiliary/util/u_format_pack.py @@ -226,9 +226,9 @@ def native_to_constant(type, value): '''Get the value of unity for this type.''' if type.type == FLOAT: if type.size = 32: -return %ff % value +return %.1ff % float(value) else: -return %ff % value +return %.1f % float(value) else: return str(int(value)) @@ -251,8 +251,8 @@ def clamp_expr(src_channel, dst_channel, dst_native_type, value): dst_max = dst_channel.max() # Translate the destination range to the src native value -dst_min_native = value_to_native(src_channel, dst_min) -dst_max_native = value_to_native(src_channel, dst_max) +dst_min_native = native_to_constant(src_channel, value_to_native(src_channel, dst_min)) +dst_max_native = native_to_constant(src_channel, value_to_native(src_channel, dst_max)) if src_min dst_min and src_max dst_max: return 'CLAMP(%s, %s, %s)' % (value, dst_min_native, dst_max_native) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On 11/07/2014 06:09 AM, Siavash Eliasi wrote: On 11/07/2014 03:14 PM, Steven Newbury wrote: On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) I think that's possible. Since cpu_has_sse4_1 and friends are simply macros, one can set them to true or 1 during compile time if it's going to be built for an SSE 4.1 capable target so your smart compiler will totally get rid of the unnecessary runtime check. I guess common_x86_features.h should be modified to something like this: #ifdef __SSE4_1__ #define cpu_has_sse4_1 1 #else #define cpu_has_sse4_1(_mesa_x86_cpu_features X86_FEATURE_SSE4_1) #endif I was thinking about doing something similar for cpu_has_xmm and cpu_has_xmm2 for x64. SSE and SSE2 are required parts of that instruction set, so they're always there. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping.
Ah yes I forgot about that, even though I have fixed this same problem elsewhere in the past... It is indeed an annoying problem, luckily only affects 32bit numbers. Am 07.11.2014 um 16:35 schrieb Jose Fonseca: Great point. The source values, src[], are already in floating point, so it's OK not to represent 2147483647 exactly, however, if the compiler 2147483647.0f upwards to 2147483648 then indeed the clamping won't work properly, as 2147483648 will overflow when converted to int32. I've confirmed that gcc generates precisely the same assembly, so at least this change fix one. But you're right, we do have a bug, before and after this change: $ cat test.c #include stdint.h #include stdio.h #define CLAMP( X, MIN, MAX ) ( (X)(MIN) ? (MIN) : ((X)(MAX) ? (MAX) : (X)) ) int main() { float src = 2147483648.0f; int32_t dst = (int32_t)CLAMP(src, -2147483648, 2147483647); printf(%f %i\n, src, dst); return 0; } $ gcc -O0 -Wall -o test test.c $ ./test 2147483648.00 -2147483648 And to make things even stranger, it depends on compiler optimization: $ gcc -O2 -Wall -o test test.c $ ./test 2147483648.00 2147483647 That's not surprising, since it's undefined. On x86 out-of-bounds values return the integer indefinite value, which is 0x8000 (though might be different if using sse or not I forgot). With O2 it probably got optimized away completely by gcc and the logic apparently got it correct. Bummer.. It's quite hard to come up with general rules to cover all this corner cases. I suspect llvmpipe's LLVM IR generation suffers from similar issues . To starters we need to add test cases for these... I think we mostly get lucky because a lot of possible conversions are things which aren't hit in practice. Jose From: Chris Forbes chr...@ijw.co.nz Sent: 07 November 2014 15:12 To: Jose Fonseca Cc: mesa-dev@lists.freedesktop.org; Roland Scheidegger; Brian Paul Subject: Re: [Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping. A possible wrinkle here is that 2147483647 isn't exactly representable as a float (the adjacent floats are 2147483520 and 2147483648). Is the clamp actually doing the right thing at the top end? On Sat, Nov 8, 2014 at 3:32 AM, jfons...@vmware.com wrote: From: José Fonseca jfons...@vmware.com This commit causes the generated C code to change as union util_format_r32g32b32a32_sscaled pixel; - pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647); - pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647); - pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647); - pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647); + pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f); + pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 2147483647.0f); + pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 2147483647.0f); + pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 2147483647.0f); memcpy(dst, pixel, sizeof pixel); which surprisingly makes a difference for MSVC. Thanks to Juraj Svec for diagnosing this and drafting a fix. Fixes https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D29661d=AAIFaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=djQWlVnf6dbYVyPQ5qxDycdd3R5KbOq7Bj00kHD0dZEs=1xrbiqwOdtyjZXGw4e6Kj338lZaorHg_Hj-BNByAboQe= Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org --- src/gallium/auxiliary/util/u_format_pack.py | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/util/u_format_pack.py b/src/gallium/auxiliary/util/u_format_pack.py index 6ccf04c..90f348e 100644 --- a/src/gallium/auxiliary/util/u_format_pack.py +++ b/src/gallium/auxiliary/util/u_format_pack.py @@ -226,9 +226,9 @@ def native_to_constant(type, value): '''Get the value of unity for this type.''' if type.type == FLOAT: if type.size = 32: -return %ff % value +return %.1ff % float(value) else: -return %ff % value +return %.1f % float(value) else: return str(int(value)) @@ -251,8 +251,8 @@ def clamp_expr(src_channel, dst_channel, dst_native_type, value): dst_max = dst_channel.max() # Translate the destination range to the src native value -dst_min_native = value_to_native(src_channel, dst_min) -dst_max_native = value_to_native(src_channel, dst_max) +dst_min_native = native_to_constant(src_channel, value_to_native(src_channel, dst_min)) +dst_max_native = native_to_constant(src_channel, value_to_native(src_channel, dst_max)) if src_min dst_min and src_max dst_max: return
[Mesa-dev] [Bug 54080] glXQueryDrawable fails with GLXBadDrawable for a Window in direct context
https://bugs.freedesktop.org/show_bug.cgi?id=54080 Eero Tamminen eero.t.tammi...@intel.com changed: What|Removed |Added CC||eero.t.tammi...@intel.com -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. Same must be applied to these patches: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy http://lists.freedesktop.org/archives/mesa-dev/2014-November/070256.html Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-announce] Mesa 10.3 release candidate 1
On 5 November 2014 04:44, Matt Turner matts...@gmail.com wrote: I tried to reproduce this today and couldn't. (...) Thanks. Maybe you could give a little more information, like an error message or something? Same error as Thierry reported in this thread in August: Unfortunately Thierry's was from a re-run of make, so it wasn't useful. No It wasn't a re-run! It was a clean build in our build system with make -jXX with XX auto set to the number of cores and is always reproducable given enough cores I've gone over this all and can't spot the problem. The dependencies look fine. I tried automake-1.13 and 1.14, and make-3.82 and 4.0. Maybe I'll have more luck on a 40 core system. As already explained, in order to be able to reproduce, you must either have a large system or force make -j to a high value (eg: -j24) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading opengl32.dll
From: José Fonseca jfons...@vmware.com On Windows, DllMain calls and thread creation/destruction are serialized, so when llvmpipe is destroyed from DllMain waiting for the rasterizer threads to finish will deadlock. So, instead of waiting for rasterizer threads to have finished, simply wait for the rasterizer threads to notify they are just about to finish. Verified with this very simple program: #include windows.h int main() { HMODULE hModule = LoadLibraryA(opengl32.dll); FreeLibrary(hModule); } Fixes https://bugs.freedesktop.org/show_bug.cgi?id=76252 Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/llvmpipe/lp_rast.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c b/src/gallium/drivers/llvmpipe/lp_rast.c index a3420a2..6b54d43 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.c +++ b/src/gallium/drivers/llvmpipe/lp_rast.c @@ -800,6 +800,8 @@ static PIPE_THREAD_ROUTINE( thread_function, init_data ) pipe_semaphore_signal(task-work_done); } + pipe_semaphore_signal(task-work_done); + return 0; } @@ -885,9 +887,11 @@ void lp_rast_destroy( struct lp_rasterizer *rast ) pipe_semaphore_signal(rast-tasks[i].work_ready); } - /* Wait for threads to terminate before cleaning up per-thread data */ + /* Wait for threads to terminate before cleaning up per-thread data. +* We don't actually call pipe_thread_wait to avoid dead lock on Windows +* per https://bugs.freedesktop.org/show_bug.cgi?id=76252 */ for (i = 0; i rast-num_threads; i++) { - pipe_thread_wait(rast-threads[i]); + pipe_semaphore_wait(rast-tasks[i].work_done); } /* Clean up per-thread data */ -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading opengl32.dll
Reviewed-by: Roland Scheidegger srol...@vmware.com Am 07.11.2014 um 17:52 schrieb jfons...@vmware.com: From: José Fonseca jfons...@vmware.com On Windows, DllMain calls and thread creation/destruction are serialized, so when llvmpipe is destroyed from DllMain waiting for the rasterizer threads to finish will deadlock. So, instead of waiting for rasterizer threads to have finished, simply wait for the rasterizer threads to notify they are just about to finish. Verified with this very simple program: #include windows.h int main() { HMODULE hModule = LoadLibraryA(opengl32.dll); FreeLibrary(hModule); } Fixes https://bugs.freedesktop.org/show_bug.cgi?id=76252 Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/llvmpipe/lp_rast.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c b/src/gallium/drivers/llvmpipe/lp_rast.c index a3420a2..6b54d43 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.c +++ b/src/gallium/drivers/llvmpipe/lp_rast.c @@ -800,6 +800,8 @@ static PIPE_THREAD_ROUTINE( thread_function, init_data ) pipe_semaphore_signal(task-work_done); } + pipe_semaphore_signal(task-work_done); + return 0; } @@ -885,9 +887,11 @@ void lp_rast_destroy( struct lp_rasterizer *rast ) pipe_semaphore_signal(rast-tasks[i].work_ready); } - /* Wait for threads to terminate before cleaning up per-thread data */ + /* Wait for threads to terminate before cleaning up per-thread data. +* We don't actually call pipe_thread_wait to avoid dead lock on Windows +* per https://bugs.freedesktop.org/show_bug.cgi?id=76252 */ for (i = 0; i rast-num_threads; i++) { - pipe_thread_wait(rast-threads[i]); + pipe_semaphore_wait(rast-tasks[i].work_done); } /* Clean up per-thread data */ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On 11/07/2014 07:31 PM, Ian Romanick wrote: On 11/07/2014 06:09 AM, Siavash Eliasi wrote: On 11/07/2014 03:14 PM, Steven Newbury wrote: On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) I think that's possible. Since cpu_has_sse4_1 and friends are simply macros, one can set them to true or 1 during compile time if it's going to be built for an SSE 4.1 capable target so your smart compiler will totally get rid of the unnecessary runtime check. I guess common_x86_features.h should be modified to something like this: #ifdef __SSE4_1__ #define cpu_has_sse4_1 1 #else #define cpu_has_sse4_1(_mesa_x86_cpu_features X86_FEATURE_SSE4_1) #endif I was thinking about doing something similar for cpu_has_xmm and cpu_has_xmm2 for x64. SSE and SSE2 are required parts of that instruction set, so they're always there. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev I can come up with a patch implementing the same for SSE, SSE2, SSE3 and SSSE3 if current approach is fine by you. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 84566] Unify the format conversion code
https://bugs.freedesktop.org/show_bug.cgi?id=84566 --- Comment #53 from Jason Ekstrand ja...@jlekstrand.net --- (In reply to Samuel Iglesias from comment #52) (In reply to Jason Ekstrand from comment #51) (In reply to Samuel Iglesias from comment #48) (In reply to Jason Ekstrand from comment #34) (In reply to Samuel Iglesias from comment #33) Jason, I would like to know your opinion about the integer RGBA clamping done in pack.c (_mesa_pack_rgba_span_from_ints()). glReadPixels() and glGetTexImage() specs said (for example, in OpenGL 3.3. core specification) that for an integer RGBA color, each component is clamped to the representable range of type. Those GL functions end up calling pack.c's functions for performing the conversion (mesa_pack_rgba_span_from_ints() and others). It's possible to replace some of those pack.c's conversion functions by the master conversion but the problem is in the clamping operation. I would like to add a boolean argument called clamp to the master conversion function which is passed to _mesa_swizzle_and_convert() where each of its convert_{uint,int,byte,ubyte,short,ushort}() do the right thing when clamp is true. This clamp parameter would be false for every call to either master conversion function or _mesa_swizzle_and_convert() except when they are being called from pack.c. What do you think? Supporting clamping is probably fine. I think we determined we needed a clamp parameter anyway for some of the float conversions. I guess it makes sense for integers too. Let's watch out for performance when we implement it though. Changing the loop inside mesa_swizzle_and_convert without hurting performance can be tricky. The teximage-colors test in Piglit has a -benchmark flag that can be used for testing that. In the end, we did not make that change in pack.c as we could just use the autogenerated format pack/unpack functions. However I am retaking this topic again because we found another issue which would require a similar solution: The convert_*() functions in format_utils.c convert between source and destination data and are used by _mesa_swizzle_and_convert. We found that these were not good enough for various conversions that involved non-normalized types of different sizes: INT to SHORT, INT to BYTE, etc. Because of that, several piglit tests related to glGetTexImage() and others failed, like for example bin/ext_texture_integer-getteximage-clamping. In order to fix that we added the clamp expressions for these cases [1] and with that we achieved no regressions when executing a full piglit run on i965 driver. Unfortunately, when testing our patches on a couple of Gallium drivers, we found a regression that we tracked down back to that patch: bin/arb_clear_buffer_object-formats. Reverting the patch makes fixes the problem with these Gallium drivers but then, bin/ext_texture_integer-getteximage-clamping fails again on i965. We wonder if there could be more cases like this that piglit is not covering, since it looks weird that this affects just this one test. So, we wonder if that patch for _mesa_swizzle_and_convert is correct and we should fix the failed gallium cases elsewhere, or if we should revert that patch and fix the cases it fixed in a different way. What do you think? Was _mesa_swizzle_and_convert implemented like that on purpose or are these real bugs? From my brief reading of the GL spec, it looks like clamping integers to the max representable range is what it expects by default. From the glTexImage spec: The selected groups are transferred to the GL as described in section 3.7.2 and then clamped to the representable range of the internal format. If the internalformat of the texture is signed or unsigned integer, components are clamped to [-2^(n-1), 2^(n-1)-1] or [0, 2^(n-1)-1], respectively, where n is the number of bits per component. For color component groups, if the internalformat of the texture is signed or unsigned normalized fixed-point, components are clamped t0 [-1, 1] or [0, 1], respectively. Therefore, it seems as if we want to be clamping when we have integer destinations. I'm not sure why the gallium drivers are regressing when you do. One more observation is that it doesn't look like your clamping patch is complete. If we're going to clamp when we have an integer destination, we should always clamp with integer destinations, not just in the few cases that piglit hits. I wouldn't be surprised if piglit's coverage in those areas is terrible. Yes, you are right about the specification. We have added clamping for all the needed conversions in
Re: [Mesa-dev] [PATCH 0/2] Disable the EGL state tracker for Linux/DRI builds
On Friday, November 07, 2014 01:52:13 PM Marek Olšák wrote: On Fri, Nov 7, 2014 at 1:10 PM, Steven Newbury st...@snewbury.org.uk wrote: On Wed, 2014-11-05 at 00:46 +, Emil Velikov wrote: On 04/11/14 22:42, Marek Olšák wrote: Hi everybody, I'm about to address this long-standing issue: The EGL state tracker is redundant. It duplicates what st/dri does and it also duplicates what the common loader egl_dri2 does, which is used by all classic drivers and even works better with gallium drivers. Let's compare EGL extensions for both backends: st/egl: EGL version string: 1.4 (Gallium) EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenVG EGL extensions string: EGL_WL_bind_wayland_display EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_reusable_sync EGL_KHR_fence_sync EGL_KHR_surfaceless_context EGL_NOK_swap_region EGL_NV_post_sub_buffer egl_dri2: EGL version string: 1.4 (DRI2) EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenGL_ES3 EGL extensions string: EGL_MESA_drm_image EGL_MESA_configless_context EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_surfaceless_context EGL_KHR_create_context EGL_NOK_swap_region EGL_NOK_texture_from_pixmap EGL_CHROMIUM_sync_control EGL_EXT_image_dma_buf_import EGL_NV_post_sub_buffer egl_dri2 also supports MSAA on the window framebuffer (through st/dri). It's really obvious which one is better. I'm aware of 2 features that we will lose: - swrast on Wayland - I'm not sure about this. Perhaps kms-swrast has addressed this already. - OpenVG - It has never taken off. If people want this on Linux, it should use egl_dri2 and st/dri, like OpenGL does. I think it would be a shame to lose the OpenVG backend. It's been disappointing with the lack of interest on desktop systems even through cairo, but I wonder if it was because of the inherent Gallium only nature? Cairo's GL backend is probably more likely to work on more systems because of this. Admittedly, it was probably mostly useful as a technology on weaker CPUs and simpler GPUs without full OpenGL(ES) capability where it could provide a performant GUI. T his series removes st/egl and st/gbm support from the autoconf build (the latter depends on the former and is probably just as redundant). The next step is to remove all Linux-specific backends from st/egl. Windows, Android, and other platform backends will be kept intact, therefore st/egl won't be removed completely. Please comment. I use EGL_DRIVER=egl_gallium to switch to using the ilo driver at run- time. (Admittedly, mostly for testing more than anything useful.) There would have to be some other way of at least selecting run-time backend to egl_dri2 for this functionality to continue working. A few thoughts: - Iirc Eric is using st/egl as the dri2 backend seems to be causing problems :( - Android supports dri modules, but st/dri/Android.mk is missing. Should be trivial to add. - Windows - might be a pain in the a** to get it working. Then again does Windows have EGL ? - fbdev, OpenVG - the etnaviv project was using the former, currently moving to full-blown drm :) If one is to nuking the three st we can remove - the libEGL symbol export mayhem - gallium's st_api, and flatten things a bit - a bit of duplication :) In summary: With a bit of work we can remove Linux and Android from the equation. How many people/companies use EGL for Windows/fbdev, how about OpenVG on any platform ? I'm not sure we can know how many use OpenVG. Probably more than is commonly thought on this list. Then I'd like those people or companies to speak up. I seriously doubt anyone is using OpenVG with a Gallium driver. It even needs GL3 hardware (because it uses ARL in a fragment shader). That means every non-GL3 Gallium driver has incomplete OpenVG support, so OpenVG users should use these: radeon = r600, nouveau = nv50, ilo maybe? That's it. I wouldn't bother with softpipe and llvmpipe -- if you have to use them, you're better off with a VG-over-GL wrapper and a good OpenGL driver/hardware combo. Marek It seems like every time we discuss this, there's been the but we have it argument and one or two people saying but it would be cool. But in practice, I don't know of anybody even using OpenVG, much less OpenVG on Gallium. People use Cairo or Skia. It's very clearly not being maintained or developed. Marek showed data indicating no one has worked on it in some time. Several developers have expressed that they would like OpenVG to work with egl_dri2, if we're going to keep it at all, and no one has stepped forward to do that work. Perhaps dropping it would inspire a maintainer to
Re: [Mesa-dev] Require micro-benchmarks for performance optimization oriented patches
On Friday, November 07, 2014 12:01:20 PM Siavash Eliasi wrote: I know that this might sound troublesome but since there is no benchmarks done by reviewers before pushing the performance optimization oriented patches into master branch, I think it's as important as piglit tests and necessary to ask the patch provider for simple OpenGL micro benchmarks triggering the optimized code path to see if there is any real benefits and make sure that it isn't degrading the performance. Being more strict about pushing and quality assurance of these kind of patches will save hours of bisecting and hair-pulling to find the root cause of performance degrades. Best regards, Siavash Eliasi. For performance patches, we generally require authors to include data in their commit messages showing that an application benefits by some metric. (It's useful for posterity too, so people can answer what is all this code for?). Honestly, I think I'm okay with our usual metrics like: - Increased FPS in a game or benchmark - Reduced number of instructions or memory accesses in a shader program - Reduced memory consumption - Significant cycle reduction in callgrind or better generated code (ideally if it's a lot of code I'd like better justification) We've had a lot of cases of people submitting +LOC with no actual metric of improvement at all, and we should absolutely reject those until /some/ data is supplied indicating why the patch is useful. Microbenchmarks can be helpful in demonstrating performance improvements. But they can also be pretty misleading - you can make a microbenchmark run a lot faster, yet not impact real applications. On the flip side, if a developer can show that a patch makes a real application (e.g. DOTA 2) run faster, I don't see any reason to require them to additionally write a microbenchmark. The patch is clearly justified, and developers are already overworked. That said, if someone would like to start building an open source microbenchmarking suite, I could definitely see that being a valuable project. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/2] i965: Use the predicate enable bit for conditional rendering without stalling
Previously whenever a primitive is drawn the driver would call _mesa_check_conditional_render which blocks waiting for the result of the query to determine whether to render. On Gen7+ there is a bit in the 3DPRIMITIVE command which can be used to disable the primitive based on the value of a state bit. This state bit can be set based on whether two registers have different values using the MI_PREDICATE command. We can load these two registers with the pixel count values stored in the query begin and end to implement conditional rendering without stalling. Unfortunately these two source registers are not in the whitelist of available registers in the kernel driver so this needs a kernel patch to work. This patch uses the command parser version from intel_screen to detect whether to attempt to set the predicate data registers. The predicate enable bit is currently only used for drawing 3D primitives. For blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For most of these it would probably just work to call the new brw_check_conditional_render function instead of _mesa_check_conditional_render because they already work in terms of rendering primitives. However it's a bit trickier for blits because it can use the BLT ring or the blorp codepath. I think these operations are less useful for conditional rendering than rendering primitives so it might be best to leave it for a later patch. v2: Use the command parser version to detect whether we can write to the predicate data registers instead of trying to execute a register load command. --- Glenn Kennard suggested that instead of trying to send a load register command to detect whether the predicate source registers can be set we could just increase the command parser version in the kernel driver and query that. That seems nicer to me so here is a second version of the patch to do that. I will post a v2 of the kernel patch too. src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 + src/mesa/drivers/dri/i965/brw_context.c| 4 + src/mesa/drivers/dri/i965/brw_context.h| 21 +++ src/mesa/drivers/dri/i965/brw_defines.h| 1 + src/mesa/drivers/dri/i965/brw_draw.c | 16 +- src/mesa/drivers/dri/i965/brw_queryobj.c | 17 ++- src/mesa/drivers/dri/i965/intel_extensions.c | 5 + src/mesa/drivers/dri/i965/intel_reg.h | 23 +++ 9 files changed, 243 insertions(+), 12 deletions(-) create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 711aabe..0adaf4d 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -38,6 +38,7 @@ i965_FILES = \ brw_clip_tri.c \ brw_clip_unfilled.c \ brw_clip_util.c \ + brw_conditional_render.c \ brw_context.c \ brw_cubemap_normalize.cpp \ brw_curbe.c \ diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c b/src/mesa/drivers/dri/i965/brw_conditional_render.c new file mode 100644 index 000..e676aa0 --- /dev/null +++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c @@ -0,0 +1,167 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Neil Roberts n...@linux.intel.com + */ + +/** @file brw_conditional_render.c + * + * Support for conditional rendering based on query objects + * (GL_NV_conditional_render, GL_ARB_conditional_render_inverted) on Gen7+. + */ + +#include main/imports.h +#include main/condrender.h + +#include brw_context.h +#include brw_defines.h +#include intel_batchbuffer.h + +static void +set_predicate_enable(struct brw_context *brw, + bool value) +{ +
[Mesa-dev] [PATCH 1/2] i965: Store the command parser version number in intel_screen
In order to detect whether the predicate source registers can be used in a later patch we will need to know the version number for the command parser. This patch just adds a member to intel_screen and does an ioctl to get the version. --- src/mesa/drivers/dri/i965/intel_screen.c | 7 +++ src/mesa/drivers/dri/i965/intel_screen.h | 8 +++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 6618c1a..ab7ac85 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1362,6 +1362,13 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp) (ret != -1 || errno != EINVAL); } + struct drm_i915_getparam getparam; + getparam.param = I915_PARAM_CMD_PARSER_VERSION; + getparam.value = intelScreen-cmd_parser_version; + const int ret = drmIoctl(psp-fd, DRM_IOCTL_I915_GETPARAM, getparam); + if (ret == -1) + intelScreen-cmd_parser_version = 0; + psp-extensions = !intelScreen-has_context_reset_notification ? intelScreenExtensions : intelRobustScreenExtensions; diff --git a/src/mesa/drivers/dri/i965/intel_screen.h b/src/mesa/drivers/dri/i965/intel_screen.h index 393315e..a7706b3 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.h +++ b/src/mesa/drivers/dri/i965/intel_screen.h @@ -115,7 +115,13 @@ struct intel_screen * Configuration cache with default values for all contexts */ driOptionCache optionCache; -}; + + /** +* Version of the command parser reported by the +* I915_PARAM_CMD_PARSER_VERSION parameter +*/ + int cmd_parser_version; + }; extern void intelDestroyContext(__DRIcontext * driContextPriv); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965: Store the command parser version number in intel_screen
On Friday, November 07, 2014 06:52:59 PM Neil Roberts wrote: In order to detect whether the predicate source registers can be used in a later patch we will need to know the version number for the command parser. This patch just adds a member to intel_screen and does an ioctl to get the version. --- src/mesa/drivers/dri/i965/intel_screen.c | 7 +++ src/mesa/drivers/dri/i965/intel_screen.h | 8 +++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 6618c1a..ab7ac85 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1362,6 +1362,13 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp) (ret != -1 || errno != EINVAL); } + struct drm_i915_getparam getparam; + getparam.param = I915_PARAM_CMD_PARSER_VERSION; + getparam.value = intelScreen-cmd_parser_version; + const int ret = drmIoctl(psp-fd, DRM_IOCTL_I915_GETPARAM, getparam); + if (ret == -1) + intelScreen-cmd_parser_version = 0; + psp-extensions = !intelScreen-has_context_reset_notification ? intelScreenExtensions : intelRobustScreenExtensions; diff --git a/src/mesa/drivers/dri/i965/intel_screen.h b/src/mesa/drivers/dri/i965/intel_screen.h index 393315e..a7706b3 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.h +++ b/src/mesa/drivers/dri/i965/intel_screen.h @@ -115,7 +115,13 @@ struct intel_screen * Configuration cache with default values for all contexts */ driOptionCache optionCache; -}; + + /** +* Version of the command parser reported by the +* I915_PARAM_CMD_PARSER_VERSION parameter +*/ + int cmd_parser_version; + }; extern void intelDestroyContext(__DRIcontext * driContextPriv); Patch 1 is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org We'll clearly want this eventually. Whether or not we can actually use the command parser is unclear - at present, the kernel leaves the hardware command validator enabled even if you also validate in software. It looks like Brad sent out new patches that might fix this, though, so assuming those land, maybe we'll be good to go... :) --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs
Pekka Paalanen ppaala...@gmail.com writes: On Thu, 06 Nov 2014 13:01:03 -0800 Ian Romanick i...@freedesktop.org wrote: I thought Eric and Chad already NAKed it in bugzilla. The problem is that applications ask for an RGBA visual for GL blending. They use the alpha channel to generate their images, but the final alpha values are, basically, random... and the composited result would be pure garbage. Reading https://bugs.freedesktop.org/show_bug.cgi?id=67676#c5 We should certainly be exposing EGLConfigs that match up to the rgba visual, though, so you can find it when you try. - Eric To me that sounds like Eric would accept having the visuals there in additional configs (as long as they are sorted after the otherwise equivalent xRGB configs?). Eric, would you like to confirm your current opinion? What I believe we want: Somebody just requesting RGBA with ChooseConfig doesn't end up forced into the depth-32 (blending) X visual. This is the most important. There is some mechanism for somebody that does want the depth 32 visual to get an EGL config to draw to it. This is important but secondary to not breaking everything else, and them having to jump through hoops is reasonable but probably avoidable. pgp0ltGUZci54.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Require micro-benchmarks for performance optimization oriented patches
On Fri, Nov 7, 2014 at 12:31 AM, Siavash Eliasi siavashser...@gmail.com wrote: Being more strict about pushing and quality assurance of these kind of patches will save hours of bisecting and hair-pulling to find the root cause of performance degrades. You say this as if it has happened...? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] util/format: Fix clamping to 32bit integers.
From: José Fonseca jfons...@vmware.com Use clamping constants that guarantee no integer overflows. As spotted by Chris Forbes. This causes the code to change as: - value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967295.0f); + value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967040.0f); - value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f)); + value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483520.0f)); --- src/gallium/auxiliary/util/u_format_pack.py | 27 +++ 1 file changed, 27 insertions(+) diff --git a/src/gallium/auxiliary/util/u_format_pack.py b/src/gallium/auxiliary/util/u_format_pack.py index 90f348e..d5138cc 100644 --- a/src/gallium/auxiliary/util/u_format_pack.py +++ b/src/gallium/auxiliary/util/u_format_pack.py @@ -207,9 +207,36 @@ def get_one_shift(type): assert False +def truncate_mantissa(x, bits): +'''Truncate an integer so it can be represented exactly with a floating +point mantissa''' + +assert isinstance(x, (int, long)) + +s = 1 +if x 0: +s = -1 +x = -x + +# We can represent integers up to mantissa + 1 bits exactly +mask = (1 (bits + 1)) - 1 + +# Slide the mask until the MSB matches +shift = 0 +while (x shift) ~mask: +shift += 1 + +x = mask shift +x *= s +return x + + def value_to_native(type, value): '''Get the value of unity for this type.''' if type.type == FLOAT: +if type.size = 32 \ +and isinstance(value, (int, long)): +return truncate_mantissa(value, 23) return value if type.type == FIXED: return int(value * (1 (type.size/2))) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] util/format: Fix clamping to 32bit integers.
Looks good to me though I barely understand the code there... I guess ideally you'd generate code which actually clamps to the max int value representable instead, but that probably would get even more messy (something like: val = MAX(src[0], -2147483648.0f) if (src[0] = 2147483648.0f) val = 2147483647) Reviewed-by: Roland Scheidegger srol...@vmware.com Am 07.11.2014 um 22:25 schrieb jfons...@vmware.com: From: José Fonseca jfons...@vmware.com Use clamping constants that guarantee no integer overflows. As spotted by Chris Forbes. This causes the code to change as: - value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967295.0f); + value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967040.0f); - value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f)); + value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483520.0f)); --- src/gallium/auxiliary/util/u_format_pack.py | 27 +++ 1 file changed, 27 insertions(+) diff --git a/src/gallium/auxiliary/util/u_format_pack.py b/src/gallium/auxiliary/util/u_format_pack.py index 90f348e..d5138cc 100644 --- a/src/gallium/auxiliary/util/u_format_pack.py +++ b/src/gallium/auxiliary/util/u_format_pack.py @@ -207,9 +207,36 @@ def get_one_shift(type): assert False +def truncate_mantissa(x, bits): +'''Truncate an integer so it can be represented exactly with a floating +point mantissa''' + +assert isinstance(x, (int, long)) + +s = 1 +if x 0: +s = -1 +x = -x + +# We can represent integers up to mantissa + 1 bits exactly +mask = (1 (bits + 1)) - 1 + +# Slide the mask until the MSB matches +shift = 0 +while (x shift) ~mask: +shift += 1 + +x = mask shift +x *= s +return x + + def value_to_native(type, value): '''Get the value of unity for this type.''' if type.type == FLOAT: +if type.size = 32 \ +and isinstance(value, (int, long)): +return truncate_mantissa(value, 23) return value if type.type == FIXED: return int(value * (1 (type.size/2))) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] util/format: Fix clamping to 32bit integers.
I guess ideally you'd generate code which actually clamps to the max int value representable instead Yes, that would be more complex to generate. Jose From: Roland Scheidegger Sent: 07 November 2014 21:40 To: Jose Fonseca; mesa-dev@lists.freedesktop.org; chr...@ijw.co.nz Subject: Re: [PATCH] util/format: Fix clamping to 32bit integers. Looks good to me though I barely understand the code there... I guess ideally you'd generate code which actually clamps to the max int value representable instead, but that probably would get even more messy (something like: val = MAX(src[0], -2147483648.0f) if (src[0] = 2147483648.0f) val = 2147483647) Reviewed-by: Roland Scheidegger srol...@vmware.com Am 07.11.2014 um 22:25 schrieb jfons...@vmware.com: From: José Fonseca jfons...@vmware.com Use clamping constants that guarantee no integer overflows. As spotted by Chris Forbes. This causes the code to change as: - value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967295.0f); + value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967040.0f); - value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f)); + value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483520.0f)); --- src/gallium/auxiliary/util/u_format_pack.py | 27 +++ 1 file changed, 27 insertions(+) diff --git a/src/gallium/auxiliary/util/u_format_pack.py b/src/gallium/auxiliary/util/u_format_pack.py index 90f348e..d5138cc 100644 --- a/src/gallium/auxiliary/util/u_format_pack.py +++ b/src/gallium/auxiliary/util/u_format_pack.py @@ -207,9 +207,36 @@ def get_one_shift(type): assert False +def truncate_mantissa(x, bits): +'''Truncate an integer so it can be represented exactly with a floating +point mantissa''' + +assert isinstance(x, (int, long)) + +s = 1 +if x 0: +s = -1 +x = -x + +# We can represent integers up to mantissa + 1 bits exactly +mask = (1 (bits + 1)) - 1 + +# Slide the mask until the MSB matches +shift = 0 +while (x shift) ~mask: +shift += 1 + +x = mask shift +x *= s +return x + + def value_to_native(type, value): '''Get the value of unity for this type.''' if type.type == FLOAT: +if type.size = 32 \ +and isinstance(value, (int, long)): +return truncate_mantissa(value, 23) return value if type.type == FIXED: return int(value * (1 (type.size/2))) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 76252] Dynamic loading/unloading of opengl32.dll results in a deadlock
https://bugs.freedesktop.org/show_bug.cgi?id=76252 José Fonseca jfons...@vmware.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #9 from José Fonseca jfons...@vmware.com --- Fixed with http://cgit.freedesktop.org/mesa/mesa/commit/?id=706ad3b649e6a75fdac9dc9acc3caa9e6067b853 -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 86025] src\glsl\list.h(535) : error C2143: syntax error : missing '; ' before 'type'
https://bugs.freedesktop.org/show_bug.cgi?id=86025 Bug ID: 86025 Summary: src\glsl\list.h(535) : error C2143: syntax error : missing ';' before 'type' Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: Windows (All) Status: NEW Keywords: bisected, regression Severity: blocker Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: v...@freedesktop.org CC: ja...@jlekstrand.net, matts...@gmail.com mesa: 0c36aac83252d32dac8c4da2850539bff0b10301 (master 10.4.0-devel) Compiling src\mesa\main\shaderapi.c ... shaderapi.c src\glsl\list.h(535) : error C2143: syntax error : missing ';' before 'type' src\glsl\list.h(535) : error C2143: syntax error : missing ')' before 'type' src\glsl\list.h(536) : error C2065: 'node' : undeclared identifier commit 0c36aac83252d32dac8c4da2850539bff0b10301 Author: Jason Ekstrand jason.ekstr...@intel.com Date: Wed Nov 5 13:57:09 2014 -0800 glsl/list: Add an exec_list_validate function This can be very useful for trying to debug list corruptions. Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com Reviewed-by: Matt Turner matts...@gmail.com -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 86025] src\glsl\list.h(535) : error C2143: syntax error : missing '; ' before 'type'
https://bugs.freedesktop.org/show_bug.cgi?id=86025 Vinson Lee v...@freedesktop.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #1 from Vinson Lee v...@freedesktop.org --- commit f9fc3949e182cb2da8ce6d8d71f441986795664f Author: Vinson Lee v...@freedesktop.org Date: Fri Nov 7 15:33:41 2014 -0800 glsl/list: Move declaration before code. Fixes MSVC build error. shaderapi.c src\glsl\list.h(535) : error C2143: syntax error : missing ';' before 'type' src\glsl\list.h(535) : error C2143: syntax error : missing ')' before 'type' src\glsl\list.h(536) : error C2065: 'node' : undeclared identifier Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86025 Signed-off-by: Vinson Lee v...@freedesktop.org -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)
Ian Romanick i...@freedesktop.org writes: On 11/06/2014 06:16 PM, Michel Dänzer wrote: On 06.11.2014 19:18, Joonas Lahtinen wrote: On to, 2014-11-06 at 18:12 +0900, Michel Dänzer wrote: On 05.11.2014 20:14, Joonas Lahtinen wrote: Modified not refer to DRI3, just uses the present extension to get rid of the excess buffer invalidations. AFAICT there's no fallback from your changes to the current behaviour if the X server doesn't support the Present extension. There probably needs to be such a fallback. It gets rid of such nasty hack (the intel_viewport one), that I thought there is no point making fallback. Because without this, the egl dri2 backend is fundamentally broken anyway. Well, AFAICT your code uses Present extension functionality unconditionally, without checking that the X server supports Present. I can't see how that could possibly work on an X server which doesn't support Present, but I think it would be better to keep it working at least as badly as it does now in that case. :) I was going to say pretty much the same thing. Aren't there (non-Intel) drivers that don't do Present? If I'm not mistaken, some parts of DRI3 (not sure about Present) are even disabled in the Intel driver when SNA is in use... or at least that was the case at one point. They actually get a fallback implementation if there's no driver support, which would be sufficient for this code. However, Present is too new for Mesa to be unconditionally relying on in my opinion. pgpD4o39VLXLa.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl/list: Add a foreach_list_typed_safe_reverse macro
--- src/glsl/list.h | 9 + 1 file changed, 9 insertions(+) diff --git a/src/glsl/list.h b/src/glsl/list.h index 1d18ec9..1fb9178 100644 --- a/src/glsl/list.h +++ b/src/glsl/list.h @@ -673,4 +673,13 @@ inline void exec_node::insert_before(exec_list *before) __node = __next, __next = \ exec_node_data(__type, (__next)-__field.next, __field)) +#define foreach_list_typed_safe_reverse(__type, __node, __field, __list) \ + for (__type * __node = \ + exec_node_data(__type, (__list)-tail_pred, __field), \ + * __prev = \ + exec_node_data(__type, (__node)-__field.prev, __field);\ +__prev != NULL;\ +__node = __prev, __prev = \ + exec_node_data(__type, (__prev)-__field.prev, __field)) + #endif /* LIST_CONTAINER_H */ -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH WIP 1/1] configure: include llvm systemlibs when using static llvm
On Wed, 2014-10-29 at 19:00 +, Emil Velikov wrote: On 27/10/14 21:03, Jan Vesely wrote: On Mon, 2014-10-27 at 20:22 +, Emil Velikov wrote: On 27/10/14 18:05, Jan Vesely wrote: On Mon, 2014-10-27 at 02:24 +, Emil Velikov wrote: On 26/10/14 19:36, Jan Vesely wrote: On Fri, 2014-10-24 at 23:54 +, Emil Velikov wrote: On 24/10/14 17:03, Jan Vesely wrote: -Wl,--exclude-libs prevents automatic export of symbols CC: Kai Wasserbach k...@dev.carbon-project.org CC: Emil Velikov emil.l.veli...@gmail.com Signed-off-by: Jan Vesely jan.ves...@rutgers.edu --- Kai, can you try this patch with your setup, and check whether LLVM symbols are exported from mesa library? (and it's still working) Emil, would it help to have --exclude-libs ALL enabled globally? Haven't really looked up on the documentation about it, yet there should be no (unneeded) exported symbols thanks to the version scripts. As such I'm not entirely sure what this patch (attempts to) resolve :( you are right. I don't know why I thought it was still a problem. In that case the attached patch should fix compiling with llvm static libs (#70410) For future patches please add the full link in the commit message Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70410 Afaics the bug mentioned has a slightly different patch, which brings the question - do we need to append to the existing LLVM_LIBS, or can we overwrite them ? Either way it would be nice if we can get a tested-by or two :) Hi, I looked at this again. LLVM cmake links system-libs as either PUBLIC or INTERFACE [0]. it means my patch is incorrect, and we should link against system-libs even if we use llvm-shared-libs. you can add my R-b to the patch by K. Sobiecky that is attached to the bug.[1] Sigh... cmake why you so PoS ? On a more mature note: I do not see why would we need it to link against those libraries for shared linking. If their libs are broken (have unresolved symbols), and we need this hack to get them working then maybe, but ... looking at line 151 - # FIXME: Should this be really PUBLIC? Answer: PRIVATE for shared libs, PUBLIC for static ones. Using PUBLIC causes all the users to recursively link against those deps. Leading to over-linking and opening the door for serious issues. looks like misdesign on llvm side. they use public to bring systemlibs to other llvm libs, while ignore private deps elsewhere (like libffi). I'd try to send a patch, but my last attempt to get Alexander's rtti patch in is stuck for 9 months... Yes, dealing with build systems is a job no-one wants to do, and I fear most people underestimate it. Please let them know and be persistent otherwise they might will end up abusing it too much :) just FYI the llvm cmake part got fixed in r221530 and r221531. anyway, since we only need those libs in static builds, do you want me to repost the patch with bug reference and Kai's tested by? There is no problem. I've picked it up, added the tags + cc'd mesa-stable, as I would expect a few more people willing to use mesa with static linked llvm. Thanks Emil jan -Emil P.S. Both their automake + cmake builds seems _quite_ bad. autoconf/Readme has a nice documentation of it :) jan [0] lib/Support/CMakeLists.txt:150 [1] https://bugs.freedesktop.org/attachment.cgi?id=91764 Thanks Emil jan -Emil jan configure.ac | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 3c76deb..b4b4b13 100644 --- a/configure.ac +++ b/configure.ac @@ -1981,7 +1981,15 @@ if test x$MESA_LLVM != x0; then dnl already added all of these objects to LLVM_LIBS. fi else -AC_MSG_WARN([Building mesa with staticly linked LLVM may cause compilation issues]) +AC_MSG_WARN([Building mesa with statically linked LLVM may cause compilation issues]) + dnl Don't export symbols automatically + dnl TODO: Do we want to list llvm libs explicitly here? + LLVM_LDFLAGS+= -Wl,exclude-libs ALL + dnl We need to link to llvm system libs when using static libs + dnl However, only llvm 3.5+ provides --system-libs + if test $LLVM_VERSION_MAJOR -eq 3 -a $LLVM_VERSION_MINOR -ge 5; then + LLVM_LIBS+= `$LLVM_CONFIG --system-libs` + fi fi fi -- Jan Vesely jan.ves...@rutgers.edu signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Require micro-benchmarks for performance optimization oriented patches
Sounds okay to me, thanks! Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Require micro-benchmarks for performance optimization oriented patches
On 11/07/2014 10:56 PM, Matt Turner wrote: On Fri, Nov 7, 2014 at 12:31 AM, Siavash Eliasi siavashser...@gmail.com wrote: Being more strict about pushing and quality assurance of these kind of patches will save hours of bisecting and hair-pulling to find the root cause of performance degrades. You say this as if it has happened...? Nope, I'm not aware of any. Just saying that it's better to stay safe. Because mesa has reached a pretty much stable and usable state for gaming and daily tasks, we will see more of these patches in upcoming years. Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl/list: Add a foreach_list_typed_safe_reverse macro
FWIW, Ian commented on my patch to add foreach_list_typed_reverse that we shouldn't merge it until it had users so people wouldn't try to remove it -- it seems like this has the same issue. So either merge both of them at the same time, or neither until NIR gets merged... Connor On Fri, Nov 7, 2014 at 9:32 PM, Jason Ekstrand ja...@jlekstrand.net wrote: --- src/glsl/list.h | 9 + 1 file changed, 9 insertions(+) diff --git a/src/glsl/list.h b/src/glsl/list.h index 1d18ec9..1fb9178 100644 --- a/src/glsl/list.h +++ b/src/glsl/list.h @@ -673,4 +673,13 @@ inline void exec_node::insert_before(exec_list *before) __node = __next, __next = \ exec_node_data(__type, (__next)-__field.next, __field)) +#define foreach_list_typed_safe_reverse(__type, __node, __field, __list) \ + for (__type * __node = \ + exec_node_data(__type, (__list)-tail_pred, __field), \ + * __prev = \ + exec_node_data(__type, (__node)-__field.prev, __field);\ +__prev != NULL;\ +__node = __prev, __prev = \ + exec_node_data(__type, (__prev)-__field.prev, __field)) + #endif /* LIST_CONTAINER_H */ -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl/list: Add a foreach_list_typed_safe_reverse macro
On Fri, Nov 7, 2014 at 8:57 PM, Connor Abbott cwabbo...@gmail.com wrote: FWIW, Ian commented on my patch to add foreach_list_typed_reverse that we shouldn't merge it until it had users so people wouldn't try to remove it -- it seems like this has the same issue. So either merge both of them at the same time, or neither until NIR gets merged... Fair enough. This at least gets it on the list for review. Connor On Fri, Nov 7, 2014 at 9:32 PM, Jason Ekstrand ja...@jlekstrand.net wrote: --- src/glsl/list.h | 9 + 1 file changed, 9 insertions(+) diff --git a/src/glsl/list.h b/src/glsl/list.h index 1d18ec9..1fb9178 100644 --- a/src/glsl/list.h +++ b/src/glsl/list.h @@ -673,4 +673,13 @@ inline void exec_node::insert_before(exec_list *before) __node = __next, __next = \ exec_node_data(__type, (__next)-__field.next, __field)) +#define foreach_list_typed_safe_reverse(__type, __node, __field, __list) \ + for (__type * __node = \ + exec_node_data(__type, (__list)-tail_pred, __field), \ + * __prev = \ + exec_node_data(__type, (__node)-__field.prev, __field); \ +__prev != NULL; \ +__node = __prev, __prev = \ + exec_node_data(__type, (__prev)-__field.prev, __field)) + #endif /* LIST_CONTAINER_H */ -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl/list: Add a foreach_list_typed_safe_reverse macro
Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Permanently enable features supported by target CPU at compile time.
This will remove the need for unnecessary runtime checks for CPU features if already supported by target CPU, resulting in smaller and less branchy code. --- src/mesa/x86/common_x86_features.h | 25 + 1 file changed, 25 insertions(+) diff --git a/src/mesa/x86/common_x86_features.h b/src/mesa/x86/common_x86_features.h index 66f2cf6..9888c26 100644 --- a/src/mesa/x86/common_x86_features.h +++ b/src/mesa/x86/common_x86_features.h @@ -67,5 +67,30 @@ #define cpu_has_3dnowext (_mesa_x86_cpu_features X86_FEATURE_3DNOWEXT) #define cpu_has_sse4_1 (_mesa_x86_cpu_features X86_FEATURE_SSE4_1) +/* Permanently enable features supported by target CPU at compile time */ +#ifdef __MMX__ +#define cpu_has_mmx1 +#endif + +#ifdef __SSE__ +#define cpu_has_xmm1 +#endif + +#ifdef __SSE2__ +#define cpu_has_xmm2 1 +#endif + +#ifdef __3dNOW__ +#define cpu_has_3dnow 1 +#endif + +#ifdef __SSSE3__ +#define cpu_has_ssse3 1 +#endif + +#ifdef __SSE4_1__ +#define cpu_has_sse4_1 1 +#endif + #endif -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Permanently enable features supported by target CPU at compile time.
On Fri, Nov 7, 2014 at 10:43 PM, Siavash Eliasi siavashser...@gmail.com wrote: This will remove the need for unnecessary runtime checks for CPU features if already supported by target CPU, resulting in smaller and less branchy code. --- src/mesa/x86/common_x86_features.h | 25 + 1 file changed, 25 insertions(+) diff --git a/src/mesa/x86/common_x86_features.h b/src/mesa/x86/common_x86_features.h index 66f2cf6..9888c26 100644 --- a/src/mesa/x86/common_x86_features.h +++ b/src/mesa/x86/common_x86_features.h @@ -67,5 +67,30 @@ #define cpu_has_3dnowext (_mesa_x86_cpu_features X86_FEATURE_3DNOWEXT) #define cpu_has_sse4_1 (_mesa_x86_cpu_features X86_FEATURE_SSE4_1) +/* Permanently enable features supported by target CPU at compile time */ +#ifdef __MMX__ +#define cpu_has_mmx1 +#endif + +#ifdef __SSE__ +#define cpu_has_xmm1 +#endif + +#ifdef __SSE2__ +#define cpu_has_xmm2 1 +#endif + +#ifdef __3dNOW__ +#define cpu_has_3dnow 1 +#endif + +#ifdef __SSSE3__ +#define cpu_has_ssse3 1 +#endif There's not an existing cpu_has_ssse3 macro. + +#ifdef __SSE4_1__ +#define cpu_has_sse4_1 1 +#endif As you can see at the beginning of the patch, you're just redefining the same macros... They should be #ifdef __SSE__ #define cpu_has_xmm1 #else #define cpu_has_xmm (_mesa_x86_cpu_features X86_FEATURE_XMM) #endif ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Permanently enable features supported by target CPU at compile time.
On 11/08/2014 10:30 AM, Matt Turner wrote: On Fri, Nov 7, 2014 at 10:43 PM, Siavash Eliasi siavashser...@gmail.com wrote: This will remove the need for unnecessary runtime checks for CPU features if already supported by target CPU, resulting in smaller and less branchy code. --- src/mesa/x86/common_x86_features.h | 25 + 1 file changed, 25 insertions(+) diff --git a/src/mesa/x86/common_x86_features.h b/src/mesa/x86/common_x86_features.h index 66f2cf6..9888c26 100644 --- a/src/mesa/x86/common_x86_features.h +++ b/src/mesa/x86/common_x86_features.h @@ -67,5 +67,30 @@ #define cpu_has_3dnowext (_mesa_x86_cpu_features X86_FEATURE_3DNOWEXT) #define cpu_has_sse4_1 (_mesa_x86_cpu_features X86_FEATURE_SSE4_1) +/* Permanently enable features supported by target CPU at compile time */ +#ifdef __MMX__ +#define cpu_has_mmx1 +#endif + +#ifdef __SSE__ +#define cpu_has_xmm1 +#endif + +#ifdef __SSE2__ +#define cpu_has_xmm2 1 +#endif + +#ifdef __3dNOW__ +#define cpu_has_3dnow 1 +#endif + +#ifdef __SSSE3__ +#define cpu_has_ssse3 1 +#endif There's not an existing cpu_has_ssse3 macro. Just wanted to add it in advance in case Timothy Arceri's patch gets merged: [Mesa-dev] [PATCH V2 1/2] mesa: add runtime support for SSSE3 http://lists.freedesktop.org/archives/mesa-dev/2014-November/070338.html I'll remove it if this isn't fine by you. + +#ifdef __SSE4_1__ +#define cpu_has_sse4_1 1 +#endif As you can see at the beginning of the patch, you're just redefining the same macros... They should be #ifdef __SSE__ #define cpu_has_xmm1 #else #define cpu_has_xmm (_mesa_x86_cpu_features X86_FEATURE_XMM) #endif Sure, will modify it. By the way, GCC should use the last macro definition so cpu_has_xmm should be 1 in case it's going to be compiled for an SSE capable target and previous definition should be ignored. Am I missing something? Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev