Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs

2014-11-07 Thread Pekka Paalanen
On Thu, 06 Nov 2014 13:01:03 -0800
Ian Romanick i...@freedesktop.org wrote:

 I thought Eric and Chad already NAKed it in bugzilla.  The problem is
 that applications ask for an RGBA visual for GL blending.  They use the
 alpha channel to generate their images, but the final alpha values are,
 basically, random... and the composited result would be pure garbage.

Reading
https://bugs.freedesktop.org/show_bug.cgi?id=67676#c5
We should certainly be exposing EGLConfigs that match up to the rgba
visual, though, so you can find it when you try. - Eric

To me that sounds like Eric would accept having the visuals there
in additional configs (as long as they are sorted after the otherwise
equivalent xRGB configs?). Eric, would you like to confirm your current
opinion?

 As Chad points out in comment #1, EGL just doesn't let applications do
 the thing the patch is trying to do.

True in general, not exactly for X11. Surely the native visual is
exposed in EGL configs for some reason? (Ok, yeah, not a good argument,
EGL considered.)

For the record, if a Wayland compositor uses the opaque region hint in
Wayland core protocol, we do not see this problem on Wayland at all. In
Wayland, so far the implementations always assume that if the alpha
channel exists in the buffer, it will be used for window blending,
except on regions marked as opaque. So we kind of already have a way to
make it work in Wayland, and just X11 is broken here.

Still, I won't even try to deny that an extension to
EGL_TRANSPARENT_TYPE would be hugely useful and the superior solution
over all the hacking. I do understand you would rather see that
developed than a native visual based solution.

More below...

 On 11/06/2014 05:12 AM, Emil Velikov wrote:
  Humble ping x2
  
  On 14/10/14 15:25, Emil Velikov wrote:
  Humble ping.
 
  On 23/09/14 01:25, Emil Velikov wrote:
  From: Sjoerd Simons sjoerd.sim...@collabora.co.uk
 
  When using RGBA EGLConfigs allow both RGB and RGBA X visuals, such that
  application can decide whether they want to use RGBA (and have the
  compositor blend their windows).
 
  On my system with this change EGLConfigs with a 24 bit visual comes up
  first, as such applications blindly picking the first EGLConfig will
  still get an RGB X visual.
 
  Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676
  ---
 
  Hello gents,
 
  This patch has been stuck in bugzilla since February this year. Bringing 
  it around here to gather greater exposure and perhaps some 
  comments/reviews.
 
  -Emil
 
   src/egl/drivers/dri2/egl_dri2.c |  5 +
   src/egl/drivers/dri2/platform_x11.c | 17 +
   2 files changed, 14 insertions(+), 8 deletions(-)
 
  diff --git a/src/egl/drivers/dri2/egl_dri2.c 
  b/src/egl/drivers/dri2/egl_dri2.c
  index 20a7243..2ed90a7 100644
  --- a/src/egl/drivers/dri2/egl_dri2.c
  +++ b/src/egl/drivers/dri2/egl_dri2.c
  @@ -110,6 +110,11 @@ EGLint dri2_to_egl_attribute_map[] = {
   static EGLBoolean
   dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria)
   {
  +
  +   if (criteria-NativeVisualID != EGL_DONT_CARE 
  +conf-NativeVisualID != criteria-NativeVisualID)
  +  return EGL_FALSE;

Does this directly affect the behaviour of eglChooseConfig?

EGL 1.4 spec quite clearly says that NATIVE_VISUAL_ID is ignored if
specified as matching criterion to eglChooseConfig, which is why
apps are expected to match it manually rather than via eglChooseConfig.

Doesn't that mean that apps that want alpha blending of the window
already check the native visual? Maybe not if it worked by accident...

Now, this magical alpha-blending visual thing used to work the past
somehow for a long time, didn't it? Until someone noticed a performance
problem (fd.o #59783), not even a correctness problem, or are there
other bugs about correctness too? The patch that caused bug #59783 to
be reported was an intel-driver specific commit, and then #59783 got
fixed by a hardware agnostic change elsewhere, causing some existing
apps to produce unwanted results rather than just affect performance,
leading to fd.o #67676.

To not break most of existing apps that need alpha for rendering but
do not intend the alpha channel to go to the window system blending, we
only need to make sure the configs with argb visual get sorted after
the equivalent xrgb visual config? Reading the EGL 1.4 spec, that seems
perfectly valid.

I don't know if this patch guarantees the config ordering though.


Thanks,
pq

  +
  if (_eglCompareConfigs(conf, criteria, NULL, EGL_FALSE) != 0)
 return EGL_FALSE;
   
  diff --git a/src/egl/drivers/dri2/platform_x11.c 
  b/src/egl/drivers/dri2/platform_x11.c
  index a7a7338..3395fb7 100644
  --- a/src/egl/drivers/dri2/platform_x11.c
  +++ b/src/egl/drivers/dri2/platform_x11.c
  @@ -672,14 +672,15 @@ dri2_x11_add_configs_for_visuals(struct 
  dri2_egl_display *dri2_dpy,
dri2_add_config(disp, dri2_dpy-driver_configs[j], id++,

[Mesa-dev] Require micro-benchmarks for performance optimization oriented patches

2014-11-07 Thread Siavash Eliasi
I know that this might sound troublesome but since there is no 
benchmarks done by reviewers before pushing the performance optimization 
oriented patches into master branch, I think it's as important as piglit 
tests and necessary to ask the patch provider for simple OpenGL micro 
benchmarks triggering the optimized code path to see if there is any 
real benefits and make sure that it isn't degrading the performance.


Being more strict about pushing and quality assurance of these kind of 
patches will save hours of bisecting and hair-pulling to find the root 
cause of performance degrades.


Best regards,
Siavash Eliasi.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-announce] Mesa 10.3 release candidate 1

2014-11-07 Thread Matt Turner
On Fri, Nov 7, 2014 at 1:07 AM, Thierry Vignaud
thierry.vign...@gmail.com wrote:
 On 5 November 2014 04:44, Matt Turner matts...@gmail.com wrote:
I tried to reproduce this today and couldn't.

 (...)

 Thanks. Maybe you could give a little more information, like an error
 message or something?

 Same error as Thierry reported in this thread in August:

 Unfortunately Thierry's was from a re-run of make, so it wasn't useful.

 No It wasn't a re-run!
 It was a clean build in our build system with make -jXX with XX auto set to
 the number of cores and is always reproducable given enough cores

Oh, weird.

 I've gone over this all and can't spot the problem. The dependencies
 look fine. I tried automake-1.13 and 1.14, and make-3.82 and 4.0.
 Maybe I'll have more luck on a 40 core system.

 As already explained, in order to be able to reproduce, you must either have
 a large system or force make -j to a high value (eg: -j24)

Did you see the rest of the thread where I said I couldn't reproduce
on a 40 core system?

Perhaps someone who can reproduce could try to take a look?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] clover: fix tab/space and alignement

2014-11-07 Thread EdB
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index 3a4fcf0..d29f5a6 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -214,7 +214,7 @@ namespace {
 
 #if HAVE_LLVM = 0x0306
   c.getPreprocessorOpts().addRemappedFile(name,
-  
llvm::MemoryBuffer::getMemBuffer(source).release());
+  
llvm::MemoryBuffer::getMemBuffer(source).release());
 #else
   c.getPreprocessorOpts().addRemappedFile(name,
   
llvm::MemoryBuffer::getMemBuffer(source));
@@ -675,7 +675,7 @@ static const struct debug_named_value debug_options[] = {
{llvm, DBG_LLVM, Dump the generated LLVM IR for all kernels.},
{asm, DBG_ASM, Dump kernel assembly code for targets specifying 
 PIPE_SHADER_IR_NATIVE},
-   DEBUG_NAMED_VALUE_END // must be last
+   DEBUG_NAMED_VALUE_END // must be last
 };
 
 module
@@ -737,7 +737,7 @@ clover::compile_program_llvm(const compat::string source,
  break;
   case PIPE_SHADER_IR_NATIVE: {
  std::vectorchar code = compile_native(mod, triple, processor,
-debug_flags  DBG_ASM, r_log);
+ debug_flags  DBG_ASM, r_log);
  m = build_module_native(code, mod, kernels, address_spaces, r_log);
  break;
   }
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/2] clover CL_INVALID_COMPILER_OPTIONS

2014-11-07 Thread EdB
When I added clCompileProgram, I forgot to made the change to
trigger CL_INVALID_COMPILER_OPTIONS instead of CL_INVALID_BUILD_OPTIONS
when invalid option are pass to compiler.
This have been catched by the yet to be reviewed Pigilt test

The first patch remove some spaces inconsistency in
llvm/invocation.cpp. It could have been a stand-alone one

EdB (2):
  clover: fix tab/space and alignement
  clover: clCompileProgram CL_INVALID_COMPILER_OPTIONS

 src/gallium/state_trackers/clover/api/program.cpp | 2 ++
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 8 
 2 files changed, 6 insertions(+), 4 deletions(-)

-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] clover: clCompileProgram CL_INVALID_COMPILER_OPTIONS

2014-11-07 Thread EdB
clCompileProgram should return CL_INVALID_COMPILER_OPTIONS
instead of CL_INVALID_BUILD_OPTIONS
---
 src/gallium/state_trackers/clover/api/program.cpp | 2 ++
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/api/program.cpp 
b/src/gallium/state_trackers/clover/api/program.cpp
index 3a6c054..60184ed 100644
--- a/src/gallium/state_trackers/clover/api/program.cpp
+++ b/src/gallium/state_trackers/clover/api/program.cpp
@@ -182,6 +182,8 @@ clBuildProgram(cl_program d_prog, cl_uint num_devs,
prog.build(devs, opts);
return CL_SUCCESS;
 } catch (error e) {
+   if (e.get() == CL_INVALID_COMPILER_OPTIONS)
+  return CL_INVALID_BUILD_OPTIONS;
if (e.get() == CL_COMPILE_PROGRAM_FAILURE)
   return CL_BUILD_PROGRAM_FAILURE;
return e.get();
diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index d29f5a6..30547d0 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -177,7 +177,7 @@ namespace {
 opts_carray.data() + 
opts_carray.size(),
 Diags);
   if (!Success) {
- throw error(CL_INVALID_BUILD_OPTIONS);
+ throw error(CL_INVALID_COMPILER_OPTIONS);
   }
   c.getFrontendOpts().ProgramAction = clang::frontend::EmitLLVMOnly;
   c.getHeaderSearchOpts().UseBuiltinIncludes = true;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: validate invariance between builtin variables

2014-11-07 Thread Tapani Pälli
Patch adds additional validation for GLSL ES 1.00 that specifies
cross stage variance requirements for a set of specified builtins.

Fixes failures in WebGL conformance test 'shaders-with-invariance'.

Signed-off-by: Tapani Pälli tapani.pa...@intel.com
---
 src/glsl/linker.cpp | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index bd2aa3c..57082a4 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -695,10 +695,29 @@ cross_validate_globals(struct gl_shader_program *prog,
 * them.
 */
glsl_symbol_table variables;
+
+   ir_variable *gl_fcoord = NULL, *gl_position = NULL,
+   *gl_pcoord = NULL, *gl_psize = NULL,
+   *gl_frontf = NULL;
+
for (unsigned i = 0; i  num_shaders; i++) {
   if (shader_list[i] == NULL)
 continue;
 
+  /* For GLSL ES 1.00, store builtin variables that require
+   * cross stage validation.
+   */
+  if (prog-IsES  prog-Version  300) {
+ if (shader_list[i]-Stage == MESA_SHADER_VERTEX) {
+gl_position = shader_list[i]-symbols-get_variable(gl_Position);
+gl_psize = shader_list[i]-symbols-get_variable(gl_PointSize);
+ } else if (shader_list[i]-Stage == MESA_SHADER_FRAGMENT) {
+gl_fcoord = shader_list[i]-symbols-get_variable(gl_FragCoord);
+gl_pcoord = shader_list[i]-symbols-get_variable(gl_PointCoord);
+gl_frontf =  
shader_list[i]-symbols-get_variable(gl_FrontFacing);
+ }
+  }
+
   foreach_in_list(ir_instruction, node, shader_list[i]-ir) {
 ir_variable *const var = node-as_variable();
 
@@ -904,6 +923,30 @@ cross_validate_globals(struct gl_shader_program *prog,
variables.add_variable(var);
   }
}
+
+   /* GLSL ES 1.00 specification, Section Invariance and Linkage says:
+*
+* For the built-in special variables, gl_FragCoord can only be
+* declared invariant if and only if gl_Position is declared invariant.
+* Similarly gl_PointCoord can only be declared invariant if and only
+* if gl_PointSize is declared invariant. It is an error to declare
+* gl_FrontFacing as invariant. The invariance of gl_FrontFacing is the
+* same as the invariance of gl_Position.
+*/
+   if (prog-IsES  prog-Version  300) {
+  if (gl_fcoord  gl_position)
+ if (gl_fcoord-data.invariant  !gl_position-data.invariant)
+linker_error(prog, gl_FragCoord declared invariant 
+ while gl_Position is declared variant.);
+
+  if (gl_pcoord  gl_psize)
+ if (gl_pcoord-data.invariant  !gl_psize-data.invariant)
+linker_error(prog, gl_PointCoord declared invariant 
+ while gl_PointSize is declared variant.);
+
+  if (gl_frontf  gl_frontf-data.invariant)
+ linker_error(prog, gl_FrontFacing cannot be declared invariant.);
+   }
 }
 
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 85918] Mesa: MSVC 2010/2012 Compile error

2014-11-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=85918

--- Comment #7 from José Fonseca jfons...@vmware.com ---
(In reply to Emil Velikov from comment #2)
 Roland,
 Can we encourage newcomers to use MSVC 2013 and later (if ever available)
 ? This way once you guys (and others) are over to 2013 we can just go with -
 anything prior to MSVC2013 is not supported :P
 I.e. having a gradual transition period is always a nice.

I'm afraid VMware needs to build with MSVC 2012.  See explanation on 
http://lists.freedesktop.org/archives/piglit/2014-October/013129.html

It's OK recommending MSVC 2013, but we need to continue to support MSVC 2012 a
bit longer.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation forglDrawElements

2014-11-07 Thread Steven Newbury
On Thu, 2014-11-06 at 14:39 -0800, Matt Turner wrote:
 On Thu, Nov 6, 2014 at 2:33 PM, Marc Dietrich marvi...@gmx.de 
 wrote:
  So this is only a problem with Link-Time-Opt.
 
 I don't actually know how you're getting to this point in the build 
 with LTO. It fails for me in src/mapi.
 
 I think LTO is something we should make work at some point, but I 
 don't think we should gate contributions on whether LTO works or 
 not.

FWIW, I always compile mesa with -flto, the only place where it fails 
is the stand-alone glsl compiler.


signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Steven Newbury
On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:
 On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
 siavashser...@gmail.com wrote:
  Then I do recommend removing the if (cpu_has_sse4_1) from this 
  patch and similar places, because there is no runtime CPU 
  dispatching happening for SSE optimized code paths in action and 
  just adds extra overhead (unnecessary branches) to the generated 
  code.
 
 No. Sorry, I realize I misread your previous question:
 
   I guess checking for cpu_has_sse4_1 is unnecessary if it isn't 
   controllable by user at runtime; because USE_SSE41 is a 
   compile time check and requires the target machine to be SSE 4.1 
   capable already.
 
 USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you 
 to build the code and then use it only on systems that actually 
 support it.
 
 All of this could have been pretty easily answered by a few greps 
 though...

I wonder what difference it would make to have an option to compile 
out the run-time check code to avoid the additional overhead in cases 
where the builder *knows* at compile time what the run-time system is? 
(ie Gentoo)


signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

2014-11-07 Thread Timothy Arceri
On Thu, 2014-11-06 at 19:30 -0500, Frank Henigman wrote:
 I tested your patch with the teximage program in mesa demos, the
 same thing I used to benchmark when I developed this code.
 As Matt and Chad point out, the odd-looking _faster functions are
 there for a reason.  Your change causes a huge slowdown.

Yes I should have known better than to assume it was left over code. I
didn't know that gcc could inline memcpy like that, very nice. In fact I
was reading a blog just last week that was saying msvc was better than
gcc for memcpy because gcc was reliant on a library implementation. A
good reminder not to believe everything you read on the internet.

Anyway I've had another go at it and the performance regression should
be fixed. In my testing I couldn't spot any real difference. The main
down side is the ssse3 code can't be inlined so there will be a small
trade off compared to the current way of building with ssse3 enabled.

Also thanks for pointing out teximage I didn't know the mesa demos
contained pref tools. 

 I tested on a sandybridge system with a Intel(R) Celeron(R) CPU 857 @
 1.20GHz.  Mesa compiled with -O2.
 
 original code:
   TexSubImage(RGBA/ubyte 256 x 256): 9660.4 images/sec, 2415.1 MB/sec
   TexSubImage(RGBA/ubyte 1024 x 1024): 821.2 images/sec, 3284.7 MB/sec
   TexSubImage(RGBA/ubyte 4096 x 4096): 76.3 images/sec, 4884.9 MB/sec
 
   TexSubImage(BGRA/ubyte 256 x 256): 11307.1 images/sec, 2826.8 MB/sec
   TexSubImage(BGRA/ubyte 1024 x 1024): 944.6 images/sec, 3778.6 MB/sec
   TexSubImage(BGRA/ubyte 4096 x 4096): 76.7 images/sec, 4908.3 MB/sec
 
   TexSubImage(L/ubyte 256 x 256): 17847.5 images/sec, 1115.5 MB/sec
   TexSubImage(L/ubyte 1024 x 1024): 3068.2 images/sec, 3068.2 MB/sec
   TexSubImage(L/ubyte 4096 x 4096): 224.6 images/sec, 3593.0 MB/sec
 
 your code:
   TexSubImage(RGBA/ubyte 256 x 256): 3271.6 images/sec, 817.9 MB/sec
   TexSubImage(RGBA/ubyte 1024 x 1024): 232.3 images/sec, 929.2 MB/sec
   TexSubImage(RGBA/ubyte 4096 x 4096): 47.5 images/sec, 3038.6 MB/sec
 
   TexSubImage(BGRA/ubyte 256 x 256): 2426.5 images/sec, 606.6 MB/sec
   TexSubImage(BGRA/ubyte 1024 x 1024): 164.1 images/sec, 656.4 MB/sec
   TexSubImage(BGRA/ubyte 4096 x 4096): 13.4 images/sec, 854.8 MB/sec
 
   TexSubImage(L/ubyte 256 x 256): 9514.5 images/sec, 594.7 MB/sec
   TexSubImage(L/ubyte 1024 x 1024): 864.1 images/sec, 864.1 MB/sec
   TexSubImage(L/ubyte 4096 x 4096): 59.7 images/sec, 955.2 MB/sec
 
 This is just one run, not an average, but you can see it's slower
 across the board up to a factor of around 6.
 Also I couldn't configure the build after your patch.  I think you
 left out a change to configure.ac to define SSSE3_SUPPORTED.
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84566] Unify the format conversion code

2014-11-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #52 from Samuel Iglesias sigles...@igalia.com ---
(In reply to Jason Ekstrand from comment #51)
 (In reply to Samuel Iglesias from comment #48)
  (In reply to Jason Ekstrand from comment #34)
   (In reply to Samuel Iglesias from comment #33)
Jason, I would like to know your opinion about the integer RGBA clamping
done in pack.c (_mesa_pack_rgba_span_from_ints()).

glReadPixels() and glGetTexImage() specs said (for example, in OpenGL 
3.3.
core specification) that for an integer RGBA color, each component is
clamped to the representable range of type.

Those GL functions end up calling pack.c's functions for performing the
conversion (mesa_pack_rgba_span_from_ints() and others).

It's possible to replace some of those pack.c's conversion functions by 
the
master conversion but the problem is in the clamping operation. I would 
like
to add a boolean argument called clamp to the master conversion 
function
which is passed to _mesa_swizzle_and_convert() where each of its
convert_{uint,int,byte,ubyte,short,ushort}() do the right thing when 
clamp
is true.

This clamp parameter would be false for every call to either master
conversion function or _mesa_swizzle_and_convert() except when they are
being called from pack.c.

What do you think?
   
   Supporting clamping is probably fine.  I think we determined we needed a
   clamp parameter anyway for some of the float conversions.  I guess it 
   makes
   sense for integers too.  Let's watch out for performance when we implement
   it though.  Changing the loop inside mesa_swizzle_and_convert without
   hurting performance can be tricky.  The teximage-colors test in Piglit 
   has a
   -benchmark flag that can be used for testing that.
  
  In the end, we did not make that change in pack.c as we could just use the
  autogenerated format pack/unpack functions. However I am retaking this topic
  again because we found another issue which would require a similar solution:
  
  The convert_*() functions in format_utils.c convert between source and
  destination data and are used by _mesa_swizzle_and_convert. We found that
  these were not good enough for various conversions that involved
  non-normalized types of different sizes: INT to SHORT, INT to BYTE, etc.
  Because of that, several piglit tests related to glGetTexImage() and others
  failed, like for example bin/ext_texture_integer-getteximage-clamping.
  
  In order to fix that we added the clamp expressions for these cases [1]  and
  with that we achieved no regressions when executing a full piglit run on
  i965 driver.
  Unfortunately, when testing our patches on a couple of Gallium drivers, we
  found a regression that we tracked down back to that patch:
  bin/arb_clear_buffer_object-formats.
  Reverting the patch makes fixes the problem with these Gallium drivers but
  then, bin/ext_texture_integer-getteximage-clamping fails again on i965. We
  wonder if there could be more cases like this that piglit is not covering,
  since it looks weird that this affects just this one test.
  
  So, we wonder if that patch for _mesa_swizzle_and_convert is correct and we
  should fix the failed gallium cases elsewhere, or if we should revert that
  patch and fix the cases it fixed in a different way. What do you think? Was
  _mesa_swizzle_and_convert implemented like that  on purpose or are these
  real bugs?
 
 From my brief reading of the GL spec, it looks like clamping integers to the
 max representable range is what it expects by default.  From the glTexImage
 spec:
 
 The selected groups are transferred to the GL as described in section 3.7.2
 and then clamped to the representable range of the internal format. If the
 internalformat of the texture is signed or unsigned integer, components are
 clamped to [-2^(n-1), 2^(n-1)-1] or [0, 2^(n-1)-1], respectively, where n is
 the number of bits per component. For color component groups, if the
 internalformat of the texture is signed or unsigned normalized fixed-point,
 components are clamped t0 [-1, 1] or [0, 1], respectively.
 
 Therefore, it seems as if we want to be clamping when we have integer
 destinations.  I'm not sure why the gallium drivers are regressing when you
 do.
 
 One more observation is that it doesn't look like your clamping patch is
 complete.  If we're going to clamp when we have an integer destination, we
 should always clamp with integer destinations, not just in the few cases
 that piglit hits.  I wouldn't be surprised if piglit's coverage in those
 areas is terrible.


Yes, you are right about the specification. We have added clamping for all the
needed conversions in _mesa_swizzle_and_convert() following the spec but, after
analyzing why gallium drivers have this regression, I discovered that it comes
from other place.

The following text was copied from piglit's 

Re: [Mesa-dev] [PATCH 0/2] Disable the EGL state tracker for Linux/DRI builds

2014-11-07 Thread Steven Newbury
On Wed, 2014-11-05 at 00:46 +, Emil Velikov wrote:
 On 04/11/14 22:42, Marek Olšák wrote:
  Hi everybody,
  
  I'm about to address this long-standing issue: The EGL state 
  tracker is redundant. It duplicates what st/dri does and it also 
  duplicates what the common loader egl_dri2 does, which is used by 
  all classic drivers and even works better with gallium drivers.
  
  Let's compare EGL extensions for both backends:
  
  st/egl:
  EGL version string: 1.4 (Gallium)
  EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenVG
  EGL extensions string:
  EGL_WL_bind_wayland_display EGL_KHR_image_base 
  EGL_KHR_image_pixmap
  EGL_KHR_image EGL_KHR_reusable_sync EGL_KHR_fence_sync
  EGL_KHR_surfaceless_context EGL_NOK_swap_region
  EGL_NV_post_sub_buffer
  
  egl_dri2:
  EGL version string: 1.4 (DRI2)
  EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenGL_ES3
  EGL extensions string:
  EGL_MESA_drm_image EGL_MESA_configless_context 
  EGL_KHR_image_base
  EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_texture_2D_image
  EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image
  EGL_KHR_surfaceless_context EGL_KHR_create_context
  EGL_NOK_swap_region EGL_NOK_texture_from_pixmap
  EGL_CHROMIUM_sync_control EGL_EXT_image_dma_buf_import
  EGL_NV_post_sub_buffer
  
  egl_dri2 also supports MSAA on the window framebuffer (through 
  st/dri). It's really obvious which one is better.
  
  I'm aware of 2 features that we will lose:
  - swrast on Wayland - I'm not sure about this. Perhaps kms-swrast 
  has
  addressed this already.
  - OpenVG - It has never taken off. If people want this on Linux, 
  it should
  use egl_dri2 and st/dri, like OpenGL does.
  
I think it would be a shame to lose the OpenVG backend.  It's been 
disappointing with the lack of interest on desktop systems even 
through cairo, but I wonder if it was because of the inherent Gallium 
only nature?  Cairo's GL backend is probably more likely to work on 
more systems because of this.  Admittedly, it was probably mostly 
useful as a technology on weaker CPUs and simpler GPUs without full 
OpenGL(ES) capability where it could provide a performant GUI.

  T his series removes st/egl and st/gbm support from the autoconf 
  build (the latter depends on the former and is probably just as 
  redundant). The next step is to remove all Linux-specific backends 
  from st/egl. Windows, Android, and other platform backends will be 
  kept intact, therefore st/egl won't be removed completely.
  
  Please comment.
  
I use EGL_DRIVER=egl_gallium to switch to using the ilo driver at run-
time.  (Admittedly, mostly for testing more than anything useful.)  
There would have to be some other way of at least selecting run-time 
backend to egl_dri2 for this functionality to continue working.

 A few thoughts:
  - Iirc Eric is using st/egl as the dri2 backend seems to be causing
 problems :(
  - Android supports dri modules, but st/dri/Android.mk is missing.
 Should be trivial to add.
  - Windows - might be a pain in the a** to get it working. Then again
 does Windows have EGL ?
  - fbdev, OpenVG - the etnaviv project was using the former, 
 currently
 moving to full-blown drm :)
 
 If one is to nuking the three st we can remove
  - the libEGL symbol export mayhem
  - gallium's st_api, and flatten things a bit
  - a bit of duplication :)
 
 In summary:
 With a bit of work we can remove Linux and Android from the 
 equation. How many people/companies use EGL for Windows/fbdev, how 
 about OpenVG on any platform ?
 
I'm not sure we can know how many use OpenVG.  Probably more than is 
commonly thought on this list.


signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 2/2] i965: add runtime check for SSSE3 rgba8_copy

2014-11-07 Thread Timothy Arceri
Callgrind cpu usage results from pts benchmarks:

For ytile_copy_faster()

Nexuiz 1.6.1: 2.16% - 1.20%

V2:
- put back the if statements and add one for the SSSE3 rgba8_copy
- move some header files out of the header
- don't indent the preprocessor tests
- changed copyright to Google and add author Frank Henigman

Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
 src/mesa/Makefile.am   |  8 +++
 src/mesa/drivers/dri/i965/intel_tex_subimage.c | 80 -
 src/mesa/main/fast_rgba8_copy.c| 83 ++
 src/mesa/main/fast_rgba8_copy.h| 38 
 4 files changed, 169 insertions(+), 40 deletions(-)
 create mode 100644 src/mesa/main/fast_rgba8_copy.c
 create mode 100644 src/mesa/main/fast_rgba8_copy.h

 Thanks to everyone who took the time to provide feedback on the first
 patchset and pointing out that the baby had been thrown out with the
 bath water.

 I've had another go at it and the performance regression should be fixed.
 With the tests available I couldn't spot any real difference. However the 
 down side is the ssse3 code can't be inlined so there will be a small trade
 off compared to the current way of building with ssse3 enabled. But for the
 majority of builds done without ssse3 it should be a win.

diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
index e71bccb..cf87289 100644
--- a/src/mesa/Makefile.am
+++ b/src/mesa/Makefile.am
@@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS)
 
 ARCH_LIBS =
 
+if SSSE3_SUPPORTED
+ARCH_LIBS += libmesa_ssse3.la
+endif
+
 if SSE41_SUPPORTED
 ARCH_LIBS += libmesa_sse41.la
 endif
@@ -150,6 +154,10 @@ libmesagallium_la_LIBADD = \
$(top_builddir)/src/glsl/libglsl.la \
$(ARCH_LIBS)
 
+libmesa_ssse3_la_SOURCES = \
+   main/fast_rgba8_copy.c
+libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3
+
 libmesa_sse41_la_SOURCES = \
main/streaming-load-memcpy.c
 libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
index cb5738a..81a2310 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
@@ -27,6 +27,7 @@
  **/
 
 #include main/bufferobj.h
+#include main/fast_rgba8_copy.h
 #include main/image.h
 #include main/macros.h
 #include main/mtypes.h
@@ -42,9 +43,7 @@
 #include intel_mipmap_tree.h
 #include intel_blit.h
 
-#ifdef __SSSE3__
-#include tmmintrin.h
-#endif
+#include x86/common_x86_asm.h
 
 #define FILE_DEBUG_FLAG DEBUG_TEXTURE
 
@@ -175,18 +174,6 @@ err:
return false;
 }
 
-#ifdef __SSSE3__
-static const uint8_t rgba8_permutation[16] =
-   { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 };
-
-/* NOTE: dst must be 16 byte aligned */
-#define rgba8_copy_16(dst, src) \
-   *(__m128i *)(dst) = _mm_shuffle_epi8(\
-  (__m128i) _mm_loadu_ps((float *)(src)),   \
-  *(__m128i *) rgba8_permutation\
-   )
-#endif
-
 /**
  * Copy RGBA to BGRA - swap R and B.
  */
@@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes)
uint8_t *d = dst;
uint8_t const *s = src;
 
-#ifdef __SSSE3__
-   /* Fast copying for tile spans.
-*
-* As long as the destination texture is 16 aligned,
-* any 16 or 64 spans we get here should also be 16 aligned.
-*/
-
-   if (bytes == 16) {
-  assert(!(((uintptr_t)dst)  0xf));
-  rgba8_copy_16(d+ 0, s+ 0);
-  return dst;
-   }
-
-   if (bytes == 64) {
-  assert(!(((uintptr_t)dst)  0xf));
-  rgba8_copy_16(d+ 0, s+ 0);
-  rgba8_copy_16(d+16, s+16);
-  rgba8_copy_16(d+32, s+32);
-  rgba8_copy_16(d+48, s+48);
-  return dst;
-   }
-#endif
-
while (bytes = 4) {
   d[0] = s[2];
   d[1] = s[1];
@@ -355,6 +319,12 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
   if (mem_copy == memcpy)
  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
dst, src, src_pitch, swizzle_bit, memcpy);
+  #if defined(USE_SSSE3)
+  else if (mem_copy == _mesa_fast_rgba8_copy)
+ return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
+   dst, src, src_pitch, swizzle_bit,
+   _mesa_fast_rgba8_copy);
+  #endif
   else if (mem_copy == rgba8_copy)
  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
dst, src, src_pitch, swizzle_bit, rgba8_copy);
@@ -362,6 +332,12 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
   if (mem_copy == memcpy)
  return xtile_copy(x0, x1, x2, x3, y0, y1,
dst, src, src_pitch, swizzle_bit, memcpy);
+  #if defined(USE_SSSE3)
+  else if (mem_copy == 

[Mesa-dev] [PATCH V2 1/2] mesa: add runtime support for SSSE3

2014-11-07 Thread Timothy Arceri
V2:
- remove unrequired #ifdef bit_SSSE3
- order flag check in config

Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
 configure.ac   | 6 ++
 src/mesa/x86/common_x86.c  | 2 ++
 src/mesa/x86/common_x86_features.h | 4 +++-
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 03f1bca..f86edf1 100644
--- a/configure.ac
+++ b/configure.ac
@@ -252,6 +252,12 @@ AC_SUBST([VISIBILITY_CXXFLAGS])
 dnl
 dnl Optional flags, check for compiler support
 dnl
+AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0])
+if test x$SSSE3_SUPPORTED = x1; then
+DEFINES=$DEFINES -DUSE_SSSE3
+fi
+AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1])
+
 AX_CHECK_COMPILE_FLAG([-msse4.1], [SSE41_SUPPORTED=1], [SSE41_SUPPORTED=0])
 if test x$SSE41_SUPPORTED = x1; then
 DEFINES=$DEFINES -DUSE_SSE41
diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c
index 25f5c40..bef9cf2 100644
--- a/src/mesa/x86/common_x86.c
+++ b/src/mesa/x86/common_x86.c
@@ -352,6 +352,8 @@ _mesa_get_x86_features(void)
 
   __get_cpuid(1, eax, ebx, ecx, edx);
 
+  if (ecx  bit_SSSE3)
+ _mesa_x86_cpu_features |= X86_FEATURE_SSSE3;
   if (ecx  bit_SSE4_1)
  _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1;
}
diff --git a/src/mesa/x86/common_x86_features.h 
b/src/mesa/x86/common_x86_features.h
index 66f2cf6..6eb2b38 100644
--- a/src/mesa/x86/common_x86_features.h
+++ b/src/mesa/x86/common_x86_features.h
@@ -43,7 +43,8 @@
 #define X86_FEATURE_XMM2   (16)
 #define X86_FEATURE_3DNOWEXT   (17)
 #define X86_FEATURE_3DNOW  (18)
-#define X86_FEATURE_SSE4_1 (19)
+#define X86_FEATURE_SSSE3  (19)
+#define X86_FEATURE_SSE4_1 (110)
 
 /* standard X86 CPU features */
 #define X86_CPU_FPU(10)
@@ -65,6 +66,7 @@
 #define cpu_has_xmm2   (_mesa_x86_cpu_features  X86_FEATURE_XMM2)
 #define cpu_has_3dnow  (_mesa_x86_cpu_features  X86_FEATURE_3DNOW)
 #define cpu_has_3dnowext   (_mesa_x86_cpu_features  X86_FEATURE_3DNOWEXT)
+#define cpu_has_ssse3  (_mesa_x86_cpu_features  X86_FEATURE_SSSE3)
 #define cpu_has_sse4_1 (_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)
 
 #endif
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Timothy Arceri
On Fri, 2014-11-07 at 11:44 +, Steven Newbury wrote:
 On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:
  On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
  siavashser...@gmail.com wrote:
   Then I do recommend removing the if (cpu_has_sse4_1) from this 
   patch and similar places, because there is no runtime CPU 
   dispatching happening for SSE optimized code paths in action and 
   just adds extra overhead (unnecessary branches) to the generated 
   code.
  
  No. Sorry, I realize I misread your previous question:
  
I guess checking for cpu_has_sse4_1 is unnecessary if it isn't 
controllable by user at runtime; because USE_SSE41 is a 
compile time check and requires the target machine to be SSE 4.1 
capable already.
  
  USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you 
  to build the code and then use it only on systems that actually 
  support it.
  
  All of this could have been pretty easily answered by a few greps 
  though...
 
 I wonder what difference it would make to have an option to compile 
 out the run-time check code to avoid the additional overhead in cases 
 where the builder *knows* at compile time what the run-time system is? 
 (ie Gentoo)

As long as the check is placed in the right location it shouldn't really
make a noticeable difference. i.e. just outside the hotspot and not
inside it. 

Things that will have more impact is not being able to inline certain
code such as in the latest patchset I sent out. It seems this is another
side effect the way gcc handles intrinsics.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] Disable the EGL state tracker for Linux/DRI builds

2014-11-07 Thread Marek Olšák
On Fri, Nov 7, 2014 at 1:10 PM, Steven Newbury st...@snewbury.org.uk wrote:
 On Wed, 2014-11-05 at 00:46 +, Emil Velikov wrote:
 On 04/11/14 22:42, Marek Olšák wrote:
  Hi everybody,
 
  I'm about to address this long-standing issue: The EGL state
  tracker is redundant. It duplicates what st/dri does and it also
  duplicates what the common loader egl_dri2 does, which is used by
  all classic drivers and even works better with gallium drivers.
 
  Let's compare EGL extensions for both backends:
 
  st/egl:
  EGL version string: 1.4 (Gallium)
  EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenVG
  EGL extensions string:
  EGL_WL_bind_wayland_display EGL_KHR_image_base
  EGL_KHR_image_pixmap
  EGL_KHR_image EGL_KHR_reusable_sync EGL_KHR_fence_sync
  EGL_KHR_surfaceless_context EGL_NOK_swap_region
  EGL_NV_post_sub_buffer
 
  egl_dri2:
  EGL version string: 1.4 (DRI2)
  EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenGL_ES3
  EGL extensions string:
  EGL_MESA_drm_image EGL_MESA_configless_context
  EGL_KHR_image_base
  EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_texture_2D_image
  EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image
  EGL_KHR_surfaceless_context EGL_KHR_create_context
  EGL_NOK_swap_region EGL_NOK_texture_from_pixmap
  EGL_CHROMIUM_sync_control EGL_EXT_image_dma_buf_import
  EGL_NV_post_sub_buffer
 
  egl_dri2 also supports MSAA on the window framebuffer (through
  st/dri). It's really obvious which one is better.
 
  I'm aware of 2 features that we will lose:
  - swrast on Wayland - I'm not sure about this. Perhaps kms-swrast
  has
  addressed this already.
  - OpenVG - It has never taken off. If people want this on Linux,
  it should
  use egl_dri2 and st/dri, like OpenGL does.
 
 I think it would be a shame to lose the OpenVG backend.  It's been
 disappointing with the lack of interest on desktop systems even
 through cairo, but I wonder if it was because of the inherent Gallium
 only nature?  Cairo's GL backend is probably more likely to work on
 more systems because of this.  Admittedly, it was probably mostly
 useful as a technology on weaker CPUs and simpler GPUs without full
 OpenGL(ES) capability where it could provide a performant GUI.

  T his series removes st/egl and st/gbm support from the autoconf
  build (the latter depends on the former and is probably just as
  redundant). The next step is to remove all Linux-specific backends
  from st/egl. Windows, Android, and other platform backends will be
  kept intact, therefore st/egl won't be removed completely.
 
  Please comment.
 
 I use EGL_DRIVER=egl_gallium to switch to using the ilo driver at run-
 time.  (Admittedly, mostly for testing more than anything useful.)
 There would have to be some other way of at least selecting run-time
 backend to egl_dri2 for this functionality to continue working.

 A few thoughts:
  - Iirc Eric is using st/egl as the dri2 backend seems to be causing
 problems :(
  - Android supports dri modules, but st/dri/Android.mk is missing.
 Should be trivial to add.
  - Windows - might be a pain in the a** to get it working. Then again
 does Windows have EGL ?
  - fbdev, OpenVG - the etnaviv project was using the former,
 currently
 moving to full-blown drm :)

 If one is to nuking the three st we can remove
  - the libEGL symbol export mayhem
  - gallium's st_api, and flatten things a bit
  - a bit of duplication :)

 In summary:
 With a bit of work we can remove Linux and Android from the
 equation. How many people/companies use EGL for Windows/fbdev, how
 about OpenVG on any platform ?

 I'm not sure we can know how many use OpenVG.  Probably more than is
 commonly thought on this list.

Then I'd like those people or companies to speak up.

I seriously doubt anyone is using OpenVG with a Gallium driver. It
even needs GL3 hardware (because it uses ARL in a fragment shader).
That means every non-GL3 Gallium driver has incomplete OpenVG support,
so OpenVG users should use these: radeon = r600, nouveau = nv50, ilo
maybe? That's it. I wouldn't bother with softpipe and llvmpipe -- if
you have to use them, you're better off with a VG-over-GL wrapper and
a good OpenGL driver/hardware combo.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Siavash Eliasi


On 11/07/2014 03:14 PM, Steven Newbury wrote:

On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:

On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
siavashser...@gmail.com wrote:

Then I do recommend removing the if (cpu_has_sse4_1) from this
patch and similar places, because there is no runtime CPU
dispatching happening for SSE optimized code paths in action and
just adds extra overhead (unnecessary branches) to the generated
code.

No. Sorry, I realize I misread your previous question:


I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
controllable by user at runtime; because USE_SSE41 is a
compile time check and requires the target machine to be SSE 4.1
capable already.

USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
to build the code and then use it only on systems that actually
support it.

All of this could have been pretty easily answered by a few greps
though...

I wonder what difference it would make to have an option to compile
out the run-time check code to avoid the additional overhead in cases
where the builder *knows* at compile time what the run-time system is?
(ie Gentoo)
I think that's possible. Since cpu_has_sse4_1 and friends are simply 
macros, one can set them to true or 1 during compile time if it's 
going to be built for an SSE 4.1 capable target so your smart compiler 
will totally get rid of the unnecessary runtime check.


I guess common_x86_features.h should be modified to something like this:

#ifdef __SSE4_1__
#define cpu_has_sse4_1 1
#else
#define cpu_has_sse4_1(_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)
#endif
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping.

2014-11-07 Thread jfonseca
From: José Fonseca jfons...@vmware.com

This commit causes the generated C code to change as

union util_format_r32g32b32a32_sscaled pixel;
  - pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647);
  - pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647);
  - pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647);
  - pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647);
  + pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 
2147483647.0f);
  + pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 
2147483647.0f);
  + pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 
2147483647.0f);
  + pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 
2147483647.0f);
memcpy(dst, pixel, sizeof pixel);

which surprisingly makes a difference for MSVC.

Thanks to Juraj Svec for diagnosing this and drafting a fix.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=29661

Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org
---
 src/gallium/auxiliary/util/u_format_pack.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_format_pack.py 
b/src/gallium/auxiliary/util/u_format_pack.py
index 6ccf04c..90f348e 100644
--- a/src/gallium/auxiliary/util/u_format_pack.py
+++ b/src/gallium/auxiliary/util/u_format_pack.py
@@ -226,9 +226,9 @@ def native_to_constant(type, value):
 '''Get the value of unity for this type.'''
 if type.type == FLOAT:
 if type.size = 32:
-return %ff % value 
+return %.1ff % float(value)
 else:
-return %ff % value 
+return %.1f % float(value)
 else:
 return str(int(value))
 
@@ -251,8 +251,8 @@ def clamp_expr(src_channel, dst_channel, dst_native_type, 
value):
 dst_max = dst_channel.max()
 
 # Translate the destination range to the src native value
-dst_min_native = value_to_native(src_channel, dst_min)
-dst_max_native = value_to_native(src_channel, dst_max)
+dst_min_native = native_to_constant(src_channel, 
value_to_native(src_channel, dst_min))
+dst_max_native = native_to_constant(src_channel, 
value_to_native(src_channel, dst_max))
 
 if src_min  dst_min and src_max  dst_max:
 return 'CLAMP(%s, %s, %s)' % (value, dst_min_native, dst_max_native)
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 61312] Assertion failure: freeing bad or corrupted memory in st_translate_mesa_program

2014-11-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=61312

José Fonseca jfons...@vmware.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from José Fonseca jfons...@vmware.com ---
This assertion happened because code in st_mesa_to_tgsi.c was mixing up
realloc() with FREE() for the same pointer, instead of using REALLOC()+FREE()
or realloc() + free() consistent, as the debugging helpers are only enabled
with the macros.

But this has been fixed since then.

It wasn't a single fix. The last one was
11070105f0b5ad20f12bb40a8dd0b357924bcfdd.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping.

2014-11-07 Thread Chris Forbes
A possible wrinkle here is that 2147483647 isn't exactly representable
as a float (the adjacent floats are 2147483520 and 2147483648). Is the
clamp actually doing the right thing at the top end?

On Sat, Nov 8, 2014 at 3:32 AM,  jfons...@vmware.com wrote:
 From: José Fonseca jfons...@vmware.com

 This commit causes the generated C code to change as

 union util_format_r32g32b32a32_sscaled pixel;
   - pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647);
   - pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647);
   - pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647);
   - pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647);
   + pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 
 2147483647.0f);
 memcpy(dst, pixel, sizeof pixel);

 which surprisingly makes a difference for MSVC.

 Thanks to Juraj Svec for diagnosing this and drafting a fix.

 Fixes https://bugs.freedesktop.org/show_bug.cgi?id=29661

 Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org
 ---
  src/gallium/auxiliary/util/u_format_pack.py | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/src/gallium/auxiliary/util/u_format_pack.py 
 b/src/gallium/auxiliary/util/u_format_pack.py
 index 6ccf04c..90f348e 100644
 --- a/src/gallium/auxiliary/util/u_format_pack.py
 +++ b/src/gallium/auxiliary/util/u_format_pack.py
 @@ -226,9 +226,9 @@ def native_to_constant(type, value):
  '''Get the value of unity for this type.'''
  if type.type == FLOAT:
  if type.size = 32:
 -return %ff % value
 +return %.1ff % float(value)
  else:
 -return %ff % value
 +return %.1f % float(value)
  else:
  return str(int(value))

 @@ -251,8 +251,8 @@ def clamp_expr(src_channel, dst_channel, dst_native_type, 
 value):
  dst_max = dst_channel.max()

  # Translate the destination range to the src native value
 -dst_min_native = value_to_native(src_channel, dst_min)
 -dst_max_native = value_to_native(src_channel, dst_max)
 +dst_min_native = native_to_constant(src_channel, 
 value_to_native(src_channel, dst_min))
 +dst_max_native = native_to_constant(src_channel, 
 value_to_native(src_channel, dst_max))

  if src_min  dst_min and src_max  dst_max:
  return 'CLAMP(%s, %s, %s)' % (value, dst_min_native, dst_max_native)
 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread steve
On Fri Nov 7 14:09:09 2014 GMT, Siavash Eliasi wrote:
 
 On 11/07/2014 03:14 PM, Steven Newbury wrote:
  On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:
  On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
  siavashser...@gmail.com wrote:
  Then I do recommend removing the if (cpu_has_sse4_1) from this
  patch and similar places, because there is no runtime CPU
  dispatching happening for SSE optimized code paths in action and
  just adds extra overhead (unnecessary branches) to the generated
  code.
  No. Sorry, I realize I misread your previous question:
 
  I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
  controllable by user at runtime; because USE_SSE41 is a
  compile time check and requires the target machine to be SSE 4.1
  capable already.
  USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
  to build the code and then use it only on systems that actually
  support it.
 
  All of this could have been pretty easily answered by a few greps
  though...
  I wonder what difference it would make to have an option to compile
  out the run-time check code to avoid the additional overhead in cases
  where the builder *knows* at compile time what the run-time system is?
  (ie Gentoo)
 I think that's possible. Since cpu_has_sse4_1 and friends are simply 
 macros, one can set them to true or 1 during compile time if it's 
 going to be built for an SSE 4.1 capable target so your smart compiler 
 will totally get rid of the unnecessary runtime check.
 
 I guess common_x86_features.h should be modified to something like this:
 
 #ifdef __SSE4_1__
 #define cpu_has_sse4_1 1
 #else
 #define cpu_has_sse4_1(_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)
 #endif

Yes, this was what I was thinking.  Then perhaps an option for disabling 
run-time detection, with the available  cpu features then determined during 
configuration setting  appropriate defines.

Whether it's worth it I don't know. I can imagine the compiler having an easier 
job optimizing the code.
-- 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Use the predicate enable bit for conditional rendering without stalling

2014-11-07 Thread Neil Roberts
Previously whenever a primitive is drawn the driver would call
_mesa_check_conditional_render which blocks waiting for the result of the
query to determine whether to render. On Gen7+ there is a bit in the
3DPRIMITIVE command which can be used to disable the primitive based on the
value of a state bit. This state bit can be set based on whether two registers
have different values using the MI_PREDICATE command. We can load these two
registers with the pixel count values stored in the query begin and end to
implement conditional rendering without stalling.

Unfortunately these two source registers are not in the whitelist of available
registers in the kernel driver so this needs a kernel patch to work. This
patch tries to check whether it is possible to write to this register by
creating a DRM buffer with a single command to write to the register and then
trying to exec it. The return value of the exec function (and hence the ioctl)
is checked.

There is a function with a similar goal just above to check whether the
OACONTROL register can be used. This works by putting the test command in the
regular batch buffer and trying to compare whether the value was successfully
written. I couldn't get this approach to work because intel_batchbuffer_flush
actually calls exit() if the ioctl fails. I am suspicious that this approach
doesn't work for OACONTROL either.

The predicate enable bit is currently only used for drawing 3D primitives. For
blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For
most of these it would probably just work to call the new
brw_check_conditional_render function instead of
_mesa_check_conditional_render because they already work in terms of rendering
primitives. However it's a bit trickier for blits because it can use the BLT
ring or the blorp codepath. I think these operations are less useful for
conditional rendering than rendering primitives so it might be best to leave
it for a later patch.
---

I'm not really sure whether I'm using the right cache domain to load
the pixel count values into the predicate source values. Currently I'm
using the instruction domain because that is what is used to write the
values from a PIPE_CONTROL command but that doesn't seem to make much
sense because the comment for that domain says it is for shaders.

I've added the PIPE_CONTROL_FLUSH_ENABLE flag to the PIPE_CONTROL
command used to save the depth pass count because the PRM says that
you should do this. However it seems to work just as well without it
and I'm not really sure what it does. Perhaps without the bit it can
start the next command without waiting for the depth pass count to be
written and that would be bad because the next command is likely to be
the load into the predicate register. I'm not sure if it also implies
a cache flush because I would expect that the PIPE_CONTROL command and
the register load command would be using the same cache domain?

I'll post the kernel patch to the intel-gfx list.

I've also written a little demo using conditional rendering here:

https://github.com/bpeel/glthing/tree/wip/conditional-render

With the patch the FPS jumps from ~20 to ~50 although it's a bit of a
silly test because if you just completely disable conditional
rendering then the FPS goes all the way up to 270.

 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 +
 src/mesa/drivers/dri/i965/brw_context.c|   4 +
 src/mesa/drivers/dri/i965/brw_context.h|  21 +++
 src/mesa/drivers/dri/i965/brw_defines.h|   1 +
 src/mesa/drivers/dri/i965/brw_draw.c   |  16 +-
 src/mesa/drivers/dri/i965/brw_queryobj.c   |  17 ++-
 src/mesa/drivers/dri/i965/intel_extensions.c   |  48 ++
 src/mesa/drivers/dri/i965/intel_reg.h  |  23 +++
 9 files changed, 286 insertions(+), 12 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 711aabe..0adaf4d 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -38,6 +38,7 @@ i965_FILES = \
brw_clip_tri.c \
brw_clip_unfilled.c \
brw_clip_util.c \
+   brw_conditional_render.c \
brw_context.c \
brw_cubemap_normalize.cpp \
brw_curbe.c \
diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c 
b/src/mesa/drivers/dri/i965/brw_conditional_render.c
new file mode 100644
index 000..e676aa0
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
@@ -0,0 +1,167 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * 

Re: [Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping.

2014-11-07 Thread Jose Fonseca
Great point.


The source values, src[], are already in floating point, so it's OK not to 
represent 2147483647 exactly, however, if the compiler 2147483647.0f upwards to 
2147483648 then indeed the clamping won't work properly, as 2147483648 will 
overflow when converted to int32.

I've confirmed that gcc generates precisely the same assembly, so at least this 
change fix one.

But you're right, we do have a bug, before and after this change:

$ cat test.c 
#include stdint.h
#include stdio.h

#define CLAMP( X, MIN, MAX )  ( (X)(MIN) ? (MIN) : ((X)(MAX) ? (MAX) : (X)) )

int main()
{
float src = 2147483648.0f;
int32_t dst = (int32_t)CLAMP(src, -2147483648, 2147483647);
printf(%f %i\n, src, dst);
return 0;
}
$ gcc -O0 -Wall -o test test.c
$ ./test 
2147483648.00 -2147483648


And to make things even stranger, it depends on compiler optimization:

$ gcc -O2 -Wall -o test test.c
$ ./test 
2147483648.00 2147483647

Bummer..

It's quite hard to come up with general rules to cover all this corner cases.  
I suspect llvmpipe's LLVM IR generation suffers from similar issues .

To starters we need to add test cases for these...

Jose


From: Chris Forbes chr...@ijw.co.nz
Sent: 07 November 2014 15:12
To: Jose Fonseca
Cc: mesa-dev@lists.freedesktop.org; Roland Scheidegger; Brian Paul
Subject: Re: [Mesa-dev] [PATCH] util/format: Generate floating point constants 
for clamping.

A possible wrinkle here is that 2147483647 isn't exactly representable
as a float (the adjacent floats are 2147483520 and 2147483648). Is the
clamp actually doing the right thing at the top end?

On Sat, Nov 8, 2014 at 3:32 AM,  jfons...@vmware.com wrote:
 From: José Fonseca jfons...@vmware.com

 This commit causes the generated C code to change as

 union util_format_r32g32b32a32_sscaled pixel;
   - pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647);
   - pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647);
   - pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647);
   - pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647);
   + pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 
 2147483647.0f);
 memcpy(dst, pixel, sizeof pixel);

 which surprisingly makes a difference for MSVC.

 Thanks to Juraj Svec for diagnosing this and drafting a fix.

 Fixes 
 https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D29661d=AAIFaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=djQWlVnf6dbYVyPQ5qxDycdd3R5KbOq7Bj00kHD0dZEs=1xrbiqwOdtyjZXGw4e6Kj338lZaorHg_Hj-BNByAboQe=

 Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org
 ---
  src/gallium/auxiliary/util/u_format_pack.py | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/src/gallium/auxiliary/util/u_format_pack.py 
 b/src/gallium/auxiliary/util/u_format_pack.py
 index 6ccf04c..90f348e 100644
 --- a/src/gallium/auxiliary/util/u_format_pack.py
 +++ b/src/gallium/auxiliary/util/u_format_pack.py
 @@ -226,9 +226,9 @@ def native_to_constant(type, value):
  '''Get the value of unity for this type.'''
  if type.type == FLOAT:
  if type.size = 32:
 -return %ff % value
 +return %.1ff % float(value)
  else:
 -return %ff % value
 +return %.1f % float(value)
  else:
  return str(int(value))

 @@ -251,8 +251,8 @@ def clamp_expr(src_channel, dst_channel, dst_native_type, 
 value):
  dst_max = dst_channel.max()

  # Translate the destination range to the src native value
 -dst_min_native = value_to_native(src_channel, dst_min)
 -dst_max_native = value_to_native(src_channel, dst_max)
 +dst_min_native = native_to_constant(src_channel, 
 value_to_native(src_channel, dst_min))
 +dst_max_native = native_to_constant(src_channel, 
 value_to_native(src_channel, dst_max))

  if src_min  dst_min and src_max  dst_max:
  return 'CLAMP(%s, %s, %s)' % (value, dst_min_native, dst_max_native)
 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIFaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=djQWlVnf6dbYVyPQ5qxDycdd3R5KbOq7Bj00kHD0dZEs=imZ41uT_6T2fgHee2dCxYwjQLSqfLYKgC8YgzpGFbpMe=
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping.

2014-11-07 Thread Roland Scheidegger
Surprising indeed...

Reviewed-by: Roland Scheidegger srol...@vmware.com

Am 07.11.2014 um 15:32 schrieb jfons...@vmware.com:
 From: José Fonseca jfons...@vmware.com
 
 This commit causes the generated C code to change as
 
 union util_format_r32g32b32a32_sscaled pixel;
   - pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647);
   - pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647);
   - pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647);
   - pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647);
   + pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 
 2147483647.0f);
 memcpy(dst, pixel, sizeof pixel);
 
 which surprisingly makes a difference for MSVC.
 
 Thanks to Juraj Svec for diagnosing this and drafting a fix.
 
 Fixes https://bugs.freedesktop.org/show_bug.cgi?id=29661
 
 Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org
 ---
  src/gallium/auxiliary/util/u_format_pack.py | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/src/gallium/auxiliary/util/u_format_pack.py 
 b/src/gallium/auxiliary/util/u_format_pack.py
 index 6ccf04c..90f348e 100644
 --- a/src/gallium/auxiliary/util/u_format_pack.py
 +++ b/src/gallium/auxiliary/util/u_format_pack.py
 @@ -226,9 +226,9 @@ def native_to_constant(type, value):
  '''Get the value of unity for this type.'''
  if type.type == FLOAT:
  if type.size = 32:
 -return %ff % value 
 +return %.1ff % float(value)
  else:
 -return %ff % value 
 +return %.1f % float(value)
  else:
  return str(int(value))
  
 @@ -251,8 +251,8 @@ def clamp_expr(src_channel, dst_channel, dst_native_type, 
 value):
  dst_max = dst_channel.max()
  
  # Translate the destination range to the src native value
 -dst_min_native = value_to_native(src_channel, dst_min)
 -dst_max_native = value_to_native(src_channel, dst_max)
 +dst_min_native = native_to_constant(src_channel, 
 value_to_native(src_channel, dst_min))
 +dst_max_native = native_to_constant(src_channel, 
 value_to_native(src_channel, dst_max))
  
  if src_min  dst_min and src_max  dst_max:
  return 'CLAMP(%s, %s, %s)' % (value, dst_min_native, dst_max_native)
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Ian Romanick
On 11/07/2014 06:09 AM, Siavash Eliasi wrote:
 
 On 11/07/2014 03:14 PM, Steven Newbury wrote:
 On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:
 On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
 siavashser...@gmail.com wrote:
 Then I do recommend removing the if (cpu_has_sse4_1) from this
 patch and similar places, because there is no runtime CPU
 dispatching happening for SSE optimized code paths in action and
 just adds extra overhead (unnecessary branches) to the generated
 code.
 No. Sorry, I realize I misread your previous question:

 I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
 controllable by user at runtime; because USE_SSE41 is a
 compile time check and requires the target machine to be SSE 4.1
 capable already.
 USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
 to build the code and then use it only on systems that actually
 support it.

 All of this could have been pretty easily answered by a few greps
 though...
 I wonder what difference it would make to have an option to compile
 out the run-time check code to avoid the additional overhead in cases
 where the builder *knows* at compile time what the run-time system is?
 (ie Gentoo)
 I think that's possible. Since cpu_has_sse4_1 and friends are simply
 macros, one can set them to true or 1 during compile time if it's
 going to be built for an SSE 4.1 capable target so your smart compiler
 will totally get rid of the unnecessary runtime check.
 
 I guess common_x86_features.h should be modified to something like this:
 
 #ifdef __SSE4_1__
 #define cpu_has_sse4_1 1
 #else
 #define cpu_has_sse4_1(_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)
 #endif

I was thinking about doing something similar for cpu_has_xmm and
cpu_has_xmm2 for x64.  SSE and SSE2 are required parts of that
instruction set, so they're always there.

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util/format: Generate floating point constants for clamping.

2014-11-07 Thread Roland Scheidegger
Ah yes I forgot about that, even though I have fixed this same problem
elsewhere in the past...
It is indeed an annoying problem, luckily only affects 32bit numbers.

Am 07.11.2014 um 16:35 schrieb Jose Fonseca:
 Great point.
 
 
 The source values, src[], are already in floating point, so it's OK not to 
 represent 2147483647 exactly, however, if the compiler 2147483647.0f upwards 
 to 2147483648 then indeed the clamping won't work properly, as 2147483648 
 will overflow when converted to int32.
 
 I've confirmed that gcc generates precisely the same assembly, so at least 
 this change fix one.
 
 But you're right, we do have a bug, before and after this change:
 
 $ cat test.c 
 #include stdint.h
 #include stdio.h
 
 #define CLAMP( X, MIN, MAX )  ( (X)(MIN) ? (MIN) : ((X)(MAX) ? (MAX) : (X)) 
 )
 
 int main()
 {
   float src = 2147483648.0f;
   int32_t dst = (int32_t)CLAMP(src, -2147483648, 2147483647);
   printf(%f %i\n, src, dst);
   return 0;
 }
 $ gcc -O0 -Wall -o test test.c
 $ ./test 
 2147483648.00 -2147483648
 
 
 And to make things even stranger, it depends on compiler optimization:
 
 $ gcc -O2 -Wall -o test test.c
 $ ./test 
 2147483648.00 2147483647

That's not surprising, since it's undefined. On x86 out-of-bounds values
return the integer indefinite value, which is 0x8000 (though might
be different if using sse or not I forgot).
With O2 it probably got optimized away completely by gcc and the logic
apparently got it correct.

 
 Bummer..
 
 It's quite hard to come up with general rules to cover all this corner cases. 
  I suspect llvmpipe's LLVM IR generation suffers from similar issues .
 
 To starters we need to add test cases for these...
I think we mostly get lucky because a lot of possible conversions are
things which aren't hit in practice.


 Jose
 
 
 From: Chris Forbes chr...@ijw.co.nz
 Sent: 07 November 2014 15:12
 To: Jose Fonseca
 Cc: mesa-dev@lists.freedesktop.org; Roland Scheidegger; Brian Paul
 Subject: Re: [Mesa-dev] [PATCH] util/format: Generate floating point 
 constants for clamping.
 
 A possible wrinkle here is that 2147483647 isn't exactly representable
 as a float (the adjacent floats are 2147483520 and 2147483648). Is the
 clamp actually doing the right thing at the top end?
 
 On Sat, Nov 8, 2014 at 3:32 AM,  jfons...@vmware.com wrote:
 From: José Fonseca jfons...@vmware.com

 This commit causes the generated C code to change as

 union util_format_r32g32b32a32_sscaled pixel;
   - pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648, 2147483647);
   - pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648, 2147483647);
   - pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648, 2147483647);
   - pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648, 2147483647);
   + pixel.chan.r = (int32_t)CLAMP(src[0], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.g = (int32_t)CLAMP(src[1], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.b = (int32_t)CLAMP(src[2], -2147483648.0f, 
 2147483647.0f);
   + pixel.chan.a = (int32_t)CLAMP(src[3], -2147483648.0f, 
 2147483647.0f);
 memcpy(dst, pixel, sizeof pixel);

 which surprisingly makes a difference for MSVC.

 Thanks to Juraj Svec for diagnosing this and drafting a fix.

 Fixes 
 https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D29661d=AAIFaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=djQWlVnf6dbYVyPQ5qxDycdd3R5KbOq7Bj00kHD0dZEs=1xrbiqwOdtyjZXGw4e6Kj338lZaorHg_Hj-BNByAboQe=

 Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org
 ---
  src/gallium/auxiliary/util/u_format_pack.py | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/src/gallium/auxiliary/util/u_format_pack.py 
 b/src/gallium/auxiliary/util/u_format_pack.py
 index 6ccf04c..90f348e 100644
 --- a/src/gallium/auxiliary/util/u_format_pack.py
 +++ b/src/gallium/auxiliary/util/u_format_pack.py
 @@ -226,9 +226,9 @@ def native_to_constant(type, value):
  '''Get the value of unity for this type.'''
  if type.type == FLOAT:
  if type.size = 32:
 -return %ff % value
 +return %.1ff % float(value)
  else:
 -return %ff % value
 +return %.1f % float(value)
  else:
  return str(int(value))

 @@ -251,8 +251,8 @@ def clamp_expr(src_channel, dst_channel, 
 dst_native_type, value):
  dst_max = dst_channel.max()

  # Translate the destination range to the src native value
 -dst_min_native = value_to_native(src_channel, dst_min)
 -dst_max_native = value_to_native(src_channel, dst_max)
 +dst_min_native = native_to_constant(src_channel, 
 value_to_native(src_channel, dst_min))
 +dst_max_native = native_to_constant(src_channel, 
 value_to_native(src_channel, dst_max))

  if src_min  dst_min and src_max  dst_max:
  return 

[Mesa-dev] [Bug 54080] glXQueryDrawable fails with GLXBadDrawable for a Window in direct context

2014-11-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=54080

Eero Tamminen eero.t.tammi...@intel.com changed:

   What|Removed |Added

 CC||eero.t.tammi...@intel.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Siavash Eliasi
Then I do recommend removing the if (cpu_has_sse4_1) from this patch 
and similar places, because there is no runtime CPU dispatching 
happening for SSE optimized code paths in action and just adds extra 
overhead (unnecessary branches) to the generated code.


Same must be applied to these patches:
[Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy
http://lists.freedesktop.org/archives/mesa-dev/2014-November/070256.html

Best regards,
Siavash Eliasi.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-announce] Mesa 10.3 release candidate 1

2014-11-07 Thread Thierry Vignaud
On 5 November 2014 04:44, Matt Turner matts...@gmail.com wrote:
I tried to reproduce this today and couldn't.

(...)

 Thanks. Maybe you could give a little more information, like an error
 message or something?

 Same error as Thierry reported in this thread in August:

 Unfortunately Thierry's was from a re-run of make, so it wasn't useful.

No It wasn't a re-run!
It was a clean build in our build system with make -jXX with XX auto set to
the number of cores and is always reproducable given enough cores

 I've gone over this all and can't spot the problem. The dependencies
 look fine. I tried automake-1.13 and 1.14, and make-3.82 and 4.0.
 Maybe I'll have more luck on a 40 core system.

As already explained, in order to be able to reproduce, you must either have
a large system or force make -j to a high value (eg: -j24)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading opengl32.dll

2014-11-07 Thread jfonseca
From: José Fonseca jfons...@vmware.com

On Windows, DllMain calls and thread creation/destruction are
serialized, so when llvmpipe is destroyed from DllMain waiting for the
rasterizer threads to finish will deadlock.

So, instead of waiting for rasterizer threads to have finished, simply wait for 
the
rasterizer threads to notify they are just about to finish.

Verified with this very simple program:

   #include windows.h
   int main() {
  HMODULE hModule = LoadLibraryA(opengl32.dll);
  FreeLibrary(hModule);
   }

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=76252

Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/llvmpipe/lp_rast.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c 
b/src/gallium/drivers/llvmpipe/lp_rast.c
index a3420a2..6b54d43 100644
--- a/src/gallium/drivers/llvmpipe/lp_rast.c
+++ b/src/gallium/drivers/llvmpipe/lp_rast.c
@@ -800,6 +800,8 @@ static PIPE_THREAD_ROUTINE( thread_function, init_data )
   pipe_semaphore_signal(task-work_done);
}
 
+   pipe_semaphore_signal(task-work_done);
+
return 0;
 }
 
@@ -885,9 +887,11 @@ void lp_rast_destroy( struct lp_rasterizer *rast )
   pipe_semaphore_signal(rast-tasks[i].work_ready);
}
 
-   /* Wait for threads to terminate before cleaning up per-thread data */
+   /* Wait for threads to terminate before cleaning up per-thread data.
+* We don't actually call pipe_thread_wait to avoid dead lock on Windows
+* per https://bugs.freedesktop.org/show_bug.cgi?id=76252 */
for (i = 0; i  rast-num_threads; i++) {
-  pipe_thread_wait(rast-threads[i]);
+  pipe_semaphore_wait(rast-tasks[i].work_done);
}
 
/* Clean up per-thread data */
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading opengl32.dll

2014-11-07 Thread Roland Scheidegger
Reviewed-by: Roland Scheidegger srol...@vmware.com

Am 07.11.2014 um 17:52 schrieb jfons...@vmware.com:
 From: José Fonseca jfons...@vmware.com
 
 On Windows, DllMain calls and thread creation/destruction are
 serialized, so when llvmpipe is destroyed from DllMain waiting for the
 rasterizer threads to finish will deadlock.
 
 So, instead of waiting for rasterizer threads to have finished, simply wait 
 for the
 rasterizer threads to notify they are just about to finish.
 
 Verified with this very simple program:
 
#include windows.h
int main() {
   HMODULE hModule = LoadLibraryA(opengl32.dll);
   FreeLibrary(hModule);
}
 
 Fixes https://bugs.freedesktop.org/show_bug.cgi?id=76252
 
 Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org
 ---
  src/gallium/drivers/llvmpipe/lp_rast.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c 
 b/src/gallium/drivers/llvmpipe/lp_rast.c
 index a3420a2..6b54d43 100644
 --- a/src/gallium/drivers/llvmpipe/lp_rast.c
 +++ b/src/gallium/drivers/llvmpipe/lp_rast.c
 @@ -800,6 +800,8 @@ static PIPE_THREAD_ROUTINE( thread_function, init_data )
pipe_semaphore_signal(task-work_done);
 }
  
 +   pipe_semaphore_signal(task-work_done);
 +
 return 0;
  }
  
 @@ -885,9 +887,11 @@ void lp_rast_destroy( struct lp_rasterizer *rast )
pipe_semaphore_signal(rast-tasks[i].work_ready);
 }
  
 -   /* Wait for threads to terminate before cleaning up per-thread data */
 +   /* Wait for threads to terminate before cleaning up per-thread data.
 +* We don't actually call pipe_thread_wait to avoid dead lock on Windows
 +* per https://bugs.freedesktop.org/show_bug.cgi?id=76252 */
 for (i = 0; i  rast-num_threads; i++) {
 -  pipe_thread_wait(rast-threads[i]);
 +  pipe_semaphore_wait(rast-tasks[i].work_done);
 }
  
 /* Clean up per-thread data */
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Siavash Eliasi


On 11/07/2014 07:31 PM, Ian Romanick wrote:

On 11/07/2014 06:09 AM, Siavash Eliasi wrote:

On 11/07/2014 03:14 PM, Steven Newbury wrote:

On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:

On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
siavashser...@gmail.com wrote:

Then I do recommend removing the if (cpu_has_sse4_1) from this
patch and similar places, because there is no runtime CPU
dispatching happening for SSE optimized code paths in action and
just adds extra overhead (unnecessary branches) to the generated
code.

No. Sorry, I realize I misread your previous question:


I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
controllable by user at runtime; because USE_SSE41 is a
compile time check and requires the target machine to be SSE 4.1
capable already.

USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
to build the code and then use it only on systems that actually
support it.

All of this could have been pretty easily answered by a few greps
though...

I wonder what difference it would make to have an option to compile
out the run-time check code to avoid the additional overhead in cases
where the builder *knows* at compile time what the run-time system is?
(ie Gentoo)

I think that's possible. Since cpu_has_sse4_1 and friends are simply
macros, one can set them to true or 1 during compile time if it's
going to be built for an SSE 4.1 capable target so your smart compiler
will totally get rid of the unnecessary runtime check.

I guess common_x86_features.h should be modified to something like this:

#ifdef __SSE4_1__
#define cpu_has_sse4_1 1
#else
#define cpu_has_sse4_1(_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)
#endif

I was thinking about doing something similar for cpu_has_xmm and
cpu_has_xmm2 for x64.  SSE and SSE2 are required parts of that
instruction set, so they're always there.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



I can come up with a patch implementing the same for SSE, SSE2, SSE3 and 
SSSE3 if current approach is fine by you.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84566] Unify the format conversion code

2014-11-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #53 from Jason Ekstrand ja...@jlekstrand.net ---
(In reply to Samuel Iglesias from comment #52)
 (In reply to Jason Ekstrand from comment #51)
  (In reply to Samuel Iglesias from comment #48)
   (In reply to Jason Ekstrand from comment #34)
(In reply to Samuel Iglesias from comment #33)
 Jason, I would like to know your opinion about the integer RGBA 
 clamping
 done in pack.c (_mesa_pack_rgba_span_from_ints()).
 
 glReadPixels() and glGetTexImage() specs said (for example, in OpenGL 
 3.3.
 core specification) that for an integer RGBA color, each component is
 clamped to the representable range of type.
 
 Those GL functions end up calling pack.c's functions for performing 
 the
 conversion (mesa_pack_rgba_span_from_ints() and others).
 
 It's possible to replace some of those pack.c's conversion functions 
 by the
 master conversion but the problem is in the clamping operation. I 
 would like
 to add a boolean argument called clamp to the master conversion 
 function
 which is passed to _mesa_swizzle_and_convert() where each of its
 convert_{uint,int,byte,ubyte,short,ushort}() do the right thing when 
 clamp
 is true.
 
 This clamp parameter would be false for every call to either master
 conversion function or _mesa_swizzle_and_convert() except when they 
 are
 being called from pack.c.
 
 What do you think?

Supporting clamping is probably fine.  I think we determined we needed a
clamp parameter anyway for some of the float conversions.  I guess it 
makes
sense for integers too.  Let's watch out for performance when we 
implement
it though.  Changing the loop inside mesa_swizzle_and_convert without
hurting performance can be tricky.  The teximage-colors test in Piglit 
has a
-benchmark flag that can be used for testing that.
   
   In the end, we did not make that change in pack.c as we could just use the
   autogenerated format pack/unpack functions. However I am retaking this 
   topic
   again because we found another issue which would require a similar 
   solution:
   
   The convert_*() functions in format_utils.c convert between source and
   destination data and are used by _mesa_swizzle_and_convert. We found that
   these were not good enough for various conversions that involved
   non-normalized types of different sizes: INT to SHORT, INT to BYTE, etc.
   Because of that, several piglit tests related to glGetTexImage() and 
   others
   failed, like for example bin/ext_texture_integer-getteximage-clamping.
   
   In order to fix that we added the clamp expressions for these cases [1]  
   and
   with that we achieved no regressions when executing a full piglit run on
   i965 driver.
   Unfortunately, when testing our patches on a couple of Gallium drivers, we
   found a regression that we tracked down back to that patch:
   bin/arb_clear_buffer_object-formats.
   Reverting the patch makes fixes the problem with these Gallium drivers but
   then, bin/ext_texture_integer-getteximage-clamping fails again on i965. We
   wonder if there could be more cases like this that piglit is not covering,
   since it looks weird that this affects just this one test.
   
   So, we wonder if that patch for _mesa_swizzle_and_convert is correct and 
   we
   should fix the failed gallium cases elsewhere, or if we should revert that
   patch and fix the cases it fixed in a different way. What do you think? 
   Was
   _mesa_swizzle_and_convert implemented like that  on purpose or are these
   real bugs?
  
  From my brief reading of the GL spec, it looks like clamping integers to the
  max representable range is what it expects by default.  From the glTexImage
  spec:
  
  The selected groups are transferred to the GL as described in section 3.7.2
  and then clamped to the representable range of the internal format. If the
  internalformat of the texture is signed or unsigned integer, components are
  clamped to [-2^(n-1), 2^(n-1)-1] or [0, 2^(n-1)-1], respectively, where n is
  the number of bits per component. For color component groups, if the
  internalformat of the texture is signed or unsigned normalized fixed-point,
  components are clamped t0 [-1, 1] or [0, 1], respectively.
  
  Therefore, it seems as if we want to be clamping when we have integer
  destinations.  I'm not sure why the gallium drivers are regressing when you
  do.
  
  One more observation is that it doesn't look like your clamping patch is
  complete.  If we're going to clamp when we have an integer destination, we
  should always clamp with integer destinations, not just in the few cases
  that piglit hits.  I wouldn't be surprised if piglit's coverage in those
  areas is terrible.
 
 
 Yes, you are right about the specification. We have added clamping for all
 the needed conversions in 

Re: [Mesa-dev] [PATCH 0/2] Disable the EGL state tracker for Linux/DRI builds

2014-11-07 Thread Kenneth Graunke
On Friday, November 07, 2014 01:52:13 PM Marek Olšák wrote:
 On Fri, Nov 7, 2014 at 1:10 PM, Steven Newbury st...@snewbury.org.uk 
wrote:
  On Wed, 2014-11-05 at 00:46 +, Emil Velikov wrote:
  On 04/11/14 22:42, Marek Olšák wrote:
   Hi everybody,
  
   I'm about to address this long-standing issue: The EGL state
   tracker is redundant. It duplicates what st/dri does and it also
   duplicates what the common loader egl_dri2 does, which is used by
   all classic drivers and even works better with gallium drivers.
  
   Let's compare EGL extensions for both backends:
  
   st/egl:
   EGL version string: 1.4 (Gallium)
   EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenVG
   EGL extensions string:
   EGL_WL_bind_wayland_display EGL_KHR_image_base
   EGL_KHR_image_pixmap
   EGL_KHR_image EGL_KHR_reusable_sync EGL_KHR_fence_sync
   EGL_KHR_surfaceless_context EGL_NOK_swap_region
   EGL_NV_post_sub_buffer
  
   egl_dri2:
   EGL version string: 1.4 (DRI2)
   EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 OpenGL_ES3
   EGL extensions string:
   EGL_MESA_drm_image EGL_MESA_configless_context
   EGL_KHR_image_base
   EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_texture_2D_image
   EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image
   EGL_KHR_surfaceless_context EGL_KHR_create_context
   EGL_NOK_swap_region EGL_NOK_texture_from_pixmap
   EGL_CHROMIUM_sync_control EGL_EXT_image_dma_buf_import
   EGL_NV_post_sub_buffer
  
   egl_dri2 also supports MSAA on the window framebuffer (through
   st/dri). It's really obvious which one is better.
  
   I'm aware of 2 features that we will lose:
   - swrast on Wayland - I'm not sure about this. Perhaps kms-swrast
   has
   addressed this already.
   - OpenVG - It has never taken off. If people want this on Linux,
   it should
   use egl_dri2 and st/dri, like OpenGL does.
  
  I think it would be a shame to lose the OpenVG backend.  It's been
  disappointing with the lack of interest on desktop systems even
  through cairo, but I wonder if it was because of the inherent Gallium
  only nature?  Cairo's GL backend is probably more likely to work on
  more systems because of this.  Admittedly, it was probably mostly
  useful as a technology on weaker CPUs and simpler GPUs without full
  OpenGL(ES) capability where it could provide a performant GUI.
 
   T his series removes st/egl and st/gbm support from the autoconf
   build (the latter depends on the former and is probably just as
   redundant). The next step is to remove all Linux-specific backends
   from st/egl. Windows, Android, and other platform backends will be
   kept intact, therefore st/egl won't be removed completely.
  
   Please comment.
  
  I use EGL_DRIVER=egl_gallium to switch to using the ilo driver at run-
  time.  (Admittedly, mostly for testing more than anything useful.)
  There would have to be some other way of at least selecting run-time
  backend to egl_dri2 for this functionality to continue working.
 
  A few thoughts:
   - Iirc Eric is using st/egl as the dri2 backend seems to be causing
  problems :(
   - Android supports dri modules, but st/dri/Android.mk is missing.
  Should be trivial to add.
   - Windows - might be a pain in the a** to get it working. Then again
  does Windows have EGL ?
   - fbdev, OpenVG - the etnaviv project was using the former,
  currently
  moving to full-blown drm :)
 
  If one is to nuking the three st we can remove
   - the libEGL symbol export mayhem
   - gallium's st_api, and flatten things a bit
   - a bit of duplication :)
 
  In summary:
  With a bit of work we can remove Linux and Android from the
  equation. How many people/companies use EGL for Windows/fbdev, how
  about OpenVG on any platform ?
 
  I'm not sure we can know how many use OpenVG.  Probably more than is
  commonly thought on this list.
 
 Then I'd like those people or companies to speak up.
 
 I seriously doubt anyone is using OpenVG with a Gallium driver. It
 even needs GL3 hardware (because it uses ARL in a fragment shader).
 That means every non-GL3 Gallium driver has incomplete OpenVG support,
 so OpenVG users should use these: radeon = r600, nouveau = nv50, ilo
 maybe? That's it. I wouldn't bother with softpipe and llvmpipe -- if
 you have to use them, you're better off with a VG-over-GL wrapper and
 a good OpenGL driver/hardware combo.
 
 Marek

It seems like every time we discuss this, there's been the but we have it 
argument and one or two people saying but it would be cool.  But in 
practice, I don't know of anybody even using OpenVG, much less OpenVG on 
Gallium.  People use Cairo or Skia.

It's very clearly not being maintained or developed.  Marek showed data 
indicating no one has worked on it in some time.  Several developers have 
expressed that they would like OpenVG to work with egl_dri2, if we're going to 
keep it at all, and no one has stepped forward to do that work.

Perhaps dropping it would inspire a maintainer to 

Re: [Mesa-dev] Require micro-benchmarks for performance optimization oriented patches

2014-11-07 Thread Kenneth Graunke
On Friday, November 07, 2014 12:01:20 PM Siavash Eliasi wrote:
 I know that this might sound troublesome but since there is no 
 benchmarks done by reviewers before pushing the performance optimization 
 oriented patches into master branch, I think it's as important as piglit 
 tests and necessary to ask the patch provider for simple OpenGL micro 
 benchmarks triggering the optimized code path to see if there is any 
 real benefits and make sure that it isn't degrading the performance.
 
 Being more strict about pushing and quality assurance of these kind of 
 patches will save hours of bisecting and hair-pulling to find the root 
 cause of performance degrades.
 
 Best regards,
 Siavash Eliasi.

For performance patches, we generally require authors to include data in their 
commit messages showing that an application benefits by some metric.  (It's 
useful for posterity too, so people can answer what is all this code for?).

Honestly, I think I'm okay with our usual metrics like:
- Increased FPS in a game or benchmark
- Reduced number of instructions or memory accesses in a shader program
- Reduced memory consumption
- Significant cycle reduction in callgrind or better generated code
  (ideally if it's a lot of code I'd like better justification)

We've had a lot of cases of people submitting +LOC with no actual metric of 
improvement at all, and we should absolutely reject those until /some/ data is 
supplied indicating why the patch is useful.

Microbenchmarks can be helpful in demonstrating performance improvements.  But 
they can also be pretty misleading - you can make a microbenchmark run a lot 
faster, yet not impact real applications.  On the flip side, if a developer 
can show that a patch makes a real application (e.g. DOTA 2) run faster, I 
don't see any reason to require them to additionally write a microbenchmark.  
The patch is clearly justified, and developers are already overworked.

That said, if someone would like to start building an open source 
microbenchmarking suite, I could definitely see that being a valuable project.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/2] i965: Use the predicate enable bit for conditional rendering without stalling

2014-11-07 Thread Neil Roberts
Previously whenever a primitive is drawn the driver would call
_mesa_check_conditional_render which blocks waiting for the result of the
query to determine whether to render. On Gen7+ there is a bit in the
3DPRIMITIVE command which can be used to disable the primitive based on the
value of a state bit. This state bit can be set based on whether two registers
have different values using the MI_PREDICATE command. We can load these two
registers with the pixel count values stored in the query begin and end to
implement conditional rendering without stalling.

Unfortunately these two source registers are not in the whitelist of available
registers in the kernel driver so this needs a kernel patch to work. This
patch uses the command parser version from intel_screen to detect whether to
attempt to set the predicate data registers.

The predicate enable bit is currently only used for drawing 3D primitives. For
blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For
most of these it would probably just work to call the new
brw_check_conditional_render function instead of
_mesa_check_conditional_render because they already work in terms of rendering
primitives. However it's a bit trickier for blits because it can use the BLT
ring or the blorp codepath. I think these operations are less useful for
conditional rendering than rendering primitives so it might be best to leave
it for a later patch.

v2: Use the command parser version to detect whether we can write to the
predicate data registers instead of trying to execute a register load
command.
---

Glenn Kennard suggested that instead of trying to send a load register
command to detect whether the predicate source registers can be set we
could just increase the command parser version in the kernel driver
and query that. That seems nicer to me so here is a second version of
the patch to do that. I will post a v2 of the kernel patch too.

 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 +
 src/mesa/drivers/dri/i965/brw_context.c|   4 +
 src/mesa/drivers/dri/i965/brw_context.h|  21 +++
 src/mesa/drivers/dri/i965/brw_defines.h|   1 +
 src/mesa/drivers/dri/i965/brw_draw.c   |  16 +-
 src/mesa/drivers/dri/i965/brw_queryobj.c   |  17 ++-
 src/mesa/drivers/dri/i965/intel_extensions.c   |   5 +
 src/mesa/drivers/dri/i965/intel_reg.h  |  23 +++
 9 files changed, 243 insertions(+), 12 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 711aabe..0adaf4d 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -38,6 +38,7 @@ i965_FILES = \
brw_clip_tri.c \
brw_clip_unfilled.c \
brw_clip_util.c \
+   brw_conditional_render.c \
brw_context.c \
brw_cubemap_normalize.cpp \
brw_curbe.c \
diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c 
b/src/mesa/drivers/dri/i965/brw_conditional_render.c
new file mode 100644
index 000..e676aa0
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
@@ -0,0 +1,167 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Neil Roberts n...@linux.intel.com
+ */
+
+/** @file brw_conditional_render.c
+ *
+ * Support for conditional rendering based on query objects
+ * (GL_NV_conditional_render, GL_ARB_conditional_render_inverted) on Gen7+.
+ */
+
+#include main/imports.h
+#include main/condrender.h
+
+#include brw_context.h
+#include brw_defines.h
+#include intel_batchbuffer.h
+
+static void
+set_predicate_enable(struct brw_context *brw,
+ bool value)
+{
+ 

[Mesa-dev] [PATCH 1/2] i965: Store the command parser version number in intel_screen

2014-11-07 Thread Neil Roberts
In order to detect whether the predicate source registers can be used in a
later patch we will need to know the version number for the command parser.
This patch just adds a member to intel_screen and does an ioctl to get the
version.
---
 src/mesa/drivers/dri/i965/intel_screen.c | 7 +++
 src/mesa/drivers/dri/i965/intel_screen.h | 8 +++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 6618c1a..ab7ac85 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1362,6 +1362,13 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
  (ret != -1 || errno != EINVAL);
}
 
+   struct drm_i915_getparam getparam;
+   getparam.param = I915_PARAM_CMD_PARSER_VERSION;
+   getparam.value = intelScreen-cmd_parser_version;
+   const int ret = drmIoctl(psp-fd, DRM_IOCTL_I915_GETPARAM, getparam);
+   if (ret == -1)
+  intelScreen-cmd_parser_version = 0;
+
psp-extensions = !intelScreen-has_context_reset_notification
   ? intelScreenExtensions : intelRobustScreenExtensions;
 
diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
b/src/mesa/drivers/dri/i965/intel_screen.h
index 393315e..a7706b3 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.h
+++ b/src/mesa/drivers/dri/i965/intel_screen.h
@@ -115,7 +115,13 @@ struct intel_screen
* Configuration cache with default values for all contexts
*/
driOptionCache optionCache;
-};
+
+   /**
+* Version of the command parser reported by the
+* I915_PARAM_CMD_PARSER_VERSION parameter
+*/
+   int cmd_parser_version;
+ };
 
 extern void intelDestroyContext(__DRIcontext * driContextPriv);
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965: Store the command parser version number in intel_screen

2014-11-07 Thread Kenneth Graunke
On Friday, November 07, 2014 06:52:59 PM Neil Roberts wrote:
 In order to detect whether the predicate source registers can be used in a
 later patch we will need to know the version number for the command parser.
 This patch just adds a member to intel_screen and does an ioctl to get the
 version.
 ---
  src/mesa/drivers/dri/i965/intel_screen.c | 7 +++
  src/mesa/drivers/dri/i965/intel_screen.h | 8 +++-
  2 files changed, 14 insertions(+), 1 deletion(-)
 
 diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
 index 6618c1a..ab7ac85 100644
 --- a/src/mesa/drivers/dri/i965/intel_screen.c
 +++ b/src/mesa/drivers/dri/i965/intel_screen.c
 @@ -1362,6 +1362,13 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
   (ret != -1 || errno != EINVAL);
 }
  
 +   struct drm_i915_getparam getparam;
 +   getparam.param = I915_PARAM_CMD_PARSER_VERSION;
 +   getparam.value = intelScreen-cmd_parser_version;
 +   const int ret = drmIoctl(psp-fd, DRM_IOCTL_I915_GETPARAM, getparam);
 +   if (ret == -1)
 +  intelScreen-cmd_parser_version = 0;
 +
 psp-extensions = !intelScreen-has_context_reset_notification
? intelScreenExtensions : intelRobustScreenExtensions;
  
 diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
b/src/mesa/drivers/dri/i965/intel_screen.h
 index 393315e..a7706b3 100644
 --- a/src/mesa/drivers/dri/i965/intel_screen.h
 +++ b/src/mesa/drivers/dri/i965/intel_screen.h
 @@ -115,7 +115,13 @@ struct intel_screen
 * Configuration cache with default values for all contexts
 */
 driOptionCache optionCache;
 -};
 +
 +   /**
 +* Version of the command parser reported by the
 +* I915_PARAM_CMD_PARSER_VERSION parameter
 +*/
 +   int cmd_parser_version;
 + };
  
  extern void intelDestroyContext(__DRIcontext * driContextPriv);

Patch 1 is:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org

We'll clearly want this eventually.  Whether or not we can actually use the 
command parser is unclear - at present, the kernel leaves the hardware command 
validator enabled even if you also validate in software.  It looks like Brad 
sent out new patches that might fix this, though, so assuming those land, 
maybe we'll be good to go... :)

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs

2014-11-07 Thread Eric Anholt
Pekka Paalanen ppaala...@gmail.com writes:

 On Thu, 06 Nov 2014 13:01:03 -0800
 Ian Romanick i...@freedesktop.org wrote:

 I thought Eric and Chad already NAKed it in bugzilla.  The problem is
 that applications ask for an RGBA visual for GL blending.  They use the
 alpha channel to generate their images, but the final alpha values are,
 basically, random... and the composited result would be pure garbage.

 Reading
 https://bugs.freedesktop.org/show_bug.cgi?id=67676#c5
 We should certainly be exposing EGLConfigs that match up to the rgba
 visual, though, so you can find it when you try. - Eric

 To me that sounds like Eric would accept having the visuals there
 in additional configs (as long as they are sorted after the otherwise
 equivalent xRGB configs?). Eric, would you like to confirm your current
 opinion?

What I believe we want:

Somebody just requesting RGBA with ChooseConfig doesn't end up forced
into the depth-32 (blending) X visual.  This is the most important.

There is some mechanism for somebody that does want the depth 32 visual
to get an EGL config to draw to it.  This is important but secondary to
not breaking everything else, and them having to jump through hoops is
reasonable but probably avoidable.


pgp0ltGUZci54.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Require micro-benchmarks for performance optimization oriented patches

2014-11-07 Thread Matt Turner
On Fri, Nov 7, 2014 at 12:31 AM, Siavash Eliasi siavashser...@gmail.com wrote:
 Being more strict about pushing and quality assurance of these kind of
 patches will save hours of bisecting and hair-pulling to find the root cause
 of performance degrades.

You say this as if it has happened...?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] util/format: Fix clamping to 32bit integers.

2014-11-07 Thread jfonseca
From: José Fonseca jfons...@vmware.com

Use clamping constants that guarantee no integer overflows.

As spotted by Chris Forbes.

This causes the code to change as:

- value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967295.0f);
+ value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967040.0f);

- value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 
2147483647.0f));
+ value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 
2147483520.0f));
---
 src/gallium/auxiliary/util/u_format_pack.py | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/src/gallium/auxiliary/util/u_format_pack.py 
b/src/gallium/auxiliary/util/u_format_pack.py
index 90f348e..d5138cc 100644
--- a/src/gallium/auxiliary/util/u_format_pack.py
+++ b/src/gallium/auxiliary/util/u_format_pack.py
@@ -207,9 +207,36 @@ def get_one_shift(type):
 assert False
 
 
+def truncate_mantissa(x, bits):
+'''Truncate an integer so it can be represented exactly with a floating
+point mantissa'''
+
+assert isinstance(x, (int, long))
+
+s = 1
+if x  0:
+s = -1
+x = -x
+
+# We can represent integers up to mantissa + 1 bits exactly
+mask = (1  (bits + 1)) - 1
+
+# Slide the mask until the MSB matches
+shift = 0
+while (x  shift)  ~mask:
+shift += 1
+
+x = mask  shift
+x *= s
+return x
+
+
 def value_to_native(type, value):
 '''Get the value of unity for this type.'''
 if type.type == FLOAT:
+if type.size = 32 \
+and isinstance(value, (int, long)):
+return truncate_mantissa(value, 23)
 return value
 if type.type == FIXED:
 return int(value * (1  (type.size/2)))
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util/format: Fix clamping to 32bit integers.

2014-11-07 Thread Roland Scheidegger
Looks good to me though I barely understand the code there...
I guess ideally you'd generate code which actually clamps to the max int
value representable instead, but that probably would get even more messy
(something like:
val = MAX(src[0], -2147483648.0f)
if (src[0] = 2147483648.0f) val = 2147483647)


Reviewed-by: Roland Scheidegger srol...@vmware.com


Am 07.11.2014 um 22:25 schrieb jfons...@vmware.com:
 From: José Fonseca jfons...@vmware.com
 
 Use clamping constants that guarantee no integer overflows.
 
 As spotted by Chris Forbes.
 
 This causes the code to change as:
 
 - value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967295.0f);
 + value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967040.0f);
 
 - value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 
 2147483647.0f));
 + value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 
 2147483520.0f));
 ---
  src/gallium/auxiliary/util/u_format_pack.py | 27 +++
  1 file changed, 27 insertions(+)
 
 diff --git a/src/gallium/auxiliary/util/u_format_pack.py 
 b/src/gallium/auxiliary/util/u_format_pack.py
 index 90f348e..d5138cc 100644
 --- a/src/gallium/auxiliary/util/u_format_pack.py
 +++ b/src/gallium/auxiliary/util/u_format_pack.py
 @@ -207,9 +207,36 @@ def get_one_shift(type):
  assert False
  
  
 +def truncate_mantissa(x, bits):
 +'''Truncate an integer so it can be represented exactly with a floating
 +point mantissa'''
 +
 +assert isinstance(x, (int, long))
 +
 +s = 1
 +if x  0:
 +s = -1
 +x = -x
 +
 +# We can represent integers up to mantissa + 1 bits exactly
 +mask = (1  (bits + 1)) - 1
 +
 +# Slide the mask until the MSB matches
 +shift = 0
 +while (x  shift)  ~mask:
 +shift += 1
 +
 +x = mask  shift
 +x *= s
 +return x
 +
 +
  def value_to_native(type, value):
  '''Get the value of unity for this type.'''
  if type.type == FLOAT:
 +if type.size = 32 \
 +and isinstance(value, (int, long)):
 +return truncate_mantissa(value, 23)
  return value
  if type.type == FIXED:
  return int(value * (1  (type.size/2)))
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util/format: Fix clamping to 32bit integers.

2014-11-07 Thread Jose Fonseca
 I guess ideally you'd generate code which actually clamps to the max int
value representable instead

Yes, that would be more complex to generate.

Jose



From: Roland Scheidegger
Sent: 07 November 2014 21:40
To: Jose Fonseca; mesa-dev@lists.freedesktop.org; chr...@ijw.co.nz
Subject: Re: [PATCH] util/format: Fix clamping to 32bit integers.

Looks good to me though I barely understand the code there...
I guess ideally you'd generate code which actually clamps to the max int
value representable instead, but that probably would get even more messy
(something like:
val = MAX(src[0], -2147483648.0f)
if (src[0] = 2147483648.0f) val = 2147483647)



Reviewed-by: Roland Scheidegger srol...@vmware.com


Am 07.11.2014 um 22:25 schrieb jfons...@vmware.com:
 From: José Fonseca jfons...@vmware.com

 Use clamping constants that guarantee no integer overflows.

 As spotted by Chris Forbes.

 This causes the code to change as:

 - value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967295.0f);
 + value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967040.0f);

 - value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 
 2147483647.0f));
 + value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 
 2147483520.0f));
 ---
  src/gallium/auxiliary/util/u_format_pack.py | 27 +++
  1 file changed, 27 insertions(+)

 diff --git a/src/gallium/auxiliary/util/u_format_pack.py 
 b/src/gallium/auxiliary/util/u_format_pack.py
 index 90f348e..d5138cc 100644
 --- a/src/gallium/auxiliary/util/u_format_pack.py
 +++ b/src/gallium/auxiliary/util/u_format_pack.py
 @@ -207,9 +207,36 @@ def get_one_shift(type):
  assert False


 +def truncate_mantissa(x, bits):
 +'''Truncate an integer so it can be represented exactly with a floating
 +point mantissa'''
 +
 +assert isinstance(x, (int, long))
 +
 +s = 1
 +if x  0:
 +s = -1
 +x = -x
 +
 +# We can represent integers up to mantissa + 1 bits exactly
 +mask = (1  (bits + 1)) - 1
 +
 +# Slide the mask until the MSB matches
 +shift = 0
 +while (x  shift)  ~mask:
 +shift += 1
 +
 +x = mask  shift
 +x *= s
 +return x
 +
 +
  def value_to_native(type, value):
  '''Get the value of unity for this type.'''
  if type.type == FLOAT:
 +if type.size = 32 \
 +and isinstance(value, (int, long)):
 +return truncate_mantissa(value, 23)
  return value
  if type.type == FIXED:
  return int(value * (1  (type.size/2)))

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 76252] Dynamic loading/unloading of opengl32.dll results in a deadlock

2014-11-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=76252

José Fonseca jfons...@vmware.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from José Fonseca jfons...@vmware.com ---
Fixed with
http://cgit.freedesktop.org/mesa/mesa/commit/?id=706ad3b649e6a75fdac9dc9acc3caa9e6067b853

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86025] src\glsl\list.h(535) : error C2143: syntax error : missing '; ' before 'type'

2014-11-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86025

Bug ID: 86025
   Summary: src\glsl\list.h(535) : error C2143: syntax error :
missing ';' before 'type'
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: Windows (All)
Status: NEW
  Keywords: bisected, regression
  Severity: blocker
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: v...@freedesktop.org
CC: ja...@jlekstrand.net, matts...@gmail.com

mesa: 0c36aac83252d32dac8c4da2850539bff0b10301 (master 10.4.0-devel)

  Compiling src\mesa\main\shaderapi.c ...
shaderapi.c
src\glsl\list.h(535) : error C2143: syntax error : missing ';' before 'type'
src\glsl\list.h(535) : error C2143: syntax error : missing ')' before 'type'
src\glsl\list.h(536) : error C2065: 'node' : undeclared identifier


commit 0c36aac83252d32dac8c4da2850539bff0b10301
Author: Jason Ekstrand jason.ekstr...@intel.com
Date:   Wed Nov 5 13:57:09 2014 -0800

glsl/list: Add an exec_list_validate function

This can be very useful for trying to debug list corruptions.

Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
Reviewed-by: Matt Turner matts...@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86025] src\glsl\list.h(535) : error C2143: syntax error : missing '; ' before 'type'

2014-11-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86025

Vinson Lee v...@freedesktop.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Vinson Lee v...@freedesktop.org ---
commit f9fc3949e182cb2da8ce6d8d71f441986795664f
Author: Vinson Lee v...@freedesktop.org
Date:   Fri Nov 7 15:33:41 2014 -0800

glsl/list: Move declaration before code.

Fixes MSVC build error.

shaderapi.c
src\glsl\list.h(535) : error C2143: syntax error : missing ';' before
'type'
src\glsl\list.h(535) : error C2143: syntax error : missing ')' before
'type'
src\glsl\list.h(536) : error C2065: 'node' : undeclared identifier

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86025
Signed-off-by: Vinson Lee v...@freedesktop.org

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)

2014-11-07 Thread Eric Anholt
Ian Romanick i...@freedesktop.org writes:

 On 11/06/2014 06:16 PM, Michel Dänzer wrote:
 On 06.11.2014 19:18, Joonas Lahtinen wrote:
 On to, 2014-11-06 at 18:12 +0900, Michel Dänzer wrote:
 On 05.11.2014 20:14, Joonas Lahtinen wrote:

 Modified not refer to DRI3, just uses the present extension to get rid
 of the excess buffer invalidations.

 AFAICT there's no fallback from your changes to the current behaviour if
 the X server doesn't support the Present extension. There probably needs
 to be such a fallback.

 It gets rid of such nasty hack (the intel_viewport one), that I thought
 there is no point making fallback. Because without this, the egl dri2
 backend is fundamentally broken anyway.
 
 Well, AFAICT your code uses Present extension functionality
 unconditionally, without checking that the X server supports Present. I
 can't see how that could possibly work on an X server which doesn't
 support Present, but I think it would be better to keep it working at
 least as badly as it does now in that case. :)

 I was going to say pretty much the same thing.  Aren't there (non-Intel)
 drivers that don't do Present?  If I'm not mistaken, some parts of DRI3
 (not sure about Present) are even disabled in the Intel driver when SNA
 is in use... or at least that was the case at one point.

They actually get a fallback implementation if there's no driver
support, which would be sufficient for this code.

However, Present is too new for Mesa to be unconditionally relying on in
my opinion.


pgpD4o39VLXLa.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl/list: Add a foreach_list_typed_safe_reverse macro

2014-11-07 Thread Jason Ekstrand
---
 src/glsl/list.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/glsl/list.h b/src/glsl/list.h
index 1d18ec9..1fb9178 100644
--- a/src/glsl/list.h
+++ b/src/glsl/list.h
@@ -673,4 +673,13 @@ inline void exec_node::insert_before(exec_list *before)
 __node = __next, __next =  \
exec_node_data(__type, (__next)-__field.next, __field))
 
+#define foreach_list_typed_safe_reverse(__type, __node, __field, __list)   \
+   for (__type * __node =  \
+   exec_node_data(__type, (__list)-tail_pred, __field),   \
+   * __prev =  \
+   exec_node_data(__type, (__node)-__field.prev, __field);\
+__prev != NULL;\
+__node = __prev, __prev =  \
+   exec_node_data(__type, (__prev)-__field.prev, __field))
+
 #endif /* LIST_CONTAINER_H */
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH WIP 1/1] configure: include llvm systemlibs when using static llvm

2014-11-07 Thread Jan Vesely
On Wed, 2014-10-29 at 19:00 +, Emil Velikov wrote:
 On 27/10/14 21:03, Jan Vesely wrote:
  On Mon, 2014-10-27 at 20:22 +, Emil Velikov wrote:
  On 27/10/14 18:05, Jan Vesely wrote:
  On Mon, 2014-10-27 at 02:24 +, Emil Velikov wrote:
  On 26/10/14 19:36, Jan Vesely wrote:
  On Fri, 2014-10-24 at 23:54 +, Emil Velikov wrote:
  On 24/10/14 17:03, Jan Vesely wrote:
  -Wl,--exclude-libs prevents automatic export of symbols
 
 
  CC: Kai Wasserbach k...@dev.carbon-project.org
  CC: Emil Velikov emil.l.veli...@gmail.com
  Signed-off-by: Jan Vesely jan.ves...@rutgers.edu
  ---
 
  Kai,
  can you try this patch with your setup, and check whether LLVM 
  symbols are
  exported from mesa library? (and it's still working)
 
  Emil,
  would it help to have --exclude-libs ALL enabled globally?
 
  Haven't really looked up on the documentation about it, yet there 
  should
  be no (unneeded) exported symbols thanks to the version scripts.
  As such I'm not entirely sure what this patch (attempts to) resolve :(
 
  you are right. I don't know why I thought it was still a problem.
  In that case the attached patch should fix compiling with llvm static
  libs (#70410)
 
  For future patches please add the full link in the commit message
  Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70410
 
  Afaics the bug mentioned has a slightly different patch, which brings
  the question - do we need to append to the existing LLVM_LIBS, or can we
  overwrite them ? Either way it would be nice if we can get a tested-by
  or two :)
 
  Hi,
 
  I looked at this again. LLVM cmake links system-libs as either PUBLIC or
  INTERFACE [0].
  it means my patch is incorrect, and we should link against system-libs
  even if we use llvm-shared-libs.
  you can add my R-b to the patch by K. Sobiecky that is attached to the
  bug.[1]
 
  Sigh... cmake why you so PoS ?
 
  On a more mature note:
  I do not see why would we need it to link against those libraries for
  shared linking. If their libs are broken (have unresolved symbols), and
  we need this hack to get them working then maybe, but
  ... looking at line 151 - # FIXME: Should this be really PUBLIC?
  Answer: PRIVATE for shared libs, PUBLIC for static ones.
 
  Using PUBLIC causes all the users to recursively link against those
  deps. Leading to over-linking and opening the door for serious issues.
  
  looks like misdesign on llvm side. they use public to bring systemlibs
  to other llvm libs, while ignore private deps elsewhere (like libffi).
  I'd try to send a patch, but my last attempt to get Alexander's rtti
  patch in is stuck for 9 months...
  
 Yes, dealing with build systems is a job no-one wants to do, and I fear
 most people underestimate it. Please let them know and be persistent
 otherwise they might will end up abusing it too much :)

just FYI
the llvm cmake part got fixed in r221530 and r221531.

 
  anyway, since we only need those libs in static builds, do you want me
  to repost the patch with bug reference and Kai's tested by?
  
 There is no problem. I've picked it up, added the tags + cc'd
 mesa-stable, as I would expect a few more people willing to use mesa
 with static linked llvm.
 
 Thanks
 Emil
 
  jan
  
 
 
  -Emil
 
  P.S. Both their automake + cmake builds seems _quite_ bad.
  autoconf/Readme has a nice documentation of it :)
 
 
  jan
 
 
  [0] lib/Support/CMakeLists.txt:150
  [1] https://bugs.freedesktop.org/attachment.cgi?id=91764
 
  Thanks
  Emil
 
  jan
 
 
  -Emil
 
  jan
 
   configure.ac | 10 +-
   1 file changed, 9 insertions(+), 1 deletion(-)
 
  diff --git a/configure.ac b/configure.ac
  index 3c76deb..b4b4b13 100644
  --- a/configure.ac
  +++ b/configure.ac
  @@ -1981,7 +1981,15 @@ if test x$MESA_LLVM != x0; then
  dnl already added all of these objects to LLVM_LIBS.
   fi
   else
  -AC_MSG_WARN([Building mesa with staticly linked LLVM may 
  cause compilation issues])
  +AC_MSG_WARN([Building mesa with statically linked LLVM may 
  cause compilation issues])
  + dnl Don't export symbols automatically
  + dnl TODO: Do we want to list llvm libs explicitly here?
  + LLVM_LDFLAGS+= -Wl,exclude-libs ALL
  + dnl We need to link to llvm system libs when using static libs
  + dnl However, only llvm 3.5+ provides --system-libs
  + if test $LLVM_VERSION_MAJOR -eq 3 -a $LLVM_VERSION_MINOR -ge 5; 
  then
  + LLVM_LIBS+= `$LLVM_CONFIG --system-libs`
  + fi
   fi
   fi
   
 
 
 
 
 
 
  
 

-- 
Jan Vesely jan.ves...@rutgers.edu


signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Require micro-benchmarks for performance optimization oriented patches

2014-11-07 Thread Siavash Eliasi

Sounds okay to me, thanks!

Best regards,
Siavash Eliasi.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Require micro-benchmarks for performance optimization oriented patches

2014-11-07 Thread Siavash Eliasi


On 11/07/2014 10:56 PM, Matt Turner wrote:

On Fri, Nov 7, 2014 at 12:31 AM, Siavash Eliasi siavashser...@gmail.com wrote:

Being more strict about pushing and quality assurance of these kind of
patches will save hours of bisecting and hair-pulling to find the root cause
of performance degrades.

You say this as if it has happened...?


Nope, I'm not aware of any. Just saying that it's better to stay safe. 
Because mesa has reached a pretty much stable and usable state for 
gaming and daily tasks, we will see more of these patches in upcoming years.


Best regards,
Siavash Eliasi.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl/list: Add a foreach_list_typed_safe_reverse macro

2014-11-07 Thread Connor Abbott
FWIW, Ian commented on my patch to add foreach_list_typed_reverse that
we shouldn't merge it until it had users so people wouldn't try to
remove it -- it seems like this has the same issue. So either merge
both of them at the same time, or neither until NIR gets merged...

Connor


On Fri, Nov 7, 2014 at 9:32 PM, Jason Ekstrand ja...@jlekstrand.net wrote:
 ---
  src/glsl/list.h | 9 +
  1 file changed, 9 insertions(+)

 diff --git a/src/glsl/list.h b/src/glsl/list.h
 index 1d18ec9..1fb9178 100644
 --- a/src/glsl/list.h
 +++ b/src/glsl/list.h
 @@ -673,4 +673,13 @@ inline void exec_node::insert_before(exec_list *before)
  __node = __next, __next =  \
 exec_node_data(__type, (__next)-__field.next, __field))

 +#define foreach_list_typed_safe_reverse(__type, __node, __field, __list)   \
 +   for (__type * __node =  \
 +   exec_node_data(__type, (__list)-tail_pred, __field),   \
 +   * __prev =  \
 +   exec_node_data(__type, (__node)-__field.prev, __field);\
 +__prev != NULL;\
 +__node = __prev, __prev =  \
 +   exec_node_data(__type, (__prev)-__field.prev, __field))
 +
  #endif /* LIST_CONTAINER_H */
 --
 2.1.0

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl/list: Add a foreach_list_typed_safe_reverse macro

2014-11-07 Thread Jason Ekstrand
On Fri, Nov 7, 2014 at 8:57 PM, Connor Abbott cwabbo...@gmail.com wrote:

 FWIW, Ian commented on my patch to add foreach_list_typed_reverse that
 we shouldn't merge it until it had users so people wouldn't try to
 remove it -- it seems like this has the same issue. So either merge
 both of them at the same time, or neither until NIR gets merged...


Fair enough.  This at least gets it on the list for review.



 Connor


 On Fri, Nov 7, 2014 at 9:32 PM, Jason Ekstrand ja...@jlekstrand.net
 wrote:
  ---
   src/glsl/list.h | 9 +
   1 file changed, 9 insertions(+)
 
  diff --git a/src/glsl/list.h b/src/glsl/list.h
  index 1d18ec9..1fb9178 100644
  --- a/src/glsl/list.h
  +++ b/src/glsl/list.h
  @@ -673,4 +673,13 @@ inline void exec_node::insert_before(exec_list
 *before)
   __node = __next, __next =
 \
  exec_node_data(__type, (__next)-__field.next, __field))
 
  +#define foreach_list_typed_safe_reverse(__type, __node, __field,
 __list)   \
  +   for (__type * __node =
 \
  +   exec_node_data(__type, (__list)-tail_pred, __field),
\
  +   * __prev =
 \
  +   exec_node_data(__type, (__node)-__field.prev, __field);
 \
  +__prev != NULL;
 \
  +__node = __prev, __prev =
 \
  +   exec_node_data(__type, (__prev)-__field.prev, __field))
  +
   #endif /* LIST_CONTAINER_H */
  --
  2.1.0
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl/list: Add a foreach_list_typed_safe_reverse macro

2014-11-07 Thread Matt Turner
Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Permanently enable features supported by target CPU at compile time.

2014-11-07 Thread Siavash Eliasi
This will remove the need for unnecessary runtime checks for CPU features if
already supported by target CPU, resulting in smaller and less branchy code.
---
 src/mesa/x86/common_x86_features.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/src/mesa/x86/common_x86_features.h 
b/src/mesa/x86/common_x86_features.h
index 66f2cf6..9888c26 100644
--- a/src/mesa/x86/common_x86_features.h
+++ b/src/mesa/x86/common_x86_features.h
@@ -67,5 +67,30 @@
 #define cpu_has_3dnowext   (_mesa_x86_cpu_features  X86_FEATURE_3DNOWEXT)
 #define cpu_has_sse4_1 (_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)
 
+/* Permanently enable features supported by target CPU at compile time */
+#ifdef __MMX__
+#define cpu_has_mmx1
+#endif
+
+#ifdef __SSE__
+#define cpu_has_xmm1
+#endif
+
+#ifdef __SSE2__
+#define cpu_has_xmm2   1
+#endif
+
+#ifdef __3dNOW__
+#define cpu_has_3dnow  1
+#endif
+
+#ifdef __SSSE3__
+#define cpu_has_ssse3  1
+#endif
+
+#ifdef __SSE4_1__
+#define cpu_has_sse4_1 1
+#endif
+
 #endif
 
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Permanently enable features supported by target CPU at compile time.

2014-11-07 Thread Matt Turner
On Fri, Nov 7, 2014 at 10:43 PM, Siavash Eliasi siavashser...@gmail.com wrote:
 This will remove the need for unnecessary runtime checks for CPU features if
 already supported by target CPU, resulting in smaller and less branchy code.
 ---
  src/mesa/x86/common_x86_features.h | 25 +
  1 file changed, 25 insertions(+)

 diff --git a/src/mesa/x86/common_x86_features.h 
 b/src/mesa/x86/common_x86_features.h
 index 66f2cf6..9888c26 100644
 --- a/src/mesa/x86/common_x86_features.h
 +++ b/src/mesa/x86/common_x86_features.h
 @@ -67,5 +67,30 @@
  #define cpu_has_3dnowext   (_mesa_x86_cpu_features  
 X86_FEATURE_3DNOWEXT)
  #define cpu_has_sse4_1 (_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)

 +/* Permanently enable features supported by target CPU at compile time */
 +#ifdef __MMX__
 +#define cpu_has_mmx1
 +#endif
 +
 +#ifdef __SSE__
 +#define cpu_has_xmm1
 +#endif
 +
 +#ifdef __SSE2__
 +#define cpu_has_xmm2   1
 +#endif
 +
 +#ifdef __3dNOW__
 +#define cpu_has_3dnow  1
 +#endif
 +
 +#ifdef __SSSE3__
 +#define cpu_has_ssse3  1
 +#endif

There's not an existing cpu_has_ssse3 macro.

 +
 +#ifdef __SSE4_1__
 +#define cpu_has_sse4_1 1
 +#endif

As you can see at the beginning of the patch, you're just redefining
the same macros...

They should be

#ifdef __SSE__
#define cpu_has_xmm1
#else
#define cpu_has_xmm (_mesa_x86_cpu_features  X86_FEATURE_XMM)
#endif
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Permanently enable features supported by target CPU at compile time.

2014-11-07 Thread Siavash Eliasi


On 11/08/2014 10:30 AM, Matt Turner wrote:

On Fri, Nov 7, 2014 at 10:43 PM, Siavash Eliasi siavashser...@gmail.com wrote:

This will remove the need for unnecessary runtime checks for CPU features if
already supported by target CPU, resulting in smaller and less branchy code.
---
  src/mesa/x86/common_x86_features.h | 25 +
  1 file changed, 25 insertions(+)

diff --git a/src/mesa/x86/common_x86_features.h 
b/src/mesa/x86/common_x86_features.h
index 66f2cf6..9888c26 100644
--- a/src/mesa/x86/common_x86_features.h
+++ b/src/mesa/x86/common_x86_features.h
@@ -67,5 +67,30 @@
  #define cpu_has_3dnowext   (_mesa_x86_cpu_features  X86_FEATURE_3DNOWEXT)
  #define cpu_has_sse4_1 (_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)

+/* Permanently enable features supported by target CPU at compile time */
+#ifdef __MMX__
+#define cpu_has_mmx1
+#endif
+
+#ifdef __SSE__
+#define cpu_has_xmm1
+#endif
+
+#ifdef __SSE2__
+#define cpu_has_xmm2   1
+#endif
+
+#ifdef __3dNOW__
+#define cpu_has_3dnow  1
+#endif
+
+#ifdef __SSSE3__
+#define cpu_has_ssse3  1
+#endif

There's not an existing cpu_has_ssse3 macro.


Just wanted to add it in advance in case Timothy Arceri's patch gets merged:
[Mesa-dev] [PATCH V2 1/2] mesa: add runtime support for SSSE3
http://lists.freedesktop.org/archives/mesa-dev/2014-November/070338.html

I'll remove it if this isn't fine by you.




+
+#ifdef __SSE4_1__
+#define cpu_has_sse4_1 1
+#endif

As you can see at the beginning of the patch, you're just redefining
the same macros...

They should be

#ifdef __SSE__
#define cpu_has_xmm1
#else
#define cpu_has_xmm (_mesa_x86_cpu_features  X86_FEATURE_XMM)
#endif


Sure, will modify it. By the way, GCC should use the last macro 
definition so cpu_has_xmm should be 1 in case it's going to be 
compiled for an SSE capable target and previous definition should be 
ignored. Am I missing something?


Best regards,
Siavash Eliasi.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev