Re: [Mesa-dev] [PATCH] softpipe: fix shadow sampling and remove nonsensical approximation of linear interpolation behavior for shadow samplers.

2014-04-14 Thread Janzing, Heinrich
Yes please. Thanks!

-Original Message-
Date: Fri, 11 Apr 2014 08:06:42 -0600
From: Brian Paul bri...@vmware.com
To: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] softpipe: fix shadow sampling and
remove nonsensical approximation of linear interpolation behavior for
shadow samplers.
Message-ID: 5347f6f2.20...@vmware.com
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 04/11/2014 01:14 AM, Janzing, Heinrich wrote:
 Shadow sampling appeared to be fundamentally broken; fix in attachment.


Reviewed-by: Brian Paul bri...@vmware.com

Piglit tests seem OK too.

Do you need me to commit this for you?

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] glsl: Fix incorrect indentation.

2014-04-14 Thread Iago Toral Quiroga
Thanks for reviewing and sorry for the huge patch, unfortunately
indentation was wrong all over the place, so it was this or a large number
of smaller patches :-(

I fixed the issues you found and completed the patch to fix some other things
that I missed in my previous patch, mostly about wrong indentation of function
parameters and wrong alignment of parameters for various function calls.

Iago Toral Quiroga (1):
  glsl: Fix incorrect indentation.

 src/glsl/ast_to_hir.cpp | 1575 +++
 1 file changed, 782 insertions(+), 793 deletions(-)

-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 73106] gtest-port.h:499:13: fatal error: 'tr1/tuple' file not found

2014-04-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=73106

Vinson Lee v...@freedesktop.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Vinson Lee v...@freedesktop.org ---
commit 36fb36aa36d8933fef04c1bdaed3ccc9b5108908
Author: Vinson Lee v...@freedesktop.org
Date:   Wed Feb 26 22:54:24 2014 -0800

gtest: Update to 1.7.0.

This patch fixes gtest build errors on Mac OS X 10.9.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73106
Signed-off-by: Vinson Lee v...@freedesktop.org
Tested-by: Ian Romanick ian.d.roman...@intel.com
Acked-by: Ian Romanick ian.d.roman...@intel.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] meta: Clip src/dest rects in BlitFramebuffer.

2014-04-14 Thread Chris Forbes
Nothing was bothering to clip the blit. If the src rect was partially
outside the framebuffer, we'd end up picking up more copies of the
edge texels due to clamping.

Note that this is slight overkill -- we could get away with clipping src
only, since fragments outside the destination surface will be discarded
anyway.

Fixes piglit's fbo-blit-stretch test on drivers which use the meta path.
(i965: should fix Broadwell, but also fixes Sandybridge/Ivybridge/Haswell
since this test falls off the blorp path now due to format conversion)

Signed-off-by: Chris Forbes chr...@ijw.co.nz
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77414
---
 src/mesa/drivers/common/meta_blit.c | 31 +--
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/common/meta_blit.c 
b/src/mesa/drivers/common/meta_blit.c
index 31e494f..38d53ef 100644
--- a/src/mesa/drivers/common/meta_blit.c
+++ b/src/mesa/drivers/common/meta_blit.c
@@ -33,6 +33,7 @@
 #include main/enable.h
 #include main/enums.h
 #include main/fbobject.h
+#include main/image.h
 #include main/macros.h
 #include main/matrix.h
 #include main/multisample.h
@@ -590,12 +591,9 @@ blitframebuffer_texture(struct gl_context *ctx,
return true;
 }
 
-/**
- * Meta implementation of ctx-Driver.BlitFramebuffer() in terms
- * of texture mapping and polygon rendering.
- */
-void
-_mesa_meta_BlitFramebuffer(struct gl_context *ctx,
+
+static void
+_mesa_meta_blit_preclipped(struct gl_context *ctx,
GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1,
GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1,
GLbitfield mask, GLenum filter)
@@ -804,6 +802,27 @@ fallback:
}
 }
 
+/**
+ * Meta implementation of ctx-Driver.BlitFramebuffer() in terms
+ * of texture mapping and polygon rendering.
+ */
+void
+_mesa_meta_BlitFramebuffer(struct gl_context *ctx,
+   GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1,
+   GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1,
+   GLbitfield mask, GLenum filter)
+{
+   if (!_mesa_clip_blit(ctx, srcX0, srcY0, srcX1, srcY1,
+dstX0, dstY0, dstX1, dstY1)) {
+  /* nothing left! */
+  return;
+   }
+
+   _mesa_meta_blit_preclipped(ctx, srcX0, srcY0, srcX1, srcY1,
+  dstX0, dstY0, dstX1, dstY1,
+  mask, filter);
+}
+
 void
 _mesa_meta_glsl_blit_cleanup(struct blit_state *blit)
 {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meta: Clip src/dest rects in BlitFramebuffer.

2014-04-14 Thread Marek Olšák
For scaled blits, this is not correct. For example, if you blit a
10x10 texture to another 10x10 texture with horizontal scaling of
1.5x, the X texture coordinate of the rightmost pixel should be
6.66. Mesa clip blit doesn't return floating-point coordinates of
the source, so you can't use them. You can however take the clipped
destination coordinates and set them as a scissor rectangle, keeping
the original coordinates unmodified. That way, you'll get correct
clipping (in hardware though) with correct source coordinates for
scaled blits.

See st_BlitFramebuffer.

Marek

On Mon, Apr 14, 2014 at 10:20 AM, Chris Forbes chr...@ijw.co.nz wrote:
 Nothing was bothering to clip the blit. If the src rect was partially
 outside the framebuffer, we'd end up picking up more copies of the
 edge texels due to clamping.

 Note that this is slight overkill -- we could get away with clipping src
 only, since fragments outside the destination surface will be discarded
 anyway.

 Fixes piglit's fbo-blit-stretch test on drivers which use the meta path.
 (i965: should fix Broadwell, but also fixes Sandybridge/Ivybridge/Haswell
 since this test falls off the blorp path now due to format conversion)

 Signed-off-by: Chris Forbes chr...@ijw.co.nz
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77414
 ---
  src/mesa/drivers/common/meta_blit.c | 31 +--
  1 file changed, 25 insertions(+), 6 deletions(-)

 diff --git a/src/mesa/drivers/common/meta_blit.c 
 b/src/mesa/drivers/common/meta_blit.c
 index 31e494f..38d53ef 100644
 --- a/src/mesa/drivers/common/meta_blit.c
 +++ b/src/mesa/drivers/common/meta_blit.c
 @@ -33,6 +33,7 @@
  #include main/enable.h
  #include main/enums.h
  #include main/fbobject.h
 +#include main/image.h
  #include main/macros.h
  #include main/matrix.h
  #include main/multisample.h
 @@ -590,12 +591,9 @@ blitframebuffer_texture(struct gl_context *ctx,
 return true;
  }

 -/**
 - * Meta implementation of ctx-Driver.BlitFramebuffer() in terms
 - * of texture mapping and polygon rendering.
 - */
 -void
 -_mesa_meta_BlitFramebuffer(struct gl_context *ctx,
 +
 +static void
 +_mesa_meta_blit_preclipped(struct gl_context *ctx,
 GLint srcX0, GLint srcY0, GLint srcX1, GLint 
 srcY1,
 GLint dstX0, GLint dstY0, GLint dstX1, GLint 
 dstY1,
 GLbitfield mask, GLenum filter)
 @@ -804,6 +802,27 @@ fallback:
 }
  }

 +/**
 + * Meta implementation of ctx-Driver.BlitFramebuffer() in terms
 + * of texture mapping and polygon rendering.
 + */
 +void
 +_mesa_meta_BlitFramebuffer(struct gl_context *ctx,
 +   GLint srcX0, GLint srcY0, GLint srcX1, GLint 
 srcY1,
 +   GLint dstX0, GLint dstY0, GLint dstX1, GLint 
 dstY1,
 +   GLbitfield mask, GLenum filter)
 +{
 +   if (!_mesa_clip_blit(ctx, srcX0, srcY0, srcX1, srcY1,
 +dstX0, dstY0, dstX1, dstY1)) {
 +  /* nothing left! */
 +  return;
 +   }
 +
 +   _mesa_meta_blit_preclipped(ctx, srcX0, srcY0, srcX1, srcY1,
 +  dstX0, dstY0, dstX1, dstY1,
 +  mask, filter);
 +}
 +
  void
  _mesa_meta_glsl_blit_cleanup(struct blit_state *blit)
  {
 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/10] winsys/radeon: remove cs_write_reloc, add simpler cs_get_reloc

2014-04-14 Thread Marek Olšák
Yes, I will remove the flag in the next patch series.

Marek

On Sun, Apr 13, 2014 at 10:45 AM, Christian König
deathsim...@vodafone.de wrote:
 For this series: Reviewed-by: Christian König christian.koe...@amd.com

 BTW: Can we get rid of RADEON_FLUSH_ASYNC and always flush asynchronously?
 Just calling cs_sync_flush directly after the flush should have the same
 effect as synchronous flushing.

 Christian.

 Am 12.04.2014 18:34, schrieb Marek Olšák:

 From: Marek Olšák marek.ol...@amd.com

 The only difference is that it doesn't write to the CS and only returns
 the index.
 ---
   src/gallium/drivers/r300/r300_cs.h|  3 ++-
   src/gallium/drivers/r300/r300_emit.c  |  4 ++--
   src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 26
 +-
   src/gallium/winsys/radeon/drm/radeon_winsys.h | 19 ++-
   4 files changed, 23 insertions(+), 29 deletions(-)

 diff --git a/src/gallium/drivers/r300/r300_cs.h
 b/src/gallium/drivers/r300/r300_cs.h
 index 744e19e..748d6ea 100644
 --- a/src/gallium/drivers/r300/r300_cs.h
 +++ b/src/gallium/drivers/r300/r300_cs.h
 @@ -109,7 +109,8 @@
   #define OUT_CS_RELOC(r) do { \
   assert((r)); \
   assert((r)-cs_buf); \
 -cs_winsys-cs_write_reloc(cs_copy, (r)-cs_buf); \
 +OUT_CS(0xc0001000); /* PKT3_NOP */ \
 +OUT_CS(cs_winsys-cs_get_reloc(cs_copy, (r)-cs_buf) * 4); \
   CS_USED_DW(2); \
   } while (0)
   diff --git a/src/gallium/drivers/r300/r300_emit.c
 b/src/gallium/drivers/r300/r300_emit.c
 index d99b919..9f16413 100644
 --- a/src/gallium/drivers/r300/r300_emit.c
 +++ b/src/gallium/drivers/r300/r300_emit.c
 @@ -1043,8 +1043,8 @@ void r300_emit_vertex_arrays_swtcl(struct
 r300_context *r300, boolean indexed)
   OUT_CS(0);
 assert(r300-vbo_cs);
 -cs_winsys-cs_write_reloc(cs_copy, r300-vbo_cs);
 -CS_USED_DW(2);
 +OUT_CS(0xc0001000); /* PKT3_NOP */
 +OUT_CS(r300-rws-cs_get_reloc(r300-cs, r300-vbo_cs) * 4);
   END_CS;
   }
   diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 index b55eb80..4ce1717 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 @@ -313,6 +313,14 @@ static unsigned radeon_drm_cs_add_reloc(struct
 radeon_winsys_cs *rcs,
   return index;
   }
   +static int radeon_drm_cs_get_reloc(struct radeon_winsys_cs *rcs,
 +   struct radeon_winsys_cs_handle *buf)
 +{
 +struct radeon_drm_cs *cs = radeon_drm_cs(rcs);
 +
 +return radeon_get_reloc(cs-csc, (struct radeon_bo*)buf, NULL);
 +}
 +
   static boolean radeon_drm_cs_validate(struct radeon_winsys_cs *rcs)
   {
   struct radeon_drm_cs *cs = radeon_drm_cs(rcs);
 @@ -359,22 +367,6 @@ static boolean
 radeon_drm_cs_memory_below_limit(struct radeon_winsys_cs *rcs, ui
   return status;
   }
   -static void radeon_drm_cs_write_reloc(struct radeon_winsys_cs *rcs,
 -  struct radeon_winsys_cs_handle
 *buf)
 -{
 -struct radeon_drm_cs *cs = radeon_drm_cs(rcs);
 -struct radeon_bo *bo = (struct radeon_bo*)buf;
 -unsigned index = radeon_get_reloc(cs-csc, bo, NULL);
 -
 -if (index == -1) {
 -fprintf(stderr, radeon: Cannot get a relocation in %s.\n,
 __func__);
 -return;
 -}
 -
 -OUT_CS(cs-base, 0xc0001000);
 -OUT_CS(cs-base, index * RELOC_DWORDS);
 -}
 -
   void radeon_drm_cs_emit_ioctl_oneshot(struct radeon_drm_cs *cs, struct
 radeon_cs_context *csc)
   {
   unsigned i;
 @@ -650,9 +642,9 @@ void radeon_drm_cs_init_functions(struct
 radeon_drm_winsys *ws)
   ws-base.cs_create = radeon_drm_cs_create;
   ws-base.cs_destroy = radeon_drm_cs_destroy;
   ws-base.cs_add_reloc = radeon_drm_cs_add_reloc;
 +ws-base.cs_get_reloc = radeon_drm_cs_get_reloc;
   ws-base.cs_validate = radeon_drm_cs_validate;
   ws-base.cs_memory_below_limit = radeon_drm_cs_memory_below_limit;
 -ws-base.cs_write_reloc = radeon_drm_cs_write_reloc;
   ws-base.cs_flush = radeon_drm_cs_flush;
   ws-base.cs_set_flush_callback = radeon_drm_cs_set_flush;
   ws-base.cs_is_buffer_referenced = radeon_bo_is_referenced;
 diff --git a/src/gallium/winsys/radeon/drm/radeon_winsys.h
 b/src/gallium/winsys/radeon/drm/radeon_winsys.h
 index 485e925..320989c 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_winsys.h
 +++ b/src/gallium/winsys/radeon/drm/radeon_winsys.h
 @@ -450,6 +450,16 @@ struct radeon_winsys {
enum radeon_bo_priority priority);
 /**
 + * Return the index of an already-added buffer.
 + *
 + * \param csCommand stream
 + * \param buf   Buffer
 + * \return  The buffer index, or -1 if the buffer has not
 been added.
 + */
 +int (*cs_get_reloc)(struct radeon_winsys_cs *cs,
 +struct radeon_winsys_cs_handle *buf);
 +
 +/**
* Return TRUE if there is enough memory 

Re: [Mesa-dev] [PATCH 2/2] i965: Disable Z16 in all APIs.

2014-04-14 Thread Chia-I Wu
On Mon, Apr 14, 2014 at 1:04 PM, Kenneth Graunke kenn...@whitecape.org wrote:
 We originally thought that GL 3.0 required GL_DEPTH_COMPONENT16 to map
 exactly to Z16.  However, we misread the specification, thanks in part
 to LaTeX reordering the tables in the PDF.

 Page 180 of the GL 3.0 specification (glspec30.20080923.pdf) says:
 [...] memory allocation per texture component is assigned by the GL to
 match the allocations listed in tables 3.16-3.18 as closely as possible.
 [...]

 Required Texture Formats
 [...]
 In addition, implementations are required to support the following sized
 internal formats.  Requesting one of these internal formats for any
 texture type will allocate exactly the internal component sizes and
 types shown for that format in tables 3.16-3.17:

 Notably, however, GL_DEPTH_COMPONENT16 does /not/ appear in table 3.16
 or table 3.17.  It appears in table 3.18, where the exact rule doesn't
 apply, and it falls back to the closely as possible rule.

 The confusing part is that the ordering of the tables in the PDF is:

 Table 3.16 (pages 182-184)
 Table 3.18 (bottom of page 184 to top of 185)
 Table 3.17 (page 185)

 Presumably, people saw table 3.16, then saw the table immediately
 following with DEPTH_COMPONENT* formats, and assumed it was 3.17.

 Based on a batch by Chia-I Wu, but without the driconf option to force
s/batch/patch/

Both patches look good to me.  Unless I overlooked your patch for
piglit, this is needed

http://lists.freedesktop.org/archives/piglit/2014-February/009650.html

to avoid a false regression.  It would be great if you could review
and commit the piglit fix, as I do not have commit access.

 Z16 to be used.  It's not required, and there's apparently no benefit
 to actually using it.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_surface_formats.c | 6 --
  1 file changed, 6 deletions(-)

 Sorry if this is a duplicate of an earlier patch...the only one I could
 find in my inbox was the one with the driconf option.

 diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
 b/src/mesa/drivers/dri/i965/brw_surface_formats.c
 index 196f139..5907dd9 100644
 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
 +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
 @@ -608,7 +608,6 @@ brw_init_surface_formats(struct brw_context *brw)
 brw-format_supported_as_render_target[MESA_FORMAT_Z24_UNORM_S8_UINT] = 
 true;
 brw-format_supported_as_render_target[MESA_FORMAT_Z24_UNORM_X8_UINT] = 
 true;
 brw-format_supported_as_render_target[MESA_FORMAT_S_UINT8] = true;
 -   brw-format_supported_as_render_target[MESA_FORMAT_Z_UNORM16] = true;
 brw-format_supported_as_render_target[MESA_FORMAT_Z_FLOAT32] = true;
 brw-format_supported_as_render_target[MESA_FORMAT_Z32_FLOAT_S8X24_UINT] 
 = true;

 @@ -630,12 +629,7 @@ brw_init_surface_formats(struct brw_context *brw)
  *
  * Other speculation is that we may be hitting increased fragment shader
  * execution from GL_LEQUAL/GL_EQUAL depth tests at reduced precision.
 -*
 -* However, desktop GL 3.0+ require that you get exactly 16 bits when
 -* asking for DEPTH_COMPONENT16, so we have to respect that.
  */
 -   if (_mesa_is_desktop_gl(ctx))
 -  ctx-TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = true;

 /* On hardware that lacks support for ETC1, we map ETC1 to RGBX
  * during glCompressedTexImage2D(). See intel_mipmap_tree::wraps_etc1.
 --
 1.9.2

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev



-- 
o...@lunarg.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] scons: Add an analyze option.

2014-04-14 Thread jfonseca
From: José Fonseca jfons...@vmware.com

For Clang static code analyzer, the scan-build script will produce more
comprehensive output.  Nevertheless you can invoke it as

  CC=clang CXX=clang++ scons analyze=1

For MSVC this is the best way to use its static code analysis.  Simply
invoke as

  scons analyze=1
---
 common.py|  1 +
 scons/gallium.py | 12 
 2 files changed, 13 insertions(+)

diff --git a/common.py b/common.py
index 22c1725..d6e6215 100644
--- a/common.py
+++ b/common.py
@@ -91,6 +91,7 @@ def AddOptions(opts):
opts.Add(EnumOption('platform', 'target platform', host_platform,

 allowed_values=('cygwin', 'darwin', 'freebsd', 'haiku', 'linux', 
'sunos', 'windows')))
opts.Add(BoolOption('embedded', 'embedded build', 'no'))
+   opts.Add(BoolOption('analyze', 'enable static code analysis where 
available', 'no'))
opts.Add('toolchain', 'compiler toolchain', default_toolchain)
opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 
'no'))
opts.Add(BoolOption('llvm', 'use LLVM', default_llvm))
diff --git a/scons/gallium.py b/scons/gallium.py
index 42e8f7c..e873c65 100755
--- a/scons/gallium.py
+++ b/scons/gallium.py
@@ -467,6 +467,18 @@ def generate(env):
 env.Append(CCFLAGS = ['/MT'])
 env.Append(SHCCFLAGS = ['/LD'])
 
+# Static code analysis
+if env['analyze']:
+if env['msvc']:
+# http://msdn.microsoft.com/en-us/library/ms173498.aspx
+env.Append(CCFLAGS = [
+'/analyze',
+#'/analyze:log', '${TARGET.base}.xml',
+])
+if env['clang']:
+# scan-build will produce more comprehensive output
+env.Append(CCFLAGS = ['--analyze'])
+
 # Assembler options
 if gcc_compat:
 if env['machine'] == 'x86':
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] scons: Enable building through Clang Static Analyzer.

2014-04-14 Thread jfonseca
From: José Fonseca jfons...@vmware.com

By accurately detecting gcc/clang through --version option instead
of executable name.

Clang Static Analyzer reports many issues, most false positives, but it
found at least one real and subtle use-after-free issue
in st_texture_get_sampler_view():

  
http://people.freedesktop.org/~jrfonseca/scan-build-2014-04-14-1/report-869047.html#EndPath
---
 scons/gallium.py | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/scons/gallium.py b/scons/gallium.py
index e11d4db..42e8f7c 100755
--- a/scons/gallium.py
+++ b/scons/gallium.py
@@ -104,6 +104,19 @@ def num_jobs():
 return 1
 
 
+def get_cc_version(env):
+# Get the first line of `$CC --version`
+pipe = SCons.Action._subproc(env, [env['CC'], '--version'],
+ stdin = 'devnull',
+ stderr = 'devnull',
+ stdout = subprocess.PIPE)
+if pipe.wait() != 0:
+return ''
+
+line = pipe.stdout.readline()
+return line
+
+
 def generate(env):
 Common environment generation code
 
@@ -119,12 +132,8 @@ def generate(env):
 if os.environ.has_key('CC'):
 env['CC'] = os.environ['CC']
 # Update CCVERSION to match
-pipe = SCons.Action._subproc(env, [env['CC'], '--version'],
- stdin = 'devnull',
- stderr = 'devnull',
- stdout = subprocess.PIPE)
-if pipe.wait() == 0:
-line = pipe.stdout.readline()
+line = get_cc_version(env)
+if line:
 match = re.search(r'[0-9]+(\.[0-9]+)+', line)
 if match:
 env['CCVERSION'] = match.group(0)
@@ -137,10 +146,16 @@ def generate(env):
 if os.environ.has_key('LDFLAGS'):
 env['LINKFLAGS'] += SCons.Util.CLVar(os.environ['LDFLAGS'])
 
-env['gcc'] = 'gcc' in os.path.basename(env['CC']).split('-')
+# Detect gcc/clang not by executable name, but through `--version` option,
+# to avoid drawing wrong conclusions when using tools that overrice CC/CXX
+# like scan-build.
+cc_version = get_cc_version(env)
+cc_version_words = cc_version.split()
+
+env['gcc'] = 'gcc' in cc_version_words
 env['msvc'] = env['CC'] == 'cl'
 env['suncc'] = env['platform'] == 'sunos' and os.path.basename(env['CC']) 
== 'cc'
-env['clang'] = env['CC'] == 'clang'
+env['clang'] = 'clang' in cc_version_words
 env['icc'] = 'icc' == os.path.basename(env['CC'])
 
 if env['msvc'] and env['toolchain'] == 'default' and env['machine'] == 
'x86_64':
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] util/u_debug: Add noreturn attribute to _debug_assert_fail().

2014-04-14 Thread jfonseca
From: José Fonseca jfons...@vmware.com

As recommended by
http://clang-analyzer.llvm.org/annotations.html#attr_noreturn
---
 src/gallium/auxiliary/util/u_debug.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/util/u_debug.h 
b/src/gallium/auxiliary/util/u_debug.h
index 9e4eb41..8570597 100644
--- a/src/gallium/auxiliary/util/u_debug.h
+++ b/src/gallium/auxiliary/util/u_debug.h
@@ -154,7 +154,11 @@ debug_get_num_option(const char *name, long dfault);
 void _debug_assert_fail(const char *expr, 
 const char *file, 
 unsigned line, 
-const char *function);
+const char *function)
+#ifdef __GNUC__
+   __attribute__((__noreturn__))
+#endif
+;
 
 
 /** 
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/vec4: unit test for copy propagation and writemask

2014-04-14 Thread Chia-I Wu
This unit test demonstrates a subtle bug fixed by
4ddf51db6af36736d5d42c1043eeea86e47459ce.

Signed-off-by: Chia-I Wu o...@lunarg.com
Cc: Eric Anholt e...@anholt.net
---
 .../dri/i965/test_vec4_copy_propagation.cpp| 30 ++
 1 file changed, 30 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp
index cb70096..fd517f8 100644
--- a/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp
@@ -154,3 +154,33 @@ TEST_F(copy_propagation_test, test_swizzle_swizzle)
 SWIZZLE_X,
 SWIZZLE_Y));
 }
+
+TEST_F(copy_propagation_test, test_swizzle_writemask)
+{
+   dst_reg a = dst_reg(v, glsl_type::vec4_type);
+   dst_reg b = dst_reg(v, glsl_type::vec4_type);
+   dst_reg c = dst_reg(v, glsl_type::vec4_type);
+
+   v-emit(v-MOV(b, swizzle(src_reg(a), BRW_SWIZZLE4(SWIZZLE_X,
+  SWIZZLE_Y,
+  SWIZZLE_X,
+  SWIZZLE_Z;
+
+   v-emit(v-MOV(writemask(a, WRITEMASK_XYZ), src_reg(1.0f)));
+
+   vec4_instruction *test_mov =
+  v-MOV(c, swizzle(src_reg(b), BRW_SWIZZLE4(SWIZZLE_W,
+ SWIZZLE_W,
+ SWIZZLE_W,
+ SWIZZLE_W)));
+   v-emit(test_mov);
+
+   copy_propagation(v);
+
+   /* should not copy propagate */
+   EXPECT_EQ(test_mov-src[0].reg, b.reg);
+   EXPECT_EQ(test_mov-src[0].swizzle, BRW_SWIZZLE4(SWIZZLE_W,
+SWIZZLE_W,
+SWIZZLE_W,
+SWIZZLE_W));
+}
-- 
1.8.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Clover clEnqueue* function don't implement blocking?

2014-04-14 Thread Dorrington, Albert
From reviewing api/transfer.cpp, it looks like all of the API calls that have 
a blocking parameter do not have anything that implement blocking 
functionality.

clEnqueueReadBuffer(), clEnqueueWriteBuffer(),
clEnqueueReadBufferRect(), clEnqueueWriteBufferRect(),
clEnqueueReadImage(), clEnqueueWriteImage()
clEnqueueMapBuffer(),
clEnqueueMapImage()

For all of these functions, shouldn't there be a conditional wait on the event 
implemented, something like the following?

  auto hev = createhard_event(...);
  ret_object(rd_ev, hev);
  if (blocking) { hev().wait(); }
  return CL_SUCCESS;

My initial tests indicate this works, but I'm not sure if I'm missing something 
in the event handling/cleanup.

Al Dorrington
Software Engineer Sr
Lockheed Martin, Mission Systems and Training

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] softpipe: fix shadow sampling and remove nonsensical approximation of linear interpolation behavior for shadow samplers.

2014-04-14 Thread Roland Scheidegger
Am 11.04.2014 09:14, schrieb Janzing, Heinrich:
 Shadow sampling appeared to be fundamentally broken; fix in
 attachment.
 

This looks ok to me (I put in the comment about this not making sense in
the first place...).
It should be noted though and I'd have preferred if a comment would
still say that somewhere that even with this the behavior isn't quite
right neither (though arguably much better) and the reason I didn't
actually fix it at that time since it was a bit more complex, what will
happen is that we'll do bilinear filter first, then replace with 0/1
depending on the filtered result. GL actually allows this but ideally
you'd really exchange the values with 0/1 _before_ doing the bilinear
filter.

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] softpipe: fix shadow sampling and remove nonsensical approximation of linear interpolation behavior for shadow samplers.

2014-04-14 Thread Janzing, Heinrich
Are you saying that the nearest neighbour sampling implementation is still not 
entirely correct now? For the linear interpolation case I much prefer the 
current imperfect behavior over the previous behavior, though a comment might 
still have been in order to highlight the remaining work in that area. Sorry 
for not adding it.

Heinrich 

-Original Message-
From: Roland Scheidegger [mailto:srol...@vmware.com] 
Sent: maandag 14 april 2014 16:20
To: Janzing, Heinrich; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] softpipe: fix shadow sampling and remove 
nonsensical approximation of linear interpolation behavior for shadow samplers.

Am 11.04.2014 09:14, schrieb Janzing, Heinrich:
 Shadow sampling appeared to be fundamentally broken; fix in 
 attachment.
 

This looks ok to me (I put in the comment about this not making sense in the 
first place...).
It should be noted though and I'd have preferred if a comment would still say 
that somewhere that even with this the behavior isn't quite right neither 
(though arguably much better) and the reason I didn't actually fix it at that 
time since it was a bit more complex, what will happen is that we'll do 
bilinear filter first, then replace with 0/1 depending on the filtered result. 
GL actually allows this but ideally you'd really exchange the values with 0/1 
_before_ doing the bilinear filter.

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Clover clEnqueue* function don't implement blocking?

2014-04-14 Thread Francisco Jerez
Dorrington, Albert albert.dorring...@lmco.com writes:

 From reviewing api/transfer.cpp, it looks like all of the API calls that have 
 a blocking parameter do not have anything that implement blocking 
 functionality.

 clEnqueueReadBuffer(), clEnqueueWriteBuffer(),
 clEnqueueReadBufferRect(), clEnqueueWriteBufferRect(),
 clEnqueueReadImage(), clEnqueueWriteImage()
 clEnqueueMapBuffer(),
 clEnqueueMapImage()

 For all of these functions, shouldn't there be a conditional wait on the 
 event implemented, something like the following?

   auto hev = createhard_event(...);
   ret_object(rd_ev, hev);
   if (blocking) { hev().wait(); }
   return CL_SUCCESS;

 My initial tests indicate this works, but I'm not sure if I'm missing 
 something in the event handling/cleanup.


Nope, that's usually not necessary.  The blocking parameter of some
OpenCL entry points surprisingly doesn't imply that the enqueued
operation itself should be blocking, it just determines whether the
application is allowed to reuse the provided memory for other purposes
after the call returns -- and in our implementation that's always the
case because we don't do DMA from client addresses, as we discussed
previously, so we can ignore the blocking parameter most of the time.

Note that this doesn't mean that buffer upload is necessarily
synchronous [buffer download is, though], pipe drivers are free to keep
the written bits in some temporary storage and perform the final copy
asynchronously.

 Al Dorrington
 Software Engineer Sr
 Lockheed Martin, Mission Systems and Training

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


pgp_0cKxxqjLV.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] translate_sse: Preserve low bit during unsigned - float conversion.

2014-04-14 Thread Roland Scheidegger
I am not too familiar with translate sse code, so someone else should
probably look at it. The rest of the series looks good to me, a comment
inline.


Am 13.04.2014 22:29, schrieb Andreas Hartmetz:
 From: Andreas Hartmetz andreas.hartm...@kdab.com
 
 ---
  src/gallium/auxiliary/translate/translate_sse.c | 35 
 -
  1 file changed, 28 insertions(+), 7 deletions(-)
 
 diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
 b/src/gallium/auxiliary/translate/translate_sse.c
 index dace722..03d8276 100644
 --- a/src/gallium/auxiliary/translate/translate_sse.c
 +++ b/src/gallium/auxiliary/translate/translate_sse.c
 @@ -71,7 +71,7 @@ struct translate_group
  
  #define ELEMENT_BUFFER_INSTANCE_ID  1001
  
 -#define NUM_CONSTS 7
 +#define NUM_CONSTS 8
  
  enum
  {
 @@ -81,6 +81,7 @@ enum
 CONST_INV_32767,
 CONST_INV_65535,
 CONST_INV_2147483647,
 +   CONST_INV_4294967295,
 CONST_255
  };
  
 @@ -92,6 +93,7 @@ static float consts[NUM_CONSTS][4] = {
 C(1.0 / 32767.0),
 C(1.0 / 65535.0),
 C(1.0 / 2147483647.0),
 +   C(1.0 / 4294967295.0),
 C(255.0)
  };
  
 @@ -515,6 +517,7 @@ translate_attr_convert(struct translate_sse *p,
  || a-output_format == PIPE_FORMAT_R32G32B32_FLOAT
  || a-output_format == PIPE_FORMAT_R32G32B32A32_FLOAT)) {
struct x86_reg dataXMM = x86_make_reg(file_XMM, 0);
 +  struct x86_reg dataXMM2 = x86_make_reg(file_XMM, 1);
  
for (i = 0; i  output_desc-nr_channels; ++i) {
   if (swizzle[i] == UTIL_FORMAT_SWIZZLE_0
 @@ -546,17 +549,38 @@ translate_attr_convert(struct translate_sse *p,
  */
 sse2_punpcklbw(p-func, dataXMM, get_const(p, 
 CONST_IDENTITY));
 sse2_punpcklbw(p-func, dataXMM, get_const(p, 
 CONST_IDENTITY));
 +   sse2_cvtdq2ps(p-func, dataXMM, dataXMM);
 break;
  case 16:
 sse2_punpcklwd(p-func, dataXMM, get_const(p, 
 CONST_IDENTITY));
 +   sse2_cvtdq2ps(p-func, dataXMM, dataXMM);
 break;
 -case 32:   /* we lose precision here */
 +case 32:   /* we lose precision if value  2^23 - 1 */
 +   /* ...but try to keep all bits for value = 2^23 - 1 */
 +   /* this could be done much smarter for 1 or 2 dwords of input
 +* on X64, using sth like cvtsi2ss xmm0, rax (note *R*ax) */
 +
 +   /* save the low bit */
 +   sse_movss(p-func, dataXMM2, dataXMM);
I don't think that's right, this code can handle multiple channels, i.e.
true vectors hence you need sse2_movdqa (or sse_movaps but the former
should cause less domain transitions).

 +   /* right shift  convert, losing the low bit - must clear
 +* high bit because there is no unsigned convert instruction 
 */
 sse2_psrld_imm(p-func, dataXMM, 1);
 +   sse2_cvtdq2ps(p-func, dataXMM, dataXMM);
 +
 +   /* convert low bit to float */
 +   sse2_pslld_imm(p-func, dataXMM2, 31);
 +   sse2_psrld_imm(p-func, dataXMM2, 31);
 +   sse2_cvtdq2ps(p-func, dataXMM2, dataXMM2);
Is this really ideal, wouldn't something like (in horrible pseudo-code
notation)
dataXMM2 = dataXMM2  CONST(0x1 vec)
dataXMM2 = cvtdq2ps(dataXMM2)
be faster?
I guess though your method avoids the constant, so probably not worth
bothering (I am actually wondering what code llvm generates for its
UIToFP instruction or what in general the fastest way to do this is).

Roland



 +
 +   /* mostly undo the right-shift by 1 */
 +   sse_addps(p-func, dataXMM, dataXMM);
 +   /* add back the lost low bit */
 +   sse_addps(p-func, dataXMM, dataXMM2);
 break;
  default:
 return FALSE;
  }
 -sse2_cvtdq2ps(p-func, dataXMM, dataXMM);
 +
  if (input_desc-channel[0].normalized) {
 struct x86_reg factor;
 switch (input_desc-channel[0].size) {
 @@ -567,7 +591,7 @@ translate_attr_convert(struct translate_sse *p,
factor = get_const(p, CONST_INV_65535);
break;
 case 32:
 -  factor = get_const(p, CONST_INV_2147483647);
 +  factor = get_const(p, CONST_INV_4294967295);
break;
 default:
assert(0);
 @@ -579,9 +603,6 @@ translate_attr_convert(struct translate_sse *p,
 }
 sse_mulps(p-func, dataXMM, factor);
  }
 -else if (input_desc-channel[0].size == 32)
 -   /* compensate for the bit we threw away to fit u32 into s32 */
 -   sse_addps(p-func, dataXMM, dataXMM);
  break;
   case UTIL_FORMAT_TYPE_SIGNED:
  if (!(x86_target_caps(p-func)  X86_SSE2))
 

Re: [Mesa-dev] [PATCH] softpipe: fix shadow sampling and remove nonsensical approximation of linear interpolation behavior for shadow samplers.

2014-04-14 Thread Roland Scheidegger
Am 14.04.2014 16:32, schrieb Janzing, Heinrich:
 Are you saying that the nearest neighbour sampling implementation is 
 still not entirely correct now?
No this is correct indeed.

 For the linear interpolation case I
 much prefer the current imperfect behavior over the previous 
 behavior, though a comment might still have been in order to 
 highlight the remaining work in that area. Sorry for not adding it.
Yes this ought to be much better (afaict this also is what some hw which
couldn't do shadow comparisons in the sampler itself has done some time
ago). I was really only missing the comment, but no biggie (the error is
easy to spot and notice what it actually is).

Roland



 Heinrich
 
 -Original Message- From: Roland Scheidegger 
 [mailto:srol...@vmware.com] Sent: maandag 14 april 2014 16:20 To: 
 Janzing, Heinrich; mesa-dev@lists.freedesktop.org Subject: Re: 
 [Mesa-dev] [PATCH] softpipe: fix shadow sampling and remove 
 nonsensical approximation of linear interpolation behavior for
 shadow samplers.
 
 Am 11.04.2014 09:14, schrieb Janzing, Heinrich:
 Shadow sampling appeared to be fundamentally broken; fix in 
 attachment.
 
 
 This looks ok to me (I put in the comment about this not making
 sense in the first place...). It should be noted though and I'd have 
 preferred if a comment would still say that somewhere that even with 
 this the behavior isn't quite right neither (though arguably much 
 better) and the reason I didn't actually fix it at that time since
 it was a bit more complex, what will happen is that we'll do
 bilinear filter first, then replace with 0/1 depending on the
 filtered result. GL actually allows this but ideally you'd really
 exchange the values with 0/1 _before_ doing the bilinear filter.
 
 Roland
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] scons: Enable building through Clang Static Analyzer.

2014-04-14 Thread Brian Paul

On 04/14/2014 07:32 AM, jfons...@vmware.com wrote:

From: José Fonseca jfons...@vmware.com

By accurately detecting gcc/clang through --version option instead
of executable name.

Clang Static Analyzer reports many issues, most false positives, but it
found at least one real and subtle use-after-free issue
in st_texture_get_sampler_view():

   
http://people.freedesktop.org/~jrfonseca/scan-build-2014-04-14-1/report-869047.html#EndPath
---
  scons/gallium.py | 31 +++
  1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/scons/gallium.py b/scons/gallium.py
index e11d4db..42e8f7c 100755
--- a/scons/gallium.py
+++ b/scons/gallium.py
@@ -104,6 +104,19 @@ def num_jobs():
  return 1


+def get_cc_version(env):
+# Get the first line of `$CC --version`
+pipe = SCons.Action._subproc(env, [env['CC'], '--version'],
+ stdin = 'devnull',
+ stderr = 'devnull',
+ stdout = subprocess.PIPE)
+if pipe.wait() != 0:
+return ''
+
+line = pipe.stdout.readline()
+return line
+
+
  def generate(env):
  Common environment generation code

@@ -119,12 +132,8 @@ def generate(env):
  if os.environ.has_key('CC'):
  env['CC'] = os.environ['CC']
  # Update CCVERSION to match
-pipe = SCons.Action._subproc(env, [env['CC'], '--version'],
- stdin = 'devnull',
- stderr = 'devnull',
- stdout = subprocess.PIPE)
-if pipe.wait() == 0:
-line = pipe.stdout.readline()
+line = get_cc_version(env)
+if line:
  match = re.search(r'[0-9]+(\.[0-9]+)+', line)
  if match:
  env['CCVERSION'] = match.group(0)
@@ -137,10 +146,16 @@ def generate(env):
  if os.environ.has_key('LDFLAGS'):
  env['LINKFLAGS'] += SCons.Util.CLVar(os.environ['LDFLAGS'])

-env['gcc'] = 'gcc' in os.path.basename(env['CC']).split('-')
+# Detect gcc/clang not by executable name, but through `--version` option,
+# to avoid drawing wrong conclusions when using tools that overrice CC/CXX
+# like scan-build.
+cc_version = get_cc_version(env)
+cc_version_words = cc_version.split()
+
+env['gcc'] = 'gcc' in cc_version_words
  env['msvc'] = env['CC'] == 'cl'
  env['suncc'] = env['platform'] == 'sunos' and os.path.basename(env['CC']) 
== 'cc'
-env['clang'] = env['CC'] == 'clang'
+env['clang'] = 'clang' in cc_version_words
  env['icc'] = 'icc' == os.path.basename(env['CC'])

  if env['msvc'] and env['toolchain'] == 'default' and env['machine'] == 
'x86_64':



Series looks good to me.

Reviewed-by: Brian Paul bri...@vmware.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/st: unmap transfer only if not null

2014-04-14 Thread Sebastian Wick
Fixes crash for r600g in piglit tests
fbo-generatemipmap-3d RGB9_E5
fbo-generatemipmap-array RGB9_E5

Signed-off-by: Sebastian Wick sebast...@sebastianwick.net
---
 src/mesa/state_tracker/st_texture.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_texture.c 
b/src/mesa/state_tracker/st_texture.c
index 8d559df..0e53823 100644
--- a/src/mesa/state_tracker/st_texture.c
+++ b/src/mesa/state_tracker/st_texture.c
@@ -271,7 +271,8 @@ st_texture_image_unmap(struct st_context *st,
 
DBG(%s\n, __FUNCTION__);
 
-   pipe_transfer_unmap(pipe, stImage-transfer);
+   if (stImage-transfer)
+  pipe_transfer_unmap(pipe, stImage-transfer);
stImage-transfer = NULL;
 }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/st: unmap transfer only if not null

2014-04-14 Thread Brian Paul

On 04/14/2014 10:17 AM, Sebastian Wick wrote:

Fixes crash for r600g in piglit tests
fbo-generatemipmap-3d RGB9_E5
fbo-generatemipmap-array RGB9_E5

Signed-off-by: Sebastian Wick sebast...@sebastianwick.net
---
  src/mesa/state_tracker/st_texture.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_texture.c 
b/src/mesa/state_tracker/st_texture.c
index 8d559df..0e53823 100644
--- a/src/mesa/state_tracker/st_texture.c
+++ b/src/mesa/state_tracker/st_texture.c
@@ -271,7 +271,8 @@ st_texture_image_unmap(struct st_context *st,

 DBG(%s\n, __FUNCTION__);

-   pipe_transfer_unmap(pipe, stImage-transfer);
+   if (stImage-transfer)
+  pipe_transfer_unmap(pipe, stImage-transfer);
 stImage-transfer = NULL;
  }




I think it would be interesting to know how this is happening in the 
first place.  We shouldn't be calling unmap on a texture that wasn't 
mapped or failed to map in the first place.


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] EXTERNAL: Re: Clover clEnqueue* function don't implement blocking?

2014-04-14 Thread Dorrington, Albert


 -Original Message-
 From: Francisco Jerez [mailto:curroje...@riseup.net]
 Sent: Monday, April 14, 2014 11:21 AM
 To: Dorrington, Albert; mesa-dev@lists.freedesktop.org
 Subject: EXTERNAL: Re: [Mesa-dev] Clover clEnqueue* function don't
 implement blocking?
 
 Dorrington, Albert albert.dorring...@lmco.com writes:
 
  From reviewing api/transfer.cpp, it looks like all of the API calls that 
  have a
 blocking parameter do not have anything that implement blocking
 functionality.
 
  clEnqueueReadBuffer(), clEnqueueWriteBuffer(),
  clEnqueueReadBufferRect(), clEnqueueWriteBufferRect(),
  clEnqueueReadImage(), clEnqueueWriteImage() clEnqueueMapBuffer(),
  clEnqueueMapImage()
 
  For all of these functions, shouldn't there be a conditional wait on the
 event implemented, something like the following?
 
auto hev = createhard_event(...);
ret_object(rd_ev, hev);
if (blocking) { hev().wait(); }
return CL_SUCCESS;
 
  My initial tests indicate this works, but I'm not sure if I'm missing 
  something
 in the event handling/cleanup.
 
 
 Nope, that's usually not necessary.  The blocking parameter of some
 OpenCL entry points surprisingly doesn't imply that the enqueued operation
 itself should be blocking, it just determines whether the application is
 allowed to reuse the provided memory for other purposes after the call
 returns -- and in our implementation that's always the case because we don't
 do DMA from client addresses, as we discussed previously, so we can ignore
 the blocking parameter most of the time.
 
 Note that this doesn't mean that buffer upload is necessarily synchronous
 [buffer download is, though], pipe drivers are free to keep the written bits 
 in
 some temporary storage and perform the final copy asynchronously.
 

Ah, ok - that makes sense - I was misinterpreting the OpenCL spec.

But when these events are queued, if there isn't a wait(), then what triggers 
their flush from the queued_events list?
That seems to only happen when the hard_event::wait() is called (assuming the 
status is queued)

Shouldn't something be flushing the queue as the events are processed 
(especially if nothing is pending on the events), well before the clFinish() 
call?



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/st: unmap transfer only if not null

2014-04-14 Thread Marek Olšák
Hi, the tests have already been fixed in the master branch. You seem
to use Mesa code that is quite old. This patch won't apply.

Marek

On Mon, Apr 14, 2014 at 6:17 PM, Sebastian Wick
sebast...@sebastianwick.net wrote:
 Fixes crash for r600g in piglit tests
 fbo-generatemipmap-3d RGB9_E5
 fbo-generatemipmap-array RGB9_E5

 Signed-off-by: Sebastian Wick sebast...@sebastianwick.net
 ---
  src/mesa/state_tracker/st_texture.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/state_tracker/st_texture.c 
 b/src/mesa/state_tracker/st_texture.c
 index 8d559df..0e53823 100644
 --- a/src/mesa/state_tracker/st_texture.c
 +++ b/src/mesa/state_tracker/st_texture.c
 @@ -271,7 +271,8 @@ st_texture_image_unmap(struct st_context *st,

 DBG(%s\n, __FUNCTION__);

 -   pipe_transfer_unmap(pipe, stImage-transfer);
 +   if (stImage-transfer)
 +  pipe_transfer_unmap(pipe, stImage-transfer);
 stImage-transfer = NULL;
  }

 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 77443] New: Potential regressions from 1afe3359258a9

2014-04-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=77443

  Priority: medium
Bug ID: 77443
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: Potential regressions from 1afe3359258a9
  Severity: normal
Classification: Unclassified
OS: All
  Reporter: cwo...@cworth.org
  Hardware: Other
Status: NEW
   Version: unspecified
 Component: Mesa core
   Product: Mesa

I recently tried cherry-picking commit
1afe3359258a9e89b62c8638761f52d78f6d1cbc to the 10.1 branch of
Mesa. When I did that, I found that the following piglit tests
regressed (passing before the commit and failing afterward):

all/spec/glsl-1.50/execution/geometry
clip-distance-out-values
max-input-components
point-size-out
primitive-id-out

all/spec/glsl-1.50/execution
gs-redeclares-both-pervertex-blocks
interface-vs-named-to-gs-array
redeclare-pervertex-out-subset-gs
redeclare-pervertex-subset-vs-to-gs

I talked about the regressions with Ken a bit. He thinks it might be
right to just convert the patch from mesa master.

Here's the original patch:

commit 1afe3359258a9e89b62c8638761f52d78f6d1cbc
Author: Kenneth Graunke kenn...@whitecape.org
Date:   Thu Mar 20 11:53:16 2014 -0700

mesa: In core profile, refuse to draw unless a VAO is bound.

Core profile requires a non-default VAO to be bound.  Currently, calls
to glVertexAttribPointer raise INVALID_OPERATION unless a VAO is bound,
and we never actually get any vertex data set.  Trying to draw without
any vertex data can only cause problems.  In i965, it causes a crash.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76400
Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Reviewed-by: Ian Romanick ian.d.roman...@intel.com
Cc: mesa-sta...@lists.freedesktop.org

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index efc3a2b..8f0b199 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -125,8 +125,11 @@ check_valid_to_render(struct gl_context *ctx, const char
*f
 return GL_FALSE;
   break;

-   case API_OPENGL_COMPAT:
case API_OPENGL_CORE:
+  if (ctx-Array.VAO == ctx-Array.DefaultVAO)
+ return GL_FALSE;
+  /* fallthrough */
+   case API_OPENGL_COMPAT:
   {
  const struct gl_shader_program *vsProg =
 ctx-_Shader-CurrentProgram[MESA_SHADER_VERTEX];

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 77443] Potential regressions from 1afe3359258a9

2014-04-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=77443

Carl Worth cwo...@cworth.org changed:

   What|Removed |Added

   Assignee|mesa-dev@lists.freedesktop. |kenn...@whitecape.org
   |org |

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Preparing for 10.1.1

2014-04-14 Thread Carl Worth
Dylan Baker baker.dyla...@gmail.com writes:
 Carl, the 10.1 branch does not currently build for r600, I get the following 
 message:

Thanks for testing, Dylan!

As part of my standard list of release steps, I do generally do a build
of all drivers. I guess I just need to get in the habit of doing this
before pushing to the stable branch as well.

 I think this is probably fixed by radeon colorswap patch that you couldn't 
 apply.

Probably would be, yes. Meanwhile, Marek suggested just skipping the
next patch in the series as well. And that should fix this build
problem.

-Carl


pgpAK60flJdVn.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: Track max enabled tex image unit

2014-04-14 Thread Anuj Phogat
On Sat, Apr 12, 2014 at 5:37 PM, Chris Forbes chr...@ijw.co.nz wrote:

 This gives us a better bound for some hot loops in the drivers than
 MAX_COMBINED_TEXTURE_IMAGE_UNITS, which is ridiculously large on modern
 hardware, and only getting worse as more shader stages are added.

 Signed-off-by: Chris Forbes chr...@ijw.co.nz
 ---
  src/mesa/main/mtypes.h   | 3 +++
  src/mesa/main/texstate.c | 2 ++
  2 files changed, 5 insertions(+)

 diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
 index 4d014d1..6694383 100644
 --- a/src/mesa/main/mtypes.h
 +++ b/src/mesa/main/mtypes.h
 @@ -1402,6 +1402,9 @@ struct gl_texture_attrib

 /** Bitwise-OR of all Texture.Unit[i]._GenFlags */
 GLbitfield _GenFlags;
 +
 +   /** Upper bound on _ReallyEnabled texunits. */
 +   GLint _MaxEnabledTexImageUnit;
  };


 diff --git a/src/mesa/main/texstate.c b/src/mesa/main/texstate.c
 index fcae878..b68920c 100644
 --- a/src/mesa/main/texstate.c
 +++ b/src/mesa/main/texstate.c
 @@ -550,6 +550,7 @@ update_texture_state( struct gl_context *ctx )
 ctx-Texture._GenFlags = 0x0;
 ctx-Texture._TexMatEnabled = 0x0;
 ctx-Texture._TexGenEnabled = 0x0;
 +   ctx-Texture._MaxEnabledTexImageUnit = -1;

 /*
  * Update texture unit state.
 @@ -636,6 +637,7 @@ update_texture_state( struct gl_context *ctx )
/* if we get here, we know this texture unit is enabled */

ctx-Texture._EnabledUnits |= (1  unit);
 +  ctx-Texture._MaxEnabledTexImageUnit = unit;

if (enabledTargetsByStage[MESA_SHADER_FRAGMENT])
   enabledFragUnits |= (1  unit);
 --
 1.9.2

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Looks good to me.
Reviewed-by: Anuj Phogat anuj.pho...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] radeonsi: move translate_colorswap to common code

2014-04-14 Thread Carl Worth
Marek Olšák mar...@gmail.com writes:
 The patch is probably not necessary. The same applies to patch 2.

Thanks. I've dropped both, and this seems to have fixed the build
problem I had introduced.

I also just force-pushed this new 10.1 branch without the cherry-picked
commit that was causing the build failure. I hope that's OK with
everybody.

-Carl

-- 
carl.d.wo...@intel.com


pgpohQtiBXgqu.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meta: Clip src/dest rects in BlitFramebuffer.

2014-04-14 Thread Chris Forbes
Right, that makes good sense.

On Mon, Apr 14, 2014 at 11:28 PM, Marek Olšák mar...@gmail.com wrote:
 For scaled blits, this is not correct. For example, if you blit a
 10x10 texture to another 10x10 texture with horizontal scaling of
 1.5x, the X texture coordinate of the rightmost pixel should be
 6.66. Mesa clip blit doesn't return floating-point coordinates of
 the source, so you can't use them. You can however take the clipped
 destination coordinates and set them as a scissor rectangle, keeping
 the original coordinates unmodified. That way, you'll get correct
 clipping (in hardware though) with correct source coordinates for
 scaled blits.

 See st_BlitFramebuffer.

 Marek

 On Mon, Apr 14, 2014 at 10:20 AM, Chris Forbes chr...@ijw.co.nz wrote:
 Nothing was bothering to clip the blit. If the src rect was partially
 outside the framebuffer, we'd end up picking up more copies of the
 edge texels due to clamping.

 Note that this is slight overkill -- we could get away with clipping src
 only, since fragments outside the destination surface will be discarded
 anyway.

 Fixes piglit's fbo-blit-stretch test on drivers which use the meta path.
 (i965: should fix Broadwell, but also fixes Sandybridge/Ivybridge/Haswell
 since this test falls off the blorp path now due to format conversion)

 Signed-off-by: Chris Forbes chr...@ijw.co.nz
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77414
 ---
  src/mesa/drivers/common/meta_blit.c | 31 +--
  1 file changed, 25 insertions(+), 6 deletions(-)

 diff --git a/src/mesa/drivers/common/meta_blit.c 
 b/src/mesa/drivers/common/meta_blit.c
 index 31e494f..38d53ef 100644
 --- a/src/mesa/drivers/common/meta_blit.c
 +++ b/src/mesa/drivers/common/meta_blit.c
 @@ -33,6 +33,7 @@
  #include main/enable.h
  #include main/enums.h
  #include main/fbobject.h
 +#include main/image.h
  #include main/macros.h
  #include main/matrix.h
  #include main/multisample.h
 @@ -590,12 +591,9 @@ blitframebuffer_texture(struct gl_context *ctx,
 return true;
  }

 -/**
 - * Meta implementation of ctx-Driver.BlitFramebuffer() in terms
 - * of texture mapping and polygon rendering.
 - */
 -void
 -_mesa_meta_BlitFramebuffer(struct gl_context *ctx,
 +
 +static void
 +_mesa_meta_blit_preclipped(struct gl_context *ctx,
 GLint srcX0, GLint srcY0, GLint srcX1, GLint 
 srcY1,
 GLint dstX0, GLint dstY0, GLint dstX1, GLint 
 dstY1,
 GLbitfield mask, GLenum filter)
 @@ -804,6 +802,27 @@ fallback:
 }
  }

 +/**
 + * Meta implementation of ctx-Driver.BlitFramebuffer() in terms
 + * of texture mapping and polygon rendering.
 + */
 +void
 +_mesa_meta_BlitFramebuffer(struct gl_context *ctx,
 +   GLint srcX0, GLint srcY0, GLint srcX1, GLint 
 srcY1,
 +   GLint dstX0, GLint dstY0, GLint dstX1, GLint 
 dstY1,
 +   GLbitfield mask, GLenum filter)
 +{
 +   if (!_mesa_clip_blit(ctx, srcX0, srcY0, srcX1, srcY1,
 +dstX0, dstY0, dstX1, dstY1)) {
 +  /* nothing left! */
 +  return;
 +   }
 +
 +   _mesa_meta_blit_preclipped(ctx, srcX0, srcY0, srcX1, srcY1,
 +  dstX0, dstY0, dstX1, dstY1,
 +  mask, filter);
 +}
 +
  void
  _mesa_meta_glsl_blit_cleanup(struct blit_state *blit)
  {
 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] EXTERNAL: Re: Clover clEnqueue* function don't implement blocking?

2014-04-14 Thread Francisco Jerez
Dorrington, Albert albert.dorring...@lmco.com writes:
[...]
 But when these events are queued, if there isn't a wait(), then what
 triggers their flush from the queued_events list?  That seems to only
 happen when the hard_event::wait() is called (assuming the status is
 queued)


Yes, that's right, or when clFlush() is called.  So basically we only
flush the queued commands to the hardware when the user does some
blocking call like reading back a buffer object or waiting for an event
explicitly.

 Shouldn't something be flushing the queue as the events are processed
 (especially if nothing is pending on the events), well before the
 clFinish() call?

I don't have any evidence that doing so would improve performance.
Clover tries to minimize the frequency of flushes because it can be
quite an expensive operation -- AFAIK for all hardware gallium drivers
it involves at least one system call.


pgp_GUkhdx4nB.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 77449] Tracker bug for all bugs related to Steam titles

2014-04-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=77449

Ian Romanick i...@freedesktop.org changed:

   What|Removed |Added

 Depends on||76858

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 77449] New: Tracker bug for all bugs related to Steam titles

2014-04-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=77449

  Priority: medium
Bug ID: 77449
CC: court...@lunarg.com, j...@lunarg.com
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: Tracker bug for all bugs related to Steam titles
  Severity: normal
Classification: Unclassified
OS: All
  Reporter: i...@freedesktop.org
  Hardware: Other
Status: NEW
   Version: git
 Component: Other
   Product: Mesa

As the summary says, this is a tracker bug for everything related to Steam and
SteamOS.  I was going to accomplish this with a keyword, but those are not
quite as generic as I had thought.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] Turning swrast into DRI2 drivers

2014-04-14 Thread Dave Airlie
On Sat, Apr 12, 2014 at 10:25 AM, Giovanni Campagna
scampa.giova...@gmail.com wrote:
 Hi everyone,

 Some time ago I sent patches to enable the swrast driver on
 GBM/DRM, and I did them in the dumbest way possible (that is,
 having GBM implement the dri-swrast interface), to make sure
 it would work without kernel support.
 This patch series is a little smarter, in that it creates
 more than one KMS buffer and has llvmpipe render directly
 into the KMS buffer, so we don't need to copy from the back
 to the shadow (before the kernel copies from the shadow to
 the front).

 For background, this is necessary to get mutter-wayland running
 inside VMs such as gnome-continuous, which has a qxl DRM driver.
 mutter-wayland only has a KMS/EGL rendering backend, and we
 have no intention to add pixman or fbdev.
 GNOME bug: https://bugzilla.gnome.org/show_bug.cgi?id=728059

 The older patches were 
 http://lists.freedesktop.org/archives/mesa-dev/2014-March/055113.html
 I can rebase if so is desired.

I actually did something similiar that was qxl specific before,
(qxl-dri2-driver in my repo, though it had some problems I can barely
remember now), the only question I have is what does this produce?
another swrast_dri.so? and will it have any affect on current
scenarios?

Dave.

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] EXTERNAL: Re: Clover clEnqueue* function don't implement blocking?

2014-04-14 Thread Dorrington, Albert


 -Original Message-
 From: Francisco Jerez [mailto:curroje...@riseup.net]
 Sent: Monday, April 14, 2014 4:04 PM
 To: Dorrington, Albert; mesa-dev@lists.freedesktop.org
 Subject: RE: EXTERNAL: Re: [Mesa-dev] Clover clEnqueue* function don't
 implement blocking?
 
 Dorrington, Albert albert.dorring...@lmco.com writes:
 [...]
  But when these events are queued, if there isn't a wait(), then what
 triggers their flush from the queued_events list?  That seems to only
 happen when the hard_event::wait() is called (assuming the status is
  queued)
 
 
 Yes, that's right, or when clFlush() is called.  So basically we only flush 
 the
 queued commands to the hardware when the user does some blocking call
 like reading back a buffer object or waiting for an event explicitly.
 
  Shouldn't something be flushing the queue as the events are processed
  (especially if nothing is pending on the events), well before the
  clFinish() call?
 
 I don't have any evidence that doing so would improve performance.
 Clover tries to minimize the frequency of flushes because it can be quite an
 expensive operation -- AFAIK for all hardware gallium drivers it involves at
 least one system call.

From reading the OpenCL spec (and perhaps I'm misinterpreting something 
again), section 5.10 Flush and Finish says:

Any blocking commands queued in a command-queue such as 
clEnqueueRead{Image|Buffer} with blocking_read set to CL_TRUE, 
clEnqueueWrite{Image|Buffer} with blocking_write set to CL_TRUE, 
clEnqueueMap{Buffer|Image} with blocking_map set to CL_TRUE or 
clWaitForEvents perform an implicit flush of the command-queue. 

From this statement, I would expect that the command-queue would be flushed 
when the blocking flag is set.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] i965/fs: Clear variable from live-set if it's completely overwritten.

2014-04-14 Thread Matt Turner
One program affected:

instructions in affected programs: 246 - 244 (-0.81%)
---
 src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
index 6addbb3..1ae7606 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
@@ -80,6 +80,13 @@ fs_visitor::dead_code_eliminate()
 }
  }
 
+ if (inst-dst.file == GRF) {
+if (!inst-is_partial_write()) {
+   int var = live_intervals-var_from_reg(inst-dst);
+   BITSET_CLEAR(live, var);
+}
+ }
+
  for (int i = 0; i  3; i++) {
 if (inst-src[i].file == GRF) {
int var = live_intervals-var_from_vgrf[inst-src[i].reg];
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3] i965/fs: New dead code elimination.

2014-04-14 Thread Matt Turner
In the process of doing other things, dead code elimination got in my way.
So I killed it.

The new dead code elimination pass uses the liveout set from dataflow
analysis to remove dead instructions. Neither of the two old passes
could, for instance remove a dead write if it was later overwritten
unconditionally in a different block:

mov vgrf4, 0.0
if
...
endif
mov vgrf4, vgrf5

The new pass handles everything the other two passes did and more. It
also speeds up compile times. Check patch 1's commit summary for neat
stats.

Matt Turner (3):
  i965/fs: Reimplement dead_code_elimination().
  i965/fs: Clear variable from live-set if it's completely overwritten.
  i965/fs: Remove dead_code_eliminate_local().

 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 215 +
 src/mesa/drivers/dri/i965/brw_fs.h |   1 -
 .../dri/i965/brw_fs_dead_code_eliminate.cpp| 116 +++
 4 files changed, 118 insertions(+), 215 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp

-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] i965/fs: Remove dead_code_eliminate_local().

2014-04-14 Thread Matt Turner
Subsumed by the new dead_code_eliminate() function. No shader-db
changes.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 158 ---
 src/mesa/drivers/dri/i965/brw_fs.h   |   1 -
 2 files changed, 159 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index c723bf0..88dfdfc 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2085,163 +2085,6 @@ fs_visitor::opt_algebraic()
return progress;
 }
 
-struct dead_code_hash_key
-{
-   int vgrf;
-   int reg_offset;
-};
-
-static bool
-dead_code_hash_compare(const void *a, const void *b)
-{
-   return memcmp(a, b, sizeof(struct dead_code_hash_key)) == 0;
-}
-
-static void
-clear_dead_code_hash(struct hash_table *ht)
-{
-   struct hash_entry *entry;
-
-   hash_table_foreach(ht, entry) {
-  _mesa_hash_table_remove(ht, entry);
-   }
-}
-
-static void
-insert_dead_code_hash(struct hash_table *ht,
-  int vgrf, int reg_offset, fs_inst *inst)
-{
-   /* We don't bother freeing keys, because they'll be GCed with the ht. */
-   struct dead_code_hash_key *key = ralloc(ht, struct dead_code_hash_key);
-
-   key-vgrf = vgrf;
-   key-reg_offset = reg_offset;
-
-   _mesa_hash_table_insert(ht, _mesa_hash_data(key, sizeof(*key)), key, inst);
-}
-
-static struct hash_entry *
-get_dead_code_hash_entry(struct hash_table *ht, int vgrf, int reg_offset)
-{
-   struct dead_code_hash_key key;
-
-   key.vgrf = vgrf;
-   key.reg_offset = reg_offset;
-
-   return _mesa_hash_table_search(ht, _mesa_hash_data(key, sizeof(key)), 
key);
-}
-
-static void
-remove_dead_code_hash(struct hash_table *ht,
-  int vgrf, int reg_offset)
-{
-   struct hash_entry *entry = get_dead_code_hash_entry(ht, vgrf, reg_offset);
-   if (!entry)
-  return;
-
-   _mesa_hash_table_remove(ht, entry);
-}
-
-/**
- * Walks basic blocks, removing any regs that are written but not read before
- * being redefined.
- *
- * The dead_code_eliminate() function implements a global dead code
- * elimination, but it only handles the removing the last write to a register
- * if it's never read.  This one can handle intermediate writes, but only
- * within a basic block.
- */
-bool
-fs_visitor::dead_code_eliminate_local()
-{
-   struct hash_table *ht;
-   bool progress = false;
-
-   ht = _mesa_hash_table_create(mem_ctx, dead_code_hash_compare);
-
-   if (ht == NULL) {
-  return false;
-   }
-
-   foreach_list_safe(node, this-instructions) {
-  fs_inst *inst = (fs_inst *)node;
-
-  /* At a basic block, empty the HT since we don't understand dataflow
-   * here.
-   */
-  if (inst-is_control_flow()) {
- clear_dead_code_hash(ht);
- continue;
-  }
-
-  /* Clear the HT of any instructions that got read. */
-  for (int i = 0; i  3; i++) {
- fs_reg src = inst-src[i];
- if (src.file != GRF)
-continue;
-
- int read = 1;
- if (inst-is_send_from_grf())
-read = virtual_grf_sizes[src.reg] - src.reg_offset;
-
- for (int reg_offset = src.reg_offset;
-  reg_offset  src.reg_offset + read;
-  reg_offset++) {
-remove_dead_code_hash(ht, src.reg, reg_offset);
- }
-  }
-
-  /* Add any update of a GRF to the HT, removing a previous write if it
-   * wasn't read.
-   */
-  if (inst-dst.file == GRF) {
- if (inst-regs_written  1) {
-/* We don't know how to trim channels from an instruction's
- * writes, so we can't incrementally remove unread channels from
- * it.  Just remove whatever it overwrites from the table
- */
-for (int i = 0; i  inst-regs_written; i++) {
-   remove_dead_code_hash(ht,
- inst-dst.reg,
- inst-dst.reg_offset + i);
-}
- } else {
-struct hash_entry *entry =
-   get_dead_code_hash_entry(ht, inst-dst.reg,
-inst-dst.reg_offset);
-
-if (entry) {
-   if (inst-is_partial_write()) {
-  /* For a partial write, we can't remove any previous dead 
code
-   * candidate, since we're just modifying their result.
-   */
-   } else {
-  /* We're completely updating a channel, and there was a
-   * previous write to the channel that wasn't read.  Kill it!
-   */
-  fs_inst *inst = (fs_inst *)entry-data;
-  inst-remove();
-  progress = true;
-   }
-
-   _mesa_hash_table_remove(ht, entry);
-}
-
-if (!inst-has_side_effects())
-   insert_dead_code_hash(ht, inst-dst.reg, inst-dst.reg_offset,
- inst);
- }

[Mesa-dev] [PATCH 1/3] i965/fs: Reimplement dead_code_elimination().

2014-04-14 Thread Matt Turner
total instructions in shared programs: 1653399 - 1651790 (-0.10%)
instructions in affected programs: 92157 - 90548 (-1.75%)
GAINED:2
LOST:  2

Also significantly reduces the number of optimization loop iterations:

total loop iterations in shared programs: 39724 - 31651 (-20.32%)
loop iterations in affected programs: 21617 - 13544 (-37.35%)

Including some great pathological cases, like 29 - 3 in Strike Suit
Zero and 24 - 3 in Dota2.
---
Moving the placement of the dead_code_elimination() call in the optimization
loop avoids a butterfly-effect instruction count regression. It's placement
is immaterial, because dead_code_eliminate_local() is going away too.

 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp   |  57 +--
 .../dri/i965/brw_fs_dead_code_eliminate.cpp| 109 +
 3 files changed, 111 insertions(+), 56 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 8205fe9..836c62b 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -58,6 +58,7 @@ i965_FILES = \
brw_fs_channel_expressions.cpp \
brw_fs_copy_propagation.cpp \
brw_fs_cse.cpp \
+   brw_fs_dead_code_eliminate.cpp \
brw_fs_fp.cpp \
brw_fs_generator.cpp \
brw_fs_live_variables.cpp \
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 85a5463..c723bf0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2085,61 +2085,6 @@ fs_visitor::opt_algebraic()
return progress;
 }
 
-/**
- * Removes any instructions writing a VGRF where that VGRF is not used by any
- * later instruction.
- */
-bool
-fs_visitor::dead_code_eliminate()
-{
-   bool progress = false;
-   int pc = 0;
-
-   calculate_live_intervals();
-
-   foreach_list_safe(node, this-instructions) {
-  fs_inst *inst = (fs_inst *)node;
-
-  if (inst-dst.file == GRF  !inst-has_side_effects()) {
- bool dead = true;
-
- for (int i = 0; i  inst-regs_written; i++) {
-int var = live_intervals-var_from_vgrf[inst-dst.reg];
-assert(live_intervals-end[var + inst-dst.reg_offset + i] = pc);
-if (live_intervals-end[var + inst-dst.reg_offset + i] != pc) {
-   dead = false;
-   break;
-}
- }
-
- if (dead) {
-/* Don't dead code eliminate instructions that write to the
- * accumulator as a side-effect. Instead just set the destination
- * to the null register to free it.
- */
-switch (inst-opcode) {
-case BRW_OPCODE_ADDC:
-case BRW_OPCODE_SUBB:
-case BRW_OPCODE_MACH:
-   inst-dst = fs_reg(retype(brw_null_reg(), inst-dst.type));
-   break;
-default:
-   inst-remove();
-   progress = true;
-   break;
-}
- }
-  }
-
-  pc++;
-   }
-
-   if (progress)
-  invalidate_live_intervals();
-
-   return progress;
-}
-
 struct dead_code_hash_key
 {
int vgrf;
@@ -3249,8 +3194,8 @@ fs_visitor::run()
 progress = opt_cse() || progress;
 progress = opt_copy_propagate() || progress;
  progress = opt_peephole_predicated_break() || progress;
-progress = dead_code_eliminate() || progress;
 progress = dead_code_eliminate_local() || progress;
+ progress = dead_code_eliminate() || progress;
  progress = opt_peephole_sel() || progress;
  progress = dead_control_flow_eliminate(this) || progress;
  progress = opt_saturate_propagation() || progress;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
new file mode 100644
index 000..6addbb3
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
@@ -0,0 +1,109 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * 

[Mesa-dev] [Bug 77449] Tracker bug for all bugs related to Steam titles

2014-04-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=77449

Tapani Pälli lem...@gmail.com changed:

   What|Removed |Added

 CC||lem...@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] EXTERNAL: Re: Clover clEnqueue* function don't implement blocking?

2014-04-14 Thread Francisco Jerez
Dorrington, Albert albert.dorring...@lmco.com writes:

 -Original Message-
 From: Francisco Jerez [mailto:curroje...@riseup.net]
 Sent: Monday, April 14, 2014 4:04 PM
 To: Dorrington, Albert; mesa-dev@lists.freedesktop.org
 Subject: RE: EXTERNAL: Re: [Mesa-dev] Clover clEnqueue* function don't
 implement blocking?
 
 Dorrington, Albert albert.dorring...@lmco.com writes:
 [...]
  But when these events are queued, if there isn't a wait(), then what
 triggers their flush from the queued_events list?  That seems to only
 happen when the hard_event::wait() is called (assuming the status is
  queued)
 
 
 Yes, that's right, or when clFlush() is called.  So basically we only flush 
 the
 queued commands to the hardware when the user does some blocking call
 like reading back a buffer object or waiting for an event explicitly.
 
  Shouldn't something be flushing the queue as the events are processed
  (especially if nothing is pending on the events), well before the
  clFinish() call?
 
 I don't have any evidence that doing so would improve performance.
 Clover tries to minimize the frequency of flushes because it can be quite an
 expensive operation -- AFAIK for all hardware gallium drivers it involves at
 least one system call.

 From reading the OpenCL spec (and perhaps I'm misinterpreting something 
 again), section 5.10 Flush and Finish says:

   Any blocking commands queued in a command-queue such as 
   clEnqueueRead{Image|Buffer} with blocking_read set to CL_TRUE, 
   clEnqueueWrite{Image|Buffer} with blocking_write set to CL_TRUE, 
   clEnqueueMap{Buffer|Image} with blocking_map set to CL_TRUE or 
   clWaitForEvents perform an implicit flush of the command-queue. 

 From this statement, I would expect that the command-queue would be flushed 
 when the blocking flag is set.

clEnqueueRead*, clEnqueueMap* and clWaitForEvents already flush the
command queue (the first two are flushing indirectly as we try to map a
buffer referenced by the GPU).  clEnqueueWrite* doesn't flush, but it's
not clear to me that not doing it can be considered a violation of the
spec.  The guarantees given by clFlush() are rather vague (to some
extent an empty function could be a valid implementation) and it seems
to me that a compliant implementation might, for instance, choose to
batch up commands across flushes if that's the most efficient thing to
do, as long as the user has no way to tell the difference.

I'd like to see some real-world example where clover's behavior
represents a problem before we change it to flush more frequently,
because I'm worried that changing this will actually worsen performance
rather than improving it.


pgpwTsYtHTyVU.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Add pass to rename registers to break live ranges.

2014-04-14 Thread Matt Turner
From: Kenneth Graunke kenn...@whitecape.org

The pass breaks live ranges of virtual registers by allocating new
registers when it sees an assignment to a virtual GRF it's already seen
written.

total instructions in shared programs: 1656292 - 1651898 (-0.27%)
instructions in affected programs: 466836 - 462442 (-0.94%)
GAINED:173
LOST:  3
---
[mattst88]: Perform renaming at control flow level 0, rather than bailing
at first sight of control flow.

 src/mesa/drivers/dri/i965/brw_fs.cpp | 56 
 src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
 2 files changed, 57 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index dbce167..aa957cc 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2076,6 +2076,61 @@ fs_visitor::opt_algebraic()
 }
 
 bool
+fs_visitor::opt_register_renaming()
+{
+   bool progress = false;
+   int depth = 0;
+
+   int remap[virtual_grf_count];
+   memset(remap, -1, sizeof(int) * virtual_grf_count);
+
+   foreach_list(node, this-instructions) {
+  fs_inst *inst = (fs_inst *) node;
+
+  if (inst-opcode == BRW_OPCODE_IF || inst-opcode == BRW_OPCODE_DO) {
+ depth++;
+  } else if (inst-opcode == BRW_OPCODE_ENDIF ||
+ inst-opcode == BRW_OPCODE_WHILE) {
+ depth--;
+  }
+
+  /* Rewrite instruction sources. */
+  for (int i = 0; i  3; i++) {
+ if (inst-src[i].file == GRF 
+ remap[inst-src[i].reg] != -1 
+ remap[inst-src[i].reg] != inst-src[i].reg) {
+inst-src[i].reg = remap[inst-src[i].reg];
+progress = true;
+ }
+  }
+
+  int dst = inst-dst.reg;
+
+  if (depth == 0 
+  inst-dst.file == GRF 
+  virtual_grf_sizes[inst-dst.reg] == 1 
+  !inst-is_partial_write()) {
+ if (remap[dst] == -1) {
+remap[dst] = dst;
+ } else {
+remap[dst] = virtual_grf_alloc(virtual_grf_sizes[dst]);
+inst-dst.reg = remap[dst];
+progress = true;
+ }
+  } else if (inst-dst.file == GRF 
+ remap[dst] != -1 
+ remap[dst] != dst) {
+ inst-dst.reg = remap[dst];
+ progress = true;
+  }
+   }
+
+   invalidate_live_intervals();
+
+   return progress;
+}
+
+bool
 fs_visitor::compute_to_mrf()
 {
bool progress = false;
@@ -3031,6 +3086,7 @@ fs_visitor::run()
  progress = opt_peephole_sel() || progress;
  progress = dead_control_flow_eliminate(this) || progress;
  progress = opt_saturate_propagation() || progress;
+ progress = opt_register_renaming() || progress;
  progress = register_coalesce() || progress;
 progress = compute_to_mrf() || progress;
   } while (progress);
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index acf6a11..3f368c2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -371,6 +371,7 @@ public:
bool opt_copy_propagate_local(void *mem_ctx, bblock_t *block,
  exec_list *acp);
void opt_drop_redundant_mov_to_flags();
+   bool opt_split_live_ranges();
bool register_coalesce();
bool compute_to_mrf();
bool dead_code_eliminate();
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Add pass to rename registers to break live ranges.

2014-04-14 Thread Matt Turner
On Mon, Apr 14, 2014 at 3:01 PM, Matt Turner matts...@gmail.com wrote:
 From: Kenneth Graunke kenn...@whitecape.org

 The pass breaks live ranges of virtual registers by allocating new
 registers when it sees an assignment to a virtual GRF it's already seen
 written.

 total instructions in shared programs: 1656292 - 1651898 (-0.27%)
 instructions in affected programs: 466836 - 462442 (-0.94%)
 GAINED:173
 LOST:  3
 ---
 [mattst88]: Perform renaming at control flow level 0, rather than bailing
 at first sight of control flow.

  src/mesa/drivers/dri/i965/brw_fs.cpp | 56 
 
  src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
  2 files changed, 57 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index dbce167..aa957cc 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -2076,6 +2076,61 @@ fs_visitor::opt_algebraic()
  }

  bool
 +fs_visitor::opt_register_renaming()
 +{
 +   bool progress = false;
 +   int depth = 0;
 +
 +   int remap[virtual_grf_count];
 +   memset(remap, -1, sizeof(int) * virtual_grf_count);
 +
 +   foreach_list(node, this-instructions) {
 +  fs_inst *inst = (fs_inst *) node;
 +
 +  if (inst-opcode == BRW_OPCODE_IF || inst-opcode == BRW_OPCODE_DO) {
 + depth++;
 +  } else if (inst-opcode == BRW_OPCODE_ENDIF ||
 + inst-opcode == BRW_OPCODE_WHILE) {
 + depth--;
 +  }
 +
 +  /* Rewrite instruction sources. */
 +  for (int i = 0; i  3; i++) {
 + if (inst-src[i].file == GRF 
 + remap[inst-src[i].reg] != -1 
 + remap[inst-src[i].reg] != inst-src[i].reg) {
 +inst-src[i].reg = remap[inst-src[i].reg];
 +progress = true;
 + }
 +  }
 +
 +  int dst = inst-dst.reg;
 +
 +  if (depth == 0 
 +  inst-dst.file == GRF 
 +  virtual_grf_sizes[inst-dst.reg] == 1 
 +  !inst-is_partial_write()) {
 + if (remap[dst] == -1) {
 +remap[dst] = dst;
 + } else {
 +remap[dst] = virtual_grf_alloc(virtual_grf_sizes[dst]);
 +inst-dst.reg = remap[dst];
 +progress = true;
 + }
 +  } else if (inst-dst.file == GRF 
 + remap[dst] != -1 
 + remap[dst] != dst) {
 + inst-dst.reg = remap[dst];
 + progress = true;
 +  }
 +   }
 +
 +   invalidate_live_intervals();
 +
 +   return progress;
 +}
 +
 +bool
  fs_visitor::compute_to_mrf()
  {
 bool progress = false;
 @@ -3031,6 +3086,7 @@ fs_visitor::run()
   progress = opt_peephole_sel() || progress;
   progress = dead_control_flow_eliminate(this) || progress;
   progress = opt_saturate_propagation() || progress;
 + progress = opt_register_renaming() || progress;
   progress = register_coalesce() || progress;
  progress = compute_to_mrf() || progress;
} while (progress);
 diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
 b/src/mesa/drivers/dri/i965/brw_fs.h
 index acf6a11..3f368c2 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.h
 +++ b/src/mesa/drivers/dri/i965/brw_fs.h
 @@ -371,6 +371,7 @@ public:
 bool opt_copy_propagate_local(void *mem_ctx, bblock_t *block,
   exec_list *acp);
 void opt_drop_redundant_mov_to_flags();
 +   bool opt_split_live_ranges();

Rebase fail. This should obviously be opt_register_renaming().
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/10] winsys/radeon: remove cs_write_reloc, add simpler cs_get_reloc

2014-04-14 Thread Marek Olšák
Actually, I take it back. There are several reasons for having an explicit flag.

1) Flushes from the state tracker are always synchronous. It's because
the CS ioctl must be finished before libGL sends the SwapBuffers
request to the X server.

2) Flushes before calling buffer_map where synchronization is expected
are also synchronous. The buffer_map and buffer_wait functions don't
call cs_sync_flush, instead, they loop and call sched_yield(). This is
probably fixable now that there is a global CS queue (originally, each
CS instance had its own thread).

3) I guess there is some overhead in switching threads, and doing
command submission in the main thread is probably easier for the CPU.

Marek


On Mon, Apr 14, 2014 at 2:41 PM, Marek Olšák mar...@gmail.com wrote:
 Yes, I will remove the flag in the next patch series.

 Marek

 On Sun, Apr 13, 2014 at 10:45 AM, Christian König
 deathsim...@vodafone.de wrote:
 For this series: Reviewed-by: Christian König christian.koe...@amd.com

 BTW: Can we get rid of RADEON_FLUSH_ASYNC and always flush asynchronously?
 Just calling cs_sync_flush directly after the flush should have the same
 effect as synchronous flushing.

 Christian.

 Am 12.04.2014 18:34, schrieb Marek Olšák:

 From: Marek Olšák marek.ol...@amd.com

 The only difference is that it doesn't write to the CS and only returns
 the index.
 ---
   src/gallium/drivers/r300/r300_cs.h|  3 ++-
   src/gallium/drivers/r300/r300_emit.c  |  4 ++--
   src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 26
 +-
   src/gallium/winsys/radeon/drm/radeon_winsys.h | 19 ++-
   4 files changed, 23 insertions(+), 29 deletions(-)

 diff --git a/src/gallium/drivers/r300/r300_cs.h
 b/src/gallium/drivers/r300/r300_cs.h
 index 744e19e..748d6ea 100644
 --- a/src/gallium/drivers/r300/r300_cs.h
 +++ b/src/gallium/drivers/r300/r300_cs.h
 @@ -109,7 +109,8 @@
   #define OUT_CS_RELOC(r) do { \
   assert((r)); \
   assert((r)-cs_buf); \
 -cs_winsys-cs_write_reloc(cs_copy, (r)-cs_buf); \
 +OUT_CS(0xc0001000); /* PKT3_NOP */ \
 +OUT_CS(cs_winsys-cs_get_reloc(cs_copy, (r)-cs_buf) * 4); \
   CS_USED_DW(2); \
   } while (0)
   diff --git a/src/gallium/drivers/r300/r300_emit.c
 b/src/gallium/drivers/r300/r300_emit.c
 index d99b919..9f16413 100644
 --- a/src/gallium/drivers/r300/r300_emit.c
 +++ b/src/gallium/drivers/r300/r300_emit.c
 @@ -1043,8 +1043,8 @@ void r300_emit_vertex_arrays_swtcl(struct
 r300_context *r300, boolean indexed)
   OUT_CS(0);
 assert(r300-vbo_cs);
 -cs_winsys-cs_write_reloc(cs_copy, r300-vbo_cs);
 -CS_USED_DW(2);
 +OUT_CS(0xc0001000); /* PKT3_NOP */
 +OUT_CS(r300-rws-cs_get_reloc(r300-cs, r300-vbo_cs) * 4);
   END_CS;
   }
   diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 index b55eb80..4ce1717 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 @@ -313,6 +313,14 @@ static unsigned radeon_drm_cs_add_reloc(struct
 radeon_winsys_cs *rcs,
   return index;
   }
   +static int radeon_drm_cs_get_reloc(struct radeon_winsys_cs *rcs,
 +   struct radeon_winsys_cs_handle *buf)
 +{
 +struct radeon_drm_cs *cs = radeon_drm_cs(rcs);
 +
 +return radeon_get_reloc(cs-csc, (struct radeon_bo*)buf, NULL);
 +}
 +
   static boolean radeon_drm_cs_validate(struct radeon_winsys_cs *rcs)
   {
   struct radeon_drm_cs *cs = radeon_drm_cs(rcs);
 @@ -359,22 +367,6 @@ static boolean
 radeon_drm_cs_memory_below_limit(struct radeon_winsys_cs *rcs, ui
   return status;
   }
   -static void radeon_drm_cs_write_reloc(struct radeon_winsys_cs *rcs,
 -  struct radeon_winsys_cs_handle
 *buf)
 -{
 -struct radeon_drm_cs *cs = radeon_drm_cs(rcs);
 -struct radeon_bo *bo = (struct radeon_bo*)buf;
 -unsigned index = radeon_get_reloc(cs-csc, bo, NULL);
 -
 -if (index == -1) {
 -fprintf(stderr, radeon: Cannot get a relocation in %s.\n,
 __func__);
 -return;
 -}
 -
 -OUT_CS(cs-base, 0xc0001000);
 -OUT_CS(cs-base, index * RELOC_DWORDS);
 -}
 -
   void radeon_drm_cs_emit_ioctl_oneshot(struct radeon_drm_cs *cs, struct
 radeon_cs_context *csc)
   {
   unsigned i;
 @@ -650,9 +642,9 @@ void radeon_drm_cs_init_functions(struct
 radeon_drm_winsys *ws)
   ws-base.cs_create = radeon_drm_cs_create;
   ws-base.cs_destroy = radeon_drm_cs_destroy;
   ws-base.cs_add_reloc = radeon_drm_cs_add_reloc;
 +ws-base.cs_get_reloc = radeon_drm_cs_get_reloc;
   ws-base.cs_validate = radeon_drm_cs_validate;
   ws-base.cs_memory_below_limit = radeon_drm_cs_memory_below_limit;
 -ws-base.cs_write_reloc = radeon_drm_cs_write_reloc;
   ws-base.cs_flush = radeon_drm_cs_flush;
   ws-base.cs_set_flush_callback = radeon_drm_cs_set_flush;
   ws-base.cs_is_buffer_referenced = radeon_bo_is_referenced;
 

[Mesa-dev] [PATCH 1/3] i965: Put an assertion to check valid varying_to_slot[varying]

2014-04-14 Thread Anuj Phogat
Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
---
 src/mesa/drivers/dri/i965/gen7_sol_state.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index 1b58efb..06a4cdf 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -132,6 +132,7 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
   buffer_mask |= 1  buffer;
 
   decl |= buffer  SO_DECL_OUTPUT_BUFFER_SLOT_SHIFT;
+  assert(vue_map-varying_to_slot[varying] = 0);
   decl |= vue_map-varying_to_slot[varying] 
 SO_DECL_REGISTER_INDEX_SHIFT;
   decl |= component_mask  SO_DECL_COMPONENT_MASK_SHIFT;
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] i965: Fix component mask and varying_to_slot mapping for gl_ViewportIndex

2014-04-14 Thread Anuj Phogat
gl_ViewportIndex doesn't get its own varying slot. It is stored
in VARYING_SLOT_PSIZ.z. This patch fixes the issue for both gen7
and gen8 because gen7_upload_3dstate_so_decl_list() is shared
between them.

Fixes failures in OpenGL Khronos CTS test transform_feedback_builtins.
Makes new piglit test glsl-1.50-transform-feedback-builtins pass for
'gl_ViewportIndex'.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
---
 src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index 3623238..8e554af 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -123,6 +123,7 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
 
   /* gl_PointSize is stored in VARYING_SLOT_PSIZ.w
* gl_Layer is stored in VARYING_SLOT_PSIZ.y
+   * gl_ViewportIndex is stored in VARYING_SLOT_PSIZ.z
*/
   if (varying == VARYING_SLOT_PSIZ) {
  assert(components == 1);
@@ -130,6 +131,9 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
   } else if (varying == VARYING_SLOT_LAYER) {
  assert(components == 1);
  component_mask = 1;
+  } else if (varying == VARYING_SLOT_VIEWPORT) {
+ assert(components == 1);
+ component_mask = 2;
   } else {
  component_mask = linked_xfb_info-Outputs[i].ComponentOffset;
   }
@@ -137,7 +141,7 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
   buffer_mask |= 1  buffer;
 
   decl |= buffer  SO_DECL_OUTPUT_BUFFER_SLOT_SHIFT;
-  if (varying == VARYING_SLOT_LAYER) {
+  if (varying == VARYING_SLOT_LAYER || varying == VARYING_SLOT_VIEWPORT) {
  decl |= vue_map-varying_to_slot[VARYING_SLOT_PSIZ] 
 SO_DECL_REGISTER_INDEX_SHIFT;
   } else {
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] i965: Fix component mask and varying_to_slot mapping for gl_Layer

2014-04-14 Thread Anuj Phogat
gl_Layer doesn't get its own varying slot. It is stored in
VARYING_SLOT_PSIZ.y. This patch fixes the issue for both gen7
and gen8 because gen7_upload_3dstate_so_decl_list() is shared
between them.

Fixes failures in OpenGL Khronos CTS test transform_feedback_builtins.
Makes new piglit test glsl-1.50-transform-feedback-builtins pass for
'gl_Layer'.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
---
 src/mesa/drivers/dri/i965/gen7_sol_state.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index 06a4cdf..3623238 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -121,10 +121,15 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
   const unsigned components = linked_xfb_info-Outputs[i].NumComponents;
   unsigned component_mask = (1  components) - 1;
 
-  /* gl_PointSize is stored in VARYING_SLOT_PSIZ.w. */
+  /* gl_PointSize is stored in VARYING_SLOT_PSIZ.w
+   * gl_Layer is stored in VARYING_SLOT_PSIZ.y
+   */
   if (varying == VARYING_SLOT_PSIZ) {
  assert(components == 1);
  component_mask = 3;
+  } else if (varying == VARYING_SLOT_LAYER) {
+ assert(components == 1);
+ component_mask = 1;
   } else {
  component_mask = linked_xfb_info-Outputs[i].ComponentOffset;
   }
@@ -132,9 +137,14 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
   buffer_mask |= 1  buffer;
 
   decl |= buffer  SO_DECL_OUTPUT_BUFFER_SLOT_SHIFT;
-  assert(vue_map-varying_to_slot[varying] = 0);
-  decl |= vue_map-varying_to_slot[varying] 
-SO_DECL_REGISTER_INDEX_SHIFT;
+  if (varying == VARYING_SLOT_LAYER) {
+ decl |= vue_map-varying_to_slot[VARYING_SLOT_PSIZ] 
+SO_DECL_REGISTER_INDEX_SHIFT;
+  } else {
+ assert(vue_map-varying_to_slot[varying] = 0);
+ decl |= vue_map-varying_to_slot[varying] 
+SO_DECL_REGISTER_INDEX_SHIFT;
+  }
   decl |= component_mask  SO_DECL_COMPONENT_MASK_SHIFT;
 
   /* Mesa doesn't store entries for gl_SkipComponents in the Outputs[]
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix buffer overruns in MSAA MCS buffer clearing.

2014-04-14 Thread Eric Anholt
This manifested as rendering failures or sometimes GPU hangs in
compositors when they accidentally got MSAA visuals due to a bug in the X
Server.  Today we decided that the problem in compositors was equivalent
to a corruption bug we'd noticed recently in resizing MSAA-visual
glxgears, and debugging got a lot easier.

When we allocate our MCS MT, libdrm takes the size we request, aligns it
to Y tile size (blowing it up from 300x300=90 bytes to 384*320=122880
bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes,
32 pages).  Because it's Y tiled, we attach a 384-byte-stride fence to it.
When we memset by the BO size in Mesa, between bytes 122880 and 131072 the
data gets stored to the first 20 or so scanlines of each of the 3 tiled
pages in that row, even though only 2 of those pages were allocated by
libdrm.  In the glxgears case, the missing 3rd page happened to
consistently be the static VBO that got mapped right after the first MCS
allocation, so corruption only appeared once window resize made us throw
out the old MCS and then allocate the same BO to back the new MCS.

Instead, just memset the amount of data we actually asked libdrm to
allocate for, which will be smaller (more efficient) and not overrun.
Thanks go to Kenneth for doing most of the hard debugging to eliminate a
lot of the search space for the bug.

Cc: 10.0 10.1 mesa-sta...@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77207
---

Ken: how I eventually figured this out was I thought well, I'm
clearing the whole surface at the start of rendering glxgears, so I
don't *really* need to memset because I'll be initializing the whole
buffer during fallback glClear() anyway, so let me just drop the whole
memset step to definitely eliminate that as a potential problem.  Huh,
the problem's gone.

The worst is I remember briefly thinking about this code when it was
landing, and thought nope, seems safe.

 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 5996a1b..59700ed 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -1219,7 +1219,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
 * Note: the clear value for MCS buffers is all 1's, so we memset to 0xff.
 */
void *data = intel_miptree_map_raw(brw, mt-mcs_mt);
-   memset(data, 0xff, mt-mcs_mt-region-bo-size);
+   memset(data, 0xff, mt-mcs_mt-region-height * mt-mcs_mt-region-pitch);
intel_miptree_unmap_raw(brw, mt-mcs_mt);
mt-fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
 
-- 
1.9.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] i965/fs: Clear variable from live-set if it's completely overwritten.

2014-04-14 Thread Eric Anholt
Matt Turner matts...@gmail.com writes:

 One program affected:

 instructions in affected programs: 246 - 244 (-0.81%)
 ---
  src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp | 7 +++
  1 file changed, 7 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
 index 6addbb3..1ae7606 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
 @@ -80,6 +80,13 @@ fs_visitor::dead_code_eliminate()
  }
   }
  
 + if (inst-dst.file == GRF) {
 +if (!inst-is_partial_write()) {
 +   int var = live_intervals-var_from_reg(inst-dst);
 +   BITSET_CLEAR(live, var);
 +}
 + }

Any more effect if you:

 +   int var = live_intervals-var_from_vgrf[inst-dst.reg];
 +   for (int i = 0; i  inst-regs_written; i++)
 +  BITSET_CLEAR(live, var + i);

instead?


pgpEZiXTiNn62.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] translate_sse: Preserve low bit during unsigned - float conversion.

2014-04-14 Thread Andreas Hartmetz
From: Andreas Hartmetz andreas.hartm...@kdab.com

---
 src/gallium/auxiliary/translate/translate_sse.c | 35 -
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index dace722..2a0d3c1 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -71,7 +71,7 @@ struct translate_group
 
 #define ELEMENT_BUFFER_INSTANCE_ID  1001
 
-#define NUM_CONSTS 7
+#define NUM_CONSTS 8
 
 enum
 {
@@ -81,6 +81,7 @@ enum
CONST_INV_32767,
CONST_INV_65535,
CONST_INV_2147483647,
+   CONST_INV_4294967295,
CONST_255
 };
 
@@ -92,6 +93,7 @@ static float consts[NUM_CONSTS][4] = {
C(1.0 / 32767.0),
C(1.0 / 65535.0),
C(1.0 / 2147483647.0),
+   C(1.0 / 4294967295.0),
C(255.0)
 };
 
@@ -515,6 +517,7 @@ translate_attr_convert(struct translate_sse *p,
 || a-output_format == PIPE_FORMAT_R32G32B32_FLOAT
 || a-output_format == PIPE_FORMAT_R32G32B32A32_FLOAT)) {
   struct x86_reg dataXMM = x86_make_reg(file_XMM, 0);
+  struct x86_reg dataXMM2 = x86_make_reg(file_XMM, 1);
 
   for (i = 0; i  output_desc-nr_channels; ++i) {
  if (swizzle[i] == UTIL_FORMAT_SWIZZLE_0
@@ -546,17 +549,38 @@ translate_attr_convert(struct translate_sse *p,
 */
sse2_punpcklbw(p-func, dataXMM, get_const(p, CONST_IDENTITY));
sse2_punpcklbw(p-func, dataXMM, get_const(p, CONST_IDENTITY));
+   sse2_cvtdq2ps(p-func, dataXMM, dataXMM);
break;
 case 16:
sse2_punpcklwd(p-func, dataXMM, get_const(p, CONST_IDENTITY));
+   sse2_cvtdq2ps(p-func, dataXMM, dataXMM);
break;
-case 32:   /* we lose precision here */
+case 32:   /* we lose precision if value  2^23 - 1 */
+   /* ...but try to keep all bits for value = 2^23 - 1 */
+   /* this could be done much smarter for 1 or 2 dwords of input
+* on X64, using sth like cvtsi2ss xmm0, rax (note *R*ax) */
+
+   /* save the low bit */
+   sse2_movdqa(p-func, dataXMM2, dataXMM);
+   /* right shift  convert, losing the low bit - must clear
+* high bit because there is no unsigned convert instruction */
sse2_psrld_imm(p-func, dataXMM, 1);
+   sse2_cvtdq2ps(p-func, dataXMM, dataXMM);
+
+   /* convert low bit to float */
+   sse2_pslld_imm(p-func, dataXMM2, 31);
+   sse2_psrld_imm(p-func, dataXMM2, 31);
+   sse2_cvtdq2ps(p-func, dataXMM2, dataXMM2);
+
+   /* mostly undo the right-shift by 1 */
+   sse_addps(p-func, dataXMM, dataXMM);
+   /* add back the lost low bit */
+   sse_addps(p-func, dataXMM, dataXMM2);
break;
 default:
return FALSE;
 }
-sse2_cvtdq2ps(p-func, dataXMM, dataXMM);
+
 if (input_desc-channel[0].normalized) {
struct x86_reg factor;
switch (input_desc-channel[0].size) {
@@ -567,7 +591,7 @@ translate_attr_convert(struct translate_sse *p,
   factor = get_const(p, CONST_INV_65535);
   break;
case 32:
-  factor = get_const(p, CONST_INV_2147483647);
+  factor = get_const(p, CONST_INV_4294967295);
   break;
default:
   assert(0);
@@ -579,9 +603,6 @@ translate_attr_convert(struct translate_sse *p,
}
sse_mulps(p-func, dataXMM, factor);
 }
-else if (input_desc-channel[0].size == 32)
-   /* compensate for the bit we threw away to fit u32 into s32 */
-   sse_addps(p-func, dataXMM, dataXMM);
 break;
  case UTIL_FORMAT_TYPE_SIGNED:
 if (!(x86_target_caps(p-func)  X86_SSE2))
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] translate_sse: Preserve low bit during unsigned - float conversion.

2014-04-14 Thread Andreas Hartmetz
  +   /* right shift  convert, losing the low bit - must clear
  +* high bit because there is no unsigned convert
  instruction */ 
  sse2_psrld_imm(p-func, dataXMM, 1);
  
  +   sse2_cvtdq2ps(p-func, dataXMM, dataXMM);
  +
  +   /* convert low bit to float */
  +   sse2_pslld_imm(p-func, dataXMM2, 31);
  +   sse2_psrld_imm(p-func, dataXMM2, 31);
  +   sse2_cvtdq2ps(p-func, dataXMM2, dataXMM2);
 
 Is this really ideal, wouldn't something like (in horrible pseudo-code
 notation)
 dataXMM2 = dataXMM2  CONST(0x1 vec)
 dataXMM2 = cvtdq2ps(dataXMM2)
 be faster?
 I guess though your method avoids the constant, so probably not worth
 bothering (I am actually wondering what code llvm generates for its
 UIToFP instruction or what in general the fastest way to do this is).
 
Well, this whole sequence is somewhat wasteful for a single bit, but
it mattered in my application before I realized that it won't work
anyway on too many drivers.

I was reluctant to add another register for the 0x1 constant(*);
loading it from memory each time seemed like it would take a lot of
bandwidth. Alternatively, loading an immediate into one 32 bit
(or smaller?) sub register and then copying it into the others
would amount to about the same instruction count after all.
Maybe there are more execution units available for doing it that way,
I don't know. Also I haven't found how to actually copy a value from
the lowest subregister into all others in one (fast) instruction.

(*) The code in this file generally uses rather few registers. If
adding one to keep the 0x1 is not a problem, that's likely optimal.

FWIW, the best I could get compilers to produce was four times
cvtsi2ss xmm0, rax - looks like the somewhat clever use of rax
with a non maxed out value range is a hardcoded pattern for the
conversion and the compilers have no means to be more creative
there. With x86 target I also saw a code sequence splitting the uint
value into two 16 bit values, converting them and then adding
them after multiplying the higher order bits by 0x1. Repeated
four times... not sure if there is a good reason for that or if it's
just a compiler limitation that the code wasn't properly SIMDed.
Every one(!) of those four iterations seemed at least equally
expensive to the whole sequence in this patch.
Compilers tested were GCC 4.8 -O3 and Clang trunk -O3.

If anybody knows an optimal sequence I'd happily see that used
instead.

 Roland
 

snip
Patch that fixes accidentally scalar mov follows.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] i965/fs: Reimplement dead_code_elimination().

2014-04-14 Thread Eric Anholt
Matt Turner matts...@gmail.com writes:
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
 new file mode 100644
 index 000..6addbb3
 --- /dev/null
 +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp

 +
 +/** @file brw_fs_dead_code_eliminate.cpp
 + */

Perhaps some actual comment:

* Dataflow-aware dead code elimination.
*
* Walks the instruction list from the bottom, removing instructions that
* have results that both aren't used in later blocks and haven't been read
* yet in the tail end of this block.

 +bool
 +fs_visitor::dead_code_eliminate()
 +{
 +   bool progress = false;
 +
 +   cfg_t cfg(instructions);
 +
 +   calculate_live_intervals();
 +
 +   int num_vars = live_intervals-num_vars;
 +   BITSET_WORD *live = ralloc_array(NULL, BITSET_WORD, 
 BITSET_WORDS(num_vars));
 +
 +   for (int b = 0; b  cfg.num_blocks; b++) {
 +  bblock_t *block = cfg.blocks[b];
 +  memcpy(live, live_intervals-bd[b].liveout,
 + sizeof(BITSET_WORD) * BITSET_WORDS(num_vars));
 +
 +  for (fs_inst *inst = (fs_inst *)block-end;
 +   inst != block-start-prev;
 +   inst = (fs_inst *)inst-prev) {
 + if (inst-dst.file == GRF 
 + !inst-has_side_effects() 
 + !inst-writes_flag()) {
 +bool result_live = false;
 +
 +if (inst-regs_written == 1) {
 +   int var = live_intervals-var_from_reg(inst-dst);
 +   result_live = BITSET_TEST(live, var);
 +} else {
 +   int var = live_intervals-var_from_vgrf[inst-dst.reg];
 +   for (int i = 0; i  inst-regs_written; i++) {
 +  result_live = result_live || BITSET_TEST(live, var + i);
 +   }
 +}

You could just use the else block of this if statement all the time,
right?  Seems easier.

 +if (!result_live) {
 +   progress = true;
 +
 +   switch (inst-opcode) {
 +   case BRW_OPCODE_ADDC:
 +   case BRW_OPCODE_SUBB:
 +   case BRW_OPCODE_MACH:
 +  inst-dst = fs_reg(retype(brw_null_reg(), inst-dst.type));
 +  break;
 +   default:
 +  inst-opcode = BRW_OPCODE_NOP;
 +  break;
 +   }
 +   continue;

I don't think the continue here is quite right: you'll skip looking at
the src args for a nulled-destination ADDC/SUBB/MACH.  I think the
continue would still be appropriate in the default case.

 +}
 + }
 +
 + for (int i = 0; i  3; i++) {
 +if (inst-src[i].file == GRF) {
 +   int var = live_intervals-var_from_vgrf[inst-src[i].reg];
 +
 +   for (int j = 0; j  inst-regs_read(this, i); j++) {
 +  BITSET_SET(live, var + inst-src[i].reg_offset + j);
 +   }
 +}
 + }
 +  }
 +   }
 +
 +   ralloc_free(live);
 +
 +   foreach_list_safe(node, this-instructions) {
 +  fs_inst *inst = (fs_inst *)node;
 +
 +  if (inst-opcode == BRW_OPCODE_NOP) {
 + inst-remove();
 +  }
 +   }

Tiny optimization: this block could go under if (progress).

 +   if (progress)
 +  invalidate_live_intervals();
 +
 +   return progress;

The continue comment is the only important one, I think.  Anything
else you can take or leave, and this series is:

Reviewed-by: Eric Anholt e...@anholt.net


pgpKavHPi_FCw.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix buffer overruns in MSAA MCS buffer clearing.

2014-04-14 Thread Ian Romanick
On 04/14/2014 05:33 PM, Eric Anholt wrote:
 This manifested as rendering failures or sometimes GPU hangs in
 compositors when they accidentally got MSAA visuals due to a bug in the X
 Server.  Today we decided that the problem in compositors was equivalent
 to a corruption bug we'd noticed recently in resizing MSAA-visual
 glxgears, and debugging got a lot easier.
 
 When we allocate our MCS MT, libdrm takes the size we request, aligns it
 to Y tile size (blowing it up from 300x300=90 bytes to 384*320=122880
  ^
Infinitesimal nit...   one too many zeros here.

 bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes,
 32 pages).  Because it's Y tiled, we attach a 384-byte-stride fence to it.
 When we memset by the BO size in Mesa, between bytes 122880 and 131072 the
 data gets stored to the first 20 or so scanlines of each of the 3 tiled
 pages in that row, even though only 2 of those pages were allocated by
 libdrm.  In the glxgears case, the missing 3rd page happened to
 consistently be the static VBO that got mapped right after the first MCS
 allocation, so corruption only appeared once window resize made us throw
 out the old MCS and then allocate the same BO to back the new MCS.

Oh man...  I didn't realize that bo-size wasn't necessarily the amount
of backing storage.  It makes sense... do we track the backed size
anywhere?  It seems possible to have similar problems elsewhere...

 Instead, just memset the amount of data we actually asked libdrm to
 allocate for, which will be smaller (more efficient) and not overrun.
 Thanks go to Kenneth for doing most of the hard debugging to eliminate a
 lot of the search space for the bug.
 
 Cc: 10.0 10.1 mesa-sta...@lists.freedesktop.org
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77207
 ---
 
 Ken: how I eventually figured this out was I thought well, I'm
 clearing the whole surface at the start of rendering glxgears, so I
 don't *really* need to memset because I'll be initializing the whole
 buffer during fallback glClear() anyway, so let me just drop the whole
 memset step to definitely eliminate that as a potential problem.  Huh,
 the problem's gone.
 
 The worst is I remember briefly thinking about this code when it was
 landing, and thought nope, seems safe.
 
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
 b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 index 5996a1b..59700ed 100644
 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 @@ -1219,7 +1219,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
  * Note: the clear value for MCS buffers is all 1's, so we memset to 0xff.
  */
 void *data = intel_miptree_map_raw(brw, mt-mcs_mt);
 -   memset(data, 0xff, mt-mcs_mt-region-bo-size);
 +   memset(data, 0xff, mt-mcs_mt-region-height * 
 mt-mcs_mt-region-pitch);
 intel_miptree_unmap_raw(brw, mt-mcs_mt);
 mt-fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
  
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 3/3] i965: Fix component mask and varying_to_slot mapping for gl_ViewportIndex

2014-04-14 Thread Chris Forbes
Are the streamout values allowed to be restricted to the ranges of the
vertex header fields, or do they have to reflect exactly what the
shader wrote?

(I ran into a similar issue with ARB_fragment_layer_viewport, and
while nothing has been done about it, the consensus was that the FS
had to see exactly what the previous stage wrote, and so a real slot
would be necessary)

On Tue, Apr 15, 2014 at 11:23 AM, Anuj Phogat anuj.pho...@gmail.com wrote:
 gl_ViewportIndex doesn't get its own varying slot. It is stored
 in VARYING_SLOT_PSIZ.z. This patch fixes the issue for both gen7
 and gen8 because gen7_upload_3dstate_so_decl_list() is shared
 between them.

 Fixes failures in OpenGL Khronos CTS test transform_feedback_builtins.
 Makes new piglit test glsl-1.50-transform-feedback-builtins pass for
 'gl_ViewportIndex'.

 Cc: mesa-sta...@lists.freedesktop.org
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
  src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
 b/src/mesa/drivers/dri/i965/gen7_sol_state.c
 index 3623238..8e554af 100644
 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
 +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
 @@ -123,6 +123,7 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,

/* gl_PointSize is stored in VARYING_SLOT_PSIZ.w
 * gl_Layer is stored in VARYING_SLOT_PSIZ.y
 +   * gl_ViewportIndex is stored in VARYING_SLOT_PSIZ.z
 */
if (varying == VARYING_SLOT_PSIZ) {
   assert(components == 1);
 @@ -130,6 +131,9 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
} else if (varying == VARYING_SLOT_LAYER) {
   assert(components == 1);
   component_mask = 1;
 +  } else if (varying == VARYING_SLOT_VIEWPORT) {
 + assert(components == 1);
 + component_mask = 2;
} else {
   component_mask = linked_xfb_info-Outputs[i].ComponentOffset;
}
 @@ -137,7 +141,7 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
buffer_mask |= 1  buffer;

decl |= buffer  SO_DECL_OUTPUT_BUFFER_SLOT_SHIFT;
 -  if (varying == VARYING_SLOT_LAYER) {
 +  if (varying == VARYING_SLOT_LAYER || varying == VARYING_SLOT_VIEWPORT) 
 {
   decl |= vue_map-varying_to_slot[VARYING_SLOT_PSIZ] 
  SO_DECL_REGISTER_INDEX_SHIFT;
} else {
 --
 1.8.3.1

 ___
 mesa-stable mailing list
 mesa-sta...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-stable
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] i965/fs: Clear variable from live-set if it's completely overwritten.

2014-04-14 Thread Matt Turner
On Mon, Apr 14, 2014 at 5:50 PM, Eric Anholt e...@anholt.net wrote:
 Matt Turner matts...@gmail.com writes:

 One program affected:

 instructions in affected programs: 246 - 244 (-0.81%)
 ---
  src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp | 7 +++
  1 file changed, 7 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
 index 6addbb3..1ae7606 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
 @@ -80,6 +80,13 @@ fs_visitor::dead_code_eliminate()
  }
   }

 + if (inst-dst.file == GRF) {
 +if (!inst-is_partial_write()) {
 +   int var = live_intervals-var_from_reg(inst-dst);
 +   BITSET_CLEAR(live, var);
 +}
 + }

 Any more effect if you:

  +   int var = live_intervals-var_from_vgrf[inst-dst.reg];
  +   for (int i = 0; i  inst-regs_written; i++)
  +  BITSET_CLEAR(live, var + i);

 instead?

Needs to be BITSET_CLEAR(live, var + inst-dst.reg_offset + i), but
nope. No changes.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] i965/fs: Reimplement dead_code_elimination().

2014-04-14 Thread Matt Turner
On Mon, Apr 14, 2014 at 5:59 PM, Eric Anholt e...@anholt.net wrote:
 Matt Turner matts...@gmail.com writes:
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
 new file mode 100644
 index 000..6addbb3
 --- /dev/null
 +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp

 +
 +/** @file brw_fs_dead_code_eliminate.cpp
 + */

 Perhaps some actual comment:

 * Dataflow-aware dead code elimination.
 *
 * Walks the instruction list from the bottom, removing instructions that
 * have results that both aren't used in later blocks and haven't been read
 * yet in the tail end of this block.

Thanks, will add.

 +bool
 +fs_visitor::dead_code_eliminate()
 +{
 +   bool progress = false;
 +
 +   cfg_t cfg(instructions);
 +
 +   calculate_live_intervals();
 +
 +   int num_vars = live_intervals-num_vars;
 +   BITSET_WORD *live = ralloc_array(NULL, BITSET_WORD, 
 BITSET_WORDS(num_vars));
 +
 +   for (int b = 0; b  cfg.num_blocks; b++) {
 +  bblock_t *block = cfg.blocks[b];
 +  memcpy(live, live_intervals-bd[b].liveout,
 + sizeof(BITSET_WORD) * BITSET_WORDS(num_vars));
 +
 +  for (fs_inst *inst = (fs_inst *)block-end;
 +   inst != block-start-prev;
 +   inst = (fs_inst *)inst-prev) {
 + if (inst-dst.file == GRF 
 + !inst-has_side_effects() 
 + !inst-writes_flag()) {
 +bool result_live = false;
 +
 +if (inst-regs_written == 1) {
 +   int var = live_intervals-var_from_reg(inst-dst);
 +   result_live = BITSET_TEST(live, var);
 +} else {
 +   int var = live_intervals-var_from_vgrf[inst-dst.reg];
 +   for (int i = 0; i  inst-regs_written; i++) {
 +  result_live = result_live || BITSET_TEST(live, var + i);
 +   }
 +}

 You could just use the else block of this if statement all the time,
 right?  Seems easier.

I don't think so, because of destinations with reg_offset != 0.

 +if (!result_live) {
 +   progress = true;
 +
 +   switch (inst-opcode) {
 +   case BRW_OPCODE_ADDC:
 +   case BRW_OPCODE_SUBB:
 +   case BRW_OPCODE_MACH:
 +  inst-dst = fs_reg(retype(brw_null_reg(), 
 inst-dst.type));
 +  break;
 +   default:
 +  inst-opcode = BRW_OPCODE_NOP;
 +  break;
 +   }
 +   continue;

 I don't think the continue here is quite right: you'll skip looking at
 the src args for a nulled-destination ADDC/SUBB/MACH.  I think the
 continue would still be appropriate in the default case.

Yep, good catch.

 +}
 + }
 +
 + for (int i = 0; i  3; i++) {
 +if (inst-src[i].file == GRF) {
 +   int var = live_intervals-var_from_vgrf[inst-src[i].reg];
 +
 +   for (int j = 0; j  inst-regs_read(this, i); j++) {
 +  BITSET_SET(live, var + inst-src[i].reg_offset + j);
 +   }
 +}
 + }
 +  }
 +   }
 +
 +   ralloc_free(live);
 +
 +   foreach_list_safe(node, this-instructions) {
 +  fs_inst *inst = (fs_inst *)node;
 +
 +  if (inst-opcode == BRW_OPCODE_NOP) {
 + inst-remove();
 +  }
 +   }

 Tiny optimization: this block could go under if (progress).

Good idea.

 +   if (progress)
 +  invalidate_live_intervals();
 +
 +   return progress;

 The continue comment is the only important one, I think.  Anything
 else you can take or leave, and this series is:

 Reviewed-by: Eric Anholt e...@anholt.net

Thanks Eric!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix buffer overruns in MSAA MCS buffer clearing.

2014-04-14 Thread Kenneth Graunke
On 04/14/2014 05:33 PM, Eric Anholt wrote:
 This manifested as rendering failures or sometimes GPU hangs in
 compositors when they accidentally got MSAA visuals due to a bug in the X
 Server.  Today we decided that the problem in compositors was equivalent
 to a corruption bug we'd noticed recently in resizing MSAA-visual
 glxgears, and debugging got a lot easier.
 
 When we allocate our MCS MT, libdrm takes the size we request, aligns it
 to Y tile size (blowing it up from 300x300=90 bytes to 384*320=122880
 bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes,
 32 pages).  Because it's Y tiled, we attach a 384-byte-stride fence to it.
 When we memset by the BO size in Mesa, between bytes 122880 and 131072 the
 data gets stored to the first 20 or so scanlines of each of the 3 tiled
 pages in that row, even though only 2 of those pages were allocated by
 libdrm.

What?

I get that drm_intel_bo_alloc/drm_intel_bo_alloc_tiled might return a
drm_intel_bo where bo-size is larger than what you asked for, due to
the BO cache.  But...what you're saying is, it doesn't actually allocate
enough pages to back the whole bo-size it gives you?  So, if you write
bytes 0..(bo-size - 1), you'll randomly clobber memory in a way that's
really difficult to detect?

I never realized that.  It seems pretty scary.

There are other places where we memset an entire BO using bo-size.  For
example, your INTEL_DEBUG=shader_time code does exactly that (though it
isn't tiled).

Could we change libdrm to set bo-size to the actual usable size of the
buffer, rather than the bucket size?  That seems much safer, and would
prevent mistakes like this in the future.  If libdrm needs to know the
bucket size internally, we could store that as a field in the private
drm_intel_gem_bo structure, and not expose it to consumers of the
library.  I can't imagine users of libdrm want the current value.

 In the glxgears case, the missing 3rd page happened to
 consistently be the static VBO that got mapped right after the first MCS
 allocation, so corruption only appeared once window resize made us throw
 out the old MCS and then allocate the same BO to back the new MCS.
 
 Instead, just memset the amount of data we actually asked libdrm to
 allocate for, which will be smaller (more efficient) and not overrun.
 Thanks go to Kenneth for doing most of the hard debugging to eliminate a
 lot of the search space for the bug.
 
 Cc: 10.0 10.1 mesa-sta...@lists.freedesktop.org
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77207
 ---
 
 Ken: how I eventually figured this out was I thought well, I'm
 clearing the whole surface at the start of rendering glxgears, so I
 don't *really* need to memset because I'll be initializing the whole
 buffer during fallback glClear() anyway, so let me just drop the whole
 memset step to definitely eliminate that as a potential problem.  Huh,
 the problem's gone.
 
 The worst is I remember briefly thinking about this code when it was
 landing, and thought nope, seems safe.
 
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
 b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 index 5996a1b..59700ed 100644
 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 @@ -1219,7 +1219,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
  * Note: the clear value for MCS buffers is all 1's, so we memset to 0xff.
  */
 void *data = intel_miptree_map_raw(brw, mt-mcs_mt);
 -   memset(data, 0xff, mt-mcs_mt-region-bo-size);
 +   memset(data, 0xff, mt-mcs_mt-region-height * 
 mt-mcs_mt-region-pitch);
 intel_miptree_unmap_raw(brw, mt-mcs_mt);
 mt-fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;

This does seem to fix the KWin problem, as well as the glxgears problem.

I agree this is the correct amount of data to memset, and even if we
make the libdrm change I suggested, this seems worth doing.  bo-size
may have been rounded up beyond what we need, and memsetting that extra
space is wasteful (even if it did work).

Reviewed-by: Kenneth Graunke kenn...@whitecape.org

Thanks a ton for your help on this, Eric.  I was really stumped.



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/radeonsi: Map transfer staging texture unsynchronized when possible

2014-04-14 Thread Michel Dänzer
From: Michel Dänzer michel.daen...@amd.com

The transfer staging texture is always freshly allocated, so for write-only
transfers we don't need to explicitly wait for the BO to become idle.

Squeezes a few hundered MB/s more out of x11perf -shmput500 with glamor.

Signed-off-by: Michel Dänzer michel.daen...@amd.com
---
 src/gallium/drivers/radeon/r600_texture.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 293eeaa..c410543 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -1039,6 +1039,8 @@ static void *r600_texture_transfer_map(struct 
pipe_context *ctx,
 
if (trans-staging) {
buf = trans-staging;
+   if (!rtex-is_depth  !(usage  PIPE_TRANSFER_READ))
+   usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
} else {
buf = rtex-resource;
}
-- 
1.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 77449] Tracker bug for all bugs related to Steam titles

2014-04-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=77449

Nicholas Miell nmi...@gmail.com changed:

   What|Removed |Added

 Depends on||66067

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev