Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Matt Arsenault

> On Feb 8, 2016, at 08:08, Tom Stellard  wrote:
> 
> Do SI/CI support fp64 denorms?  If so, won't this hurt performance?
This is the only mode that should ever be used. I’m not sure why these are 
options. There technically are separate flush on input or flush on output 
options, but I’m not sure why they would be used.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: move VS creation in bitmap code

2016-02-08 Thread Brian Paul

On 02/08/2016 07:52 AM, Nicolai Hähnle wrote:

On 05.02.2016 19:55, Brian Paul wrote:

Do this one-time init with the other on-time inits.


Since Bitmap is something that few programs use, wouldn't it be better
to move in the other direction and do all the one-time inits on-demand
rather than at context init?


Sure, we could do that.  This change is part of a longer glBitmap 
optimization series that I'm still working on...


-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/23] i965: Don't try to create aux buffer for non-msrt aux-buffer

2016-02-08 Thread Topi Pohjolainen
In addition to simply calling miptree_create() the higher level
call intel_miptree_create() also considers if the buffer should
be associated with an auxiliary buffer based on the given format.

Here we are allocating an auxiliary buffer which in turn has such
format that would mislead intel_miptree_create_layout() later on
to try to associate the auxiliary buffer with an auxiliary buffer.
To prevent this the actual buffer creation logic was split out
into its own function. Lets invoke that instead.

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index d655de8..6c447ba 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -1550,16 +1550,17 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context 
*brw,
if (brw->gen >= 8) {
   layout_flags |= MIPTREE_LAYOUT_FORCE_HALIGN16;
}
-   mt->mcs_mt = intel_miptree_create(brw,
- mt->target,
- format,
- mt->first_level,
- mt->last_level,
- mcs_width,
- mcs_height,
- mt->logical_depth0,
- 0 /* num_samples */,
- layout_flags);
+   mt->mcs_mt = miptree_create(brw,
+   mt->target,
+   format,
+   mt->first_level,
+   mt->last_level,
+   mcs_width,
+   mcs_height,
+   mt->logical_depth0,
+   0 /* num_samples */,
+   INTEL_MSAA_LAYOUT_NONE,
+   layout_flags);
 
return mt->mcs_mt;
 }
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/23] i965: Add means for limiting color resolves

2016-02-08 Thread Topi Pohjolainen
Until now there has been only one type of color buffer that needs
to resolved - namely single sampled fast clear. As even the
sampler engine in GPU doesn't understand the associated meta data,
the color values need to be always resolved prior to reading them.

From SKL onwards there is new scheme supported called the lossless
compresson of single sampled color buffers. This is something that
is understood by the sampling engine and therefore resolving of
these types of buffers is not necessary before sampling.
This patch adds means to make the distinction when considering if
resolve is needed.

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp   | 2 +-
 src/mesa/drivers/dri/i965/brw_context.c| 8 
 src/mesa/drivers/dri/i965/intel_blit.c | 4 ++--
 src/mesa/drivers/dri/i965/intel_copy_image.c   | 4 ++--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c  | 9 ++---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h  | 3 ++-
 src/mesa/drivers/dri/i965/intel_pixel_bitmap.c | 2 +-
 src/mesa/drivers/dri/i965/intel_pixel_read.c   | 2 +-
 src/mesa/drivers/dri/i965/intel_tex_image.c| 2 +-
 src/mesa/drivers/dri/i965/intel_tex_subimage.c | 2 +-
 10 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 9ca33b4..bc008ae 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -71,7 +71,7 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
 * to destination color buffers, and the standard render path is
 * fast-color-aware.
 */
-   intel_miptree_resolve_color(brw, src_mt);
+   intel_miptree_resolve_color(brw, src_mt, 0);
intel_miptree_slice_resolve_depth(brw, src_mt, src_level, src_layer);
intel_miptree_slice_resolve_depth(brw, dst_mt, dst_level, dst_layer);
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 44d2fe4..a5f7a2e 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -208,7 +208,7 @@ intel_update_state(struct gl_context * ctx, GLuint 
new_state)
   if (!tex_obj || !tex_obj->mt)
 continue;
   intel_miptree_all_slices_resolve_depth(brw, tex_obj->mt);
-  intel_miptree_resolve_color(brw, tex_obj->mt);
+  intel_miptree_resolve_color(brw, tex_obj->mt, 0);
   brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
}
 
@@ -223,7 +223,7 @@ intel_update_state(struct gl_context * ctx, GLuint 
new_state)
 tex_obj = intel_texture_object(u->TexObj);
 
 if (tex_obj && tex_obj->mt) {
-   intel_miptree_resolve_color(brw, tex_obj->mt);
+   intel_miptree_resolve_color(brw, tex_obj->mt, 0);
brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
 }
  }
@@ -252,7 +252,7 @@ intel_update_state(struct gl_context * ctx, GLuint 
new_state)
  _mesa_get_srgb_format_linear(mt->format) == mt->format)
continue;
 
- intel_miptree_resolve_color(brw, mt);
+ intel_miptree_resolve_color(brw, mt, 0);
  brw_render_cache_set_check_flush(brw, mt->bo);
   }
}
@@ -1227,7 +1227,7 @@ intel_resolve_for_dri2_flush(struct brw_context *brw,
   if (rb == NULL || rb->mt == NULL)
  continue;
   if (rb->mt->num_samples <= 1)
- intel_miptree_resolve_color(brw, rb->mt);
+ intel_miptree_resolve_color(brw, rb->mt, 0);
   else
  intel_renderbuffer_downsample(brw, rb);
}
diff --git a/src/mesa/drivers/dri/i965/intel_blit.c 
b/src/mesa/drivers/dri/i965/intel_blit.c
index 6d29fbd..72cf9af 100644
--- a/src/mesa/drivers/dri/i965/intel_blit.c
+++ b/src/mesa/drivers/dri/i965/intel_blit.c
@@ -317,8 +317,8 @@ intel_miptree_blit(struct brw_context *brw,
 */
intel_miptree_slice_resolve_depth(brw, src_mt, src_level, src_slice);
intel_miptree_slice_resolve_depth(brw, dst_mt, dst_level, dst_slice);
-   intel_miptree_resolve_color(brw, src_mt);
-   intel_miptree_resolve_color(brw, dst_mt);
+   intel_miptree_resolve_color(brw, src_mt, 0);
+   intel_miptree_resolve_color(brw, dst_mt, 0);
 
if (src_flip)
   src_y = minify(src_mt->physical_height0, src_level - 
src_mt->first_level) - src_y - height;
diff --git a/src/mesa/drivers/dri/i965/intel_copy_image.c 
b/src/mesa/drivers/dri/i965/intel_copy_image.c
index 0a3337e..fac4252 100644
--- a/src/mesa/drivers/dri/i965/intel_copy_image.c
+++ b/src/mesa/drivers/dri/i965/intel_copy_image.c
@@ -269,11 +269,11 @@ intel_copy_image_sub_data(struct gl_context *ctx,
 */
intel_miptree_all_slices_resolve_hiz(brw, src_mt);
intel_miptree_all_slices_resolve_depth(brw, src_mt);
-   intel_miptree_resolve_color(brw, src_mt);
+   intel_miptree_resolve_color(brw, src_mt, 0);
 
intel_miptree_all_slices_resolve_hiz(brw, dst_mt);

[Mesa-dev] [PATCH 20/23] i965: Expose logic telling if non-msrt mcs is supported

2016-02-08 Thread Topi Pohjolainen
Alos use the opportunity to mark inputs constant. (Context has to be
given as read-write to intel_miptree_supports_non_msrt_fast_clear()
to support debug output).

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 +
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 8 
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 18803f7..1fd2654 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -164,8 +164,9 @@ intel_get_non_msrt_mcs_alignment(struct intel_mipmap_tree 
*mt,
}
 }
 
-static bool
-intel_tiling_supports_non_msrt_mcs(struct brw_context *brw, unsigned tiling)
+bool
+intel_tiling_supports_non_msrt_mcs(const struct brw_context *brw,
+   unsigned tiling)
 {
/* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render
 * Target(s)", beneath the "Fast Color Clear" bullet (p326):
@@ -203,9 +204,9 @@ intel_tiling_supports_non_msrt_mcs(struct brw_context *brw, 
unsigned tiling)
  * - MCS and Lossless compression is supported for TiledY/TileYs/TileYf
  * non-MSRTs only.
  */
-static bool
+bool
 intel_miptree_supports_non_msrt_fast_clear(struct brw_context *brw,
-   struct intel_mipmap_tree *mt)
+   const struct intel_mipmap_tree *mt)
 {
/* MCS support does not exist prior to Gen7 */
if (brw->gen < 7)
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 582522d..f05436d 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -676,6 +676,14 @@ intel_get_non_msrt_mcs_alignment(struct intel_mipmap_tree 
*mt,
  unsigned *width_px, unsigned *height);
 
 bool
+intel_tiling_supports_non_msrt_mcs(const struct brw_context *brw,
+   unsigned tiling);
+
+bool
+intel_miptree_supports_non_msrt_fast_clear(struct brw_context *brw,
+   const struct intel_mipmap_tree *mt);
+
+bool
 intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw,
  struct intel_mipmap_tree *mt);
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 16/23] i965: Add a few assertions on lossless compression

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 4 
 src/mesa/drivers/dri/i965/brw_context.c  | 4 
 2 files changed, 8 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index bc008ae..af3bb15 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -70,7 +70,11 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
 * the destination buffer because we use the standard render path to render
 * to destination color buffers, and the standard render path is
 * fast-color-aware.
+* Lossless compression is only introduced for gen9 onwards whereas
+* blorp is not supported even for gen8. Therefore it should be impossible
+* to end up here with single sampled compressed surfaces.
 */
+   assert(src_mt->msaa_layout != INTEL_MSAA_LAYOUT_CSS);
intel_miptree_resolve_color(brw, src_mt, 0);
intel_miptree_slice_resolve_depth(brw, src_mt, src_level, src_layer);
intel_miptree_slice_resolve_depth(brw, dst_mt, dst_level, dst_layer);
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index b61036e..c3d02bc 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -262,6 +262,10 @@ intel_update_state(struct gl_context * ctx, GLuint 
new_state)
  _mesa_get_srgb_format_linear(mt->format) == mt->format)
continue;
 
+ /* Lossless compression is not supported for SRGB formats, it
+  * should be impossible to get here with such surfaces.
+  */
+ assert(mt->msaa_layout != INTEL_MSAA_LAYOUT_CSS);
  intel_miptree_resolve_color(brw, mt, 0);
  brw_render_cache_set_check_flush(brw, mt->bo);
   }
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/23] i965/gen9: Add buffer layout representing lossless compression

2016-02-08 Thread Topi Pohjolainen
Skylake introduces compression support also for the single-sampled
color buffers. Similarly to the multi-sampled case the color buffer
will be associated with an auxiliary surface tracking the
compression state.

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  | 15 +++
 src/mesa/drivers/dri/i965/brw_tex_layout.c|  2 ++
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c |  4 
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  9 +
 4 files changed, 30 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index c7cb394..9ca33b4 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -1162,6 +1162,11 @@ brw_blorp_blit_program::encode_msaa(unsigned num_samples,
   SWAP_XY_AND_XPYP();
   s_is_zero = true;
   break;
+   case INTEL_MSAA_LAYOUT_CSS:
+  /* This is impossible combination, blorp is supported only on
+   * gen < 8 while CSS is only supported from gen 9 onwards.
+   */
+  unreachable("Blorp does not support lossless compression");
}
 }
 
@@ -1239,6 +1244,11 @@ brw_blorp_blit_program::decode_msaa(unsigned num_samples,
   s_is_zero = false;
   SWAP_XY_AND_XPYP();
   break;
+   case INTEL_MSAA_LAYOUT_CSS:
+  /* This is impossible combination, blorp is supported only on
+   * gen < 8 while CSS is only supported from gen 9 onwards.
+   */
+  unreachable("Blorp does not support lossless compression");
}
 }
 
@@ -1642,6 +1652,11 @@ brw_blorp_blit_program::texel_fetch(struct brw_reg dst)
  texture_lookup(dst, SHADER_OPCODE_TXF, gen7_ld_args,
 ARRAY_SIZE(gen7_ld_args));
  break;
+  case INTEL_MSAA_LAYOUT_CSS:
+ /* This is impossible combination, blorp is supported only on
+  * gen < 8 while CSS is only supported from gen 9 onwards.
+  */
+unreachable("Blorp does not support lossless compression");
   }
   break;
default:
diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
b/src/mesa/drivers/dri/i965/brw_tex_layout.c
index a294829..2366bfa 100644
--- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
+++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
@@ -686,6 +686,8 @@ intel_miptree_set_total_width_height(struct brw_context 
*brw,
   case INTEL_MSAA_LAYOUT_CMS:
  brw_miptree_layout_texture_array(brw, mt);
  break;
+  case INTEL_MSAA_LAYOUT_CSS:
+ assert(brw->gen >= 9);
   case INTEL_MSAA_LAYOUT_NONE:
   case INTEL_MSAA_LAYOUT_IMS:
  if (gen9_use_linear_1d_layout(brw, mt))
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index d40a529..fe525c3 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -460,6 +460,8 @@ intel_miptree_create_layout(struct brw_context *brw,
   case INTEL_MSAA_LAYOUT_CMS:
  mt->array_layout = ALL_SLICES_AT_EACH_LOD;
  break;
+  case INTEL_MSAA_LAYOUT_CSS:
+unreachable("Lossless compression is only support for gen9+");
   }
}
 
@@ -1051,6 +1053,8 @@ intel_miptree_match_image(struct intel_mipmap_tree *mt,
   case INTEL_MSAA_LAYOUT_CMS:
  level_depth /= mt->num_samples;
  break;
+  case INTEL_MSAA_LAYOUT_CSS:
+ unreachable("Lossless compression is only for single-sampled");
   }
}
 
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 64f73ea..0970fd5 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -198,6 +198,15 @@ enum intel_msaa_layout
 * @see PRM section "Compressed Multisampled Surfaces"
 */
INTEL_MSAA_LAYOUT_CMS,
+
+   /**
+* Compressed Singlesample Surface. The color values are stored in one
+* slice and an auxiliary buffer is used to track compression state just
+* as in the Compressed Multisample case.
+*
+* @see section "Lossless Compression"
+*/
+   INTEL_MSAA_LAYOUT_CSS,
 };
 
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/23] i965: Add a flag telling color resolve pass to ignore CCS_E

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/brw_context.c   | 12 +++-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 10 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  9 +
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index a5f7a2e..b61036e 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -208,7 +208,11 @@ intel_update_state(struct gl_context * ctx, GLuint 
new_state)
   if (!tex_obj || !tex_obj->mt)
 continue;
   intel_miptree_all_slices_resolve_depth(brw, tex_obj->mt);
-  intel_miptree_resolve_color(brw, tex_obj->mt, 0);
+  /* Sampling engine understands lossless compression and resolving
+   * those surfaces should be skipped for performance reasons.
+   */
+  intel_miptree_resolve_color(brw, tex_obj->mt,
+  INTEL_MIPTREE_IGNORE_CCS_E);
   brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
}
 
@@ -223,6 +227,12 @@ intel_update_state(struct gl_context * ctx, GLuint 
new_state)
 tex_obj = intel_texture_object(u->TexObj);
 
 if (tex_obj && tex_obj->mt) {
+   /* Access to images is implemented using indirect messages
+* against data port. Normal render target write understands
+* lossless compression but unfortunately the typed/untyped
+* read/write interface doesn't. Therefore the compressed
+* surfaces need to be resolved prior to accessing them.
+*/
intel_miptree_resolve_color(brw, tex_obj->mt, 0);
brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
 }
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 5b176ca..e9ff800 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -2031,7 +2031,15 @@ intel_miptree_resolve_color(struct brw_context *brw,
 struct intel_mipmap_tree *mt,
 int flags)
 {
-   (void)flags;
+   /* From gen9 onwards there is new compression scheme for single sampled
+* surfaces called "lossless compressed". These don't need to be always
+* resolved.
+*/
+   if ((flags & INTEL_MIPTREE_IGNORE_CCS_E) &&
+   mt->msaa_layout == INTEL_MSAA_LAYOUT_CSS) {
+  assert(brw->gen >= 9);
+  return;
+   }
 
switch (mt->fast_clear_state) {
case INTEL_FAST_CLEAR_STATE_NO_MCS:
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 819ec5a..582522d 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -893,6 +893,15 @@ intel_miptree_used_for_rendering(struct intel_mipmap_tree 
*mt)
   mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_UNRESOLVED;
 }
 
+/**
+ * Flag values telling color resolve pass which special types of buffers
+ * can be ignored.
+ *
+ * INTEL_MIPTREE_IGNORE_CCS_E:   Lossless compressed (single-sample
+ *   compression scheme since gen9)
+ */
+#define INTEL_MIPTREE_IGNORE_CCS_E (1 << 0)
+
 void
 intel_miptree_resolve_color(struct brw_context *brw,
 struct intel_mipmap_tree *mt,
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 17/23] i965: Set buffer cleared after actually clearing it

2016-02-08 Thread Topi Pohjolainen
Subsequent patch will modify the surface state to set state to
unresolved whenever the surface is used as render target. Color
resolve itself will use the same surface setup path and marking
the buffer as cleared after the draw call ensures that the state
correct after the resolve

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 07696cf..8117727 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -880,11 +880,12 @@ brw_meta_resolve_color(struct brw_context *brw,
else
   set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_RESOLVE_ENABLE);
 
-   mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED;
get_resolve_rect(brw, mt, );
 
brw_draw_rectlist(brw, , 1);
 
+   mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED;
+
set_fast_clear_op(brw, 0);
use_rectlist(brw, false);
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 18/23] i965/gen9: Prepare surface state setup for lossless compression

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/brw_defines.h| 1 +
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index fa71865..f7f904c 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -656,6 +656,7 @@
 #define GEN8_SURFACE_AUX_MODE_MCS   1
 #define GEN8_SURFACE_AUX_MODE_APPEND2
 #define GEN8_SURFACE_AUX_MODE_HIZ   3
+#define GEN9_SURFACE_AUX_MODE_CCS_E 5
 
 /* Surface state DW7 */
 #define GEN9_SURFACE_RT_COMPRESSION_SHIFT   30
diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index 0a52815..b140ff4 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -216,6 +216,9 @@ gen8_get_aux_mode(const struct brw_context *brw,
if (brw->gen >= 9 || mt->num_samples == 1)
   assert(mt->halign == 16);
 
+   if (mt->msaa_layout == INTEL_MSAA_LAYOUT_CSS)
+  return GEN9_SURFACE_AUX_MODE_CCS_E;
+
return GEN8_SURFACE_AUX_MODE_MCS;
 }
 
@@ -484,6 +487,9 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
struct intel_mipmap_tree *aux_mt = mt->mcs_mt;
const uint32_t aux_mode = gen8_get_aux_mode(brw, mt, surf_type);
 
+   if (aux_mode == GEN9_SURFACE_AUX_MODE_CCS_E)
+  mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_UNRESOLVED;
+
uint32_t *surf = allocate_surface_state(brw, , surf_index);
 
surf[0] = (surf_type << BRW_SURFACE_TYPE_SHIFT) |
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/23] i965: Resolve color buffer also in lossless compression case

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 6f46385..6ec02d8 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -2038,8 +2038,10 @@ intel_miptree_resolve_color(struct brw_context *brw,
case INTEL_FAST_CLEAR_STATE_UNRESOLVED:
case INTEL_FAST_CLEAR_STATE_CLEAR:
   /* Fast color clear resolves only make sense for non-MSAA buffers. */
-  if (mt->msaa_layout == INTEL_MSAA_LAYOUT_NONE)
+  if (mt->msaa_layout == INTEL_MSAA_LAYOUT_NONE ||
+  mt->msaa_layout == INTEL_MSAA_LAYOUT_CSS) {
  brw_meta_resolve_color(brw, mt);
+  }
   break;
}
 }
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 19/23] i965/gen9: Refactor msrt mcs initialization

2016-02-08 Thread Topi Pohjolainen
This will be re-used to initialize auxiliary buffers in lossless
compression case.

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 36 ---
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index e9ff800..18803f7 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -1437,6 +1437,27 @@ intel_miptree_copy_teximage(struct brw_context *brw,
intel_obj->needs_validate = true;
 }
 
+static void
+intel_miptree_init_mcs(struct brw_context *brw,
+   struct intel_mipmap_tree *mt,
+   int init_value)
+{
+   /* From the Ivy Bridge PRM, Vol 2 Part 1 p326:
+*
+* When MCS buffer is enabled and bound to MSRT, it is required that it
+* is cleared prior to any rendering.
+*
+* Since we don't use the MCS buffer for any purpose other than rendering,
+* it makes sense to just clear it immediately upon allocation.
+*
+* Note: the clear value for MCS buffers is all 1's, so we memset to 0xff.
+*/
+   void *data = intel_miptree_map_raw(brw, mt->mcs_mt);
+   memset(data, init_value, mt->mcs_mt->total_height * mt->mcs_mt->pitch);
+   intel_miptree_unmap_raw(mt->mcs_mt);
+   mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
+}
+
 static bool
 intel_miptree_alloc_mcs(struct brw_context *brw,
 struct intel_mipmap_tree *mt,
@@ -1494,20 +1515,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
INTEL_MSAA_LAYOUT_NONE,
mcs_flags);
 
-   /* From the Ivy Bridge PRM, Vol 2 Part 1 p326:
-*
-* When MCS buffer is enabled and bound to MSRT, it is required that it
-* is cleared prior to any rendering.
-*
-* Since we don't use the MCS buffer for any purpose other than rendering,
-* it makes sense to just clear it immediately upon allocation.
-*
-* Note: the clear value for MCS buffers is all 1's, so we memset to 0xff.
-*/
-   void *data = intel_miptree_map_raw(brw, mt->mcs_mt);
-   memset(data, 0xff, mt->mcs_mt->total_height * mt->mcs_mt->pitch);
-   intel_miptree_unmap_raw(mt->mcs_mt);
-   mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
+   intel_miptree_init_mcs(brw, mt, 0xFF);
 
return mt->mcs_mt;
 }
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/23] i965: Let caller of intel_miptree_create_layout() decide msaa layout

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 108dd87..0edd59f 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -64,8 +64,11 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
  */
 static enum intel_msaa_layout
 compute_msaa_layout(struct brw_context *brw, mesa_format format,
-bool disable_aux_buffers)
+unsigned num_samples, bool disable_aux_buffers)
 {
+   if (num_samples <= 1)
+  return INTEL_MSAA_LAYOUT_NONE;
+
/* Prior to Gen7, all MSAA surfaces used IMS layout. */
if (brw->gen < 7)
   return INTEL_MSAA_LAYOUT_IMS;
@@ -299,6 +302,7 @@ intel_miptree_create_layout(struct brw_context *brw,
 GLuint height0,
 GLuint depth0,
 GLuint num_samples,
+enum intel_msaa_layout msaa_layout,
 uint32_t layout_flags)
 {
struct intel_mipmap_tree *mt = calloc(sizeof(*mt), 1);
@@ -343,13 +347,11 @@ intel_miptree_create_layout(struct brw_context *brw,
mt->cpp = _mesa_get_format_bytes(format);
mt->num_samples = num_samples;
mt->compressed = _mesa_is_format_compressed(format);
-   mt->msaa_layout = INTEL_MSAA_LAYOUT_NONE;
+   mt->msaa_layout = msaa_layout;
mt->refcount = 1;
 
if (num_samples > 1) {
   /* Adjust width/height/depth for MSAA */
-  mt->msaa_layout = compute_msaa_layout(brw, format,
-mt->disable_aux_buffers);
   if (mt->msaa_layout == INTEL_MSAA_LAYOUT_IMS) {
  /* From the Ivybridge PRM, Volume 1, Part 1, page 108:
   * "If the surface is multisampled and it is a depth or stencil
@@ -636,6 +638,8 @@ intel_miptree_create(struct brw_context *brw,
mt = intel_miptree_create_layout(brw, target, format,
 first_level, last_level, width0,
 height0, depth0, num_samples,
+compute_msaa_layout(brw, format,
+num_samples, false),
 layout_flags);
/*
 * pitch == 0 || height == 0  indicates the null texture
@@ -743,6 +747,7 @@ intel_miptree_create_for_bo(struct brw_context *brw,
struct intel_mipmap_tree *mt;
uint32_t tiling, swizzle;
GLenum target;
+   const bool disable_aux_buffers = layout_flags & MIPTREE_LAYOUT_DISABLE_AUX;
 
drm_intel_bo_get_tiling(bo, , );
 
@@ -769,6 +774,8 @@ intel_miptree_create_for_bo(struct brw_context *brw,
mt = intel_miptree_create_layout(brw, target, format,
 0, 0,
 width, height, depth, 0,
+compute_msaa_layout(brw, format, 0,
+disable_aux_buffers),
 layout_flags);
if (!mt)
   return NULL;
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/23] i965: Allow fast clear to be used with lossless compression

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 871a77e..5391794 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -227,7 +227,10 @@ get_fast_clear_rect(struct brw_context *brw, struct 
gl_framebuffer *fb,
unsigned int x_align, y_align;
unsigned int x_scaledown, y_scaledown;
 
-   if (irb->mt->msaa_layout == INTEL_MSAA_LAYOUT_NONE) {
+   switch (irb->mt->msaa_layout) {
+   case INTEL_MSAA_LAYOUT_CSS:
+  assert(brw->gen >= 9);
+   case INTEL_MSAA_LAYOUT_NONE:
   /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render
* Target(s)", beneath the "Fast Color Clear" bullet (p327):
*
@@ -278,7 +281,8 @@ get_fast_clear_rect(struct brw_context *brw, struct 
gl_framebuffer *fb,
*/
   x_align *= 2;
   y_align *= 2;
-   } else {
+  break;
+   default:
   /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render
* Target(s)", beneath the "MSAA Compression" bullet (p326):
*
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/23] i965: Add resolve option for lossless compression

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/brw_defines.h | 1 +
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 01e0c99..fa71865 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -2711,6 +2711,7 @@ enum brw_wm_barycentric_interp_mode {
 # define GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE   (1 << 8)
 # define GEN7_PS_DUAL_SOURCE_BLEND_ENABLE  (1 << 7)
 # define GEN7_PS_RENDER_TARGET_RESOLVE_ENABLE  (1 << 6)
+# define GEN9_PS_RENDER_TARGET_RESOLVE_FULL (3 << 6)
 # define HSW_PS_UAV_ACCESS_ENABLE  (1 << 5)
 # define GEN7_PS_POSOFFSET_NONE(0 << 3)
 # define GEN7_PS_POSOFFSET_CENTROID(2 << 3)
diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 5391794..07696cf 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -875,7 +875,10 @@ brw_meta_resolve_color(struct brw_context *brw,
 * bits to let us select the type of resolve.  For fast clear resolves, it
 * turns out we can use the same value as pre-SKL though.
 */
-   set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_RESOLVE_ENABLE);
+   if (mt->msaa_layout == INTEL_MSAA_LAYOUT_CSS)
+  set_fast_clear_op(brw, GEN9_PS_RENDER_TARGET_RESOLVE_FULL);
+   else
+  set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_RESOLVE_ENABLE);
 
mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED;
get_resolve_rect(brw, mt, );
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] i965/gen9: Compression support for single-sampled

2016-02-08 Thread Topi Pohjolainen
This series enables compression for single sampled color surfaces,
also referred to as "lossless compression". This is yet only for
driver internal use easing pressure on memory bandwidth and caches
when writing, blending and sampling surfaces uing gpu.

As a side effect the need for color buffer resolves after fast
clears is also decreased. Current understanding is that sampling
engine doesn't understand meta data (auxiliary buffer) for single
sampled fast cleared surfaces. However, if the meta data is written
with lossless compression enabled, even sampling engine is capable
of reading both the color buffer and the auxiliary, and resolves
can be omitted in those case.

The final enabling patch is dependent on earlier two-patch series
fixing state restore mechanism in i965-meta operations.

There are some performance numbers available in the final commit.

Topi Pohjolainen (23):
  i965: Let caller of intel_miptree_create_layout() decide msaa layout
  i965: Use miptree non-aligned dimensions directly for x-tiled
  i965: Separate miptree creation from auxiliary buffer setup
  i965: Don't try to create aux buffer for non-msrt aux-buffer
  i965: Stop considering if msrt aux buffers need aux buffer
  i965/gen9: Add buffer layout representing lossless compression
  i965/gen9: Allow halign == 16 also for losslessly compressed
  i965: Allow fast clear to be used with lossless compression
  i965: Add resolve option for lossless compression
  i965: Use constant pointer when checking for compression
  i965/gen8: Remove dead assertion
  i965: Refactor resolving of auxiliary mode
  i965: Resolve color buffer also in lossless compression case
  i965: Add means for limiting color resolves
  i965: Add a flag telling color resolve pass to ignore CCS_E
  i965: Add a few assertions on lossless compression
  i965: Set buffer cleared after actually clearing it
  i965/gen9: Prepare surface state setup for lossless compression
  i965/gen9: Refactor msrt mcs initialization
  i965: Expose logic telling if non-msrt mcs is supported
  i965/gen9: Setup MCS for compressed texture surfaces
  i965: Add helper for checking for lossless compressible
  i965/gen9: Enable lossless compression

 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp|  21 +-
 src/mesa/drivers/dri/i965/brw_context.c |  22 ++-
 src/mesa/drivers/dri/i965/brw_context.h |   2 +-
 src/mesa/drivers/dri/i965/brw_defines.h |   2 +
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c |  16 +-
 src/mesa/drivers/dri/i965/brw_surface_formats.c |   2 +-
 src/mesa/drivers/dri/i965/brw_tex_layout.c  |   2 +
 src/mesa/drivers/dri/i965/gen8_surface_state.c  |  90 +
 src/mesa/drivers/dri/i965/intel_blit.c  |   4 +-
 src/mesa/drivers/dri/i965/intel_copy_image.c|   4 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c   | 249 +---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h   |  32 ++-
 src/mesa/drivers/dri/i965/intel_pixel_bitmap.c  |   2 +-
 src/mesa/drivers/dri/i965/intel_pixel_read.c|   2 +-
 src/mesa/drivers/dri/i965/intel_tex_image.c |   2 +-
 src/mesa/drivers/dri/i965/intel_tex_subimage.c  |   2 +-
 16 files changed, 318 insertions(+), 136 deletions(-)

-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/23] i965: Separate miptree creation from auxiliary buffer setup

2016-02-08 Thread Topi Pohjolainen
Currently the logic allocating and setting up miptrees is closely
combined with decision making when to re-allocate buffers in
X-tiled layout and when to associate colors with auxiliary buffers.

These auxiliary buffers are in turn also represented as miptrees
and are created by the same miptree creation logic calling itself
recursively. This means considering in vain if the auxiliary buffers
should be represented in X-tiled layout or if they should be
associated with auxiliary buffers again.
While this is somewhat unnecessary, this doesn't impose any problems
currently. Miptrees for auxiliary buffers are created as simgle-sampled
fusing the consideration for multi-sampled compression auxiliary
buffers. The format in turn is such that is not applicable for
single-sampled fast clears (that would require accompaning auxiliary
buffer).
But once the driver starts to support lossless compression of color
buffers the auxiliary buffer will have a format that would itself
be applicable for lossless compression. This would be rather
difficult and ugly to detect in the current miptree creation logic,
and therefore this patch seeks to separate the association logic
from the general allocation and setup steps.

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 75 +--
 1 file changed, 49 insertions(+), 26 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 033f4c6..d655de8 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -611,17 +611,18 @@ intel_get_yf_ys_bo_size(struct intel_mipmap_tree *mt, 
unsigned *alignment,
return size;
 }
 
-struct intel_mipmap_tree *
-intel_miptree_create(struct brw_context *brw,
- GLenum target,
- mesa_format format,
- GLuint first_level,
- GLuint last_level,
- GLuint width0,
- GLuint height0,
- GLuint depth0,
- GLuint num_samples,
- uint32_t layout_flags)
+static struct intel_mipmap_tree *
+miptree_create(struct brw_context *brw,
+   GLenum target,
+   mesa_format format,
+   GLuint first_level,
+   GLuint last_level,
+   GLuint width0,
+   GLuint height0,
+   GLuint depth0,
+   GLuint num_samples,
+   enum intel_msaa_layout msaa_layout,
+   uint32_t layout_flags)
 {
struct intel_mipmap_tree *mt;
mesa_format tex_format = format;
@@ -638,9 +639,7 @@ intel_miptree_create(struct brw_context *brw,
mt = intel_miptree_create_layout(brw, target, format,
 first_level, last_level, width0,
 height0, depth0, num_samples,
-compute_msaa_layout(brw, format,
-num_samples, false),
-layout_flags);
+msaa_layout, layout_flags);
/*
 * pitch == 0 || height == 0  indicates the null texture
 */
@@ -658,12 +657,8 @@ intel_miptree_create(struct brw_context *brw,
   total_height = ALIGN(total_height, 64);
}
 
-   bool y_or_x = false;
-
-   if (mt->tiling == (I915_TILING_Y | I915_TILING_X)) {
-  y_or_x = true;
+   if (mt->tiling == (I915_TILING_Y | I915_TILING_X))
   mt->tiling = I915_TILING_Y;
-   }
 
if (layout_flags & MIPTREE_LAYOUT_ACCELERATED_UPLOAD)
   alloc_flags |= BO_ALLOC_FOR_RENDER;
@@ -686,12 +681,46 @@ intel_miptree_create(struct brw_context *brw,
}
 
mt->pitch = pitch;
+   mt->offset = 0;
+
+   if (!mt->bo) {
+  intel_miptree_release();
+  return NULL;
+   }
+
+   return mt;
+}
+
+struct intel_mipmap_tree *
+intel_miptree_create(struct brw_context *brw,
+ GLenum target,
+ mesa_format format,
+ GLuint first_level,
+ GLuint last_level,
+ GLuint width0,
+ GLuint height0,
+ GLuint depth0,
+ GLuint num_samples,
+ uint32_t layout_flags)
+{
+   struct intel_mipmap_tree *mt = miptree_create(
+ brw, target, format,
+ first_level, last_level,
+ width0, height0, depth0, num_samples,
+ compute_msaa_layout(brw, format,
+ num_samples, false),
+ layout_flags);
 
/* If the BO is too large to fit in the aperture, we need to use the
 * BLT engine to support it.  Prior to Sandybridge, the 

[Mesa-dev] [PATCH 05/23] i965: Stop considering if msrt aux buffers need aux buffer

2016-02-08 Thread Topi Pohjolainen
Auxiliary buffers are always created with sample number of zero
which effectively prevents intel_miptree_create_layout() from trying
to associate auxiliary buffers with auxiliary buffers.

Now that there is more direct path available lets start using it
instead and stop even checking for such (im)possibility.

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 6c447ba..d40a529 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -1477,16 +1477,17 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
 */
const uint32_t mcs_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD |
   MIPTREE_LAYOUT_TILING_Y;
-   mt->mcs_mt = intel_miptree_create(brw,
- mt->target,
- format,
- mt->first_level,
- mt->last_level,
- mt->logical_width0,
- mt->logical_height0,
- mt->logical_depth0,
- 0 /* num_samples */,
- mcs_flags);
+   mt->mcs_mt = miptree_create(brw,
+   mt->target,
+   format,
+   mt->first_level,
+   mt->last_level,
+   mt->logical_width0,
+   mt->logical_height0,
+   mt->logical_depth0,
+   0 /* num_samples */,
+   INTEL_MSAA_LAYOUT_NONE,
+   mcs_flags);
 
/* From the Ivy Bridge PRM, Vol 2 Part 1 p326:
 *
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/23] i965/gen9: Allow halign == 16 also for losslessly compressed

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index fe525c3..6f46385 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -530,7 +530,8 @@ intel_miptree_create_layout(struct brw_context *brw,
if (intel_miptree_supports_non_msrt_fast_clear(brw, mt)) {
   if (brw->gen >= 9 || (brw->gen == 8 && num_samples <= 1))
  layout_flags |= MIPTREE_LAYOUT_FORCE_HALIGN16;
-   } else if (brw->gen >= 9 && num_samples > 1) {
+   } else if (brw->gen >= 9 &&
+  (num_samples > 1 || mt->msaa_layout == INTEL_MSAA_LAYOUT_CSS)) {
   layout_flags |= MIPTREE_LAYOUT_FORCE_HALIGN16;
} else {
   /* For now, nothing else has this requirement */
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/23] i965: Refactor resolving of auxiliary mode

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 62 --
 1 file changed, 29 insertions(+), 33 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index fc8f701..0a52815 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -197,6 +197,28 @@ gen8_emit_fast_clear_color(struct brw_context *brw,
   surf[7] |= mt->fast_clear_color_value;
 }
 
+static uint32_t
+gen8_get_aux_mode(const struct brw_context *brw,
+  const struct intel_mipmap_tree *mt,
+  uint32_t surf_type)
+{
+   if (mt->mcs_mt == NULL)
+  return GEN8_SURFACE_AUX_MODE_NONE;
+
+   /*
+* From the BDW PRM, Volume 2d, page 260 (RENDER_SURFACE_STATE):
+* "When MCS is enabled for non-MSRT, HALIGN_16 must be used"
+*
+* From the hardware spec for GEN9:
+* "When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, HALIGN
+*  16 must be used."
+*/
+   if (brw->gen >= 9 || mt->num_samples == 1)
+  assert(mt->halign == 16);
+
+   return GEN8_SURFACE_AUX_MODE_MCS;
+}
+
 static void
 gen8_emit_texture_surface_state(struct brw_context *brw,
 struct intel_mipmap_tree *mt,
@@ -209,13 +231,13 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
 bool rw, bool for_gather)
 {
const unsigned depth = max_layer - min_layer;
-   struct intel_mipmap_tree *aux_mt = NULL;
-   uint32_t aux_mode = GEN8_SURFACE_AUX_MODE_NONE;
+   struct intel_mipmap_tree *aux_mt = mt->mcs_mt;
uint32_t mocs_wb = brw->gen >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB;
int surf_index = surf_offset - >wm.base.surf_offset[0];
unsigned tiling_mode, pitch;
const unsigned tr_mode = surface_tiling_resource_mode(mt->tr_mode);
const uint32_t surf_type = translate_tex_target(target);
+   uint32_t aux_mode = gen8_get_aux_mode(brw, mt, surf_type);
 
if (mt->format == MESA_FORMAT_S_UINT8) {
   tiling_mode = GEN8_SURFACE_TILING_W;
@@ -229,20 +251,9 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
 * buffer should always have been resolved before it is used as a texture
 * so there is no need for it.
 */
-   if (mt->mcs_mt && mt->num_samples > 1) {
-  aux_mt = mt->mcs_mt;
-  aux_mode = GEN8_SURFACE_AUX_MODE_MCS;
-
-  /*
-   * From the BDW PRM, Volume 2d, page 260 (RENDER_SURFACE_STATE):
-   * "When MCS is enabled for non-MSRT, HALIGN_16 must be used"
-   *
-   * From the hardware spec for GEN9:
-   * "When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, HALIGN
-   *  16 must be used."
-   */
-  if (brw->gen >= 9 || mt->num_samples == 1)
- assert(mt->halign == 16);
+   if (mt->num_samples <= 1) {
+  aux_mt = NULL;
+  aux_mode = GEN8_SURFACE_AUX_MODE_NONE;
}
 
uint32_t *surf = allocate_surface_state(brw, surf_offset, surf_index);
@@ -418,8 +429,6 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
struct gl_context *ctx = >ctx;
struct intel_renderbuffer *irb = intel_renderbuffer(rb);
struct intel_mipmap_tree *mt = irb->mt;
-   struct intel_mipmap_tree *aux_mt = NULL;
-   uint32_t aux_mode = GEN8_SURFACE_AUX_MODE_NONE;
unsigned width = mt->logical_width0;
unsigned height = mt->logical_height0;
unsigned pitch = mt->pitch;
@@ -472,21 +481,8 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
__func__, _mesa_get_format_name(rb_format));
}
 
-   if (mt->mcs_mt) {
-  aux_mt = mt->mcs_mt;
-  aux_mode = GEN8_SURFACE_AUX_MODE_MCS;
-
-  /*
-   * From the BDW PRM, Volume 2d, page 260 (RENDER_SURFACE_STATE):
-   * "When MCS is enabled for non-MSRT, HALIGN_16 must be used"
-   *
-   * From the hardware spec for GEN9:
-   * "When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, HALIGN
-   *  16 must be used."
-   */
-  if (brw->gen >= 9 || mt->num_samples == 1)
- assert(mt->halign == 16);
-   }
+   struct intel_mipmap_tree *aux_mt = mt->mcs_mt;
+   const uint32_t aux_mode = gen8_get_aux_mode(brw, mt, surf_type);
 
uint32_t *surf = allocate_surface_state(brw, , surf_index);
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Matt Arsenault

> On Feb 8, 2016, at 08:08, Tom Stellard  wrote:
> 
> Do SI/CI support fp64 denorms?  If so, won't this hurt performance?
> 
> We should tell the compiler we are enabling fp-64 denorms by adding
> +fp64-denormals to the feature string.  It would also be better to
> read the float_mode value from the config registers emitted by the
> compiler.

Yes, the runtime here should read the value out of the binary and enable it in 
the compiler rather than the runtime hardcoding it. If you wanted to load a 
shader with different FP rules for example it should be able to switch.

-Matt___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] draw: use util_pstipple_create_fragment_shader

2016-02-08 Thread Brian Paul

On 02/08/2016 07:59 AM, Nicolai Hähnle wrote:

Ping?


Looks good.  For both:

Reviewed-by: Brian Paul 




On 22.01.2016 11:56, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

This reduces code duplication. It also adds support for drivers where the
fragment position is a system value.

Suggested-by: Jose Fonseca 
---
A basic polygon stippling test shows no regression on llvmpipe, but
that's
the extent of my testing.

  src/gallium/auxiliary/draw/draw_pipe_pstipple.c | 209
++--
  1 file changed, 12 insertions(+), 197 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
index cf52ca4..e468cc3 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
@@ -43,10 +43,10 @@
  #include "util/u_format.h"
  #include "util/u_math.h"
  #include "util/u_memory.h"
+#include "util/u_pstipple.h"
  #include "util/u_sampler.h"

  #include "tgsi/tgsi_transform.h"
-#include "tgsi/tgsi_dump.h"

  #include "draw_context.h"
  #include "draw_pipe.h"
@@ -114,178 +114,6 @@ struct pstip_stage
  };


-
-/**
- * Subclass of tgsi_transform_context, used for transforming the
- * user's fragment shader to add the extra texture sample and
fragment kill
- * instructions.
- */
-struct pstip_transform_context {
-   struct tgsi_transform_context base;
-   uint tempsUsed;  /**< bitmask */
-   int wincoordInput;
-   int maxInput;
-   uint samplersUsed;  /**< bitfield of samplers used */
-   bool hasSview;
-   int freeSampler;  /** an available sampler for the pstipple */
-   int texTemp;  /**< temp registers */
-   int numImmed;
-};
-
-
-/**
- * TGSI declaration transform callback.
- * Look for a free sampler, a free input attrib, and two free temp regs.
- */
-static void
-pstip_transform_decl(struct tgsi_transform_context *ctx,
- struct tgsi_full_declaration *decl)
-{
-   struct pstip_transform_context *pctx = (struct
pstip_transform_context *) ctx;
-
-   if (decl->Declaration.File == TGSI_FILE_SAMPLER) {
-  uint i;
-  for (i = decl->Range.First;
-   i <= decl->Range.Last; i++) {
- pctx->samplersUsed |= 1 << i;
-  }
-   }
-   else if (decl->Declaration.File == TGSI_FILE_SAMPLER_VIEW) {
-  pctx->hasSview = true;
-   }
-   else if (decl->Declaration.File == TGSI_FILE_INPUT) {
-  pctx->maxInput = MAX2(pctx->maxInput, (int) decl->Range.Last);
-  if (decl->Semantic.Name == TGSI_SEMANTIC_POSITION)
- pctx->wincoordInput = (int) decl->Range.First;
-   }
-   else if (decl->Declaration.File == TGSI_FILE_TEMPORARY) {
-  uint i;
-  for (i = decl->Range.First;
-   i <= decl->Range.Last; i++) {
- pctx->tempsUsed |= (1 << i);
-  }
-   }
-
-   ctx->emit_declaration(ctx, decl);
-}
-
-
-/**
- * TGSI immediate declaration transform callback.
- * We're just counting the number of immediates here.
- */
-static void
-pstip_transform_immed(struct tgsi_transform_context *ctx,
-  struct tgsi_full_immediate *immed)
-{
-   struct pstip_transform_context *pctx = (struct
pstip_transform_context *) ctx;
-   ctx->emit_immediate(ctx, immed); /* emit to output shader */
-   pctx->numImmed++;
-}
-
-
-/**
- * Find the lowest zero bit in the given word, or -1 if bitfield is
all ones.
- */
-static int
-free_bit(uint bitfield)
-{
-   return ffs(~bitfield) - 1;
-}
-
-
-/**
- * TGSI transform prolog callback.
- */
-static void
-pstip_transform_prolog(struct tgsi_transform_context *ctx)
-{
-   struct pstip_transform_context *pctx = (struct
pstip_transform_context *) ctx;
-   uint i;
-   int wincoordInput;
-
-   /* find free sampler */
-   pctx->freeSampler = free_bit(pctx->samplersUsed);
-   if (pctx->freeSampler >= PIPE_MAX_SAMPLERS)
-  pctx->freeSampler = PIPE_MAX_SAMPLERS - 1;
-
-   if (pctx->wincoordInput < 0)
-  wincoordInput = pctx->maxInput + 1;
-   else
-  wincoordInput = pctx->wincoordInput;
-
-   /* find one free temp reg */
-   for (i = 0; i < 32; i++) {
-  if ((pctx->tempsUsed & (1 << i)) == 0) {
-  /* found a free temp */
-  if (pctx->texTemp < 0)
- pctx->texTemp  = i;
-  else
- break;
-  }
-   }
-   assert(pctx->texTemp >= 0);
-
-   if (pctx->wincoordInput < 0) {
-  /* declare new position input reg */
-  tgsi_transform_input_decl(ctx, wincoordInput,
-TGSI_SEMANTIC_POSITION, 1,
-TGSI_INTERPOLATE_LINEAR);
-   }
-
-   /* declare new sampler */
-   tgsi_transform_sampler_decl(ctx, pctx->freeSampler);
-
-   /* if the src shader has SVIEW decl's for each SAMP decl, we
-* need to continue the trend and ensure there is a matching
-* SVIEW for the new SAMP we just created
-*/
-   if (pctx->hasSview) {
-  tgsi_transform_sampler_view_decl(ctx,
-   pctx->freeSampler,
-   

[Mesa-dev] [PATCH 11/23] i965/gen8: Remove dead assertion

2016-02-08 Thread Topi Pohjolainen
The assertion is inside a condition mandating num_samples > 1 and
therefore the first half of the constraint is always met. The
second half in turn would only be applicable for single sampled
case and moreover it is trying to falsely check against surface
type instead of format.
Subsequent patches will introduce proper support for the lossless
compression and dropping this here makes the patches a little
simpler.

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index 0df25d2..fc8f701 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -243,12 +243,6 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
*/
   if (brw->gen >= 9 || mt->num_samples == 1)
  assert(mt->halign == 16);
-
-  if (brw->gen >= 9) {
- assert(mt->num_samples > 1 ||
-brw_losslessly_compressible_format(brw, surf_type));
-  }
-
}
 
uint32_t *surf = allocate_surface_state(brw, surf_offset, surf_index);
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/23] i965: Use miptree non-aligned dimensions directly for x-tiled

2016-02-08 Thread Topi Pohjolainen
The logic in intel_miptree_create() uses the local copies
for 64-byte aligned equivalent but only for stencil buffers which
in turn are never x-tiled. This makes the logic a little more
explicit and helps to keep subsequent patches easier to read.

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 0edd59f..033f4c6 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -698,7 +698,7 @@ intel_miptree_create(struct brw_context *brw,
   mt->tiling = I915_TILING_X;
   drm_intel_bo_unreference(mt->bo);
   mt->bo = drm_intel_bo_alloc_tiled(brw->bufmgr, "miptree",
-  total_width, total_height, mt->cpp,
+  mt->total_width, mt->total_height, mt->cpp,
   >tiling, , alloc_flags);
   mt->pitch = pitch;
}
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 22/23] i965: Add helper for checking for lossless compressible

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 21 +
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  3 +++
 2 files changed, 24 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 1fd2654..59961f2 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -270,6 +270,27 @@ intel_miptree_supports_non_msrt_fast_clear(struct 
brw_context *brw,
   return true;
 }
 
+bool
+intel_miptree_supports_lossless_compressed(mesa_format format)
+{
+   /* For now compression is only enabled for integer formats even though
+* there exist supported floating point formats also. This is a heuristic
+* decision based on current public benchmarks. In none of the cases these
+* formats provided any improvement but a few cases were seen to regress.
+* Hence these are left to to be enabled in the future when they are known
+* to improve things.
+*/
+   if (!_mesa_is_format_integer_color(format))
+  return false;
+
+   /* In principle, fast clear mechanism and lossless compression go hand in
+* hand. However, fast clear can be also used to clear srgb surfaces by
+* using equivalent linear format. This trick, however, can't be extended
+* to be used with lossless compression and therefore a check is needed to
+* see if the format really is linear.
+*/
+   return _mesa_get_srgb_format_linear(format) == format;
+}
 
 /**
  * Determine depth format corresponding to a depth+stencil format,
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index f05436d..3a1ecd2 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -684,6 +684,9 @@ intel_miptree_supports_non_msrt_fast_clear(struct 
brw_context *brw,
const struct intel_mipmap_tree *mt);
 
 bool
+intel_miptree_supports_lossless_compressed(mesa_format format);
+
+bool
 intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw,
  struct intel_mipmap_tree *mt);
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 21/23] i965/gen9: Setup MCS for compressed texture surfaces

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index b140ff4..eaf5874 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -250,11 +250,12 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
   pitch = mt->pitch;
}
 
-   /* The MCS is not uploaded for single-sampled surfaces because the color
-* buffer should always have been resolved before it is used as a texture
-* so there is no need for it.
+   /* Prior to Gen9 MCS is not uploaded for single-sampled surfaces because
+* the color buffer should always have been resolved before it is used as
+* a texture so there is no need for it. On Gen9 it will be uploaded when
+* the surface is losslessly compressed (CCS_E).
 */
-   if (mt->num_samples <= 1) {
+   if (mt->num_samples <= 1 && aux_mode != GEN9_SURFACE_AUX_MODE_CCS_E) {
   aux_mt = NULL;
   aux_mode = GEN8_SURFACE_AUX_MODE_NONE;
}
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 23/23] i965/gen9: Enable lossless compression

2016-02-08 Thread Topi Pohjolainen
I tried first creating the auxiliary buffer the same time with the
color buffer. That, however, led me into a situation where we would
later create the rest of the mip-levels and the compression would
need to be disabled (it is only supported for single level buffers).

Here we try to create it on demand just before the hardware starts
to render. This is similar what we do with fast clear buffers,
their creation is deferred until the first clear.
This setup also gives the opportunity to detect if the miptree
represents the temporaty texture used internally in the mesa core.
This texture is mostly written by cpu and therefore enabling
compression for it doesn't make much sense.

Note that a heuristic is included. Floating point formats are not
enabled yet as they are only seen to hurt performance.

Some highlights with window system driver kept fixed to default
and only the application driver changing:

Manhattan: 8.32152% +/- 0.355881%
Offscreen: 9.09713% +/- 0.340763%

Glb trex: 8.46231% +/- 0.460624%
Offscreen: 9.31872% +/- 0.463743%

Numbers from our system where the driver is changed also for
the windowing environment:

GFXBench3_Manhattan 41.8 FPS  12.0 %  46.8 FPS
GFXBench3_Manhattan_OffScreen   48.7 FPS  9.0 %   53.1 FPS

GLBenchmark_Trex_FixedTime133.0 FPS  9.0 %   145.0 FPS
GLBenchmark_Trex_FixedTime_OffScreen  168.0 FPS  9.5 %   184.0 FPS

Unigine-heaven regresses: -2.31021% +/- 0.217207%. There are no color
resolves needed during the run so the hit comes from something else.
Perhaps the content is such that it doesn't really compress but
the additional work required of the hardware to maintain the
associated meta data slows us down.

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/gen8_surface_state.c |  9 +
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c  | 23 ++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index eaf5874..b9b6b5a 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -485,6 +485,15 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
__func__, _mesa_get_format_name(rb_format));
}
 
+   /* Consider if lossless compression is supported but the needed
+* auxiliary buffer doesn't exist yet.
+*/
+   if (brw->gen >= 9 && mt->mcs_mt == NULL &&
+   intel_tiling_supports_non_msrt_mcs(brw, mt->tiling) &&
+   intel_miptree_supports_non_msrt_fast_clear(brw, mt) &&
+   intel_miptree_supports_lossless_compressed(mt->format))
+  intel_miptree_alloc_non_msrt_mcs(brw, mt);
+
struct intel_mipmap_tree *aux_mt = mt->mcs_mt;
const uint32_t aux_mode = gen8_get_aux_mode(brw, mt, surf_type);
 
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 59961f2..083139a 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -770,7 +770,8 @@ intel_miptree_create(struct brw_context *brw,
/* If this miptree is capable of supporting fast color clears, set
 * fast_clear_state appropriately to ensure that fast clears will occur.
 * Allocation of the MCS miptree will be deferred until the first fast
-* clear actually occurs.
+* clear actually occurs or when compressed single sampled buffer is
+* written by the GPU for the first time.
 */
if (intel_tiling_supports_non_msrt_mcs(brw, mt->tiling) &&
intel_miptree_supports_non_msrt_fast_clear(brw, mt)) {
@@ -1598,6 +1599,26 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw,
INTEL_MSAA_LAYOUT_NONE,
layout_flags);
 
+   /* From Gen9 onwards single-sampled (non-msrt) auxiliary buffers are
+* used for lossless compression which requires similar initialisation
+* as multi-sample compression.
+*/
+   if (brw->gen >= 9 &&
+   intel_miptree_supports_lossless_compressed(mt->format)) {
+  /* Hardware sets the auxiliary buffer to all zeroes when it does full
+   * resolve. Initialize it accordingly in case the first renderer is
+   * cpu (or other none compression aware party).
+   *
+   * This is also explicitly stated in the spec (MCS Buffer for Render
+   * Target(s)):
+   *   "If Software wants to enable Color Compression without Fast clear,
+   *Software needs to initialize MCS with zeros."
+   */
+  intel_miptree_init_mcs(brw, mt, 0);
+  mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED;
+  mt->msaa_layout = INTEL_MSAA_LAYOUT_CSS;
+   }
+
return mt->mcs_mt;
 }
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/23] i965: Use constant pointer when checking for compression

2016-02-08 Thread Topi Pohjolainen
Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/brw_context.h | 2 +-
 src/mesa/drivers/dri/i965/brw_surface_formats.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 55d6723..5c63b8f 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1573,7 +1573,7 @@ void brw_upload_image_surfaces(struct brw_context *brw,
 /* brw_surface_formats.c */
 bool brw_render_target_supported(struct brw_context *brw,
  struct gl_renderbuffer *rb);
-bool brw_losslessly_compressible_format(struct brw_context *brw,
+bool brw_losslessly_compressible_format(const struct brw_context *brw,
 uint32_t brw_format);
 uint32_t brw_depth_format(struct brw_context *brw, mesa_format format);
 mesa_format brw_lower_mesa_image_format(const struct brw_device_info *devinfo,
diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
b/src/mesa/drivers/dri/i965/brw_surface_formats.c
index b5c1a35..3c0b23b 100644
--- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
+++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
@@ -824,7 +824,7 @@ brw_render_target_supported(struct brw_context *brw,
  * compression.
  */
 bool
-brw_losslessly_compressible_format(struct brw_context *brw,
+brw_losslessly_compressible_format(const struct brw_context *brw,
uint32_t brw_format)
 {
const struct surface_format_info * const sinfo =
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] winsys/radeon: fix a wrong NUM_TILE_PIPES value from the kernel

2016-02-08 Thread Michel Dänzer
On 08.02.2016 04:25, Marek Olšák wrote:
> From: Marek Olšák 
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94019
> ---
>  src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c 
> b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
> index 35dc7e6..49c310c 100644
> --- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
> @@ -405,6 +405,12 @@ static boolean do_winsys_init(struct radeon_drm_winsys 
> *ws)
>  radeon_get_drm_value(ws->fd, RADEON_INFO_NUM_TILE_PIPES, NULL,
>   >info.num_tile_pipes);
>  
> +/* The kernel returns 12 for some cards for an unknown reason.
> + * I thought this was supposed to be a power of two.
> + */
> +if (ws->gen == DRV_SI && ws->info.num_tile_pipes == 12)
> +ws->info.num_tile_pipes = 8;
> +
>  if (radeon_get_drm_value(ws->fd, RADEON_INFO_BACKEND_MAP, NULL,
>>info.r600_gb_backend_map))
>  ws->info.r600_gb_backend_map_valid = TRUE;
> 

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: move GL_ARB_debug_output code into new debug_output.c file

2016-02-08 Thread Brian Paul

On 02/05/2016 06:28 PM, Timothy Arceri wrote:

On Fri, 2016-02-05 at 17:54 -0700, Brian Paul wrote:

The errors.c file had grown quite large so split off this extension
code into its own file.  This involved making a handful of functions
non-static.


I was going to do this when I added KHR_debug but was too new at the
time and didn't want to make such a big change.

Acked-by: Timothy Arceri 

I guess we could also move the object labels suff in there but probably
not worth the effort.


I think I'd prefer to keep them separate as-is, actually.

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Tom Stellard
On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
> From: Marek Olšák 
> 
> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
> but not SI & CI, which can't disable denorms for those instructions.

Do you know why this fixes FP16 conversions?  What does the OpenGL
spec say about denormal handing?

> ---
>  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
>  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
>  src/gallium/drivers/radeonsi/sid.h  |  3 +++
>  3 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index a4680ce..3f1db70 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
>  
>   si_shader_binary_read_config(binary, conf, 0);
>  
> + /* Enable 64-bit and 16-bit denormals, because there is no performance
> +  * cost.
> +  *
> +  * If denormals are enabled, all floating-point output modifiers are
> +  * ignored.
> +  *
> +  * Don't enable denormals for 32-bit floats, because:
> +  * - Floating-point output modifiers would be ignored by the hw.
> +  * - Some opcodes don't support denormals, such as v_mad_f32. We would
> +  *   have to stop using those.
> +  * - SI & CI would be very slow.
> +  */
> + conf->float_mode |= V_00B028_FP_64_DENORMS;
> +

Do SI/CI support fp64 denorms?  If so, won't this hurt performance?

We should tell the compiler we are enabling fp-64 denorms by adding
+fp64-denormals to the feature string.  It would also be better to
read the float_mode value from the config registers emitted by the
compiler.


-Tom
>   FREE(binary->config);
>   FREE(binary->global_symbol_offsets);
>   binary->config = NULL;
> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
> b/src/gallium/drivers/radeonsi/si_state_shaders.c
> index ce795c0..77a4e47 100644
> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
> @@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader)
>   shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / 
> 4) |
>  S_00B528_SGPRS((num_sgprs - 1) / 8) |
>  S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) |
> -S_00B528_DX10_CLAMP(1);
> +S_00B528_DX10_CLAMP(1) |
> +S_00B528_FLOAT_MODE(shader->config.float_mode);
>   shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) |
>  
> S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0);
>  }
> @@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader)
>   si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS,
>  S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B428_SGPRS((num_sgprs - 1) / 8) |
> -S_00B428_DX10_CLAMP(1));
> +S_00B428_DX10_CLAMP(1) |
> +S_00B428_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS,
>  S_00B42C_USER_SGPR(num_user_sgprs) |
>  
> S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader)
>  S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B328_SGPRS((num_sgprs - 1) / 8) |
>  S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) |
> -S_00B328_DX10_CLAMP(1));
> +S_00B328_DX10_CLAMP(1) |
> +S_00B328_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES,
>  S_00B32C_USER_SGPR(num_user_sgprs) |
>  
> S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader)
>   si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS,
>  S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B228_SGPRS((num_sgprs - 1) / 8) |
> -S_00B228_DX10_CLAMP(1));
> +S_00B228_DX10_CLAMP(1) |
> +S_00B228_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS,
>  S_00B22C_USER_SGPR(num_user_sgprs) |
>  
> S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct 
> si_shader *gs)
>  S_00B128_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  

[Mesa-dev] [PATCH 2/3] st/mesa: move the setup_bitmap_vertex_data() code into draw_bitmap_quad()

2016-02-08 Thread Brian Paul
Now all the code to setup the vertex data and draw it is in one place.
---
 src/mesa/state_tracker/st_cb_bitmap.c | 168 --
 1 file changed, 78 insertions(+), 90 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_bitmap.c 
b/src/mesa/state_tracker/st_cb_bitmap.c
index 31f57c4..c26ee7f 100644
--- a/src/mesa/state_tracker/st_cb_bitmap.c
+++ b/src/mesa/state_tracker/st_cb_bitmap.c
@@ -176,77 +176,6 @@ make_bitmap_texture(struct gl_context *ctx, GLsizei width, 
GLsizei height,
return pt;
 }
 
-static void
-setup_bitmap_vertex_data(struct st_context *st, bool normalized,
- int x, int y, int width, int height,
- float z, const float color[4],
-struct pipe_resource **vbuf,
-unsigned *vbuf_offset)
-{
-   const GLfloat fb_width = (GLfloat)st->state.framebuffer.width;
-   const GLfloat fb_height = (GLfloat)st->state.framebuffer.height;
-   const GLfloat x0 = (GLfloat)x;
-   const GLfloat x1 = (GLfloat)(x + width);
-   const GLfloat y0 = (GLfloat)y;
-   const GLfloat y1 = (GLfloat)(y + height);
-   GLfloat sLeft = (GLfloat)0.0, sRight = (GLfloat)1.0;
-   GLfloat tTop = (GLfloat)0.0, tBot = (GLfloat)1.0 - tTop;
-   const GLfloat clip_x0 = (GLfloat)(x0 / fb_width * 2.0 - 1.0);
-   const GLfloat clip_y0 = (GLfloat)(y0 / fb_height * 2.0 - 1.0);
-   const GLfloat clip_x1 = (GLfloat)(x1 / fb_width * 2.0 - 1.0);
-   const GLfloat clip_y1 = (GLfloat)(y1 / fb_height * 2.0 - 1.0);
-   GLuint i;
-   float (*vertices)[3][4];  /**< vertex pos + color + texcoord */
-
-   if (!normalized) {
-  sRight = (GLfloat) width;
-  tBot = (GLfloat) height;
-   }
-
-   u_upload_alloc(st->uploader, 0, 4 * sizeof(vertices[0]), 4,
-  vbuf_offset, vbuf, (void **) );
-   if (!*vbuf) {
-  return;
-   }
-
-   /* Positions are in clip coords since we need to do clipping in case
-* the bitmap quad goes beyond the window bounds.
-*/
-   vertices[0][0][0] = clip_x0;
-   vertices[0][0][1] = clip_y0;
-   vertices[0][2][0] = sLeft;
-   vertices[0][2][1] = tTop;
-
-   vertices[1][0][0] = clip_x1;
-   vertices[1][0][1] = clip_y0;
-   vertices[1][2][0] = sRight;
-   vertices[1][2][1] = tTop;
-   
-   vertices[2][0][0] = clip_x1;
-   vertices[2][0][1] = clip_y1;
-   vertices[2][2][0] = sRight;
-   vertices[2][2][1] = tBot;
-   
-   vertices[3][0][0] = clip_x0;
-   vertices[3][0][1] = clip_y1;
-   vertices[3][2][0] = sLeft;
-   vertices[3][2][1] = tBot;
-   
-   /* same for all verts: */
-   for (i = 0; i < 4; i++) {
-  vertices[i][0][2] = z;
-  vertices[i][0][3] = 1.0f;
-  vertices[i][1][0] = color[0];
-  vertices[i][1][1] = color[1];
-  vertices[i][1][2] = color[2];
-  vertices[i][1][3] = color[3];
-  vertices[i][2][2] = 0.0; /*R*/
-  vertices[i][2][3] = 1.0; /*Q*/
-   }
-
-   u_upload_unmap(st->uploader);
-}
-
 
 /**
  * Setup pipeline state prior to rendering the bitmap textured quad.
@@ -395,36 +324,95 @@ draw_bitmap_quad(struct gl_context *ctx, GLint x, GLint 
y, GLfloat z,
struct st_context *st = st_context(ctx);
struct pipe_context *pipe = st->pipe;
struct pipe_resource *vbuf = NULL;
-   GLuint maxSize;
-   GLuint offset;
+   const float fb_width = (float) st->state.framebuffer.width;
+   const float fb_height = (float) st->state.framebuffer.height;
+   const float x0 = (float) x;
+   const float x1 = (float) (x + width);
+   const float y0 = (float) y;
+   const float y1 = (float) (y + height);
+   float sLeft = 0.0f, sRight = 1.0f;
+   float tTop = 0.0f, tBot = 1.0f - tTop;
+   const float clip_x0 = x0 / fb_width * 2.0f - 1.0f;
+   const float clip_y0 = y0 / fb_height * 2.0f - 1.0f;
+   const float clip_x1 = x1 / fb_width * 2.0f - 1.0f;
+   const float clip_y1 = y1 / fb_height * 2.0f - 1.0f;
+   float (*vertices)[3][4];  /**< vertex pos + color + texcoord */
+   unsigned offset, i;
 
/* limit checks */
-   /* XXX if the bitmap is larger than the max texture size, break
-* it up into chunks.
-*/
-   maxSize = 1 << (pipe->screen->get_param(pipe->screen,
+   {
+  /* XXX if the bitmap is larger than the max texture size, break
+   * it up into chunks.
+   */
+  GLuint maxSize = 1 << (pipe->screen->get_param(pipe->screen,
 PIPE_CAP_MAX_TEXTURE_2D_LEVELS) - 1);
-   assert(width <= (GLsizei)maxSize);
-   assert(height <= (GLsizei)maxSize);
+  assert(width <= (GLsizei) maxSize);
+  assert(height <= (GLsizei) maxSize);
+   }
 
setup_render_state(ctx, sv, color);
 
/* convert Z from [0,1] to [-1,-1] to match viewport Z scale/bias */
z = z * 2.0f - 1.0f;
 
-   /* draw textured quad */
-   setup_bitmap_vertex_data(st, sv->texture->target != PIPE_TEXTURE_RECT,
-   x, y, width, height, z, color, , );
-
-   if (vbuf) {
-  util_draw_vertex_buffer(pipe, st->cso_context, vbuf,
-  

[Mesa-dev] [PATCH 1/3] st/mesa: refactor some bitmap drawing code

2016-02-08 Thread Brian Paul
Move setup/restoration of rendering state into helper functions.
This makes the draw_bitmap_quad() function much more concise.
---
 src/mesa/state_tracker/st_cb_bitmap.c | 90 ++-
 1 file changed, 57 insertions(+), 33 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_bitmap.c 
b/src/mesa/state_tracker/st_cb_bitmap.c
index 87c606a..31f57c4 100644
--- a/src/mesa/state_tracker/st_cb_bitmap.c
+++ b/src/mesa/state_tracker/st_cb_bitmap.c
@@ -248,24 +248,18 @@ setup_bitmap_vertex_data(struct st_context *st, bool 
normalized,
 }
 
 
-
 /**
- * Render a glBitmap by drawing a textured quad
+ * Setup pipeline state prior to rendering the bitmap textured quad.
  */
 static void
-draw_bitmap_quad(struct gl_context *ctx, GLint x, GLint y, GLfloat z,
- GLsizei width, GLsizei height,
- struct pipe_sampler_view *sv,
- const GLfloat *color)
+setup_render_state(struct gl_context *ctx,
+   struct pipe_sampler_view *sv,
+   const GLfloat *color)
 {
struct st_context *st = st_context(ctx);
-   struct pipe_context *pipe = st->pipe;
struct cso_context *cso = st->cso_context;
struct st_fp_variant *fpv;
struct st_fp_variant_key key;
-   GLuint maxSize;
-   GLuint offset;
-   struct pipe_resource *vbuf = NULL;
 
memset(, 0, sizeof(key));
key.st = st->has_shareable_shaders ? NULL : st;
@@ -291,16 +285,6 @@ draw_bitmap_quad(struct gl_context *ctx, GLint x, GLint y, 
GLfloat z,
   COPY_4V(ctx->Current.Attrib[VERT_ATTRIB_COLOR0], colorSave);
}
 
-
-   /* limit checks */
-   /* XXX if the bitmap is larger than the max texture size, break
-* it up into chunks.
-*/
-   maxSize = 1 << (pipe->screen->get_param(pipe->screen,
-PIPE_CAP_MAX_TEXTURE_2D_LEVELS) - 1);
-   assert(width <= (GLsizei)maxSize);
-   assert(height <= (GLsizei)maxSize);
-
cso_save_rasterizer(cso);
cso_save_fragment_samplers(cso);
cso_save_fragment_sampler_views(cso);
@@ -372,6 +356,58 @@ draw_bitmap_quad(struct gl_context *ctx, GLint x, GLint y, 
GLfloat z,
 
cso_set_vertex_elements(cso, 3, st->velems_util_draw);
cso_set_stream_outputs(st->cso_context, 0, NULL, NULL);
+}
+
+
+/**
+ * Restore pipeline state after rendering the bitmap textured quad.
+ */
+static void
+restore_render_state(struct gl_context *ctx)
+{
+   struct st_context *st = st_context(ctx);
+   struct cso_context *cso = st->cso_context;
+
+   cso_restore_rasterizer(cso);
+   cso_restore_fragment_samplers(cso);
+   cso_restore_fragment_sampler_views(cso);
+   cso_restore_viewport(cso);
+   cso_restore_fragment_shader(cso);
+   cso_restore_vertex_shader(cso);
+   cso_restore_tessctrl_shader(cso);
+   cso_restore_tesseval_shader(cso);
+   cso_restore_geometry_shader(cso);
+   cso_restore_vertex_elements(cso);
+   cso_restore_aux_vertex_buffer_slot(cso);
+   cso_restore_stream_outputs(cso);
+}
+
+
+/**
+ * Render a glBitmap by drawing a textured quad
+ */
+static void
+draw_bitmap_quad(struct gl_context *ctx, GLint x, GLint y, GLfloat z,
+ GLsizei width, GLsizei height,
+ struct pipe_sampler_view *sv,
+ const GLfloat *color)
+{
+   struct st_context *st = st_context(ctx);
+   struct pipe_context *pipe = st->pipe;
+   struct pipe_resource *vbuf = NULL;
+   GLuint maxSize;
+   GLuint offset;
+
+   /* limit checks */
+   /* XXX if the bitmap is larger than the max texture size, break
+* it up into chunks.
+*/
+   maxSize = 1 << (pipe->screen->get_param(pipe->screen,
+PIPE_CAP_MAX_TEXTURE_2D_LEVELS) - 1);
+   assert(width <= (GLsizei)maxSize);
+   assert(height <= (GLsizei)maxSize);
+
+   setup_render_state(ctx, sv, color);
 
/* convert Z from [0,1] to [-1,-1] to match viewport Z scale/bias */
z = z * 2.0f - 1.0f;
@@ -389,19 +425,7 @@ draw_bitmap_quad(struct gl_context *ctx, GLint x, GLint y, 
GLfloat z,
   3); /* attribs/vert */
}
 
-   /* restore state */
-   cso_restore_rasterizer(cso);
-   cso_restore_fragment_samplers(cso);
-   cso_restore_fragment_sampler_views(cso);
-   cso_restore_viewport(cso);
-   cso_restore_fragment_shader(cso);
-   cso_restore_vertex_shader(cso);
-   cso_restore_tessctrl_shader(cso);
-   cso_restore_tesseval_shader(cso);
-   cso_restore_geometry_shader(cso);
-   cso_restore_vertex_elements(cso);
-   cso_restore_aux_vertex_buffer_slot(cso);
-   cso_restore_stream_outputs(cso);
+   restore_render_state(ctx);
 
pipe_resource_reference(, NULL);
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] st/mesa: don't allocate bitmap drawing state until needed

2016-02-08 Thread Brian Paul
Most apps don't use glBitmap so don't allocate the bitmap cache or
gallium state objects/shaders/etc until the first call to st_Bitmap().
---
 src/mesa/state_tracker/st_cb_bitmap.c | 145 ++
 src/mesa/state_tracker/st_cb_bitmap.h |   3 -
 src/mesa/state_tracker/st_context.c   |   1 -
 3 files changed, 77 insertions(+), 72 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_bitmap.c 
b/src/mesa/state_tracker/st_cb_bitmap.c
index c26ee7f..ca1dfab 100644
--- a/src/mesa/state_tracker/st_cb_bitmap.c
+++ b/src/mesa/state_tracker/st_cb_bitmap.c
@@ -497,8 +497,9 @@ create_cache_trans(struct st_context *st)
 void
 st_flush_bitmap_cache(struct st_context *st)
 {
-   if (!st->bitmap.cache->empty) {
-  struct bitmap_cache *cache = st->bitmap.cache;
+   struct bitmap_cache *cache = st->bitmap.cache;
+
+   if (cache && !st->bitmap.cache->empty) {
   struct pipe_context *pipe = st->pipe;
   struct pipe_sampler_view *sv;
 
@@ -617,6 +618,76 @@ accum_bitmap(struct gl_context *ctx,
 }
 
 
+/**
+ * One-time init for drawing bitmaps.
+ */
+static void
+init_bitmap_state(struct st_context *st)
+{
+   struct pipe_sampler_state *sampler = >bitmap.samplers[0];
+   struct pipe_context *pipe = st->pipe;
+   struct pipe_screen *screen = pipe->screen;
+
+   /* This function should only be called once */
+   assert(st->bitmap.cache == NULL);
+
+   /* alloc bitmap cache object */
+   st->bitmap.cache = ST_CALLOC_STRUCT(bitmap_cache);
+
+   /* init sampler state once */
+   memset(sampler, 0, sizeof(*sampler));
+   sampler->wrap_s = PIPE_TEX_WRAP_CLAMP;
+   sampler->wrap_t = PIPE_TEX_WRAP_CLAMP;
+   sampler->wrap_r = PIPE_TEX_WRAP_CLAMP;
+   sampler->min_img_filter = PIPE_TEX_FILTER_NEAREST;
+   sampler->min_mip_filter = PIPE_TEX_MIPFILTER_NONE;
+   sampler->mag_img_filter = PIPE_TEX_FILTER_NEAREST;
+   st->bitmap.samplers[1] = *sampler;
+   st->bitmap.samplers[1].normalized_coords = 1;
+
+   /* init baseline rasterizer state once */
+   memset(>bitmap.rasterizer, 0, sizeof(st->bitmap.rasterizer));
+   st->bitmap.rasterizer.half_pixel_center = 1;
+   st->bitmap.rasterizer.bottom_edge_rule = 1;
+   st->bitmap.rasterizer.depth_clip = 1;
+
+   /* find a usable texture format */
+   if (screen->is_format_supported(screen, PIPE_FORMAT_I8_UNORM,
+   PIPE_TEXTURE_2D, 0,
+   PIPE_BIND_SAMPLER_VIEW)) {
+  st->bitmap.tex_format = PIPE_FORMAT_I8_UNORM;
+   }
+   else if (screen->is_format_supported(screen, PIPE_FORMAT_A8_UNORM,
+PIPE_TEXTURE_2D, 0,
+PIPE_BIND_SAMPLER_VIEW)) {
+  st->bitmap.tex_format = PIPE_FORMAT_A8_UNORM;
+   }
+   else if (screen->is_format_supported(screen, PIPE_FORMAT_L8_UNORM,
+PIPE_TEXTURE_2D, 0,
+PIPE_BIND_SAMPLER_VIEW)) {
+  st->bitmap.tex_format = PIPE_FORMAT_L8_UNORM;
+   }
+   else {
+  /* XXX support more formats */
+  assert(0);
+   }
+
+   /* Create the vertex shader */
+   {
+  const uint semantic_names[] = { TGSI_SEMANTIC_POSITION,
+  TGSI_SEMANTIC_COLOR,
+st->needs_texcoord_semantic ? TGSI_SEMANTIC_TEXCOORD :
+  TGSI_SEMANTIC_GENERIC };
+  const uint semantic_indexes[] = { 0, 0, 0 };
+  st->bitmap.vs = util_make_vertex_passthrough_shader(st->pipe, 3,
+  semantic_names,
+  semantic_indexes,
+  FALSE);
+   }
+
+   reset_cache(st);
+}
+
 
 /**
  * Called via ctx->Driver.Bitmap()
@@ -632,6 +703,10 @@ st_Bitmap(struct gl_context *ctx, GLint x, GLint y,
assert(width > 0);
assert(height > 0);
 
+   if (!st->bitmap.cache) {
+  init_bitmap_state(st);
+   }
+
/* We only need to validate state of the st dirty flags are set or
 * any non-_NEW_PROGRAM_CONSTANTS mesa flags are set.  The VS we use
 * for bitmap drawing uses no constants and the FS constants are
@@ -641,19 +716,6 @@ st_Bitmap(struct gl_context *ctx, GLint x, GLint y,
   st_validate_state(st);
}
 
-   if (!st->bitmap.vs) {
-  /* create pass-through vertex shader now */
-  const uint semantic_names[] = { TGSI_SEMANTIC_POSITION,
-  TGSI_SEMANTIC_COLOR,
-st->needs_texcoord_semantic ? TGSI_SEMANTIC_TEXCOORD :
-  TGSI_SEMANTIC_GENERIC };
-  const uint semantic_indexes[] = { 0, 0, 0 };
-  st->bitmap.vs = util_make_vertex_passthrough_shader(st->pipe, 3,
-  semantic_names,
-  semantic_indexes,
-  FALSE);
-   }
-
if (UseBitmapCache && 

Re: [Mesa-dev] [PATCH 2/7] st/mesa: implement GL_ATI_fragment_shader

2016-02-08 Thread Marek Olšák
I see that the shader doesn't specify texture target types and they
must be obtained from the current GL states. That means texture
targets have to be properly listed in the shader key. The shader
compilation *must not* read any states from the current context. The
shader key is the only place from which the compilation can read
states.

Right now, if an ATI fragment shader is used with a different texture
target than it has been compiled for, the texturing won't work.

Marek

On Fri, Feb 5, 2016 at 10:11 PM, Miklós Máté  wrote:
> v2: fix arithmetic for special opcodes
>  (based on comments from Marek and Ilia),
>  fix fog state, cleanup
> ---
>  src/mesa/Makefile.sources |   1 +
>  src/mesa/program/program.h|   2 +
>  src/mesa/state_tracker/st_atifs_to_tgsi.c | 734 
> ++
>  src/mesa/state_tracker/st_atifs_to_tgsi.h |  65 +++
>  src/mesa/state_tracker/st_atom_constbuf.c |  14 +
>  src/mesa/state_tracker/st_atom_shader.c   |   5 +-
>  src/mesa/state_tracker/st_cb_drawpixels.c |   1 +
>  src/mesa/state_tracker/st_cb_program.c|  36 +-
>  src/mesa/state_tracker/st_program.c   |  30 +-
>  src/mesa/state_tracker/st_program.h   |   4 +
>  10 files changed, 889 insertions(+), 3 deletions(-)
>  create mode 100644 src/mesa/state_tracker/st_atifs_to_tgsi.c
>  create mode 100644 src/mesa/state_tracker/st_atifs_to_tgsi.h
>
> diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
> index ffe560f..23fe42a 100644
> --- a/src/mesa/Makefile.sources
> +++ b/src/mesa/Makefile.sources
> @@ -393,6 +393,7 @@ VBO_FILES = \
> vbo/vbo_split_inplace.c
>
>  STATETRACKER_FILES = \
> +   state_tracker/st_atifs_to_tgsi.c \
> state_tracker/st_atom_array.c \
> state_tracker/st_atom_atomicbuf.c \
> state_tracker/st_atom_blend.c \
> diff --git a/src/mesa/program/program.h b/src/mesa/program/program.h
> index 24e0597..09e6928 100644
> --- a/src/mesa/program/program.h
> +++ b/src/mesa/program/program.h
> @@ -172,6 +172,8 @@ _mesa_program_enum_to_shader_stage(GLenum v)
>return MESA_SHADER_VERTEX;
> case GL_FRAGMENT_PROGRAM_ARB:
>return MESA_SHADER_FRAGMENT;
> +   case GL_FRAGMENT_SHADER_ATI:
> +  return MESA_SHADER_FRAGMENT;
> case GL_GEOMETRY_PROGRAM_NV:
>return MESA_SHADER_GEOMETRY;
> case GL_TESS_CONTROL_PROGRAM_NV:
> diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c 
> b/src/mesa/state_tracker/st_atifs_to_tgsi.c
> new file mode 100644
> index 000..fe303f6
> --- /dev/null
> +++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c
> @@ -0,0 +1,734 @@
> +/*
> + * Copyright (C) 2016 Miklós Máté
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include "main/mtypes.h"
> +#include "main/atifragshader.h"
> +#include "main/texobj.h"
> +#include "main/errors.h"
> +#include "program/prog_parameter.h"
> +
> +#include "tgsi/tgsi_transform.h"
> +#include "tgsi/tgsi_ureg.h"
> +#include "util/u_math.h"
> +#include "util/u_memory.h"
> +
> +#include "st_program.h"
> +#include "st_atifs_to_tgsi.h"
> +
> +/**
> + * Intermediate state used during shader translation.
> + */
> +struct st_translate {
> +   struct ureg_program *ureg;
> +   struct gl_context *ctx;
> +   struct ati_fragment_shader *atifs;
> +
> +   struct ureg_dst temps[MAX_PROGRAM_TEMPS];
> +   struct ureg_src *constants;
> +   struct ureg_dst outputs[PIPE_MAX_SHADER_OUTPUTS];
> +   struct ureg_src inputs[PIPE_MAX_SHADER_INPUTS];
> +   struct ureg_src samplers[PIPE_MAX_SAMPLERS];
> +
> +   const GLuint *inputMapping;
> +   const GLuint *outputMapping;
> +
> +   unsigned current_pass;
> +
> +   bool regs_written[MAX_NUM_PASSES_ATI][MAX_NUM_FRAGMENT_REGISTERS_ATI];
> +
> +   boolean error;
> +};
> +
> +struct instruction_desc {
> +   unsigned TGSI_opcode;
> +   const char *name;
> +   unsigned char arg_count;
> +   unsigned char special; /* no 1:1 

Re: [Mesa-dev] [PATCH v2 i-g-t] igt/list-workarounds: Extend the script to Mesa

2016-02-08 Thread Damien Lespiau
On Fri, Feb 05, 2016 at 04:12:08PM -0800, Dylan Baker wrote:
> > >   parse(work_arounds)
> > > + print "\nList of workarounds found in %s:" % project
> 
> Hey Damien, the script says it's python 3, and this ^^^ is broken syntax
> in python 3 (but not in 2).

:(

I did notice the python2 construct, but then, of course, the diff is
missing the full context so didn't realize it was a python3 script.

Sent and pushed the obvious fix.

I really want to trust that developers run the code at least once before
submitting, even if it's a rework of the original patch. Even better
would be a simple unit test, and hook make distcheck to patchwork. I'll
look into that at some point.

-- 
Damien
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Marek Olšák
On Mon, Feb 8, 2016 at 1:11 PM, Roland Scheidegger  wrote:
> This looks good to me, albeit I know nothing about the hw.
> So VI could do (just with some restrictios) even full-speed fp32 denorms
> whereas SI/CI can't? Interesting, I suppose that would be intended for

Yes, VI has full-speed fp32 denorms except for a few instructions
(e.g. MAD) which can be replaced by other instructions.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] st/mesa: enable AoA for gallium drivers reporting GLSL 1.30

2016-02-08 Thread Mike Lothian
Cool, thanks.

On Mon, 8 Feb 2016, 4:33 a.m. Dave Airlie  wrote:

> On 8 February 2016 at 14:26, Mike Lothian  wrote:
> > Does that also add in AoA for OpenGL ES 3.1 or will that require more
> work?
>
> Good question, I've no idea. but I think desktop is > GLES in this
> case, so I should
> update GL3.txt for that as well then.
>
> Dave.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] Simple Klocwork patches

2016-02-08 Thread Eero Tamminen

Hi,

On 08.02.2016 00:07, Matt Turner wrote:

On Sun, Feb 7, 2016 at 1:37 PM, Juha-Pekka Heikkilä
 wrote:

I know there are lot of places where there is malloc unchecked still
-- and then there is ralloc which is a story of its own. Reason why I
think checking these would be remotely useful in windows only (or
other way around, not under linux kernel) is on Windows one can get
the null pointer from malloc. On Androids I think memory over
committing has always been enabled and on Linux I suspect I belong to
the minority who like to set ulimits for memory.

I agree checking these mostly is quite useless but there are those
corners where it may suddenly become valuable. When process is running
and everything has settled it will be weird if hit any of these checks
but any code which is run when process is starting I notice is the
place where things will fail if they fail. This is of course just my
opinion about the value of these checks but I really dislike
possibility of segfault when it is coming from a library.

I didn't quickly notice where _mesa_error() get more heap. Stack it of
course needs but when I did stress test these _mesa_error() did still
work. Cannot promise my test was 100% correct though, I think it was
over year ago when I was playing with it.


There's no guarantee that fprintf() doesn't call malloc. In fact, glibc's does.


If one is just concerned about not calling something that may use 
malloc, the functions listed by "man 7 signal" as async-signal-safe are 
safe in that respect (one cannot use malloc within signal handler context).




Adding these checks is really useless.


If one runs out of memory when overcommit is enabled, in theory about 
anything the program does, can cause it to use more memory.  Not just 
allocations.   E.g. running code for handling an error, may cause kernel 
to need to allocate space to page that code into RAM.  This can happen 
even if the isn't run for the first time, if kernel had dropped that 
page from RAM.



- Eero

(Modern desktop systems don't work well without overcommit.  Having 
system with programs that have large number of threads or otherwise 
hardly used memory mapping, like is case e.g. on fork, could be in 
trouble, unless they have overtly large amounts of RAM.)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code

2016-02-08 Thread Marek Olšák
On Mon, Feb 8, 2016 at 12:33 AM, Ilia Mirkin  wrote:
> On Sun, Feb 7, 2016 at 6:26 PM, Axel Davy  wrote:
>> On 08/02/2016 00:21, Ilia Mirkin wrote:
>>>
>>> On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy  wrote:

 SQRT is not supported everywhere, so replace
 it by RSQ + RCP

 Signed-off-by: Axel Davy 
 ---
   src/gallium/state_trackers/nine/nine_ff.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/state_trackers/nine/nine_ff.c
 b/src/gallium/state_trackers/nine/nine_ff.c
 index a5466a7..894fc63 100644
 --- a/src/gallium/state_trackers/nine/nine_ff.c
 +++ b/src/gallium/state_trackers/nine/nine_ff.c
 @@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct
 vs_build_ctx *vs)
   struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);

   ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1]));
 -ureg_SQRT(ureg, tmp_y, _X(tmp));
 +ureg_RSQ(ureg, tmp_y, _X(tmp));
 +ureg_RCP(ureg, tmp_y, _Y(tmp));
>>>
>>> I'd recommend doing
>>>
>>> ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp))
>>>
>>> instead. That should be (a) more numerically stable (rcp doesn't have
>>> great precision), and (b) not blow up for 0.
>>
>> Ok for the precision, but I'm not sure for 0
>>
>> With the mul version, with 0, it ends up computing inf * 0 = NaN,
>> whereas with the rcp version, it does 1/inf == 0 (as far as I know),
>> which is the expected result.
>
> Hmmm... not sure what RSQ(0) returns actually. I assumed it was NaN.
> What you really want is a "flush nan to 0" option on the mul like nvc0
> has, but there's no way to express that in TGSI.
>
> Perhaps you can keep the SQRT if PIPE_CAP_TGSI_SQRT is exposed, and
> otherwise do the MUL or the RCP. FWIW this is what glsl_to_tgsi does:
>
>  emit_scalar(ir, TGSI_OPCODE_RSQ, result_dst, op[0]);
>  emit_asm(ir, TGSI_OPCODE_MUL, result_dst, result_src, op[0]);
>  /* For incoming channels <= 0, set the result to 0. */
>  op[0].negate = ~op[0].negate;
>  emit_asm(ir, TGSI_OPCODE_CMP, result_dst,
>   op[0], result_src, st_src_reg_for_float(0.0));

FWIW, "NaN to 0" is always enabled on radeon. We also enable it for compute.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 4/5] st/mesa: make use of the occlusion predicate query

2016-02-08 Thread Marek Olšák
Reviewed-by: Marek Olšák 

On Sun, Feb 7, 2016 at 2:54 AM, Ilia Mirkin  wrote:
> Signed-off-by: Ilia Mirkin 
> Reviewed-by: Marek Olšák  (v1)
>
> v1 -> v2: read .b for result of predicate
> ---
>  src/mesa/state_tracker/st_cb_queryobj.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_cb_queryobj.c 
> b/src/mesa/state_tracker/st_cb_queryobj.c
> index fc239bc..cdb9efc 100644
> --- a/src/mesa/state_tracker/st_cb_queryobj.c
> +++ b/src/mesa/state_tracker/st_cb_queryobj.c
> @@ -96,7 +96,8 @@ st_BeginQuery(struct gl_context *ctx, struct 
> gl_query_object *q)
> switch (q->Target) {
> case GL_ANY_SAMPLES_PASSED:
> case GL_ANY_SAMPLES_PASSED_CONSERVATIVE:
> -  /* fall-through */
> +  type = PIPE_QUERY_OCCLUSION_PREDICATE;
> +  break;
> case GL_SAMPLES_PASSED_ARB:
>type = PIPE_QUERY_OCCLUSION_COUNTER;
>break;
> @@ -240,7 +241,14 @@ get_query_result(struct pipe_context *pipe,
>stq->base.Result = data.pipeline_statistics.c_primitives;
>break;
> default:
> -  stq->base.Result = data.u64;
> +  switch (stq->type) {
> +  case PIPE_QUERY_OCCLUSION_PREDICATE:
> + stq->base.Result = !!data.b;
> + break;
> +  default:
> + stq->base.Result = data.u64;
> + break;
> +  }
>break;
> }
>
> --
> 2.4.10
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFCv3 11/11] mesa/st: add support for NIR as possible driver IR

2016-02-08 Thread Roland Scheidegger
Am 06.02.2016 um 22:30 schrieb Marek Olšák:
> On Sat, Feb 6, 2016 at 2:45 PM, Rob Clark  wrote:
>> On Sun, Jan 31, 2016 at 3:16 PM, Rob Clark  wrote:
>>> +   // XXX get from pipe_screen?  Or just let pipe driver provide?
>>> +   nir_options.lower_fpow = true;
>>> +   nir_options.lower_fsat = true;
>>> +   nir_options.lower_scmp = true;
>>> +   nir_options.lower_flrp = true;
>>> +   nir_options.lower_ffract = true;
>>> +   nir_options.native_integers = true;
>>> +
>>
>>
>> btw, one of the few remaining things to tackle is how to handle
>> nir_shader_compiler_options struct.  To follow the existing approach
>> of shader caps, I'd have to add a big pile of caps now, and then keep
>> adding them as nir_shader_compiler_options struct grows.  Which seems
>> sub-optimal.
>>
>> How do people feel about adding a screen->get_shader_paramp() which,
>> along the lines of get_paramf, returns a 'const void *'?  Then we
>> could add a single cap to return the whole compiler-options struct.
>> (And maybe if at some point there was direct support for LLVM as an
>> IR, it might need something similar??)
>>
>> Other possibility is just a pipe->get_nir_compiler_options() type
>> hook.  A bit more of a point solution, but might make sense if we
>> can't think of any other plausible uses for ->get_shader_paramp()..
>> and less churn since it would only need to be implemented by drivers
>> consuming NIR..
>>
>> Thoughts/opinions?
> 
> pipe->get_nir_compiler_options() sounds good.
> 
> Maybe wait for VMWare guys' opinion as well.

Looks usable to me, albeit I'm not sure you really need NIR-specific
options as such? That is those options above don't really look nir
specific - maybe they aren't used with just glsl->tgsi, but it looks to
me like they would in theory be applicable to other IR as well. Though I
suppose if you just had compiler_otions it would be a bit confusing if
you had entries which then may not be used...

Roland


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Roland Scheidegger
This looks good to me, albeit I know nothing about the hw.
So VI could do (just with some restrictios) even full-speed fp32 denorms
whereas SI/CI can't? Interesting, I suppose that would be intended for
compute. intel x86 can't even do that (actually, I think skylake can),
though certainly other cpus could do that for ages.

(Albeit there's still nothing in the glsl spec which says this is
required for fp16 pack...)

Roland

Am 06.02.2016 um 13:15 schrieb Marek Olšák:
> From: Marek Olšák 
> 
> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
> but not SI & CI, which can't disable denorms for those instructions.
> ---
>  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
>  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
>  src/gallium/drivers/radeonsi/sid.h  |  3 +++
>  3 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index a4680ce..3f1db70 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
>  
>   si_shader_binary_read_config(binary, conf, 0);
>  
> + /* Enable 64-bit and 16-bit denormals, because there is no performance
> +  * cost.
> +  *
> +  * If denormals are enabled, all floating-point output modifiers are
> +  * ignored.
> +  *
> +  * Don't enable denormals for 32-bit floats, because:
> +  * - Floating-point output modifiers would be ignored by the hw.
> +  * - Some opcodes don't support denormals, such as v_mad_f32. We would
> +  *   have to stop using those.
> +  * - SI & CI would be very slow.
> +  */
> + conf->float_mode |= V_00B028_FP_64_DENORMS;
> +
>   FREE(binary->config);
>   FREE(binary->global_symbol_offsets);
>   binary->config = NULL;
> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
> b/src/gallium/drivers/radeonsi/si_state_shaders.c
> index ce795c0..77a4e47 100644
> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
> @@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader)
>   shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / 
> 4) |
>  S_00B528_SGPRS((num_sgprs - 1) / 8) |
>  S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) |
> -S_00B528_DX10_CLAMP(1);
> +S_00B528_DX10_CLAMP(1) |
> +S_00B528_FLOAT_MODE(shader->config.float_mode);
>   shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) |
>  
> S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0);
>  }
> @@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader)
>   si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS,
>  S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B428_SGPRS((num_sgprs - 1) / 8) |
> -S_00B428_DX10_CLAMP(1));
> +S_00B428_DX10_CLAMP(1) |
> +S_00B428_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS,
>  S_00B42C_USER_SGPR(num_user_sgprs) |
>  
> S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader)
>  S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B328_SGPRS((num_sgprs - 1) / 8) |
>  S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) |
> -S_00B328_DX10_CLAMP(1));
> +S_00B328_DX10_CLAMP(1) |
> +S_00B328_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES,
>  S_00B32C_USER_SGPR(num_user_sgprs) |
>  
> S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader)
>   si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS,
>  S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B228_SGPRS((num_sgprs - 1) / 8) |
> -S_00B228_DX10_CLAMP(1));
> +S_00B228_DX10_CLAMP(1) |
> +S_00B228_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS,
>  S_00B22C_USER_SGPR(num_user_sgprs) |
>  
> S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct 
> si_shader *gs)
>  

[Mesa-dev] New stable-branch 11.1 candidate pushed

2016-02-08 Thread Emil Velikov
Hello list,

The candidate for the Mesa 11.1.2 is now available. Currently we have:
 - 45 queued
 - 14 nominated (outstanding)
 - and 2 rejected/obsolete patches

With the current queue nothing in particular stands out - we have fixes all
over the place - core mesa, glsl, i965, nouveau, r600, radeonsi, omx. Yet
piglit shows a significan amount of fixes for the software based renderers :-)

Take a look at section "Mesa stable queue" for more information.

Testing
---
The following results are against piglit d34b3f77191.


Changes - classic i965(snb)
---
None.


Changes - swrast classic

Fixes:
 - arb_debug_output
   + arb_debug_output-api_error
 fail > pass
 - arb_draw_elements_base_vertex
   + arb_draw_elements_base_vertex-bounds
 fail > pass
   + arb_draw_elements_base_vertex-drawelements
 fail > pass
   + arb_draw_elements_base_vertex-drawelements-instanced
 fail > pass
   + arb_draw_elements_base_vertex-drawelements-user_varrays
 fail > pass
 - khr_debug
   + object-label_gles2
 fail > pass
   + object-label_gles3
 fail > pass
   + push-pop-group_gl
 fail > pass
   + push-pop-group_gles2
 fail > pass
   + push-pop-group_gles3
 fail > pass
 - oes_draw_elements_base_vertex
   + oes_draw_elements_base_vertex-drawelements
 fail > pass
   + oes_draw_elements_base_vertex-multidrawelements
 fail > pass

Regressions:
 - arb_draw_elements_base_vertex
   + arb_draw_elements_base_vertex-negative-index
 fail > crash
   + arb_draw_elements_base_vertex-negative-index-user_varrays
 fail > crash


Changes - gallium softpipe
--
Fixes:
 - arb_blend_func_extended
   + arb_blend_func_extended-bindfragdataindexed-invalid-parameters
 fail > pass
   + arb_blend_func_extended-bindfragdataindexed-invalid-parameters_gles3
  fail > pass
   + arb_blend_func_extended-fbo-extended-blend
 fail > pass
   + arb_blend_func_extended-fbo-extended-blend_gles3
 fail > pass
   + arb_blend_func_extended-getfragdataindex
 fail > pass
   + arb_blend_func_extended-getfragdataindex_gles3
 fail > pass
 - arb_debug_output
   + arb_debug_output-api_error
 fail > pass
 - arb_draw_elements_base_vertex
   + arb_draw_elements_base_vertex-bounds
 fail > pass
   + arb_draw_elements_base_vertex-drawelements
 fail > pass
   + arb_draw_elements_base_vertex-drawelements-instanced
 fail > pass
   + arb_draw_elements_base_vertex-drawelements-user_varrays
 fail > pass
   + arb_draw_elements_base_vertex-drawrangeelements
 fail > pass
   + arb_draw_elements_base_vertex-multidrawelements
 fail > pass
   + arb_draw_elements_base_vertex-negative-index
 fail > pass
   + arb_draw_elements_base_vertex-negative-index-user_varrays
 fail > pass
 - arb_texture_cube_map_array
   +arb_texture_cube_map_array-cubemap
 fail > pass
 - khr_debug
   + object-label_gl
 fail > pass
   + object-label_gles2
 fail > pass
   + object-label_gles3
 fail > pass
   + push-pop-group_gl
 fail > pass
   + push-pop-group_gles2
 fail > pass
   + push-pop-group_gles3
 fail > pass
 - oes_draw_elements_base_vertex
   + oes_draw_elements_base_vertex-drawelements
 fail > pass
   + oes_draw_elements_base_vertex-drawelements-instanced
 fail > pass
   + oes_draw_elements_base_vertex-drawrangeelements
 fail > pass
   + oes_draw_elements_base_vertex-multidrawelements
 fail > pass


Changes - gallium llvmpipe (LLVM 3.7.0)
---
Fixes:
 - !opengl 3.2
   + gl_vertexid used with glmultidrawelementsbasevertex
 fail > pass
 - arb_blend_func_extended
   + arb_blend_func_extended-bindfragdataindexed-invalid-parameters
 fail > pass
   + arb_blend_func_extended-bindfragdataindexed-invalid-parameters_gles3
  fail > pass
   + arb_blend_func_extended-fbo-extended-blend
 fail > pass
   + arb_blend_func_extended-fbo-extended-blend_gles3
 fail > pass
   + arb_blend_func_extended-getfragdataindex
 fail > pass
   + arb_blend_func_extended-getfragdataindex_gles3
 fail > pass
 - arb_buffer_storage
   + bufferstorage-persistent draw coherent
 fail > pass
   + bufferstorage-persistent draw coherent client-storage
 fail > pass
   + bufferstorage-persistent read coherent
 fail > pass
   + bufferstorage-persistent read coherent client-storage
 fail > pass
 - arb_debug_output
   + arb_debug_output-api_error
 fail > pass
 - arb_draw_elements_base_vertex
   + arb_draw_elements_base_vertex-bounds
 fail > pass
   + arb_draw_elements_base_vertex-drawelements
 fail > pass
   + arb_draw_elements_base_vertex-drawelements-instanced
 fail > pass
   + arb_draw_elements_base_vertex-drawelements-user_varrays
 fail > pass
   + arb_draw_elements_base_vertex-drawrangeelements
 fail > pass
   + arb_draw_elements_base_vertex-multidrawelements
 fail > pass
   + arb_draw_elements_base_vertex-negative-index
 fail > pass

Re: [Mesa-dev] [PATCH] mesa: rewrite save_CallLists() code

2016-02-08 Thread Brian Paul

On 02/08/2016 06:25 PM, Ian Romanick wrote:

On 02/08/2016 05:06 PM, Brian Paul wrote:

When glCallLists() is compiled into a display list, preserve the call
as a single glCallLists rather than 'n' glCallList calls.  This will
matter for an upcoming display list optimization project.


I think this code is generally better than what was here before, but
reading that last sentence made me die inside just a little. :)


Yeah, I know what you mean.  But legacy GL apps will be with us for a 
long time and here at VMware we find people are very interested in 
running legacy apps on legacy OSes in VMs.


In this particular case, we have an app that makes quite a few calls to 
glCallLists() to render GLbitmap text.  I'm working on a texture atlas 
optimization which puts all the glBitmap glyphs into a texture so we can 
render glCallLists() text by drawing textured quads.  This greatly 
reduces texture uploads and gives us a worthwhile speed-up.  The code's 
nearly done.  Just working on some last details...


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 94040] clGetPlatformIDs causes futex race condition

2016-02-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=94040

--- Comment #7 from b...@bob131.so ---
In the specific case of blender, there are 31 hung up threads with around 9
each waiting on

__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
pthread_cond_wait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
do_futex_wait () at ../sysdeps/unix/sysv/linux/futex-internal.h:205

The remaining threads are waiting on poll () at
../sysdeps/unix/syscall-template.S:84 with the last being the one I provided a
backtrace for. Is there any one you're interested in particular? After sampling
some of the backtraces they seem to be unrelated, either blender-specific or
just mainloops for udev and pulseaudio waiting for events, but if please let me
know if there's any backtraces you'd like

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [v2] i965: Make sure we blit a full compressed block

2016-02-08 Thread Jason Ekstrand
On Sat, Feb 6, 2016 at 6:11 PM, Ben Widawsky 
wrote:

> This fixes an assertion failure in [at least] one of the Unreal Engine
> Linux
> demo/games that uses DXT1 compression. Specifically, the "Vehicle Game".
>
> At some point, the game ends up trying to blit mip level whose size is 2x2,
> which is smaller than a DXT1 block. As a result, the assertion in the blit
> path
> is triggered. It should be safe to simply make sure we align the width and
> height, which is sadly an example of compression being less efficient.
>
> NOTE: The demo seems to work fine without the assert, and therefore release
> builds of mesa wouldn't stumble over this. Perhaps there is some
> unnoticeable
> corruption, but I had trouble spotting it.
>
> Thanks to Jason for looking at my backtrace and figuring out what was
> going on.
>
> v2: Use NPOT alignment to make sure ASTC is handled properly (Ilia)
> Remove comment about how this doesn't fix other bugs, because it does.
>
> Cc: Jason Ekstrand 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93358
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/intel_copy_image.c | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_copy_image.c
> b/src/mesa/drivers/dri/i965/intel_copy_image.c
> index 0a3337e..42bd7ff 100644
> --- a/src/mesa/drivers/dri/i965/intel_copy_image.c
> +++ b/src/mesa/drivers/dri/i965/intel_copy_image.c
> @@ -212,6 +212,7 @@ intel_copy_image_sub_data(struct gl_context *ctx,
> struct brw_context *brw = brw_context(ctx);
> struct intel_mipmap_tree *src_mt, *dst_mt;
> unsigned src_level, dst_level;
> +   GLuint bw, bh;
>
> if (_mesa_meta_CopyImageSubData_uncompressed(ctx,
>  src_image,
> src_renderbuffer,
> @@ -275,6 +276,18 @@ intel_copy_image_sub_data(struct gl_context *ctx,
> intel_miptree_all_slices_resolve_depth(brw, dst_mt);
> intel_miptree_resolve_color(brw, dst_mt);
>
> +   _mesa_get_format_block_size(src_mt->format, , );
> +   /* It's legal to have a WxH that's smaller than a compressed block.
> This
> +* happens for example when you are using a higher level LOD. For this
> case,
> +* we still want to copy the entire block, or else the decompression
> will be
> +* incorrect.
> +*/
> +   if (src_width < bw)
> +  src_width = ALIGN_NPOT(src_width, bw);
> +
> +   if (src_height < bh)
> +  src_height = ALIGN_NPOT(src_height, bh);
>

I've been going back-and-forth as to whether or not this is the right place
to do this or if we wanted it further up or down the stack.  At the end of
the day, I concluded that it's as good a place as any.

Reviewed-by: Jason Ekstrand 
Cc "11.0 11.1" 

--Jason


> +
> if (copy_image_with_blitter(brw, src_mt, src_level,
> src_x, src_y, src_z,
> dst_mt, dst_level,
> --
> 2.7.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 94040] clGetPlatformIDs causes futex race condition

2016-02-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=94040

--- Comment #8 from Francisco Jerez  ---
(In reply to bob from comment #7)
> In the specific case of blender, there are 31 hung up threads with around 9
> each waiting on
> 
> __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> pthread_cond_wait@@GLIBC_2.3.2 () at
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> do_futex_wait () at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> 
> The remaining threads are waiting on poll () at
> ../sysdeps/unix/syscall-template.S:84 with the last being the one I provided
> a backtrace for. Is there any one you're interested in particular? After
> sampling some of the backtraces they seem to be unrelated, either
> blender-specific or just mainloops for udev and pulseaudio waiting for
> events, but if please let me know if there's any backtraces you'd like

Is there any other hung thread showing mesa function calls in the stack trace? 
That would be particularly interesting.  In any case the more backtraces you
can provide the better :)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Request for someone who understands viewports, clip control, etc

2016-02-08 Thread Ilia Mirkin
https://bugs.freedesktop.org/show_bug.cgi?id=93813

Can someone who *actually* understands what's going on rule on this
one way or the other? I've tried to catch people's attention on IRC,
but unsuccessfully. If this is a real bug, seems like a very bad
one...

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] workarounds: Update workaround names and platforms

2016-02-08 Thread Ben Widawsky
On Fri, Feb 05, 2016 at 01:59:23PM -0800, Sameer Kibey wrote:
> Update the format in which workarounds are documented
> in the source code. This allows mesa to be parsed
> by the list-workarounds utility in intel-gpu-tools.
> 
> Signed-off-by: Sameer Kibey 
> ---
> changed byt to vlv for consistency.
>  src/mesa/drivers/dri/i965/brw_binding_tables.c | 2 +-
>  src/mesa/drivers/dri/i965/brw_blorp.cpp| 2 ++
>  src/mesa/drivers/dri/i965/brw_defines.h| 3 ++-
>  src/mesa/drivers/dri/i965/brw_eu_emit.c| 3 ++-
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 9 ++---
>  src/mesa/drivers/dri/i965/brw_pipe_control.c   | 4 +++-
>  src/mesa/drivers/dri/i965/gen6_queryobj.c  | 2 +-
>  src/mesa/drivers/dri/i965/gen8_depth_state.c   | 3 ++-
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c  | 2 +-
>  9 files changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
> b/src/mesa/drivers/dri/i965/brw_binding_tables.c
> index f3a0310..bcf6422 100644
> --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
> +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
> @@ -54,7 +54,7 @@ static uint32_t
>  reserve_hw_bt_space(struct brw_context *brw, unsigned bytes)
>  {
> /* From the Broadwell PRM, Volume 16, "Workarounds",
> -* WaStateBindingTableOverfetch:
> +* WaStateBindingTableOverfetch:hsw,bdw,chv,bxt
>  * "HW over-fetches two cache lines of binding table indices.  When
>  *  using the resource streamer, SW needs to pad binding table pointer
>  *  updates with an additional two cache lines."
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp.cpp
> index 1bc6d15..dd01ea8 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
> @@ -304,6 +304,8 @@ brw_hiz_op_params::brw_hiz_op_params(struct 
> intel_mipmap_tree *mt,
>  * aligned to an 8x4 pixel block relative to the upper left corner
>  * of the depth buffer [...]
>  *
> +* WaHizAmbiguate8x4Aligned:hsw
> +*
>  * For hiz resolves, the rectangle must also be 8x4 aligned. Item
>  * WaHizAmbiguate8x4Aligned from the Haswell workarounds page and the
>  * Ivybridge simulator require the alignment.
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 01e0c99..5410a1d 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -1756,7 +1756,8 @@ enum brw_message_target {
>  /* Dataport special binding table indices: */
>  #define BRW_BTI_STATELESS255
>  #define GEN7_BTI_SLM 254
> -/* Note that on Gen8+ BTI 255 was redefined to be IA-coherent according to 
> the
> +/* WaForceEnableNonCoherent:bdw,chv,skl,kbl
> + * Note that on Gen8+ BTI 255 was redefined to be IA-coherent according to 
> the
>   * hardware spec, however because the DRM sets bit 4 of HDC_CHICKEN0 on BDW,
>   * CHV and at least some pre-production steppings of SKL due to
>   * WaForceEnableNonCoherent, HDC memory access may have been overridden by 
> the
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index 35d8039..918d69e 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -1885,7 +1885,8 @@ void brw_CMP(struct brw_codegen *p,
> brw_set_src0(p, insn, src0);
> brw_set_src1(p, insn, src1);
>  
> -   /* Item WaCMPInstNullDstForcesThreadSwitch in the Haswell Bspec 
> workarounds
> +   /* WaCMPInstNullDstForcesThreadSwitch:ivb,hsw,vlv
> +* Item WaCMPInstNullDstForcesThreadSwitch in the Haswell Bspec 
> workarounds
>  * page says:
>  *"Any CMP instruction with a null destination must use a {switch}."
>  *
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 1916a99..24d4a9d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -1836,7 +1836,8 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>   brw_F16TO32(p, dst, src[0]);
>   break;
>case BRW_OPCODE_CMP:
> - /* The Ivybridge/BayTrail WaCMPInstFlagDepClearedEarly workaround 
> says
> + /* WaCMPInstFlagDepClearedEarly:ivb,hsw,vlv
> +  * The Ivybridge/BayTrail WaCMPInstFlagDepClearedEarly workaround 
> says
>* that when the destination is a GRF that the dependency-clear bit 
> on
>* the flag register is cleared early.
>*
> @@ -1928,7 +1929,8 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>  
>case BRW_OPCODE_BFI1:
>   assert(devinfo->gen >= 7);
> - /* The Haswell WaForceSIMD8ForBFIInstruction workaround says that we
> + /* 

Re: [Mesa-dev] [PATCH 4/7] mesa: remove check_compatible() in make_current

2016-02-08 Thread Ian Romanick
On 02/05/2016 01:11 PM, Miklós Máté wrote:
> this was marked for removal since 2007
> ctx::Visual is also removed, since this was its only legit user
> ---
>  .../drivers/dri/radeon/radeon_common_context.c |  2 +-
>  src/mesa/main/blend.c  |  4 +-
>  src/mesa/main/blend.h  |  4 +-
>  src/mesa/main/context.c| 89 
> ++
>  src/mesa/main/mtypes.h |  7 --
>  src/mesa/main/pixel.c  |  4 +-
>  src/mesa/main/pixel.h  |  4 +-
>  7 files changed, 15 insertions(+), 99 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/radeon/radeon_common_context.c 
> b/src/mesa/drivers/dri/radeon/radeon_common_context.c
> index 4660d98..2989f63 100644
> --- a/src/mesa/drivers/dri/radeon/radeon_common_context.c
> +++ b/src/mesa/drivers/dri/radeon/radeon_common_context.c
> @@ -604,7 +604,7 @@ GLboolean radeonMakeCurrent(__DRIcontext * driContextPriv,
>   }
>  
>   if(driDrawPriv == NULL && driReadPriv == NULL) {
> - drfb = _mesa_create_framebuffer(>glCtx.Visual);
> + drfb = _mesa_get_incomplete_framebuffer();
>   readfb = drfb;
>   }
>   else {
> diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
> index 2ae22e9..28e2dbf 100644
> --- a/src/mesa/main/blend.c
> +++ b/src/mesa/main/blend.c
> @@ -921,7 +921,7 @@ _mesa_get_render_format(const struct gl_context *ctx, 
> mesa_format format)
>   * Initializes the related fields in the context color attribute group,
>   * __struct gl_contextRec::Color.
>   */
> -void _mesa_init_color( struct gl_context * ctx )
> +void _mesa_init_color( struct gl_context * ctx, GLuint doubleBufferMode )

Mesa has changed it's style (years ago, at this point) to not have the
spaces after the ( or before the ).  Since the prototype is being
updated anyway, this is a good time to fix the style.

I think I'd also prefer doubleBufferMode to be bool.

I have a bunch of comments below about the other changes, but I think
explicitly passing the double buffer mode around is a reasonable change.
>  {
> GLuint i;
>  
> @@ -951,7 +951,7 @@ void _mesa_init_color( struct gl_context * ctx )
>  
> /* GL_FRONT is not possible on GLES. Instead GL_BACK will render to either
>  * the front or the back buffer depending on the config */
> -   if (ctx->Visual.doubleBufferMode || _mesa_is_gles(ctx)) {
> +   if (doubleBufferMode || _mesa_is_gles(ctx)) {
>ctx->Color.DrawBuffer[0] = GL_BACK;
> }
> else {
> diff --git a/src/mesa/main/blend.h b/src/mesa/main/blend.h
> index 8ab9e02..f4854a6 100644
> --- a/src/mesa/main/blend.h
> +++ b/src/mesa/main/blend.h
> @@ -124,7 +124,7 @@ _mesa_update_clamp_vertex_color(struct gl_context *ctx,
>  extern mesa_format
>  _mesa_get_render_format(const struct gl_context *ctx, mesa_format format);
>  
> -extern void  
> -_mesa_init_color( struct gl_context * ctx );
> +extern void
> +_mesa_init_color( struct gl_context * ctx, GLuint doubleBufferMode );

Same comment here about fixing the spacing while changing the prototype.

>  
>  #endif
> diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
> index 8b415ed..2a512c6 100644
> --- a/src/mesa/main/context.c
> +++ b/src/mesa/main/context.c
> @@ -796,7 +796,7 @@ check_context_limits(struct gl_context *ctx)
>   * functions for the more complex data structures.
>   */
>  static GLboolean
> -init_attrib_groups(struct gl_context *ctx)
> +init_attrib_groups(struct gl_context *ctx, GLuint doubleBufferMode)

Same comment here about s/GLuint/bool/

>  {
> assert(ctx);
>  
> @@ -810,7 +810,7 @@ init_attrib_groups(struct gl_context *ctx)
> _mesa_init_accum( ctx );
> _mesa_init_attrib( ctx );
> _mesa_init_buffer_objects( ctx );
> -   _mesa_init_color( ctx );
> +   _mesa_init_color( ctx, doubleBufferMode );
> _mesa_init_current( ctx );
> _mesa_init_depth( ctx );
> _mesa_init_debug( ctx );
> @@ -828,7 +828,7 @@ init_attrib_groups(struct gl_context *ctx)
> _mesa_init_multisample( ctx );
> _mesa_init_performance_monitors( ctx );
> _mesa_init_pipeline( ctx );
> -   _mesa_init_pixel( ctx );
> +   _mesa_init_pixel( ctx, doubleBufferMode );

Spaces.

> _mesa_init_pixelstore( ctx );
> _mesa_init_point( ctx );
> _mesa_init_polygon( ctx );
> @@ -1159,15 +1159,6 @@ _mesa_initialize_context(struct gl_context *ctx,
> ctx->WinSysDrawBuffer = NULL;
> ctx->WinSysReadBuffer = NULL;
>  
> -   if (visual) {
> -  ctx->Visual = *visual;
> -  ctx->HasConfig = GL_TRUE;
> -   }
> -   else {
> -  memset(>Visual, 0, sizeof ctx->Visual);
> -  ctx->HasConfig = GL_FALSE;
> -   }
> -
> _mesa_override_gl_version(ctx);
>  
> /* misc one-time initializations */
> @@ -1193,7 +1184,7 @@ _mesa_initialize_context(struct gl_context *ctx,
>  
> _mesa_reference_shared_state(ctx, >Shared, shared);
>  
> -   if (!init_attrib_groups( ctx ))

Re: [Mesa-dev] [PATCH] [v2] i965: Make sure we blit a full compressed block

2016-02-08 Thread Matt Turner
On Sat, Feb 6, 2016 at 6:11 PM, Ben Widawsky
 wrote:
> This fixes an assertion failure in [at least] one of the Unreal Engine Linux
> demo/games that uses DXT1 compression. Specifically, the "Vehicle Game".
>
> At some point, the game ends up trying to blit mip level whose size is 2x2,
> which is smaller than a DXT1 block. As a result, the assertion in the blit 
> path
> is triggered. It should be safe to simply make sure we align the width and
> height, which is sadly an example of compression being less efficient.
>
> NOTE: The demo seems to work fine without the assert, and therefore release
> builds of mesa wouldn't stumble over this. Perhaps there is some unnoticeable
> corruption, but I had trouble spotting it.
>
> Thanks to Jason for looking at my backtrace and figuring out what was going 
> on.
>
> v2: Use NPOT alignment to make sure ASTC is handled properly (Ilia)
> Remove comment about how this doesn't fix other bugs, because it does.
>
> Cc: Jason Ekstrand 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93358

Tested-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Matt Arsenault

> On Feb 8, 2016, at 12:38, Marek Olšák  wrote:
> 
>> 
>> We should tell the compiler we are enabling fp-64 denorms by adding
>> +fp64-denormals to the feature string.  It would also be better to
>> read the float_mode value from the config registers emitted by the
>> compiler.
> 
> Yes, I agree, but LLVM only sets these parameters for compute or even
> HSA-only kernels, not for graphics shaders. We need to set the mode
> for all users _now_, not in 6 months. Last time I looked,
> +fp64-denormals had no effect on graphics shaders.

This is a bug. I think I left these because the config register macro names 
were different for the other shader types, even though they appeared to be the 
same thing.

-Matt___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 0/5] st/mesa: use PIPE_QUERY_OCCLUSION_PREDICATE

2016-02-08 Thread Nicolai Hähnle

Patch 4 & 5: Reviewed-by: Nicolai Hähnle 

On 06.02.2016 20:53, Ilia Mirkin wrote:

Resending this as a bunch of the patches ended up with updates. They
seem rather trivial, but wanted to make sure people had a chance to
object.

Ilia Mirkin (5):
   ilo: add PIPE_QUERY_OCCLUSION_PREDICATE support
   nv30: add PIPE_QUERY_OCCLUSION_PREDICATE support
   nv50: add PIPE_QUERY_OCCLUSION_PREDICATE support
   st/mesa: make use of the occlusion predicate query
   mesa: remove hack to fix up GL_ANY_SAMPLES_PASSED results

  src/gallium/drivers/ilo/ilo_draw.c   |  2 ++
  src/gallium/drivers/ilo/ilo_query.c  |  9 -
  src/gallium/drivers/ilo/ilo_render.c |  2 ++
  src/gallium/drivers/nouveau/nv30/nv30_query.c|  7 +--
  src/gallium/drivers/nouveau/nv50/nv50_query_hw.c |  6 ++
  src/mesa/main/queryobj.c |  5 -
  src/mesa/state_tracker/st_cb_queryobj.c  | 12 ++--
  7 files changed, 33 insertions(+), 10 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFCv3 11/11] mesa/st: add support for NIR as possible driver IR

2016-02-08 Thread Rob Clark
On Mon, Feb 8, 2016 at 6:58 AM, Roland Scheidegger  wrote:
> Am 06.02.2016 um 22:30 schrieb Marek Olšák:
>> On Sat, Feb 6, 2016 at 2:45 PM, Rob Clark  wrote:
>>> On Sun, Jan 31, 2016 at 3:16 PM, Rob Clark  wrote:
 +   // XXX get from pipe_screen?  Or just let pipe driver provide?
 +   nir_options.lower_fpow = true;
 +   nir_options.lower_fsat = true;
 +   nir_options.lower_scmp = true;
 +   nir_options.lower_flrp = true;
 +   nir_options.lower_ffract = true;
 +   nir_options.native_integers = true;
 +
>>>
>>>
>>> btw, one of the few remaining things to tackle is how to handle
>>> nir_shader_compiler_options struct.  To follow the existing approach
>>> of shader caps, I'd have to add a big pile of caps now, and then keep
>>> adding them as nir_shader_compiler_options struct grows.  Which seems
>>> sub-optimal.
>>>
>>> How do people feel about adding a screen->get_shader_paramp() which,
>>> along the lines of get_paramf, returns a 'const void *'?  Then we
>>> could add a single cap to return the whole compiler-options struct.
>>> (And maybe if at some point there was direct support for LLVM as an
>>> IR, it might need something similar??)
>>>
>>> Other possibility is just a pipe->get_nir_compiler_options() type
>>> hook.  A bit more of a point solution, but might make sense if we
>>> can't think of any other plausible uses for ->get_shader_paramp()..
>>> and less churn since it would only need to be implemented by drivers
>>> consuming NIR..
>>>
>>> Thoughts/opinions?
>>
>> pipe->get_nir_compiler_options() sounds good.
>>
>> Maybe wait for VMWare guys' opinion as well.
>
> Looks usable to me, albeit I'm not sure you really need NIR-specific
> options as such? That is those options above don't really look nir
> specific - maybe they aren't used with just glsl->tgsi, but it looks to
> me like they would in theory be applicable to other IR as well. Though I
> suppose if you just had compiler_otions it would be a bit confusing if
> you had entries which then may not be used...

Yeah, not really NIR specific (and there are a couple that overlap w/
existing caps), other than being used only by NIR..  although it would
be a lot of churn to keep adding caps when the compiler_options struct
is extended, and it might be confusing that some of the lowering
options aren't supported in the TGSI path..

I guess right now it really only matters for two drivers, and down the
road I think we won't have more than 3 or 4 drivers using NIR, so I
suppose it is also an option to start w/
screen->get_nir_compiler_options() for now and revisit later.  If we
get to the point where we are always doing glsl->nir and then
optionally nir->tgsi for drivers that don't consume NIR directly,
maybe then it would make more sense to expose everything as caps?

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Nicolai Hähnle

On 06.02.2016 07:15, Marek Olšák wrote:

From: Marek Olšák 

This fixes FP16 conversion instructions for VI, which has 16-bit floats,
but not SI & CI, which can't disable denorms for those instructions.


Reviewed-by: Nicolai Hähnle 


---
  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
  src/gallium/drivers/radeonsi/sid.h  |  3 +++
  3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index a4680ce..3f1db70 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,

si_shader_binary_read_config(binary, conf, 0);

+   /* Enable 64-bit and 16-bit denormals, because there is no performance
+* cost.
+*
+* If denormals are enabled, all floating-point output modifiers are
+* ignored.
+*
+* Don't enable denormals for 32-bit floats, because:
+* - Floating-point output modifiers would be ignored by the hw.
+* - Some opcodes don't support denormals, such as v_mad_f32. We would
+*   have to stop using those.
+* - SI & CI would be very slow.
+*/
+   conf->float_mode |= V_00B028_FP_64_DENORMS;
+
FREE(binary->config);
FREE(binary->global_symbol_offsets);
binary->config = NULL;
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index ce795c0..77a4e47 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader)
shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / 
4) |
   S_00B528_SGPRS((num_sgprs - 1) / 8) |
   S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) |
-  S_00B528_DX10_CLAMP(1);
+  S_00B528_DX10_CLAMP(1) |
+  S_00B528_FLOAT_MODE(shader->config.float_mode);
shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) |
   
S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0);
  }
@@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader)
si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS,
   S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B428_SGPRS((num_sgprs - 1) / 8) |
-  S_00B428_DX10_CLAMP(1));
+  S_00B428_DX10_CLAMP(1) |
+  S_00B428_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS,
   S_00B42C_USER_SGPR(num_user_sgprs) |
   S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave 
> 0));
@@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader)
   S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B328_SGPRS((num_sgprs - 1) / 8) |
   S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) |
-  S_00B328_DX10_CLAMP(1));
+  S_00B328_DX10_CLAMP(1) |
+  S_00B328_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES,
   S_00B32C_USER_SGPR(num_user_sgprs) |
   S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave 
> 0));
@@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader)
si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS,
   S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B228_SGPRS((num_sgprs - 1) / 8) |
-  S_00B228_DX10_CLAMP(1));
+  S_00B228_DX10_CLAMP(1) |
+  S_00B228_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS,
   S_00B22C_USER_SGPR(num_user_sgprs) |
   S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave 
> 0));
@@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct 
si_shader *gs)
   S_00B128_VGPRS((shader->config.num_vgprs - 1) / 4) |
   S_00B128_SGPRS((num_sgprs - 1) / 8) |
   S_00B128_VGPR_COMP_CNT(vgpr_comp_cnt) |
-  S_00B128_DX10_CLAMP(1));
+  S_00B128_DX10_CLAMP(1) |
+  S_00B128_FLOAT_MODE(shader->config.float_mode));
si_pm4_set_reg(pm4, R_00B12C_SPI_SHADER_PGM_RSRC2_VS,
   S_00B12C_USER_SGPR(num_user_sgprs) |
  

Re: [Mesa-dev] [PATCH] st/mesa: move VS creation in bitmap code

2016-02-08 Thread Nicolai Hähnle

On 05.02.2016 19:55, Brian Paul wrote:

Do this one-time init with the other on-time inits.


Since Bitmap is something that few programs use, wouldn't it be better 
to move in the other direction and do all the one-time inits on-demand 
rather than at context init?


Cheers,
Nicolai


---
  src/mesa/state_tracker/st_cb_bitmap.c | 26 +-
  1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_bitmap.c 
b/src/mesa/state_tracker/st_cb_bitmap.c
index d8c3dbd..f39d956 100644
--- a/src/mesa/state_tracker/st_cb_bitmap.c
+++ b/src/mesa/state_tracker/st_cb_bitmap.c
@@ -631,19 +631,6 @@ st_Bitmap(struct gl_context *ctx, GLint x, GLint y,
st_validate_state(st);
 }

-   if (!st->bitmap.vs) {
-  /* create pass-through vertex shader now */
-  const uint semantic_names[] = { TGSI_SEMANTIC_POSITION,
-  TGSI_SEMANTIC_COLOR,
-st->needs_texcoord_semantic ? TGSI_SEMANTIC_TEXCOORD :
-  TGSI_SEMANTIC_GENERIC };
-  const uint semantic_indexes[] = { 0, 0, 0 };
-  st->bitmap.vs = util_make_vertex_passthrough_shader(st->pipe, 3,
-  semantic_names,
-  semantic_indexes,
-  FALSE);
-   }
-
 if (UseBitmapCache && accum_bitmap(ctx, x, y, width, height, unpack, 
bitmap))
return;

@@ -722,6 +709,19 @@ st_init_bitmap(struct st_context *st)
assert(0);
 }

+   /* Create VS for rendering bitmaps */
+   {
+  const uint semantic_names[] = { TGSI_SEMANTIC_POSITION,
+  TGSI_SEMANTIC_COLOR,
+st->needs_texcoord_semantic ? TGSI_SEMANTIC_TEXCOORD :
+  TGSI_SEMANTIC_GENERIC };
+  const uint semantic_indexes[] = { 0, 0, 0 };
+  st->bitmap.vs = util_make_vertex_passthrough_shader(st->pipe, 3,
+  semantic_names,
+  semantic_indexes,
+  FALSE);
+   }
+
 /* alloc bitmap cache object */
 st->bitmap.cache = ST_CALLOC_STRUCT(bitmap_cache);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] draw: use util_pstipple_create_fragment_shader

2016-02-08 Thread Nicolai Hähnle

Ping?

On 22.01.2016 11:56, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

This reduces code duplication. It also adds support for drivers where the
fragment position is a system value.

Suggested-by: Jose Fonseca 
---
A basic polygon stippling test shows no regression on llvmpipe, but that's
the extent of my testing.

  src/gallium/auxiliary/draw/draw_pipe_pstipple.c | 209 ++--
  1 file changed, 12 insertions(+), 197 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c 
b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
index cf52ca4..e468cc3 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
@@ -43,10 +43,10 @@
  #include "util/u_format.h"
  #include "util/u_math.h"
  #include "util/u_memory.h"
+#include "util/u_pstipple.h"
  #include "util/u_sampler.h"

  #include "tgsi/tgsi_transform.h"
-#include "tgsi/tgsi_dump.h"

  #include "draw_context.h"
  #include "draw_pipe.h"
@@ -114,178 +114,6 @@ struct pstip_stage
  };


-
-/**
- * Subclass of tgsi_transform_context, used for transforming the
- * user's fragment shader to add the extra texture sample and fragment kill
- * instructions.
- */
-struct pstip_transform_context {
-   struct tgsi_transform_context base;
-   uint tempsUsed;  /**< bitmask */
-   int wincoordInput;
-   int maxInput;
-   uint samplersUsed;  /**< bitfield of samplers used */
-   bool hasSview;
-   int freeSampler;  /** an available sampler for the pstipple */
-   int texTemp;  /**< temp registers */
-   int numImmed;
-};
-
-
-/**
- * TGSI declaration transform callback.
- * Look for a free sampler, a free input attrib, and two free temp regs.
- */
-static void
-pstip_transform_decl(struct tgsi_transform_context *ctx,
- struct tgsi_full_declaration *decl)
-{
-   struct pstip_transform_context *pctx = (struct pstip_transform_context *) 
ctx;
-
-   if (decl->Declaration.File == TGSI_FILE_SAMPLER) {
-  uint i;
-  for (i = decl->Range.First;
-   i <= decl->Range.Last; i++) {
- pctx->samplersUsed |= 1 << i;
-  }
-   }
-   else if (decl->Declaration.File == TGSI_FILE_SAMPLER_VIEW) {
-  pctx->hasSview = true;
-   }
-   else if (decl->Declaration.File == TGSI_FILE_INPUT) {
-  pctx->maxInput = MAX2(pctx->maxInput, (int) decl->Range.Last);
-  if (decl->Semantic.Name == TGSI_SEMANTIC_POSITION)
- pctx->wincoordInput = (int) decl->Range.First;
-   }
-   else if (decl->Declaration.File == TGSI_FILE_TEMPORARY) {
-  uint i;
-  for (i = decl->Range.First;
-   i <= decl->Range.Last; i++) {
- pctx->tempsUsed |= (1 << i);
-  }
-   }
-
-   ctx->emit_declaration(ctx, decl);
-}
-
-
-/**
- * TGSI immediate declaration transform callback.
- * We're just counting the number of immediates here.
- */
-static void
-pstip_transform_immed(struct tgsi_transform_context *ctx,
-  struct tgsi_full_immediate *immed)
-{
-   struct pstip_transform_context *pctx = (struct pstip_transform_context *) 
ctx;
-   ctx->emit_immediate(ctx, immed); /* emit to output shader */
-   pctx->numImmed++;
-}
-
-
-/**
- * Find the lowest zero bit in the given word, or -1 if bitfield is all ones.
- */
-static int
-free_bit(uint bitfield)
-{
-   return ffs(~bitfield) - 1;
-}
-
-
-/**
- * TGSI transform prolog callback.
- */
-static void
-pstip_transform_prolog(struct tgsi_transform_context *ctx)
-{
-   struct pstip_transform_context *pctx = (struct pstip_transform_context *) 
ctx;
-   uint i;
-   int wincoordInput;
-
-   /* find free sampler */
-   pctx->freeSampler = free_bit(pctx->samplersUsed);
-   if (pctx->freeSampler >= PIPE_MAX_SAMPLERS)
-  pctx->freeSampler = PIPE_MAX_SAMPLERS - 1;
-
-   if (pctx->wincoordInput < 0)
-  wincoordInput = pctx->maxInput + 1;
-   else
-  wincoordInput = pctx->wincoordInput;
-
-   /* find one free temp reg */
-   for (i = 0; i < 32; i++) {
-  if ((pctx->tempsUsed & (1 << i)) == 0) {
-  /* found a free temp */
-  if (pctx->texTemp < 0)
- pctx->texTemp  = i;
-  else
- break;
-  }
-   }
-   assert(pctx->texTemp >= 0);
-
-   if (pctx->wincoordInput < 0) {
-  /* declare new position input reg */
-  tgsi_transform_input_decl(ctx, wincoordInput,
-TGSI_SEMANTIC_POSITION, 1,
-TGSI_INTERPOLATE_LINEAR);
-   }
-
-   /* declare new sampler */
-   tgsi_transform_sampler_decl(ctx, pctx->freeSampler);
-
-   /* if the src shader has SVIEW decl's for each SAMP decl, we
-* need to continue the trend and ensure there is a matching
-* SVIEW for the new SAMP we just created
-*/
-   if (pctx->hasSview) {
-  tgsi_transform_sampler_view_decl(ctx,
-   pctx->freeSampler,
-   TGSI_TEXTURE_2D,
-   TGSI_RETURN_TYPE_FLOAT);
-   }
-
-   

Re: [Mesa-dev] [PATCH] gallium: pass the robust buffer access context flag to drivers

2016-02-08 Thread Nicolai Hähnle

On 06.02.2016 16:26, Marek Olšák wrote:

From: Marek Olšák 

radeonsi will not do bounds checking for loads if this is not set.
---
  src/gallium/include/pipe/p_defines.h | 6 ++
  src/mesa/state_tracker/st_manager.c  | 6 +-
  2 files changed, 11 insertions(+), 1 deletion(-)


Reviewed-by: Nicolai Hähnle 



diff --git a/src/gallium/include/pipe/p_defines.h 
b/src/gallium/include/pipe/p_defines.h
index 800f16c..b01f6ea 100644
--- a/src/gallium/include/pipe/p_defines.h
+++ b/src/gallium/include/pipe/p_defines.h
@@ -349,6 +349,12 @@ enum pipe_flush_flags
  #define PIPE_CONTEXT_DEBUG (1 << 1)

  /**
+ * Whether out-of-bounds shader loads must return zero and out-of-bounds
+ * shader stores must be dropped.
+ */
+#define PIPE_CONTEXT_ROBUST_BUFFER_ACCESS (1 << 2)
+
+/**
   * Flags for pipe_context::memory_barrier.
   */
  #define PIPE_BARRIER_MAPPED_BUFFER (1 << 0)
diff --git a/src/mesa/state_tracker/st_manager.c 
b/src/mesa/state_tracker/st_manager.c
index 385e26b..162810f 100644
--- a/src/mesa/state_tracker/st_manager.c
+++ b/src/mesa/state_tracker/st_manager.c
@@ -635,6 +635,7 @@ st_api_create_context(struct st_api *stapi, struct 
st_manager *smapi,
 struct pipe_context *pipe;
 struct gl_config mode;
 gl_api api;
+   unsigned ctx_flags = 0;

 if (!(stapi->profile_mask & (1 << attribs->profile)))
return NULL;
@@ -658,7 +659,10 @@ st_api_create_context(struct st_api *stapi, struct 
st_manager *smapi,
break;
 }

-   pipe = smapi->screen->context_create(smapi->screen, NULL, 0);
+   if (attribs->flags & ST_CONTEXT_FLAG_ROBUST_ACCESS)
+  ctx_flags |= PIPE_CONTEXT_ROBUST_BUFFER_ACCESS;
+
+   pipe = smapi->screen->context_create(smapi->screen, NULL, ctx_flags);
 if (!pipe) {
*error = ST_CONTEXT_ERROR_NO_MEMORY;
return NULL;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFCv3 11/11] mesa/st: add support for NIR as possible driver IR

2016-02-08 Thread Rob Clark
On Mon, Feb 8, 2016 at 1:01 PM, Ilia Mirkin  wrote:
> On Mon, Feb 8, 2016 at 8:59 AM, Rob Clark  wrote:
>> On Mon, Feb 8, 2016 at 6:58 AM, Roland Scheidegger  
>> wrote:
>>> Am 06.02.2016 um 22:30 schrieb Marek Olšák:
 On Sat, Feb 6, 2016 at 2:45 PM, Rob Clark  wrote:
> On Sun, Jan 31, 2016 at 3:16 PM, Rob Clark  wrote:
>> +   // XXX get from pipe_screen?  Or just let pipe driver provide?
>> +   nir_options.lower_fpow = true;
>> +   nir_options.lower_fsat = true;
>> +   nir_options.lower_scmp = true;
>> +   nir_options.lower_flrp = true;
>> +   nir_options.lower_ffract = true;
>> +   nir_options.native_integers = true;
>> +
>
>
> btw, one of the few remaining things to tackle is how to handle
> nir_shader_compiler_options struct.  To follow the existing approach
> of shader caps, I'd have to add a big pile of caps now, and then keep
> adding them as nir_shader_compiler_options struct grows.  Which seems
> sub-optimal.
>
> How do people feel about adding a screen->get_shader_paramp() which,
> along the lines of get_paramf, returns a 'const void *'?  Then we
> could add a single cap to return the whole compiler-options struct.
> (And maybe if at some point there was direct support for LLVM as an
> IR, it might need something similar??)
>
> Other possibility is just a pipe->get_nir_compiler_options() type
> hook.  A bit more of a point solution, but might make sense if we
> can't think of any other plausible uses for ->get_shader_paramp()..
> and less churn since it would only need to be implemented by drivers
> consuming NIR..
>
> Thoughts/opinions?

 pipe->get_nir_compiler_options() sounds good.

 Maybe wait for VMWare guys' opinion as well.
>>>
>>> Looks usable to me, albeit I'm not sure you really need NIR-specific
>>> options as such? That is those options above don't really look nir
>>> specific - maybe they aren't used with just glsl->tgsi, but it looks to
>>> me like they would in theory be applicable to other IR as well. Though I
>>> suppose if you just had compiler_otions it would be a bit confusing if
>>> you had entries which then may not be used...
>>
>> Yeah, not really NIR specific (and there are a couple that overlap w/
>> existing caps), other than being used only by NIR..  although it would
>> be a lot of churn to keep adding caps when the compiler_options struct
>> is extended, and it might be confusing that some of the lowering
>> options aren't supported in the TGSI path..
>>
>> I guess right now it really only matters for two drivers, and down the
>> road I think we won't have more than 3 or 4 drivers using NIR, so I
>> suppose it is also an option to start w/
>> screen->get_nir_compiler_options() for now and revisit later.  If we
>> get to the point where we are always doing glsl->nir and then
>> optionally nir->tgsi for drivers that don't consume NIR directly,
>> maybe then it would make more sense to expose everything as caps?
>
> I actually kinda want this for TGSI as well, eventually. Perhaps something 
> like
>
> bool get_compiler_options(pipe_shader_ir, void *)

perhaps:

  const struct pipe_compiler_options * (*get_compiler_options)(struct
pipe_screen *, unsigned shader)

imo, it should take shader stage as arg so we can have different
config per stage, and return a const ptr.. and I think it could
directly return options struct (no need for bool return)..

I suppose if you plan to add lots of knobs to twiddle for TGSI then
shader cap for each would be annoying.  Although not super-thrilled
about having to translate from generic(ish) pipe struct to nir struct,
since the driver will already want a const version of the nir struct
for the tgsi_to_nir case.

I guess we could do:

  const void * (*get_compiler_options)(struct pipe_screen *, unsigned
shader, enum pipe_shader_ir type)

where the return value could be 'struct tgsi_shader_options *' or
'struct nir_shader_compiler_options *', etc..

hmm.. also not sure how to roll that out without a flag day.  Perhaps
keep the shader params for now (for tgsi) with a helper to populate a
tgsi_compiler_options struct for drivers where
screen->get_compiler_options() is null.. (and then what about other
st's?)

BR,
-R

> or perhaps struct pipe_compiler_options * (which would contain some
> common stuff + a union of the per-ir options) instead of the void *
> would make more sense? The reason I'm interested is to be able to
> indicate that various frontend opts should be disabled. Also it could
> be used to get rid of a bunch of PIPE_CAP_TGSI_XYZ's, which are a huge
> pain to add all the time.
>
>   -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] nir: Handle large unsigned values in opt_algebraic.

2016-02-08 Thread Dylan Baker
This seems perfectly fine to me. For what it's worth:

Reviewed-by: Dylan Baker 

Quoting Matt Turner (2016-02-04 17:48:00)
> The next patch adds an algebraic rule that uses the constant 0xff00ff00.
> 
> Without this change, the build fails with
> 
>return hex(struct.unpack('I', struct.pack('i', self.value))[0])
>struct.error: 'i' format requires -2147483648 <= number <= 2147483647
> 
> The hex() function handles integers of any size, and assigning a
> negative value to an unsigned does what we want in C. The pack/unpack is
> unnecessary (and as we see, buggy).
> ---
>  src/compiler/nir/nir_algebraic.py | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/src/compiler/nir/nir_algebraic.py 
> b/src/compiler/nir/nir_algebraic.py
> index 77ad35e..2357b57 100644
> --- a/src/compiler/nir/nir_algebraic.py
> +++ b/src/compiler/nir/nir_algebraic.py
> @@ -102,13 +102,10 @@ class Constant(Value):
>self.value = val
>  
> def __hex__(self):
> -  # Even if it's an integer, we still need to unpack as an unsigned
> -  # int.  This is because, without C99, we can only assign to the first
> -  # element of a union in an initializer.
>if isinstance(self.value, (bool)):
>   return 'NIR_TRUE' if self.value else 'NIR_FALSE'
>if isinstance(self.value, (int, long)):
> - return hex(struct.unpack('I', struct.pack('i', self.value))[0])
> + return hex(self.value)
>elif isinstance(self.value, float):
>   return hex(struct.unpack('I', struct.pack('f', self.value))[0])
>else:
> -- 
> 2.4.10
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] workarounds: Update workaround names and platforms

2016-02-08 Thread Kibey, Sameer


> -Original Message-
> From: Widawsky, Benjamin
> Sent: Saturday, February 06, 2016 10:30 AM
> To: Kibey, Sameer
> Cc: mesa-dev@lists.freedesktop.org; Sharp, Sarah A
> Subject: Re: [PATCH v2] workarounds: Update workaround names and
> platforms
> 
> On Fri, Feb 05, 2016 at 01:59:23PM -0800, Sameer Kibey wrote:
> > Update the format in which workarounds are documented in the source
> > code. This allows mesa to be parsed by the list-workarounds utility in
> > intel-gpu-tools.
> >
> > Signed-off-by: Sameer Kibey 
> 
> Do you have any plan for updating these as we add new platforms to mesa? I
> foresee a problem of these getting stale. I wonder how the drm-intel devs
> deals with that.

To update this for the new platforms should be a trivial patch. I do not see 
any issues with that.
 
> [snip]
> 
> --
> Ben Widawsky, Intel Open Source Technology Center
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] workarounds: Update workaround names and platforms

2016-02-08 Thread Ben Widawsky
On Mon, Feb 08, 2016 at 09:09:30AM -0800, Kibey, Sameer wrote:
> 
> 
> > -Original Message-
> > From: Widawsky, Benjamin
> > Sent: Saturday, February 06, 2016 10:30 AM
> > To: Kibey, Sameer
> > Cc: mesa-dev@lists.freedesktop.org; Sharp, Sarah A
> > Subject: Re: [PATCH v2] workarounds: Update workaround names and
> > platforms
> > 
> > On Fri, Feb 05, 2016 at 01:59:23PM -0800, Sameer Kibey wrote:
> > > Update the format in which workarounds are documented in the source
> > > code. This allows mesa to be parsed by the list-workarounds utility in
> > > intel-gpu-tools.
> > >
> > > Signed-off-by: Sameer Kibey 
> > 
> > Do you have any plan for updating these as we add new platforms to mesa? I
> > foresee a problem of these getting stale. I wonder how the drm-intel devs
> > deals with that.
> 
> To update this for the new platforms should be a trivial patch. I do not see 
> any issues with that.

The issue is remembering to do it. But I take it that we have no good solution
for that.

>  
> > [snip]
> > 
> > --
> > Ben Widawsky, Intel Open Source Technology Center

-- 
Ben Widawsky, Intel Open Source Technology Center
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] st/mesa: don't allocate bitmap drawing state until needed

2016-02-08 Thread Gustaw Smolarczyk
2016-02-08 18:07 GMT+01:00 Brian Paul :
> Most apps don't use glBitmap so don't allocate the bitmap cache or
> gallium state objects/shaders/etc until the first call to st_Bitmap().
> ---
>  src/mesa/state_tracker/st_cb_bitmap.c | 145 
> ++
>  src/mesa/state_tracker/st_cb_bitmap.h |   3 -
>  src/mesa/state_tracker/st_context.c   |   1 -
>  3 files changed, 77 insertions(+), 72 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_cb_bitmap.c 
> b/src/mesa/state_tracker/st_cb_bitmap.c
> index c26ee7f..ca1dfab 100644
> --- a/src/mesa/state_tracker/st_cb_bitmap.c
> +++ b/src/mesa/state_tracker/st_cb_bitmap.c
> @@ -497,8 +497,9 @@ create_cache_trans(struct st_context *st)
>  void
>  st_flush_bitmap_cache(struct st_context *st)
>  {
> -   if (!st->bitmap.cache->empty) {
> -  struct bitmap_cache *cache = st->bitmap.cache;
> +   struct bitmap_cache *cache = st->bitmap.cache;
> +
> +   if (cache && !st->bitmap.cache->empty) {
Maybe do the following:

if (cache && !cache->empty) {

>struct pipe_context *pipe = st->pipe;
>struct pipe_sampler_view *sv;
>
> @@ -617,6 +618,76 @@ accum_bitmap(struct gl_context *ctx,
>  }
>
>
> +/**
> + * One-time init for drawing bitmaps.
> + */
> +static void
> +init_bitmap_state(struct st_context *st)
> +{
> +   struct pipe_sampler_state *sampler = >bitmap.samplers[0];
> +   struct pipe_context *pipe = st->pipe;
> +   struct pipe_screen *screen = pipe->screen;
> +
> +   /* This function should only be called once */
> +   assert(st->bitmap.cache == NULL);
> +
> +   /* alloc bitmap cache object */
> +   st->bitmap.cache = ST_CALLOC_STRUCT(bitmap_cache);
> +
> +   /* init sampler state once */
> +   memset(sampler, 0, sizeof(*sampler));
> +   sampler->wrap_s = PIPE_TEX_WRAP_CLAMP;
> +   sampler->wrap_t = PIPE_TEX_WRAP_CLAMP;
> +   sampler->wrap_r = PIPE_TEX_WRAP_CLAMP;
> +   sampler->min_img_filter = PIPE_TEX_FILTER_NEAREST;
> +   sampler->min_mip_filter = PIPE_TEX_MIPFILTER_NONE;
> +   sampler->mag_img_filter = PIPE_TEX_FILTER_NEAREST;
> +   st->bitmap.samplers[1] = *sampler;
> +   st->bitmap.samplers[1].normalized_coords = 1;
> +
> +   /* init baseline rasterizer state once */
> +   memset(>bitmap.rasterizer, 0, sizeof(st->bitmap.rasterizer));
> +   st->bitmap.rasterizer.half_pixel_center = 1;
> +   st->bitmap.rasterizer.bottom_edge_rule = 1;
> +   st->bitmap.rasterizer.depth_clip = 1;
> +
> +   /* find a usable texture format */
> +   if (screen->is_format_supported(screen, PIPE_FORMAT_I8_UNORM,
> +   PIPE_TEXTURE_2D, 0,
> +   PIPE_BIND_SAMPLER_VIEW)) {
> +  st->bitmap.tex_format = PIPE_FORMAT_I8_UNORM;
> +   }
> +   else if (screen->is_format_supported(screen, PIPE_FORMAT_A8_UNORM,
> +PIPE_TEXTURE_2D, 0,
> +PIPE_BIND_SAMPLER_VIEW)) {
> +  st->bitmap.tex_format = PIPE_FORMAT_A8_UNORM;
> +   }
> +   else if (screen->is_format_supported(screen, PIPE_FORMAT_L8_UNORM,
> +PIPE_TEXTURE_2D, 0,
> +PIPE_BIND_SAMPLER_VIEW)) {
> +  st->bitmap.tex_format = PIPE_FORMAT_L8_UNORM;
> +   }
> +   else {
> +  /* XXX support more formats */
> +  assert(0);
> +   }
> +
> +   /* Create the vertex shader */
> +   {
> +  const uint semantic_names[] = { TGSI_SEMANTIC_POSITION,
> +  TGSI_SEMANTIC_COLOR,
> +st->needs_texcoord_semantic ? TGSI_SEMANTIC_TEXCOORD :
> +  TGSI_SEMANTIC_GENERIC };
> +  const uint semantic_indexes[] = { 0, 0, 0 };
> +  st->bitmap.vs = util_make_vertex_passthrough_shader(st->pipe, 3,
> +  semantic_names,
> +  semantic_indexes,
> +  FALSE);
> +   }
> +
> +   reset_cache(st);
> +}
> +
>
>  /**
>   * Called via ctx->Driver.Bitmap()
> @@ -632,6 +703,10 @@ st_Bitmap(struct gl_context *ctx, GLint x, GLint y,
> assert(width > 0);
> assert(height > 0);
>
> +   if (!st->bitmap.cache) {
> +  init_bitmap_state(st);
> +   }
> +
> /* We only need to validate state of the st dirty flags are set or
>  * any non-_NEW_PROGRAM_CONSTANTS mesa flags are set.  The VS we use
>  * for bitmap drawing uses no constants and the FS constants are
> @@ -641,19 +716,6 @@ st_Bitmap(struct gl_context *ctx, GLint x, GLint y,
>st_validate_state(st);
> }
>
> -   if (!st->bitmap.vs) {
> -  /* create pass-through vertex shader now */
> -  const uint semantic_names[] = { TGSI_SEMANTIC_POSITION,
> -  TGSI_SEMANTIC_COLOR,
> -st->needs_texcoord_semantic ? TGSI_SEMANTIC_TEXCOORD :
> -  TGSI_SEMANTIC_GENERIC };
> -  const uint 

Re: [Mesa-dev] [PATCH 3/5] nir: Do opt_algebraic in reverse order.

2016-02-08 Thread Jason Ekstrand
On Feb 8, 2016 9:17 AM, "Matt Turner"  wrote:
>
> On Sun, Feb 7, 2016 at 8:06 AM, Jason Ekstrand 
wrote:
> >
> > On Feb 4, 2016 5:45 PM, "Matt Turner"  wrote:
> >>
> >> Walking the SSA definitions in order means that we consider the
smallest
> >> algebraic optimizations before larger optimizations. So if a smaller
> >> rule is part of a larger rule, the smaller one will happen first,
> >> preventing the larger one from happening.
> >>
> >> instructions in affected programs: 32721 -> 32611 (-0.34%)
> >> helped: 106
> >>
> >> Prevents regressions and annoyances in the next commits.
> >
> > Mind doing just a little tooling to try and determine whether or not
this
> > increases the number of times the optimization loop runs?  Some
> > Optimizations may immediately allow some other optimization on their
result
> > which will now have to wait until the next time through the loop.
>
> In affected programs (129 of them):
>
> before:  1164 optimization loops
> after: 1071 optimization loops
>
> Of the 129 affected, 16 programs' optimization loop counts increased.

Good enough for me. Please add that to the commit message. R-B
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFCv3 11/11] mesa/st: add support for NIR as possible driver IR

2016-02-08 Thread Roland Scheidegger
Am 08.02.2016 um 14:59 schrieb Rob Clark:
> On Mon, Feb 8, 2016 at 6:58 AM, Roland Scheidegger  wrote:
>> Am 06.02.2016 um 22:30 schrieb Marek Olšák:
>>> On Sat, Feb 6, 2016 at 2:45 PM, Rob Clark  wrote:
 On Sun, Jan 31, 2016 at 3:16 PM, Rob Clark  wrote:
> +   // XXX get from pipe_screen?  Or just let pipe driver provide?
> +   nir_options.lower_fpow = true;
> +   nir_options.lower_fsat = true;
> +   nir_options.lower_scmp = true;
> +   nir_options.lower_flrp = true;
> +   nir_options.lower_ffract = true;
> +   nir_options.native_integers = true;
> +


 btw, one of the few remaining things to tackle is how to handle
 nir_shader_compiler_options struct.  To follow the existing approach
 of shader caps, I'd have to add a big pile of caps now, and then keep
 adding them as nir_shader_compiler_options struct grows.  Which seems
 sub-optimal.

 How do people feel about adding a screen->get_shader_paramp() which,
 along the lines of get_paramf, returns a 'const void *'?  Then we
 could add a single cap to return the whole compiler-options struct.
 (And maybe if at some point there was direct support for LLVM as an
 IR, it might need something similar??)

 Other possibility is just a pipe->get_nir_compiler_options() type
 hook.  A bit more of a point solution, but might make sense if we
 can't think of any other plausible uses for ->get_shader_paramp()..
 and less churn since it would only need to be implemented by drivers
 consuming NIR..

 Thoughts/opinions?
>>>
>>> pipe->get_nir_compiler_options() sounds good.
>>>
>>> Maybe wait for VMWare guys' opinion as well.
>>
>> Looks usable to me, albeit I'm not sure you really need NIR-specific
>> options as such? That is those options above don't really look nir
>> specific - maybe they aren't used with just glsl->tgsi, but it looks to
>> me like they would in theory be applicable to other IR as well. Though I
>> suppose if you just had compiler_otions it would be a bit confusing if
>> you had entries which then may not be used...
> 
> Yeah, not really NIR specific (and there are a couple that overlap w/
> existing caps), other than being used only by NIR..  although it would
> be a lot of churn to keep adding caps when the compiler_options struct
> is extended, and it might be confusing that some of the lowering
> options aren't supported in the TGSI path..
> 
> I guess right now it really only matters for two drivers, and down the
> road I think we won't have more than 3 or 4 drivers using NIR, so I
> suppose it is also an option to start w/
> screen->get_nir_compiler_options() for now and revisit later.  If we
> get to the point where we are always doing glsl->nir and then
> optionally nir->tgsi for drivers that don't consume NIR directly,
> maybe then it would make more sense to expose everything as caps?
> 

Probably. In any case, it looks like it would be easy enough to change
later, so whatever solution looks good now is ok.

Roland


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] st/mesa: don't allocate bitmap drawing state until needed

2016-02-08 Thread Brian Paul

On 02/08/2016 10:10 AM, Gustaw Smolarczyk wrote:

2016-02-08 18:07 GMT+01:00 Brian Paul :

Most apps don't use glBitmap so don't allocate the bitmap cache or
gallium state objects/shaders/etc until the first call to st_Bitmap().
---
  src/mesa/state_tracker/st_cb_bitmap.c | 145 ++
  src/mesa/state_tracker/st_cb_bitmap.h |   3 -
  src/mesa/state_tracker/st_context.c   |   1 -
  3 files changed, 77 insertions(+), 72 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_bitmap.c 
b/src/mesa/state_tracker/st_cb_bitmap.c
index c26ee7f..ca1dfab 100644
--- a/src/mesa/state_tracker/st_cb_bitmap.c
+++ b/src/mesa/state_tracker/st_cb_bitmap.c
@@ -497,8 +497,9 @@ create_cache_trans(struct st_context *st)
  void
  st_flush_bitmap_cache(struct st_context *st)
  {
-   if (!st->bitmap.cache->empty) {
-  struct bitmap_cache *cache = st->bitmap.cache;
+   struct bitmap_cache *cache = st->bitmap.cache;
+
+   if (cache && !st->bitmap.cache->empty) {

Maybe do the following:

if (cache && !cache->empty) {



Yes.  Thanks.

-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFCv3 11/11] mesa/st: add support for NIR as possible driver IR

2016-02-08 Thread Ilia Mirkin
On Mon, Feb 8, 2016 at 8:59 AM, Rob Clark  wrote:
> On Mon, Feb 8, 2016 at 6:58 AM, Roland Scheidegger  wrote:
>> Am 06.02.2016 um 22:30 schrieb Marek Olšák:
>>> On Sat, Feb 6, 2016 at 2:45 PM, Rob Clark  wrote:
 On Sun, Jan 31, 2016 at 3:16 PM, Rob Clark  wrote:
> +   // XXX get from pipe_screen?  Or just let pipe driver provide?
> +   nir_options.lower_fpow = true;
> +   nir_options.lower_fsat = true;
> +   nir_options.lower_scmp = true;
> +   nir_options.lower_flrp = true;
> +   nir_options.lower_ffract = true;
> +   nir_options.native_integers = true;
> +


 btw, one of the few remaining things to tackle is how to handle
 nir_shader_compiler_options struct.  To follow the existing approach
 of shader caps, I'd have to add a big pile of caps now, and then keep
 adding them as nir_shader_compiler_options struct grows.  Which seems
 sub-optimal.

 How do people feel about adding a screen->get_shader_paramp() which,
 along the lines of get_paramf, returns a 'const void *'?  Then we
 could add a single cap to return the whole compiler-options struct.
 (And maybe if at some point there was direct support for LLVM as an
 IR, it might need something similar??)

 Other possibility is just a pipe->get_nir_compiler_options() type
 hook.  A bit more of a point solution, but might make sense if we
 can't think of any other plausible uses for ->get_shader_paramp()..
 and less churn since it would only need to be implemented by drivers
 consuming NIR..

 Thoughts/opinions?
>>>
>>> pipe->get_nir_compiler_options() sounds good.
>>>
>>> Maybe wait for VMWare guys' opinion as well.
>>
>> Looks usable to me, albeit I'm not sure you really need NIR-specific
>> options as such? That is those options above don't really look nir
>> specific - maybe they aren't used with just glsl->tgsi, but it looks to
>> me like they would in theory be applicable to other IR as well. Though I
>> suppose if you just had compiler_otions it would be a bit confusing if
>> you had entries which then may not be used...
>
> Yeah, not really NIR specific (and there are a couple that overlap w/
> existing caps), other than being used only by NIR..  although it would
> be a lot of churn to keep adding caps when the compiler_options struct
> is extended, and it might be confusing that some of the lowering
> options aren't supported in the TGSI path..
>
> I guess right now it really only matters for two drivers, and down the
> road I think we won't have more than 3 or 4 drivers using NIR, so I
> suppose it is also an option to start w/
> screen->get_nir_compiler_options() for now and revisit later.  If we
> get to the point where we are always doing glsl->nir and then
> optionally nir->tgsi for drivers that don't consume NIR directly,
> maybe then it would make more sense to expose everything as caps?

I actually kinda want this for TGSI as well, eventually. Perhaps something like

bool get_compiler_options(pipe_shader_ir, void *)

or perhaps struct pipe_compiler_options * (which would contain some
common stuff + a union of the per-ir options) instead of the void *
would make more sense? The reason I'm interested is to be able to
indicate that various frontend opts should be disabled. Also it could
be used to get rid of a bunch of PIPE_CAP_TGSI_XYZ's, which are a huge
pain to add all the time.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] nir: Do opt_algebraic in reverse order.

2016-02-08 Thread Matt Turner
On Sun, Feb 7, 2016 at 8:06 AM, Jason Ekstrand  wrote:
>
> On Feb 4, 2016 5:45 PM, "Matt Turner"  wrote:
>>
>> Walking the SSA definitions in order means that we consider the smallest
>> algebraic optimizations before larger optimizations. So if a smaller
>> rule is part of a larger rule, the smaller one will happen first,
>> preventing the larger one from happening.
>>
>> instructions in affected programs: 32721 -> 32611 (-0.34%)
>> helped: 106
>>
>> Prevents regressions and annoyances in the next commits.
>
> Mind doing just a little tooling to try and determine whether or not this
> increases the number of times the optimization loop runs?  Some
> Optimizations may immediately allow some other optimization on their result
> which will now have to wait until the next time through the loop.

In affected programs (129 of them):

before:  1164 optimization loops
after: 1071 optimization loops

Of the 129 affected, 16 programs' optimization loop counts increased.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] nir: Do opt_algebraic in reverse order.

2016-02-08 Thread Matt Turner
On Mon, Feb 8, 2016 at 9:52 AM, Jason Ekstrand  wrote:
> Good enough for me. Please add that to the commit message. R-B

Thanks, will do.

Do you plan to review any of the others in the series?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] draw: use util_pstipple_create_fragment_shader

2016-02-08 Thread Jose Fonseca

Thanks for this this.  Series looks good to me.

Reviewed-by: Jose Fonseca 

Sorry for not replying sooner -- I missed it.  (Unfortunately I haven't 
been able to keep up with mesa-dev traffic and if I'm not CC'ed the odds 
are I miss things.)


Jose


On 08/02/16 14:59, Nicolai Hähnle wrote:

Ping?

On 22.01.2016 11:56, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

This reduces code duplication. It also adds support for drivers where the
fragment position is a system value.

Suggested-by: Jose Fonseca 
---
A basic polygon stippling test shows no regression on llvmpipe, but
that's
the extent of my testing.

  src/gallium/auxiliary/draw/draw_pipe_pstipple.c | 209
++--
  1 file changed, 12 insertions(+), 197 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
index cf52ca4..e468cc3 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
@@ -43,10 +43,10 @@
  #include "util/u_format.h"
  #include "util/u_math.h"
  #include "util/u_memory.h"
+#include "util/u_pstipple.h"
  #include "util/u_sampler.h"

  #include "tgsi/tgsi_transform.h"
-#include "tgsi/tgsi_dump.h"

  #include "draw_context.h"
  #include "draw_pipe.h"
@@ -114,178 +114,6 @@ struct pstip_stage
  };


-
-/**
- * Subclass of tgsi_transform_context, used for transforming the
- * user's fragment shader to add the extra texture sample and
fragment kill
- * instructions.
- */
-struct pstip_transform_context {
-   struct tgsi_transform_context base;
-   uint tempsUsed;  /**< bitmask */
-   int wincoordInput;
-   int maxInput;
-   uint samplersUsed;  /**< bitfield of samplers used */
-   bool hasSview;
-   int freeSampler;  /** an available sampler for the pstipple */
-   int texTemp;  /**< temp registers */
-   int numImmed;
-};
-
-
-/**
- * TGSI declaration transform callback.
- * Look for a free sampler, a free input attrib, and two free temp regs.
- */
-static void
-pstip_transform_decl(struct tgsi_transform_context *ctx,
- struct tgsi_full_declaration *decl)
-{
-   struct pstip_transform_context *pctx = (struct
pstip_transform_context *) ctx;
-
-   if (decl->Declaration.File == TGSI_FILE_SAMPLER) {
-  uint i;
-  for (i = decl->Range.First;
-   i <= decl->Range.Last; i++) {
- pctx->samplersUsed |= 1 << i;
-  }
-   }
-   else if (decl->Declaration.File == TGSI_FILE_SAMPLER_VIEW) {
-  pctx->hasSview = true;
-   }
-   else if (decl->Declaration.File == TGSI_FILE_INPUT) {
-  pctx->maxInput = MAX2(pctx->maxInput, (int) decl->Range.Last);
-  if (decl->Semantic.Name == TGSI_SEMANTIC_POSITION)
- pctx->wincoordInput = (int) decl->Range.First;
-   }
-   else if (decl->Declaration.File == TGSI_FILE_TEMPORARY) {
-  uint i;
-  for (i = decl->Range.First;
-   i <= decl->Range.Last; i++) {
- pctx->tempsUsed |= (1 << i);
-  }
-   }
-
-   ctx->emit_declaration(ctx, decl);
-}
-
-
-/**
- * TGSI immediate declaration transform callback.
- * We're just counting the number of immediates here.
- */
-static void
-pstip_transform_immed(struct tgsi_transform_context *ctx,
-  struct tgsi_full_immediate *immed)
-{
-   struct pstip_transform_context *pctx = (struct
pstip_transform_context *) ctx;
-   ctx->emit_immediate(ctx, immed); /* emit to output shader */
-   pctx->numImmed++;
-}
-
-
-/**
- * Find the lowest zero bit in the given word, or -1 if bitfield is
all ones.
- */
-static int
-free_bit(uint bitfield)
-{
-   return ffs(~bitfield) - 1;
-}
-
-
-/**
- * TGSI transform prolog callback.
- */
-static void
-pstip_transform_prolog(struct tgsi_transform_context *ctx)
-{
-   struct pstip_transform_context *pctx = (struct
pstip_transform_context *) ctx;
-   uint i;
-   int wincoordInput;
-
-   /* find free sampler */
-   pctx->freeSampler = free_bit(pctx->samplersUsed);
-   if (pctx->freeSampler >= PIPE_MAX_SAMPLERS)
-  pctx->freeSampler = PIPE_MAX_SAMPLERS - 1;
-
-   if (pctx->wincoordInput < 0)
-  wincoordInput = pctx->maxInput + 1;
-   else
-  wincoordInput = pctx->wincoordInput;
-
-   /* find one free temp reg */
-   for (i = 0; i < 32; i++) {
-  if ((pctx->tempsUsed & (1 << i)) == 0) {
-  /* found a free temp */
-  if (pctx->texTemp < 0)
- pctx->texTemp  = i;
-  else
- break;
-  }
-   }
-   assert(pctx->texTemp >= 0);
-
-   if (pctx->wincoordInput < 0) {
-  /* declare new position input reg */
-  tgsi_transform_input_decl(ctx, wincoordInput,
-TGSI_SEMANTIC_POSITION, 1,
-TGSI_INTERPOLATE_LINEAR);
-   }
-
-   /* declare new sampler */
-   tgsi_transform_sampler_decl(ctx, pctx->freeSampler);
-
-   /* if the src shader has SVIEW decl's for each SAMP decl, we
-* need to continue the trend and ensure there is a matching
-* 

Re: [Mesa-dev] [RFCv3 11/11] mesa/st: add support for NIR as possible driver IR

2016-02-08 Thread Ilia Mirkin
On Mon, Feb 8, 2016 at 1:55 PM, Rob Clark  wrote:
> On Mon, Feb 8, 2016 at 1:01 PM, Ilia Mirkin  wrote:
>> On Mon, Feb 8, 2016 at 8:59 AM, Rob Clark  wrote:
>>> On Mon, Feb 8, 2016 at 6:58 AM, Roland Scheidegger  
>>> wrote:
 Am 06.02.2016 um 22:30 schrieb Marek Olšák:
> On Sat, Feb 6, 2016 at 2:45 PM, Rob Clark  wrote:
>> On Sun, Jan 31, 2016 at 3:16 PM, Rob Clark  wrote:
>>> +   // XXX get from pipe_screen?  Or just let pipe driver provide?
>>> +   nir_options.lower_fpow = true;
>>> +   nir_options.lower_fsat = true;
>>> +   nir_options.lower_scmp = true;
>>> +   nir_options.lower_flrp = true;
>>> +   nir_options.lower_ffract = true;
>>> +   nir_options.native_integers = true;
>>> +
>>
>>
>> btw, one of the few remaining things to tackle is how to handle
>> nir_shader_compiler_options struct.  To follow the existing approach
>> of shader caps, I'd have to add a big pile of caps now, and then keep
>> adding them as nir_shader_compiler_options struct grows.  Which seems
>> sub-optimal.
>>
>> How do people feel about adding a screen->get_shader_paramp() which,
>> along the lines of get_paramf, returns a 'const void *'?  Then we
>> could add a single cap to return the whole compiler-options struct.
>> (And maybe if at some point there was direct support for LLVM as an
>> IR, it might need something similar??)
>>
>> Other possibility is just a pipe->get_nir_compiler_options() type
>> hook.  A bit more of a point solution, but might make sense if we
>> can't think of any other plausible uses for ->get_shader_paramp()..
>> and less churn since it would only need to be implemented by drivers
>> consuming NIR..
>>
>> Thoughts/opinions?
>
> pipe->get_nir_compiler_options() sounds good.
>
> Maybe wait for VMWare guys' opinion as well.

 Looks usable to me, albeit I'm not sure you really need NIR-specific
 options as such? That is those options above don't really look nir
 specific - maybe they aren't used with just glsl->tgsi, but it looks to
 me like they would in theory be applicable to other IR as well. Though I
 suppose if you just had compiler_otions it would be a bit confusing if
 you had entries which then may not be used...
>>>
>>> Yeah, not really NIR specific (and there are a couple that overlap w/
>>> existing caps), other than being used only by NIR..  although it would
>>> be a lot of churn to keep adding caps when the compiler_options struct
>>> is extended, and it might be confusing that some of the lowering
>>> options aren't supported in the TGSI path..
>>>
>>> I guess right now it really only matters for two drivers, and down the
>>> road I think we won't have more than 3 or 4 drivers using NIR, so I
>>> suppose it is also an option to start w/
>>> screen->get_nir_compiler_options() for now and revisit later.  If we
>>> get to the point where we are always doing glsl->nir and then
>>> optionally nir->tgsi for drivers that don't consume NIR directly,
>>> maybe then it would make more sense to expose everything as caps?
>>
>> I actually kinda want this for TGSI as well, eventually. Perhaps something 
>> like
>>
>> bool get_compiler_options(pipe_shader_ir, void *)
>
> perhaps:
>
>   const struct pipe_compiler_options * (*get_compiler_options)(struct
> pipe_screen *, unsigned shader)
>
> imo, it should take shader stage as arg so we can have different
> config per stage, and return a const ptr.. and I think it could
> directly return options struct (no need for bool return)..
>
> I suppose if you plan to add lots of knobs to twiddle for TGSI then
> shader cap for each would be annoying.  Although not super-thrilled
> about having to translate from generic(ish) pipe struct to nir struct,
> since the driver will already want a const version of the nir struct
> for the tgsi_to_nir case.
>
> I guess we could do:
>
>   const void * (*get_compiler_options)(struct pipe_screen *, unsigned
> shader, enum pipe_shader_ir type)

That means the driver has to allocate and store the options somewhere.
This can get annoying... I'd rather just have it fill in a struct
that's passed in.

>
> where the return value could be 'struct tgsi_shader_options *' or
> 'struct nir_shader_compiler_options *', etc..

Well I was thinking that there'd be a

struct pipe_compiler_options {
  some common stuff?
  union {
struct nir_options;
struct tgsi_options;
  }
}

>
> hmm.. also not sure how to roll that out without a flag day.  Perhaps
> keep the shader params for now (for tgsi) with a helper to populate a
> tgsi_compiler_options struct for drivers where
> screen->get_compiler_options() is null.. (and then what about other
> st's?)

Yeah, definitely some sort of transition plan would have to happen.
And maybe leave all the current 

Re: [Mesa-dev] [PATCH] mesa/extensions: Fix NVX_gpu_memory_info lexicographical order.

2016-02-08 Thread Nanley Chery
On Sat, Feb 06, 2016 at 01:20:30PM +0100, Kai Wasserbäch wrote:
> Hey Vinson,
> I would say the test is wrong. If I sort as a human, "NV_" comes before 
> "NVX_".
> 
> And running this through sort (the tool), it agrees:
> 
> $ echo -e "NVX_gpu_memory_info\nNV_blend_square" | sort -d
> NV_blend_square
> NVX_gpu_memory_info
> 
> From src/mesa/main/tests/mesa_extensions.cpp I'm seeing that you don't 
> actually
> check the dictionary order but rather the character value. So it depends 
> whether
> you want to make humans or a simple test happy. ;-) But that's to decide for
> people more involved in Mesa.
> 

The test implements the same character sorting method used to create
the extensions string in Mesa.
See extension_compare() in src/mesa/main/extensions.c.

If we want to change the test behaviour, I think it'd be good to change
Mesa's as well for consistency.

Regards,
Nanley

> Cheers,
> Kai
> 
> 
> Vinson Lee wrote on 06.02.2016 08:30:
> > Fixes MesaExtensionsTest.AlphabeticallySorted.
> > 
> > Fixes: 1d79b9958090 ("mesa: implement GL_NVX_gpu_memory_info (v2)")
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94016
> > Signed-off-by: Vinson Lee 
> > ---
> >  src/mesa/main/extensions_table.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/main/extensions_table.h 
> > b/src/mesa/main/extensions_table.h
> > index ded6f2c..d1e3a99 100644
> > --- a/src/mesa/main/extensions_table.h
> > +++ b/src/mesa/main/extensions_table.h
> > @@ -273,6 +273,8 @@ EXT(MESA_texture_signed_rgba, 
> > EXT_texture_snorm
> >  EXT(MESA_window_pos , dummy_true   
> >   , GLL,  x ,  x ,  x , 2000)
> >  EXT(MESA_ycbcr_texture  , MESA_ycbcr_texture   
> >   , GLL, GLC,  x ,  x , 2002)
> >  
> > +EXT(NVX_gpu_memory_info , NVX_gpu_memory_info  
> >   , GLL, GLC,  x ,  x , 2013)
> > +
> >  EXT(NV_blend_square , dummy_true   
> >   , GLL,  x ,  x ,  x , 1999)
> >  EXT(NV_conditional_render   , NV_conditional_render
> >   , GLL, GLC,  x ,  x , 2008)
> >  EXT(NV_depth_clamp  , ARB_depth_clamp  
> >   , GLL, GLC,  x ,  x , 2001)
> > @@ -293,7 +295,6 @@ EXT(NV_texture_barrier  , 
> > NV_texture_barrier
> >  EXT(NV_texture_env_combine4 , NV_texture_env_combine4  
> >   , GLL,  x ,  x ,  x , 1999)
> >  EXT(NV_texture_rectangle, NV_texture_rectangle 
> >   , GLL,  x ,  x ,  x , 2000)
> >  EXT(NV_vdpau_interop, NV_vdpau_interop 
> >   , GLL, GLC,  x ,  x , 2010)
> > -EXT(NVX_gpu_memory_info , NVX_gpu_memory_info  
> >   , GLL, GLC,  x ,  x , 2013)
> >  
> >  EXT(OES_EGL_image   , OES_EGL_image
> >   , GLL, GLC, ES1, ES2, 2006) /* FIXME: Mesa expects 
> > GL_OES_EGL_image to be available in OpenGL contexts. */
> >  EXT(OES_EGL_image_external  , OES_EGL_image_external   
> >   ,  x ,  x , ES1, ES2, 2010)
> > 
> 



> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] nir: Recognize open-coded bitfield_reverse.

2016-02-08 Thread Dylan Baker
Quoting Jason Ekstrand (2016-02-08 12:04:15)
[snip]

> 
>I trust Dylan on patch 4.  I was just trying to ensure that we got/used a
>32-bit value.
>--Jason
> 

I mentioned to you offline, but I think it might be worth converting
at least the opt algebraic passes to use numpy, since those *are* C
types so you can guarantee the C behavior.

If that's to anyone interesting I could converting to numpy a go.

Dylan

[snip]


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Marek Olšák
On Mon, Feb 8, 2016 at 5:16 PM, Matt Arsenault  wrote:
>
> On Feb 8, 2016, at 08:08, Tom Stellard  wrote:
>
> Do SI/CI support fp64 denorms?  If so, won't this hurt performance?
>
> We should tell the compiler we are enabling fp-64 denorms by adding
> +fp64-denormals to the feature string.  It would also be better to
> read the float_mode value from the config registers emitted by the
> compiler.
>
>
> Yes, the runtime here should read the value out of the binary and enable it
> in the compiler rather than the runtime hardcoding it. If you wanted to load
> a shader with different FP rules for example it should be able to switch.

This is not the runtime. This is the compiler. :) It's the middle-end
though, not the back-end.

I would use +fp64-denormals if it did something to graphics shaders.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] nir: Recognize open-coded bitfield_reverse.

2016-02-08 Thread Jason Ekstrand
On Thu, Feb 4, 2016 at 5:48 PM, Matt Turner  wrote:

> Helps 11 shaders in UnrealEngine4 demos.
>
> I seriously hope they would have given us bitfieldReverse() if we
> exposed GL 4.0 (but we do expose ARB_gpu_shader5, so why not use that
> anyway?).
>
> instructions in affected programs: 4875 -> 4633 (-4.96%)
> cycles in affected programs: 270516 -> 244516 (-9.61%)
>
> I suspect there's a *lot* of room to improve nir_search/opt_algebraic's
> handling of this. We'd actually like to match, e.g., step2 by matching
> step1 once and then doing a pointer comparison for the second instance
> of step1, but unfortunately we generate an enormous tuple for instead.
>
> The .text size increases by 6.5% and the .data by 17.5%.
>
>text data  bssdechex  filename
>   22957452240  68181  10a55  nir_libnir_la-nir_opt_algebraic.o
>   24461531600  77621  12f35  nir_libnir_la-nir_opt_algebraic.o
>
> I'd be happy to remove this if Unreal4 uses bitfieldReverse() if it is
> in a GL 4.0 context once we expose GL 4.0.
> ---
> Maybe it'd be better do make this a separate pass capable of recognizing
> this
> pattern without blowing up the compiled code size. Probably worth checking
> whether they use bitfieldReverse() under GL 4.0 first...
>
>  src/compiler/nir/nir_opt_algebraic.py | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py
> b/src/compiler/nir/nir_opt_algebraic.py
> index 0a248a2..f92c6b9 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -311,6 +311,18 @@ optimizations = [
>   'options->lower_unpack_snorm_4x8'),
>  ]
>
> +def bitfield_reverse(u):
> +step1 = ('ior', ('ishl', u, 16), ('ushr', u, 16))
> +step2 = ('ior', ('ishl', ('iand', step1, 0x00ff00ff), 8), ('ushr',
> ('iand', step1, 0xff00ff00), 8))
> +step3 = ('ior', ('ishl', ('iand', step2, 0x0f0f0f0f), 4), ('ushr',
> ('iand', step2, 0xf0f0f0f0), 4))
> +step4 = ('ior', ('ishl', ('iand', step3, 0x), 2), ('ushr',
> ('iand', step3, 0x), 2))
> +step5 = ('ior', ('ishl', ('iand', step4, 0x), 1), ('ushr',
> ('iand', step4, 0x), 1))
> +
> +return step5
>

Mind calling this "ue4_bitfield_reverse"?  You're not detecting a generic
bitfield reverse here.  With that, patches 1, 3, and 5 are

Reviewed-by: Jason Ekstrand 

I trust Dylan on patch 4.  I was just trying to ensure that we got/used a
32-bit value.
--Jason


> +
> +optimizations += [(bitfield_reverse('x'), ('bitfield_reverse', 'x'))]
> +
> +
>  # Add optimizations to handle the case where the result of a ternary is
>  # compared to a constant.  This way we can take things like
>  #
> --
> 2.4.10
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] nir: Recognize sum of open-coded pow()s.

2016-02-08 Thread Matt Turner
On Mon, Feb 8, 2016 at 12:01 PM, Jason Ekstrand  wrote:
> On Thu, Feb 4, 2016 at 5:47 PM, Matt Turner  wrote:
>>
>> Prevents regressions in the next commit.
>> ---
>>  src/compiler/nir/nir_opt_algebraic.py | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index 60df69f..0a248a2 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -167,6 +167,7 @@ optimizations = [
>> (('flog2', ('fexp2', a)), a), # lg2(2^a) = a
>> (('fpow', a, b), ('fexp2', ('fmul', ('flog2', a), b)),
>> 'options->lower_fpow'), # a^b = 2^(lg2(a)*b)
>> (('fexp2', ('fmul', ('flog2', a), b)), ('fpow', a, b),
>> '!options->lower_fpow'), # 2^(lg2(a)*b) = a^b
>> +   (('fexp2', ('fadd', ('fmul', ('flog2', a), b), ('fmul', ('flog2', c),
>> d))), ('fadd', ('fpow', a, b), ('fpow', c, d))),
>
>
> I think you mean ('fmul', ('fpow', a, b), ('fpow', c, d)).  You can't pull
> an add out of an exp and get another add.

Whoops. Yes, thank you!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Marek Olšák
On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard  wrote:
> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
>> but not SI & CI, which can't disable denorms for those instructions.
>
> Do you know why this fixes FP16 conversions?  What does the OpenGL
> spec say about denormal handing?

Yes, I know why. The patch explain everything as far as I can see
though. What isn't clear?

SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
and accept FP16 denormals.
VI: Supports FP16. FP16 denormal support is now configurable and
affects FP16 conversions as well.(shared setting with FP64).

OpenGL doesn't require denormals. Piglit does. I think this is
incorrect piglit behavior.

>
>> ---
>>  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
>>  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
>>  src/gallium/drivers/radeonsi/sid.h  |  3 +++
>>  3 files changed, 29 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
>> b/src/gallium/drivers/radeonsi/si_shader.c
>> index a4680ce..3f1db70 100644
>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
>>
>>   si_shader_binary_read_config(binary, conf, 0);
>>
>> + /* Enable 64-bit and 16-bit denormals, because there is no performance
>> +  * cost.
>> +  *
>> +  * If denormals are enabled, all floating-point output modifiers are
>> +  * ignored.
>> +  *
>> +  * Don't enable denormals for 32-bit floats, because:
>> +  * - Floating-point output modifiers would be ignored by the hw.
>> +  * - Some opcodes don't support denormals, such as v_mad_f32. We would
>> +  *   have to stop using those.
>> +  * - SI & CI would be very slow.
>> +  */
>> + conf->float_mode |= V_00B028_FP_64_DENORMS;
>> +
>
> Do SI/CI support fp64 denorms?  If so, won't this hurt performance?

Yes, they do. Fp64 denorms don't hurt performance. Only fp32 denorms
do on SI & CI.

>
> We should tell the compiler we are enabling fp-64 denorms by adding
> +fp64-denormals to the feature string.  It would also be better to
> read the float_mode value from the config registers emitted by the
> compiler.

Yes, I agree, but LLVM only sets these parameters for compute or even
HSA-only kernels, not for graphics shaders. We need to set the mode
for all users _now_, not in 6 months. Last time I looked,
+fp64-denormals had no effect on graphics shaders.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 94040] clGetPlatformIDs causes futex race condition

2016-02-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=94040

Francisco Jerez  changed:

   What|Removed |Added

 CC||curroje...@riseup.net

--- Comment #6 from Francisco Jerez  ---
Can you also provide backtraces for any concurrently running threads?  I
suspect that reverting commit d5b1731178378b3d828c74368f6bfe85edc10618 may fix
the deadlock, any chance you could try?

Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Don't add barrier deps for FB write messages.

2016-02-08 Thread Kenneth Graunke
There are never render target reads, so there are no scheduling hazards.

Giving the extra flexibility to the scheduler makes it possible to do
FB writes as soon as their sources are available, reducing register
pressure.  It also makes it possible to do the payload setup for more
than one FB write message at a time, which could better hide latency.

shader-db results on Skylake:

total instructions in shared programs: 9110254 -> 9110211 (-0.00%)
instructions in affected programs: 2898 -> 2855 (-1.48%)
helped: 3
HURT:   0
LOST:   0
GAINED: 1

A reduction in instruction counts is surprising, but legitimate:
the three shaders helped were spilling, and reducing register
pressure allowed us to issue fewer spills/fills.

total cycles in shared programs: 69035108 -> 68928820 (-0.15%)
cycles in affected programs: 4412402 -> 4306114 (-2.41%)
helped: 4457
HURT: 213

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
index 60f7fd9..4f97577 100644
--- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
@@ -939,8 +939,9 @@ fs_instruction_scheduler::calculate_deps()
foreach_in_list(schedule_node, n, ) {
   fs_inst *inst = (fs_inst *)n->inst;
 
-  if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT ||
- inst->has_side_effects())
+  if ((inst->opcode == FS_OPCODE_PLACEHOLDER_HALT ||
+   inst->has_side_effects()) &&
+  inst->opcode != FS_OPCODE_FB_WRITE)
  add_barrier_deps(n);
 
   /* read-after-write deps. */
@@ -1195,7 +1196,7 @@ vec4_instruction_scheduler::calculate_deps()
foreach_in_list(schedule_node, n, ) {
   vec4_instruction *inst = (vec4_instruction *)n->inst;
 
-  if (inst->has_side_effects())
+  if (inst->has_side_effects() && inst->opcode != FS_OPCODE_FB_WRITE)
  add_barrier_deps(n);
 
   /* read-after-write deps. */
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Don't add barrier deps for FB write messages.

2016-02-08 Thread Connor Abbott
Reviewed-by: Connor Abbott 

FWIW, in this area, another place where we unnecessarily introduce
dependencies between instructions is when multiple instructions write
to different parts of a virtual register, for example when setting up
message headers. Instead of tracking dependencies per-vgrf, we should
be tracking them per-register (similar to what liveness analysis does)
to avoid that. The glassy mesa repo has some changes to that effect,
but their implementation is broken.

On Mon, Feb 8, 2016 at 2:31 PM, Kenneth Graunke  wrote:
> There are never render target reads, so there are no scheduling hazards.
>
> Giving the extra flexibility to the scheduler makes it possible to do
> FB writes as soon as their sources are available, reducing register
> pressure.  It also makes it possible to do the payload setup for more
> than one FB write message at a time, which could better hide latency.
>
> shader-db results on Skylake:
>
> total instructions in shared programs: 9110254 -> 9110211 (-0.00%)
> instructions in affected programs: 2898 -> 2855 (-1.48%)
> helped: 3
> HURT:   0
> LOST:   0
> GAINED: 1
>
> A reduction in instruction counts is surprising, but legitimate:
> the three shaders helped were spilling, and reducing register
> pressure allowed us to issue fewer spills/fills.
>
> total cycles in shared programs: 69035108 -> 68928820 (-0.15%)
> cycles in affected programs: 4412402 -> 4306114 (-2.41%)
> helped: 4457
> HURT: 213
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
> b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> index 60f7fd9..4f97577 100644
> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> @@ -939,8 +939,9 @@ fs_instruction_scheduler::calculate_deps()
> foreach_in_list(schedule_node, n, ) {
>fs_inst *inst = (fs_inst *)n->inst;
>
> -  if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT ||
> - inst->has_side_effects())
> +  if ((inst->opcode == FS_OPCODE_PLACEHOLDER_HALT ||
> +   inst->has_side_effects()) &&
> +  inst->opcode != FS_OPCODE_FB_WRITE)
>   add_barrier_deps(n);
>
>/* read-after-write deps. */
> @@ -1195,7 +1196,7 @@ vec4_instruction_scheduler::calculate_deps()
> foreach_in_list(schedule_node, n, ) {
>vec4_instruction *inst = (vec4_instruction *)n->inst;
>
> -  if (inst->has_side_effects())
> +  if (inst->has_side_effects() && inst->opcode != FS_OPCODE_FB_WRITE)
>   add_barrier_deps(n);
>
>/* read-after-write deps. */
> --
> 2.7.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 94050] test_vec4_register_coalesce regression

2016-02-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=94050

Bug ID: 94050
   Summary: test_vec4_register_coalesce regression
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Keywords: bisected, regression
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: v...@freedesktop.org
QA Contact: mesa-dev@lists.freedesktop.org
CC: imir...@alum.mit.edu, matts...@gmail.com

mesa: 01dacc83ff43a054513277e3e1296c3fc8cd750a (master 11.2.0-devel)

Running main() from gtest_main.cc
[==] Running 2 tests from 1 test case.
[--] Global test environment set-up.
[--] 2 tests from copy_propagation_test
[ RUN  ] copy_propagation_test.test_swizzle_swizzle

Program received signal SIGSEGV, Segmentation fault.
brw::vec4_visitor::opt_copy_propagation (this=this@entry=0xab73a0,
do_constant_prop=do_constant_prop@entry=true)
at brw_vec4_copy_propagation.cpp:403
403  prog_data->dispatch_mode == DISPATCH_MODE_4X2_DUAL_OBJECT ? 1 : 2;
(gdb) bt
#0  brw::vec4_visitor::opt_copy_propagation (this=this@entry=0xab73a0,
do_constant_prop=do_constant_prop@entry=true)
at brw_vec4_copy_propagation.cpp:403
#1  0x0040aa89 in copy_propagation (v=0xab73a0) at
test_vec4_copy_propagation.cpp:118
#2  copy_propagation_test_test_swizzle_swizzle_Test::TestBody (this=) at test_vec4_copy_propagation.cpp:146
#3  0x0042eb63 in
testing::internal::HandleSehExceptionsInMethodIfSupported
(
location=0x744498 "the test body", method=,
object=) at ./src/gtest.cc:2078
#4  testing::internal::HandleExceptionsInMethodIfSupported
(object=object@entry=0xab6950, 
method=(void (testing::Test::*)(testing::Test * const)) 0x40a860
,
location=location@entry=0x744498 "the test body") at ./src/gtest.cc:2114
#5  0x004265da in testing::Test::Run (this=0xab6950) at
./src/gtest.cc:2151
#6  0x00426728 in testing::TestInfo::Run (this=0xab2cb0) at
./src/gtest.cc:2326
#7  0x00426805 in testing::TestCase::Run (this=0xab31a0) at
./src/gtest.cc:2444
#8  0x0042745f in testing::internal::UnitTestImpl::RunAllTests
(this=0xab2de0) at ./src/gtest.cc:4315
#9  0x0042f043 in
testing::internal::HandleSehExceptionsInMethodIfSupported (
location=0x743760 "auxiliary test code (environments or event listeners)",
method=, object=)
at ./src/gtest.cc:2078
#10
testing::internal::HandleExceptionsInMethodIfSupported (object=0xab2de0, 
method=(bool
(testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const))
0x427220 ,
location=location@entry=0x743760 "auxiliary test code (environments or event
listeners)")
at ./src/gtest.cc:2114
#11 0x004268d4 in testing::UnitTest::Run (this=0xa9b380
)
at ./src/gtest.cc:3926
#12 0x00407cd2 in RUN_ALL_TESTS () at
../../src/gtest/include/gtest/gtest.h:2288
#13 main (argc=1, argv=0x7fffde88) at src/gtest_main.cc:37
(gdb) frame 0
#0  brw::vec4_visitor::opt_copy_propagation (this=this@entry=0xab73a0,
do_constant_prop=do_constant_prop@entry=true)
at brw_vec4_copy_propagation.cpp:403
403  prog_data->dispatch_mode == DISPATCH_MODE_4X2_DUAL_OBJECT ? 1 : 2;
(gdb) print prog_data
$1 = (brw_vue_prog_data * const) 0x0

9f2e22bf343b21d6b44e6a502f00a86d169f5ade is the first bad commit
commit 9f2e22bf343b21d6b44e6a502f00a86d169f5ade
Author: Matt Turner 
Date:   Sun Jan 17 20:30:14 2016 -0500

i965/vec4: don't copy ATTR into 3src instructions with complex swizzles

The vec4 backend, at the end, does this:

if (inst->is_3src()) {
   for (int i = 0; i < 3; i++) {
  if (inst->src[i].vstride == BRW_VERTICAL_STRIDE_0)
 assert(brw_is_single_value_swizzle(inst->src[i].swizzle));

So make sure that we use the same conditions when trying to
copy-propagate. UNIFORMs will be converted to vstride 0 in
convert_to_hw_regs, but so will ATTRs when interleaved (as will happen
in a GS with multiple attributes). Since the vstride is not set at
copy-prop time, infer it by inspecting dispatch_mode and reject ATTRs if
they have non-scalar swizzles and are interleaved.

Fixes assertion errors in dolphin-generated geometry shaders (or
misrendering on opt builds) on Sandybridge or on IVB/HSW with
INTEL_DEBUG=nodualobj.

Co-authored-by: Ilia Mirkin 
Reviewed-by: Ilia Mirkin 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93418
Cc: "11.0 11.1" 

[Mesa-dev] [Bug 94050] test_vec4_register_coalesce regression

2016-02-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=94050

--- Comment #1 from Ilia Mirkin  ---
>From a quick glance, this appears to be a test shortcoming, not an issue with
the actual patch. It assumes that prog_data is there, which is true in real
life, but I guess the test doesn't set that up.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFCv3 11/11] mesa/st: add support for NIR as possible driver IR

2016-02-08 Thread Rob Clark
On Mon, Feb 8, 2016 at 2:00 PM, Ilia Mirkin  wrote:
> On Mon, Feb 8, 2016 at 1:55 PM, Rob Clark  wrote:
>> On Mon, Feb 8, 2016 at 1:01 PM, Ilia Mirkin  wrote:
>>> On Mon, Feb 8, 2016 at 8:59 AM, Rob Clark  wrote:
 On Mon, Feb 8, 2016 at 6:58 AM, Roland Scheidegger  
 wrote:
> Am 06.02.2016 um 22:30 schrieb Marek Olšák:
>> On Sat, Feb 6, 2016 at 2:45 PM, Rob Clark  wrote:
>>> On Sun, Jan 31, 2016 at 3:16 PM, Rob Clark  wrote:
 +   // XXX get from pipe_screen?  Or just let pipe driver provide?
 +   nir_options.lower_fpow = true;
 +   nir_options.lower_fsat = true;
 +   nir_options.lower_scmp = true;
 +   nir_options.lower_flrp = true;
 +   nir_options.lower_ffract = true;
 +   nir_options.native_integers = true;
 +
>>>
>>>
>>> btw, one of the few remaining things to tackle is how to handle
>>> nir_shader_compiler_options struct.  To follow the existing approach
>>> of shader caps, I'd have to add a big pile of caps now, and then keep
>>> adding them as nir_shader_compiler_options struct grows.  Which seems
>>> sub-optimal.
>>>
>>> How do people feel about adding a screen->get_shader_paramp() which,
>>> along the lines of get_paramf, returns a 'const void *'?  Then we
>>> could add a single cap to return the whole compiler-options struct.
>>> (And maybe if at some point there was direct support for LLVM as an
>>> IR, it might need something similar??)
>>>
>>> Other possibility is just a pipe->get_nir_compiler_options() type
>>> hook.  A bit more of a point solution, but might make sense if we
>>> can't think of any other plausible uses for ->get_shader_paramp()..
>>> and less churn since it would only need to be implemented by drivers
>>> consuming NIR..
>>>
>>> Thoughts/opinions?
>>
>> pipe->get_nir_compiler_options() sounds good.
>>
>> Maybe wait for VMWare guys' opinion as well.
>
> Looks usable to me, albeit I'm not sure you really need NIR-specific
> options as such? That is those options above don't really look nir
> specific - maybe they aren't used with just glsl->tgsi, but it looks to
> me like they would in theory be applicable to other IR as well. Though I
> suppose if you just had compiler_otions it would be a bit confusing if
> you had entries which then may not be used...

 Yeah, not really NIR specific (and there are a couple that overlap w/
 existing caps), other than being used only by NIR..  although it would
 be a lot of churn to keep adding caps when the compiler_options struct
 is extended, and it might be confusing that some of the lowering
 options aren't supported in the TGSI path..

 I guess right now it really only matters for two drivers, and down the
 road I think we won't have more than 3 or 4 drivers using NIR, so I
 suppose it is also an option to start w/
 screen->get_nir_compiler_options() for now and revisit later.  If we
 get to the point where we are always doing glsl->nir and then
 optionally nir->tgsi for drivers that don't consume NIR directly,
 maybe then it would make more sense to expose everything as caps?
>>>
>>> I actually kinda want this for TGSI as well, eventually. Perhaps something 
>>> like
>>>
>>> bool get_compiler_options(pipe_shader_ir, void *)
>>
>> perhaps:
>>
>>   const struct pipe_compiler_options * (*get_compiler_options)(struct
>> pipe_screen *, unsigned shader)
>>
>> imo, it should take shader stage as arg so we can have different
>> config per stage, and return a const ptr.. and I think it could
>> directly return options struct (no need for bool return)..
>>
>> I suppose if you plan to add lots of knobs to twiddle for TGSI then
>> shader cap for each would be annoying.  Although not super-thrilled
>> about having to translate from generic(ish) pipe struct to nir struct,
>> since the driver will already want a const version of the nir struct
>> for the tgsi_to_nir case.
>>
>> I guess we could do:
>>
>>   const void * (*get_compiler_options)(struct pipe_screen *, unsigned
>> shader, enum pipe_shader_ir type)
>
> That means the driver has to allocate and store the options somewhere.
> This can get annoying... I'd rather just have it fill in a struct
> that's passed in.

If you really have enough diff settings between gen's then just embed
the struct in your pipe_screen and populate it at screen init..

Currently we've managed to keep nir_shader_compiler_options as
something that can be const (by keeping options that depend on draw
state out).  It would be annoying to loose that.  Not to mention that
it would be duplication for the NIR drivers, since they already need
the nir options sturct in tgsi_to_nir case.

>>
>> where the return value 

Re: [Mesa-dev] [PATCH] i965: Don't add barrier deps for FB write messages.

2016-02-08 Thread Matt Turner
On Mon, Feb 8, 2016 at 11:31 AM, Kenneth Graunke  wrote:
> There are never render target reads, so there are no scheduling hazards.
>
> Giving the extra flexibility to the scheduler makes it possible to do
> FB writes as soon as their sources are available, reducing register
> pressure.  It also makes it possible to do the payload setup for more
> than one FB write message at a time, which could better hide latency.
>
> shader-db results on Skylake:
>
> total instructions in shared programs: 9110254 -> 9110211 (-0.00%)
> instructions in affected programs: 2898 -> 2855 (-1.48%)
> helped: 3
> HURT:   0
> LOST:   0
> GAINED: 1
>
> A reduction in instruction counts is surprising, but legitimate:
> the three shaders helped were spilling, and reducing register
> pressure allowed us to issue fewer spills/fills.
>
> total cycles in shared programs: 69035108 -> 68928820 (-0.15%)
> cycles in affected programs: 4412402 -> 4306114 (-2.41%)
> helped: 4457
> HURT: 213
>

Nice!

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] nir: Recognize sum of open-coded pow()s.

2016-02-08 Thread Jason Ekstrand
On Thu, Feb 4, 2016 at 5:47 PM, Matt Turner  wrote:

> Prevents regressions in the next commit.
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py
> b/src/compiler/nir/nir_opt_algebraic.py
> index 60df69f..0a248a2 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -167,6 +167,7 @@ optimizations = [
> (('flog2', ('fexp2', a)), a), # lg2(2^a) = a
> (('fpow', a, b), ('fexp2', ('fmul', ('flog2', a), b)),
> 'options->lower_fpow'), # a^b = 2^(lg2(a)*b)
> (('fexp2', ('fmul', ('flog2', a), b)), ('fpow', a, b),
> '!options->lower_fpow'), # 2^(lg2(a)*b) = a^b
> +   (('fexp2', ('fadd', ('fmul', ('flog2', a), b), ('fmul', ('flog2', c),
> d))), ('fadd', ('fpow', a, b), ('fpow', c, d))),
>

I think you mean ('fmul', ('fpow', a, b), ('fpow', c, d)).  You can't pull
an add out of an exp and get another add.


> (('fpow', a, 1.0), a),
> (('fpow', a, 2.0), ('fmul', a, a)),
> (('fpow', a, 4.0), ('fmul', ('fmul', a, a), ('fmul', a, a))),
> --
> 2.4.10
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] nir: const_index helpers

2016-02-08 Thread Jason Ekstrand
On Mon, Feb 8, 2016 at 12:16 PM, Rob Clark  wrote:

> On Mon, Feb 8, 2016 at 3:13 PM, Jason Ekstrand 
> wrote:
> >> +   if (info->index_map[NIR_INTRINSIC_BASE] ||
> >> +   info->index_map[NIR_INTRINSIC_WRMASK]) {
> >> +  fprintf(fp, " /*");
> >> +  if (info->index_map[NIR_INTRINSIC_BASE])
> >> + fprintf(fp, " base=%d", nir_intrinsic_base(instr));
> >> +  if (info->index_map[NIR_INTRINSIC_WRMASK]) {
> >> +  unsigned wrmask = nir_intrinsic_write_mask(instr);
> >> +  fprintf(fp, " wrmask=");
> >> +  for (unsigned i = 0; i < 4; i++)
> >> + if ((wrmask >> i) & 1)
> >> +fprintf(fp, "%c", "xyzw"[i]);
> >> +  }
> >> +  fprintf(fp, " */");
> >> +   }
> >
> >
> > Can we use an enum -> name table to name all of them?  Right now, it
> looks
> > like it only names baes and writemask.
>
> Hmm.. sure.  I think writemask is really the only one needing special
> casing.  I guess all the others would just show as %d..
>

Sure.  That one makes sense as hex.  The others shouldn't matter.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] nir: Recognize sum of open-coded pow()s.

2016-02-08 Thread Jason Ekstrand
On Mon, Feb 8, 2016 at 12:18 PM, Matt Turner  wrote:

> On Mon, Feb 8, 2016 at 12:01 PM, Jason Ekstrand 
> wrote:
> > On Thu, Feb 4, 2016 at 5:47 PM, Matt Turner  wrote:
> >>
> >> Prevents regressions in the next commit.
> >> ---
> >>  src/compiler/nir/nir_opt_algebraic.py | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/src/compiler/nir/nir_opt_algebraic.py
> >> b/src/compiler/nir/nir_opt_algebraic.py
> >> index 60df69f..0a248a2 100644
> >> --- a/src/compiler/nir/nir_opt_algebraic.py
> >> +++ b/src/compiler/nir/nir_opt_algebraic.py
> >> @@ -167,6 +167,7 @@ optimizations = [
> >> (('flog2', ('fexp2', a)), a), # lg2(2^a) = a
> >> (('fpow', a, b), ('fexp2', ('fmul', ('flog2', a), b)),
> >> 'options->lower_fpow'), # a^b = 2^(lg2(a)*b)
> >> (('fexp2', ('fmul', ('flog2', a), b)), ('fpow', a, b),
> >> '!options->lower_fpow'), # 2^(lg2(a)*b) = a^b
> >> +   (('fexp2', ('fadd', ('fmul', ('flog2', a), b), ('fmul', ('flog2',
> c),
> >> d))), ('fadd', ('fpow', a, b), ('fpow', c, d))),
> >
> >
> > I think you mean ('fmul', ('fpow', a, b), ('fpow', c, d)).  You can't
> pull
> > an add out of an exp and get another add.
>
> Whoops. Yes, thank you!
>

With that fixed, this one gets my R-B too.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] OpenGL ES context using EGL on framebuffer

2016-02-08 Thread Ilia Mirkin
Perhaps you'd be interested in having a look at kmscube:
https://github.com/robclark/kmscube

This is a simple demo which uses GBM and passes the resulting buffers
to KMS for scanout. Make sure you build mesa with --enable-gbm and
--enable-egl-platforms=drm [and x11 if you still want egl to work in
X11]

  -ilia


On Mon, Feb 8, 2016 at 5:14 PM, Jörg Wille  wrote:
> Is there a way to create an OpenGL ES context using EGL on a Intel Atom
> E3845 without XServer?
> For an embedded board running a Linux (based on Yocto) I want to use a
> OpenGL ES context on a framebuffer device.
> I am not familiar with Intel graphics on linux at all. There seem to be 3
> different projects:
> - Intel Embedded Graphics Drivers
> - Intel Embedded Media and Graphics Driver (EMGD)
> - Intel Graphics for Linux
>
> If possible at all, which of these is the right place to look for? And what
> are the differences between these projects?
> As I understood so far, the Intel open-source graphics driver supports the
> Mesa OpenGL (ES) implementation, which itself has an EGL implementation. But
> is it possible to target the framebuffer device?
> Thanks.
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] nir: Handle large unsigned values in opt_algebraic.

2016-02-08 Thread Kenneth Graunke
On Thursday, February 4, 2016 5:48:00 PM PST Matt Turner wrote:
> The next patch adds an algebraic rule that uses the constant 0xff00ff00.
> 
> Without this change, the build fails with
> 
>return hex(struct.unpack('I', struct.pack('i', self.value))[0])
>struct.error: 'i' format requires -2147483648 <= number <= 2147483647
> 
> The hex() function handles integers of any size, and assigning a
> negative value to an unsigned does what we want in C. The pack/unpack is
> unnecessary (and as we see, buggy).
> ---
>  src/compiler/nir/nir_algebraic.py | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/src/compiler/nir/nir_algebraic.py
> b/src/compiler/nir/nir_algebraic.py index 77ad35e..2357b57 100644
> --- a/src/compiler/nir/nir_algebraic.py
> +++ b/src/compiler/nir/nir_algebraic.py
> @@ -102,13 +102,10 @@ class Constant(Value):
>self.value = val
> 
> def __hex__(self):
> -  # Even if it's an integer, we still need to unpack as an unsigned
> -  # int.  This is because, without C99, we can only assign to the first
> -  # element of a union in an initializer.
>if isinstance(self.value, (bool)):
>   return 'NIR_TRUE' if self.value else 'NIR_FALSE'
>if isinstance(self.value, (int, long)):
> - return hex(struct.unpack('I', struct.pack('i', self.value))[0])
> + return hex(self.value)
>elif isinstance(self.value, float):
>   return hex(struct.unpack('I', struct.pack('f', self.value))[0])
>else:

FWIW, I sent a patch to fix this on January 19th which went unreviewed:
https://lists.freedesktop.org/archives/mesa-dev/2016-January/105387.html

Your patch is probably better, though.  I never understood the point of
pack/unpacking these.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] OpenGL ES context using EGL on framebuffer

2016-02-08 Thread Jörg Wille
Is there a way to create an OpenGL ES context using EGL on a Intel Atom
E3845  without
XServer?
For an embedded board
 running
a Linux (based on Yocto ) I want to use a
OpenGL ES context on a framebuffer device.
I am not familiar with Intel graphics on linux at all. There seem to be 3
different projects:
- Intel Embedded Graphics Drivers

- Intel Embedded Media and Graphics Driver (EMGD)

- Intel Graphics for Linux 

If possible at all, which of these is the right place to look for? And what
are the differences between these projects?
As I understood so far, the Intel open-source graphics driver supports the
Mesa OpenGL (ES) implementation, which itself has an EGL implementation.
But is it possible to target the framebuffer device?
Thanks.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] mesa/readpix: Dedent former _mesa_readpixels() if block

2016-02-08 Thread Nanley Chery
From: Nanley Chery 

Formatting patch split out for easy reviewing.

Signed-off-by: Nanley Chery 
---
 src/mesa/main/readpix.c | 58 -
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index 56e9d60..470182a 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -861,38 +861,38 @@ _mesa_readpixels(struct gl_context *ctx,
if (ctx->NewState)
   _mesa_update_state(ctx);
 
-  pixels = _mesa_map_pbo_dest(ctx, packing, pixels);
-
-  if (pixels) {
- /* Try memcpy first. */
- if (readpixels_memcpy(ctx, x, y, width, height, format, type,
-   pixels, packing)) {
-_mesa_unmap_pbo_dest(ctx, packing);
-return;
- }
-
- /* Otherwise take the slow path. */
- switch (format) {
- case GL_STENCIL_INDEX:
-read_stencil_pixels(ctx, x, y, width, height, type, pixels,
-packing);
-break;
- case GL_DEPTH_COMPONENT:
-read_depth_pixels(ctx, x, y, width, height, type, pixels,
-  packing);
-break;
- case GL_DEPTH_STENCIL_EXT:
-read_depth_stencil_pixels(ctx, x, y, width, height, type, pixels,
-  packing);
-break;
- default:
-/* all other formats should be color formats */
-read_rgba_pixels(ctx, x, y, width, height, format, type, pixels,
- packing);
- }
+   pixels = _mesa_map_pbo_dest(ctx, packing, pixels);
 
+   if (pixels) {
+  /* Try memcpy first. */
+  if (readpixels_memcpy(ctx, x, y, width, height, format, type,
+pixels, packing)) {
  _mesa_unmap_pbo_dest(ctx, packing);
+ return;
+  }
+
+  /* Otherwise take the slow path. */
+  switch (format) {
+  case GL_STENCIL_INDEX:
+ read_stencil_pixels(ctx, x, y, width, height, type, pixels,
+ packing);
+ break;
+  case GL_DEPTH_COMPONENT:
+ read_depth_pixels(ctx, x, y, width, height, type, pixels,
+   packing);
+ break;
+  case GL_DEPTH_STENCIL_EXT:
+ read_depth_stencil_pixels(ctx, x, y, width, height, type, pixels,
+   packing);
+ break;
+  default:
+ /* all other formats should be color formats */
+ read_rgba_pixels(ctx, x, y, width, height, format, type, pixels,
+  packing);
   }
+
+  _mesa_unmap_pbo_dest(ctx, packing);
+   }
 }
 
 
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] mesa/readpix: Clip ReadPixels() area to the ReadBuffer's

2016-02-08 Thread Nanley Chery
From: Nanley Chery 

The fast path for Intel's ReadPixels() unintentionally omits clipping
the specified area to a valid one. Rather than clip in various
corner-cases, perform this operation in the API validation stage.

The bug in intel_readpixels_tiled_memcpy() showed itself when the winsys
ReadBuffer's height was smaller than the one specified by ReadPixels().
yoffset became negative, which was an invalid input for tiled_to_linear().

v2: Move clipping to validation stage (Jason)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92193
Reported-by: Marta Löfstedt 
Cc: "11.0 11.1" 
Signed-off-by: Nanley Chery 
---
 src/mesa/main/readpix.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index 8cdc9fe..a5b74bc 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -993,6 +993,7 @@ _mesa_ReadnPixelsARB( GLint x, GLint y, GLsizei width, 
GLsizei height,
 {
GLenum err = GL_NO_ERROR;
struct gl_renderbuffer *rb;
+   struct gl_pixelstore_attrib clippedPacking;
 
GET_CURRENT_CONTEXT(ctx);
 
@@ -1094,7 +1095,9 @@ _mesa_ReadnPixelsARB( GLint x, GLint y, GLsizei width, 
GLsizei height,
   }
}
 
-   if (width == 0 || height == 0)
+   /* Do all needed clipping here, so that we can forget about it later */
+   clippedPacking = ctx->Pack;
+   if (!_mesa_clip_readpixels(ctx, , , , , ))
   return; /* nothing to do */
 
if (!_mesa_validate_pbo_access(2, >Pack, width, height, 1,
@@ -1118,7 +1121,7 @@ _mesa_ReadnPixelsARB( GLint x, GLint y, GLsizei width, 
GLsizei height,
}
 
ctx->Driver.ReadPixels(ctx, x, y, width, height,
- format, type, >Pack, pixels);
+  format, type, , pixels);
 }
 
 void GLAPIENTRY
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] mesa/readpix: Don't clip in _mesa_readpixels()

2016-02-08 Thread Nanley Chery
From: Nanley Chery 

The clipping is performed higher up in the call-chain.

Signed-off-by: Nanley Chery 
---
 src/mesa/main/readpix.c | 20 +++-
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index a5b74bc..56e9d60 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -858,21 +858,16 @@ _mesa_readpixels(struct gl_context *ctx,
  const struct gl_pixelstore_attrib *packing,
  GLvoid *pixels)
 {
-   struct gl_pixelstore_attrib clippedPacking = *packing;
-
if (ctx->NewState)
   _mesa_update_state(ctx);
 
-   /* Do all needed clipping here, so that we can forget about it later */
-   if (_mesa_clip_readpixels(ctx, , , , , )) {
-
-  pixels = _mesa_map_pbo_dest(ctx, , pixels);
+  pixels = _mesa_map_pbo_dest(ctx, packing, pixels);
 
   if (pixels) {
  /* Try memcpy first. */
  if (readpixels_memcpy(ctx, x, y, width, height, format, type,
pixels, packing)) {
-_mesa_unmap_pbo_dest(ctx, );
+_mesa_unmap_pbo_dest(ctx, packing);
 return;
  }
 
@@ -880,25 +875,24 @@ _mesa_readpixels(struct gl_context *ctx,
  switch (format) {
  case GL_STENCIL_INDEX:
 read_stencil_pixels(ctx, x, y, width, height, type, pixels,
-);
+packing);
 break;
  case GL_DEPTH_COMPONENT:
 read_depth_pixels(ctx, x, y, width, height, type, pixels,
-  );
+  packing);
 break;
  case GL_DEPTH_STENCIL_EXT:
 read_depth_stencil_pixels(ctx, x, y, width, height, type, pixels,
-  );
+  packing);
 break;
  default:
 /* all other formats should be color formats */
 read_rgba_pixels(ctx, x, y, width, height, format, type, pixels,
- );
+ packing);
  }
 
- _mesa_unmap_pbo_dest(ctx, );
+ _mesa_unmap_pbo_dest(ctx, packing);
   }
-   }
 }
 
 
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] mesa/image: Make _mesa_clip_readpixels() work with renderbuffers

2016-02-08 Thread Nanley Chery
From: Nanley Chery 

v2: Use gl_renderbuffer::{Width,Height} (Jason)

Cc: "11.0 11.1" 
Signed-off-by: Nanley Chery 
---
 src/mesa/main/image.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/mesa/main/image.c b/src/mesa/main/image.c
index e79e3e6..99f253c 100644
--- a/src/mesa/main/image.c
+++ b/src/mesa/main/image.c
@@ -670,7 +670,7 @@ _mesa_clip_drawpixels(const struct gl_context *ctx,
  * so that the image region is entirely within the window bounds.
  * Note: this is different from _mesa_clip_drawpixels() in that the
  * scissor box is ignored, and we use the bounds of the current readbuffer
- * surface.
+ * surface or the attached image.
  *
  * \return  GL_TRUE if region to read is in bounds
  *  GL_FALSE if region is completely out of bounds (nothing to read)
@@ -682,6 +682,18 @@ _mesa_clip_readpixels(const struct gl_context *ctx,
   struct gl_pixelstore_attrib *pack)
 {
const struct gl_framebuffer *buffer = ctx->ReadBuffer;
+   struct gl_renderbuffer *rb = buffer->_ColorReadBuffer;
+   GLsizei clip_width;
+   GLsizei clip_height;
+
+   if (rb) {
+  clip_width = rb->Width;
+  clip_height = rb->Height;
+   } else {
+  clip_width = buffer->Width;
+  clip_height = buffer->Height;
+   }
+
 
if (pack->RowLength == 0) {
   pack->RowLength = *width;
@@ -694,8 +706,8 @@ _mesa_clip_readpixels(const struct gl_context *ctx,
   *srcX = 0;
}
/* right clipping */
-   if (*srcX + *width > (GLsizei) buffer->Width)
-  *width -= (*srcX + *width - buffer->Width);
+   if (*srcX + *width > clip_width)
+  *width -= (*srcX + *width - clip_width);
 
if (*width <= 0)
   return GL_FALSE;
@@ -707,8 +719,8 @@ _mesa_clip_readpixels(const struct gl_context *ctx,
   *srcY = 0;
}
/* top clipping */
-   if (*srcY + *height > (GLsizei) buffer->Height)
-  *height -= (*srcY + *height - buffer->Height);
+   if (*srcY + *height > clip_height)
+  *height -= (*srcY + *height - clip_height);
 
if (*height <= 0)
   return GL_FALSE;
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: add missing error check in _mesa_CallLists()

2016-02-08 Thread Brian Paul
Generate GL_INVALID_VALUE if n < 0.  Return early if n==0.
---
 src/mesa/main/dlist.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/mesa/main/dlist.c b/src/mesa/main/dlist.c
index cd8e3b6..24aea35 100644
--- a/src/mesa/main/dlist.c
+++ b/src/mesa/main/dlist.c
@@ -9105,6 +9105,15 @@ _mesa_CallLists(GLsizei n, GLenum type, const GLvoid * 
lists)
   return;
}
 
+   if (n < 0) {
+  _mesa_error(ctx, GL_INVALID_VALUE, "glCallLists(n < 0)");
+  return;
+   }
+   else if (n == 0) {
+  /* nothing to do */
+  return;
+   }
+
/* Save the CompileFlag status, turn it off, execute display list,
 * and restore the CompileFlag.
 */
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: add missing error check in _mesa_CallLists()

2016-02-08 Thread Ian Romanick
On 02/08/2016 02:31 PM, Brian Paul wrote:
> Generate GL_INVALID_VALUE if n < 0.  Return early if n==0.
> ---
>  src/mesa/main/dlist.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/src/mesa/main/dlist.c b/src/mesa/main/dlist.c
> index cd8e3b6..24aea35 100644
> --- a/src/mesa/main/dlist.c
> +++ b/src/mesa/main/dlist.c
> @@ -9105,6 +9105,15 @@ _mesa_CallLists(GLsizei n, GLenum type, const GLvoid * 
> lists)
>return;
> }
>  
> +   if (n < 0) {
> +  _mesa_error(ctx, GL_INVALID_VALUE, "glCallLists(n < 0)");
> +  return;
> +   }
> +   else if (n == 0) {

I think the modern style is to put the 'else if' on the same line with
the closing curly brace.  I'm not too picky about it since this matches
all the rest of dlist.c.

I'm also wondering... should this check go before the call to
SAVE_FLUSH_VERTICES?  Usually we try to bail from errors before doing
anything.

> +  /* nothing to do */
> +  return;
> +   }
> +
> /* Save the CompileFlag status, turn it off, execute display list,
>  * and restore the CompileFlag.
>  */
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Roland Scheidegger
Am 09.02.2016 um 00:53 schrieb Ian Romanick:
> On 02/08/2016 03:37 PM, Roland Scheidegger wrote:
>> Am 09.02.2016 um 00:02 schrieb Ian Romanick:
>>> On 02/08/2016 12:38 PM, Marek Olšák wrote:
 On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard  wrote:
> On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
>> but not SI & CI, which can't disable denorms for those instructions.
>
> Do you know why this fixes FP16 conversions?  What does the OpenGL
> spec say about denormal handing?

 Yes, I know why. The patch explain everything as far as I can see
 though. What isn't clear?

 SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
 and accept FP16 denormals.
 VI: Supports FP16. FP16 denormal support is now configurable and
 affects FP16 conversions as well.(shared setting with FP64).

 OpenGL doesn't require denormals. Piglit does. I think this is
 incorrect piglit behavior.
>>>
>>> I submitted a public spec bug for this issue:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.khronos.org_bugzilla_show-5Fbug.cgi-3Fid-3D1460=BQIDaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=wf_-p9zXClKi6rlzphb6XSztBDs8LgFs5sHmLe6XksM=LSXF0wJDqDbzYPJ2Vq96RZlxflw--IPmOYlRKgcPgXg=
>>>  
>>>
>>> I'm investigating whether a similar bug is needed for the SPIR-V
>>> specification.
>>>
>>> I think an argument can be made for either the flush-to-zero or
>>> non-flush-to-zero behavior in the case of unpackHalf2x16 and (possibly)
>>> packHalf2x16.  The only place in the GLSL 4.50.5 specification that
>>> mentions subnormal values is section 4.7.1 (Range and Precision).
>>>
>>> "The precision of stored single- and double-precision floating-point
>>> variables is defined by the IEEE 754 standard for 32-bit and 64-bit
>>> floating-point numbersAny denormalized value input into a
>>> shader or potentially generated by any operation in a shader can be
>>> flushed to 0."
>>>
>>> Since there is no half-precision type in desktop GLSL, there is no
>>> mention of 16-bit subnormal values.  As Roland mentioned before, all
>>> 16-bit subnormal values values are 32-bit normal values.
>>>
>>> As I mentioned before, from the point of view of an application
>>> developer, the flush-to-zero behavior for unpackHalf2x16 is both
>>> surprising and awful. :)
>>>
>>> While I think an argument can be made for either behavior, I also think
>>> the argument for the non-flush-to-zero behavior is slightly stronger.
>>> The case for flush-to-zero based on the above spec quotation fails for
>>> two reasons.  First, the "input into [the] shader" is not a subnormal
>>> number.  It is an integer.  Second, the "[value] potentially generated
>>> by [the] operation" is not subnormal in single-precision.
>>
>> I don't disagree with that, however OTOH you could make an argument that
>> such a strong guarantee for packed half floats is inconsistent with
>> what's required for them elsewhere in GL. In particular half float
>> texture formats - these are still based on ARB_half_float_pixel. Which
>> says denormals are optional, infs are optional, NaNs are optional -
>> albeit that's not any different to ordinary floats...
> 
> Thanks for mentioning this. :)  The same issue had occurred to me, and I
> was trying to find some relevant text in the GL spec.  I hadn't thought
> to look in the extension spec.
GL core spec 4.5 actually mentions pretty much the same within the
generic numeric bits, section 2.3.4.2 - except the extension bit has
explicitly listed that exponent 0 and mantissa non-zero may be decoded
to zero (and similar for infs, nans). But the core bits text still
mentions just that "providing a denormalized number or negative zero to
GL must yield predictable results" so flush to zero is apparently still
allowed.


> 
>> (And I still have the problem that d3d10 wants trunc behavior instead of
>> round... fwiw the precedent there in GL is also for r11g11b10 format,
>> which says round-to-nearest recommended but trunc allowed, and all too
>> large finite numbers converted to max finite (which is inconsistent with
>> nearest rounding). The spec is completely silent both within GLSL or GL
>> how rounding should be done for fp32 to fp16, albeit I don't disagree
>> round-to-nearest seems the most reasonable.)
> 
> The GLSL spec isn't silent.  Section 4.7.1 explicitly says, "The
> rounding mode cannot be set and is undefined."
Yes, but at least to me it's not really obvious this applies to all
operations - and at least the basic operations say "must be correctly
rounded", what does this even mean if the rounding mode isn't defined in
the first place? Would the rounding mode have to be consistent for all
operations, so, always "trunc" for all operations would 

Re: [Mesa-dev] [PATCH 3/5] nir: Do opt_algebraic in reverse order.

2016-02-08 Thread Matt Turner
On Mon, Feb 8, 2016 at 3:57 PM, Ian Romanick  wrote:
> On 02/04/2016 05:47 PM, Matt Turner wrote:
>> Walking the SSA definitions in order means that we consider the smallest
>> algebraic optimizations before larger optimizations. So if a smaller
>> rule is part of a larger rule, the smaller one will happen first,
>> preventing the larger one from happening.
>
> Doesn't that just mean that our "larger pattern" space is somehow
> incompletely?  This seems to indicate that applications could (but
> probably don't?) open-code these patterns and we'll miss them.

I don't understand the question. Does my reply to Eduardo perhaps
answer your question?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] nir: Handle large unsigned values in opt_algebraic.

2016-02-08 Thread Kenneth Graunke
On Monday, February 8, 2016 4:01:37 PM PST Ian Romanick wrote:
> On 02/08/2016 01:59 PM, Kenneth Graunke wrote:
> > On Thursday, February 4, 2016 5:48:00 PM PST Matt Turner wrote:
> >> The next patch adds an algebraic rule that uses the constant 0xff00ff00.
> >>
> >> Without this change, the build fails with
> >>
> >>return hex(struct.unpack('I', struct.pack('i', self.value))[0])
> >>struct.error: 'i' format requires -2147483648 <= number <= 2147483647
> >>
> >> The hex() function handles integers of any size, and assigning a
> >> negative value to an unsigned does what we want in C. The pack/unpack is
> >> unnecessary (and as we see, buggy).
> >> ---
> >>  src/compiler/nir/nir_algebraic.py | 5 +
> >>  1 file changed, 1 insertion(+), 4 deletions(-)
> >>
> >> diff --git a/src/compiler/nir/nir_algebraic.py
> >> b/src/compiler/nir/nir_algebraic.py index 77ad35e..2357b57 100644
> >> --- a/src/compiler/nir/nir_algebraic.py
> >> +++ b/src/compiler/nir/nir_algebraic.py
> >> @@ -102,13 +102,10 @@ class Constant(Value):
> >>self.value = val
> >>
> >> def __hex__(self):
> >> -  # Even if it's an integer, we still need to unpack as an unsigned
> >> -  # int.  This is because, without C99, we can only assign to the 
> >> first
> >> -  # element of a union in an initializer.
> >>if isinstance(self.value, (bool)):
> >>   return 'NIR_TRUE' if self.value else 'NIR_FALSE'
> >>if isinstance(self.value, (int, long)):
> >> - return hex(struct.unpack('I', struct.pack('i', self.value))[0])
> >> + return hex(self.value)
> >>elif isinstance(self.value, float):
> >>   return hex(struct.unpack('I', struct.pack('f', self.value))[0])
> >>else:
> > 
> > FWIW, I sent a patch to fix this on January 19th which went unreviewed:
> > https://lists.freedesktop.org/archives/mesa-dev/2016-January/105387.html
> 
> I was going to R-b it...  After you NAKed the second patch in the series
> I waited for v2.

Oh.  Sorry for the confusion.  That series was actually fine - I just
had a fabs/iabs mixup.  With that fixed, everything worked fine, and I
considered it out for review again.  I suppose I should just re-send it.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >