Re: [Mesa-dev] [PATCH v2] i965: do not round line width when multisampling or antialiaing are enabled

2015-06-10 Thread Kenneth Graunke
On Wednesday, June 10, 2015 09:07:32 AM Iago Toral Quiroga wrote:
> In commit fe74fee8fa721a we rounded the line width to the nearest integer to
> match the GLES3 spec requirements stated in section 13.4.2.1, but that seems
> to break a dEQP test that renders wide lines in some multisampling scenarios.
> 
> Ian noted that the Open 4.4 spec has the following similar text:
> 
> "The actual width of non-antialiased lines is determined by rounding the
> supplied width to the nearest integer, then clamping it to the
> implementation-dependent maximum non-antialiased line width."
> 
> and suggested that when ES removed antialiased lines, they removed
> "non-antialised" from that paragraph but probably should not have.
> 
> Going by that note, this patch restricts the quantization implemented in
> fe74fee8fa721a only to regular aliased lines. This seems to keep the
> tests fixed with that commit passing while fixing the broken test.
> 
> v2:
>   - Drop one of the clamps (Ken, Marius)
>   - Add a rule to prevent advertising line widths that when rounded go beyond
> the limits allowed by the hardware (Ken)
>   - Update comments in the code accordingly (Ian)
>   - Put the code in a utility function (Ian)
> 
> Fixes:
> dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90749

Looks good to me.  Thanks for doing this!

Reviewed-by: Kenneth Graunke 

By the way, I noticed that Marius' line-width < 1.5 code never got added
to gen8_sf_state.c, so a couple Piglit tests fail still.  It might be
nice to put that in the helper function too.  Feel like making a
follow-up patch?


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Momentarily pretend to support ARB_texture_stencil8 for blits.

2015-06-10 Thread Kenneth Graunke
On Wednesday, June 10, 2015 05:57:05 PM Neil Roberts wrote:
> Kenneth Graunke  writes:
> 
> > _mesa_meta_fb_tex_blit_begin(ctx, &blit);
> > +   ctx->Extensions.ARB_texture_stencil8 = true;
> 
> Maybe you could put assert(ctx->Extensions.ARB_texture_stencil8==false)
> just before setting it to true so that we'll definitely remember to
> remove it if we eventually enable the extension. Otherwise as it stands
> if we forget about this it would probably not break any tests but the
> extension would mysteriously disable itself and no-one would notice
> because they would probably check the extensions just once upfront.
> 
> - Neil

Great idea.  Added, retested, and pushed.  Thanks all!


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/gen8: Fix antialiased line rendering with width < 1.5

2015-06-11 Thread Kenneth Graunke
On Thursday, June 11, 2015 09:03:37 AM Iago Toral Quiroga wrote:
> The same fix Marius implemented for gen6 (commit a9b04d8a) and
> gen7 (commit 24ecf37a).
> ---
>  src/mesa/drivers/dri/i965/gen8_sf_state.c | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> Ken, I don't have gen8 hardware available to test this so it would
> be nice if someone else verifies that this fixes the piglit tests you
> mentioned.

Oh, sorry - I was hoping we could include this in the helper function,
since it's the exact same code for Gen6/7/8...

> diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c 
> b/src/mesa/drivers/dri/i965/gen8_sf_state.c
> index 83ef62b..22c1c4a 100644
> --- a/src/mesa/drivers/dri/i965/gen8_sf_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_sf_state.c
> @@ -156,8 +156,26 @@ upload_sf(struct brw_context *brw)
> /* _NEW_LINE */
> float line_width = brw_get_line_width(brw);
> uint32_t line_width_u3_7 = U_FIXED(line_width, 7);
> -   if (line_width_u3_7 == 0)
> -  line_width_u3_7 = 1;
> +
> +   /* Line width of 0 is not allowed when MSAA enabled */
> +   if (ctx->Multisample._Enabled) {
> +  if (line_width_u3_7 == 0)
> +  line_width_u3_7 = 1;
> +   } else if (ctx->Line.SmoothFlag && ctx->Line.Width < 1.5) {
> +  /* For 1 pixel line thickness or less, the general
> +   * anti-aliasing algorithm gives up, and a garbage line is
> +   * generated.  Setting a Line Width of 0.0 specifies the
> +   * rasterization of the "thinnest" (one-pixel-wide),
> +   * non-antialiased lines.
> +   *
> +   * Lines rendered with zero Line Width are rasterized using
> +   * Grid Intersection Quantization rules as specified by
> +   * bspec section 6.3.12.1 Zero-Width (Cosmetic) Line
> +   * Rasterization.
> +   */
> +  line_width_u3_7 = 0;
> +   }
> +
> if (brw->gen >= 9 || brw->is_cherryview) {
>dw1 |= line_width_u3_7 << GEN9_SF_LINE_WIDTH_SHIFT;
> } else {
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: Delete linked GLSL IR when using NIR.

2015-06-11 Thread Kenneth Graunke
On Thursday, June 11, 2015 10:41:53 AM Tapani Pälli wrote:
> This is based on Kenneth's patch to delete 'most of the IR'. Due to
> linker changes to clone variables, we can now free all of IR.
> 
> Saves 58MB of memory when replaying a Dota 2 trace on Broadwell.
> 
> Signed-off-by: Tapani Pälli 
> ---
>  src/mesa/drivers/dri/i965/brw_shader.cpp | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> b/src/mesa/drivers/dri/i965/brw_shader.cpp
> index 76285f2..99de1cd 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> @@ -297,8 +297,11 @@ brw_link_shader(struct gl_context *ctx, struct 
> gl_shader_program *shProg)
>  
>brw_add_texrect_params(prog);
>  
> -  if (options->NirOptions)
> +  if (options->NirOptions) {
>   prog->nir = brw_create_nir(brw, shProg, prog, (gl_shader_stage) 
> stage);
> + ralloc_free(shader->ir);
> + shader->ir = NULL;
> +  }
>  
>    _mesa_reference_program(ctx, &prog, NULL);
> }
> 

Thanks, Tapani!

Both are:
Reviewed-by: Kenneth Graunke 

I also think they might be worth a
Cc: mesa-sta...@lists.freedesktop.org
since it saves a pretty large amount of memory.  I'm happy to concede on
that point if others feel differently, though.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965/gen8: Fix antialiased line rendering with width < 1.5

2015-06-11 Thread Kenneth Graunke
On Thursday, June 11, 2015 09:50:53 AM Iago Toral Quiroga wrote:
> The same fix Marius implemented for gen6 (commit a9b04d8a) and
> gen7 (commit 24ecf37a).
> 
> Also, we need the same code to handle special cases of line width
> in gen6, gen7 and now gen8, so put that in the helper function
> we use to compute the line width.

Fixes Piglit's line-aa-width and line-flat-clip-color on Broadwell.

Reviewed-by: Kenneth Graunke 

Thanks, Iago!

> ---
>  src/mesa/drivers/dri/i965/brw_util.h  | 31 
> +++
>  src/mesa/drivers/dri/i965/gen6_sf_state.c | 22 +-
>  src/mesa/drivers/dri/i965/gen7_sf_state.c | 21 +
>  src/mesa/drivers/dri/i965/gen8_sf_state.c |  5 +
>  4 files changed, 30 insertions(+), 49 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_util.h 
> b/src/mesa/drivers/dri/i965/brw_util.h
> index 671d72e..04e4e94 100644
> --- a/src/mesa/drivers/dri/i965/brw_util.h
> +++ b/src/mesa/drivers/dri/i965/brw_util.h
> @@ -41,7 +41,7 @@ extern GLuint brw_translate_blend_factor( GLenum factor );
>  extern GLuint brw_translate_blend_equation( GLenum mode );
>  extern GLenum brw_fix_xRGB_alpha(GLenum function);
>  
> -static inline float
> +static inline uint32_t
>  brw_get_line_width(struct brw_context *brw)
>  {
> /* From the OpenGL 4.4 spec:
> @@ -50,9 +50,32 @@ brw_get_line_width(struct brw_context *brw)
>  * the supplied width to the nearest integer, then clamping it to the
>  * implementation-dependent maximum non-antialiased line width."
>  */
> -   return CLAMP(!brw->ctx.Multisample._Enabled && !brw->ctx.Line.SmoothFlag
> -? roundf(brw->ctx.Line.Width) : brw->ctx.Line.Width,
> -0.0, brw->ctx.Const.MaxLineWidth);
> +   float line_width =
> +  CLAMP(!brw->ctx.Multisample._Enabled && !brw->ctx.Line.SmoothFlag
> +? roundf(brw->ctx.Line.Width) : brw->ctx.Line.Width,
> +0.0, brw->ctx.Const.MaxLineWidth);
> +   uint32_t line_width_u3_7 = U_FIXED(line_width, 7);
> +
> +   /* Line width of 0 is not allowed when MSAA enabled */
> +   if (brw->ctx.Multisample._Enabled) {
> +  if (line_width_u3_7 == 0)
> + line_width_u3_7 = 1;
> +   } else if (brw->ctx.Line.SmoothFlag && line_width < 1.5) {
> +  /* For 1 pixel line thickness or less, the general
> +   * anti-aliasing algorithm gives up, and a garbage line is
> +   * generated.  Setting a Line Width of 0.0 specifies the
> +   * rasterization of the "thinnest" (one-pixel-wide),
> +   * non-antialiased lines.
> +   *
> +   * Lines rendered with zero Line Width are rasterized using
> +   * Grid Intersection Quantization rules as specified by
> +   * bspec section 6.3.12.1 Zero-Width (Cosmetic) Line
> +   * Rasterization.
> +   */
> +  line_width_u3_7 = 0;
> +   }
> +
> +   return line_width_u3_7;
>  }
>  
>  #endif
> diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
> b/src/mesa/drivers/dri/i965/gen6_sf_state.c
> index d577764..5809628 100644
> --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
> @@ -361,27 +361,7 @@ upload_sf_state(struct brw_context *brw)
>  
> /* _NEW_LINE */
> {
> -  float line_width = brw_get_line_width(brw);
> -  uint32_t line_width_u3_7 = U_FIXED(line_width, 7);
> -
> -  /* Line width of 0 is not allowed when MSAA enabled */
> -  if (ctx->Multisample._Enabled) {
> - if (line_width_u3_7 == 0)
> - line_width_u3_7 = 1;
> -  } else if (ctx->Line.SmoothFlag && ctx->Line.Width < 1.5) {
> - /* For 1 pixel line thickness or less, the general
> -  * anti-aliasing algorithm gives up, and a garbage line is
> -  * generated.  Setting a Line Width of 0.0 specifies the
> -  * rasterization of the "thinnest" (one-pixel-wide),
> -  * non-antialiased lines.
> -  *
> -  * Lines rendered with zero Line Width are rasterized using
> -  * Grid Intersection Quantization rules as specified by
> -  * bspec section 6.3.12.1 Zero-Width (Cosmetic) Line
> -  * Rasterization.
> -  */
> - line_width_u3_7 = 0;
> -  }
> +  uint32_t line_width_u3_7 = brw_get_line_width(brw);
>dw3 |= line_width_u3_7 << GEN6_SF_LINE_WIDTH_SHIFT;
> }
> if (ctx->Line.SmoothFlag) {
> diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c 
> b/src/mesa/drivers/dri/i965/gen7_sf_state.c
> index 87ff284..a20967c 100644
> --- a/src/mesa/drivers/

[Mesa-dev] [PATCH] nir: Recognize max(min(a, 1.0), 0.0) as fsat(a).

2015-06-11 Thread Kenneth Graunke
We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
this variant (which is also handled by the GLSL IR pass).

shader-db results on Broadwell:
total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
instructions in affected programs: 11928 -> 11670 (-2.16%)
helped:64
HURT:  0

Signed-off-by: Kenneth Graunke 
---
 src/glsl/nir/nir_opt_algebraic.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/glsl/nir/nir_opt_algebraic.py 
b/src/glsl/nir/nir_opt_algebraic.py
index eace791..3068445 100644
--- a/src/glsl/nir/nir_opt_algebraic.py
+++ b/src/glsl/nir/nir_opt_algebraic.py
@@ -101,6 +101,7 @@ optimizations = [
(('umin', a, a), a),
(('umax', a, a), a),
(('fmin', ('fmax', a, 0.0), 1.0), ('fsat', a), '!options->lower_fsat'),
+   (('fmax', ('fmin', a, 1.0), 0.0), ('fsat', a), '!options->lower_fsat'),
(('fsat', a), ('fmin', ('fmax', a, 0.0), 1.0), 'options->lower_fsat'),
(('fsat', ('fsat', a)), ('fsat', a)),
(('fmin', ('fmax', ('fmin', ('fmax', a, 0.0), 1.0), 0.0), 1.0), ('fmin', 
('fmax', a, 0.0), 1.0)),
-- 
2.2.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] What branch to get patch 47790

2015-06-16 Thread Kenneth Graunke
On Tuesday, June 16, 2015 10:08:38 PM Meng, David wrote:
> Hi:
> I am new to this email list.  I would like to get a help from you.
> 
> I found a patch with number of 47790 which supports Intel Broadwell(BDW) 
> system gen8 GPU.  The author is Topi Pohjolainen.  The description is in 
> below.
> I need this patch to launch a virtual machine on BDW system in which we are 
> using Mesa library in user space.  But I could not find this patch in mesa 
> master or any branches.  Would you please pint me where I can find a branch 
> including this patch? 
> 
> I highly appreciate any help.
> 
> Regards,
> 
> David
> patch title and 
> description-
> [Mesa-dev] i965: Don't use gl-context for fbo-blits
> This series introduces new blorp parameter type for blit programs
> compiled from glsl-sources. For most parts the launch logic just
> calls core i965 batch emission logic.
> Vertex batches are handcrafted containing full vertex header
> information. This is needed because the pipeline is programmed to
> skip vertex shader, clip and viewport transformation in strips&fans
> (SF) but to provide the vertices directly from vertex fetcher (VF)
> to the windower (WM).
> 
> Topi Pohjolainen (14):
>   i965/blorp/gen7: Support for loading glsl-based fragment shaders
>   i965/blorp/gen6: Support for loading glsl-based fragment shaders
>   meta: Provide read access to blit shaders
>   i965/meta: Add helper for looking up blit programs
>   i965/blorp: Add plumbing for glsl-based color blits
>   i965/blorp: Add support for loading vertices for glsl-based blits
>   i965/blorp: Add support for setting up surfaces for glsl-based blits
>   i965/blorp: Add support for setting samplers for glsl-based blits
>   i965/gen6: Add support for setting minimum layer for tex surfaces
>   i965/blorp: Enable glsl-based fbo blits
>   i965/blorp/gen7: Prepare re-using for gen8
>   i965/blorp/gen7: Expose state setup applicable to gen8
>   i965/blorp/gen6: Prepare vertex buffer setup logic for gen8
>   i965/blorp/gen8: Execution support

Hi David,

I'm not sure what you mean by "patch with a number of 47790".  We don't
number patches in the Mesa community.  That must be some Intel internal
number.

Those patches optimize our implementation of glBlitFramebuffer() on
Broadwell, which should provide better performance in some cases.  We
haven't landed them yet because Topi is rewriting them.

Those patches are not required for Broadwell support, however.
Broadwell has been supported since Mesa 10.1 (but you should use a more
recent version, such as 10.6).

I don't know what's required for virtual machines, as I've never tried
that.  But these patches definitely aren't relevant.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/compute: Fix undefined code with right_mask for SIMD32

2015-06-16 Thread Kenneth Graunke
On Tuesday, June 16, 2015 02:46:22 PM Jordan Justen wrote:
> Although we don't support SIMD32, krh pointed out that the left shift
> by 32 is undefined by C/C++ for 32-bit integers.
> 
> Suggested-by: Kristian Høgsberg 
> Signed-off-by: Jordan Justen 
> Cc: Kristian Høgsberg 
> ---
>  src/mesa/drivers/dri/i965/brw_compute.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
> b/src/mesa/drivers/dri/i965/brw_compute.c
> index b3d6de5..5693ab5 100644
> --- a/src/mesa/drivers/dri/i965/brw_compute.c
> +++ b/src/mesa/drivers/dri/i965/brw_compute.c
> @@ -45,7 +45,7 @@ brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint 
> *num_groups)
> unsigned thread_width_max =
>(group_size + simd_size - 1) / simd_size;
>  
> -   uint32_t right_mask = (1u << simd_size) - 1;
> +   uint32_t right_mask = 0xu >> (32 - simd_size);
> const unsigned right_non_aligned = group_size & (simd_size - 1);
> if (right_non_aligned != 0)
>right_mask >>= (simd_size - right_non_aligned);
> 

Patch works as advertised:

python>> hex((1 << 8) - 1)
'0xff'
python>> hex((1 << 16) - 1)
'0x'
python>> hex((1 << 32) - 1)
'0xffff'
python>> hex(0x >> (32 - 8))
'0xff'
python>> hex(0x >> (32 - 16))
'0x'
python>> hex(0x >> (32 - 32))
'0x'

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] What branch to get patch 47790

2015-06-16 Thread Kenneth Graunke
On Tuesday, June 16, 2015 11:34:17 PM Meng, David wrote:
> Hi Ken:
> 
> Thank you very much for your quick response.
> 
> I have been developing a GPU driver for VMware ESXi kernel.  ESXi kernel is a 
> virtualized hypervisor and our GPU driver provides graphics support for that 
> kernel. We used Mesa libraries in the user space.  The kernel driver and Mesa 
> library work fine on HSW and IVB systems.  But we found that the Mesa 
> (10.4.0) hits the unreachable code when we launching VM on the BDW system.  
> Launching a VM is equivalent to starting a 3D graphics application on Linux 
> system.
> 
> The assert happens in following function:
>   
> brw_blorp_exec(struct brw_context *brw, const brw_blorp_params *params)

BLORP should never be used on Gen8, currently.  Could you post a
backtrace (in gdb, run 'bt') that shows what's calling it?

Alternatively, do you have additional patches to your Mesa tree that
might cause it to be called?  brw_blorp_exec should never be called with
upstream Mesa.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] What branch to get patch 47790

2015-06-16 Thread Kenneth Graunke
On Wednesday, June 17, 2015 12:05:05 AM Meng, David wrote:
> Hi Ken:
> Thank you for the help and clarification.
>  
> The back trace we got from the dump file is in blow.  The brw_blorp_exec()  
> is called in the thread 1. 
> 
> We do not have other major patches in the Mesa but some small patches.  I can 
> find them if you need.
> 
> Regards,
> 
> David
> --back trace-
> Thread 3 (Thread 38344):
> #0  0x077cccac in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x0623d9c1 in ?? ()
> #2  0x065d2f82 in ?? ()
> #3  0x06245a38 in ?? ()
> #4  0x077c8ddc in start_thread () from /lib64/libpthread.so.0
> #5  0x0860f12d in clone () from /lib64/libc.so.6
> 
> Thread 2 (Thread 38341):
> #0  0x0623c77f in ?? ()
> #1  0x0623c7ff in ?? ()
> #2  0x0623c4c4 in ?? ()
> #3  0x06294047 in ?? ()
> #4  0x060a2d0b in ?? ()
> #5  0x06018135 in ?? ()
> #6  0x06019654 in ?? ()
> #7  0x060164b6 in main ()
> 
> Thread 1 (Thread 38345):
> #0  0x065d5942 in ?? ()
> #1  0x06018991 in ?? ()
> #2  0x065d5816 in ?? ()
> #3  
> #4  0x0857014e in raise () from /lib64/libc.so.6
> #5  0x085715fb in abort () from /lib64/libc.so.6
> #6  0x08568e4d in __assert_fail () from /lib64/libc.so.6
> #7  0x0ad1fd97 in brw_blorp_exec (brw=0x32d55360, 
> params=0x3ffc38b1270) at brw_blorp.cpp:241
> #8  0x0ad20c40 in brw_blorp_blit_miptrees (brw=0x32d55360, 
> src_mt=0x337b02a0, src_level=0, src_layer=1, 
> src_format=MESA_FORMAT_R8G8B8A8_SRGB, dst_mt=0x337bb4c0, dst_level=0, 
> dst_layer=0, dst_format=MESA_FORMAT_B8G8R8A8_SRGB, src_x0=0, src_y0=0, 
> src_x1=32, src_y1=32, dst_x0=0, dst_y0=0, dst_x1=32, dst_y1=32, filter=9728, 
> mirror_x=false, mirror_y=false) at brw_blorp_blit.cpp:96
> #9  0x0ad1cd52 in blit_texture_to_pbo (ctx=0x32d55360, format=32993, 
> type=33639, pixels=0x0, texImage=0x337b04e0) at intel_tex_image.c:531
> #10 0x0ad1cdf2 in intel_get_tex_image (ctx=0x32d55360, format=32993, 
> type=33639, pixels=0x0, texImage=0x337b04e0) at intel_tex_image.c:551
> #11 0x0aac5ab6 in _mesa_GetnTexImageARB (target=34070, level=0, 
> format=32993, type=33639, bufSize=2147483647, pixels=0x0) at 
> ../../src/mesa/main/texgetimage.c:949
> #12 0x0aac5b13 in _mesa_GetTexImage (target=34070, level=0, 
> format=32993, type=33639, pixels=0x0) at ../../src/mesa/main/texgetimage.c:959
> #13 0x09a4a19b in shared_dispatch_stub_281 (target=34070, level=0, 
> format=32993, type=33639, pixels=0x0) at 
> ../../src/mapi/shared-glapi/glapi_mapi_tmp.h:16289
> #14 0x062d1360 in ?? ()
> #15 0x062cd9fd in ?? ()
> #16 0x062f8b24 in ?? ()
> #17 0x062f8be3 in ?? ()
> #18 0x062ac937 in ?? ()
> #19 0x06294f25 in ?? ()
> #20 0x062a9073 in ?? ()
> #21 0x06245a38 in ?? ()
> #22 0x077c8ddc in start_thread () from /lib64/libpthread.so.0
> 
> Following is the back trace we get from the dump file:

Note that blit_texture_to_pbo has been deleted in commit 779923194c65e,
which was in Mesa 10.5.0.  You must be using some older version of Mesa;
it may just work if you update to a newer version.

But I'm confused.  Even when it existed, blit_texture_to_pbo() never
directly called brw_blorp_blit_miptrees().  It used the BLT engine.

So I'm suspicious that this is caused by a patch in your tree.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Split VUE map handling out of brw_vs.c into brw_vue_map.c.

2015-06-17 Thread Kenneth Graunke
This was originally only used by the vertex shader, but it's now used by
the geometry shader as well, and will also eventually be used for
tessellation control and evaluation shaders.

I suspect it will be easier to find in a file named after the concept.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_vs.c | 102 --
 src/mesa/drivers/dri/i965/brw_vue_map.c| 136 +
 3 files changed, 137 insertions(+), 102 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_vue_map.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 93f336e..981fe79 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -130,6 +130,7 @@ i965_FILES = \
brw_vs.h \
brw_vs_state.c \
brw_vs_surface_state.c \
+   brw_vue_map.c \
brw_wm.c \
brw_wm.h \
brw_wm_iz.cpp \
diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index d03567e..6e9848f 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -40,108 +40,6 @@
 
 #include "util/ralloc.h"
 
-static inline void assign_vue_slot(struct brw_vue_map *vue_map,
-   int varying)
-{
-   /* Make sure this varying hasn't been assigned a slot already */
-   assert (vue_map->varying_to_slot[varying] == -1);
-
-   vue_map->varying_to_slot[varying] = vue_map->num_slots;
-   vue_map->slot_to_varying[vue_map->num_slots++] = varying;
-}
-
-/**
- * Compute the VUE map for vertex shader program.
- */
-void
-brw_compute_vue_map(const struct brw_device_info *devinfo,
-struct brw_vue_map *vue_map,
-GLbitfield64 slots_valid)
-{
-   vue_map->slots_valid = slots_valid;
-   int i;
-
-   /* gl_Layer and gl_ViewportIndex don't get their own varying slots -- they
-* are stored in the first VUE slot (VARYING_SLOT_PSIZ).
-*/
-   slots_valid &= ~(VARYING_BIT_LAYER | VARYING_BIT_VIEWPORT);
-
-   /* Make sure that the values we store in vue_map->varying_to_slot and
-* vue_map->slot_to_varying won't overflow the signed chars that are used
-* to store them.  Note that since vue_map->slot_to_varying sometimes holds
-* values equal to BRW_VARYING_SLOT_COUNT, we need to ensure that
-* BRW_VARYING_SLOT_COUNT is <= 127, not 128.
-*/
-   STATIC_ASSERT(BRW_VARYING_SLOT_COUNT <= 127);
-
-   vue_map->num_slots = 0;
-   for (i = 0; i < BRW_VARYING_SLOT_COUNT; ++i) {
-  vue_map->varying_to_slot[i] = -1;
-  vue_map->slot_to_varying[i] = BRW_VARYING_SLOT_COUNT;
-   }
-
-   /* VUE header: format depends on chip generation and whether clipping is
-* enabled.
-*/
-   if (devinfo->gen < 6) {
-  /* There are 8 dwords in VUE header pre-Ironlake:
-   * dword 0-3 is indices, point width, clip flags.
-   * dword 4-7 is ndc position
-   * dword 8-11 is the first vertex data.
-   *
-   * On Ironlake the VUE header is nominally 20 dwords, but the hardware
-   * will accept the same header layout as Gen4 [and should be a bit 
faster]
-   */
-  assign_vue_slot(vue_map, VARYING_SLOT_PSIZ);
-  assign_vue_slot(vue_map, BRW_VARYING_SLOT_NDC);
-  assign_vue_slot(vue_map, VARYING_SLOT_POS);
-   } else {
-  /* There are 8 or 16 DWs (D0-D15) in VUE header on Sandybridge:
-   * dword 0-3 of the header is indices, point width, clip flags.
-   * dword 4-7 is the 4D space position
-   * dword 8-15 of the vertex header is the user clip distance if
-   * enabled.
-   * dword 8-11 or 16-19 is the first vertex element data we fill.
-   */
-  assign_vue_slot(vue_map, VARYING_SLOT_PSIZ);
-  assign_vue_slot(vue_map, VARYING_SLOT_POS);
-  if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST0))
- assign_vue_slot(vue_map, VARYING_SLOT_CLIP_DIST0);
-  if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST1))
- assign_vue_slot(vue_map, VARYING_SLOT_CLIP_DIST1);
-
-  /* front and back colors need to be consecutive so that we can use
-   * ATTRIBUTE_SWIZZLE_INPUTATTR_FACING to swizzle them when doing
-   * two-sided color.
-   */
-  if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_COL0))
- assign_vue_slot(vue_map, VARYING_SLOT_COL0);
-  if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_BFC0))
- assign_vue_slot(vue_map, VARYING_SLOT_BFC0);
-  if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_COL1))
- assign_vue_slot(vue_map, VARYING_SLOT_COL1);
-  if (slots_valid & BITFIELD64_BIT(VARYING_SLOT_BFC1))
- assign_vue_slot(vue_map, VARYING_SLOT_BFC1);
-   }
-
-   /* The hardware doesn't care about the rest of the vertex outputs, so just
- 

[Mesa-dev] [PATCH 2/2] i965: Add and fix comments in brw_vue_map.c.

2015-06-17 Thread Kenneth Graunke
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vue_map.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vue_map.c 
b/src/mesa/drivers/dri/i965/brw_vue_map.c
index ff92bd2..7687578 100644
--- a/src/mesa/drivers/dri/i965/brw_vue_map.c
+++ b/src/mesa/drivers/dri/i965/brw_vue_map.c
@@ -24,6 +24,15 @@
 /**
  * @file brw_vue_map.c
  *
+ * This file computes the "VUE map" for a (non-fragment) shader stage, which
+ * describes the layout of its output varyings.  The VUE map is used to match
+ * outputs from one stage with the inputs of the next.
+ *
+ * Largely, varyings can be placed however we like - producers/consumers simply
+ * have to agree on the layout.  However, there is also a "VUE Header" that
+ * prescribes a fixed-layout for items that interact with fixed function
+ * hardware, such as the clipper and rasterizer.
+ *
  * Authors:
  *   Paul Berry 
  *   Chris Forbes 
@@ -45,7 +54,7 @@ assign_vue_slot(struct brw_vue_map *vue_map, int varying)
 }
 
 /**
- * Compute the VUE map for vertex shader program.
+ * Compute the VUE map for a shader stage.
  */
 void
 brw_compute_vue_map(const struct brw_device_info *devinfo,
@@ -76,6 +85,9 @@ brw_compute_vue_map(const struct brw_device_info *devinfo,
 
/* VUE header: format depends on chip generation and whether clipping is
 * enabled.
+*
+* See the Sandybridge PRM, Volume 2 Part 1, section 1.5.1 (page 30),
+* "Vertex URB Entry (VUE) Formats" which describes the VUE header layout.
 */
if (devinfo->gen < 6) {
   /* There are 8 dwords in VUE header pre-Ironlake:
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Add missing braces around if-statement.

2015-06-18 Thread Kenneth Graunke
On Thursday, June 18, 2015 04:28:25 PM Ben Widawsky wrote:
> 
> On Thu, Jun 18, 2015 at 04:19:36PM -0700, Matt Turner wrote:
> > Fixes a performance problem caused by commit b639ed2f.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90895
> 
> Ken spotted this in review.
> /me hides
> 
> Reviewed-by: Ben Widawsky 

Scratch one mystery!  Thanks Matt.

Reviewed-by: Kenneth Graunke 

> 
> > ---
> >  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > index c0c8dfa..49f2e3e 100644
> > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > @@ -339,12 +339,13 @@ is_color_fast_clear_compatible(struct brw_context 
> > *brw,
> > mesa_format format,
> > const union gl_color_union *color)
> >  {
> > -   if (_mesa_is_format_integer_color(format))
> > +   if (_mesa_is_format_integer_color(format)) {
> >if (brw->gen >= 8) {
> >   perf_debug("Integer fast clear not enabled for (%s)",
> >  _mesa_get_format_name(format));
> >}
> >return false;
> > +   }
> >  
> > for (int i = 0; i < 4; i++) {
> >if (color->f[i] != 0.0 && color->f[i] != 1.0 &&
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] i965/gen9: Don't use encrypted MOCS

2015-06-18 Thread Kenneth Graunke
On Wednesday, June 17, 2015 03:50:13 PM Ben Widawsky wrote:
> On gen9+ MOCS is an index into a table. It is 7 bits, and AFAICT, bit 0 is for
> doing encrypted reads.
> 
> I don't recall how I decided to do this for BXT. I don't know this patch was
> ever needed, since it seems nothing is broken today on SKL. Furthermore, this
> patch may no longer be needed because of the ongoing changes with MOCS setup. 
> It
> is what is being used/tested, so it's included in the series.
> 
> The chosen values are the old values left shifted. That was also an arbitrary
> choice.
> 
> Cc:  Francisco Jerez 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index bfcc442..5358edc 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -2495,8 +2495,8 @@ enum brw_wm_barycentric_interp_mode {
>   * cache settings.  We still use only either write-back or write-through; and
>   * rely on the documented default values.
>   */
> -#define SKL_MOCS_WB 9
> -#define SKL_MOCS_WT 5
> +#define SKL_MOCS_WB 0x12
> +#define SKL_MOCS_WT 0xa


Yeah, it looks like Kristian made these defines the indices into the
table, but may have missed that the MOCS field puts that table index in
[6:1] and bit 0 is something else.

So shifting left by 1 seems like a good plan.  Perhaps write it as

#define SKL_MOCS_WB (0b000101 << 1)
#define SKL_MOCS_WT (0b001001 << 1)

so the index value is written like it is in the documentation, and the
shift 1 indicates moving it into the right place for MOCS?

Either way,
Reviewed-by: Kenneth Graunke 

Incidentally...the WT value (index 5) appears to skip eLLC - the target
cache is 01b = "LLC only".  That doesn't seem desirable.  We probably
want index 6 instead (0b000110 << 1) which uses both LLC and eLLC.

That said, we shouldn't ever be using WT in the driver - we want to use
the PTE value.  (krh even added a FINISHME comment to that effect.)

I think a proper value for that would be:
#define SKL_MOCS_PTE (0b10 << 1)
(Default: 0b10,
 LeCC = 0x00 - use cacheability controls from page table / ...
 TC = LLC/eLLC allowed)

We could either fix the _WT define or just delete it.

>  
>  #define MEDIA_VFE_STATE 0x7000
>  /* GEN7 DW2, GEN8+ DW3 */
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] docs: update developer info

2015-06-18 Thread Kenneth Graunke
On Friday, June 19, 2015 01:14:07 PM Timothy Arceri wrote:
> Just link directly to the piglit repo the old link has outdated information.
> 
> Add note about updating patchwork when sending patch revisions.
> ---
>  docs/devinfo.html | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/devinfo.html b/docs/devinfo.html
> index 0da18b9..6779ab4 100644
> --- a/docs/devinfo.html
> +++ b/docs/devinfo.html
> @@ -244,7 +244,7 @@ to update the tests themselves.
>  
>  
>  Whenever possible and applicable, test the patch with
> -http://people.freedesktop.org/~nh/piglit/";>Piglit to
> +http://cgit.freedesktop.org/piglit";>Piglit to
>  check for regressions.
>  
>  
> @@ -266,6 +266,12 @@ re-sending the whole series). Using --in-reply-to makes
>  it harder for reviewers to accidentally review old patches.
>  
>  
> +
> +When submitting follow-up patches you should also login to
> +https://patchwork.freedesktop.org";>patchwork and change the
> +state of your old patches to Superseded.
> +
> +
>  Reviewing Patches
>  
>  
> 

I might link to http://piglit.freedesktop.org/ instead - it's the actual
Piglit website.  (There's not much more than the git link, though -
either are definitely better than linking to ~nh!)



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/46] mesa: add tessellation shader getters.

2015-06-19 Thread Kenneth Graunke
On Wednesday, June 17, 2015 01:01:11 AM Marek Olšák wrote:
> From: Fabian Bieler 
> 
> Tessellation dependencies added by Marek.
> ---
>  src/mesa/main/get.c  |  1 +
>  src/mesa/main/get_hash_params.py | 28 ++
>  src/mesa/main/shaderapi.c| 84 
> 
>  3 files changed, 113 insertions(+)
> 
> diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
> index 1bc9b5d..6f2e1ec 100644
> --- a/src/mesa/main/get.c
> +++ b/src/mesa/main/get.c
> @@ -400,6 +400,7 @@ EXTRA_EXT(INTEL_performance_query);
>  EXTRA_EXT(ARB_explicit_uniform_location);
>  EXTRA_EXT(ARB_clip_control);
>  EXTRA_EXT(EXT_polygon_offset_clamp);
> +EXTRA_EXT(ARB_tessellation_shader);
>  
>  static const int
>  extra_ARB_color_buffer_float_or_glcore[] = {
> diff --git a/src/mesa/main/get_hash_params.py 
> b/src/mesa/main/get_hash_params.py
> index 513d5d2..6d393cc 100644
> --- a/src/mesa/main/get_hash_params.py
> +++ b/src/mesa/main/get_hash_params.py
> @@ -820,6 +820,34 @@ descriptor=[
>  
>  # GL_EXT_polygon_offset_clamp
>[ "POLYGON_OFFSET_CLAMP_EXT", "CONTEXT_FLOAT(Polygon.OffsetClamp), 
> extra_EXT_polygon_offset_clamp" ],
> +
> +# GL_ARB_tessellation_shader
> +  [ "PATCH_VERTICES", "CONTEXT_INT(TessCtrlProgram.patch_vertices), 
> extra_ARB_tessellation_shader" ],
> +  [ "PATCH_DEFAULT_OUTER_LEVEL", 
> "CONTEXT_FLOAT4(TessCtrlProgram.patch_default_outer_level), 
> extra_ARB_tessellation_shader" ],
> +  [ "PATCH_DEFAULT_INNER_LEVEL", 
> "CONTEXT_FLOAT2(TessCtrlProgram.patch_default_inner_level), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_GEN_LEVEL", "CONTEXT_INT(Const.MaxTessGenLevel), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_PATCH_VERTICES", "CONTEXT_INT(Const.MaxPatchVertices), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_CONTROL_UNIFORM_COMPONENTS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_CTRL].MaxUniformComponents), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_EVALUATION_UNIFORM_COMPONENTS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_EVAL].MaxUniformComponents), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_CONTROL_TEXTURE_IMAGE_UNITS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_CTRL].MaxTextureImageUnits), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_EVALUATION_TEXTURE_IMAGE_UNITS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_EVAL].MaxTextureImageUnits), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_CONTROL_OUTPUT_COMPONENTS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_CTRL].MaxOutputComponents), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_PATCH_COMPONENTS", "CONTEXT_INT(Const.MaxTessPatchComponents), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_CONTROL_TOTAL_OUTPUT_COMPONENTS", 
> "CONTEXT_INT(Const.MaxTessControlTotalOutputComponents), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_EVALUATION_OUTPUT_COMPONENTS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_EVAL].MaxOutputComponents), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_CONTROL_INPUT_COMPONENTS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_CTRL].MaxInputComponents), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_EVALUATION_INPUT_COMPONENTS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_EVAL].MaxInputComponents), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_CONTROL_UNIFORM_BLOCKS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_CTRL].MaxUniformBlocks), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_TESS_EVALUATION_UNIFORM_BLOCKS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_EVAL].MaxUniformBlocks), 
> extra_ARB_tessellation_shader" ],
> +  [ "MAX_COMBINED_TESS_CONTROL_UNIFORM_COMPONENTS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_CTRL].MaxCombinedUniformComponents),
>  extra_ARB_tessellation_shader" ],
> +  [ "MAX_COMBINED_TESS_EVALUATION_UNIFORM_COMPONENTS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_EVAL].MaxCombinedUniformComponents),
>  extra_ARB_tessellation_shader" ],
> +# Dependencies on GL_ARB_tessellation_shader
> +  [ "MAX_TESS_CONTROL_ATOMIC_COUNTER_BUFFERS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_CTRL].MaxAtomicBuffers), 
> extra_ARB_shader_atomic_counters" ],
> +  [ "MAX_TESS_CONTROL_ATOMIC_COUNTERS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_CTRL].MaxAtomicCounters), 
> extra_ARB_shader_atomic_counters" ],
> +  [ "MAX_TESS_EVALUATION_ATOMIC_COUNTER_BUFFERS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_EVAL].MaxAtomicBuffers), 
> extra_ARB_shader_atomic_counters" ],
> +  [ "MAX_TESS_EVALUATION_ATOMIC_COUNTERS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_EVAL].MaxAtomicCounters), 
> extra_ARB_shader_atomic_counters" ],
> +  [ "MAX_TESS_CONTROL_IMAGE_UNIFORMS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_CTRL].MaxImageUniforms), 
> extra_ARB_shader_image_load_store"],
> +  [ "MAX_TESS_EVALUATION_IMAGE_UNIFORMS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_TESS_EVAL].MaxImageUniforms), 
> extra_ARB_shader_image_load_store"],

S

Re: [Mesa-dev] [PATCH 19/46] mesa: don't allow drawing with tess ctrl shader and without tess eval shader

2015-06-19 Thread Kenneth Graunke
On Wednesday, June 17, 2015 01:01:15 AM Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/mesa/main/api_validate.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index 401120a..9a5ac69 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -69,6 +69,25 @@ check_valid_to_render(struct gl_context *ctx, const char 
> *function)
>   return false;
>}
>  
> +  /* The spec argues that this is allowed because a tess ctrl shader
> +   * without a tess eval shader can be used with transform feedback.
> +   * However, glBeginTransformFeedback doesn't allow GL_PATCHES and
> +   * therefore doesn't allow tessellation.
> +   *
> +   * Further investigation showed that this is indeed a spec bug and
> +   * a tess ctrl shader without a tess eval shader shouldn't have been
> +   * allowed, because there is no API in GL 4.0 that can make use this
> +   * to produce something useful.
> +   *
> +   * Also, all vendors except one don't support a tess ctrl shader 
> without
> +   * a tess eval shader anyway.
> +   */
> +  if (ctx->TessCtrlProgram._Current && !ctx->TessEvalProgram._Current) {
> + _mesa_error(ctx, GL_INVALID_OPERATION,
> + "%s(tess eval shader is missing)", function);
> + return false;
> +  }
> +
>/* Section 7.3 (Program Objects) of the OpenGL 4.5 Core Profile spec
> * says:
> *
> 

This makes sense to me - the TCS always generates patches, and I don't
see any way to record those with transform feedback.  And nothing else
can use patches.  So...effectively you need a TES.

It sounds like they were leaving the door open for a vendor extension
that allowed transform feedback on patches, but nobody has written such
an extension yet.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 20/46] glsl: add tessellation shader parsing support.

2015-06-19 Thread Kenneth Graunke
On Wednesday, June 17, 2015 01:01:16 AM Marek Olšák wrote:
> From: Fabian Bieler 
> 
> ---
>  src/glsl/ast.h  |  54 +++-
>  src/glsl/ast_to_hir.cpp | 133 
> +++-
>  src/glsl/ast_type.cpp   | 112 -
>  src/glsl/glsl_parser.yy | 118 +--
>  src/glsl/glsl_parser_extras.cpp |  39 +++-
>  src/glsl/glsl_parser_extras.h   |  31 --
>  6 files changed, 471 insertions(+), 16 deletions(-)
> 
> diff --git a/src/glsl/ast.h b/src/glsl/ast.h
> index ef74e51..26ad3bf 100644
> --- a/src/glsl/ast.h
> +++ b/src/glsl/ast.h
> @@ -514,6 +514,17 @@ struct ast_type_qualifier {
>   unsigned stream:1; /**< Has stream value assigned  */
>   unsigned explicit_stream:1; /**< stream value assigned explicitly 
> by shader code */
>   /** \} */
> +
> +  /** \name Layout qualifiers for GL_ARB_tessellation_shader */
> +  /** \{ */
> +  /* tess eval input layout */
> +  /* gs prim_type reused for primitive mode */
> +  unsigned vertex_spacing:1;
> +  unsigned ordering:1;
> +  unsigned point_mode:1;
> +  /* tess control output layout */
> +  unsigned vertices:1;
> +  /** \} */
>}
>/** \brief Set of flags, accessed by name. */
>q;
> @@ -549,7 +560,10 @@ struct ast_type_qualifier {
> /** Stream in GLSL 1.50 geometry shaders. */
> unsigned stream;
>  
> -   /** Input or output primitive type in GLSL 1.50 geometry shaders */
> +   /**
> +* Input or output primitive type in GLSL 1.50 geometry shaders
> +* and tessellation shaders.
> +*/
> GLenum prim_type;
>  
> /**
> @@ -576,6 +590,18 @@ struct ast_type_qualifier {
>  */
> int local_size[3];
>  
> +   /** Tessellation evaluation shader: vertex spacing (equal, fractional 
> even/odd) */
> +   GLenum vertex_spacing;
> +
> +   /** Tessellation evaluation shader: vertex ordering (CW or CCW) */
> +   GLenum ordering;
> +
> +   /** Tessellation evaluation shader: point mode */
> +   bool point_mode;
> +
> +   /** Tessellation control shader: number of output vertices */
> +   int vertices;
> +
> /**
>  * Image format specified with an ARB_shader_image_load_store
>  * layout qualifier.
> @@ -631,6 +657,11 @@ struct ast_type_qualifier {
>   _mesa_glsl_parse_state *state,
>   ast_type_qualifier q);
>  
> +   bool merge_out_qualifier(YYLTYPE *loc,
> +   _mesa_glsl_parse_state *state,
> +   ast_type_qualifier q,
> +   ast_node* &node);
> +
> bool merge_in_qualifier(YYLTYPE *loc,
> _mesa_glsl_parse_state *state,
> ast_type_qualifier q,
> @@ -1031,6 +1062,27 @@ public:
>  
>  
>  /**
> + * AST node representing a declaration of the output layout for tessellation
> + * control shaders.
> + */
> +class ast_tcs_output_layout : public ast_node
> +{
> +public:
> +   ast_tcs_output_layout(const struct YYLTYPE &locp, int vertices)
> +  : vertices(vertices)
> +   {
> +  set_location(locp);
> +   }
> +
> +   virtual ir_rvalue *hir(exec_list *instructions,
> +  struct _mesa_glsl_parse_state *state);
> +
> +private:
> +   const int vertices;
> +};
> +
> +
> +/**
>   * AST node representing a declaration of the input layout for geometry
>   * shaders.
>   */
> diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
> index 259e01e..53daf13 100644
> --- a/src/glsl/ast_to_hir.cpp
> +++ b/src/glsl/ast_to_hir.cpp
> @@ -79,6 +79,7 @@ _mesa_ast_to_hir(exec_list *instructions, struct 
> _mesa_glsl_parse_state *state)
> state->toplevel_ir = instructions;
>  
> state->gs_input_prim_type_specified = false;
> +   state->tcs_output_vertices_specified = false;
> state->cs_input_local_size_specified = false;
>  
> /* Section 4.2 of the GLSL 1.20 specification states:
> @@ -2205,6 +2206,8 @@ validate_explicit_location(const struct 
> ast_type_qualifier *qual,
>  * inputoutput
>  * ---
>  * vertex  explicit_loc sso
> +* tess controlsso  sso
> +* tess eval   sso  sso
>  * geometrysso  sso
>  * fragmentsso  explicit_loc
>  */
> @@ -2227,6 +2230,8 @@ validate_explicit_location(const struct 
> ast_type_qualifier *qual,
>fail = true;
>break;
>  
> +   case MESA_SHADER_TESS_CTRL:
> +   case MESA_SHADER_TESS_EVAL:
> case MESA_SHADER_GEOMETRY:
>if (var->data.mode == ir_var_shader_in || var->data.mode == 
> ir_var_shader_out) {
>   if (!state->check_separate_shader_objects_allowed(loc, var))
> @@ -2286,6 +2291,8 @@ validate_explicit_location(const struct 
> ast_type_qualifier *qual,
>  

Re: [Mesa-dev] [PATCH 22/46] glsl: add the patch in/out qualifier

2015-06-19 Thread Kenneth Graunke
On Wednesday, June 17, 2015 01:01:18 AM Marek Olšák wrote:
> From: Fabian Bieler 
> 
> ---
>  src/glsl/ast.h|  1 +
>  src/glsl/ast_to_hir.cpp   | 45 
>  src/glsl/ast_type.cpp |  3 +-
>  src/glsl/builtin_variables.cpp|  8 ++--
>  src/glsl/glsl_lexer.ll|  2 +-
>  src/glsl/glsl_parser.yy   | 15 ---
>  src/glsl/glsl_parser_extras.cpp   |  2 +
>  src/glsl/glsl_types.cpp   |  5 +++
>  src/glsl/glsl_types.h |  6 +++
>  src/glsl/ir.cpp   |  2 +
>  src/glsl/ir.h |  1 +
>  src/glsl/ir_print_visitor.cpp |  5 ++-
>  src/glsl/ir_reader.cpp|  2 +
>  src/glsl/ir_set_program_inouts.cpp| 69 
> +++
>  src/glsl/link_varyings.cpp| 15 ++-
>  src/glsl/lower_named_interface_blocks.cpp |  1 +
>  src/glsl/lower_packed_varyings.cpp|  1 +
>  17 files changed, 161 insertions(+), 22 deletions(-)
> 
> diff --git a/src/glsl/ast.h b/src/glsl/ast.h
> index 26ad3bf..87e1354 100644
> --- a/src/glsl/ast.h
> +++ b/src/glsl/ast.h
> @@ -434,6 +434,7 @@ struct ast_type_qualifier {
>unsigned out:1;
>unsigned centroid:1;
>   unsigned sample:1;
> +  unsigned patch:1;
>unsigned uniform:1;
>unsigned smooth:1;
>unsigned flat:1;
> diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
> index 53daf13..837bac7 100644
> --- a/src/glsl/ast_to_hir.cpp
> +++ b/src/glsl/ast_to_hir.cpp
> @@ -2461,6 +2461,9 @@ apply_type_qualifier_to_variable(const struct 
> ast_type_qualifier *qual,
>var->data.stream = qual->stream;
> }
>  
> +   if (qual->flags.q.patch)
> +  var->data.patch = 1;
> +
> if (qual->flags.q.attribute && state->stage != MESA_SHADER_VERTEX) {
>var->type = glsl_type::error_type;
>_mesa_glsl_error(loc, state,
> @@ -3119,6 +3122,17 @@ handle_tess_ctrl_shader_output_decl(struct 
> _mesa_glsl_parse_state *state,
>num_vertices = state->out_qualifier->vertices;
> }
>  
> +   if (!var->type->is_array() && !var->data.patch) {
> +  _mesa_glsl_error(&loc, state,
> +   "tessellation control shader outputs must be arrays");
> +
> +  /* To avoid cascading failures, short circuit the checks below. */
> +  return;
> +   }

Seems like this block should have gone in patch 20 and just the
!var->data.patch part added here.  But it's already a huge patch, so I
suppose it doesn't matter so much...

> +
> +   if (var->data.patch)
> +  return;
> +
> if (var->type->is_unsized_array()) {
>if (num_vertices != 0)
>   var->type = glsl_type::get_array_instance(var->type->fields.array,
> @@ -3940,6 +3954,33 @@ ast_declarator_list::hir(exec_list *instructions,
>}
>  
>  
> +  /* From section 4.3.4 of the GLSL 4.00 spec:
> +   *"Input variables may not be declared using the patch in qualifier
> +   *in tessellation control or geometry shaders."
> +   *
> +   * From section 4.3.6 of the GLSL 4.00 spec:
> +   *"It is an error to use patch out in a vertex, tessellation
> +   *evaluation, or geometry shader."
> +   *
> +   * This doesn't explicitly forbid using them in a fragment shader, but
> +   * that's probably just an oversight.
> +   */
> +  if (state->stage != MESA_SHADER_TESS_EVAL
> +  && this->type->qualifier.flags.q.patch
> +  && this->type->qualifier.flags.q.in) {
> +
> + _mesa_glsl_error(&loc, state, "'patch in' can only be used in a "
> +  "tessellation evaluation shader");
> +  }
> +
> +  if (state->stage != MESA_SHADER_TESS_CTRL
> +  && this->type->qualifier.flags.q.patch
> +  && this->type->qualifier.flags.q.out) {
> +
> + _mesa_glsl_error(&loc, state, "'patch out' can only be used in a "
> +  "tessellation control shader");
> +  }
> +
>/* Precision qualifiers exists only in GLSL versions 1.00 and >= 1.30.
> */
>if (this->type->qualifier.precision != ast_precision_none) {
> @@ -5463,6 +5504,7 @@ ast_process_structure_or_interface_block(exec_list 
> *instructions,
>  interpret_interpolation_qualifier(qual, var_mode, state, &loc);
>   fields[i].centroid = qual->flags.q.centroid ? 1 : 0;
>   fields[i].sample = qual->flags.q.sample ? 1 : 0;
> + fields[i].patch = qual->flags.q.patch ? 1 : 0;
>  
>   /* Only save explicitly defined streams in block's field */
>   fields[i].stream = qual->flags.q.explicit_stream ? qual->stream : 
> -1;
> @@ -5794,6 +5836,8 @@ ast_interface_block::hir(exec_list *instructions,
> earlier_per_vertex->fields.structure[j].centroid;
>  fields[i].sample =
> earlier_per_vert

Re: [Mesa-dev] [PATCH 03/46] mesa: add tessellation shader structs

2015-06-19 Thread Kenneth Graunke
On Wednesday, June 17, 2015 01:00:59 AM Marek Olšák wrote:
> From: Fabian Bieler 
> 
> Marek: remove unused members, cleanup
> ---
>  src/mesa/main/mtypes.h | 105 
> +
>  1 file changed, 105 insertions(+)
> 
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index 086f553..12789f1 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -2163,6 +2163,29 @@ struct gl_vertex_program
>  };
>  
>  
> +/** Tessellation control program object */
> +struct gl_tess_ctrl_program
> +{
> +   struct gl_program Base;   /**< base class */
> +
> +   /* output layout */
> +   GLint VerticesOut;
> +};
> +
> +
> +/** Tessellation evaluation program object */
> +struct gl_tess_eval_program
> +{
> +   struct gl_program Base;   /**< base class */
> +
> +   /* input layout */
> +   GLenum PrimitiveMode; /* GL_TRIANGLES, GL_QUADS or GL_ISOLINES */
> +   GLenum Spacing;   /* GL_EQUAL, GL_FRACTIONAL_EVEN, GL_FRACTIONAL_ODD 
> */
> +   GLenum VertexOrder;   /* GL_CW or GL_CCW */
> +   bool PointMode;
> +};
> +
> +
>  /** Geometry program object */
>  struct gl_geometry_program
>  {
> @@ -2265,6 +2288,27 @@ struct gl_vertex_program_state
> GLboolean _Overriden;
>  };
>  
> +/**
> + * Context state for tessellation control programs.
> + */
> +struct gl_tess_ctrl_program_state
> +{
> +   /** Currently bound and valid shader. */
> +   struct gl_tess_ctrl_program *_Current;
> +
> +   GLint patch_vertices;
> +   GLfloat patch_default_outer_level[4];
> +   GLfloat patch_default_inner_level[2];
> +};
> +
> +/**
> + * Context state for tessellation evaluation programs.
> + */
> +struct gl_tess_eval_program_state
> +{
> +   /** Currently bound and valid shader. */
> +   struct gl_tess_eval_program *_Current;
> +};
>  
>  /**
>   * Context state for geometry programs.
> @@ -2445,6 +2489,41 @@ struct gl_shader
> bool pixel_center_integer;
>  
> /**
> +* Tessellation Control shader state from layout qualifiers.
> +*/
> +   struct {
> +  /**
> +   * 0 - vertices not declared in shader, or
> +   * 1 .. GL_MAX_PATCH_VERTICES
> +   */
> +  GLint VerticesOut;
> +   } TessCtrl;
> +
> +   /**
> +* Tessellation Evaluation shader state from layout qualifiers.
> +*/
> +   struct {
> +  /**
> +   * GL_TRIANGLES, GL_QUADS, GL_ISOLINES or PRIM_UNKNOWN if it's not set
> +   * in this shader.
> +   */
> +  GLenum PrimitiveMode;
> +  /**
> +   * GL_EQUAL, GL_FRACTIONAL_ODD, GL_FRACTIONAL_EVEN, or 0 if it's not 
> set
> +   * in this shader.
> +   */
> +  GLenum Spacing;
> +  /**
> +   * GL_CW, GL_CCW, or 0 if it's not set in this shader.
> +   */
> +  GLenum VertexOrder;
> +  /**
> +   * 1, 0, or -1 if it's not set in this shader.
> +   */
> +  int PointMode;
> +   } TessEval;
> +
> +   /**
>  * Geometry shader state from GLSL 1.50 layout qualifiers.
>  */
> struct {
> @@ -2668,6 +2747,30 @@ struct gl_shader_program
> enum gl_frag_depth_layout FragDepthLayout;
>  
> /**
> +* Tessellation Control shader state from layout qualifiers.
> +*/
> +   struct {
> +  /**
> +   * 0 - vertices not declared in shader, or
> +   * 1 .. GL_MAX_PATCH_VERTICES
> +   */
> +  GLint VerticesOut;
> +   } TessCtrl;
> +
> +   /**
> +* Tessellation Evaluation shader state from layout qualifiers.
> +*/
> +   struct {
> +  /** GL_TRIANGLES, GL_QUADS or GL_ISOLINES */
> +  GLenum PrimitiveMode;
> +  /** GL_EQUAL, GL_FRACTIONAL_ODD or GL_FRACTIONAL_EVEN */
> +  GLenum Spacing;
> +  /** GL_CW or GL_CCW */
> +  GLenum VertexOrder;
> +  bool PointMode;
> +   } TessEval;

Seems a little odd that we've basically represented this same struct 2-3
times now.  Perhaps give it an actual type and reuse it?  Though I
suppose it doesn't matter much...

> +
> +   /**
>  * Geometry shader state - copied into gl_geometry_program by
>  * _mesa_copy_linked_program_data().
>  */
> @@ -4201,6 +4304,8 @@ struct gl_context
> struct gl_fragment_program_state FragmentProgram;
> struct gl_geometry_program_state GeometryProgram;
> struct gl_compute_program_state ComputeProgram;
> +   struct gl_tess_ctrl_program_state TessCtrlProgram;
> +   struct gl_tess_eval_program_state TessEvalProgram;
> struct gl_ati_fragment_shader_state ATIFragmentShader;
>  
> struct gl_pipeline_shader_state Pipeline; /**< GLSL pipeline shader 
> object state */
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/46] drirc: drop support for Heaven 3.0, fixes tessellation in 4.0

2015-06-19 Thread Kenneth Graunke
I made some comments, but assuming those are taken care of,
patches 1-22 are:

Reviewed-by: Kenneth Graunke 

I plan on reviewing the rest, but probably not tonight.
Thanks for picking this up!


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 25/46] glsl: lower gl_TessLevel* from float[n] to vecn.

2015-06-19 Thread Kenneth Graunke
 +   expr->operands[1]);
> +   }
> +   ir->set_lhs(new_lhs);
> +
> +   if (old_index_constant) {
> +  /* gl_TessLevel* is being accessed via a constant index.  Don't bother
> +   * creating a vector insert op. Just use a write mask.
> +   */
> +  ir->write_mask = 1 << old_index_constant->get_int_component(0);
> +   } else {
> +  if(expr->operands[0]->type == glsl_type::vec4_type)
> + ir->write_mask = WRITEMASK_XYZW;
> +  else
> + ir->write_mask = WRITEMASK_XY;

This could be shortened to:

   ir->writemask = (1 << expr->operands[0]->type->vector_elements) - 1;

> +   }
> +}
> +
> +/**
> + * Replace any assignment having a gl_TessLevel* (undereferenced) as
> + * its LHS or RHS with a sequence of assignments, one for each component of
> + * the array.  Each of these assignments is lowered to refer to
> + * gl_TessLevel*MESA as appropriate.
> + */
> +ir_visitor_status
> +lower_tess_level_visitor::visit_leave(ir_assignment *ir)
> +{
> +   /* First invoke the base class visitor.  This causes handle_rvalue() to be
> +* called on ir->rhs and ir->condition.
> +*/
> +   ir_rvalue_visitor::visit_leave(ir);
> +
> +   if (this->is_tess_level_array(ir->lhs) ||
> +   this->is_tess_level_array(ir->rhs)) {
> +  /* LHS or RHS of the assignment is the entire gl_TessLevel* array.
> +   * Since we are
> +   * reshaping gl_TessLevel* from an array of floats to a
> +   * vec4, this isn't going to work as a bulk assignment anymore, so
> +   * unroll it to element-by-element assignments and lower each of them.
> +   *
> +   * Note: to unroll into element-by-element assignments, we need to make
> +   * clones of the LHS and RHS.  This is safe because expressions and
> +   * l-values are side-effect free.
> +   */
> +  void *ctx = ralloc_parent(ir);
> +  int array_size = ir->lhs->type->array_size();
> +  for (int i = 0; i < array_size; ++i) {
> + ir_dereference_array *new_lhs = new(ctx) ir_dereference_array(
> +ir->lhs->clone(ctx, NULL), new(ctx) ir_constant(i));
> + ir_dereference_array *new_rhs = new(ctx) ir_dereference_array(
> +ir->rhs->clone(ctx, NULL), new(ctx) ir_constant(i));
> + this->handle_rvalue((ir_rvalue **) &new_rhs);
> +
> + /* Handle the LHS after creating the new assignment.  This must
> +  * happen in this order because handle_rvalue may replace the old 
> LHS
> +  * with an ir_expression of ir_binop_vector_extract.  Since this is
> +  * not a valide l-value, this will cause an assertion in the
> +  * ir_assignment constructor to fail.
> +  *
> +  * If this occurs, replace the mangled LHS with a dereference of the
> +  * vector, and replace the RHS with an ir_triop_vector_insert.
> +  */
> + ir_assignment *const assign = new(ctx) ir_assignment(new_lhs, 
> new_rhs);
> + this->handle_rvalue((ir_rvalue **) &assign->lhs);
> + this->fix_lhs(assign);
> +
> + this->base_ir->insert_before(assign);
> +  }
> +  ir->remove();
> +
> +  return visit_continue;
> +   }
> +
> +   /* Handle the LHS as if it were an r-value.  Normally
> +* rvalue_visit(ir_assignment *) only visits the RHS, but we need to lower
> +* expressions in the LHS as well.
> +*
> +* This may cause the LHS to get replaced with an ir_expression of
> +* ir_binop_vector_extract.  If this occurs, replace it with a dereference
> +* of the vector, and replace the RHS with an ir_triop_vector_insert.
> +*/
> +   handle_rvalue((ir_rvalue **)&ir->lhs);
> +   this->fix_lhs(ir);
> +
> +   return rvalue_visit(ir);
> +}
> +
> +
> +/**
> + * Set up base_ir properly and call visit_leave() on a newly created
> + * ir_assignment node.  This is used in cases where we have to insert an
> + * ir_assignment in a place where we know the hierarchical visitor won't see
> + * it.
> + */
> +void
> +lower_tess_level_visitor::visit_new_assignment(ir_assignment *ir)
> +{
> +   ir_instruction *old_base_ir = this->base_ir;
> +   this->base_ir = ir;
> +   ir->accept(this);
> +   this->base_ir = old_base_ir;
> +}
> +
> +
> +/**
> + * If a gl_TessLevel* variable appears as an argument in an ir_call
> + * expression, replace it with a temporary variable, and make sure the 
> ir_call
> + * is preceded and/or followed by assignments that copy the contents of the
> + * temporary variable to and/or from gl_TessLevel*.  

Re: [Mesa-dev] [PATCH 44/46] glsl: fix locations of 2-dimensional varyings without varying packing

2015-06-19 Thread Kenneth Graunke
On Wednesday, June 17, 2015 01:01:40 AM Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/glsl/link_varyings.cpp | 37 -
>  1 file changed, 28 insertions(+), 9 deletions(-)
> 
> diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
> index 5fa9ddf..6bd8dba 100644
> --- a/src/glsl/link_varyings.cpp
> +++ b/src/glsl/link_varyings.cpp
> @@ -750,7 +750,9 @@ namespace {
>  class varying_matches
>  {
>  public:
> -   varying_matches(bool disable_varying_packing, bool consumer_is_fs);
> +   varying_matches(bool disable_varying_packing,
> +   gl_shader_stage producer_type,
> +   gl_shader_stage consumer_type);

Could we perhaps call these producer_stage and consumer-stage?

Either way,
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Fix ir_txs in emit_texture_gen4_simd16().

2015-06-22 Thread Kenneth Graunke
We were not emitting the LOD, which led to message lengths of 1 instead
of 3.  Setting has_lod makes us emit the LOD, but I had to make changes
to avoid emitting the non-existent coordinate as well.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91022
Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 4770838..12253e4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -247,7 +247,7 @@ fs_visitor::emit_texture_gen4_simd16(ir_texture_opcode op, 
fs_reg dst,
  uint32_t sampler)
 {
fs_reg message(MRF, 2, BRW_REGISTER_TYPE_F, dispatch_width);
-   bool has_lod = op == ir_txl || op == ir_txb || op == ir_txf;
+   bool has_lod = op == ir_txl || op == ir_txb || op == ir_txf || op == ir_txs;
 
if (has_lod && shadow_c.file != BAD_FILE)
   no16("TXB and TXL with shadow comparison unsupported in SIMD16.");
@@ -264,14 +264,15 @@ fs_visitor::emit_texture_gen4_simd16(ir_texture_opcode 
op, fs_reg dst,
fs_reg msg_end = offset(message, vector_elements);
 
/* Messages other than sample and ld require all three components */
-   if (has_lod || shadow_c.file != BAD_FILE) {
+   if (vector_elements > 0 && (has_lod || shadow_c.file != BAD_FILE)) {
   for (int i = vector_elements; i < 3; i++) {
  bld.MOV(offset(message, i), fs_reg(0.0f));
   }
+  msg_end = offset(message, 3);
}
 
if (has_lod) {
-  fs_reg msg_lod = retype(offset(message, 3), op == ir_txf ?
+  fs_reg msg_lod = retype(msg_end, op == ir_txf ?
   BRW_REGISTER_TYPE_UD : BRW_REGISTER_TYPE_F);
   bld.MOV(msg_lod, lod);
   msg_end = offset(msg_lod, 1);
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Don't count NIR instructions for shader-db.

2015-06-22 Thread Kenneth Graunke
Matt, Jason, and I haven't found this useful in a long time.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_nir.c |   31 ---
 1 file changed, 31 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index c13708a..dffb8ab 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -57,28 +57,6 @@ nir_optimize(nir_shader *nir)
} while (progress);
 }
 
-static bool
-count_nir_instrs_in_block(nir_block *block, void *state)
-{
-   int *count = (int *) state;
-   nir_foreach_instr(block, instr) {
-  *count = *count + 1;
-   }
-   return true;
-}
-
-static int
-count_nir_instrs(nir_shader *nir)
-{
-   int count = 0;
-   nir_foreach_overload(nir, overload) {
-  if (!overload->impl)
- continue;
-  nir_foreach_block(overload->impl, count_nir_instrs_in_block, &count);
-   }
-   return count;
-}
-
 nir_shader *
 brw_create_nir(struct brw_context *brw,
const struct gl_shader_program *shader_prog,
@@ -178,15 +156,6 @@ brw_create_nir(struct brw_context *brw,
   nir_print_shader(nir, stderr);
}
 
-   static GLuint msg_id = 0;
-   _mesa_gl_debug(&brw->ctx, &msg_id,
-  MESA_DEBUG_SOURCE_SHADER_COMPILER,
-  MESA_DEBUG_TYPE_OTHER,
-  MESA_DEBUG_SEVERITY_NOTIFICATION,
-  "%s NIR shader: %d inst\n",
-  _mesa_shader_stage_to_abbrev(stage),
-  count_nir_instrs(nir));
-
nir_convert_from_ssa(nir);
nir_validate_shader(nir);
 
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Don't mess up stride for uniform integer multiplication.

2015-06-22 Thread Kenneth Graunke
On Monday, June 22, 2015 02:58:36 PM Matt Turner wrote:
> If the stride is 0, the source is a uniform and we should not modify the
> stride.
> 
> Cc: "10.6" 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91047
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 20 
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 5563c5a..903624c 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -3196,10 +3196,16 @@ fs_visitor::lower_integer_multiplication()
> src1_1_w.fixed_hw_reg.dw1.ud >>= 16;
>  } else {
> src1_0_w.type = BRW_REGISTER_TYPE_UW;
> -   src1_0_w.stride = 2;
> +   if (src1_0_w.stride != 0) {
> +  assert(src1_0_w.stride == 1);
> +  src1_0_w.stride = 2;
> +   }
>  
> src1_1_w.type = BRW_REGISTER_TYPE_UW;
> -   src1_1_w.stride = 2;
> +   if (src1_1_w.stride != 0) {
> +  assert(src1_1_w.stride == 1);
> +  src1_1_w.stride = 2;
> +   }
> src1_1_w.subreg_offset += type_sz(BRW_REGISTER_TYPE_UW);
>  }
>  ibld.MUL(low, inst->src[0], src1_0_w);
> @@ -3209,10 +3215,16 @@ fs_visitor::lower_integer_multiplication()
>  fs_reg src0_1_w = inst->src[0];
>  
>  src0_0_w.type = BRW_REGISTER_TYPE_UW;
> -src0_0_w.stride = 2;
> +if (src0_0_w.stride != 0) {
> +   assert(src0_0_w.stride == 1);
> +   src0_0_w.stride = 2;
> +}
>  
>  src0_1_w.type = BRW_REGISTER_TYPE_UW;
> -src0_1_w.stride = 2;
> +if (src0_1_w.stride != 0) {
> +   assert(src0_1_w.stride == 1);
> +   src0_1_w.stride = 2;
> +}
>  src0_1_w.subreg_offset += type_sz(BRW_REGISTER_TYPE_UW);
>  
>  ibld.MUL(low, src0_0_w, inst->src[1]);
> 

Whoops.  Yeah, this makes sense.

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 28/46] glsl: don't lower variable indexing on non-patch tessellation inputs/outputs

2015-06-22 Thread Kenneth Graunke
On Wednesday, June 17, 2015 01:01:24 AM Marek Olšák wrote:
> From: Marek Olšák 
> 
> There is no way to lower them, because the array sizes are unknown
> at compile time.
> 
> Based on a patch from: Fabian Bieler 

I'm a bit confused by the justification given for this patch.

TCS/TES per-vertex inputs:
--

...are always fixed-size arrays of length gl_MaxPatchVertices, because:

"The length of gl_in is equal to the implementation-dependent maximum
 patch size (gl_MaxPatchVertices)."

"Similarly to the built-in inputs, each user-defined input variable has
 a value for each vertex and thus needs to be declared as arrays or
 inside input blocks declared as arrays.  Declaring an array size is
 optional.  If no size is specified, it will be taken from the
 implementation-dependent maximum patch size (gl_MaxPatchVertices).
 If a size is specified, it must match the maximum patch size;
 otherwise, a link-error will occur."

This same text exists for both TCS inputs and TES inputs.  Since we
always know the array size, I don't see why we can't do lowering in
this case.

I'm pretty new to tessellation shaders, so am I missing something?

TCS per-patch inputs:
-

...don't exist AFAICT.

TES per-patch inputs:
-

...do exist, require no special handling.

TCS per-vertex outputs:
---

...are arrays whose size is known at link time, but not necessarily
compile time.

"The length of gl_out is equal to the output patch size specified in the
 tessellation control shader output layout declaration."

"A tessellation control shader may also declare user-defined per-vertex
 output variables. User-defined per-vertex output variables are declared
 with the qualifier out and have a value for each vertex in the output
 patch. Such variables must be declared as arrays or inside output blocks
 declared as arrays. Declaring an array size is optional. If no size is
 specified, it will be taken from the output patch size declared in the
 shader."

Apparently, the index must also be gl_InvocationID when writing:

"While per-vertex output variables are declared as arrays indexed by
 vertex number, each tessellation control shader invocation may write only
 to those outputs corresponding to its output patch vertex. Tessellation
 control shaders must use the input variable gl_InvocationID as the
 vertex number index when writing to per-vertex output variables."

So we clearly don't want to do lowering on writes.  But for reads, it
seems like we could do lowering when the array size is known (such as
post-linking).  I'm not sure whether or not it's beneficial...

It might be nice to add a comment explaining why it makes no sense to
lower variable indexing on TCS output writes (with the above spec
citation).

TES outputs:


...require no special handling.


> ---
>  src/glsl/ir_optimization.h   |  5 +--
>  src/glsl/lower_variable_index_to_cond_assign.cpp | 43 
> +---
>  src/glsl/test_optpass.cpp|  3 +-
>  src/mesa/drivers/dri/i965/brw_shader.cpp |  8 +++--
>  src/mesa/program/ir_to_mesa.cpp  |  2 +-
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp   |  2 +-
>  6 files changed, 42 insertions(+), 21 deletions(-)
> 
> diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
> index 688a5e1..a174c96 100644
> --- a/src/glsl/ir_optimization.h
> +++ b/src/glsl/ir_optimization.h
> @@ -114,8 +114,9 @@ bool lower_discard(exec_list *instructions);
>  void lower_discard_flow(exec_list *instructions);
>  bool lower_instructions(exec_list *instructions, unsigned what_to_lower);
>  bool lower_noise(exec_list *instructions);
> -bool lower_variable_index_to_cond_assign(exec_list *instructions,
> -bool lower_input, bool lower_output, bool lower_temp, bool 
> lower_uniform);
> +bool lower_variable_index_to_cond_assign(gl_shader_stage stage,
> +exec_list *instructions, bool lower_input, bool lower_output,
> +bool lower_temp, bool lower_uniform);
>  bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz);
>  bool lower_const_arrays_to_uniforms(exec_list *instructions);
>  bool lower_clip_distance(gl_shader *shader);
> diff --git a/src/glsl/lower_variable_index_to_cond_assign.cpp 
> b/src/glsl/lower_variable_index_to_cond_assign.cpp
> index d878cb0..b6421f5 100644
> --- a/src/glsl/lower_variable_index_to_cond_assign.cpp
> +++ b/src/glsl/lower_variable_index_to_cond_assign.cpp
> @@ -335,12 +335,14 @@ struct switch_generator
>  
>  class variable_index_to_cond_assign_visitor : public ir_rvalue_visitor {
>  public:
> -   variable_index_to_cond_assign_visitor(bool lower_input,
> -  bool lower_output,
> -  bool lower_temp,
> -  bool lower_uniform)
> +   variable_index_to_cond_assign_visitor(gl_shader_stage stage,
> +  

Re: [Mesa-dev] [PATCH 3/4] i965/gen9: Don't use encrypted MOCS

2015-06-22 Thread Kenneth Graunke
On Monday, June 22, 2015 05:24:11 PM Ben Widawsky wrote:
> On Thu, Jun 18, 2015 at 03:41:50PM -0700, Kenneth Graunke wrote:
> > On Wednesday, June 17, 2015 03:50:13 PM Ben Widawsky wrote:
> > > On gen9+ MOCS is an index into a table. It is 7 bits, and AFAICT, bit 0 
> > > is for
> > > doing encrypted reads.
> > > 
> > > I don't recall how I decided to do this for BXT. I don't know this patch 
> > > was
> > > ever needed, since it seems nothing is broken today on SKL. Furthermore, 
> > > this
> > > patch may no longer be needed because of the ongoing changes with MOCS 
> > > setup. It
> > > is what is being used/tested, so it's included in the series.
> > > 
> > > The chosen values are the old values left shifted. That was also an 
> > > arbitrary
> > > choice.
> > > 
> > > Cc:  Francisco Jerez 
> > > Signed-off-by: Ben Widawsky 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_defines.h | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> > > b/src/mesa/drivers/dri/i965/brw_defines.h
> > > index bfcc442..5358edc 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > > @@ -2495,8 +2495,8 @@ enum brw_wm_barycentric_interp_mode {
> > >   * cache settings.  We still use only either write-back or 
> > > write-through; and
> > >   * rely on the documented default values.
> > >   */
> > > -#define SKL_MOCS_WB 9
> > > -#define SKL_MOCS_WT 5
> > > +#define SKL_MOCS_WB 0x12
> > > +#define SKL_MOCS_WT 0xa
> > 
> > 
> > Yeah, it looks like Kristian made these defines the indices into the
> > table, but may have missed that the MOCS field puts that table index in
> > [6:1] and bit 0 is something else.
> > 
> > So shifting left by 1 seems like a good plan.  Perhaps write it as
> > 
> > #define SKL_MOCS_WB (0b000101 << 1)
> > #define SKL_MOCS_WT (0b001001 << 1)
> > 
> 
> You meant this, right (you reversed it, I think)?
> #define SKL_MOCS_WB (0b001001 << 1)
> #define SKL_MOCS_WT (0b000101 << 1)

Whoops!  Yes, that's what I meant.  Thanks!

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/16] i965: Move INTEL_DEBUG variable parsing to screen creation time

2015-06-23 Thread Kenneth Graunke
On Monday, June 22, 2015 06:07:25 PM Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_context.c  | 10 +-
>  src/mesa/drivers/dri/i965/intel_debug.c  | 13 ++---
>  src/mesa/drivers/dri/i965/intel_debug.h  |  4 ++--
>  src/mesa/drivers/dri/i965/intel_screen.c |  2 ++
>  4 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> b/src/mesa/drivers/dri/i965/brw_context.c
> index c629f39..327a668 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -822,7 +822,15 @@ brwCreateContext(gl_api api,
> _mesa_meta_init(ctx);
>  
> brw_process_driconf_options(brw);
> -   brw_process_intel_debug_variable(brw);
> +
> +   if (INTEL_DEBUG & DEBUG_BUFMGR)
> +  dri_bufmgr_set_debug(brw->bufmgr, true);

This should be done at screen creation time.  brw->bufmgr is just a
shadow copy of intelScreen->bufmgr; there is only one bufmgr for the
whole process.

> +
> +   if (INTEL_DEBUG & DEBUG_PERF)
> +  brw->perf_debug = true;
> +
> +   if (INTEL_DEBUG & DEBUG_AUB)
> +  drm_intel_bufmgr_gem_set_aub_dump(brw->bufmgr, true);

Ditto for aub dumping.

Perhaps just pass the screen into brw_process_intel_debug_variable
instead of devinfo?  Or also pass the bufmgr?

>  
> if (brw->gen >= 8 && !(INTEL_DEBUG & DEBUG_VEC4VS))
>brw->scalar_vs = true;
> diff --git a/src/mesa/drivers/dri/i965/intel_debug.c 
> b/src/mesa/drivers/dri/i965/intel_debug.c
> index 53f575a..0f4e556 100644
> --- a/src/mesa/drivers/dri/i965/intel_debug.c
> +++ b/src/mesa/drivers/dri/i965/intel_debug.c
> @@ -88,25 +88,16 @@ intel_debug_flag_for_shader_stage(gl_shader_stage stage)
>  }
>  
>  void
> -brw_process_intel_debug_variable(struct brw_context *brw)
> +brw_process_intel_debug_variable(const struct brw_device_info *devinfo)
>  {
> uint64_t intel_debug = driParseDebugString(getenv("INTEL_DEBUG"), 
> debug_control);
> (void) p_atomic_cmpxchg(&INTEL_DEBUG, 0, intel_debug);
>  
> -   if (INTEL_DEBUG & DEBUG_BUFMGR)
> -  dri_bufmgr_set_debug(brw->bufmgr, true);
> -
> -   if ((INTEL_DEBUG & DEBUG_SHADER_TIME) && brw->gen < 7) {
> +   if ((INTEL_DEBUG & DEBUG_SHADER_TIME) && devinfo->gen < 7) {
>fprintf(stderr,
>"shader_time debugging requires gen7 (Ivybridge) or 
> better.\n");
>INTEL_DEBUG &= ~DEBUG_SHADER_TIME;
> }
> -
> -   if (INTEL_DEBUG & DEBUG_PERF)
> -  brw->perf_debug = true;
> -
> -   if (INTEL_DEBUG & DEBUG_AUB)
> -  drm_intel_bufmgr_gem_set_aub_dump(brw->bufmgr, true);
>  }
>  
>  /**
> diff --git a/src/mesa/drivers/dri/i965/intel_debug.h 
> b/src/mesa/drivers/dri/i965/intel_debug.h
> index f754be2..96212df 100644
> --- a/src/mesa/drivers/dri/i965/intel_debug.h
> +++ b/src/mesa/drivers/dri/i965/intel_debug.h
> @@ -114,8 +114,8 @@ extern uint64_t INTEL_DEBUG;
>  
>  extern uint64_t intel_debug_flag_for_shader_stage(gl_shader_stage stage);
>  
> -struct brw_context;
> +struct brw_device_info;
>  
> -extern void brw_process_intel_debug_variable(struct brw_context *brw);
> +extern void brw_process_intel_debug_variable(const struct brw_device_info *);
>  
>  extern bool brw_env_var_as_boolean(const char *var_name, bool default_value);
> diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
> b/src/mesa/drivers/dri/i965/intel_screen.c
> index 896a125..38475b9 100644
> --- a/src/mesa/drivers/dri/i965/intel_screen.c
> +++ b/src/mesa/drivers/dri/i965/intel_screen.c
> @@ -1372,6 +1372,8 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
> if (!intelScreen->devinfo)
>return false;
>  
> +   brw_process_intel_debug_variable(intelScreen->devinfo);
> +
> intelScreen->hw_must_use_separate_stencil = intelScreen->devinfo->gen >= 
> 7;
>  
> intelScreen->hw_has_swizzling = intel_detect_swizzling(intelScreen);
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/16] i965: Add compiler options to brw_compiler

2015-06-23 Thread Kenneth Graunke
On Monday, June 22, 2015 06:07:29 PM Jason Ekstrand wrote:
> This creates the options at screen cration time and then we just copy them
> into the context at context creation time.  We also move is_scalar to the
> brw_compiler structure.
> 
> We also end up manually setting some values that the core would have set by
> default for us.  Fortunately, there are only two non-zero shader compiler
> option defaults that we aren't overriding anyway so this isn't a big deal.
> ---
>  src/mesa/drivers/dri/i965/brw_context.c  | 46 ++
>  src/mesa/drivers/dri/i965/brw_context.h  |  1 -
>  src/mesa/drivers/dri/i965/brw_shader.cpp | 49 
> +++-
>  src/mesa/drivers/dri/i965/brw_shader.h   |  3 ++
>  src/mesa/drivers/dri/i965/brw_vec4.cpp   |  2 +-
>  src/mesa/drivers/dri/i965/intel_screen.c |  1 +
>  6 files changed, 56 insertions(+), 46 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> b/src/mesa/drivers/dri/i965/brw_context.c
> index 327a668..33cdbd2 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -50,6 +50,7 @@
>  
>  #include "brw_context.h"
>  #include "brw_defines.h"
> +#include "brw_shader.h"
>  #include "brw_draw.h"
>  #include "brw_state.h"
>  
> @@ -68,8 +69,6 @@
>  #include "tnl/t_pipeline.h"
>  #include "util/ralloc.h"
>  
> -#include "glsl/nir/nir.h"
> -
>  /***
>   * Mesa's Driver Functions
>   ***/
> @@ -558,48 +557,12 @@ brw_initialize_context_constants(struct brw_context 
> *brw)
>ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxInputComponents = 128;
> }
>  
> -   static const nir_shader_compiler_options nir_options = {
> -  .native_integers = true,
> -  /* In order to help allow for better CSE at the NIR level we tell NIR
> -   * to split all ffma instructions during opt_algebraic and we then
> -   * re-combine them as a later step.
> -   */
> -  .lower_ffma = true,
> -  .lower_sub = true,
> -   };
> -
> /* We want the GLSL compiler to emit code that uses condition codes */
> for (int i = 0; i < MESA_SHADER_STAGES; i++) {
> -  ctx->Const.ShaderCompilerOptions[i].MaxIfDepth = brw->gen < 6 ? 16 : 
> UINT_MAX;
> -  ctx->Const.ShaderCompilerOptions[i].EmitCondCodes = true;
> -  ctx->Const.ShaderCompilerOptions[i].EmitNoNoise = true;
> -  ctx->Const.ShaderCompilerOptions[i].EmitNoMainReturn = true;
> -  ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectInput = true;
> -  ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectOutput =
> -  (i == MESA_SHADER_FRAGMENT);
> -  ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectTemp =
> -  (i == MESA_SHADER_FRAGMENT);
> -  ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectUniform = false;
> -  ctx->Const.ShaderCompilerOptions[i].LowerClipDistance = true;
> +  ctx->Const.ShaderCompilerOptions[i] =
> + brw->intelScreen->compiler->glsl_compiler_options[i];
> }
>  
> -   ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = 
> true;
> -   ctx->Const.ShaderCompilerOptions[MESA_SHADER_GEOMETRY].OptimizeForAOS = 
> true;
> -
> -   if (brw->scalar_vs) {
> -  /* If we're using the scalar backend for vertex shaders, we need to
> -   * configure these accordingly.
> -   */
> -  
> ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].EmitNoIndirectOutput = 
> true;
> -  
> ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].EmitNoIndirectTemp = 
> true;
> -  ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = 
> false;
> -
> -  ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].NirOptions = 
> &nir_options;
> -   }
> -
> -   ctx->Const.ShaderCompilerOptions[MESA_SHADER_FRAGMENT].NirOptions = 
> &nir_options;
> -   ctx->Const.ShaderCompilerOptions[MESA_SHADER_COMPUTE].NirOptions = 
> &nir_options;
> -
> /* ARB_viewport_array */
> if (brw->gen >= 6 && ctx->API == API_OPENGL_CORE) {
>ctx->Const.MaxViewports = GEN6_NUM_VIEWPORTS;
> @@ -832,9 +795,6 @@ brwCreateContext(gl_api api,
> if (INTEL_DEBUG & DEBUG_AUB)
>drm_intel_bufmgr_gem_set_aub_dump(brw->bufmgr, true);
>  
> -   if (brw->gen >= 8 && !(INTEL_DEBUG & DEBUG_VEC4VS))
> -  brw->scalar_vs = true;
> -
> brw_initialize_context_constants(brw);
>  
> ctx->Const.ResetStrategy = notify_reset
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 58119ee..d8fcfff 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1137,7 +1137,6 @@ struct brw_context
> bool has_pln;
> bool no_simd8;
> bool use_rep_send;
> -   bool scalar_vs;
>  
> /**
>  * Some versions of Gen hardware don't do centroid interpolation correctly
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> b/src/mesa/drivers/dri/i965/b

Re: [Mesa-dev] [PATCH 00/16] i965: Finish removing brw_context from the compiler

2015-06-23 Thread Kenneth Graunke
On Monday, June 22, 2015 06:07:20 PM Jason Ekstrand wrote:
> I started working on this project some time ago to remove brw_context from
> the backend compiler.  I got a bunch of refactoring done but eventualy got
> stuck up on shader_time and some debug logging stuff.  I've finally gotten
> around to finishing it and here it is.
> 
> Jason Ekstrand (15):
>   i965: Replace some instances of brw->gen with devinfo->gen
>   i965: Plumb compiler debug logging through a function pointer in
> brw_compiler
>   i965: Remove the dependance on brw_context from the generators
>   i965: Move INTEL_DEBUG variable parsing to screen creation time
>   i965/fs: Make no16 non-variadic
>   i965/fs: Do the no16 perf logging directly in fs_visitor::no16()
>   i965/fs: Plumb compiler debug logging through brw_compiler
>   i965: Add compiler options to brw_compiler
>   i965: Use a single index per shader for shader_time.
>   i965: Pull calls to get_shader_time_index out of the visitor
>   i965/fs: Add a do_rep_send flag to run_fs
>   i965/vs: Pass the current set of clip planes through run() and
> run_vs()
>   i965/vec4: Turn some _mesa_problem calls into asserts
>   i965/vec4_vs: Add an explicit use_legacy_snorm_formula flag
>   i965: Remove the brw_context from the visitors
> 
> Kenneth Graunke (1):
>   mesa: Add a va_args variant of _mesa_gl_debug().

I requested a few small changes.  With those fixed,

For the series (minus my patch),
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: Use a switch statement for detecting move-like operations.

2015-06-23 Thread Kenneth Graunke
Suggested by Jason Ekstrand.

Signed-off-by: Kenneth Graunke 
---
 src/glsl/nir/nir_opt_peephole_select.c |   20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/glsl/nir/nir_opt_peephole_select.c 
b/src/glsl/nir/nir_opt_peephole_select.c
index ef7c977..6620e5d 100644
--- a/src/glsl/nir/nir_opt_peephole_select.c
+++ b/src/glsl/nir/nir_opt_peephole_select.c
@@ -82,14 +82,22 @@ block_check_for_allowed_instrs(nir_block *block)
  break;
 
   case nir_instr_type_alu: {
- /* It must be a move operation */
  nir_alu_instr *mov = nir_instr_as_alu(instr);
- if (mov->op != nir_op_fmov && mov->op != nir_op_imov &&
- mov->op != nir_op_fneg && mov->op != nir_op_ineg &&
- mov->op != nir_op_fabs && mov->op != nir_op_iabs &&
- mov->op != nir_op_vec2 && mov->op != nir_op_vec3 &&
- mov->op != nir_op_vec4)
+ switch (mov->op) {
+ case nir_op_fmov:
+ case nir_op_imov:
+ case nir_op_fneg:
+ case nir_op_ineg:
+ case nir_op_fabs:
+ case nir_op_iabs:
+ case nir_op_vec2:
+ case nir_op_vec3:
+ case nir_op_vec4:
+/* It must be a move-like operation. */
+break;
+ default:
 return false;
+ }
 
  /* Can't handle saturate */
  if (mov->dest.saturate)
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Drop brw->depthstencil.stencil_offset from gen8_depth_state.c.

2015-06-24 Thread Kenneth Graunke
This is always 0 - only brw_workaround_depthstencil_alignment ever sets
it, and that doesn't run on Gen6+.  My initial Broadwell depth state
commit had this mistake.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/gen8_depth_state.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c 
b/src/mesa/drivers/dri/i965/gen8_depth_state.c
index 7c4ec06..76ba09c 100644
--- a/src/mesa/drivers/dri/i965/gen8_depth_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c
@@ -41,7 +41,6 @@ emit_depth_packets(struct brw_context *brw,
bool depth_writable,
struct intel_mipmap_tree *stencil_mt,
bool stencil_writable,
-   uint32_t stencil_offset,
bool hiz,
uint32_t width,
uint32_t height,
@@ -127,8 +126,7 @@ emit_depth_packets(struct brw_context *brw,
   OUT_BATCH(HSW_STENCIL_ENABLED | mocs_wb << 22 |
 (2 * stencil_mt->pitch - 1));
   OUT_RELOC64(stencil_mt->bo,
-  I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER,
-  stencil_offset);
+  I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, 0);
   OUT_BATCH(stencil_mt ? stencil_mt->qpitch >> 2 : 0);
   ADVANCE_BATCH();
}
@@ -220,7 +218,6 @@ gen8_emit_depth_stencil_hiz(struct brw_context *brw,
emit_depth_packets(brw, depth_mt, brw_depthbuffer_format(brw), surftype,
   ctx->Depth.Mask != 0,
   stencil_mt, ctx->Stencil._WriteEnabled,
-  brw->depthstencil.stencil_offset,
   hiz, width, height, depth, lod, min_array_element);
 }
 
@@ -439,7 +436,7 @@ gen8_hiz_exec(struct brw_context *brw, struct 
intel_mipmap_tree *mt,
   brw_depth_format(brw, mt->format),
   BRW_SURFACE_2D,
   true, /* depth writes */
-  NULL, false, 0, /* no stencil for now */
+  NULL, false, /* no stencil for now */
   true, /* hiz */
   surface_width,
   surface_height,
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Don't use GCC extension for ?: with only two operands.

2015-06-24 Thread Kenneth Graunke
From the "apparently I don't know C" files...GCC apparently supports:

x ?: y

which is equivalent to

x ? x : y

except that it doesn't cause side-effects to occur twice.  See:
https://gcc.gnu.org/onlinedocs/gcc/Conditionals.html#Conditionals

This was confusing and looked like a typo.  It doesn't really buy us
anything, so just write the obvious code in normal C.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/intel_fbo.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c 
b/src/mesa/drivers/dri/i965/intel_fbo.c
index 1b3a72f..d1f5770 100644
--- a/src/mesa/drivers/dri/i965/intel_fbo.c
+++ b/src/mesa/drivers/dri/i965/intel_fbo.c
@@ -551,10 +551,12 @@ intel_renderbuffer_update_wrapper(struct brw_context *brw,
 
irb->mt_layer = layer_multiplier * layer;
 
-   if (layered) {
-  irb->layer_count = image->TexObject->NumLayers ?: mt->level[level].depth 
/ layer_multiplier;
-   } else {
+   if (!layered) {
   irb->layer_count = 1;
+   } else if (image->TexObject->NumLayers > 0) {
+  irb->layer_count = image->TexObject->NumLayers;
+   } else {
+  irb->layer_count = mt->level[level].depth / layer_multiplier;
}
 
intel_miptree_reference(&irb->mt, mt);
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] i965: Don't consider uniform value locations in program uploads

2015-06-25 Thread Kenneth Graunke
On Wednesday, June 03, 2015 09:21:11 PM Topi Pohjolainen wrote:
> Shader programs are cached per stage (FS, VS, GS) using the
> corresponding shader source identifier and compile time choices
> as key. However, one not only stores the program binary but
> a pair consisting of program binary and program data. The latter
> represents the store of constants (such as uniforms) used by
> the program.
> 
> However, when programs are searched in the cache for reloading
> only the program key representing the binary is considered
> (see for example, brw_upload_wm_prog() and brw_search_cache()).
> Hence, when programs are re-loaded from cache the first program
> binary, program data pair is extracted without considering if
> the program data matches the currently in use uniform storage
> as well.
> 
> My reasoning Why this actually works is because the key
> contains the identifier of the corresponding gl_program that
> represents the source code for the shader program. Hence,
> two programs having identical source code still have unique
> keys.
> And therefore brw_try_upload_using_copy() never encounters
> a case where a matching binary is found but the program data
> doesn't match.
> 
> My ultimate goal is to stop storing pointers to the individual
> components of a uniform but to store only a pointer to the
> "struct gl_uniform_storage" instead, and allow
> gen6_upload_push_constants() to iterate over individual
> components and array elements. This is needed to be able to
> convert 32-bits floats to fp16 - otherwise there is only
> pointer to 32-bits without knowing its type (int, float, etc)
> let alone its target precision.
> 
> No regression in jenkins. However, we talked about this with
> Ken and this doesn't really tell much as piglit doesn't really
> re-use shader sources during one execution.
> 
> Signed-off-by: Topi Pohjolainen 
> CC: Kenneth Graunke 
> CC: Tapani P\344lli 
> ---
>  src/mesa/drivers/dri/i965/brw_program.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> b/src/mesa/drivers/dri/i965/brw_program.c
> index e5c0d3c..7f5fde8 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -576,12 +576,6 @@ brw_stage_prog_data_compare(const struct 
> brw_stage_prog_data *a,
> if (memcmp(a, b, offsetof(struct brw_stage_prog_data, param)))
>return false;
>  
> -   if (memcmp(a->param, b->param, a->nr_params * sizeof(void *)))
> -  return false;
> -
> -   if (memcmp(a->pull_param, b->pull_param, a->nr_pull_params * sizeof(void 
> *)))
> -  return false;
> -
> return true;
>  }
>  
> 

(Eric, feel free to add thoughts if you care to.  If not, no worries...)

Okay, I've spent a while doing some git archaeology and trying to piece
together this puzzle...this all goes back to 2011 and even 2010, so it's
pushing my limits of recollection...

Eric introduced brw_try_upload_using_copy() in 2011 (18d4a44bd).  His
commit message indicates that it actually did something back then.

I thought of one reason why it might have worked: in the bad old days,
we used to call ProgramStringNotify() every time sampler uniforms
changed.  Which increments key->program_string_id, meaning that every
time sampler uniforms changed, the key would never match again.  But the
shader assembly would be identical, and the uniform storage pointers
should even have been the same.  Which should have hit Eric's code,
preventing us from uploading an extra duplicate copy.

In late 2012 (174d44a9c4d3), I fixed that, so we stopped doing that.
I suspect that at this point, brw_try_upload_using_copy() basically
stopped being useful.  I'm having a real hard time thinking of another
case where the key wouldn't match, but both the shader assembly and
prog_data - including the param pointers - would match.

Looking at it now, I don't see any point at all in the aux_compare
functions.  I've got no idea why brw_try_upload_using_copy() would
bother checking prog_data.  All it does is avoid uploading an extra
copy of the shader assembly into cache->bo.  Whether it succeeds or
fails, we still create a new brw_cache_item entry which contains
both the key and prog_data, which goes in brw->cache.items[].

So, I think we should just delete the aux compare functions
entirely.  AFAICT this stuff goes way back - maybe even to the original
driver import - so I suspect we just kept doing it, even when
unnecessary...


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/gen6: Set up layer constraints properly for depth buffers.

2015-06-25 Thread Kenneth Graunke
This ports over Chris Forbes' equivalent fixes in gen7_misc_state.c
from commit 77d55ef4819436ebbf9786a1e720ec00707bbb19.

No Piglit changes on Sandybridge.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/gen6_depth_state.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_depth_state.c 
b/src/mesa/drivers/dri/i965/gen6_depth_state.c
index 8f0d7dc..febd478 100644
--- a/src/mesa/drivers/dri/i965/gen6_depth_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_depth_state.c
@@ -73,7 +73,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw,
rb = (struct gl_renderbuffer*) irb;
 
if (rb) {
-  depth = MAX2(rb->Depth, 1);
+  depth = MAX2(irb->layer_count, 1);
   if (rb->TexImage)
  gl_target = rb->TexImage->TexObject->Target;
}
@@ -89,6 +89,10 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw,
   surftype = BRW_SURFACE_2D;
   depth *= 6;
   break;
+   case GL_TEXTURE_3D:
+  assert(mt);
+  depth = MAX2(mt->logical_depth0, 1);
+  /* fallthrough */
default:
   surftype = translate_tex_target(gl_target);
   break;
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Remove special case for layered drawbuffer attachments.

2015-06-25 Thread Kenneth Graunke
When binding a layered texture, the layer is already 0.  There's no need
to special case this.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/gen6_surface_state.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_surface_state.c 
b/src/mesa/drivers/dri/i965/gen6_surface_state.c
index 03e913a..39de62f 100644
--- a/src/mesa/drivers/dri/i965/gen6_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_surface_state.c
@@ -88,7 +88,8 @@ gen6_update_renderbuffer_surface(struct brw_context *brw,
   break;
}
 
-   const int min_array_element = layered ? 0 : irb->mt_layer;
+   const int min_array_element = irb->mt_layer;
+   assert(!layered || irb->mt_layer == 0);
 
surf[0] = SET_FIELD(surftype, BRW_SURFACE_TYPE) |
  SET_FIELD(format, BRW_SURFACE_FORMAT);
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Only change and restore viewport 0 in mesa meta mode

2015-06-26 Thread Kenneth Graunke
On Friday, June 26, 2015 03:15:46 PM Mike Stroyan wrote:
> The meta code was setting a default depth range for all viewports
> and 'restoring' all viewports to depth range values saved from viewport 0.
> ---
>  src/mesa/drivers/common/meta.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
> index 214a68a..9a75019 100644
> --- a/src/mesa/drivers/common/meta.c
> +++ b/src/mesa/drivers/common/meta.c
> @@ -728,7 +728,7 @@ _mesa_meta_begin(struct gl_context *ctx, GLbitfield state)
>save->DepthNear = ctx->ViewportArray[0].Near;
>save->DepthFar = ctx->ViewportArray[0].Far;
>/* set depth range to default */
> -  _mesa_DepthRange(0.0, 1.0);
> +  _mesa_set_depth_range(ctx, 0, 0.0, 1.0);
> }
>  
> if (state & MESA_META_CLAMP_FRAGMENT_COLOR) {
> @@ -1129,7 +1129,7 @@ _mesa_meta_end(struct gl_context *ctx)
>   _mesa_set_viewport(ctx, 0, save->ViewportX, save->ViewportY,
>  save->ViewportW, save->ViewportH);
>}
> -  _mesa_DepthRange(save->DepthNear, save->DepthFar);
> +  _mesa_set_depth_range(ctx, 0, save->DepthNear, save->DepthFar);
> }
>  
> if (state & MESA_META_CLAMP_FRAGMENT_COLOR &&
> 

Good catch - this code predates GL_ARB_viewport_array, and really ought
to only change viewport 0.  Thanks, Mike!

Cc: "10.6 10.5" 
Reviewed-by: Kenneth Graunke 

Is there a bugzilla entry related to this patch?

I'll plan to push this tonight/tomorrow unless someone else objects.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: Delete linked GLSL IR when using NIR.

2015-06-26 Thread Kenneth Graunke
On Thursday, June 25, 2015 02:29:13 PM Tapani Pälli wrote:
> Huh I see this went in already, I've noticed a problem and thought to 
> share it.
> 
> Currently program resource list (used by gl api shader queries) is 
> generated in linker, before backend LinkShader call. What this means is 
> that it relies on frontend optimization passes to get rid of dead inputs 
> and outputs. But .. this does not seem to always happen, sometimes these 
> get removed only during backend optimization passes. I have a bug on 
> this as #90925.
> 
> There's 2 possibilities to move with this, either move resource list 
> creation to happen after LinkShader (which would mean we should not free 
> IR during LinkShader) or try to fix frontend dead code removal to 
> recognize the case in the bug. I will keep digging why the variable in 
> question is not recognized by the frontend passes, just wanted to let 
> you know!

Sorry, I didn't see any indication on the mailing list that the patches
were broken, so I thought there were reviewed-and-ready-to-push patches
that just hadn't gone in.  So I did a Piglit run and pushed them.

Feel free to revert if they're not baked yet.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Write at least some data in SIMD8 URB write messages.

2015-06-26 Thread Kenneth Graunke
According to the "URB SIMD8 Write > Write Data Payload" documentation,
"The write data payload can be between 1 and 8 message phases long."

Apparently, the simulator considers it an error if you issue an URB
SIMD8 message with only a header and no actual data to write.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 9a4bad6..7074b5c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1800,14 +1800,13 @@ fs_visitor::emit_urb_writes(gl_clip_plane *clip_planes)
/* If we don't have any valid slots to write, just do a minimal urb write
 * send to terminate the shader. */
if (vue_map->slots_valid == 0) {
-
-  fs_reg payload = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD);
+  fs_reg payload = fs_reg(GRF, alloc.allocate(2), BRW_REGISTER_TYPE_UD);
   bld.exec_all().MOV(payload, fs_reg(retype(brw_vec8_grf(1, 0),
 BRW_REGISTER_TYPE_UD)));
 
   fs_inst *inst = bld.emit(SHADER_OPCODE_URB_WRITE_SIMD8, reg_undef, 
payload);
   inst->eot = true;
-  inst->mlen = 1;
+  inst->mlen = 2;
   inst->offset = 1;
   return;
}
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/vs: Move compute_clip_distance() out of emit_urb_writes().

2015-06-26 Thread Kenneth Graunke
Legacy user clipping (using gl_Position or gl_ClipVertex) is handled by
turning those into the modern gl_ClipDistance equivalents.

This is unnecessary in Core Profile: if user clipping is enabled, but
the shader doesn't write the corresponding gl_ClipDistance entry,
results are undefined.  Hence, it is also unnecessary for geometry
shaders.

This patch moves the call up to run_vs().  This is equivalent for VS,
but removes the need to pass clip distances into emit_urb_writes().

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp |4 +++-
 src/mesa/drivers/dri/i965/brw_fs.h   |2 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |   16 +++-
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 4292aa6..8658554 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3816,7 +3816,9 @@ fs_visitor::run_vs(gl_clip_plane *clip_planes)
if (failed)
   return false;
 
-   emit_urb_writes(clip_planes);
+   compute_clip_distance(clip_planes);
+
+   emit_urb_writes();
 
if (shader_time_index >= 0)
   emit_shader_time_end();
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 243baf6..d08d438 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -271,7 +271,7 @@ public:
  fs_reg src0_alpha, unsigned components,
  unsigned exec_size, bool use_2nd_half = 
false);
void emit_fb_writes();
-   void emit_urb_writes(gl_clip_plane *clip_planes);
+   void emit_urb_writes();
void emit_cs_terminate();
 
void emit_barrier();
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 7074b5c..854e49b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1730,6 +1730,12 @@ fs_visitor::setup_uniform_clipplane_values(gl_clip_plane 
*clip_planes)
}
 }
 
+/**
+ * Lower legacy fixed-function and gl_ClipVertex clipping to clip distances.
+ *
+ * This does nothing if the shader uses gl_ClipDistance or user clipping is
+ * disabled altogether.
+ */
 void fs_visitor::compute_clip_distance(gl_clip_plane *clip_planes)
 {
struct brw_vue_prog_data *vue_prog_data =
@@ -1737,6 +1743,10 @@ void fs_visitor::compute_clip_distance(gl_clip_plane 
*clip_planes)
const struct brw_vue_prog_key *key =
   (const struct brw_vue_prog_key *) this->key;
 
+   /* Bail unless some sort of legacy clipping is enabled */
+   if (!key->userclip_active || prog->UsesClipDistanceOut)
+  return;
+
/* From the GLSL 1.30 spec, section 7.1 (Vertex Shader Special Variables):
 *
 * "If a linked set of shaders forming the vertex stage contains no
@@ -1780,7 +1790,7 @@ void fs_visitor::compute_clip_distance(gl_clip_plane 
*clip_planes)
 }
 
 void
-fs_visitor::emit_urb_writes(gl_clip_plane *clip_planes)
+fs_visitor::emit_urb_writes()
 {
int slot, urb_offset, length;
struct brw_vs_prog_data *vs_prog_data =
@@ -1793,10 +1803,6 @@ fs_visitor::emit_urb_writes(gl_clip_plane *clip_planes)
bool flush;
fs_reg sources[8];
 
-   /* Lower legacy ff and ClipVertex clipping to clip distances */
-   if (key->base.userclip_active && !prog->UsesClipDistanceOut)
-  compute_clip_distance(clip_planes);
-
/* If we don't have any valid slots to write, just do a minimal urb write
 * send to terminate the shader. */
if (vue_map->slots_valid == 0) {
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Switch on shader stage in nir_setup_outputs().

2015-06-26 Thread Kenneth Graunke
Adding new shader stages to a switch statement is less confusing than an
if-else-if ladder where all but the first case are fragment shader
specific (but don't claim to be).

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp |   59 +-
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 59081ea..8bcd5e2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -133,38 +133,45 @@ fs_visitor::nir_setup_outputs(nir_shader *shader)
  var->type->is_array() ? var->type->fields.array->vector_elements
: var->type->vector_elements;
 
-  if (stage == MESA_SHADER_VERTEX) {
+  switch (stage) {
+  case MESA_SHADER_VERTEX:
  for (int i = 0; i < ALIGN(type_size(var->type), 4) / 4; i++) {
 int output = var->data.location + i;
 this->outputs[output] = offset(reg, 4 * i);
 this->output_components[output] = vector_elements;
  }
-  } else if (var->data.index > 0) {
- assert(var->data.location == FRAG_RESULT_DATA0);
- assert(var->data.index == 1);
- this->dual_src_output = reg;
- this->do_dual_src = true;
-  } else if (var->data.location == FRAG_RESULT_COLOR) {
- /* Writing gl_FragColor outputs to all color regions. */
- for (unsigned int i = 0; i < MAX2(key->nr_color_regions, 1); i++) {
-this->outputs[i] = reg;
-this->output_components[i] = 4;
- }
-  } else if (var->data.location == FRAG_RESULT_DEPTH) {
- this->frag_depth = reg;
-  } else if (var->data.location == FRAG_RESULT_SAMPLE_MASK) {
- this->sample_mask = reg;
-  } else {
- /* gl_FragData or a user-defined FS output */
- assert(var->data.location >= FRAG_RESULT_DATA0 &&
-var->data.location < FRAG_RESULT_DATA0 + BRW_MAX_DRAW_BUFFERS);
-
- /* General color output. */
- for (unsigned int i = 0; i < MAX2(1, var->type->length); i++) {
-int output = var->data.location - FRAG_RESULT_DATA0 + i;
-this->outputs[output] = offset(reg, vector_elements * i);
-this->output_components[output] = vector_elements;
+ break;
+  case MESA_SHADER_FRAGMENT:
+ if (var->data.index > 0) {
+assert(var->data.location == FRAG_RESULT_DATA0);
+assert(var->data.index == 1);
+this->dual_src_output = reg;
+this->do_dual_src = true;
+ } else if (var->data.location == FRAG_RESULT_COLOR) {
+/* Writing gl_FragColor outputs to all color regions. */
+for (unsigned int i = 0; i < MAX2(key->nr_color_regions, 1); i++) {
+   this->outputs[i] = reg;
+   this->output_components[i] = 4;
+}
+ } else if (var->data.location == FRAG_RESULT_DEPTH) {
+this->frag_depth = reg;
+ } else if (var->data.location == FRAG_RESULT_SAMPLE_MASK) {
+this->sample_mask = reg;
+ } else {
+/* gl_FragData or a user-defined FS output */
+assert(var->data.location >= FRAG_RESULT_DATA0 &&
+   var->data.location < 
FRAG_RESULT_DATA0+BRW_MAX_DRAW_BUFFERS);
+
+/* General color output. */
+for (unsigned int i = 0; i < MAX2(1, var->type->length); i++) {
+   int output = var->data.location - FRAG_RESULT_DATA0 + i;
+   this->outputs[output] = offset(reg, vector_elements * i);
+   this->output_components[output] = vector_elements;
+}
  }
+ break;
+  default:
+ unreachable("unhandled shader stage");
   }
}
 }
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/vs: Move compute_clip_distance() out of emit_urb_writes().

2015-06-26 Thread Kenneth Graunke
On Friday, June 26, 2015 04:17:39 PM Jason Ekstrand wrote:
> On Fri, Jun 26, 2015 at 3:56 PM, Kenneth Graunke  
> wrote:
> > Legacy user clipping (using gl_Position or gl_ClipVertex) is handled by
> > turning those into the modern gl_ClipDistance equivalents.
> >
> > This is unnecessary in Core Profile: if user clipping is enabled, but
> > the shader doesn't write the corresponding gl_ClipDistance entry,
> > results are undefined.  Hence, it is also unnecessary for geometry
> > shaders.
> >
> > This patch moves the call up to run_vs().  This is equivalent for VS,
> > but removes the need to pass clip distances into emit_urb_writes().
> >
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp |4 +++-
> >  src/mesa/drivers/dri/i965/brw_fs.h   |2 +-
> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |   16 +++-
> >  3 files changed, 15 insertions(+), 7 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 4292aa6..8658554 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -3816,7 +3816,9 @@ fs_visitor::run_vs(gl_clip_plane *clip_planes)
> > if (failed)
> >return false;
> >
> > -   emit_urb_writes(clip_planes);
> > +   compute_clip_distance(clip_planes);
> > +
> > +   emit_urb_writes();
> >
> > if (shader_time_index >= 0)
> >emit_shader_time_end();
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> > b/src/mesa/drivers/dri/i965/brw_fs.h
> > index 243baf6..d08d438 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.h
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> > @@ -271,7 +271,7 @@ public:
> >   fs_reg src0_alpha, unsigned components,
> >   unsigned exec_size, bool use_2nd_half = 
> > false);
> > void emit_fb_writes();
> > -   void emit_urb_writes(gl_clip_plane *clip_planes);
> > +   void emit_urb_writes();
> > void emit_cs_terminate();
> >
> > void emit_barrier();
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > index 7074b5c..854e49b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > @@ -1730,6 +1730,12 @@ 
> > fs_visitor::setup_uniform_clipplane_values(gl_clip_plane *clip_planes)
> > }
> >  }
> >
> > +/**
> > + * Lower legacy fixed-function and gl_ClipVertex clipping to clip 
> > distances.
> > + *
> > + * This does nothing if the shader uses gl_ClipDistance or user clipping is
> > + * disabled altogether.
> > + */
> >  void fs_visitor::compute_clip_distance(gl_clip_plane *clip_planes)
> >  {
> > struct brw_vue_prog_data *vue_prog_data =
> > @@ -1737,6 +1743,10 @@ void fs_visitor::compute_clip_distance(gl_clip_plane 
> > *clip_planes)
> > const struct brw_vue_prog_key *key =
> >(const struct brw_vue_prog_key *) this->key;
> >
> > +   /* Bail unless some sort of legacy clipping is enabled */
> > +   if (!key->userclip_active || prog->UsesClipDistanceOut)
> > +  return;
> > +
> 
> Any reason why you changed this from a conditional call to
> compute_clip_distance to an early return?  I don't know that I care
> much either way.
> 
> Thanks for making this less gross.
> 
> Reviewed-by: Jason Ekstrand 

I did it that way because compute_clip_distances() already prods at
brw_vue_prog_key, and run_vs() currently doesn't.  I would have had
to introduce key casts there.  I felt the unconditional call kept
run_vs() less cluttered, too.

Either way would work.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Write at least some data in SIMD8 URB write messages.

2015-06-26 Thread Kenneth Graunke
On Friday, June 26, 2015 04:15:46 PM Jordan Justen wrote:
> On 2015-06-26 15:18:52, Kenneth Graunke wrote:
> > According to the "URB SIMD8 Write > Write Data Payload" documentation,
> > "The write data payload can be between 1 and 8 message phases long."
> 
> Would a more precise PRM ref location be possible?
> 
> Reviewed-by: Jordan Justen 

That's a good idea.  I wrote this before the Broadwell documentation
was released - now that it is, citing the PRM seems appropriate.

It's unfortunately hard to quote these days: the recent documentation
no longer has section numbers, and we effectively can't use page numbers
because they've taken to updating the PDFs randomly without changing any
sort of revision number.  They even changed the volume numbers on the
Broadwell docs (when they added the Observability Architecture
information), so even that's not guaranteed...

Here's what I've come up with for v2:

/* If we don't have any valid slots to write, just do a minimal urb write
-* send to terminate the shader. */
+* send to terminate the shader.  This includes 1 slot of undefined data,
+* because it's invalid to write 0 data:
+*
+* From the Broadwell PRM, Volume 7: 3D Media GPGPU, Shared Functions -
+* Unified Return Buffer (URB) > URB_SIMD8_Write and URB_SIMD8_Read >
+* Write Data Payload:
+*
+*"The write data payload can be between 1 and 8 message phases long."
+*/

Lately, I've taken to just searching for the quoted text in Okular in
order to find it in the documentation.  That works pretty well no matter
what set of PRMs you're looking at.

--Ken

> 
> > Apparently, the simulator considers it an error if you issue an URB
> > SIMD8 message with only a header and no actual data to write.
> > 
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > index 9a4bad6..7074b5c 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > @@ -1800,14 +1800,13 @@ fs_visitor::emit_urb_writes(gl_clip_plane 
> > *clip_planes)
> > /* If we don't have any valid slots to write, just do a minimal urb 
> > write
> >  * send to terminate the shader. */
> > if (vue_map->slots_valid == 0) {
> > -
> > -  fs_reg payload = fs_reg(GRF, alloc.allocate(1), 
> > BRW_REGISTER_TYPE_UD);
> > +  fs_reg payload = fs_reg(GRF, alloc.allocate(2), 
> > BRW_REGISTER_TYPE_UD);
> >bld.exec_all().MOV(payload, fs_reg(retype(brw_vec8_grf(1, 0),
> >  BRW_REGISTER_TYPE_UD)));
> >  
> >fs_inst *inst = bld.emit(SHADER_OPCODE_URB_WRITE_SIMD8, reg_undef, 
> > payload);
> >inst->eot = true;
> > -  inst->mlen = 1;
> > +  inst->mlen = 2;
> >inst->offset = 1;
> >return;
> > }
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Switch on shader stage in nir_setup_outputs().

2015-06-27 Thread Kenneth Graunke
On Saturday, June 27, 2015 05:00:22 PM Jordan Justen wrote:
> On 2015-06-26 16:03:21, Kenneth Graunke wrote:
> > Adding new shader stages to a switch statement is less confusing than an
> > if-else-if ladder where all but the first case are fragment shader
> > specific (but don't claim to be).
> > 
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp |   59 
> > +-
> >  1 file changed, 33 insertions(+), 26 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > index 59081ea..8bcd5e2 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > @@ -133,38 +133,45 @@ fs_visitor::nir_setup_outputs(nir_shader *shader)
> >   var->type->is_array() ? var->type->fields.array->vector_elements
> > : var->type->vector_elements;
> >  
> > -  if (stage == MESA_SHADER_VERTEX) {
> > +  switch (stage) {
> > +  case MESA_SHADER_VERTEX:
> >   for (int i = 0; i < ALIGN(type_size(var->type), 4) / 4; i++) {
> >  int output = var->data.location + i;
> >  this->outputs[output] = offset(reg, 4 * i);
> >  this->output_components[output] = vector_elements;
> >   }
> > -  } else if (var->data.index > 0) {
> > - assert(var->data.location == FRAG_RESULT_DATA0);
> > - assert(var->data.index == 1);
> > - this->dual_src_output = reg;
> > - this->do_dual_src = true;
> > -  } else if (var->data.location == FRAG_RESULT_COLOR) {
> > - /* Writing gl_FragColor outputs to all color regions. */
> > - for (unsigned int i = 0; i < MAX2(key->nr_color_regions, 1); i++) 
> > {
> > -this->outputs[i] = reg;
> > -this->output_components[i] = 4;
> > - }
> > -  } else if (var->data.location == FRAG_RESULT_DEPTH) {
> > - this->frag_depth = reg;
> > -  } else if (var->data.location == FRAG_RESULT_SAMPLE_MASK) {
> > - this->sample_mask = reg;
> > -  } else {
> > - /* gl_FragData or a user-defined FS output */
> > - assert(var->data.location >= FRAG_RESULT_DATA0 &&
> > -var->data.location < FRAG_RESULT_DATA0 + 
> > BRW_MAX_DRAW_BUFFERS);
> > -
> > - /* General color output. */
> > - for (unsigned int i = 0; i < MAX2(1, var->type->length); i++) {
> > -int output = var->data.location - FRAG_RESULT_DATA0 + i;
> > -this->outputs[output] = offset(reg, vector_elements * i);
> > -this->output_components[output] = vector_elements;
> > + break;
> > +  case MESA_SHADER_FRAGMENT:
> > + if (var->data.index > 0) {
> > +assert(var->data.location == FRAG_RESULT_DATA0);
> > +assert(var->data.index == 1);
> > +this->dual_src_output = reg;
> > +this->do_dual_src = true;
> > + } else if (var->data.location == FRAG_RESULT_COLOR) {
> > +/* Writing gl_FragColor outputs to all color regions. */
> > +for (unsigned int i = 0; i < MAX2(key->nr_color_regions, 1); 
> > i++) {
> > +   this->outputs[i] = reg;
> > +   this->output_components[i] = 4;
> > +}
> > + } else if (var->data.location == FRAG_RESULT_DEPTH) {
> > +this->frag_depth = reg;
> > + } else if (var->data.location == FRAG_RESULT_SAMPLE_MASK) {
> > +this->sample_mask = reg;
> > + } else {
> > +/* gl_FragData or a user-defined FS output */
> > +assert(var->data.location >= FRAG_RESULT_DATA0 &&
> > +   var->data.location < 
> > FRAG_RESULT_DATA0+BRW_MAX_DRAW_BUFFERS);
> > +
> > +/* General color output. */
> > +for (unsigned int i = 0; i < MAX2(1, var->type->length); i++) {
> > +   int output = var->data.location - FRAG_RESULT_DATA0 + i;
> > +   this->outputs[output] = offset(reg, vector_elements * i);
> > +   this->output_components[output] = vector_elements;
> > +}
> 
> I noticed that it looks like MESA_SHADER_FRAGMENT case could use a
&g

Re: [Mesa-dev] [PATCH v2] glsl: use non-null context when cloning variable

2015-06-28 Thread Kenneth Graunke
On Monday, June 29, 2015 01:13:30 AM Ilia Mirkin wrote:
> ProgramResourceList might not yet have been initialized. In that case,
> parent the var to the program.
> 
> Fixes: c2ff3485b3d (glsl: clone inputs and outputs during linking)
> Signed-off-by: Ilia Mirkin 
> Cc: mesa-sta...@lists.freedesktop.org
> ---
> 
> v1 -> v2: parent to prog only if the resource list doesn't exist
> 
> Perhaps it's not worth it to clone in the first place if the resource
> list isn't there?
> 
>  src/glsl/linker.cpp | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index 5da9cad..c8cd858 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -2639,7 +2639,9 @@ add_interface_variables(struct gl_shader_program 
> *shProg,
>  
>/* Clone ir_variable data so that backend is able to free memory. */
>if (!add_program_resource(shProg, programInterface,
> -var->clone(shProg->ProgramResourceList, 
> NULL),
> +var->clone(shProg->ProgramResourceList ?
> +   (void 
> *)shProg->ProgramResourceList :
> +   (void *)shProg, NULL),
>  build_stageref(shProg, var->name) | mask))
>   return false;
> }
> 

There's a mistake in the original patch - add_program_resource is what
allocates ProgramResourceList in the first place, making it non-NULL.
But before it can do that, we use it as a memory context for cloning.

Tapani, any thoughts on that?  You know this code much better than I do :)

I've gone ahead and reverted those two patches, which should fix this
leak.  Tapani found some other bugs in those patches, and I believe
is looking into fixing them.  We may as well revert them in the
meantime.

Sorry for making a mess of this!


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/78] i965/nir/vec4: Select between new nir_vec4 or current vec4_visitor code-paths

2015-06-29 Thread Kenneth Graunke
On Friday, June 26, 2015 10:06:18 AM Eduardo Lima Mitev wrote:
> The NIR->vec4 pass will be activated if ALL the following conditions are met:
> 
> * INTEL_USE_NIR environment variable is defined and is positive (1 or true)
> * The stage is vertex shader
> * The HW generation is either SandyBridge (gen6), IvyBridge or Haswell (gen7)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580
> ---
>  src/mesa/drivers/dri/i965/brw_program.c  |  5 +
>  src/mesa/drivers/dri/i965/brw_shader.cpp | 14 --
>  src/mesa/drivers/dri/i965/brw_vec4.cpp   | 32 
> ++--
>  src/mesa/drivers/dri/i965/brw_vec4.h |  2 ++
>  4 files changed, 45 insertions(+), 8 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> b/src/mesa/drivers/dri/i965/brw_program.c
> index 2327af7..7e5d23d 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -574,6 +574,11 @@ brw_dump_ir(const char *stage, struct gl_shader_program 
> *shader_prog,
>  struct gl_shader *shader, struct gl_program *prog)
>  {
> if (shader_prog) {
> +  /* Since git~104c8fc, shader->ir can be NULL if NIR is used.
> +   * That must have been checked prior to calling this function, but
> +   * we double-check here just in case.
> +   */

That got reverted, so you can probably drop this comment.  The assertion
seems reasonable.

> +  assert(shader->ir != NULL);
>fprintf(stderr,
>"GLSL IR for native %s shader %d:\n", stage, 
> shader_prog->Name);
>_mesa_print_ir(stderr, shader->ir, NULL);
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> b/src/mesa/drivers/dri/i965/brw_shader.cpp
> index 5653d6b..0b53647 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> @@ -118,12 +118,14 @@ brw_compiler_create(void *mem_ctx, const struct 
> brw_device_info *devinfo)
> compiler->glsl_compiler_options[MESA_SHADER_VERTEX].OptimizeForAOS = true;
> compiler->glsl_compiler_options[MESA_SHADER_GEOMETRY].OptimizeForAOS = 
> true;
>  
> -   if (compiler->scalar_vs) {
> -  /* If we're using the scalar backend for vertex shaders, we need to
> -   * configure these accordingly.
> -   */
> -  
> compiler->glsl_compiler_options[MESA_SHADER_VERTEX].EmitNoIndirectOutput = 
> true;
> -  compiler->glsl_compiler_options[MESA_SHADER_VERTEX].EmitNoIndirectTemp 
> = true;
> +   if (compiler->scalar_vs || brw_env_var_as_boolean("INTEL_USE_NIR", 
> false)) {
> +  if (compiler->scalar_vs) {
> + /* If we're using the scalar backend for vertex shaders, we need to
> + * configure these accordingly.
> + */

indentation looks a bit off here.

> + 
> compiler->glsl_compiler_options[MESA_SHADER_VERTEX].EmitNoIndirectOutput = 
> true;
> + 
> compiler->glsl_compiler_options[MESA_SHADER_VERTEX].EmitNoIndirectTemp = true;
> +  }
>compiler->glsl_compiler_options[MESA_SHADER_VERTEX].OptimizeForAOS = 
> false;
>  
>compiler->glsl_compiler_options[MESA_SHADER_VERTEX].NirOptions = 
> nir_options;
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index a5c686c..dcffa04 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1707,6 +1707,21 @@ vec4_visitor::emit_shader_time_write(int 
> shader_time_subindex, src_reg value)
>  }
>  
>  bool
> +vec4_visitor::should_use_vec4_nir()
> +{
> +   /* NIR->vec4 pass is activated when all these conditions meet:
> +*
> +* 1) it is a vertex shader
> +* 2) INTEL_USE_NIR env-var set to true, so NirOptions are defined for VS
> +* 3) hardware gen is SNB, IVB or HSW
> +*/
> +   return
> +  stage == MESA_SHADER_VERTEX &&
> +  compiler->glsl_compiler_options[MESA_SHADER_VERTEX].NirOptions != NULL 
> &&
> +  devinfo->gen >= 6 && devinfo->gen < 8;

Why not just do:

   return compiler->glsl_compiler_options[stage].NirOptions != NULL;

As long as you don't set NirOptions for geometry shaders, that will
work fine.  You could also only set NirOptions[MESA_SHADER_VERTEX] when
gen >= 6 && gen < 8, if you like.  You could probably just eliminate the
function at that point.

> +}
> +
> +bool
>  vec4_visitor::run(gl_clip_plane *clip_planes)
>  {
> sanity_param_count = prog->Parameters->NumParameters;
> @@ -1722,7 +1737,17 @@ vec4_visitor::run(gl_clip_plane *clip_planes)
>  * functions called "main").
>  */
> if (shader) {
> -  visit_instructions(shader->base.ir);
> +  if (should_use_vec4_nir()) {
> + assert(prog->nir != NULL);
> + emit_nir_code();
> + if (failed)
> +return false;
> +  } else {
> + /* Generate VS IR for main().  (the visitor only descends into
> +  * functions called "main").
> +  */
> + visit_instructions(shader->base.ir);
> + 

Re: [Mesa-dev] [PATCH v2 02/19] i965/fs: Actually set/use the mlen for gen7 uniform pull constant loads

2015-06-30 Thread Kenneth Graunke
On Thursday, June 25, 2015 01:24:46 PM Jason Ekstrand wrote:
> Previously, we were allocating the payload with different sizes per gen and
> then figuring out the mlen in the generator based on gen.  This meant,
> among other things, that the higher level passes knew nothing about it.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp   | 19 ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  9 +++--
>  2 files changed, 15 insertions(+), 13 deletions(-)

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] i965/vec4: Move total_scratch calculation into the visitor.

2015-07-01 Thread Kenneth Graunke
This is more consistent with how we do it in the FS backend, and reduces
a tiny bit of duplication.  It'll also allow for a bit more tidying.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_gs.c | 5 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 7 +--
 src/mesa/drivers/dri/i965/brw_vs.c | 5 +
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
b/src/mesa/drivers/dri/i965/brw_gs.c
index 7f947e0..9c59c8a 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -267,10 +267,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
}
 
/* Scratch space is used for register spilling */
-   if (c.base.last_scratch) {
-  c.prog_data.base.base.total_scratch
- = brw_get_scratch_size(c.base.last_scratch*REG_SIZE);
-
+   if (c.prog_data.base.base.total_scratch) {
   brw_get_scratch_bo(brw, &stage_state->scratch_bo,
 c.prog_data.base.base.total_scratch *
  brw->max_gs_threads);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 60f73e2..7b367ec 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1846,6 +1846,11 @@ vec4_visitor::run(gl_clip_plane *clip_planes)
 
opt_set_dependency_control();
 
+   if (c->last_scratch > 0) {
+  prog_data->base.total_scratch =
+ brw_get_scratch_size(c->last_scratch * REG_SIZE);
+   }
+
/* If any state parameters were appended, then ParameterValues could have
 * been realloced, in which case the driver uniform storage set up by
 * _mesa_associate_uniform_storage() would point to freed memory.  Make
@@ -1943,8 +1948,6 @@ brw_vs_emit(struct brw_context *brw,
   }
   g.generate_code(v.cfg, 8);
   assembly = g.get_assembly(final_assembly_size);
-
-  c->base.last_scratch = v.last_scratch;
}
 
if (!assembly) {
diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index edbcbcf..ee3f664 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -195,10 +195,7 @@ brw_codegen_vs_prog(struct brw_context *brw,
}
 
/* Scratch space is used for register spilling */
-   if (c.base.last_scratch) {
-  prog_data.base.base.total_scratch
- = brw_get_scratch_size(c.base.last_scratch*REG_SIZE);
-
+   if (prog_data.base.base.total_scratch) {
   brw_get_scratch_bo(brw, &brw->vs.base.scratch_bo,
 prog_data.base.base.total_scratch *
  brw->max_vs_threads);
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] i965/vs: Remove 'c'/vs_compile from vec4_vs_visitor.

2015-07-01 Thread Kenneth Graunke
At this point, the brw_vs_compile structure only contains the key and
gl_vertex_program pointer.  We may as well pass and store them directly;
it's simpler and more convenient (key-> instead of vs_compile->key...).

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp|  4 ++--
 src/mesa/drivers/dri/i965/brw_vec4_vp.cpp |  9 +++--
 src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 11 ++-
 src/mesa/drivers/dri/i965/brw_vs.h|  6 --
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e5db268..42d014c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1953,8 +1953,8 @@ brw_vs_emit(struct brw_context *brw,
if (!assembly) {
   prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
 
-  vec4_vs_visitor v(brw->intelScreen->compiler, brw,
-c, prog_data, prog, mem_ctx, st_index,
+  vec4_vs_visitor v(brw->intelScreen->compiler, brw, &c->key, prog_data,
+&c->vp->program, prog, mem_ctx, st_index,
 !_mesa_is_gles3(&brw->ctx));
   if (!v.run(brw_select_clip_planes(&brw->ctx))) {
  if (prog) {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp
index dcbd240..d1a72d7 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp
@@ -394,8 +394,7 @@ vec4_vs_visitor::emit_program_code()
 * pull constants.  Do that now.
 */
if (this->need_all_constants_in_pull_buffer) {
-  const struct gl_program_parameter_list *params =
- vs_compile->vp->program.Base.Parameters;
+  const struct gl_program_parameter_list *params = vp->Base.Parameters;
   unsigned i;
   for (i = 0; i < params->NumParameters * 4; i++) {
  stage_prog_data->pull_param[i] =
@@ -415,8 +414,7 @@ vec4_vs_visitor::setup_vp_regs()
   vp_temp_regs[i] = src_reg(this, glsl_type::vec4_type);
 
/* PROGRAM_STATE_VAR etc. */
-   struct gl_program_parameter_list *plist =
-  vs_compile->vp->program.Base.Parameters;
+   struct gl_program_parameter_list *plist = vp->Base.Parameters;
for (unsigned p = 0; p < plist->NumParameters; p++) {
   unsigned components = plist->Parameters[p].Size;
 
@@ -486,8 +484,7 @@ vec4_vs_visitor::get_vp_dst_reg(const prog_dst_register 
&dst)
 src_reg
 vec4_vs_visitor::get_vp_src_reg(const prog_src_register &src)
 {
-   struct gl_program_parameter_list *plist =
-  vs_compile->vp->program.Base.Parameters;
+   struct gl_program_parameter_list *plist = vp->Base.Parameters;
 
src_reg result;
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
index 35b601a..b7ec8b9 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
@@ -36,7 +36,7 @@ vec4_vs_visitor::emit_prolog()
 
for (int i = 0; i < VERT_ATTRIB_MAX; i++) {
   if (vs_prog_data->inputs_read & BITFIELD64_BIT(i)) {
- uint8_t wa_flags = vs_compile->key.gl_attrib_wa_flags[i];
+ uint8_t wa_flags = key->gl_attrib_wa_flags[i];
  dst_reg reg(ATTR, i);
  dst_reg reg_d = reg;
  reg_d.type = BRW_REGISTER_TYPE_D;
@@ -213,20 +213,21 @@ vec4_vs_visitor::emit_thread_end()
 
 vec4_vs_visitor::vec4_vs_visitor(const struct brw_compiler *compiler,
  void *log_data,
- struct brw_vs_compile *vs_compile,
+ const struct brw_vs_prog_key *key,
  struct brw_vs_prog_data *vs_prog_data,
+ struct gl_vertex_program *vp,
  struct gl_shader_program *prog,
  void *mem_ctx,
  int shader_time_index,
  bool use_legacy_snorm_formula)
: vec4_visitor(compiler, log_data,
-  &vs_compile->vp->program.Base,
-  &vs_compile->key.base, &vs_prog_data->base, prog,
+  &vp->Base, &key->base, &vs_prog_data->base, prog,
   MESA_SHADER_VERTEX,
   mem_ctx, false /* no_spills */,
   shader_time_index),
- vs_compile(vs_compile),
+ key(key),
  vs_prog_data(vs_prog_data),
+ vp(vp),
  use_legacy_snorm_formula(use_legacy_snorm_formula)
 {
 }
diff --git a/src/mesa/drivers/dri/i965/brw_vs.h 
b/src/mesa/drivers/dri/i965/brw_vs.h
index 3a131b0..0481c44 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.h
+++ b/src/mesa/drivers/dri/i9

[Mesa-dev] [PATCH 2/6] i965/vec4: Move perf_debug about register spilling into the visitor.

2015-07-01 Thread Kenneth Graunke
This patch makes us only issue the performance warning about register
spilling if we actually spilled registers.  We also use scratch space
for indirect addressing and the like.

This is basically commit c51163b0cf7aff0375b1a5ea4cb3da9d9e164044 for
the vec4 backend.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_gs.c |  4 
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 16 +---
 src/mesa/drivers/dri/i965/brw_vs.c |  4 
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
b/src/mesa/drivers/dri/i965/brw_gs.c
index 52c7303..7f947e0 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -268,10 +268,6 @@ brw_codegen_gs_prog(struct brw_context *brw,
 
/* Scratch space is used for register spilling */
if (c.base.last_scratch) {
-  perf_debug("Geometry shader triggered register spilling.  "
- "Try reducing the number of live vec4 values to "
- "improve performance.\n");
-
   c.prog_data.base.base.total_scratch
  = brw_get_scratch_size(c.base.last_scratch*REG_SIZE);
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 2a56564..60f73e2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1827,9 +1827,19 @@ vec4_visitor::run(gl_clip_plane *clip_planes)
   }
}
 
-   while (!reg_allocate()) {
-  if (failed)
- return false;
+   bool allocated_without_spills = reg_allocate();
+
+   if (!allocated_without_spills) {
+  compiler->shader_perf_log(log_data,
+"%s shader triggered register spilling.  "
+"Try reducing the number of live vec4 values "
+"to improve performance.\n",
+stage_name);
+
+  while (!reg_allocate()) {
+ if (failed)
+return false;
+  }
}
 
opt_schedule_instructions();
diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index 6e9848f..edbcbcf 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -196,10 +196,6 @@ brw_codegen_vs_prog(struct brw_context *brw,
 
/* Scratch space is used for register spilling */
if (c.base.last_scratch) {
-  perf_debug("Vertex shader triggered register spilling.  "
- "Try reducing the number of live vec4 values to "
- "improve performance.\n");
-
   prog_data.base.base.total_scratch
  = brw_get_scratch_size(c.base.last_scratch*REG_SIZE);
 
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] i965/vec4: Plumb log_data through so the backend_shader field gets set.

2015-07-01 Thread Kenneth Graunke
Jason plumbed this through a while back in the FS backend, but
apparently we were just passing NULL in the vec4 backend.

This patch passes brw in as intended.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp|  2 +-
 src/mesa/drivers/dri/i965/brw_vec4.h  |  1 +
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 10 ++
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h   |  1 +
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp|  3 ++-
 src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp |  4 +++-
 src/mesa/drivers/dri/i965/brw_vs.h|  1 +
 src/mesa/drivers/dri/i965/gen6_gs_visitor.h   |  4 +++-
 8 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index a5c686c..2a56564 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1940,7 +1940,7 @@ brw_vs_emit(struct brw_context *brw,
if (!assembly) {
   prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
 
-  vec4_vs_visitor v(brw->intelScreen->compiler,
+  vec4_vs_visitor v(brw->intelScreen->compiler, brw,
 c, prog_data, prog, mem_ctx, st_index,
 !_mesa_is_gles3(&brw->ctx));
   if (!v.run(brw_select_clip_planes(&brw->ctx))) {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 2ac1693..043557b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -77,6 +77,7 @@ class vec4_visitor : public backend_shader, public ir_visitor
 {
 public:
vec4_visitor(const struct brw_compiler *compiler,
+void *log_data,
 struct brw_vec4_compile *c,
 struct gl_program *prog,
 const struct brw_vue_prog_key *key,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index 69bcf5a..80c59af 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -35,12 +35,14 @@ const unsigned MAX_GS_INPUT_VERTICES = 6;
 namespace brw {
 
 vec4_gs_visitor::vec4_gs_visitor(const struct brw_compiler *compiler,
+ void *log_data,
  struct brw_gs_compile *c,
  struct gl_shader_program *prog,
  void *mem_ctx,
  bool no_spills,
  int shader_time_index)
-   : vec4_visitor(compiler, &c->base, &c->gp->program.Base, &c->key.base,
+   : vec4_visitor(compiler, log_data,
+  &c->base, &c->gp->program.Base, &c->key.base,
   &c->prog_data.base, prog, MESA_SHADER_GEOMETRY, mem_ctx,
   no_spills, shader_time_index),
  c(c)
@@ -662,7 +664,7 @@ brw_gs_emit(struct brw_context *brw,
   likely(!(INTEL_DEBUG & DEBUG_NO_DUAL_OBJECT_GS))) {
  c->prog_data.base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
 
- vec4_gs_visitor v(brw->intelScreen->compiler,
+ vec4_gs_visitor v(brw->intelScreen->compiler, brw,
c, prog, mem_ctx, true /* no_spills */, st_index);
  if (v.run(NULL /* clip planes */)) {
 return generate_assembly(brw, prog, &c->gp->program.Base,
@@ -704,11 +706,11 @@ brw_gs_emit(struct brw_context *brw,
const unsigned *ret = NULL;
 
if (brw->gen >= 7)
-  gs = new vec4_gs_visitor(brw->intelScreen->compiler,
+  gs = new vec4_gs_visitor(brw->intelScreen->compiler, brw,
c, prog, mem_ctx, false /* no_spills */,
st_index);
else
-  gs = new gen6_gs_visitor(brw->intelScreen->compiler,
+  gs = new gen6_gs_visitor(brw->intelScreen->compiler, brw,
c, prog, mem_ctx, false /* no_spills */,
st_index);
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
index e693c56..e48d861 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
@@ -69,6 +69,7 @@ class vec4_gs_visitor : public vec4_visitor
 {
 public:
vec4_gs_visitor(const struct brw_compiler *compiler,
+   void *log_data,
struct brw_gs_compile *c,
struct gl_shader_program *prog,
void *mem_ctx,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 8d7a80b..21d9b01 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers

[Mesa-dev] [PATCH 6/6] i965/vs: Get rid of brw_vs_compile completely.

2015-07-01 Thread Kenneth Graunke
After tearing it out another level or two, and just passing the key and
vp directly, we can finally remove this struct.  It also eliminates a
pointless memcpy() of the key.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 37 +-
 src/mesa/drivers/dri/i965/brw_vs.c | 20 --
 src/mesa/drivers/dri/i965/brw_vs.h | 13 
 3 files changed, 31 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 42d014c..39715c4 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1872,10 +1872,11 @@ extern "C" {
  */
 const unsigned *
 brw_vs_emit(struct brw_context *brw,
-struct gl_shader_program *prog,
-struct brw_vs_compile *c,
-struct brw_vs_prog_data *prog_data,
 void *mem_ctx,
+const struct brw_vs_prog_key *key,
+struct brw_vs_prog_data *prog_data,
+struct gl_vertex_program *vp,
+struct gl_shader_program *prog,
 unsigned *final_assembly_size)
 {
bool start_busy = false;
@@ -1894,29 +1895,29 @@ brw_vs_emit(struct brw_context *brw,
 
int st_index = -1;
if (INTEL_DEBUG & DEBUG_SHADER_TIME)
-  st_index = brw_get_shader_time_index(brw, prog, &c->vp->program.Base,
+  st_index = brw_get_shader_time_index(brw, prog, &vp->Base,
ST_VS);
 
if (unlikely(INTEL_DEBUG & DEBUG_VS))
-  brw_dump_ir("vertex", prog, &shader->base, &c->vp->program.Base);
+  brw_dump_ir("vertex", prog, &shader->base, &vp->Base);
 
if (brw->intelScreen->compiler->scalar_vs) {
-  if (!c->vp->program.Base.nir) {
+  if (!vp->Base.nir) {
  /* Normally we generate NIR in LinkShader() or
   * ProgramStringNotify(), but Mesa's fixed-function vertex program
   * handling doesn't notify the driver at all.  Just do it here, at
   * the last minute, even though it's lame.
   */
- assert(c->vp->program.Base.Id == 0 && prog == NULL);
- c->vp->program.Base.nir =
-brw_create_nir(brw, NULL, &c->vp->program.Base, 
MESA_SHADER_VERTEX);
+ assert(vp->Base.Id == 0 && prog == NULL);
+ vp->Base.nir =
+brw_create_nir(brw, NULL, &vp->Base, MESA_SHADER_VERTEX);
   }
 
   prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
 
   fs_visitor v(brw->intelScreen->compiler, brw,
-   mem_ctx, MESA_SHADER_VERTEX, &c->key,
-   &prog_data->base.base, prog, &c->vp->program.Base,
+   mem_ctx, MESA_SHADER_VERTEX, key,
+   &prog_data->base.base, prog, &vp->Base,
8, st_index);
   if (!v.run_vs(brw_select_clip_planes(&brw->ctx))) {
  if (prog) {
@@ -1931,8 +1932,8 @@ brw_vs_emit(struct brw_context *brw,
   }
 
   fs_generator g(brw->intelScreen->compiler, brw,
- mem_ctx, (void *) &c->key, &prog_data->base.base,
- &c->vp->program.Base, v.promoted_constants,
+ mem_ctx, (void *) key, &prog_data->base.base,
+ &vp->Base, v.promoted_constants,
  v.runtime_check_aads_emit, "VS");
   if (INTEL_DEBUG & DEBUG_VS) {
  char *name;
@@ -1942,7 +1943,7 @@ brw_vs_emit(struct brw_context *brw,
prog->Name);
  } else {
 name = ralloc_asprintf(mem_ctx, "vertex program %d",
-   c->vp->program.Base.Id);
+   vp->Base.Id);
  }
  g.enable_debug(name);
   }
@@ -1953,8 +1954,8 @@ brw_vs_emit(struct brw_context *brw,
if (!assembly) {
   prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
 
-  vec4_vs_visitor v(brw->intelScreen->compiler, brw, &c->key, prog_data,
-&c->vp->program, prog, mem_ctx, st_index,
+  vec4_vs_visitor v(brw->intelScreen->compiler, brw, key, prog_data,
+vp, prog, mem_ctx, st_index,
 !_mesa_is_gles3(&brw->ctx));
   if (!v.run(brw_select_clip_planes(&brw->ctx))) {
  if (prog) {
@@ -1969,14 +1970,14 @@ brw_vs_emit(struct brw_context *brw,
   }
 
   vec4_generator g(brw->intelScreen->compiler, brw,
-   prog, &c->vp->program.Base, &prog_data->base,
+   prog, &vp->Base, &prog_data->base,

[Mesa-dev] [PATCH 4/6] i965/vec4: Move c->last_scratch into vec4_visitor.

2015-07-01 Thread Kenneth Graunke
Nothing outside of vec4_visitor uses it, so we may as well keep it
internal.

Commit db9c915abcc5ad78d2d11d0e732f04cc94631350 for the vec4 backend.

(The empty class will be going away soon.)

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp  |  4 ++--
 src/mesa/drivers/dri/i965/brw_vec4.h|  8 ++--
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp   |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h |  1 -
 src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp  | 17 -
 src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp   |  2 +-
 src/mesa/drivers/dri/i965/brw_vs.h  |  1 -
 8 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 7b367ec..e5db268 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1846,9 +1846,9 @@ vec4_visitor::run(gl_clip_plane *clip_planes)
 
opt_set_dependency_control();
 
-   if (c->last_scratch > 0) {
+   if (last_scratch > 0) {
   prog_data->base.total_scratch =
- brw_get_scratch_size(c->last_scratch * REG_SIZE);
+ brw_get_scratch_size(last_scratch * REG_SIZE);
}
 
/* If any state parameters were appended, then ParameterValues could have
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 043557b..3643651 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -47,10 +47,6 @@ extern "C" {
 #include "glsl/ir.h"
 
 
-struct brw_vec4_compile {
-   GLuint last_scratch; /**< measured in 32-byte (register size) units */
-};
-
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -78,7 +74,6 @@ class vec4_visitor : public backend_shader, public ir_visitor
 public:
vec4_visitor(const struct brw_compiler *compiler,
 void *log_data,
-struct brw_vec4_compile *c,
 struct gl_program *prog,
 const struct brw_vue_prog_key *key,
 struct brw_vue_prog_data *prog_data,
@@ -104,7 +99,6 @@ public:
   return dst_reg(retype(brw_null_reg(), BRW_REGISTER_TYPE_UD));
}
 
-   struct brw_vec4_compile * const c;
const struct brw_vue_prog_key * const key;
struct brw_vue_prog_data * const prog_data;
unsigned int sanity_param_count;
@@ -412,6 +406,8 @@ private:
const bool no_spills;
 
int shader_time_index;
+
+   unsigned last_scratch; /**< measured in 32-byte (register size) units */
 };
 
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index 80c59af..2f948ee 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -42,7 +42,7 @@ vec4_gs_visitor::vec4_gs_visitor(const struct brw_compiler 
*compiler,
  bool no_spills,
  int shader_time_index)
: vec4_visitor(compiler, log_data,
-  &c->base, &c->gp->program.Base, &c->key.base,
+  &c->gp->program.Base, &c->key.base,
   &c->prog_data.base, prog, MESA_SHADER_GEOMETRY, mem_ctx,
   no_spills, shader_time_index),
  c(c)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
index e48d861..e51399d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
@@ -37,7 +37,6 @@
  */
 struct brw_gs_compile
 {
-   struct brw_vec4_compile base;
struct brw_gs_prog_key key;
struct brw_gs_prog_data prog_data;
struct brw_vue_map input_vue_map;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
index 555c42e..cd89edd 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
@@ -339,7 +339,7 @@ void
 vec4_visitor::spill_reg(int spill_reg_nr)
 {
assert(alloc.sizes[spill_reg_nr] == 1);
-   unsigned int spill_offset = c->last_scratch++;
+   unsigned int spill_offset = last_scratch++;
 
/* Generate spill/unspill instructions for the objects being spilled. */
foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 21d9b01..c9c2661 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -3484,16 +3484,16 @@ vec4_visitor::move_grf_array_access_to_scratch()
foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
   if (inst->dst.file == GRF && i

[Mesa-dev] [PATCH 1/2] i965/gs: Move vertex_count != 0 check up a level; skip one caller.

2015-07-01 Thread Kenneth Graunke
Paul's original code had emit_control_data_bits() skip the URB write if
vertex_count was 0.  This meant wrapping every control data write in a
conditional write.

We accumulate control data bits in a single UD (32-bit) register.  For
simple shaders that don't emit many vertices, the control data header
will be <= 32-bits long, so we only need to write it once at the end of
the shader.

For shaders with larger headers, we write out batches of control data
bits at EmitVertex(), when (vertex_count * bits_per_vertex) % 32 == 0.
On the first EmitVertex() call, the above expression will evaluate to
true simply because vertex_count == 0.  But we want to avoid emitting
the control data bits, because we haven't accumulated 32-bits worth yet.

In other words, the vertex_count != 0 check is really only necessary in
the EmitVertex() batching case, not the end-of-thread case.

This saves a CMP/IF/ENDIF in every shader that uses EndPrimitive() or
multiple streams.  The only downside is that a shader which emits no
vertices at all will execute an additional URB write---but such shaders
are pointless and not worth optimizing.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index 2f948ee..55408eb 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -348,11 +348,6 @@ vec4_gs_visitor::emit_control_data_bits()
if (c->control_data_header_size_bits > 128)
   urb_write_flags = urb_write_flags | BRW_URB_WRITE_PER_SLOT_OFFSET;
 
-   /* If vertex_count is 0, then no control data bits have been accumulated
-* yet, so we should do nothing.
-*/
-   emit(CMP(dst_null_d(), this->vertex_count, 0u, BRW_CONDITIONAL_NEQ));
-   emit(IF(BRW_PREDICATE_NORMAL));
{
   /* If we are using either channel masks or a per-slot offset, then we
* need to figure out which DWORD we are trying to write to, using the
@@ -431,7 +426,6 @@ vec4_gs_visitor::emit_control_data_bits()
   inst->base_mrf = base_mrf;
   inst->mlen = 2;
}
-   emit(BRW_OPCODE_ENDIF);
 }
 
 void
@@ -531,9 +525,17 @@ vec4_gs_visitor::visit(ir_emit_vertex *ir)
 emit(AND(dst_null_d(), this->vertex_count,
  (uint32_t) (32 / c->control_data_bits_per_vertex - 1)));
  inst->conditional_mod = BRW_CONDITIONAL_Z;
+
  emit(IF(BRW_PREDICATE_NORMAL));
  {
+/* If vertex_count is 0, then no control data bits have been
+ * accumulated yet, so we skip emitting them.
+ */
+emit(CMP(dst_null_d(), this->vertex_count, 0u,
+ BRW_CONDITIONAL_NEQ));
+emit(IF(BRW_PREDICATE_NORMAL));
 emit_control_data_bits();
+emit(BRW_OPCODE_ENDIF);
 
 /* Reset control_data_bits to 0 so we can start accumulating a new
  * batch.
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Fix indentation in emit_control_data_bits().

2015-07-01 Thread Kenneth Graunke
The last patch left the code indented too far.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 142 +++---
 1 file changed, 70 insertions(+), 72 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index 55408eb..d6b350b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -348,84 +348,82 @@ vec4_gs_visitor::emit_control_data_bits()
if (c->control_data_header_size_bits > 128)
   urb_write_flags = urb_write_flags | BRW_URB_WRITE_PER_SLOT_OFFSET;
 
-   {
-  /* If we are using either channel masks or a per-slot offset, then we
-   * need to figure out which DWORD we are trying to write to, using the
-   * formula:
-   *
-   * dword_index = (vertex_count - 1) * bits_per_vertex / 32
-   *
-   * Since bits_per_vertex is a power of two, and is known at compile
-   * time, this can be optimized to:
-   *
-   * dword_index = (vertex_count - 1) >> (6 - log2(bits_per_vertex))
+   /* If we are using either channel masks or a per-slot offset, then we
+* need to figure out which DWORD we are trying to write to, using the
+* formula:
+*
+* dword_index = (vertex_count - 1) * bits_per_vertex / 32
+*
+* Since bits_per_vertex is a power of two, and is known at compile
+* time, this can be optimized to:
+*
+* dword_index = (vertex_count - 1) >> (6 - log2(bits_per_vertex))
+*/
+   src_reg dword_index(this, glsl_type::uint_type);
+   if (urb_write_flags) {
+  src_reg prev_count(this, glsl_type::uint_type);
+  emit(ADD(dst_reg(prev_count), this->vertex_count, 0xu));
+  unsigned log2_bits_per_vertex =
+ _mesa_fls(c->control_data_bits_per_vertex);
+  emit(SHR(dst_reg(dword_index), prev_count,
+   (uint32_t) (6 - log2_bits_per_vertex)));
+   }
+
+   /* Start building the URB write message.  The first MRF gets a copy of
+* R0.
+*/
+   int base_mrf = 1;
+   dst_reg mrf_reg(MRF, base_mrf);
+   src_reg r0(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
+   vec4_instruction *inst = emit(MOV(mrf_reg, r0));
+   inst->force_writemask_all = true;
+
+   if (urb_write_flags & BRW_URB_WRITE_PER_SLOT_OFFSET) {
+  /* Set the per-slot offset to dword_index / 4, to that we'll write to
+   * the appropriate OWORD within the control data header.
*/
-  src_reg dword_index(this, glsl_type::uint_type);
-  if (urb_write_flags) {
- src_reg prev_count(this, glsl_type::uint_type);
- emit(ADD(dst_reg(prev_count), this->vertex_count, 0xu));
- unsigned log2_bits_per_vertex =
-_mesa_fls(c->control_data_bits_per_vertex);
- emit(SHR(dst_reg(dword_index), prev_count,
-  (uint32_t) (6 - log2_bits_per_vertex)));
-  }
+  src_reg per_slot_offset(this, glsl_type::uint_type);
+  emit(SHR(dst_reg(per_slot_offset), dword_index, 2u));
+  emit(GS_OPCODE_SET_WRITE_OFFSET, mrf_reg, per_slot_offset, 1u);
+   }
 
-  /* Start building the URB write message.  The first MRF gets a copy of
-   * R0.
+   if (urb_write_flags & BRW_URB_WRITE_USE_CHANNEL_MASKS) {
+  /* Set the channel masks to 1 << (dword_index % 4), so that we'll
+   * write to the appropriate DWORD within the OWORD.  We need to do
+   * this computation with force_writemask_all, otherwise garbage data
+   * from invocation 0 might clobber the mask for invocation 1 when
+   * GS_OPCODE_PREPARE_CHANNEL_MASKS tries to OR the two masks
+   * together.
*/
-  int base_mrf = 1;
-  dst_reg mrf_reg(MRF, base_mrf);
-  src_reg r0(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
-  vec4_instruction *inst = emit(MOV(mrf_reg, r0));
+  src_reg channel(this, glsl_type::uint_type);
+  inst = emit(AND(dst_reg(channel), dword_index, 3u));
   inst->force_writemask_all = true;
-
-  if (urb_write_flags & BRW_URB_WRITE_PER_SLOT_OFFSET) {
- /* Set the per-slot offset to dword_index / 4, to that we'll write to
-  * the appropriate OWORD within the control data header.
-  */
- src_reg per_slot_offset(this, glsl_type::uint_type);
- emit(SHR(dst_reg(per_slot_offset), dword_index, 2u));
- emit(GS_OPCODE_SET_WRITE_OFFSET, mrf_reg, per_slot_offset, 1u);
-  }
-
-  if (urb_write_flags & BRW_URB_WRITE_USE_CHANNEL_MASKS) {
- /* Set the channel masks to 1 << (dword_index % 4), so that we'll
-  * write to the appropriate DWORD within the OWORD.  We need to do
-  * this computation with force_writemask_all, otherwise garbage data
-  * from invocation 0 might clobber the mask for invocation 1 when
-  * GS_OPCODE_PREPARE_CHANNEL_MASKS tries to 

Re: [Mesa-dev] [PATCH] i965: allocate at least 1 BLEND_STATE element

2015-07-02 Thread Kenneth Graunke
On Wednesday, July 01, 2015 10:16:28 AM Mike Stroyan wrote:
> When there are no color buffer render targets, gen6 and gen7 still
> use the first BLEND_STATE element to determine alpha test.
> gen6_upload_blend_state was allocating zero elements when
> ctx->Color.AlphaEnabled was false.
> That left _3DSTATE_CC_STATE_POINTERS or _3DSTATE_BLEND_STATE_POINTERS
> pointing to random data from some previous brw_state_batch().
> That sometimes suppressed depth rendering when those bits
> happened to mean COMPAREFUNC_NEVER.
> This produced flickering shadows for dota2 reborn.
> ---
>  src/mesa/drivers/dri/i965/gen6_cc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen6_cc.c 
> b/src/mesa/drivers/dri/i965/gen6_cc.c
> index 2bfa271..2b76e24 100644
> --- a/src/mesa/drivers/dri/i965/gen6_cc.c
> +++ b/src/mesa/drivers/dri/i965/gen6_cc.c
> @@ -51,7 +51,7 @@ gen6_upload_blend_state(struct brw_context *brw)
>  * with render target 0, which will reference BLEND_STATE[0] for
>  * alpha test enable.
>  */
> -   if (nr_draw_buffers == 0 && ctx->Color.AlphaEnabled)
> +   if (nr_draw_buffers == 0)
>nr_draw_buffers = 1;
>  
> size = sizeof(*blend) * nr_draw_buffers;
> 

Great catch!

Reviewed-by: Kenneth Graunke 

And pushed:
   9d408a4..fe2b748  master -> master

I think we ought to change gen8_blend_state.c as well, but I'm not quite
sure what change to make.  Either we should make the same change you did
here, or delete the whole "We need at least 1 BLEND_STATE written"
block.

On Gen8+, it looks like the alpha test and other functions that might
discard pixels are all in the shared/common DWord, and the per-color
target DWord pairs look relatively harmless.  I suppose the null RT
would still refer to BLEND_STATE[0]...so it might still be worth
emitting one.  Any thoughts?


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10.6] i965/vs: Fix matNxM vertex attributes where M != 4.

2015-07-02 Thread Kenneth Graunke
Matrix vertex attributes have their columns padded out to vec4s, which
I was failing to account for.  Scalar NIR expects them to be packed,
however.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

Here's a backport of the patch to 10.6, since the one on master won't apply
due to all the builder changes.

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 5dd8363..6d11c0d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -89,12 +89,19 @@ fs_visitor::nir_setup_inputs(nir_shader *shader)
   * So, we need to copy from fs_reg(ATTR, var->location) to
   * offset(nir_inputs, var->data.driver_location).
   */
- unsigned components = var->type->without_array()->components();
+ const glsl_type *const t = var->type->without_array();
+ const unsigned components = t->components();
+ const unsigned cols = t->matrix_columns;
+ const unsigned elts = t->vector_elements;
  unsigned array_length = var->type->is_array() ? var->type->length : 1;
  for (unsigned i = 0; i < array_length; i++) {
-for (unsigned j = 0; j < components; j++) {
-   emit(MOV(retype(offset(input, components * i + j), type),
-offset(fs_reg(ATTR, var->data.location + i, type), 
j)));
+for (unsigned j = 0; j < cols; j++) {
+   for (unsigned k = 0; k < elts; k++) {
+  emit(MOV(offset(retype(input, type),
+  components * i + elts * j + k),
+   offset(fs_reg(ATTR, var->data.location + i, type),
+  4 * j + k)));
+   }
 }
  }
  break;
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/vs: Fix matNxM vertex attributes where M != 4.

2015-07-02 Thread Kenneth Graunke
Matrix vertex attributes have their columns padded out to vec4s, which
I was failing to account for.  Scalar NIR expects them to be packed,
however.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

I still need to write proper Piglit tests for this.  We have basically a single
test for matrix vertex attributes, and that's a mat4 (which worked).

But I figure we probably shouldn't hold up the bugfix on that.

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index caf1300..37b1ed7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -91,12 +91,19 @@ fs_visitor::nir_setup_inputs(nir_shader *shader)
   * So, we need to copy from fs_reg(ATTR, var->location) to
   * offset(nir_inputs, var->data.driver_location).
   */
- unsigned components = var->type->without_array()->components();
+ const glsl_type *const t = var->type->without_array();
+ const unsigned components = t->components();
+ const unsigned cols = t->matrix_columns;
+ const unsigned elts = t->vector_elements;
  unsigned array_length = var->type->is_array() ? var->type->length : 1;
  for (unsigned i = 0; i < array_length; i++) {
-for (unsigned j = 0; j < components; j++) {
-   bld.MOV(retype(offset(input, bld, components * i + j), type),
-   offset(fs_reg(ATTR, var->data.location + i, type), bld, 
j));
+for (unsigned j = 0; j < cols; j++) {
+   for (unsigned k = 0; k < elts; k++) {
+  bld.MOV(offset(retype(input, type), bld,
+ components * i + elts * j + k),
+  offset(fs_reg(ATTR, var->data.location + i, type),
+ bld, 4 * j + k));
+   }
 }
  }
  break;
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] i965/vec4: Plumb log_data through so the backend_shader field gets set.

2015-07-03 Thread Kenneth Graunke
On Friday, July 03, 2015 10:50:52 AM Pohjolainen, Topi wrote:
> On Wed, Jul 01, 2015 at 03:03:31PM -0700, Kenneth Graunke wrote:
> > Jason plumbed this through a while back in the FS backend, but
> > apparently we were just passing NULL in the vec4 backend.
> > 
> > This patch passes brw in as intended.
> > 
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/brw_vec4.cpp|  2 +-
> >  src/mesa/drivers/dri/i965/brw_vec4.h  |  1 +
> >  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 10 ++
> >  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h   |  1 +
> >  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp|  3 ++-
> >  src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp |  4 +++-
> >  src/mesa/drivers/dri/i965/brw_vs.h|  1 +
> >  src/mesa/drivers/dri/i965/gen6_gs_visitor.h   |  4 +++-
> >  8 files changed, 18 insertions(+), 8 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> > b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > index a5c686c..2a56564 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > @@ -1940,7 +1940,7 @@ brw_vs_emit(struct brw_context *brw,
> > if (!assembly) {
> >prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
> >  
> > -  vec4_vs_visitor v(brw->intelScreen->compiler,
> > +  vec4_vs_visitor v(brw->intelScreen->compiler, brw,
> >  c, prog_data, prog, mem_ctx, st_index,
> >  !_mesa_is_gles3(&brw->ctx));
> >if (!v.run(brw_select_clip_planes(&brw->ctx))) {
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
> > b/src/mesa/drivers/dri/i965/brw_vec4.h
> > index 2ac1693..043557b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> > @@ -77,6 +77,7 @@ class vec4_visitor : public backend_shader, public 
> > ir_visitor
> >  {
> >  public:
> > vec4_visitor(const struct brw_compiler *compiler,
> > +void *log_data,
> 
> As far as I can see, all the constructors addressed in this patch are
> "struct brw_context" aware. Could we use the type "struct brw_context *"
> instead of "void *"? The pointer is in the end given to shader_perf_log_mesa()
> which in turn unconditionally casts is to "struct brw_context *".

Jason is trying to separate the compiler backend from the OpenGL driver,
so we can more easily reuse it...elsewhere :)  "elsewhere" does not have
a brw_context, but will have some other structure.  So instead he made the
logging functions pass a void * closure.

I'm also concerned that if we pass in brw_context that people will start
using it everywhere.  Admittedly, having to type log_data-> might deter
them, though...


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] i965/vec4: Move perf_debug about register spilling into the visitor.

2015-07-03 Thread Kenneth Graunke
On Friday, July 03, 2015 11:11:45 AM Pohjolainen, Topi wrote:
> On Wed, Jul 01, 2015 at 03:03:32PM -0700, Kenneth Graunke wrote:
> > This patch makes us only issue the performance warning about register
> > spilling if we actually spilled registers.  We also use scratch space
> > for indirect addressing and the like.
> > 
> > This is basically commit c51163b0cf7aff0375b1a5ea4cb3da9d9e164044 for
> > the vec4 backend.
> > 
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/brw_gs.c |  4 
> >  src/mesa/drivers/dri/i965/brw_vec4.cpp | 16 +---
> >  src/mesa/drivers/dri/i965/brw_vs.c |  4 
> >  3 files changed, 13 insertions(+), 11 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
> > b/src/mesa/drivers/dri/i965/brw_gs.c
> > index 52c7303..7f947e0 100644
> > --- a/src/mesa/drivers/dri/i965/brw_gs.c
> > +++ b/src/mesa/drivers/dri/i965/brw_gs.c
> > @@ -268,10 +268,6 @@ brw_codegen_gs_prog(struct brw_context *brw,
> >  
> > /* Scratch space is used for register spilling */
> > if (c.base.last_scratch) {
> > -  perf_debug("Geometry shader triggered register spilling.  "
> > - "Try reducing the number of live vec4 values to "
> > - "improve performance.\n");
> > -
> >c.prog_data.base.base.total_scratch
> >   = brw_get_scratch_size(c.base.last_scratch*REG_SIZE);
> >  
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> > b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > index 2a56564..60f73e2 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > @@ -1827,9 +1827,19 @@ vec4_visitor::run(gl_clip_plane *clip_planes)
> >}
> > }
> >  
> > -   while (!reg_allocate()) {
> > -  if (failed)
> > - return false;
> > +   bool allocated_without_spills = reg_allocate();
> > +
> > +   if (!allocated_without_spills) {
> > +  compiler->shader_perf_log(log_data,
> > +"%s shader triggered register spilling.  "
> > +"Try reducing the number of live vec4 
> > values "
> > +"to improve performance.\n",
> > +stage_name);
> > +
> > +  while (!reg_allocate()) {
> 
> I tried to understand a little how repeating calls to reg_allocate() differ
> from previous in result wise. I didn't really get it but that doesn't really
> prevent me from reviewing this patch. This patch preserves the logic while
> corresponding to the intent in commit message.

The interface is indeed weird.  reg_allocate() may fail to allocate
without spilling, at which point it will spill a register, and return
false.  The caller is expected to call it again to retry.

I don't know why it doesn't just do that itself.

> Reviewed-by: Topi Pohjolainen 
> 
> > + if (failed)
> > +return false;
> > +  }
> > }
> >  
> > opt_schedule_instructions();
> > diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
> > b/src/mesa/drivers/dri/i965/brw_vs.c
> > index 6e9848f..edbcbcf 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vs.c
> > +++ b/src/mesa/drivers/dri/i965/brw_vs.c
> > @@ -196,10 +196,6 @@ brw_codegen_vs_prog(struct brw_context *brw,
> >  
> > /* Scratch space is used for register spilling */
> > if (c.base.last_scratch) {
> > -  perf_debug("Vertex shader triggered register spilling.  "
> > - "Try reducing the number of live vec4 values to "
> > - "improve performance.\n");
> > -
> >prog_data.base.base.total_scratch
> >   = brw_get_scratch_size(c.base.last_scratch*REG_SIZE);
> >  
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/skl: Set the pulls bary bit in 3DSTATE_PS_EXTRA

2015-07-03 Thread Kenneth Graunke
On Friday, July 03, 2015 01:15:21 PM Neil Roberts wrote:
> On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if
> the shader sends a message to the pixel interpolator. This fixes the
> interpolateAt* tests on SKL, apart from interpolateatsample-nonconst
> but that is not implemented anywhere so it's not a regression.
> ---
>  src/mesa/drivers/dri/i965/brw_context.h   | 1 +
>  src/mesa/drivers/dri/i965/brw_defines.h   | 1 +
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp  | 4 
>  src/mesa/drivers/dri/i965/gen8_ps_state.c | 3 +++
>  4 files changed, 9 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 3553f6e..7596139 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -415,6 +415,7 @@ struct brw_wm_prog_data {
> bool uses_pos_offset;
> bool uses_omask;
> bool uses_kill;
> +   bool pulls_bary;
> uint32_t prog_offset_16;
>  
> /**
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 66b9abc..19489ab 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -2145,6 +2145,7 @@ enum brw_pixel_shader_computed_depth_mode {
>  # define GEN8_PSX_SHADER_DISABLES_ALPHA_TO_COVERAGE (1 << 7)
>  # define GEN8_PSX_SHADER_IS_PER_SAMPLE  (1 << 6)
>  # define GEN8_PSX_SHADER_COMPUTES_STENCIL   (1 << 5)
> +# define GEN9_PSX_SHADER_PULLS_BARY (1 << 3)
>  # define GEN8_PSX_SHADER_HAS_UAV(1 << 2)
>  # define GEN8_PSX_SHADER_USES_INPUT_COVERAGE_MASK   (1 << 1)
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index bd71404..3ebc3a2 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1481,6 +1481,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
> nir_intrinsic_instr *instr
> case nir_intrinsic_interp_var_at_centroid:
> case nir_intrinsic_interp_var_at_sample:
> case nir_intrinsic_interp_var_at_offset: {
> +  assert(stage == MESA_SHADER_FRAGMENT);
> +
> +  ((struct brw_wm_prog_data *) prog_data)->pulls_bary = true;
> +
>fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
>  
>/* For most messages, we need one reg of ignored data; the hardware
> diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c 
> b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> index a88f109..d544509 100644
> --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> @@ -58,6 +58,9 @@ gen8_upload_ps_extra(struct brw_context *brw,
> if (prog_data->uses_omask)
>dw1 |= GEN8_PSX_OMASK_TO_RENDER_TARGET;
>  
> +   if (brw->gen >= 9 && prog_data->pulls_bary)
> +  dw1 |= GEN9_PSX_SHADER_PULLS_BARY;
> +
> if (_mesa_active_fragment_shader_has_atomic_ops(&brw->ctx))
>dw1 |= GEN8_PSX_SHADER_HAS_UAV;
>  
> 

Good find!  That explains a lot.

Cc: "10.6 10.5" 
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] program: Allow redundant OPTION ARB_fog_* directives.

2015-07-04 Thread Kenneth Graunke
A fragment program from "Pixel Piracy" contains redundant OPTION
directives:

!!ARBfp1.0
OPTION ARB_precision_hint_fastest;
OPTION ARB_fog_exp2;
OPTION ARB_precision_hint_fastest;
OPTION ARB_fog_exp2;
...

We already allow redundant ARB_precision_hint_fastest directives, but
disallow the redundant (yet consistent) ARB_fog_exp2 directives, failing
to compile the program.

The specification seems to contradict itself - the main text says that
only one fog application option may be specified, but then backpedals,
indicating the intent is to disallow /contradictory/ flags.  One of the
issues suggests that specifying contradictory ones is stupid, but
allowed, and only the last one should take effect.

Accepting multiple redundant (but consistent) directives seems harmless,
and like a reasonable interpretation of the specification.  It also
fixes a fragment program found in the wild.

Cc: i...@freedesktop.org
Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Kenneth Graunke 
---
 src/mesa/program/program_parse_extra.c | 50 +-
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/src/mesa/program/program_parse_extra.c 
b/src/mesa/program/program_parse_extra.c
index 32b54af..71f86d1 100644
--- a/src/mesa/program/program_parse_extra.c
+++ b/src/mesa/program/program_parse_extra.c
@@ -163,6 +163,8 @@ _mesa_ARBvp_parse_option(struct asm_parser_state *state, 
const char *option)
 int
 _mesa_ARBfp_parse_option(struct asm_parser_state *state, const char *option)
 {
+   unsigned fog_option;
+
/* All of the options currently supported start with "ARB_".  The code is
 * currently structured with nested if-statements because eventually options
 * that start with "NV_" will be supported.  This structure will result in
@@ -177,20 +179,42 @@ _mesa_ARBfp_parse_option(struct asm_parser_state *state, 
const char *option)
   if (strncmp(option, "fog_", 4) == 0) {
 option += 4;
 
-if (state->option.Fog == OPTION_NONE) {
-   if (strcmp(option, "exp") == 0) {
-  state->option.Fog = OPTION_FOG_EXP;
-  return 1;
-   } else if (strcmp(option, "exp2") == 0) {
-  state->option.Fog = OPTION_FOG_EXP2;
-  return 1;
-   } else if (strcmp(option, "linear") == 0) {
-  state->option.Fog = OPTION_FOG_LINEAR;
-  return 1;
-   }
-}
+ if (strcmp(option, "exp") == 0) {
+fog_option = OPTION_FOG_EXP;
+ } else if (strcmp(option, "exp2") == 0) {
+fog_option = OPTION_FOG_EXP2;
+ } else if (strcmp(option, "linear") == 0) {
+fog_option = OPTION_FOG_LINEAR;
+ } else {
+/* invalid option */
+return 0;
+ }
 
-return 0;
+ if (state->option.Fog == OPTION_NONE) {
+state->option.Fog = fog_option;
+return 1;
+ }
+
+ /* The ARB_fragment_program specification instructs us to handle
+  * redundant options in two seemingly contradictory ways:
+  *
+  * Section 3.11.4.5.1 says:
+  * "Only one fog application option may be specified by any given
+  *  fragment program.  A fragment program that specifies more than one
+  *  of the program options "ARB_fog_exp", "ARB_fog_exp2", and
+  *  "ARB_fog_linear", will fail to load."
+  *
+  * Issue 27 says:
+  * "The three mandatory options are ARB_fog_exp, ARB_fog_exp2, and
+  *  ARB_fog_linear.  As these options are mutually exclusive by
+  *  nature, specifying more than one is not useful.  If more than one
+  *  is specified, the last one encountered in the 
+  *  will be the one to actually modify the execution environment."
+  *
+  * We choose to allow programs to specify the same OPTION redundantly,
+  * but fail to load programs that specify contradictory options.
+  */
+ return state->option.Fog == fog_option ? 1 : 0;
   } else if (strncmp(option, "precision_hint_", 15) == 0) {
 option += 15;
 
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Don't disable SIMD16 when using the pixel interpolator

2015-07-05 Thread Kenneth Graunke
On Monday, July 06, 2015 02:45:59 AM Francisco Jerez wrote:
> Matt Turner  writes:
> > On Fri, Jul 3, 2015 at 3:46 AM, Francisco Jerez  
> > wrote:
[snip]
> Yeah.  I did in fact try to implement uaddCarry last Friday without
> using the accumulator by doing something like:
> 
> | CMP.o tmp, src0, -src1
> | MOV dst, -tmp
> 
> ...what of course didn't work because of the extra argument precision
> post-source modifiers and also because the .o condmod doesn't work at
> all on CMP, but...
> 
> > Ideally, we'd recognize merge the addition and carry operations into a
> > single ADDC instruction, but it's pretty unimportant. It's all pretty
> > academic -- I've never seen an application use either operation (or
> > [iu]mulExtended either).
> 
> ...if we did the following instead:
> 
> | ADD tmp, src0, src1
> | CMP.l tmp, tmp, src0
> | MOV dst, -tmp
> 
> the ADD could be easily CSE'ed with the original ADD instruction (and
> the source modifier of the last MOV can also be easily propagated into
> some other instruction), so even though it seems like one instruction
> more than what we emit now it might be a net win (aside from it working
> on SIMD16).  usubBorrow is even easier:
> 
> | CMP.l tmp, src0, src1
> | MOV dst, -tmp
> 
> I was planning to run it through shader-db tomorrow but if you say
> you've never seen them used I guess I shouldn't get my hopes too high? :P

Yeah, there's nothing in shader-db that uses them, so I wouldn't bother.

I'm definitely a huge fan of avoiding the accumulator - it's always been
a total pain to deal with.  If you guys come up with a solution that
uses normal registers and avoids MACH, that sounds great to me.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: allocate at least 1 BLEND_STATE element

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 04:24:12 PM Emil Velikov wrote:
> Hello gents,
> 
> On 2 July 2015 at 08:45, Kenneth Graunke  wrote:
> > On Wednesday, July 01, 2015 10:16:28 AM Mike Stroyan wrote:
> >> When there are no color buffer render targets, gen6 and gen7 still
> >> use the first BLEND_STATE element to determine alpha test.
> >> gen6_upload_blend_state was allocating zero elements when
> >> ctx->Color.AlphaEnabled was false.
> >> That left _3DSTATE_CC_STATE_POINTERS or _3DSTATE_BLEND_STATE_POINTERS
> >> pointing to random data from some previous brw_state_batch().
> >> That sometimes suppressed depth rendering when those bits
> >> happened to mean COMPAREFUNC_NEVER.
> >> This produced flickering shadows for dota2 reborn.
> >> ---
> >>  src/mesa/drivers/dri/i965/gen6_cc.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/gen6_cc.c 
> >> b/src/mesa/drivers/dri/i965/gen6_cc.c
> >> index 2bfa271..2b76e24 100644
> >> --- a/src/mesa/drivers/dri/i965/gen6_cc.c
> >> +++ b/src/mesa/drivers/dri/i965/gen6_cc.c
> >> @@ -51,7 +51,7 @@ gen6_upload_blend_state(struct brw_context *brw)
> >>  * with render target 0, which will reference BLEND_STATE[0] for
> >>  * alpha test enable.
> >>  */
> >> -   if (nr_draw_buffers == 0 && ctx->Color.AlphaEnabled)
> >> +   if (nr_draw_buffers == 0)
> >>nr_draw_buffers = 1;
> >>
> >> size = sizeof(*blend) * nr_draw_buffers;
> >>
> >
> > Great catch!
> >
> > Reviewed-by: Kenneth Graunke 
> >
> > And pushed:
> >9d408a4..fe2b748  master -> master
> >
> > I think we ought to change gen8_blend_state.c as well, but I'm not quite
> > sure what change to make.  Either we should make the same change you did
> > here, or delete the whole "We need at least 1 BLEND_STATE written"
> > block.
> >
> > On Gen8+, it looks like the alpha test and other functions that might
> > discard pixels are all in the shared/common DWord, and the per-color
> > target DWord pairs look relatively harmless.  I suppose the null RT
> > would still refer to BLEND_STATE[0]...so it might still be worth
> > emitting one.  Any thoughts?
> Should we get this into 10.6 or there are some not so obvious
> dependencies that we're missing in the 10.6 branch ? Asking every user
> to rebuild mesa just to play Dota2 seems unreasonable imho.
> 
> Cheers,
> Emil

Yeah, it should definitely land in 10.6.I think I just forgot to add
the Cc stable tag (sorry!).  It shouldn't have any dependencies.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/18] i965: Move pipecontrol workaround bo to brw_pipe_control

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 11:33:07 AM Chris Wilson wrote:
> With the exception of gen8, the sole user of the workaround bo are for
> emitting pipe controls. Move it out of the purview of the batchbuffer
> and into the pipecontrol.
> 
> Signed-off-by: Chris Wilson 

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/18] i965: Share the workaround bo between all contexts

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 11:33:08 AM Chris Wilson wrote:
> Since the workaround bo is used strictly as a write-only buffer, we need
> only allocate one per screen and use the same one from all contexts.
> 
> (The caveat here is during extension initialisation, where we write into
> and read back register values from the buffer, but that is performed only
> once for the first context - and baring synchronisation issues should not
> be a problem. Safer would be to move that also to the screen.)
> 
> Signed-off-by: Chris Wilson 
> ---
>  src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 +++---
>  src/mesa/drivers/dri/i965/intel_screen.c | 4 
>  src/mesa/drivers/dri/i965/intel_screen.h | 1 +
>  3 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
> b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> index 7ee3cb6..05e14cd 100644
> --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> @@ -341,12 +341,12 @@ brw_init_pipe_control(struct brw_context *brw,
>  * the gen6 workaround because it involves actually writing to
>  * the buffer, and the kernel doesn't let us write to the batch.
>  */
> -   brw->workaround_bo = drm_intel_bo_alloc(brw->bufmgr,
> -   "pipe_control workaround",
> -   4096, 4096);
> +   brw->workaround_bo = brw->intelScreen->workaround_bo;
> if (brw->workaround_bo == NULL)
>return -ENOMEM;

Checking for out-of-memory conditions in code that doesn't actually
allocate anything looks funky now.  I'd be inclined just to drop the
-ENOMEM path and make this a void function.

Alternatively, you could just fold this into the brw_context setup and
ditch the functions altogether.  Up to you.

>  
> +   drm_intel_bo_reference(brw->workaround_bo);
> +
> brw->pipe_controls_since_last_cs_stall = 0;
>  
> return 0;
> diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
> b/src/mesa/drivers/dri/i965/intel_screen.c
> index 839a984..cd8e6eb 100644
> --- a/src/mesa/drivers/dri/i965/intel_screen.c
> +++ b/src/mesa/drivers/dri/i965/intel_screen.c
> @@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv)
>  {
> struct intel_screen *intelScreen = sPriv->driverPrivate;
>  
> +   drm_intel_bo_unreference(intelScreen->workaround_bo);
> dri_bufmgr_destroy(intelScreen->bufmgr);
> driDestroyOptionInfo(&intelScreen->optionCache);
>  
> @@ -1096,6 +1097,9 @@ intel_init_bufmgr(struct intel_screen *intelScreen)
>return false;
> }
>  
> +   intelScreen->workaround_bo =
> +  drm_intel_bo_alloc(intelScreen->bufmgr, "pipe_control w/a", 4096, 
> 4096);
> +

Seems a little funny to put this in intel_init_bufmgr, since it's not
setting up libdrm - why not put it in the caller?

> return true;
>  }
>  
> diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
> b/src/mesa/drivers/dri/i965/intel_screen.h
> index 941e0fc..e55fddb 100644
> --- a/src/mesa/drivers/dri/i965/intel_screen.h
> +++ b/src/mesa/drivers/dri/i965/intel_screen.h
> @@ -60,6 +60,7 @@ struct intel_screen
> bool has_context_reset_notification;
>  
> dri_bufmgr *bufmgr;
> +   drm_intel_bo *workaround_bo;
>  
> /**
>  * A unique ID for shader programs.
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/18] i965: Reuse our VBO for streaming fast-clear vertices

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 11:33:10 AM Chris Wilson wrote:
> Rather than allocating a fresh page every time we clear a buffer, keep
> that page around between invocations by tracking the last used offset
> and only allocating a fresh page when we wrap.
> 
> Signed-off-by: Chris Wilson 
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 17 ++---
>  1 file changed, 14 insertions(+), 3 deletions(-)

This looks okay to me.  Do you have any performance data to justify the
extra complexity?


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/18] i965: Pass the map-mode along to intel_mipmap_tree_map_raw()

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 11:33:11 AM Chris Wilson wrote:
> Since we can distinguish when mapping between READ and WRITE, we can
> pass along the map mode to avoid stalls and flushes where possible.
> 
> Signed-off-by: Chris Wilson 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 28 
> ++-
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  3 ++-
>  2 files changed, 17 insertions(+), 14 deletions(-)

Huh, I thought I fixed this ages ago.  Guess not.  Thanks!

Reviewed-by: Kenneth Graunke 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/18] i965: Make intel_mipmap_tree_map_raw() static

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 11:33:12 AM Chris Wilson wrote:
> No external users, so no need to export the symbol outside of our
> compilation unit.
> 
> Signed-off-by: Chris Wilson 

Good call.

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix missing BRW_NEW_FS_PROG_DATA in gen6_renderbuffer_surfaces.

2015-07-06 Thread Kenneth Graunke
It looks like this was forgotten in commit 3c9dc2d31b80fc73bffa1f40a
(i965: Make a brw_stage_prog_data for storing the SURF_INDEX
information.)   In other words, it's been missing since we moved to
dynamic binding table slot assignments.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index 72aad96..b67d9ca 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -770,7 +770,7 @@ update_renderbuffer_surfaces(struct brw_context *brw)
 {
const struct gl_context *ctx = &brw->ctx;
 
-   /* _NEW_BUFFERS | _NEW_COLOR */
+   /* _NEW_BUFFERS | _NEW_COLOR | BRW_NEW_FS_PROG_DATA */
const struct gl_framebuffer *fb = ctx->DrawBuffer;
brw_update_renderbuffer_surfaces(
   brw, fb,
@@ -792,7 +792,8 @@ const struct brw_tracked_state brw_renderbuffer_surfaces = {
 const struct brw_tracked_state gen6_renderbuffer_surfaces = {
.dirty = {
   .mesa = _NEW_BUFFERS,
-  .brw = BRW_NEW_BATCH,
+  .brw = BRW_NEW_BATCH |
+ BRW_NEW_FS_PROG_DATA,
},
.emit = update_renderbuffer_surfaces,
 };
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 05:12:10 PM Chris Wilson wrote:
> On Mon, Jul 06, 2015 at 04:19:36PM +0300, Martin Peres wrote:
> > 
> > 
> > On 06/07/15 16:15, Martin Peres wrote:
> > >On 06/07/15 16:13, Chris Wilson wrote:
> > >>On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote:
> > >>>On 06/07/15 13:33, Chris Wilson wrote:
> > >>>>Move the query for the TIMESTAMP register from context init to the
> > >>>>screen, so that it is only queried once for all contexts.
> > >>>>
> > >>>>On 32bit systems, some old kernels trigger a hw bug resulting in the
> > >>>>TIMESTAMP register being shifted and the low bits always zero. Detect
> > >>>>this by repeating the read a few times and check the register is
> > >>>>incrementing.
> > >>>You do not do the latter. You only check for the low bits.
> > >>>
> > >>>I guess the counter is supposed to be monotonically increasing and
> > >>>with a resolution of a few microseconds which would make this
> > >>>perfectly valid. Could you confirm and make sure to add this
> > >>>information in the commit message please?
> > >>The counter should increment every 80ns. What's misleading in what I
> > >>wrote? It describes the hw bug and how to detect it.
> > >
> > >Well, it is not misleading, it just lacks this information.
> > >
> > >If it incremented every seconds, the patch would be stupid because
> > >the timestamp could be at 0 and polling 10 times at a few us of
> > >interval would always yield the same result. That's all :)
> > 
> > Oh, forgot to say: With this information added in the commit message
> > and the commit message duplicated as a comment in
> > intel_detect_timestamp(), the patch is:
> 
> How about:
> 
> On 32bit systems, some old kernels trigger a hw bug resulting in the
> TIMESTAMP register being shifted and the low 32bits always zero. Detect
> this by repeating the read a few times and check the register is
> incrementing every 80ns as expected and not stuck on zero (as would be 
> the case with the buggy kernel/hw.).
> -Chris
> 
> 

It would be great to put this in a comment above the loop rather than
only in the commit message.

Also, moving the check to screen-time and fixing the check to use a loop
are really two separate things - but they're so trivial I'm inclined to
not be picky. :)

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Add a MUST_CHECK macro for __attribute__((warn_unused_result)).

2015-07-06 Thread Kenneth Graunke
In the kernel, this is called __must_check; all our attribute macros in
Mesa appear to be uppercase, so I went with that.

Signed-off-by: Kenneth Graunke 
Cc: ch...@chris-wilson.co.uk
Cc: matts...@gmail.com
---
 configure.ac  | 1 +
 src/util/macros.h | 6 ++
 2 files changed, 7 insertions(+)

I noticed Chris wants to use this in one of his patches, so I figured I'd
throw together a patch to do this a bit more cleanly.

diff --git a/configure.ac b/configure.ac
index ea0f069..d240c06 100644
--- a/configure.ac
+++ b/configure.ac
@@ -210,6 +210,7 @@ AX_GCC_FUNC_ATTRIBUTE([format])
 AX_GCC_FUNC_ATTRIBUTE([malloc])
 AX_GCC_FUNC_ATTRIBUTE([packed])
 AX_GCC_FUNC_ATTRIBUTE([unused])
+AX_GCC_FUNC_ATTRIBUTE([warn_unused_result])
 
 AM_CONDITIONAL([GEN_ASM_OFFSETS], test "x$GEN_ASM_OFFSETS" = xyes)
 
diff --git a/src/util/macros.h b/src/util/macros.h
index 3b708ed..66698e7 100644
--- a/src/util/macros.h
+++ b/src/util/macros.h
@@ -182,6 +182,12 @@ do {   \
 #define UNUSED
 #endif
 
+#ifdef HAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT
+#define MUST_CHECK __attribute__((warn_unused_result))
+#else
+#define MUST_CHECK
+#endif
+
 /** Compute ceiling of integer quotient of A divided by B. */
 #define DIV_ROUND_UP( A, B )  ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 )
 
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] i965/gen4-5: Set ENDIF dst and src0 fields to the null register.

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 09:03:35 PM Francisco Jerez wrote:
> The hardware docs don't mention explicitly what these fields should
> be, but I've verified experimentally on ILK that using a GRF as
> destination causes the register to be corrupted when the execution
> size of an ENDIF instruction is higher than 8 -- and because the
> destination we were using was g0, eventually a hang.
> 
> Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with
> if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3.
> ---
>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index 0f53604..4d39762 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -1584,8 +1584,8 @@ brw_ENDIF(struct brw_codegen *p)
> }
>  
> if (devinfo->gen < 6) {
> -  brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
> -  brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
> +  brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
> +  brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
>brw_set_src1(p, insn, brw_imm_d(0x0));
> } else if (devinfo->gen == 6) {
>brw_set_dest(p, insn, brw_imm_w(0));
> 

Hah!  I always thought there were just some bugs in this area, but I'm
surprised to see that they were so small.  Great work tracking them
down!

We were doing a bunch of other bogus stuff back in the day, but some of
that got fixed during the Gen4-7 and Gen8+ code generator merging.

Nice work :) Series is:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 11:33:09 AM Chris Wilson wrote:
> When submitting commands to the GPU every cycle of latency counts;
> mutexes, spinlocks, even atomics quickly add to substantial overhead.
> 
> This "batch manager" acts as thread-local shim over the buffer manager
> (drm_intel_bufmgr_gem). As we are only ever used from within a single
> context, we can rely on the upper layers providing thread safety.
> This allows us to import buffers from the shared screen (sharing buffers
> between multiple contexts, threads and users) and wrap that handle in
> our own. Similarly, we want to share the buffer cache between all
> users on the file and so allocate from the global threadsafe buffer
> manager, with a very small and transient local cache of active buffers.
> 
> The batch manager provides a cheap way of busyness tracking and very
> efficient batch construction and kernel submission.
> 
> The restrictions over and above the generic submission engine in
> intel_bufmgr_gem are:
>  - not thread-safe
>  - flat relocations, only the batch buffer itself carries
>relocations. Relocations relative to auxiliary buffers
>must be performed via STATE_BASE
>  - direct mapping of the batch for writes, expect reads
>from the batch to be slow
>  - the batch is a fixed 64k in size
>  - access to the batch must be wrapped by brw_batch_begin/_end
>  - all relocations must be immediately written into the batch
> 
> The importance of the flat relocation tree with local offset handling is
> that it allows us to use the "relocation-less" execbuffer interfaces,
> dramatically reducing the overhead of batch submission. However, that
> can be relaxed to allow other buffers than the batch buffer to carry
> relocations, if need be.
> 
> ivb/bdw OglBatch7 improves by ~20% above and beyond my kernel relocation
> speedups.
> 
> ISSUES:
> * shared mipmap trees
>   - we instantiate a context local copy on use, but what are the semantics for
> serializing read/writes between them - do we need automagic flushing of
> execution on other contexts and common busyness tracking?
>   - we retain references to the bo past the lifetime of its parent
> batchmgr as the mipmap_tree is retained past the lifetime of its
> original context, see glx_arb_create_context/default_major_version
> * OglMultithread is nevertheless unhappy; but that looks like undefined
>   behaviour - i.e. a buggy client concurrently executing the same GL
>   context in multiple threads, unpatched is equally buggy.
> * Add full-ppgtt softpinning support (no more relocations, at least for
>   the first 256TiB), at the moment there is a limited proof-of-principle
>   demonstration
> * polish and move to libdrm; though at the cost of sealing the structs?
> 
> Signed-off-by: Chris Wilson 
> Cc: Daniel Vetter 
> Cc: Kristian Høgsberg 
> Cc: Kenneth Graunke 
> Cc: Jesse Barnes 
> Cc: Ian Romanick 
> Cc: Abdiel Janulgue 
> Cc: Eero Tamminen 
> Cc: Martin Peres 
> ---
>  src/mesa/drivers/dri/i965/Makefile.sources |4 +-
>  src/mesa/drivers/dri/i965/brw_batch.c  | 1946 
> 
>  src/mesa/drivers/dri/i965/brw_batch.h  |  377 
>  src/mesa/drivers/dri/i965/brw_binding_tables.c |1 -
>  src/mesa/drivers/dri/i965/brw_blorp.cpp|   46 +-
>  src/mesa/drivers/dri/i965/brw_cc.c |   16 +-
>  src/mesa/drivers/dri/i965/brw_clear.c  |1 -
>  src/mesa/drivers/dri/i965/brw_clip.c   |2 -
>  src/mesa/drivers/dri/i965/brw_clip_line.c  |2 -
>  src/mesa/drivers/dri/i965/brw_clip_point.c |2 -
>  src/mesa/drivers/dri/i965/brw_clip_state.c |   14 +-
>  src/mesa/drivers/dri/i965/brw_clip_tri.c   |2 -
>  src/mesa/drivers/dri/i965/brw_clip_unfilled.c  |2 -
>  src/mesa/drivers/dri/i965/brw_clip_util.c  |2 -
>  src/mesa/drivers/dri/i965/brw_compute.c|   42 +-
>  src/mesa/drivers/dri/i965/brw_conditional_render.c |2 +-
>  src/mesa/drivers/dri/i965/brw_context.c|  233 ++-
>  src/mesa/drivers/dri/i965/brw_context.h|  144 +-
>  src/mesa/drivers/dri/i965/brw_cs.cpp   |6 +-
>  src/mesa/drivers/dri/i965/brw_curbe.c  |1 -
>  src/mesa/drivers/dri/i965/brw_draw.c   |  103 +-
>  src/mesa/drivers/dri/i965/brw_draw_upload.c|   23 +-
>  src/mesa/drivers/dri/i965/brw_ff_gs.c  |2 -
>  src/mesa/drivers/dri/i965/brw_ff_gs_emit.c |1 -
>  src/mesa/drivers/dri/i965/brw_fs.cpp   |5 +-
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c|   11 +-
>  src/mesa/d

Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote:
> Since the purpose of transform feedback tends to be for the client to
> act upon the results to change the geometry in the scene, it is likely
> that the client will soon be waiting upon the results. Flush the batch
> early so that we don't build up a long queue of commands afterwards that
> could delay the readback.
> ---
>  src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
> b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> index 857ebe5..13dbe5b 100644
> --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx,
>  
> brw_batch_end(&brw->batch);
>  
> +   /* We will likely want to read the results in the very near future, so
> +* push this primitive to hardware if it is currently idle.
> +*/
> +   if (!brw_batch_busy(&brw->batch))
> +  brw_batch_flush(&brw->batch);
> +
> /* EndTransformFeedback() means that we need to update the number of
>  * vertices written.  Since it's only necessary if DrawTransformFeedback()
>  * is called and it means mapping a buffer object, we delay computing it
> 

We need some data to justify this change.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/18] swrast: Defer _tnl_vertex_init until first use

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 11:33:20 AM Chris Wilson wrote:
> The vertices require a large chunk of memory, currently allocated during
> context creation. However, this memory is not required until use so we
> can defer the allocation until the first swrast_Wakeup().
> 
> Signed-off-by: Chris Wilson 
> Cc: Kenneth Graunke 
> ---
>  src/mesa/swrast_setup/ss_context.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/swrast_setup/ss_context.c 
> b/src/mesa/swrast_setup/ss_context.c
> index 74b1da3..028eccb 100644
> --- a/src/mesa/swrast_setup/ss_context.c
> +++ b/src/mesa/swrast_setup/ss_context.c
> @@ -59,10 +59,6 @@ _swsetup_CreateContext( struct gl_context *ctx )
> swsetup->NewState = ~0;
> _swsetup_trifuncs_init( ctx );
>  
> -   _tnl_init_vertices( ctx, ctx->Const.MaxArrayLockSize + 12, 
> -sizeof(SWvertex) );
> -
> -
> return GL_TRUE;
>  }
>  
> @@ -233,6 +229,11 @@ _swsetup_Wakeup( struct gl_context *ctx )
> TNLcontext *tnl = TNL_CONTEXT(ctx);
> SScontext *swsetup = SWSETUP_CONTEXT(ctx);
>  
> +   if (!(GET_VERTEX_STATE(ctx))->max_vertex_size)
> +  _tnl_init_vertices(ctx,
> +  ctx->Const.MaxArrayLockSize + 12,
> +  sizeof(SWvertex));
> +
> tnl->Driver.Render.Start = _swsetup_RenderStart;
> tnl->Driver.Render.Finish = _swsetup_RenderFinish;
> tnl->Driver.Render.PrimitiveNotify = _swsetup_RenderPrimitive;
> 

Looks reasonable - this saves 2.59MB of memory in Piglit's glsl-cos
test.  (It's worth noting that in the commit message.)

I wonder if we can do better though by avoiding most of the
swrast context entirely...

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix missing BRW_NEW_FS_PROG_DATA in gen6_renderbuffer_surfaces.

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 12:18:18 PM Matt Turner wrote:
> On Mon, Jul 6, 2015 at 9:55 AM, Kenneth Graunke  wrote:
> > It looks like this was forgotten in commit 3c9dc2d31b80fc73bffa1f40a
> > (i965: Make a brw_stage_prog_data for storing the SURF_INDEX
> > information.)   In other words, it's been missing since we moved to
> > dynamic binding table slot assignments.
> 
> Author: Eric Anholt 
> AuthorDate: Wed Oct 2 14:07:40 2013 -0700
> CommitDate: Tue Oct 15 10:18:42 2013 -0700
> 
> Dang.
> 
> How did you find this?

After reading Eero's latest performance analysis observations, I decided
to try marking render targets uncached (MOCS = UC) when blending and
logic ops (i.e. things that cause render target reads) are disabled.

This necessitated adding _NEW_COLOR.  Then I realized
BRW_NEW_FS_PROG_DATA was just plain missing.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Add a MUST_CHECK macro for __attribute__((warn_unused_result)).

2015-07-06 Thread Kenneth Graunke
On Monday, July 06, 2015 09:27:12 PM Chris Wilson wrote:
> On Mon, Jul 06, 2015 at 11:18:00AM -0700, Kenneth Graunke wrote:
> > In the kernel, this is called __must_check; all our attribute macros in
> > Mesa appear to be uppercase, so I went with that.
> > 
> > Signed-off-by: Kenneth Graunke 
> > Cc: ch...@chris-wilson.co.uk
> > Cc: matts...@gmail.com
> > ---
> >  configure.ac  | 1 +
> >  src/util/macros.h | 6 ++
> >  2 files changed, 7 insertions(+)
> > 
> > I noticed Chris wants to use this in one of his patches, so I figured I'd
> > throw together a patch to do this a bit more cleanly.
> 
> That would suit me very much. I guess I need to learn about
> AX_GCC_FUNC_ATTRIBUTE!
> 
> Reviewed-by: Chris Wilson 
> -Chris

It's pretty handy!  Matt found it in the autoconf archive:

http://www.gnu.org/software/autoconf-archive/ax_gcc_func_attribute.html

We just dropped it in our m4/ directory.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 5/5] i965/gen9: Allocate YF/YS tiled buffer objects

2015-07-07 Thread Kenneth Graunke
On Tuesday, June 23, 2015 01:23:05 PM Anuj Phogat wrote:
> In case of I915_TILING_{X,Y} we need to pass tiling format to libdrm
> using drm_intel_bo_alloc_tiled(). But, In case of YF/YS tiled buffers
> libdrm need not know about the tiling format because these buffers
> don't have hardware support to be tiled or detiled through a fenced
> region. libdrm still need to know buffer alignment value for its use
> in kernel when resolving the relocation.
> 
> Using drm_intel_bo_alloc_for_render() for YF/YS tiled buffers
> satisfy both the above conditions.
> 
> V2: Delete min/max buffer size restrictions not valid for i965+.
> Remove redundant align to tile size statements.
> Remove some redundant code now when there are no min/max buffer size.
> 
> Signed-off-by: Anuj Phogat 
> Cc: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 62 
> +--
>  1 file changed, 58 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 80c52f2..5bcb094 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -558,6 +558,48 @@ intel_lower_compressed_format(struct brw_context *brw, 
> mesa_format format)
> }
>  }
>  
> +/* This function computes Yf/Ys tiled bo size, alignment and pitch. */
> +static uint64_t
> +intel_get_yf_ys_bo_size(struct intel_mipmap_tree *mt, unsigned *alignment,
> +uint64_t *pitch)

Hi Anuj,

This patch has a subtle bug: you've specified pitch and stride to be
uint64_t here, but below when you call it

[snip]
> @@ -616,11 +658,23 @@ intel_miptree_create(struct brw_context *brw,
>alloc_flags |= BO_ALLOC_FOR_RENDER;
>  
> unsigned long pitch;
> -   mt->bo = drm_intel_bo_alloc_tiled(brw->bufmgr, "miptree", total_width,
> - total_height, mt->cpp, &mt->tiling,
> - &pitch, alloc_flags);
> mt->etc_format = etc_format;
> -   mt->pitch = pitch;
> +
> +   if (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE) {
> +  unsigned alignment = 0;
> +  unsigned long size;
> +  size = intel_get_yf_ys_bo_size(mt, &alignment, &pitch);

...you're passing a pointer to an unsigned long.  On 32-bit builds,
unsigned long is a 4 byte value, while uint64_t is 8 bytes.  This could
lead to stack corruption.  (GCC warns about this during a 32-bit build.)

I assumed the solution was to make everything uint32_t, but apparently
drm_intel_bo_alloc_tiled actually expects an unsigned long.  So we can't
change that.

Then I looked at your code, and realized that nothing even uses the
pitch value.  Is there some point to the parameter existing at all?

--Ken

> +  assert(size);
> +  mt->bo = drm_intel_bo_alloc_for_render(brw->bufmgr, "miptree",
> + size, alignment);
> +  mt->pitch = pitch;
> +   } else {
> +  mt->bo = drm_intel_bo_alloc_tiled(brw->bufmgr, "miptree",
> +total_width, total_height, mt->cpp,
> +&mt->tiling, &pitch,
> +alloc_flags);
> +  mt->pitch = pitch;
> +   }
>  
> /* If the BO is too large to fit in the aperture, we need to use the
>  * BLT engine to support it.  Prior to Sandybridge, the BLT paths can't
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/gen9: Plugin the code for selecting YF/YS tiling on skl+

2015-07-07 Thread Kenneth Graunke
On Wednesday, June 10, 2015 03:30:47 PM Anuj Phogat wrote:
> Buffers with Yf/Ys tiling end up using meta upload / download
> paths or the blitter for cases where they used tiled_memcpy paths
> in case of Y tiling. This has exposed some bugs in meta path. To
> avoid any piglit regressions on SKL this patch keeps the Yf/Ys
> tiling disabled at the moment.
> 
> V3: Make brw_miptree_choose_tr_mode() actually choose TRMODE. (Ben)
> Few cosmetic changes.
> V4: Get rid of brw_miptree_choose_tr_mode().
> Take care of all tile resource modes {Yf, Ys, none} for all
> generations at one place.
> 
> Signed-off-by: Anuj Phogat 
> Cc: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_tex_layout.c | 97 
> --
>  1 file changed, 79 insertions(+), 18 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> index b9ac4cf..c0ef5cc 100644
> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> @@ -807,27 +807,88 @@ brw_miptree_layout(struct brw_context *brw,
> enum intel_miptree_tiling_mode requested,
> struct intel_mipmap_tree *mt)
>  {
> -   mt->tr_mode = INTEL_MIPTREE_TRMODE_NONE;
> +   const unsigned bpp = mt->cpp * 8;
> +   const bool is_tr_mode_yf_ys_allowed =
> +  brw->gen >= 9 &&
> +  !for_bo &&
> +  !mt->compressed &&
> +  /* Enable YF/YS tiling only for color surfaces because depth and
> +   * stencil surfaces are not supported in blitter using fast copy
> +   * blit and meta PBO upload, download paths. No other paths
> +   * currently support Yf/Ys tiled surfaces.
> +   * FIXME:  Remove this restriction once we have a tiled_memcpy()
> +   * path to do depth/stencil data upload/download to Yf/Ys tiled
> +   * surfaces.
> +   */
> +  _mesa_is_format_color_format(mt->format) &&
> +  (requested == INTEL_MIPTREE_TILING_Y ||
> +   requested == INTEL_MIPTREE_TILING_ANY) &&
> +  (bpp && is_power_of_two(bpp)) &&
> +  /* FIXME: To avoid piglit regressions keep the Yf/Ys tiling
> +   * disabled at the moment.
> +   */
> +  false;

I must say, I was a bit surprised to see this land as is.  You've got a
lot of conditions there, only to finish them up with && false - with a
comment saying that your code isn't passing Piglit yet.  That doesn't
really meet our usual qualifications for merging.

Coverity also pointed out that your if (is_tr_mode_yf_ys_allowed) block
below is dead code, issuing new warnings.

Forgive my ignorance, but what's the purpose of Yf/Ys tiling?  My
understanding was that Ys is primarily in support of a new OpenGL
feature - GL_ARB_spare_texture(*) - which isn't yet enabled:

https://www.opengl.org/registry/specs/ARB/sparse_texture.txt

Is Yf tiling supposed to be more efficient than legacy Y-tiling?  If so,
then switching to it is an optimization, isn't it?  We usually require
data indicating some kind of performance improvement (any kind!) before
landing a bunch of code for optimizations.  Obviously that's pretty
tricky with pre-release hardware, so I'd settle for "it's complete
and functions correctly."

At any rate, it's merged, and hopefully you're able to get it working...

>  
> -   intel_miptree_set_alignment(brw, mt);
> -   intel_miptree_set_total_width_height(brw, mt);
> +   /* Lower index (Yf) is the higher priority mode */
> +   const uint32_t tr_mode[3] = {INTEL_MIPTREE_TRMODE_YF,
> +INTEL_MIPTREE_TRMODE_YS,
> +INTEL_MIPTREE_TRMODE_NONE};
> +   int i = is_tr_mode_yf_ys_allowed ? 0 : ARRAY_SIZE(tr_mode) - 1;
>  
> -   if (!mt->total_width || !mt->total_height) {
> -  intel_miptree_release(&mt);
> -  return;
> -   }
> +   while (i < ARRAY_SIZE(tr_mode)) {
> +  if (brw->gen < 9)
> + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_NONE);
> +  else
> + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_YF ||
> +tr_mode[i] == INTEL_MIPTREE_TRMODE_YS ||
> +tr_mode[i] == INTEL_MIPTREE_TRMODE_NONE);
>  
> -   /* On Gen9+ the alignment values are expressed in multiples of the block
> -* size
> -*/
> -   if (brw->gen >= 9) {
> -  unsigned int i, j;
> -  _mesa_get_format_block_size(mt->format, &i, &j);
> -  mt->align_w /= i;
> -  mt->align_h /= j;
> -   }
> +  mt->tr_mode = tr_mode[i];
> +  intel_miptree_set_alignment(brw, mt);
> +  intel_miptree_set_total_width_height(brw, mt);
>  
> -   if (!for_bo)
> -  mt->tiling = brw_miptree_choose_tiling(brw, requested, mt);
> +  if (!mt->total_width || !mt->total_height) {
> + intel_miptree_release(&mt);
> + return;
> +  }
> +
> +  /* On Gen9+ the alignment values are expressed in multiples of the
> +   * block size.
> +   */
> +  if (brw->gen >= 9) {
> + unsigned int i, j;
> + _mesa_get_format

Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback

2015-07-07 Thread Kenneth Graunke
On Tuesday, July 07, 2015 04:46:22 PM Chris Wilson wrote:
> On Tue, Jul 07, 2015 at 10:12:20AM +0100, Chris Wilson wrote:
> > On Mon, Jul 06, 2015 at 09:05:18PM -0700, Kristian Høgsberg wrote:
> > > On Mon, Jul 6, 2015 at 12:36 PM, Kenneth Graunke  
> > > wrote:
> > > > On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote:
> > > >> Since the purpose of transform feedback tends to be for the client to
> > > >> act upon the results to change the geometry in the scene, it is likely
> > > >> that the client will soon be waiting upon the results. Flush the batch
> > > >> early so that we don't build up a long queue of commands afterwards 
> > > >> that
> > > >> could delay the readback.
> > > >> ---
> > > >>  src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++
> > > >>  1 file changed, 6 insertions(+)
> > > >>
> > > >> diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
> > > >> b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > > >> index 857ebe5..13dbe5b 100644
> > > >> --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > > >> +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > > >> @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context 
> > > >> *ctx,
> > > >>
> > > >> brw_batch_end(&brw->batch);
> > > >>
> > > >> +   /* We will likely want to read the results in the very near 
> > > >> future, so
> > > >> +* push this primitive to hardware if it is currently idle.
> > > >> +*/
> > > >> +   if (!brw_batch_busy(&brw->batch))
> > > >> +  brw_batch_flush(&brw->batch);
> > > >> +
> > > >> /* EndTransformFeedback() means that we need to update the number 
> > > >> of
> > > >>  * vertices written.  Since it's only necessary if 
> > > >> DrawTransformFeedback()
> > > >>  * is called and it means mapping a buffer object, we delay 
> > > >> computing it
> > > >>
> > > >
> > > > We need some data to justify this change.
> > > 
> > > I think even the theory is not correct - transform feedback is
> > > typically fed back into the GPU (as new geometry, eg) rather than
> > > consumed by the CPU, and in that case the flush is not helpful. But at
> > > the end of the day, data will tell.
> > 
> > How are they fed back? Can the xfb buffer be bound to the vertex buffer?
> > (Genuine question! The only examples I've seen were for testing by the
> > CPU.)

Yes, it can.  Just glBindBuffer() some buffers around.  Or, I suspect
one could bind it as a texture buffer object or SSBO and then use a
compute shader on the results.

With GL 4.x, the "avoid synchronizing with the CPU" mentality is a lot
more prevalent, due to the advent of compute shaders.

> 
> I've reviewed the code again, and gen7_end_transform_feedback() is always
> followed by brw_compute_xfb_vertices_written (and a read of the sol
> buffer) afaict, maybe not immediately but always before the next
> transform feedback.

Sadly, yes.  We have a primitive count and we need a vertex count - so,
a tiny bit of math.  Ideally, we would use the Gen7.5 MI_MATH+ feature
to do this, eliminating the CPU-GPU synchronization point.

> Also afaict it is not possible to map the sol buffer directly into the
> application.
> -Chris

It definitely is - the application creates GL buffer objects and binds
them for use with transform feedback.  They can certainly
glMapBufferRange() those buffers.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback

2015-07-07 Thread Kenneth Graunke
On Tuesday, July 07, 2015 09:02:16 PM Chris Wilson wrote:
> On Tue, Jul 07, 2015 at 10:31:07AM -0700, Kenneth Graunke wrote:
> > On Tuesday, July 07, 2015 04:46:22 PM Chris Wilson wrote:
> > > On Tue, Jul 07, 2015 at 10:12:20AM +0100, Chris Wilson wrote:
> > > > On Mon, Jul 06, 2015 at 09:05:18PM -0700, Kristian Høgsberg wrote:
> > > > > On Mon, Jul 6, 2015 at 12:36 PM, Kenneth Graunke 
> > > > >  wrote:
> > > > > > On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote:
> > > > > >> Since the purpose of transform feedback tends to be for the client 
> > > > > >> to
> > > > > >> act upon the results to change the geometry in the scene, it is 
> > > > > >> likely
> > > > > >> that the client will soon be waiting upon the results. Flush the 
> > > > > >> batch
> > > > > >> early so that we don't build up a long queue of commands 
> > > > > >> afterwards that
> > > > > >> could delay the readback.
> > > > > >> ---
> > > > > >>  src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++
> > > > > >>  1 file changed, 6 insertions(+)
> > > > > >>
> > > > > >> diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
> > > > > >> b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > > > > >> index 857ebe5..13dbe5b 100644
> > > > > >> --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > > > > >> +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > > > > >> @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context 
> > > > > >> *ctx,
> > > > > >>
> > > > > >> brw_batch_end(&brw->batch);
> > > > > >>
> > > > > >> +   /* We will likely want to read the results in the very near 
> > > > > >> future, so
> > > > > >> +* push this primitive to hardware if it is currently idle.
> > > > > >> +*/
> > > > > >> +   if (!brw_batch_busy(&brw->batch))
> > > > > >> +  brw_batch_flush(&brw->batch);
> > > > > >> +
> > > > > >> /* EndTransformFeedback() means that we need to update the 
> > > > > >> number of
> > > > > >>  * vertices written.  Since it's only necessary if 
> > > > > >> DrawTransformFeedback()
> > > > > >>  * is called and it means mapping a buffer object, we delay 
> > > > > >> computing it
> > > > > >>
> > > > > >
> > > > > > We need some data to justify this change.
> > > > > 
> > > > > I think even the theory is not correct - transform feedback is
> > > > > typically fed back into the GPU (as new geometry, eg) rather than
> > > > > consumed by the CPU, and in that case the flush is not helpful. But at
> > > > > the end of the day, data will tell.
> > > > 
> > > > How are they fed back? Can the xfb buffer be bound to the vertex buffer?
> > > > (Genuine question! The only examples I've seen were for testing by the
> > > > CPU.)
> > 
> > Yes, it can.  Just glBindBuffer() some buffers around.  Or, I suspect
> > one could bind it as a texture buffer object or SSBO and then use a
> > compute shader on the results.
> > 
> > With GL 4.x, the "avoid synchronizing with the CPU" mentality is a lot
> > more prevalent, due to the advent of compute shaders.
> > 
> > > 
> > > I've reviewed the code again, and gen7_end_transform_feedback() is always
> > > followed by brw_compute_xfb_vertices_written (and a read of the sol
> > > buffer) afaict, maybe not immediately but always before the next
> > > transform feedback.
> > 
> > Sadly, yes.  We have a primitive count and we need a vertex count - so,
> > a tiny bit of math.  Ideally, we would use the Gen7.5 MI_MATH+ feature
> > to do this, eliminating the CPU-GPU synchronization point.
> > 
> > > Also afaict it is not possible to map the sol buffer directly into the
> > > application.
> > > -Chris
> > 
> > It definitely is - the application creates GL buffer objects and binds
> > them for use with transform feedback.  They can certainly
> >

Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager

2015-07-07 Thread Kenneth Graunke
Hi Chris,

I made a genuine effort to review this patch, hoping to better understand
the various changes and what you were trying to accomplish.  I spent many
hours reading and trying to enumerate changes - or potential changes I
needed to look hard at to convince myself whether they were correct.

I came up with a frighteningly long list of changes:

* Relocation handling changes considerably (the original point of
  Kristian's endeavour which led up to this).

* Fencing, busy tracking, and sync objects are completely reworked.

* Render-to-texture cache flushing and dirty buffer tracking is
  completely reworked.

* Gen7 SOL buffer offset resetting now uses MI_LOAD_REGISTER_IMM rather
  than the execbuf2 parameter, requiring the command validator on Haswell.
  This effectively bumps the kernel requirement from v3.6 to v4.2-rc1,
  which will simply not fly with distributions at this time.

* glBufferSubData() now uses intel_upload_data() rather than allocating
  a temporary BO.  This is the first use of the upload buffer by the
  BLT engine, and could imply that the upload buffer's lifetime now
  extends across batches - longer than before.  Separable change that
  requires separate evaluation and justification.

* Per buffer cache-coherency checking rather than brw->has_llc?

* glBufferSubData()'s prefer_stall_to_blit flag appears to depend on
  per-buffer cache-coherency rather than being set globally.  Could
  impact performance of buffer uploads.

* Potential missing flushes (which can cause hangs or misrendering):

  - It looks like calling brw_bo_busy() with BUSY_FLUSH causes a flush
when necessary.  However, some instances of the old bo_busy,
bo_references, batch_flush pattern are replaced without that flag.
One occurrance was in BufferSubData(); I did not spend time to
check every case.

  - Flushes are often done implicitly by e.g. brw_bo_read calling
brw_bo_map with the appropriate flags, and many explicit checks
and flushes are removed.  Not bad, but needs careful review.

  - Gen6+ query object code might have dropped an implicit flush
guaranteeing that when the GL application requests the result,
any pending work will be kicked off so they can poll/spin
repeatedly until the result arrives.

  - New code to avoid redundant flushes.

* perf_debug() warnings are removed all over the code for some reason:

  - Unsynchronized maps/BufferSubData not working on !LLC platforms?
If they work now, that's a huge change!  If not, why drop the warning?

  - Warnings about stalls on mapping buffers and miptrees are gone now.
These have been useful in tracking down performance problems.  They
might not always be accurate, but surely removing them should be done
separately with justification?

  - Warnings about stalls on query objects are gone.  I've used these when
analyzing application performance.  Why?

  - Warnings about implicit flushes are gone.

* BO unmap calls appears to be missing in some places.  A few map calls
  have moved around in hard-to-follow ways.  Unclear how lifetimes of
  buffers and lifetimes of maps are affected.

* Possible mmap vs. pwrite preference changes?  Hard to follow.

* Texture upload (tiled_memcpy) changes, which is notoriously fragile
  and can lose all of the performance benefit if the compiler isn't able
  to optimize it just right.  Ideally separate.

* Assertions change to GL errors in brw_get_graphics_reset_status().

* Aperture space checking significantly reworked, especially for the BLT
  paths.  Honestly, a lot nicer, but couldn't this be separated?

* The bo_reuse driconf option is removed.

* Gen4-5 structure changes.

* brw_get_timestamp() - removes initialization of result to 0.
  Probably unnecessary and OK to delete; should be separate.

* New helper functions and coding patterns.  Separable.

* Noise (renaming, moving code between files, some other trivial changes
  like removing 'brw' variables and moving code into "else" blocks).

* ...I probably missed some things.

Based upon this, I cannot in good conscience consider merging this patch.
The potential for breakage is staggering.  As a proof-of-concept, you've
done an excellent job in proving we can do much better, and introduced a
lot of good ideas.  But there's a lot of work left to be done before we
can consider applying it to our production quality driver.

Please advise whether you would like to work towards making a mergeable,
incremental patch series, or if someone else should embark on that
endeavour.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: Fix comment above nir_convert_from_ssa() prototype.

2015-07-08 Thread Kenneth Graunke
Presumably Connor renamed the parameter, inverting the sense.
Update the comment accordingly.

Cc: Connor Abbott 
Signed-off-by: Kenneth Graunke 
---
 src/glsl/nir/nir.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 4cb7d2f..9e2a281 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1659,9 +1659,9 @@ bool nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def 
*b);
 void nir_convert_to_ssa_impl(nir_function_impl *impl);
 void nir_convert_to_ssa(nir_shader *shader);
 
-/* If convert_everything is true, convert all values (even those not involved
- * in a phi node) to registers. If false, only convert SSA values involved in
- * phi nodes to registers.
+/* If phi_webs_only is true, only convert SSA values involved in phi nodes to
+ * registers.  If false, convert all values (even those not involved in a phi
+ * node) to registers.
  */
 void nir_convert_from_ssa(nir_shader *shader, bool phi_webs_only);
 
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager

2015-07-08 Thread Kenneth Graunke
On Wednesday, July 08, 2015 03:17:35 PM Chris Wilson wrote:
> On Wed, Jul 08, 2015 at 09:51:07AM +0100, Chris Wilson wrote:
> > On Tue, Jul 07, 2015 at 10:03:09PM -0700, Kenneth Graunke wrote:
> > > * Gen4-5 structure changes.
> 
> Did you mean brw_structs.h?
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_structs.h 
> b/src/mesa/drivers/dri/i965/brw_structs.h
> index 55338c0..e167254 100644
> --- a/src/mesa/drivers/dri/i965/brw_structs.h
> +++ b/src/mesa/drivers/dri/i965/brw_structs.h
> @@ -391,13 +391,16 @@ struct brw_sf_unit_state
>unsigned pad3:1;
> } thread4;
>  
> -   struct
> +   union
> {
> +  struct {
>  unsigned front_winding:1;
>  unsigned viewport_transform:1;
>  unsigned pad0:3;
>  unsigned sf_viewport_state_offset:27; /* Offset from 
> GENERAL_STATE_BASE */
>} sf5;
> +  uint32_t dw5;
> +   };
>  
> struct
> {
> @@ -525,15 +528,17 @@ struct brw_wm_unit_state
> struct thread2 thread2;
> struct thread3 thread3;
>  
> +   union {
>struct {
>  unsigned stats_enable:1;
>  unsigned depth_buffer_clear:1;
>  unsigned sampler_count:3;
>  unsigned sampler_state_pointer:27;
>} wm4;
> +  uint32_t dw4;
> +   };
>  
> -   struct
> -   {
> +   struct {
>unsigned enable_8_pix:1;
>unsigned enable_16_pix:1;
>unsigned enable_32_pix:1;
> diff --git a/src/mesa/drivers/dri/i965/brw_urb.c 
> b/src/mesa/drivers/dri/i965/brw_urb.c
> 
> Or something else?
> -Chris

Yes - and the changes are fine.  But...separable.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: Don't link to SHA1 library if shader-cache is disabled.

2015-07-08 Thread Kenneth Graunke
On Tuesday, July 07, 2015 11:33:57 PM Matt Turner wrote:
> Cc: "10.6" 
> ---
>  src/util/Makefile.am | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/util/Makefile.am b/src/util/Makefile.am
> index 2e7542e..1e087b4 100644
> --- a/src/util/Makefile.am
> +++ b/src/util/Makefile.am
> @@ -46,9 +46,9 @@ libmesautil_la_SOURCES = \
>  
>  if ENABLE_SHADER_CACHE
>  libmesautil_la_SOURCES += $(MESA_UTIL_SHADER_CACHE_FILES)
> -endif
>  
>  libmesautil_la_LIBADD = $(SHA1_LIBS)
> +endif
>  
>  roundeven_test_LDADD = -lm
>  
> 

Yeah...haven't thought through a plan yet.  This certainly seems
reasonable.  Reverting it all and pushing it back in later isn't crazy
either, but with this patch I'm not sure it's hurting anything...

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/78] i965/nir/vec4: Add setup of uniform variables

2015-07-09 Thread Kenneth Graunke
On Tuesday, June 30, 2015 10:04:47 AM Iago Toral wrote:
> Hi Jason,
> 
> On Mon, 2015-06-29 at 16:22 -0700, Jason Ekstrand wrote:
> > On Fri, Jun 26, 2015 at 1:06 AM, Eduardo Lima Mitev  
> > wrote:
> > > From: Iago Toral Quiroga 
> > >
> > > This is based on similar code existing in vec4_visitor. It builds the
> > > uniform register file iterating through each uniform variable. It
> > > also stores the index of each register at the corresponding offset
> > > in a map. This map will later be used by load_uniform intrinsic
> > > instructions to build the correct UNIFORM source register.
> > >
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_vec4.h   |   2 +
> > >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 115 
> > > -
> > >  2 files changed, 114 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
> > > b/src/mesa/drivers/dri/i965/brw_vec4.h
> > > index 673df4e..6535f19 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> > > @@ -414,6 +414,8 @@ public:
> > > src_reg *nir_inputs;
> > > int *nir_outputs;
> > > brw_reg_type *nir_output_types;
> > > +   unsigned *nir_uniform_offset;
> > > +   unsigned *nir_uniform_driver_location;
> > >
> > >  protected:
> > > void emit_vertex();
> > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
> > > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > > index 2d457a6..40ec66f 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > > @@ -106,19 +106,128 @@ vec4_visitor::nir_setup_outputs(nir_shader *shader)
> > >  void
> > >  vec4_visitor::nir_setup_uniforms(nir_shader *shader)
> > >  {
> > > -   /* @TODO: Not yet implemented */
> > > +   uniforms = 0;
> > > +
> > > +   nir_uniform_offset =
> > > +  rzalloc_array(mem_ctx, unsigned, this->uniform_array_size);
> > > +   memset(nir_uniform_offset, 0, this->uniform_array_size * 
> > > sizeof(unsigned));
> > 
> > rzalloc memsets the whole thing to 0 for you, this memset is redundant.
> > 
> > > +
> > > +   nir_uniform_driver_location =
> > > +  rzalloc_array(mem_ctx, unsigned, this->uniform_array_size);
> > > +   memset(nir_uniform_driver_location, 0,
> > > +  this->uniform_array_size * sizeof(unsigned));
> > 
> > Same here.
> 
> Oh, right.
> 
> > > +
> > > +   if (shader_prog) {
> > > +  foreach_list_typed(nir_variable, var, node, &shader->uniforms) {
> > > + /* UBO's, atomics and samplers don't take up space in the
> > > +uniform file */
> > > + if (var->interface_type != NULL || var->type->contains_atomic() 
> > > ||
> > > + type_size(var->type) == 0) {
> > 
> > I'm curious as to why you have this extra type_size() == 0 condition.
> > We don't have that in the FS NIR code.  What caused you to add it?
> 
> Take this piglit test for example:
> bin/textureSize vs isampler1D -auto -fbo
> 
> here, 'tex' is a uniform of size 0 since type_size() returns 0 for all
> sampler types. If we do not ignore these, we will try to store uniform
> information for them in the various structures we have to track uniform
> data, like uniform_size[] and others. The size allocated for these
> arrays is computed by in the vec4_visitor constructor based on
> stage_prog_data->nr_params (uniform_array_size) and that does not seem
> to make room for zero-sized uniforms. Without that check we would
> process more uniforms than uniform_array_size and overflow the arrays we
> allocate to track uniform information. I understand that
> stage_prog_data->nr_params does not track uniforms that don't use
> register space, so skipping uniforms with no size seems to make sense
> here.
> 
> Notice that this is done in the current vec4_visitor too, when we visit
> the variable in vec4_visitor::visit(ir_variable *ir), for ir_var_uniform
> there is this code:
> 
> if (ir->is_in_uniform_block() || type_size(ir->type) == 0)
> return; 

Oh, right.  I think we handle sampler uniforms a bit differently in the
vec4 and FS backends, and I was never quite sure why...I know we've had
bugs relating to zero-sized uniforms, but I was never quite able to sort
them out.

This seems fine for now - it keeps the vec4 world doing what it's always
done.




signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/hsw: Implement end of batch workaround

2015-07-09 Thread Kenneth Graunke
From: Ben Widawsky 

This patch can cause an infinite recursion if the previous patch titled, "i965:
Track finished batch state" isn't present (backporters take notice).

v2: Sent out the wrong patch originally. This patches switches the order of
flushes, doing the generic flush before the CC_STATE, and the required
workaround flush afterwards

v3: Only perform workaround for render ring
Add text to the BATCH_RESERVE comments

v4: Rebase; update citation to mention PRM and Wa name; combine two blocks.

Signed-off-by: Ben Widawsky 
Reviewed-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 27 +--
 src/mesa/drivers/dri/i965/intel_batchbuffer.h |  4 
 2 files changed, 29 insertions(+), 2 deletions(-)

Hey Ben,

I was going to suggest a few minor changes, and then realized it'd save us both
time if I just typed them up.  Here's a v4 of your patch.  If it passes Jenkins,
I'd say let's push it.  Thanks for remembering this!

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 969d92c..d93ee6e 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -32,6 +32,7 @@
 #include "intel_buffers.h"
 #include "intel_fbo.h"
 #include "brw_context.h"
+#include "brw_defines.h"
 
 #include 
 #include 
@@ -206,10 +207,32 @@ brw_finish_batch(struct brw_context *brw)
 */
brw_emit_query_end(brw);
 
-   /* We may also need to snapshot and disable OA counters. */
-   if (brw->batch.ring == RENDER_RING)
+   if (brw->batch.ring == RENDER_RING) {
+  /* We may also need to snapshot and disable OA counters. */
   brw_perf_monitor_finish_batch(brw);
 
+  if (brw->is_haswell) {
+ /* From the Haswell PRM, Volume 2b, Command Reference: Instructions,
+  * 3DSTATE_CC_STATE_POINTERS > "Note":
+  *
+  * "SW must program 3DSTATE_CC_STATE_POINTERS command at the end of 
every
+  *  3D batch buffer followed by a PIPE_CONTROL with RC flush and CS 
stall."
+  *
+  * From the example in the docs, it seems to expect a regular pipe 
control
+  * flush here as well. We may have done it already, but meh.
+  *
+  * See also WaAvoidRCZCounterRollover.
+  */
+ brw_emit_mi_flush(brw);
+ BEGIN_BATCH(2);
+ OUT_BATCH(_3DSTATE_CC_STATE_POINTERS << 16 | (2 - 2));
+ OUT_BATCH(brw->cc.state_offset | 1);
+ ADVANCE_BATCH();
+ brw_emit_pipe_control_flush(brw, PIPE_CONTROL_RENDER_TARGET_FLUSH |
+  PIPE_CONTROL_CS_STALL);
+  }
+   }
+
/* Mark that the current program cache BO has been used by the GPU.
 * It will be reallocated if we need to put new programs in for the
 * next batch.
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index fdd07e0..8eaedd1 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -26,6 +26,10 @@ extern "C" {
  * - 3 DWords for MI_REPORT_PERF_COUNT itself on Gen6+.  ==> 12 bytes.
  *   On Ironlake, it's 6 DWords, but we have some slack due to the lack of
  *   Sandybridge PIPE_CONTROL madness.
+ *   - CC_STATE workaround on HSW (12 * 4 = 48 bytes)
+ * - 5 dwords for initial mi_flush
+ * - 2 dwords for CC state setup
+ * - 5 dwords for the required pipe control at the end
  */
 #define BATCH_RESERVED 152
 
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [v3] i965: Split out gen8 push constant state upload

2015-07-11 Thread Kenneth Graunke
On Thursday, July 09, 2015 11:00:40 AM Ben Widawsky wrote:
> While implementing the workaround in the previous patch I noticed things were
> starting to get a bit messy. Since gen8 works differently enough from gen7, I
> thought splitting it out with be good.

IMHO this is still a bit messy.  What about separating the packet
emission from the decision-making about which of the 4 buffers to use?
Something along these lines (warning: doesn't compile):

#define OUT_RELOC_NULL(x) if (x != 0) { OUT_RELOC(x) } else { OUT_BATCH(x); }
#define OUT_RELOC64_NULL(x) if (x != 0) { OUT_RELOC64(x) } else { OUT_BATCH(x); 
}

static inline void
emit_3dstate_constant(struct brw_context *brw,
  uint32_t opcode,
  uint32_t mocs,
  uint16_t read_length_0,
  uint16_t read_length_1,
  uint16_t read_length_2,
  uint16_t read_length_3,
  uint64_t ptr_0,
  uint64_t ptr_1,
  uint64_t ptr_2,
  uint64_t ptr_3)
{
   // XXX: or in mocs wherever it goes
   if (brw->gen >= 8) {
  BEGIN_BATCH(11);
  OUT_BATCH(opcode << 16 | (11 - 2));
  OUT_BATCH(read_length_0 | read_length_1 << 16);
  OUT_BATCH(read_length_2 | read_length_3 << 16);
  OUT_BATCH64(ptr_0);
  OUT_RELOC64_NULL(ptr_1);
  OUT_RELOC64_NULL(ptr_2);
  OUT_RELOC64_NULL(ptr_3);
  ADVANCE_BATCH();
   } else if (brw->gen == 7) {
  /* XXX: we could even put asserts here about the buffers being enabled
   * in order, i.e. if you use 2 you have to use 0 and 1 also
   */
  BEGIN_BATCH(7);
  OUT_BATCH(opcode << 16 | (11 - 2));
  OUT_BATCH(read_length_0 | read_length_1 << 16);
  OUT_BATCH(read_length_2 | read_length_3 << 16);
  OUT_BATCH(ptr_0);
  OUT_RELOC_NULL(ptr_1);
  OUT_RELOC_NULL(ptr_2);
  OUT_RELOC_NULL(ptr_3);
  ADVANCE_BATCH();
   } else if (brw->gen == 6) {
  /* XXX: could probably do gen6 here too */
   } else {
  unreachable("unhandled gen in emit_3dstate_constant");
   }
}

void
gen7_upload_constant_state(struct brw_context *brw,
   const struct brw_stage_state *stage_state,
   bool active, unsigned opcode)
{
   uint32_t mocs = brw->gen < 8 ? GEN7_MOCS_L3 : 0;

   /* Disable if the shader stage is inactive or there are no push constants. */
   active = active && stage_state->push_const_size != 0;

   if (!active) {
  emit_3dstate_constant(brw, opcode, mocs, 0, 0, 0, 0, 0, 0, 0, 0);
   } else if (brw->gen >= 9) {
  /* Workaround for SKL+ (we use option #2 until we have a need for more
   * constant buffers). This comes from the documentation for 
3DSTATE_CONSTANT_*
   *
   * The driver must ensure The following case does not occur without a 
flush
   * to the 3D engine: 3DSTATE_CONSTANT_* with buffer 3 read length equal to
   * zero committed followed by a 3DSTATE_CONSTANT_* with buffer 0 read 
length
   * not equal to zero committed. Possible ways to avoid this condition
   * include:
   * 1. always force buffer 3 to have a non zero read length
   * 2. always force buffer 0 to a zero read length
   */
  emit_3dstate_constant(brw, opcode, mocs,
0, stage_state->push_const_size, 0, 0,
0, stage_state->push_const_offset, 0, 0);
   } else {
  emit_3dstate_constant(brw, opcode, mocs,
stage_state->push_const_size, 0, 0, 0,
stage_state->push_const_offset, 0, 0, 0);
   }
   
   /* On SKL+ the new constants don't take effect until the next corresponding
* 3DSTATE_BINDING_TABLE_POINTER_* command is parsed so we need to ensure
* that is sent
*/
if (brw->gen >= 9)
   brw->ctx.NewDriverState |= BRW_NEW_SURFACES;
}

By using a static inline, all the code for unused buffers *should* get
compiled away, seeing as it's all 0.  We might have to stuff it in a .h
file, or put all the gen6+ constbuf stuff in a single file, i.e.
gen6_push_constants.c.

Anyway, just an idea...

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix 32 bit build warnings in intel_get_yf_ys_bo_size()

2015-07-13 Thread Kenneth Graunke
On Monday, July 13, 2015 03:35:16 PM Anuj Phogat wrote:
> Along with fixing the type of pitch parameter, patch also changes
> the types of few local variables and function return type.
> 
> Warnings fixed are:
> intel_mipmap_tree.c:671:7: warning: passing argument 3 of
> 'intel_get_yf_ys_bo_size' from incompatible pointer type
> 
> intel_mipmap_tree.c:563:1: note: expected 'uint64_t *' but
> argument is of type 'long unsigned int *'
> 
> Reported-by: Kenneth Graunke 
> Signed-off-by: Anuj Phogat 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index fb896a9..1529651 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -559,14 +559,14 @@ intel_lower_compressed_format(struct brw_context *brw, 
> mesa_format format)
>  }
>  
>  /* This function computes Yf/Ys tiled bo size, alignment and pitch. */
> -static uint64_t
> +static unsigned long
>  intel_get_yf_ys_bo_size(struct intel_mipmap_tree *mt, unsigned *alignment,
> -uint64_t *pitch)
> +unsigned long *pitch)
>  {
> const uint32_t bpp = mt->cpp * 8;
> const uint32_t aspect_ratio = (bpp == 16 || bpp == 64) ? 2 : 1;
> uint32_t tile_width, tile_height;
> -   uint64_t stride, size, aligned_y;
> +   unsigned long stride, size, aligned_y;
>  
> assert(mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE);

This looks good to me.  I don't think anything should overflow, though I
didn't check thoroughly.  Thanks Anuj :)

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: free interface_types and replace old hash_table uses

2015-07-13 Thread Kenneth Graunke
On Monday, July 13, 2015 11:21:05 AM Iago Toral wrote:
> On Sat, 2015-07-11 at 10:13 +1000, Timothy Arceri wrote:
> > @@ -648,27 +653,28 @@ glsl_type::get_array_instance(const glsl_type *base, 
> > unsigned array_size)
> > mtx_lock(&glsl_type::mutex);
> >  
> > if (array_types == NULL) {
> > -  array_types = hash_table_ctor(64, hash_table_string_hash,
> > -   hash_table_string_compare);
> > +  array_types = _mesa_hash_table_create(NULL, _mesa_key_hash_string,
> > +_mesa_key_string_equal);
> > }
> >  
> > -   const glsl_type *t = (glsl_type *) hash_table_find(array_types, key);
> > -
> > -   if (t == NULL) {
> > +   const struct hash_entry *entry = _mesa_hash_table_search(array_types, 
> > key);
> > +   if (entry == NULL) {
> >mtx_unlock(&glsl_type::mutex);
> > -  t = new glsl_type(base, array_size);
> > +  const glsl_type *t = new glsl_type(base, array_size);
> >mtx_lock(&glsl_type::mutex);
> >  
> > -  hash_table_insert(array_types, (void *) t, ralloc_strdup(mem_ctx, 
> > key));
> > +  entry = _mesa_hash_table_insert(array_types,
> > +  ralloc_strdup(mem_ctx, key),
> > +  (void *) t);
> > }
> >  
> > -   assert(t->base_type == GLSL_TYPE_ARRAY);
> > -   assert(t->length == array_size);
> > -   assert(t->fields.array == base);
> > +   assert(((glsl_type *)entry->data)->base_type == GLSL_TYPE_ARRAY);
> > +   assert(((glsl_type *)entry->data)->length == array_size);
> > +   assert(((glsl_type *)entry->data)->fields.array == base);
> 
> Other parts of this file put a blank between the type cast and the
> variable, so I would add that here (and in all other places where you
> cast entry to glsl_type* in this patch).

Or...why not continue to have a local variable t, and just set t =
entry->data?  Then these could all stay the same, and there would be
less casting.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: free interface_types and replace old hash_table uses

2015-07-14 Thread Kenneth Graunke
On Tuesday, July 14, 2015 04:26:50 PM Timothy Arceri wrote:
> On Mon, 2015-07-13 at 22:19 -0700, Kenneth Graunke wrote:
> > On Monday, July 13, 2015 11:21:05 AM Iago Toral wrote:
> > > On Sat, 2015-07-11 at 10:13 +1000, Timothy Arceri wrote:
> > > > @@ -648,27 +653,28 @@ glsl_type::get_array_instance(const glsl_type 
> > > > *base, unsigned array_size)
> > > > mtx_lock(&glsl_type::mutex);
> > > >  
> > > > if (array_types == NULL) {
> > > > -  array_types = hash_table_ctor(64, hash_table_string_hash,
> > > > - hash_table_string_compare);
> > > > +  array_types = _mesa_hash_table_create(NULL, 
> > > > _mesa_key_hash_string,
> > > > +_mesa_key_string_equal);
> > > > }
> > > >  
> > > > -   const glsl_type *t = (glsl_type *) hash_table_find(array_types, 
> > > > key);
> > > > -
> > > > -   if (t == NULL) {
> > > > +   const struct hash_entry *entry = 
> > > > _mesa_hash_table_search(array_types, key);
> > > > +   if (entry == NULL) {
> > > >mtx_unlock(&glsl_type::mutex);
> > > > -  t = new glsl_type(base, array_size);
> > > > +  const glsl_type *t = new glsl_type(base, array_size);
> > > >mtx_lock(&glsl_type::mutex);
> > > >  
> > > > -  hash_table_insert(array_types, (void *) t, 
> > > > ralloc_strdup(mem_ctx, 
> > > > key));
> > > > +  entry = _mesa_hash_table_insert(array_types,
> > > > +  ralloc_strdup(mem_ctx, key),
> > > > +  (void *) t);
> > > > }
> > > >  
> > > > -   assert(t->base_type == GLSL_TYPE_ARRAY);
> > > > -   assert(t->length == array_size);
> > > > -   assert(t->fields.array == base);
> > > > +   assert(((glsl_type *)entry->data)->base_type == GLSL_TYPE_ARRAY);
> > > > +   assert(((glsl_type *)entry->data)->length == array_size);
> > > > +   assert(((glsl_type *)entry->data)->fields.array == base);
> > > 
> > > Other parts of this file put a blank between the type cast and the
> > > variable, so I would add that here (and in all other places where you
> > > cast entry to glsl_type* in this patch).
> > 
> > Or...why not continue to have a local variable t, and just set t =
> > entry->data?  Then these could all stay the same, and there would be
> > less casting.
> 
> I did have it like that but in my opinion it just looked messy. 2 extra lines
> vs 3 extra casts all of which are in asserts(), it felt like I was making the
> surrounding code worse for the sake of the asserts.
> 
> I've pushed this already so I guess it doesn't matter now anyway.

Yup, just an idea - what you had was fine.  Thanks!


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: Detect and provide macros for function attributes pure and const.

2015-07-14 Thread Kenneth Graunke
On Tuesday, July 14, 2015 11:45:57 AM Eric Anholt wrote:
> These are really useful hints to the compiler in the absence of link-time
> optimization, and I'm going to use them in VC4.
> 
> I've made the const attribute be ATTRIBUTE_CONST unlike other function
> attributes, because we have other things in the tree #defining CONST for
> their own unrelated purposes.
> ---
>  configure.ac  |  2 ++
>  src/util/macros.h | 20 
>  2 files changed, 22 insertions(+)
> 
> diff --git a/configure.ac b/configure.ac
> index bdfd134..38ad398 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -210,6 +210,8 @@ AX_GCC_FUNC_ATTRIBUTE([format])
>  AX_GCC_FUNC_ATTRIBUTE([malloc])
>  AX_GCC_FUNC_ATTRIBUTE([packed])
>  AX_GCC_FUNC_ATTRIBUTE([unused])
> +AX_GCC_FUNC_ATTRIBUTE([const])
> +AX_GCC_FUNC_ATTRIBUTE([pure])
>  AX_GCC_FUNC_ATTRIBUTE([warn_unused_result])
>  
>  AM_CONDITIONAL([GEN_ASM_OFFSETS], test "x$GEN_ASM_OFFSETS" = xyes)
> diff --git a/src/util/macros.h b/src/util/macros.h
> index 66698e7..4d16183 100644
> --- a/src/util/macros.h
> +++ b/src/util/macros.h
> @@ -130,6 +130,26 @@ do {   \
>  #define PACKED
>  #endif
>  
> +/* Attribute pure is used for functions that have no effects other than their
> + * return value.  As a result, calls to it can be dead code eliminated.
> + */
> +#ifdef HAVE_FUNC_ATTRIBUTE_PURE
> +#define PURE __attribute__((__pure__))
> +#else
> +#define PURE
> +#endif
> +
> +/* Attribute const is used for functions that have no effects other than 
> their
> + * return value, and only rely on the argument values to compute the return
> + * value.  As a result, calls to it can be CSEed.  Note that using memory
> + * pointed to by the arguments is not allowed for const functions.
> + */
> +#ifdef HAVE_FUNC_ATTRIBUTE_CONST
> +#define ATTRIBUTE_CONST __attribute__((__const__))
> +#else
> +#define ATTRIBUTE_CONST
> +#endif
> +
>  #ifdef __cplusplus
>  /**
>   * Macro function that evaluates to true if T is a trivially
> 

This patch is:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix comment about DRM_IOCTL_I915_GEM_WAIT.

2015-07-15 Thread Kenneth Graunke
From: Chris Wilson 

The kernel actually waits forever when supplied a timeout value < 0,
rather than returning immediately.  See i915_gem_wait_ioctl() in
i915_gem.c's call to __i915_wait_request().

(split by Ken from a large patch authored by Chris Wilson)

Reviewed-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/intel_syncobj.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_syncobj.c 
b/src/mesa/drivers/dri/i965/intel_syncobj.c
index c44c4be..c2f4fa9 100644
--- a/src/mesa/drivers/dri/i965/intel_syncobj.c
+++ b/src/mesa/drivers/dri/i965/intel_syncobj.c
@@ -105,9 +105,9 @@ brw_fence_client_wait(struct brw_context *brw, struct 
brw_fence *fence,
assert(fence->batch_bo);
 
/* DRM_IOCTL_I915_GEM_WAIT uses a signed 64 bit timeout and returns
-* immediately for timeouts <= 0.  The best we can do is to clamp the
-* timeout to INT64_MAX.  This limits the maximum timeout from 584 years to
-* 292 years - likely not a big deal.
+* immediately for timeout == 0, and indefinitely if timeout is negative.
+* The best we can do is to clamp the timeout to INT64_MAX.  This limits
+* the maximum timeout from 584 years to 292 years - likely not a big deal.
 */
if (timeout > INT64_MAX)
   timeout = INT64_MAX;
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] nir: add the ability insert a CF node after an instruction

2015-07-16 Thread Kenneth Graunke
From: Connor Abbott 

This will split the block containing the instruction and put the CF node
in between.

v2: (by Kenneth Graunke)
- Simplify split_block_after_instr()'s implementation by using
  split_block_end() rather than duplicating code.
- Fix a bug in nir_cf_node_insert_after_instr() where inserting a
  non-block after the last instruction would cause update_if_uses()
  to be called twice, making us try to add the same SSA def to the
  if_uses list twice, corrupting the list.
- Comment changes.

Cc: Jason Ekstrand 
Signed-off-by: Kenneth Graunke 
---
 src/glsl/nir/nir.c | 62 ++
 src/glsl/nir/nir.h |  3 +++
 2 files changed, 65 insertions(+)

Nothing uses this yet, but I've tested it with my SIMD8 geometry shader patches,
which use this to replace emit_vertex intrinsics with if blocks (for "safety
checks" that make sure the program hasn't emitted too many vertices).  It seems
to work just fine, and seems like a really useful piece of infrastructure to
have, so I'm submitting it now.

Jason, would you mind reviewing it, since Connor and I both hacked on it?
It would be nice to have a non-author take a look at it :)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index 78ff886..0c53bab 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -843,6 +843,29 @@ split_block_end(nir_block *block)
 }
 
 /**
+ * Creates a new block, and moves all the instructions after the given
+ * instruction to the new block.
+ */
+static nir_block *
+split_block_after_instr(nir_instr *instr)
+{
+   /* We don't have to do anything special for handling jump instructions,
+* as this will move the successors associated with the jump to the new
+* block already.
+*/
+   nir_block *new_block = split_block_end(instr->block);
+
+   nir_instr *cur_instr;
+   while ((cur_instr = nir_instr_next(instr)) != NULL) {
+  exec_node_remove(&cur_instr->node);
+  exec_list_push_tail(&new_block->instr_list, &cur_instr->node);
+  cur_instr->block = new_block;
+   }
+
+   return new_block;
+}
+
+/**
  * Inserts a non-basic block between two basic blocks and links them together.
  */
 
@@ -1124,6 +1147,45 @@ nir_cf_node_insert_after(nir_cf_node *node, nir_cf_node 
*after)
 }
 
 void
+nir_cf_node_insert_after_instr(nir_instr *instr, nir_cf_node *after)
+{
+   /* If the instruction is the last in its block, then this is equivalent
+* to inserting the CF node after this block.  Just call that, to avoid
+* attempting to split blocks unnecessarily.
+*/
+   if (nir_instr_is_last(instr)) {
+  nir_cf_node_insert_after(&instr->block->cf_node, after);
+  return;
+   }
+
+   update_if_uses(after);
+
+   if (after->type == nir_cf_node_block) {
+  /* We're attempting to insert a block after an instruction; instead,
+   * just move all of the instructions into the existing block.  Actually
+   * removing and adding them would involve removing and adding uses/defs,
+   * which we don't need to do, so just take them off the list directly.
+   */
+  nir_block *after_block = nir_cf_node_as_block(after);
+  nir_foreach_instr_safe_reverse(after_block, new_instr) {
+ exec_node_remove(&new_instr->node);
+ new_instr->block = instr->block;
+ exec_node_insert_after(&instr->node, &new_instr->node);
+  }
+   } else {
+  /* We're inserting a loop or if after an instruction.  Split up the
+   * basic block and insert it between those two blocks.
+   */
+  nir_block *before_block = instr->block;
+  nir_block *after_block = split_block_after_instr(instr);
+  insert_non_block(before_block, after, after_block);
+   }
+
+   nir_function_impl *impl = nir_cf_node_get_function(&instr->block->cf_node);
+   nir_metadata_preserve(impl, nir_metadata_none);
+}
+
+void
 nir_cf_node_insert_before(nir_cf_node *node, nir_cf_node *before)
 {
update_if_uses(before);
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 62cdbd4..6efbc18 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1506,6 +1506,9 @@ void nir_cf_node_insert_after(nir_cf_node *node, 
nir_cf_node *after);
 /** puts a control flow node immediately before another control flow node */
 void nir_cf_node_insert_before(nir_cf_node *node, nir_cf_node *before);
 
+/** puts a control flow node immediately after a given instruction */
+void nir_cf_node_insert_after_instr(nir_instr *instr, nir_cf_node *after);
+
 /** puts a control flow node at the beginning of a list from an if, loop, or 
function */
 void nir_cf_node_insert_begin(struct exec_list *list, nir_cf_node *node);
 
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] nir: add nir_instr_is_first() and nir_instr_is_last() helpers

2015-07-16 Thread Kenneth Graunke
From: Connor Abbott 

Reviewed-by: Kenneth Graunke 
---
 src/glsl/nir/nir.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index e9a506c..0db1fc3 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -443,6 +443,18 @@ nir_instr_prev(nir_instr *instr)
   return exec_node_data(nir_instr, prev, node);
 }
 
+static inline bool
+nir_instr_is_first(nir_instr *instr)
+{
+   return exec_node_is_head_sentinel(exec_node_get_prev(&instr->node));
+}
+
+static inline bool
+nir_instr_is_last(nir_instr *instr)
+{
+   return exec_node_is_tail_sentinel(exec_node_get_next(&instr->node));
+}
+
 typedef struct {
/** for debugging only, can be NULL */
const char* name;
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] nir: add nir_foreach_instr_safe_reverse()

2015-07-16 Thread Kenneth Graunke
From: Connor Abbott 

Reviewed-by: Kenneth Graunke 
---
 src/glsl/nir/nir.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 0db1fc3..62cdbd4 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1233,6 +1233,8 @@ nir_block_last_instr(nir_block *block)
foreach_list_typed_reverse(nir_instr, instr, node, &(block)->instr_list)
 #define nir_foreach_instr_safe(block, instr) \
foreach_list_typed_safe(nir_instr, instr, node, &(block)->instr_list)
+#define nir_foreach_instr_safe_reverse(block, instr) \
+   foreach_list_typed_safe_reverse(nir_instr, instr, node, 
&(block)->instr_list)
 
 typedef struct nir_if {
nir_cf_node cf_node;
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5 2/6] i965: Enable resource streamer for the batchbuffer

2015-07-16 Thread Kenneth Graunke
On Friday, July 03, 2015 10:00:30 AM Abdiel Janulgue wrote:
> Check first if the hardware and kernel supports resource streamer. If this
> is allowed, tell the kernel to enable the resource streamer enable bit on
> MI_BATCHBUFFER_START by specifying I915_EXEC_RESOURCE_STREAMER
> execbuffer flags.
> 
> v2: - Use new I915_PARAM_HAS_RESOURCE_STREAMER ioctl to check if kernel
>   supports RS (Ken).
> - Add brw_device_info::has_resource_streamer and toggle it for
>   Haswell, Broadwell, Cherryview, Skylake, and Broxton (Ken).
> v3: - Update I915_PARAM_HAS_RESOURCE_STREAMER to match updated kernel.
> v4: - Always inspect the getparam.value (Chris Wilson).
> v5: - Fold redundant devinfo->has_resource_streamer check in context create
>   into init screen.
> 
> Cc: kenn...@whitecape.org
> Cc: ch...@chris-wilson.co.uk
> Signed-off-by: Abdiel Janulgue 

This patch is:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/nir/fs: removed unneeded support for global variables

2015-07-16 Thread Kenneth Graunke
On Friday, June 26, 2015 01:47:48 PM Alejandro Piñeiro wrote:
> As functions are inlined, and nir_lower_global_vars_to_local gets
> run, all global variables are lowered to local variables.
> ---
> 
> Jason Enkstrand already confirmed that global support is not needed
> on the bug open for the nir/vec4 support:
> https://bugs.freedesktop.org/show_bug.cgi?id=89580#c9
> 
> So this patch just apply that answer to the fs path. 
> 
> Full piglit run. No regressions.
> 
>  src/mesa/drivers/dri/i965/brw_fs.h   |  1 -
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 14 ++
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  1 -
>  3 files changed, 2 insertions(+), 14 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index 243baf6..c49d0f8 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -345,7 +345,6 @@ public:
> unsigned max_grf;
>  
> fs_reg *nir_locals;
> -   fs_reg *nir_globals;
> fs_reg nir_inputs;
> fs_reg nir_outputs;
> fs_reg *nir_system_values;
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 59081ea..a648a5a 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -55,14 +55,6 @@ fs_visitor::emit_nir_code()
>  
> nir_emit_system_values(nir);
>  
> -   nir_globals = ralloc_array(mem_ctx, fs_reg, nir->reg_alloc);
> -   foreach_list_typed(nir_register, reg, node, &nir->registers) {
> -  unsigned array_elems =
> - reg->num_array_elems == 0 ? 1 : reg->num_array_elems;
> -  unsigned size = array_elems * reg->num_components;
> -  nir_globals[reg->index] = bld.vgrf(BRW_REGISTER_TYPE_F, size);
> -   }
> -
> /* get the main function and emit it */
> nir_foreach_overload(nir, overload) {
>assert(strcmp(overload->function->name, "main") == 0);
> @@ -1151,10 +1143,8 @@ fs_reg_for_nir_reg(fs_visitor *v, nir_register 
> *nir_reg,
> unsigned base_offset, nir_src *indirect)
>  {
> fs_reg reg;
> -   if (nir_reg->is_global)
> -  reg = v->nir_globals[nir_reg->index];
> -   else
> -  reg = v->nir_locals[nir_reg->index];
> +

Perhaps include a sanity check:

   assert(!nir_reg->is_global);

Either way,
Reviewed-by: Kenneth Graunke 

> +   reg = v->nir_locals[nir_reg->index];
>  
> reg = offset(reg, base_offset * nir_reg->num_components);
> if (indirect) {
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 9a4bad6..90d5706 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -2012,7 +2012,6 @@ fs_visitor::fs_visitor(const struct brw_compiler 
> *compiler, void *log_data,
> this->no16_msg = NULL;
>  
> this->nir_locals = NULL;
> -   this->nir_globals = NULL;
>  
> memset(&this->payload, 0, sizeof(this->payload));
> memset(this->outputs, 0, sizeof(this->outputs));
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   5   6   7   8   9   10   >