Re: [Mesa-dev] [PATCH v2 6/7] i965: Enable arb_transform_feedback_overflow_query.

2016-12-09 Thread Jordan Justen
On 2016-12-09 13:39:52, Rafael Antognolli wrote:
> This extension adds new query types which can be used to detect overflow
> of transform feedback buffers. The new query types are also accepted by
> conditional rendering commands.
> 
> Signed-off-by: Rafael Antognolli 
> ---
>  docs/features.txt| 2 +-
>  docs/relnotes/13.1.0.html| 1 +
>  src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/features.txt b/docs/features.txt
> index c27d521..bb7925e 100644
> --- a/docs/features.txt
> +++ b/docs/features.txt
> @@ -303,7 +303,7 @@ Khronos, ARB, and OES extensions that are not part of any 
> OpenGL or OpenGL ES ve
>GL_ARB_sparse_texture2not started
>GL_ARB_sparse_texture_clamp   not started
>GL_ARB_texture_filter_minmax  not started
> -  GL_ARB_transform_feedback_overflow_query  not started
> +  GL_ARB_transform_feedback_overflow_query  DONE (i965/gen7+)
>GL_KHR_blend_equation_advanced_coherent   DONE (i965/gen9+)
>GL_KHR_no_error   not started
>GL_KHR_texture_compression_astc_hdr   DONE (core only)
> diff --git a/docs/relnotes/13.1.0.html b/docs/relnotes/13.1.0.html
> index 5b8b016..4f52cd1 100644
> --- a/docs/relnotes/13.1.0.html
> +++ b/docs/relnotes/13.1.0.html
> @@ -45,6 +45,7 @@ Note: some of the new features are only available with 
> certain drivers.
>  
>  
>  GL_ARB_post_depth_coverage on i965/gen9+
> +GL_ARB_transform_feedback_overflow_query on i965/gen7+
>  GL_NV_image_formats on any driver supporting 
> GL_ARB_shader_image_load_store (i965, nvc0, radeonsi, softpipe)
>  
>  
> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
> b/src/mesa/drivers/dri/i965/intel_extensions.c
> index c1f42aa..d5e4164 100644
> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> @@ -320,6 +320,7 @@ intelInitExtensions(struct gl_context *ctx)
>ctx->Extensions.EXT_framebuffer_multisample = true;
>ctx->Extensions.EXT_framebuffer_multisample_blit_scaled = true;
>ctx->Extensions.EXT_transform_feedback = true;
> +  ctx->Extensions.ARB_transform_feedback_overflow_query = true;

Is this enabling the extension on gen6?

Should it depend on brw->predicate.supported (and thus the next
patch)?

-Jordan

>ctx->Extensions.OES_depth_texture_cube_map = true;
>ctx->Extensions.OES_sample_variables = true;
>  
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] gallium: add renderonly library

2016-12-09 Thread Alexandre Courbot
Hi Emil,

On 12/09/2016 11:20 PM, Emil Velikov wrote:
> On 9 December 2016 at 13:20, Alexandre Courbot  wrote:
>> On 12/08/2016 04:16 PM, Alexandre Courbot wrote:
>>> On 11/30/2016 10:44 PM, Christian Gmeiner wrote:
 This a very lightweight library to add basic support for
 renderonly GPUs. It does all the magic regarding in/exporting
 buffers etc. This library will likely break android support and
 hopefully will get replaced with a better solution based on gbm2.
>>>
>>> Since we have no idea when said better solution will be available, and
>>> the situation of render-only GPUs has been unsustainable for way too
>>> long, I really hope a solution like this one can be merged in the meantime.
>>>
>>> I have tried it after porting support for Tegra
>>> (https://github.com/austriancoder/mesa/commit/2c7354701ee21ca28f69f5d7588f1d497553b4bf)
>>> to this latest version. Here are a few issues I have met:
>>>
>>> First, setting the tiling works indeed just fine if we are using an
>>> ioctl for this. However my impression was that the preferred way of
>>> doing it was through FB modifiers, and we started moving Tegra to this
>>> scheme. Problem: the FB modifier is passed through a call to
>>> drmModeAddFB2WithModifiers(), which is called by the client program, not
>>> Mesa - which in this case leaves the program with the burden of figuring
>>> out what the modifier should be. So with FB modifiers the problem is
>>> still here.
>>>
>>> Another issue I have seen is that GLX does not seem to work with this.
>>> X/modesetting starts just fine, and GLamor also seems to initialize.
>>> However glxinfo freezes on a xshmfence_await() call, and all GLX
>>> programs fail as follow:
>>
>> Solved that issue by forcing is_different_gpu to true in
>> loader_dri3_drawable_init() (pretty hackish, looking for a better way).
>>
>> Also I had another issue with Wayland where EGL windows would be
>> displayed all black. I traced this to the fact that Wayland was trying
>> to share the buffer by calling the old FLINK ioctl on the rendernode
>> device, which is forbidden. Opening card1 instead of renderD128 did the
>> trick as a workaround, but I am surprised as I thought Wayland was using
>> DRI3 exclusively? I am not very familiar with neither Mesa nor Wayland
>> though, so my assumption may very well be incorrect.
>>
> Some of these issues is due to the hardcoded nature of the card/render
> node. I've had drmDevice API which could/should be extended and
> utilised here.
> Earlier versions were quite buggy, so make sure to use
> 677cd97dc4a930af508388713f5016baf664ed18 or later.

My libdrm was 2.4.74, so I think this patch is there.

> 
> Since from kernel there is no relation between the KMS and GPU device,
> one will need to apply some heuristics locally. At some point we might
> want to make things more systematic/configurable, but let's get it
> working first ;-)
> 
> Thus, please propose/add anything to drmDevice that will you think is
> enough to build some heuristics on.
> 
> With that sorted, the Wayland FLINK issues should go away.

Thanks for the hint. I will look into that.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Return LINEAR encoding for winsys FBO depth/stencil.

2016-12-09 Thread Jordan Justen
Reviewed-by: Jordan Justen 

On 2016-12-09 17:16:37, Kenneth Graunke wrote:
> GetFramebufferAttachmentParameteriv should return GL_LINEAR for the
> window system default framebuffer's GL_DEPTH or GL_STENCIL attachments
> when there are zero depth or stencil bits.
> 
> The GL 4.5 spec's GetFramebufferAttachmentParameteriv section says:
> 
> "If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is not NONE,
>  these queries apply to all other framebuffer types:
> 
>  [...]
> 
>  If attachment is not a color attachment, or no data storage or texture
>  image has been specified for the attachment, then params will contain
>  the value LINEAR."
> 
> Note that we already return LINEAR for the case where there is an actual
> depth or stencil renderbuffer attached.  In the case modified by this
> patch, FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE returns FRAMEBUFFER_DEFAULT
> rather than NONE.
> 
> Fixes a CTS test when run in a visual without depth / stencil buffers:
> GL45-CTS.gtf30.GL3Tests.framebuffer_srgb.framebuffer_srgb_default_encoding
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/main/fbobject.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
> index 64c4ab5..26fc15d 100644
> --- a/src/mesa/main/fbobject.c
> +++ b/src/mesa/main/fbobject.c
> @@ -3788,8 +3788,13 @@ _mesa_get_framebuffer_attachment_parameter(struct 
> gl_context *ctx,
>   goto invalid_pname_enum;
>}
>else if (att->Type == GL_NONE) {
> - _mesa_error(ctx, err, "%s(invalid pname %s)", caller,
> - _mesa_enum_to_string(pname));
> + if (_mesa_is_winsys_fbo(buffer) &&
> + (attachment == GL_DEPTH || attachment == GL_STENCIL)) {
> +*params = GL_LINEAR;
> + } else {
> +_mesa_error(ctx, err, "%s(invalid pname %s)", caller,
> +_mesa_enum_to_string(pname));
> + }
>}
>else {
>   if (ctx->Extensions.EXT_framebuffer_sRGB) {
> -- 
> 2.10.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] gallium: add renderonly library

2016-12-09 Thread Alexandre Courbot
Hi Daniel,

On 12/09/2016 11:13 PM, Daniel Stone wrote:
> Hi Alexandre,
> 
> On 9 December 2016 at 13:20, Alexandre Courbot  wrote:
>> On 12/08/2016 04:16 PM, Alexandre Courbot wrote:
>>> First, setting the tiling works indeed just fine if we are using an
>>> ioctl for this. However my impression was that the preferred way of
>>> doing it was through FB modifiers, and we started moving Tegra to this
>>> scheme. Problem: the FB modifier is passed through a call to
>>> drmModeAddFB2WithModifiers(), which is called by the client program, not
>>> Mesa - which in this case leaves the program with the burden of figuring
>>> out what the modifier should be. So with FB modifiers the problem is
>>> still here.
>>>
>>> Another issue I have seen is that GLX does not seem to work with this.
>>> X/modesetting starts just fine, and GLamor also seems to initialize.
>>> However glxinfo freezes on a xshmfence_await() call, and all GLX
>>> programs fail as follow:
>>
>> Solved that issue by forcing is_different_gpu to true in
>> loader_dri3_drawable_init() (pretty hackish, looking for a better way).
>>
>> Also I had another issue with Wayland where EGL windows would be
>> displayed all black. I traced this to the fact that Wayland was trying
>> to share the buffer by calling the old FLINK ioctl on the rendernode
>> device, which is forbidden. Opening card1 instead of renderD128 did the
>> trick as a workaround, but I am surprised as I thought Wayland was using
>> DRI3 exclusively? I am not very familiar with neither Mesa nor Wayland
>> though, so my assumption may very well be incorrect.
> 
> Wayland doesn't use DRI-anything; Mesa has its own interface for
> Wayland. I'm really surprised that you're seeing this behaviour
> though: if you search for WL_DRM_CAPABILITY_PRIME (i.e. send dmabufs
> rather than flink names) in src/egl/drivers/dri2/platform_wayland.c,
> you'll see that a) we always use it when available, and b) we refuse
> to initialise when the device is a rendernode and we don't have PRIME.
> So I'm not sure how this could ever happen ...

Interesting. I will try to get to the bottom of this as a way to improve
my (weak) understanding of Mesa.

> 
>> Anyway, with this patch and the corresponding Tegra support, I have a
>> working solution that can run unmodified Mesa applications using KMS,
>> EGL/Wayland and GLX backends on TK1 and TX1 platforms. Neat!
> 
> Cool! I assume this will work on Tegra124 more generally then - do you
> have a branch somewhere?

Yes, anything using Tegra124 or Tegra210 with working display drivers (a
few Chromebooks so far, and probably the Pixel C sometime soon) should
directly benefit from it.

I have pushed a branch (just Christian's initial branch + a port of his
first Tegra patch and my hacks to make Wayland and GLX work) here:
https://github.com/Gnurou/mesa/tree/renderonly

> 
>> Considering that we have been ressorting to hacking all the KMS
>> applications of interest to connect the render and display nodes
>> together with the right tiling settings for the last two years, I regard
>> this patch as a huge improvement for mobile graphics and would like to
>> strongly support it.
>>
>> My only remaining concern is that this scheme cannot support the case
>> where the tiling format is specified using FB modifiers, since this
>> requires drmModeAddFB2WithModifiers() to be called from the application.
>> So for Tegra we have to resort to a staging, not enabled by default
>> SET_TILING ioctl. Not ideal, but recompiling your kernel with an
>> additional config option is much less a hassle than patching every KMS
>> app under the sun.
>>
>> So while thoughts about how this last issue can be addressed are
>> welcome, I think this little lib can improve the life of many SoC users.
> 
> Check out Ben Widawsky's 'Renderbuffer Decompression (and GBM
> modifiers)' patchset. With this, as well as krh's pending GETPLANE2
> ioctl that will allow us to get a list of acceptable modifiers for
> display from KMS, we can trivially implement this in clients without
> the need for a backchannel ioctl:
> https://git.collabora.com/cgit/user/daniels/weston.git/commit/?h=wip/2016-11/gbm-planes-modifiers

That should make a good reading for the weekend. :) Thanks!

Alex.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: make use of nir_lower_returns() for GL

2016-12-09 Thread Timothy Arceri
Updated shader-db numbers for BDW with recent shader-db:

fills helped:   shaders/closed/steam/deus-ex-mankind-
divided/306.shader_test CS SIMD16: 56 -> 53 (-5.36%)
fills helped:   shaders/closed/steam/deus-ex-mankind-
divided/206.shader_test CS SIMD16: 56 -> 53 (-5.36%)

total instructions in shared programs: 13065576 -> 13065524 (-0.00%)
instructions in affected programs: 37675 -> 37623 (-0.14%)
helped: 36
HURT: 4

total cycles in shared programs: 295966670 -> 295964212 (-0.00%)
cycles in affected programs: 10168934 -> 10166476 (-0.02%)
helped: 38
HURT: 5

total fills in shared programs: 20301 -> 20295 (-0.03%)
fills in affected programs: 112 -> 106 (-5.36%)
helped: 2
HURT: 0


On Fri, 2016-12-09 at 16:49 +1100, Timothy Arceri wrote:
> total instructions in shared programs: 8673389 -> 8673371 (-0.00%)
> instructions in affected programs: 558 -> 540 (-3.23%)
> helped: 2
> HURT: 0
> 
> total cycles in shared programs: 73195178 -> 73195104 (-0.00%)
> cycles in affected programs: 45680 -> 45606 (-0.16%)
> helped: 2
> HURT: 1
> ---
>  src/mesa/drivers/dri/i965/brw_link.cpp  | 6 --
>  src/mesa/drivers/dri/i965/brw_program.c | 2 ++
>  2 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp
> b/src/mesa/drivers/dri/i965/brw_link.cpp
> index 3f6041b..38d1349 100644
> --- a/src/mesa/drivers/dri/i965/brw_link.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_link.cpp
> @@ -161,12 +161,6 @@ process_glsl_ir(struct brw_context *brw,
>   brw_do_vector_splitting(shader->ir);
>    }
>  
> -  progress = do_lower_jumps(shader->ir, true, true,
> -true, /* main return */
> -false, /* continue */
> -false /* loops */
> -) || progress;
> -
>    progress = do_common_optimization(shader->ir, true, true,
>  options, ctx-
> >Const.NativeIntegers) || progress;
> } while (progress);
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c
> b/src/mesa/drivers/dri/i965/brw_program.c
> index a502b8e..c4ab5ee 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -78,6 +78,8 @@ brw_create_nir(struct brw_context *brw,
> if (shader_prog) {
>    nir = glsl_to_nir(shader_prog, stage, options);
>    nir_remove_dead_variables(nir, nir_var_shader_in |
> nir_var_shader_out);
> +  nir_lower_returns(nir);
> +  nir_validate_shader(nir);
>    NIR_PASS_V(nir, nir_lower_io_to_temporaries,
>   nir_shader_get_entrypoint(nir), true, false);
> } else {
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/10] nir: add a loop unrolling pass

2016-12-09 Thread Timothy Arceri
On Fri, 2016-12-09 at 17:23 -0800, Jason Ekstrand wrote:
> Wow!  This is way better than the last time I read through it.  Good
> work!
> 
> Overall, I'm much happier with the code now.  The structure is
> better, some of the crazy phi logic is gone, and clone_cf_list is
> helping a lot.  That said... I still have a pile of comments.  Most
> of them are cosmetic, one is a bug, and one is a suggestion for how
> to make things simpler.  I think fixing some of the cosmetic stuff,
> especially in unroll_complex, will help with readability a lot.
> 
> The suggestion, which I'd like to highlight here, was to take
> advantage of nir_repair_ssa and see if we can git rid of a bunch of
> the phi node pain from unroll_complex.  Most of the pain in that part
> of the pass appears to be in trying to deal with the phi nodes at the
> end of the pass and put everything back where it belongs.  The
> repair_ssa pass (which I think I wrote after you started on this
> project) is a tiny pass that takes a shader that is in "mostly" ssa
> form and makes it into "proper" NIR ssa.  The only requirement on the
> shader is tha the sources of the phi nodes point to the correct SSA
> values.  Dominance doesn't have to hold and you don't have to have
> the whole ladder of phi nodes so long as the data flow is correct.  I
> *think* based on my sketchy reading of things, that we should be able
> to just apply the final remap table to each of the phis after the
> loop and then just run repair_ssa on the shader once we're done
> unrolling.
> 
> On Mon, Dec 5, 2016 at 5:12 PM, Timothy Arceri  ora.com> wrote:
> > V2:
> > - tidy ups suggested by Connor.
> > - tidy up cloning logic and handle copy propagation
> >  based of suggestion by Connor.
> > - use nir_ssa_def_rewrite_uses to fix up lcssa phis
> >   suggested by Connor.
> > - add support for complex loop unrolling (two terminators)
> > - handle case were the ssa defs use outside the loop is already a
> > phi
> > - support unrolling loops with multiple terminators when trip count
> >   is know for each terminator
> > 
> > V3:
> > - set correct num_components when creating phi in complex unroll
> > - rewrite update remap table based on Jasons suggestions.
> > - remove unrequired extract_loop_body() helper as suggested by
> > Jason.
> > - simplify the lcssa phi fix up code for simple loops as per Jasons
> > suggestions.
> > - use mem context to keep track of hash table memory as suggested
> > by Jason.
> > - move is_{complex,simple}_loop helpers to the unroll code
> > - require nir_metadata_block_index
> > - partially rewrote complex unroll to be simpler and easier to
> > follow.
> > 
> > V4:
> > - use rzalloc() when creating nir_phi_src but not setting pred
> > right away
> >  fixes regression cause by ralloc() no longer zeroing memory.
> > ---
> >  src/compiler/Makefile.sources          |   1 +
> >  src/compiler/nir/nir.h                 |   2 +
> >  src/compiler/nir/nir_opt_loop_unroll.c | 729
> > +
> >  3 files changed, 732 insertions(+)
> >  create mode 100644 src/compiler/nir/nir_opt_loop_unroll.c
> > 
> > diff --git a/src/compiler/Makefile.sources
> > b/src/compiler/Makefile.sources
> > index d3e158a..799fb38 100644
> > --- a/src/compiler/Makefile.sources
> > +++ b/src/compiler/Makefile.sources
> > @@ -238,6 +238,7 @@ NIR_FILES = \
> >         nir/nir_opt_dead_cf.c \
> >         nir/nir_opt_gcm.c \
> >         nir/nir_opt_global_to_local.c \
> > +       nir/nir_opt_loop_unroll.c \
> >         nir/nir_opt_peephole_select.c \
> >         nir/nir_opt_remove_phis.c \
> >         nir/nir_opt_undef.c \
> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> > index d948e97..b8813e4 100644
> > --- a/src/compiler/nir/nir.h
> > +++ b/src/compiler/nir/nir.h
> > @@ -2573,6 +2573,8 @@ bool nir_opt_dead_cf(nir_shader *shader);
> > 
> >  bool nir_opt_gcm(nir_shader *shader, bool value_number);
> > 
> > +bool nir_opt_loop_unroll(nir_shader *shader, nir_variable_mode
> > indirect_mask);
> > +
> >  bool nir_opt_peephole_select(nir_shader *shader, unsigned limit);
> > 
> >  bool nir_opt_remove_phis(nir_shader *shader);
> > diff --git a/src/compiler/nir/nir_opt_loop_unroll.c
> > b/src/compiler/nir/nir_opt_loop_unroll.c
> > new file mode 100644
> > index 000..8715757
> > --- /dev/null
> > +++ b/src/compiler/nir/nir_opt_loop_unroll.c
> > @@ -0,0 +1,729 @@
> > +/*
> > + * Copyright © 2016 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person
> > obtaining a
> > + * copy of this software and associated documentation files (the
> > "Software"),
> > + * to deal in the Software without restriction, including without
> > limitation
> > + * the rights to use, copy, modify, merge, publish, distribute,
> > sublicense,
> > + * and/or sell copies of the Software, and to permit persons to
> > whom the
> > + * Software is furnished to do so, subject to the following
> > conditions:
> > + *
> > + * The 

[Mesa-dev] [PATCH] glsl: Make copy propagation not panic when it sees an intrinsic.

2016-12-09 Thread Kenneth Graunke
A number of games have large arrays of constants, which we promote to
uniforms.  This introduces copies from the uniform array to the original
temporary array.  Normally, copy propagation eliminates those copies,
making everything refer to the uniform array directly.

A number of shaders in "Deus Ex: Mankind Divided" recently exposed a
limitation of copy propagation - if we had any intrinsics (i.e. image
access in a compute shader), we weren't able to get rid of these copies.

That meant that any variable indexing remained on the temporary array
rather being moved to the uniform array.  i965's scalar backend
currently doesn't support indirect addressing of temporary arrays,
which meant lowering it to if-ladders.  This was horrible.

On Skylake:

total instructions in shared programs: 13700090 -> 13654519 (-0.33%)
instructions in affected programs: 56438 -> 10867 (-80.75%)
helped: 14
HURT: 0

total cycles in shared programs: 288879704 -> 291270232 (0.83%)
cycles in affected programs: 12758080 -> 15148608 (18.74%)
helped: 6
HURT: 8

All shaders helped are compute shaders in Tomb Raider or Deus Ex.

Signed-off-by: Kenneth Graunke 
---
 src/compiler/glsl/opt_copy_propagation.cpp | 31 ++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/src/compiler/glsl/opt_copy_propagation.cpp 
b/src/compiler/glsl/opt_copy_propagation.cpp
index 247c498..2240421 100644
--- a/src/compiler/glsl/opt_copy_propagation.cpp
+++ b/src/compiler/glsl/opt_copy_propagation.cpp
@@ -186,11 +186,34 @@ ir_copy_propagation_visitor::visit_enter(ir_call *ir)
   }
}
 
-   /* Since we're unlinked, we don't (necessarily) know the side effects of
-* this call.  So kill all copies.
+   /* Since this pass can run when unlinked, we don't (necessarily) know
+* the side effects of calls.  (When linked, most calls are inlined
+* anyway, so it doesn't matter much.)
+*
+* One place where this does matter is IR intrinsics.  They're never
+* inlined.  We also know what they do - while some have side effects
+* (such as image writes), none edit random global variables.  So we
+* can assume they're side-effect free (other than the return value
+* and out parameters).
 */
-   _mesa_hash_table_clear(acp, NULL);
-   this->killed_all = true;
+   if (!ir->callee->is_intrinsic()) {
+  _mesa_hash_table_clear(acp, NULL);
+  this->killed_all = true;
+   } else {
+  if (ir->return_deref)
+ kill(ir->return_deref->var);
+
+  foreach_two_lists(formal_node, >callee->parameters,
+actual_node, >actual_parameters) {
+ ir_variable *sig_param = (ir_variable *) formal_node;
+ if (sig_param->data.mode == ir_var_function_out ||
+ sig_param->data.mode == ir_var_function_inout) {
+ir_rvalue *ir = (ir_rvalue *) actual_node;
+ir_variable *var = ir->variable_referenced();
+kill(var);
+ }
+  }
+   }
 
return visit_continue_with_parent;
 }
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/10] nir: add helper for cloning nir_cf_list

2016-12-09 Thread Timothy Arceri
On Fri, 2016-12-09 at 14:20 -0800, Jason Ekstrand wrote:
> On Mon, Dec 5, 2016 at 5:12 PM, Timothy Arceri  ora.com> wrote:
> > V2:
> > - updated to create a generic list clone helper nir_cf_list_clone()
> > - continue to assert on clone when fallback flag not set as
> > suggested
> >   by Jason.
> > ---
> >  src/compiler/nir/nir_clone.c        | 58
> > +++--
> >  src/compiler/nir/nir_control_flow.h |  3 ++
> >  2 files changed, 52 insertions(+), 9 deletions(-)
> > 
> > diff --git a/src/compiler/nir/nir_clone.c
> > b/src/compiler/nir/nir_clone.c
> > index e6483b1..b9b7829 100644
> > --- a/src/compiler/nir/nir_clone.c
> > +++ b/src/compiler/nir/nir_clone.c
> > @@ -22,7 +22,7 @@
> >   */
> > 
> >  #include "nir.h"
> > -#include "nir_control_flow_private.h"
> > +#include "nir_control_flow.h"
> > 
> >  /* Secret Decoder Ring:
> >   *   clone_foo():
> > @@ -35,6 +35,11 @@ typedef struct {
> >     /* True if we are cloning an entire shader. */
> >     bool global_clone;
> > 
> > +   /* This allows us to clone a loop body without having to add
> > srcs from
> > +    * outside the loop to the remap table. This is useful for loop
> > unrolling.
> 
> It would be better of this comment started with a description of what
> the variable means rather than a use-case.  The other is fine, but we
> should say some thing such as "allow the clone operation to fall back
> to the original pointer if no clone pointer is found in the remap
> table."
>  
> > +    */
> > +   bool allow_remap_fallback;
> > +
> >     /* maps orig ptr -> cloned ptr: */
> >     struct hash_table *remap_table;
> > 
> > @@ -46,11 +51,19 @@ typedef struct {
> >  } clone_state;
> > 
> >  static void
> > -init_clone_state(clone_state *state, bool global)
> > +init_clone_state(clone_state *state, struct hash_table
> > *remap_table,
> > +                 bool global, bool allow_remap_fallback)
> >  {
> >     state->global_clone = global;
> > -   state->remap_table = _mesa_hash_table_create(NULL,
> > _mesa_hash_pointer,
> > -                                               
> > _mesa_key_pointer_equal);
> > +   state->allow_remap_fallback = allow_remap_fallback;
> > +
> > +   if (remap_table) {
> > +      state->remap_table = remap_table;
> > +   } else {
> > +      state->remap_table = _mesa_hash_table_create(NULL,
> > _mesa_hash_pointer,
> > +                                                 
> >  _mesa_key_pointer_equal);
> > +   }
> > +
> >     list_inithead(>phi_srcs);
> >  }
> > 
> > @@ -72,9 +85,10 @@ _lookup_ptr(clone_state *state, const void *ptr,
> > bool global)
> >        return (void *)ptr;
> > 
> >     entry = _mesa_hash_table_search(state->remap_table, ptr);
> > -   assert(entry && "Failed to find pointer!");
> > -   if (!entry)
> > -      return NULL;
> > +   if (!entry) {
> > +      assert(state->allow_remap_fallback);
> > +      return (void *)ptr;
> > +   }
> > 
> >     return entry->data;
> >  }
> > @@ -613,6 +627,32 @@ fixup_phi_srcs(clone_state *state)
> >     assert(list_empty(>phi_srcs));
> >  }
> > 
> > +void
> > +nir_cf_list_clone(nir_cf_list *dst, nir_cf_list *src, nir_cf_node
> > *parent,
> > +                  struct hash_table *remap_table)
> > +{
> > +   exec_list_make_empty(>list);
> > +   dst->impl = src->impl;
> > +
> > +   if (exec_list_is_empty(>list))
> > +      return;
> > +
> > +   clone_state state;
> > +   init_clone_state(, remap_table, false, true);
> > +
> > +   /* We use the same shader */
> > +   state.ns = src->impl->function->shader;
> > +
> > +   /* Dest list needs to at least have one block */
> 
> I'm confused by this.  If src->list is empty, then we'll bale above
> and never get here.  Or is this just so that the control-flow code
> will be happy?

We exit if the src list is empty but the dst needs to at least have a
block to begin with to make clone_cf_list() happy. I'll add your
comment, thanks.

>  If that's the case, perhaps we should say something like "The
> control-flow code assumes that the list of cf_nodes always starts and
> ends with a block.  We start by adding an empty block."
> 
> With those two comments addressed, 6 and 7 are
> 
> Reviewed-by: Jason Ekstrand 
> 
> This is much nicer.  Thank you!
>  
> > +   nir_block *nblk = nir_block_create(state.ns);
> > +   nblk->cf_node.parent = parent;
> > +   exec_list_push_tail(>list, >cf_node.node); 
> > +
> > +   clone_cf_list(, >list, >list);
> > +
> > +   fixup_phi_srcs();
> > +}
> > +
> >  static nir_function_impl *
> >  clone_function_impl(clone_state *state, const nir_function_impl
> > *fi)
> >  {
> > @@ -646,7 +686,7 @@ nir_function_impl *
> >  nir_function_impl_clone(const nir_function_impl *fi)
> >  {
> >     clone_state state;
> > -   init_clone_state(, false);
> > +   init_clone_state(, NULL, false, false);
> > 
> >     /* We use the same shader */
> >     state.ns = fi->function->shader;
> > @@ -686,7 +726,7 @@ nir_shader *
> >  nir_shader_clone(void *mem_ctx, const 

Re: [Mesa-dev] [PATCH shader-db] run: don't use alloca to avoid random crashes in the GLSL compiler

2016-12-09 Thread Marek Olšák
On Sat, Dec 10, 2016 at 2:07 AM, Timothy Arceri
 wrote:
> On Wed, 2016-12-07 at 18:33 +0100, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> ---
>>  run.c | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/run.c b/run.c
>> index 08fd543..ded224a 100644
>> --- a/run.c
>> +++ b/run.c
>> @@ -656,28 +656,32 @@ main(int argc, char **argv)
>>
>>  /* If there's only one GLSL shader, mark it separable so
>>   * inputs and outputs aren't eliminated.
>>   */
>>  if (num_shaders == 1 && type != TYPE_VP && type !=
>> TYPE_FP)
>>  use_separate_shader_objects = true;
>>
>>  if (use_separate_shader_objects) {
>>  for (unsigned i = 0; i < num_shaders; i++) {
>>  const char *const_text;
>> -char *text = alloca(shader[i].length + 1);
>> +unsigned size = shader[i].length + 1000;
>
> I was hitting this crash also, shader[i].length + 1 worked for me.
>
> Assuming that changing it to + 1 works for you.
>
> Reviewed-by: Timothy Arceri 

Thanks. I've pushed the commit with that.

I'd also like to have some answers from valgrind, but that thing takes
forever to run.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/10] nir: add a loop unrolling pass

2016-12-09 Thread Jason Ekstrand
Wow!  This is way better than the last time I read through it.  Good work!

Overall, I'm much happier with the code now.  The structure is better, some
of the crazy phi logic is gone, and clone_cf_list is helping a lot.  That
said... I still have a pile of comments.  Most of them are cosmetic, one is
a bug, and one is a suggestion for how to make things simpler.  I think
fixing some of the cosmetic stuff, especially in unroll_complex, will help
with readability a lot.

The suggestion, which I'd like to highlight here, was to take advantage of
nir_repair_ssa and see if we can git rid of a bunch of the phi node pain
from unroll_complex.  Most of the pain in that part of the pass appears to
be in trying to deal with the phi nodes at the end of the pass and put
everything back where it belongs.  The repair_ssa pass (which I think I
wrote after you started on this project) is a tiny pass that takes a shader
that is in "mostly" ssa form and makes it into "proper" NIR ssa.  The only
requirement on the shader is tha the sources of the phi nodes point to the
correct SSA values.  Dominance doesn't have to hold and you don't have to
have the whole ladder of phi nodes so long as the data flow is correct.  I
*think* based on my sketchy reading of things, that we should be able to
just apply the final remap table to each of the phis after the loop and
then just run repair_ssa on the shader once we're done unrolling.

On Mon, Dec 5, 2016 at 5:12 PM, Timothy Arceri  wrote:

> V2:
> - tidy ups suggested by Connor.
> - tidy up cloning logic and handle copy propagation
>  based of suggestion by Connor.
> - use nir_ssa_def_rewrite_uses to fix up lcssa phis
>   suggested by Connor.
> - add support for complex loop unrolling (two terminators)
> - handle case were the ssa defs use outside the loop is already a phi
> - support unrolling loops with multiple terminators when trip count
>   is know for each terminator
>
> V3:
> - set correct num_components when creating phi in complex unroll
> - rewrite update remap table based on Jasons suggestions.
> - remove unrequired extract_loop_body() helper as suggested by Jason.
> - simplify the lcssa phi fix up code for simple loops as per Jasons
> suggestions.
> - use mem context to keep track of hash table memory as suggested by Jason.
> - move is_{complex,simple}_loop helpers to the unroll code
> - require nir_metadata_block_index
> - partially rewrote complex unroll to be simpler and easier to follow.
>
> V4:
> - use rzalloc() when creating nir_phi_src but not setting pred right away
>  fixes regression cause by ralloc() no longer zeroing memory.
> ---
>  src/compiler/Makefile.sources  |   1 +
>  src/compiler/nir/nir.h |   2 +
>  src/compiler/nir/nir_opt_loop_unroll.c | 729
> +
>  3 files changed, 732 insertions(+)
>  create mode 100644 src/compiler/nir/nir_opt_loop_unroll.c
>
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index d3e158a..799fb38 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -238,6 +238,7 @@ NIR_FILES = \
> nir/nir_opt_dead_cf.c \
> nir/nir_opt_gcm.c \
> nir/nir_opt_global_to_local.c \
> +   nir/nir_opt_loop_unroll.c \
> nir/nir_opt_peephole_select.c \
> nir/nir_opt_remove_phis.c \
> nir/nir_opt_undef.c \
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index d948e97..b8813e4 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2573,6 +2573,8 @@ bool nir_opt_dead_cf(nir_shader *shader);
>
>  bool nir_opt_gcm(nir_shader *shader, bool value_number);
>
> +bool nir_opt_loop_unroll(nir_shader *shader, nir_variable_mode
> indirect_mask);
> +
>  bool nir_opt_peephole_select(nir_shader *shader, unsigned limit);
>
>  bool nir_opt_remove_phis(nir_shader *shader);
> diff --git a/src/compiler/nir/nir_opt_loop_unroll.c
> b/src/compiler/nir/nir_opt_loop_unroll.c
> new file mode 100644
> index 000..8715757
> --- /dev/null
> +++ b/src/compiler/nir/nir_opt_loop_unroll.c
> @@ -0,0 +1,729 @@
> +/*
> + * Copyright © 2016 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> next
> + * paragraph) shall be included in all copies or substantial portions of
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 

[Mesa-dev] [PATCH] mesa: Return LINEAR encoding for winsys FBO depth/stencil.

2016-12-09 Thread Kenneth Graunke
GetFramebufferAttachmentParameteriv should return GL_LINEAR for the
window system default framebuffer's GL_DEPTH or GL_STENCIL attachments
when there are zero depth or stencil bits.

The GL 4.5 spec's GetFramebufferAttachmentParameteriv section says:

"If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is not NONE,
 these queries apply to all other framebuffer types:

 [...]

 If attachment is not a color attachment, or no data storage or texture
 image has been specified for the attachment, then params will contain
 the value LINEAR."

Note that we already return LINEAR for the case where there is an actual
depth or stencil renderbuffer attached.  In the case modified by this
patch, FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE returns FRAMEBUFFER_DEFAULT
rather than NONE.

Fixes a CTS test when run in a visual without depth / stencil buffers:
GL45-CTS.gtf30.GL3Tests.framebuffer_srgb.framebuffer_srgb_default_encoding

Signed-off-by: Kenneth Graunke 
---
 src/mesa/main/fbobject.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
index 64c4ab5..26fc15d 100644
--- a/src/mesa/main/fbobject.c
+++ b/src/mesa/main/fbobject.c
@@ -3788,8 +3788,13 @@ _mesa_get_framebuffer_attachment_parameter(struct 
gl_context *ctx,
  goto invalid_pname_enum;
   }
   else if (att->Type == GL_NONE) {
- _mesa_error(ctx, err, "%s(invalid pname %s)", caller,
- _mesa_enum_to_string(pname));
+ if (_mesa_is_winsys_fbo(buffer) &&
+ (attachment == GL_DEPTH || attachment == GL_STENCIL)) {
+*params = GL_LINEAR;
+ } else {
+_mesa_error(ctx, err, "%s(invalid pname %s)", caller,
+_mesa_enum_to_string(pname));
+ }
   }
   else {
  if (ctx->Extensions.EXT_framebuffer_sRGB) {
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH shader-db] run: don't use alloca to avoid random crashes in the GLSL compiler

2016-12-09 Thread Timothy Arceri
On Wed, 2016-12-07 at 18:33 +0100, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  run.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/run.c b/run.c
> index 08fd543..ded224a 100644
> --- a/run.c
> +++ b/run.c
> @@ -656,28 +656,32 @@ main(int argc, char **argv)
>  
>  /* If there's only one GLSL shader, mark it separable so
>   * inputs and outputs aren't eliminated.
>   */
>  if (num_shaders == 1 && type != TYPE_VP && type !=
> TYPE_FP)
>  use_separate_shader_objects = true;
>  
>  if (use_separate_shader_objects) {
>  for (unsigned i = 0; i < num_shaders; i++) {
>  const char *const_text;
> -char *text = alloca(shader[i].length + 1);
> +unsigned size = shader[i].length + 1000;

I was hitting this crash also, shader[i].length + 1 worked for me.

Assuming that changing it to + 1 works for you.

Reviewed-by: Timothy Arceri 


> +/* Using alloca crashes in the GLSL
> compiler.  */
> +char *text = malloc(size);
> +memset(text, 0, size);
>  
>  /* Make it zero-terminated. */
>  memcpy(text, shader[i].text, shader[i].length);
>  text[shader[i].length] = 0;
>  
>  const_text = text;
>  glCreateShaderProgramv(shader[i].type, 1,
> _text);
> +free(text);
>  }
>  } else if (type == TYPE_CORE || type == TYPE_COMPAT) {
>  GLuint prog = glCreateProgram();
>  
>  for (unsigned i = 0; i < num_shaders; i++) {
>  GLuint s = glCreateShader(shader[i].type);
>  glShaderSource(s, 1, [i].text,
> [i].length);
>  glCompileShader(s);
>  
>  GLint param;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] run: stop leaking SSO programs

2016-12-09 Thread Timothy Arceri
This was causing my poor 8GB laptop to run out on memory.
---
 run.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/run.c b/run.c
index 08fd543..6d635c1 100644
--- a/run.c
+++ b/run.c
@@ -670,7 +670,9 @@ main(int argc, char **argv)
 text[shader[i].length] = 0;
 
 const_text = text;
-glCreateShaderProgramv(shader[i].type, 1, _text);
+GLuint prog = glCreateShaderProgramv(shader[i].type, 1,
+ _text);
+glDeleteProgram(prog);
 }
 } else if (type == TYPE_CORE || type == TYPE_COMPAT) {
 GLuint prog = glCreateProgram();
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] i965: Fix dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer

2016-12-09 Thread Kenneth Graunke
On Friday, December 9, 2016 4:32:53 PM PST Chad Versace wrote:
> The inescapable vortex of HiZ finds me wherever I go...
> 
> This series brings us one step closer to passing the Android N CTS.
> 
> See https://bugs.freedesktop.org/show_bug.cgi?id=98329.
> 
> Chad Versace (2):
>   i965/mt: Disable aux surfaces after making miptree shareable
>   i965/mt: Disable HiZ when sharing depth buffer externally
> 
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 25 ++---
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> 

It would be good if Topi could take a look too, but these get my

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965/mt: Disable HiZ when sharing depth buffer externally (v2)

2016-12-09 Thread Chad Versace
intel_miptree_make_shareable() discarded and disabled CCS. Fix it so
that it discards and disables HiZ too.

Fixes 
dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer
on Skylake.

v2: Actually do what the commit message says. Discard the HiZ buffer.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98329
Cc: Topi Pohjolainen 
Cc: Nanley Chery 
Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 29 ---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 15404dae32..c4afab94ca 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -945,6 +945,19 @@ intel_miptree_reference(struct intel_mipmap_tree **dst,
*dst = src;
 }
 
+static void
+intel_miptree_hiz_buffer_free(struct intel_miptree_hiz_buffer *hiz_buf)
+{
+   if (hiz_buf == NULL)
+  return;
+
+   if (hiz_buf->mt)
+  intel_miptree_release(_buf->mt);
+   else
+  drm_intel_bo_unreference(hiz_buf->aux_base.bo);
+
+   free(hiz_buf);
+}
 
 void
 intel_miptree_release(struct intel_mipmap_tree **mt)
@@ -961,13 +974,7 @@ intel_miptree_release(struct intel_mipmap_tree **mt)
   drm_intel_bo_unreference((*mt)->bo);
   intel_miptree_release(&(*mt)->stencil_mt);
   intel_miptree_release(&(*mt)->r8stencil_mt);
-  if ((*mt)->hiz_buf) {
- if ((*mt)->hiz_buf->mt)
-intel_miptree_release(&(*mt)->hiz_buf->mt);
- else
-drm_intel_bo_unreference((*mt)->hiz_buf->aux_base.bo);
- free((*mt)->hiz_buf);
-  }
+  intel_miptree_hiz_buffer_free((*mt)->hiz_buf);
   if ((*mt)->mcs_buf) {
  drm_intel_bo_unreference((*mt)->mcs_buf->bo);
  free((*mt)->mcs_buf);
@@ -2311,6 +2318,8 @@ intel_miptree_all_slices_resolve_color(struct brw_context 
*brw,
  * Fast color clears are unsafe with shared buffers, so we need to resolve and
  * then discard the MCS buffer, if present.  We also set the no_ccs flag to
  * ensure that no MCS buffer gets allocated in the future.
+ *
+ * HiZ is similarly unsafe with shared buffers.
  */
 void
 intel_miptree_make_shareable(struct brw_context *brw,
@@ -2331,6 +2340,12 @@ intel_miptree_make_shareable(struct brw_context *brw,
   mt->mcs_buf = NULL;
}
 
+   if (mt->hiz_buf) {
+  intel_miptree_all_slices_resolve_depth(brw, mt);
+  intel_miptree_hiz_buffer_free(mt->hiz_buf);
+  mt->hiz_buf = NULL;
+   }
+
mt->disable_aux_buffers = true;
 }
 
-- 
2.11.0.rc2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965/mt: Disable HiZ when sharing depth buffer externally

2016-12-09 Thread Chad Versace
intel_miptree_make_shareable() discarded and disabled CCS. Fix it so
that it discards and disables HiZ too.

Fixes 
dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer
on Skylake.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98329
Cc: Haixia Shi 
Cc: Topi Pohjolainen 
Cc: Nanley Chery mt)
+  intel_miptree_release(_buf->mt);
+   else
+  drm_intel_bo_unreference(hiz_buf->aux_base.bo);
+
+   free(hiz_buf);
+}
 
 void
 intel_miptree_release(struct intel_mipmap_tree **mt)
@@ -961,13 +974,7 @@ intel_miptree_release(struct intel_mipmap_tree **mt)
   drm_intel_bo_unreference((*mt)->bo);
   intel_miptree_release(&(*mt)->stencil_mt);
   intel_miptree_release(&(*mt)->r8stencil_mt);
-  if ((*mt)->hiz_buf) {
- if ((*mt)->hiz_buf->mt)
-intel_miptree_release(&(*mt)->hiz_buf->mt);
- else
-drm_intel_bo_unreference((*mt)->hiz_buf->aux_base.bo);
- free((*mt)->hiz_buf);
-  }
+  intel_miptree_hiz_buffer_free((*mt)->hiz_buf);
   if ((*mt)->mcs_buf) {
  drm_intel_bo_unreference((*mt)->mcs_buf->bo);
  free((*mt)->mcs_buf);
@@ -2311,6 +2318,8 @@ intel_miptree_all_slices_resolve_color(struct brw_context 
*brw,
  * Fast color clears are unsafe with shared buffers, so we need to resolve and
  * then discard the MCS buffer, if present.  We also set the no_ccs flag to
  * ensure that no MCS buffer gets allocated in the future.
+ *
+ * HiZ is similarly unsafe with shared buffers.
  */
 void
 intel_miptree_make_shareable(struct brw_context *brw,
-- 
2.11.0.rc2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/2] i965: Fix dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer

2016-12-09 Thread Chad Versace
The inescapable vortex of HiZ finds me wherever I go...

This series brings us one step closer to passing the Android N CTS.

See https://bugs.freedesktop.org/show_bug.cgi?id=98329.

Chad Versace (2):
  i965/mt: Disable aux surfaces after making miptree shareable
  i965/mt: Disable HiZ when sharing depth buffer externally

 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

-- 
2.11.0.rc2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965/mt: Disable aux surfaces after making miptree shareable

2016-12-09 Thread Chad Versace
The entire goal of intel_miptree_make_shareable() is to permanently
disable the miptree's aux surfaces. So set
intel_mipmap_tree:disable_aux_buffers after the function's done with
discarding down the aux surfaces.

References: https://bugs.freedesktop.org/show_bug.cgi?id=98329
Cc: Haixia Shi 
Cc: Topi Pohjolainen 
Cc: Nanley Chery mcs_buf);
   mt->mcs_buf = NULL;
}
+
+   mt->disable_aux_buffers = true;
 }
 
 
-- 
2.11.0.rc2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/28] configure: remove unneeded dri3/present proto requirements

2016-12-09 Thread Eric Anholt
Emil Velikov  writes:

> From: Emil Velikov 
>
> All the information required is provided via the respecive xcb packages.

I was confused by the commit message, as I thought you meant that xcb
had the depends.  But what it is is that we don't actually include these
headers at all, and we use XCB instead.

1 and 2 are:

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] mesa: main: use _NEW_MULTISAMPLE for conservative rasterization state

2016-12-09 Thread Marek Olšák
Hi,

If the state change doesn't require any state validation in mesa/main,
it shouldn't flag _NEW_MULTISAMPLE. Instead, a new flag should be
added to gl_driver_flags and used here. The final code:

FLUSH_VERTICES(ctx, 0);
ctx->NewDriverState |= ctx->DriverFlags.NewIntelConservativeRasterization;

In your driver, you can set:
cts->DriverFlags.NewIntelConservativeRasterization = BRW_/*whatever
updates that state*/;

Marek

On Thu, Dec 8, 2016 at 11:59 AM, Lionel Landwerlin
 wrote:
> Suggested by Ilia.
>
> Signed-off-by: Lionel Landwerlin 
> Cc: Ilia Mirkin 
> ---
>  src/mesa/drivers/dri/i965/gen8_ps_state.c | 5 ++---
>  src/mesa/drivers/dri/i965/gen8_sf_state.c | 2 +-
>  src/mesa/main/enable.c| 2 +-
>  3 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c 
> b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> index e43192d..24a062e 100644
> --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> @@ -53,7 +53,7 @@ gen8_upload_ps_extra(struct brw_context *brw,
> if (prog_data->persample_dispatch)
>dw1 |= GEN8_PSX_SHADER_IS_PER_SAMPLE;
>
> -   /* _NEW_POLYGON */
> +   /* _NEW_MULTISAMPLE */
> if (prog_data->uses_sample_mask) {
>if (brw->gen >= 9) {
>   if (prog_data->post_depth_coverage)
> @@ -291,8 +291,7 @@ upload_ps_state(struct brw_context *brw)
>
>  const struct brw_tracked_state gen8_ps_state = {
> .dirty = {
> -  .mesa  = _NEW_MULTISAMPLE |
> -   _NEW_POLYGON,
> +  .mesa  = _NEW_MULTISAMPLE,
>.brw   = BRW_NEW_BATCH |
> BRW_NEW_BLORP |
> BRW_NEW_FS_PROG_DATA,
> diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c 
> b/src/mesa/drivers/dri/i965/gen8_sf_state.c
> index afe7b52..f227f33 100644
> --- a/src/mesa/drivers/dri/i965/gen8_sf_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_sf_state.c
> @@ -319,7 +319,7 @@ upload_raster(struct brw_context *brw)
>}
> }
>
> -   /* _NEW_POLYGON */
> +   /* _NEW_MULTISAMPLE */
> if (ctx->IntelConservativeRasterization) {
>if (brw->gen >= 9)
>   dw1 |= GEN9_RASTER_CONSERVATIVE_RASTERIZATION_ENABLE;
> diff --git a/src/mesa/main/enable.c b/src/mesa/main/enable.c
> index c9f10ab..8440c62 100644
> --- a/src/mesa/main/enable.c
> +++ b/src/mesa/main/enable.c
> @@ -444,7 +444,7 @@ _mesa_set_enable(struct gl_context *ctx, GLenum cap, 
> GLboolean state)
>  goto invalid_enum_error;
>   if (ctx->IntelConservativeRasterization == state)
>  return;
> - FLUSH_VERTICES(ctx, _NEW_POLYGON);
> + FLUSH_VERTICES(ctx, _NEW_MULTISAMPLE);
>   ctx->IntelConservativeRasterization = state;
>   break;
>case GL_COLOR_LOGIC_OP:
> --
> 2.10.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 25/25] radeonsi: shrink the GSVS ring to account for the reduced item sizes

2016-12-09 Thread Marek Olšák
For the rest:

Reviewed-by: Marek Olšák 

Marek

On Tue, Dec 6, 2016 at 11:48 AM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> ---
>  src/gallium/drivers/radeonsi/si_state_shaders.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
> b/src/gallium/drivers/radeonsi/si_state_shaders.c
> index 151ed17..4a7f638 100644
> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
> @@ -1954,21 +1954,21 @@ static bool si_update_gs_ring_buffers(struct 
> si_context *sctx)
> unsigned max_size = ((unsigned)(63.999 * 1024 * 1024) & ~255) * 
> num_se;
>
> /* Calculate the minimum size. */
> unsigned min_esgs_ring_size = align(es->esgs_itemsize * 
> gs_vertex_reuse *
> wave_size, alignment);
>
> /* These are recommended sizes, not minimum sizes. */
> unsigned esgs_ring_size = max_gs_waves * 2 * wave_size *
>   es->esgs_itemsize * 
> gs->gs_input_verts_per_prim;
> unsigned gsvs_ring_size = max_gs_waves * 2 * wave_size *
> - gs->max_gsvs_emit_size * (gs->max_gs_stream 
> + 1);
> + gs->max_gsvs_emit_size;
>
> min_esgs_ring_size = align(min_esgs_ring_size, alignment);
> esgs_ring_size = align(esgs_ring_size, alignment);
> gsvs_ring_size = align(gsvs_ring_size, alignment);
>
> esgs_ring_size = CLAMP(esgs_ring_size, min_esgs_ring_size, max_size);
> gsvs_ring_size = MIN2(gsvs_ring_size, max_size);
>
> /* Some rings don't have to be allocated if shaders don't use them.
>  * (e.g. no varyings between ES and GS or GS and VS)
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/10] nir: add helper for cloning nir_cf_list

2016-12-09 Thread Jason Ekstrand
On Mon, Dec 5, 2016 at 5:12 PM, Timothy Arceri  wrote:

> V2:
> - updated to create a generic list clone helper nir_cf_list_clone()
> - continue to assert on clone when fallback flag not set as suggested
>   by Jason.
> ---
>  src/compiler/nir/nir_clone.c| 58 ++
> +--
>  src/compiler/nir/nir_control_flow.h |  3 ++
>  2 files changed, 52 insertions(+), 9 deletions(-)
>
> diff --git a/src/compiler/nir/nir_clone.c b/src/compiler/nir/nir_clone.c
> index e6483b1..b9b7829 100644
> --- a/src/compiler/nir/nir_clone.c
> +++ b/src/compiler/nir/nir_clone.c
> @@ -22,7 +22,7 @@
>   */
>
>  #include "nir.h"
> -#include "nir_control_flow_private.h"
> +#include "nir_control_flow.h"
>
>  /* Secret Decoder Ring:
>   *   clone_foo():
> @@ -35,6 +35,11 @@ typedef struct {
> /* True if we are cloning an entire shader. */
> bool global_clone;
>
> +   /* This allows us to clone a loop body without having to add srcs from
> +* outside the loop to the remap table. This is useful for loop
> unrolling.
>

It would be better of this comment started with a description of what the
variable means rather than a use-case.  The other is fine, but we should
say some thing such as "allow the clone operation to fall back to the
original pointer if no clone pointer is found in the remap table."


> +*/
> +   bool allow_remap_fallback;
> +
> /* maps orig ptr -> cloned ptr: */
> struct hash_table *remap_table;
>
> @@ -46,11 +51,19 @@ typedef struct {
>  } clone_state;
>
>  static void
> -init_clone_state(clone_state *state, bool global)
> +init_clone_state(clone_state *state, struct hash_table *remap_table,
> + bool global, bool allow_remap_fallback)
>  {
> state->global_clone = global;
> -   state->remap_table = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
> -_mesa_key_pointer_equal);
> +   state->allow_remap_fallback = allow_remap_fallback;
> +
> +   if (remap_table) {
> +  state->remap_table = remap_table;
> +   } else {
> +  state->remap_table = _mesa_hash_table_create(NULL,
> _mesa_hash_pointer,
> +
>  _mesa_key_pointer_equal);
> +   }
> +
> list_inithead(>phi_srcs);
>  }
>
> @@ -72,9 +85,10 @@ _lookup_ptr(clone_state *state, const void *ptr, bool
> global)
>return (void *)ptr;
>
> entry = _mesa_hash_table_search(state->remap_table, ptr);
> -   assert(entry && "Failed to find pointer!");
> -   if (!entry)
> -  return NULL;
> +   if (!entry) {
> +  assert(state->allow_remap_fallback);
> +  return (void *)ptr;
> +   }
>
> return entry->data;
>  }
> @@ -613,6 +627,32 @@ fixup_phi_srcs(clone_state *state)
> assert(list_empty(>phi_srcs));
>  }
>
> +void
> +nir_cf_list_clone(nir_cf_list *dst, nir_cf_list *src, nir_cf_node *parent,
> +  struct hash_table *remap_table)
> +{
> +   exec_list_make_empty(>list);
> +   dst->impl = src->impl;
> +
> +   if (exec_list_is_empty(>list))
> +  return;
> +
> +   clone_state state;
> +   init_clone_state(, remap_table, false, true);
> +
> +   /* We use the same shader */
> +   state.ns = src->impl->function->shader;
> +
> +   /* Dest list needs to at least have one block */
>

I'm confused by this.  If src->list is empty, then we'll bale above and
never get here.  Or is this just so that the control-flow code will be
happy?  If that's the case, perhaps we should say something like "The
control-flow code assumes that the list of cf_nodes always starts and ends
with a block.  We start by adding an empty block."

With those two comments addressed, 6 and 7 are

Reviewed-by: Jason Ekstrand 

This is much nicer.  Thank you!


> +   nir_block *nblk = nir_block_create(state.ns);
> +   nblk->cf_node.parent = parent;
> +   exec_list_push_tail(>list, >cf_node.node);
>
+
> +   clone_cf_list(, >list, >list);
> +
> +   fixup_phi_srcs();
> +}
> +
>  static nir_function_impl *
>  clone_function_impl(clone_state *state, const nir_function_impl *fi)
>  {
> @@ -646,7 +686,7 @@ nir_function_impl *
>  nir_function_impl_clone(const nir_function_impl *fi)
>  {
> clone_state state;
> -   init_clone_state(, false);
> +   init_clone_state(, NULL, false, false);
>
> /* We use the same shader */
> state.ns = fi->function->shader;
> @@ -686,7 +726,7 @@ nir_shader *
>  nir_shader_clone(void *mem_ctx, const nir_shader *s)
>  {
> clone_state state;
> -   init_clone_state(, true);
> +   init_clone_state(, NULL, true, false);
>
> nir_shader *ns = nir_shader_create(mem_ctx, s->stage, s->options,
> NULL);
> state.ns = ns;
> diff --git a/src/compiler/nir/nir_control_flow.h b/src/compiler/nir/nir_
> control_flow.h
> index b71382f..b496aec 100644
> --- a/src/compiler/nir/nir_control_flow.h
> +++ b/src/compiler/nir/nir_control_flow.h
> @@ -141,6 +141,9 @@ void nir_cf_reinsert(nir_cf_list *cf_list, nir_cursor
> cursor);
>
>  void nir_cf_delete(nir_cf_list *cf_list);
>
> +void 

Re: [Mesa-dev] [PATCH v4 7/7] gallium: add pipe_screen::resource_changed callback wrappers

2016-12-09 Thread Marek Olšák
On Tue, Dec 6, 2016 at 5:17 PM, Philipp Zabel  wrote:
> Add resource_changed to the ddebug, rbug, and trace wrappers. Since it
> is optional, there is no need to add it to noop.
>
> Signed-off-by: Philipp Zabel 
> Suggested-by: Nicolai Hähnle 
> ---
>  src/gallium/drivers/ddebug/dd_screen.c | 10 ++
>  src/gallium/drivers/rbug/rbug_screen.c | 11 +++
>  src/gallium/drivers/trace/tr_screen.c  | 20 
>  3 files changed, 41 insertions(+)
>
> diff --git a/src/gallium/drivers/ddebug/dd_screen.c 
> b/src/gallium/drivers/ddebug/dd_screen.c
> index a0c0dd0..3e20abe 100644
> --- a/src/gallium/drivers/ddebug/dd_screen.c
> +++ b/src/gallium/drivers/ddebug/dd_screen.c
> @@ -227,6 +227,15 @@ dd_screen_resource_from_user_memory(struct pipe_screen 
> *_screen,
>  }
>
>  static void
> +dd_screen_resource_changed(struct pipe_screen *_screen,
> +   struct pipe_resource *res)
> +{
> +   struct pipe_screen *screen = dd_screen(_screen)->screen;
> +
> +   screen->resource_changed(screen, res);
> +}
> +
> +static void
>  dd_screen_resource_destroy(struct pipe_screen *_screen,
> struct pipe_resource *res)
>  {
> @@ -385,6 +394,7 @@ ddebug_screen_create(struct pipe_screen *screen)
> dscreen->base.resource_from_handle = dd_screen_resource_from_handle;
> SCR_INIT(resource_from_user_memory);
> dscreen->base.resource_get_handle = dd_screen_resource_get_handle;
> +   dscreen->base.resource_changed = dd_screen_resource_changed;

This should use SCR_INIT, because it's optional.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 6/7] i965: Enable arb_transform_feedback_overflow_query.

2016-12-09 Thread Rafael Antognolli
This extension adds new query types which can be used to detect overflow
of transform feedback buffers. The new query types are also accepted by
conditional rendering commands.

Signed-off-by: Rafael Antognolli 
---
 docs/features.txt| 2 +-
 docs/relnotes/13.1.0.html| 1 +
 src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/features.txt b/docs/features.txt
index c27d521..bb7925e 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -303,7 +303,7 @@ Khronos, ARB, and OES extensions that are not part of any 
OpenGL or OpenGL ES ve
   GL_ARB_sparse_texture2not started
   GL_ARB_sparse_texture_clamp   not started
   GL_ARB_texture_filter_minmax  not started
-  GL_ARB_transform_feedback_overflow_query  not started
+  GL_ARB_transform_feedback_overflow_query  DONE (i965/gen7+)
   GL_KHR_blend_equation_advanced_coherent   DONE (i965/gen9+)
   GL_KHR_no_error   not started
   GL_KHR_texture_compression_astc_hdr   DONE (core only)
diff --git a/docs/relnotes/13.1.0.html b/docs/relnotes/13.1.0.html
index 5b8b016..4f52cd1 100644
--- a/docs/relnotes/13.1.0.html
+++ b/docs/relnotes/13.1.0.html
@@ -45,6 +45,7 @@ Note: some of the new features are only available with 
certain drivers.
 
 
 GL_ARB_post_depth_coverage on i965/gen9+
+GL_ARB_transform_feedback_overflow_query on i965/gen7+
 GL_NV_image_formats on any driver supporting 
GL_ARB_shader_image_load_store (i965, nvc0, radeonsi, softpipe)
 
 
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index c1f42aa..d5e4164 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -320,6 +320,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.EXT_framebuffer_multisample = true;
   ctx->Extensions.EXT_framebuffer_multisample_blit_scaled = true;
   ctx->Extensions.EXT_transform_feedback = true;
+  ctx->Extensions.ARB_transform_feedback_overflow_query = true;
   ctx->Extensions.OES_depth_texture_cube_map = true;
   ctx->Extensions.OES_sample_variables = true;
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 5/7] i965: Add support for xfb overflow query on conditional render.

2016-12-09 Thread Rafael Antognolli
Enable the use of a transform feedback overflow query with
glBeginConditionalRender. The render commands will only execute if the
query is true (i.e. if there was an overflow).

Use ARB_conditional_render_inverted to change this behavior.

Signed-off-by: Rafael Antognolli 
---
 src/mesa/drivers/dri/i965/brw_conditional_render.c | 111 +++--
 1 file changed, 101 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c 
b/src/mesa/drivers/dri/i965/brw_conditional_render.c
index 122a4ec..db2b722 100644
--- a/src/mesa/drivers/dri/i965/brw_conditional_render.c
+++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
@@ -48,20 +48,83 @@ set_predicate_enable(struct brw_context *brw,
 }
 
 static void
-set_predicate_for_result(struct brw_context *brw,
- struct brw_query_object *query,
- bool inverted)
+set_predicate_for_overflow_query(struct brw_context *brw,
+ struct brw_query_object *query,
+ int stream_start, int count)
 {
-   int load_op;
+   /* R3 = R4 - R3; generated vertices
+* R1 = R2 - R1; written vertices
+* R1 = R3 - R1; there was an overflow on this stream
+* R0 = R0 | R1; accumulate whether there was any overflow
+*/
+   static const uint32_t maths[] = {
+  MI_MATH_ALU2(LOAD, SRCA, R4),
+  MI_MATH_ALU2(LOAD, SRCB, R3),
+  MI_MATH_ALU0(SUB),
+  MI_MATH_ALU2(STORE, R3, ACCU),
+  MI_MATH_ALU2(LOAD, SRCA, R2),
+  MI_MATH_ALU2(LOAD, SRCB, R1),
+  MI_MATH_ALU0(SUB),
+  MI_MATH_ALU2(STORE, R1, ACCU),
+  MI_MATH_ALU2(LOAD, SRCA, R3),
+  MI_MATH_ALU2(LOAD, SRCB, R1),
+  MI_MATH_ALU0(SUB),
+  MI_MATH_ALU2(STORE, R1, ACCU),
+  MI_MATH_ALU2(LOAD, SRCA, R1),
+  MI_MATH_ALU2(LOAD, SRCB, R0),
+  MI_MATH_ALU0(OR),
+  MI_MATH_ALU2(STORE, R0, ACCU),
+   };
 
-   assert(query->bo != NULL);
+   brw_load_register_imm64(brw, HSW_CS_GPR(0), 0ull);
 
-   /* Needed to ensure the memory is coherent for the MI_LOAD_REGISTER_MEM
-* command when loading the values into the predicate source registers for
-* conditional rendering.
-*/
-   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_FLUSH_ENABLE);
+   for (int i = stream_start; i < stream_start + count; i++) {
+  int offset = 32 * i;
+  brw_load_register_mem64(brw,
+  HSW_CS_GPR(1),
+  query->bo,
+  I915_GEM_DOMAIN_INSTRUCTION,
+  0, /* write domain */
+  offset);
+  offset += 8;
+  brw_load_register_mem64(brw,
+  HSW_CS_GPR(2),
+  query->bo,
+  I915_GEM_DOMAIN_INSTRUCTION,
+  0, /* write domain */
+  offset);
+  offset += 8;
+  brw_load_register_mem64(brw,
+  HSW_CS_GPR(3),
+  query->bo,
+  I915_GEM_DOMAIN_INSTRUCTION,
+  0, /* write domain */
+  offset);
+  offset += 8;
+  brw_load_register_mem64(brw,
+  HSW_CS_GPR(4),
+  query->bo,
+  I915_GEM_DOMAIN_INSTRUCTION,
+  0, /* write domain */
+  offset);
 
+  BEGIN_BATCH(1 + ARRAY_SIZE(maths));
+  OUT_BATCH(HSW_MI_MATH | (1 + ARRAY_SIZE(maths) - 2));
+
+  for (int m = 0; m < ARRAY_SIZE(maths); m++)
+ OUT_BATCH(maths[m]);
+
+  ADVANCE_BATCH();
+   }
+
+   brw_load_register_reg64(brw, HSW_CS_GPR(0), MI_PREDICATE_SRC0);
+   brw_load_register_imm64(brw, MI_PREDICATE_SRC1, 0ull);
+}
+
+static void
+set_predicate_for_occlusion_query(struct brw_context *brw,
+  struct brw_query_object *query)
+{
brw_load_register_mem64(brw,
MI_PREDICATE_SRC0,
query->bo,
@@ -74,6 +137,34 @@ set_predicate_for_result(struct brw_context *brw,
I915_GEM_DOMAIN_INSTRUCTION,
0, /* write domain */
8 /* offset */);
+}
+
+static void
+set_predicate_for_result(struct brw_context *brw,
+ struct brw_query_object *query,
+ bool inverted)
+{
+
+   int load_op;
+
+   assert(query->bo != NULL);
+
+   /* Needed to ensure the memory is coherent for the MI_LOAD_REGISTER_MEM
+* command when loading the values into the predicate source registers for
+* conditional rendering.
+*/
+   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_FLUSH_ENABLE);
+
+   switch (query->Base.Target) {
+   case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
+  

[Mesa-dev] [PATCH v2 3/7] i965: add plumbing for ARB_transform_feedback_overflow_query.

2016-12-09 Thread Rafael Antognolli
When querying for transform feedback overflow on one or all of the
streams, store information about number of generated and written
primitives. Then check whether generated == written.

v2:
- use only SO_PRIM_STORAGE_NEEDED, do not fallback to
  CL_INVOCATION_COUNT. (Kenneth)

Signed-off-by: Rafael Antognolli 
---
 src/mesa/drivers/dri/i965/brw_queryobj.c  |  2 +
 src/mesa/drivers/dri/i965/gen6_queryobj.c | 73 +++
 2 files changed, 75 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c 
b/src/mesa/drivers/dri/i965/brw_queryobj.c
index dda17de..40b86a0 100644
--- a/src/mesa/drivers/dri/i965/brw_queryobj.c
+++ b/src/mesa/drivers/dri/i965/brw_queryobj.c
@@ -530,6 +530,8 @@ brw_is_query_pipelined(struct brw_query_object *query)
 
case GL_PRIMITIVES_GENERATED:
case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:
+   case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
+   case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB:
case GL_VERTICES_SUBMITTED_ARB:
case GL_PRIMITIVES_SUBMITTED_ARB:
case GL_VERTEX_SHADER_INVOCATIONS_ARB:
diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c 
b/src/mesa/drivers/dri/i965/gen6_queryobj.c
index bbd3c44..98cbbff 100644
--- a/src/mesa/drivers/dri/i965/gen6_queryobj.c
+++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c
@@ -98,6 +98,54 @@ write_xfb_primitives_written(struct brw_context *brw,
}
 }
 
+static void
+write_xfb_overflow_streams(struct gl_context *ctx,
+   drm_intel_bo *bo, int stream, int count,
+   int idx)
+{
+   struct brw_context *brw = brw_context(ctx);
+
+   brw_emit_mi_flush(brw);
+
+   for (int i = 0; i < count; i++) {
+  int w_idx = 4 * i + idx;
+  int g_idx = 4 * i + idx + 2;
+
+  if (brw->gen >= 7) {
+ brw_store_register_mem64(brw, bo,
+  GEN7_SO_NUM_PRIMS_WRITTEN(stream + i),
+  g_idx * sizeof(uint64_t));
+ brw_store_register_mem64(brw, bo,
+  GEN7_SO_PRIM_STORAGE_NEEDED(stream + i),
+  w_idx * sizeof(uint64_t));
+  } else {
+ brw_store_register_mem64(brw, bo,
+  GEN6_SO_NUM_PRIMS_WRITTEN,
+  g_idx * sizeof(uint64_t));
+ brw_store_register_mem64(brw, bo,
+  GEN6_SO_PRIM_STORAGE_NEEDED,
+  w_idx * sizeof(uint64_t));
+  }
+   }
+}
+
+static bool
+check_xfb_overflow_streams(uint64_t *results, int count)
+{
+   bool overflow = false;
+
+   for (int i = 0; i < count; i++) {
+  uint64_t *result_i = [4 * i];
+
+  if ((result_i[3] - result_i[2]) != (result_i[1] - result_i[0])) {
+ overflow = true;
+ break;
+  }
+   }
+
+   return overflow;
+}
+
 static inline int
 pipeline_target_to_index(int target)
 {
@@ -225,6 +273,14 @@ gen6_queryobj_get_results(struct gl_context *ctx,
   query->Base.Result = results[1] - results[0];
   break;
 
+   case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
+  query->Base.Result = check_xfb_overflow_streams(results, 1);
+  break;
+
+   case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB:
+  query->Base.Result = check_xfb_overflow_streams(results, 
MAX_VERTEX_STREAMS);
+  break;
+
case GL_FRAGMENT_SHADER_INVOCATIONS_ARB:
   query->Base.Result = (results[1] - results[0]);
   /* Implement the "WaDividePSInvocationCountBy4:HSW,BDW" workaround:
@@ -314,6 +370,14 @@ gen6_begin_query(struct gl_context *ctx, struct 
gl_query_object *q)
   write_xfb_primitives_written(brw, query->bo, query->Base.Stream, 0);
   break;
 
+   case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
+  write_xfb_overflow_streams(ctx, query->bo, query->Base.Stream, 1, 0);
+  break;
+
+   case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB:
+  write_xfb_overflow_streams(ctx, query->bo, 0, MAX_VERTEX_STREAMS, 0);
+  break;
+
case GL_VERTICES_SUBMITTED_ARB:
case GL_PRIMITIVES_SUBMITTED_ARB:
case GL_VERTEX_SHADER_INVOCATIONS_ARB:
@@ -368,6 +432,15 @@ gen6_end_query(struct gl_context *ctx, struct 
gl_query_object *q)
   write_xfb_primitives_written(brw, query->bo, query->Base.Stream, 1);
   break;
 
+   case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
+  write_xfb_overflow_streams(ctx, query->bo, query->Base.Stream, 1, 1);
+  break;
+
+   case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB:
+  write_xfb_overflow_streams(ctx, query->bo, 0, MAX_VERTEX_STREAMS, 1);
+  break;
+
+  /* calculate overflow here */
case GL_VERTICES_SUBMITTED_ARB:
case GL_PRIMITIVES_SUBMITTED_ARB:
case GL_VERTEX_SHADER_INVOCATIONS_ARB:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 0/7] Add support for ARB_transform_feedback_overflow_query.

2016-12-09 Thread Rafael Antognolli
Updated version addressing things suggested by Kenneth Graunke.

The series is available on github here:

https://github.com/rantogno/mesa/tree/review/overflow_query-v02

There are also piglit tests available for it here:

https://github.com/rantogno/piglit/tree/review/overflow_query-v02

Regards,
Rafael


Rafael Antognolli (7):
  mesa: Add types for ARB_transform_feedback_oveflow_query.
  mesa: Track transform feedback overflow query objects.
  i965: add plumbing for ARB_transform_feedback_overflow_query.
  i965: Add support for xfb overflow on query buffer objects.
  i965: Add support for xfb overflow query on conditional render.
  i965: Enable arb_transform_feedback_overflow_query.
  i965: Enable predicate support on gen >= 8.

 docs/features.txt  |   2 +-
 docs/relnotes/13.1.0.html  |   1 +
 src/mesa/drivers/dri/i965/brw_conditional_render.c | 111 +++--
 src/mesa/drivers/dri/i965/brw_queryobj.c   |   2 +
 src/mesa/drivers/dri/i965/gen6_queryobj.c  |  73 ++
 src/mesa/drivers/dri/i965/hsw_queryobj.c   | 108 
 src/mesa/drivers/dri/i965/intel_extensions.c   |   2 +
 src/mesa/main/condrender.c |   4 +-
 src/mesa/main/extensions_table.h   |   1 +
 src/mesa/main/mtypes.h |   5 +
 src/mesa/main/queryobj.c   |  21 
 src/mesa/state_tracker/st_cb_queryobj.c|   6 ++
 12 files changed, 324 insertions(+), 12 deletions(-)

-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 4/7] i965: Add support for xfb overflow on query buffer objects.

2016-12-09 Thread Rafael Antognolli
Enable getting the results of a transform feedback overflow query with a
buffer object.

Signed-off-by: Rafael Antognolli 
---
 src/mesa/drivers/dri/i965/hsw_queryobj.c | 108 +++
 1 file changed, 108 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/hsw_queryobj.c 
b/src/mesa/drivers/dri/i965/hsw_queryobj.c
index 0da2c3d..0c9dbdc 100644
--- a/src/mesa/drivers/dri/i965/hsw_queryobj.c
+++ b/src/mesa/drivers/dri/i965/hsw_queryobj.c
@@ -187,6 +187,103 @@ gpr0_to_bool(struct brw_context *brw)
 }
 
 static void
+load_gen_written_data_to_regs(struct brw_context *brw,
+  struct brw_query_object *query,
+  int idx)
+{
+   int offset = idx * sizeof(uint64_t) * 4;
+
+   brw_load_register_mem64(brw,
+   HSW_CS_GPR(1),
+   query->bo,
+   I915_GEM_DOMAIN_INSTRUCTION,
+   I915_GEM_DOMAIN_INSTRUCTION,
+   offset);
+
+   offset += sizeof(uint64_t);
+   brw_load_register_mem64(brw,
+   HSW_CS_GPR(2),
+   query->bo,
+   I915_GEM_DOMAIN_INSTRUCTION,
+   I915_GEM_DOMAIN_INSTRUCTION,
+   offset);
+
+   offset += sizeof(uint64_t);
+   brw_load_register_mem64(brw,
+   HSW_CS_GPR(3),
+   query->bo,
+   I915_GEM_DOMAIN_INSTRUCTION,
+   I915_GEM_DOMAIN_INSTRUCTION,
+   offset);
+
+   offset += sizeof(uint64_t);
+   brw_load_register_mem64(brw,
+   HSW_CS_GPR(4),
+   query->bo,
+   I915_GEM_DOMAIN_INSTRUCTION,
+   I915_GEM_DOMAIN_INSTRUCTION,
+   offset);
+}
+
+/*
+ * R3 = R4 - R3;
+ * R1 = R2 - R1;
+ * R1 = R3 - R1;
+ * R0 = R0 | R1;
+ */
+static void
+calc_overflow_for_stream(struct brw_context *brw)
+{
+   static const uint32_t maths[] = {
+  MI_MATH_ALU2(LOAD, SRCA, R4),
+  MI_MATH_ALU2(LOAD, SRCB, R3),
+  MI_MATH_ALU0(SUB),
+  MI_MATH_ALU2(STORE, R3, ACCU),
+  MI_MATH_ALU2(LOAD, SRCA, R2),
+  MI_MATH_ALU2(LOAD, SRCB, R1),
+  MI_MATH_ALU0(SUB),
+  MI_MATH_ALU2(STORE, R1, ACCU),
+  MI_MATH_ALU2(LOAD, SRCA, R3),
+  MI_MATH_ALU2(LOAD, SRCB, R1),
+  MI_MATH_ALU0(SUB),
+  MI_MATH_ALU2(STORE, R1, ACCU),
+  MI_MATH_ALU2(LOAD, SRCA, R1),
+  MI_MATH_ALU2(LOAD, SRCB, R0),
+  MI_MATH_ALU0(OR),
+  MI_MATH_ALU2(STORE, R0, ACCU),
+   };
+
+   BEGIN_BATCH(1 + ARRAY_SIZE(maths));
+   OUT_BATCH(HSW_MI_MATH | (1 + ARRAY_SIZE(maths) - 2));
+
+   for (int m = 0; m < ARRAY_SIZE(maths); m++)
+  OUT_BATCH(maths[m]);
+
+   ADVANCE_BATCH();
+}
+
+static void
+calc_overflow_to_gpr0(struct brw_context *brw, struct brw_query_object *query,
+   int count)
+{
+   brw_load_register_imm64(brw, HSW_CS_GPR(0), 0ull);
+
+   for (int i = 0; i < count; i++) {
+  load_gen_written_data_to_regs(brw, query, i);
+  calc_overflow_for_stream(brw);
+   }
+}
+
+static void
+overflow_result_to_grp0(struct brw_context *brw,
+struct brw_query_object *query,
+int count)
+{
+   calc_overflow_to_gpr0(brw, query, count);
+   gpr0_to_bool(brw);
+}
+
+static void
 hsw_result_to_gpr0(struct gl_context *ctx, struct brw_query_object *query,
struct gl_buffer_object *buf, intptr_t offset,
GLenum pname, GLenum ptype)
@@ -223,6 +320,11 @@ hsw_result_to_gpr0(struct gl_context *ctx, struct 
brw_query_object *query,
   I915_GEM_DOMAIN_INSTRUCTION,
   I915_GEM_DOMAIN_INSTRUCTION,
   0 * sizeof(uint64_t));
+   } else if (query->Base.Target == GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB
+  || query->Base.Target == GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB) {
+  /* Don't do anything in advance here, since the math for this is a little
+   * more complex.
+   */
} else {
   brw_load_register_mem64(brw,
   HSW_CS_GPR(1),
@@ -274,6 +376,12 @@ hsw_result_to_gpr0(struct gl_context *ctx, struct 
brw_query_object *query,
case GL_ANY_SAMPLES_PASSED_CONSERVATIVE:
   gpr0_to_bool(brw);
   break;
+   case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
+  overflow_result_to_grp0(brw, query, 1);
+  break;
+   case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB:
+  overflow_result_to_grp0(brw, query, MAX_VERTEX_STREAMS);
+  break;
}
 }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 7/7] i965: Enable predicate support on gen >= 8.

2016-12-09 Thread Rafael Antognolli
Predication needs cmd parser only on gen7. For newer platforms, it
should be available without it.

Signed-off-by: Rafael Antognolli 
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index d5e4164..848a8b9 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -407,6 +407,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.OES_geometry_shader = true;
   ctx->Extensions.OES_texture_cube_map_array = true;
   ctx->Extensions.OES_viewport_array = true;
+  brw->predicate.supported = true;
}
 
if (brw->gen >= 9) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/7] mesa: Add types for ARB_transform_feedback_oveflow_query.

2016-12-09 Thread Rafael Antognolli
Add some basic types and storage for the queries of this extension.

v2:
- update date of extension (Kenneth)

Signed-off-by: Rafael Antognolli 
---
 src/mesa/main/extensions_table.h | 1 +
 src/mesa/main/mtypes.h   | 5 +
 2 files changed, 6 insertions(+)

diff --git a/src/mesa/main/extensions_table.h b/src/mesa/main/extensions_table.h
index 9c3b776..7ddb520 100644
--- a/src/mesa/main/extensions_table.h
+++ b/src/mesa/main/extensions_table.h
@@ -161,6 +161,7 @@ EXT(ARB_timer_query , 
ARB_timer_query
 EXT(ARB_transform_feedback2 , ARB_transform_feedback2  
  , GLL, GLC,  x ,  x , 2010)
 EXT(ARB_transform_feedback3 , ARB_transform_feedback3  
  , GLL, GLC,  x ,  x , 2010)
 EXT(ARB_transform_feedback_instanced, ARB_transform_feedback_instanced 
  , GLL, GLC,  x ,  x , 2011)
+EXT(ARB_transform_feedback_overflow_query   , 
ARB_transform_feedback_overflow_query  , GLL, GLC,  x ,  x , 2014)
 EXT(ARB_transpose_matrix, dummy_true   
  , GLL,  x ,  x ,  x , 1999)
 EXT(ARB_uniform_buffer_object   , ARB_uniform_buffer_object
  , GLL, GLC,  x ,  x , 2009)
 EXT(ARB_vertex_array_bgra   , EXT_vertex_array_bgra
  , GLL, GLC,  x ,  x , 2008)
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 71bd89e..19956ab 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3003,6 +3003,10 @@ struct gl_query_state
struct gl_query_object *PrimitivesGenerated[MAX_VERTEX_STREAMS];
struct gl_query_object *PrimitivesWritten[MAX_VERTEX_STREAMS];
 
+   /** GL_ARB_transform_feedback_overflow_query */
+   struct gl_query_object *TransformFeedbackOverflow[MAX_VERTEX_STREAMS];
+   struct gl_query_object *TransformFeedbackOverflowAny;
+
/** GL_ARB_timer_query */
struct gl_query_object *TimeElapsed;
 
@@ -3873,6 +3877,7 @@ struct gl_extensions
GLboolean ARB_transform_feedback2;
GLboolean ARB_transform_feedback3;
GLboolean ARB_transform_feedback_instanced;
+   GLboolean ARB_transform_feedback_overflow_query;
GLboolean ARB_uniform_buffer_object;
GLboolean ARB_vertex_attrib_64bit;
GLboolean ARB_vertex_program;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/7] mesa: Track transform feedback overflow query objects.

2016-12-09 Thread Rafael Antognolli
Also update checks on conditional rendering.

Signed-off-by: Rafael Antognolli 
---
 src/mesa/main/condrender.c  |  4 +++-
 src/mesa/main/queryobj.c| 21 +
 src/mesa/state_tracker/st_cb_queryobj.c |  6 ++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/condrender.c b/src/mesa/main/condrender.c
index 46c6036..2ea2c88 100644
--- a/src/mesa/main/condrender.c
+++ b/src/mesa/main/condrender.c
@@ -99,7 +99,9 @@ _mesa_BeginConditionalRender(GLuint queryId, GLenum mode)
 */
if ((q->Target != GL_SAMPLES_PASSED &&
 q->Target != GL_ANY_SAMPLES_PASSED &&
-q->Target != GL_ANY_SAMPLES_PASSED_CONSERVATIVE) || q->Active) {
+q->Target != GL_ANY_SAMPLES_PASSED_CONSERVATIVE &&
+q->Target != GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB &&
+q->Target != GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB) || q->Active) {
   _mesa_error(ctx, GL_INVALID_OPERATION, "glBeginConditionalRender()");
   return;
}
diff --git a/src/mesa/main/queryobj.c b/src/mesa/main/queryobj.c
index 1fa0279..e4edb51 100644
--- a/src/mesa/main/queryobj.c
+++ b/src/mesa/main/queryobj.c
@@ -197,6 +197,16 @@ get_query_binding_point(struct gl_context *ctx, GLenum 
target, GLuint index)
  return >Query.PrimitivesWritten[index];
   else
  return NULL;
+   case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
+  if (ctx->Extensions.ARB_transform_feedback_overflow_query)
+ return >Query.TransformFeedbackOverflow[index];
+  else
+ return NULL;
+   case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB:
+  if (ctx->Extensions.ARB_transform_feedback_overflow_query)
+ return >Query.TransformFeedbackOverflowAny;
+  else
+ return NULL;
 
case GL_VERTICES_SUBMITTED_ARB:
case GL_PRIMITIVES_SUBMITTED_ARB:
@@ -293,6 +303,8 @@ _mesa_CreateQueries(GLenum target, GLsizei n, GLuint *ids)
case GL_TIMESTAMP:
case GL_PRIMITIVES_GENERATED:
case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:
+   case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
+   case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB:
   break;
default:
   _mesa_error(ctx, GL_INVALID_ENUM, "glCreateQueries(invalid target = %s)",
@@ -368,6 +380,7 @@ query_error_check_index(struct gl_context *ctx, GLenum 
target, GLuint index)
switch (target) {
case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:
case GL_PRIMITIVES_GENERATED:
+   case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
   if (index >= ctx->Const.MaxVertexStreams) {
  _mesa_error(ctx, GL_INVALID_VALUE,
  "glBeginQueryIndexed(index>=MaxVertexStreams)");
@@ -677,6 +690,14 @@ _mesa_GetQueryIndexediv(GLenum target, GLuint index, 
GLenum pname,
  case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:
 *params = ctx->Const.QueryCounterBits.PrimitivesWritten;
 break;
+ case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
+ case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB:
+/* The minimum value of this is 1 if it's nonzero, and the value
+ * is only ever GL_TRUE or GL_FALSE, so no sense in reporting more
+ * bits.
+ */
+*params = 1;
+break;
  case GL_VERTICES_SUBMITTED_ARB:
 *params = ctx->Const.QueryCounterBits.VerticesSubmitted;
 break;
diff --git a/src/mesa/state_tracker/st_cb_queryobj.c 
b/src/mesa/state_tracker/st_cb_queryobj.c
index 2489676..b1ac2aa 100644
--- a/src/mesa/state_tracker/st_cb_queryobj.c
+++ b/src/mesa/state_tracker/st_cb_queryobj.c
@@ -114,6 +114,12 @@ st_BeginQuery(struct gl_context *ctx, struct 
gl_query_object *q)
case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:
   type = PIPE_QUERY_PRIMITIVES_EMITTED;
   break;
+   case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
+  type = PIPE_QUERY_SO_OVERFLOW_PREDICATE;
+  break;
+   case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB:
+  type = PIPE_QUERY_SO_OVERFLOW_PREDICATE;
+  break;
case GL_TIME_ELAPSED:
   if (st->has_time_elapsed)
  type = PIPE_QUERY_TIME_ELAPSED;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] i965/fs: Fetch one cacheline of pull constants at a time.

2016-12-09 Thread Francisco Jerez
Asking the DC for less than one cacheline (4 owords) of data for
uniform pull constants is suboptimal because the DC cannot request
less than that from L3, resulting in wasted bandwidth and unnecessary
message dispatch overhead, and exacerbating the IVB L3 serialization
bug.  The following table summarizes the overall framerate improvement
(with statistical significance of 5% and sample size ~10) from the
whole series up to this patch for several benchmarks and hardware
generations:

 | SKL   | BDW  | HSW
SynMark2 OglShMapPcf | 24.63% ±0.45% | 4.01% ±0.70% | 10.31% ±0.38%
GfxBench4 gl_manhattan31 |  5.93% ±0.35% | 3.92% ±0.31% |  6.62% ±0.22%
GfxBench4 gl_4   |  2.52% ±0.44% | 1.23% ±0.10% |  N/A
Unigine Valley   |  0.83% ±0.17% | 0.23% ±0.05% |  0.74% ±0.45%

Note that there are two versions of the Manhattan demo shipped with
GfxBench4, one of them is the original gl_manhattan demo which doesn't
use UBOs, so this patch will have no effect on it, and another one is
the gl_manhattan31 demo based on GL 4.3/GLES 3.1, which this patch
benefits as shown above.

I haven't observed any statistically significant regressions in the
benchmarks I have at hand.

Going up to 8 oword blocks would improve performance of pull constants
even more, but at the cost of some additional bandwidth and register
pressure, so it would have to be done on-demand based on the number of
constants actually used by the shader.

v2: Fix for Gen4 and 5.
v3: Non-trivial rebase.  Rework to allow the visitor specifiy
arbitrary pull constant block sizes.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 21 +
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 16 +---
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index b6a571a..0221287 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2111,25 +2111,22 @@ fs_visitor::lower_constant_loads()
  if (pull_index == -1)
continue;
 
- const unsigned index = 
stage_prog_data->binding_table.pull_constants_start;
- fs_reg dst;
-
- if (type_sz(inst->src[i].type) <= 4)
-dst = vgrf(glsl_type::float_type);
- else
-dst = vgrf(glsl_type::double_type);
-
  assert(inst->src[i].stride == 0);
 
- const fs_builder ubld = ibld.exec_all().group(4, 0);
- struct brw_reg offset = brw_imm_ud((unsigned)(pull_index * 4) & ~15);
+ const unsigned index = 
stage_prog_data->binding_table.pull_constants_start;
+ const unsigned block_sz = 64; /* Fetch one cacheline at a time. */
+ const fs_builder ubld = ibld.exec_all().group(block_sz / 4, 0);
+ const fs_reg dst = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+ const unsigned base = pull_index * 4;
+
  ubld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
-   dst, brw_imm_ud(index), offset);
+   dst, brw_imm_ud(index), brw_imm_ud(base & ~(block_sz - 1)));
 
  /* Rewrite the instruction to use the temporary VGRF. */
  inst->src[i].file = VGRF;
  inst->src[i].nr = dst.nr;
- inst->src[i].offset = (pull_index & 3) * 4 + inst->src[i].offset % 4;
+ inst->src[i].offset = (base & (block_sz - 1)) +
+   inst->src[i].offset % 4;
 
  brw_mark_surface_used(prog_data, index);
   }
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 7e00086..e97cae3 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -4059,21 +4059,23 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   * and we have to split it if necessary.
   */
  const unsigned type_size = type_sz(dest.type);
- const fs_builder ubld = bld.exec_all().group(4, 0);
- const fs_reg packed_consts = ubld.vgrf(BRW_REGISTER_TYPE_F);
+ const unsigned block_sz = 64; /* Fetch one cacheline at a time. */
+ const fs_builder ubld = bld.exec_all().group(block_sz / 4, 0);
+ const fs_reg packed_consts = ubld.vgrf(BRW_REGISTER_TYPE_UD);
 
  for (unsigned c = 0; c < instr->num_components;) {
 const unsigned base = const_offset->u32[0] + c * type_size;
-
-/* Number of usable components in the next 16B-aligned load */
+/* Number of usable components in the next block-aligned load. */
 const unsigned count = MIN2(instr->num_components - c,
-(16 - base % 16) / type_size);
+(block_sz - base % block_sz) / 
type_size);
 
 ubld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
-  packed_consts, surf_index, brw_imm_ud(base & ~15));
+  

[Mesa-dev] [PATCH 0/9] i965/fs: Uniform pull constant loads through the constant cache.

2016-12-09 Thread Francisco Jerez
This is a respin of a series I sent nearly two years ago
reimplementing uniform pull constant loads in terms of constant cache
block read messages instead of using sampler LD messages.  The
motivation is that oword block read messages are able to fetch more
data with a single message than the current SIMD4x2 sampler LD
messages, and they don't contribute to thrashing of the sampler
caches, which can lead to performance problems with several workloads.
Here is a summary of the benchmarks that are improved by this series
along with an estimate of their standard deviation (see PATCH 6 for
more details):

   | SKL   | BDW  | HSW
  SynMark2 OglShMapPcf | 24.63% ±0.45% | 4.01% ±0.70% | 10.31% ±0.38%
  GfxBench4 gl_manhattan31 |  5.93% ±0.35% | 3.92% ±0.31% |  6.62% ±0.22%
  GfxBench4 gl_4   |  2.52% ±0.44% | 1.23% ±0.10% |  N/A
  Unigine Valley   |  0.83% ±0.17% | 0.23% ±0.05% |  0.74% ±0.45%

I'm resending the series since Mark pointed out that the i965 driver
leads to an increased amount of sampler traffic in comparison to the
proprietary driver during some expensive draw calls of the Manhattan
demo.  On the other hand it would lead to a decreased (in fact zero)
non-sampler shader memory access counts.  The original Manhattan demo
I tried two years ago wasn't affected by the change, because it didn't
make use of UBOs at all, but the newer gl_manhattan31 demo based on GL
4.3/GLES 3.1 does as you can tell from the table above.

The series should be roughly functionally equivalent to the last
revision, but rebased two years forwards in time, which involved
nearly rewriting some of the patches so I ended up making things
slightly more flexible to allow the oword read block size to be
specified arbitrarily by the back-end in order to allow easier future
extension to use a larger block size -- Or a smaller one in order to
minimize register pressure.

 src/mesa/drivers/dri/i965/brw_defines.h  |   7 ++-
 src/mesa/drivers/dri/i965/brw_disasm.c   |   1 +
 src/mesa/drivers/dri/i965/brw_eu.h   |   1 +
 src/mesa/drivers/dri/i965/brw_eu_emit.c  |  97 
+++--
 src/mesa/drivers/dri/i965/brw_fs.cpp |  63 
+--
 src/mesa/drivers/dri/i965/brw_fs.h   |   5 +
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 108 

 src/mesa/drivers/dri/i965/brw_fs_nir.cpp |  19 +++
 src/mesa/drivers/dri/i965/brw_pipe_control.c |   1 +
 src/mesa/drivers/dri/i965/brw_shader.cpp |   2 --
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  15 ---
 11 files changed, 113 insertions(+), 206 deletions(-)

[PATCH 1/9] i965/gen6+: Invalidate constant cache on brw_emit_mi_flush().
[PATCH 2/9] i965: Let the caller of brw_set_dp_write/read_message control the 
target cache.
[PATCH 3/9] i965/fs: Switch to the constant cache for uniform pull constants.
[PATCH 4/9] i965: Factor out oword block read and write message control 
calculation.
[PATCH 5/9] i965/fs: Expose arbitrary pull constant load sizes to the IR.
[PATCH 6/9] i965/fs: Fetch one cacheline of pull constants at a time.
[PATCH 7/9] i965/fs: Drop useless access mode override from pull constant 
generator code.
[PATCH 8/9] i965/fs: Remove the FS_OPCODE_SET_SIMD4X2_OFFSET virtual opcode.
[PATCH 9/9] i965/disasm: Decode dataport constant cache control fields.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] i965/disasm: Decode dataport constant cache control fields.

2016-12-09 Thread Francisco Jerez
---
 src/mesa/drivers/dri/i965/brw_disasm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index 5e51be7..5930e44 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -1410,6 +1410,7 @@ brw_disassemble_inst(FILE *file, const struct 
gen_device_info *devinfo,
 }
 break;
  case GEN6_SFID_DATAPORT_SAMPLER_CACHE:
+ case GEN6_SFID_DATAPORT_CONSTANT_CACHE:
 /* aka BRW_SFID_DATAPORT_READ on Gen4-5 */
 if (devinfo->gen >= 6) {
format(file, " (%"PRIu64", %"PRIu64", %"PRIu64", %"PRIu64")",
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] i965: Factor out oword block read and write message control calculation.

2016-12-09 Thread Francisco Jerez
We'll need roughly the same logic in other places and it would be
annoying to duplicate it.  Instead factor it out into a function-like
macro that takes the number of dwords per block (which will prove more
convenient than taking the same value in owords or some other unit).
---
 src/mesa/drivers/dri/i965/brw_defines.h |  6 ++
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 ++
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index cae8e9a..1c638a0 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1669,6 +1669,12 @@ enum brw_message_target {
 #define BRW_DATAPORT_OWORD_BLOCK_2_OWORDS 2
 #define BRW_DATAPORT_OWORD_BLOCK_4_OWORDS 3
 #define BRW_DATAPORT_OWORD_BLOCK_8_OWORDS 4
+#define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n)  \
+   ((n) == 4 ? BRW_DATAPORT_OWORD_BLOCK_1_OWORDLOW :\
+(n) == 8 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS :  \
+(n) == 16 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS : \
+(n) == 32 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : \
+(abort(), ~0))
 
 #define BRW_DATAPORT_OWORD_DUAL_BLOCK_1OWORD 0
 #define BRW_DATAPORT_OWORD_DUAL_BLOCK_4OWORDS2
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 341f543..6141bfb 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2056,11 +2056,6 @@ void brw_oword_block_write_scratch(struct brw_codegen *p,
mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
 
const unsigned mlen = 1 + num_regs;
-   const unsigned msg_control =
-  (num_regs == 1 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS :
-   num_regs == 2 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS :
-   num_regs == 4 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : 0);
-   assert(msg_control);
 
/* Set up the message header.  This is g0, with g0.2 filled with
 * the offset.  We don't want to leave our offset around in g0 or
@@ -2134,7 +2129,7 @@ void brw_oword_block_write_scratch(struct brw_codegen *p,
   brw_set_dp_write_message(p,
   insn,
brw_scratch_surface_idx(p),
-  msg_control,
+  BRW_DATAPORT_OWORD_BLOCK_DWORDS(num_regs * 8),
   msg_type,
target_cache,
   mlen,
@@ -2181,11 +2176,6 @@ brw_oword_block_read_scratch(struct brw_codegen *p,
dest = retype(dest, BRW_REGISTER_TYPE_UW);
 
const unsigned rlen = num_regs;
-   const unsigned msg_control =
-  (num_regs == 1 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS :
-   num_regs == 2 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS :
-   num_regs == 4 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : 0);
-   assert(msg_control);
const unsigned target_cache =
   (devinfo->gen >= 7 ? GEN7_SFID_DATAPORT_DATA_CACHE :
devinfo->gen >= 6 ? GEN6_SFID_DATAPORT_RENDER_CACHE :
@@ -,7 +2212,7 @@ brw_oword_block_read_scratch(struct brw_codegen *p,
   brw_set_dp_read_message(p,
  insn,
   brw_scratch_surface_idx(p),
- msg_control,
+ BRW_DATAPORT_OWORD_BLOCK_DWORDS(num_regs * 8),
  BRW_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ, /* 
msg_type */
  target_cache,
  1, /* msg_length */
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] i965/fs: Drop useless access mode override from pull constant generator code.

2016-12-09 Thread Francisco Jerez
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index e73f2ca..6565f4d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1175,7 +1175,6 @@ 
fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst,
 
   brw_push_insn_state(p);
   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
-  brw_set_default_access_mode(p, BRW_ALIGN_1);
 
   /* a0.0 = surf_index & 0xff */
   brw_inst *insn_and = brw_next_insn(p, BRW_OPCODE_AND);
@@ -1311,7 +1310,6 @@ 
fs_generator::generate_varying_pull_constant_load_gen7(fs_inst *inst,
 
   brw_push_insn_state(p);
   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
-  brw_set_default_access_mode(p, BRW_ALIGN_1);
 
   /* a0.0 = surf_index & 0xff */
   brw_inst *insn_and = brw_next_insn(p, BRW_OPCODE_AND);
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] i965: Let the caller of brw_set_dp_write/read_message control the target cache.

2016-12-09 Thread Francisco Jerez
brw_set_dp_read_message already had a target_cache argument, but its
interpretation was rather convoluted (on Gen6 the render cache was
used if the caller asked for it, otherwise it was ignored using the
sampler cache instead), and the constant cache wasn't representable at
all.  brw_set_dp_write_message used the data cache on Gen7+ except for
RENDER_TARGET_WRITE messages, in which case it would use the render
cache.  On Gen6 the render cache was always used.

Instead of the above, provide the shared unit SFID that the caller
expects will be used.  Makes no functional changes.

v3: Non-trivial rebase.
---
 src/mesa/drivers/dri/i965/brw_eu.h   |  1 +
 src/mesa/drivers/dri/i965/brw_eu_emit.c  | 69 +++-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 15 --
 3 files changed, 43 insertions(+), 42 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 737a335..c44896b 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -233,6 +233,7 @@ void brw_set_dp_write_message(struct brw_codegen *p,
  unsigned binding_table_index,
  unsigned msg_control,
  unsigned msg_type,
+  unsigned target_cache,
  unsigned msg_length,
  bool header_present,
  unsigned last_render_target,
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index ca04221..72b6df6 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -706,6 +706,7 @@ brw_set_dp_write_message(struct brw_codegen *p,
 unsigned binding_table_index,
 unsigned msg_control,
 unsigned msg_type,
+ unsigned target_cache,
 unsigned msg_length,
 bool header_present,
 unsigned last_render_target,
@@ -714,20 +715,8 @@ brw_set_dp_write_message(struct brw_codegen *p,
 unsigned send_commit_msg)
 {
const struct gen_device_info *devinfo = p->devinfo;
-   unsigned sfid;
-
-   if (devinfo->gen >= 7) {
-  /* Use the Render Cache for RT writes; otherwise use the Data Cache */
-  if (msg_type == GEN6_DATAPORT_WRITE_MESSAGE_RENDER_TARGET_WRITE)
-sfid = GEN6_SFID_DATAPORT_RENDER_CACHE;
-  else
-sfid = GEN7_SFID_DATAPORT_DATA_CACHE;
-   } else if (devinfo->gen == 6) {
-  /* Use the render cache for all write messages. */
-  sfid = GEN6_SFID_DATAPORT_RENDER_CACHE;
-   } else {
-  sfid = BRW_SFID_DATAPORT_WRITE;
-   }
+   const unsigned sfid = (devinfo->gen >= 6 ? target_cache :
+  BRW_SFID_DATAPORT_WRITE);
 
brw_set_message_descriptor(p, insn, sfid, msg_length, response_length,
  header_present, end_of_thread);
@@ -753,26 +742,8 @@ brw_set_dp_read_message(struct brw_codegen *p,
unsigned response_length)
 {
const struct gen_device_info *devinfo = p->devinfo;
-   unsigned sfid;
-
-   if (devinfo->gen >= 7) {
-  if (target_cache == BRW_DATAPORT_READ_TARGET_RENDER_CACHE)
- sfid = GEN6_SFID_DATAPORT_RENDER_CACHE;
-  else if (target_cache == BRW_DATAPORT_READ_TARGET_DATA_CACHE)
- sfid = GEN7_SFID_DATAPORT_DATA_CACHE;
-  else if (target_cache == BRW_DATAPORT_READ_TARGET_SAMPLER_CACHE)
- sfid = GEN6_SFID_DATAPORT_SAMPLER_CACHE;
-  else
- unreachable("Invalid target cache");
-
-   } else if (devinfo->gen == 6) {
-  if (target_cache == BRW_DATAPORT_READ_TARGET_RENDER_CACHE)
-sfid = GEN6_SFID_DATAPORT_RENDER_CACHE;
-  else
-sfid = GEN6_SFID_DATAPORT_SAMPLER_CACHE;
-   } else {
-  sfid = BRW_SFID_DATAPORT_READ;
-   }
+   const unsigned sfid = (devinfo->gen >= 6 ? target_cache :
+  BRW_SFID_DATAPORT_READ);
 
brw_set_message_descriptor(p, insn, sfid, msg_length, response_length,
  header_present, false);
@@ -2073,6 +2044,10 @@ void brw_oword_block_write_scratch(struct brw_codegen *p,
   unsigned offset)
 {
const struct gen_device_info *devinfo = p->devinfo;
+   const unsigned target_cache =
+  (devinfo->gen >= 7 ? GEN7_SFID_DATAPORT_DATA_CACHE :
+   devinfo->gen >= 6 ? GEN6_SFID_DATAPORT_RENDER_CACHE :
+   BRW_DATAPORT_READ_TARGET_RENDER_CACHE);
uint32_t msg_type;
 
if (devinfo->gen >= 6)
@@ -2161,6 +2136,7 @@ void brw_oword_block_write_scratch(struct brw_codegen *p,
brw_scratch_surface_idx(p),
   msg_control,
   msg_type,
+   target_cache,
   mlen,
 

[Mesa-dev] [PATCH 8/9] i965/fs: Remove the FS_OPCODE_SET_SIMD4X2_OFFSET virtual opcode.

2016-12-09 Thread Francisco Jerez
Not used anymore.  It was just a scalar MOV.
---
 src/mesa/drivers/dri/i965/brw_defines.h|  1 -
 src/mesa/drivers/dri/i965/brw_fs.h |  3 ---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27 --
 src/mesa/drivers/dri/i965/brw_shader.cpp   |  2 --
 4 files changed, 33 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 1c638a0..953f457 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1119,7 +1119,6 @@ enum opcode {
FS_OPCODE_MOV_DISPATCH_TO_FLAGS,
FS_OPCODE_DISCARD_JUMP,
FS_OPCODE_SET_SAMPLE_ID,
-   FS_OPCODE_SET_SIMD4X2_OFFSET,
FS_OPCODE_PACK_HALF_2x16_SPLIT,
FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X,
FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y,
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 2b02458..6e83f71 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -442,9 +442,6 @@ private:
struct brw_reg src0,
struct brw_reg src1);
 
-   void generate_set_simd4x2_offset(fs_inst *inst,
-struct brw_reg dst,
-struct brw_reg offset);
void generate_discard_jump(fs_inst *inst);
 
void generate_pack_half_2x16_split(fs_inst *inst,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 6565f4d..4f83efc 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1379,29 +1379,6 @@ fs_generator::generate_pixel_interpolator_query(fs_inst 
*inst,
  inst->size_written / REG_SIZE);
 }
 
-
-/**
- * Sets the first word of a vgrf for gen7+ simd4x2 uniform pull constant
- * sampler LD messages.
- *
- * We don't want to bake it into the send message's code generation because
- * that means we don't get a chance to schedule the instructions.
- */
-void
-fs_generator::generate_set_simd4x2_offset(fs_inst *inst,
-  struct brw_reg dst,
-  struct brw_reg value)
-{
-   assert(value.file == BRW_IMMEDIATE_VALUE);
-
-   brw_push_insn_state(p);
-   brw_set_default_exec_size(p, BRW_EXECUTE_8);
-   brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
-   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
-   brw_MOV(p, retype(brw_vec1_reg(dst.file, dst.nr, 0), value.type), value);
-   brw_pop_insn_state(p);
-}
-
 /* Sets vstride=1, width=4, hstride=0 of register src1 during
  * the ADD instruction.
  */
@@ -2004,10 +1981,6 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
  brw_memory_fence(p, dst);
  break;
 
-  case FS_OPCODE_SET_SIMD4X2_OFFSET:
- generate_set_simd4x2_offset(inst, dst, src[0]);
- break;
-
   case SHADER_OPCODE_FIND_LIVE_CHANNEL: {
  const struct brw_reg mask =
 brw_stage_has_packed_dispatch(devinfo, stage,
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 25f745d..afab4aa 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -363,8 +363,6 @@ brw_instruction_name(const struct gen_device_info *devinfo, 
enum opcode op)
 
case FS_OPCODE_SET_SAMPLE_ID:
   return "set_sample_id";
-   case FS_OPCODE_SET_SIMD4X2_OFFSET:
-  return "set_simd4x2_offset";
 
case FS_OPCODE_PACK_HALF_2x16_SPLIT:
   return "pack_half_2x16_split";
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] i965/fs: Expose arbitrary pull constant load sizes to the IR.

2016-12-09 Thread Francisco Jerez
Change the FS generator to ask the dataport for enough owords worth of
constants to fill the execution size of the instruction -- Which means
that the visitor now needs to set the execution size correctly for
uniform pull constant load instructions, which we were kind of
neglecting until now.
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c| 15 +++---
 src/mesa/drivers/dri/i965/brw_fs.cpp   |  2 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27 --
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  9 +
 4 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 6141bfb..8536a13 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2256,7 +2256,7 @@ gen7_block_read_scratch(struct brw_codegen *p,
 }
 
 /**
- * Read a float[4] vector from the data port constant cache.
+ * Read float[4] vectors from the data port constant cache.
  * Location (in buffer) should be a multiple of 16.
  * Used for fetching shader constants.
  */
@@ -2270,6 +2270,7 @@ void brw_oword_block_read(struct brw_codegen *p,
const unsigned target_cache =
   (devinfo->gen >= 6 ? GEN6_SFID_DATAPORT_CONSTANT_CACHE :
BRW_DATAPORT_READ_TARGET_DATA_CACHE);
+   const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current);
 
/* On newer hardware, offset is in units of owords. */
if (devinfo->gen >= 6)
@@ -2278,11 +2279,12 @@ void brw_oword_block_read(struct brw_codegen *p,
mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
 
brw_push_insn_state(p);
-   brw_set_default_exec_size(p, BRW_EXECUTE_8);
brw_set_default_predicate_control(p, BRW_PREDICATE_NONE);
brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
brw_set_default_mask_control(p, BRW_MASK_DISABLE);
 
+   brw_push_insn_state(p);
+   brw_set_default_exec_size(p, BRW_EXECUTE_8);
brw_MOV(p, mrf, retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
 
/* set message header global offset field (reg 0, element 2) */
@@ -2291,6 +2293,7 @@ void brw_oword_block_read(struct brw_codegen *p,
   mrf.nr,
   2), BRW_REGISTER_TYPE_UD),
   brw_imm_ud(offset));
+   brw_pop_insn_state(p);
 
brw_inst *insn = next_insn(p, BRW_OPCODE_SEND);
 
@@ -2305,15 +2308,13 @@ void brw_oword_block_read(struct brw_codegen *p,
   brw_inst_set_base_mrf(devinfo, insn, mrf.nr);
}
 
-   brw_set_dp_read_message(p,
-  insn,
-  bind_table_index,
-  BRW_DATAPORT_OWORD_BLOCK_1_OWORDLOW,
+   brw_set_dp_read_message(p, insn, bind_table_index,
+   BRW_DATAPORT_OWORD_BLOCK_DWORDS(exec_size),
   BRW_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ,
   target_cache,
   1, /* msg_length */
true, /* header_present */
-  1); /* response_length (1 reg, 2 owords!) */
+  DIV_ROUND_UP(exec_size, 8)); /* response_length */
 
brw_pop_insn_state(p);
 }
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 819d256..b6a571a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2121,7 +2121,7 @@ fs_visitor::lower_constant_loads()
 
  assert(inst->src[i].stride == 0);
 
- const fs_builder ubld = ibld.exec_all().group(8, 0);
+ const fs_builder ubld = ibld.exec_all().group(4, 0);
  struct brw_reg offset = brw_imm_ud((unsigned)(pull_index * 4) & ~15);
  ubld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
dst, brw_imm_ud(index), offset);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 24bec5f..e73f2ca 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1127,6 +1127,7 @@ fs_generator::generate_uniform_pull_constant_load(fs_inst 
*inst,
   struct brw_reg index,
   struct brw_reg offset)
 {
+   assert(type_sz(dst.type) == 4);
assert(inst->mlen != 0);
 
assert(index.file == BRW_IMMEDIATE_VALUE &&
@@ -1149,27 +1150,25 @@ 
fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst,
 {
assert(index.type == BRW_REGISTER_TYPE_UD);
assert(payload.file == BRW_GENERAL_REGISTER_FILE);
+   assert(type_sz(dst.type) == 4);
 
if (index.file == BRW_IMMEDIATE_VALUE) {
   const uint32_t surf_index = index.ud;
 
   brw_push_insn_state(p);
-  brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
   brw_inst *send = brw_next_insn(p, 

[Mesa-dev] [PATCH 3/9] i965/fs: Switch to the constant cache for uniform pull constants.

2016-12-09 Thread Francisco Jerez
This reverts to using the oword block read messages for uniform pull
constant loads, as used to be the case until
4c1fdae0a01b3f92ec03b61aac1d3df5.  There are two important differences
though: Now the L3 cacheability bits are set up correctly for UBOs
(since 11f5d8a5d4fbb861ec161f68593e429cbd65d1cd), and we target the
constant cache instead of the data cache.  The latter used to get no
L3 way allocation on boot on all platforms that existed at the time,
so oword read messages wouldn't get cached on L3 regardless of the
MOCS bits, what probably explains the apparent slowness of oword
fetches.

Constant cache loads seem to perform better than SIMD4x2 sampler loads
in a number of cases, they alleviate some of the cache thrashing
caused by the competition with textures for the L1/L2 sampler caches,
and they allow fetching up to 128B worth of constants with a single
oword fetch message.

Note that IVB devices suffer from a hardware bug that leads to
serialization of L3 read requests overlapping the same cacheline as
result of a (on IVB buggy) mechanism of the L3 to preserve coherency.
Since read requests for matching cachelines from any L3 client are not
pipelined, throughput may decrease in cases where there are no
non-overlapping requests left in the queue that can be processed
between them.

This situation should be relatively uncommon as long as we make sure
that we don't use the 1/2 oword messages in cases where the shader
intends to read from any other location of the same cacheline at some
other point.  This is generally a good idea anyway on all generations
because using the 1 and 2 oword messages is expected to waste
bandwidth since the minimum L3 request size for the DC is exactly 4
owords (i.e. one cacheline).  A future commit will have this effect.
I haven't been able to find any real-world example where this would
still result in a regression on IVB, but if someone happens to find
one it shouldn't be too difficult to add an IVB-specific check to have
it fall back to the sampler cache for pull constant loads.

v3: Non-trivial rebase.
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c|  5 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 42 +++---
 src/mesa/drivers/dri/i965/brw_fs.h |  2 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 78 +-
 4 files changed, 36 insertions(+), 91 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 72b6df6..341f543 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2266,7 +2266,7 @@ gen7_block_read_scratch(struct brw_codegen *p,
 }
 
 /**
- * Read a float[4] vector from the data port Data Cache (const buffer).
+ * Read a float[4] vector from the data port constant cache.
  * Location (in buffer) should be a multiple of 16.
  * Used for fetching shader constants.
  */
@@ -2278,8 +2278,7 @@ void brw_oword_block_read(struct brw_codegen *p,
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned target_cache =
-  (devinfo->gen >= 7 ? GEN7_SFID_DATAPORT_DATA_CACHE :
-   devinfo->gen >= 6 ? GEN6_SFID_DATAPORT_SAMPLER_CACHE :
+  (devinfo->gen >= 6 ? GEN6_SFID_DATAPORT_CONSTANT_CACHE :
BRW_DATAPORT_READ_TARGET_DATA_CACHE);
 
/* On newer hardware, offset is in units of owords. */
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index b5d1381..819d256 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3202,44 +3202,18 @@ fs_visitor::lower_uniform_pull_constant_loads()
  continue;
 
   if (devinfo->gen >= 7) {
- /* The offset arg is a vec4-aligned immediate byte offset. */
- fs_reg const_offset_reg = inst->src[1];
- assert(const_offset_reg.file == IMM &&
-const_offset_reg.type == BRW_REGISTER_TYPE_UD);
- assert(const_offset_reg.ud % 16 == 0);
-
- fs_reg payload, offset;
- if (devinfo->gen >= 9) {
-/* We have to use a message header on Skylake to get SIMD4x2
- * mode.  Reserve space for the register.
-*/
-offset = payload = fs_reg(VGRF, alloc.allocate(2));
-offset.offset += REG_SIZE;
-inst->mlen = 2;
- } else {
-offset = payload = fs_reg(VGRF, alloc.allocate(1));
-inst->mlen = 1;
- }
-
- /* This is actually going to be a MOV, but since only the first dword
-  * is accessed, we have a special opcode to do just that one.  Note
-  * that this needs to be an operation that will be considered a def
-  * by live variable analysis, or register allocation will explode.
-  */
- fs_inst *setup = new(mem_ctx) fs_inst(FS_OPCODE_SET_SIMD4X2_OFFSET,
-   8, offset, const_offset_reg);
- setup->force_writemask_all = 

[Mesa-dev] [PATCH 1/9] i965/gen6+: Invalidate constant cache on brw_emit_mi_flush().

2016-12-09 Thread Francisco Jerez
In order to make sure that the constant cache is coherent with
previous rendering when we start using it for pull constant loads.
---
 src/mesa/drivers/dri/i965/brw_pipe_control.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index dd426bf..b8f7406 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -351,6 +351,7 @@ brw_emit_mi_flush(struct brw_context *brw)
   int flags = PIPE_CONTROL_NO_WRITE | PIPE_CONTROL_RENDER_TARGET_FLUSH;
   if (brw->gen >= 6) {
  flags |= PIPE_CONTROL_INSTRUCTION_INVALIDATE |
+  PIPE_CONTROL_CONST_CACHE_INVALIDATE |
   PIPE_CONTROL_DEPTH_CACHE_FLUSH |
   PIPE_CONTROL_VF_CACHE_INVALIDATE |
   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/7] mesa: Add types for ARB_transform_feedback_oveflow_query.

2016-12-09 Thread Kenneth Graunke
On Wednesday, December 7, 2016 10:50:29 AM PST Rafael Antognolli wrote:
> Add some basic types and storage for the queries of this extension.
> 
> Signed-off-by: Rafael Antognolli 
> ---
>  src/mesa/main/extensions_table.h | 1 +
>  src/mesa/main/mtypes.h   | 5 +
>  2 files changed, 6 insertions(+)
> 
> diff --git a/src/mesa/main/extensions_table.h 
> b/src/mesa/main/extensions_table.h
> index 9c3b776..18a7097 100644
> --- a/src/mesa/main/extensions_table.h
> +++ b/src/mesa/main/extensions_table.h
> @@ -161,6 +161,7 @@ EXT(ARB_timer_query , 
> ARB_timer_query
>  EXT(ARB_transform_feedback2 , ARB_transform_feedback2
> , GLL, GLC,  x ,  x , 2010)
>  EXT(ARB_transform_feedback3 , ARB_transform_feedback3
> , GLL, GLC,  x ,  x , 2010)
>  EXT(ARB_transform_feedback_instanced, 
> ARB_transform_feedback_instanced   , GLL, GLC,  x ,  x , 2011)
> +EXT(ARB_transform_feedback_overflow_query   , 
> ARB_transform_feedback_overflow_query  , GLL, GLC,  x ,  x , 2016)

This should be 2014 (the date the spec was first written).

It's unlikely to matter though - this is for the MESA_EXTENSION_MAX_YEAR
hack which allows users to stop exposing extensions past a certain year,
which is rarely used.  (It's to work around old buggy GL apps that used
a fixed size buffer for the extension string, so exposing too many
extensions would trigger the buffer overflow in their game.)

>  EXT(ARB_transpose_matrix, dummy_true 
> , GLL,  x ,  x ,  x , 1999)
>  EXT(ARB_uniform_buffer_object   , ARB_uniform_buffer_object  
> , GLL, GLC,  x ,  x , 2009)
>  EXT(ARB_vertex_array_bgra   , EXT_vertex_array_bgra  
> , GLL, GLC,  x ,  x , 2008)
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index 71bd89e..19956ab 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -3003,6 +3003,10 @@ struct gl_query_state
> struct gl_query_object *PrimitivesGenerated[MAX_VERTEX_STREAMS];
> struct gl_query_object *PrimitivesWritten[MAX_VERTEX_STREAMS];
>  
> +   /** GL_ARB_transform_feedback_overflow_query */
> +   struct gl_query_object *TransformFeedbackOverflow[MAX_VERTEX_STREAMS];
> +   struct gl_query_object *TransformFeedbackOverflowAny;
> +
> /** GL_ARB_timer_query */
> struct gl_query_object *TimeElapsed;
>  
> @@ -3873,6 +3877,7 @@ struct gl_extensions
> GLboolean ARB_transform_feedback2;
> GLboolean ARB_transform_feedback3;
> GLboolean ARB_transform_feedback_instanced;
> +   GLboolean ARB_transform_feedback_overflow_query;
> GLboolean ARB_uniform_buffer_object;
> GLboolean ARB_vertex_attrib_64bit;
> GLboolean ARB_vertex_program;
> 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 94512] X segfaults with glx-tls enabled in a x32 environment

2016-12-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=94512

--- Comment #8 from Emil Velikov  ---
Double-checking the logs - seems like TLS is built/used throughout the board.
One thing which comes to mind - can you try with --disable-asm. I'm fairly sure
that the code we have in there doesn't attribute x32.

Note: I'll be pushing a patch which makes --enable-glx-tls the default in a
moment, so please keep it disabled locally until we get to the bottom of this.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: Use a simpler formula for tanh

2016-12-09 Thread Kenneth Graunke
On Friday, December 9, 2016 9:41:51 AM PST Jason Ekstrand wrote:
> The formula we have used in the past is a trivial reduction from the
> definition by simply multiplying both the numerator and denominator of the
> formula by 2.  However, multiplying by e^x, you can further reduce it.
> This allows us to get rid of one side of the clamp and two of exponential
> functions which should make it faster.  The new formula still passes the
> dEQP precision tests for tanh so it should be fine.
> ---
>  src/compiler/glsl/builtin_functions.cpp | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/src/compiler/glsl/builtin_functions.cpp 
> b/src/compiler/glsl/builtin_functions.cpp
> index 3dead1a..94e8279 100644
> --- a/src/compiler/glsl/builtin_functions.cpp
> +++ b/src/compiler/glsl/builtin_functions.cpp
> @@ -3563,17 +3563,19 @@ builtin_builder::_tanh(const glsl_type *type)
> ir_variable *x = in_var(type, "x");
> MAKE_SIG(type, v130, 1, x);
>  
> -   /* Clamp x to [-10, +10] to avoid precision problems.
> -* When x > 10, e^(-x) is so small relative to e^x that it gets flushed to
> -* zero in the computation e^x + e^(-x). The same happens in the other
> -* direction when x < -10.
> +   /* tanh(x) := (0.5 * (e^x - e^(-x))) / (0.5 * (e^x + e^(-x)))
> +*
> +* With a little algebra this reduces to (e^2x - 1) / (e^2x + 1)
> +*
> +* Clamp x to (-inf, +10] to avoid precision problems.  When x > 10, e^x 
> is
> +* so much larger than 1.0 that 1.0 gets flushed to zero in the 
> computation
> +* e^x +- 1 so it can be ignored.

e^2x (you say e^x here and e^2x in the spirv patch).  I'd also normally
write +/- instead of +-.

Both are
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: Use a simpler formula for tanh

2016-12-09 Thread Roland Scheidegger
Unsurprisingly, the formula looks great to me :-).
I was actually wondering about accuracy. I believe the biggest issue
(both with the original formula and this one) is probably values around
zero - because that gets calculated as (~1 - 1) / 2 - so the closest
values to zero you can get (other than zero) are ~2^-25 (whereas an
exact calculation could go down to 2^-127). So maybe the simplified
formula might actually be even a bit better there? glsl seems to be
quite lenient with required exp precision.

In any case,
Reviewed-by: Roland Scheidegger 

Am 09.12.2016 um 18:41 schrieb Jason Ekstrand:
> The formula we have used in the past is a trivial reduction from the
> definition by simply multiplying both the numerator and denominator of the
> formula by 2.  However, multiplying by e^x, you can further reduce it.
> This allows us to get rid of one side of the clamp and two of exponential
> functions which should make it faster.  The new formula still passes the
> dEQP precision tests for tanh so it should be fine.
> ---
>  src/compiler/glsl/builtin_functions.cpp | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/src/compiler/glsl/builtin_functions.cpp 
> b/src/compiler/glsl/builtin_functions.cpp
> index 3dead1a..94e8279 100644
> --- a/src/compiler/glsl/builtin_functions.cpp
> +++ b/src/compiler/glsl/builtin_functions.cpp
> @@ -3563,17 +3563,19 @@ builtin_builder::_tanh(const glsl_type *type)
> ir_variable *x = in_var(type, "x");
> MAKE_SIG(type, v130, 1, x);
>  
> -   /* Clamp x to [-10, +10] to avoid precision problems.
> -* When x > 10, e^(-x) is so small relative to e^x that it gets flushed to
> -* zero in the computation e^x + e^(-x). The same happens in the other
> -* direction when x < -10.
> +   /* tanh(x) := (0.5 * (e^x - e^(-x))) / (0.5 * (e^x + e^(-x)))
> +*
> +* With a little algebra this reduces to (e^2x - 1) / (e^2x + 1)
> +*
> +* Clamp x to (-inf, +10] to avoid precision problems.  When x > 10, e^x 
> is
> +* so much larger than 1.0 that 1.0 gets flushed to zero in the 
> computation
> +* e^x +- 1 so it can be ignored.
>  */
> ir_variable *t = body.make_temp(type, "tmp");
> -   body.emit(assign(t, min2(max2(x, imm(-10.0f)), imm(10.0f;
> +   body.emit(assign(t, min2(x, imm(10.0f;
>  
> -   /* (e^x - e^(-x)) / (e^x + e^(-x)) */
> -   body.emit(ret(div(sub(exp(t), exp(neg(t))),
> - add(exp(t), exp(neg(t));
> +   body.emit(ret(div(sub(exp(mul(t, imm(2.0f))), imm(1.0f)),
> + add(exp(mul(t, imm(2.0f))), imm(1.0f);
>  
> return sig;
>  }
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] compiler/glsl: fix precision problem of tanh

2016-12-09 Thread Jason Ekstrand
On Thu, Dec 8, 2016 at 5:50 PM, Kenneth Graunke 
wrote:

> On Thursday, December 8, 2016 5:41:02 PM PST Haixia Shi wrote:
> > Clamp input scalar value to range [-10, +10] to avoid precision problems
> > when the absolute value of input is too large.
> >
> > Fixes dEQP-GLES3.functional.shaders.builtin_functions.precision.tanh.*
> test
> > failures.
> >
> > v2: added more explanation in the comment.
> > v3: fixed a typo in the comment.
> >
> > Signed-off-by: Haixia Shi 
> > Cc: Jason Ekstrand ,
> > Cc: Stéphane Marchesin ,
> > Cc: Kenneth Graunke 
> >
> > Change-Id: I324c948b3323ff8107127c42934f14459e124b95
> > ---
> >  src/compiler/glsl/builtin_functions.cpp | 13 +++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/compiler/glsl/builtin_functions.cpp
> b/src/compiler/glsl/builtin_functions.cpp
> > index 3e4bcbb..0bacffb 100644
> > --- a/src/compiler/glsl/builtin_functions.cpp
> > +++ b/src/compiler/glsl/builtin_functions.cpp
> > @@ -3563,9 +3563,18 @@ builtin_builder::_tanh(const glsl_type *type)
> > ir_variable *x = in_var(type, "x");
> > MAKE_SIG(type, v130, 1, x);
> >
> > +   /*
>
> For future reference, /* doesn't go on its own line in Mesa.
> (We can fix that when pushing, no big deal.)
>
> Thanks for fixing this.  The explanation makes sense.
>
> Reviewed-by: Kenneth Graunke 
>

I just pushed this with my and ken's reviews and the comment change Ken
suggested.


> > +* Clamp x to [-10, +10] to avoid precision problems.
> > +* When x > 10, e^(-x) is so small relative to e^x that it gets
> flushed to
> > +* zero in the computation e^x + e^(-x). The same happens in the
> other
> > +* direction when x < -10.
> > +*/
> > +   ir_variable *t = body.make_temp(type, "tmp");
> > +   body.emit(assign(t, min2(max2(x, imm(-10.0f)), imm(10.0f;
> > +
> > /* (e^x - e^(-x)) / (e^x + e^(-x)) */
> > -   body.emit(ret(div(sub(exp(x), exp(neg(x))),
> > - add(exp(x), exp(neg(x));
> > +   body.emit(ret(div(sub(exp(t), exp(neg(t))),
> > + add(exp(t), exp(neg(t));
> >
> > return sig;
> >  }
> >
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 60197] Mesa Gallium VPATH build is broken

2016-12-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=60197

Emil Velikov  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #7 from Emil Velikov  ---
Quentin, since the issues should be resolved I'm closing this.
If you're still interested in hacking on Mesa, please send patches directly to
the ML.

Not many people read through bug reports, I'm afraid. Even less look for
patches in them.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] spirv: Use a simpler and more correct implementaiton of tanh()

2016-12-09 Thread Jason Ekstrand
The new implementation is more correct because it clamps the incoming value
to 10 to avoid floating-point overflow.  It also uses a much reduced
version of the formula which only requires 1 exp() rather than 2.  This
fixes all of the dEQP-VK.glsl.builtin.precision.tanh.* tests.
---
 src/compiler/spirv/vtn_glsl450.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index cb0570d..f0c9544 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -565,16 +565,21 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 
entrypoint,
build_exp(nb, nir_fneg(nb, src[0];
   return;
 
-   case GLSLstd450Tanh:
-  /* (0.5 * (e^x - e^(-x))) / (0.5 * (e^x + e^(-x))) */
-  val->ssa->def =
- nir_fdiv(nb, nir_fmul(nb, nir_imm_float(nb, 0.5f),
-   nir_fsub(nb, build_exp(nb, src[0]),
-build_exp(nb, nir_fneg(nb, 
src[0],
-  nir_fmul(nb, nir_imm_float(nb, 0.5f),
-   nir_fadd(nb, build_exp(nb, src[0]),
-build_exp(nb, nir_fneg(nb, 
src[0]);
+   case GLSLstd450Tanh: {
+  /* tanh(x) := (0.5 * (e^x - e^(-x))) / (0.5 * (e^x + e^(-x)))
+   *
+   * With a little algebra this reduces to (e^2x - 1) / (e^2x + 1)
+   *
+   * We clamp x to (-inf, +10] to avoid precision problems.  When x > 10,
+   * e^2x is so much larger than 1.0 that 1.0 gets flushed to zero in the
+   * computation e^2x +- 1 so it can be ignored.
+   */
+  nir_ssa_def *x = nir_fmin(nb, src[0], nir_imm_float(nb, 10));
+  nir_ssa_def *exp2x = build_exp(nb, nir_fmul(nb, x, nir_imm_float(nb, 
2)));
+  val->ssa->def = nir_fdiv(nb, nir_fsub(nb, exp2x, nir_imm_float(nb, 1)),
+   nir_fadd(nb, exp2x, nir_imm_float(nb, 1)));
   return;
+   }
 
case GLSLstd450Asinh:
   val->ssa->def = nir_fmul(nb, nir_fsign(nb, src[0]),
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] glsl: Use a simpler formula for tanh

2016-12-09 Thread Jason Ekstrand
The formula we have used in the past is a trivial reduction from the
definition by simply multiplying both the numerator and denominator of the
formula by 2.  However, multiplying by e^x, you can further reduce it.
This allows us to get rid of one side of the clamp and two of exponential
functions which should make it faster.  The new formula still passes the
dEQP precision tests for tanh so it should be fine.
---
 src/compiler/glsl/builtin_functions.cpp | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/compiler/glsl/builtin_functions.cpp 
b/src/compiler/glsl/builtin_functions.cpp
index 3dead1a..94e8279 100644
--- a/src/compiler/glsl/builtin_functions.cpp
+++ b/src/compiler/glsl/builtin_functions.cpp
@@ -3563,17 +3563,19 @@ builtin_builder::_tanh(const glsl_type *type)
ir_variable *x = in_var(type, "x");
MAKE_SIG(type, v130, 1, x);
 
-   /* Clamp x to [-10, +10] to avoid precision problems.
-* When x > 10, e^(-x) is so small relative to e^x that it gets flushed to
-* zero in the computation e^x + e^(-x). The same happens in the other
-* direction when x < -10.
+   /* tanh(x) := (0.5 * (e^x - e^(-x))) / (0.5 * (e^x + e^(-x)))
+*
+* With a little algebra this reduces to (e^2x - 1) / (e^2x + 1)
+*
+* Clamp x to (-inf, +10] to avoid precision problems.  When x > 10, e^x is
+* so much larger than 1.0 that 1.0 gets flushed to zero in the computation
+* e^x +- 1 so it can be ignored.
 */
ir_variable *t = body.make_temp(type, "tmp");
-   body.emit(assign(t, min2(max2(x, imm(-10.0f)), imm(10.0f;
+   body.emit(assign(t, min2(x, imm(10.0f;
 
-   /* (e^x - e^(-x)) / (e^x + e^(-x)) */
-   body.emit(ret(div(sub(exp(t), exp(neg(t))),
- add(exp(t), exp(neg(t));
+   body.emit(ret(div(sub(exp(mul(t, imm(2.0f))), imm(1.0f)),
+ add(exp(mul(t, imm(2.0f))), imm(1.0f);
 
return sig;
 }
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: pass the scratch buffer via user SGPRs on LLVM 4.0

2016-12-09 Thread Marek Olšák
From: Marek Olšák 

TGSI compute shaders don't have RW_BUFFERS, so use SGPR[0:1].
Graphics shaders use the first slot of RW_BUFFERS.

TODO: Dave's patch only implements the latter; fix the attribute names.

UNTESTED
---
 src/gallium/drivers/radeonsi/si_compute.c   |  27 +--
 src/gallium/drivers/radeonsi/si_shader.c|  34 +---
 src/gallium/drivers/radeonsi/si_shader.h|   1 +
 src/gallium/drivers/radeonsi/si_state.h |   1 +
 src/gallium/drivers/radeonsi/si_state_draw.c|   8 ++
 src/gallium/drivers/radeonsi/si_state_shaders.c | 102 +---
 6 files changed, 111 insertions(+), 62 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 9d83cb3..8a4c02e 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -287,21 +287,23 @@ static bool si_setup_compute_scratch_buffer(struct 
si_context *sctx,
r600_resource_reference(>compute_scratch_buffer, NULL);
 
sctx->compute_scratch_buffer = (struct r600_resource*)
pipe_buffer_create(>screen->b.b, 0,
   PIPE_USAGE_DEFAULT, scratch_needed);
 
if (!sctx->compute_scratch_buffer)
return false;
}
 
-   if (sctx->compute_scratch_buffer != shader->scratch_bo && 
scratch_needed) {
+   if (HAVE_LLVM <= 0x0309 &&
+   scratch_needed &&
+   sctx->compute_scratch_buffer != shader->scratch_bo) {
uint64_t scratch_va = sctx->compute_scratch_buffer->gpu_address;
 
si_shader_apply_scratch_relocs(sctx, shader, config, 
scratch_va);
 
if (si_shader_binary_upload(sctx->screen, shader))
return false;
 
r600_resource_reference(>scratch_bo,
sctx->compute_scratch_buffer);
}
@@ -351,30 +353,43 @@ static bool si_switch_compute_shader(struct si_context 
*sctx,
/* TODO: use si_multiwave_lds_size_workaround */
assert(lds_blocks <= 0xFF);
 
config->rsrc2 &= C_00B84C_LDS_SIZE;
config->rsrc2 |=  S_00B84C_LDS_SIZE(lds_blocks);
}
 
if (!si_setup_compute_scratch_buffer(sctx, shader, config))
return false;
 
-   if (shader->scratch_bo) {
+   if (config->scratch_bytes_per_wave) {
COMPUTE_DBG(sctx->screen, "Waves: %u; Scratch per wave: %u 
bytes; "
"Total Scratch: %u bytes\n", sctx->scratch_waves,
config->scratch_bytes_per_wave,
config->scratch_bytes_per_wave *
sctx->scratch_waves);
 
radeon_add_to_buffer_list(>b, >b.gfx,
- shader->scratch_bo, RADEON_USAGE_READWRITE,
- RADEON_PRIO_SCRATCH_BUFFER);
+ sctx->compute_scratch_buffer,
+ RADEON_USAGE_READWRITE,
+ RADEON_PRIO_SCRATCH_BUFFER);
+
+   /* Write the scratch pointer to SGPR[0:1]. */
+   if (HAVE_LLVM >= 0x0400 &&
+   program->ir_type == PIPE_SHADER_IR_TGSI) {
+   uint64_t scratch_va = 
sctx->compute_scratch_buffer->gpu_address;
+
+   radeon_set_sh_reg_seq(cs, R_00B900_COMPUTE_USER_DATA_0, 
2);
+   radeon_emit(cs, scratch_va);
+   radeon_emit(cs,
+   S_008F04_BASE_ADDRESS_HI(scratch_va >> 32) |
+   S_008F04_SWIZZLE_ENABLE(1));
+   }
}
 
shader_va = shader->bo->gpu_address + offset;
if (program->use_code_object_v2) {
/* Shader code is placed after the amd_kernel_code_t
 * struct. */
shader_va += sizeof(amd_kernel_code_t);
}
 
radeon_add_to_buffer_list(>b, >b.gfx, shader->bo,
@@ -729,21 +744,23 @@ static void si_launch_grid(
 
si_upload_compute_shader_descriptors(sctx);
si_emit_compute_shader_userdata(sctx);
 
if (si_is_atom_dirty(sctx, sctx->atoms.s.render_cond)) {
sctx->atoms.s.render_cond->emit(>b,
sctx->atoms.s.render_cond);
si_set_atom_dirty(sctx, sctx->atoms.s.render_cond, false);
}
 
-   if (program->input_size || program->ir_type == PIPE_SHADER_IR_NATIVE)
+   if (program->ir_type == PIPE_SHADER_IR_TGSI)
+   assert(program->input_size == 0);
+   else if (program->ir_type == PIPE_SHADER_IR_NATIVE)
si_upload_compute_input(sctx, code_object, info);
 
/* Global buffers */
for (i = 0; i < MAX_GLOBAL_BUFFERS; 

Re: [Mesa-dev] [PATCH v2 1/2] isl: introduce depth pitch query function

2016-12-09 Thread Jason Ekstrand
On Fri, Dec 9, 2016 at 8:45 AM, Lionel Landwerlin <
lionel.g.landwer...@intel.com> wrote:

> On 08/12/16 19:19, Jason Ekstrand wrote:
>
> On Dec 8, 2016 8:48 AM, "Lionel Landwerlin"  wrote:
>
> v2: add lod level argument (Jason)
> return 0 for any lod level > 0 (Jason)
> return 0 for any surface not 3D (Jason)
>
>
> I'd rather have ISL assert these than just silently return 0.  That way
> it's clear they make no sense.  We can have a dimension check in the Vulkan
> driver where it calls this function.
>
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/Makefile.isl.am  |  10 +-
>  src/intel/isl/isl.c|  28 +++
>  src/intel/isl/isl.h|  11 +
>  src/intel/isl/tests/.gitignore |   1 +
>  .../tests/isl_surf_get_image_depth_pitch_test.c| 245
> +
>  5 files changed, 294 insertions(+), 1 deletion(-)
>  create mode 100644 src/intel/isl/tests/isl_surf_g
> et_image_depth_pitch_test.c
>
> diff --git a/src/intel/Makefile.isl.am b/src/intel/Makefile.isl.am
> index 5a317f522b..eb788f4a13 100644
> --- a/src/intel/Makefile.isl.am
> +++ b/src/intel/Makefile.isl.am
> @@ -67,10 +67,18 @@ isl/isl_format_layout.c: isl/gen_format_layout.py \
>  #  Tests
>  # 
> 
>
> -check_PROGRAMS += isl/tests/isl_surf_get_image_offset_test
> +check_PROGRAMS += \
> +   isl/tests/isl_surf_get_image_depth_pitch_test \
> +   isl/tests/isl_surf_get_image_offset_test
>
>  TESTS += $(check_PROGRAMS)
>
> +isl_tests_isl_surf_get_image_depth_pitch_test_LDADD = \
> +   common/libintel_common.la \
> +   isl/libisl.la \
> +   $(top_builddir)/src/mesa/drivers/dri/i965/libi965_compiler.la \
> +   -lm
> +
>  isl_tests_isl_surf_get_image_offset_test_LDADD = \
> common/libintel_common.la \
> isl/libisl.la \
> diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> index 82ab68dc65..0d61cd7cdc 100644
> --- a/src/intel/isl/isl.c
> +++ b/src/intel/isl/isl.c
> @@ -1874,3 +1874,31 @@ isl_surf_get_depth_format(const struct isl_device
> *dev,
>return 5; /* D16_UNORM */
> }
>  }
> +
> +uint32_t
> +isl_surf_get_depth_pitch(const struct isl_device *device,
>
>
> Could you please put some units on this function?
>
> + const struct isl_surf *surf,
> + uint32_t level)
> +{
> +   switch (surf->dim_layout) {
> +   case ISL_DIM_LAYOUT_GEN9_1D:
> +   case ISL_DIM_LAYOUT_GEN4_2D:
> +  return 0;
>
>
> This isn't right.  On Sky Lake and above, 3D surfaces have the GEN4_2D
> layout.  The depth pitch does make sense here and it's equal to the array
> pitch for all miplevels.
>
>
> So should that return the array pitch for GEN4_2D layout just on Skylake
> or other generations too?
>

Anything that does GEN4_2D for 3D textures.  I think what I'd do is
something like this:

assert(surf->dim == ISL_SURF_DIM_3D);
switch (surf->dim_layout) {
case ISL_DIM_LAYOUT_GEN4_2D:
   return isl_surf_get_array_pitch(surf);
case ISL_DIM_LAYOUT_GEN4_3D:
   /* Depth pitch doesn't make sense for gen4 3D textures at LOD1 and above
*/
   assert(level == 0);
   return isl_align(isl_align_div_npot(surf->phys_level0_sa.h,

isl_format_get_layout(surf->format)->bh,
surf->image_align_el.h);
case ISL_DIM_LAYOUT_GEN9_1D:
default:
   unreachable("Invalid layout for a 3D texture");
}

or something along those lines.


>
>
> +   case ISL_DIM_LAYOUT_GEN4_3D: {
> +  if (level > 0)
> + return 0;
> +
> +  if (surf->tiling == ISL_TILING_LINEAR)
> + return surf->row_pitch * surf->phys_level0_sa.h;
> +
> +  struct isl_tile_info tile_info;
> +  isl_surf_get_tile_info(device, surf, _info);
> +
> +  return surf->row_pitch * isl_align(surf->phys_level0_sa.h,
> + surf->image_alignment_el.h);
>
>
> This calculation isn't right.  In both cases, it should simply be the
> height of lod0 aligned to the surface vertical alignment.  It has nothing
> to do with tiling so fat as I know.
>
> +  }
> +   default:
> +  unreachable("bad isl_dim_layout");
> +  break;
> +   }
> +}
> diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h
> index 07368f9bcf..7c033f380c 100644
> --- a/src/intel/isl/isl.h
> +++ b/src/intel/isl/isl.h
> @@ -1388,10 +1388,21 @@ isl_surf_get_array_pitch_sa_rows(const struct
> isl_surf *surf)
>  static inline uint32_t
>  isl_surf_get_array_pitch(const struct isl_surf *surf)
>  {
> +   if (surf->dim_layout == ISL_DIM_LAYOUT_GEN4_3D)
> +  return 0;
> return isl_surf_get_array_pitch_sa_rows(surf) * surf->row_pitch;
>  }
>
>  /**
> + * Pitch between depth slices, in bytes (for 2D images, this should be
> + * equivalent to isl_surf_get_array_pitch()).
> + */
> +uint32_t
> +isl_surf_get_depth_pitch(const struct isl_device *device,

[Mesa-dev] [PATCH v2] glapi: add missing INTEL_conservative_rasterization

2016-12-09 Thread Lionel Landwerlin
v2: put enum directly in gl_API.xml (Ilia)

Signed-off-by: Lionel Landwerlin 
Cc: Ilia Mirkin 
---
 src/mapi/glapi/gen/gl_API.xml | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 00c9bb795c..6e00363b6f 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -12812,6 +12812,10 @@
 
 

+
+  
+
+
 http://www.w3.org/2001/XInclude"/>

 
--
2.11.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] isl: introduce depth pitch query function

2016-12-09 Thread Lionel Landwerlin

On 08/12/16 19:19, Jason Ekstrand wrote:
On Dec 8, 2016 8:48 AM, "Lionel Landwerlin" > wrote:


v2: add lod level argument (Jason)
return 0 for any lod level > 0 (Jason)
return 0 for any surface not 3D (Jason)


I'd rather have ISL assert these than just silently return 0.  That 
way it's clear they make no sense. We can have a dimension check in 
the Vulkan driver where it calls this function.


Signed-off-by: Lionel Landwerlin >
---
 src/intel/Makefile.isl.am  |  10 +-
 src/intel/isl/isl.c| 28 +++
 src/intel/isl/isl.h| 11 +
 src/intel/isl/tests/.gitignore |  1 +
 .../tests/isl_surf_get_image_depth_pitch_test.c | 245
+
 5 files changed, 294 insertions(+), 1 deletion(-)
 create mode 100644
src/intel/isl/tests/isl_surf_get_image_depth_pitch_test.c

diff --git a/src/intel/Makefile.isl.am 
b/src/intel/Makefile.isl.am 
index 5a317f522b..eb788f4a13 100644
--- a/src/intel/Makefile.isl.am 
+++ b/src/intel/Makefile.isl.am 
@@ -67,10 +67,18 @@ isl/isl_format_layout.c:
isl/gen_format_layout.py \
 #  Tests
 #


-check_PROGRAMS += isl/tests/isl_surf_get_image_offset_test
+check_PROGRAMS += \
+   isl/tests/isl_surf_get_image_depth_pitch_test \
+   isl/tests/isl_surf_get_image_offset_test

 TESTS += $(check_PROGRAMS)

+isl_tests_isl_surf_get_image_depth_pitch_test_LDADD = \
+   common/libintel_common.la  \
+   isl/libisl.la  \
+ 
 $(top_builddir)/src/mesa/drivers/dri/i965/libi965_compiler.la

 \
+   -lm
+
 isl_tests_isl_surf_get_image_offset_test_LDADD = \
common/libintel_common.la  \
isl/libisl.la  \
diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
index 82ab68dc65..0d61cd7cdc 100644
--- a/src/intel/isl/isl.c
+++ b/src/intel/isl/isl.c
@@ -1874,3 +1874,31 @@ isl_surf_get_depth_format(const struct
isl_device *dev,
   return 5; /* D16_UNORM */
}
 }
+
+uint32_t
+isl_surf_get_depth_pitch(const struct isl_device *device,


Could you please put some units on this function?

+ const struct isl_surf *surf,
+ uint32_t level)
+{
+   switch (surf->dim_layout) {
+   case ISL_DIM_LAYOUT_GEN9_1D:
+   case ISL_DIM_LAYOUT_GEN4_2D:
+  return 0;


This isn't right.  On Sky Lake and above, 3D surfaces have the GEN4_2D 
layout.  The depth pitch does make sense here and it's equal to the 
array pitch for all miplevels.


So should that return the array pitch for GEN4_2D layout just on Skylake 
or other generations too?




+   case ISL_DIM_LAYOUT_GEN4_3D: {
+  if (level > 0)
+ return 0;
+
+  if (surf->tiling == ISL_TILING_LINEAR)
+ return surf->row_pitch * surf->phys_level0_sa.h;
+
+  struct isl_tile_info tile_info;
+  isl_surf_get_tile_info(device, surf, _info);
+
+  return surf->row_pitch * isl_align(surf->phys_level0_sa.h,
+  surf->image_alignment_el.h);


This calculation isn't right.  In both cases, it should simply be the 
height of lod0 aligned to the surface vertical alignment.  It has 
nothing to do with tiling so fat as I know.


+  }
+   default:
+  unreachable("bad isl_dim_layout");
+  break;
+   }
+}
diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h
index 07368f9bcf..7c033f380c 100644
--- a/src/intel/isl/isl.h
+++ b/src/intel/isl/isl.h
@@ -1388,10 +1388,21 @@ isl_surf_get_array_pitch_sa_rows(const
struct isl_surf *surf)
 static inline uint32_t
 isl_surf_get_array_pitch(const struct isl_surf *surf)
 {
+   if (surf->dim_layout == ISL_DIM_LAYOUT_GEN4_3D)
+  return 0;
return isl_surf_get_array_pitch_sa_rows(surf) * surf->row_pitch;
 }

 /**
+ * Pitch between depth slices, in bytes (for 2D images, this
should be
+ * equivalent to isl_surf_get_array_pitch()).
+ */
+uint32_t
+isl_surf_get_depth_pitch(const struct isl_device *device,
+ const struct isl_surf *surf,
+ uint32_t level);
+
+/**
  * Calculate the offset, in units of surface samples, to a
subimage in the
  * surface.
  *
diff --git a/src/intel/isl/tests/.gitignore

Re: [Mesa-dev] [PATCH] anv: Clean up some unused variables

2016-12-09 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Dec 9, 2016 07:07, "Edward O'Callaghan" 
wrote:

Following on from the spirit of commit 011e5570f.

Signed-off-by: Edward O'Callaghan 
---
 src/intel/vulkan/anv_private.h | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 1f03b68..9e3b72e 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -176,14 +176,6 @@ vk_to_isl_color(VkClearColorValue color)
memcpy((dest), (src), (count) * sizeof(*(src))); \
 })

-/* Define no kernel as 1, since that's an illegal offset for a kernel */
-#define NO_KERNEL 1
-
-struct anv_common {
-VkStructureType sType;
-const void* pNext;
-};
-
 /* Whenever we generate an error, pass it through this function. Useful for
  * debugging, where we can break on it. Only call at error site, not when
  * propagating errors. Might be useful to plug in a stack trace here.
@@ -1859,13 +1851,6 @@ ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_shader_module,
VkShaderModule)
   return (const __VkType *) __anv_obj; \
}

-#define ANV_COMMON_TO_STRUCT(__VkType, __vk_name, __common_name) \
-   const __VkType *__vk_name = anv_common_to_ ## __VkType(__common_name)
-
-ANV_DEFINE_STRUCT_CASTS(anv_common, VkMemoryBarrier)
-ANV_DEFINE_STRUCT_CASTS(anv_common, VkBufferMemoryBarrier)
-ANV_DEFINE_STRUCT_CASTS(anv_common, VkImageMemoryBarrier)
-
 /* Gen-specific function declarations */
 #ifdef genX
 #  include "anv_genX.h"
--
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv: Clean up some unused variables

2016-12-09 Thread Edward O'Callaghan
Following on from the spirit of commit 011e5570f.

Signed-off-by: Edward O'Callaghan 
---
 src/intel/vulkan/anv_private.h | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 1f03b68..9e3b72e 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -176,14 +176,6 @@ vk_to_isl_color(VkClearColorValue color)
memcpy((dest), (src), (count) * sizeof(*(src))); \
 })
 
-/* Define no kernel as 1, since that's an illegal offset for a kernel */
-#define NO_KERNEL 1
-
-struct anv_common {
-VkStructureType sType;
-const void* pNext;
-};
-
 /* Whenever we generate an error, pass it through this function. Useful for
  * debugging, where we can break on it. Only call at error site, not when
  * propagating errors. Might be useful to plug in a stack trace here.
@@ -1859,13 +1851,6 @@ ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_shader_module, 
VkShaderModule)
   return (const __VkType *) __anv_obj; \
}
 
-#define ANV_COMMON_TO_STRUCT(__VkType, __vk_name, __common_name) \
-   const __VkType *__vk_name = anv_common_to_ ## __VkType(__common_name)
-
-ANV_DEFINE_STRUCT_CASTS(anv_common, VkMemoryBarrier)
-ANV_DEFINE_STRUCT_CASTS(anv_common, VkBufferMemoryBarrier)
-ANV_DEFINE_STRUCT_CASTS(anv_common, VkImageMemoryBarrier)
-
 /* Gen-specific function declarations */
 #ifdef genX
 #  include "anv_genX.h"
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [rfc] radv: add initial prime support.

2016-12-09 Thread Mike Lothian
Hi

This no longer applies cleanly since radv/meta: cleanup image info setup.
 71a9574ffa1463773ad7587262bacc50ed37c042

Regards

Mike

On Wed, 23 Nov 2016 at 05:29 Dave Airlie  wrote:

> From: Dave Airlie 
>
> This is kind of a gross hacks, but vulkan doesn't specify anything
> but it would be nice to let people with prime systems at least
> see some stuff rendering for now.
>
> This creates a linear shadow image in GART that gets blitted to at the
> image transition.
>
> Now ideally:
> this would use SDMA - but we want to use SDMA for transfer queues
> maybe we don't expose a transfer queue on prime cards who knows.
>
> we wouldn't have to add two pointers to every image, but my other
> attempts at this were ugly.
>
> Is the image transition the proper place to hack this in? not
> really sure anywhere else is appropriate.
>
> It also relies on DRI_PRIME=1 being set, I should be able
> to work this out somehow automatically I think, probably getting
> a DRI3 fd from the X server and doing drmGetDevice on it, and
> comparing where we end up.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c |  18 +++
>  src/amd/vulkan/radv_device.c |   3 ++
>  src/amd/vulkan/radv_meta.h   |   2 +
>  src/amd/vulkan/radv_meta_copy.c  |  31 +++
>  src/amd/vulkan/radv_private.h|   4 ++
>  src/amd/vulkan/radv_wsi.c| 111
> ++-
>  6 files changed, 144 insertions(+), 25 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c
> b/src/amd/vulkan/radv_cmd_buffer.c
> index a2d55833..4432afc 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -2296,6 +2296,20 @@ static void radv_handle_dcc_image_transition(struct
> radv_cmd_buffer *cmd_buffer,
> }
>  }
>
> +static void radv_handle_prime_image_transition(struct radv_cmd_buffer
> *cmd_buffer,
> +  struct radv_image *image,
> +  VkImageLayout src_layout,
> +  VkImageLayout dst_layout,
> +  VkImageSubresourceRange
> range,
> +  VkImageAspectFlags
> pending_clears)
> +{
> +   cmd_buffer->state.flush_bits |= RADV_CMD_FLUSH_AND_INV_FRAMEBUFFER;
> +   si_emit_cache_flush(cmd_buffer);
> +   radv_blit_to_prime_linear(cmd_buffer, image);
> +   cmd_buffer->state.flush_bits |= RADV_CMD_FLUSH_AND_INV_FRAMEBUFFER;
> +   si_emit_cache_flush(cmd_buffer);
> +}
> +
>  static void radv_handle_image_transition(struct radv_cmd_buffer
> *cmd_buffer,
>  struct radv_image *image,
>  VkImageLayout src_layout,
> @@ -2314,6 +2328,10 @@ static void radv_handle_image_transition(struct
> radv_cmd_buffer *cmd_buffer,
> if (image->surface.dcc_size)
> radv_handle_dcc_image_transition(cmd_buffer, image,
> src_layout,
>  dst_layout, range,
> pending_clears);
> +
> +   if (image->prime_image && dst_layout ==
> VK_IMAGE_LAYOUT_PRESENT_SRC_KHR)
> +   radv_handle_prime_image_transition(cmd_buffer, image,
> src_layout,
> +  dst_layout, range,
> pending_clears);
>  }
>
>  void radv_CmdPipelineBarrier(
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index c639d53..b21447f 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -105,6 +105,9 @@ radv_physical_device_init(struct radv_physical_device
> *device,
> }
> drmFreeVersion(version);
>
> +   if (getenv("DRI_PRIME"))
> +   device->is_different_gpu = true;
> +
> device->_loader_data.loaderMagic = ICD_LOADER_MAGIC;
> device->instance = instance;
> assert(strlen(path) < ARRAY_SIZE(device->path));
> diff --git a/src/amd/vulkan/radv_meta.h b/src/amd/vulkan/radv_meta.h
> index 97d020c..e43a0e7 100644
> --- a/src/amd/vulkan/radv_meta.h
> +++ b/src/amd/vulkan/radv_meta.h
> @@ -186,6 +186,8 @@ void radv_meta_resolve_compute_image(struct
> radv_cmd_buffer *cmd_buffer,
>  uint32_t region_count,
>  const VkImageResolve *regions);
>
> +void radv_blit_to_prime_linear(struct radv_cmd_buffer *cmd_buffer,
> +  struct radv_image *image);
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/src/amd/vulkan/radv_meta_copy.c
> b/src/amd/vulkan/radv_meta_copy.c
> index 4c01eb7..3fd8d0c 100644
> --- a/src/amd/vulkan/radv_meta_copy.c
> +++ b/src/amd/vulkan/radv_meta_copy.c
> @@ -397,3 +397,34 @@ void radv_CmdCopyImage(
>
> radv_meta_restore(_state, cmd_buffer);
>  }
> +
> +void radv_blit_to_prime_linear(struct radv_cmd_buffer 

Re: [Mesa-dev] [PATCH 1/3] radv: Clean up some unused variables.

2016-12-09 Thread Emil Velikov
On 8 December 2016 at 22:11, Bas Nieuwenhuizen  wrote:
> Leftovers from anv?
>
> Signed-off-by: Bas Nieuwenhuizen 
> ---
>  src/amd/vulkan/radv_private.h | 16 
>  1 file changed, 16 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
> index 4b72017..67da46a 100644
> --- a/src/amd/vulkan/radv_private.h
> +++ b/src/amd/vulkan/radv_private.h
> @@ -179,14 +179,6 @@ radv_clear_mask(uint32_t *inout_mask, uint32_t 
> clear_mask)
>
>  #define zero(x) (memset(&(x), 0, sizeof(x)))
>
> -/* Define no kernel as 1, since that's an illegal offset for a kernel */
> -#define NO_KERNEL 1
> -
> -struct radv_common {
> -   VkStructureType sType;
> -   const void* pNext;
> -};
> -
>  /* Whenever we generate an error, pass it through this function. Useful for
>   * debugging, where we can break on it. Only call at error site, not when
>   * propagating errors. Might be useful to plug in a stack trace here.
> @@ -1282,12 +1274,4 @@ RADV_DEFINE_NONDISP_HANDLE_CASTS(radv_shader_module, 
> VkShaderModule)
> return (const __VkType *) __radv_obj;   \
> }
>
> -#define RADV_COMMON_TO_STRUCT(__VkType, __vk_name, __common_name)  \
> -   const __VkType *__vk_name = radv_common_to_ ## __VkType(__common_name)
> -
> -RADV_DEFINE_STRUCT_CASTS(radv_common, VkMemoryBarrier)
> -RADV_DEFINE_STRUCT_CASTS(radv_common, VkBufferMemoryBarrier)
> -RADV_DEFINE_STRUCT_CASTS(radv_common, VkImageMemoryBarrier)
> -
> -
Skimming through - all three should be applicable on anv. Care to send
some patches ;-)

Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] gallium: add renderonly library

2016-12-09 Thread Emil Velikov
On 9 December 2016 at 13:20, Alexandre Courbot  wrote:
> On 12/08/2016 04:16 PM, Alexandre Courbot wrote:
>> On 11/30/2016 10:44 PM, Christian Gmeiner wrote:
>>> This a very lightweight library to add basic support for
>>> renderonly GPUs. It does all the magic regarding in/exporting
>>> buffers etc. This library will likely break android support and
>>> hopefully will get replaced with a better solution based on gbm2.
>>
>> Since we have no idea when said better solution will be available, and
>> the situation of render-only GPUs has been unsustainable for way too
>> long, I really hope a solution like this one can be merged in the meantime.
>>
>> I have tried it after porting support for Tegra
>> (https://github.com/austriancoder/mesa/commit/2c7354701ee21ca28f69f5d7588f1d497553b4bf)
>> to this latest version. Here are a few issues I have met:
>>
>> First, setting the tiling works indeed just fine if we are using an
>> ioctl for this. However my impression was that the preferred way of
>> doing it was through FB modifiers, and we started moving Tegra to this
>> scheme. Problem: the FB modifier is passed through a call to
>> drmModeAddFB2WithModifiers(), which is called by the client program, not
>> Mesa - which in this case leaves the program with the burden of figuring
>> out what the modifier should be. So with FB modifiers the problem is
>> still here.
>>
>> Another issue I have seen is that GLX does not seem to work with this.
>> X/modesetting starts just fine, and GLamor also seems to initialize.
>> However glxinfo freezes on a xshmfence_await() call, and all GLX
>> programs fail as follow:
>
> Solved that issue by forcing is_different_gpu to true in
> loader_dri3_drawable_init() (pretty hackish, looking for a better way).
>
> Also I had another issue with Wayland where EGL windows would be
> displayed all black. I traced this to the fact that Wayland was trying
> to share the buffer by calling the old FLINK ioctl on the rendernode
> device, which is forbidden. Opening card1 instead of renderD128 did the
> trick as a workaround, but I am surprised as I thought Wayland was using
> DRI3 exclusively? I am not very familiar with neither Mesa nor Wayland
> though, so my assumption may very well be incorrect.
>
Some of these issues is due to the hardcoded nature of the card/render
node. I've had drmDevice API which could/should be extended and
utilised here.
Earlier versions were quite buggy, so make sure to use
677cd97dc4a930af508388713f5016baf664ed18 or later.

Since from kernel there is no relation between the KMS and GPU device,
one will need to apply some heuristics locally. At some point we might
want to make things more systematic/configurable, but let's get it
working first ;-)

Thus, please propose/add anything to drmDevice that will you think is
enough to build some heuristics on.

With that sorted, the Wayland FLINK issues should go away.

> Anyway, with this patch and the corresponding Tegra support, I have a
> working solution that can run unmodified Mesa applications using KMS,
> EGL/Wayland and GLX backends on TK1 and TX1 platforms. Neat!
>
> Considering that we have been ressorting to hacking all the KMS
> applications of interest to connect the render and display nodes
> together with the right tiling settings for the last two years, I regard
> this patch as a huge improvement for mobile graphics and would like to
> strongly support it.
>
> My only remaining concern is that this scheme cannot support the case
> where the tiling format is specified using FB modifiers, since this
> requires drmModeAddFB2WithModifiers() to be called from the application.
> So for Tegra we have to resort to a staging, not enabled by default
> SET_TILING ioctl. Not ideal, but recompiling your kernel with an
> additional config option is much less a hassle than patching every KMS
> app under the sun.
>
> So while thoughts about how this last issue can be addressed are
> welcome, I think this little lib can improve the life of many SoC users.
Agreed - let's have things as-is. One can "polish" the backend side of
things, once we have something in place.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] gallium: add renderonly library

2016-12-09 Thread Daniel Stone
Hi Alexandre,

On 9 December 2016 at 13:20, Alexandre Courbot  wrote:
> On 12/08/2016 04:16 PM, Alexandre Courbot wrote:
>> First, setting the tiling works indeed just fine if we are using an
>> ioctl for this. However my impression was that the preferred way of
>> doing it was through FB modifiers, and we started moving Tegra to this
>> scheme. Problem: the FB modifier is passed through a call to
>> drmModeAddFB2WithModifiers(), which is called by the client program, not
>> Mesa - which in this case leaves the program with the burden of figuring
>> out what the modifier should be. So with FB modifiers the problem is
>> still here.
>>
>> Another issue I have seen is that GLX does not seem to work with this.
>> X/modesetting starts just fine, and GLamor also seems to initialize.
>> However glxinfo freezes on a xshmfence_await() call, and all GLX
>> programs fail as follow:
>
> Solved that issue by forcing is_different_gpu to true in
> loader_dri3_drawable_init() (pretty hackish, looking for a better way).
>
> Also I had another issue with Wayland where EGL windows would be
> displayed all black. I traced this to the fact that Wayland was trying
> to share the buffer by calling the old FLINK ioctl on the rendernode
> device, which is forbidden. Opening card1 instead of renderD128 did the
> trick as a workaround, but I am surprised as I thought Wayland was using
> DRI3 exclusively? I am not very familiar with neither Mesa nor Wayland
> though, so my assumption may very well be incorrect.

Wayland doesn't use DRI-anything; Mesa has its own interface for
Wayland. I'm really surprised that you're seeing this behaviour
though: if you search for WL_DRM_CAPABILITY_PRIME (i.e. send dmabufs
rather than flink names) in src/egl/drivers/dri2/platform_wayland.c,
you'll see that a) we always use it when available, and b) we refuse
to initialise when the device is a rendernode and we don't have PRIME.
So I'm not sure how this could ever happen ...

> Anyway, with this patch and the corresponding Tegra support, I have a
> working solution that can run unmodified Mesa applications using KMS,
> EGL/Wayland and GLX backends on TK1 and TX1 platforms. Neat!

Cool! I assume this will work on Tegra124 more generally then - do you
have a branch somewhere?

> Considering that we have been ressorting to hacking all the KMS
> applications of interest to connect the render and display nodes
> together with the right tiling settings for the last two years, I regard
> this patch as a huge improvement for mobile graphics and would like to
> strongly support it.
>
> My only remaining concern is that this scheme cannot support the case
> where the tiling format is specified using FB modifiers, since this
> requires drmModeAddFB2WithModifiers() to be called from the application.
> So for Tegra we have to resort to a staging, not enabled by default
> SET_TILING ioctl. Not ideal, but recompiling your kernel with an
> additional config option is much less a hassle than patching every KMS
> app under the sun.
>
> So while thoughts about how this last issue can be addressed are
> welcome, I think this little lib can improve the life of many SoC users.

Check out Ben Widawsky's 'Renderbuffer Decompression (and GBM
modifiers)' patchset. With this, as well as krh's pending GETPLANE2
ioctl that will allow us to get a list of acceptable modifiers for
display from KMS, we can trivially implement this in clients without
the need for a backchannel ioctl:
https://git.collabora.com/cgit/user/daniels/weston.git/commit/?h=wip/2016-11/gbm-planes-modifiers

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/7] i965: Flush pipeline before saving SO_WRITE_OFFSETS

2016-12-09 Thread Emil Velikov
On 9 December 2016 at 10:54, Chris Wilson  wrote:
> Before saving the current position of the pipeline for the render
> stream, we need to flush.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99030
> Testcase: piglit/arb_transform_feedback2-draw-auto
> Signed-off-by: Chris Wilson 
Looks like we'd want this in the stable release, won't we ? If so
please add the stable tag before pushing.
If you think that pre-3/7 we may have exhibit unwanted issues, please
mention so (literally a line or two), and tag it for stable as well.

Cc: mesa-sta...@lists.freedesktop.org

Thanks !
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] i965: Move pipelined register access to its own file

2016-12-09 Thread Emil Velikov
On 9 December 2016 at 10:54, Chris Wilson  wrote:

> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/brw_pipelined_register.h

> +#ifndef BRW_PIPELINED_REGISTER_H
> +#define BRW_PIPELINED_REGISTER_H
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +void brw_load_register_mem32(struct brw_context *brw,
> + uint32_t reg,
> + drm_intel_bo *bo,
> + uint32_t offset);
> +void brw_load_register_mem64(struct brw_context *brw,
> + uint32_t reg,
> + drm_intel_bo *bo,
> + uint32_t offset);
> +
Please add a couple of forward declarations/includes to resolve the
above types. It will save you/others a bit of "wtf" moments as one
reorders the header inclusions at a later stage.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] gallium: add renderonly library

2016-12-09 Thread Alexandre Courbot
On 12/08/2016 04:16 PM, Alexandre Courbot wrote:
> On 11/30/2016 10:44 PM, Christian Gmeiner wrote:
>> This a very lightweight library to add basic support for
>> renderonly GPUs. It does all the magic regarding in/exporting
>> buffers etc. This library will likely break android support and
>> hopefully will get replaced with a better solution based on gbm2.
> 
> Since we have no idea when said better solution will be available, and
> the situation of render-only GPUs has been unsustainable for way too
> long, I really hope a solution like this one can be merged in the meantime.
> 
> I have tried it after porting support for Tegra
> (https://github.com/austriancoder/mesa/commit/2c7354701ee21ca28f69f5d7588f1d497553b4bf)
> to this latest version. Here are a few issues I have met:
> 
> First, setting the tiling works indeed just fine if we are using an
> ioctl for this. However my impression was that the preferred way of
> doing it was through FB modifiers, and we started moving Tegra to this
> scheme. Problem: the FB modifier is passed through a call to
> drmModeAddFB2WithModifiers(), which is called by the client program, not
> Mesa - which in this case leaves the program with the burden of figuring
> out what the modifier should be. So with FB modifiers the problem is
> still here.
> 
> Another issue I have seen is that GLX does not seem to work with this.
> X/modesetting starts just fine, and GLamor also seems to initialize.
> However glxinfo freezes on a xshmfence_await() call, and all GLX
> programs fail as follow:

Solved that issue by forcing is_different_gpu to true in
loader_dri3_drawable_init() (pretty hackish, looking for a better way).

Also I had another issue with Wayland where EGL windows would be
displayed all black. I traced this to the fact that Wayland was trying
to share the buffer by calling the old FLINK ioctl on the rendernode
device, which is forbidden. Opening card1 instead of renderD128 did the
trick as a workaround, but I am surprised as I thought Wayland was using
DRI3 exclusively? I am not very familiar with neither Mesa nor Wayland
though, so my assumption may very well be incorrect.

Anyway, with this patch and the corresponding Tegra support, I have a
working solution that can run unmodified Mesa applications using KMS,
EGL/Wayland and GLX backends on TK1 and TX1 platforms. Neat!

Considering that we have been ressorting to hacking all the KMS
applications of interest to connect the render and display nodes
together with the right tiling settings for the last two years, I regard
this patch as a huge improvement for mobile graphics and would like to
strongly support it.

My only remaining concern is that this scheme cannot support the case
where the tiling format is specified using FB modifiers, since this
requires drmModeAddFB2WithModifiers() to be called from the application.
So for Tegra we have to resort to a staging, not enabled by default
SET_TILING ioctl. Not ideal, but recompiling your kernel with an
additional config option is much less a hassle than patching every KMS
app under the sun.

So while thoughts about how this last issue can be addressed are
welcome, I think this little lib can improve the life of many SoC users.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/7] i965: Reorder parameters to brw_store_register_mem

2016-12-09 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 09/12/16 10:54, Chris Wilson wrote:

Reorder the parameters to brw_store_register_mem32 and
brw_store_register_mem64 so that the offset into the buffer and its
identifier are paired. This brings the interface into line wth
brw_load_register_mem.

Signed-off-by: Chris Wilson 
---
  src/mesa/drivers/dri/i965/brw_performance_monitor.c |  3 ++-
  src/mesa/drivers/dri/i965/brw_pipelined_register.c  |  8 ++--
  src/mesa/drivers/dri/i965/brw_pipelined_register.h  |  4 ++--
  src/mesa/drivers/dri/i965/gen6_queryobj.c   | 21 -
  src/mesa/drivers/dri/i965/gen7_sol_state.c  |  5 ++---
  src/mesa/drivers/dri/i965/hsw_sol.c | 16 +---
  6 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_performance_monitor.c 
b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
index 1b991bfafa..e198525f8f 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_monitor.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
@@ -589,8 +589,9 @@ snapshot_statistics_registers(struct brw_context *brw,
   assert(ctx->PerfMonitor.Groups[group].Counters[i].Type ==
  GL_UNSIGNED_INT64_AMD);
  
- brw_store_register_mem64(brw, monitor->pipeline_stats_bo,

+ brw_store_register_mem64(brw,
brw->perfmon.statistics_registers[i],
+  monitor->pipeline_stats_bo,
offset + i * sizeof(uint64_t));
}
 }
diff --git a/src/mesa/drivers/dri/i965/brw_pipelined_register.c 
b/src/mesa/drivers/dri/i965/brw_pipelined_register.c
index 0b226035e7..6b6b2487e8 100644
--- a/src/mesa/drivers/dri/i965/brw_pipelined_register.c
+++ b/src/mesa/drivers/dri/i965/brw_pipelined_register.c
@@ -81,7 +81,9 @@ brw_load_register_mem64(struct brw_context *brw,
   */
  void
  brw_store_register_mem32(struct brw_context *brw,
- drm_intel_bo *bo, uint32_t reg, uint32_t offset)
+ uint32_t reg,
+ drm_intel_bo *bo,
+ uint32_t offset)
  {
 assert(brw->gen >= 6);
  
@@ -107,7 +109,9 @@ brw_store_register_mem32(struct brw_context *brw,

   */
  void
  brw_store_register_mem64(struct brw_context *brw,
- drm_intel_bo *bo, uint32_t reg, uint32_t offset)
+ uint32_t reg,
+ drm_intel_bo *bo,
+ uint32_t offset)
  {
 assert(brw->gen >= 6);
  
diff --git a/src/mesa/drivers/dri/i965/brw_pipelined_register.h b/src/mesa/drivers/dri/i965/brw_pipelined_register.h

index 7730f4cad7..1904ae4a54 100644
--- a/src/mesa/drivers/dri/i965/brw_pipelined_register.h
+++ b/src/mesa/drivers/dri/i965/brw_pipelined_register.h
@@ -38,12 +38,12 @@ void brw_load_register_mem64(struct brw_context *brw,
   uint32_t offset);
  
  void brw_store_register_mem32(struct brw_context *brw,

-  drm_intel_bo *bo,
uint32_t reg,
+  drm_intel_bo *bo,
uint32_t offset);
  void brw_store_register_mem64(struct brw_context *brw,
-  drm_intel_bo *bo,
uint32_t reg,
+  drm_intel_bo *bo,
uint32_t offset);
  
  void brw_load_register_imm32(struct brw_context *brw,

diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c 
b/src/mesa/drivers/dri/i965/gen6_queryobj.c
index ce6813b531..9de83ed50b 100644
--- a/src/mesa/drivers/dri/i965/gen6_queryobj.c
+++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c
@@ -75,12 +75,13 @@ write_primitives_generated(struct brw_context *brw,
 brw_emit_mi_flush(brw);
  
 if (brw->gen >= 7 && stream > 0) {

-  brw_store_register_mem64(brw, query_bo,
+  brw_store_register_mem64(brw,
 GEN7_SO_PRIM_STORAGE_NEEDED(stream),
-   idx * sizeof(uint64_t));
+   query_bo, idx * sizeof(uint64_t));
 } else {
-  brw_store_register_mem64(brw, query_bo, CL_INVOCATION_COUNT,
-   idx * sizeof(uint64_t));
+  brw_store_register_mem64(brw,
+   CL_INVOCATION_COUNT,
+   query_bo, idx * sizeof(uint64_t));
 }
  }
  
@@ -91,11 +92,13 @@ write_xfb_primitives_written(struct brw_context *brw,

 brw_emit_mi_flush(brw);
  
 if (brw->gen >= 7) {

-  brw_store_register_mem64(brw, bo, GEN7_SO_NUM_PRIMS_WRITTEN(stream),
-   idx * sizeof(uint64_t));
+  brw_store_register_mem64(brw,
+   GEN7_SO_NUM_PRIMS_WRITTEN(stream),
+   bo, idx * 

Re: [Mesa-dev] [PATCH 3/7] i965: Stop passing read/write domains to load_reg_mem32/64

2016-12-09 Thread Lionel Landwerlin
Some I915_GEM_DOMAIN_VERTEX are changed to I915_GEM_DOMAIN_INSTRUCTION, 
which are treated the same way in the kernel. So I guess it doesn't matter.


Reviewed-by: Lionel Landwerlin 

On 09/12/16 10:54, Chris Wilson wrote:

The domains used are immaterial, and we should never be marking the read
from the buffer as a write, so stop passing them around from the caller
and choose the appropriate read domain when writing.

Signed-off-by: Chris Wilson 
---
  src/mesa/drivers/dri/i965/brw_compute.c| 27 +--
  src/mesa/drivers/dri/i965/brw_conditional_render.c | 14 ++--
  src/mesa/drivers/dri/i965/brw_context.h| 10 +++---
  src/mesa/drivers/dri/i965/brw_draw.c   | 38 +-
  src/mesa/drivers/dri/i965/hsw_queryobj.c   | 29 -
  src/mesa/drivers/dri/i965/hsw_sol.c| 14 +++-
  src/mesa/drivers/dri/i965/intel_batchbuffer.c  | 19 +--
  7 files changed, 48 insertions(+), 103 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
b/src/mesa/drivers/dri/i965/brw_compute.c
index 16b5df7ca4..51cd45df7a 100644
--- a/src/mesa/drivers/dri/i965/brw_compute.c
+++ b/src/mesa/drivers/dri/i965/brw_compute.c
@@ -40,15 +40,12 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw)
 GLintptr indirect_offset = brw->compute.num_work_groups_offset;
 drm_intel_bo *bo = brw->compute.num_work_groups_bo;
  
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMX, bo,

- I915_GEM_DOMAIN_VERTEX, 0,
- indirect_offset + 0);
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMY, bo,
- I915_GEM_DOMAIN_VERTEX, 0,
- indirect_offset + 4);
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMZ, bo,
- I915_GEM_DOMAIN_VERTEX, 0,
- indirect_offset + 8);
+   brw_load_register_mem32(brw,
+   GEN7_GPGPU_DISPATCHDIMX, bo, indirect_offset + 0);
+   brw_load_register_mem32(brw,
+   GEN7_GPGPU_DISPATCHDIMY, bo, indirect_offset + 4);
+   brw_load_register_mem32(brw,
+   GEN7_GPGPU_DISPATCHDIMZ, bo, indirect_offset + 8);
  
 if (brw->gen > 7)

return;
@@ -65,9 +62,7 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw)
 ADVANCE_BATCH();
  
 /* Load compute_dispatch_indirect_x_size into SRC0 */

-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo,
- I915_GEM_DOMAIN_INSTRUCTION, 0,
- indirect_offset + 0);
+   brw_load_register_mem32(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 0);
  
 /* predicate = (compute_dispatch_indirect_x_size == 0); */

 BEGIN_BATCH(1);
@@ -78,9 +73,7 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw)
 ADVANCE_BATCH();
  
 /* Load compute_dispatch_indirect_y_size into SRC0 */

-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo,
- I915_GEM_DOMAIN_INSTRUCTION, 0,
- indirect_offset + 4);
+   brw_load_register_mem32(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 4);
  
 /* predicate |= (compute_dispatch_indirect_y_size == 0); */

 BEGIN_BATCH(1);
@@ -91,9 +84,7 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw)
 ADVANCE_BATCH();
  
 /* Load compute_dispatch_indirect_z_size into SRC0 */

-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo,
- I915_GEM_DOMAIN_INSTRUCTION, 0,
- indirect_offset + 8);
+   brw_load_register_mem32(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 8);
  
 /* predicate |= (compute_dispatch_indirect_z_size == 0); */

 BEGIN_BATCH(1);
diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c 
b/src/mesa/drivers/dri/i965/brw_conditional_render.c
index 122a4ecc0f..8574fc1aeb 100644
--- a/src/mesa/drivers/dri/i965/brw_conditional_render.c
+++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
@@ -62,18 +62,8 @@ set_predicate_for_result(struct brw_context *brw,
  */
 brw_emit_pipe_control_flush(brw, PIPE_CONTROL_FLUSH_ENABLE);
  
-   brw_load_register_mem64(brw,

-   MI_PREDICATE_SRC0,
-   query->bo,
-   I915_GEM_DOMAIN_INSTRUCTION,
-   0, /* write domain */
-   0 /* offset */);
-   brw_load_register_mem64(brw,
-   MI_PREDICATE_SRC1,
-   query->bo,
-   I915_GEM_DOMAIN_INSTRUCTION,
-   0, /* write domain */
-   8 /* offset */);
+   brw_load_register_mem64(brw, MI_PREDICATE_SRC0, query->bo, 0 /* offset */);
+   brw_load_register_mem64(brw, MI_PREDICATE_SRC1, query->bo, 8 /* offset */);
  
 if (inverted)

load_op = 

Re: [Mesa-dev] [PATCH 4/7] i965: Replace opencoding of brw_load_register_imm32

2016-12-09 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 09/12/16 10:54, Chris Wilson wrote:

There are a few open coded setting of single registers using
MI_LOAD_REGISTER_IMM, replace those with a call to
brw_load_register_imm32().

Signed-off-by: Chris Wilson 
---
  src/mesa/drivers/dri/i965/brw_draw.c |  6 +-
  src/mesa/drivers/dri/i965/brw_state_upload.c | 13 +
  src/mesa/drivers/dri/i965/gen7_l3_state.c| 20 
  src/mesa/drivers/dri/i965/gen8_depth_state.c |  8 +++-
  4 files changed, 17 insertions(+), 30 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 52589d0d13..b78e73516e 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -239,11 +239,7 @@ brw_emit_prim(struct brw_context *brw,
} else {
   brw_load_register_mem32(brw, GEN7_3DPRIM_START_INSTANCE, bo,
   prim->indirect_offset + 12);
- BEGIN_BATCH(3);
- OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
- OUT_BATCH(GEN7_3DPRIM_BASE_VERTEX);
- OUT_BATCH(0);
- ADVANCE_BATCH();
+ brw_load_register_imm32(brw, GEN7_3DPRIM_BASE_VERTEX, 0);
}
 } else {
indirect_flag = 0;
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index b689ae41f6..ea58bf02cf 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -399,14 +399,11 @@ brw_upload_initial_gpu_state(struct brw_context *brw)
 brw_upload_invariant_state(brw);
  
 /* Recommended optimization for Victim Cache eviction in pixel backend. */

-   if (brw->gen >= 9) {
-  BEGIN_BATCH(3);
-  OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
-  OUT_BATCH(GEN7_CACHE_MODE_1);
-  OUT_BATCH(REG_MASK(GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC) |
-GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC);
-  ADVANCE_BATCH();
-   }
+   if (brw->gen >= 9)
+  brw_load_register_imm32(brw,
+  GEN7_CACHE_MODE_1,
+  REG_MASK(GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC) |
+  GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC);
  
 if (brw->gen >= 8) {

gen8_emit_3dstate_sample_pattern(brw);
diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
index e746b995c1..dd68f036b3 100644
--- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -117,21 +117,17 @@ setup_l3_config(struct brw_context *brw, const struct 
gen_l3_config *cfg)
 PIPE_CONTROL_CS_STALL);
  
 if (brw->gen >= 8) {

-  assert(!cfg->n[GEN_L3P_IS] && !cfg->n[GEN_L3P_C] && !cfg->n[GEN_L3P_T]);
+  uint32_t partition;
  
-  BEGIN_BATCH(3);

-  OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
+  assert(!cfg->n[GEN_L3P_IS] && !cfg->n[GEN_L3P_C] && !cfg->n[GEN_L3P_T]);
  
/* Set up the L3 partitioning. */

-  OUT_BATCH(GEN8_L3CNTLREG);
-  OUT_BATCH((has_slm ? GEN8_L3CNTLREG_SLM_ENABLE : 0) |
-SET_FIELD(cfg->n[GEN_L3P_URB], GEN8_L3CNTLREG_URB_ALLOC) |
-SET_FIELD(cfg->n[GEN_L3P_RO], GEN8_L3CNTLREG_RO_ALLOC) |
-SET_FIELD(cfg->n[GEN_L3P_DC], GEN8_L3CNTLREG_DC_ALLOC) |
-SET_FIELD(cfg->n[GEN_L3P_ALL], GEN8_L3CNTLREG_ALL_ALLOC));
-
-  ADVANCE_BATCH();
-
+  partition = has_slm ? GEN8_L3CNTLREG_SLM_ENABLE : 0;
+  partition |= SET_FIELD(cfg->n[GEN_L3P_URB], GEN8_L3CNTLREG_URB_ALLOC);
+  partition |= SET_FIELD(cfg->n[GEN_L3P_RO],  GEN8_L3CNTLREG_RO_ALLOC);
+  partition |= SET_FIELD(cfg->n[GEN_L3P_DC],  GEN8_L3CNTLREG_DC_ALLOC);
+  partition |= SET_FIELD(cfg->n[GEN_L3P_ALL], GEN8_L3CNTLREG_ALL_ALLOC);
+  brw_load_register_imm32(brw, GEN8_L3CNTLREG, partition);
 } else {
assert(!cfg->n[GEN_L3P_ALL]);
  
diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c b/src/mesa/drivers/dri/i965/gen8_depth_state.c

index 14689f400f..71e5831cf1 100644
--- a/src/mesa/drivers/dri/i965/gen8_depth_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c
@@ -347,11 +347,9 @@ gen8_write_pma_stall_bits(struct brw_context *brw, 
uint32_t pma_stall_bits)
 render_cache_flush);
  
 /* CACHE_MODE_1 is a non-privileged register. */

-   BEGIN_BATCH(3);
-   OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
-   OUT_BATCH(GEN7_CACHE_MODE_1);
-   OUT_BATCH(GEN8_HIZ_PMA_MASK_BITS | pma_stall_bits);
-   ADVANCE_BATCH();
+   brw_load_register_imm32(brw,
+   GEN7_CACHE_MODE_1,
+   GEN8_HIZ_PMA_MASK_BITS | pma_stall_bits);
  
 /* After the LRI, a PIPE_CONTROL with both the Depth Stall and Depth Cache

  * Flush bits is often necessary.  We do it regardless because it's easier.




Re: [Mesa-dev] [PATCH 6/7] i965: s/brw_load_register_reg/brw_load_register_reg32/

2016-12-09 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 09/12/16 10:54, Chris Wilson wrote:

Rename brw_load_register_reg to include the width (32bits) similar to
all the other register routines.

Signed-off-by: Chris Wilson 
---
  src/mesa/drivers/dri/i965/brw_pipelined_register.c | 2 +-
  src/mesa/drivers/dri/i965/brw_pipelined_register.h | 6 +++---
  src/mesa/drivers/dri/i965/hsw_queryobj.c   | 2 +-
  3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_pipelined_register.c 
b/src/mesa/drivers/dri/i965/brw_pipelined_register.c
index b143bac04e..0b226035e7 100644
--- a/src/mesa/drivers/dri/i965/brw_pipelined_register.c
+++ b/src/mesa/drivers/dri/i965/brw_pipelined_register.c
@@ -175,7 +175,7 @@ brw_load_register_imm64(struct brw_context *brw, uint32_t 
reg, uint64_t imm)
   * Copies a 32-bit register.
   */
  void
-brw_load_register_reg(struct brw_context *brw, uint32_t src, uint32_t dest)
+brw_load_register_reg32(struct brw_context *brw, uint32_t src, uint32_t dest)
  {
 assert(brw->gen >= 8 || brw->is_haswell);
  
diff --git a/src/mesa/drivers/dri/i965/brw_pipelined_register.h b/src/mesa/drivers/dri/i965/brw_pipelined_register.h

index 94d52433a1..7730f4cad7 100644
--- a/src/mesa/drivers/dri/i965/brw_pipelined_register.h
+++ b/src/mesa/drivers/dri/i965/brw_pipelined_register.h
@@ -53,9 +53,9 @@ void brw_load_register_imm64(struct brw_context *brw,
   uint32_t reg,
   uint64_t imm);
  
-void brw_load_register_reg(struct brw_context *brw,

-   uint32_t src,
-   uint32_t dest);
+void brw_load_register_reg32(struct brw_context *brw,
+ uint32_t src,
+ uint32_t dest);
  void brw_load_register_reg64(struct brw_context *brw,
   uint32_t src,
   uint32_t dest);
diff --git a/src/mesa/drivers/dri/i965/hsw_queryobj.c 
b/src/mesa/drivers/dri/i965/hsw_queryobj.c
index c3eeafc091..e9a6f459a1 100644
--- a/src/mesa/drivers/dri/i965/hsw_queryobj.c
+++ b/src/mesa/drivers/dri/i965/hsw_queryobj.c
@@ -156,7 +156,7 @@ static void
  shr_gpr0_by_2_bits(struct brw_context *brw)
  {
 shl_gpr0_by_30_bits(brw);
-   brw_load_register_reg(brw, HSW_CS_GPR(0) + 4, HSW_CS_GPR(0));
+   brw_load_register_reg32(brw, HSW_CS_GPR(0) + 4, HSW_CS_GPR(0));
 brw_load_register_imm32(brw, HSW_CS_GPR(0) + 4, 0);
  }
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] i965: Move pipelined register access to its own file

2016-12-09 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 09/12/16 10:54, Chris Wilson wrote:

My ulterior motive is to kill intel_batchbuffer.[ch] and moving
discrete pieces of functionality into their own files is a small step
towards that goal.

Signed-off-by: Chris Wilson 
---
  src/mesa/drivers/dri/i965/Makefile.sources |   2 +
  src/mesa/drivers/dri/i965/brw_compute.c|   1 +
  src/mesa/drivers/dri/i965/brw_conditional_render.c |   2 +
  src/mesa/drivers/dri/i965/brw_context.h|  26 ---
  src/mesa/drivers/dri/i965/brw_draw.c   |   1 +
  .../drivers/dri/i965/brw_performance_monitor.c |   2 +
  src/mesa/drivers/dri/i965/brw_pipelined_register.c | 252 +
  src/mesa/drivers/dri/i965/brw_pipelined_register.h |  76 +++
  src/mesa/drivers/dri/i965/brw_state_upload.c   |   1 +
  src/mesa/drivers/dri/i965/gen6_queryobj.c  |   1 +
  src/mesa/drivers/dri/i965/gen7_l3_state.c  |   2 +
  src/mesa/drivers/dri/i965/gen7_sol_state.c |   2 +
  src/mesa/drivers/dri/i965/gen8_depth_state.c   |   1 +
  src/mesa/drivers/dri/i965/hsw_queryobj.c   |   2 +
  src/mesa/drivers/dri/i965/hsw_sol.c|   2 +
  src/mesa/drivers/dri/i965/intel_batchbuffer.c  | 224 --
  16 files changed, 347 insertions(+), 250 deletions(-)
  create mode 100644 src/mesa/drivers/dri/i965/brw_pipelined_register.c
  create mode 100644 src/mesa/drivers/dri/i965/brw_pipelined_register.h

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 1c33ea55fa..49044db169 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -137,6 +137,8 @@ i965_FILES = \
brw_object_purgeable.c \
brw_performance_monitor.c \
brw_pipe_control.c \
+   brw_pipelined_register.c \
+   brw_pipelined_register.h \
brw_program.c \
brw_program.h \
brw_program_cache.c \
diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
b/src/mesa/drivers/dri/i965/brw_compute.c
index 51cd45df7a..d63ebbe588 100644
--- a/src/mesa/drivers/dri/i965/brw_compute.c
+++ b/src/mesa/drivers/dri/i965/brw_compute.c
@@ -28,6 +28,7 @@
  #include "main/state.h"
  #include "brw_context.h"
  #include "brw_draw.h"
+#include "brw_pipelined_register.h"
  #include "brw_state.h"
  #include "intel_batchbuffer.h"
  #include "intel_buffer_objects.h"
diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c 
b/src/mesa/drivers/dri/i965/brw_conditional_render.c
index 8574fc1aeb..6ad218be55 100644
--- a/src/mesa/drivers/dri/i965/brw_conditional_render.c
+++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
@@ -35,6 +35,8 @@
  
  #include "brw_context.h"

  #include "brw_defines.h"
+#include "brw_pipelined_register.h"
+
  #include "intel_batchbuffer.h"
  
  static void

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 77a5f8b879..428f5773c1 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1362,32 +1362,6 @@ void hsw_init_queryobj_functions(struct 
dd_function_table *functions);
  void brw_init_conditional_render_functions(struct dd_function_table 
*functions);
  bool brw_check_conditional_render(struct brw_context *brw);
  
-/** intel_batchbuffer.c */

-void brw_load_register_mem32(struct brw_context *brw,
- uint32_t reg,
- drm_intel_bo *bo,
- uint32_t offset);
-void brw_load_register_mem64(struct brw_context *brw,
- uint32_t reg,
- drm_intel_bo *bo,
- uint32_t offset);
-void brw_store_register_mem32(struct brw_context *brw,
-  drm_intel_bo *bo, uint32_t reg, uint32_t offset);
-void brw_store_register_mem64(struct brw_context *brw,
-  drm_intel_bo *bo, uint32_t reg, uint32_t offset);
-void brw_load_register_imm32(struct brw_context *brw,
- uint32_t reg, uint32_t imm);
-void brw_load_register_imm64(struct brw_context *brw,
- uint32_t reg, uint64_t imm);
-void brw_load_register_reg(struct brw_context *brw, uint32_t src,
-   uint32_t dest);
-void brw_load_register_reg64(struct brw_context *brw, uint32_t src,
- uint32_t dest);
-void brw_store_data_imm32(struct brw_context *brw, drm_intel_bo *bo,
-  uint32_t offset, uint32_t imm);
-void brw_store_data_imm64(struct brw_context *brw, drm_intel_bo *bo,
-  uint32_t offset, uint64_t imm);
-
  /*==
   * brw_state_dump.c
   */
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 

Re: [Mesa-dev] [PATCH 2/7] i965: Replace open-coded store/load SO_WRITE_OFFSET to/from mem

2016-12-09 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 09/12/16 10:54, Chris Wilson wrote:

Rather than emit the instructions directions, make use of the helpers
brw_store_register_mem32() and brw_load_register_mem()

Signed-off-by: Chris Wilson 
---
  src/mesa/drivers/dri/i965/hsw_sol.c | 27 +--
  1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/hsw_sol.c 
b/src/mesa/drivers/dri/i965/hsw_sol.c
index 87d4ab531b..2f1112699b 100644
--- a/src/mesa/drivers/dri/i965/hsw_sol.c
+++ b/src/mesa/drivers/dri/i965/hsw_sol.c
@@ -204,15 +204,10 @@ hsw_pause_transform_feedback(struct gl_context *ctx,
brw_emit_mi_flush(brw);
  
/* Save the SOL buffer offset register values. */

-  for (int i = 0; i < BRW_MAX_XFB_STREAMS; i++) {
- BEGIN_BATCH(3);
- OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2));
- OUT_BATCH(GEN7_SO_WRITE_OFFSET(i));
- OUT_RELOC(brw_obj->offset_bo,
-   I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-   i * sizeof(uint32_t));
- ADVANCE_BATCH();
-  }
+  for (int i = 0; i < BRW_MAX_XFB_STREAMS; i++)
+ brw_store_register_mem32(brw, brw_obj->offset_bo,
+  GEN7_SO_WRITE_OFFSET(i),
+  i * sizeof(uint32_t));
 }
  
 /* Add any primitives written to our tally */

@@ -232,15 +227,11 @@ hsw_resume_transform_feedback(struct gl_context *ctx,
  
 if (brw->is_haswell) {

/* Reload the SOL buffer offset registers. */
-  for (int i = 0; i < BRW_MAX_XFB_STREAMS; i++) {
- BEGIN_BATCH(3);
- OUT_BATCH(GEN7_MI_LOAD_REGISTER_MEM | (3 - 2));
- OUT_BATCH(GEN7_SO_WRITE_OFFSET(i));
- OUT_RELOC(brw_obj->offset_bo,
-   I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-   i * sizeof(uint32_t));
- ADVANCE_BATCH();
-  }
+  for (int i = 0; i < BRW_MAX_XFB_STREAMS; i++)
+ brw_load_register_mem(brw, GEN7_SO_WRITE_OFFSET(i),
+   brw_obj->offset_bo,
+   I915_GEM_DOMAIN_INSTRUCTION, 0,
+   i * sizeof(uint32_t));
 }
  
 /* Store the new starting value of the SO_NUM_PRIMS_WRITTEN counters. */



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3] EGL/android: Enhance pbuffer implementation

2016-12-09 Thread Liu Zhiquan
Some dri drivers will pass multiple bits in buffer_mask parameter
to droid_image_get_buffer(), more than the actual supported buffer
type combination. For such case, will go through all the bits, and
will not return error when unsupported buffer is requested, only
return error when the allocation for supported buffer failed.

v2: coding style and log changes
v3: coding style changes and update patch format

Signed-off-by: Liu Zhiquan 
Signed-off-by: Long, Zhifang 
Reviewed-by: Tomasz Figa 
---
 src/egl/drivers/dri2/platform_android.c | 177 +---
 1 file changed, 96 insertions(+), 81 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_android.c 
b/src/egl/drivers/dri2/platform_android.c
index 373e2c0..1c880f9 100644
--- a/src/egl/drivers/dri2/platform_android.c
+++ b/src/egl/drivers/dri2/platform_android.c
@@ -434,7 +434,40 @@ update_buffers(struct dri2_egl_surface *dri2_surf)
 }
 
 static int
-get_back_bo(struct dri2_egl_surface *dri2_surf)
+get_front_bo(struct dri2_egl_surface *dri2_surf, unsigned int format)
+{
+   struct dri2_egl_display *dri2_dpy =
+  dri2_egl_display(dri2_surf->base.Resource.Display);
+
+   if (dri2_surf->dri_image_front)
+  return 0;
+
+   if (dri2_surf->base.Type == EGL_WINDOW_BIT) {
+  /* According current EGL spec, front buffer rendering
+   * for window surface is not supported now.
+   * and mesa doesn't have the implementation of this case.
+   * Add warning message, but not treat it as error.
+   */
+  _eglLog(_EGL_DEBUG, "DRI driver requested unsupported front buffer for 
window surface");
+   } else if (dri2_surf->base.Type == EGL_PBUFFER_BIT) {
+  dri2_surf->dri_image_front =
+  dri2_dpy->image->createImage(dri2_dpy->dri_screen,
+  dri2_surf->base.Width,
+  dri2_surf->base.Height,
+  format,
+  0,
+  dri2_surf);
+  if (!dri2_surf->dri_image_front) {
+ _eglLog(_EGL_WARNING, "dri2_image_front allocation failed");
+ return -1;
+  }
+   }
+
+   return 0;
+}
+
+static int
+get_back_bo(struct dri2_egl_surface *dri2_surf, unsigned int format)
 {
struct dri2_egl_display *dri2_dpy =
   dri2_egl_display(dri2_surf->base.Resource.Display);
@@ -444,42 +477,68 @@ get_back_bo(struct dri2_egl_surface *dri2_surf)
if (dri2_surf->dri_image_back)
   return 0;
 
-   if (!dri2_surf->buffer)
-  return -1;
+   if (dri2_surf->base.Type == EGL_WINDOW_BIT) {
+  if (!dri2_surf->buffer) {
+ _eglLog(_EGL_WARNING, "Could not get native buffer");
+ return -1;
+  }
 
-   fd = get_native_buffer_fd(dri2_surf->buffer);
-   if (fd < 0) {
-  _eglLog(_EGL_WARNING, "Could not get native buffer FD");
-  return -1;
-   }
+  fd = get_native_buffer_fd(dri2_surf->buffer);
+  if (fd < 0) {
+ _eglLog(_EGL_WARNING, "Could not get native buffer FD");
+ return -1;
+  }
 
-   fourcc = get_fourcc(dri2_surf->buffer->format);
+  fourcc = get_fourcc(dri2_surf->buffer->format);
 
-   pitch = dri2_surf->buffer->stride *
-  get_format_bpp(dri2_surf->buffer->format);
+  pitch = dri2_surf->buffer->stride *
+ get_format_bpp(dri2_surf->buffer->format);
 
-   if (fourcc == -1 || pitch == 0) {
-  _eglLog(_EGL_WARNING, "Invalid buffer fourcc(%x) or pitch(%d)",
-  fourcc, pitch);
-  return -1;
-   }
+  if (fourcc == -1 || pitch == 0) {
+ _eglLog(_EGL_WARNING, "Invalid buffer fourcc(%x) or pitch(%d)",
+ fourcc, pitch);
+ return -1;
+  }
 
-   dri2_surf->dri_image_back =
-  dri2_dpy->image->createImageFromFds(dri2_dpy->dri_screen,
-  dri2_surf->base.Width,
-  dri2_surf->base.Height,
-  fourcc,
-  ,
-  1,
-  ,
-  ,
-  dri2_surf);
-   if (!dri2_surf->dri_image_back)
-  return -1;
+  dri2_surf->dri_image_back =
+ dri2_dpy->image->createImageFromFds(dri2_dpy->dri_screen,
+ dri2_surf->base.Width,
+ dri2_surf->base.Height,
+ fourcc,
+ ,
+ 1,
+ ,
+ ,
+ dri2_surf);
+  if (!dri2_surf->dri_image_back) {
+ 

Re: [Mesa-dev] [PATCH 4/5] glapi: add missing INTEL_conservative_rasterization

2016-12-09 Thread Lionel Landwerlin
We need the enum somewhere in the xml files for patch 5 to support the 
glGet*()

Otherwise get_hash_params.py will error out.

On 09/12/16 01:29, Ilia Mirkin wrote:

While I'm not against it, not sure that this has much use... mostly
this would be for _mesa_enum_to_string() to work AFAIK. Also, for such
smaller exts, we tend to just stick them into gl_API.xml directly.
Lastly if you do want to keep it in a separate file, make sure to add
it to the list in Makefile.am.


Oops...



On Thu, Dec 8, 2016 at 7:11 AM, Lionel Landwerlin
 wrote:

Signed-off-by: Lionel Landwerlin 
Cc: Ilia Mirkin 
---
  src/mapi/glapi/gen/INTEL_conservative_rasterization.xml | 10 ++
  src/mapi/glapi/gen/gl_API.xml   |  1 +
  2 files changed, 11 insertions(+)
  create mode 100644 src/mapi/glapi/gen/INTEL_conservative_rasterization.xml

diff --git a/src/mapi/glapi/gen/INTEL_conservative_rasterization.xml 
b/src/mapi/glapi/gen/INTEL_conservative_rasterization.xml
new file mode 100644
index 000..0eeb4ce
--- /dev/null
+++ b/src/mapi/glapi/gen/INTEL_conservative_rasterization.xml
@@ -0,0 +1,10 @@
+
+
+
+
+
+
+  
+
+
+
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 00c9bb7..e65ab10 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -12812,6 +12812,7 @@
  
  

+http://www.w3.org/2001/XInclude"/>
  http://www.w3.org/2001/XInclude"/>

  
--
2.10.2



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/7] i965: Reorder parameters to brw_store_register_mem

2016-12-09 Thread Chris Wilson
Reorder the parameters to brw_store_register_mem32 and
brw_store_register_mem64 so that the offset into the buffer and its
identifier are paired. This brings the interface into line wth
brw_load_register_mem.

Signed-off-by: Chris Wilson 
---
 src/mesa/drivers/dri/i965/brw_performance_monitor.c |  3 ++-
 src/mesa/drivers/dri/i965/brw_pipelined_register.c  |  8 ++--
 src/mesa/drivers/dri/i965/brw_pipelined_register.h  |  4 ++--
 src/mesa/drivers/dri/i965/gen6_queryobj.c   | 21 -
 src/mesa/drivers/dri/i965/gen7_sol_state.c  |  5 ++---
 src/mesa/drivers/dri/i965/hsw_sol.c | 16 +---
 6 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_performance_monitor.c 
b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
index 1b991bfafa..e198525f8f 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_monitor.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
@@ -589,8 +589,9 @@ snapshot_statistics_registers(struct brw_context *brw,
  assert(ctx->PerfMonitor.Groups[group].Counters[i].Type ==
 GL_UNSIGNED_INT64_AMD);
 
- brw_store_register_mem64(brw, monitor->pipeline_stats_bo,
+ brw_store_register_mem64(brw,
   brw->perfmon.statistics_registers[i],
+  monitor->pipeline_stats_bo,
   offset + i * sizeof(uint64_t));
   }
}
diff --git a/src/mesa/drivers/dri/i965/brw_pipelined_register.c 
b/src/mesa/drivers/dri/i965/brw_pipelined_register.c
index 0b226035e7..6b6b2487e8 100644
--- a/src/mesa/drivers/dri/i965/brw_pipelined_register.c
+++ b/src/mesa/drivers/dri/i965/brw_pipelined_register.c
@@ -81,7 +81,9 @@ brw_load_register_mem64(struct brw_context *brw,
  */
 void
 brw_store_register_mem32(struct brw_context *brw,
- drm_intel_bo *bo, uint32_t reg, uint32_t offset)
+ uint32_t reg,
+ drm_intel_bo *bo,
+ uint32_t offset)
 {
assert(brw->gen >= 6);
 
@@ -107,7 +109,9 @@ brw_store_register_mem32(struct brw_context *brw,
  */
 void
 brw_store_register_mem64(struct brw_context *brw,
- drm_intel_bo *bo, uint32_t reg, uint32_t offset)
+ uint32_t reg,
+ drm_intel_bo *bo,
+ uint32_t offset)
 {
assert(brw->gen >= 6);
 
diff --git a/src/mesa/drivers/dri/i965/brw_pipelined_register.h 
b/src/mesa/drivers/dri/i965/brw_pipelined_register.h
index 7730f4cad7..1904ae4a54 100644
--- a/src/mesa/drivers/dri/i965/brw_pipelined_register.h
+++ b/src/mesa/drivers/dri/i965/brw_pipelined_register.h
@@ -38,12 +38,12 @@ void brw_load_register_mem64(struct brw_context *brw,
  uint32_t offset);
 
 void brw_store_register_mem32(struct brw_context *brw,
-  drm_intel_bo *bo,
   uint32_t reg,
+  drm_intel_bo *bo,
   uint32_t offset);
 void brw_store_register_mem64(struct brw_context *brw,
-  drm_intel_bo *bo,
   uint32_t reg,
+  drm_intel_bo *bo,
   uint32_t offset);
 
 void brw_load_register_imm32(struct brw_context *brw,
diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c 
b/src/mesa/drivers/dri/i965/gen6_queryobj.c
index ce6813b531..9de83ed50b 100644
--- a/src/mesa/drivers/dri/i965/gen6_queryobj.c
+++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c
@@ -75,12 +75,13 @@ write_primitives_generated(struct brw_context *brw,
brw_emit_mi_flush(brw);
 
if (brw->gen >= 7 && stream > 0) {
-  brw_store_register_mem64(brw, query_bo,
+  brw_store_register_mem64(brw,
GEN7_SO_PRIM_STORAGE_NEEDED(stream),
-   idx * sizeof(uint64_t));
+   query_bo, idx * sizeof(uint64_t));
} else {
-  brw_store_register_mem64(brw, query_bo, CL_INVOCATION_COUNT,
-   idx * sizeof(uint64_t));
+  brw_store_register_mem64(brw,
+   CL_INVOCATION_COUNT,
+   query_bo, idx * sizeof(uint64_t));
}
 }
 
@@ -91,11 +92,13 @@ write_xfb_primitives_written(struct brw_context *brw,
brw_emit_mi_flush(brw);
 
if (brw->gen >= 7) {
-  brw_store_register_mem64(brw, bo, GEN7_SO_NUM_PRIMS_WRITTEN(stream),
-   idx * sizeof(uint64_t));
+  brw_store_register_mem64(brw,
+   GEN7_SO_NUM_PRIMS_WRITTEN(stream),
+   bo, idx * sizeof(uint64_t));
} else {
-  brw_store_register_mem64(brw, bo, GEN6_SO_NUM_PRIMS_WRITTEN,
-   idx * sizeof(uint64_t));
+  

[Mesa-dev] [PATCH 3/7] i965: Stop passing read/write domains to load_reg_mem32/64

2016-12-09 Thread Chris Wilson
The domains used are immaterial, and we should never be marking the read
from the buffer as a write, so stop passing them around from the caller
and choose the appropriate read domain when writing.

Signed-off-by: Chris Wilson 
---
 src/mesa/drivers/dri/i965/brw_compute.c| 27 +--
 src/mesa/drivers/dri/i965/brw_conditional_render.c | 14 ++--
 src/mesa/drivers/dri/i965/brw_context.h| 10 +++---
 src/mesa/drivers/dri/i965/brw_draw.c   | 38 +-
 src/mesa/drivers/dri/i965/hsw_queryobj.c   | 29 -
 src/mesa/drivers/dri/i965/hsw_sol.c| 14 +++-
 src/mesa/drivers/dri/i965/intel_batchbuffer.c  | 19 +--
 7 files changed, 48 insertions(+), 103 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
b/src/mesa/drivers/dri/i965/brw_compute.c
index 16b5df7ca4..51cd45df7a 100644
--- a/src/mesa/drivers/dri/i965/brw_compute.c
+++ b/src/mesa/drivers/dri/i965/brw_compute.c
@@ -40,15 +40,12 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw)
GLintptr indirect_offset = brw->compute.num_work_groups_offset;
drm_intel_bo *bo = brw->compute.num_work_groups_bo;
 
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMX, bo,
- I915_GEM_DOMAIN_VERTEX, 0,
- indirect_offset + 0);
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMY, bo,
- I915_GEM_DOMAIN_VERTEX, 0,
- indirect_offset + 4);
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMZ, bo,
- I915_GEM_DOMAIN_VERTEX, 0,
- indirect_offset + 8);
+   brw_load_register_mem32(brw,
+   GEN7_GPGPU_DISPATCHDIMX, bo, indirect_offset + 0);
+   brw_load_register_mem32(brw,
+   GEN7_GPGPU_DISPATCHDIMY, bo, indirect_offset + 4);
+   brw_load_register_mem32(brw,
+   GEN7_GPGPU_DISPATCHDIMZ, bo, indirect_offset + 8);
 
if (brw->gen > 7)
   return;
@@ -65,9 +62,7 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw)
ADVANCE_BATCH();
 
/* Load compute_dispatch_indirect_x_size into SRC0 */
-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo,
- I915_GEM_DOMAIN_INSTRUCTION, 0,
- indirect_offset + 0);
+   brw_load_register_mem32(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 0);
 
/* predicate = (compute_dispatch_indirect_x_size == 0); */
BEGIN_BATCH(1);
@@ -78,9 +73,7 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw)
ADVANCE_BATCH();
 
/* Load compute_dispatch_indirect_y_size into SRC0 */
-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo,
- I915_GEM_DOMAIN_INSTRUCTION, 0,
- indirect_offset + 4);
+   brw_load_register_mem32(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 4);
 
/* predicate |= (compute_dispatch_indirect_y_size == 0); */
BEGIN_BATCH(1);
@@ -91,9 +84,7 @@ prepare_indirect_gpgpu_walker(struct brw_context *brw)
ADVANCE_BATCH();
 
/* Load compute_dispatch_indirect_z_size into SRC0 */
-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo,
- I915_GEM_DOMAIN_INSTRUCTION, 0,
- indirect_offset + 8);
+   brw_load_register_mem32(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 8);
 
/* predicate |= (compute_dispatch_indirect_z_size == 0); */
BEGIN_BATCH(1);
diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c 
b/src/mesa/drivers/dri/i965/brw_conditional_render.c
index 122a4ecc0f..8574fc1aeb 100644
--- a/src/mesa/drivers/dri/i965/brw_conditional_render.c
+++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
@@ -62,18 +62,8 @@ set_predicate_for_result(struct brw_context *brw,
 */
brw_emit_pipe_control_flush(brw, PIPE_CONTROL_FLUSH_ENABLE);
 
-   brw_load_register_mem64(brw,
-   MI_PREDICATE_SRC0,
-   query->bo,
-   I915_GEM_DOMAIN_INSTRUCTION,
-   0, /* write domain */
-   0 /* offset */);
-   brw_load_register_mem64(brw,
-   MI_PREDICATE_SRC1,
-   query->bo,
-   I915_GEM_DOMAIN_INSTRUCTION,
-   0, /* write domain */
-   8 /* offset */);
+   brw_load_register_mem64(brw, MI_PREDICATE_SRC0, query->bo, 0 /* offset */);
+   brw_load_register_mem64(brw, MI_PREDICATE_SRC1, query->bo, 8 /* offset */);
 
if (inverted)
   load_op = MI_PREDICATE_LOADOP_LOAD;
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 550eefedcc..77a5f8b879 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1363,15 +1363,13 @@ void brw_init_conditional_render_functions(struct 

[Mesa-dev] [PATCH 5/7] i965: Move pipelined register access to its own file

2016-12-09 Thread Chris Wilson
My ulterior motive is to kill intel_batchbuffer.[ch] and moving
discrete pieces of functionality into their own files is a small step
towards that goal.

Signed-off-by: Chris Wilson 
---
 src/mesa/drivers/dri/i965/Makefile.sources |   2 +
 src/mesa/drivers/dri/i965/brw_compute.c|   1 +
 src/mesa/drivers/dri/i965/brw_conditional_render.c |   2 +
 src/mesa/drivers/dri/i965/brw_context.h|  26 ---
 src/mesa/drivers/dri/i965/brw_draw.c   |   1 +
 .../drivers/dri/i965/brw_performance_monitor.c |   2 +
 src/mesa/drivers/dri/i965/brw_pipelined_register.c | 252 +
 src/mesa/drivers/dri/i965/brw_pipelined_register.h |  76 +++
 src/mesa/drivers/dri/i965/brw_state_upload.c   |   1 +
 src/mesa/drivers/dri/i965/gen6_queryobj.c  |   1 +
 src/mesa/drivers/dri/i965/gen7_l3_state.c  |   2 +
 src/mesa/drivers/dri/i965/gen7_sol_state.c |   2 +
 src/mesa/drivers/dri/i965/gen8_depth_state.c   |   1 +
 src/mesa/drivers/dri/i965/hsw_queryobj.c   |   2 +
 src/mesa/drivers/dri/i965/hsw_sol.c|   2 +
 src/mesa/drivers/dri/i965/intel_batchbuffer.c  | 224 --
 16 files changed, 347 insertions(+), 250 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_pipelined_register.c
 create mode 100644 src/mesa/drivers/dri/i965/brw_pipelined_register.h

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 1c33ea55fa..49044db169 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -137,6 +137,8 @@ i965_FILES = \
brw_object_purgeable.c \
brw_performance_monitor.c \
brw_pipe_control.c \
+   brw_pipelined_register.c \
+   brw_pipelined_register.h \
brw_program.c \
brw_program.h \
brw_program_cache.c \
diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
b/src/mesa/drivers/dri/i965/brw_compute.c
index 51cd45df7a..d63ebbe588 100644
--- a/src/mesa/drivers/dri/i965/brw_compute.c
+++ b/src/mesa/drivers/dri/i965/brw_compute.c
@@ -28,6 +28,7 @@
 #include "main/state.h"
 #include "brw_context.h"
 #include "brw_draw.h"
+#include "brw_pipelined_register.h"
 #include "brw_state.h"
 #include "intel_batchbuffer.h"
 #include "intel_buffer_objects.h"
diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c 
b/src/mesa/drivers/dri/i965/brw_conditional_render.c
index 8574fc1aeb..6ad218be55 100644
--- a/src/mesa/drivers/dri/i965/brw_conditional_render.c
+++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
@@ -35,6 +35,8 @@
 
 #include "brw_context.h"
 #include "brw_defines.h"
+#include "brw_pipelined_register.h"
+
 #include "intel_batchbuffer.h"
 
 static void
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 77a5f8b879..428f5773c1 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1362,32 +1362,6 @@ void hsw_init_queryobj_functions(struct 
dd_function_table *functions);
 void brw_init_conditional_render_functions(struct dd_function_table 
*functions);
 bool brw_check_conditional_render(struct brw_context *brw);
 
-/** intel_batchbuffer.c */
-void brw_load_register_mem32(struct brw_context *brw,
- uint32_t reg,
- drm_intel_bo *bo,
- uint32_t offset);
-void brw_load_register_mem64(struct brw_context *brw,
- uint32_t reg,
- drm_intel_bo *bo,
- uint32_t offset);
-void brw_store_register_mem32(struct brw_context *brw,
-  drm_intel_bo *bo, uint32_t reg, uint32_t offset);
-void brw_store_register_mem64(struct brw_context *brw,
-  drm_intel_bo *bo, uint32_t reg, uint32_t offset);
-void brw_load_register_imm32(struct brw_context *brw,
- uint32_t reg, uint32_t imm);
-void brw_load_register_imm64(struct brw_context *brw,
- uint32_t reg, uint64_t imm);
-void brw_load_register_reg(struct brw_context *brw, uint32_t src,
-   uint32_t dest);
-void brw_load_register_reg64(struct brw_context *brw, uint32_t src,
- uint32_t dest);
-void brw_store_data_imm32(struct brw_context *brw, drm_intel_bo *bo,
-  uint32_t offset, uint32_t imm);
-void brw_store_data_imm64(struct brw_context *brw, drm_intel_bo *bo,
-  uint32_t offset, uint64_t imm);
-
 /*==
  * brw_state_dump.c
  */
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index b78e73516e..44d5dac1fc 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -44,6 

[Mesa-dev] [PATCH 6/7] i965: s/brw_load_register_reg/brw_load_register_reg32/

2016-12-09 Thread Chris Wilson
Rename brw_load_register_reg to include the width (32bits) similar to
all the other register routines.

Signed-off-by: Chris Wilson 
---
 src/mesa/drivers/dri/i965/brw_pipelined_register.c | 2 +-
 src/mesa/drivers/dri/i965/brw_pipelined_register.h | 6 +++---
 src/mesa/drivers/dri/i965/hsw_queryobj.c   | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_pipelined_register.c 
b/src/mesa/drivers/dri/i965/brw_pipelined_register.c
index b143bac04e..0b226035e7 100644
--- a/src/mesa/drivers/dri/i965/brw_pipelined_register.c
+++ b/src/mesa/drivers/dri/i965/brw_pipelined_register.c
@@ -175,7 +175,7 @@ brw_load_register_imm64(struct brw_context *brw, uint32_t 
reg, uint64_t imm)
  * Copies a 32-bit register.
  */
 void
-brw_load_register_reg(struct brw_context *brw, uint32_t src, uint32_t dest)
+brw_load_register_reg32(struct brw_context *brw, uint32_t src, uint32_t dest)
 {
assert(brw->gen >= 8 || brw->is_haswell);
 
diff --git a/src/mesa/drivers/dri/i965/brw_pipelined_register.h 
b/src/mesa/drivers/dri/i965/brw_pipelined_register.h
index 94d52433a1..7730f4cad7 100644
--- a/src/mesa/drivers/dri/i965/brw_pipelined_register.h
+++ b/src/mesa/drivers/dri/i965/brw_pipelined_register.h
@@ -53,9 +53,9 @@ void brw_load_register_imm64(struct brw_context *brw,
  uint32_t reg,
  uint64_t imm);
 
-void brw_load_register_reg(struct brw_context *brw,
-   uint32_t src,
-   uint32_t dest);
+void brw_load_register_reg32(struct brw_context *brw,
+ uint32_t src,
+ uint32_t dest);
 void brw_load_register_reg64(struct brw_context *brw,
  uint32_t src,
  uint32_t dest);
diff --git a/src/mesa/drivers/dri/i965/hsw_queryobj.c 
b/src/mesa/drivers/dri/i965/hsw_queryobj.c
index c3eeafc091..e9a6f459a1 100644
--- a/src/mesa/drivers/dri/i965/hsw_queryobj.c
+++ b/src/mesa/drivers/dri/i965/hsw_queryobj.c
@@ -156,7 +156,7 @@ static void
 shr_gpr0_by_2_bits(struct brw_context *brw)
 {
shl_gpr0_by_30_bits(brw);
-   brw_load_register_reg(brw, HSW_CS_GPR(0) + 4, HSW_CS_GPR(0));
+   brw_load_register_reg32(brw, HSW_CS_GPR(0) + 4, HSW_CS_GPR(0));
brw_load_register_imm32(brw, HSW_CS_GPR(0) + 4, 0);
 }
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/7] i965: Replace opencoding of brw_load_register_imm32

2016-12-09 Thread Chris Wilson
There are a few open coded setting of single registers using
MI_LOAD_REGISTER_IMM, replace those with a call to
brw_load_register_imm32().

Signed-off-by: Chris Wilson 
---
 src/mesa/drivers/dri/i965/brw_draw.c |  6 +-
 src/mesa/drivers/dri/i965/brw_state_upload.c | 13 +
 src/mesa/drivers/dri/i965/gen7_l3_state.c| 20 
 src/mesa/drivers/dri/i965/gen8_depth_state.c |  8 +++-
 4 files changed, 17 insertions(+), 30 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 52589d0d13..b78e73516e 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -239,11 +239,7 @@ brw_emit_prim(struct brw_context *brw,
   } else {
  brw_load_register_mem32(brw, GEN7_3DPRIM_START_INSTANCE, bo,
  prim->indirect_offset + 12);
- BEGIN_BATCH(3);
- OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
- OUT_BATCH(GEN7_3DPRIM_BASE_VERTEX);
- OUT_BATCH(0);
- ADVANCE_BATCH();
+ brw_load_register_imm32(brw, GEN7_3DPRIM_BASE_VERTEX, 0);
   }
} else {
   indirect_flag = 0;
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index b689ae41f6..ea58bf02cf 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -399,14 +399,11 @@ brw_upload_initial_gpu_state(struct brw_context *brw)
brw_upload_invariant_state(brw);
 
/* Recommended optimization for Victim Cache eviction in pixel backend. */
-   if (brw->gen >= 9) {
-  BEGIN_BATCH(3);
-  OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
-  OUT_BATCH(GEN7_CACHE_MODE_1);
-  OUT_BATCH(REG_MASK(GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC) |
-GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC);
-  ADVANCE_BATCH();
-   }
+   if (brw->gen >= 9)
+  brw_load_register_imm32(brw,
+  GEN7_CACHE_MODE_1,
+  REG_MASK(GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC) |
+  GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC);
 
if (brw->gen >= 8) {
   gen8_emit_3dstate_sample_pattern(brw);
diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
index e746b995c1..dd68f036b3 100644
--- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -117,21 +117,17 @@ setup_l3_config(struct brw_context *brw, const struct 
gen_l3_config *cfg)
PIPE_CONTROL_CS_STALL);
 
if (brw->gen >= 8) {
-  assert(!cfg->n[GEN_L3P_IS] && !cfg->n[GEN_L3P_C] && !cfg->n[GEN_L3P_T]);
+  uint32_t partition;
 
-  BEGIN_BATCH(3);
-  OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
+  assert(!cfg->n[GEN_L3P_IS] && !cfg->n[GEN_L3P_C] && !cfg->n[GEN_L3P_T]);
 
   /* Set up the L3 partitioning. */
-  OUT_BATCH(GEN8_L3CNTLREG);
-  OUT_BATCH((has_slm ? GEN8_L3CNTLREG_SLM_ENABLE : 0) |
-SET_FIELD(cfg->n[GEN_L3P_URB], GEN8_L3CNTLREG_URB_ALLOC) |
-SET_FIELD(cfg->n[GEN_L3P_RO], GEN8_L3CNTLREG_RO_ALLOC) |
-SET_FIELD(cfg->n[GEN_L3P_DC], GEN8_L3CNTLREG_DC_ALLOC) |
-SET_FIELD(cfg->n[GEN_L3P_ALL], GEN8_L3CNTLREG_ALL_ALLOC));
-
-  ADVANCE_BATCH();
-
+  partition = has_slm ? GEN8_L3CNTLREG_SLM_ENABLE : 0;
+  partition |= SET_FIELD(cfg->n[GEN_L3P_URB], GEN8_L3CNTLREG_URB_ALLOC);
+  partition |= SET_FIELD(cfg->n[GEN_L3P_RO],  GEN8_L3CNTLREG_RO_ALLOC);
+  partition |= SET_FIELD(cfg->n[GEN_L3P_DC],  GEN8_L3CNTLREG_DC_ALLOC);
+  partition |= SET_FIELD(cfg->n[GEN_L3P_ALL], GEN8_L3CNTLREG_ALL_ALLOC);
+  brw_load_register_imm32(brw, GEN8_L3CNTLREG, partition);
} else {
   assert(!cfg->n[GEN_L3P_ALL]);
 
diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c 
b/src/mesa/drivers/dri/i965/gen8_depth_state.c
index 14689f400f..71e5831cf1 100644
--- a/src/mesa/drivers/dri/i965/gen8_depth_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c
@@ -347,11 +347,9 @@ gen8_write_pma_stall_bits(struct brw_context *brw, 
uint32_t pma_stall_bits)
render_cache_flush);
 
/* CACHE_MODE_1 is a non-privileged register. */
-   BEGIN_BATCH(3);
-   OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
-   OUT_BATCH(GEN7_CACHE_MODE_1);
-   OUT_BATCH(GEN8_HIZ_PMA_MASK_BITS | pma_stall_bits);
-   ADVANCE_BATCH();
+   brw_load_register_imm32(brw,
+   GEN7_CACHE_MODE_1,
+   GEN8_HIZ_PMA_MASK_BITS | pma_stall_bits);
 
/* After the LRI, a PIPE_CONTROL with both the Depth Stall and Depth Cache
 * Flush bits is often necessary.  We do it regardless because it's easier.
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH 2/7] i965: Replace open-coded store/load SO_WRITE_OFFSET to/from mem

2016-12-09 Thread Chris Wilson
Rather than emit the instructions directions, make use of the helpers
brw_store_register_mem32() and brw_load_register_mem()

Signed-off-by: Chris Wilson 
---
 src/mesa/drivers/dri/i965/hsw_sol.c | 27 +--
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/hsw_sol.c 
b/src/mesa/drivers/dri/i965/hsw_sol.c
index 87d4ab531b..2f1112699b 100644
--- a/src/mesa/drivers/dri/i965/hsw_sol.c
+++ b/src/mesa/drivers/dri/i965/hsw_sol.c
@@ -204,15 +204,10 @@ hsw_pause_transform_feedback(struct gl_context *ctx,
   brw_emit_mi_flush(brw);
 
   /* Save the SOL buffer offset register values. */
-  for (int i = 0; i < BRW_MAX_XFB_STREAMS; i++) {
- BEGIN_BATCH(3);
- OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2));
- OUT_BATCH(GEN7_SO_WRITE_OFFSET(i));
- OUT_RELOC(brw_obj->offset_bo,
-   I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-   i * sizeof(uint32_t));
- ADVANCE_BATCH();
-  }
+  for (int i = 0; i < BRW_MAX_XFB_STREAMS; i++)
+ brw_store_register_mem32(brw, brw_obj->offset_bo,
+  GEN7_SO_WRITE_OFFSET(i),
+  i * sizeof(uint32_t));
}
 
/* Add any primitives written to our tally */
@@ -232,15 +227,11 @@ hsw_resume_transform_feedback(struct gl_context *ctx,
 
if (brw->is_haswell) {
   /* Reload the SOL buffer offset registers. */
-  for (int i = 0; i < BRW_MAX_XFB_STREAMS; i++) {
- BEGIN_BATCH(3);
- OUT_BATCH(GEN7_MI_LOAD_REGISTER_MEM | (3 - 2));
- OUT_BATCH(GEN7_SO_WRITE_OFFSET(i));
- OUT_RELOC(brw_obj->offset_bo,
-   I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
-   i * sizeof(uint32_t));
- ADVANCE_BATCH();
-  }
+  for (int i = 0; i < BRW_MAX_XFB_STREAMS; i++)
+ brw_load_register_mem(brw, GEN7_SO_WRITE_OFFSET(i),
+   brw_obj->offset_bo,
+   I915_GEM_DOMAIN_INSTRUCTION, 0,
+   i * sizeof(uint32_t));
}
 
/* Store the new starting value of the SO_NUM_PRIMS_WRITTEN counters. */
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/7] i965: Flush pipeline before saving SO_WRITE_OFFSETS

2016-12-09 Thread Chris Wilson
Before saving the current position of the pipeline for the render
stream, we need to flush.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99030
Testcase: piglit/arb_transform_feedback2-draw-auto
Signed-off-by: Chris Wilson 
---
 src/mesa/drivers/dri/i965/hsw_sol.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/hsw_sol.c 
b/src/mesa/drivers/dri/i965/hsw_sol.c
index e299b02270..87d4ab531b 100644
--- a/src/mesa/drivers/dri/i965/hsw_sol.c
+++ b/src/mesa/drivers/dri/i965/hsw_sol.c
@@ -201,6 +201,8 @@ hsw_pause_transform_feedback(struct gl_context *ctx,
   (struct brw_transform_feedback_object *) obj;
 
if (brw->is_haswell) {
+  brw_emit_mi_flush(brw);
+
   /* Save the SOL buffer offset register values. */
   for (int i = 0; i < BRW_MAX_XFB_STREAMS; i++) {
  BEGIN_BATCH(3);
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev