Re: [Mesa-dev] [PATCH 1/2] radv: Add occlusion query shader.

2017-04-09 Thread Edward O'Callaghan
One trivial comment but otherwise 1&2 are,
Reviewed-by: Edward O'Callaghan 

On 04/10/2017 09:34 AM, Bas Nieuwenhuizen wrote:
> Adds a shader for writing occlusion query results to a buffer, as the
> CP packet isn't support on SI or secondary buffers, and doesn't handle
> the availability bit (or partial results) nor truncation to 32-bit.
> 
> Signed-off-by: Bas Nieuwenhuizen 
> ---
>  src/amd/vulkan/radv_meta.c|   7 +
>  src/amd/vulkan/radv_meta.h|   3 +
>  src/amd/vulkan/radv_private.h |   6 +
>  src/amd/vulkan/radv_query.c   | 419 
> ++
>  4 files changed, 435 insertions(+)
> 
> diff --git a/src/amd/vulkan/radv_meta.c b/src/amd/vulkan/radv_meta.c
> index 04fa247dd36..0098e0844c1 100644
> --- a/src/amd/vulkan/radv_meta.c
> +++ b/src/amd/vulkan/radv_meta.c
> @@ -324,6 +324,10 @@ radv_device_init_meta(struct radv_device *device)
>   if (result != VK_SUCCESS)
>   goto fail_buffer;
>  
> + result = radv_device_init_meta_query_state(device);
> + if (result != VK_SUCCESS)
> + goto fail_query;
> +
>   result = radv_device_init_meta_fast_clear_flush_state(device);
>   if (result != VK_SUCCESS)
>   goto fail_fast_clear;
> @@ -337,6 +341,8 @@ fail_resolve_compute:
>   radv_device_finish_meta_fast_clear_flush_state(device);
>  fail_fast_clear:
>   radv_device_finish_meta_buffer_state(device);
> +fail_query:
> + radv_device_finish_meta_query_state(device);
>  fail_buffer:
>   radv_device_finish_meta_depth_decomp_state(device);
>  fail_depth_decomp:
> @@ -363,6 +369,7 @@ radv_device_finish_meta(struct radv_device *device)
>   radv_device_finish_meta_blit2d_state(device);
>   radv_device_finish_meta_bufimage_state(device);
>   radv_device_finish_meta_depth_decomp_state(device);
> + radv_device_finish_meta_query_state(device);
>   radv_device_finish_meta_buffer_state(device);
>   radv_device_finish_meta_fast_clear_flush_state(device);
>   radv_device_finish_meta_resolve_compute_state(device);
> diff --git a/src/amd/vulkan/radv_meta.h b/src/amd/vulkan/radv_meta.h
> index d70fef1e5f1..6cfc6134c53 100644
> --- a/src/amd/vulkan/radv_meta.h
> +++ b/src/amd/vulkan/radv_meta.h
> @@ -85,6 +85,9 @@ void radv_device_finish_meta_blit2d_state(struct 
> radv_device *device);
>  VkResult radv_device_init_meta_buffer_state(struct radv_device *device);
>  void radv_device_finish_meta_buffer_state(struct radv_device *device);
>  
> +VkResult radv_device_init_meta_query_state(struct radv_device *device);
> +void radv_device_finish_meta_query_state(struct radv_device *device);
> +
>  VkResult radv_device_init_meta_resolve_compute_state(struct radv_device 
> *device);
>  void radv_device_finish_meta_resolve_compute_state(struct radv_device 
> *device);
>  void radv_meta_save(struct radv_meta_saved_state *state,
> diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
> index 580c1197e64..a03c24c24ac 100644
> --- a/src/amd/vulkan/radv_private.h
> +++ b/src/amd/vulkan/radv_private.h
> @@ -438,6 +438,12 @@ struct radv_meta_state {
>   VkPipeline fill_pipeline;
>   VkPipeline copy_pipeline;
>   } buffer;
> +
> + struct {
> + VkDescriptorSetLayout occlusion_query_ds_layout;
> + VkPipelineLayout occlusion_query_p_layout;
> + VkPipeline occlusion_query_pipeline;
> + } query;
>  };
>  
>  /* queue types */
> diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
> index 288bd43a763..5b1fff4eeaa 100644
> --- a/src/amd/vulkan/radv_query.c
> +++ b/src/amd/vulkan/radv_query.c
> @@ -29,6 +29,8 @@
>  #include 
>  #include 
>  
> +#include "nir/nir_builder.h"
> +#include "radv_meta.h"
>  #include "radv_private.h"
>  #include "radv_cs.h"
>  #include "sid.h"
> @@ -49,6 +51,423 @@ static unsigned get_max_db(struct radv_device *device)
>   return num_db;
>  }
>  
> +static void radv_break_on_count(nir_builder *b, nir_variable *var, int count)
> +{
> + nir_ssa_def *counter = nir_load_var(b, var);
> +
> + nir_if *if_stmt = nir_if_create(b->shader);
> + if_stmt->condition = nir_src_for_ssa(nir_uge(b, counter, nir_imm_int(b, 
> count)));
> + nir_cf_node_insert(b->cursor, _stmt->cf_node);
> +
> + b->cursor = nir_after_cf_list(_stmt->then_list);
> +
> + nir_jump_instr *instr = nir_jump_instr_create(b->shader, 
> nir_jump_break);
> + nir_builder_instr_insert(b, >instr);
> +
> + b->cursor = nir_after_cf_node(_stmt->cf_node);
> + counter = nir_iadd(b, counter, nir_imm_int(b, 1));
> + nir_store_var(b, var, counter, 0x1);
> +}
> +
> +static struct nir_ssa_def *
> +radv_load_push_int(nir_builder *b, unsigned offset, const char *name)
> +{
> + nir_intrinsic_instr *flags = nir_intrinsic_instr_create(b->shader, 
> nir_intrinsic_load_push_constant);
> + flags->src[0] = nir_src_for_ssa(nir_imm_int(b, offset));
> + 

Re: [Mesa-dev] [PATCH] amd/addrlib: use correct variable name in header

2017-04-09 Thread Edward O'Callaghan


On 04/10/2017 12:31 PM, Thomas H.P. Andersen wrote:
> On Sun, Apr 9, 2017 at 8:25 PM, Marek Olšák  wrote:
>> Reviewed-by: Marek Olšák 
>>
>> Marek
> 
> Thanks. I do not have commit access, so will need someone to push it for me.
Done, thanks for the fix!
Kind Regards,
Edward.

> 
>> On Sat, Apr 8, 2017 at 8:36 AM, Thomas Hindoe Paaboel Andersen
>>  wrote:
>>> Since the inclusion in 7f160efcde41b52ad78e562316384373dab419e3
>>> the header used x_biased, while the implementation used y_biased.
>>> This changes the header to macth the implementation since the
>>> uses of the function seems to expect y_biased.
>>> ---
>>>  src/amd/addrlib/gfx9/rbmap.h | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/src/amd/addrlib/gfx9/rbmap.h b/src/amd/addrlib/gfx9/rbmap.h
>>> index f2f2ca8..89c8922 100644
>>> --- a/src/amd/addrlib/gfx9/rbmap.h
>>> +++ b/src/amd/addrlib/gfx9/rbmap.h
>>> @@ -49,7 +49,7 @@ public:
>>>
>>>  void Get_Comp_Block_Screen_Space( CoordEq& addr, int bytes_log2, int* 
>>> w, int* h, int* d = NULL);
>>>
>>> -void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool 
>>> is_thick, bool x_biased,
>>> +void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool 
>>> is_thick, bool y_biased,
>>>int comp_block_width_log2, int 
>>> comp_block_height_log2, int comp_block_depth_log2,
>>>int& meta_block_width_log2, int& 
>>> meta_block_height_log2, int& meta_block_depth_log2 );
>>>  void cap_pipe( int xmode, bool is_thick, int& num_ses_log2, int 
>>> bpp_log2, int num_samples_log2, int pipe_interleave_log2,
>>> --
>>> 2.9.3
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] amd/addrlib: use correct variable name in header

2017-04-09 Thread Thomas H.P. Andersen
On Sun, Apr 9, 2017 at 8:25 PM, Marek Olšák  wrote:
> Reviewed-by: Marek Olšák 
>
> Marek

Thanks. I do not have commit access, so will need someone to push it for me.

> On Sat, Apr 8, 2017 at 8:36 AM, Thomas Hindoe Paaboel Andersen
>  wrote:
>> Since the inclusion in 7f160efcde41b52ad78e562316384373dab419e3
>> the header used x_biased, while the implementation used y_biased.
>> This changes the header to macth the implementation since the
>> uses of the function seems to expect y_biased.
>> ---
>>  src/amd/addrlib/gfx9/rbmap.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/amd/addrlib/gfx9/rbmap.h b/src/amd/addrlib/gfx9/rbmap.h
>> index f2f2ca8..89c8922 100644
>> --- a/src/amd/addrlib/gfx9/rbmap.h
>> +++ b/src/amd/addrlib/gfx9/rbmap.h
>> @@ -49,7 +49,7 @@ public:
>>
>>  void Get_Comp_Block_Screen_Space( CoordEq& addr, int bytes_log2, int* 
>> w, int* h, int* d = NULL);
>>
>> -void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool 
>> is_thick, bool x_biased,
>> +void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool 
>> is_thick, bool y_biased,
>>int comp_block_width_log2, int 
>> comp_block_height_log2, int comp_block_depth_log2,
>>int& meta_block_width_log2, int& 
>> meta_block_height_log2, int& meta_block_depth_log2 );
>>  void cap_pipe( int xmode, bool is_thick, int& num_ses_log2, int 
>> bpp_log2, int num_samples_log2, int pipe_interleave_log2,
>> --
>> 2.9.3
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: fix memory leak in arb_fragment_program

2017-04-09 Thread Timothy Arceri

Thanks.

Reviewed-by: Timothy Arceri 


On 10/04/17 02:37, Bartosz Tomczyk wrote:

---
 src/mesa/program/arbprogparse.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/program/arbprogparse.c b/src/mesa/program/arbprogparse.c
index 07bdf1603e..83a501eea6 100644
--- a/src/mesa/program/arbprogparse.c
+++ b/src/mesa/program/arbprogparse.c
@@ -78,6 +78,7 @@ _mesa_parse_arb_fragment_program(struct gl_context* ctx, 
GLenum target,
memset(, 0, sizeof(prog));
memset(, 0, sizeof(state));
state.prog = 
+   state.mem_ctx = program;

if (!_mesa_parse_arb_program(ctx, target, (const GLubyte*) str, len,
)) {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 8/9] nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*

2017-04-09 Thread Boyan Ding
2017-04-10 9:54 GMT+08:00 Ilia Mirkin :
> On Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding  wrote:
>> ---
>>  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 28 
>> ++
>>  1 file changed, 28 insertions(+)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>> index 1bd01a9a32..2ce6f29905 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>> @@ -978,6 +978,10 @@ static nv50_ir::operation translateOpcode(uint opcode)
>> NV50_IR_OPCODE_CASE(VOTE_ANY, VOTE);
>> NV50_IR_OPCODE_CASE(VOTE_EQ, VOTE);
>>
>> +   NV50_IR_OPCODE_CASE(BALLOT, VOTE);
>> +   NV50_IR_OPCODE_CASE(READ_INVOC, SHFL);
>> +   NV50_IR_OPCODE_CASE(READ_FIRST, SHFL);
>> +
>> NV50_IR_OPCODE_CASE(END, EXIT);
>>
>> default:
>> @@ -3431,6 +3435,30 @@ Converter::handleInstruction(const struct 
>> tgsi_full_instruction *insn)
>>   mkCvt(OP_CVT, TYPE_U32, dst0[c], TYPE_U8, val0);
>>}
>>break;
>> +   case TGSI_OPCODE_BALLOT:
>> +  val0 = new_LValue(func, FILE_PREDICATE);
>> +  mkCmp(OP_SET, CC_NE, TYPE_U32, val0, TYPE_U32, fetchSrc(0, 0), zero);
>> +  mkOp1(op, TYPE_U32, dst0[0], val0)->subOp = NV50_IR_SUBOP_VOTE_ANY;
>> +  mkMov(dst0[1], zero, TYPE_U32);
>
> Check that dst[n] isn't masked though before writing to it.
>
>> +  break;
>> +   case TGSI_OPCODE_READ_FIRST:
>> +  // ReadFirstInvocationARB(src) is implemented as
>> +  // ReadInvocationARB(src, findLSB(ballot(true)))
>> +  val0 = getScratch();
>> +  mkOp1(OP_VOTE, TYPE_U32, val0, mkImm(1))->subOp = 
>> NV50_IR_SUBOP_VOTE_ANY;
>> +  mkOp2(OP_EXTBF, TYPE_U32, val0, val0, mkImm(0x2000))
>> + ->subOp = NV50_IR_SUBOP_EXTBF_REV;
>> +  mkOp1(OP_BFIND, TYPE_U32, val0, val0)->subOp = 
>> NV50_IR_SUBOP_BFIND_SAMT;
>> +  src1 = val0;
>> +  /* fallthrough */
>
> You could, of course, do this as:
>
> if (false)
>
>> +   case TGSI_OPCODE_READ_INVOC:
>> +  if (tgsi.getOpcode() == TGSI_OPCODE_READ_INVOC)
>
> And then remove this if statement. (Ain't C fun.)
>
> But don't actually do that :) I'm more pointing it out due to the crazy 
> factor.

Well, I didn't even think of that ;) But I surely won't take it.

>
> I really do hate that if for somewhat irrational reasons though...
> can't think of a clean way of getting rid of it. Oh well.

Yeah, the 'if' here isnt really great. However, without that, the only
way I could come up with will cause duplication which is even worse.

>
>> + src1 = fetchSrc(1, 0);
>> +  FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
>> + geni = mkOp3(op, dstTy, dst0[c], fetchSrc(0, c), src1, 
>> mkImm(0x1f));
>> + geni->subOp = NV50_IR_SUBOP_SHFL_IDX;
>> +  }
>> +  break;
>> case TGSI_OPCODE_CLOCK:
>>// Stick the 32-bit clock into the high dword of the logical result.
>>if (!tgsi.getDst(0).isMasked(0))
>> --
>> 2.12.1
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 9/9] nvc0: Enable ARB_shader_ballot on Kepler+

2017-04-09 Thread Ilia Mirkin
Reviewed-by: Ilia Mirkin 

On Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding  wrote:
> readInvocationARB() and readFirstInvocationARB() need SHFL.IDX
> instruction which is introduced in Kepler.
> ---
>  docs/features.txt  | 2 +-
>  docs/relnotes/17.1.0.html  | 2 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 ++-
>  3 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/docs/features.txt b/docs/features.txt
> index edc56842b9..a2d7785827 100644
> --- a/docs/features.txt
> +++ b/docs/features.txt
> @@ -292,7 +292,7 @@ Khronos, ARB, and OES extensions that are not part of any 
> OpenGL or OpenGL ES ve
>GL_ARB_sample_locations   not started
>GL_ARB_seamless_cubemap_per_texture   DONE (i965, nvc0, 
> radeonsi, r600, softpipe, swr)
>GL_ARB_shader_atomic_counter_ops  DONE (i965/gen7+, 
> nvc0, radeonsi, softpipe)
> -  GL_ARB_shader_ballot  DONE (radeonsi)
> +  GL_ARB_shader_ballot  DONE (nvc0, radeonsi)
>GL_ARB_shader_clock   DONE (i965/gen7+, 
> nv50, nvc0, radeonsi)
>GL_ARB_shader_draw_parameters DONE (i965, nvc0, 
> radeonsi)
>GL_ARB_shader_group_vote  DONE (nvc0, radeonsi)
> diff --git a/docs/relnotes/17.1.0.html b/docs/relnotes/17.1.0.html
> index 0a5cabe4f1..8f237ed527 100644
> --- a/docs/relnotes/17.1.0.html
> +++ b/docs/relnotes/17.1.0.html
> @@ -45,7 +45,7 @@ Note: some of the new features are only available with 
> certain drivers.
>
>  
>  GL_ARB_gpu_shader_int64 on i965/gen8+, nvc0, radeonsi, softpipe, 
> llvmpipe
> -GL_ARB_shader_ballot on radeonsi
> +GL_ARB_shader_ballot on nvc0, radeonsi
>  GL_ARB_shader_clock on nv50, nvc0, radeonsi
>  GL_ARB_shader_group_vote on radeonsi
>  GL_ARB_sparse_buffer on radeonsi/CIK+
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> index 7ef9bf9c9c..8c6712a121 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> @@ -259,6 +259,8 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
> pipe_cap param)
>return class_3d >= NVE4_3D_CLASS; /* needs testing on fermi */
> case PIPE_CAP_POLYGON_MODE_FILL_RECTANGLE:
>return class_3d >= GM200_3D_CLASS;
> +   case PIPE_CAP_TGSI_BALLOT:
> +  return class_3d >= NVE4_3D_CLASS;
>
> /* unsupported caps */
> case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
> @@ -289,7 +291,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
> pipe_cap param)
> case PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY:
> case PIPE_CAP_INT64_DIVMOD:
> case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE:
> -   case PIPE_CAP_TGSI_BALLOT:
>return 0;
>
> case PIPE_CAP_VENDOR_ID:
> --
> 2.12.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 100613] Regression in Mesa 17 on s390x (zSystems)

2017-04-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=100613

--- Comment #3 from Roland Scheidegger  ---
(In reply to Stefan Dirsch from comment #2)
> Roland, thanks a lot for your prompt reply! Very much appreciated! 
> 
> Seems Richard meanwhile switched companies from IBM to ARM meanwhile. I
> found him on Linkedin. Possibly he's now working on aarch64 (LE). So I'm
> afraid he has no longer access to BE machines any longer.
> 
> Unfortunately I'm not familiar with llvmpipe at all. Would it be an option
> not to change the code there for BE, if developers have no access to such
> machines? Reverse-applying the commit is going to break sooner or later I'm
> sure.
That'll be theoretically possible but I can't say I particularly like that
solution. It doesn't make much sense that the fetch paths for BE and LE are
completely distinct...
Chances are it will break sooner or later anyway - this code really desperately
wants someone who is willing to test it and keep it working on BE.
(That it took 3 months until someone notices it's broken isn't a good sign...)
Otherwise there's probably a build change down the road which just disables
build on BE...

> 
> Of course I'm willing to test any proposed change/patch on s390x, but I'm
> not a Mesa/llvmwpipe developer per se.
> 
> Unfortunately llvmpipe is needed on s390x, since it has become a requirement
> for modern desktops like gdm/gnome-shell. :-(
All the more reason why someone might want to look into it...


> I can't say how fundamental the issue is. gdm and gnome-shell just show a
> black screen. :-( 
I don't know what vertex formats these use, but yes bogus vertex fetch will
make for a very bad experience (it's nearly a miracle glxgears still manages to
draw something in fact I like that new look better :-)).

I've taken a closer look now, and I can see some reasons why it doesn't work.
That said, I never really understood the vector_justify logic, which just looks
odd to me. But in the end the gather really is different for AoS and SoA (and I
didn't understand the differences there neither wrt vector_justify).

So, looking at R32G32B32F format (which glxgears uses) (for this format SoA vs.
AoS should not actually make that much of a difference, since it doesn't
require any actual conversion):
The old code would have called lp_build_fetch_rgba_aos() 4 times - which would
have resulted in 4 lp_build_gather with vector_justify set to TRUE, block_bits
96 and dst type of 1x128bit. The gather would have fetched 96 bits, do a ZEXT
and then (due to vector_justify - this is the stuff guarded with
PIPE_ARCH_BIG_ENDIAN in lp_bld_gather.c) do a left shift of 32 for some reason
I don't quite get (I thought it shouldn't make a difference with those array
formats if they are fetched on BE or LE but it looks like I'm wrong). The
values then would have gone through lp_build_format_swizzle_aos() (and I have
no idea if that swizzle looks different on BE) before finally getting
transposed to SoA.
The new code will now use one lp_build_fetch_rgba_soa() call. This will still
end up with 4 gathers, but in the soa path which always use vector_justify of
false (why? I have no idea but this was like that before), so you don't get the
left shift of 32. Oh and the values will be fetched as 3x32bit instead of a
96bit int (this particular change was one of the changes preceding this commit,
so you could verify independently if it breaks stuff, some piglit texture
format tests for instance could show that - unfortunately lp_test_format only
does (scalar) rgba_aos fetch, so not exactly helpful for that, but you really
want rgba SoA fetches working in general, regardless of vertex fetch), if that
makes any difference (again, I have no idea really) (it will do pad_vector, so
use a shuffle to extend the 3x32bit values to 4x32bit instead of using ZExt to
1x128bit, but I'm not worried about that particular bit). The values will then
be transposed and finally going into lp_build_format_swizzle_soa().
So, my guess is maybe things would work a bit better if you'd hack up the
vector_justify parameter to lp_build_gather() in lp_build_fetch_rgba_soa().
However, this near certainly breaks all the other callers of
lp_build_fetch_rgba_soa(), which is used for just about all texture formats
except the rgba8 ones, so glxgears and desktop compositors might still run but
probably not much else, you don't want to do that...

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 8/9] nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*

2017-04-09 Thread Ilia Mirkin
On Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding  wrote:
> ---
>  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 28 
> ++
>  1 file changed, 28 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> index 1bd01a9a32..2ce6f29905 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> @@ -978,6 +978,10 @@ static nv50_ir::operation translateOpcode(uint opcode)
> NV50_IR_OPCODE_CASE(VOTE_ANY, VOTE);
> NV50_IR_OPCODE_CASE(VOTE_EQ, VOTE);
>
> +   NV50_IR_OPCODE_CASE(BALLOT, VOTE);
> +   NV50_IR_OPCODE_CASE(READ_INVOC, SHFL);
> +   NV50_IR_OPCODE_CASE(READ_FIRST, SHFL);
> +
> NV50_IR_OPCODE_CASE(END, EXIT);
>
> default:
> @@ -3431,6 +3435,30 @@ Converter::handleInstruction(const struct 
> tgsi_full_instruction *insn)
>   mkCvt(OP_CVT, TYPE_U32, dst0[c], TYPE_U8, val0);
>}
>break;
> +   case TGSI_OPCODE_BALLOT:
> +  val0 = new_LValue(func, FILE_PREDICATE);
> +  mkCmp(OP_SET, CC_NE, TYPE_U32, val0, TYPE_U32, fetchSrc(0, 0), zero);
> +  mkOp1(op, TYPE_U32, dst0[0], val0)->subOp = NV50_IR_SUBOP_VOTE_ANY;
> +  mkMov(dst0[1], zero, TYPE_U32);

Check that dst[n] isn't masked though before writing to it.

> +  break;
> +   case TGSI_OPCODE_READ_FIRST:
> +  // ReadFirstInvocationARB(src) is implemented as
> +  // ReadInvocationARB(src, findLSB(ballot(true)))
> +  val0 = getScratch();
> +  mkOp1(OP_VOTE, TYPE_U32, val0, mkImm(1))->subOp = 
> NV50_IR_SUBOP_VOTE_ANY;
> +  mkOp2(OP_EXTBF, TYPE_U32, val0, val0, mkImm(0x2000))
> + ->subOp = NV50_IR_SUBOP_EXTBF_REV;
> +  mkOp1(OP_BFIND, TYPE_U32, val0, val0)->subOp = 
> NV50_IR_SUBOP_BFIND_SAMT;
> +  src1 = val0;
> +  /* fallthrough */

You could, of course, do this as:

if (false)

> +   case TGSI_OPCODE_READ_INVOC:
> +  if (tgsi.getOpcode() == TGSI_OPCODE_READ_INVOC)

And then remove this if statement. (Ain't C fun.)

But don't actually do that :) I'm more pointing it out due to the crazy factor.

I really do hate that if for somewhat irrational reasons though...
can't think of a clean way of getting rid of it. Oh well.

> + src1 = fetchSrc(1, 0);
> +  FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
> + geni = mkOp3(op, dstTy, dst0[c], fetchSrc(0, c), src1, mkImm(0x1f));
> + geni->subOp = NV50_IR_SUBOP_SHFL_IDX;
> +  }
> +  break;
> case TGSI_OPCODE_CLOCK:
>// Stick the 32-bit clock into the high dword of the logical result.
>if (!tgsi.getDst(0).isMasked(0))
> --
> 2.12.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/9] nvc0/ir: Properly handle a "split form" of predicate destination

2017-04-09 Thread Boyan Ding
2017-04-10 9:31 GMT+08:00 Ilia Mirkin :
> Wow, great find!
>
> On Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding  wrote:
>> GF100's ISA encoding has a weird form of predicate destination where its
>> 3 bits are split across whole the instruction. Use a dedicated setPDSTL
>> function instead of original defId which is incorrect in this case.
>> ---
>>  src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 13 +++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>> index 5467447e35..d5a310f88c 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>> @@ -58,6 +58,7 @@ private:
>> void setImmediateS8(const ValueRef&);
>> void setSUConst16(const Instruction *, const int s);
>> void setSUPred(const Instruction *, const int s);
>> +   inline void setPDSTL(const ValueDef&);
>>
>> void emitCondCode(CondCode cc, int pos);
>> void emitInterpMode(const Instruction *);
>> @@ -375,6 +376,14 @@ void CodeEmitterNVC0::setImmediateS8(const ValueRef 
>> )
>> code[0] |= (s8 >> 6) << 8;
>>  }
>>
>> +void CodeEmitterNVC0::setPDSTL(const ValueDef )
>> +{
>> +   uint32_t pred = (def.get() && def.getFile() != FILE_FLAGS ? 
>> DDATA(def).id : 7);
>
> Why not just == FILE_PREDICATE? Also, I don't think the outer parens do much.

Okay, will fix it.

>
>> +
>> +   code[0] |= (pred & 3) << 8;
>> +   code[1] |= !!(pred & 7) << 26;
>
> This always makes me nervous... how about
>
> (pred & 4) << (26 - 2)
>
> BTW, this should be pred & 4 in either case, no?

Yeah, should be pred & 4.

>
>> +}
>> +
>>  void
>>  CodeEmitterNVC0::emitForm_A(const Instruction *i, uint64_t opc)
>>  {
>> @@ -1873,7 +1882,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i)
>>if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
>>i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
>>   assert(i->defExists(0));
>> - defId(i->def(0), 8);
>> + setPDSTL(i->def(0));
>>}
>> }
>>
>> @@ -1945,7 +1954,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
>>
>> if (p >= 0) {
>>if (targ->getChipset() >= NVISA_GK104_CHIPSET)
>> - defId(i->def(p), 8);
>> + setPDSTL(i->def(p));
>>else
>>   defId(i->def(p), 32 + 18);
>> }
>> --
>> 2.12.1
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V3 2/2] glsl: don't run the GLSL pre-processor when we are skipping compilation

2017-04-09 Thread Timothy Arceri
Improves Deus Ex start-up times with a warm cache from ~30 seconds to
~22 seconds.

Also fixes the leaking of state.

V2: fix indentation

v3: add the value of MESA_EXTENSION_OVERRIDE to the hash of the shader.

Tested-by (v2): Grazvydas Ignotas 
---
 src/compiler/glsl/glsl_parser_extras.cpp | 19 ++-
 src/compiler/glsl/shader_cache.cpp   | 10 ++
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
b/src/compiler/glsl/glsl_parser_extras.cpp
index ca74b55..eb12eff 100644
--- a/src/compiler/glsl/glsl_parser_extras.cpp
+++ b/src/compiler/glsl/glsl_parser_extras.cpp
@@ -1998,32 +1998,23 @@ opt_shader_and_create_symbol_table(struct gl_context 
*ctx,
   }
}
 
_mesa_glsl_initialize_derived_variables(ctx, shader);
 }
 
 void
 _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader,
   bool dump_ast, bool dump_hir, bool force_recompile)
 {
-   struct _mesa_glsl_parse_state *state =
-  new(shader) _mesa_glsl_parse_state(ctx, shader->Stage, shader);
const char *source = force_recompile && shader->FallbackSource ?
   shader->FallbackSource : shader->Source;
 
-   if (ctx->Const.GenerateTemporaryNames)
-  (void) p_atomic_cmpxchg(_variable::temporaries_allocate_names,
-  false, true);
-
-   state->error = glcpp_preprocess(state, , >info_log,
- add_builtin_defines, state, ctx);
-
if (!force_recompile) {
   if (ctx->Cache) {
  char buf[41];
  disk_cache_compute_key(ctx->Cache, source, strlen(source),
 shader->sha1);
  if (disk_cache_has_key(ctx->Cache, shader->sha1)) {
 /* We've seen this shader before and know it compiles */
 if (ctx->_Shader->Flags & GLSL_CACHE_INFO) {
_mesa_sha1_format(buf, shader->sha1);
fprintf(stderr, "deferring compile of shader: %s\n", buf);
@@ -2043,20 +2034,30 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, 
struct gl_shader *shader,
   if (shader->CompileStatus == compile_success)
  return;
 
   if (shader->CompileStatus == compiled_no_opts) {
  opt_shader_and_create_symbol_table(ctx, shader);
  shader->CompileStatus = compile_success;
  return;
   }
}
 
+   struct _mesa_glsl_parse_state *state =
+  new(shader) _mesa_glsl_parse_state(ctx, shader->Stage, shader);
+
+   if (ctx->Const.GenerateTemporaryNames)
+  (void) p_atomic_cmpxchg(_variable::temporaries_allocate_names,
+  false, true);
+
+   state->error = glcpp_preprocess(state, , >info_log,
+   add_builtin_defines, state, ctx);
+
if (!state->error) {
  _mesa_glsl_lexer_ctor(state, source);
  _mesa_glsl_parse(state);
  _mesa_glsl_lexer_dtor(state);
  do_late_parsing_checks(state);
}
 
if (dump_ast) {
   foreach_list_typed(ast_node, ast, link, >translation_unit) {
  ast->print();
diff --git a/src/compiler/glsl/shader_cache.cpp 
b/src/compiler/glsl/shader_cache.cpp
index e51fecd..738e548 100644
--- a/src/compiler/glsl/shader_cache.cpp
+++ b/src/compiler/glsl/shader_cache.cpp
@@ -1312,20 +1312,30 @@ shader_cache_read_program_metadata(struct gl_context 
*ctx,
   prog->SeparateShader ? "T" : "F");
 
/* A shader might end up producing different output depending on the glsl
 * version supported by the compiler. For example a different path might be
 * taken by the preprocessor, so add the version to the hash input.
 */
ralloc_asprintf_append(, "api: %d glsl: %d fglsl: %d\n",
   ctx->API, ctx->Const.GLSLVersion,
   ctx->Const.ForceGLSLVersion);
 
+   /* We run the preprocessor on shaders after hashing them, so we need to
+* add any extension override vars to the hash. If we don't do this the
+* preprocessor could result in different output and we could load the
+* wrong shader.
+*/
+   char *ext_override = getenv("MESA_EXTENSION_OVERRIDE");
+   if (ext_override) {
+  ralloc_asprintf_append(, "ext:%s", ext_override);
+   }
+
/* DRI config options may also change the output from the compiler so
 * include them as an input to sha1 creation.
 */
char sha1buf[41];
_mesa_sha1_format(sha1buf, ctx->Const.dri_config_options_sha1);
ralloc_strcat(, sha1buf);
 
for (unsigned i = 0; i < prog->NumShaders; i++) {
   struct gl_shader *sh = prog->Shaders[i];
   _mesa_sha1_format(sha1buf, sh->sha1);
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 1/2] glsl: delay optimisations on individual shaders when cache is available

2017-04-09 Thread Timothy Arceri
Due to a max limit of 65,536 entries on the index table that we use to
decide if we can skip compiling individual shaders, it is very likely
we will have collisions.

To avoid doing too much work when the linked program may be in the
cache this patch delays calling the optimisations until link time.

Improves cold cache start-up times on Deus Ex by ~20 seconds.

When deleting the cache index to simulate a worst case scenario
of collisions in the index, warm cache start-up time improves by
~45 seconds.

V2: fix indentation, make sure to call optimisations on cache
fallback, make sure optimisations get called for XFB.

Tested-by: Grazvydas Ignotas i
Reviewed-by: Nicolai Hähnle 
---
 src/compiler/glsl/glsl_parser_extras.cpp | 166 +--
 src/compiler/glsl/linker.cpp |   3 -
 src/compiler/glsl/shader_cache.cpp   |   2 +-
 src/mesa/main/mtypes.h   |   3 +-
 4 files changed, 96 insertions(+), 78 deletions(-)

diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
b/src/compiler/glsl/glsl_parser_extras.cpp
index 4629e78..ca74b55 100644
--- a/src/compiler/glsl/glsl_parser_extras.cpp
+++ b/src/compiler/glsl/glsl_parser_extras.cpp
@@ -1915,20 +1915,99 @@ static void
 do_late_parsing_checks(struct _mesa_glsl_parse_state *state)
 {
if (state->stage == MESA_SHADER_COMPUTE && !state->has_compute_shader()) {
   YYLTYPE loc;
   memset(, 0, sizeof(loc));
   _mesa_glsl_error(, state, "Compute shaders require "
"GLSL 4.30 or GLSL ES 3.10");
}
 }
 
+static void
+opt_shader_and_create_symbol_table(struct gl_context *ctx,
+   struct gl_shader *shader)
+{
+   assert(shader->CompileStatus != compile_failure &&
+  !shader->ir->is_empty());
+
+   struct gl_shader_compiler_options *options =
+  >Const.ShaderCompilerOptions[shader->Stage];
+
+   /* Do some optimization at compile time to reduce shader IR size
+* and reduce later work if the same shader is linked multiple times
+*/
+   if (ctx->Const.GLSLOptimizeConservatively) {
+  /* Run it just once. */
+  do_common_optimization(shader->ir, false, false, options,
+ ctx->Const.NativeIntegers);
+   } else {
+  /* Repeat it until it stops making changes. */
+  while (do_common_optimization(shader->ir, false, false, options,
+ctx->Const.NativeIntegers))
+ ;
+   }
+
+   validate_ir_tree(shader->ir);
+
+   enum ir_variable_mode other;
+   switch (shader->Stage) {
+   case MESA_SHADER_VERTEX:
+  other = ir_var_shader_in;
+  break;
+   case MESA_SHADER_FRAGMENT:
+  other = ir_var_shader_out;
+  break;
+   default:
+  /* Something invalid to ensure optimize_dead_builtin_uniforms
+   * doesn't remove anything other than uniforms or constants.
+   */
+  other = ir_var_mode_count;
+  break;
+   }
+
+   optimize_dead_builtin_variables(shader->ir, other);
+
+   validate_ir_tree(shader->ir);
+
+   /* Retain any live IR, but trash the rest. */
+   reparent_ir(shader->ir, shader->ir);
+
+   /* Destroy the symbol table.  Create a new symbol table that contains only
+* the variables and functions that still exist in the IR.  The symbol
+* table will be used later during linking.
+*
+* There must NOT be any freed objects still referenced by the symbol
+* table.  That could cause the linker to dereference freed memory.
+*
+* We don't have to worry about types or interface-types here because those
+* are fly-weights that are looked up by glsl_type.
+*/
+   foreach_in_list (ir_instruction, ir, shader->ir) {
+  switch (ir->ir_type) {
+  case ir_type_function:
+ shader->symbols->add_function((ir_function *) ir);
+ break;
+  case ir_type_variable: {
+ ir_variable *const var = (ir_variable *) ir;
+
+ if (var->data.mode != ir_var_temporary)
+shader->symbols->add_variable(var);
+ break;
+  }
+  default:
+ break;
+  }
+   }
+
+   _mesa_glsl_initialize_derived_variables(ctx, shader);
+}
+
 void
 _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader,
   bool dump_ast, bool dump_hir, bool force_recompile)
 {
struct _mesa_glsl_parse_state *state =
   new(shader) _mesa_glsl_parse_state(ctx, shader->Stage, shader);
const char *source = force_recompile && shader->FallbackSource ?
   shader->FallbackSource : shader->Source;
 
if (ctx->Const.GenerateTemporaryNames)
@@ -1956,20 +2035,26 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, 
struct gl_shader *shader,
 return;
  }
   }
} else {
   /* We should only ever end up here if a re-compile has been forced by a
* shader cache miss. In which case we can skip the compile if its
* already be done by a previous fallback or the 

Re: [Mesa-dev] [PATCH v2 7/9] nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_*

2017-04-09 Thread Ilia Mirkin
Reviewed-by: Ilia Mirkin 

On Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding  wrote:
> ---
>  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 27 
> ++
>  1 file changed, 27 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> index 3ed7d345c4..1bd01a9a32 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> @@ -450,6 +450,12 @@ static nv50_ir::SVSemantic translateSysVal(uint sysval)
> case TGSI_SEMANTIC_BASEINSTANCE: return nv50_ir::SV_BASEINSTANCE;
> case TGSI_SEMANTIC_DRAWID: return nv50_ir::SV_DRAWID;
> case TGSI_SEMANTIC_WORK_DIM:   return nv50_ir::SV_WORK_DIM;
> +   case TGSI_SEMANTIC_SUBGROUP_INVOCATION: return nv50_ir::SV_LANEID;
> +   case TGSI_SEMANTIC_SUBGROUP_EQ_MASK: return nv50_ir::SV_LANEMASK_EQ;
> +   case TGSI_SEMANTIC_SUBGROUP_LT_MASK: return nv50_ir::SV_LANEMASK_LT;
> +   case TGSI_SEMANTIC_SUBGROUP_LE_MASK: return nv50_ir::SV_LANEMASK_LE;
> +   case TGSI_SEMANTIC_SUBGROUP_GT_MASK: return nv50_ir::SV_LANEMASK_GT;
> +   case TGSI_SEMANTIC_SUBGROUP_GE_MASK: return nv50_ir::SV_LANEMASK_GE;
> default:
>assert(0);
>return nv50_ir::SV_CLOCK;
> @@ -1667,6 +1673,8 @@ private:
> Symbol *srcToSym(tgsi::Instruction::SrcRegister, int c);
> Symbol *dstToSym(tgsi::Instruction::DstRegister, int c);
>
> +   bool isSubGroupMask(uint8_t semantic);
> +
> bool handleInstruction(const struct tgsi_full_instruction *);
> void exportOutputs();
> inline Subroutine *getSubroutine(unsigned ip);
> @@ -1996,6 +2004,21 @@ Converter::adjustTempIndex(int arrayId, int , int 
> ) const
> idx += it->second;
>  }
>
> +bool
> +Converter::isSubGroupMask(uint8_t semantic)
> +{
> +   switch (semantic) {
> +  case TGSI_SEMANTIC_SUBGROUP_EQ_MASK:
> +  case TGSI_SEMANTIC_SUBGROUP_LT_MASK:
> +  case TGSI_SEMANTIC_SUBGROUP_LE_MASK:
> +  case TGSI_SEMANTIC_SUBGROUP_GT_MASK:
> +  case TGSI_SEMANTIC_SUBGROUP_GE_MASK:
> + return true;
> +  default:
> + return false;
> +   }
> +}
> +
>  Value *
>  Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr)
>  {
> @@ -2041,6 +2064,10 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister 
> src, int c, Value *ptr)
>if (info->sv[idx].sn == TGSI_SEMANTIC_THREAD_ID &&
>info->prop.cp.numThreads[swz] == 1)
>   return loadImm(NULL, 0u);
> +  if (isSubGroupMask(info->sv[idx].sn) && swz > 0)
> + return loadImm(NULL, 0u);
> +  if (info->sv[idx].sn == TGSI_SEMANTIC_SUBGROUP_SIZE)
> + return loadImm(NULL, 32u);
>ld = mkOp1(OP_RDSV, TYPE_U32, getSSA(), srcToSym(src, c));
>ld->perPatch = info->sv[idx].patch;
>return ld->getDef(0);
> --
> 2.12.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 5/9] nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE

2017-04-09 Thread Ilia Mirkin
On Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding  wrote:
> Implementation of readFirstInvocationARB() on nvidia hardware needs a
> ballotARB(true) used to decide the first active thread. This expressed
> in gm107 asm as (supposing output is $r0):
> vote any $r0 0x1 0x1
>
> To model the always true input, which corresponds to the second 0x1
> above, we make OP_VOTE accept immediate value 0/1 and emit "0x1" and
> "not 0x1" in the src field respectively.
>
> v2: Make sure that asImm() is not NULL (Samuel Pitoiset)
> ---
>  .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 24 
> ++
>  .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 22 +---
>  .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 24 
> ++
>  3 files changed, 59 insertions(+), 11 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
> index 58076ba4d5..87976ffebc 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
> @@ -1621,7 +1621,8 @@ CodeEmitterGK110::emitSHFL(const Instruction *i)
>  void
>  CodeEmitterGK110::emitVOTE(const Instruction *i)
>  {
> -   assert(i->src(0).getFile() == FILE_PREDICATE);
> +   const ImmediateValue *imm;
> +   uint32_t u32;
>
> code[0] = 0x0002;
> code[1] = 0x86c0 | (i->subOp << 19);
> @@ -1646,9 +1647,24 @@ CodeEmitterGK110::emitVOTE(const Instruction *i)
>code[0] |= 255 << 2;
> if (!(rp & 2))
>code[1] |= 7 << 16;
> -   if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
> -  code[1] |= 1 << 13;
> -   srcId(i->src(0), 42);
> +
> +   switch (i->src(0).getFile()) {
> +   case FILE_PREDICATE:
> +  if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
> + code[0] |= 1 << 13;
> +  srcId(i->src(0), 42);
> +  break;
> +   case FILE_IMMEDIATE:
> +  imm = i->src(0).get()->asImm();
> +  assert(imm);
> +  u32 = imm->reg.data.u32;
> +  assert(u32 == 0 || u32 == 1);
> +  code[1] |= (u32 == 1 ? 0x7 : 0xf) << 10;
> +  break;
> +   default:
> +  assert(!"Unhandled src");
> +  break;
> +   }
>  }
>
>  void
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> index 944563c93c..0382cb3903 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> @@ -2931,7 +2931,8 @@ CodeEmitterGM107::emitMEMBAR()
>  void
>  CodeEmitterGM107::emitVOTE()
>  {
> -   assert(insn->src(0).getFile() == FILE_PREDICATE);
> +   const ImmediateValue *imm;
> +   uint32_t u32;
>
> int r = -1, p = -1;
> for (int i = 0; insn->defExists(i); i++) {
> @@ -2951,8 +2952,23 @@ CodeEmitterGM107::emitVOTE()
>emitPRED (0x2d, insn->def(p));
> else
>emitPRED (0x2d);
> -   emitField(0x2a, 1, insn->src(0).mod == Modifier(NV50_IR_MOD_NOT));
> -   emitPRED (0x27, insn->src(0));
> +
> +   switch (insn->src(0).getFile()) {
> +   case FILE_PREDICATE:
> +  emitField(0x2a, 1, insn->src(0).mod == Modifier(NV50_IR_MOD_NOT));
> +  emitPRED (0x27, insn->src(0));
> +  break;
> +   case FILE_IMMEDIATE:
> +  imm = insn->src(0).get()->asImm();
> +  assert(imm);
> +  u32 = imm->reg.data.u32;
> +  assert(u32 == 0 || u32 == 1);
> +  emitField(0x27, 4, u32 == 1 ? 0x7 : 0xf);

I'd kinda prefer this to be

emitField(0x2a, 1, u32 == 0);
emitPRED(0x27);

That way you have symmetry with the predicate version. Unfortunately
this is tricky to do in the other emitters -- the helpers in gm107 are
*way* better (well, Ben probably learned from the earlier failures).
So don't worry about trying to do it in the other ones.

> +  break;
> +   default:
> +  assert(!"Unhandled src");
> +  break;
> +   }
>  }
>
>  void
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> index ee2d2f06c1..84c3aca1df 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> @@ -2587,7 +2587,8 @@ CodeEmitterNVC0::emitSHFL(const Instruction *i)
>  void
>  CodeEmitterNVC0::emitVOTE(const Instruction *i)
>  {
> -   assert(i->src(0).getFile() == FILE_PREDICATE);
> +   const ImmediateValue *imm;
> +   uint32_t u32;
>
> code[0] = 0x0004 | (i->subOp << 5);
> code[1] = 0x4800;
> @@ -2612,9 +2613,24 @@ CodeEmitterNVC0::emitVOTE(const Instruction *i)
>code[0] |= 63 << 14;
> if (!(rp & 2))
>code[1] |= 7 << 22;
> -   if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
> -  code[0] |= 1 << 23;
> -   srcId(i->src(0), 20);
> +
> +   switch (i->src(0).getFile()) {
> +   case FILE_PREDICATE:
> +  if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
> + 

Re: [Mesa-dev] [PATCH v2 6/9] nvc0/ir: Add SV_LANEMASK_* system values.

2017-04-09 Thread Ilia Mirkin
Please add these to nv50_ir_print.cpp's list of names too.

On Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding  wrote:
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir.h  | 5 +
>  src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 5 +
>  src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 5 +
>  src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 5 +
>  4 files changed, 20 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> index 6e5ffa525d..de6c110536 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> @@ -470,6 +470,11 @@ enum SVSemantic
> SV_BASEINSTANCE,
> SV_DRAWID,
> SV_WORK_DIM,
> +   SV_LANEMASK_EQ,
> +   SV_LANEMASK_LT,
> +   SV_LANEMASK_LE,
> +   SV_LANEMASK_GT,
> +   SV_LANEMASK_GE,
> SV_UNDEFINED,
> SV_LAST
>  };
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
> index 87976ffebc..bd4bd118f4 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
> @@ -2300,6 +2300,11 @@ CodeEmitterGK110::getSRegEncoding(const ValueRef& ref)
> case SV_NCTAID:return 0x2d + SDATA(ref).sv.index;
> case SV_LBASE: return 0x34;
> case SV_SBASE: return 0x30;
> +   case SV_LANEMASK_EQ:   return 0x38;
> +   case SV_LANEMASK_LT:   return 0x39;
> +   case SV_LANEMASK_LE:   return 0x3a;
> +   case SV_LANEMASK_GT:   return 0x3b;
> +   case SV_LANEMASK_GE:   return 0x3c;
> case SV_CLOCK: return 0x50 + SDATA(ref).sv.index;
> default:
>assert(!"no sreg for system value");
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> index 0382cb3903..29426c130b 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> @@ -269,6 +269,11 @@ CodeEmitterGM107::emitSYS(int pos, const Value *val)
> case SV_INVOCATION_INFO: id = 0x1d; break;
> case SV_TID: id = 0x21 + val->reg.data.sv.index; break;
> case SV_CTAID  : id = 0x25 + val->reg.data.sv.index; break;
> +   case SV_LANEMASK_EQ: id = 0x38; break;
> +   case SV_LANEMASK_LT: id = 0x39; break;
> +   case SV_LANEMASK_LE: id = 0x3a; break;
> +   case SV_LANEMASK_GT: id = 0x3b; break;
> +   case SV_LANEMASK_GE: id = 0x3c; break;
> case SV_CLOCK  : id = 0x50 + val->reg.data.sv.index; break;
> default:
>assert(!"invalid system value");
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> index 84c3aca1df..c549ca1158 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> @@ -1989,6 +1989,11 @@ CodeEmitterNVC0::getSRegEncoding(const ValueRef& ref)
> case SV_NCTAID:return 0x2d + SDATA(ref).sv.index;
> case SV_LBASE: return 0x34;
> case SV_SBASE: return 0x30;
> +   case SV_LANEMASK_EQ:   return 0x38;
> +   case SV_LANEMASK_LT:   return 0x39;
> +   case SV_LANEMASK_LE:   return 0x3a;
> +   case SV_LANEMASK_GT:   return 0x3b;
> +   case SV_LANEMASK_GE:   return 0x3c;
> case SV_CLOCK: return 0x50 + SDATA(ref).sv.index;
> default:
>assert(!"no sreg for system value");
> --
> 2.12.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/9] nvc0/ir: Emit OP_SHFL

2017-04-09 Thread Ilia Mirkin
On Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding  wrote:
> v2: (Samuel Pitoiset)
> Add an assertion to check if the target is Kepler
> Make sure that asImm() is not NULL
> ---
>  .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 59 
> ++
>  1 file changed, 59 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> index d5a310f88c..ee2d2f06c1 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> @@ -150,6 +150,8 @@ private:
>
> void emitPIXLD(const Instruction *);
>
> +   void emitSHFL(const Instruction *);
> +
> void emitVOTE(const Instruction *);
>
> inline void defId(const ValueDef&, const int pos);
> @@ -2529,6 +2531,60 @@ CodeEmitterNVC0::emitPIXLD(const Instruction *i)
>  }
>
>  void
> +CodeEmitterNVC0::emitSHFL(const Instruction *i)
> +{
> +   const ImmediateValue *imm;
> +
> +   assert(targ->getChipset() >= NVISA_GK104_CHIPSET);
> +
> +   code[0] = 0x0005;
> +   code[1] = 0x8800 | (i->subOp << 23);
> +
> +   emitPredicate(i);
> +
> +   defId(i->def(0), 14);
> +   srcId(i->src(0), 20);
> +
> +   switch (i->src(1).getFile()) {
> +   case FILE_GPR:
> +  srcId(i->src(1), 26);
> +  break;
> +   case FILE_IMMEDIATE:
> +  imm = i->src(1).get()->asImm();

The common thing to do is i->getSrc(1)->asImm(). Should be identical.
Same below.

> +  assert(imm);
> +  code[0] |= (imm->reg.data.u32 & 0x1f) << 26;
> +  code[0] |= 1 << 5;
> +  break;
> +   default:
> +  assert(!"invalid src1 file");
> +  break;
> +   }
> +
> +   switch (i->src(2).getFile()) {
> +   case FILE_GPR:
> +  srcId(i->src(2), 49);
> +  break;
> +   case FILE_IMMEDIATE:
> +  imm = i->src(2).get()->asImm();
> +  assert(imm);

&& imm->reg.data.u32 < 0x2000

> +  code[1] |= (imm->reg.data.u32 & 0x1fff) << 10;
> +  code[0] |= 1 << 6;
> +  break;
> +   default:
> +  assert(!"invalid src2 file");
> +  break;
> +   }
> +
> +   if (!i->defExists(1)) {
> +  code[0] |= 3 << 8;
> +  code[1] |= 1 << 26;
> +   } else {
> +  assert(i->def(1).getFile() == FILE_PREDICATE);
> +  setPDSTL(i->def(1));

setPDSTL should be able to handle the no-exists case too, no? You
might change the API to be

setPDSTL(const Instruction *, int d)

to avoid confusion.

> +   }
> +}
> +
> +void
>  CodeEmitterNVC0::emitVOTE(const Instruction *i)
>  {
> assert(i->src(0).getFile() == FILE_PREDICATE);
> @@ -2837,6 +2893,9 @@ CodeEmitterNVC0::emitInstruction(Instruction *insn)
> case OP_PIXLD:
>emitPIXLD(insn);
>break;
> +   case OP_SHFL:
> +  emitSHFL(insn);
> +  break;
> case OP_VOTE:
>emitVOTE(insn);
>break;
> --
> 2.12.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/9] nvc0/ir: Properly handle a "split form" of predicate destination

2017-04-09 Thread Ilia Mirkin
Wow, great find!

On Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding  wrote:
> GF100's ISA encoding has a weird form of predicate destination where its
> 3 bits are split across whole the instruction. Use a dedicated setPDSTL
> function instead of original defId which is incorrect in this case.
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> index 5467447e35..d5a310f88c 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
> @@ -58,6 +58,7 @@ private:
> void setImmediateS8(const ValueRef&);
> void setSUConst16(const Instruction *, const int s);
> void setSUPred(const Instruction *, const int s);
> +   inline void setPDSTL(const ValueDef&);
>
> void emitCondCode(CondCode cc, int pos);
> void emitInterpMode(const Instruction *);
> @@ -375,6 +376,14 @@ void CodeEmitterNVC0::setImmediateS8(const ValueRef )
> code[0] |= (s8 >> 6) << 8;
>  }
>
> +void CodeEmitterNVC0::setPDSTL(const ValueDef )
> +{
> +   uint32_t pred = (def.get() && def.getFile() != FILE_FLAGS ? DDATA(def).id 
> : 7);

Why not just == FILE_PREDICATE? Also, I don't think the outer parens do much.

> +
> +   code[0] |= (pred & 3) << 8;
> +   code[1] |= !!(pred & 7) << 26;

This always makes me nervous... how about

(pred & 4) << (26 - 2)

BTW, this should be pred & 4 in either case, no?

> +}
> +
>  void
>  CodeEmitterNVC0::emitForm_A(const Instruction *i, uint64_t opc)
>  {
> @@ -1873,7 +1882,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i)
>if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
>i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
>   assert(i->defExists(0));
> - defId(i->def(0), 8);
> + setPDSTL(i->def(0));
>}
> }
>
> @@ -1945,7 +1954,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
>
> if (p >= 0) {
>if (targ->getChipset() >= NVISA_GK104_CHIPSET)
> - defId(i->def(p), 8);
> + setPDSTL(i->def(p));
>else
>   defId(i->def(p), 32 + 18);
> }
> --
> 2.12.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 9/9] nvc0: Enable ARB_shader_ballot on Kepler+

2017-04-09 Thread Boyan Ding
readInvocationARB() and readFirstInvocationARB() need SHFL.IDX
instruction which is introduced in Kepler.
---
 docs/features.txt  | 2 +-
 docs/relnotes/17.1.0.html  | 2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index edc56842b9..a2d7785827 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -292,7 +292,7 @@ Khronos, ARB, and OES extensions that are not part of any 
OpenGL or OpenGL ES ve
   GL_ARB_sample_locations   not started
   GL_ARB_seamless_cubemap_per_texture   DONE (i965, nvc0, 
radeonsi, r600, softpipe, swr)
   GL_ARB_shader_atomic_counter_ops  DONE (i965/gen7+, 
nvc0, radeonsi, softpipe)
-  GL_ARB_shader_ballot  DONE (radeonsi)
+  GL_ARB_shader_ballot  DONE (nvc0, radeonsi)
   GL_ARB_shader_clock   DONE (i965/gen7+, 
nv50, nvc0, radeonsi)
   GL_ARB_shader_draw_parameters DONE (i965, nvc0, 
radeonsi)
   GL_ARB_shader_group_vote  DONE (nvc0, radeonsi)
diff --git a/docs/relnotes/17.1.0.html b/docs/relnotes/17.1.0.html
index 0a5cabe4f1..8f237ed527 100644
--- a/docs/relnotes/17.1.0.html
+++ b/docs/relnotes/17.1.0.html
@@ -45,7 +45,7 @@ Note: some of the new features are only available with 
certain drivers.
 
 
 GL_ARB_gpu_shader_int64 on i965/gen8+, nvc0, radeonsi, softpipe, 
llvmpipe
-GL_ARB_shader_ballot on radeonsi
+GL_ARB_shader_ballot on nvc0, radeonsi
 GL_ARB_shader_clock on nv50, nvc0, radeonsi
 GL_ARB_shader_group_vote on radeonsi
 GL_ARB_sparse_buffer on radeonsi/CIK+
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 7ef9bf9c9c..8c6712a121 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -259,6 +259,8 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
   return class_3d >= NVE4_3D_CLASS; /* needs testing on fermi */
case PIPE_CAP_POLYGON_MODE_FILL_RECTANGLE:
   return class_3d >= GM200_3D_CLASS;
+   case PIPE_CAP_TGSI_BALLOT:
+  return class_3d >= NVE4_3D_CLASS;
 
/* unsupported caps */
case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
@@ -289,7 +291,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY:
case PIPE_CAP_INT64_DIVMOD:
case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE:
-   case PIPE_CAP_TGSI_BALLOT:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 6/9] nvc0/ir: Add SV_LANEMASK_* system values.

2017-04-09 Thread Boyan Ding
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  | 5 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 5 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 5 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 5 +
 4 files changed, 20 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 6e5ffa525d..de6c110536 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -470,6 +470,11 @@ enum SVSemantic
SV_BASEINSTANCE,
SV_DRAWID,
SV_WORK_DIM,
+   SV_LANEMASK_EQ,
+   SV_LANEMASK_LT,
+   SV_LANEMASK_LE,
+   SV_LANEMASK_GT,
+   SV_LANEMASK_GE,
SV_UNDEFINED,
SV_LAST
 };
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 87976ffebc..bd4bd118f4 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -2300,6 +2300,11 @@ CodeEmitterGK110::getSRegEncoding(const ValueRef& ref)
case SV_NCTAID:return 0x2d + SDATA(ref).sv.index;
case SV_LBASE: return 0x34;
case SV_SBASE: return 0x30;
+   case SV_LANEMASK_EQ:   return 0x38;
+   case SV_LANEMASK_LT:   return 0x39;
+   case SV_LANEMASK_LE:   return 0x3a;
+   case SV_LANEMASK_GT:   return 0x3b;
+   case SV_LANEMASK_GE:   return 0x3c;
case SV_CLOCK: return 0x50 + SDATA(ref).sv.index;
default:
   assert(!"no sreg for system value");
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index 0382cb3903..29426c130b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -269,6 +269,11 @@ CodeEmitterGM107::emitSYS(int pos, const Value *val)
case SV_INVOCATION_INFO: id = 0x1d; break;
case SV_TID: id = 0x21 + val->reg.data.sv.index; break;
case SV_CTAID  : id = 0x25 + val->reg.data.sv.index; break;
+   case SV_LANEMASK_EQ: id = 0x38; break;
+   case SV_LANEMASK_LT: id = 0x39; break;
+   case SV_LANEMASK_LE: id = 0x3a; break;
+   case SV_LANEMASK_GT: id = 0x3b; break;
+   case SV_LANEMASK_GE: id = 0x3c; break;
case SV_CLOCK  : id = 0x50 + val->reg.data.sv.index; break;
default:
   assert(!"invalid system value");
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 84c3aca1df..c549ca1158 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1989,6 +1989,11 @@ CodeEmitterNVC0::getSRegEncoding(const ValueRef& ref)
case SV_NCTAID:return 0x2d + SDATA(ref).sv.index;
case SV_LBASE: return 0x34;
case SV_SBASE: return 0x30;
+   case SV_LANEMASK_EQ:   return 0x38;
+   case SV_LANEMASK_LT:   return 0x39;
+   case SV_LANEMASK_LE:   return 0x3a;
+   case SV_LANEMASK_GT:   return 0x3b;
+   case SV_LANEMASK_GE:   return 0x3c;
case SV_CLOCK: return 0x50 + SDATA(ref).sv.index;
default:
   assert(!"no sreg for system value");
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 5/9] nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE

2017-04-09 Thread Boyan Ding
Implementation of readFirstInvocationARB() on nvidia hardware needs a
ballotARB(true) used to decide the first active thread. This expressed
in gm107 asm as (supposing output is $r0):
vote any $r0 0x1 0x1

To model the always true input, which corresponds to the second 0x1
above, we make OP_VOTE accept immediate value 0/1 and emit "0x1" and
"not 0x1" in the src field respectively.

v2: Make sure that asImm() is not NULL (Samuel Pitoiset)
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 24 ++
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 22 +---
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 24 ++
 3 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 58076ba4d5..87976ffebc 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -1621,7 +1621,8 @@ CodeEmitterGK110::emitSHFL(const Instruction *i)
 void
 CodeEmitterGK110::emitVOTE(const Instruction *i)
 {
-   assert(i->src(0).getFile() == FILE_PREDICATE);
+   const ImmediateValue *imm;
+   uint32_t u32;
 
code[0] = 0x0002;
code[1] = 0x86c0 | (i->subOp << 19);
@@ -1646,9 +1647,24 @@ CodeEmitterGK110::emitVOTE(const Instruction *i)
   code[0] |= 255 << 2;
if (!(rp & 2))
   code[1] |= 7 << 16;
-   if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
-  code[1] |= 1 << 13;
-   srcId(i->src(0), 42);
+
+   switch (i->src(0).getFile()) {
+   case FILE_PREDICATE:
+  if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
+ code[0] |= 1 << 13;
+  srcId(i->src(0), 42);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->src(0).get()->asImm();
+  assert(imm);
+  u32 = imm->reg.data.u32;
+  assert(u32 == 0 || u32 == 1);
+  code[1] |= (u32 == 1 ? 0x7 : 0xf) << 10;
+  break;
+   default:
+  assert(!"Unhandled src");
+  break;
+   }
 }
 
 void
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index 944563c93c..0382cb3903 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -2931,7 +2931,8 @@ CodeEmitterGM107::emitMEMBAR()
 void
 CodeEmitterGM107::emitVOTE()
 {
-   assert(insn->src(0).getFile() == FILE_PREDICATE);
+   const ImmediateValue *imm;
+   uint32_t u32;
 
int r = -1, p = -1;
for (int i = 0; insn->defExists(i); i++) {
@@ -2951,8 +2952,23 @@ CodeEmitterGM107::emitVOTE()
   emitPRED (0x2d, insn->def(p));
else
   emitPRED (0x2d);
-   emitField(0x2a, 1, insn->src(0).mod == Modifier(NV50_IR_MOD_NOT));
-   emitPRED (0x27, insn->src(0));
+
+   switch (insn->src(0).getFile()) {
+   case FILE_PREDICATE:
+  emitField(0x2a, 1, insn->src(0).mod == Modifier(NV50_IR_MOD_NOT));
+  emitPRED (0x27, insn->src(0));
+  break;
+   case FILE_IMMEDIATE:
+  imm = insn->src(0).get()->asImm();
+  assert(imm);
+  u32 = imm->reg.data.u32;
+  assert(u32 == 0 || u32 == 1);
+  emitField(0x27, 4, u32 == 1 ? 0x7 : 0xf);
+  break;
+   default:
+  assert(!"Unhandled src");
+  break;
+   }
 }
 
 void
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index ee2d2f06c1..84c3aca1df 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -2587,7 +2587,8 @@ CodeEmitterNVC0::emitSHFL(const Instruction *i)
 void
 CodeEmitterNVC0::emitVOTE(const Instruction *i)
 {
-   assert(i->src(0).getFile() == FILE_PREDICATE);
+   const ImmediateValue *imm;
+   uint32_t u32;
 
code[0] = 0x0004 | (i->subOp << 5);
code[1] = 0x4800;
@@ -2612,9 +2613,24 @@ CodeEmitterNVC0::emitVOTE(const Instruction *i)
   code[0] |= 63 << 14;
if (!(rp & 2))
   code[1] |= 7 << 22;
-   if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
-  code[0] |= 1 << 23;
-   srcId(i->src(0), 20);
+
+   switch (i->src(0).getFile()) {
+   case FILE_PREDICATE:
+  if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
+ code[0] |= 1 << 23;
+  srcId(i->src(0), 20);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->src(0).get()->asImm();
+  assert(imm);
+  u32 = imm->reg.data.u32;
+  assert(u32 == 0 || u32 == 1);
+  code[0] |= (u32 == 1 ? 0x7 : 0xf) << 20;
+  break;
+   default:
+  assert(!"Unhandled src");
+  break;
+   }
 }
 
 bool
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 7/9] nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_*

2017-04-09 Thread Boyan Ding
---
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 27 ++
 1 file changed, 27 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 3ed7d345c4..1bd01a9a32 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -450,6 +450,12 @@ static nv50_ir::SVSemantic translateSysVal(uint sysval)
case TGSI_SEMANTIC_BASEINSTANCE: return nv50_ir::SV_BASEINSTANCE;
case TGSI_SEMANTIC_DRAWID: return nv50_ir::SV_DRAWID;
case TGSI_SEMANTIC_WORK_DIM:   return nv50_ir::SV_WORK_DIM;
+   case TGSI_SEMANTIC_SUBGROUP_INVOCATION: return nv50_ir::SV_LANEID;
+   case TGSI_SEMANTIC_SUBGROUP_EQ_MASK: return nv50_ir::SV_LANEMASK_EQ;
+   case TGSI_SEMANTIC_SUBGROUP_LT_MASK: return nv50_ir::SV_LANEMASK_LT;
+   case TGSI_SEMANTIC_SUBGROUP_LE_MASK: return nv50_ir::SV_LANEMASK_LE;
+   case TGSI_SEMANTIC_SUBGROUP_GT_MASK: return nv50_ir::SV_LANEMASK_GT;
+   case TGSI_SEMANTIC_SUBGROUP_GE_MASK: return nv50_ir::SV_LANEMASK_GE;
default:
   assert(0);
   return nv50_ir::SV_CLOCK;
@@ -1667,6 +1673,8 @@ private:
Symbol *srcToSym(tgsi::Instruction::SrcRegister, int c);
Symbol *dstToSym(tgsi::Instruction::DstRegister, int c);
 
+   bool isSubGroupMask(uint8_t semantic);
+
bool handleInstruction(const struct tgsi_full_instruction *);
void exportOutputs();
inline Subroutine *getSubroutine(unsigned ip);
@@ -1996,6 +2004,21 @@ Converter::adjustTempIndex(int arrayId, int , int 
) const
idx += it->second;
 }
 
+bool
+Converter::isSubGroupMask(uint8_t semantic)
+{
+   switch (semantic) {
+  case TGSI_SEMANTIC_SUBGROUP_EQ_MASK:
+  case TGSI_SEMANTIC_SUBGROUP_LT_MASK:
+  case TGSI_SEMANTIC_SUBGROUP_LE_MASK:
+  case TGSI_SEMANTIC_SUBGROUP_GT_MASK:
+  case TGSI_SEMANTIC_SUBGROUP_GE_MASK:
+ return true;
+  default:
+ return false;
+   }
+}
+
 Value *
 Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr)
 {
@@ -2041,6 +2064,10 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src, 
int c, Value *ptr)
   if (info->sv[idx].sn == TGSI_SEMANTIC_THREAD_ID &&
   info->prop.cp.numThreads[swz] == 1)
  return loadImm(NULL, 0u);
+  if (isSubGroupMask(info->sv[idx].sn) && swz > 0)
+ return loadImm(NULL, 0u);
+  if (info->sv[idx].sn == TGSI_SEMANTIC_SUBGROUP_SIZE)
+ return loadImm(NULL, 32u);
   ld = mkOp1(OP_RDSV, TYPE_U32, getSSA(), srcToSym(src, c));
   ld->perPatch = info->sv[idx].patch;
   return ld->getDef(0);
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/9] nvc0/ir: Emit OP_SHFL

2017-04-09 Thread Boyan Ding
v2: (Samuel Pitoiset)
Add an assertion to check if the target is Kepler
Make sure that asImm() is not NULL
---
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 59 ++
 1 file changed, 59 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index d5a310f88c..ee2d2f06c1 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -150,6 +150,8 @@ private:
 
void emitPIXLD(const Instruction *);
 
+   void emitSHFL(const Instruction *);
+
void emitVOTE(const Instruction *);
 
inline void defId(const ValueDef&, const int pos);
@@ -2529,6 +2531,60 @@ CodeEmitterNVC0::emitPIXLD(const Instruction *i)
 }
 
 void
+CodeEmitterNVC0::emitSHFL(const Instruction *i)
+{
+   const ImmediateValue *imm;
+
+   assert(targ->getChipset() >= NVISA_GK104_CHIPSET);
+
+   code[0] = 0x0005;
+   code[1] = 0x8800 | (i->subOp << 23);
+
+   emitPredicate(i);
+
+   defId(i->def(0), 14);
+   srcId(i->src(0), 20);
+
+   switch (i->src(1).getFile()) {
+   case FILE_GPR:
+  srcId(i->src(1), 26);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->src(1).get()->asImm();
+  assert(imm);
+  code[0] |= (imm->reg.data.u32 & 0x1f) << 26;
+  code[0] |= 1 << 5;
+  break;
+   default:
+  assert(!"invalid src1 file");
+  break;
+   }
+
+   switch (i->src(2).getFile()) {
+   case FILE_GPR:
+  srcId(i->src(2), 49);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->src(2).get()->asImm();
+  assert(imm);
+  code[1] |= (imm->reg.data.u32 & 0x1fff) << 10;
+  code[0] |= 1 << 6;
+  break;
+   default:
+  assert(!"invalid src2 file");
+  break;
+   }
+
+   if (!i->defExists(1)) {
+  code[0] |= 3 << 8;
+  code[1] |= 1 << 26;
+   } else {
+  assert(i->def(1).getFile() == FILE_PREDICATE);
+  setPDSTL(i->def(1));
+   }
+}
+
+void
 CodeEmitterNVC0::emitVOTE(const Instruction *i)
 {
assert(i->src(0).getFile() == FILE_PREDICATE);
@@ -2837,6 +2893,9 @@ CodeEmitterNVC0::emitInstruction(Instruction *insn)
case OP_PIXLD:
   emitPIXLD(insn);
   break;
+   case OP_SHFL:
+  emitSHFL(insn);
+  break;
case OP_VOTE:
   emitVOTE(insn);
   break;
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/9] gm107/ir: Emit third src 'bound' and optional predicate output of SHFL

2017-04-09 Thread Boyan Ding
v2: Emit the original hard-coded 0x1c03 when OP_SHFL is used in gm107's
lowering (Samuel Pitoiset)
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 23 ++
 .../nouveau/codegen/nv50_ir_lowering_gm107.cpp | 15 +-
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index c3c0dcd9fc..944563c93c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -967,11 +967,26 @@ CodeEmitterGM107::emitSHFL()
   break;
}
 
-   /*XXX: what is this arg? hardcode immediate for now */
-   emitField(0x22, 13, 0x1c03);
-   type |= 2;
+   switch (insn->src(2).getFile()) {
+   case FILE_GPR:
+  emitGPR(0x27, insn->src(2));
+  break;
+   case FILE_IMMEDIATE:
+  emitIMMD(0x22, 13, insn->src(2));
+  type |= 2;
+  break;
+   default:
+  assert(!"invalid src2 file");
+  break;
+   }
+
+   if (!insn->defExists(1))
+  emitPRED(0x30);
+   else {
+  assert(insn->def(1).getFile() == FILE_PREDICATE);
+  emitPRED(0x30, insn->def(1));
+   }
 
-   emitPRED (0x30);
emitField(0x1e, 2, insn->subOp);
emitField(0x1c, 2, type);
emitGPR  (0x08, insn->src(0));
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp
index 371ebae40c..6b9edd4864 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp
@@ -41,6 +41,8 @@ namespace nv50_ir {
((QOP_##q << 6) | (QOP_##r << 4) |   \
 (QOP_##s << 2) | (QOP_##t << 0))
 
+#define SHFL_BOUND_QUAD 0x1c03
+
 void
 GM107LegalizeSSA::handlePFETCH(Instruction *i)
 {
@@ -120,7 +122,8 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i)
   // mov coordinates from lane l to all lanes
   bld.mkOp(OP_QUADON, TYPE_NONE, NULL);
   for (c = 0; c < dim; ++c) {
- bld.mkOp2(OP_SHFL, TYPE_F32, crd[c], i->getSrc(c + array), 
bld.mkImm(l));
+ bld.mkOp3(OP_SHFL, TYPE_F32, crd[c], i->getSrc(c + array),
+   bld.mkImm(l), bld.mkImm(SHFL_BOUND_QUAD));
  add = bld.mkOp2(OP_QUADOP, TYPE_F32, crd[c], crd[c], zero);
  add->subOp = 0x00;
  add->lanes = 1; /* abused for .ndv */
@@ -128,7 +131,8 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i)
 
   // add dPdx from lane l to lanes dx
   for (c = 0; c < dim; ++c) {
- bld.mkOp2(OP_SHFL, TYPE_F32, tmp, i->dPdx[c].get(), bld.mkImm(l));
+ bld.mkOp3(OP_SHFL, TYPE_F32, tmp, i->dPdx[c].get(), bld.mkImm(l),
+   bld.mkImm(SHFL_BOUND_QUAD));
  add = bld.mkOp2(OP_QUADOP, TYPE_F32, crd[c], tmp, crd[c]);
  add->subOp = qOps[l][0];
  add->lanes = 1; /* abused for .ndv */
@@ -136,7 +140,8 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i)
 
   // add dPdy from lane l to lanes dy
   for (c = 0; c < dim; ++c) {
- bld.mkOp2(OP_SHFL, TYPE_F32, tmp, i->dPdy[c].get(), bld.mkImm(l));
+ bld.mkOp3(OP_SHFL, TYPE_F32, tmp, i->dPdy[c].get(), bld.mkImm(l),
+   bld.mkImm(SHFL_BOUND_QUAD));
  add = bld.mkOp2(OP_QUADOP, TYPE_F32, crd[c], tmp, crd[c]);
  add->subOp = qOps[l][1];
  add->lanes = 1; /* abused for .ndv */
@@ -203,8 +208,8 @@ GM107LoweringPass::handleDFDX(Instruction *insn)
   break;
}
 
-   shfl = bld.mkOp2(OP_SHFL, TYPE_F32, bld.getScratch(),
-insn->getSrc(0), bld.mkImm(xid));
+   shfl = bld.mkOp3(OP_SHFL, TYPE_F32, bld.getScratch(), insn->getSrc(0),
+bld.mkImm(xid), bld.mkImm(SHFL_BOUND_QUAD));
shfl->subOp = NV50_IR_SUBOP_SHFL_BFLY;
insn->op = OP_QUADOP;
insn->subOp = qop;
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/9] nvc0/ir: Properly handle a "split form" of predicate destination

2017-04-09 Thread Boyan Ding
GF100's ISA encoding has a weird form of predicate destination where its
3 bits are split across whole the instruction. Use a dedicated setPDSTL
function instead of original defId which is incorrect in this case.
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 5467447e35..d5a310f88c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -58,6 +58,7 @@ private:
void setImmediateS8(const ValueRef&);
void setSUConst16(const Instruction *, const int s);
void setSUPred(const Instruction *, const int s);
+   inline void setPDSTL(const ValueDef&);
 
void emitCondCode(CondCode cc, int pos);
void emitInterpMode(const Instruction *);
@@ -375,6 +376,14 @@ void CodeEmitterNVC0::setImmediateS8(const ValueRef )
code[0] |= (s8 >> 6) << 8;
 }
 
+void CodeEmitterNVC0::setPDSTL(const ValueDef )
+{
+   uint32_t pred = (def.get() && def.getFile() != FILE_FLAGS ? DDATA(def).id : 
7);
+
+   code[0] |= (pred & 3) << 8;
+   code[1] |= !!(pred & 7) << 26;
+}
+
 void
 CodeEmitterNVC0::emitForm_A(const Instruction *i, uint64_t opc)
 {
@@ -1873,7 +1882,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i)
   if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
   i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
  assert(i->defExists(0));
- defId(i->def(0), 8);
+ setPDSTL(i->def(0));
   }
}
 
@@ -1945,7 +1954,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
 
if (p >= 0) {
   if (targ->getChipset() >= NVISA_GK104_CHIPSET)
- defId(i->def(p), 8);
+ setPDSTL(i->def(p));
   else
  defId(i->def(p), 32 + 18);
}
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] nvc0: ARB_shader_ballot for Kepler+ (v2)

2017-04-09 Thread Boyan Ding
This is v2 series of my ARB_shader_ballot enablement. I added some fixes
based on Samuel Pitoiset's feedback, which mainly include adapting
existing OP_SHFL usage to the new form in gm107's lowering and addition
of several assertion checks. It is also rebased against current master.

Boyan Ding (9):
  gm107/ir: Emit third src 'bound' and optional predicate output of SHFL
  nvc0/ir: Properly handle a "split form" of predicate destination
  nvc0/ir: Emit OP_SHFL
  gk110/ir: Emit OP_SHFL
  nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE
  nvc0/ir: Add SV_LANEMASK_* system values.
  nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_*
  nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*
  nvc0: Enable ARB_shader_ballot on Kepler+

 docs/features.txt  |   2 +-
 docs/relnotes/17.1.0.html  |   2 +-
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  |   5 +
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp |  85 -
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp |  50 --
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 101 +++--
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |  55 +++
 .../nouveau/codegen/nv50_ir_lowering_gm107.cpp |  15 ++-
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |   3 +-
 9 files changed, 293 insertions(+), 25 deletions(-)

-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 8/9] nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*

2017-04-09 Thread Boyan Ding
---
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 28 ++
 1 file changed, 28 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 1bd01a9a32..2ce6f29905 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -978,6 +978,10 @@ static nv50_ir::operation translateOpcode(uint opcode)
NV50_IR_OPCODE_CASE(VOTE_ANY, VOTE);
NV50_IR_OPCODE_CASE(VOTE_EQ, VOTE);
 
+   NV50_IR_OPCODE_CASE(BALLOT, VOTE);
+   NV50_IR_OPCODE_CASE(READ_INVOC, SHFL);
+   NV50_IR_OPCODE_CASE(READ_FIRST, SHFL);
+
NV50_IR_OPCODE_CASE(END, EXIT);
 
default:
@@ -3431,6 +3435,30 @@ Converter::handleInstruction(const struct 
tgsi_full_instruction *insn)
  mkCvt(OP_CVT, TYPE_U32, dst0[c], TYPE_U8, val0);
   }
   break;
+   case TGSI_OPCODE_BALLOT:
+  val0 = new_LValue(func, FILE_PREDICATE);
+  mkCmp(OP_SET, CC_NE, TYPE_U32, val0, TYPE_U32, fetchSrc(0, 0), zero);
+  mkOp1(op, TYPE_U32, dst0[0], val0)->subOp = NV50_IR_SUBOP_VOTE_ANY;
+  mkMov(dst0[1], zero, TYPE_U32);
+  break;
+   case TGSI_OPCODE_READ_FIRST:
+  // ReadFirstInvocationARB(src) is implemented as
+  // ReadInvocationARB(src, findLSB(ballot(true)))
+  val0 = getScratch();
+  mkOp1(OP_VOTE, TYPE_U32, val0, mkImm(1))->subOp = NV50_IR_SUBOP_VOTE_ANY;
+  mkOp2(OP_EXTBF, TYPE_U32, val0, val0, mkImm(0x2000))
+ ->subOp = NV50_IR_SUBOP_EXTBF_REV;
+  mkOp1(OP_BFIND, TYPE_U32, val0, val0)->subOp = NV50_IR_SUBOP_BFIND_SAMT;
+  src1 = val0;
+  /* fallthrough */
+   case TGSI_OPCODE_READ_INVOC:
+  if (tgsi.getOpcode() == TGSI_OPCODE_READ_INVOC)
+ src1 = fetchSrc(1, 0);
+  FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
+ geni = mkOp3(op, dstTy, dst0[c], fetchSrc(0, c), src1, mkImm(0x1f));
+ geni->subOp = NV50_IR_SUBOP_SHFL_IDX;
+  }
+  break;
case TGSI_OPCODE_CLOCK:
   // Stick the 32-bit clock into the high dword of the logical result.
   if (!tgsi.getDst(0).isMasked(0))
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 4/9] gk110/ir: Emit OP_SHFL

2017-04-09 Thread Boyan Ding
v2: Make sure that asImm() is not NULL (Samuel Pitoiset)
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 56 ++
 1 file changed, 56 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 1121ae0912..58076ba4d5 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -135,6 +135,8 @@ private:
 
void emitFlow(const Instruction *);
 
+   void emitSHFL(const Instruction *);
+
void emitVOTE(const Instruction *);
 
void emitSULDGB(const TexInstruction *);
@@ -1566,6 +1568,57 @@ CodeEmitterGK110::emitFlow(const Instruction *i)
 }
 
 void
+CodeEmitterGK110::emitSHFL(const Instruction *i)
+{
+   const ImmediateValue *imm;
+
+   code[0] = 0x0002;
+   code[1] = 0x7880 | (i->subOp << 1);
+
+   emitPredicate(i);
+
+   defId(i->def(0), 2);
+   srcId(i->src(0), 10);
+
+   switch (i->src(1).getFile()) {
+   case FILE_GPR:
+  srcId(i->src(1), 23);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->src(1).get()->asImm();
+  assert(imm);
+  code[0] |= (imm->reg.data.u32 & 0x1f) << 23;
+  code[0] |= 1 << 31;
+  break;
+   default:
+  assert(!"invalid src1 file");
+  break;
+   }
+
+   switch (i->src(2).getFile()) {
+   case FILE_GPR:
+  srcId(i->src(2), 42);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->src(2).get()->asImm();
+  assert(imm);
+  code[1] |= (imm->reg.data.u32 & 0x1fff) << 5;
+  code[1] |= 1;
+  break;
+   default:
+  assert(!"invalid src2 file");
+  break;
+   }
+
+   if (!i->defExists(1))
+  code[1] |= 7 << 19;
+   else {
+  assert(i->def(1).getFile() == FILE_PREDICATE);
+  defId(i->def(1), 51);
+   }
+}
+
+void
 CodeEmitterGK110::emitVOTE(const Instruction *i)
 {
assert(i->src(0).getFile() == FILE_PREDICATE);
@@ -2642,6 +2695,9 @@ CodeEmitterGK110::emitInstruction(Instruction *insn)
case OP_CCTL:
   emitCCTL(insn);
   break;
+   case OP_SHFL:
+  emitSHFL(insn);
+  break;
case OP_VOTE:
   emitVOTE(insn);
   break;
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radv: Use a shader for occlusion CmdCopyQueryPoolResults.

2017-04-09 Thread Bas Nieuwenhuizen
Use the new occlusion query copy shader.

We don't use the shader for the waiting as a polling loop ineracts badly
with having caching enabled. I noticed on my GPU (Tonga) that the values
are written out in order, so I just use a WAIT_REG_MEM on the last value.

If it turns out other chips don't do that we may need to look a bit more
into this. Having 8 WAIT_REG_MEM packets per query doesn't sound ideal.

This also restricts the availability word in the pool to timestamp queries
only, as occlusion queries don't use it, and pipeline statistic queries
likely won't either.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/amd/vulkan/radv_query.c | 138 
 1 file changed, 64 insertions(+), 74 deletions(-)

diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
index 5b1fff4eeaa..86be85a5369 100644
--- a/src/amd/vulkan/radv_query.c
+++ b/src/amd/vulkan/radv_query.c
@@ -486,9 +486,7 @@ VkResult radv_CreateQueryPool(
 
switch(pCreateInfo->queryType) {
case VK_QUERY_TYPE_OCCLUSION:
-   /* 16 bytes tmp. buffer as the compute packet writes 64 bits, 
but
-* the app. may have 32 bits of space. */
-   pool->stride = 16 * get_max_db(device) + 16;
+   pool->stride = 16 * get_max_db(device);
break;
case VK_QUERY_TYPE_PIPELINE_STATISTICS:
pool->stride = 16 * 11;
@@ -502,7 +500,9 @@ VkResult radv_CreateQueryPool(
 
pool->type = pCreateInfo->queryType;
pool->availability_offset = pool->stride * pCreateInfo->queryCount;
-   size = pool->availability_offset + 4 * pCreateInfo->queryCount;
+   size = pool->availability_offset;
+   if (pCreateInfo->queryType == VK_QUERY_TYPE_TIMESTAMP)
+   size += 4 * pCreateInfo->queryCount;
 
pool->bo = device->ws->buffer_create(device->ws, size,
 64, RADEON_DOMAIN_GTT, 0);
@@ -649,6 +649,7 @@ void radv_CmdCopyQueryPoolResults(
RADV_FROM_HANDLE(radv_query_pool, pool, queryPool);
RADV_FROM_HANDLE(radv_buffer, dst_buffer, dstBuffer);
struct radeon_winsys_cs *cs = cmd_buffer->cs;
+   unsigned elem_size = (flags & VK_QUERY_RESULT_64_BIT) ? 8 : 4;
uint64_t va = cmd_buffer->device->ws->buffer_get_va(pool->bo);
uint64_t dest_va = 
cmd_buffer->device->ws->buffer_get_va(dst_buffer->bo);
dest_va += dst_buffer->offset + dstOffset;
@@ -656,33 +657,62 @@ void radv_CmdCopyQueryPoolResults(
cmd_buffer->device->ws->cs_add_buffer(cmd_buffer->cs, pool->bo, 8);
cmd_buffer->device->ws->cs_add_buffer(cmd_buffer->cs, dst_buffer->bo, 
8);
 
-   for(unsigned i = 0; i < queryCount; ++i, dest_va += stride) {
-   unsigned query = firstQuery + i;
-   uint64_t local_src_va = va  + query * pool->stride;
-   unsigned elem_size = (flags & VK_QUERY_RESULT_64_BIT) ? 8 : 4;
-
-   MAYBE_UNUSED unsigned cdw_max = 
radeon_check_space(cmd_buffer->device->ws, cs, 26);
-
+   switch (pool->type) {
+   case VK_QUERY_TYPE_OCCLUSION:
if (flags & VK_QUERY_RESULT_WAIT_BIT) {
-   /* TODO, not sure if there is any case where we won't 
always be ready yet */
-   uint64_t avail_va = va + pool->availability_offset + 4 
* query;
-
-
-   /* This waits on the ME. All copies below are done on 
the ME */
-   radeon_emit(cs, PKT3(PKT3_WAIT_REG_MEM, 5, 0));
-   radeon_emit(cs, WAIT_REG_MEM_EQUAL | 
WAIT_REG_MEM_MEM_SPACE(1));
-   radeon_emit(cs, avail_va);
-   radeon_emit(cs, avail_va >> 32);
-   radeon_emit(cs, 1); /* reference value */
-   radeon_emit(cs, 0x); /* mask */
-   radeon_emit(cs, 4); /* poll interval */
+   for(unsigned i = 0; i < queryCount; ++i, dest_va += 
stride) {
+   unsigned query = firstQuery + i;
+   uint64_t src_va = va + query * pool->stride + 
pool->stride - 4;
+
+   /* Waits on the upper word of the last DB entry 
*/
+   radeon_emit(cs, PKT3(PKT3_WAIT_REG_MEM, 5, 0));
+   radeon_emit(cs, /*WAIT_REG_MEM_EQUAL*/ 5 | 
WAIT_REG_MEM_MEM_SPACE(1));
+   radeon_emit(cs, src_va);
+   radeon_emit(cs, src_va >> 32);
+   radeon_emit(cs, 0x8000); /* reference value 
*/
+   radeon_emit(cs, 0x); /* mask */
+   radeon_emit(cs, 4); /* poll interval */
+   }
}
+   occlusion_query_shader(cmd_buffer, pool->bo, dst_buffer->bo,
+  firstQuery * 

[Mesa-dev] [PATCH 1/2] radv: Add occlusion query shader.

2017-04-09 Thread Bas Nieuwenhuizen
Adds a shader for writing occlusion query results to a buffer, as the
CP packet isn't support on SI or secondary buffers, and doesn't handle
the availability bit (or partial results) nor truncation to 32-bit.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/amd/vulkan/radv_meta.c|   7 +
 src/amd/vulkan/radv_meta.h|   3 +
 src/amd/vulkan/radv_private.h |   6 +
 src/amd/vulkan/radv_query.c   | 419 ++
 4 files changed, 435 insertions(+)

diff --git a/src/amd/vulkan/radv_meta.c b/src/amd/vulkan/radv_meta.c
index 04fa247dd36..0098e0844c1 100644
--- a/src/amd/vulkan/radv_meta.c
+++ b/src/amd/vulkan/radv_meta.c
@@ -324,6 +324,10 @@ radv_device_init_meta(struct radv_device *device)
if (result != VK_SUCCESS)
goto fail_buffer;
 
+   result = radv_device_init_meta_query_state(device);
+   if (result != VK_SUCCESS)
+   goto fail_query;
+
result = radv_device_init_meta_fast_clear_flush_state(device);
if (result != VK_SUCCESS)
goto fail_fast_clear;
@@ -337,6 +341,8 @@ fail_resolve_compute:
radv_device_finish_meta_fast_clear_flush_state(device);
 fail_fast_clear:
radv_device_finish_meta_buffer_state(device);
+fail_query:
+   radv_device_finish_meta_query_state(device);
 fail_buffer:
radv_device_finish_meta_depth_decomp_state(device);
 fail_depth_decomp:
@@ -363,6 +369,7 @@ radv_device_finish_meta(struct radv_device *device)
radv_device_finish_meta_blit2d_state(device);
radv_device_finish_meta_bufimage_state(device);
radv_device_finish_meta_depth_decomp_state(device);
+   radv_device_finish_meta_query_state(device);
radv_device_finish_meta_buffer_state(device);
radv_device_finish_meta_fast_clear_flush_state(device);
radv_device_finish_meta_resolve_compute_state(device);
diff --git a/src/amd/vulkan/radv_meta.h b/src/amd/vulkan/radv_meta.h
index d70fef1e5f1..6cfc6134c53 100644
--- a/src/amd/vulkan/radv_meta.h
+++ b/src/amd/vulkan/radv_meta.h
@@ -85,6 +85,9 @@ void radv_device_finish_meta_blit2d_state(struct radv_device 
*device);
 VkResult radv_device_init_meta_buffer_state(struct radv_device *device);
 void radv_device_finish_meta_buffer_state(struct radv_device *device);
 
+VkResult radv_device_init_meta_query_state(struct radv_device *device);
+void radv_device_finish_meta_query_state(struct radv_device *device);
+
 VkResult radv_device_init_meta_resolve_compute_state(struct radv_device 
*device);
 void radv_device_finish_meta_resolve_compute_state(struct radv_device *device);
 void radv_meta_save(struct radv_meta_saved_state *state,
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index 580c1197e64..a03c24c24ac 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -438,6 +438,12 @@ struct radv_meta_state {
VkPipeline fill_pipeline;
VkPipeline copy_pipeline;
} buffer;
+
+   struct {
+   VkDescriptorSetLayout occlusion_query_ds_layout;
+   VkPipelineLayout occlusion_query_p_layout;
+   VkPipeline occlusion_query_pipeline;
+   } query;
 };
 
 /* queue types */
diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
index 288bd43a763..5b1fff4eeaa 100644
--- a/src/amd/vulkan/radv_query.c
+++ b/src/amd/vulkan/radv_query.c
@@ -29,6 +29,8 @@
 #include 
 #include 
 
+#include "nir/nir_builder.h"
+#include "radv_meta.h"
 #include "radv_private.h"
 #include "radv_cs.h"
 #include "sid.h"
@@ -49,6 +51,423 @@ static unsigned get_max_db(struct radv_device *device)
return num_db;
 }
 
+static void radv_break_on_count(nir_builder *b, nir_variable *var, int count)
+{
+   nir_ssa_def *counter = nir_load_var(b, var);
+
+   nir_if *if_stmt = nir_if_create(b->shader);
+   if_stmt->condition = nir_src_for_ssa(nir_uge(b, counter, nir_imm_int(b, 
count)));
+   nir_cf_node_insert(b->cursor, _stmt->cf_node);
+
+   b->cursor = nir_after_cf_list(_stmt->then_list);
+
+   nir_jump_instr *instr = nir_jump_instr_create(b->shader, 
nir_jump_break);
+   nir_builder_instr_insert(b, >instr);
+
+   b->cursor = nir_after_cf_node(_stmt->cf_node);
+   counter = nir_iadd(b, counter, nir_imm_int(b, 1));
+   nir_store_var(b, var, counter, 0x1);
+}
+
+static struct nir_ssa_def *
+radv_load_push_int(nir_builder *b, unsigned offset, const char *name)
+{
+   nir_intrinsic_instr *flags = nir_intrinsic_instr_create(b->shader, 
nir_intrinsic_load_push_constant);
+   flags->src[0] = nir_src_for_ssa(nir_imm_int(b, offset));
+   flags->num_components = 1;
+   nir_ssa_dest_init(>instr, >dest, 1, 32, name);
+   nir_builder_instr_insert(b, >instr);
+   return >dest.ssa;
+}
+
+static nir_shader *
+build_occlusion_query_shader(struct radv_device *device) {
+   /* the shader this builds is roughly
+*
+* push constants {
+

[Mesa-dev] [PATCH 2/3] r600g: add draw_vbo check for a NULL pixel shader

2017-04-09 Thread Constantine Kharlamov
Taken from radeonsi, required to remove dummy pixel shader in the next patch

Signed-off-by: Constantine Kharlamov 
---
 src/gallium/drivers/r600/evergreen_state.c   | 1 +
 src/gallium/drivers/r600/r600_pipe.h | 1 +
 src/gallium/drivers/r600/r600_state.c| 3 ++-
 src/gallium/drivers/r600/r600_state_common.c | 7 ++-
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 371e7ce212..5697da4af9 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -471,6 +471,7 @@ static void *evergreen_create_rs_state(struct pipe_context 
*ctx,
rs->clip_halfz = state->clip_halfz;
rs->flatshade = state->flatshade;
rs->sprite_coord_enable = state->sprite_coord_enable;
+   rs->rasterizer_discard = state->rasterizer_discard;
rs->two_side = state->light_twoside;
rs->clip_plane_enable = state->clip_plane_enable;
rs->pa_sc_line_stipple = state->line_stipple_enable ?
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 86634b8681..7f1ecc278b 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -279,6 +279,7 @@ struct r600_rasterizer_state {
boolscissor_enable;
boolmultisample_enable;
boolclip_halfz;
+   boolrasterizer_discard;
 };
 
 struct r600_poly_offset_state {
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 1f7e9b3aa5..06100abc4a 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -470,6 +470,7 @@ static void *r600_create_rs_state(struct pipe_context *ctx,
rs->clip_halfz = state->clip_halfz;
rs->flatshade = state->flatshade;
rs->sprite_coord_enable = state->sprite_coord_enable;
+   rs->rasterizer_discard = state->rasterizer_discard;
rs->two_side = state->light_twoside;
rs->clip_plane_enable = state->clip_plane_enable;
rs->pa_sc_line_stipple = state->line_stipple_enable ?
@@ -622,7 +623,7 @@ static void *r600_create_sampler_state(struct pipe_context 
*ctx,
 static struct pipe_sampler_view *
 texture_buffer_sampler_view(struct r600_pipe_sampler_view *view,
unsigned width0, unsigned height0)
-   
+
 {
struct r600_texture *tmp = (struct r600_texture*)view->base.texture;
int stride = util_format_get_blocksize(view->base.format);
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 94f85e6dd3..c9b41517cc 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1708,7 +1708,12 @@ static void r600_draw_vbo(struct pipe_context *ctx, 
const struct pipe_draw_info
return;
}
 
-   if (unlikely(!rctx->vs_shader || !rctx->ps_shader)) {
+   if (unlikely(!rctx->vs_shader)) {
+   assert(0);
+   return;
+   }
+   if (unlikely(!rctx->ps_shader &&
+(!rctx->rasterizer || 
!rctx->rasterizer->rasterizer_discard))) {
assert(0);
return;
}
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3 v2] r600g: skip repeating vs, gs, and tes shader binds

2017-04-09 Thread Constantine Kharlamov
The idea is taken from radeonsi. The code lacks some checks for null vs,
and I'm unsure about some changes against that, so I left it in place.

Some statistics for GTAⅣ:
Average tesselation bind skip per frame: ≈350
Average geometric shaders bind skip per frame: ≈260
Skip of binding vertex ones occurs rarely enough to not get into per-frame
counter at all, so I just gonna say: it happens.

v2: I've occasionally removed an empty line, don't do this.

Signed-off-by: Constantine Kharlamov 
---
 src/gallium/drivers/r600/r600_state_common.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 4de2a7344b..94f85e6dd3 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -931,7 +931,7 @@ static void r600_bind_vs_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
-   if (!state)
+   if (!state || rctx->vs_shader == state)
return;
 
rctx->vs_shader = (struct r600_pipe_shader_selector *)state;
@@ -943,11 +943,12 @@ static void r600_bind_gs_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
+   if (state == rctx->gs_shader)
+   return;
+
rctx->gs_shader = (struct r600_pipe_shader_selector *)state;
r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx));
 
-   if (!state)
-   return;
rctx->b.streamout.stride_in_dw = rctx->gs_shader->so.stride;
 }
 
@@ -962,11 +963,12 @@ static void r600_bind_tes_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
+   if (state == rctx->tes_shader)
+   return;
+
rctx->tes_shader = (struct r600_pipe_shader_selector *)state;
r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx));
 
-   if (!state)
-   return;
rctx->b.streamout.stride_in_dw = rctx->tes_shader->so.stride;
 }
 
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3 v2] r600g: shader logic improvements

2017-04-09 Thread Constantine Kharlamov
Although I didn't see a statistically significant change in GTAⅣ benchmark, it
seem to have reduced stall for opening the door from a house to the outer world
at the first savepoint.

No changes in gpu.py tests of piglit in gbm mode.

v2: In the 1-st patch was occasionally removed empty line. Don't do that.

To the 3-rd patch added a check I missed because of macros using prefix.
Tbh I'd rather prefer to split ps-related logic out of
r600_update_derived_state(), but after more than hour of looking into 
it,
and with understanding only half of the logic, I gave up.

Constantine Kharlamov (3):
  r600g: skip repeating vs, gs, and tes shader binds
  r600g: add draw_vbo check for a NULL pixel shader
  r600g: get rid of dummy pixel shader

 src/gallium/drivers/r600/evergreen_state.c   |  1 +
 src/gallium/drivers/r600/r600_pipe.c |  9 
 src/gallium/drivers/r600/r600_pipe.h |  4 +-
 src/gallium/drivers/r600/r600_state.c|  3 +-
 src/gallium/drivers/r600/r600_state_common.c | 77 
 5 files changed, 47 insertions(+), 47 deletions(-)

-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3 v2] r600g: get rid of dummy pixel shader

2017-04-09 Thread Constantine Kharlamov
The idea is taken from radeonsi. The code mostly was already checking for null
pixel shader, so little checks had to be added.

Interestingly, acc. to testing with GTAⅣ, though binding of null shader happens
a lot at the start (then just stops), but draw_vbo() never actually sees null
ps.

v2: added a check I missed because of a macros using a prefix to choose
a shader.

Signed-off-by: Constantine Kharlamov 
---
 src/gallium/drivers/r600/r600_pipe.c |  9 -
 src/gallium/drivers/r600/r600_pipe.h |  3 --
 src/gallium/drivers/r600/r600_state_common.c | 58 ++--
 3 files changed, 30 insertions(+), 40 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 5014f2525c..7d8efd2c9b 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -82,9 +82,6 @@ static void r600_destroy_context(struct pipe_context *context)
if (rctx->fixed_func_tcs_shader)
rctx->b.b.delete_tcs_state(>b.b, 
rctx->fixed_func_tcs_shader);
 
-   if (rctx->dummy_pixel_shader) {
-   rctx->b.b.delete_fs_state(>b.b, rctx->dummy_pixel_shader);
-   }
if (rctx->custom_dsa_flush) {
rctx->b.b.delete_depth_stencil_alpha_state(>b.b, 
rctx->custom_dsa_flush);
}
@@ -209,12 +206,6 @@ static struct pipe_context *r600_create_context(struct 
pipe_screen *screen,
 
r600_begin_new_cs(rctx);
 
-   rctx->dummy_pixel_shader =
-   util_make_fragment_cloneinput_shader(>b.b, 0,
-TGSI_SEMANTIC_GENERIC,
-TGSI_INTERPOLATE_CONSTANT);
-   rctx->b.b.bind_fs_state(>b.b, rctx->dummy_pixel_shader);
-
return >b.b;
 
 fail:
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 7f1ecc278b..e636ef0024 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -432,9 +432,6 @@ struct r600_context {
void*custom_blend_resolve;
void*custom_blend_decompress;
void*custom_blend_fastclear;
-   /* With rasterizer discard, there doesn't have to be a pixel shader.
-* In that case, we bind this one: */
-   void*dummy_pixel_shader;
/* These dummy CMASK and FMASK buffers are used to get around the R6xx 
hardware
 * bug where valid CMASK and FMASK are required to be present to avoid
 * a hardlock in certain operations but aren't actually used
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index c9b41517cc..8d1193360b 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -725,7 +725,8 @@ static inline void r600_shader_selector_key(const struct 
pipe_context *ctx,
if (!key->vs.as_ls)
key->vs.as_es = (rctx->gs_shader != NULL);
 
-   if (rctx->ps_shader->current->shader.gs_prim_id_input && 
!rctx->gs_shader) {
+   if (rctx->ps_shader && 
rctx->ps_shader->current->shader.gs_prim_id_input &&
+   !rctx->gs_shader) {
key->vs.as_gs_a = true;
key->vs.prim_id_out = 
rctx->ps_shader->current->shader.input[rctx->ps_shader->current->shader.ps_prim_id_input].spi_sid;
}
@@ -909,9 +910,6 @@ static void r600_bind_ps_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
-   if (!state)
-   state = rctx->dummy_pixel_shader;
-
rctx->ps_shader = (struct r600_pipe_shader_selector *)state;
 }
 
@@ -1474,7 +1472,8 @@ static bool r600_update_derived_state(struct r600_context 
*rctx)
}
}
 
-   SELECT_SHADER_OR_FAIL(ps);
+   if (rctx->ps_shader)
+   SELECT_SHADER_OR_FAIL(ps);
 
r600_mark_atom_dirty(rctx, >shader_stages.atom);
 
@@ -1551,37 +1550,40 @@ static bool r600_update_derived_state(struct 
r600_context *rctx)
rctx->b.streamout.enabled_stream_buffers_mask = 
clip_so_current->enabled_stream_buffers_mask;
}
 
-   if (unlikely(ps_dirty || 
rctx->hw_shader_stages[R600_HW_STAGE_PS].shader != rctx->ps_shader->current ||
-   rctx->rasterizer->sprite_coord_enable != 
rctx->ps_shader->current->sprite_coord_enable ||
-   rctx->rasterizer->flatshade != 
rctx->ps_shader->current->flatshade)) {
+   if (rctx->ps_shader) {
+   if (unlikely((ps_dirty || 
rctx->hw_shader_stages[R600_HW_STAGE_PS].shader != rctx->ps_shader->current ||
+ rctx->rasterizer->sprite_coord_enable != 
rctx->ps_shader->current->sprite_coord_enable ||
+   

Re: [Mesa-dev] [PATCH 2/2] genxml: Make BLEND_STATE command support variable length array.

2017-04-09 Thread Lionel Landwerlin

On 09/04/17 17:23, Jason Ekstrand wrote:



On April 9, 2017 8:48:31 AM Lionel Landwerlin 
 wrote:



I have one suggestion at the bottom of the patch, otherwise :

Reviewed-by: Lionel Landwerlin 

On 07/04/17 17:52, Rafael Antognolli wrote:

We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers
dwords (on gen8+), but the BLEND_STATE struct length is always 17. By
marking it size 1, which is actually the size of the struct minus the
BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of
entries.

For gen6 and gen7 we set length to 0, since it only contains
BLEND_STATE_ENTRY's, and no other data.

With this change, we also change the code for blorp and anv to emit 
only

the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on
gen6-7 and 17 dwords on gen8+.

Signed-off-by: Rafael Antognolli 
---
  src/intel/blorp/blorp_genX_exec.h | 35 -
  src/intel/genxml/gen6.xml |  4 +-
  src/intel/genxml/gen7.xml |  4 +-
  src/intel/genxml/gen75.xml|  4 +-
  src/intel/genxml/gen8.xml |  4 +-
  src/intel/genxml/gen9.xml |  4 +-
  src/intel/vulkan/genX_pipeline.c  | 53 


  7 files changed, 58 insertions(+), 50 deletions(-)

diff --git a/src/intel/blorp/blorp_genX_exec.h 
b/src/intel/blorp/blorp_genX_exec.h

index 3791462..fc1856f 100644
--- a/src/intel/blorp/blorp_genX_exec.h
+++ b/src/intel/blorp/blorp_genX_exec.h
@@ -902,23 +902,30 @@ blorp_emit_blend_state(struct blorp_batch *batch,
 struct GENX(BLEND_STATE) blend;
 memset(, 0, sizeof(blend));

+   uint32_t offset;
+   int size = GENX(BLEND_STATE_length) * 4;
+   size += GENX(BLEND_STATE_ENTRY_length) * 4 * 
params->num_draw_buffers;
+   uint32_t *state = blorp_alloc_dynamic_state(batch, size, 64, 
);

+   uint32_t *pos = state;
+
+   GENX(BLEND_STATE_pack)(NULL, pos, );
+   pos += GENX(BLEND_STATE_length);
+
 for (unsigned i = 0; i < params->num_draw_buffers; ++i) {
-  blend.Entry[i].PreBlendColorClampEnable = true;
-  blend.Entry[i].PostBlendColorClampEnable = true;
-  blend.Entry[i].ColorClampRange = COLORCLAMP_RTFORMAT;
-
-  blend.Entry[i].WriteDisableRed = params->color_write_disable[0];
-  blend.Entry[i].WriteDisableGreen = 
params->color_write_disable[1];
-  blend.Entry[i].WriteDisableBlue = 
params->color_write_disable[2];
-  blend.Entry[i].WriteDisableAlpha = 
params->color_write_disable[3];

+  struct GENX(BLEND_STATE_ENTRY) entry = { 0 };
+  entry.PreBlendColorClampEnable = true;
+  entry.PostBlendColorClampEnable = true;
+  entry.ColorClampRange = COLORCLAMP_RTFORMAT;
+
+  entry.WriteDisableRed = params->color_write_disable[0];
+  entry.WriteDisableGreen = params->color_write_disable[1];
+  entry.WriteDisableBlue = params->color_write_disable[2];
+  entry.WriteDisableAlpha = params->color_write_disable[3];
+  GENX(BLEND_STATE_ENTRY_pack)(NULL, pos, );
+  pos += GENX(BLEND_STATE_ENTRY_length);
 }

-   uint32_t offset;
-   void *state = blorp_alloc_dynamic_state(batch,
- GENX(BLEND_STATE_length) * 4,
-   64, );
-   GENX(BLEND_STATE_pack)(NULL, state, );
-   blorp_flush_range(batch, state, GENX(BLEND_STATE_length) * 4);
+   blorp_flush_range(batch, state, size);

  #if GEN_GEN >= 7
 blorp_emit(batch, GENX(3DSTATE_BLEND_STATE_POINTERS), sp) {
diff --git a/src/intel/genxml/gen6.xml b/src/intel/genxml/gen6.xml
index 5083f07..3059bfc 100644
--- a/src/intel/genxml/gen6.xml
+++ b/src/intel/genxml/gen6.xml
@@ -452,8 +452,8 @@
  end="32" type="bool"/>



-  
-
+  
+
type="BLEND_STATE_ENTRY"/>

  

diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml
index ada8f74..867a1d4 100644
--- a/src/intel/genxml/gen7.xml
+++ b/src/intel/genxml/gen7.xml
@@ -507,8 +507,8 @@
  end="32" type="bool"/>



-  
-
+  
+
type="BLEND_STATE_ENTRY"/>

  

diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml
index 16d2d74..594e539 100644
--- a/src/intel/genxml/gen75.xml
+++ b/src/intel/genxml/gen75.xml
@@ -517,8 +517,8 @@
  end="32" type="bool"/>



-  
-
+  
+
type="BLEND_STATE_ENTRY"/>

  

diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml
index 1390fe6..4985342 100644
--- a/src/intel/genxml/gen8.xml
+++ b/src/intel/genxml/gen8.xml
@@ -546,7 +546,7 @@
  


-  
+  
  type="bool"/>
  end="30" type="bool"/>
  type="bool"/>

@@ -556,7 +556,7 @@
  type="bool"/>

  
  
-
+
type="BLEND_STATE_ENTRY"/>

  

diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
index 4bf0fb6..a620e78 100644
--- a/src/intel/genxml/gen9.xml
+++ b/src/intel/genxml/gen9.xml
@@ -555,7 +555,7 @@
  


-  
+  
  type="bool"/>
  end="30" type="bool"/>
  

Re: [Mesa-dev] [PATCH shader-db] Add ".so" shared objects to .gitignore

2017-04-09 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Sat, Apr 8, 2017 at 9:59 PM, Rhys Kidd  wrote:
> For intel_stubs.so
>
> Signed-off-by: Rhys Kidd 
> ---
>
> I don't have commit access, so I would appreciate a reviewer pushing this to
> master.
>
>  .gitignore | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/.gitignore b/.gitignore
> index f69750a..95a04f6 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -1,2 +1,3 @@
>  bin
>  run
> +*.so
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] amd/addrlib: use correct variable name in header

2017-04-09 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Sat, Apr 8, 2017 at 8:36 AM, Thomas Hindoe Paaboel Andersen
 wrote:
> Since the inclusion in 7f160efcde41b52ad78e562316384373dab419e3
> the header used x_biased, while the implementation used y_biased.
> This changes the header to macth the implementation since the
> uses of the function seems to expect y_biased.
> ---
>  src/amd/addrlib/gfx9/rbmap.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/addrlib/gfx9/rbmap.h b/src/amd/addrlib/gfx9/rbmap.h
> index f2f2ca8..89c8922 100644
> --- a/src/amd/addrlib/gfx9/rbmap.h
> +++ b/src/amd/addrlib/gfx9/rbmap.h
> @@ -49,7 +49,7 @@ public:
>
>  void Get_Comp_Block_Screen_Space( CoordEq& addr, int bytes_log2, int* w, 
> int* h, int* d = NULL);
>
> -void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool 
> is_thick, bool x_biased,
> +void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool 
> is_thick, bool y_biased,
>int comp_block_width_log2, int 
> comp_block_height_log2, int comp_block_depth_log2,
>int& meta_block_width_log2, int& 
> meta_block_height_log2, int& meta_block_depth_log2 );
>  void cap_pipe( int xmode, bool is_thick, int& num_ses_log2, int 
> bpp_log2, int num_samples_log2, int pipe_interleave_log2,
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] amd: fix distcheck

2017-04-09 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Wed, Apr 5, 2017 at 1:00 PM, Juan A. Suarez Romero
 wrote:
> Add missing GFX9 files in the EXTRA_DIST.
> ---
>  src/amd/Makefile.sources | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/src/amd/Makefile.sources b/src/amd/Makefile.sources
> index 46da0fb..816e7e4 100644
> --- a/src/amd/Makefile.sources
> +++ b/src/amd/Makefile.sources
> @@ -21,12 +21,14 @@ ADDRLIB_FILES = \
> addrlib/core/addrlib2.h \
> addrlib/core/addrobject.cpp \
> addrlib/core/addrobject.h \
> +   addrlib/gfx9/chip/gfx9_enum.h \
> addrlib/gfx9/coord.cpp \
> addrlib/gfx9/coord.h \
> addrlib/gfx9/gfx9addrlib.cpp \
> addrlib/gfx9/gfx9addrlib.h \
> addrlib/gfx9/rbmap.cpp \
> addrlib/gfx9/rbmap.h \
> +   addrlib/inc/chip/gfx9/gfx9_gb_reg.h \
> addrlib/inc/chip/r800/si_gb_reg.h \
> addrlib/inc/lnx_common_defs.h \
> addrlib/r800/chip/si_ci_vi_merged_enum.h \
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/3] nv50/ir: handle logops with NOT in AlgebraicOpt

2017-04-09 Thread Ilia Mirkin
On Mon, Apr 3, 2017 at 11:58 AM, Karol Herbst  wrote:
> Signed-off-by: Karol Herbst 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index bd60a84998..0de84fe9fc 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -1856,6 +1856,12 @@ AlgebraicOpt::handleLOGOP(Instruction *logop)
>
>set0 = cloneForward(func, set0);
>set1 = cloneShallow(func, set1);
> +
> +  if (logop->src(0).mod == Modifier(NV50_IR_MOD_NOT))
> + set0->asCmp()->setCond = inverseCondCode(set0->asCmp()->setCond);
> +  if (logop->src(1).mod == Modifier(NV50_IR_MOD_NOT))
> + set1->asCmp()->setCond = inverseCondCode(set1->asCmp()->setCond);

set0/set1 may have been swapped further up, so you need to keep track of that.

Also, I don't think this will work if one of the sets is a SET_AND --
the condcode applies to the set bit, not to the AND bit. I think you'd
also have to flip AND <-> OR and flip the neg.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/3] nv50/ir: fix AlgebraicOpt for slcts with mods

2017-04-09 Thread Ilia Mirkin
On Mon, Apr 3, 2017 at 11:58 AM, Karol Herbst  wrote:
> Signed-off-by: Karol Herbst 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index 4c92a1efb5..bd60a84998 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -1797,10 +1797,10 @@ AlgebraicOpt::handleSLCT(Instruction *slct)
>if (slct->getSrc(2)->asImm()->compare(slct->asCmp()->setCond, 0.0f))
>   slct->setSrc(0, slct->getSrc(1));
> } else
> -   if (slct->getSrc(0) != slct->getSrc(1)) {
> +   if (slct->getSrc(0) != slct->getSrc(1) || slct->src(0).mod != 
> slct->src(1).mod)

SLCT can't have mods on src0/src1. Only on src2. I'd be just as happy
to assert that they're both == 0 here. You can also add a helper to
ValueRef to see if it's == to another ValueRef, which compares both
the Value ptr as well as any modifiers, indirects, etc. But it again
doesn't ultimately need to be used here.

>return;
> -   }
> -   slct->op = OP_MOV;
> +   slct->op = slct->src(0).mod.getOp();
> +   slct->src(0).mod = slct->src(0).mod ^ Modifier(slct->op);
> slct->setSrc(1, NULL);
> slct->setSrc(2, NULL);
>  }
> --
> 2.12.2
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: fix memory leak in arb_fragment_program

2017-04-09 Thread Bartosz Tomczyk
---
 src/mesa/program/arbprogparse.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/program/arbprogparse.c b/src/mesa/program/arbprogparse.c
index 07bdf1603e..83a501eea6 100644
--- a/src/mesa/program/arbprogparse.c
+++ b/src/mesa/program/arbprogparse.c
@@ -78,6 +78,7 @@ _mesa_parse_arb_fragment_program(struct gl_context* ctx, 
GLenum target,
memset(, 0, sizeof(prog));
memset(, 0, sizeof(state));
state.prog = 
+   state.mem_ctx = program;
 
if (!_mesa_parse_arb_program(ctx, target, (const GLubyte*) str, len,
)) {
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] genxml: Make BLEND_STATE command support variable length array.

2017-04-09 Thread Jason Ekstrand



On April 9, 2017 8:48:31 AM Lionel Landwerlin 
 wrote:



I have one suggestion at the bottom of the patch, otherwise :

Reviewed-by: Lionel Landwerlin 

On 07/04/17 17:52, Rafael Antognolli wrote:

We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers
dwords (on gen8+), but the BLEND_STATE struct length is always 17. By
marking it size 1, which is actually the size of the struct minus the
BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of
entries.

For gen6 and gen7 we set length to 0, since it only contains
BLEND_STATE_ENTRY's, and no other data.

With this change, we also change the code for blorp and anv to emit only
the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on
gen6-7 and 17 dwords on gen8+.

Signed-off-by: Rafael Antognolli 
---
  src/intel/blorp/blorp_genX_exec.h | 35 -
  src/intel/genxml/gen6.xml |  4 +-
  src/intel/genxml/gen7.xml |  4 +-
  src/intel/genxml/gen75.xml|  4 +-
  src/intel/genxml/gen8.xml |  4 +-
  src/intel/genxml/gen9.xml |  4 +-
  src/intel/vulkan/genX_pipeline.c  | 53 
  7 files changed, 58 insertions(+), 50 deletions(-)

diff --git a/src/intel/blorp/blorp_genX_exec.h 
b/src/intel/blorp/blorp_genX_exec.h

index 3791462..fc1856f 100644
--- a/src/intel/blorp/blorp_genX_exec.h
+++ b/src/intel/blorp/blorp_genX_exec.h
@@ -902,23 +902,30 @@ blorp_emit_blend_state(struct blorp_batch *batch,
 struct GENX(BLEND_STATE) blend;
 memset(, 0, sizeof(blend));

+   uint32_t offset;
+   int size = GENX(BLEND_STATE_length) * 4;
+   size += GENX(BLEND_STATE_ENTRY_length) * 4 * params->num_draw_buffers;
+   uint32_t *state = blorp_alloc_dynamic_state(batch, size, 64, );
+   uint32_t *pos = state;
+
+   GENX(BLEND_STATE_pack)(NULL, pos, );
+   pos += GENX(BLEND_STATE_length);
+
 for (unsigned i = 0; i < params->num_draw_buffers; ++i) {
-  blend.Entry[i].PreBlendColorClampEnable = true;
-  blend.Entry[i].PostBlendColorClampEnable = true;
-  blend.Entry[i].ColorClampRange = COLORCLAMP_RTFORMAT;
-
-  blend.Entry[i].WriteDisableRed = params->color_write_disable[0];
-  blend.Entry[i].WriteDisableGreen = params->color_write_disable[1];
-  blend.Entry[i].WriteDisableBlue = params->color_write_disable[2];
-  blend.Entry[i].WriteDisableAlpha = params->color_write_disable[3];
+  struct GENX(BLEND_STATE_ENTRY) entry = { 0 };
+  entry.PreBlendColorClampEnable = true;
+  entry.PostBlendColorClampEnable = true;
+  entry.ColorClampRange = COLORCLAMP_RTFORMAT;
+
+  entry.WriteDisableRed = params->color_write_disable[0];
+  entry.WriteDisableGreen = params->color_write_disable[1];
+  entry.WriteDisableBlue = params->color_write_disable[2];
+  entry.WriteDisableAlpha = params->color_write_disable[3];
+  GENX(BLEND_STATE_ENTRY_pack)(NULL, pos, );
+  pos += GENX(BLEND_STATE_ENTRY_length);
 }

-   uint32_t offset;
-   void *state = blorp_alloc_dynamic_state(batch,
-   GENX(BLEND_STATE_length) * 4,
-   64, );
-   GENX(BLEND_STATE_pack)(NULL, state, );
-   blorp_flush_range(batch, state, GENX(BLEND_STATE_length) * 4);
+   blorp_flush_range(batch, state, size);

  #if GEN_GEN >= 7
 blorp_emit(batch, GENX(3DSTATE_BLEND_STATE_POINTERS), sp) {
diff --git a/src/intel/genxml/gen6.xml b/src/intel/genxml/gen6.xml
index 5083f07..3059bfc 100644
--- a/src/intel/genxml/gen6.xml
+++ b/src/intel/genxml/gen6.xml
@@ -452,8 +452,8 @@
  


-  
-
+  
+

  

diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml
index ada8f74..867a1d4 100644
--- a/src/intel/genxml/gen7.xml
+++ b/src/intel/genxml/gen7.xml
@@ -507,8 +507,8 @@
  


-  
-
+  
+

  

diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml
index 16d2d74..594e539 100644
--- a/src/intel/genxml/gen75.xml
+++ b/src/intel/genxml/gen75.xml
@@ -517,8 +517,8 @@
  


-  
-
+  
+

  

diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml
index 1390fe6..4985342 100644
--- a/src/intel/genxml/gen8.xml
+++ b/src/intel/genxml/gen8.xml
@@ -546,7 +546,7 @@
  


-  
+  
  
  
  
@@ -556,7 +556,7 @@
  
  
  
-
+

  

diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
index 4bf0fb6..a620e78 100644
--- a/src/intel/genxml/gen9.xml
+++ b/src/intel/genxml/gen9.xml
@@ -555,7 +555,7 @@
  


-  
+  
  
  
  
@@ -565,7 +565,7 @@
  
  
  
-
+

  

diff --git a/src/intel/vulkan/genX_pipeline.c 
b/src/intel/vulkan/genX_pipeline.c

index 3fd1333..894d584 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -862,28 

Re: [Mesa-dev] [PATCH 2/2] genxml: Make BLEND_STATE command support variable length array.

2017-04-09 Thread Lionel Landwerlin

I have one suggestion at the bottom of the patch, otherwise :

Reviewed-by: Lionel Landwerlin 

On 07/04/17 17:52, Rafael Antognolli wrote:

We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers
dwords (on gen8+), but the BLEND_STATE struct length is always 17. By
marking it size 1, which is actually the size of the struct minus the
BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of
entries.

For gen6 and gen7 we set length to 0, since it only contains
BLEND_STATE_ENTRY's, and no other data.

With this change, we also change the code for blorp and anv to emit only
the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on
gen6-7 and 17 dwords on gen8+.

Signed-off-by: Rafael Antognolli 
---
  src/intel/blorp/blorp_genX_exec.h | 35 -
  src/intel/genxml/gen6.xml |  4 +-
  src/intel/genxml/gen7.xml |  4 +-
  src/intel/genxml/gen75.xml|  4 +-
  src/intel/genxml/gen8.xml |  4 +-
  src/intel/genxml/gen9.xml |  4 +-
  src/intel/vulkan/genX_pipeline.c  | 53 
  7 files changed, 58 insertions(+), 50 deletions(-)

diff --git a/src/intel/blorp/blorp_genX_exec.h 
b/src/intel/blorp/blorp_genX_exec.h
index 3791462..fc1856f 100644
--- a/src/intel/blorp/blorp_genX_exec.h
+++ b/src/intel/blorp/blorp_genX_exec.h
@@ -902,23 +902,30 @@ blorp_emit_blend_state(struct blorp_batch *batch,
 struct GENX(BLEND_STATE) blend;
 memset(, 0, sizeof(blend));
  
+   uint32_t offset;

+   int size = GENX(BLEND_STATE_length) * 4;
+   size += GENX(BLEND_STATE_ENTRY_length) * 4 * params->num_draw_buffers;
+   uint32_t *state = blorp_alloc_dynamic_state(batch, size, 64, );
+   uint32_t *pos = state;
+
+   GENX(BLEND_STATE_pack)(NULL, pos, );
+   pos += GENX(BLEND_STATE_length);
+
 for (unsigned i = 0; i < params->num_draw_buffers; ++i) {
-  blend.Entry[i].PreBlendColorClampEnable = true;
-  blend.Entry[i].PostBlendColorClampEnable = true;
-  blend.Entry[i].ColorClampRange = COLORCLAMP_RTFORMAT;
-
-  blend.Entry[i].WriteDisableRed = params->color_write_disable[0];
-  blend.Entry[i].WriteDisableGreen = params->color_write_disable[1];
-  blend.Entry[i].WriteDisableBlue = params->color_write_disable[2];
-  blend.Entry[i].WriteDisableAlpha = params->color_write_disable[3];
+  struct GENX(BLEND_STATE_ENTRY) entry = { 0 };
+  entry.PreBlendColorClampEnable = true;
+  entry.PostBlendColorClampEnable = true;
+  entry.ColorClampRange = COLORCLAMP_RTFORMAT;
+
+  entry.WriteDisableRed = params->color_write_disable[0];
+  entry.WriteDisableGreen = params->color_write_disable[1];
+  entry.WriteDisableBlue = params->color_write_disable[2];
+  entry.WriteDisableAlpha = params->color_write_disable[3];
+  GENX(BLEND_STATE_ENTRY_pack)(NULL, pos, );
+  pos += GENX(BLEND_STATE_ENTRY_length);
 }
  
-   uint32_t offset;

-   void *state = blorp_alloc_dynamic_state(batch,
-   GENX(BLEND_STATE_length) * 4,
-   64, );
-   GENX(BLEND_STATE_pack)(NULL, state, );
-   blorp_flush_range(batch, state, GENX(BLEND_STATE_length) * 4);
+   blorp_flush_range(batch, state, size);
  
  #if GEN_GEN >= 7

 blorp_emit(batch, GENX(3DSTATE_BLEND_STATE_POINTERS), sp) {
diff --git a/src/intel/genxml/gen6.xml b/src/intel/genxml/gen6.xml
index 5083f07..3059bfc 100644
--- a/src/intel/genxml/gen6.xml
+++ b/src/intel/genxml/gen6.xml
@@ -452,8 +452,8 @@
  

  
-  

-
+  
+

  

diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml
index ada8f74..867a1d4 100644
--- a/src/intel/genxml/gen7.xml
+++ b/src/intel/genxml/gen7.xml
@@ -507,8 +507,8 @@
  

  
-  

-
+  
+

  

diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml
index 16d2d74..594e539 100644
--- a/src/intel/genxml/gen75.xml
+++ b/src/intel/genxml/gen75.xml
@@ -517,8 +517,8 @@
  

  
-  

-
+  
+

  

diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml
index 1390fe6..4985342 100644
--- a/src/intel/genxml/gen8.xml
+++ b/src/intel/genxml/gen8.xml
@@ -546,7 +546,7 @@
  

  
-  

+  
  
  
  
@@ -556,7 +556,7 @@
  
  
  
-
+

  

diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
index 4bf0fb6..a620e78 100644
--- a/src/intel/genxml/gen9.xml
+++ b/src/intel/genxml/gen9.xml
@@ -555,7 +555,7 @@
  

  
-  

+  
  
  
  
@@ -565,7 +565,7 @@
  
  
  
-
+

  

diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 3fd1333..894d584 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -862,28 +862,14 @@ emit_cb_state(struct anv_pipeline *pipeline,
  {
 struct 

[Mesa-dev] [PATCH 1/3 v3] r600g: skip repeating vs, gs, and tes shader binds

2017-04-09 Thread Constantine Kharlamov
The idea is taken from radeonsi. The code lacks some checks for null vs,
and I'm unsure about some changes against that, so I left it in place.

Some statistics for GTAⅣ:
Average tesselation bind skip per frame: ≈350
Average geometric shaders bind skip per frame: ≈260
Skip of binding vertex ones occurs rarely enough to not get into per-frame
counter at all, so I just gonna say: it happens.

v2: I've occasionally removed an empty line, don't do this.
v3: fix the title for the mail to get stacked with its series

Signed-off-by: Constantine Kharlamov 
---
 src/gallium/drivers/r600/r600_state_common.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 4de2a7344b..94f85e6dd3 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -931,7 +931,7 @@ static void r600_bind_vs_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
-   if (!state)
+   if (!state || rctx->vs_shader == state)
return;
 
rctx->vs_shader = (struct r600_pipe_shader_selector *)state;
@@ -943,11 +943,12 @@ static void r600_bind_gs_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
+   if (state == rctx->gs_shader)
+   return;
+
rctx->gs_shader = (struct r600_pipe_shader_selector *)state;
r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx));
 
-   if (!state)
-   return;
rctx->b.streamout.stride_in_dw = rctx->gs_shader->so.stride;
 }
 
@@ -962,11 +963,12 @@ static void r600_bind_tes_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
+   if (state == rctx->tes_shader)
+   return;
+
rctx->tes_shader = (struct r600_pipe_shader_selector *)state;
r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx));
 
-   if (!state)
-   return;
rctx->b.streamout.stride_in_dw = rctx->tes_shader->so.stride;
 }
 
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] r600g: skip repeating vs, gs, and tes shader binds

2017-04-09 Thread Constantine Kharlamov
The idea is taken from radeonsi. The code lacks some checks for null vs,
and I'm unsure about some changes against that, so I left it in place.

Some statistics for GTAⅣ:
Average tesselation bind skip per frame: ≈350
Average geometric shaders bind skip per frame: ≈260
Skip of binding vertex ones occurs rarely enough to not get into per-frame
counter at all, so I just gonna say: it happens.

v2: I've occasionally removed an empty line, don't do this.

Signed-off-by: Constantine Kharlamov 
---
 src/gallium/drivers/r600/r600_state_common.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 4de2a7344b..94f85e6dd3 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -931,7 +931,7 @@ static void r600_bind_vs_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
-   if (!state)
+   if (!state || rctx->vs_shader == state)
return;
 
rctx->vs_shader = (struct r600_pipe_shader_selector *)state;
@@ -943,11 +943,12 @@ static void r600_bind_gs_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
+   if (state == rctx->gs_shader)
+   return;
+
rctx->gs_shader = (struct r600_pipe_shader_selector *)state;
r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx));
 
-   if (!state)
-   return;
rctx->b.streamout.stride_in_dw = rctx->gs_shader->so.stride;
 }
 
@@ -962,11 +963,12 @@ static void r600_bind_tes_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
+   if (state == rctx->tes_shader)
+   return;
+
rctx->tes_shader = (struct r600_pipe_shader_selector *)state;
r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx));
 
-   if (!state)
-   return;
rctx->b.streamout.stride_in_dw = rctx->tes_shader->so.stride;
 }
 
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 100629] No mans sky renders white screen under wine in linux

2017-04-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=100629

--- Comment #2 from Giovanni ongaro  ---
those errors are displayed multiple times
Mesa: User error: GL_INVALID_ENUM in glDrawElements(mode=)
Mesa: User error: GL_INVALID_ENUM in glDrawElementsInstanced(mode=)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] r600g: get rid of dummy pixel shader

2017-04-09 Thread Constantine Kharlamov
The idea is taken from radeonsi. The code mostly was already checking for null
pixel shader, so little checks had to be added.

Interestingly, acc. to testing with GTAⅣ, though binding of null shader happens
a lot at the start (then just stops), but draw_vbo() never actually sees null
ps.

Signed-off-by: Constantine Kharlamov 
---
 src/gallium/drivers/r600/r600_pipe.c |  9 -
 src/gallium/drivers/r600/r600_pipe.h |  3 ---
 src/gallium/drivers/r600/r600_state_common.c | 17 -
 3 files changed, 8 insertions(+), 21 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 5014f2525c..7d8efd2c9b 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -82,9 +82,6 @@ static void r600_destroy_context(struct pipe_context *context)
if (rctx->fixed_func_tcs_shader)
rctx->b.b.delete_tcs_state(>b.b, 
rctx->fixed_func_tcs_shader);
 
-   if (rctx->dummy_pixel_shader) {
-   rctx->b.b.delete_fs_state(>b.b, rctx->dummy_pixel_shader);
-   }
if (rctx->custom_dsa_flush) {
rctx->b.b.delete_depth_stencil_alpha_state(>b.b, 
rctx->custom_dsa_flush);
}
@@ -209,12 +206,6 @@ static struct pipe_context *r600_create_context(struct 
pipe_screen *screen,
 
r600_begin_new_cs(rctx);
 
-   rctx->dummy_pixel_shader =
-   util_make_fragment_cloneinput_shader(>b.b, 0,
-TGSI_SEMANTIC_GENERIC,
-TGSI_INTERPOLATE_CONSTANT);
-   rctx->b.b.bind_fs_state(>b.b, rctx->dummy_pixel_shader);
-
return >b.b;
 
 fail:
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 7f1ecc278b..e636ef0024 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -432,9 +432,6 @@ struct r600_context {
void*custom_blend_resolve;
void*custom_blend_decompress;
void*custom_blend_fastclear;
-   /* With rasterizer discard, there doesn't have to be a pixel shader.
-* In that case, we bind this one: */
-   void*dummy_pixel_shader;
/* These dummy CMASK and FMASK buffers are used to get around the R6xx 
hardware
 * bug where valid CMASK and FMASK are required to be present to avoid
 * a hardlock in certain operations but aren't actually used
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index c4b1a22d95..be7db361d1 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -725,7 +725,8 @@ static inline void r600_shader_selector_key(const struct 
pipe_context *ctx,
if (!key->vs.as_ls)
key->vs.as_es = (rctx->gs_shader != NULL);
 
-   if (rctx->ps_shader->current->shader.gs_prim_id_input && 
!rctx->gs_shader) {
+   if (rctx->ps_shader && 
rctx->ps_shader->current->shader.gs_prim_id_input &&
+   !rctx->gs_shader) {
key->vs.as_gs_a = true;
key->vs.prim_id_out = 
rctx->ps_shader->current->shader.input[rctx->ps_shader->current->shader.ps_prim_id_input].spi_sid;
}
@@ -909,9 +910,6 @@ static void r600_bind_ps_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
-   if (!state)
-   state = rctx->dummy_pixel_shader;
-
rctx->ps_shader = (struct r600_pipe_shader_selector *)state;
 }
 
@@ -1550,9 +1548,10 @@ static bool r600_update_derived_state(struct 
r600_context *rctx)
rctx->b.streamout.enabled_stream_buffers_mask = 
clip_so_current->enabled_stream_buffers_mask;
}
 
-   if (unlikely(ps_dirty || 
rctx->hw_shader_stages[R600_HW_STAGE_PS].shader != rctx->ps_shader->current ||
-   rctx->rasterizer->sprite_coord_enable != 
rctx->ps_shader->current->sprite_coord_enable ||
-   rctx->rasterizer->flatshade != 
rctx->ps_shader->current->flatshade)) {
+   if (unlikely(rctx->ps_shader &&
+(ps_dirty || 
rctx->hw_shader_stages[R600_HW_STAGE_PS].shader != rctx->ps_shader->current ||
+ rctx->rasterizer->sprite_coord_enable != 
rctx->ps_shader->current->sprite_coord_enable ||
+ rctx->rasterizer->flatshade != 
rctx->ps_shader->current->flatshade))) {
 
if (rctx->cb_misc_state.nr_ps_color_outputs != 
rctx->ps_shader->current->nr_ps_color_outputs) {
rctx->cb_misc_state.nr_ps_color_outputs = 
rctx->ps_shader->current->nr_ps_color_outputs;
@@ -1568,7 +1567,7 @@ static bool r600_update_derived_state(struct 

[Mesa-dev] [PATCH 2/3] r600g: add draw_vbo check for a NULL pixel shader

2017-04-09 Thread Constantine Kharlamov
Taken from radeonsi, required to remove dummy pixel shader in the next patch

Signed-off-by: Constantine Kharlamov 
---
 src/gallium/drivers/r600/evergreen_state.c   | 1 +
 src/gallium/drivers/r600/r600_pipe.h | 1 +
 src/gallium/drivers/r600/r600_state.c| 3 ++-
 src/gallium/drivers/r600/r600_state_common.c | 7 ++-
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 371e7ce212..5697da4af9 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -471,6 +471,7 @@ static void *evergreen_create_rs_state(struct pipe_context 
*ctx,
rs->clip_halfz = state->clip_halfz;
rs->flatshade = state->flatshade;
rs->sprite_coord_enable = state->sprite_coord_enable;
+   rs->rasterizer_discard = state->rasterizer_discard;
rs->two_side = state->light_twoside;
rs->clip_plane_enable = state->clip_plane_enable;
rs->pa_sc_line_stipple = state->line_stipple_enable ?
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 86634b8681..7f1ecc278b 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -279,6 +279,7 @@ struct r600_rasterizer_state {
boolscissor_enable;
boolmultisample_enable;
boolclip_halfz;
+   boolrasterizer_discard;
 };
 
 struct r600_poly_offset_state {
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 1f7e9b3aa5..06100abc4a 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -470,6 +470,7 @@ static void *r600_create_rs_state(struct pipe_context *ctx,
rs->clip_halfz = state->clip_halfz;
rs->flatshade = state->flatshade;
rs->sprite_coord_enable = state->sprite_coord_enable;
+   rs->rasterizer_discard = state->rasterizer_discard;
rs->two_side = state->light_twoside;
rs->clip_plane_enable = state->clip_plane_enable;
rs->pa_sc_line_stipple = state->line_stipple_enable ?
@@ -622,7 +623,7 @@ static void *r600_create_sampler_state(struct pipe_context 
*ctx,
 static struct pipe_sampler_view *
 texture_buffer_sampler_view(struct r600_pipe_sampler_view *view,
unsigned width0, unsigned height0)
-   
+
 {
struct r600_texture *tmp = (struct r600_texture*)view->base.texture;
int stride = util_format_get_blocksize(view->base.format);
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index dab39f19e3..c4b1a22d95 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1707,7 +1707,12 @@ static void r600_draw_vbo(struct pipe_context *ctx, 
const struct pipe_draw_info
return;
}
 
-   if (unlikely(!rctx->vs_shader || !rctx->ps_shader)) {
+   if (unlikely(!rctx->vs_shader)) {
+   assert(0);
+   return;
+   }
+   if (unlikely(!rctx->ps_shader &&
+(!rctx->rasterizer || 
!rctx->rasterizer->rasterizer_discard))) {
assert(0);
return;
}
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] r600g: skip repeating vs, gs, and tes shader binds

2017-04-09 Thread Constantine Kharlamov
The idea is taken from radeonsi. The code lacks some checks for null vs,
and I'm unsure about some changes against that, so I left it in place.

Some statistics for GTAⅣ:
Average tesselation shaders bind skip per frame: ≈350
Average geometric shaders bind skip per frame: ≈260
Skip of binding vertex ones occurs rarely enough to not get into per-frame
counter at all, so I just gonna say: it happens.

Signed-off-by: Constantine Kharlamov 
---
 src/gallium/drivers/r600/r600_state_common.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 4de2a7344b..dab39f19e3 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -926,12 +926,11 @@ static struct tgsi_shader_info *r600_get_vs_info(struct 
r600_context *rctx)
else
return NULL;
 }
-
 static void r600_bind_vs_state(struct pipe_context *ctx, void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
-   if (!state)
+   if (!state || rctx->vs_shader == state)
return;
 
rctx->vs_shader = (struct r600_pipe_shader_selector *)state;
@@ -943,11 +942,12 @@ static void r600_bind_gs_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
+   if (state == rctx->gs_shader)
+   return;
+
rctx->gs_shader = (struct r600_pipe_shader_selector *)state;
r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx));
 
-   if (!state)
-   return;
rctx->b.streamout.stride_in_dw = rctx->gs_shader->so.stride;
 }
 
@@ -962,11 +962,12 @@ static void r600_bind_tes_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
+   if (state == rctx->tes_shader)
+   return;
+
rctx->tes_shader = (struct r600_pipe_shader_selector *)state;
r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx));
 
-   if (!state)
-   return;
rctx->b.streamout.stride_in_dw = rctx->tes_shader->so.stride;
 }
 
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3] r600g: shader logic improvements

2017-04-09 Thread Constantine Kharlamov
Although I didn't see a staticstically significant change in GTAⅣ benchmark, it
seem to have reduced stall for opening the door from a house to the outer world
at the first savepoint.

No changes in gpu.py tests of piglit in gbm mode.

Constantine Kharlamov (3):
  r600g: skip repeating vs, gs, and tes shader binds
  r600g: add draw_vbo check for a NULL pixel shader
  r600g: get rid of dummy pixel shader

 src/gallium/drivers/r600/evergreen_state.c   |  1 +
 src/gallium/drivers/r600/r600_pipe.c |  9 ---
 src/gallium/drivers/r600/r600_pipe.h |  4 +--
 src/gallium/drivers/r600/r600_state.c|  3 ++-
 src/gallium/drivers/r600/r600_state_common.c | 37 
 5 files changed, 25 insertions(+), 29 deletions(-)

-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 100629] No mans sky renders white screen under wine in linux

2017-04-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=100629

--- Comment #1 from Giovanni ongaro  ---
Upon starting no man  sky under wine (no man sky  need OGL4.5 ) ingame only a
white screen is displayed

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 100629] No mans sky renders white screen under wine in linux

2017-04-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=100629

Bug ID: 100629
   Summary: No mans sky renders white screen under wine in linux
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: giovanni.nic...@ticino.com
QA Contact: mesa-dev@lists.freedesktop.org

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: use single memcpy when strides matches

2017-04-09 Thread Bartosz Tomczyk
---
 src/mesa/main/readpix.c  | 15 ++-
 src/mesa/main/texstore.c | 15 +++
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index 25823230d6..14568de497 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -220,7 +220,7 @@ readpixels_memcpy(struct gl_context *ctx,
struct gl_renderbuffer *rb =
  _mesa_get_read_renderbuffer_for_format(ctx, format);
GLubyte *dst, *map;
-   int dstStride, stride, j, texelBytes;
+   int dstStride, stride, j, texelBytes, bytesPerRow;
 
/* Fail if memcpy cannot be used. */
if (!readpixels_can_use_memcpy(ctx, format, type, packing)) {
@@ -239,12 +239,17 @@ readpixels_memcpy(struct gl_context *ctx,
}
 
texelBytes = _mesa_get_format_bytes(rb->Format);
+   bytesPerRow = texelBytes * width;
 
/* memcpy*/
-   for (j = 0; j < height; j++) {
-  memcpy(dst, map, width * texelBytes);
-  dst += dstStride;
-  map += stride;
+   if (dstStride == stride && dstStride == bytesPerRow) {
+ memcpy(dst, map, bytesPerRow * height);
+   } else {
+  for (j = 0; j < height; j++) {
+ memcpy(dst, map, bytesPerRow);
+ dst += dstStride;
+ map += stride;
+  }
}
 
ctx->Driver.UnmapRenderbuffer(ctx, rb);
diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c
index 615ba63362..3314e557c0 100644
--- a/src/mesa/main/texstore.c
+++ b/src/mesa/main/texstore.c
@@ -1360,10 +1360,17 @@ _mesa_store_compressed_texsubimage(struct gl_context 
*ctx, GLuint dims,
   if (dstMap) {
 
  /* copy rows of blocks */
- for (i = 0; i < store.CopyRowsPerSlice; i++) {
-memcpy(dstMap, src, store.CopyBytesPerRow);
-dstMap += dstRowStride;
-src += store.TotalBytesPerRow;
+ if (dstRowStride == store.TotalBytesPerRow &&
+ dstRowStride == store.CopyBytesPerRow) {
+memcpy(dstMap, src, store.CopyBytesPerRow * 
store.CopyRowsPerSlice);
+src += store.CopyBytesPerRow * store.CopyRowsPerSlice;
+ }
+ else {
+for (i = 0; i < store.CopyRowsPerSlice; i++) {
+   memcpy(dstMap, src, store.CopyBytesPerRow);
+   dstMap += dstRowStride;
+   src += store.TotalBytesPerRow;
+}
  }
 
  ctx->Driver.UnmapTextureImage(ctx, texImage, slice + zoffset);
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 100627] EGL fails to fall back to DRI2 when DRI3 is enabled but not available

2017-04-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=100627

Bug ID: 100627
   Summary: EGL fails to fall back to DRI2 when DRI3 is enabled
but not available
   Product: Mesa
   Version: 17.0
  Hardware: All
OS: FreeBSD
Status: NEW
  Severity: normal
  Priority: medium
 Component: EGL
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: re...@freebsd.org
QA Contact: mesa-dev@lists.freedesktop.org

DRI2 will be available but DRI3 may or may not depending which kernel and
drivers are running. When Mesa is compiled with DRI3 support enabled and run on
a kernel with only DRI2 support, applications using GLX work fine (modulo scary
messages), but applications that use EGL fail. The cause appears to be
insufficient checking during init in libEGL.

The GLX init code has separate dri3_create_screen and dri2_create_screen
functions which are called from the respective init function. When the former
bails with "libGL error: Version 7 or imageFromFds image extension not
found\nlibGL error: failed to load driver: r600", libGL then tries the DRI2
path and succeeds. As an aside, it would be nice if the first message was only
shown if LIBGL_DEBUG is set, and the second should indicate that DRI3 init has
failed but DRI2 will be attempted instead of saying the driver failed to load
(which is a source of spurious bug reports).

The EGL path on the other hand only has a dri2_create_screen function which is
called from both dri2_initialize_x11_dri2 and dri2_initialize_x11_dri3. Thus
the init path succeeds for DRI3 even though it cannot work, so the application
ultimately fails because the first detectable error is after we are out of the
init routines and it's too late to attempt a fallback to DRI2. Setting
LIBGL_DRI3_DISABLE allows applications using EGL to function correctly, so
there is a work-around until the initialization can be fixed to check
availability of DRI3 support as is done in libGL.

Example of failure running mesa EGL demos:

% LIBGL_DEBUG=verbose EGL_LOG_LEVEL=debug MESA_DEBUG=1 eglgears_x11 
libEGL debug: Native platform type: x11 (autodetected)
libEGL debug: added egl_dri2 to module array
libGL: Can't open configuration file /home/user/.drirc: No such file or
directory.
libEGL debug: DRI2: dlopen(/usr/local/lib/dri/r600_dri.so)
libEGL debug: found extension `DRI_Core'
libEGL info: found extension DRI_Core version 1
libEGL debug: found extension `DRI_IMAGE_DRIVER'
libEGL info: found extension DRI_IMAGE_DRIVER version 1
libEGL debug: found extension `DRI_DRI2'
libEGL debug: found extension `DRI_ConfigOptions'
libEGL debug: found extension `DRI2_Fence'
libGL: Can't open configuration file /home/user/.drirc: No such file or
directory.
libGL: Can't open configuration file /home/user/.drirc: No such file or
directory.
libEGL debug: found extension `DRI_TexBuffer'
libEGL info: found extension DRI_TexBuffer version 2
libEGL debug: found extension `DRI2_Flush'
libEGL info: found extension DRI2_Flush version 4
libEGL debug: found extension `DRI_IMAGE'
libEGL info: found extension DRI_IMAGE version 12
libEGL debug: found extension `DRI_RENDERER_QUERY'
libEGL debug: found extension `DRI_CONFIG_QUERY'
libEGL debug: found extension `DRI2_Throttle'
libEGL debug: found extension `DRI2_Fence'
libEGL debug: found extension `DRI2_Interop'
libEGL debug: found extension `DRI_TexBuffer'
libEGL debug: found extension `DRI2_Flush'
libEGL debug: found extension `DRI_IMAGE'
libEGL debug: found extension `DRI_RENDERER_QUERY'
libEGL info: found extension DRI_RENDERER_QUERY version 1
libEGL debug: found extension `DRI_CONFIG_QUERY'
libEGL info: found extension DRI_CONFIG_QUERY version 1
libEGL debug: found extension `DRI2_Throttle'
libEGL debug: found extension `DRI2_Fence'
libEGL info: found extension DRI2_Fence version 2
libEGL debug: found extension `DRI2_Interop'
libEGL info: found extension DRI2_Interop version 1
libEGL debug: did not find optional extension DRI_Robustness version 1
libEGL info: Using DRI3
libEGL debug: the best driver is DRI2
EGL_VERSION = 1.4 (DRI2)
zsh: segmentation fault (core dumped)

As can be seen, libEGL runs right through the DRI3 init and then crashes when
it tries to draw without having a surface. The backtrace differs according to
the driver in use. This is just an example from my machine:

(lldb) bt
* thread #1
  * frame #0: r600_dri.so`_debug_assert_fail(expr="surface",
file="state_tracker/st_atom_framebuffer.c", line=61,
function="update_framebuffer_size") at u_debug.c:321
frame #1:
r600_dri.so`update_framebuffer_size(framebuffer=0x00080a845028,
surface=0x) at st_atom_framebuffer.c:61
frame #2: r600_dri.so`update_framebuffer_state(st=0x00080a843000) at
st_atom_framebuffer.c:181
frame #3: r600_dri.so`st_validate_state(st=0x00080a843000,
pipeline=ST_PIPELINE_RENDER) at st_atom.c:219
frame #4: 

[Mesa-dev] [Bug 100613] Regression in Mesa 17 on s390x (zSystems)

2017-04-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=100613

--- Comment #2 from Stefan Dirsch  ---
Roland, thanks a lot for your prompt reply! Very much appreciated! 

Seems Richard meanwhile switched companies from IBM to ARM meanwhile. I found
him on Linkedin. Possibly he's now working on aarch64 (LE). So I'm afraid he
has no longer access to BE machines any longer.

Unfortunately I'm not familiar with llvmpipe at all. Would it be an option not
to change the code there for BE, if developers have no access to such machines?
Reverse-applying the commit is going to break sooner or later I'm sure.

Of course I'm willing to test any proposed change/patch on s390x, but I'm not a
Mesa/llvmwpipe developer per se.

UNfortunately llvmpipe is needed on s390x, since it has become a requirement
for modern desktops like gdm/gnome-shell. :-(

I can't say how fundamental the issue is. gdm and gnome-shell just show a black
screen. :-( 

I found glxgears as example more useful. ;-)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev