Re: [Mesa-dev] [PATCH 6/9] gallium/radeon: cleanup getting PIPE_QUERY_TIMESTAMP result

2016-09-16 Thread Edward O'Callaghan
Reviewed-by: Edward O'Callaghan 

On 09/16/2016 11:57 PM, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> ---
>  src/gallium/drivers/radeon/r600_query.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/r600_query.c 
> b/src/gallium/drivers/radeon/r600_query.c
> index b9041eb..c1c3599 100644
> --- a/src/gallium/drivers/radeon/r600_query.c
> +++ b/src/gallium/drivers/radeon/r600_query.c
> @@ -946,26 +946,22 @@ static void r600_query_hw_add_result(struct 
> r600_common_context *ctx,
>   unsigned results_base = i * 16;
>   result->b = result->b ||
>   r600_query_read_result(buffer + results_base, 
> 0, 2, true) != 0;
>   }
>   break;
>   }
>   case PIPE_QUERY_TIME_ELAPSED:
>   result->u64 += r600_query_read_result(buffer, 0, 2, false);
>   break;
>   case PIPE_QUERY_TIMESTAMP:
> - {
> - uint32_t *current_result = (uint32_t*)buffer;
> - result->u64 = (uint64_t)current_result[0] |
> -   (uint64_t)current_result[1] << 32;
> + result->u64 = *(uint64_t*)buffer;
>   break;
> - }
>   case PIPE_QUERY_PRIMITIVES_EMITTED:
>   /* SAMPLE_STREAMOUTSTATS stores this structure:
>* {
>*u64 NumPrimitivesWritten;
>*u64 PrimitiveStorageNeeded;
>* }
>* We only need NumPrimitivesWritten here. */
>   result->u64 += r600_query_read_result(buffer, 2, 6, true);
>   break;
>   case PIPE_QUERY_PRIMITIVES_GENERATED:
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] spirv/cfg: Detect switch_break after loop_break/continue

2016-09-16 Thread Jason Ekstrand
While the current CFG code is valid in the case where a switch break also
happens to be a loop continue, it's a bit suboptimal.  Since hardware is
capable of handling the continue as a direct jump, it's better to use a
continue instruction when we can than to bother with all of the nasty
switch break lowering.

Signed-off-by: Jason Ekstrand 
---
 src/compiler/spirv/vtn_cfg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/compiler/spirv/vtn_cfg.c b/src/compiler/spirv/vtn_cfg.c
index 75251a4..62b9056 100644
--- a/src/compiler/spirv/vtn_cfg.c
+++ b/src/compiler/spirv/vtn_cfg.c
@@ -239,12 +239,12 @@ vtn_get_branch_type(struct vtn_block *block,
  swcase->fallthrough == block->switch_case);
   swcase->fallthrough = block->switch_case;
   return vtn_branch_type_switch_fallthrough;
-   } else if (block == switch_break) {
-  return vtn_branch_type_switch_break;
} else if (block == loop_break) {
   return vtn_branch_type_loop_break;
} else if (block == loop_cont) {
   return vtn_branch_type_loop_continue;
+   } else if (block == switch_break) {
+  return vtn_branch_type_switch_break;
} else {
   return vtn_branch_type_none;
}
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] nir/spirv: Handle switches whose break block is a loop continue

2016-09-16 Thread Jason Ekstrand
It is possible that the break block of a switch is actually the continue of
the loop containing the switch.  In this case, we need to identify the
break block as a continue and break out of current level of CFG handling.
If we don't, the continue portion of the loop will get handled twice, once
by following after the break and a second time by the loop handling code
handling it explicitly.

This fixes 6 of the new Vulkan CTS tests:
 - dEQP-VK.spirv_assembly.instruction.graphics.opphi.out_of_order*
 - 
dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order*

Signed-off-by: Jason Ekstrand 
---
 src/compiler/spirv/vtn_cfg.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/compiler/spirv/vtn_cfg.c b/src/compiler/spirv/vtn_cfg.c
index 599ed69..75251a4 100644
--- a/src/compiler/spirv/vtn_cfg.c
+++ b/src/compiler/spirv/vtn_cfg.c
@@ -443,6 +443,19 @@ vtn_cfg_walk_blocks(struct vtn_builder *b, struct 
list_head *cf_list,
 vtn_order_case(swtch, case_block->switch_case);
  }
 
+ enum vtn_branch_type branch_type =
+vtn_get_branch_type(break_block, switch_case, NULL,
+loop_break, loop_cont);
+
+ if (branch_type != vtn_branch_type_none) {
+/* It is possible that the break is actually the continue block
+ * for the containing loop.  In this case, we need to bail and let
+ * the loop parsing code handle the continue properly.
+ */
+assert(branch_type == vtn_branch_type_loop_continue);
+return;
+ }
+
  block = break_block;
  continue;
   }
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.

2016-09-16 Thread Jason Ekstrand
On Fri, Sep 16, 2016 at 5:59 PM, Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > On Sep 16, 2016 3:04 PM, "Francisco Jerez" 
> wrote:
> >>
> >> Not intended for upstream.  Should cause a GPU hang if some thread is
> >> executed with a non-contiguous dispatch mask breaking assumptions of
> >> brw_stage_has_packed_dispatch().  Doesn't cause any CTS, DEQP or
> >> Piglit regressions, while replacing brw_stage_has_packed_dispatch()
> >> with a dummy implementation that unconditionally returns true on top
> >> of this patch causes multiple GPU hangs.
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 17 +
> >>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 +
> >>  2 files changed, 38 insertions(+)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> index 042203d..b3eec49 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> @@ -33,6 +33,23 @@ using namespace brw::surface_access;
> >>  void
> >>  fs_visitor::emit_nir_code()
> >>  {
> >> +   if (brw_stage_has_packed_dispatch(stage, prog_data)) {
> >
> > Mind adding "0 &&" and merging this patch so we remain aware of the
> issue,
> > keep it building, and can easily test future hardware.
> >
> I guess that would work -- An alternative that would keep the
> NIR-to-i965 pass tidier would be to assert(devinfo->gen <= 9) in
> brw_stage_has_packed_dispatch() to make sure we don't forget to do a
> full Piglit/DEQP/CTS run with this patch applied when a new generation
> is powered on.  I can also add a comment with a link to this patch.
>

Can we do both?  Maybe something like this instead of an assert:

if (devinfo->gen > 9) {
   static bool warned = false;
   if (!warned) {
  fprintf(stderr, "WARNING: VMask/DMask power-of-two assumptions need
to be verified.  Once verified using emit_fs_mask_pow2_check(), this
warning may be disabled for gen%d", devinfo->gen);
  warned = true;
   }
}

where emit_fs_mask_pow2_check() is a helper that we put the check code
into.  That way it's not an assert-failure that will get instantly removed
but something that will constantly bug the bring-up person until they have
verified the assumption.


> >> +  const fs_builder ubld = bld.exec_all().group(1, 0);
> >> +  const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0);
> >> +  const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ?
> > brw_vmask_reg() :
> >> +   brw_dmask_reg());
> >> +
> >> +  ubld.ADD(tmp, mask, brw_imm_ud(1));
> >> +  ubld.AND(tmp, mask, tmp);
> >> +
> >> +  /* This will loop forever if the dispatch mask doesn't have the
> > expected
> >> +   * form '2^n-1', in which case tmp will be non-zero.
> >> +   */
> >> +  bld.emit(BRW_OPCODE_DO);
> >> +  bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0),
> BRW_CONDITIONAL_NZ);
> >> +  set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE));
> >> +   }
> >> +
> >> /* emit the arrays used for inputs and outputs - load/store
> > intrinsics will
> >>  * be converted to reads/writes of these arrays
> >>  */
> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> >> index ba3bbdf..9f7a1f0 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> >> @@ -35,6 +35,27 @@ namespace brw {
> >>  void
> >>  vec4_visitor::emit_nir_code()
> >>  {
> >> +   if (brw_stage_has_packed_dispatch(stage, _data->base)) {
> >> +  const dst_reg tmp = writemask(dst_reg(this,
> glsl_type::uint_type),
> >> +WRITEMASK_X);
> >> +  const src_reg mask =
> >> +
> >  brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE,
> > BRW_ARF_STATE, 0),
> >> +   0, 4, 1),
> >> +BRW_REGISTER_TYPE_UD),
> >> + BRW_SWIZZLE_);
> >
> > Can we just do vec4_reg(brw_vmask_reg)?
> >
> That didn't seem to work, because both brw_dmask_reg() and
> brw_vmask_reg() return a register region that is equivalent to
> sr0.0. in Align16 addressing mode (maybe a bug of PATCH 1?  It's
> unlikely to matter a lot in practice though), so the Z component where
> the DMask is stored in is not even part of the returned Align16 region.
>

Right.  As I said in another patch, I'm not terribly concerned about vec4.
We've shown it works and we won't be bringing up more vec4.  I'm fine with
just merging the fs version.


> >> +
> >> +  emit(ADD(tmp, mask, brw_imm_ud(1)));
> >> +  emit(AND(tmp, mask, src_reg(tmp)));
> >> +
> >> +  /* This will loop forever if the dispatch mask doesn't have the
> > expected
> >> +   * form '2^n-1', in which case tmp will be non-zero.
> >> +   */
> >> +  

Re: [Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.

2016-09-16 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Sep 16, 2016 3:04 PM, "Francisco Jerez"  wrote:
>>
>> Not intended for upstream.  Should cause a GPU hang if some thread is
>> executed with a non-contiguous dispatch mask breaking assumptions of
>> brw_stage_has_packed_dispatch().  Doesn't cause any CTS, DEQP or
>> Piglit regressions, while replacing brw_stage_has_packed_dispatch()
>> with a dummy implementation that unconditionally returns true on top
>> of this patch causes multiple GPU hangs.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 17 +
>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 +
>>  2 files changed, 38 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index 042203d..b3eec49 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -33,6 +33,23 @@ using namespace brw::surface_access;
>>  void
>>  fs_visitor::emit_nir_code()
>>  {
>> +   if (brw_stage_has_packed_dispatch(stage, prog_data)) {
>
> Mind adding "0 &&" and merging this patch so we remain aware of the issue,
> keep it building, and can easily test future hardware.
>
I guess that would work -- An alternative that would keep the
NIR-to-i965 pass tidier would be to assert(devinfo->gen <= 9) in
brw_stage_has_packed_dispatch() to make sure we don't forget to do a
full Piglit/DEQP/CTS run with this patch applied when a new generation
is powered on.  I can also add a comment with a link to this patch.

>> +  const fs_builder ubld = bld.exec_all().group(1, 0);
>> +  const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0);
>> +  const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ?
> brw_vmask_reg() :
>> +   brw_dmask_reg());
>> +
>> +  ubld.ADD(tmp, mask, brw_imm_ud(1));
>> +  ubld.AND(tmp, mask, tmp);
>> +
>> +  /* This will loop forever if the dispatch mask doesn't have the
> expected
>> +   * form '2^n-1', in which case tmp will be non-zero.
>> +   */
>> +  bld.emit(BRW_OPCODE_DO);
>> +  bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0), BRW_CONDITIONAL_NZ);
>> +  set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE));
>> +   }
>> +
>> /* emit the arrays used for inputs and outputs - load/store
> intrinsics will
>>  * be converted to reads/writes of these arrays
>>  */
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> index ba3bbdf..9f7a1f0 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> @@ -35,6 +35,27 @@ namespace brw {
>>  void
>>  vec4_visitor::emit_nir_code()
>>  {
>> +   if (brw_stage_has_packed_dispatch(stage, _data->base)) {
>> +  const dst_reg tmp = writemask(dst_reg(this, glsl_type::uint_type),
>> +WRITEMASK_X);
>> +  const src_reg mask =
>> +
>  brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE,
> BRW_ARF_STATE, 0),
>> +   0, 4, 1),
>> +BRW_REGISTER_TYPE_UD),
>> + BRW_SWIZZLE_);
>
> Can we just do vec4_reg(brw_vmask_reg)?
>
That didn't seem to work, because both brw_dmask_reg() and
brw_vmask_reg() return a register region that is equivalent to
sr0.0. in Align16 addressing mode (maybe a bug of PATCH 1?  It's
unlikely to matter a lot in practice though), so the Z component where
the DMask is stored in is not even part of the returned Align16 region.

>> +
>> +  emit(ADD(tmp, mask, brw_imm_ud(1)));
>> +  emit(AND(tmp, mask, src_reg(tmp)));
>> +
>> +  /* This will loop forever if the dispatch mask doesn't have the
> expected
>> +   * form '2^n-1', in which case tmp will be non-zero.
>> +   */
>> +  emit(BRW_OPCODE_DO);
>> +  emit(CMP(dst_null_ud(), src_reg(tmp), brw_imm_ud(0),
>> +   BRW_CONDITIONAL_NZ));
>> +  emit(BRW_OPCODE_WHILE)->predicate = BRW_PREDICATE_NORMAL;
>> +   }
>> +
>> if (nir->num_uniforms > 0)
>>nir_setup_uniforms();
>>
>> --
>> 2.9.0
>>


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] mesa: implement GL_OES_texture_view (V2)

2016-09-16 Thread Ilia Mirkin
Hi Francesco,

Where are you with the piglit tests? I just finished converting the
ARB_viewport_array tests, and was thinking of having a go at the
ARB_texture_view ones. However if you've made significant progress
there already, I have other things I can do too.

  -ilia

On Wed, Aug 31, 2016 at 1:43 AM, Francesco Ansanelli
 wrote:
> Hi,
>
> I sent this series to have some feedback (rfc prefix failed) and the comment
> will be addressed as soon as the tests are done.
> I started checking the piglit part that you suggested in a previous mail..
>
> I'll ask you also about them if I don't bother too much :)
>
>
> Il 31 ago 2016 01:44, "Ilia Mirkin"  ha scritto:
>>
>> On Mon, Aug 29, 2016 at 1:25 AM, Francesco Ansanelli
>>  wrote:
>> > XXX still need to figure how to treat the removed VIEW_CLASS*
>> > and formats.
>>
>> Can you elaborate what this comment means?
>>
>> You definitely need to add piglit tests for testing ETC2 stuff - it's
>> not supported in hardware for most desktop hw, and so a fallback
>> method is used. I think that's the main thing this patchset is waiting
>> on...
>>
>>   -ilia
>>
>> >
>> > V2: drop the oes suffix in messages
>> > (Ilia Mirkin)
>> >
>> > Signed-off-by: Francesco Ansanelli 
>> > ---
>> >  src/mesa/main/textureview.c |   11 +++
>> >  1 file changed, 7 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/src/mesa/main/textureview.c b/src/mesa/main/textureview.c
>> > index ed66c17..36663cc 100644
>> > --- a/src/mesa/main/textureview.c
>> > +++ b/src/mesa/main/textureview.c
>> > @@ -387,8 +387,10 @@ target_valid(struct gl_context *ctx, GLenum
>> > origTarget, GLenum newTarget)
>> > switch (origTarget) {
>> > case GL_TEXTURE_1D:
>> > case GL_TEXTURE_1D_ARRAY:
>> > -  RETURN_IF_SUPPORTED(TEXTURE_1D);
>> > -  RETURN_IF_SUPPORTED(TEXTURE_1D_ARRAY);
>> > +  if (!_mesa_is_gles3(ctx)) {
>> > + RETURN_IF_SUPPORTED(TEXTURE_1D);
>> > + RETURN_IF_SUPPORTED(TEXTURE_1D_ARRAY);
>> > +  }
>> >break;
>> > case GL_TEXTURE_2D:
>> >RETURN_IF_SUPPORTED(TEXTURE_2D);
>> > @@ -398,7 +400,8 @@ target_valid(struct gl_context *ctx, GLenum
>> > origTarget, GLenum newTarget)
>> >RETURN_IF_SUPPORTED(TEXTURE_3D);
>> >break;
>> > case GL_TEXTURE_RECTANGLE:
>> > -  RETURN_IF_SUPPORTED(TEXTURE_RECTANGLE);
>> > +  if (!_mesa_is_gles3(ctx))
>> > + RETURN_IF_SUPPORTED(TEXTURE_RECTANGLE);
>> >break;
>> > case GL_TEXTURE_CUBE_MAP:
>> > case GL_TEXTURE_2D_ARRAY:
>> > @@ -514,7 +517,7 @@ _mesa_set_texture_view_state(struct gl_context *ctx,
>> >  }
>> >
>> >  /**
>> > - * glTextureView (ARB_texture_view)
>> > + * glTextureView (ARB_texture_view / OES_texture_view)
>> >   * If an error is found, record it with _mesa_error()
>> >   * \return none.
>> >   */
>> > --
>> > 1.7.9.5
>> >
>> > ___
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/10] nir: Add a loop analysis pass

2016-09-16 Thread Jason Ekstrand
On Fri, Sep 16, 2016 at 5:36 PM, Connor Abbott  wrote:

> On Fri, Sep 16, 2016 at 6:25 PM, Jason Ekstrand 
> wrote:
> > On Thu, Sep 15, 2016 at 12:03 AM, Timothy Arceri
> >  wrote:
> >>
> >> From: Thomas Helland 
> >>
> >> This pass detects induction variables and calculates the
> >> trip count of loops to be used for loop unrolling.
> >>
> >> I've removed support for float induction values for now, for the
> >> simple reason that they don't appear in my shader-db collection,
> >> and so I don't see it as common enough that we want to pollute the
> >> pass with this in the initial version.
> >>
> >> V2: Rebase, adapt to removal of function overloads
> >>
> >> V3: (Timothy Arceri)
> >>  - don't try to find trip count if loop terminator conditional is a phi
> >>  - fix trip count for do-while loops
> >>  - replace conditional type != alu assert with return
> >>  - disable unrolling of loops with continues
> >>  - multiple fixes to memory allocation, stop leaking and don't destroy
> >>structs we want to use for unrolling.
> >>  - fix iteration count bugs when induction var not on RHS of condition
> >>  - add FIXME for && conditions
> >>  - calculate trip count for unsigned induction/limit vars
> >>
> >> V4:
> >> - count instructions in a loop
> >> - set the limiting_terminator even if we can't find the trip count for
> >>  all terminators. This is needed for complex unrolling where we handle
> >>  2 terminators and the trip count is unknown for one of them.
> >> - restruct structs so we don't keep information not required after
> >>  analysis and remove dead fields.
> >> - force unrolling in some cases as per the rules in the GLSL IR pass
> >> ---
> >>  src/compiler/Makefile.sources   |2 +
> >>  src/compiler/nir/nir.h  |   36 +-
> >>  src/compiler/nir/nir_loop_analyze.c | 1012
> >> +++
> >>  src/compiler/nir/nir_metadata.c |8 +-
> >>  4 files changed, 1056 insertions(+), 2 deletions(-)
> >>  create mode 100644 src/compiler/nir/nir_loop_analyze.c
> >>
> >> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.
> sources
> >> index f5b4f9c..7ed26a9 100644
> >> --- a/src/compiler/Makefile.sources
> >> +++ b/src/compiler/Makefile.sources
> >> @@ -190,6 +190,8 @@ NIR_FILES = \
> >> nir/nir_intrinsics.c \
> >> nir/nir_intrinsics.h \
> >> nir/nir_liveness.c \
> >> +   nir/nir_loop_analyze.c \
> >> +   nir/nir_loop_analyze.h \
> >> nir/nir_lower_alu_to_scalar.c \
> >> nir/nir_lower_atomics.c \
> >> nir/nir_lower_bitmap.c \
> >> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> >> index ff7c422..49e8cd8 100644
> >> --- a/src/compiler/nir/nir.h
> >> +++ b/src/compiler/nir/nir.h
> >> @@ -1549,9 +1549,36 @@ nir_if_last_else_node(nir_if *if_stmt)
> >>  }
> >>
> >>  typedef struct {
> >> +   nir_if *nif;
> >> +
> >> +   nir_instr *conditional_instr;
> >> +
> >> +   struct list_head loop_terminator_link;
> >> +} nir_loop_terminator;
> >> +
> >> +typedef struct {
> >> +   /* Number of instructions in the loop */
> >> +   unsigned num_instructions;
> >> +
> >> +   /* How many times the loop is run (if known) */
> >> +   unsigned trip_count;
> >> +   bool is_trip_count_known;
> >
> >
> > We could use 0 or -1 to indicate "I don't know trip count" instead of an
> > extra boolean.  Not sure that it matters much.
> >
> >>
> >> +
> >> +   /* Unroll the loop regardless of its size */
> >> +   bool force_unroll;
> >
> >
> > It seems a bit odd to have this decide to force-unroll.  This is an
> analysis
> > pass, not a "make decisions" pass.
> >
> >>
> >> +
> >> +   nir_loop_terminator *limiting_terminator;
> >> +
> >> +   /* A list of loop_terminators terminating this loop. */
> >> +   struct list_head loop_terminator_list;
> >> +} nir_loop_info;
> >> +
> >> +typedef struct {
> >> nir_cf_node cf_node;
> >>
> >> struct exec_list body; /** < list of nir_cf_node */
> >> +
> >> +   nir_loop_info *info;
> >>  } nir_loop;
> >>
> >>  static inline nir_cf_node *
> >> @@ -1576,6 +1603,7 @@ typedef enum {
> >> nir_metadata_dominance = 0x2,
> >> nir_metadata_live_ssa_defs = 0x4,
> >> nir_metadata_not_properly_reset = 0x8,
> >> +   nir_metadata_loop_analysis = 0x16,
> >>  } nir_metadata;
> >>
> >>  typedef struct {
> >> @@ -1758,6 +1786,8 @@ typedef struct nir_shader_compiler_options {
> >>  * information must be inferred from the list of input
> nir_variables.
> >>  */
> >> bool use_interpolated_input_intrinsics;
> >> +
> >> +   unsigned max_unroll_iterations;
> >>  } nir_shader_compiler_options;
> >>
> >>  typedef struct nir_shader_info {
> >> @@ -1962,7 +1992,7 @@ nir_loop *nir_loop_create(nir_shader *shader);
> >>  nir_function_impl *nir_cf_node_get_function(nir_cf_node *node);
> >>
> >>  /** requests that the given pieces of metadata be generated */
> >> -void 

Re: [Mesa-dev] [PATCH 02/10] nir: Add a loop analysis pass

2016-09-16 Thread Connor Abbott
On Fri, Sep 16, 2016 at 6:25 PM, Jason Ekstrand  wrote:
> On Thu, Sep 15, 2016 at 12:03 AM, Timothy Arceri
>  wrote:
>>
>> From: Thomas Helland 
>>
>> This pass detects induction variables and calculates the
>> trip count of loops to be used for loop unrolling.
>>
>> I've removed support for float induction values for now, for the
>> simple reason that they don't appear in my shader-db collection,
>> and so I don't see it as common enough that we want to pollute the
>> pass with this in the initial version.
>>
>> V2: Rebase, adapt to removal of function overloads
>>
>> V3: (Timothy Arceri)
>>  - don't try to find trip count if loop terminator conditional is a phi
>>  - fix trip count for do-while loops
>>  - replace conditional type != alu assert with return
>>  - disable unrolling of loops with continues
>>  - multiple fixes to memory allocation, stop leaking and don't destroy
>>structs we want to use for unrolling.
>>  - fix iteration count bugs when induction var not on RHS of condition
>>  - add FIXME for && conditions
>>  - calculate trip count for unsigned induction/limit vars
>>
>> V4:
>> - count instructions in a loop
>> - set the limiting_terminator even if we can't find the trip count for
>>  all terminators. This is needed for complex unrolling where we handle
>>  2 terminators and the trip count is unknown for one of them.
>> - restruct structs so we don't keep information not required after
>>  analysis and remove dead fields.
>> - force unrolling in some cases as per the rules in the GLSL IR pass
>> ---
>>  src/compiler/Makefile.sources   |2 +
>>  src/compiler/nir/nir.h  |   36 +-
>>  src/compiler/nir/nir_loop_analyze.c | 1012
>> +++
>>  src/compiler/nir/nir_metadata.c |8 +-
>>  4 files changed, 1056 insertions(+), 2 deletions(-)
>>  create mode 100644 src/compiler/nir/nir_loop_analyze.c
>>
>> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
>> index f5b4f9c..7ed26a9 100644
>> --- a/src/compiler/Makefile.sources
>> +++ b/src/compiler/Makefile.sources
>> @@ -190,6 +190,8 @@ NIR_FILES = \
>> nir/nir_intrinsics.c \
>> nir/nir_intrinsics.h \
>> nir/nir_liveness.c \
>> +   nir/nir_loop_analyze.c \
>> +   nir/nir_loop_analyze.h \
>> nir/nir_lower_alu_to_scalar.c \
>> nir/nir_lower_atomics.c \
>> nir/nir_lower_bitmap.c \
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index ff7c422..49e8cd8 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -1549,9 +1549,36 @@ nir_if_last_else_node(nir_if *if_stmt)
>>  }
>>
>>  typedef struct {
>> +   nir_if *nif;
>> +
>> +   nir_instr *conditional_instr;
>> +
>> +   struct list_head loop_terminator_link;
>> +} nir_loop_terminator;
>> +
>> +typedef struct {
>> +   /* Number of instructions in the loop */
>> +   unsigned num_instructions;
>> +
>> +   /* How many times the loop is run (if known) */
>> +   unsigned trip_count;
>> +   bool is_trip_count_known;
>
>
> We could use 0 or -1 to indicate "I don't know trip count" instead of an
> extra boolean.  Not sure that it matters much.
>
>>
>> +
>> +   /* Unroll the loop regardless of its size */
>> +   bool force_unroll;
>
>
> It seems a bit odd to have this decide to force-unroll.  This is an analysis
> pass, not a "make decisions" pass.
>
>>
>> +
>> +   nir_loop_terminator *limiting_terminator;
>> +
>> +   /* A list of loop_terminators terminating this loop. */
>> +   struct list_head loop_terminator_list;
>> +} nir_loop_info;
>> +
>> +typedef struct {
>> nir_cf_node cf_node;
>>
>> struct exec_list body; /** < list of nir_cf_node */
>> +
>> +   nir_loop_info *info;
>>  } nir_loop;
>>
>>  static inline nir_cf_node *
>> @@ -1576,6 +1603,7 @@ typedef enum {
>> nir_metadata_dominance = 0x2,
>> nir_metadata_live_ssa_defs = 0x4,
>> nir_metadata_not_properly_reset = 0x8,
>> +   nir_metadata_loop_analysis = 0x16,
>>  } nir_metadata;
>>
>>  typedef struct {
>> @@ -1758,6 +1786,8 @@ typedef struct nir_shader_compiler_options {
>>  * information must be inferred from the list of input nir_variables.
>>  */
>> bool use_interpolated_input_intrinsics;
>> +
>> +   unsigned max_unroll_iterations;
>>  } nir_shader_compiler_options;
>>
>>  typedef struct nir_shader_info {
>> @@ -1962,7 +1992,7 @@ nir_loop *nir_loop_create(nir_shader *shader);
>>  nir_function_impl *nir_cf_node_get_function(nir_cf_node *node);
>>
>>  /** requests that the given pieces of metadata be generated */
>> -void nir_metadata_require(nir_function_impl *impl, nir_metadata
>> required);
>> +void nir_metadata_require(nir_function_impl *impl, nir_metadata required,
>> ...);
>>  /** dirties all but the preserved metadata */
>>  void nir_metadata_preserve(nir_function_impl *impl, nir_metadata
>> preserved);
>>
>> @@ -2559,6 +2589,10 @@ void 

Re: [Mesa-dev] [PATCH 02/10] nir: Add a loop analysis pass

2016-09-16 Thread Timothy Arceri
On Sat, 2016-09-17 at 09:40 +1000, Timothy Arceri wrote:
> On Fri, 2016-09-16 at 15:25 -0700, Jason Ekstrand wrote:
> > > > On Thu, Sep 15, 2016 at 12:03 AM, Timothy Arceri  wrote:
> > > > > > From: Thomas Helland 
> > 
> > 

snip

> 
> >  
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 
+      return -1;

+

+   /* do-while loops can increment the starting value before the condition is

+    * checked. e.g.

+    *

+    *    do {

+    *        ndx++;

+    *     } while (ndx < 3);

+    *

+    * Here we check if the induction variable is used directly by the loop

+    * condition and if so we assume we need to step the initial value.

+    */

+   bool increment_before = false;

+   if (cond_alu->src[0].src.ssa == alu_def->def ||

+       cond_alu->src[1].src.ssa == alu_def->def) {

+      increment_before = true;
> > 
> > Is there a reason why this can't be handled as "trip_count + 1"?  This 
> > seems way overcomplicated.
> 
> Yes there is. We don't know that we will increment by 1 it could be by 10, 
> also if we support more opts we may have to do a mul etc.
> We could set the initial value here but I decided to keep it all in 
> get_iteration() I guess I could move this logic there also.
> 
Ingonre that I see what you are getting at but it would be trip_count -
1. That should be ok if sub is lowered as you say.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] i965/ir: Pass identity mask to brw_find_live_channel() in the packed dispatch case.

2016-09-16 Thread Jason Ekstrand
As I said on patch 5, I would like to see some version of it merged at
least for fs.  The vec4 back-end isn't as much of a problem since we've
verified it now and future hardware won't be using it.

Series is Reviewed-by: Jason Ekstrand 

On Sep 16, 2016 3:04 PM, "Francisco Jerez"  wrote:

> This avoids emitting a few extra instructions required to take the
> dispatch mask into account when it's known to be tightly packed.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 4 +++-
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 8 ++--
>  2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index c510f42..bdeda3b 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -2045,7 +2045,9 @@ fs_generator::generate_code(const cfg_t *cfg, int
> dispatch_width)
>
>case SHADER_OPCODE_FIND_LIVE_CHANNEL: {
>   const struct brw_reg mask =
> -stage == MESA_SHADER_FRAGMENT ? brw_vmask_reg() :
> brw_dmask_reg();
> +brw_stage_has_packed_dispatch(stage, prog_data) ?
> brw_imm_ud(~0u) :
> +stage == MESA_SHADER_FRAGMENT ? brw_vmask_reg() :
> +brw_dmask_reg();
>   brw_find_live_channel(p, dst, mask);
>   break;
>}
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> index f9e6d1c..2bef549 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> @@ -1862,9 +1862,13 @@ generate_code(struct brw_codegen *p,
>   brw_memory_fence(p, dst);
>   break;
>
> -  case SHADER_OPCODE_FIND_LIVE_CHANNEL:
> - brw_find_live_channel(p, dst, brw_dmask_reg());
> +  case SHADER_OPCODE_FIND_LIVE_CHANNEL: {
> + const struct brw_reg mask =
> +brw_stage_has_packed_dispatch(nir->stage, _data->base) ?
> +brw_imm_ud(~0u) : brw_dmask_reg();
> + brw_find_live_channel(p, dst, mask);
>   break;
> +  }
>
>case SHADER_OPCODE_BROADCAST:
>   assert(inst->force_writemask_all);
> --
> 2.9.0
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.

2016-09-16 Thread Jason Ekstrand
On Sep 16, 2016 3:04 PM, "Francisco Jerez"  wrote:
>
> Not intended for upstream.  Should cause a GPU hang if some thread is
> executed with a non-contiguous dispatch mask breaking assumptions of
> brw_stage_has_packed_dispatch().  Doesn't cause any CTS, DEQP or
> Piglit regressions, while replacing brw_stage_has_packed_dispatch()
> with a dummy implementation that unconditionally returns true on top
> of this patch causes multiple GPU hangs.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 17 +
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 +
>  2 files changed, 38 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 042203d..b3eec49 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -33,6 +33,23 @@ using namespace brw::surface_access;
>  void
>  fs_visitor::emit_nir_code()
>  {
> +   if (brw_stage_has_packed_dispatch(stage, prog_data)) {

Mind adding "0 &&" and merging this patch so we remain aware of the issue,
keep it building, and can easily test future hardware.

> +  const fs_builder ubld = bld.exec_all().group(1, 0);
> +  const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0);
> +  const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ?
brw_vmask_reg() :
> +   brw_dmask_reg());
> +
> +  ubld.ADD(tmp, mask, brw_imm_ud(1));
> +  ubld.AND(tmp, mask, tmp);
> +
> +  /* This will loop forever if the dispatch mask doesn't have the
expected
> +   * form '2^n-1', in which case tmp will be non-zero.
> +   */
> +  bld.emit(BRW_OPCODE_DO);
> +  bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0), BRW_CONDITIONAL_NZ);
> +  set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE));
> +   }
> +
> /* emit the arrays used for inputs and outputs - load/store
intrinsics will
>  * be converted to reads/writes of these arrays
>  */
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index ba3bbdf..9f7a1f0 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -35,6 +35,27 @@ namespace brw {
>  void
>  vec4_visitor::emit_nir_code()
>  {
> +   if (brw_stage_has_packed_dispatch(stage, _data->base)) {
> +  const dst_reg tmp = writemask(dst_reg(this, glsl_type::uint_type),
> +WRITEMASK_X);
> +  const src_reg mask =
> +
 brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE,
BRW_ARF_STATE, 0),
> +   0, 4, 1),
> +BRW_REGISTER_TYPE_UD),
> + BRW_SWIZZLE_);

Can we just do vec4_reg(brw_vmask_reg)?

> +
> +  emit(ADD(tmp, mask, brw_imm_ud(1)));
> +  emit(AND(tmp, mask, src_reg(tmp)));
> +
> +  /* This will loop forever if the dispatch mask doesn't have the
expected
> +   * form '2^n-1', in which case tmp will be non-zero.
> +   */
> +  emit(BRW_OPCODE_DO);
> +  emit(CMP(dst_null_ud(), src_reg(tmp), brw_imm_ud(0),
> +   BRW_CONDITIONAL_NZ));
> +  emit(BRW_OPCODE_WHILE)->predicate = BRW_PREDICATE_NORMAL;
> +   }
> +
> if (nir->num_uniforms > 0)
>nir_setup_uniforms();
>
> --
> 2.9.0
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread dispatch.

2016-09-16 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Fri, Sep 16, 2016 at 3:03 PM, Francisco Jerez 
> wrote:
>
>> The eliminate_find_live_channel optimization eliminates
>> FIND_LIVE_CHANNEL instructions in cases where control flow is known to
>> be uniform, and replaces them with 'MOV 0', which in turn unblocks
>> subsequent elimination of the BROADCAST instruction frequently used on
>> the result of FIND_LIVE_CHANNEL.  This is however not correct in
>> per-sample fragment shader dispatch because the PSD can dispatch a
>> fully unlit sample under certain conditions.  Disable the optimization
>> in that case.
>> ---
>>  src/mesa/drivers/dri/i965/brw_compiler.h | 41
>> 
>>  src/mesa/drivers/dri/i965/brw_fs.cpp |  8 +++
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp   |  8 +++
>>  3 files changed, 57 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h
>> b/src/mesa/drivers/dri/i965/brw_compiler.h
>> index 84d3dde..1429875 100644
>> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
>> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
>> @@ -868,6 +868,47 @@ encode_slm_size(unsigned gen, uint32_t bytes)
>> return slm_size;
>>  }
>>
>> +/**
>> + * Return true if the given shader stage is dispatched contiguously by the
>> + * relevant fixed function starting from channel 0 of the SIMD thread,
>> which
>> + * implies that the dispatch mask of a thread can be assumed to have the
>> form
>> + * '2^n - 1' for some n.
>> + */
>> +static inline bool
>> +brw_stage_has_packed_dispatch(gl_shader_stage stage,
>> +  const struct brw_stage_prog_data *prog_data)
>>
>
> Thank you, thank you, thank you for making this a well-documented helper
> function!
>

Given the amount of hardware documentation about this, any documentation
is too little documentation. ;)

>
>> +{
>> +   switch (stage) {
>> +   case MESA_SHADER_FRAGMENT: {
>> +  /* The PSD discards subspans coming in with no lit samples, which
>> in the
>> +   * per-pixel shading case implies that each subspan will either be
>> fully
>> +   * lit (due to the VMask being used to allow derivative
>> computations),
>> +   * or not dispatched at all.  In per-sample dispatch mode individual
>> +   * samples from the same subspan have a fixed relative location
>> within
>> +   * the SIMD thread, so dispatch of unlit samples cannot be avoided
>> in
>> +   * general and we should return false.
>> +   */
>> +  const struct brw_wm_prog_data *wm_prog_data =
>> + (const struct brw_wm_prog_data *)prog_data;
>> +  return !wm_prog_data->persample_dispatch;
>> +   }
>> +   case MESA_SHADER_COMPUTE:
>> +  /* Compute shaders will be spawned with either a fully enabled
>> dispatch
>> +   * mask or with whatever bottom/right execution mask was given to
>> the
>> +   * GPGPU walker command to be used along the workgroup edges -- In
>> both
>> +   * cases the dispatch mask is required to be tightly packed for our
>> +   * invocation index calculations to work.
>> +   */
>> +  return true;
>> +   default:
>> +  /* Most remaining fixed functions are limited to use a packed
>> dispatch
>> +   * mask due to the hardware representation of the dispatch mask as a
>> +   * single counter representing the number of enabled channels.
>> +   */
>> +  return true;
>> +   }
>> +}
>> +
>>  #ifdef __cplusplus
>>  } /* extern "C" */
>>  #endif
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index bb65077..32f7ae2 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -2835,6 +2835,14 @@ fs_visitor::eliminate_find_live_channel()
>> bool progress = false;
>> unsigned depth = 0;
>>
>> +   if (!brw_stage_has_packed_dispatch(stage, stage_prog_data)) {
>> +  /* The optimization below assumes that channel zero is live on
>> thread
>> +   * dispatch, which may not be the case if the fixed function
>> dispatches
>> +   * threads sparsely.
>> +   */
>> +  return progress;
>>
>
> Maybe just return false?
>

Sure, don't have a strong preference, changed locally.

>
>> +   }
>> +
>> foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
>>switch (inst->opcode) {
>>case BRW_OPCODE_IF:
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index 58c8a8a..d5bb82b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -1291,6 +1291,14 @@ vec4_visitor::eliminate_find_live_channel()
>> bool progress = false;
>> unsigned depth = 0;
>>
>> +   if (!brw_stage_has_packed_dispatch(stage, stage_prog_data)) {
>> +  /* The optimization below assumes that channel zero is live on
>> thread
>> +   * dispatch, which may not be the case if the fixed function
>> dispatches
>> + 

Re: [Mesa-dev] [PATCH 02/10] nir: Add a loop analysis pass

2016-09-16 Thread Timothy Arceri
On Fri, 2016-09-16 at 17:01 +0200, Erik Faye-Lund wrote:
> On Thu, Sep 15, 2016 at 9:03 AM, Timothy Arceri
>  wrote:
> > 
> > +   const int bias[] = { -1, 1, 1 };
> > +
> > +   for (unsigned i = 0; i < ARRAY_SIZE(bias); i++) {
> > +  iter_int = iter_int + bias[i];
> > +
> > +  switch (cond_op) {
> > +  case nir_op_ige:
> > +  case nir_op_ilt:
> > +  case nir_op_ieq:
> > +  case nir_op_ine:
> > + if (itest_interations(iter_int, step, limit, cond_op,
> > initial_val,
> > +   limit_rhs)) {
> > +return iter_int;
> > + }
> > + break;
> > +  case nir_op_uge:
> > +  case nir_op_ult:
> > + if (utest_interations(iter_int, step, limit, cond_op,
> > +   (uint32_t) initial_val, limit_rhs))
> > {
> > +return iter_int;
> > + }
> > + break;
> > +  default:
> > + return -1;
> > +  }
> > +   }
> 
> Can't this be easier written as:

Probably. I believe Thomas just copied this code from the GLSL IR pass.

> 
> for (int i = iter_int - 1; i <= iter_int + 1; ++i)
> {
> switch (cond_op) {
> [...]
> if (itest_interations(i, step, limit, cond_op, initial_val,
> limit_rhs))
>  return i;
> [...]
> ?
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/10] nir: add helpers to check if we can unroll loops

2016-09-16 Thread Timothy Arceri
On Fri, 2016-09-16 at 16:52 +0200, Erik Faye-Lund wrote:
> On Thu, Sep 15, 2016 at 9:03 AM, Timothy Arceri
>  wrote:
> > 
> > This will be used by the loop unroll and lcssa passes.
> > 
> > V2:
> > - Check instruction count is not too large for unrolling
> > - Add helper for complex loop unrolling
> > ---
> >  src/compiler/nir/nir.h | 31 +++
> >  1 file changed, 31 insertions(+)
> > 
> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> > index 49e8cd8..3a2a13a 100644
> > --- a/src/compiler/nir/nir.h
> > +++ b/src/compiler/nir/nir.h
> > @@ -2590,6 +2590,37 @@ bool nir_normalize_cubemap_coords(nir_shader
> > *shader);
> > 
> >  void nir_live_ssa_defs_impl(nir_function_impl *impl);
> > 
> > +static inline bool
> > +is_loop_small_enough_to_unroll(nir_shader *shader, nir_loop_info
> > *li)
> > +{
> > +   unsigned max_iter = shader->options->max_unroll_iterations;
> > +
> > +   if (li->trip_count > max_iter)
> > +  return false;
> > +
> > +   if (li->force_unroll)
> > +  return true;
> > +
> > +   bool loop_not_too_large =
> > +  li->num_instructions * li->trip_count <= max_iter * 25;
> 
> 
> "max_iter * 25" seems like a pretty arbirary limit at first glance.
> How was it found? Perhaps a comment explaining a bit could be added?

Well it is :P I just tried to match it somewhat to the GLSL IR pass. I
don't think there was even a great explanation for the value that was
chosen.

> 
> > 
> > +static inline bool
> > +is_complex_loop(nir_shader *shader, nir_loop_info *li)
> > +{
> > +   unsigned num_lt = list_length(>loop_terminator_list);
> > +   return is_loop_small_enough_to_unroll(shader, li) && num_lt ==
> > 2;
> 
> Perhaps you could add a comment to explain the "num_lt == 2"-part?

Sure. Basically if we don't know the trip count of all the exits (not a
simple loop) we can only possibly unroll loops with two exit points.
Anything more would be extra code for a scenario that is unlikely to
come up very often.

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread dispatch.

2016-09-16 Thread Jason Ekstrand
On Fri, Sep 16, 2016 at 3:03 PM, Francisco Jerez 
wrote:

> The eliminate_find_live_channel optimization eliminates
> FIND_LIVE_CHANNEL instructions in cases where control flow is known to
> be uniform, and replaces them with 'MOV 0', which in turn unblocks
> subsequent elimination of the BROADCAST instruction frequently used on
> the result of FIND_LIVE_CHANNEL.  This is however not correct in
> per-sample fragment shader dispatch because the PSD can dispatch a
> fully unlit sample under certain conditions.  Disable the optimization
> in that case.
> ---
>  src/mesa/drivers/dri/i965/brw_compiler.h | 41
> 
>  src/mesa/drivers/dri/i965/brw_fs.cpp |  8 +++
>  src/mesa/drivers/dri/i965/brw_vec4.cpp   |  8 +++
>  3 files changed, 57 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h
> b/src/mesa/drivers/dri/i965/brw_compiler.h
> index 84d3dde..1429875 100644
> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
> @@ -868,6 +868,47 @@ encode_slm_size(unsigned gen, uint32_t bytes)
> return slm_size;
>  }
>
> +/**
> + * Return true if the given shader stage is dispatched contiguously by the
> + * relevant fixed function starting from channel 0 of the SIMD thread,
> which
> + * implies that the dispatch mask of a thread can be assumed to have the
> form
> + * '2^n - 1' for some n.
> + */
> +static inline bool
> +brw_stage_has_packed_dispatch(gl_shader_stage stage,
> +  const struct brw_stage_prog_data *prog_data)
>

Thank you, thank you, thank you for making this a well-documented helper
function!


> +{
> +   switch (stage) {
> +   case MESA_SHADER_FRAGMENT: {
> +  /* The PSD discards subspans coming in with no lit samples, which
> in the
> +   * per-pixel shading case implies that each subspan will either be
> fully
> +   * lit (due to the VMask being used to allow derivative
> computations),
> +   * or not dispatched at all.  In per-sample dispatch mode individual
> +   * samples from the same subspan have a fixed relative location
> within
> +   * the SIMD thread, so dispatch of unlit samples cannot be avoided
> in
> +   * general and we should return false.
> +   */
> +  const struct brw_wm_prog_data *wm_prog_data =
> + (const struct brw_wm_prog_data *)prog_data;
> +  return !wm_prog_data->persample_dispatch;
> +   }
> +   case MESA_SHADER_COMPUTE:
> +  /* Compute shaders will be spawned with either a fully enabled
> dispatch
> +   * mask or with whatever bottom/right execution mask was given to
> the
> +   * GPGPU walker command to be used along the workgroup edges -- In
> both
> +   * cases the dispatch mask is required to be tightly packed for our
> +   * invocation index calculations to work.
> +   */
> +  return true;
> +   default:
> +  /* Most remaining fixed functions are limited to use a packed
> dispatch
> +   * mask due to the hardware representation of the dispatch mask as a
> +   * single counter representing the number of enabled channels.
> +   */
> +  return true;
> +   }
> +}
> +
>  #ifdef __cplusplus
>  } /* extern "C" */
>  #endif
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index bb65077..32f7ae2 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2835,6 +2835,14 @@ fs_visitor::eliminate_find_live_channel()
> bool progress = false;
> unsigned depth = 0;
>
> +   if (!brw_stage_has_packed_dispatch(stage, stage_prog_data)) {
> +  /* The optimization below assumes that channel zero is live on
> thread
> +   * dispatch, which may not be the case if the fixed function
> dispatches
> +   * threads sparsely.
> +   */
> +  return progress;
>

Maybe just return false?


> +   }
> +
> foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
>switch (inst->opcode) {
>case BRW_OPCODE_IF:
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 58c8a8a..d5bb82b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1291,6 +1291,14 @@ vec4_visitor::eliminate_find_live_channel()
> bool progress = false;
> unsigned depth = 0;
>
> +   if (!brw_stage_has_packed_dispatch(stage, stage_prog_data)) {
> +  /* The optimization below assumes that channel zero is live on
> thread
> +   * dispatch, which may not be the case if the fixed function
> dispatches
> +   * threads sparsely.
> +   */
> +  return progress;
>

Same here.


> +   }
> +
> foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
>switch (inst->opcode) {
>case BRW_OPCODE_IF:
> --
> 2.9.0
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH 02/10] nir: Add a loop analysis pass

2016-09-16 Thread Jason Ekstrand
On Thu, Sep 15, 2016 at 12:03 AM, Timothy Arceri <
timothy.arc...@collabora.com> wrote:

> From: Thomas Helland 
>
> This pass detects induction variables and calculates the
> trip count of loops to be used for loop unrolling.
>
> I've removed support for float induction values for now, for the
> simple reason that they don't appear in my shader-db collection,
> and so I don't see it as common enough that we want to pollute the
> pass with this in the initial version.
>
> V2: Rebase, adapt to removal of function overloads
>
> V3: (Timothy Arceri)
>  - don't try to find trip count if loop terminator conditional is a phi
>  - fix trip count for do-while loops
>  - replace conditional type != alu assert with return
>  - disable unrolling of loops with continues
>  - multiple fixes to memory allocation, stop leaking and don't destroy
>structs we want to use for unrolling.
>  - fix iteration count bugs when induction var not on RHS of condition
>  - add FIXME for && conditions
>  - calculate trip count for unsigned induction/limit vars
>
> V4:
> - count instructions in a loop
> - set the limiting_terminator even if we can't find the trip count for
>  all terminators. This is needed for complex unrolling where we handle
>  2 terminators and the trip count is unknown for one of them.
> - restruct structs so we don't keep information not required after
>  analysis and remove dead fields.
> - force unrolling in some cases as per the rules in the GLSL IR pass
> ---
>  src/compiler/Makefile.sources   |2 +
>  src/compiler/nir/nir.h  |   36 +-
>  src/compiler/nir/nir_loop_analyze.c | 1012 ++
> +
>  src/compiler/nir/nir_metadata.c |8 +-
>  4 files changed, 1056 insertions(+), 2 deletions(-)
>  create mode 100644 src/compiler/nir/nir_loop_analyze.c
>
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index f5b4f9c..7ed26a9 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -190,6 +190,8 @@ NIR_FILES = \
> nir/nir_intrinsics.c \
> nir/nir_intrinsics.h \
> nir/nir_liveness.c \
> +   nir/nir_loop_analyze.c \
> +   nir/nir_loop_analyze.h \
> nir/nir_lower_alu_to_scalar.c \
> nir/nir_lower_atomics.c \
> nir/nir_lower_bitmap.c \
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index ff7c422..49e8cd8 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -1549,9 +1549,36 @@ nir_if_last_else_node(nir_if *if_stmt)
>  }
>
>  typedef struct {
> +   nir_if *nif;
> +
> +   nir_instr *conditional_instr;
> +
> +   struct list_head loop_terminator_link;
> +} nir_loop_terminator;
> +
> +typedef struct {
> +   /* Number of instructions in the loop */
> +   unsigned num_instructions;
> +
> +   /* How many times the loop is run (if known) */
> +   unsigned trip_count;
> +   bool is_trip_count_known;
>

We could use 0 or -1 to indicate "I don't know trip count" instead of an
extra boolean.  Not sure that it matters much.


> +
> +   /* Unroll the loop regardless of its size */
> +   bool force_unroll;
>

It seems a bit odd to have this decide to force-unroll.  This is an
analysis pass, not a "make decisions" pass.


> +
> +   nir_loop_terminator *limiting_terminator;
> +
> +   /* A list of loop_terminators terminating this loop. */
> +   struct list_head loop_terminator_list;
> +} nir_loop_info;
> +
> +typedef struct {
> nir_cf_node cf_node;
>
> struct exec_list body; /** < list of nir_cf_node */
> +
> +   nir_loop_info *info;
>  } nir_loop;
>
>  static inline nir_cf_node *
> @@ -1576,6 +1603,7 @@ typedef enum {
> nir_metadata_dominance = 0x2,
> nir_metadata_live_ssa_defs = 0x4,
> nir_metadata_not_properly_reset = 0x8,
> +   nir_metadata_loop_analysis = 0x16,
>  } nir_metadata;
>
>  typedef struct {
> @@ -1758,6 +1786,8 @@ typedef struct nir_shader_compiler_options {
>  * information must be inferred from the list of input nir_variables.
>  */
> bool use_interpolated_input_intrinsics;
> +
> +   unsigned max_unroll_iterations;
>  } nir_shader_compiler_options;
>
>  typedef struct nir_shader_info {
> @@ -1962,7 +1992,7 @@ nir_loop *nir_loop_create(nir_shader *shader);
>  nir_function_impl *nir_cf_node_get_function(nir_cf_node *node);
>
>  /** requests that the given pieces of metadata be generated */
> -void nir_metadata_require(nir_function_impl *impl, nir_metadata
> required);
> +void nir_metadata_require(nir_function_impl *impl, nir_metadata
> required, ...);
>  /** dirties all but the preserved metadata */
>  void nir_metadata_preserve(nir_function_impl *impl, nir_metadata
> preserved);
>
> @@ -2559,6 +2589,10 @@ void nir_lower_double_pack(nir_shader *shader);
>  bool nir_normalize_cubemap_coords(nir_shader *shader);
>
>  void nir_live_ssa_defs_impl(nir_function_impl *impl);
> +
> +void nir_loop_analyze_impl(nir_function_impl *impl,
> +   

[Mesa-dev] [Bug 97230] MATLAB hangs if DRI3 enabled with intel driver

2016-09-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97230

Chris Wilson  changed:

   What|Removed |Added

 QA Contact|intel-gfx-bugs@lists.freede |mesa-dev@lists.freedesktop.
   |sktop.org   |org
Product|xorg|Mesa
   Assignee|ch...@chris-wilson.co.uk|mesa-dev@lists.freedesktop.
   ||org
  Component|Driver/intel|GLX

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] HACK: i965/ir: Test thread dispatch packing assumptions.

2016-09-16 Thread Francisco Jerez
Not intended for upstream.  Should cause a GPU hang if some thread is
executed with a non-contiguous dispatch mask breaking assumptions of
brw_stage_has_packed_dispatch().  Doesn't cause any CTS, DEQP or
Piglit regressions, while replacing brw_stage_has_packed_dispatch()
with a dummy implementation that unconditionally returns true on top
of this patch causes multiple GPU hangs.
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 17 +
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 +
 2 files changed, 38 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 042203d..b3eec49 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -33,6 +33,23 @@ using namespace brw::surface_access;
 void
 fs_visitor::emit_nir_code()
 {
+   if (brw_stage_has_packed_dispatch(stage, prog_data)) {
+  const fs_builder ubld = bld.exec_all().group(1, 0);
+  const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0);
+  const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ? brw_vmask_reg() :
+   brw_dmask_reg());
+
+  ubld.ADD(tmp, mask, brw_imm_ud(1));
+  ubld.AND(tmp, mask, tmp);
+
+  /* This will loop forever if the dispatch mask doesn't have the expected
+   * form '2^n-1', in which case tmp will be non-zero.
+   */
+  bld.emit(BRW_OPCODE_DO);
+  bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0), BRW_CONDITIONAL_NZ);
+  set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE));
+   }
+
/* emit the arrays used for inputs and outputs - load/store intrinsics will
 * be converted to reads/writes of these arrays
 */
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index ba3bbdf..9f7a1f0 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -35,6 +35,27 @@ namespace brw {
 void
 vec4_visitor::emit_nir_code()
 {
+   if (brw_stage_has_packed_dispatch(stage, _data->base)) {
+  const dst_reg tmp = writemask(dst_reg(this, glsl_type::uint_type),
+WRITEMASK_X);
+  const src_reg mask =
+ 
brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE, 
BRW_ARF_STATE, 0),
+   0, 4, 1),
+BRW_REGISTER_TYPE_UD),
+ BRW_SWIZZLE_);
+
+  emit(ADD(tmp, mask, brw_imm_ud(1)));
+  emit(AND(tmp, mask, src_reg(tmp)));
+
+  /* This will loop forever if the dispatch mask doesn't have the expected
+   * form '2^n-1', in which case tmp will be non-zero.
+   */
+  emit(BRW_OPCODE_DO);
+  emit(CMP(dst_null_ud(), src_reg(tmp), brw_imm_ud(0),
+   BRW_CONDITIONAL_NZ));
+  emit(BRW_OPCODE_WHILE)->predicate = BRW_PREDICATE_NORMAL;
+   }
+
if (nir->num_uniforms > 0)
   nir_setup_uniforms();
 
-- 
2.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] i965/fs: Take Dispatch/Vector mask into account in FIND_LIVE_CHANNEL

2016-09-16 Thread Francisco Jerez
From: Jason Ekstrand 

On at least Sky Lake, ce0 does not contain the full story as far as enabled
channels goes.  It is possible to have completely disabled channels where
the corresponding bits in ce0 are 1.  In order to get the correct execution
mask, you have to mask off those channels which were disabled from the
beginning by taking the AND of ce0 with either sr0.2 or sr0.3 depending on
the shader stage.  Failure to do so can result in FIND_LIVE_CHANNEL
returning a completely dead channel.

Signed-off-by: Jason Ekstrand 
Cc: Francisco Jerez 
[ Francisco Jerez: Fix a couple of typos, add mask register type
  assertion, clarify reason why ce0 can have bits set for disabled
  channels, clarify that this may only be a problem when thread
  dispatch doesn't pack channels tightly in the SIMD thread.  Apply
  same treatment to Align16 path. ]
Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_eu.h   |  3 +-
 src/mesa/drivers/dri/i965/brw_eu_emit.c  | 39 ++--
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  7 +++--
 src/mesa/drivers/dri/i965/brw_reg.h  | 12 
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
 5 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 3e52764..737a335 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 
 void
 brw_find_live_channel(struct brw_codegen *p,
-  struct brw_reg dst);
+  struct brw_reg dst,
+  struct brw_reg mask);
 
 void
 brw_broadcast(struct brw_codegen *p,
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 3b12030..c98867a 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 }
 
 void
-brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst)
+brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst,
+  struct brw_reg mask)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current);
@@ -3369,6 +3370,7 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst)
brw_inst *inst;
 
assert(devinfo->gen >= 7);
+   assert(mask.type == BRW_REGISTER_TYPE_UD);
 
brw_push_insn_state(p);
 
@@ -3377,18 +3379,32 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst)
 
   if (devinfo->gen >= 8) {
  /* Getting the first active channel index is easy on Gen8: Just find
-  * the first bit set in the mask register.  The same register exists
-  * on HSW already but it reads back as all ones when the current
+  * the first bit set in the execution mask.  The register exists on
+  * HSW already but it reads back as all ones when the current
   * instruction has execution masking disabled, so it's kind of
   * useless.
   */
- inst = brw_FBL(p, vec1(dst),
-retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD));
+ struct brw_reg exec_mask =
+retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD);
+
+ if (mask.file != BRW_IMMEDIATE_VALUE || mask.ud != 0x) {
+/* Unfortunately, ce0 does not take into account the thread
+ * dispatch mask, which may be a problem in cases where it's not
+ * tightly packed (i.e. it doesn't have the form '2^n - 1' for
+ * some n).  Combine ce0 with the given dispatch (or vector) mask
+ * to mask off those channels which were never dispatched by the
+ * hardware.
+ */
+brw_SHR(p, vec1(dst), mask, brw_imm_ud(qtr_control * 8));
+brw_AND(p, vec1(dst), exec_mask, vec1(dst));
+exec_mask = vec1(dst);
+ }
 
  /* Quarter control has the effect of magically shifting the value of
-  * this register so you'll get the first active channel relative to
-  * the specified quarter control as result.
+  * ce0 so you'll get the first active channel relative to the
+  * specified quarter control as result.
   */
+ inst = brw_FBL(p, vec1(dst), exec_mask);
   } else {
  const struct brw_reg flag = brw_flag_reg(1, 0);
 
@@ -3422,9 +3438,14 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst)
} else {
   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
 
-  if (devinfo->gen >= 8) {
+  if (devinfo->gen >= 8 &&
+  mask.file == BRW_IMMEDIATE_VALUE && mask.ud == 0x) {
  /* In SIMD4x2 mode 

[Mesa-dev] [PATCH 3/5] i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread dispatch.

2016-09-16 Thread Francisco Jerez
The eliminate_find_live_channel optimization eliminates
FIND_LIVE_CHANNEL instructions in cases where control flow is known to
be uniform, and replaces them with 'MOV 0', which in turn unblocks
subsequent elimination of the BROADCAST instruction frequently used on
the result of FIND_LIVE_CHANNEL.  This is however not correct in
per-sample fragment shader dispatch because the PSD can dispatch a
fully unlit sample under certain conditions.  Disable the optimization
in that case.
---
 src/mesa/drivers/dri/i965/brw_compiler.h | 41 
 src/mesa/drivers/dri/i965/brw_fs.cpp |  8 +++
 src/mesa/drivers/dri/i965/brw_vec4.cpp   |  8 +++
 3 files changed, 57 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index 84d3dde..1429875 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -868,6 +868,47 @@ encode_slm_size(unsigned gen, uint32_t bytes)
return slm_size;
 }
 
+/**
+ * Return true if the given shader stage is dispatched contiguously by the
+ * relevant fixed function starting from channel 0 of the SIMD thread, which
+ * implies that the dispatch mask of a thread can be assumed to have the form
+ * '2^n - 1' for some n.
+ */
+static inline bool
+brw_stage_has_packed_dispatch(gl_shader_stage stage,
+  const struct brw_stage_prog_data *prog_data)
+{
+   switch (stage) {
+   case MESA_SHADER_FRAGMENT: {
+  /* The PSD discards subspans coming in with no lit samples, which in the
+   * per-pixel shading case implies that each subspan will either be fully
+   * lit (due to the VMask being used to allow derivative computations),
+   * or not dispatched at all.  In per-sample dispatch mode individual
+   * samples from the same subspan have a fixed relative location within
+   * the SIMD thread, so dispatch of unlit samples cannot be avoided in
+   * general and we should return false.
+   */
+  const struct brw_wm_prog_data *wm_prog_data =
+ (const struct brw_wm_prog_data *)prog_data;
+  return !wm_prog_data->persample_dispatch;
+   }
+   case MESA_SHADER_COMPUTE:
+  /* Compute shaders will be spawned with either a fully enabled dispatch
+   * mask or with whatever bottom/right execution mask was given to the
+   * GPGPU walker command to be used along the workgroup edges -- In both
+   * cases the dispatch mask is required to be tightly packed for our
+   * invocation index calculations to work.
+   */
+  return true;
+   default:
+  /* Most remaining fixed functions are limited to use a packed dispatch
+   * mask due to the hardware representation of the dispatch mask as a
+   * single counter representing the number of enabled channels.
+   */
+  return true;
+   }
+}
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index bb65077..32f7ae2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2835,6 +2835,14 @@ fs_visitor::eliminate_find_live_channel()
bool progress = false;
unsigned depth = 0;
 
+   if (!brw_stage_has_packed_dispatch(stage, stage_prog_data)) {
+  /* The optimization below assumes that channel zero is live on thread
+   * dispatch, which may not be the case if the fixed function dispatches
+   * threads sparsely.
+   */
+  return progress;
+   }
+
foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
   switch (inst->opcode) {
   case BRW_OPCODE_IF:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 58c8a8a..d5bb82b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1291,6 +1291,14 @@ vec4_visitor::eliminate_find_live_channel()
bool progress = false;
unsigned depth = 0;
 
+   if (!brw_stage_has_packed_dispatch(stage, stage_prog_data)) {
+  /* The optimization below assumes that channel zero is live on thread
+   * dispatch, which may not be the case if the fixed function dispatches
+   * threads sparsely.
+   */
+  return progress;
+   }
+
foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
   switch (inst->opcode) {
   case BRW_OPCODE_IF:
-- 
2.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] i965/ir: Pass identity mask to brw_find_live_channel() in the packed dispatch case.

2016-09-16 Thread Francisco Jerez
This avoids emitting a few extra instructions required to take the
dispatch mask into account when it's known to be tightly packed.
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 4 +++-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 8 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index c510f42..bdeda3b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -2045,7 +2045,9 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
 
   case SHADER_OPCODE_FIND_LIVE_CHANNEL: {
  const struct brw_reg mask =
-stage == MESA_SHADER_FRAGMENT ? brw_vmask_reg() : brw_dmask_reg();
+brw_stage_has_packed_dispatch(stage, prog_data) ? brw_imm_ud(~0u) :
+stage == MESA_SHADER_FRAGMENT ? brw_vmask_reg() :
+brw_dmask_reg();
  brw_find_live_channel(p, dst, mask);
  break;
   }
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index f9e6d1c..2bef549 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1862,9 +1862,13 @@ generate_code(struct brw_codegen *p,
  brw_memory_fence(p, dst);
  break;
 
-  case SHADER_OPCODE_FIND_LIVE_CHANNEL:
- brw_find_live_channel(p, dst, brw_dmask_reg());
+  case SHADER_OPCODE_FIND_LIVE_CHANNEL: {
+ const struct brw_reg mask =
+brw_stage_has_packed_dispatch(nir->stage, _data->base) ?
+brw_imm_ud(~0u) : brw_dmask_reg();
+ brw_find_live_channel(p, dst, mask);
  break;
+  }
 
   case SHADER_OPCODE_BROADCAST:
  assert(inst->force_writemask_all);
-- 
2.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] i965/reg: Make brw_sr0_reg take a subnr and return a vec1 reg

2016-09-16 Thread Francisco Jerez
From: Jason Ekstrand 

The state register sr0 is really a collection of dwords not a SIMD8
anything.  It's much more convenient for brw_sr0_reg to return the
particular dword you're looking for rather than a giant blob you have to
massage into what you want.

Signed-off-by: Jason Ekstrand 
[ Francisco Jerez: Trivial simplification of brw_ud1_reg(). ]
Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp |  2 +-
 src/mesa/drivers/dri/i965/brw_reg.h  | 20 
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index c858f44..bb65077 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -6185,7 +6185,7 @@ fs_visitor::run_cs()
if (devinfo->is_haswell && prog_data->total_shared > 0) {
   /* Move SLM index from g0.0[27:24] to sr0.1[11:8] */
   const fs_builder abld = bld.exec_all().group(1, 0);
-  abld.MOV(retype(suboffset(brw_sr0_reg(), 1), BRW_REGISTER_TYPE_UW),
+  abld.MOV(retype(brw_sr0_reg(1), BRW_REGISTER_TYPE_UW),
suboffset(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UW), 1));
}
 
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index d6f22ed..b71c63b 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -567,6 +567,12 @@ brw_uw1_reg(enum brw_reg_file file, unsigned nr, unsigned 
subnr)
 }
 
 static inline struct brw_reg
+brw_ud1_reg(enum brw_reg_file file, unsigned nr, unsigned subnr)
+{
+   return retype(brw_vec1_reg(file, nr, subnr), BRW_REGISTER_TYPE_UD);
+}
+
+static inline struct brw_reg
 brw_imm_reg(enum brw_reg_type type)
 {
return brw_reg(BRW_IMMEDIATE_VALUE,
@@ -789,19 +795,9 @@ brw_notification_reg(void)
 }
 
 static inline struct brw_reg
-brw_sr0_reg(void)
+brw_sr0_reg(unsigned subnr)
 {
-   return brw_reg(BRW_ARCHITECTURE_REGISTER_FILE,
-  BRW_ARF_STATE,
-  0,
-  0,
-  0,
-  BRW_REGISTER_TYPE_UD,
-  BRW_VERTICAL_STRIDE_8,
-  BRW_WIDTH_8,
-  BRW_HORIZONTAL_STRIDE_1,
-  BRW_SWIZZLE_XYZW,
-  WRITEMASK_XYZW);
+   return brw_ud1_reg(BRW_ARCHITECTURE_REGISTER_FILE, BRW_ARF_STATE, subnr);
 }
 
 static inline struct brw_reg
-- 
2.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] mesa: add new entrypoints for GL_OES_viewport_array

2016-09-16 Thread Anuj Phogat
On Fri, Sep 16, 2016 at 12:55 PM, Ilia Mirkin  wrote:
> Signed-off-by: Ilia Mirkin 
> ---
>  src/mapi/glapi/gen/apiexec.py   | 12 
>  src/mapi/glapi/gen/es_EXT.xml   | 50 
> +
>  src/mesa/main/tests/dispatch_sanity.cpp | 11 
>  src/mesa/main/viewport.c| 12 
>  src/mesa/main/viewport.h|  6 
>  5 files changed, 85 insertions(+), 6 deletions(-)
>
> diff --git a/src/mapi/glapi/gen/apiexec.py b/src/mapi/glapi/gen/apiexec.py
> index b4f4cf6..4bdc95d 100644
> --- a/src/mapi/glapi/gen/apiexec.py
> +++ b/src/mapi/glapi/gen/apiexec.py
> @@ -133,12 +133,12 @@ functions = {
>  #
>  # Mesa does not support either of the geometry shader extensions, so
>  # OpenGL 3.2 is required.
> -"ViewportArrayv": exec_info(core=32),
> -"ViewportIndexedf": exec_info(core=32),
> -"ViewportIndexedfv": exec_info(core=32),
> -"ScissorArrayv": exec_info(core=32),
> -"ScissorIndexed": exec_info(core=32),
> -"ScissorIndexedv": exec_info(core=32),
> +"ViewportArrayv": exec_info(core=32, es2=31),
> +"ViewportIndexedf": exec_info(core=32, es2=31),
> +"ViewportIndexedfv": exec_info(core=32, es2=31),
> +"ScissorArrayv": exec_info(core=32, es2=31),
> +"ScissorIndexed": exec_info(core=32, es2=31),
> +"ScissorIndexedv": exec_info(core=32, es2=31),
>  "DepthRangeArrayv": exec_info(core=32),
>  "DepthRangeIndexed": exec_info(core=32),
>  # GetFloati_v also GL_ARB_shader_atomic_counters
> diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
> index 332dc5e..3e705eb 100644
> --- a/src/mapi/glapi/gen/es_EXT.xml
> +++ b/src/mapi/glapi/gen/es_EXT.xml
> @@ -1342,4 +1342,54 @@
>
>  
>
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> + alias="ViewportIndexedfv">
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
>  
> diff --git a/src/mesa/main/tests/dispatch_sanity.cpp 
> b/src/mesa/main/tests/dispatch_sanity.cpp
> index c87b1dc..0d3b6ab 100644
> --- a/src/mesa/main/tests/dispatch_sanity.cpp
> +++ b/src/mesa/main/tests/dispatch_sanity.cpp
> @@ -2613,5 +2613,16 @@ const struct function gles31_functions_possible[] = {
> /* GL_OES_primitive_bound_box */
> { "glPrimitiveBoundingBoxOES", 31, -1 },
>
> +   /* GL_OES_viewport_array */
> +   { "glViewportArrayvOES", 31, -1 },
> +   { "glViewportIndexedfOES", 31, -1 },
> +   { "glViewportIndexedfvOES", 31, -1 },
> +   { "glScissorArrayvOES", 31, -1 },
> +   { "glScissorIndexedOES", 31, -1 },
> +   { "glScissorIndexedvOES", 31, -1 },
> +   { "glDepthRangeArrayfvOES", 31, -1 },
> +   { "glDepthRangeIndexedfOES", 31, -1 },
> +   { "glGetFloati_vOES", 31, -1 },
> +
> { NULL, 0, -1 },
>   };
> diff --git a/src/mesa/main/viewport.c b/src/mesa/main/viewport.c
> index 681e46b..f6eaa0f 100644
> --- a/src/mesa/main/viewport.c
> +++ b/src/mesa/main/viewport.c
> @@ -330,6 +330,12 @@ _mesa_DepthRangeArrayv(GLuint first, GLsizei count, 
> const GLclampd *v)
>ctx->Driver.DepthRange(ctx);
>  }
>
> +void GLAPIENTRY
> +_mesa_DepthRangeArrayfvOES(GLuint first, GLsizei count, const GLfloat *v)
> +{
> +
> +}
> +
>  /**
>   * Update a single DepthRange
>   *
> @@ -358,6 +364,12 @@ _mesa_DepthRangeIndexed(GLuint index, GLclampd nearval, 
> GLclampd farval)
> _mesa_set_depth_range(ctx, index, nearval, farval);
>  }
>
> +void GLAPIENTRY
> +_mesa_DepthRangeIndexedfOES(GLuint index, GLfloat nearval, GLfloat farval)
> +{
> +
> +}
> +
>  /**
>   * Initialize the context viewport attribute group.
>   * \param ctx  the GL context.
> diff --git a/src/mesa/main/viewport.h b/src/mesa/main/viewport.h
> index b0675db..3951319 100644
> --- a/src/mesa/main/viewport.h
> +++ b/src/mesa/main/viewport.h
> @@ -58,8 +58,14 @@ extern void GLAPIENTRY
>  _mesa_DepthRangeArrayv(GLuint first, GLsizei count, const GLclampd * v);
>
>  extern void GLAPIENTRY
> +_mesa_DepthRangeArrayfvOES(GLuint first, GLsizei count, const GLfloat * v);
> +
> +extern void GLAPIENTRY
>  _mesa_DepthRangeIndexed(GLuint index, GLclampd n, GLclampd f);
>
> +extern void GLAPIENTRY
> +_mesa_DepthRangeIndexedfOES(GLuint index, GLfloat n, GLfloat f);
> +
>  extern void
>  _mesa_set_depth_range(struct gl_context *ctx, unsigned idx,
>GLclampd nearval, GLclampd farval);
> --
> 2.7.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Patches 1-5 are: Reviewed-by: Anuj Phogat 

Re: [Mesa-dev] [PATCH 3/9] radeonsi: add si_get_shader_buffers/get_pipe_constant_buffers

2016-09-16 Thread Bas Nieuwenhuizen
On Fri, Sep 16, 2016 at 3:57 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> These functions extract the pipe state structure from the current
> descriptors, for state saving.
> ---
>  src/gallium/drivers/radeonsi/si_descriptors.c | 46 
> +++
>  src/gallium/drivers/radeonsi/si_state.h   |  5 +++
>  2 files changed, 51 insertions(+)
>
> diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
> b/src/gallium/drivers/radeonsi/si_descriptors.c
> index b1a8594..d82910c 100644
> --- a/src/gallium/drivers/radeonsi/si_descriptors.c
> +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
> @@ -830,20 +830,41 @@ static void si_buffer_resources_begin_new_cs(struct 
> si_context *sctx,
> /* Add buffers to the CS. */
> while (mask) {
> int i = u_bit_scan();
>
> radeon_add_to_buffer_list(>b, >b.gfx,
>   (struct 
> r600_resource*)buffers->buffers[i],
>   buffers->shader_usage, 
> buffers->priority);
> }
>  }
>
> +static void si_get_buffer_from_descriptors(struct si_buffer_resources 
> *buffers,
> +  struct si_descriptors *descs,
> +  unsigned idx, struct pipe_resource 
> **buf,
> +  unsigned *offset, unsigned *size)
> +{
> +   pipe_resource_reference(buf, buffers->buffers[idx]);
> +   if (*buf) {
> +   struct r600_resource *res = (struct r600_resource *)buf;

I think this has to be *buf.

> +   const uint32_t *desc = descs->list + idx * 4;
> +   uint64_t va;
> +
> +   *size = desc[2];
> +
> +   assert(G_008F04_STRIDE(desc[1]) == 0);
> +   va = ((uint64_t)desc[1] << 32) | desc[0];
> +
> +   assert(va >= res->gpu_address && va + *size <= 
> res->gpu_address + res->bo_size);
> +   *offset = va - res->gpu_address;
> +   }
> +}
> +
>  /* VERTEX BUFFERS */
>
>  static void si_vertex_buffers_begin_new_cs(struct si_context *sctx)
>  {
> struct si_descriptors *desc = >vertex_buffers;
> int count = sctx->vertex_elements ? sctx->vertex_elements->count : 0;
> int i;
>
> for (i = 0; i < count; i++) {
> int vb = 
> sctx->vertex_elements->elements[i].vertex_buffer_index;
> @@ -1055,20 +1076,30 @@ static void si_pipe_set_constant_buffer(struct 
> pipe_context *ctx,
> struct si_context *sctx = (struct si_context *)ctx;
>
> if (shader >= SI_NUM_SHADERS)
> return;
>
> si_set_constant_buffer(sctx, >const_buffers[shader],
>si_const_buffer_descriptors_idx(shader),
>slot, input);
>  }
>
> +void si_get_pipe_constant_buffer(struct si_context *sctx, uint shader,
> +uint slot, struct pipe_constant_buffer *cbuf)
> +{
> +   cbuf->user_buffer = NULL;
> +   si_get_buffer_from_descriptors(
> +   >const_buffers[shader],
> +   si_const_buffer_descriptors(sctx, shader),
> +   slot, >buffer, >buffer_offset, 
> >buffer_size);
> +}
> +
>  /* SHADER BUFFERS */
>
>  static unsigned
>  si_shader_buffer_descriptors_idx(enum pipe_shader_type shader)
>  {
> return SI_DESCS_FIRST_SHADER + shader * SI_NUM_SHADER_DESCS +
>SI_SHADER_DESCS_SHADER_BUFFERS;
>  }
>
>  static struct si_descriptors *
> @@ -1125,20 +1156,35 @@ static void si_set_shader_buffers(struct pipe_context 
> *ctx,
> radeon_add_to_buffer_list_check_mem(>b, >b.gfx, 
> buf,
> buffers->shader_usage,
> buffers->priority, true);
> buffers->enabled_mask |= 1u << slot;
> descs->dirty_mask |= 1u << slot;
> sctx->descriptors_dirty |=
> 1u << si_shader_buffer_descriptors_idx(shader);
> }
>  }
>
> +void si_get_shader_buffers(struct si_context *sctx, uint shader,
> +  uint start_slot, uint count,
> +  struct pipe_shader_buffer *sbuf)
> +{
> +   struct si_buffer_resources *buffers = >shader_buffers[shader];
> +   struct si_descriptors *descs = si_shader_buffer_descriptors(sctx, 
> shader);
> +
> +   for (unsigned i = 0; i < count; ++i) {
> +   si_get_buffer_from_descriptors(
> +   buffers, descs, start_slot + i,
> +   [i].buffer, [i].buffer_offset,
> +   [i].buffer_size);
> +   }
> +}
> +
>  /* RING BUFFERS */
>
>  void si_set_ring_buffer(struct pipe_context *ctx, uint slot,
> struct pipe_resource *buffer,
> unsigned stride, unsigned num_records,
> 

[Mesa-dev] [PATCH 5/6] mesa: add implementations for new float depth functions

2016-09-16 Thread Ilia Mirkin
This just up-converts them to doubles. Not great, but this is what all
the other variants also do.

Signed-off-by: Ilia Mirkin 
---
 src/mesa/main/viewport.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/viewport.c b/src/mesa/main/viewport.c
index f6eaa0f..25a5415 100644
--- a/src/mesa/main/viewport.c
+++ b/src/mesa/main/viewport.c
@@ -333,7 +333,24 @@ _mesa_DepthRangeArrayv(GLuint first, GLsizei count, const 
GLclampd *v)
 void GLAPIENTRY
 _mesa_DepthRangeArrayfvOES(GLuint first, GLsizei count, const GLfloat *v)
 {
+   int i;
+   GET_CURRENT_CONTEXT(ctx);
+
+   if (MESA_VERBOSE & VERBOSE_API)
+  _mesa_debug(ctx, "glDepthRangeArrayvf %d %d\n", first, count);
 
+   if ((first + count) > ctx->Const.MaxViewports) {
+  _mesa_error(ctx, GL_INVALID_VALUE,
+  "glDepthRangeArrayvf: first (%d) + count (%d) >= 
MaxViewports (%d)",
+  first, count, ctx->Const.MaxViewports);
+  return;
+   }
+
+   for (i = 0; i < count; i++)
+  set_depth_range_no_notify(ctx, i + first, v[i * 2], v[i * 2 + 1]);
+
+   if (ctx->Driver.DepthRange)
+  ctx->Driver.DepthRange(ctx);
 }
 
 /**
@@ -367,7 +384,7 @@ _mesa_DepthRangeIndexed(GLuint index, GLclampd nearval, 
GLclampd farval)
 void GLAPIENTRY
 _mesa_DepthRangeIndexedfOES(GLuint index, GLfloat nearval, GLfloat farval)
 {
-
+   _mesa_DepthRangeIndexed(index, nearval, farval);
 }
 
 /** 
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] mesa: move ARB_viewport_array params to a GLES 3.1-accessible section

2016-09-16 Thread Ilia Mirkin
This is needed for GL_OES_viewport_array.

Signed-off-by: Ilia Mirkin 
---
 src/mesa/main/get_hash_params.py | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py
index 1f63dc3..716cb57 100644
--- a/src/mesa/main/get_hash_params.py
+++ b/src/mesa/main/get_hash_params.py
@@ -613,6 +613,12 @@ descriptor=[
 
 # GL_OES_primitive_bounding_box
   [ "PRIMITIVE_BOUNDING_BOX_ARB", "CONTEXT_FLOAT8(PrimitiveBoundingBox), 
extra_OES_primitive_bounding_box" ],
+
+# GL_ARB_viewport_array / GL_OES_viewport_array
+  [ "MAX_VIEWPORTS", "CONTEXT_INT(Const.MaxViewports), 
extra_ARB_viewport_array" ],
+  [ "VIEWPORT_SUBPIXEL_BITS", "CONTEXT_INT(Const.ViewportSubpixelBits), 
extra_ARB_viewport_array" ],
+  [ "VIEWPORT_BOUNDS_RANGE", "CONTEXT_FLOAT2(Const.ViewportBounds), 
extra_ARB_viewport_array" ],
+  [ "VIEWPORT_INDEX_PROVOKING_VERTEX", 
"CONTEXT_ENUM(Const.LayerAndVPIndexProvokingVertex), extra_ARB_viewport_array" 
],
 ]},
 
 { "apis": ["GL_CORE", "GLES32"], "params": [
@@ -938,12 +944,6 @@ descriptor=[
 
 # Enums restricted to OpenGL Core profile
 { "apis": ["GL_CORE"], "params": [
-# GL_ARB_viewport_array
-  [ "MAX_VIEWPORTS", "CONTEXT_INT(Const.MaxViewports), 
extra_ARB_viewport_array" ],
-  [ "VIEWPORT_SUBPIXEL_BITS", "CONTEXT_INT(Const.ViewportSubpixelBits), 
extra_ARB_viewport_array" ],
-  [ "VIEWPORT_BOUNDS_RANGE", "CONTEXT_FLOAT2(Const.ViewportBounds), 
extra_ARB_viewport_array" ],
-  [ "VIEWPORT_INDEX_PROVOKING_VERTEX", 
"CONTEXT_ENUM(Const.LayerAndVPIndexProvokingVertex), extra_ARB_viewport_array" 
],
-
 # GL_ARB_shader_subroutine
   [ "MAX_SUBROUTINES", "CONST(MAX_SUBROUTINES), extra_ARB_shader_subroutine" ],
   [ "MAX_SUBROUTINE_UNIFORM_LOCATIONS", 
"CONST(MAX_SUBROUTINE_UNIFORM_LOCATIONS), extra_ARB_shader_subroutine" ],
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] st/mesa: turn on OES_viewport_array when dependencies are met

2016-09-16 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 docs/features.txt  | 2 +-
 docs/relnotes/12.1.0.html  | 1 +
 src/mesa/state_tracker/st_extensions.c | 5 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/docs/features.txt b/docs/features.txt
index df81f91..ed45e10 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -319,7 +319,7 @@ Khronos, ARB, and OES extensions that are not part of any 
OpenGL or OpenGL ES ve
   GL_OES_texture_half_float DONE (i965, r300, 
r600, radeonsi, nv30, nv50, nvc0, softpipe, llvmpipe)
   GL_OES_texture_half_float_linear  DONE (i965, r300, 
r600, radeonsi, nv30, nv50, nvc0, softpipe, llvmpipe)
   GL_OES_texture_view   not started - based on 
GL_ARB_texture_view
-  GL_OES_viewport_array not started - based on 
GL_ARB_viewport_array and GL_ARB_fragment_layer_viewport
+  GL_OES_viewport_array DONE (nvc0, radeonsi)
   GLX_ARB_context_flush_control not started
   GLX_ARB_robustness_application_isolation  not started
   GLX_ARB_robustness_share_group_isolation  not started
diff --git a/docs/relnotes/12.1.0.html b/docs/relnotes/12.1.0.html
index 8e0a84e..75dfb31 100644
--- a/docs/relnotes/12.1.0.html
+++ b/docs/relnotes/12.1.0.html
@@ -63,6 +63,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_OES_primitive_bounding_box on i965/gen7+, nvc0, radeonsi
 GL_OES_texture_cube_map_array on i965/gen8+, nvc0, radeonsi
 GL_OES_tessellation_shader on i965/gen7+, nvc0, radeonsi
+GL_OES_viewport_array on nvc0, radeonsi
 GL_ANDROID_extension_pack_es31a on i965/gen9+
 
 
diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 8c7be52..2f23e8d 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -1227,6 +1227,11 @@ void st_init_extensions(struct pipe_screen *screen,
   extensions->OES_geometry_shader &&
   extensions->ARB_texture_cube_map_array;
 
+   extensions->OES_viewport_array =
+  extensions->ARB_ES3_1_compatibility &&
+  extensions->OES_geometry_shader &&
+  extensions->ARB_viewport_array;
+
extensions->OES_primitive_bounding_box = 
extensions->ARB_ES3_1_compatibility;
consts->NoPrimitiveBoundingBoxOutput = true;
 }
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] glsl: add OES_viewport_array enables and use them to expose gl_ViewportIndex

2016-09-16 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 src/compiler/glsl/builtin_variables.cpp | 9 ++---
 src/compiler/glsl/glsl_parser_extras.h  | 2 ++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/compiler/glsl/builtin_variables.cpp 
b/src/compiler/glsl/builtin_variables.cpp
index f47daab..3bd8c16 100644
--- a/src/compiler/glsl/builtin_variables.cpp
+++ b/src/compiler/glsl/builtin_variables.cpp
@@ -876,7 +876,8 @@ builtin_variable_generator::generate_constants()
}
 
if (state->is_version(410, 0) ||
-   state->ARB_viewport_array_enable)
+   state->ARB_viewport_array_enable ||
+   state->OES_viewport_array_enable)
   add_const("gl_MaxViewports", state->Const.MaxViewports);
 
if (state->has_tessellation_shader()) {
@@ -1086,7 +1087,8 @@ builtin_variable_generator::generate_gs_special_vars()
 
var = add_output(VARYING_SLOT_LAYER, int_t, "gl_Layer");
var->data.interpolation = INTERP_MODE_FLAT;
-   if (state->is_version(410, 0) || state->ARB_viewport_array_enable) {
+   if (state->is_version(410, 0) || state->ARB_viewport_array_enable ||
+   state->OES_viewport_array_enable) {
   var = add_output(VARYING_SLOT_VIEWPORT, int_t, "gl_ViewportIndex");
   var->data.interpolation = INTERP_MODE_FLAT;
}
@@ -1216,7 +1218,8 @@ builtin_variable_generator::generate_fs_special_vars()
}
 
if (state->is_version(430, 0) ||
-   state->ARB_fragment_layer_viewport_enable) {
+   state->ARB_fragment_layer_viewport_enable ||
+   state->OES_viewport_array_enable) {
   var = add_input(VARYING_SLOT_VIEWPORT, int_t, "gl_ViewportIndex");
   var->data.interpolation = INTERP_MODE_FLAT;
}
diff --git a/src/compiler/glsl/glsl_parser_extras.h 
b/src/compiler/glsl/glsl_parser_extras.h
index 027b97e..f10525c 100644
--- a/src/compiler/glsl/glsl_parser_extras.h
+++ b/src/compiler/glsl/glsl_parser_extras.h
@@ -696,6 +696,8 @@ struct _mesa_glsl_parse_state {
bool OES_texture_cube_map_array_warn;
bool OES_texture_storage_multisample_2d_array_enable;
bool OES_texture_storage_multisample_2d_array_warn;
+   bool OES_viewport_array_enable;
+   bool OES_viewport_array_warn;
 
/* All other extensions go here, sorted alphabetically.
 */
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] mesa: add GL_OES_viewport_array to the extension string

2016-09-16 Thread Ilia Mirkin
The expectation is that drivers will set this based on
OES_geometry_shader and ARB_viewport_array support. This is a separate
enable on the same reasoning as for OES_texture_cube_map_array.

Signed-off-by: Ilia Mirkin 
---
 src/compiler/glsl/glsl_parser_extras.cpp | 1 +
 src/mesa/main/extensions_table.h | 1 +
 src/mesa/main/mtypes.h   | 1 +
 3 files changed, 3 insertions(+)

diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
b/src/compiler/glsl/glsl_parser_extras.cpp
index 0e9bfa7..e5a8e0c 100644
--- a/src/compiler/glsl/glsl_parser_extras.cpp
+++ b/src/compiler/glsl/glsl_parser_extras.cpp
@@ -653,6 +653,7 @@ static const _mesa_glsl_extension 
_mesa_glsl_supported_extensions[] = {
EXT(OES_texture_buffer),
EXT(OES_texture_cube_map_array),
EXT_AEP(OES_texture_storage_multisample_2d_array),
+   EXT(OES_viewport_array),
 
/* All other extensions go here, sorted alphabetically.
 */
diff --git a/src/mesa/main/extensions_table.h b/src/mesa/main/extensions_table.h
index 1f8da7e..0ce8d4a 100644
--- a/src/mesa/main/extensions_table.h
+++ b/src/mesa/main/extensions_table.h
@@ -389,6 +389,7 @@ EXT(OES_texture_npot, 
ARB_texture_non_power_of_two
 EXT(OES_texture_stencil8, ARB_texture_stencil8 
  ,  x ,  x ,  x ,  30, 2014)
 EXT(OES_texture_storage_multisample_2d_array, ARB_texture_multisample  
  ,  x ,  x ,  x ,  31, 2014)
 EXT(OES_vertex_array_object , dummy_true   
  ,  x ,  x , ES1, ES2, 2010)
+EXT(OES_viewport_array  , OES_viewport_array   
  ,  x ,  x ,  x ,  31, 2010)
 
 EXT(S3_s3tc , ANGLE_texture_compression_dxt
  , GLL, GLC,  x ,  x , 1999)
 
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index d00829c..88025d6 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3951,6 +3951,7 @@ struct gl_extensions
GLboolean OES_standard_derivatives;
GLboolean OES_texture_buffer;
GLboolean OES_texture_cube_map_array;
+   GLboolean OES_viewport_array;
/* vendor extensions */
GLboolean AMD_performance_monitor;
GLboolean AMD_pinned_memory;
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] mesa: add new entrypoints for GL_OES_viewport_array

2016-09-16 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 src/mapi/glapi/gen/apiexec.py   | 12 
 src/mapi/glapi/gen/es_EXT.xml   | 50 +
 src/mesa/main/tests/dispatch_sanity.cpp | 11 
 src/mesa/main/viewport.c| 12 
 src/mesa/main/viewport.h|  6 
 5 files changed, 85 insertions(+), 6 deletions(-)

diff --git a/src/mapi/glapi/gen/apiexec.py b/src/mapi/glapi/gen/apiexec.py
index b4f4cf6..4bdc95d 100644
--- a/src/mapi/glapi/gen/apiexec.py
+++ b/src/mapi/glapi/gen/apiexec.py
@@ -133,12 +133,12 @@ functions = {
 #
 # Mesa does not support either of the geometry shader extensions, so
 # OpenGL 3.2 is required.
-"ViewportArrayv": exec_info(core=32),
-"ViewportIndexedf": exec_info(core=32),
-"ViewportIndexedfv": exec_info(core=32),
-"ScissorArrayv": exec_info(core=32),
-"ScissorIndexed": exec_info(core=32),
-"ScissorIndexedv": exec_info(core=32),
+"ViewportArrayv": exec_info(core=32, es2=31),
+"ViewportIndexedf": exec_info(core=32, es2=31),
+"ViewportIndexedfv": exec_info(core=32, es2=31),
+"ScissorArrayv": exec_info(core=32, es2=31),
+"ScissorIndexed": exec_info(core=32, es2=31),
+"ScissorIndexedv": exec_info(core=32, es2=31),
 "DepthRangeArrayv": exec_info(core=32),
 "DepthRangeIndexed": exec_info(core=32),
 # GetFloati_v also GL_ARB_shader_atomic_counters
diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
index 332dc5e..3e705eb 100644
--- a/src/mapi/glapi/gen/es_EXT.xml
+++ b/src/mapi/glapi/gen/es_EXT.xml
@@ -1342,4 +1342,54 @@
 
 
 
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
 
diff --git a/src/mesa/main/tests/dispatch_sanity.cpp 
b/src/mesa/main/tests/dispatch_sanity.cpp
index c87b1dc..0d3b6ab 100644
--- a/src/mesa/main/tests/dispatch_sanity.cpp
+++ b/src/mesa/main/tests/dispatch_sanity.cpp
@@ -2613,5 +2613,16 @@ const struct function gles31_functions_possible[] = {
/* GL_OES_primitive_bound_box */
{ "glPrimitiveBoundingBoxOES", 31, -1 },
 
+   /* GL_OES_viewport_array */
+   { "glViewportArrayvOES", 31, -1 },
+   { "glViewportIndexedfOES", 31, -1 },
+   { "glViewportIndexedfvOES", 31, -1 },
+   { "glScissorArrayvOES", 31, -1 },
+   { "glScissorIndexedOES", 31, -1 },
+   { "glScissorIndexedvOES", 31, -1 },
+   { "glDepthRangeArrayfvOES", 31, -1 },
+   { "glDepthRangeIndexedfOES", 31, -1 },
+   { "glGetFloati_vOES", 31, -1 },
+
{ NULL, 0, -1 },
  };
diff --git a/src/mesa/main/viewport.c b/src/mesa/main/viewport.c
index 681e46b..f6eaa0f 100644
--- a/src/mesa/main/viewport.c
+++ b/src/mesa/main/viewport.c
@@ -330,6 +330,12 @@ _mesa_DepthRangeArrayv(GLuint first, GLsizei count, const 
GLclampd *v)
   ctx->Driver.DepthRange(ctx);
 }
 
+void GLAPIENTRY
+_mesa_DepthRangeArrayfvOES(GLuint first, GLsizei count, const GLfloat *v)
+{
+
+}
+
 /**
  * Update a single DepthRange
  *
@@ -358,6 +364,12 @@ _mesa_DepthRangeIndexed(GLuint index, GLclampd nearval, 
GLclampd farval)
_mesa_set_depth_range(ctx, index, nearval, farval);
 }
 
+void GLAPIENTRY
+_mesa_DepthRangeIndexedfOES(GLuint index, GLfloat nearval, GLfloat farval)
+{
+
+}
+
 /** 
  * Initialize the context viewport attribute group.
  * \param ctx  the GL context.
diff --git a/src/mesa/main/viewport.h b/src/mesa/main/viewport.h
index b0675db..3951319 100644
--- a/src/mesa/main/viewport.h
+++ b/src/mesa/main/viewport.h
@@ -58,8 +58,14 @@ extern void GLAPIENTRY
 _mesa_DepthRangeArrayv(GLuint first, GLsizei count, const GLclampd * v);
 
 extern void GLAPIENTRY
+_mesa_DepthRangeArrayfvOES(GLuint first, GLsizei count, const GLfloat * v);
+
+extern void GLAPIENTRY
 _mesa_DepthRangeIndexed(GLuint index, GLclampd n, GLclampd f);
 
+extern void GLAPIENTRY
+_mesa_DepthRangeIndexedfOES(GLuint index, GLfloat n, GLfloat f);
+
 extern void
 _mesa_set_depth_range(struct gl_context *ctx, unsigned idx,
   GLclampd nearval, GLclampd farval);
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: only enable MSAA coverage options when we have a MSAA buffer

2016-09-16 Thread Roland Scheidegger
We can't change how gallium is supposed to behave since other apis rely
on coverage-to-alpha working even if msaa is disabled.

Roland

Am 16.09.2016 um 18:58 schrieb Ilia Mirkin:
> FTR, the new piglit test passed as-is on NVIDIA hw (at least nv50 and
> nvc0). I'm not opposed to this new state dependency if Marek isn't
> (he's analyzed these things a whole lot more than I suspect anyone
> else), but just wanted to point it out in case the preference is to
> instead change how gallium is supposed to behave.
> 
> Cheers,
> 
>   -ilia
> 
> On Thu, Sep 15, 2016 at 5:20 PM, Brian Paul  wrote:
>> Regardless of whether GL_MULTISAMPLE is enabled (it's enabled by default)
>> we should not set the alpha_to_coverage or alpha_to_one flags if the
>> current drawing buffer does not do MSAA.
>>
>> This fixes the new piglit gl-1.3-alpha_to_coverage_nop test.
>> ---
>>  src/mesa/state_tracker/st_atom_blend.c | 9 ++---
>>  src/mesa/state_tracker/st_context.c| 3 ++-
>>  2 files changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/mesa/state_tracker/st_atom_blend.c 
>> b/src/mesa/state_tracker/st_atom_blend.c
>> index 65de67b..67e 100644
>> --- a/src/mesa/state_tracker/st_atom_blend.c
>> +++ b/src/mesa/state_tracker/st_atom_blend.c
>> @@ -265,9 +265,12 @@ update_blend( struct st_context *st )
>>
>> blend->dither = ctx->Color.DitherFlag;
>>
>> -   if (ctx->Multisample.Enabled) {
>> -  /* unlike in gallium/d3d10 these operations are only performed
>> - if msaa is enabled */
>> +   if (ctx->Multisample.Enabled &&
>> +   ctx->DrawBuffer &&
>> +   ctx->DrawBuffer->Visual.sampleBuffers > 0) {
>> +  /* Unlike in gallium/d3d10 these operations are only performed
>> +   * if both msaa is enabled and we have a multisample buffer.
>> +   */
>>blend->alpha_to_coverage = ctx->Multisample.SampleAlphaToCoverage;
>>blend->alpha_to_one = ctx->Multisample.SampleAlphaToOne;
>> }
>> diff --git a/src/mesa/state_tracker/st_context.c 
>> b/src/mesa/state_tracker/st_context.c
>> index ddc11a4..81b3387 100644
>> --- a/src/mesa/state_tracker/st_context.c
>> +++ b/src/mesa/state_tracker/st_context.c
>> @@ -166,7 +166,8 @@ void st_invalidate_state(struct gl_context * ctx, 
>> GLbitfield new_state)
>> struct st_context *st = st_context(ctx);
>>
>> if (new_state & _NEW_BUFFERS) {
>> -  st->dirty |= ST_NEW_DSA |
>> +  st->dirty |= ST_NEW_BLEND |
>> +   ST_NEW_DSA |
>> ST_NEW_FB_STATE |
>> ST_NEW_SAMPLE_MASK |
>> ST_NEW_SAMPLE_SHADING |
>> --
>> 1.9.1
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev=CwIGaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=FXrt74ZtFglQGjzdeMzTViyt5ShMaOWiemjQQr1Brfo=Gn2XFwCKJqAGKnzGpmFXVDwVDuM7C9FSad_e8d3dp_4=
>>  
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev=CwIGaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=FXrt74ZtFglQGjzdeMzTViyt5ShMaOWiemjQQr1Brfo=Gn2XFwCKJqAGKnzGpmFXVDwVDuM7C9FSad_e8d3dp_4=
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] glx/glvnd: Fix dispatch function names and indices

2016-09-16 Thread Emil Velikov
On 14 September 2016 at 19:06, Adam Jackson  wrote:
> As this array was not actually sorted, FindGLXFunction's binary search
> would only sometimes work.
>
This commit message is a bit iffy, yet again most of this and
g_glxglvnddispatchfuncs.c is dead code.

Afaict the sole reason behind his file is to have the vendor driver
callback into libGLX in order to manage (add/remove) the relevant
fbconfig/drawable/context to vendor mappings. From a quick search we
have ~5 out of the 30+ functions that do that.

Everyone else calls back into the vendor library to a) get the correct
vendor (dispatch) and using it dive via the vendor neutral library
_directly_ back into itself (see the generated g_libglglxwrapper.c
file) to get the entry point via getProcAddress callback - see
__glXFindVendorDispatchAddress(). Thus as-is we have at least three
unneeded indirections in some 20+ entrypoints ... right ?

Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls (v3)

2016-09-16 Thread Emil Velikov
On 14 September 2016 at 14:59, Adam Jackson  wrote:
> From: Kyle Brenneman 
>
> This decorates every EGL entrypoint with _EGL_FUNC_START, which records
> the function name and primary dispatch object label in the current
> thread state. It also adds debug report functions and calls them when
> appropriate.
>
> This would be useful enough for debugging on its own, if the user set a
> breakpoint when the report function was called. We will also need this
> state tracked in order to expose EGL_KHR_debug.
>
> v2:
> - Clear the object label in more cases in _eglSetFuncName
> - Pass draw surface (if any) to _EGL_FUNC_START in eglSwapInterval
>
> v3:
> - Set dummy thread's CurrentAPI to EGL_OPENGL_ES_API not zero
Maybe (only maybe) we want this as a one-line fix for stable.

But regardless, this and v2 14/14 look a lot better and are
Reviewed-by: Emil Velikov 

Thanks for addressing all the comment Adam !
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 5/7] egl/wayland: introduce dri2_wl_add_configs_for_visuals() helper

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

Analogous to previous commits - with an extra bonus.

Current code, apart from not attributing the lack of 'per visual'
and overall configs also overwrites the newly added config.

Namely if the dpy supports two or more of the supported formats
(XRGB, ARGB and RGB565) earlier configs will be overwritten
and the the final one will be stored, since the we use the same index
for all three in our dri2_add_config call.

v2: Use correct comparison in loop conditional (Eric)
Use valid C initializer (Gurchetan)

Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/platform_wayland.c | 83 -
 1 file changed, 51 insertions(+), 32 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_wayland.c 
b/src/egl/drivers/dri2/platform_wayland.c
index 63edf2e..726a458 100644
--- a/src/egl/drivers/dri2/platform_wayland.c
+++ b/src/egl/drivers/dri2/platform_wayland.c
@@ -1083,16 +1083,52 @@ static const __DRIextension *image_loader_extensions[] 
= {
 };
 
 static EGLBoolean
+dri2_wl_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay *disp)
+{
+   struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
+   static const struct {
+  const char *format_name;
+  int has_format;
+  unsigned int rgba_masks[4];
+   } visuals[] = {
+  { "XRGB", HAS_XRGB, { 0xff, 0xff00, 0x00ff, 0xff00 } },
+  { "ARGB", HAS_ARGB, { 0xff, 0xff00, 0x00ff, 0 } },
+  { "RGB565",   HAS_RGB565,   { 0x00f800, 0x07e0, 0x001f, 0 } },
+   };
+   unsigned int format_count[ARRAY_SIZE(visuals)] = { 0 };
+   unsigned int count, i, j;
+
+   count = 0;
+   for (i = 0; dri2_dpy->driver_configs[i]; i++) {
+  for (j = 0; j < ARRAY_SIZE(visuals); j++) {
+ struct dri2_egl_config *dri2_conf;
+
+ if (!(dri2_dpy->formats & visuals[j].has_format))
+continue;
+
+ dri2_conf = dri2_add_config(disp, dri2_dpy->driver_configs[i],
+   count + 1, EGL_WINDOW_BIT, NULL, visuals[j].rgba_masks);
+ if (dri2_conf) {
+count++;
+format_count[j]++;
+ }
+  }
+   }
+
+   for (i = 0; i < ARRAY_SIZE(format_count); i++) {
+  if (!format_count[i]) {
+ _eglLog(_EGL_DEBUG, "No DRI config supports native format %s",
+ visuals[i].format_name);
+  }
+   }
+
+   return (count != 0);
+}
+
+static EGLBoolean
 dri2_initialize_wayland_drm(_EGLDriver *drv, _EGLDisplay *disp)
 {
struct dri2_egl_display *dri2_dpy;
-   const __DRIconfig *config;
-   uint32_t types;
-   int i;
-   static const unsigned int argb_masks[4] =
-  { 0xff, 0xff00, 0xff, 0xff00 };
-   static const unsigned int rgb_masks[4] = { 0xff, 0xff00, 0xff, 0 };
-   static const unsigned int rgb565_masks[4] = { 0xf800, 0x07e0, 0x001f, 0 };
 
loader_set_logger(_eglLog);
 
@@ -1195,15 +1231,9 @@ dri2_initialize_wayland_drm(_EGLDriver *drv, _EGLDisplay 
*disp)
   goto cleanup_screen;
}
 
-   types = EGL_WINDOW_BIT;
-   for (i = 0; dri2_dpy->driver_configs[i]; i++) {
-  config = dri2_dpy->driver_configs[i];
-  if (dri2_dpy->formats & HAS_XRGB)
-dri2_add_config(disp, config, i + 1, types, NULL, rgb_masks);
-  if (dri2_dpy->formats & HAS_ARGB)
-dri2_add_config(disp, config, i + 1, types, NULL, argb_masks);
-  if (dri2_dpy->formats & HAS_RGB565)
-dri2_add_config(disp, config, i + 1, types, NULL, rgb565_masks);
+   if (!dri2_wl_add_configs_for_visuals(drv, disp)) {
+  _eglError(EGL_NOT_INITIALIZED, "DRI2: failed to add configs");
+  goto cleanup_screen;
}
 
dri2_set_WL_bind_wayland_display(drv, disp);
@@ -1816,13 +1846,6 @@ static EGLBoolean
 dri2_initialize_wayland_swrast(_EGLDriver *drv, _EGLDisplay *disp)
 {
struct dri2_egl_display *dri2_dpy;
-   const __DRIconfig *config;
-   uint32_t types;
-   int i;
-   static const unsigned int argb_masks[4] =
-  { 0xff, 0xff00, 0xff, 0xff00 };
-   static const unsigned int rgb_masks[4] = { 0xff, 0xff00, 0xff, 0 };
-   static const unsigned int rgb565_masks[4] = { 0xf800, 0x07e0, 0x001f, 0 };
 
loader_set_logger(_eglLog);
 
@@ -1869,15 +1892,9 @@ dri2_initialize_wayland_swrast(_EGLDriver *drv, 
_EGLDisplay *disp)
 
dri2_wl_setup_swap_interval(dri2_dpy);
 
-   types = EGL_WINDOW_BIT;
-   for (i = 0; dri2_dpy->driver_configs[i]; i++) {
-  config = dri2_dpy->driver_configs[i];
-  if (dri2_dpy->formats & HAS_XRGB)
-dri2_add_config(disp, config, i + 1, types, NULL, rgb_masks);
-  if (dri2_dpy->formats & HAS_ARGB)
-dri2_add_config(disp, config, i + 1, types, NULL, argb_masks);
-  if (dri2_dpy->formats & HAS_RGB565)
-dri2_add_config(disp, config, i + 1, types, NULL, rgb565_masks);
+   if (!dri2_wl_add_configs_for_visuals(drv, disp)) {
+  _eglError(EGL_NOT_INITIALIZED, "DRI2: failed to add configs");
+  goto cleanup_screen;
}
 
/* Fill 

[Mesa-dev] [PATCH v2 7/7] egl/drm: set eglError and provide an error message on failure

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

v2: Remove gratuitous newline/semicolon (Eric)

Signed-off-by: Emil Velikov 
Reviewed-by: Eric Engestrom 
---
 src/egl/drivers/dri2/platform_drm.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_drm.c 
b/src/egl/drivers/dri2/platform_drm.c
index 168c9dc..ea1a7f1 100644
--- a/src/egl/drivers/dri2/platform_drm.c
+++ b/src/egl/drivers/dri2/platform_drm.c
@@ -657,6 +657,7 @@ dri2_initialize_drm(_EGLDriver *drv, _EGLDisplay *disp)
 {
struct dri2_egl_display *dri2_dpy;
struct gbm_device *gbm;
+   const char *err;
int fd = -1;
 
loader_set_logger(_eglLog);
@@ -677,20 +678,28 @@ dri2_initialize_drm(_EGLDriver *drv, _EGLDisplay *disp)
  fd = loader_open_device("/dev/dri/card0");
   dri2_dpy->own_device = 1;
   gbm = gbm_create_device(fd);
-  if (gbm == NULL)
+  if (gbm == NULL) {
+ err = "DRI2: failed to create gbm device";
  goto cleanup;
+  }
} else {
   fd = fcntl(gbm_device_get_fd(gbm), F_DUPFD_CLOEXEC, 3);
-  if (fd < 0)
+  if (fd < 0) {
+ err = "DRI2: failed to fcntl() existing gbm device";
  goto cleanup;
+  }
}
 
-   if (strcmp(gbm_device_get_backend_name(gbm), "drm") != 0)
+   if (strcmp(gbm_device_get_backend_name(gbm), "drm") != 0) {
+  err = "DRI2: gbm device using incorrect/incompatible backend";
   goto cleanup;
+   }
 
dri2_dpy->gbm_dri = gbm_dri_device(gbm);
-   if (dri2_dpy->gbm_dri->base.type != GBM_DRM_DRIVER_TYPE_DRI)
+   if (dri2_dpy->gbm_dri->base.type != GBM_DRM_DRIVER_TYPE_DRI) {
+  err = "DRI2: gbm device using incorrect/incompatible type";
   goto cleanup;
+   }
 
dri2_dpy->fd = fd;
dri2_dpy->driver_name = strdup(dri2_dpy->gbm_dri->base.driver_name);
@@ -721,7 +730,7 @@ dri2_initialize_drm(_EGLDriver *drv, _EGLDisplay *disp)
dri2_setup_screen(disp);
 
if (!drm_add_configs_for_visuals(drv, disp)) {
-  _eglError(EGL_NOT_INITIALIZED, "DRI2: failed to add configs");
+  err = "DRI2: failed to add configs";
   goto cleanup;
}
 
@@ -747,5 +756,5 @@ cleanup:
 
free(dri2_dpy);
disp->DriverData = NULL;
-   return EGL_FALSE;
+   return _eglError(EGL_NOT_INITIALIZED, err);
 }
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 6/7] egl/x11: attribute for dri2_add_config failure

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

... in dri2_x11_add_configs_for_visuals().

Currently the latter does not consider that, thus in such cases it adds
"empty" configs in the list.

Properly account for things and as we do that we can reuse count,
instead of calling _eglGetArraySize to deterime if we've added any
configs.

Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/platform_x11.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_x11.c 
b/src/egl/drivers/dri2/platform_x11.c
index 0b1b514..2921147 100644
--- a/src/egl/drivers/dri2/platform_x11.c
+++ b/src/egl/drivers/dri2/platform_x11.c
@@ -723,7 +723,7 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display 
*dri2_dpy,
xcb_screen_iterator_t s;
xcb_depth_iterator_t d;
xcb_visualtype_t *visuals;
-   int i, j, id;
+   int i, j, count;
unsigned int rgba_masks[4];
EGLint surface_type;
EGLint config_attrs[] = {
@@ -734,7 +734,7 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display 
*dri2_dpy,
 
s = xcb_setup_roots_iterator(xcb_get_setup(dri2_dpy->conn));
d = xcb_screen_allowed_depths_iterator(get_xcb_screen(s, dri2_dpy->screen));
-   id = 1;
+   count = 0;
 
surface_type =
   EGL_WINDOW_BIT |
@@ -754,6 +754,9 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display 
*dri2_dpy,
 
 class_added[visuals[i]._class] = EGL_TRUE;
 for (j = 0; dri2_dpy->driver_configs[j]; j++) {
+struct dri2_egl_config *dri2_conf;
+const __DRIconfig *config = dri2_dpy->driver_configs[j];
+
 config_attrs[1] = visuals[i].visual_id;
 config_attrs[3] = visuals[i]._class;
 
@@ -761,8 +764,10 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display 
*dri2_dpy,
 rgba_masks[1] = visuals[i].green_mask;
 rgba_masks[2] = visuals[i].blue_mask;
 rgba_masks[3] = 0;
-   dri2_add_config(disp, dri2_dpy->driver_configs[j], id++,
-   surface_type, config_attrs, rgba_masks);
+dri2_conf = dri2_add_config(disp, config, count + 1, surface_type,
+config_attrs, rgba_masks);
+if (dri2_conf)
+   count++;
 
 /* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig.
  * Otherwise it will only match a 32-bit RGBA visual.  On a
@@ -774,8 +779,10 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display 
*dri2_dpy,
 if (d.data->depth == 24) {
rgba_masks[3] =
   ~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]);
-   dri2_add_config(disp, dri2_dpy->driver_configs[j], id++,
-   surface_type, config_attrs, rgba_masks);
+   dri2_conf = dri2_add_config(disp, config, count + 1, 
surface_type,
+   config_attrs, rgba_masks);
+   if (dri2_conf)
+  count++;
 }
 }
   }
@@ -783,7 +790,7 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display 
*dri2_dpy,
   xcb_depth_next();
}
 
-   if (!_eglGetArraySize(disp->Configs)) {
+   if (!count) {
   _eglLog(_EGL_WARNING, "DRI2: failed to create any config");
   return EGL_FALSE;
}
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 4/7] egl/surfaceless: tweak surfaceless_add_configs_for_visuals()

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

Analogous to previous commit.

v2: Use correct comparison in loop conditional (Eric)
Use valid C initializer (Gurchetan)

Signed-off-by: Emil Velikov 
Reviewed-by: Gurchetan Singh 
---
 src/egl/drivers/dri2/platform_surfaceless.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_surfaceless.c 
b/src/egl/drivers/dri2/platform_surfaceless.c
index 9e2aa7c..3fc1a68 100644
--- a/src/egl/drivers/dri2/platform_surfaceless.c
+++ b/src/egl/drivers/dri2/platform_surfaceless.c
@@ -189,25 +189,26 @@ surfaceless_add_configs_for_visuals(_EGLDriver *drv, 
_EGLDisplay *dpy)
   { "RGB888",   { 0xff, 0xff00, 0xff, 0x0 } },
   { "RGB565",   { 0x00f800, 0x07e0, 0x1f, 0x0 } },
};
+   unsigned int format_count[ARRAY_SIZE(visuals)] = { 0 };
unsigned int count, i, j;
 
count = 0;
-   for (i = 0; i < ARRAY_SIZE(visuals); i++) {
-  int format_count = 0;
-
-  for (j = 0; dri2_dpy->driver_configs[j]; j++) {
+   for (i = 0; i < dri2_dpy->driver_configs[i]; i++) {
+  for (j = 0; j < ARRAY_SIZE(visuals); j++) {
  struct dri2_egl_config *dri2_conf;
 
- dri2_conf = dri2_add_config(dpy, dri2_dpy->driver_configs[j],
+ dri2_conf = dri2_add_config(dpy, dri2_dpy->driver_configs[i],
count + 1, EGL_PBUFFER_BIT, NULL, visuals[i].rgba_masks);
 
  if (dri2_conf) {
 count++;
-format_count++;
+format_count[j]++;
  }
   }
+   }
 
-  if (!format_count) {
+   for (i = 0; i < ARRAY_SIZE(format_count); i++) {
+  if (!format_count[i]) {
  _eglLog(_EGL_DEBUG, "No DRI config supports native format %s",
visuals[i].format_name);
   }
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/7] egl/android: tweak droid_add_configs_for_visuals()

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

Iterate over the driver_configs first in order to cut down the number of
getConfigAttrib() calls by a factor of 5.

While we're here, also drop the sentinel of the visuals array. We
already know its size so we can use that and save a few bytes.

v2: Use correct comparison in loop conditional (Eric)
Use valid C initializer (Gurchetan)

Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/platform_android.c | 40 +
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_android.c 
b/src/egl/drivers/dri2/platform_android.c
index c10ae59..e3aac0a 100644
--- a/src/egl/drivers/dri2/platform_android.c
+++ b/src/egl/drivers/dri2/platform_android.c
@@ -759,7 +759,6 @@ droid_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay 
*dpy)
   { HAL_PIXEL_FORMAT_RGB_888,   { 0xff, 0xff00, 0xff, 0x0 } },
   { HAL_PIXEL_FORMAT_RGB_565,   { 0xf800, 0x7e0, 0x1f, 0x0 } },
   { HAL_PIXEL_FORMAT_BGRA_, { 0xff, 0xff00, 0xff, 0xff00 } },
-  { 0, { 0, 0, 0, 0 } }
};
EGLint config_attrs[] = {
  EGL_NATIVE_VISUAL_ID,   0,
@@ -770,38 +769,41 @@ droid_add_configs_for_visuals(_EGLDriver *drv, 
_EGLDisplay *dpy)
  EGL_MAX_PBUFFER_HEIGHT, _EGL_MAX_PBUFFER_HEIGHT,
  EGL_NONE
};
+   unsigned int format_count[ARRAY_SIZE(visuals)] = { 0 };
int count, i, j;
 
count = 0;
-   for (i = 0; visuals[i].format; i++) {
-  int format_count = 0;
+   for (i = 0; dri2_dpy->driver_configs[i]; i++) {
+  const EGLint surface_type = EGL_WINDOW_BIT | EGL_PBUFFER_BIT;
+  struct dri2_egl_config *dri2_conf;
+  unsigned int double_buffered = 0;
 
-  config_attrs[1] = visuals[i].format;
-  config_attrs[3] = visuals[i].format;
+  dri2_dpy->core->getConfigAttrib(dri2_dpy->driver_configs[i],
+ __DRI_ATTRIB_DOUBLE_BUFFER, _buffered);
 
-  for (j = 0; dri2_dpy->driver_configs[j]; j++) {
- const EGLint surface_type = EGL_WINDOW_BIT | EGL_PBUFFER_BIT;
- struct dri2_egl_config *dri2_conf;
- unsigned int double_buffered = 0;
+  /* support only double buffered configs */
+  if (!double_buffered)
+ continue;
 
- dri2_dpy->core->getConfigAttrib(dri2_dpy->driver_configs[j],
-__DRI_ATTRIB_DOUBLE_BUFFER, _buffered);
+  for (j = 0; j < ARRAY_SIZE(visuals); j++) {
+ int format_count = 0;
 
- /* support only double buffered configs */
- if (!double_buffered)
-continue;
+ config_attrs[1] = visuals[j].format;
+ config_attrs[3] = visuals[j].format;
 
- dri2_conf = dri2_add_config(dpy, dri2_dpy->driver_configs[j],
-   count + 1, surface_type, config_attrs, visuals[i].rgba_masks);
+ dri2_conf = dri2_add_config(dpy, dri2_dpy->driver_configs[i],
+   count + 1, surface_type, config_attrs, visuals[j].rgba_masks);
  if (dri2_conf) {
 count++;
-format_count++;
+format_count[j]++;
  }
   }
+   }
 
-  if (!format_count) {
+   for (i = 0; i < ARRAY_SIZE(format_count); i++) {
+  if (!format_count[i]) {
  _eglLog(_EGL_DEBUG, "No DRI config supports native format 0x%x",
-   visuals[i].format);
+ visuals[i].format);
   }
}
 
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/7] egl/drm: introduce drm_add_configs_for_visuals() helper

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

Factor out and rework the existing code so that it prints a debug
message if we have zero configs for any visual.

As a nice side effect we now provide a correct (sequential ID) when
creating a config (via dri2_add_config).

v2: Use correct comparison in loop conditional (Eric)
Use valid C initializer (Gurchetan)

Signed-off-by: Emil Velikov 
Reviewed-by: Eric Engestrom 
---
 src/egl/drivers/dri2/platform_drm.c | 89 +
 1 file changed, 61 insertions(+), 28 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_drm.c 
b/src/egl/drivers/dri2/platform_drm.c
index bb1515f..168c9dc 100644
--- a/src/egl/drivers/dri2/platform_drm.c
+++ b/src/egl/drivers/dri2/platform_drm.c
@@ -575,6 +575,64 @@ swrast_get_image(__DRIdrawable *driDrawable,
gbm_dri_bo_unmap_dumb(bo);
 }
 
+static EGLBoolean
+drm_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay *disp)
+{
+   struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
+   static const struct {
+  int format;
+  unsigned int red_mask;
+  unsigned int alpha_mask;
+   } visuals[] = {
+  { GBM_FORMAT_XRGB2101010, 0x3ff0, 0x },
+  { GBM_FORMAT_ARGB2101010, 0x3ff0, 0xc000 },
+  { GBM_FORMAT_XRGB,0x00ff, 0x },
+  { GBM_FORMAT_ARGB,0x00ff, 0xff00 },
+  { GBM_FORMAT_RGB565,  0xf800, 0x },
+   };
+   EGLint attr_list[] = {
+  EGL_NATIVE_VISUAL_ID, 0,
+  EGL_NONE,
+   };
+   unsigned int format_count[ARRAY_SIZE(visuals)] = { 0 };
+   unsigned int count, i, j;
+
+   count = 0;
+   for (i = 0; dri2_dpy->driver_configs[i]; i++) {
+  unsigned int red, alpha;
+
+  dri2_dpy->core->getConfigAttrib(dri2_dpy->driver_configs[i],
+  __DRI_ATTRIB_RED_MASK, );
+  dri2_dpy->core->getConfigAttrib(dri2_dpy->driver_configs[i],
+  __DRI_ATTRIB_ALPHA_MASK, );
+
+  for (j = 0; j < ARRAY_SIZE(visuals); j++) {
+ struct dri2_egl_config *dri2_conf;
+
+ if (visuals[j].red_mask != red || visuals[j].alpha_mask != alpha)
+continue;
+
+ attr_list[1] = visuals[j].format;
+
+ dri2_conf = dri2_add_config(disp, dri2_dpy->driver_configs[i],
+   count + 1, EGL_WINDOW_BIT, attr_list, NULL);
+ if (dri2_conf) {
+count++;
+format_count[j]++;
+ }
+  }
+   }
+
+   for (i = 0; i < ARRAY_SIZE(format_count); i++) {
+  if (!format_count[i]) {
+ _eglLog(_EGL_DEBUG, "No DRI config supports native format 0x%x",
+ visuals[i].format);
+  }
+   }
+
+   return (count != 0);
+}
+
 static struct dri2_egl_display_vtbl dri2_drm_display_vtbl = {
.authenticate = dri2_drm_authenticate,
.create_window_surface = dri2_drm_create_window_surface,
@@ -600,7 +658,6 @@ dri2_initialize_drm(_EGLDriver *drv, _EGLDisplay *disp)
struct dri2_egl_display *dri2_dpy;
struct gbm_device *gbm;
int fd = -1;
-   int i;
 
loader_set_logger(_eglLog);
 
@@ -663,33 +720,9 @@ dri2_initialize_drm(_EGLDriver *drv, _EGLDisplay *disp)
 
dri2_setup_screen(disp);
 
-   for (i = 0; dri2_dpy->driver_configs[i]; i++) {
-  EGLint format, attr_list[3];
-  unsigned int red, alpha;
-
-  dri2_dpy->core->getConfigAttrib(dri2_dpy->driver_configs[i],
-   __DRI_ATTRIB_RED_MASK, );
-  dri2_dpy->core->getConfigAttrib(dri2_dpy->driver_configs[i],
-   __DRI_ATTRIB_ALPHA_MASK, );
-  if (red == 0x3ff0 && alpha == 0x)
- format = GBM_FORMAT_XRGB2101010;
-  else if (red == 0x3ff0 && alpha == 0xc000)
- format = GBM_FORMAT_ARGB2101010;
-  else if (red == 0x00ff && alpha == 0x)
- format = GBM_FORMAT_XRGB;
-  else if (red == 0x00ff && alpha == 0xff00)
- format = GBM_FORMAT_ARGB;
-  else if (red == 0xf800)
- format = GBM_FORMAT_RGB565;
-  else
- continue;
-
-  attr_list[0] = EGL_NATIVE_VISUAL_ID;
-  attr_list[1] = format;
-  attr_list[2] = EGL_NONE;
-
-  dri2_add_config(disp, dri2_dpy->driver_configs[i],
-  i + 1, EGL_WINDOW_BIT, attr_list, NULL);
+   if (!drm_add_configs_for_visuals(drv, disp)) {
+  _eglError(EGL_NOT_INITIALIZED, "DRI2: failed to add configs");
+  goto cleanup;
}
 
disp->Extensions.KHR_image_pixmap = EGL_TRUE;
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/7] egl/surfaceless: print out a message on zero configs for given format

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

Currently we print a debug message if the total configs is non-zero only
to do the same (at an error level) as we return from the function.

Rework the message to print if we're missing a config for the given
format.

Signed-off-by: Emil Velikov 
Reviewed-by: Gurchetan Singh 
---
 src/egl/drivers/dri2/platform_surfaceless.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_surfaceless.c 
b/src/egl/drivers/dri2/platform_surfaceless.c
index 386aa7a..9e2aa7c 100644
--- a/src/egl/drivers/dri2/platform_surfaceless.c
+++ b/src/egl/drivers/dri2/platform_surfaceless.c
@@ -181,28 +181,37 @@ static EGLBoolean
 surfaceless_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay *dpy)
 {
struct dri2_egl_display *dri2_dpy = dri2_egl_display(dpy);
-   static const unsigned int visuals[3][4] = {
-  { 0xff, 0xff00, 0xff, 0xff00 },   // ARGB
-  { 0xff, 0xff00, 0xff, 0x0 },  // RGB888
-  { 0xf800, 0x7e0, 0x1f, 0x0  },// RGB565
+   static const struct {
+  const char *format_name;
+  unsigned int rgba_masks[4];
+   } visuals[] = {
+  { "ARGB", { 0xff, 0xff00, 0xff, 0xff00 } },
+  { "RGB888",   { 0xff, 0xff00, 0xff, 0x0 } },
+  { "RGB565",   { 0x00f800, 0x07e0, 0x1f, 0x0 } },
};
unsigned int count, i, j;
 
count = 0;
for (i = 0; i < ARRAY_SIZE(visuals); i++) {
+  int format_count = 0;
+
   for (j = 0; dri2_dpy->driver_configs[j]; j++) {
  struct dri2_egl_config *dri2_conf;
 
  dri2_conf = dri2_add_config(dpy, dri2_dpy->driver_configs[j],
-   count + 1, EGL_PBUFFER_BIT, NULL, visuals[i]);
+   count + 1, EGL_PBUFFER_BIT, NULL, visuals[i].rgba_masks);
 
- if (dri2_conf)
+ if (dri2_conf) {
 count++;
+format_count++;
+ }
   }
-   }
 
-   if (!count)
-  _eglLog(_EGL_DEBUG, "Can't create surfaceless visuals");
+  if (!format_count) {
+ _eglLog(_EGL_DEBUG, "No DRI config supports native format %s",
+   visuals[i].format_name);
+  }
+   }
 
return (count != 0);
 }
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/9] gallium/tgsi: 64-bit integer foundations

2016-09-16 Thread Ian Romanick
On 09/16/2016 06:48 AM, Nicolai Hähnle wrote:
> Hi all,
> 
> this is really Dave's work, with a few touch-ups from me that I think make
> sense. I've kept those separate with the intention to squash. I'd like to
> land these in master even before the main ARB_gpu_shader_int64 stuff lands
> (that is currently in Ian's court).

If you guys are comfortable enabling it in radeonsi, I think the rest of
the code is close enough to ready to land.  I'm sure that we'll find
more bugs as more tests become available, but that's always the case.
I've updated my arb_gpu_shader_int64 tree, but it's intertwined with
some other stuff.  I can de-tangle it easy enough.

> The reason is that radeonsi's ARB_query_buffer_object support needs 64-bit
> integers in shaders, and for that it's convenient to have all the TGSI
> opcodes and gallivm bits in place already.
> 
> Any objections? Reviews?
> Thanks,
> Nicolai
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/9] radeonsi: ARB_query_buffer_object implementation

2016-09-16 Thread Ian Romanick
On 09/16/2016 06:57 AM, Nicolai Hähnle wrote:
> Hi all,
> 
> as the title says. The implementation uses a compute shader to summarize
> data from the query buffers. As long as only one query buffer is in flight
> (the normal case), that compute shader is launched exactly once, on a
> single thread. If multiple buffers were required, then one compute grid is
> launched for each of these buffers, in sequence.
> 
> All of this could be done in much fancier ways using bindless buffers and
> wave-wide computations, but really, the expectation is that most queries
> will be rather simple (though occlusion queries always contain at least 8
> result pairs, so it's not like it would be completely pointless).
> 
> This code also exposes the hilarious lowering of 64-bit integer divides
> in LLVM, since timestamp queries use it. This lowering generates more than
> 2KB of code for a single division, which is excessive even when the division
> *isn't* by a constant. The right place to fix this is in LLVM, and I'm
> already looking into it. For normal queries this is completely irrelevant
> because the code will just be skipped.

Is the division by a constant?  If it is, you might want to use
something like what libdivide would generate.

> Please review!
> Thanks
> Nicolai
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/30] egl: a ton of eglMakeCurrent inspired cleanups

2016-09-16 Thread Emil Velikov
On 25 August 2016 at 17:18, Emil Velikov  wrote:
> Hi all,
>
> With the resent noise in the egl area I decided to do some of the long
> planned cleanup in the area. It spans across the following:
>
>  - glapi missing glFlush and non-shared glapi are not an option
>  - encapsulate/separate disp->DriverData (more?) management
>  - 'unwrap' eglMakeCurrent
>  - don't opencode what can be static const data
>  - unify WL_bind_wayland_display management
>
> Note: some of the patches can be folded or split. Please let me know
> if you have any preferences.
>
Humble ping. A fair few patches haven't been reviewed so if there's
anyone interested please go ahead.

Alternatively I'll push the lot ~mid next week.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 30/30] egl/dri2: set WL_bind_wayland_display in a consistent way

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

Introduce a helper and use it throughout the platform code. This allows
us to reduce the amount of ifdef(s) and (potentially) use
kms_swrast_dri.so for !drm platforms (namely wayland and x11).

Note: in the future as other platforms (android, surfaceless) support
the extension they can reuse the helper.

v2: Rebase, check for device_name.
Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/egl_dri2.h | 24 
 src/egl/drivers/dri2/platform_drm.c | 16 ++--
 src/egl/drivers/dri2/platform_wayland.c |  2 +-
 src/egl/drivers/dri2/platform_x11.c |  9 ++---
 4 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.h b/src/egl/drivers/dri2/egl_dri2.h
index fb170a5..bf18646 100644
--- a/src/egl/drivers/dri2/egl_dri2.h
+++ b/src/egl/drivers/dri2/egl_dri2.h
@@ -397,4 +397,28 @@ const __DRIconfig *
 dri2_get_dri_config(struct dri2_egl_config *conf, EGLint surface_type,
 EGLenum colorspace);
 
+static inline void
+dri2_set_WL_bind_wayland_display(_EGLDriver *drv, _EGLDisplay *disp)
+{
+#ifdef HAVE_WAYLAND_PLATFORM
+   struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
+
+   (void) drv;
+
+   if (dri2_dpy->device_name && dri2_dpy->image) {
+   if (dri2_dpy->image->base.version >= 10 &&
+   dri2_dpy->image->getCapabilities != NULL) {
+   int capabilities;
+
+   capabilities =
+   dri2_dpy->image->getCapabilities(dri2_dpy->dri_screen);
+   disp->Extensions.WL_bind_wayland_display =
+   (capabilities & __DRI_IMAGE_CAP_GLOBAL_NAMES) != 0;
+   } else {
+   disp->Extensions.WL_bind_wayland_display = EGL_TRUE;
+   }
+   }
+#endif
+}
+
 #endif /* EGL_DRI2_INCLUDED */
diff --git a/src/egl/drivers/dri2/platform_drm.c 
b/src/egl/drivers/dri2/platform_drm.c
index 2668dff..bb1515f 100644
--- a/src/egl/drivers/dri2/platform_drm.c
+++ b/src/egl/drivers/dri2/platform_drm.c
@@ -697,21 +697,9 @@ dri2_initialize_drm(_EGLDriver *drv, _EGLDisplay *disp)
   disp->Extensions.EXT_buffer_age = EGL_TRUE;
 
 #ifdef HAVE_WAYLAND_PLATFORM
-   if (dri2_dpy->image) {
-   dri2_dpy->device_name = loader_get_device_name_for_fd(dri2_dpy->fd);
-
-   if (dri2_dpy->image->base.version >= 10 &&
-   dri2_dpy->image->getCapabilities != NULL) {
-   int capabilities;
-
-   capabilities =
-   dri2_dpy->image->getCapabilities(dri2_dpy->dri_screen);
-   disp->Extensions.WL_bind_wayland_display =
-   (capabilities & __DRI_IMAGE_CAP_GLOBAL_NAMES) != 0;
-   } else
-   disp->Extensions.WL_bind_wayland_display = EGL_TRUE;
-   }
+   dri2_dpy->device_name = loader_get_device_name_for_fd(dri2_dpy->fd);
 #endif
+   dri2_set_WL_bind_wayland_display(drv, disp);
 
/* Fill vtbl last to prevent accidentally calling virtual function during
 * initialization.
diff --git a/src/egl/drivers/dri2/platform_wayland.c 
b/src/egl/drivers/dri2/platform_wayland.c
index 005d2f3..63edf2e 100644
--- a/src/egl/drivers/dri2/platform_wayland.c
+++ b/src/egl/drivers/dri2/platform_wayland.c
@@ -1206,7 +1206,7 @@ dri2_initialize_wayland_drm(_EGLDriver *drv, _EGLDisplay 
*disp)
 dri2_add_config(disp, config, i + 1, types, NULL, rgb565_masks);
}
 
-   disp->Extensions.WL_bind_wayland_display = EGL_TRUE;
+   dri2_set_WL_bind_wayland_display(drv, disp);
/* When cannot convert EGLImage to wl_buffer when on a different gpu,
 * because the buffer of the EGLImage has likely a tiling mode the server
 * gpu won't support. These is no way to check for now. Thus do not support 
the
diff --git a/src/egl/drivers/dri2/platform_x11.c 
b/src/egl/drivers/dri2/platform_x11.c
index c27f289..0b1b514 100644
--- a/src/egl/drivers/dri2/platform_x11.c
+++ b/src/egl/drivers/dri2/platform_x11.c
@@ -1339,10 +1339,7 @@ dri2_initialize_x11_dri3(_EGLDriver *drv, _EGLDisplay 
*disp)
disp->Extensions.CHROMIUM_sync_control = EGL_TRUE;
disp->Extensions.EXT_buffer_age = EGL_TRUE;
 
-#ifdef HAVE_WAYLAND_PLATFORM
-   if (dri2_dpy->device_name)
-  disp->Extensions.WL_bind_wayland_display = EGL_TRUE;
-#endif
+   dri2_set_WL_bind_wayland_display(drv, disp);
 
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp, false))
   goto cleanup_configs;
@@ -1458,9 +1455,7 @@ dri2_initialize_x11_dri2(_EGLDriver *drv, _EGLDisplay 
*disp)
disp->Extensions.NV_post_sub_buffer = EGL_TRUE;
disp->Extensions.CHROMIUM_sync_control = EGL_TRUE;
 
-#ifdef HAVE_WAYLAND_PLATFORM
-   disp->Extensions.WL_bind_wayland_display = EGL_TRUE;
-#endif
+   dri2_set_WL_bind_wayland_display(drv, disp);
 
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp, true))
   goto cleanup_configs;
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 26/30] egl/dri2: annotate dri2_extension_match instances as const data

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

v2: Rebase.

Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/egl_dri2.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index d2ae25a..75070da 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -356,37 +356,37 @@ struct dri2_extension_match {
int offset;
 };
 
-static struct dri2_extension_match dri3_driver_extensions[] = {
+static const struct dri2_extension_match dri3_driver_extensions[] = {
{ __DRI_CORE, 1, offsetof(struct dri2_egl_display, core) },
{ __DRI_IMAGE_DRIVER, 1, offsetof(struct dri2_egl_display, image_driver) },
{ NULL, 0, 0 }
 };
 
-static struct dri2_extension_match dri2_driver_extensions[] = {
+static const struct dri2_extension_match dri2_driver_extensions[] = {
{ __DRI_CORE, 1, offsetof(struct dri2_egl_display, core) },
{ __DRI_DRI2, 2, offsetof(struct dri2_egl_display, dri2) },
{ NULL, 0, 0 }
 };
 
-static struct dri2_extension_match dri2_core_extensions[] = {
+static const struct dri2_extension_match dri2_core_extensions[] = {
{ __DRI2_FLUSH, 1, offsetof(struct dri2_egl_display, flush) },
{ __DRI_TEX_BUFFER, 2, offsetof(struct dri2_egl_display, tex_buffer) },
{ __DRI_IMAGE, 1, offsetof(struct dri2_egl_display, image) },
{ NULL, 0, 0 }
 };
 
-static struct dri2_extension_match swrast_driver_extensions[] = {
+static const struct dri2_extension_match swrast_driver_extensions[] = {
{ __DRI_CORE, 1, offsetof(struct dri2_egl_display, core) },
{ __DRI_SWRAST, 2, offsetof(struct dri2_egl_display, swrast) },
{ NULL, 0, 0 }
 };
 
-static struct dri2_extension_match swrast_core_extensions[] = {
+static const struct dri2_extension_match swrast_core_extensions[] = {
{ __DRI_TEX_BUFFER, 2, offsetof(struct dri2_egl_display, tex_buffer) },
{ NULL, 0, 0 }
 };
 
-static struct dri2_extension_match optional_core_extensions[] = {
+static const struct dri2_extension_match optional_core_extensions[] = {
{ __DRI2_ROBUSTNESS, 1, offsetof(struct dri2_egl_display, robustness) },
{ __DRI2_CONFIG_QUERY, 1, offsetof(struct dri2_egl_display, config) },
{ __DRI2_FENCE, 1, offsetof(struct dri2_egl_display, fence) },
@@ -397,7 +397,7 @@ static struct dri2_extension_match 
optional_core_extensions[] = {
 
 static EGLBoolean
 dri2_bind_extensions(struct dri2_egl_display *dri2_dpy,
- struct dri2_extension_match *matches,
+ const struct dri2_extension_match *matches,
  const __DRIextension **extensions,
  bool optional)
 {
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 25/30] egl/dri2: use dri2_bind_extensions to manage the optional extensions

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

v2: dri2_bind_extensions() now takes optional as an argument.

Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/egl_dri2.c | 28 ++--
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index 57c0760..d2ae25a 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -386,6 +386,15 @@ static struct dri2_extension_match 
swrast_core_extensions[] = {
{ NULL, 0, 0 }
 };
 
+static struct dri2_extension_match optional_core_extensions[] = {
+   { __DRI2_ROBUSTNESS, 1, offsetof(struct dri2_egl_display, robustness) },
+   { __DRI2_CONFIG_QUERY, 1, offsetof(struct dri2_egl_display, config) },
+   { __DRI2_FENCE, 1, offsetof(struct dri2_egl_display, fence) },
+   { __DRI2_RENDERER_QUERY, 1, offsetof(struct dri2_egl_display, 
rendererQuery) },
+   { __DRI2_INTEROP, 1, offsetof(struct dri2_egl_display, interop) },
+   { NULL, 0, 0 }
+};
+
 static EGLBoolean
 dri2_bind_extensions(struct dri2_egl_display *dri2_dpy,
  struct dri2_extension_match *matches,
@@ -677,7 +686,6 @@ dri2_create_screen(_EGLDisplay *disp)
 {
const __DRIextension **extensions;
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
-   unsigned i;
 
if (dri2_dpy->image_driver) {
   dri2_dpy->dri_screen =
@@ -731,23 +739,7 @@ dri2_create_screen(_EGLDisplay *disp)
  goto cleanup_dri_screen;
}
 
-   for (i = 0; extensions[i]; i++) {
-  if (strcmp(extensions[i]->name, __DRI2_ROBUSTNESS) == 0) {
- dri2_dpy->robustness = (__DRIrobustnessExtension *) extensions[i];
-  }
-  if (strcmp(extensions[i]->name, __DRI2_CONFIG_QUERY) == 0) {
- dri2_dpy->config = (__DRI2configQueryExtension *) extensions[i];
-  }
-  if (strcmp(extensions[i]->name, __DRI2_FENCE) == 0) {
- dri2_dpy->fence = (__DRI2fenceExtension *) extensions[i];
-  }
-  if (strcmp(extensions[i]->name, __DRI2_RENDERER_QUERY) == 0) {
- dri2_dpy->rendererQuery = (__DRI2rendererQueryExtension *) 
extensions[i];
-  }
-  if (strcmp(extensions[i]->name, __DRI2_INTEROP) == 0)
- dri2_dpy->interop = (__DRI2interopExtension *) extensions[i];
-   }
-
+   dri2_bind_extensions(dri2_dpy, optional_core_extensions, extensions, true);
dri2_setup_screen(disp);
 
return EGL_TRUE;
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 23/30] egl/dri2: add support for optional extensions in dri2_bind_extensions()

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

Will allow us to reuse the function for optional extensions and fold a
bit of code.

v2: Make dri2_bind_extensions::optional flag an argument to
dri2_bind_extensions (Kristian).

Cc: Rob Clark 
Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/egl_dri2.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index e29fab0..57c0760 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -389,7 +389,8 @@ static struct dri2_extension_match swrast_core_extensions[] 
= {
 static EGLBoolean
 dri2_bind_extensions(struct dri2_egl_display *dri2_dpy,
  struct dri2_extension_match *matches,
- const __DRIextension **extensions)
+ const __DRIextension **extensions,
+ bool optional)
 {
int i, j, ret = EGL_TRUE;
void *field;
@@ -410,9 +411,14 @@ dri2_bind_extensions(struct dri2_egl_display *dri2_dpy,
for (j = 0; matches[j].name; j++) {
   field = ((char *) dri2_dpy + matches[j].offset);
   if (*(const __DRIextension **) field == NULL) {
- _eglLog(_EGL_WARNING, "did not find extension %s version %d",
- matches[j].name, matches[j].version);
- ret = EGL_FALSE;
+ if (optional) {
+_eglLog(_EGL_DEBUG, "did not find optional extension %s version 
%d",
+matches[j].name, matches[j].version);
+ } else {
+_eglLog(_EGL_WARNING, "did not find extension %s version %d",
+matches[j].name, matches[j].version);
+ret = EGL_FALSE;
+ }
   }
}
 
@@ -513,7 +519,7 @@ dri2_load_driver_dri3(_EGLDisplay *disp)
if (!extensions)
   return EGL_FALSE;
 
-   if (!dri2_bind_extensions(dri2_dpy, dri3_driver_extensions, extensions)) {
+   if (!dri2_bind_extensions(dri2_dpy, dri3_driver_extensions, extensions, 
false)) {
   dlclose(dri2_dpy->driver);
   return EGL_FALSE;
}
@@ -532,7 +538,7 @@ dri2_load_driver(_EGLDisplay *disp)
if (!extensions)
   return EGL_FALSE;
 
-   if (!dri2_bind_extensions(dri2_dpy, dri2_driver_extensions, extensions)) {
+   if (!dri2_bind_extensions(dri2_dpy, dri2_driver_extensions, extensions, 
false)) {
   dlclose(dri2_dpy->driver);
   return EGL_FALSE;
}
@@ -551,7 +557,7 @@ dri2_load_driver_swrast(_EGLDisplay *disp)
if (!extensions)
   return EGL_FALSE;
 
-   if (!dri2_bind_extensions(dri2_dpy, swrast_driver_extensions, extensions)) {
+   if (!dri2_bind_extensions(dri2_dpy, swrast_driver_extensions, extensions, 
false)) {
   dlclose(dri2_dpy->driver);
   return EGL_FALSE;
}
@@ -717,11 +723,11 @@ dri2_create_screen(_EGLDisplay *disp)
extensions = dri2_dpy->core->getExtensions(dri2_dpy->dri_screen);
 
if (dri2_dpy->image_driver || dri2_dpy->dri2) {
-  if (!dri2_bind_extensions(dri2_dpy, dri2_core_extensions, extensions))
+  if (!dri2_bind_extensions(dri2_dpy, dri2_core_extensions, extensions, 
false))
  goto cleanup_dri_screen;
} else {
   assert(dri2_dpy->swrast);
-  if (!dri2_bind_extensions(dri2_dpy, swrast_core_extensions, extensions))
+  if (!dri2_bind_extensions(dri2_dpy, swrast_core_extensions, extensions, 
false))
  goto cleanup_dri_screen;
}
 
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 20/30] egl/dri2: rework dri2_egl_display::extensions storage

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

Remove the error prone fixed size array.
While we're here also rename to loader_extensions like in the GLX code.

v2: Rebase. Keep image_loader_extension within the wayland_drm
dri2_loader_extensions list.

Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/egl_dri2.c | 10 
 src/egl/drivers/dri2/egl_dri2.h |  2 +-
 src/egl/drivers/dri2/platform_android.c | 27 ---
 src/egl/drivers/dri2/platform_surfaceless.c | 12 ++---
 src/egl/drivers/dri2/platform_wayland.c | 35 +
 src/egl/drivers/dri2/platform_x11.c | 40 +
 6 files changed, 85 insertions(+), 41 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index eca9f56..7c71d5b 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -676,7 +676,7 @@ dri2_create_screen(_EGLDisplay *disp)
if (dri2_dpy->image_driver) {
   dri2_dpy->dri_screen =
  dri2_dpy->image_driver->createNewScreen2(0, dri2_dpy->fd,
-  dri2_dpy->extensions,
+  dri2_dpy->loader_extensions,
   dri2_dpy->driver_extensions,
   _dpy->driver_configs,
   disp);
@@ -684,25 +684,25 @@ dri2_create_screen(_EGLDisplay *disp)
   if (dri2_dpy->dri2->base.version >= 4) {
  dri2_dpy->dri_screen =
 dri2_dpy->dri2->createNewScreen2(0, dri2_dpy->fd,
- dri2_dpy->extensions,
+ dri2_dpy->loader_extensions,
  dri2_dpy->driver_extensions,
  _dpy->driver_configs, disp);
   } else {
  dri2_dpy->dri_screen =
 dri2_dpy->dri2->createNewScreen(0, dri2_dpy->fd,
-dri2_dpy->extensions,
+dri2_dpy->loader_extensions,
 _dpy->driver_configs, disp);
   }
} else {
   assert(dri2_dpy->swrast);
   if (dri2_dpy->swrast->base.version >= 4) {
  dri2_dpy->dri_screen =
-dri2_dpy->swrast->createNewScreen2(0, dri2_dpy->extensions,
+dri2_dpy->swrast->createNewScreen2(0, dri2_dpy->loader_extensions,
dri2_dpy->driver_extensions,
_dpy->driver_configs, 
disp);
   } else {
  dri2_dpy->dri_screen =
-dri2_dpy->swrast->createNewScreen(0, dri2_dpy->extensions,
+dri2_dpy->swrast->createNewScreen(0, dri2_dpy->loader_extensions,
   _dpy->driver_configs, disp);
   }
}
diff --git a/src/egl/drivers/dri2/egl_dri2.h b/src/egl/drivers/dri2/egl_dri2.h
index b475968..dcb863d 100644
--- a/src/egl/drivers/dri2/egl_dri2.h
+++ b/src/egl/drivers/dri2/egl_dri2.h
@@ -196,7 +196,7 @@ struct dri2_egl_display
 
char *driver_name;
 
-   const __DRIextension *extensions[5];
+   const __DRIextension**loader_extensions;
const __DRIextension**driver_extensions;
 
 #ifdef HAVE_X11_PLATFORM
diff --git a/src/egl/drivers/dri2/platform_android.c 
b/src/egl/drivers/dri2/platform_android.c
index 0e43821..f8e9919 100644
--- a/src/egl/drivers/dri2/platform_android.c
+++ b/src/egl/drivers/dri2/platform_android.c
@@ -897,6 +897,20 @@ static const __DRIimageLoaderExtension 
droid_image_loader_extension = {
.flushFrontBuffer= droid_flush_front_buffer,
 };
 
+static const __DRIextension *droid_dri2_loader_extensions[] = {
+   _dri2_loader_extension.base,
+   _lookup_extension.base,
+   _invalidate.base,
+   NULL,
+};
+
+static const __DRIextension *droid_image_loader_extensions[] = {
+   _image_loader_extension.base,
+   _lookup_extension.base,
+   _invalidate.base,
+   NULL,
+};
+
 EGLBoolean
 dri2_initialize_android(_EGLDriver *drv, _EGLDisplay *dpy)
 {
@@ -934,15 +948,10 @@ dri2_initialize_android(_EGLDriver *drv, _EGLDisplay *dpy)
 
/* render nodes cannot use Gem names, and thus do not support
 * the __DRI_DRI2_LOADER extension */
-   if (!dri2_dpy->is_render_node) {
-  dri2_dpy->extensions[0] = _dri2_loader_extension.base;
-   } else {
-  dri2_dpy->extensions[0] = _image_loader_extension.base;
-   }
-   dri2_dpy->extensions[1] = _invalidate.base;
-   dri2_dpy->extensions[2] = _lookup_extension.base;
-   dri2_dpy->extensions[3] = NULL;
-
+   if (!dri2_dpy->is_render_node)
+  dri2_dpy->loader_extensions = droid_dri2_loader_extensions;
+   else
+  dri2_dpy->loader_extensions = droid_image_loader_extensions;
 
if 

[Mesa-dev] [PATCH v2 16/30] egl/x11: don't populate dri2_dpy->dri2_loader_extension

2016-09-16 Thread Emil Velikov
From: Emil Velikov 

Analogous to the earlier android and wayland patches. As we're here we
can drop exposing the old version of the extension.

Any dri loader/driver interface use lower bound checking thus exposing
dri2 loader v3 to a v2 capable driver is perfectly normal.

v2: Preserve compat with dri2_minor < 1. The driver does not know if
there is a protocol to manage getBuffersWithFormat(). It's up-to the
loader to expose the vfunc if there is one. (Kristian)

Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/platform_x11.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_x11.c 
b/src/egl/drivers/dri2/platform_x11.c
index 06b8e1b..7d921f8 100644
--- a/src/egl/drivers/dri2/platform_x11.c
+++ b/src/egl/drivers/dri2/platform_x11.c
@@ -1374,6 +1374,22 @@ dri2_initialize_x11_dri3(_EGLDriver *drv, _EGLDisplay 
*disp)
 }
 #endif
 
+static const __DRIdri2LoaderExtension dri2_loader_extension_old = {
+   .base = { __DRI_DRI2_LOADER, 2 },
+
+   .getBuffers   = dri2_x11_get_buffers,
+   .flushFrontBuffer = dri2_x11_flush_front_buffer,
+   .getBuffersWithFormat = NULL,
+};
+
+static const __DRIdri2LoaderExtension dri2_loader_extension = {
+   .base = { __DRI_DRI2_LOADER, 3 },
+
+   .getBuffers   = dri2_x11_get_buffers,
+   .flushFrontBuffer = dri2_x11_flush_front_buffer,
+   .getBuffersWithFormat = dri2_x11_get_buffers_with_format,
+};
+
 static EGLBoolean
 dri2_initialize_x11_dri2(_EGLDriver *drv, _EGLDisplay *disp)
 {
@@ -1405,22 +1421,11 @@ dri2_initialize_x11_dri2(_EGLDriver *drv, _EGLDisplay 
*disp)
if (!dri2_load_driver(disp))
   goto cleanup_fd;
 
-   if (dri2_dpy->dri2_minor >= 1) {
-  dri2_dpy->dri2_loader_extension.base.name = __DRI_DRI2_LOADER;
-  dri2_dpy->dri2_loader_extension.base.version = 3;
-  dri2_dpy->dri2_loader_extension.getBuffers = dri2_x11_get_buffers;
-  dri2_dpy->dri2_loader_extension.flushFrontBuffer = 
dri2_x11_flush_front_buffer;
-  dri2_dpy->dri2_loader_extension.getBuffersWithFormat =
-dri2_x11_get_buffers_with_format;
-   } else {
-  dri2_dpy->dri2_loader_extension.base.name = __DRI_DRI2_LOADER;
-  dri2_dpy->dri2_loader_extension.base.version = 2;
-  dri2_dpy->dri2_loader_extension.getBuffers = dri2_x11_get_buffers;
-  dri2_dpy->dri2_loader_extension.flushFrontBuffer = 
dri2_x11_flush_front_buffer;
-  dri2_dpy->dri2_loader_extension.getBuffersWithFormat = NULL;
-   }
-  
-   dri2_dpy->extensions[0] = _dpy->dri2_loader_extension.base;
+   if (dri2_dpy->dri2_minor >= 1)
+  dri2_dpy->extensions[0] = _loader_extension.base;
+   else
+  dri2_dpy->extensions[0] = _loader_extension_old.base;
+
dri2_dpy->extensions[1] = _lookup_extension.base;
dri2_dpy->extensions[2] = NULL;
 
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: only enable MSAA coverage options when we have a MSAA buffer

2016-09-16 Thread Ilia Mirkin
FTR, the new piglit test passed as-is on NVIDIA hw (at least nv50 and
nvc0). I'm not opposed to this new state dependency if Marek isn't
(he's analyzed these things a whole lot more than I suspect anyone
else), but just wanted to point it out in case the preference is to
instead change how gallium is supposed to behave.

Cheers,

  -ilia

On Thu, Sep 15, 2016 at 5:20 PM, Brian Paul  wrote:
> Regardless of whether GL_MULTISAMPLE is enabled (it's enabled by default)
> we should not set the alpha_to_coverage or alpha_to_one flags if the
> current drawing buffer does not do MSAA.
>
> This fixes the new piglit gl-1.3-alpha_to_coverage_nop test.
> ---
>  src/mesa/state_tracker/st_atom_blend.c | 9 ++---
>  src/mesa/state_tracker/st_context.c| 3 ++-
>  2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_atom_blend.c 
> b/src/mesa/state_tracker/st_atom_blend.c
> index 65de67b..67e 100644
> --- a/src/mesa/state_tracker/st_atom_blend.c
> +++ b/src/mesa/state_tracker/st_atom_blend.c
> @@ -265,9 +265,12 @@ update_blend( struct st_context *st )
>
> blend->dither = ctx->Color.DitherFlag;
>
> -   if (ctx->Multisample.Enabled) {
> -  /* unlike in gallium/d3d10 these operations are only performed
> - if msaa is enabled */
> +   if (ctx->Multisample.Enabled &&
> +   ctx->DrawBuffer &&
> +   ctx->DrawBuffer->Visual.sampleBuffers > 0) {
> +  /* Unlike in gallium/d3d10 these operations are only performed
> +   * if both msaa is enabled and we have a multisample buffer.
> +   */
>blend->alpha_to_coverage = ctx->Multisample.SampleAlphaToCoverage;
>blend->alpha_to_one = ctx->Multisample.SampleAlphaToOne;
> }
> diff --git a/src/mesa/state_tracker/st_context.c 
> b/src/mesa/state_tracker/st_context.c
> index ddc11a4..81b3387 100644
> --- a/src/mesa/state_tracker/st_context.c
> +++ b/src/mesa/state_tracker/st_context.c
> @@ -166,7 +166,8 @@ void st_invalidate_state(struct gl_context * ctx, 
> GLbitfield new_state)
> struct st_context *st = st_context(ctx);
>
> if (new_state & _NEW_BUFFERS) {
> -  st->dirty |= ST_NEW_DSA |
> +  st->dirty |= ST_NEW_BLEND |
> +   ST_NEW_DSA |
> ST_NEW_FB_STATE |
> ST_NEW_SAMPLE_MASK |
> ST_NEW_SAMPLE_SHADING |
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium: fix struct/class declaration conflicts

2016-09-16 Thread Kollarova, Martina
> Do you need someone to push it for you?

Yeah, I don't have push rights.

Thanks for the review, it didn't occur to me before to look at logs to
see what prefix is correct.

Martina
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] r600g/sb: fix struct/class declaration conflicts

2016-09-16 Thread Martina Kollarova
A couple of forward-declarations were causing warnings in clang:
'value' defined as a class here but previously declared as a struct
[-Wmismatched-tags]

Signed-off-by: Martina Kollarova 
Reviewed-by: Bas Nieuwenhuizen 
---
 src/gallium/drivers/r600/sb/sb_ir.h | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_ir.h 
b/src/gallium/drivers/r600/sb/sb_ir.h
index c612e6c..4fc4da2 100644
--- a/src/gallium/drivers/r600/sb/sb_ir.h
+++ b/src/gallium/drivers/r600/sb/sb_ir.h
@@ -263,8 +263,6 @@ public:
}
 };
 
-class value;
-
 enum value_kind {
VLK_REG,
VLK_REL_REG,
@@ -433,8 +431,6 @@ inline value_flags& operator &=(value_flags , value_flags 
r) {
return l;
 }
 
-struct value;
-
 sb_ostream& operator << (sb_ostream , value );
 
 typedef uint32_t value_hash;
@@ -467,7 +463,7 @@ enum constraint_kind {
 
 class shader;
 class sb_value_pool;
-class ra_chunk;
+struct ra_chunk;
 class ra_constraint;
 
 class value {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/9] gallium: add opcode and types for 64-bit integers. (v2)

2016-09-16 Thread Roland Scheidegger
Am 16.09.2016 um 15:48 schrieb Nicolai Hähnle:
> From: Dave Airlie 
> 
> This just adds the basic support for 64-bit opcodes,
> and the new types.
> 
> v2: add conversion opcodes.
> add documentation.
> 
> Reviewed-by: Marek Olšák 
> Reviewed-by: Nicolai Hähnle 
> Signed-off-by: Dave Airlie 
> ---
>  src/gallium/auxiliary/tgsi/tgsi_info.c |  92 +--
>  src/gallium/auxiliary/tgsi/tgsi_info.h |   4 +-
>  src/gallium/docs/source/tgsi.rst   | 246 
> +
>  src/gallium/include/pipe/p_shader_tokens.h |  46 --
>  4 files changed, 368 insertions(+), 20 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
> b/src/gallium/auxiliary/tgsi/tgsi_info.c
> index 60e0f2c..e319be1 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
> @@ -52,61 +52,61 @@ static const struct tgsi_opcode_info 
> opcode_info[TGSI_OPCODE_LAST] =
> { 1, 2, 0, 0, 0, 0, 0, COMP, "MIN", TGSI_OPCODE_MIN },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "MAX", TGSI_OPCODE_MAX },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "SLT", TGSI_OPCODE_SLT },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "SGE", TGSI_OPCODE_SGE },
> { 1, 3, 0, 0, 0, 0, 0, COMP, "MAD", TGSI_OPCODE_MAD },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "SUB", TGSI_OPCODE_SUB },
> { 1, 3, 0, 0, 0, 0, 0, COMP, "LRP", TGSI_OPCODE_LRP },
> { 1, 3, 0, 0, 0, 0, 0, COMP, "FMA", TGSI_OPCODE_FMA },
> { 1, 1, 0, 0, 0, 0, 0, REPL, "SQRT", TGSI_OPCODE_SQRT },
> { 1, 3, 0, 0, 0, 0, 0, REPL, "DP2A", TGSI_OPCODE_DP2A },
> -   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 22 },  /* removed */
> -   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 23 },  /* removed */
> +   { 1, 1, 0, 0, 0, 0, 0, COMP, "F2U64", TGSI_OPCODE_F2U64 },
> +   { 1, 1, 0, 0, 0, 0, 0, COMP, "F2I64", TGSI_OPCODE_F2I64 },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "FRC", TGSI_OPCODE_FRC },
> { 1, 3, 0, 0, 0, 0, 0, COMP, "CLAMP", TGSI_OPCODE_CLAMP },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "FLR", TGSI_OPCODE_FLR },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "ROUND", TGSI_OPCODE_ROUND },
> { 1, 1, 0, 0, 0, 0, 0, REPL, "EX2", TGSI_OPCODE_EX2 },
> { 1, 1, 0, 0, 0, 0, 0, REPL, "LG2", TGSI_OPCODE_LG2 },
> { 1, 2, 0, 0, 0, 0, 0, REPL, "POW", TGSI_OPCODE_POW },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "XPD", TGSI_OPCODE_XPD },
> -   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 32 },  /* removed */
> +   { 1, 1, 0, 0, 0, 0, 0, COMP, "I2U64", TGSI_OPCODE_I2U64 },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "ABS", TGSI_OPCODE_ABS },
> -   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 34 },  /* removed */
> +   { 1, 1, 0, 0, 0, 0, 0, COMP, "I2I64", TGSI_OPCODE_I2I64 },
> { 1, 2, 0, 0, 0, 0, 0, REPL, "DPH", TGSI_OPCODE_DPH },
> { 1, 1, 0, 0, 0, 0, 0, REPL, "COS", TGSI_OPCODE_COS },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "DDX", TGSI_OPCODE_DDX },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "DDY", TGSI_OPCODE_DDY },
> { 0, 0, 0, 0, 0, 0, 0, NONE, "KILL", TGSI_OPCODE_KILL },
> { 1, 1, 0, 0, 0, 0, 0, REPL, "PK2H", TGSI_OPCODE_PK2H },
> { 1, 1, 0, 0, 0, 0, 0, REPL, "PK2US", TGSI_OPCODE_PK2US },
> { 1, 1, 0, 0, 0, 0, 0, REPL, "PK4B", TGSI_OPCODE_PK4B },
> { 1, 1, 0, 0, 0, 0, 0, REPL, "PK4UB", TGSI_OPCODE_PK4UB },
> -   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 44 },  /* removed */
> +   { 1, 1, 0, 0, 0, 0, 1, COMP, "D2U64", TGSI_OPCODE_D2U64 },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "SEQ", TGSI_OPCODE_SEQ },
> -   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 46 },  /* removed */
> +   { 1, 1, 0, 0, 0, 0, 1, COMP, "D2I64", TGSI_OPCODE_D2I64 },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "SGT", TGSI_OPCODE_SGT },
> { 1, 1, 0, 0, 0, 0, 0, REPL, "SIN", TGSI_OPCODE_SIN },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "SLE", TGSI_OPCODE_SLE },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "SNE", TGSI_OPCODE_SNE },
> -   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 51 },  /* removed */
> +   { 1, 1, 0, 0, 0, 0, 1, COMP, "U642D", TGSI_OPCODE_U642D },
> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TEX", TGSI_OPCODE_TEX },
> { 1, 4, 1, 0, 0, 0, 0, OTHR, "TXD", TGSI_OPCODE_TXD },
> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXP", TGSI_OPCODE_TXP },
> { 1, 1, 0, 0, 0, 0, 0, CHAN, "UP2H", TGSI_OPCODE_UP2H },
> { 1, 1, 0, 0, 0, 0, 0, CHAN, "UP2US", TGSI_OPCODE_UP2US },
> { 1, 1, 0, 0, 0, 0, 0, CHAN, "UP4B", TGSI_OPCODE_UP4B },
> { 1, 1, 0, 0, 0, 0, 0, CHAN, "UP4UB", TGSI_OPCODE_UP4UB },
> -   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 59 },  /* removed */
> -   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 60 },  /* removed */
> +   { 1, 1, 0, 0, 0, 0, 1, COMP, "U642F", TGSI_OPCODE_U642F },
> +   { 1, 1, 0, 0, 0, 0, 1, COMP, "I642F", TGSI_OPCODE_I642F },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "ARR", TGSI_OPCODE_ARR },
> -   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 62 },  /* removed */
> +   { 1, 1, 0, 0, 0, 0, 1, COMP, "I642D", TGSI_OPCODE_I642D },
> { 0, 0, 0, 0, 1, 0, 0, NONE, "CAL", TGSI_OPCODE_CAL },
> { 0, 0, 0, 0, 0, 0, 0, NONE, "RET", TGSI_OPCODE_RET },
> { 1, 1, 0, 0, 0, 

Re: [Mesa-dev] [PATCH 9/9] radeonsi: enable ARB_query_buffer_object

2016-09-16 Thread Nicolai Hähnle

On 16.09.2016 15:57, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

---
 docs/features.txt  | 2 +-
 docs/relnotes/12.1.0.html  | 1 +
 src/gallium/drivers/radeonsi/si_pipe.c | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index 9850a43..8b87b08 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -194,21 +194,21 @@ GL 4.4, GLSL 4.40 -- all DONE: i965/gen8+
   GL_ARB_buffer_storage DONE (i965, nv50, 
nvc0, r600, radeonsi)
   GL_ARB_clear_texture  DONE (i965, nv50, 
nvc0, r600, radeonsi)
   GL_ARB_enhanced_layouts   DONE (i965)
   - compile-time constant expressions   DONE
   - explicit byte offsets for blocksDONE
   - forced alignment within blocks  DONE
   - specified vec4-slot component numbers   DONE (i965)
   - specified transform/feedback layout DONE
   - input/output block locationsDONE
   GL_ARB_multi_bind DONE (all drivers)
-  GL_ARB_query_buffer_objectDONE (i965/hsw+, nvc0)
+  GL_ARB_query_buffer_objectDONE (i965/hsw+, nvc0, 
radeonsi)
   GL_ARB_texture_mirror_clamp_to_edge   DONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
   GL_ARB_texture_stencil8   DONE (i965/hsw+, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
   GL_ARB_vertex_type_10f_11f_11f_revDONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)

 GL 4.5, GLSL 4.50:

   GL_ARB_ES3_1_compatibilityDONE (i965/hsw+, nvc0, 
radeonsi)
   GL_ARB_clip_control   DONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
   GL_ARB_conditional_render_invertedDONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
   GL_ARB_cull_distance  DONE (i965, nv50, 
nvc0, radeonsi, llvmpipe, softpipe, swr)
diff --git a/docs/relnotes/12.1.0.html b/docs/relnotes/12.1.0.html
index 8e0a84e..5a45a1d 100644
--- a/docs/relnotes/12.1.0.html
+++ b/docs/relnotes/12.1.0.html
@@ -44,20 +44,21 @@ Note: some of the new features are only available with 
certain drivers.
 

 
 OpenGL ES 3.1 on i965/hsw
 GL_ARB_ES3_1_compatibility on i965
 GL_ARB_ES3_2_compatibility on i965/gen8+
 GL_ARB_clear_texture on r600, radeonsi
 GL_ARB_cull_distance on radeonsi
 GL_ARB_enhanced_layouts on i965
 GL_ARB_indirect_parameters on radeonsi
+GL_ARB_query_buffer_object on radeonsi
 GL_ARB_shader_draw_parameters on radeonsi
 GL_ARB_shader_group_vote on nvc0
 GL_ARB_stencil_texturing on i965/hsw
 GL_ARB_texture_stencil8 on i965/hsw
 GL_EXT_window_rectangles on nv50, nvc0
 GL_KHR_blend_equation_advanced on i965
 GL_KHR_texture_compression_astc_sliced_3d on i965
 GL_OES_copy_image on nv50, nvc0, r600, radeonsi, softpipe, llvmpipe
 GL_OES_geometry_shader on i965/gen8+, nvc0, radeonsi
 GL_OES_primitive_bounding_box on i965/gen7+, nvc0, radeonsi
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 730be9d..84f9796 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -393,20 +393,21 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_SURFACE_REINTERPRET_BLOCKS:
case PIPE_CAP_QUERY_MEMORY_INFO:
case PIPE_CAP_TGSI_PACK_HALF_FLOAT:
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
case PIPE_CAP_GENERATE_MIPMAP:
case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED:
case PIPE_CAP_STRING_MARKER:
case PIPE_CAP_CLEAR_TEXTURE:
case PIPE_CAP_CULL_DISTANCE:
+   case PIPE_CAP_QUERY_BUFFER_OBJECT:


Actually, this needs to be gated the same as ARB_compute_shader, the 
expression for that is in si_get_shader_param.


Nicolai


return 1;

case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
return !SI_BIG_ENDIAN && sscreen->b.info.has_userptr;

case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
return (sscreen->b.info.drm_major == 2 &&
sscreen->b.info.drm_minor >= 43) ||
   sscreen->b.info.drm_major == 3;

@@ -441,21 +442,20 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_BUFFER_SAMPLER_VIEW_RGBA_ONLY:
return 0;

/* Unsupported features. */
case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS:
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_FAKE_SW_MSAA:
case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
case 

[Mesa-dev] [PATCH] glsl: optimize copy_propagation_elements pass

2016-09-16 Thread Tapani Pälli
Changes make copy_propagation_elements pass faster, reducing link
time spent in test case of bug 94477. Does not fix the actual issue
but brings down the total time. No regressions seen in CI.

Signed-off-by: Tapani Pälli 
---

For performance measurements, Martina reported in the bug 8x speedup
to the test case shader link time when using this patch together with
commit 2cd02e30d2e1677762d34f1831b8e609970ef0f3

 .../glsl/opt_copy_propagation_elements.cpp | 187 -
 1 file changed, 145 insertions(+), 42 deletions(-)

diff --git a/src/compiler/glsl/opt_copy_propagation_elements.cpp 
b/src/compiler/glsl/opt_copy_propagation_elements.cpp
index e4237cc..1c5060a 100644
--- a/src/compiler/glsl/opt_copy_propagation_elements.cpp
+++ b/src/compiler/glsl/opt_copy_propagation_elements.cpp
@@ -46,6 +46,7 @@
 #include "ir_basic_block.h"
 #include "ir_optimization.h"
 #include "compiler/glsl_types.h"
+#include "util/hash_table.h"
 
 static bool debug = false;
 
@@ -76,6 +77,18 @@ public:
int swizzle[4];
 };
 
+/* Class that refers to acp_entry in another exec_list. Used
+ * when making removals based on rhs.
+ */
+class acp_ref : public exec_node
+{
+public:
+   acp_ref(acp_entry *e)
+   {
+  entry = e;
+   }
+   acp_entry *entry;
+};
 
 class kill_entry : public exec_node
 {
@@ -98,14 +111,42 @@ public:
   this->killed_all = false;
   this->mem_ctx = ralloc_context(NULL);
   this->shader_mem_ctx = NULL;
-  this->acp = new(mem_ctx) exec_list;
   this->kills = new(mem_ctx) exec_list;
+
+  create_acp();
}
~ir_copy_propagation_elements_visitor()
{
   ralloc_free(mem_ctx);
}
 
+   void create_acp()
+   {
+  lhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+  rhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+   }
+
+   void destroy_acp()
+   {
+  _mesa_hash_table_destroy(lhs_ht, NULL);
+  _mesa_hash_table_destroy(rhs_ht, NULL);
+   }
+
+   void populate_acp(hash_table *lhs, hash_table *rhs)
+   {
+  struct hash_entry *entry;
+  hash_table_foreach(lhs, entry)
+  {
+ _mesa_hash_table_insert(lhs_ht, entry->key, entry->data);
+  }
+  hash_table_foreach(rhs, entry)
+  {
+ _mesa_hash_table_insert(rhs_ht, entry->key, entry->data);
+  }
+   }
+
void handle_loop(ir_loop *, bool keep_acp);
virtual ir_visitor_status visit_enter(class ir_loop *);
virtual ir_visitor_status visit_enter(class ir_function_signature *);
@@ -120,8 +161,10 @@ public:
void kill(kill_entry *k);
void handle_if_block(exec_list *instructions);
 
-   /** List of acp_entry: The available copies to propagate */
-   exec_list *acp;
+   /** Hash of acp_entry: The available copies to propagate */
+   hash_table *lhs_ht;
+   hash_table *rhs_ht;
+
/**
 * List of kill_entry: The variables whose values were killed in this
 * block.
@@ -147,23 +190,29 @@ 
ir_copy_propagation_elements_visitor::visit_enter(ir_function_signature *ir)
 * block.  Any instructions at global scope will be shuffled into
 * main() at link time, so they're irrelevant to us.
 */
-   exec_list *orig_acp = this->acp;
exec_list *orig_kills = this->kills;
bool orig_killed_all = this->killed_all;
 
-   this->acp = new(mem_ctx) exec_list;
+   hash_table *orig_lhs_ht = lhs_ht;
+   hash_table *orig_rhs_ht = rhs_ht;
+
this->kills = new(mem_ctx) exec_list;
this->killed_all = false;
 
+   create_acp();
+
visit_list_elements(this, >body);
 
-   ralloc_free(this->acp);
ralloc_free(this->kills);
 
+   destroy_acp();
+
this->kills = orig_kills;
-   this->acp = orig_acp;
this->killed_all = orig_killed_all;
 
+   lhs_ht = orig_lhs_ht;
+   rhs_ht = orig_rhs_ht;
+
return visit_continue_with_parent;
 }
 
@@ -249,17 +298,19 @@ 
ir_copy_propagation_elements_visitor::handle_rvalue(ir_rvalue **ir)
/* Try to find ACP entries covering swizzle_chan[], hoping they're
 * the same source variable.
 */
-   foreach_in_list(acp_entry, entry, this->acp) {
-  if (var == entry->lhs) {
-for (int c = 0; c < chans; c++) {
-   if (entry->write_mask & (1 << swizzle_chan[c])) {
-  source[c] = entry->rhs;
-  source_chan[c] = entry->swizzle[swizzle_chan[c]];
+   hash_entry *ht_entry = _mesa_hash_table_search(lhs_ht, var);
+   if (ht_entry) {
+  exec_list *ht_list = (exec_list *) ht_entry->data;
+  foreach_in_list(acp_entry, entry, ht_list) {
+ for (int c = 0; c < chans; c++) {
+if (entry->write_mask & (1 << swizzle_chan[c])) {
+   source[c] = entry->rhs;
+   source_chan[c] = entry->swizzle[swizzle_chan[c]];
 
if (source_chan[c] != swizzle_chan[c])
   noop_swizzle = false;
-   }
-}
+   }
+ 

Re: [Mesa-dev] [PATCH 8/9] gallium/radeon: implement get_query_result_resource

2016-09-16 Thread Gustaw Smolarczyk
2016-09-16 15:57 GMT+02:00 Nicolai Hähnle :
> From: Nicolai Hähnle 
>
> ---
>  src/gallium/drivers/radeon/r600_pipe_common.c |   3 +
>  src/gallium/drivers/radeon/r600_pipe_common.h |   2 +
>  src/gallium/drivers/radeon/r600_query.c   | 391 
> +-
>  src/gallium/drivers/radeon/r600_query.h   |   7 +
>  4 files changed, 402 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
> b/src/gallium/drivers/radeon/r600_pipe_common.c
> index 2af4b41..0d2dd6b 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
> @@ -566,20 +566,23 @@ void r600_common_context_cleanup(struct 
> r600_common_context *rctx)
> assert(!rctx->dcc_stats[i].query_active);
>
> for (j = 0; j < ARRAY_SIZE(rctx->dcc_stats[i].ps_stats); j++)
> if (rctx->dcc_stats[i].ps_stats[j])
> rctx->b.destroy_query(>b,
>   
> rctx->dcc_stats[i].ps_stats[j]);
>
> r600_texture_reference(>dcc_stats[i].tex, NULL);
> }
>
> +   if (rctx->query_result_shader)
> +   rctx->b.delete_compute_state(>b, 
> rctx->query_result_shader);
> +
> if (rctx->gfx.cs)
> rctx->ws->cs_destroy(rctx->gfx.cs);
> if (rctx->dma.cs)
> rctx->ws->cs_destroy(rctx->dma.cs);
> if (rctx->ctx)
> rctx->ws->ctx_destroy(rctx->ctx);
>
> if (rctx->uploader) {
> u_upload_destroy(rctx->uploader);
> }
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
> b/src/gallium/drivers/radeon/r600_pipe_common.h
> index 32acca5..f23f1c4 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.h
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
> @@ -614,20 +614,22 @@ struct r600_common_context {
> boolquery_active;
> } dcc_stats[5];
>
> /* The list of all texture buffer objects in this context.
>  * This list is walked when a buffer is invalidated/reallocated and
>  * the GPU addresses are updated. */
> struct list_headtexture_buffers;
>
> struct pipe_debug_callback  debug;
>
> +   void*query_result_shader;
> +
> /* Copy one resource to another using async DMA. */
> void (*dma_copy)(struct pipe_context *ctx,
>  struct pipe_resource *dst,
>  unsigned dst_level,
>  unsigned dst_x, unsigned dst_y, unsigned dst_z,
>  struct pipe_resource *src,
>  unsigned src_level,
>  const struct pipe_box *src_box);
>
> void (*clear_buffer)(struct pipe_context *ctx, struct pipe_resource 
> *dst,
> diff --git a/src/gallium/drivers/radeon/r600_query.c 
> b/src/gallium/drivers/radeon/r600_query.c
> index d96f9fc..8aa1c888 100644
> --- a/src/gallium/drivers/radeon/r600_query.c
> +++ b/src/gallium/drivers/radeon/r600_query.c
> @@ -18,20 +18,23 @@
>   * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
>   * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
>   * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>   * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
>   * USE OR OTHER DEALINGS IN THE SOFTWARE.
>   */
>
>  #include "r600_query.h"
>  #include "r600_cs.h"
>  #include "util/u_memory.h"
> +#include "util/u_upload_mgr.h"
> +
> +#include "tgsi/tgsi_text.h"
>
>  struct r600_hw_query_params {
> unsigned start_offset;
> unsigned end_offset;
> unsigned fence_offset;
> unsigned pair_stride;
> unsigned pair_count;
>  };
>
>  /* Queries without buffer handling or suspend/resume. */
> @@ -275,25 +278,27 @@ static bool r600_query_sw_get_result(struct 
> r600_common_context *rctx,
> break;
> case R600_QUERY_CURRENT_GPU_SCLK:
> case R600_QUERY_CURRENT_GPU_MCLK:
> result->u64 *= 100;
> break;
> }
>
> return true;
>  }
>
> +
>  static struct r600_query_ops sw_query_ops = {
> .destroy = r600_query_sw_destroy,
> .begin = r600_query_sw_begin,
> .end = r600_query_sw_end,
> -   .get_result = r600_query_sw_get_result
> +   .get_result = r600_query_sw_get_result,
> +   .get_result_resource = NULL
>  };
>
>  static struct pipe_query *r600_query_sw_create(struct pipe_context *ctx,
>unsigned query_type)
>  {
> struct r600_query_sw *query;
>
> query = CALLOC_STRUCT(r600_query_sw);
> if (!query)
> return NULL;
> @@ -373,25 +378,34 @@ static bool 

Re: [Mesa-dev] [PATCH] gallium: fix struct/class declaration conflicts

2016-09-16 Thread Bas Nieuwenhuizen
I don't think the "gallium:" commit message prefix is correct here.
Looking at the logs it should be "r600g/sb:".

With that change:

Reviewed-by: Bas Nieuwenhuizen 

Do you need someone to push it for you?

- Bas

On Fri, Sep 16, 2016 at 4:58 PM, Martina Kollarova
 wrote:
> A couple of forward-declarations were causing warnings in clang:
> 'value' defined as a class here but previously declared as a struct
> [-Wmismatched-tags]
>
> Signed-off-by: Martina Kollarova 
> ---
>  src/gallium/drivers/r600/sb/sb_ir.h | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/sb/sb_ir.h 
> b/src/gallium/drivers/r600/sb/sb_ir.h
> index c612e6c..4fc4da2 100644
> --- a/src/gallium/drivers/r600/sb/sb_ir.h
> +++ b/src/gallium/drivers/r600/sb/sb_ir.h
> @@ -263,8 +263,6 @@ public:
> }
>  };
>
> -class value;
> -
>  enum value_kind {
> VLK_REG,
> VLK_REL_REG,
> @@ -433,8 +431,6 @@ inline value_flags& operator &=(value_flags , 
> value_flags r) {
> return l;
>  }
>
> -struct value;
> -
>  sb_ostream& operator << (sb_ostream , value );
>
>  typedef uint32_t value_hash;
> @@ -467,7 +463,7 @@ enum constraint_kind {
>
>  class shader;
>  class sb_value_pool;
> -class ra_chunk;
> +struct ra_chunk;
>  class ra_constraint;
>
>  class value {
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/10] nir: Add a loop analysis pass

2016-09-16 Thread Erik Faye-Lund
On Thu, Sep 15, 2016 at 9:03 AM, Timothy Arceri
 wrote:
> +   const int bias[] = { -1, 1, 1 };
> +
> +   for (unsigned i = 0; i < ARRAY_SIZE(bias); i++) {
> +  iter_int = iter_int + bias[i];
> +
> +  switch (cond_op) {
> +  case nir_op_ige:
> +  case nir_op_ilt:
> +  case nir_op_ieq:
> +  case nir_op_ine:
> + if (itest_interations(iter_int, step, limit, cond_op, initial_val,
> +   limit_rhs)) {
> +return iter_int;
> + }
> + break;
> +  case nir_op_uge:
> +  case nir_op_ult:
> + if (utest_interations(iter_int, step, limit, cond_op,
> +   (uint32_t) initial_val, limit_rhs)) {
> +return iter_int;
> + }
> + break;
> +  default:
> + return -1;
> +  }
> +   }

Can't this be easier written as:

for (int i = iter_int - 1; i <= iter_int + 1; ++i)
{
switch (cond_op) {
[...]
if (itest_interations(i, step, limit, cond_op, initial_val, limit_rhs))
 return i;
[...]
?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium: fix struct/class declaration conflicts

2016-09-16 Thread Martina Kollarova
A couple of forward-declarations were causing warnings in clang:
'value' defined as a class here but previously declared as a struct
[-Wmismatched-tags]

Signed-off-by: Martina Kollarova 
---
 src/gallium/drivers/r600/sb/sb_ir.h | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_ir.h 
b/src/gallium/drivers/r600/sb/sb_ir.h
index c612e6c..4fc4da2 100644
--- a/src/gallium/drivers/r600/sb/sb_ir.h
+++ b/src/gallium/drivers/r600/sb/sb_ir.h
@@ -263,8 +263,6 @@ public:
}
 };
 
-class value;
-
 enum value_kind {
VLK_REG,
VLK_REL_REG,
@@ -433,8 +431,6 @@ inline value_flags& operator &=(value_flags , value_flags 
r) {
return l;
 }
 
-struct value;
-
 sb_ostream& operator << (sb_ostream , value );
 
 typedef uint32_t value_hash;
@@ -467,7 +463,7 @@ enum constraint_kind {
 
 class shader;
 class sb_value_pool;
-class ra_chunk;
+struct ra_chunk;
 class ra_constraint;
 
 class value {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/10] nir: add helpers to check if we can unroll loops

2016-09-16 Thread Erik Faye-Lund
On Thu, Sep 15, 2016 at 9:03 AM, Timothy Arceri
 wrote:
> This will be used by the loop unroll and lcssa passes.
>
> V2:
> - Check instruction count is not too large for unrolling
> - Add helper for complex loop unrolling
> ---
>  src/compiler/nir/nir.h | 31 +++
>  1 file changed, 31 insertions(+)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 49e8cd8..3a2a13a 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2590,6 +2590,37 @@ bool nir_normalize_cubemap_coords(nir_shader *shader);
>
>  void nir_live_ssa_defs_impl(nir_function_impl *impl);
>
> +static inline bool
> +is_loop_small_enough_to_unroll(nir_shader *shader, nir_loop_info *li)
> +{
> +   unsigned max_iter = shader->options->max_unroll_iterations;
> +
> +   if (li->trip_count > max_iter)
> +  return false;
> +
> +   if (li->force_unroll)
> +  return true;
> +
> +   bool loop_not_too_large =
> +  li->num_instructions * li->trip_count <= max_iter * 25;


"max_iter * 25" seems like a pretty arbirary limit at first glance.
How was it found? Perhaps a comment explaining a bit could be added?

> +static inline bool
> +is_complex_loop(nir_shader *shader, nir_loop_info *li)
> +{
> +   unsigned num_lt = list_length(>loop_terminator_list);
> +   return is_loop_small_enough_to_unroll(shader, li) && num_lt == 2;

Perhaps you could add a comment to explain the "num_lt == 2"-part?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: only enable MSAA coverage options when we have a MSAA buffer

2016-09-16 Thread Brian Paul

On 09/16/2016 08:07 AM, Marek Olšák wrote:

On Thu, Sep 15, 2016 at 11:20 PM, Brian Paul  wrote:

Regardless of whether GL_MULTISAMPLE is enabled (it's enabled by default)
we should not set the alpha_to_coverage or alpha_to_one flags if the
current drawing buffer does not do MSAA.

This fixes the new piglit gl-1.3-alpha_to_coverage_nop test.
---
  src/mesa/state_tracker/st_atom_blend.c | 9 ++---
  src/mesa/state_tracker/st_context.c| 3 ++-
  2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/mesa/state_tracker/st_atom_blend.c 
b/src/mesa/state_tracker/st_atom_blend.c
index 65de67b..67e 100644
--- a/src/mesa/state_tracker/st_atom_blend.c
+++ b/src/mesa/state_tracker/st_atom_blend.c
@@ -265,9 +265,12 @@ update_blend( struct st_context *st )

 blend->dither = ctx->Color.DitherFlag;

-   if (ctx->Multisample.Enabled) {
-  /* unlike in gallium/d3d10 these operations are only performed
- if msaa is enabled */
+   if (ctx->Multisample.Enabled &&
+   ctx->DrawBuffer &&


Is it possible for ctx->DrawBuffer to be NULL?


Probably not, but I'm not 100% sure.  I have some memory of an extension 
to allow MakeCurrent(ctx!=null, fb==null) but I can't find it now.  Or 
maybe MakeCurrent(ctx!=null, fb==null) is supposed to be generally 
supported now.  I don't remember and will have to look.


I think the use case for MakeCurrent(ctx!=null, fb==null) is to have a 
context just to compile shaders, etc.


Actually, looking again now, I found the IncompleteFramebuffer object 
and the _mesa_get_incomplete_framebuffer() function.  So it looks like 
that should be used to prevent the null pointer.


And we're not checking for DrawBuffer==NULL elsewhere.  I can remove the 
check.






+   ctx->DrawBuffer->Visual.sampleBuffers > 0) {
+  /* Unlike in gallium/d3d10 these operations are only performed
+   * if both msaa is enabled and we have a multisample buffer.
+   */
blend->alpha_to_coverage = ctx->Multisample.SampleAlphaToCoverage;
blend->alpha_to_one = ctx->Multisample.SampleAlphaToOne;
 }
diff --git a/src/mesa/state_tracker/st_context.c 
b/src/mesa/state_tracker/st_context.c
index ddc11a4..81b3387 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -166,7 +166,8 @@ void st_invalidate_state(struct gl_context * ctx, 
GLbitfield new_state)
 struct st_context *st = st_context(ctx);

 if (new_state & _NEW_BUFFERS) {
-  st->dirty |= ST_NEW_DSA |
+  st->dirty |= ST_NEW_BLEND |
+   ST_NEW_DSA |
 ST_NEW_FB_STATE |
 ST_NEW_SAMPLE_MASK |
 ST_NEW_SAMPLE_SHADING |


I guess it's OK to add a dependency on _NEW_BUFFERS, because that flag
is set the least often.

Reviewed-by: Marek Olšák 


Thanks!

PS: I'm also updating the check-in comment with remarks about how I 
stumbled across this in ETQW.


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vl/dri3: handle the case of different GPU(v4.1)

2016-09-16 Thread Leo Liu

This Patch is Reviewed-by: Leo Liu 

On 09/16/2016 08:51 AM, Nayan Deshmukh wrote:

In case of prime when rendering is done on GPU other then the
server GPU, use a seprate linear buffer for each back buffer
which will be displayed using present extension.

v2: Use a seprate linear buffer for each back buffer (Michel)
v3: Change variable names and fix coding style (Leo and Emil)
v4: Use PIPE_BIND_SAMPLER_VIEW for back buffer in case when
 a seprate linear buffer is used (Michel)
v4.1: remove empty line

Signed-off-by: Nayan Deshmukh 
---
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 61 ---
  1 file changed, 48 insertions(+), 13 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index 3d596a6..e0aaad8 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -49,6 +49,7 @@
  struct vl_dri3_buffer
  {
 struct pipe_resource *texture;
+   struct pipe_resource *linear_texture;
  
 uint32_t pixmap;

 uint32_t sync_fence;
@@ -69,6 +70,8 @@ struct vl_dri3_screen
 xcb_present_event_t eid;
 xcb_special_event_t *special_event;
  
+   struct pipe_context *pipe;

+
 struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM];
 int cur_back;
  
@@ -82,6 +85,7 @@ struct vl_dri3_screen

 int64_t last_ust, ns_frame, last_msc, next_msc;
  
 bool flushed;

+   bool is_different_gpu;
  };
  
  static void

@@ -102,6 +106,8 @@ dri3_free_back_buffer(struct vl_dri3_screen *scrn,
 xcb_sync_destroy_fence(scrn->conn, buffer->sync_fence);
 xshmfence_unmap_shm(buffer->shm_fence);
 pipe_resource_reference(>texture, NULL);
+   if (buffer->linear_texture)
+   pipe_resource_reference(>linear_texture, NULL);
 FREE(buffer);
  }
  
@@ -209,7 +215,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)

 xcb_sync_fence_t sync_fence;
 struct xshmfence *shm_fence;
 int buffer_fd, fence_fd;
-   struct pipe_resource templ;
+   struct pipe_resource templ, *pixmap_buffer_texture;
 struct winsys_handle whandle;
 unsigned usage;
  
@@ -226,8 +232,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)

goto close_fd;
  
 memset(, 0, sizeof(templ));

-   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
-PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW;
 templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
 templ.target = PIPE_TEXTURE_2D;
 templ.last_level = 0;
@@ -235,16 +240,34 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
 templ.height0 = scrn->height;
 templ.depth0 = 1;
 templ.array_size = 1;
-   buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
- );
-   if (!buffer->texture)
-  goto unmap_shm;
  
+   if (scrn->is_different_gpu) {

+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+);
+  if (!buffer->texture)
+ goto unmap_shm;
+
+  templ.bind |= PIPE_BIND_SCANOUT | PIPE_BIND_SHARED |
+PIPE_BIND_LINEAR;
+  buffer->linear_texture = 
scrn->base.pscreen->resource_create(scrn->base.pscreen,
+  );
+  pixmap_buffer_texture = buffer->linear_texture;
+
+  if (!buffer->linear_texture)
+ goto no_linear_texture;
+   } else {
+  templ.bind |= PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+);
+  if (!buffer->texture)
+ goto unmap_shm;
+  pixmap_buffer_texture = buffer->texture;
+   }
 memset(, 0, sizeof(whandle));
 whandle.type= DRM_API_HANDLE_TYPE_FD;
 usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ;
 scrn->base.pscreen->resource_get_handle(scrn->base.pscreen, NULL,
-   buffer->texture, ,
+   pixmap_buffer_texture, ,
 usage);
 buffer_fd = whandle.handle;
 buffer->pitch = whandle.stride;
@@ -271,6 +294,8 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
  
 return buffer;
  
+no_linear_texture:

+   pipe_resource_reference(>texture, NULL);
  unmap_shm:
 xshmfence_unmap_shm(shm_fence);
  close_fd:
@@ -474,6 +499,7 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
 struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)context_private;
 uint32_t options = XCB_PRESENT_OPTION_NONE;
 struct vl_dri3_buffer *back;
+   struct pipe_box src_box;
  
 back = scrn->back_buffers[scrn->cur_back];

 if (!back)
@@ -485,6 +511,16 @@ vl_dri3_flush_frontbuffer(struct 

Re: [Mesa-dev] [PATCH] st/mesa: only enable MSAA coverage options when we have a MSAA buffer

2016-09-16 Thread Marek Olšák
On Thu, Sep 15, 2016 at 11:20 PM, Brian Paul  wrote:
> Regardless of whether GL_MULTISAMPLE is enabled (it's enabled by default)
> we should not set the alpha_to_coverage or alpha_to_one flags if the
> current drawing buffer does not do MSAA.
>
> This fixes the new piglit gl-1.3-alpha_to_coverage_nop test.
> ---
>  src/mesa/state_tracker/st_atom_blend.c | 9 ++---
>  src/mesa/state_tracker/st_context.c| 3 ++-
>  2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_atom_blend.c 
> b/src/mesa/state_tracker/st_atom_blend.c
> index 65de67b..67e 100644
> --- a/src/mesa/state_tracker/st_atom_blend.c
> +++ b/src/mesa/state_tracker/st_atom_blend.c
> @@ -265,9 +265,12 @@ update_blend( struct st_context *st )
>
> blend->dither = ctx->Color.DitherFlag;
>
> -   if (ctx->Multisample.Enabled) {
> -  /* unlike in gallium/d3d10 these operations are only performed
> - if msaa is enabled */
> +   if (ctx->Multisample.Enabled &&
> +   ctx->DrawBuffer &&

Is it possible for ctx->DrawBuffer to be NULL?

> +   ctx->DrawBuffer->Visual.sampleBuffers > 0) {
> +  /* Unlike in gallium/d3d10 these operations are only performed
> +   * if both msaa is enabled and we have a multisample buffer.
> +   */
>blend->alpha_to_coverage = ctx->Multisample.SampleAlphaToCoverage;
>blend->alpha_to_one = ctx->Multisample.SampleAlphaToOne;
> }
> diff --git a/src/mesa/state_tracker/st_context.c 
> b/src/mesa/state_tracker/st_context.c
> index ddc11a4..81b3387 100644
> --- a/src/mesa/state_tracker/st_context.c
> +++ b/src/mesa/state_tracker/st_context.c
> @@ -166,7 +166,8 @@ void st_invalidate_state(struct gl_context * ctx, 
> GLbitfield new_state)
> struct st_context *st = st_context(ctx);
>
> if (new_state & _NEW_BUFFERS) {
> -  st->dirty |= ST_NEW_DSA |
> +  st->dirty |= ST_NEW_BLEND |
> +   ST_NEW_DSA |
> ST_NEW_FB_STATE |
> ST_NEW_SAMPLE_MASK |
> ST_NEW_SAMPLE_SHADING |

I guess it's OK to add a dependency on _NEW_BUFFERS, because that flag
is set the least often.

Reviewed-by: Marek Olšák 

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] gallium/radeon: add r600_gfx_{write, wait}_fence

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

For bottom-of-pipe fences inside the gfx command stream.
---
 src/gallium/drivers/radeon/r600_pipe_common.c | 52 +++
 src/gallium/drivers/radeon/r600_pipe_common.h |  5 +++
 src/gallium/drivers/radeonsi/si_perfcounter.c | 41 ++---
 3 files changed, 60 insertions(+), 38 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index f0fdc9b..2af4b41 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -73,20 +73,72 @@ void radeon_shader_binary_clean(struct radeon_shader_binary 
*b)
FREE(b->global_symbol_offsets);
FREE(b->relocs);
FREE(b->disasm_string);
FREE(b->llvm_ir_string);
 }
 
 /*
  * pipe_context
  */
 
+void r600_gfx_write_fence(struct r600_common_context *ctx,
+ uint64_t va, uint32_t old_value, uint32_t new_value)
+{
+   struct radeon_winsys_cs *cs = ctx->gfx.cs;
+
+   if (ctx->chip_class == CIK) {
+   /* Two EOP events are required to make all engines go idle
+* (and optional cache flushes executed) before the timestamp
+* is written.
+*/
+   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
+   radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_BOTTOM_OF_PIPE_TS) |
+   EVENT_INDEX(5));
+   radeon_emit(cs, va);
+   radeon_emit(cs, (va >> 32) | EOP_DATA_SEL(1));
+   radeon_emit(cs, old_value); /* immediate data */
+   radeon_emit(cs, 0); /* unused */
+   }
+
+   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
+   radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_BOTTOM_OF_PIPE_TS) |
+   EVENT_INDEX(5));
+   radeon_emit(cs, va);
+   radeon_emit(cs, (va >> 32) | EOP_DATA_SEL(1));
+   radeon_emit(cs, new_value); /* immediate data */
+   radeon_emit(cs, 0); /* unused */
+}
+
+unsigned r600_gfx_write_fence_dwords(struct r600_common_screen *screen)
+{
+   unsigned dwords = 6;
+
+   if (screen->chip_class == CIK)
+   dwords *= 2;
+
+   return dwords;
+}
+
+void r600_gfx_wait_fence(struct r600_common_context *ctx,
+uint64_t va, uint32_t ref, uint32_t mask)
+{
+   struct radeon_winsys_cs *cs = ctx->gfx.cs;
+
+   radeon_emit(cs, PKT3(PKT3_WAIT_REG_MEM, 5, 0));
+   radeon_emit(cs, WAIT_REG_MEM_EQUAL | WAIT_REG_MEM_MEM_SPACE(1));
+   radeon_emit(cs, va);
+   radeon_emit(cs, va >> 32);
+   radeon_emit(cs, ref); /* reference value */
+   radeon_emit(cs, mask); /* mask */
+   radeon_emit(cs, 4); /* poll interval */
+}
+
 void r600_draw_rectangle(struct blitter_context *blitter,
 int x1, int y1, int x2, int y2, float depth,
 enum blitter_attrib_type type,
 const union pipe_color_union *attrib)
 {
struct r600_common_context *rctx =
(struct r600_common_context*)util_blitter_get_pipe(blitter);
struct pipe_viewport_state viewport;
struct pipe_resource *buf = NULL;
unsigned offset = 0;
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index dd33eab..96b23b2 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -688,20 +688,25 @@ struct pipe_resource * r600_aligned_buffer_create(struct 
pipe_screen *screen,
  unsigned alignment);
 struct pipe_resource *
 r600_buffer_from_user_memory(struct pipe_screen *screen,
 const struct pipe_resource *templ,
 void *user_memory);
 void
 r600_invalidate_resource(struct pipe_context *ctx,
 struct pipe_resource *resource);
 
 /* r600_common_pipe.c */
+void r600_gfx_write_fence(struct r600_common_context *ctx,
+ uint64_t va, uint32_t old_value, uint32_t new_value);
+unsigned r600_gfx_write_fence_dwords(struct r600_common_screen *screen);
+void r600_gfx_wait_fence(struct r600_common_context *ctx,
+uint64_t va, uint32_t ref, uint32_t mask);
 void r600_draw_rectangle(struct blitter_context *blitter,
 int x1, int y1, int x2, int y2, float depth,
 enum blitter_attrib_type type,
 const union pipe_color_union *attrib);
 bool r600_common_screen_init(struct r600_common_screen *rscreen,
 struct radeon_winsys *ws);
 void r600_destroy_common_screen(struct r600_common_screen *rscreen);
 void r600_preflush_suspend_features(struct r600_common_context *ctx);
 void r600_postflush_resume_features(struct r600_common_context *ctx);
 bool r600_common_context_init(struct 

[Mesa-dev] [PATCH 1/9] gallium/radeon: add barrier_flags to r600_common_screen

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

There are driver-specific context flags for barriers that are not covered
by the Gallium barrier interfaces.

The R600 settings of these flags may not be optimal, but we're not going
to use them yet anyway.
---
 src/gallium/drivers/r600/r600_pipe.c  |  6 ++
 src/gallium/drivers/radeon/r600_pipe_common.h | 12 
 src/gallium/drivers/radeonsi/si_pipe.c|  5 +
 3 files changed, 23 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index c09821d..0799ba2 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -673,20 +673,26 @@ struct pipe_screen *r600_screen_create(struct 
radeon_winsys *ws)
rscreen->has_compressed_msaa_texturing = true;
break;
default:
rscreen->has_msaa = FALSE;
rscreen->has_compressed_msaa_texturing = false;
}
 
rscreen->b.has_cp_dma = rscreen->b.info.drm_minor >= 27 &&
  !(rscreen->b.debug_flags & DBG_NO_CP_DMA);
 
+   rscreen->b.barrier_flags.cp_to_L2 =
+   R600_CONTEXT_INV_VERTEX_CACHE |
+   R600_CONTEXT_INV_TEX_CACHE |
+   R600_CONTEXT_INV_CONST_CACHE;
+   rscreen->b.barrier_flags.compute_to_L2 = R600_CONTEXT_PS_PARTIAL_FLUSH;
+
rscreen->global_pool = compute_memory_pool_new(rscreen);
 
/* Create the auxiliary context. This must be done last. */
rscreen->b.aux_context = rscreen->b.b.context_create(>b.b, 
NULL, 0);
 
 #if 0 /* This is for testing whether aux_context and buffer clearing work 
correctly. */
struct pipe_resource templ = {};
 
templ.width0 = 4;
templ.height0 = 2048;
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index d9f22e4..dd33eab 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -408,20 +408,32 @@ struct r600_common_screen {
 * contexts' compressed texture binding masks.
 */
unsignedcompressed_colortex_counter;
 
/* Atomically increment this counter when an existing texture's
 * backing buffer or tile mode parameters have changed that requires
 * recomputation of shader descriptors.
 */
unsigneddirty_tex_descriptor_counter;
 
+   struct {
+   /* Context flags to set so that all writes from earlier jobs
+* in the CP are seen by L2 clients.
+*/
+   unsigned cp_to_L2;
+
+   /* Context flags to set so that all writes from earlier
+* compute jobs are seen by L2 clients.
+*/
+   unsigned compute_to_L2;
+   } barrier_flags;
+
void (*query_opaque_metadata)(struct r600_common_screen *rscreen,
  struct r600_texture *rtex,
  struct radeon_bo_metadata *md);
 
void (*apply_opaque_metadata)(struct r600_common_screen *rscreen,
struct r600_texture *rtex,
struct radeon_bo_metadata *md);
 };
 
 /* This encapsulates a state or an operation which can emitted into the GPU
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 8f9e6f5..730be9d 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -811,20 +811,25 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws)
 sscreen->b.info.pfp_fw_version >= 121 &&
 sscreen->b.info.me_fw_version >= 87);
 
sscreen->b.has_cp_dma = true;
sscreen->b.has_streamout = true;
pipe_mutex_init(sscreen->shader_parts_mutex);
sscreen->use_monolithic_shaders =
HAVE_LLVM < 0x0308 ||
(sscreen->b.debug_flags & DBG_MONOLITHIC_SHADERS) != 0;
 
+   sscreen->b.barrier_flags.cp_to_L2 = SI_CONTEXT_INV_SMEM_L1 |
+   SI_CONTEXT_INV_VMEM_L1 |
+   SI_CONTEXT_INV_GLOBAL_L2;
+   sscreen->b.barrier_flags.compute_to_L2 = SI_CONTEXT_CS_PARTIAL_FLUSH;
+
if (debug_get_bool_option("RADEON_DUMP_SHADERS", false))
sscreen->b.debug_flags |= DBG_FS | DBG_VS | DBG_GS | DBG_PS | 
DBG_CS;
 
/* Only enable as many threads as we have target machines and CPUs. */
num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
num_compiler_threads = MIN2(num_cpus, ARRAY_SIZE(sscreen->tm));
 
for (i = 0; i < num_compiler_threads; i++)
sscreen->tm[i] = si_create_llvm_target_machine(sscreen);
 
-- 
2.7.4

___
mesa-dev mailing list

[Mesa-dev] [PATCH 5/9] gallium/radeon: add query fences and r600_get_hw_query_params

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

We will support the waiting option in ARB_query_buffer_object using
WAIT_REG_MEM on an appropriate fence-like dword. Some queries conveniently
write their results with the highest bit set, and we can just use that;
for others, we have to write a fence explicitly.

ZPASS_DONE for occlusion queries writes its results with the high bit
set, but it writes up to 8 pairs of results (one for each DB). We have
to wait for all of these results, so let's just add an explicit fence.

The new function provides summary information to be used by subsequent
patches.
---
 src/gallium/drivers/radeon/r600_query.c | 107 +++-
 1 file changed, 91 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_query.c 
b/src/gallium/drivers/radeon/r600_query.c
index 2c3d530..b9041eb 100644
--- a/src/gallium/drivers/radeon/r600_query.c
+++ b/src/gallium/drivers/radeon/r600_query.c
@@ -19,20 +19,28 @@
  * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
  * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
  * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  */
 
 #include "r600_query.h"
 #include "r600_cs.h"
 #include "util/u_memory.h"
 
+struct r600_hw_query_params {
+   unsigned start_offset;
+   unsigned end_offset;
+   unsigned fence_offset;
+   unsigned pair_stride;
+   unsigned pair_count;
+};
+
 /* Queries without buffer handling or suspend/resume. */
 struct r600_query_sw {
struct r600_query b;
 
uint64_t begin_result;
uint64_t end_result;
/* Fence for GPU_FINISHED. */
struct pipe_fence_handle *fence;
 };
 
@@ -352,21 +360,21 @@ static bool r600_query_hw_prepare_buffer(struct 
r600_common_context *ctx,
return false;
 
memset(results, 0, buffer->b.b.width0);
 
if (query->b.type == PIPE_QUERY_OCCLUSION_COUNTER ||
query->b.type == PIPE_QUERY_OCCLUSION_PREDICATE) {
unsigned num_results;
unsigned i, j;
 
/* Set top bits for unused backends. */
-   num_results = buffer->b.b.width0 / (16 * ctx->max_db);
+   num_results = buffer->b.b.width0 / query->result_size;
for (j = 0; j < num_results; j++) {
for (i = 0; i < ctx->max_db; i++) {
if (!(ctx->backend_mask & (1<max_db;
}
}
@@ -422,50 +430,52 @@ static struct pipe_query *r600_query_hw_create(struct 
r600_common_context *rctx,
return NULL;
 
query->b.type = query_type;
query->b.ops = _hw_ops;
query->ops = _hw_default_hw_ops;
 
switch (query_type) {
case PIPE_QUERY_OCCLUSION_COUNTER:
case PIPE_QUERY_OCCLUSION_PREDICATE:
query->result_size = 16 * rctx->max_db;
+   query->result_size += 16; /* for the fence + alignment */
query->num_cs_dw_begin = 6;
-   query->num_cs_dw_end = 6;
+   query->num_cs_dw_end = 6 + 
r600_gfx_write_fence_dwords(rctx->screen);
query->flags |= R600_QUERY_HW_FLAG_PREDICATE;
break;
case PIPE_QUERY_TIME_ELAPSED:
-   query->result_size = 16;
+   query->result_size = 24;
query->num_cs_dw_begin = 8;
-   query->num_cs_dw_end = 8;
+   query->num_cs_dw_end = 8 + 
r600_gfx_write_fence_dwords(rctx->screen);
break;
case PIPE_QUERY_TIMESTAMP:
-   query->result_size = 8;
-   query->num_cs_dw_end = 8;
+   query->result_size = 16;
+   query->num_cs_dw_end = 8 + 
r600_gfx_write_fence_dwords(rctx->screen);
query->flags = R600_QUERY_HW_FLAG_NO_START;
break;
case PIPE_QUERY_PRIMITIVES_EMITTED:
case PIPE_QUERY_PRIMITIVES_GENERATED:
case PIPE_QUERY_SO_STATISTICS:
case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
/* NumPrimitivesWritten, PrimitiveStorageNeeded. */
query->result_size = 32;
query->num_cs_dw_begin = 6;
query->num_cs_dw_end = 6;
query->stream = index;
query->flags |= R600_QUERY_HW_FLAG_PREDICATE;
break;
case PIPE_QUERY_PIPELINE_STATISTICS:
/* 11 values on EG, 8 on R600. */
query->result_size = (rctx->chip_class >= EVERGREEN ? 11 : 8) * 
16;
+   query->result_size += 8; /* for the fence + alignment */
query->num_cs_dw_begin = 6;
- 

[Mesa-dev] [PATCH 7/9] gallium/radeon: zero all query buffers

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

To ensure that fences are properly initialized.
---
 src/gallium/drivers/radeon/r600_query.c | 26 ++
 src/gallium/drivers/radeon/r600_query.h |  2 +-
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_query.c 
b/src/gallium/drivers/radeon/r600_query.c
index c1c3599..d96f9fc 100644
--- a/src/gallium/drivers/radeon/r600_query.c
+++ b/src/gallium/drivers/radeon/r600_query.c
@@ -331,25 +331,23 @@ static struct r600_resource *r600_new_query_buffer(struct 
r600_common_context *c
/* Queries are normally read by the CPU after
 * being written by the gpu, hence staging is probably a good
 * usage pattern.
 */
struct r600_resource *buf = (struct r600_resource*)
pipe_buffer_create(ctx->b.screen, PIPE_BIND_CUSTOM,
   PIPE_USAGE_STAGING, buf_size);
if (!buf)
return NULL;
 
-   if (query->flags & R600_QUERY_HW_FLAG_PREDICATE) {
-   if (!query->ops->prepare_buffer(ctx, query, buf)) {
-   r600_resource_reference(, NULL);
-   return NULL;
-   }
+   if (!query->ops->prepare_buffer(ctx, query, buf)) {
+   r600_resource_reference(, NULL);
+   return NULL;
}
 
return buf;
 }
 
 static bool r600_query_hw_prepare_buffer(struct r600_common_context *ctx,
 struct r600_query_hw *query,
 struct r600_resource *buffer)
 {
/* Callers ensure that the buffer is currently unused by the GPU. */
@@ -433,42 +431,40 @@ static struct pipe_query *r600_query_hw_create(struct 
r600_common_context *rctx,
query->b.ops = _hw_ops;
query->ops = _hw_default_hw_ops;
 
switch (query_type) {
case PIPE_QUERY_OCCLUSION_COUNTER:
case PIPE_QUERY_OCCLUSION_PREDICATE:
query->result_size = 16 * rctx->max_db;
query->result_size += 16; /* for the fence + alignment */
query->num_cs_dw_begin = 6;
query->num_cs_dw_end = 6 + 
r600_gfx_write_fence_dwords(rctx->screen);
-   query->flags |= R600_QUERY_HW_FLAG_PREDICATE;
break;
case PIPE_QUERY_TIME_ELAPSED:
query->result_size = 24;
query->num_cs_dw_begin = 8;
query->num_cs_dw_end = 8 + 
r600_gfx_write_fence_dwords(rctx->screen);
break;
case PIPE_QUERY_TIMESTAMP:
query->result_size = 16;
query->num_cs_dw_end = 8 + 
r600_gfx_write_fence_dwords(rctx->screen);
query->flags = R600_QUERY_HW_FLAG_NO_START;
break;
case PIPE_QUERY_PRIMITIVES_EMITTED:
case PIPE_QUERY_PRIMITIVES_GENERATED:
case PIPE_QUERY_SO_STATISTICS:
case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
/* NumPrimitivesWritten, PrimitiveStorageNeeded. */
query->result_size = 32;
query->num_cs_dw_begin = 6;
query->num_cs_dw_end = 6;
query->stream = index;
-   query->flags |= R600_QUERY_HW_FLAG_PREDICATE;
break;
case PIPE_QUERY_PIPELINE_STATISTICS:
/* 11 values on EG, 8 on R600. */
query->result_size = (rctx->chip_class >= EVERGREEN ? 11 : 8) * 
16;
query->result_size += 8; /* for the fence + alignment */
query->num_cs_dw_begin = 6;
query->num_cs_dw_end = 6 + 
r600_gfx_write_fence_dwords(rctx->screen);
break;
default:
assert(0);
@@ -786,30 +782,28 @@ void r600_query_hw_reset_buffers(struct 
r600_common_context *rctx,
while (prev) {
struct r600_query_buffer *qbuf = prev;
prev = prev->previous;
r600_resource_reference(>buf, NULL);
FREE(qbuf);
}
 
query->buffer.results_end = 0;
query->buffer.previous = NULL;
 
-   if (query->flags & R600_QUERY_HW_FLAG_PREDICATE) {
-   /* Obtain a new buffer if the current one can't be mapped 
without a stall. */
-   if (r600_rings_is_buffer_referenced(rctx, 
query->buffer.buf->buf, RADEON_USAGE_READWRITE) ||
-   !rctx->ws->buffer_wait(query->buffer.buf->buf, 0, 
RADEON_USAGE_READWRITE)) {
+   /* Obtain a new buffer if the current one can't be mapped without a 
stall. */
+   if (r600_rings_is_buffer_referenced(rctx, query->buffer.buf->buf, 
RADEON_USAGE_READWRITE) ||
+   !rctx->ws->buffer_wait(query->buffer.buf->buf, 0, 
RADEON_USAGE_READWRITE)) {
+   r600_resource_reference(>buffer.buf, NULL);
+   query->buffer.buf = r600_new_query_buffer(rctx, query);
+   } else {
+   if 

[Mesa-dev] [PATCH 6/9] gallium/radeon: cleanup getting PIPE_QUERY_TIMESTAMP result

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeon/r600_query.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_query.c 
b/src/gallium/drivers/radeon/r600_query.c
index b9041eb..c1c3599 100644
--- a/src/gallium/drivers/radeon/r600_query.c
+++ b/src/gallium/drivers/radeon/r600_query.c
@@ -946,26 +946,22 @@ static void r600_query_hw_add_result(struct 
r600_common_context *ctx,
unsigned results_base = i * 16;
result->b = result->b ||
r600_query_read_result(buffer + results_base, 
0, 2, true) != 0;
}
break;
}
case PIPE_QUERY_TIME_ELAPSED:
result->u64 += r600_query_read_result(buffer, 0, 2, false);
break;
case PIPE_QUERY_TIMESTAMP:
-   {
-   uint32_t *current_result = (uint32_t*)buffer;
-   result->u64 = (uint64_t)current_result[0] |
- (uint64_t)current_result[1] << 32;
+   result->u64 = *(uint64_t*)buffer;
break;
-   }
case PIPE_QUERY_PRIMITIVES_EMITTED:
/* SAMPLE_STREAMOUTSTATS stores this structure:
 * {
 *u64 NumPrimitivesWritten;
 *u64 PrimitiveStorageNeeded;
 * }
 * We only need NumPrimitivesWritten here. */
result->u64 += r600_query_read_result(buffer, 2, 6, true);
break;
case PIPE_QUERY_PRIMITIVES_GENERATED:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] radeonsi: enable ARB_query_buffer_object

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 docs/features.txt  | 2 +-
 docs/relnotes/12.1.0.html  | 1 +
 src/gallium/drivers/radeonsi/si_pipe.c | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index 9850a43..8b87b08 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -194,21 +194,21 @@ GL 4.4, GLSL 4.40 -- all DONE: i965/gen8+
   GL_ARB_buffer_storage DONE (i965, nv50, 
nvc0, r600, radeonsi)
   GL_ARB_clear_texture  DONE (i965, nv50, 
nvc0, r600, radeonsi)
   GL_ARB_enhanced_layouts   DONE (i965)
   - compile-time constant expressions   DONE
   - explicit byte offsets for blocksDONE
   - forced alignment within blocks  DONE
   - specified vec4-slot component numbers   DONE (i965)
   - specified transform/feedback layout DONE
   - input/output block locationsDONE
   GL_ARB_multi_bind DONE (all drivers)
-  GL_ARB_query_buffer_objectDONE (i965/hsw+, nvc0)
+  GL_ARB_query_buffer_objectDONE (i965/hsw+, nvc0, 
radeonsi)
   GL_ARB_texture_mirror_clamp_to_edge   DONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
   GL_ARB_texture_stencil8   DONE (i965/hsw+, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
   GL_ARB_vertex_type_10f_11f_11f_revDONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
 
 GL 4.5, GLSL 4.50:
 
   GL_ARB_ES3_1_compatibilityDONE (i965/hsw+, nvc0, 
radeonsi)
   GL_ARB_clip_control   DONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
   GL_ARB_conditional_render_invertedDONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
   GL_ARB_cull_distance  DONE (i965, nv50, 
nvc0, radeonsi, llvmpipe, softpipe, swr)
diff --git a/docs/relnotes/12.1.0.html b/docs/relnotes/12.1.0.html
index 8e0a84e..5a45a1d 100644
--- a/docs/relnotes/12.1.0.html
+++ b/docs/relnotes/12.1.0.html
@@ -44,20 +44,21 @@ Note: some of the new features are only available with 
certain drivers.
 
 
 
 OpenGL ES 3.1 on i965/hsw
 GL_ARB_ES3_1_compatibility on i965
 GL_ARB_ES3_2_compatibility on i965/gen8+
 GL_ARB_clear_texture on r600, radeonsi
 GL_ARB_cull_distance on radeonsi
 GL_ARB_enhanced_layouts on i965
 GL_ARB_indirect_parameters on radeonsi
+GL_ARB_query_buffer_object on radeonsi
 GL_ARB_shader_draw_parameters on radeonsi
 GL_ARB_shader_group_vote on nvc0
 GL_ARB_stencil_texturing on i965/hsw
 GL_ARB_texture_stencil8 on i965/hsw
 GL_EXT_window_rectangles on nv50, nvc0
 GL_KHR_blend_equation_advanced on i965
 GL_KHR_texture_compression_astc_sliced_3d on i965
 GL_OES_copy_image on nv50, nvc0, r600, radeonsi, softpipe, llvmpipe
 GL_OES_geometry_shader on i965/gen8+, nvc0, radeonsi
 GL_OES_primitive_bounding_box on i965/gen7+, nvc0, radeonsi
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 730be9d..84f9796 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -393,20 +393,21 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_SURFACE_REINTERPRET_BLOCKS:
case PIPE_CAP_QUERY_MEMORY_INFO:
case PIPE_CAP_TGSI_PACK_HALF_FLOAT:
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
case PIPE_CAP_GENERATE_MIPMAP:
case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED:
case PIPE_CAP_STRING_MARKER:
case PIPE_CAP_CLEAR_TEXTURE:
case PIPE_CAP_CULL_DISTANCE:
+   case PIPE_CAP_QUERY_BUFFER_OBJECT:
return 1;
 
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
return !SI_BIG_ENDIAN && sscreen->b.info.has_userptr;
 
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
return (sscreen->b.info.drm_major == 2 &&
sscreen->b.info.drm_minor >= 43) ||
   sscreen->b.info.drm_major == 3;
 
@@ -441,21 +442,20 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_BUFFER_SAMPLER_VIEW_RGBA_ONLY:
return 0;
 
/* Unsupported features. */
case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS:
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_FAKE_SW_MSAA:
case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
case PIPE_CAP_VERTEXID_NOBASE:
-   case PIPE_CAP_QUERY_BUFFER_OBJECT:
case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES:
case PIPE_CAP_TGSI_VOTE:
case 

[Mesa-dev] [PATCH 0/9] radeonsi: ARB_query_buffer_object implementation

2016-09-16 Thread Nicolai Hähnle
Hi all,

as the title says. The implementation uses a compute shader to summarize
data from the query buffers. As long as only one query buffer is in flight
(the normal case), that compute shader is launched exactly once, on a
single thread. If multiple buffers were required, then one compute grid is
launched for each of these buffers, in sequence.

All of this could be done in much fancier ways using bindless buffers and
wave-wide computations, but really, the expectation is that most queries
will be rather simple (though occlusion queries always contain at least 8
result pairs, so it's not like it would be completely pointless).

This code also exposes the hilarious lowering of 64-bit integer divides
in LLVM, since timestamp queries use it. This lowering generates more than
2KB of code for a single division, which is excessive even when the division
*isn't* by a constant. The right place to fix this is in LLVM, and I'm
already looking into it. For normal queries this is completely irrelevant
because the code will just be skipped.

Please review!
Thanks
Nicolai

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/9] radeonsi: add si_get_shader_buffers/get_pipe_constant_buffers

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

These functions extract the pipe state structure from the current
descriptors, for state saving.
---
 src/gallium/drivers/radeonsi/si_descriptors.c | 46 +++
 src/gallium/drivers/radeonsi/si_state.h   |  5 +++
 2 files changed, 51 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index b1a8594..d82910c 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -830,20 +830,41 @@ static void si_buffer_resources_begin_new_cs(struct 
si_context *sctx,
/* Add buffers to the CS. */
while (mask) {
int i = u_bit_scan();
 
radeon_add_to_buffer_list(>b, >b.gfx,
  (struct 
r600_resource*)buffers->buffers[i],
  buffers->shader_usage, buffers->priority);
}
 }
 
+static void si_get_buffer_from_descriptors(struct si_buffer_resources *buffers,
+  struct si_descriptors *descs,
+  unsigned idx, struct pipe_resource 
**buf,
+  unsigned *offset, unsigned *size)
+{
+   pipe_resource_reference(buf, buffers->buffers[idx]);
+   if (*buf) {
+   struct r600_resource *res = (struct r600_resource *)buf;
+   const uint32_t *desc = descs->list + idx * 4;
+   uint64_t va;
+
+   *size = desc[2];
+
+   assert(G_008F04_STRIDE(desc[1]) == 0);
+   va = ((uint64_t)desc[1] << 32) | desc[0];
+
+   assert(va >= res->gpu_address && va + *size <= res->gpu_address 
+ res->bo_size);
+   *offset = va - res->gpu_address;
+   }
+}
+
 /* VERTEX BUFFERS */
 
 static void si_vertex_buffers_begin_new_cs(struct si_context *sctx)
 {
struct si_descriptors *desc = >vertex_buffers;
int count = sctx->vertex_elements ? sctx->vertex_elements->count : 0;
int i;
 
for (i = 0; i < count; i++) {
int vb = sctx->vertex_elements->elements[i].vertex_buffer_index;
@@ -1055,20 +1076,30 @@ static void si_pipe_set_constant_buffer(struct 
pipe_context *ctx,
struct si_context *sctx = (struct si_context *)ctx;
 
if (shader >= SI_NUM_SHADERS)
return;
 
si_set_constant_buffer(sctx, >const_buffers[shader],
   si_const_buffer_descriptors_idx(shader),
   slot, input);
 }
 
+void si_get_pipe_constant_buffer(struct si_context *sctx, uint shader,
+uint slot, struct pipe_constant_buffer *cbuf)
+{
+   cbuf->user_buffer = NULL;
+   si_get_buffer_from_descriptors(
+   >const_buffers[shader],
+   si_const_buffer_descriptors(sctx, shader),
+   slot, >buffer, >buffer_offset, >buffer_size);
+}
+
 /* SHADER BUFFERS */
 
 static unsigned
 si_shader_buffer_descriptors_idx(enum pipe_shader_type shader)
 {
return SI_DESCS_FIRST_SHADER + shader * SI_NUM_SHADER_DESCS +
   SI_SHADER_DESCS_SHADER_BUFFERS;
 }
 
 static struct si_descriptors *
@@ -1125,20 +1156,35 @@ static void si_set_shader_buffers(struct pipe_context 
*ctx,
radeon_add_to_buffer_list_check_mem(>b, >b.gfx, buf,
buffers->shader_usage,
buffers->priority, true);
buffers->enabled_mask |= 1u << slot;
descs->dirty_mask |= 1u << slot;
sctx->descriptors_dirty |=
1u << si_shader_buffer_descriptors_idx(shader);
}
 }
 
+void si_get_shader_buffers(struct si_context *sctx, uint shader,
+  uint start_slot, uint count,
+  struct pipe_shader_buffer *sbuf)
+{
+   struct si_buffer_resources *buffers = >shader_buffers[shader];
+   struct si_descriptors *descs = si_shader_buffer_descriptors(sctx, 
shader);
+
+   for (unsigned i = 0; i < count; ++i) {
+   si_get_buffer_from_descriptors(
+   buffers, descs, start_slot + i,
+   [i].buffer, [i].buffer_offset,
+   [i].buffer_size);
+   }
+}
+
 /* RING BUFFERS */
 
 void si_set_ring_buffer(struct pipe_context *ctx, uint slot,
struct pipe_resource *buffer,
unsigned stride, unsigned num_records,
bool add_tid, bool swizzle,
unsigned element_size, unsigned index_stride, uint64_t 
offset)
 {
struct si_context *sctx = (struct si_context *)ctx;
struct si_buffer_resources *buffers = >rw_buffers;
diff --git a/src/gallium/drivers/radeonsi/si_state.h 
b/src/gallium/drivers/radeonsi/si_state.h
index 

[Mesa-dev] [PATCH 8/9] gallium/radeon: implement get_query_result_resource

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeon/r600_pipe_common.c |   3 +
 src/gallium/drivers/radeon/r600_pipe_common.h |   2 +
 src/gallium/drivers/radeon/r600_query.c   | 391 +-
 src/gallium/drivers/radeon/r600_query.h   |   7 +
 4 files changed, 402 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 2af4b41..0d2dd6b 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -566,20 +566,23 @@ void r600_common_context_cleanup(struct 
r600_common_context *rctx)
assert(!rctx->dcc_stats[i].query_active);
 
for (j = 0; j < ARRAY_SIZE(rctx->dcc_stats[i].ps_stats); j++)
if (rctx->dcc_stats[i].ps_stats[j])
rctx->b.destroy_query(>b,
  
rctx->dcc_stats[i].ps_stats[j]);
 
r600_texture_reference(>dcc_stats[i].tex, NULL);
}
 
+   if (rctx->query_result_shader)
+   rctx->b.delete_compute_state(>b, 
rctx->query_result_shader);
+
if (rctx->gfx.cs)
rctx->ws->cs_destroy(rctx->gfx.cs);
if (rctx->dma.cs)
rctx->ws->cs_destroy(rctx->dma.cs);
if (rctx->ctx)
rctx->ws->ctx_destroy(rctx->ctx);
 
if (rctx->uploader) {
u_upload_destroy(rctx->uploader);
}
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index 32acca5..f23f1c4 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -614,20 +614,22 @@ struct r600_common_context {
boolquery_active;
} dcc_stats[5];
 
/* The list of all texture buffer objects in this context.
 * This list is walked when a buffer is invalidated/reallocated and
 * the GPU addresses are updated. */
struct list_headtexture_buffers;
 
struct pipe_debug_callback  debug;
 
+   void*query_result_shader;
+
/* Copy one resource to another using async DMA. */
void (*dma_copy)(struct pipe_context *ctx,
 struct pipe_resource *dst,
 unsigned dst_level,
 unsigned dst_x, unsigned dst_y, unsigned dst_z,
 struct pipe_resource *src,
 unsigned src_level,
 const struct pipe_box *src_box);
 
void (*clear_buffer)(struct pipe_context *ctx, struct pipe_resource 
*dst,
diff --git a/src/gallium/drivers/radeon/r600_query.c 
b/src/gallium/drivers/radeon/r600_query.c
index d96f9fc..8aa1c888 100644
--- a/src/gallium/drivers/radeon/r600_query.c
+++ b/src/gallium/drivers/radeon/r600_query.c
@@ -18,20 +18,23 @@
  * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
  * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
  * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
  * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  */
 
 #include "r600_query.h"
 #include "r600_cs.h"
 #include "util/u_memory.h"
+#include "util/u_upload_mgr.h"
+
+#include "tgsi/tgsi_text.h"
 
 struct r600_hw_query_params {
unsigned start_offset;
unsigned end_offset;
unsigned fence_offset;
unsigned pair_stride;
unsigned pair_count;
 };
 
 /* Queries without buffer handling or suspend/resume. */
@@ -275,25 +278,27 @@ static bool r600_query_sw_get_result(struct 
r600_common_context *rctx,
break;
case R600_QUERY_CURRENT_GPU_SCLK:
case R600_QUERY_CURRENT_GPU_MCLK:
result->u64 *= 100;
break;
}
 
return true;
 }
 
+
 static struct r600_query_ops sw_query_ops = {
.destroy = r600_query_sw_destroy,
.begin = r600_query_sw_begin,
.end = r600_query_sw_end,
-   .get_result = r600_query_sw_get_result
+   .get_result = r600_query_sw_get_result,
+   .get_result_resource = NULL
 };
 
 static struct pipe_query *r600_query_sw_create(struct pipe_context *ctx,
   unsigned query_type)
 {
struct r600_query_sw *query;
 
query = CALLOC_STRUCT(r600_query_sw);
if (!query)
return NULL;
@@ -373,25 +378,34 @@ static bool r600_query_hw_prepare_buffer(struct 
r600_common_context *ctx,
results[(i * 4)+3] = 0x8000;
}
}
results += 4 * ctx->max_db;
}
}
 
return true;
 }
 

[Mesa-dev] [PATCH 4/9] radeonsi: add save_qbo_state

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Save compute shader state that will be used for the ARB_query_buffer_object
implementation.
---
 src/gallium/drivers/radeon/r600_pipe_common.h |  3 +++
 src/gallium/drivers/radeon/r600_query.h   |  7 +++
 src/gallium/drivers/radeonsi/si_state.c   | 12 
 3 files changed, 22 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index 96b23b2..32acca5 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -113,20 +113,21 @@ enum r600_coherency {
 
 #ifdef PIPE_ARCH_BIG_ENDIAN
 #define R600_BIG_ENDIAN 1
 #else
 #define R600_BIG_ENDIAN 0
 #endif
 
 struct r600_common_context;
 struct r600_perfcounters;
 struct tgsi_shader_info;
+struct r600_qbo_state;
 
 struct radeon_shader_reloc {
char name[32];
uint64_t offset;
 };
 
 struct radeon_shader_binary {
/** Shader code */
unsigned char *code;
unsigned code_size;
@@ -643,20 +644,22 @@ struct r600_common_context {
void (*decompress_dcc)(struct pipe_context *ctx,
   struct r600_texture *rtex);
 
/* Reallocate the buffer and update all resource bindings where
 * the buffer is bound, including all resource descriptors. */
void (*invalidate_buffer)(struct pipe_context *ctx, struct 
pipe_resource *buf);
 
/* Enable or disable occlusion queries. */
void (*set_occlusion_query_state)(struct pipe_context *ctx, bool 
enable);
 
+   void (*save_qbo_state)(struct pipe_context *ctx, struct r600_qbo_state 
*st);
+
/* This ensures there is enough space in the command stream. */
void (*need_gfx_cs_space)(struct pipe_context *ctx, unsigned num_dw,
  bool include_draw_vbo);
 
void (*set_atom_dirty)(struct r600_common_context *ctx,
   struct r600_atom *atom, bool dirty);
 
void (*check_vm_faults)(struct r600_common_context *ctx,
struct radeon_saved_cs *saved,
enum ring_type ring);
diff --git a/src/gallium/drivers/radeon/r600_query.h 
b/src/gallium/drivers/radeon/r600_query.h
index 0cd1a02..4f5aa3a 100644
--- a/src/gallium/drivers/radeon/r600_query.h
+++ b/src/gallium/drivers/radeon/r600_query.h
@@ -22,20 +22,21 @@
  *
  * Authors:
  *  Nicolai Hähnle 
  *
  */
 
 #ifndef R600_QUERY_H
 #define R600_QUERY_H
 
 #include "pipe/p_defines.h"
+#include "pipe/p_state.h"
 #include "util/list.h"
 
 struct pipe_context;
 struct pipe_query;
 
 struct r600_common_context;
 struct r600_common_screen;
 struct r600_query;
 struct r600_query_hw;
 struct r600_resource;
@@ -260,11 +261,17 @@ int r600_get_perfcounter_group_info(struct 
r600_common_screen *,
 bool r600_perfcounters_init(struct r600_perfcounters *, unsigned num_blocks);
 void r600_perfcounters_add_block(struct r600_common_screen *,
 struct r600_perfcounters *,
 const char *name, unsigned flags,
 unsigned counters, unsigned selectors,
 unsigned instances, void *data);
 void r600_perfcounters_do_destroy(struct r600_perfcounters *);
 void r600_query_hw_reset_buffers(struct r600_common_context *rctx,
 struct r600_query_hw *query);
 
+struct r600_qbo_state {
+   void *saved_compute;
+   struct pipe_constant_buffer saved_const0;
+   struct pipe_shader_buffer saved_ssbo[3];
+};
+
 #endif /* R600_QUERY_H */
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 1703e42..443dc37 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -21,20 +21,21 @@
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  *
  * Authors:
  *  Christian König 
  */
 
 #include "si_pipe.h"
 #include "si_shader.h"
 #include "sid.h"
 #include "radeon/r600_cs.h"
+#include "radeon/r600_query.h"
 
 #include "util/u_dual_blend.h"
 #include "util/u_format.h"
 #include "util/u_format_s3tc.h"
 #include "util/u_memory.h"
 #include "util/u_pstipple.h"
 #include "util/u_resource.h"
 
 /* Initialize an external atom (owned by ../radeon). */
 static void
@@ -1067,20 +1068,30 @@ static void si_set_active_query_state(struct 
pipe_context *ctx, boolean enable)
}
 }
 
 static void si_set_occlusion_query_state(struct pipe_context *ctx, bool enable)
 {
struct si_context *sctx = (struct si_context*)ctx;
 
si_mark_atom_dirty(sctx, >db_render_state);
 }
 
+static void si_save_qbo_state(struct pipe_context *ctx, struct r600_qbo_state 
*st)
+{
+   struct si_context *sctx = (struct si_context*)ctx;
+
+   st->saved_compute = sctx->cs_shader_state.program;
+
+   

[Mesa-dev] [PATCH 01/11] i965: allow sampler indirects on all gens

2016-09-16 Thread Timothy Arceri
Without this we will regress the max-samplers piglit test on Gen6
and lower when loop unrolling is done in NIR. There is a check
in the GLSL IR linker that errors when it finds indirects and
EmitNoIndirectSampler is set.

As far as I can tell there is no reason for not enabling this for
all gens regardless of whether they fully support ARB_gpu_shader5
or not.
---
 src/mesa/drivers/dri/i965/brw_compiler.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c 
b/src/mesa/drivers/dri/i965/brw_compiler.c
index 86b1eaa..9318aa6 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.c
+++ b/src/mesa/drivers/dri/i965/brw_compiler.c
@@ -135,10 +135,6 @@ brw_compiler_create(void *mem_ctx, const struct 
gen_device_info *devinfo)
   compiler->glsl_compiler_options[i].EmitNoIndirectTemp = is_scalar;
   compiler->glsl_compiler_options[i].OptimizeForAOS = !is_scalar;
 
-  /* !ARB_gpu_shader5 */
-  if (devinfo->gen < 7)
- compiler->glsl_compiler_options[i].EmitNoIndirectSampler = true;
-
   if (is_scalar) {
  compiler->glsl_compiler_options[i].NirOptions = _nir_options;
   } else {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/9] squash! [rfc] radeonsi: enable 64-bit integer support.

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

- PIPE_CAP_INT64 is not there yet
- emit DIV/MOD without the divide-by-zero workaround
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 5 +
 src/gallium/drivers/radeonsi/si_pipe.c  | 1 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 9216143..bcb3143 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -2063,20 +2063,25 @@ void radeon_llvm_context_init(struct 
radeon_llvm_context *ctx, const char *tripl
bld_base->op_actions[TGSI_OPCODE_U64SNE].emit = emit_icmp;
bld_base->op_actions[TGSI_OPCODE_U64SGE].emit = emit_icmp;
bld_base->op_actions[TGSI_OPCODE_U64SLT].emit = emit_icmp;
bld_base->op_actions[TGSI_OPCODE_I64SGE].emit = emit_icmp;
bld_base->op_actions[TGSI_OPCODE_I64SLT].emit = emit_icmp;
 
bld_base->op_actions[TGSI_OPCODE_U64ADD].emit = emit_uadd;
bld_base->op_actions[TGSI_OPCODE_U64SHL].emit = emit_shl;
bld_base->op_actions[TGSI_OPCODE_U64SHR].emit = emit_ushr;
bld_base->op_actions[TGSI_OPCODE_I64SHR].emit = emit_ishr;
+
+   bld_base->op_actions[TGSI_OPCODE_U64MOD].emit = emit_umod;
+   bld_base->op_actions[TGSI_OPCODE_I64MOD].emit = emit_mod;
+   bld_base->op_actions[TGSI_OPCODE_U64DIV].emit = emit_udiv;
+   bld_base->op_actions[TGSI_OPCODE_I64DIV].emit = emit_idiv;
 }
 
 void radeon_llvm_create_func(struct radeon_llvm_context *ctx,
 LLVMTypeRef *return_types, unsigned 
num_return_elems,
 LLVMTypeRef *ParamTypes, unsigned ParamCount)
 {
LLVMTypeRef main_fn_type, ret_type;
LLVMBasicBlockRef main_fn_body;
 
if (num_return_elems)
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index ca930fe..8f9e6f5 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -393,21 +393,20 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_SURFACE_REINTERPRET_BLOCKS:
case PIPE_CAP_QUERY_MEMORY_INFO:
case PIPE_CAP_TGSI_PACK_HALF_FLOAT:
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
case PIPE_CAP_GENERATE_MIPMAP:
case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED:
case PIPE_CAP_STRING_MARKER:
case PIPE_CAP_CLEAR_TEXTURE:
case PIPE_CAP_CULL_DISTANCE:
-   case PIPE_CAP_INT64:
return 1;
 
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
return !SI_BIG_ENDIAN && sscreen->b.info.has_userptr;
 
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
return (sscreen->b.info.drm_major == 2 &&
sscreen->b.info.drm_minor >= 43) ||
   sscreen->b.info.drm_major == 3;
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] gallium/tgsi: add support for 64-bit integer immediates.

2016-09-16 Thread Nicolai Hähnle
From: Dave Airlie 

This adds support to TGSI for 64-bit integer immediates.

Reviewed-by: Marek Olšák 
Reviewed-by: Nicolai Hähnle 
Signed-off-by: Dave Airlie 
---
 src/gallium/auxiliary/tgsi/tgsi_dump.c | 14 ++
 src/gallium/auxiliary/tgsi/tgsi_exec.c |  2 ++
 src/gallium/auxiliary/tgsi/tgsi_parse.c|  2 ++
 src/gallium/auxiliary/tgsi/tgsi_text.c | 44 +
 src/gallium/auxiliary/tgsi/tgsi_ureg.c | 45 --
 src/gallium/auxiliary/tgsi/tgsi_ureg.h | 10 +++
 src/gallium/include/pipe/p_shader_tokens.h |  2 ++
 7 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c 
b/src/gallium/auxiliary/tgsi/tgsi_dump.c
index d59b7ff..614bcb2 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_dump.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c
@@ -247,20 +247,34 @@ dump_imm_data(struct tgsi_iterate_context *iter,
assert( num_tokens <= 4 );
for (i = 0; i < num_tokens; i++) {
   switch (data_type) {
   case TGSI_IMM_FLOAT64: {
  union di d;
  d.ui = data[i].Uint | (uint64_t)data[i+1].Uint << 32;
  DBL( d.d );
  i++;
  break;
   }
+  case TGSI_IMM_INT64: {
+ union di d;
+ d.i = data[i].Uint | (uint64_t)data[i+1].Uint << 32;
+ UID( d.i );
+ i++;
+ break;
+  }
+  case TGSI_IMM_UINT64: {
+ union di d;
+ d.ui = data[i].Uint | (uint64_t)data[i+1].Uint << 32;
+ UID( d.ui );
+ i++;
+ break;
+  }
   case TGSI_IMM_FLOAT32:
  if (ctx->dump_float_as_hex)
 HFLT( data[i].Float );
  else
 FLT( data[i].Float );
  break;
   case TGSI_IMM_UINT32:
  UID(data[i].Uint);
  break;
   case TGSI_IMM_INT32:
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index 1457c06..e99caeb 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -70,20 +70,22 @@
 #define FAST_MATH 0
 
 #define TILE_TOP_LEFT 0
 #define TILE_TOP_RIGHT1
 #define TILE_BOTTOM_LEFT  2
 #define TILE_BOTTOM_RIGHT 3
 
 union tgsi_double_channel {
double d[TGSI_QUAD_SIZE];
unsigned u[TGSI_QUAD_SIZE][2];
+   uint64_t u64[TGSI_QUAD_SIZE];
+   int64_t i64[TGSI_QUAD_SIZE];
 };
 
 struct tgsi_double_vector {
union tgsi_double_channel xy;
union tgsi_double_channel zw;
 };
 
 static void
 micro_abs(union tgsi_exec_channel *dst,
   const union tgsi_exec_channel *src)
diff --git a/src/gallium/auxiliary/tgsi/tgsi_parse.c 
b/src/gallium/auxiliary/tgsi/tgsi_parse.c
index 16564dd..940af7d 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_parse.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_parse.c
@@ -148,26 +148,28 @@ tgsi_parse_token(
 
   switch (imm->Immediate.DataType) {
   case TGSI_IMM_FLOAT32:
   case TGSI_IMM_FLOAT64:
  for (i = 0; i < imm_count; i++) {
 next_token(ctx, >u[i].Float);
  }
  break;
 
   case TGSI_IMM_UINT32:
+  case TGSI_IMM_UINT64:
  for (i = 0; i < imm_count; i++) {
 next_token(ctx, >u[i].Uint);
  }
  break;
 
   case TGSI_IMM_INT32:
+  case TGSI_IMM_INT64:
  for (i = 0; i < imm_count; i++) {
 next_token(ctx, >u[i].Int);
  }
  break;
 
   default:
  assert( 0 );
   }
 
   break;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c 
b/src/gallium/auxiliary/tgsi/tgsi_text.c
index 8bdec06..be80842 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -288,20 +288,56 @@ static boolean parse_double( const char **pcur, uint32_t 
*val0, uint32_t *val1)
v.dval = strtod(cur, (char**)pcur);
if (*pcur == cur)
   return FALSE;
 
*val0 = v.uval[0];
*val1 = v.uval[1];
 
return TRUE;
 }
 
+static boolean parse_int64( const char **pcur, uint32_t *val0, uint32_t *val1)
+{
+   const char *cur = *pcur;
+   union {
+  int64_t i64val;
+  uint32_t uval[2];
+   } v;
+
+   v.i64val = strtoll(cur, (char**)pcur, 0);
+   if (*pcur == cur)
+  return FALSE;
+
+   *val0 = v.uval[0];
+   *val1 = v.uval[1];
+
+   return TRUE;
+}
+
+static boolean parse_uint64( const char **pcur, uint32_t *val0, uint32_t *val1)
+{
+   const char *cur = *pcur;
+   union {
+  uint64_t u64val;
+  uint32_t uval[2];
+   } v;
+
+   v.u64val = strtoull(cur, (char**)pcur, 0);
+   if (*pcur == cur)
+  return FALSE;
+
+   *val0 = v.uval[0];
+   *val1 = v.uval[1];
+
+   return TRUE;
+}
+
 struct translate_ctx
 {
const char *text;
const char *cur;
struct tgsi_token *tokens;
struct tgsi_token *tokens_cur;
struct tgsi_token *tokens_end;
struct tgsi_header *header;
unsigned processor : 4;
unsigned implied_array_size : 6;

[Mesa-dev] [PATCH 4/9] squash! tgsi/softpipe: enable ARB_gpu_shader_int64 support. (v2)

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/softpipe/sp_screen.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_screen.c 
b/src/gallium/drivers/softpipe/sp_screen.c
index 01d7e8a..cd4269f 100644
--- a/src/gallium/drivers/softpipe/sp_screen.c
+++ b/src/gallium/drivers/softpipe/sp_screen.c
@@ -276,22 +276,20 @@ softpipe_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
case PIPE_CAP_PCI_FUNCTION:
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES:
case PIPE_CAP_TGSI_VOTE:
case PIPE_CAP_MAX_WINDOW_RECTANGLES:
case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED:
case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS:
   return 0;
case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT:
   return 4;
-   case PIPE_CAP_INT64:
-  return 1;
}
/* should only get here on unhandled cases */
debug_printf("Unexpected PIPE_CAP %d query\n", param);
return 0;
 }
 
 static int
 softpipe_get_shader_param(struct pipe_screen *screen, unsigned shader, enum 
pipe_shader_cap param)
 {
struct softpipe_screen *sp_screen = softpipe_screen(screen);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/9] tgsi/softpipe: enable ARB_gpu_shader_int64 support. (v2)

2016-09-16 Thread Nicolai Hähnle
From: Dave Airlie 

This adds all the opcodes to tgsi_exec for softpipe to use.

It also enables the cap.

v2: add conversion opcodes.

Reviewed-by: Nicolai Hähnle 
Signed-off-by: Dave Airlie 
---
 src/gallium/auxiliary/tgsi/tgsi_exec.c   | 673 +--
 src/gallium/drivers/softpipe/sp_screen.c |   2 +
 2 files changed, 543 insertions(+), 132 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index e99caeb..ef3c077 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -687,25 +687,265 @@ micro_trunc(union tgsi_exec_channel *dst,
 static void
 micro_u2d(union tgsi_double_channel *dst,
   const union tgsi_exec_channel *src)
 {
dst->d[0] = (double)src->u[0];
dst->d[1] = (double)src->u[1];
dst->d[2] = (double)src->u[2];
dst->d[3] = (double)src->u[3];
 }
 
+static void
+micro_i64abs(union tgsi_double_channel *dst,
+ const union tgsi_double_channel *src)
+{
+   dst->i64[0] = src->i64[0] >= 0.0 ? src->i64[0] : -src->i64[0];
+   dst->i64[1] = src->i64[1] >= 0.0 ? src->i64[1] : -src->i64[1];
+   dst->i64[2] = src->i64[2] >= 0.0 ? src->i64[2] : -src->i64[2];
+   dst->i64[3] = src->i64[3] >= 0.0 ? src->i64[3] : -src->i64[3];
+}
+
+static void
+micro_i64sgn(union tgsi_double_channel *dst,
+ const union tgsi_double_channel *src)
+{
+   dst->i64[0] = src->i64[0] < 0 ? -1 : src->i64[0] > 0 ? 1 : 0;
+   dst->i64[1] = src->i64[1] < 0 ? -1 : src->i64[1] > 0 ? 1 : 0;
+   dst->i64[2] = src->i64[2] < 0 ? -1 : src->i64[2] > 0 ? 1 : 0;
+   dst->i64[3] = src->i64[3] < 0 ? -1 : src->i64[3] > 0 ? 1 : 0;
+}
+
+static void
+micro_i64neg(union tgsi_double_channel *dst,
+ const union tgsi_double_channel *src)
+{
+   dst->i64[0] = -src->i64[0];
+   dst->i64[1] = -src->i64[1];
+   dst->i64[2] = -src->i64[2];
+   dst->i64[3] = -src->i64[3];
+}
+
+static void
+micro_u64seq(union tgsi_double_channel *dst,
+   const union tgsi_double_channel *src)
+{
+   dst->u[0][0] = src[0].u64[0] == src[1].u64[0] ? ~0U : 0U;
+   dst->u[1][0] = src[0].u64[1] == src[1].u64[1] ? ~0U : 0U;
+   dst->u[2][0] = src[0].u64[2] == src[1].u64[2] ? ~0U : 0U;
+   dst->u[3][0] = src[0].u64[3] == src[1].u64[3] ? ~0U : 0U;
+}
+
+static void
+micro_u64sne(union tgsi_double_channel *dst,
+ const union tgsi_double_channel *src)
+{
+   dst->u[0][0] = src[0].u64[0] != src[1].u64[0] ? ~0U : 0U;
+   dst->u[1][0] = src[0].u64[1] != src[1].u64[1] ? ~0U : 0U;
+   dst->u[2][0] = src[0].u64[2] != src[1].u64[2] ? ~0U : 0U;
+   dst->u[3][0] = src[0].u64[3] != src[1].u64[3] ? ~0U : 0U;
+}
+
+static void
+micro_i64slt(union tgsi_double_channel *dst,
+ const union tgsi_double_channel *src)
+{
+   dst->u[0][0] = src[0].i64[0] < src[1].i64[0] ? ~0U : 0U;
+   dst->u[1][0] = src[0].i64[1] < src[1].i64[1] ? ~0U : 0U;
+   dst->u[2][0] = src[0].i64[2] < src[1].i64[2] ? ~0U : 0U;
+   dst->u[3][0] = src[0].i64[3] < src[1].i64[3] ? ~0U : 0U;
+}
+
+static void
+micro_u64slt(union tgsi_double_channel *dst,
+ const union tgsi_double_channel *src)
+{
+   dst->u[0][0] = src[0].u64[0] < src[1].u64[0] ? ~0U : 0U;
+   dst->u[1][0] = src[0].u64[1] < src[1].u64[1] ? ~0U : 0U;
+   dst->u[2][0] = src[0].u64[2] < src[1].u64[2] ? ~0U : 0U;
+   dst->u[3][0] = src[0].u64[3] < src[1].u64[3] ? ~0U : 0U;
+}
+
+static void
+micro_i64sge(union tgsi_double_channel *dst,
+   const union tgsi_double_channel *src)
+{
+   dst->u[0][0] = src[0].i64[0] >= src[1].i64[0] ? ~0U : 0U;
+   dst->u[1][0] = src[0].i64[1] >= src[1].i64[1] ? ~0U : 0U;
+   dst->u[2][0] = src[0].i64[2] >= src[1].i64[2] ? ~0U : 0U;
+   dst->u[3][0] = src[0].i64[3] >= src[1].i64[3] ? ~0U : 0U;
+}
+
+static void
+micro_u64sge(union tgsi_double_channel *dst,
+ const union tgsi_double_channel *src)
+{
+   dst->u[0][0] = src[0].u64[0] >= src[1].u64[0] ? ~0U : 0U;
+   dst->u[1][0] = src[0].u64[1] >= src[1].u64[1] ? ~0U : 0U;
+   dst->u[2][0] = src[0].u64[2] >= src[1].u64[2] ? ~0U : 0U;
+   dst->u[3][0] = src[0].u64[3] >= src[1].u64[3] ? ~0U : 0U;
+}
+
+static void
+micro_u64max(union tgsi_double_channel *dst,
+ const union tgsi_double_channel *src)
+{
+   dst->u64[0] = src[0].u64[0] > src[1].u64[0] ? src[0].u64[0] : src[1].u64[0];
+   dst->u64[1] = src[0].u64[1] > src[1].u64[1] ? src[0].u64[1] : src[1].u64[1];
+   dst->u64[2] = src[0].u64[2] > src[1].u64[2] ? src[0].u64[2] : src[1].u64[2];
+   dst->u64[3] = src[0].u64[3] > src[1].u64[3] ? src[0].u64[3] : src[1].u64[3];
+}
+
+static void
+micro_i64max(union tgsi_double_channel *dst,
+ const union tgsi_double_channel *src)
+{
+   dst->i64[0] = src[0].i64[0] > src[1].i64[0] ? src[0].i64[0] : src[1].i64[0];
+   dst->i64[1] = src[0].i64[1] > src[1].i64[1] ? src[0].i64[1] : src[1].i64[1];
+   dst->i64[2] = src[0].i64[2] > src[1].i64[2] ? src[0].i64[2] : src[1].i64[2];
+   

[Mesa-dev] [PATCH 5/9] gallivm/llvmpipe: add support for ARB_gpu_shader_int64.

2016-09-16 Thread Nicolai Hähnle
From: Dave Airlie 

This enables 64-bit integer support in gallivm and
llvmpipe.

v2: add conversion opcodes.
Signed-off-by: Dave Airlie 
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.c|   2 +
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.h|   4 +
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 473 +
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c|  40 +-
 src/gallium/auxiliary/tgsi/tgsi_info.h |   3 +-
 src/gallium/drivers/llvmpipe/lp_screen.c   |   1 +
 6 files changed, 518 insertions(+), 5 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
index 1ef6ae4..b397261 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
@@ -357,20 +357,22 @@ lp_build_emit_fetch(
if (reg->Register.Absolute) {
   switch (stype) {
   case TGSI_TYPE_FLOAT:
   case TGSI_TYPE_DOUBLE:
   case TGSI_TYPE_UNTYPED:
   /* modifiers on movs assume data is float */
  res = lp_build_emit_llvm_unary(bld_base, TGSI_OPCODE_ABS, res);
  break;
   case TGSI_TYPE_UNSIGNED:
   case TGSI_TYPE_SIGNED:
+  case TGSI_TYPE_UNSIGNED64:
+  case TGSI_TYPE_SIGNED64:
   case TGSI_TYPE_VOID:
   default:
  /* abs modifier is only legal on floating point types */
  assert(0);
  break;
   }
}
 
if (reg->Register.Negate) {
   switch (stype) {
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
index de1150c..b6b3fe3 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
@@ -330,20 +330,24 @@ typedef LLVMValueRef (*lp_build_emit_fetch_fn)(struct 
lp_build_tgsi_context *,
 unsigned);
 
 struct lp_build_tgsi_context
 {
struct lp_build_context base;
 
struct lp_build_context uint_bld;
struct lp_build_context int_bld;
 
struct lp_build_context dbl_bld;
+
+   struct lp_build_context uint64_bld;
+   struct lp_build_context int64_bld;
+
/** This array stores functions that are used to transform TGSI opcodes to
  * LLVM instructions.
  */
struct lp_build_tgsi_action op_actions[TGSI_OPCODE_LAST];
 
/* TGSI_OPCODE_RSQ is defined as 1 / sqrt( abs(src0.x) ), rsq_action
 * should compute 1 / sqrt (src0.x) */
struct lp_build_tgsi_action rsq_action;
 
struct lp_build_tgsi_action sqrt_action;
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
index 1ee9704..ad512e9 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
@@ -1086,20 +1086,231 @@ static void dfrac_emit(
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
 {
LLVMValueRef tmp;
tmp = lp_build_floor(_base->dbl_bld,
emit_data->args[0]);
emit_data->output[emit_data->chan] =  
LLVMBuildFSub(bld_base->base.gallivm->builder,
emit_data->args[0], 
tmp, "");
 }
 
+/* TGSI_OPCODE_U64MUL */
+static void
+u64mul_emit(
+   const struct lp_build_tgsi_action * action,
+   struct lp_build_tgsi_context * bld_base,
+   struct lp_build_emit_data * emit_data)
+{
+   emit_data->output[emit_data->chan] = lp_build_mul(_base->uint64_bld,
+   emit_data->args[0], emit_data->args[1]);
+}
+
+/* TGSI_OPCODE_U64MOD  */
+static void
+u64mod_emit(
+   const struct lp_build_tgsi_action * action,
+   struct lp_build_tgsi_context * bld_base,
+   struct lp_build_emit_data * emit_data)
+{
+   LLVMBuilderRef builder = bld_base->base.gallivm->builder;
+   LLVMValueRef div_mask = lp_build_cmp(_base->uint64_bld,
+PIPE_FUNC_EQUAL, emit_data->args[1],
+bld_base->uint64_bld.zero);
+   /* We want to make sure that we never divide/mod by zero to not
+* generate sigfpe. We don't want to crash just because the
+* shader is doing something weird. */
+   LLVMValueRef divisor = LLVMBuildOr(builder,
+  div_mask,
+  emit_data->args[1], "");
+   LLVMValueRef result = lp_build_mod(_base->uint64_bld,
+  emit_data->args[0], divisor);
+   /* umod by zero doesn't have a guaranteed return value chose -1 for now. */
+   emit_data->output[emit_data->chan] = LLVMBuildOr(builder,
+div_mask,
+result, "");
+}
+
+/* TGSI_OPCODE_MOD (CPU Only) */
+static void
+i64mod_emit(
+   const struct lp_build_tgsi_action * action,
+   struct lp_build_tgsi_context * bld_base,
+   struct 

[Mesa-dev] [PATCH 7/9] [rfc] radeonsi: enable 64-bit integer support.

2016-09-16 Thread Nicolai Hähnle
From: Dave Airlie 

This passes all my current piglit tests except the variants on:
fs-op-div-int64_t-i64vec3

I'm guessing this is probably a backend bug.

[rfc: this needs more testing - just posting to show I've done
it]

Reviewed-by: Marek Olšák 
Signed-off-by: Dave Airlie 
---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 64 +++---
 src/gallium/drivers/radeonsi/si_pipe.c |  1 +
 2 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 4fa43cd..9216143 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -44,20 +44,23 @@
 
 LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base,
  enum tgsi_opcode_type type)
 {
LLVMContextRef ctx = bld_base->base.gallivm->context;
 
switch (type) {
case TGSI_TYPE_UNSIGNED:
case TGSI_TYPE_SIGNED:
return LLVMInt32TypeInContext(ctx);
+   case TGSI_TYPE_UNSIGNED64:
+   case TGSI_TYPE_SIGNED64:
+   return LLVMInt64TypeInContext(ctx);
case TGSI_TYPE_DOUBLE:
return LLVMDoubleTypeInContext(ctx);
case TGSI_TYPE_UNTYPED:
case TGSI_TYPE_FLOAT:
return LLVMFloatTypeInContext(ctx);
default: break;
}
return 0;
 }
 
@@ -1173,26 +1176,32 @@ void radeon_llvm_emit_prepare_cube_coords(struct 
lp_build_tgsi_context *bld_base
 
 static void emit_icmp(const struct lp_build_tgsi_action *action,
  struct lp_build_tgsi_context *bld_base,
  struct lp_build_emit_data *emit_data)
 {
unsigned pred;
LLVMBuilderRef builder = bld_base->base.gallivm->builder;
LLVMContextRef context = bld_base->base.gallivm->context;
 
switch (emit_data->inst->Instruction.Opcode) {
-   case TGSI_OPCODE_USEQ: pred = LLVMIntEQ; break;
-   case TGSI_OPCODE_USNE: pred = LLVMIntNE; break;
-   case TGSI_OPCODE_USGE: pred = LLVMIntUGE; break;
-   case TGSI_OPCODE_USLT: pred = LLVMIntULT; break;
-   case TGSI_OPCODE_ISGE: pred = LLVMIntSGE; break;
-   case TGSI_OPCODE_ISLT: pred = LLVMIntSLT; break;
+   case TGSI_OPCODE_USEQ:
+   case TGSI_OPCODE_U64SEQ: pred = LLVMIntEQ; break;
+   case TGSI_OPCODE_USNE:
+   case TGSI_OPCODE_U64SNE: pred = LLVMIntNE; break;
+   case TGSI_OPCODE_USGE:
+   case TGSI_OPCODE_U64SGE: pred = LLVMIntUGE; break;
+   case TGSI_OPCODE_USLT:
+   case TGSI_OPCODE_U64SLT: pred = LLVMIntULT; break;
+   case TGSI_OPCODE_ISGE:
+   case TGSI_OPCODE_I64SGE: pred = LLVMIntSGE; break;
+   case TGSI_OPCODE_ISLT:
+   case TGSI_OPCODE_I64SLT: pred = LLVMIntSLT; break;
default:
assert(!"unknown instruction");
pred = 0;
break;
}
 
LLVMValueRef v = LLVMBuildICmp(builder, pred,
emit_data->args[0], emit_data->args[1],"");
 
v = LLVMBuildSExtOrBitCast(builder, v,
@@ -1434,21 +1443,26 @@ static void emit_xor(const struct lp_build_tgsi_action 
*action,
 }
 
 static void emit_ssg(const struct lp_build_tgsi_action *action,
 struct lp_build_tgsi_context *bld_base,
 struct lp_build_emit_data *emit_data)
 {
LLVMBuilderRef builder = bld_base->base.gallivm->builder;
 
LLVMValueRef cmp, val;
 
-   if (emit_data->inst->Instruction.Opcode == TGSI_OPCODE_ISSG) {
+   if (emit_data->inst->Instruction.Opcode == TGSI_OPCODE_I64SSG) {
+   cmp = LLVMBuildICmp(builder, LLVMIntSGT, emit_data->args[0], 
bld_base->int64_bld.zero, "");
+   val = LLVMBuildSelect(builder, cmp, bld_base->int64_bld.one, 
emit_data->args[0], "");
+   cmp = LLVMBuildICmp(builder, LLVMIntSGE, val, 
bld_base->int64_bld.zero, "");
+   val = LLVMBuildSelect(builder, cmp, val, 
LLVMConstInt(bld_base->int64_bld.elem_type, -1, true), "");
+   } else if (emit_data->inst->Instruction.Opcode == TGSI_OPCODE_ISSG) {
cmp = LLVMBuildICmp(builder, LLVMIntSGT, emit_data->args[0], 
bld_base->int_bld.zero, "");
val = LLVMBuildSelect(builder, cmp, bld_base->int_bld.one, 
emit_data->args[0], "");
cmp = LLVMBuildICmp(builder, LLVMIntSGE, val, 
bld_base->int_bld.zero, "");
val = LLVMBuildSelect(builder, cmp, val, 
LLVMConstInt(bld_base->int_bld.elem_type, -1, true), "");
} else { // float SSG
cmp = LLVMBuildFCmp(builder, LLVMRealOGT, emit_data->args[0], 
bld_base->base.zero, "");
val = LLVMBuildSelect(builder, cmp, bld_base->base.one, 
emit_data->args[0], "");
cmp = LLVMBuildFCmp(builder, LLVMRealOGE, val, 
bld_base->base.zero, "");

[Mesa-dev] [PATCH 1/9] gallium: add opcode and types for 64-bit integers. (v2)

2016-09-16 Thread Nicolai Hähnle
From: Dave Airlie 

This just adds the basic support for 64-bit opcodes,
and the new types.

v2: add conversion opcodes.
add documentation.

Reviewed-by: Marek Olšák 
Reviewed-by: Nicolai Hähnle 
Signed-off-by: Dave Airlie 
---
 src/gallium/auxiliary/tgsi/tgsi_info.c |  92 +--
 src/gallium/auxiliary/tgsi/tgsi_info.h |   4 +-
 src/gallium/docs/source/tgsi.rst   | 246 +
 src/gallium/include/pipe/p_shader_tokens.h |  46 --
 4 files changed, 368 insertions(+), 20 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
b/src/gallium/auxiliary/tgsi/tgsi_info.c
index 60e0f2c..e319be1 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_info.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
@@ -52,61 +52,61 @@ static const struct tgsi_opcode_info 
opcode_info[TGSI_OPCODE_LAST] =
{ 1, 2, 0, 0, 0, 0, 0, COMP, "MIN", TGSI_OPCODE_MIN },
{ 1, 2, 0, 0, 0, 0, 0, COMP, "MAX", TGSI_OPCODE_MAX },
{ 1, 2, 0, 0, 0, 0, 0, COMP, "SLT", TGSI_OPCODE_SLT },
{ 1, 2, 0, 0, 0, 0, 0, COMP, "SGE", TGSI_OPCODE_SGE },
{ 1, 3, 0, 0, 0, 0, 0, COMP, "MAD", TGSI_OPCODE_MAD },
{ 1, 2, 0, 0, 0, 0, 0, COMP, "SUB", TGSI_OPCODE_SUB },
{ 1, 3, 0, 0, 0, 0, 0, COMP, "LRP", TGSI_OPCODE_LRP },
{ 1, 3, 0, 0, 0, 0, 0, COMP, "FMA", TGSI_OPCODE_FMA },
{ 1, 1, 0, 0, 0, 0, 0, REPL, "SQRT", TGSI_OPCODE_SQRT },
{ 1, 3, 0, 0, 0, 0, 0, REPL, "DP2A", TGSI_OPCODE_DP2A },
-   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 22 },  /* removed */
-   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 23 },  /* removed */
+   { 1, 1, 0, 0, 0, 0, 0, COMP, "F2U64", TGSI_OPCODE_F2U64 },
+   { 1, 1, 0, 0, 0, 0, 0, COMP, "F2I64", TGSI_OPCODE_F2I64 },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "FRC", TGSI_OPCODE_FRC },
{ 1, 3, 0, 0, 0, 0, 0, COMP, "CLAMP", TGSI_OPCODE_CLAMP },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "FLR", TGSI_OPCODE_FLR },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "ROUND", TGSI_OPCODE_ROUND },
{ 1, 1, 0, 0, 0, 0, 0, REPL, "EX2", TGSI_OPCODE_EX2 },
{ 1, 1, 0, 0, 0, 0, 0, REPL, "LG2", TGSI_OPCODE_LG2 },
{ 1, 2, 0, 0, 0, 0, 0, REPL, "POW", TGSI_OPCODE_POW },
{ 1, 2, 0, 0, 0, 0, 0, COMP, "XPD", TGSI_OPCODE_XPD },
-   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 32 },  /* removed */
+   { 1, 1, 0, 0, 0, 0, 0, COMP, "I2U64", TGSI_OPCODE_I2U64 },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "ABS", TGSI_OPCODE_ABS },
-   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 34 },  /* removed */
+   { 1, 1, 0, 0, 0, 0, 0, COMP, "I2I64", TGSI_OPCODE_I2I64 },
{ 1, 2, 0, 0, 0, 0, 0, REPL, "DPH", TGSI_OPCODE_DPH },
{ 1, 1, 0, 0, 0, 0, 0, REPL, "COS", TGSI_OPCODE_COS },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "DDX", TGSI_OPCODE_DDX },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "DDY", TGSI_OPCODE_DDY },
{ 0, 0, 0, 0, 0, 0, 0, NONE, "KILL", TGSI_OPCODE_KILL },
{ 1, 1, 0, 0, 0, 0, 0, REPL, "PK2H", TGSI_OPCODE_PK2H },
{ 1, 1, 0, 0, 0, 0, 0, REPL, "PK2US", TGSI_OPCODE_PK2US },
{ 1, 1, 0, 0, 0, 0, 0, REPL, "PK4B", TGSI_OPCODE_PK4B },
{ 1, 1, 0, 0, 0, 0, 0, REPL, "PK4UB", TGSI_OPCODE_PK4UB },
-   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 44 },  /* removed */
+   { 1, 1, 0, 0, 0, 0, 1, COMP, "D2U64", TGSI_OPCODE_D2U64 },
{ 1, 2, 0, 0, 0, 0, 0, COMP, "SEQ", TGSI_OPCODE_SEQ },
-   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 46 },  /* removed */
+   { 1, 1, 0, 0, 0, 0, 1, COMP, "D2I64", TGSI_OPCODE_D2I64 },
{ 1, 2, 0, 0, 0, 0, 0, COMP, "SGT", TGSI_OPCODE_SGT },
{ 1, 1, 0, 0, 0, 0, 0, REPL, "SIN", TGSI_OPCODE_SIN },
{ 1, 2, 0, 0, 0, 0, 0, COMP, "SLE", TGSI_OPCODE_SLE },
{ 1, 2, 0, 0, 0, 0, 0, COMP, "SNE", TGSI_OPCODE_SNE },
-   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 51 },  /* removed */
+   { 1, 1, 0, 0, 0, 0, 1, COMP, "U642D", TGSI_OPCODE_U642D },
{ 1, 2, 1, 0, 0, 0, 0, OTHR, "TEX", TGSI_OPCODE_TEX },
{ 1, 4, 1, 0, 0, 0, 0, OTHR, "TXD", TGSI_OPCODE_TXD },
{ 1, 2, 1, 0, 0, 0, 0, OTHR, "TXP", TGSI_OPCODE_TXP },
{ 1, 1, 0, 0, 0, 0, 0, CHAN, "UP2H", TGSI_OPCODE_UP2H },
{ 1, 1, 0, 0, 0, 0, 0, CHAN, "UP2US", TGSI_OPCODE_UP2US },
{ 1, 1, 0, 0, 0, 0, 0, CHAN, "UP4B", TGSI_OPCODE_UP4B },
{ 1, 1, 0, 0, 0, 0, 0, CHAN, "UP4UB", TGSI_OPCODE_UP4UB },
-   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 59 },  /* removed */
-   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 60 },  /* removed */
+   { 1, 1, 0, 0, 0, 0, 1, COMP, "U642F", TGSI_OPCODE_U642F },
+   { 1, 1, 0, 0, 0, 0, 1, COMP, "I642F", TGSI_OPCODE_I642F },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "ARR", TGSI_OPCODE_ARR },
-   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 62 },  /* removed */
+   { 1, 1, 0, 0, 0, 0, 1, COMP, "I642D", TGSI_OPCODE_I642D },
{ 0, 0, 0, 0, 1, 0, 0, NONE, "CAL", TGSI_OPCODE_CAL },
{ 0, 0, 0, 0, 0, 0, 0, NONE, "RET", TGSI_OPCODE_RET },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "SSG", TGSI_OPCODE_SSG },
{ 1, 3, 0, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP },
{ 1, 1, 0, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS },
{ 1, 2, 1, 0, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB },
{ 0, 1, 

[Mesa-dev] [PATCH 6/9] squash! gallivm/llvmpipe: add support for ARB_gpu_shader_int64.

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

- PIPE_CAP_INT64 is not there yet
- restrict DIV/MOD defaults to the CPU, as for 32 bits
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 17 -
 src/gallium/drivers/llvmpipe/lp_screen.c   |  1 -
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
index ad512e9..010ad9d 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
@@ -1099,21 +1099,21 @@ u64mul_emit(
const struct lp_build_tgsi_action * action,
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
 {
emit_data->output[emit_data->chan] = lp_build_mul(_base->uint64_bld,
emit_data->args[0], emit_data->args[1]);
 }
 
 /* TGSI_OPCODE_U64MOD  */
 static void
-u64mod_emit(
+u64mod_emit_cpu(
const struct lp_build_tgsi_action * action,
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
 {
LLVMBuilderRef builder = bld_base->base.gallivm->builder;
LLVMValueRef div_mask = lp_build_cmp(_base->uint64_bld,
 PIPE_FUNC_EQUAL, emit_data->args[1],
 bld_base->uint64_bld.zero);
/* We want to make sure that we never divide/mod by zero to not
 * generate sigfpe. We don't want to crash just because the
@@ -1124,21 +1124,21 @@ u64mod_emit(
LLVMValueRef result = lp_build_mod(_base->uint64_bld,
   emit_data->args[0], divisor);
/* umod by zero doesn't have a guaranteed return value chose -1 for now. */
emit_data->output[emit_data->chan] = LLVMBuildOr(builder,
 div_mask,
 result, "");
 }
 
 /* TGSI_OPCODE_MOD (CPU Only) */
 static void
-i64mod_emit(
+i64mod_emit_cpu(
const struct lp_build_tgsi_action * action,
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
 {
LLVMBuilderRef builder = bld_base->base.gallivm->builder;
LLVMValueRef div_mask = lp_build_cmp(_base->uint64_bld,
 PIPE_FUNC_EQUAL, emit_data->args[1],
 bld_base->uint64_bld.zero);
/* We want to make sure that we never divide/mod by zero to not
 * generate sigfpe. We don't want to crash just because the
@@ -1149,21 +1149,21 @@ i64mod_emit(
LLVMValueRef result = lp_build_mod(_base->int64_bld,
   emit_data->args[0], divisor);
/* umod by zero doesn't have a guaranteed return value chose -1 for now. */
emit_data->output[emit_data->chan] = LLVMBuildOr(builder,
 div_mask,
 result, "");
 }
 
 /* TGSI_OPCODE_U64DIV */
 static void
-u64div_emit(
+u64div_emit_cpu(
const struct lp_build_tgsi_action * action,
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
 {
 
LLVMBuilderRef builder = bld_base->base.gallivm->builder;
LLVMValueRef div_mask = lp_build_cmp(_base->uint64_bld,
 PIPE_FUNC_EQUAL, emit_data->args[1],
 bld_base->uint64_bld.zero);
/* We want to make sure that we never divide/mod by zero to not
@@ -1175,21 +1175,21 @@ u64div_emit(
LLVMValueRef result = LLVMBuildUDiv(builder,
   emit_data->args[0], divisor, "");
/* udiv by zero is guaranteed to return 0x at least with d3d10 */
emit_data->output[emit_data->chan] = LLVMBuildOr(builder,
 div_mask,
 result, "");
 }
 
 /* TGSI_OPCODE_I64DIV */
 static void
-i64div_emit(
+i64div_emit_cpu(
const struct lp_build_tgsi_action * action,
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
 {
 
LLVMBuilderRef builder = bld_base->base.gallivm->builder;
LLVMValueRef div_mask = lp_build_cmp(_base->int64_bld,
 PIPE_FUNC_EQUAL, emit_data->args[1],
 bld_base->int64_bld.zero);
/* We want to make sure that we never divide/mod by zero to not
@@ -1373,24 +1373,20 @@ lp_set_default_actions(struct lp_build_tgsi_context * 
bld_base)
bld_base->op_actions[TGSI_OPCODE_F2D].emit = f2d_emit;
bld_base->op_actions[TGSI_OPCODE_I2D].emit = i2d_emit;
bld_base->op_actions[TGSI_OPCODE_U2D].emit = u2d_emit;
 
bld_base->op_actions[TGSI_OPCODE_DMAD].emit = dmad_emit;
 
bld_base->op_actions[TGSI_OPCODE_DRCP].emit = drcp_emit;

[Mesa-dev] [PATCH 9/9] gallivm: support negation on 64-bit integers

2016-09-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This should be analogous to 32-bit integers.
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
index b397261..68ac695 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
@@ -382,20 +382,24 @@ lp_build_emit_fetch(
  res = lp_build_negate( _base->base, res );
  break;
   case TGSI_TYPE_DOUBLE:
  /* no double build context */
  assert(0);
  break;
   case TGSI_TYPE_SIGNED:
   case TGSI_TYPE_UNSIGNED:
  res = lp_build_negate( _base->int_bld, res );
  break;
+  case TGSI_TYPE_SIGNED64:
+  case TGSI_TYPE_UNSIGNED64:
+ res = lp_build_negate( _base->int64_bld, res );
+ break;
   case TGSI_TYPE_VOID:
   default:
  assert(0);
  break;
   }
}
 
/*
 * Swizzle the argument
 */
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] gallium/tgsi: 64-bit integer foundations

2016-09-16 Thread Nicolai Hähnle
Hi all,

this is really Dave's work, with a few touch-ups from me that I think make
sense. I've kept those separate with the intention to squash. I'd like to
land these in master even before the main ARB_gpu_shader_int64 stuff lands
(that is currently in Ian's court).

The reason is that radeonsi's ARB_query_buffer_object support needs 64-bit
integers in shaders, and for that it's convenient to have all the TGSI
opcodes and gallivm bits in place already.

Any objections? Reviews?
Thanks,
Nicolai

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Revert "st/vdpau: use linear layout for output surfaces"

2016-09-16 Thread Alex Deucher
On Fri, Sep 16, 2016 at 4:03 AM, Christian König
 wrote:
> Am 16.09.2016 um 09:50 schrieb Michel Dänzer:
>>
>> On 16/09/16 04:33 PM, Christian König wrote:
>>>
>>> Am 15.09.2016 um 21:43 schrieb Dave Airlie:

 On 15 September 2016 at 17:43, Christian König
  wrote:
>
> Am 15.09.2016 um 06:00 schrieb Ilia Mirkin:
>>
>> On Wed, Sep 14, 2016 at 11:58 PM, Dave Airlie 
>> wrote:
>>>
>>> From: Dave Airlie 
>>>
>>> This reverts commit d180de35320eafa3df3d76f0e82b332656530126.
>>>
>>> This is a radeon specific hack that causes problems on nouveau
>>> when combined with the SHARED flag later. If radeonsi needs a fix
>>> for this, please fix it in the driver.
>
> Actually it isn't radeon specific. Using linear surfaces for this makes
> sense because tilling isn't beneficial and the surfaces can
> potentially be
> shared with other GPUs using the VDPAU OpenGL interop.

 Who says tiling isn't beneficial though? Maybe on other GPUs tiling
 might be, it
 still seems like a radeon centric view to me.
>>>
>>> Tiling helps with the memory throughput because it makes pixels which
>>> are rendered together appear near to each other in the memory layout as
>>> well.
>>>
>>> Since multimedia as well as compute applications usually always render
>>> to the whole texture/array/matrix it usually makes no sense at all to
>>> enable it for those tasks.
>>
>> Are you sure about that? Tiling also affects the order of memory accesses,
>> which could affect performance even when all pixels of a surface are
>> written.
>
>
> I can't 100% rule that out, but the hardware I've encountered so far orders
> the execution by the memory layout of the output buffer which is written to
> maximize throughput.
>
> On the other hand I never double checked how the MC on AMD hardware really
> works in the documentation, just toke some measurements and it didn't seemed
> to be beneficial at all.
>
> tiling/shuffling can actually hurt performance quite a bit when the whole
> buffer is written and the execution order doesn't follow the memory pattern,
> so I think we would have noticed that.

Tiling also hurts with GTT buffers due to the way pcie transactions work.

Alex

>
> Where tilling could help quite a bit is with the video surfaces, cause the
> deinterlacing shaders need to read them quite extensively, but unfortunately
> our decoding hardware can't fill it in the way it is needed.
>
> Regards,
> Christian.
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Revert "st/vdpau: use linear layout for output surfaces"

2016-09-16 Thread Marek Olšák
On Fri, Sep 16, 2016 at 10:03 AM, Christian König
 wrote:
> Am 16.09.2016 um 09:50 schrieb Michel Dänzer:
>>
>> On 16/09/16 04:33 PM, Christian König wrote:
>>>
>>> Am 15.09.2016 um 21:43 schrieb Dave Airlie:

 On 15 September 2016 at 17:43, Christian König
  wrote:
>
> Am 15.09.2016 um 06:00 schrieb Ilia Mirkin:
>>
>> On Wed, Sep 14, 2016 at 11:58 PM, Dave Airlie 
>> wrote:
>>>
>>> From: Dave Airlie 
>>>
>>> This reverts commit d180de35320eafa3df3d76f0e82b332656530126.
>>>
>>> This is a radeon specific hack that causes problems on nouveau
>>> when combined with the SHARED flag later. If radeonsi needs a fix
>>> for this, please fix it in the driver.
>
> Actually it isn't radeon specific. Using linear surfaces for this makes
> sense because tilling isn't beneficial and the surfaces can
> potentially be
> shared with other GPUs using the VDPAU OpenGL interop.

 Who says tiling isn't beneficial though? Maybe on other GPUs tiling
 might be, it
 still seems like a radeon centric view to me.
>>>
>>> Tiling helps with the memory throughput because it makes pixels which
>>> are rendered together appear near to each other in the memory layout as
>>> well.
>>>
>>> Since multimedia as well as compute applications usually always render
>>> to the whole texture/array/matrix it usually makes no sense at all to
>>> enable it for those tasks.
>>
>> Are you sure about that? Tiling also affects the order of memory accesses,
>> which could affect performance even when all pixels of a surface are
>> written.
>
>
> I can't 100% rule that out, but the hardware I've encountered so far orders
> the execution by the memory layout of the output buffer which is written to
> maximize throughput.
>
> On the other hand I never double checked how the MC on AMD hardware really
> works in the documentation, just toke some measurements and it didn't seemed
> to be beneficial at all.
>
> tiling/shuffling can actually hurt performance quite a bit when the whole
> buffer is written and the execution order doesn't follow the memory pattern,
> so I think we would have noticed that.
>
> Where tilling could help quite a bit is with the video surfaces, cause the
> deinterlacing shaders need to read them quite extensively, but unfortunately
> our decoding hardware can't fill it in the way it is needed.

Tiling is mainly used for drawing triangles whose average bounding box
is a square and since you want to touch as few cache lines as
possible, the tiles should also be squares.

Tiling doesn't make any sense for video decoding and also isn't
desirable for many compute applications.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/11] nir: pass compiler rather than devinfo to functions that call nir_optimize

2016-09-16 Thread Timothy Arceri
Later we will pass compiler to nir_optimise to be used by the loop unroll
pass.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp  | 10 --
 src/mesa/drivers/dri/i965/brw_nir.c   |  7 ---
 src/mesa/drivers/dri/i965/brw_nir.h   |  4 ++--
 src/mesa/drivers/dri/i965/brw_shader.cpp  |  4 ++--
 src/mesa/drivers/dri/i965/brw_vec4.cpp|  5 ++---
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |  5 ++---
 src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp|  4 ++--
 7 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index d026bbd..a686ade 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -6427,14 +6427,13 @@ brw_compile_fs(const struct brw_compiler *compiler, 
void *log_data,
char **error_str)
 {
nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
-   shader = brw_nir_apply_sampler_key(shader, compiler->devinfo, >tex,
-  true);
+   shader = brw_nir_apply_sampler_key(shader, compiler, >tex, true);
brw_nir_lower_fs_inputs(shader, compiler->devinfo, key);
brw_nir_lower_fs_outputs(shader);
if (!key->multisample_fbo)
   NIR_PASS_V(shader, demote_sample_qualifiers);
NIR_PASS_V(shader, move_interpolation_to_top);
-   shader = brw_postprocess_nir(shader, compiler->devinfo, true);
+   shader = brw_postprocess_nir(shader, compiler, true);
 
/* key->alpha_test_func means simulating alpha testing via discards,
 * so the shader definitely kills pixels.
@@ -6657,8 +6656,7 @@ brw_compile_cs(const struct brw_compiler *compiler, void 
*log_data,
char **error_str)
 {
nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
-   shader = brw_nir_apply_sampler_key(shader, compiler->devinfo, >tex,
-  true);
+   shader = brw_nir_apply_sampler_key(shader, compiler, >tex, true);
brw_nir_lower_cs_shared(shader);
prog_data->base.total_shared += shader->num_shared;
 
@@ -6671,7 +6669,7 @@ brw_compile_cs(const struct brw_compiler *compiler, void 
*log_data,
(unsigned)4 * (prog_data->thread_local_id_index + 1));
 
brw_nir_lower_intrinsics(shader, _data->base);
-   shader = brw_postprocess_nir(shader, compiler->devinfo, true);
+   shader = brw_postprocess_nir(shader, compiler, true);
 
prog_data->local_size[0] = shader->info.cs.local_size[0];
prog_data->local_size[1] = shader->info.cs.local_size[1];
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index 264d812..b75140b 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -508,10 +508,10 @@ brw_preprocess_nir(const struct brw_compiler *compiler, 
nir_shader *nir)
  * will not work.
  */
 nir_shader *
-brw_postprocess_nir(nir_shader *nir,
-const struct gen_device_info *devinfo,
+brw_postprocess_nir(nir_shader *nir, const struct brw_compiler *compiler,
 bool is_scalar)
 {
+   const struct gen_device_info *devinfo = compiler->devinfo;
bool debug_enabled =
   (INTEL_DEBUG & intel_debug_flag_for_shader_stage(nir->stage));
 
@@ -573,10 +573,11 @@ brw_postprocess_nir(nir_shader *nir,
 
 nir_shader *
 brw_nir_apply_sampler_key(nir_shader *nir,
-  const struct gen_device_info *devinfo,
+  const struct brw_compiler *compiler,
   const struct brw_sampler_prog_key_data *key_tex,
   bool is_scalar)
 {
+   const struct gen_device_info *devinfo = compiler->devinfo;
nir_lower_tex_options tex_options = { 0 };
 
/* Iron Lake and prior require lowering of all rectangle textures */
diff --git a/src/mesa/drivers/dri/i965/brw_nir.h 
b/src/mesa/drivers/dri/i965/brw_nir.h
index 425d6ce..f110e16 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.h
+++ b/src/mesa/drivers/dri/i965/brw_nir.h
@@ -115,7 +115,7 @@ void brw_nir_lower_fs_outputs(nir_shader *nir);
 void brw_nir_lower_cs_shared(nir_shader *nir);
 
 nir_shader *brw_postprocess_nir(nir_shader *nir,
-const struct gen_device_info *devinfo,
+const struct brw_compiler *compiler,
 bool is_scalar);
 
 bool brw_nir_apply_attribute_workarounds(nir_shader *nir,
@@ -127,7 +127,7 @@ bool brw_nir_apply_trig_workarounds(nir_shader *nir);
 void brw_nir_apply_tcs_quads_workaround(nir_shader *nir);
 
 nir_shader *brw_nir_apply_sampler_key(nir_shader *nir,
-  const struct gen_device_info *devinfo,
+  const struct brw_compiler *compiler,
   const struct brw_sampler_prog_key_data 
*key,
   bool is_scalar);
 
diff --git 

[Mesa-dev] [PATCH 11/11] i965: use nir loop unrolling pass

2016-09-16 Thread Timothy Arceri
V2:
- enable on all gens
---
 src/compiler/glsl/glsl_parser_extras.cpp | 12 +++-
 src/mesa/drivers/dri/i965/brw_compiler.c |  5 -
 src/mesa/drivers/dri/i965/brw_nir.c  | 23 ++-
 3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
b/src/compiler/glsl/glsl_parser_extras.cpp
index 0e9bfa7..6bf7f1d 100644
--- a/src/compiler/glsl/glsl_parser_extras.cpp
+++ b/src/compiler/glsl/glsl_parser_extras.cpp
@@ -2083,12 +2083,14 @@ do_common_optimization(exec_list *ir, bool linked,
OPT(optimize_split_arrays, ir, linked);
OPT(optimize_redundant_jumps, ir);
 
-   loop_state *ls = analyze_loop_variables(ir);
-   if (ls->loop_found) {
-  OPT(set_loop_controls, ir, ls);
-  OPT(unroll_loops, ir, ls, options);
+   if (options->MaxUnrollIterations != 0) {
+  loop_state *ls = analyze_loop_variables(ir);
+  if (ls->loop_found) {
+ OPT(set_loop_controls, ir, ls);
+ OPT(unroll_loops, ir, ls, options);
+  }
+  delete ls;
}
-   delete ls;
 
 #undef OPT
 
diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c 
b/src/mesa/drivers/dri/i965/brw_compiler.c
index 9318aa6..6d3f41a 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.c
+++ b/src/mesa/drivers/dri/i965/brw_compiler.c
@@ -55,6 +55,7 @@ static const struct nir_shader_compiler_options 
scalar_nir_options = {
.lower_unpack_snorm_4x8 = true,
.lower_unpack_unorm_2x16 = true,
.lower_unpack_unorm_4x8 = true,
+   .max_unroll_iterations = 32,
 };
 
 static const struct nir_shader_compiler_options vector_nir_options = {
@@ -75,6 +76,7 @@ static const struct nir_shader_compiler_options 
vector_nir_options = {
.lower_unpack_unorm_2x16 = true,
.lower_extract_byte = true,
.lower_extract_word = true,
+   .max_unroll_iterations = 32,
 };
 
 static const struct nir_shader_compiler_options vector_nir_options_gen6 = {
@@ -92,6 +94,7 @@ static const struct nir_shader_compiler_options 
vector_nir_options_gen6 = {
.lower_unpack_unorm_2x16 = true,
.lower_extract_byte = true,
.lower_extract_word = true,
+   .max_unroll_iterations = 32,
 };
 
 struct brw_compiler *
@@ -119,7 +122,7 @@ brw_compiler_create(void *mem_ctx, const struct 
gen_device_info *devinfo)
 
/* We want the GLSL compiler to emit code that uses condition codes */
for (int i = 0; i < MESA_SHADER_STAGES; i++) {
-  compiler->glsl_compiler_options[i].MaxUnrollIterations = 32;
+  compiler->glsl_compiler_options[i].MaxUnrollIterations = 0;
   compiler->glsl_compiler_options[i].MaxIfDepth =
  devinfo->gen < 6 ? 16 : UINT_MAX;
 
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index b75140b..f433d73 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -396,8 +396,17 @@ brw_nir_lower_cs_shared(nir_shader *nir)
 #define OPT_V(pass, ...) NIR_PASS_V(nir, pass, ##__VA_ARGS__)
 
 static nir_shader *
-nir_optimize(nir_shader *nir, bool is_scalar)
+nir_optimize(nir_shader *nir, const struct brw_compiler *compiler,
+ bool is_scalar)
 {
+   nir_variable_mode indirect_mask = 0;
+   if (compiler->glsl_compiler_options[nir->stage].EmitNoIndirectInput)
+  indirect_mask |= nir_var_shader_in;
+   if (compiler->glsl_compiler_options[nir->stage].EmitNoIndirectOutput)
+  indirect_mask |= nir_var_shader_out;
+   if (compiler->glsl_compiler_options[nir->stage].EmitNoIndirectTemp)
+  indirect_mask |= nir_var_local;
+
bool progress;
do {
   progress = false;
@@ -420,6 +429,10 @@ nir_optimize(nir_shader *nir, bool is_scalar)
   OPT(nir_opt_algebraic);
   OPT(nir_opt_constant_folding);
   OPT(nir_opt_dead_cf);
+  if (nir->options->max_unroll_iterations != 0) {
+ OPT_V(nir_to_lcssa, indirect_mask);
+ OPT(nir_opt_loop_unroll, indirect_mask);
+  }
   OPT(nir_opt_remove_phis);
   OPT(nir_opt_undef);
   OPT_V(nir_lower_doubles, nir_lower_drcp |
@@ -473,7 +486,7 @@ brw_preprocess_nir(const struct brw_compiler *compiler, 
nir_shader *nir)
 
OPT(nir_split_var_copies);
 
-   nir = nir_optimize(nir, is_scalar);
+   nir = nir_optimize(nir, compiler, is_scalar);
 
if (is_scalar) {
   OPT_V(nir_lower_load_const_to_scalar);
@@ -493,7 +506,7 @@ brw_preprocess_nir(const struct brw_compiler *compiler, 
nir_shader *nir)
nir_lower_indirect_derefs(nir, indirect_mask);
 
/* Get rid of split copies */
-   nir = nir_optimize(nir, is_scalar);
+   nir = nir_optimize(nir, compiler, is_scalar);
 
OPT(nir_remove_dead_variables, nir_var_local);
 
@@ -518,7 +531,7 @@ brw_postprocess_nir(nir_shader *nir, const struct 
brw_compiler *compiler,
bool progress; /* Written by OPT and OPT_V */
(void)progress;
 
-   nir = nir_optimize(nir, is_scalar);
+   nir = nir_optimize(nir, compiler, is_scalar);
 
if (devinfo->gen >= 6) {
   /* Try and fuse multiply-adds */
@@ -607,7 +620,7 @@ 

Re: [Mesa-dev] [PATCH] gallium/docs: document alpha_to_coverage and alpha_to_one blend state

2016-09-16 Thread Marek Olšák
On Thu, Sep 15, 2016 at 11:35 PM, Ilia Mirkin  wrote:
> What about integer RTs? I had to add a hack in nouveau to make it
> disable those when RT0 is an integer. It'd be more convenient if they
> were turned off in the first place.

Deriving one hw state from 2 states isn't hacking. That's normal.

I'm generally not in favor of adding more dependencies to st/mesa
states, because it would increase the amount of state processing for
everybody.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/11] nir: add helper for cloning loops

2016-09-16 Thread Timothy Arceri
---
 src/compiler/nir/nir.h   |  2 ++
 src/compiler/nir/nir_clone.c | 41 ++---
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 29a6f45..d052cad 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2338,6 +2338,8 @@ void nir_print_instr(const nir_instr *instr, FILE *fp);
 
 nir_shader *nir_shader_clone(void *mem_ctx, const nir_shader *s);
 nir_function_impl *nir_function_impl_clone(const nir_function_impl *fi);
+void nir_clone_loop_list(struct exec_list *dst, const struct exec_list *list,
+ struct hash_table *remap_table, nir_shader *ns);
 nir_constant *nir_constant_clone(const nir_constant *c, nir_variable *var);
 nir_variable *nir_variable_clone(const nir_variable *c, nir_shader *shader);
 
diff --git a/src/compiler/nir/nir_clone.c b/src/compiler/nir/nir_clone.c
index 8808333..7afccbb 100644
--- a/src/compiler/nir/nir_clone.c
+++ b/src/compiler/nir/nir_clone.c
@@ -35,6 +35,11 @@ typedef struct {
/* True if we are cloning an entire shader. */
bool global_clone;
 
+   /* This allows us to clone a loop body without having to add srcs from
+* outside the loop to the remap table. This is useful for loop unrolling.
+*/
+   bool allow_remap_fallback;
+
/* maps orig ptr -> cloned ptr: */
struct hash_table *remap_table;
 
@@ -46,11 +51,19 @@ typedef struct {
 } clone_state;
 
 static void
-init_clone_state(clone_state *state, bool global)
+init_clone_state(clone_state *state, struct hash_table *remap_table,
+ bool global, bool allow_remap_fallback)
 {
state->global_clone = global;
-   state->remap_table = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
-_mesa_key_pointer_equal);
+   state->allow_remap_fallback = allow_remap_fallback;
+
+   if (remap_table) {
+  state->remap_table = remap_table;
+   } else {
+  state->remap_table = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+   }
+
list_inithead(>phi_srcs);
 }
 
@@ -72,9 +85,8 @@ _lookup_ptr(clone_state *state, const void *ptr, bool global)
   return (void *)ptr;
 
entry = _mesa_hash_table_search(state->remap_table, ptr);
-   assert(entry && "Failed to find pointer!");
if (!entry)
-  return NULL;
+  return state->allow_remap_fallback ? (void *)ptr : NULL;
 
return entry->data;
 }
@@ -613,6 +625,21 @@ fixup_phi_srcs(clone_state *state)
assert(list_empty(>phi_srcs));
 }
 
+void
+nir_clone_loop_list(struct exec_list *dst, const struct exec_list *list,
+struct hash_table *remap_table, nir_shader *ns)
+{
+   clone_state state;
+   init_clone_state(, remap_table, false, true);
+
+   /* We use the same shader */
+   state.ns = ns;
+
+   clone_cf_list(, dst, list);
+
+   fixup_phi_srcs();
+}
+
 static nir_function_impl *
 clone_function_impl(clone_state *state, const nir_function_impl *fi)
 {
@@ -646,7 +673,7 @@ nir_function_impl *
 nir_function_impl_clone(const nir_function_impl *fi)
 {
clone_state state;
-   init_clone_state(, false);
+   init_clone_state(, NULL, false, false);
 
/* We use the same shader */
state.ns = fi->function->shader;
@@ -686,7 +713,7 @@ nir_shader *
 nir_shader_clone(void *mem_ctx, const nir_shader *s)
 {
clone_state state;
-   init_clone_state(, true);
+   init_clone_state(, NULL, true, false);
 
nir_shader *ns = nir_shader_create(mem_ctx, s->stage, s->options);
state.ns = ns;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/11] nir: add a loop unrolling pass

2016-09-16 Thread Timothy Arceri
V2:
- tidy ups suggested by Connor.
- tidy up cloning logic and handle copy propagation
 based of suggestion by Connor.
- use nir_ssa_def_rewrite_uses to fix up lcssa phis
  suggested by Connor.
- add support for complex loop unrolling (two terminators)
- handle case were the ssa defs use outside the loop is already a phi
- support unrolling loops with multiple terminators when trip count
  is know for each terminator

V3:
- set correct num_components when creating phi in complex unroll
---
 src/compiler/Makefile.sources  |   1 +
 src/compiler/nir/nir.h |   2 +
 src/compiler/nir/nir_opt_loop_unroll.c | 821 +
 3 files changed, 824 insertions(+)
 create mode 100644 src/compiler/nir/nir_opt_loop_unroll.c

diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index 8ef6080..b3512bb 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -233,6 +233,7 @@ NIR_FILES = \
nir/nir_opt_dead_cf.c \
nir/nir_opt_gcm.c \
nir/nir_opt_global_to_local.c \
+   nir/nir_opt_loop_unroll.c \
nir/nir_opt_peephole_select.c \
nir/nir_opt_remove_phis.c \
nir/nir_opt_undef.c \
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index d052cad..c287809 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2673,6 +2673,8 @@ bool nir_opt_dead_cf(nir_shader *shader);
 
 bool nir_opt_gcm(nir_shader *shader, bool value_number);
 
+bool nir_opt_loop_unroll(nir_shader *shader, nir_variable_mode indirect_mask);
+
 bool nir_opt_peephole_select(nir_shader *shader);
 
 bool nir_opt_remove_phis(nir_shader *shader);
diff --git a/src/compiler/nir/nir_opt_loop_unroll.c 
b/src/compiler/nir/nir_opt_loop_unroll.c
new file mode 100644
index 000..bd0135c
--- /dev/null
+++ b/src/compiler/nir/nir_opt_loop_unroll.c
@@ -0,0 +1,821 @@
+/*
+ * Copyright © 2016 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "nir.h"
+#include "nir_builder.h"
+#include "nir_control_flow.h"
+
+static void
+extract_loop_body(nir_cf_list *extracted, nir_cf_node *node)
+{
+   nir_cf_node *end = node;
+   while (!nir_cf_node_is_last(end))
+  end = nir_cf_node_next(end);
+
+   nir_cf_extract(extracted, nir_before_cf_node(node),
+  nir_after_cf_node(end));
+}
+
+static void
+clone_list(nir_shader *ns, nir_loop *loop, nir_cf_list *src_cf_list,
+   nir_cf_list *cloned_cf_list, struct hash_table *remap_table)
+{
+   /* Dest list needs to at least have one block */
+   nir_block *nblk = nir_block_create(ns);
+   nblk->cf_node.parent = loop->cf_node.parent;
+   exec_list_push_tail(_cf_list->list, >cf_node.node);
+
+   nir_clone_loop_list(_cf_list->list, _cf_list->list,
+   remap_table, ns);
+}
+
+static void
+move_cf_list_into_if(nir_cf_list *lst, nir_cf_node *if_node,
+ nir_cf_node *last_node, bool continue_from_then_branch)
+{
+   nir_if *if_stmt = nir_cf_node_as_if(if_node);
+   if (continue_from_then_branch) {
+  /* Move the rest of the loop inside the then */
+  nir_cf_reinsert(lst, nir_after_cf_node(nir_if_last_then_node(if_stmt)));
+   } else {
+  /* Move the rest of the loop inside the else */
+  nir_cf_reinsert(lst, nir_after_cf_node(nir_if_last_else_node(if_stmt)));
+   }
+
+   /* Remove the break */
+   nir_instr_remove(nir_block_last_instr(nir_cf_node_as_block(last_node)));
+}
+
+static bool
+is_phi_src_phi_from_loop_header(nir_ssa_def *def, nir_ssa_def *src)
+{
+   return def->parent_instr->type == nir_instr_type_phi &&
+  src->parent_instr->type == nir_instr_type_phi &&
+  nir_instr_as_phi(def->parent_instr)->instr.block->index ==
+  nir_instr_as_phi(src->parent_instr)->instr.block->index;
+}
+
+static void
+get_table_of_lcssa_and_loop_term_phis(nir_cf_node *loop,
+  struct 

[Mesa-dev] [PATCH 07/11] nir: create helper for fixing phi srcs when cloning

2016-09-16 Thread Timothy Arceri
This will be useful for fixing phi srcs when cloning a loop body
during loop unrolling.
---
 src/compiler/nir/nir_clone.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/src/compiler/nir/nir_clone.c b/src/compiler/nir/nir_clone.c
index 0e397b0..8808333 100644
--- a/src/compiler/nir/nir_clone.c
+++ b/src/compiler/nir/nir_clone.c
@@ -593,6 +593,26 @@ clone_cf_list(clone_state *state, struct exec_list *dst,
}
 }
 
+/* After we've cloned almost everything, we have to walk the list of phi
+ * sources and fix them up.  Thanks to loops, the block and SSA value for a
+ * phi source may not be defined when we first encounter it.  Instead, we
+ * add it to the phi_srcs list and we fix it up here.
+ */
+static void
+fixup_phi_srcs(clone_state *state)
+{
+   list_for_each_entry_safe(nir_phi_src, src, >phi_srcs, src.use_link) {
+  src->pred = remap_local(state, src->pred);
+  assert(src->src.is_ssa);
+  src->src.ssa = remap_local(state, src->src.ssa);
+
+  /* Remove from this list and place in the uses of the SSA def */
+  list_del(>src.use_link);
+  list_addtail(>src.use_link, >src.ssa->uses);
+   }
+   assert(list_empty(>phi_srcs));
+}
+
 static nir_function_impl *
 clone_function_impl(clone_state *state, const nir_function_impl *fi)
 {
@@ -614,21 +634,7 @@ clone_function_impl(clone_state *state, const 
nir_function_impl *fi)
 
clone_cf_list(state, >body, >body);
 
-   /* After we've cloned almost everything, we have to walk the list of phi
-* sources and fix them up.  Thanks to loops, the block and SSA value for a
-* phi source may not be defined when we first encounter it.  Instead, we
-* add it to the phi_srcs list and we fix it up here.
-*/
-   list_for_each_entry_safe(nir_phi_src, src, >phi_srcs, src.use_link) {
-  src->pred = remap_local(state, src->pred);
-  assert(src->src.is_ssa);
-  src->src.ssa = remap_local(state, src->src.ssa);
-
-  /* Remove from this list and place in the uses of the SSA def */
-  list_del(>src.use_link);
-  list_addtail(>src.use_link, >src.ssa->uses);
-   }
-   assert(list_empty(>phi_srcs));
+   fixup_phi_srcs(state);
 
/* All metadata is invalidated in the cloning process */
nfi->valid_metadata = 0;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/11] nir: Add a LCSAA-pass

2016-09-16 Thread Timothy Arceri
From: Thomas Helland 

V2: Do a "depth first search" to convert to LCSSA

V3: Small comment fixup

V4: Rebase, adapt to removal of function overloads

V5: Rebase, adapt to relocation of nir to compiler/nir
Still need to adapt to potential if-uses
Work around nir_validate issue

V6 (Timothy):
 - tidy lcssa and stop leaking memory
 - dont rewrite the src for the lcssa phi node
 - validate lcssa phi srcs to avoid postvalidate assert
 - don't add new phi if one already exists
 - more lcssa phi validation fixes
 - Rather than marking ssa defs inside a loop just mark blocks inside
   a loop. This is simpler and fixes lcssa for intrinsics which do
   not have a destination.
 - don't create LCSSA phis for loops we won't unroll
 - require loop metadata for lcssa pass
 - handle case were the ssa defs use outside the loop is already a phi

V7: (Timothy)
- pass indirect mask to metadata call
---
 src/compiler/Makefile.sources   |   1 +
 src/compiler/nir/nir.h  |   6 ++
 src/compiler/nir/nir_to_lcssa.c | 227 
 src/compiler/nir/nir_validate.c |  11 +-
 4 files changed, 242 insertions(+), 3 deletions(-)
 create mode 100644 src/compiler/nir/nir_to_lcssa.c

diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index 7ed26a9..8ef6080 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -247,6 +247,7 @@ NIR_FILES = \
nir/nir_search_helpers.h \
nir/nir_split_var_copies.c \
nir/nir_sweep.c \
+   nir/nir_to_lcssa.c \
nir/nir_to_ssa.c \
nir/nir_validate.c \
nir/nir_vla.h \
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index cc8f4b6..29a6f45 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1387,6 +1387,8 @@ typedef struct {
struct exec_list srcs; /** < list of nir_phi_src */
 
nir_dest dest;
+
+   bool is_lcssa_phi;
 } nir_phi_instr;
 
 typedef struct {
@@ -2643,6 +2645,10 @@ void nir_convert_to_ssa(nir_shader *shader);
 bool nir_repair_ssa_impl(nir_function_impl *impl);
 bool nir_repair_ssa(nir_shader *shader);
 
+void nir_to_lcssa_impl(nir_function_impl *impl,
+   nir_variable_mode indirect_mask);
+void nir_to_lcssa(nir_shader *shader, nir_variable_mode indirect_mask);
+
 /* If phi_webs_only is true, only convert SSA values involved in phi nodes to
  * registers.  If false, convert all values (even those not involved in a phi
  * node) to registers.
diff --git a/src/compiler/nir/nir_to_lcssa.c b/src/compiler/nir/nir_to_lcssa.c
new file mode 100644
index 000..25d0bdb
--- /dev/null
+++ b/src/compiler/nir/nir_to_lcssa.c
@@ -0,0 +1,227 @@
+/*
+ * Copyright © 2015 Thomas Helland
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/*
+ * This pass converts the ssa-graph into "Loop Closed SSA form". This is
+ * done by placing phi nodes at the exits of the loop for all values
+ * that are used outside the loop. The result is it transforms:
+ *
+ * loop {->  loop {
+ *ssa2 = ->  ssa2 = ...
+ *if (cond)  ->  if (cond) {
+ *   break;  -> break;
+ *ssa3 = ssa2 * ssa4 ->  }
+ * } ->  ssa3 = ssa2 * ssa4
+ * ssa6 = ssa2 + 4   ->   }
+ *ssa5 = lcssa_phi(ssa2)
+ *ssa6 = ssa5 + 4
+ */
+
+#include "nir.h"
+
+typedef struct {
+   /* The nir_shader we are transforming */
+   nir_shader *shader;
+
+   /* The loop we store information for */
+   nir_loop *loop;
+
+   /* Keep track of which defs are in the loop */
+   BITSET_WORD *is_in_loop;
+
+   /* General purpose bool */
+   bool flag;
+} lcssa_state;
+
+static void
+mark_block_as_in_loop(nir_block *blk, void *state)
+{
+   lcssa_state *state_cast = 

[Mesa-dev] [PATCH 03/11] nir: Add a loop analysis pass

2016-09-16 Thread Timothy Arceri
From: Thomas Helland 

This pass detects induction variables and calculates the
trip count of loops to be used for loop unrolling.

I've removed support for float induction values for now, for the
simple reason that they don't appear in my shader-db collection,
and so I don't see it as common enough that we want to pollute the
pass with this in the initial version.

V2: Rebase, adapt to removal of function overloads

V3: (Timothy Arceri)
 - don't try to find trip count if loop terminator conditional is a phi
 - fix trip count for do-while loops
 - replace conditional type != alu assert with return
 - disable unrolling of loops with continues
 - multiple fixes to memory allocation, stop leaking and don't destroy
   structs we want to use for unrolling.
 - fix iteration count bugs when induction var not on RHS of condition
 - add FIXME for && conditions
 - calculate trip count for unsigned induction/limit vars

V4:
- count instructions in a loop
- set the limiting_terminator even if we can't find the trip count for
 all terminators. This is needed for complex unrolling where we handle
 2 terminators and the trip count is unknown for one of them.
- restruct structs so we don't keep information not required after
 analysis and remove dead fields.
- force unrolling in some cases as per the rules in the GLSL IR pass

V5:
- fix metadata mask value 0x10 vs 0x16
---
 src/compiler/Makefile.sources   |2 +
 src/compiler/nir/nir.h  |   36 +-
 src/compiler/nir/nir_loop_analyze.c | 1012 +++
 src/compiler/nir/nir_metadata.c |8 +-
 4 files changed, 1056 insertions(+), 2 deletions(-)
 create mode 100644 src/compiler/nir/nir_loop_analyze.c

diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index f5b4f9c..7ed26a9 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -190,6 +190,8 @@ NIR_FILES = \
nir/nir_intrinsics.c \
nir/nir_intrinsics.h \
nir/nir_liveness.c \
+   nir/nir_loop_analyze.c \
+   nir/nir_loop_analyze.h \
nir/nir_lower_alu_to_scalar.c \
nir/nir_lower_atomics.c \
nir/nir_lower_bitmap.c \
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index aac247c..4272051 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1552,9 +1552,36 @@ nir_if_last_else_node(nir_if *if_stmt)
 }
 
 typedef struct {
+   nir_if *nif;
+
+   nir_instr *conditional_instr;
+
+   struct list_head loop_terminator_link;
+} nir_loop_terminator;
+
+typedef struct {
+   /* Number of instructions in the loop */
+   unsigned num_instructions;
+
+   /* How many times the loop is run (if known) */
+   unsigned trip_count;
+   bool is_trip_count_known;
+
+   /* Unroll the loop regardless of its size */
+   bool force_unroll;
+
+   nir_loop_terminator *limiting_terminator;
+
+   /* A list of loop_terminators terminating this loop. */
+   struct list_head loop_terminator_list;
+} nir_loop_info;
+
+typedef struct {
nir_cf_node cf_node;
 
struct exec_list body; /** < list of nir_cf_node */
+
+   nir_loop_info *info;
 } nir_loop;
 
 static inline nir_cf_node *
@@ -1579,6 +1606,7 @@ typedef enum {
nir_metadata_dominance = 0x2,
nir_metadata_live_ssa_defs = 0x4,
nir_metadata_not_properly_reset = 0x8,
+   nir_metadata_loop_analysis = 0x10,
 } nir_metadata;
 
 typedef struct {
@@ -1761,6 +1789,8 @@ typedef struct nir_shader_compiler_options {
 * information must be inferred from the list of input nir_variables.
 */
bool use_interpolated_input_intrinsics;
+
+   unsigned max_unroll_iterations;
 } nir_shader_compiler_options;
 
 typedef struct nir_shader_info {
@@ -1965,7 +1995,7 @@ nir_loop *nir_loop_create(nir_shader *shader);
 nir_function_impl *nir_cf_node_get_function(nir_cf_node *node);
 
 /** requests that the given pieces of metadata be generated */
-void nir_metadata_require(nir_function_impl *impl, nir_metadata required);
+void nir_metadata_require(nir_function_impl *impl, nir_metadata required, ...);
 /** dirties all but the preserved metadata */
 void nir_metadata_preserve(nir_function_impl *impl, nir_metadata preserved);
 
@@ -2570,6 +2600,10 @@ void nir_lower_double_pack(nir_shader *shader);
 bool nir_normalize_cubemap_coords(nir_shader *shader);
 
 void nir_live_ssa_defs_impl(nir_function_impl *impl);
+
+void nir_loop_analyze_impl(nir_function_impl *impl,
+   nir_variable_mode indirect_mask);
+
 bool nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def *b);
 
 void nir_convert_to_ssa_impl(nir_function_impl *impl);
diff --git a/src/compiler/nir/nir_loop_analyze.c 
b/src/compiler/nir/nir_loop_analyze.c
new file mode 100644
index 000..6bea9e5
--- /dev/null
+++ b/src/compiler/nir/nir_loop_analyze.c
@@ -0,0 +1,1012 @@
+/*
+ * Copyright © 2015 Thomas Helland
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated 

[Mesa-dev] [PATCH 02/11] i965: use nir_lower_indirect_derefs() for GLSL

2016-09-16 Thread Timothy Arceri
This moves the nir_lower_indirect_derefs() call into
brw_preprocess_nir() so thats is called by both OpenGL and Vulkan
and removes that call to the old GLSL IR pass
lower_variable_index_to_cond_assign()

We want to do this pass in nir to be able to move loop unrolling
to nir.

There is a increase of 1-3 instructions in a small number of shaders,
and 2 Kerbal Space program shaders that increase by 32 instructions.

Shader-db results BDW:

total instructions in shared programs: 8705873 -> 8706194 (0.00%)
instructions in affected programs: 32515 -> 32836 (0.99%)
helped: 3
HURT: 79

total cycles in shared programs: 74618120 -> 74583476 (-0.05%)
cycles in affected programs: 528104 -> 493460 (-6.56%)
helped: 47
HURT: 37

LOST:   2
GAINED: 0
---
 src/intel/vulkan/anv_pipeline.c| 10 --
 src/mesa/drivers/dri/i965/brw_link.cpp | 14 --
 src/mesa/drivers/dri/i965/brw_nir.c| 10 ++
 3 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index f96fe22..f292f0b 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -183,16 +183,6 @@ anv_shader_compile_to_nir(struct anv_device *device,
 
nir_shader_gather_info(nir, entry_point->impl);
 
-   nir_variable_mode indirect_mask = 0;
-   if (compiler->glsl_compiler_options[stage].EmitNoIndirectInput)
-  indirect_mask |= nir_var_shader_in;
-   if (compiler->glsl_compiler_options[stage].EmitNoIndirectOutput)
-  indirect_mask |= nir_var_shader_out;
-   if (compiler->glsl_compiler_options[stage].EmitNoIndirectTemp)
-  indirect_mask |= nir_var_local;
-
-   nir_lower_indirect_derefs(nir, indirect_mask);
-
return nir;
 }
 
diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp 
b/src/mesa/drivers/dri/i965/brw_link.cpp
index 2b1fa61..abb997b 100644
--- a/src/mesa/drivers/dri/i965/brw_link.cpp
+++ b/src/mesa/drivers/dri/i965/brw_link.cpp
@@ -139,20 +139,6 @@ process_glsl_ir(gl_shader_stage stage,
 
do_copy_propagation(shader->ir);
 
-   bool lowered_variable_indexing =
-  lower_variable_index_to_cond_assign((gl_shader_stage)stage,
-  shader->ir,
-  options->EmitNoIndirectInput,
-  options->EmitNoIndirectOutput,
-  options->EmitNoIndirectTemp,
-  options->EmitNoIndirectUniform);
-
-   if (unlikely(brw->perf_debug && lowered_variable_indexing)) {
-  perf_debug("Unsupported form of variable indexing in %s; falling "
- "back to very inefficient code generation\n",
- _mesa_shader_stage_to_abbrev(shader->Stage));
-   }
-
bool progress;
do {
   progress = false;
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index fbc84c4..264d812 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -482,6 +482,16 @@ brw_preprocess_nir(const struct brw_compiler *compiler, 
nir_shader *nir)
/* Lower a bunch of stuff */
OPT_V(nir_lower_var_copies);
 
+   nir_variable_mode indirect_mask = 0;
+   if (compiler->glsl_compiler_options[nir->stage].EmitNoIndirectInput)
+  indirect_mask |= nir_var_shader_in;
+   if (compiler->glsl_compiler_options[nir->stage].EmitNoIndirectOutput)
+  indirect_mask |= nir_var_shader_out;
+   if (compiler->glsl_compiler_options[nir->stage].EmitNoIndirectTemp)
+  indirect_mask |= nir_var_local;
+
+   nir_lower_indirect_derefs(nir, indirect_mask);
+
/* Get rid of split copies */
nir = nir_optimize(nir, is_scalar);
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/11] nir: don't count removal of lcssa_phi as progress

2016-09-16 Thread Timothy Arceri
---
 src/compiler/nir/nir_opt_remove_phis.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir_opt_remove_phis.c 
b/src/compiler/nir/nir_opt_remove_phis.c
index acaa6e1..d4344b0 100644
--- a/src/compiler/nir/nir_opt_remove_phis.c
+++ b/src/compiler/nir/nir_opt_remove_phis.c
@@ -73,6 +73,7 @@ remove_phis_block(nir_block *block, nir_builder *b)
  break;
 
   nir_phi_instr *phi = nir_instr_as_phi(instr);
+  bool is_lcssa_phi = phi->is_lcssa_phi;
 
   nir_ssa_def *def = NULL;
   nir_alu_instr *mov = NULL;
@@ -133,7 +134,8 @@ remove_phis_block(nir_block *block, nir_builder *b)
   nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(def));
   nir_instr_remove(instr);
 
-  progress = true;
+  if (!is_lcssa_phi)
+ progress = true;
}
 
return progress;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/11] nir: add helpers to check if we can unroll loops

2016-09-16 Thread Timothy Arceri
This will be used by the loop unroll and lcssa passes.

V2:
- Check instruction count is not too large for unrolling
- Add helper for complex loop unrolling
---
 src/compiler/nir/nir.h | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 4272051..cc8f4b6 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2601,6 +2601,37 @@ bool nir_normalize_cubemap_coords(nir_shader *shader);
 
 void nir_live_ssa_defs_impl(nir_function_impl *impl);
 
+static inline bool
+is_loop_small_enough_to_unroll(nir_shader *shader, nir_loop_info *li)
+{
+   unsigned max_iter = shader->options->max_unroll_iterations;
+
+   if (li->trip_count > max_iter)
+  return false;
+
+   if (li->force_unroll)
+  return true;
+
+   bool loop_not_too_large =
+  li->num_instructions * li->trip_count <= max_iter * 25;
+
+   return loop_not_too_large;
+}
+
+static inline bool
+is_complex_loop(nir_shader *shader, nir_loop_info *li)
+{
+   unsigned num_lt = list_length(>loop_terminator_list);
+   return is_loop_small_enough_to_unroll(shader, li) && num_lt == 2;
+}
+
+static inline bool
+is_simple_loop(nir_shader *shader, nir_loop_info *li)
+{
+   return li->is_trip_count_known &&
+  is_loop_small_enough_to_unroll(shader, li);
+}
+
 void nir_loop_analyze_impl(nir_function_impl *impl,
nir_variable_mode indirect_mask);
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] V4 Loop unrolling in NIR

2016-09-16 Thread Timothy Arceri
Sorry for the noise. Connor pointed out that some of my assumptions
for not enabling this on all gens were wrong this lead to finding
a subtle bug where loop analysis was being run when is shouldn't
due to an error with the loop analysis flag (0x10 vs 0x16).

This version enabled unrolling for all gens. As well as the bug fix
and dropping the restrictions Patch 1 is also new.

Note: I ran this in Jenkin and it seems ok but I was getting a lot
of noise from failing GLES 3.1 tests on HSW it seems like 3.1 was
enabled on master but not my branch for some reason.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] vl/dri3: handle the case of different GPU(v4.1)

2016-09-16 Thread Nayan Deshmukh
In case of prime when rendering is done on GPU other then the
server GPU, use a seprate linear buffer for each back buffer
which will be displayed using present extension.

v2: Use a seprate linear buffer for each back buffer (Michel)
v3: Change variable names and fix coding style (Leo and Emil)
v4: Use PIPE_BIND_SAMPLER_VIEW for back buffer in case when
a seprate linear buffer is used (Michel)
v4.1: remove empty line

Signed-off-by: Nayan Deshmukh 
---
 src/gallium/auxiliary/vl/vl_winsys_dri3.c | 61 ---
 1 file changed, 48 insertions(+), 13 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index 3d596a6..e0aaad8 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -49,6 +49,7 @@
 struct vl_dri3_buffer
 {
struct pipe_resource *texture;
+   struct pipe_resource *linear_texture;
 
uint32_t pixmap;
uint32_t sync_fence;
@@ -69,6 +70,8 @@ struct vl_dri3_screen
xcb_present_event_t eid;
xcb_special_event_t *special_event;
 
+   struct pipe_context *pipe;
+
struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM];
int cur_back;
 
@@ -82,6 +85,7 @@ struct vl_dri3_screen
int64_t last_ust, ns_frame, last_msc, next_msc;
 
bool flushed;
+   bool is_different_gpu;
 };
 
 static void
@@ -102,6 +106,8 @@ dri3_free_back_buffer(struct vl_dri3_screen *scrn,
xcb_sync_destroy_fence(scrn->conn, buffer->sync_fence);
xshmfence_unmap_shm(buffer->shm_fence);
pipe_resource_reference(>texture, NULL);
+   if (buffer->linear_texture)
+   pipe_resource_reference(>linear_texture, NULL);
FREE(buffer);
 }
 
@@ -209,7 +215,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
xcb_sync_fence_t sync_fence;
struct xshmfence *shm_fence;
int buffer_fd, fence_fd;
-   struct pipe_resource templ;
+   struct pipe_resource templ, *pixmap_buffer_texture;
struct winsys_handle whandle;
unsigned usage;
 
@@ -226,8 +232,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
   goto close_fd;
 
memset(, 0, sizeof(templ));
-   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
-PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW;
templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
templ.target = PIPE_TEXTURE_2D;
templ.last_level = 0;
@@ -235,16 +240,34 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
templ.height0 = scrn->height;
templ.depth0 = 1;
templ.array_size = 1;
-   buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
- );
-   if (!buffer->texture)
-  goto unmap_shm;
 
+   if (scrn->is_different_gpu) {
+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+);
+  if (!buffer->texture)
+ goto unmap_shm;
+
+  templ.bind |= PIPE_BIND_SCANOUT | PIPE_BIND_SHARED |
+PIPE_BIND_LINEAR;
+  buffer->linear_texture = 
scrn->base.pscreen->resource_create(scrn->base.pscreen,
+  );
+  pixmap_buffer_texture = buffer->linear_texture;
+
+  if (!buffer->linear_texture)
+ goto no_linear_texture;
+   } else {
+  templ.bind |= PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+);
+  if (!buffer->texture)
+ goto unmap_shm;
+  pixmap_buffer_texture = buffer->texture;
+   }
memset(, 0, sizeof(whandle));
whandle.type= DRM_API_HANDLE_TYPE_FD;
usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ;
scrn->base.pscreen->resource_get_handle(scrn->base.pscreen, NULL,
-   buffer->texture, ,
+   pixmap_buffer_texture, ,
usage);
buffer_fd = whandle.handle;
buffer->pitch = whandle.stride;
@@ -271,6 +294,8 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
 
return buffer;
 
+no_linear_texture:
+   pipe_resource_reference(>texture, NULL);
 unmap_shm:
xshmfence_unmap_shm(shm_fence);
 close_fd:
@@ -474,6 +499,7 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)context_private;
uint32_t options = XCB_PRESENT_OPTION_NONE;
struct vl_dri3_buffer *back;
+   struct pipe_box src_box;
 
back = scrn->back_buffers[scrn->cur_back];
if (!back)
@@ -485,6 +511,16 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
 return;
}
 
+   if (scrn->is_different_gpu) {
+  u_box_origin_2d(scrn->width, scrn->height, _box);
+  

Re: [Mesa-dev] [PATCH 1/3] vl/dri3: handle the case of different GPU(v4)

2016-09-16 Thread Nayan Deshmukh
Hi Michel,

Thanks for the review.

On Fri, Sep 16, 2016 at 1:47 PM, Christian König 
wrote:

> Am 16.09.2016 um 10:07 schrieb Michel Dänzer:
>
>> On 14/09/16 02:34 PM, Nayan Deshmukh wrote:
>>
>>> In case of prime when rendering is done on GPU other then the
>>> server GPU, use a seprate linear buffer for each back buffer
>>> which will be displayed using present extension.
>>>
>>> v2: Use a seprate linear buffer for each back buffer (Michel)
>>> v3: Change variable names and fix coding style (Leo and Emil)
>>> v4: Use PIPE_BIND_SAMPLER_VIEW for back buffer in case when
>>>  a seprate linear buffer is used (Michel)
>>>
>>> Signed-off-by: Nayan Deshmukh 
>>>
>> Looks mostly good to me, but Leo should probably also take a look.
>>
>> Acked-by: Michel Dänzer 
>>
>
> I was just about to ping you guys for comments on this.
>
> Leo anything more you want to add or should I commit it? I'm not so deep
> into this code either.
>
> Christian.
>
>
>
>>
>> @@ -235,16 +240,35 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
>>>  templ.height0 = scrn->height;
>>>  templ.depth0 = 1;
>>>  templ.array_size = 1;
>>> -   buffer->texture = scrn->base.pscreen->resource_c
>>> reate(scrn->base.pscreen,
>>> - );
>>> -   if (!buffer->texture)
>>> -  goto unmap_shm;
>>>   +   if (scrn->is_different_gpu) {
>>> +  buffer->texture = scrn->base.pscreen->resource_c
>>> reate(scrn->base.pscreen,
>>> +);
>>> +  if (!buffer->texture)
>>> + goto unmap_shm;
>>> +
>>> +  templ.bind |= PIPE_BIND_SCANOUT | PIPE_BIND_SHARED |
>>> +PIPE_BIND_LINEAR;
>>> +  buffer->linear_texture = scrn->base.pscreen->resource_c
>>> reate(scrn->base.pscreen,
>>> +
>>> );
>>> +  pixmap_buffer_texture = buffer->linear_texture;
>>> +
>>> +  if (!buffer->linear_texture)
>>> + goto no_linear_texture;
>>> +
>>> +   } else {
>>>
>> Bonus points for not adding the empty line at the end of the block here.
>> :)
>>
>> I have submitted another patch without empty line for the bonus points.
:)

Regards,
Nayan.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >