Re: [Mesa-dev] [PATCH 02/17] i965: enable component packing for vs and fs

2016-07-07 Thread Timothy Arceri
On Thu, 2016-07-07 at 20:12 -0700, Jason Ekstrand wrote:
> 
> On Jul 6, 2016 6:59 PM, "Timothy Arceri"  m> wrote:
> >
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp     | 20 
> >  src/mesa/drivers/dri/i965/brw_fs.h       |  5 +++--
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 29
> -
> >  3 files changed, 35 insertions(+), 19 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 2f473cc..9e7223e 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -1109,7 +1109,8 @@ fs_visitor::emit_general_interpolation(fs_reg
> *attr, const char *name,
> >                                         const glsl_type *type,
> >                                         glsl_interp_qualifier
> interpolation_mode,
> >                                         int *location, bool
> mod_centroid,
> > -                                       bool mod_sample)
> > +                                       bool mod_sample,
> > +                                       unsigned
> num_packed_components)
> >  {
> >     assert(stage == MESA_SHADER_FRAGMENT);
> >     brw_wm_prog_data *prog_data = (brw_wm_prog_data*) this-
> >prog_data;
> > @@ -1131,22 +1132,26 @@
> fs_visitor::emit_general_interpolation(fs_reg *attr, const char
> *name,
> >
> >        for (unsigned i = 0; i < length; i++) {
> >           emit_general_interpolation(attr, name, elem_type,
> interpolation_mode,
> > -                                    location, mod_centroid,
> mod_sample);
> > +                                    location, mod_centroid,
> mod_sample,
> > +                                    num_packed_components);
> >        }
> >     } else if (type->is_record()) {
> >        for (unsigned i = 0; i < type->length; i++) {
> >           const glsl_type *field_type = type-
> >fields.structure[i].type;
> >           emit_general_interpolation(attr, name, field_type,
> interpolation_mode,
> > -                                    location, mod_centroid,
> mod_sample);
> > +                                    location, mod_centroid,
> mod_sample,
> > +                                    num_packed_components);
> >        }
> >     } else {
> >        assert(type->is_scalar() || type->is_vector());
> > +      unsigned num_components = num_packed_components ?
> > +         num_packed_components : type->vector_elements;
> >
> >        if (prog_data->urb_setup[*location] == -1) {
> >           /* If there's no incoming setup data for this slot, don't
> >            * emit interpolation for it.
> >            */
> > -         *attr = offset(*attr, bld, type->vector_elements);
> > +         *attr = offset(*attr, bld, num_components);

> This appears to be the only interesting use of num_components. 
> Pardon my while I ask a really stupid question:  why can't we just
> make it always 4 and call it a day?

The size of *attr ultimately comes from nir_assign_var_locations() so
this might work fine for the vec4 backend but for the scalar backend we
would need to count the size differently meaning some kind of hack.

However, as I pointed out in my other reply patch 4 also makes use of
num_packed_components for calculating the location of arrays elements
so there is still that to deal with also. 


> >           (*location)++;
> >           return;
> >        }
> > @@ -1158,7 +1163,6 @@ fs_visitor::emit_general_interpolation(fs_reg
> *attr, const char *name,
> >            * handed us defined values in only the constant offset
> >            * field of the setup reg.
> >            */
> > -         unsigned vector_elements = type->vector_elements;
> >
> >           /* Data starts at suboffet 3 in 32-bit units (12 bytes),
> so it is not
> >            * 64-bit aligned and the current implementation fails to
> read the
> > @@ -1166,10 +1170,10 @@
> fs_visitor::emit_general_interpolation(fs_reg *attr, const char
> *name,
> >            * read it as vector of floats with twice the number of
> components.
> >            */
> >           if (attr->type == BRW_REGISTER_TYPE_DF) {
> > -            vector_elements *= 2;
> > +            num_components *= 2;
> >              attr->type = BRW_REGISTER_TYPE_F;
> >           }
> > -         for (unsigned int i = 0; i < vector_elements; i++) {
> > +         for (unsigned int i = 0; i < num_components; i++) {
> >              struct brw_reg interp = interp_reg(*location, i);
> >              interp = suboffset(interp, 3);
> >              interp.type = attr->type;
> > @@ -1178,7 +1182,7 @@ fs_visitor::emit_general_interpolation(fs_reg
> *attr, const char *name,
> >           }
> >        } else {
> >           /* Smooth/noperspective interpolation case. */
> > -         for (unsigned int i = 0; i < type->vector_elements; i++)
> {
> > +         for (unsigned int i = 0; i < num_components; i++) {
> >              struct brw_reg interp = 

Re: [Mesa-dev] [PATCH 01/17] glsl/nir: add new num_packed_components field

2016-07-07 Thread Timothy Arceri
On Thu, 2016-07-07 at 20:12 -0700, Jason Ekstrand wrote:
> 
> On Jul 7, 2016 7:47 PM, wrote:
> >
> >
> > On Jul 7, 2016 9:59 AM, "Kenneth Graunke" 
> wrote:
> > >
> > > On Thursday, July 7, 2016 11:58:43 AM PDT Timothy Arceri wrote:
> > > > This will be used to store the total number of components used
> at this location
> > > > when packing via ARB_enhanced_layouts.
> > > > ---
> > > >  src/compiler/glsl/glsl_to_nir.cpp   |  1 +
> > > >  src/compiler/glsl/ir.h              |  5 +++
> > > >  src/compiler/glsl/link_varyings.cpp | 74
> -
> > > >  src/compiler/glsl/linker.cpp        |  2 +
> > > >  src/compiler/glsl/linker.h          |  4 ++
> > > >  src/compiler/nir/nir.h              |  5 +++
> > > >  6 files changed, 89 insertions(+), 2 deletions(-)
> > >
> > > I still hate this field.  I'm going to try and come up with an
> alternate
> > > solution.  I'll keep you posted.

I look forward to any suggestions :)

> >
> > On a first brush, me too.
> I still haven't finished reading but here's a maybe useful (don't
> count on it; I'm working on half a brain right now) suggestion:  It
> seems rather easy to declare an array uint8_t
> components[MAX_LOCATION] and do a real quick walk of the inputs to
> populate it.  Then we can just use that when we set up interpolation.

I believe the problem I was having was with arrays, see patch 4.

On top of that as far I can tell the spec allows arrays starting at
different locations and of different sizes  etc to be packed together.
I believe there is likely still corner cases my series doesn't catch
yet to do with arrays but I haven't yet created piglit tests to cover
ever scenario. 


> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/17] glsl/nir: add new num_packed_components field

2016-07-07 Thread Jason Ekstrand
On Jul 7, 2016 9:59 AM, "Kenneth Graunke"  wrote:
>
> On Thursday, July 7, 2016 11:58:43 AM PDT Timothy Arceri wrote:
> > This will be used to store the total number of components used at this
location
> > when packing via ARB_enhanced_layouts.
> > ---
> >  src/compiler/glsl/glsl_to_nir.cpp   |  1 +
> >  src/compiler/glsl/ir.h  |  5 +++
> >  src/compiler/glsl/link_varyings.cpp | 74
-
> >  src/compiler/glsl/linker.cpp|  2 +
> >  src/compiler/glsl/linker.h  |  4 ++
> >  src/compiler/nir/nir.h  |  5 +++
> >  6 files changed, 89 insertions(+), 2 deletions(-)
>
> I still hate this field.  I'm going to try and come up with an alternate
> solution.  I'll keep you posted.

On a first brush, me too.

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/17] i965: enable component packing for vs and fs

2016-07-07 Thread Jason Ekstrand
On Jul 6, 2016 6:59 PM, "Timothy Arceri" 
wrote:
>
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 20 
>  src/mesa/drivers/dri/i965/brw_fs.h   |  5 +++--
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 29
-
>  3 files changed, 35 insertions(+), 19 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 2f473cc..9e7223e 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1109,7 +1109,8 @@ fs_visitor::emit_general_interpolation(fs_reg
*attr, const char *name,
> const glsl_type *type,
> glsl_interp_qualifier
interpolation_mode,
> int *location, bool mod_centroid,
> -   bool mod_sample)
> +   bool mod_sample,
> +   unsigned num_packed_components)
>  {
> assert(stage == MESA_SHADER_FRAGMENT);
> brw_wm_prog_data *prog_data = (brw_wm_prog_data*) this->prog_data;
> @@ -1131,22 +1132,26 @@ fs_visitor::emit_general_interpolation(fs_reg
*attr, const char *name,
>
>for (unsigned i = 0; i < length; i++) {
>   emit_general_interpolation(attr, name, elem_type,
interpolation_mode,
> -location, mod_centroid, mod_sample);
> +location, mod_centroid, mod_sample,
> +num_packed_components);
>}
> } else if (type->is_record()) {
>for (unsigned i = 0; i < type->length; i++) {
>   const glsl_type *field_type = type->fields.structure[i].type;
>   emit_general_interpolation(attr, name, field_type,
interpolation_mode,
> -location, mod_centroid, mod_sample);
> +location, mod_centroid, mod_sample,
> +num_packed_components);
>}
> } else {
>assert(type->is_scalar() || type->is_vector());
> +  unsigned num_components = num_packed_components ?
> + num_packed_components : type->vector_elements;
>
>if (prog_data->urb_setup[*location] == -1) {
>   /* If there's no incoming setup data for this slot, don't
>* emit interpolation for it.
>*/
> - *attr = offset(*attr, bld, type->vector_elements);
> + *attr = offset(*attr, bld, num_components);

This appears to be the only interesting use of num_components.  Pardon my
while I ask a really stupid question:  why can't we just make it always 4
and call it a day?

>   (*location)++;
>   return;
>}
> @@ -1158,7 +1163,6 @@ fs_visitor::emit_general_interpolation(fs_reg
*attr, const char *name,
>* handed us defined values in only the constant offset
>* field of the setup reg.
>*/
> - unsigned vector_elements = type->vector_elements;
>
>   /* Data starts at suboffet 3 in 32-bit units (12 bytes), so it
is not
>* 64-bit aligned and the current implementation fails to read
the
> @@ -1166,10 +1170,10 @@ fs_visitor::emit_general_interpolation(fs_reg
*attr, const char *name,
>* read it as vector of floats with twice the number of
components.
>*/
>   if (attr->type == BRW_REGISTER_TYPE_DF) {
> -vector_elements *= 2;
> +num_components *= 2;
>  attr->type = BRW_REGISTER_TYPE_F;
>   }
> - for (unsigned int i = 0; i < vector_elements; i++) {
> + for (unsigned int i = 0; i < num_components; i++) {
>  struct brw_reg interp = interp_reg(*location, i);
>  interp = suboffset(interp, 3);
>  interp.type = attr->type;
> @@ -1178,7 +1182,7 @@ fs_visitor::emit_general_interpolation(fs_reg
*attr, const char *name,
>   }
>} else {
>   /* Smooth/noperspective interpolation case. */
> - for (unsigned int i = 0; i < type->vector_elements; i++) {
> + for (unsigned int i = 0; i < num_components; i++) {
>  struct brw_reg interp = interp_reg(*location, i);
>  if (devinfo->needs_unlit_centroid_workaround &&
mod_centroid) {
> /* Get the pixel/sample mask into f0 so that we know
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h
b/src/mesa/drivers/dri/i965/brw_fs.h
> index 1f88f8f..0c72802 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -181,7 +181,7 @@ public:
> const glsl_type *type,
> glsl_interp_qualifier
interpolation_mode,
> int *location, bool mod_centroid,
> -   bool mod_sample);
> +

Re: [Mesa-dev] [PATCH 01/17] glsl/nir: add new num_packed_components field

2016-07-07 Thread Jason Ekstrand
On Jul 7, 2016 7:47 PM, wrote:
>
>
> On Jul 7, 2016 9:59 AM, "Kenneth Graunke"  wrote:
> >
> > On Thursday, July 7, 2016 11:58:43 AM PDT Timothy Arceri wrote:
> > > This will be used to store the total number of components used at
this location
> > > when packing via ARB_enhanced_layouts.
> > > ---
> > >  src/compiler/glsl/glsl_to_nir.cpp   |  1 +
> > >  src/compiler/glsl/ir.h  |  5 +++
> > >  src/compiler/glsl/link_varyings.cpp | 74
-
> > >  src/compiler/glsl/linker.cpp|  2 +
> > >  src/compiler/glsl/linker.h  |  4 ++
> > >  src/compiler/nir/nir.h  |  5 +++
> > >  6 files changed, 89 insertions(+), 2 deletions(-)
> >
> > I still hate this field.  I'm going to try and come up with an alternate
> > solution.  I'll keep you posted.
>
> On a first brush, me too.

I still haven't finished reading but here's a maybe useful (don't count on
it; I'm working on half a brain right now) suggestion:  It seems rather
easy to declare an array uint8_t components[MAX_LOCATION] and do a real
quick walk of the inputs to populate it.  Then we can just use that when we
set up interpolation.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] anv/dump: Fix post-blit memory barrier

2016-07-07 Thread Jason Ekstrand
Drp...

Reviewed-by: Jason Ekstrand 
On Jul 7, 2016 4:06 PM, "Chad Versace"  wrote:

> Swap srcAccessMask and dstAccessMask.
> ---
>  src/intel/vulkan/anv_dump.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_dump.c b/src/intel/vulkan/anv_dump.c
> index 49a5ae2..4a5a44f 100644
> --- a/src/intel/vulkan/anv_dump.c
> +++ b/src/intel/vulkan/anv_dump.c
> @@ -158,8 +158,8 @@ dump_image_do_blit(struct anv_device *device, struct
> dump_image *image,
>0, 0, NULL, 0, NULL, 1,
>&(VkImageMemoryBarrier) {
>   .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
> - .srcAccessMask = VK_ACCESS_HOST_READ_BIT,
> - .dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
> + .srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
> + .dstAccessMask = VK_ACCESS_HOST_READ_BIT,
>   .oldLayout = VK_IMAGE_LAYOUT_GENERAL,
>   .newLayout = VK_IMAGE_LAYOUT_GENERAL,
>   .srcQueueFamilyIndex = 0,
> --
> 2.9.0.rc2
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] i965/fs: don't copy propagate if the instruction writes to more than two adjacent GRFs

2016-07-07 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> This is not allowed by the HW and copy propagation can hide this issue to
> lower_simd_width pass, which is going to fix it.
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index 438f681..c7f7628 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -752,6 +752,7 @@ can_propagate_from(fs_inst *inst)
>  inst->src[0].file == UNIFORM ||
>  inst->src[0].file == IMM) &&
> inst->src[0].type == inst->dst.type &&
> +   inst->regs_written <= 2 &&

This doesn't look right to me, why should copy propagation care whether
the SIMD width of a MOV instruction it's going to propagate away is
allowed by the hardware or not?  The "illegal" copy instruction is
already there anyway, and preventing copy propagation from doing its job
in that case can only increase the likelihood that the unsupported
instruction will remain in the program which implies more work for the
SIMD lowering pass at a later point.

> !inst->is_partial_write());
>  }
>  
> -- 
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/17] glsl/nir: add new num_packed_components field

2016-07-07 Thread Edward O'Callaghan
Patches 7, 8, 10-13 & 15-17 are simple enough,

Reviewed-by: Edward O'Callaghan 

I would perhaps suggest squashing 16+17 together, up to you.

Kind Regards,
Edward.

On 07/08/2016 10:38 AM, Timothy Arceri wrote:
> On Thu, 2016-07-07 at 17:50 +1000, Edward O'Callaghan wrote:
>> Hi,
>>
>> There is a typing issue in this patch in that, you are converting
>> ‘gl_linked_shader*’ to ‘gl_shader*’ for the first argument to
>> function
>> ‘void set_num_packed_components(gl_shader*, ir_variable_mode,
>> unsigned
>> int)’ at the various call sites.
> 
> Whoops, not sure how I missed that when rebasing. Fix pushed to the
> branch in my repo. Thanks.
> 
>>
>> Cheers,
>> Edward.
>>
>>
>> On 07/07/2016 11:58 AM, Timothy Arceri wrote:
>>> This will be used to store the total number of components used at
>>> this location
>>> when packing via ARB_enhanced_layouts.
>>> ---
>>>  src/compiler/glsl/glsl_to_nir.cpp   |  1 +
>>>  src/compiler/glsl/ir.h  |  5 +++
>>>  src/compiler/glsl/link_varyings.cpp | 74
>>> -
>>>  src/compiler/glsl/linker.cpp|  2 +
>>>  src/compiler/glsl/linker.h  |  4 ++
>>>  src/compiler/nir/nir.h  |  5 +++
>>>  6 files changed, 89 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/compiler/glsl/glsl_to_nir.cpp
>>> b/src/compiler/glsl/glsl_to_nir.cpp
>>> index 20302e3..baba624 100644
>>> --- a/src/compiler/glsl/glsl_to_nir.cpp
>>> +++ b/src/compiler/glsl/glsl_to_nir.cpp
>>> @@ -375,6 +375,7 @@ nir_visitor::visit(ir_variable *ir)
>>> var->data.explicit_binding = ir->data.explicit_binding;
>>> var->data.has_initializer = ir->data.has_initializer;
>>> var->data.location_frac = ir->data.location_frac;
>>> +   var->data.num_packed_components = ir-
 data.num_packed_components;
>>>  
>>> switch (ir->data.depth_layout) {
>>> case ir_depth_layout_none:
>>> diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
>>> index 1325e35..637b53c 100644
>>> --- a/src/compiler/glsl/ir.h
>>> +++ b/src/compiler/glsl/ir.h
>>> @@ -770,6 +770,11 @@ public:
>>>unsigned location_frac:2;
>>>  
>>>/**
>>> +   * The total number of components packed into this location.
>>> +   */
>>> +  unsigned num_packed_components:4;
>>> +
>>> +  /**
>>> * Layout of the matrix.  Uses glsl_matrix_layout values.
>>> */
>>>unsigned matrix_layout:2;
>>> diff --git a/src/compiler/glsl/link_varyings.cpp
>>> b/src/compiler/glsl/link_varyings.cpp
>>> index 76d0be1..35f97a9 100644
>>> --- a/src/compiler/glsl/link_varyings.cpp
>>> +++ b/src/compiler/glsl/link_varyings.cpp
>>> @@ -1975,6 +1975,70 @@ reserved_varying_slot(struct
>>> gl_linked_shader *stage,
>>> return slots;
>>>  }
>>>  
>>> +void
>>> +set_num_packed_components(struct gl_shader *shader,
>>> ir_variable_mode io_mode,
>>> +  unsigned base_offset)
>>> +{
>>> +   /* Find the max number of components used at this location */
>>> +   unsigned num_components[MAX_VARYINGS_INCL_PATCH] = { 0 };
>>> +
>>> +   foreach_in_list(ir_instruction, node, shader->ir) {
>>> +  ir_variable *const var = node->as_variable();
>>> +
>>> +  if (var == NULL || var->data.mode != io_mode ||
>>> +  !var->data.explicit_location)
>>> + continue;
>>> +
>>> +  int idx = var->data.location - base_offset;
>>> +  if (idx < 0 || idx >= MAX_VARYINGS_INCL_PATCH ||
>>> +  var->type->without_array()->is_record() ||
>>> +  var->type->without_array()->is_matrix())
>>> + continue;
>>> +
>>> +  if (var->type->is_array()) {
>>> + const glsl_type *type = get_varying_type(var, shader-
 Stage);
>>> + unsigned array_components = type->without_array()-
 vector_elements +
>>> +var->data.location_frac;
>>> + assert(type->arrays_of_arrays_size() + idx <=
>>> +ARRAY_SIZE(num_components));
>>> + for (unsigned i = idx; i < type->arrays_of_arrays_size(); 
>>> i++) {
>>> +num_components[i] = MAX2(array_components,
>>> num_components[i]);
>>> + }
>>> +  } else {
>>> + unsigned comps = var->type->vector_elements +
>>> +var->data.location_frac;
>>> + num_components[idx] = MAX2(comps, num_components[idx]);
>>> +  }
>>> +   }
>>> +
>>> +   foreach_in_list(ir_instruction, node, shader->ir) {
>>> +  ir_variable *const var = node->as_variable();
>>> +
>>> +  if (var == NULL || var->data.mode != io_mode ||
>>> +  !var->data.explicit_location)
>>> + continue;
>>> +
>>> +  int idx = var->data.location - base_offset;
>>> +  if (idx < 0 || idx >= MAX_VARYINGS_INCL_PATCH ||
>>> +  var->type->without_array()->is_record() ||
>>> +  var->type->without_array()->is_matrix())
>>> + continue;
>>> +
>>> +  /* For arrays we need to check all elements in order to find
>>> the max
>>> +   * number of components 

Re: [Mesa-dev] [PATCH 3/6] i965/fs/gen7: split instructions that run into exec masking bugs

2016-07-07 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> In fp64 we can produce code like this:
>
> mov(16) vgrf2<2>:UD, vgrf3<2>:UD
>
> That our simd lowering pass would typically split in instructions with a
> width of 8, writing to two consecutive registers each. Unfortunately, gen7
> hardware has a bug affecting execution masking and as a result, the
> second GRF register write won't work properly. Curro verified this:
>
> "The problem is that pre-Gen8 EUs are hardwired to use the QtrCtrl+1
>  (where QtrCtrl is the 8-bit quarter of the execution mask signals
>  specified in the instruction control fields) for the second
>  compressed half of any single-precision instruction (for
>  double-precision instructions it's hardwired to use NibCtrl+1),
>  which means that the EU will apply the wrong execution controls
>  for the second sequential GRF write if the number of channels per
>  GRF is not exactly eight in single-precision mode (or four in
>  double-float mode)."
>
> In practice, this means that we cannot write more than one
> consecutive GRF in a single instruction if the number of channels
> per GRF is not exactly eight in single-precision mode (or four
> in double-float mode).
>
> This patch makes our SIMD lowering pass split this kind of instructions
> so that the split versions only write to a single register. In the
> example above this means that we split the write in 4 instructions, each
> one writing 4 UD elements (width = 4) to a single register.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 2f473cc..caf88d1 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4677,6 +4677,26 @@ static unsigned
>  get_fpu_lowered_simd_width(const struct brw_device_info *devinfo,
> const fs_inst *inst)
>  {
> +   /* Pre-Gen8 EUs are hardwired to use the QtrCtrl+1 (where QtrCtrl is
> +* the 8-bit quarter of the execution mask signals specified in the
> +* instruction control fields) for the second compressed half of any
> +* single-precision instruction (for double-precision instructions
> +* it's hardwired to use NibCtrl+1), which means that the EU will

When I found out the hardware issue you describe in this comment I only
had a HSW at hand, so I looked into this again today in order to verify
whether IVB/VLV behave in the same way, and I'm afraid they don't...  I
haven't tried it on real IVB hardware, but at least the simulator
behaves the same as HSW for single precision execution types, while for
double precision types instruction decompression seems to be completely
busted.  AFAICT it applies the same channel enable signals to both
halves of the compressed instruction which will be just wrong under
non-uniform control flow.  Can you clarify in the comment above that the
text in parentheses referring to double-precision instructions may only
apply to HSW?

Have you been able to get any of the FP64 non-uniform control flow tests
to pass on IVB?  If you have I guess this may be a simulator-only bug,
although I'm not sure the FS tests you have written will be non-uniform
enough to reproduce the issue.  If you haven't, we may have to add
another check here in order to lower all non-force_writemask_all DF
instructions to SIMD4 on IVB/VLV...  :(

> +* apply the wrong execution controls for the second sequential GRF
> +* write if the number of channels per GRF is not exactly eight in
> +* single-precision mode (or four in double-float mode).
> +*
> +* In this situation we calculate the maximum size of the split
> +* instructions so they only ever write to a single register.
> +*/
> +   unsigned type_size = type_sz(inst->dst.type);
> +   unsigned channels_per_grf = inst->exec_size / inst->regs_written;

This will cause a division by zero if the instruction doesn't write any
registers.  Strictly speaking you'd need to check the source types too
in order to find out whether the instruction is compressed...

> +   assert(channels_per_grf > 0);
> +   if (devinfo->gen < 8 && inst->regs_written > 1 &&
> +   channels_per_grf != REG_SIZE / type_size) {

I believe the hardware is more stupid than that, it doesn't really
calculate the number of components that fit in a single GRF and then
shifts QtrCtrl based on that, but rather it's hardwired to shift by four
channels in DF mode (at least on HSW) or by eight channels for any other
execution type, so you need to find out what the execution type of the
instruction is (which is not necessarily the same as the destination
type).

> +  return channels_per_grf;

For this to interact nicely with the other restrictions implemented in
the same function in case several of them ever apply at the same time,
move the check down and have it 

[Mesa-dev] [PATCH 3/3] docs: Mark KHR_texture_compression_astc_sliced_3d done on i965

2016-07-07 Thread Anuj Phogat
Signed-off-by: Anuj Phogat 
---
 docs/GL3.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index ce34869..883604a 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -255,6 +255,7 @@ GLES3.2, GLSL ES 3.2
   GL_KHR_debug  DONE (all drivers)
   GL_KHR_robustness DONE (i965)
   GL_KHR_texture_compression_astc_ldr   DONE (i965/gen9+)
+  GL_KHR_texture_compression_astc_sliced_3d DONE (i965/gen9+)
   GL_OES_copy_image DONE (i965)
   GL_OES_draw_buffers_indexed   DONE (all drivers that 
support GL_ARB_draw_buffers_blend)
   GL_OES_draw_elements_base_vertex  DONE (all drivers)
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] mesa: Add the infrastructure for KHR_texture_compression_astc_sliced_3d

2016-07-07 Thread Anuj Phogat
Signed-off-by: Anuj Phogat 
---
 src/mapi/glapi/registry/gl.xml   | 1 +
 src/mesa/main/extensions_table.h | 1 +
 src/mesa/main/mtypes.h   | 1 +
 src/mesa/main/teximage.c | 5 +++--
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/mapi/glapi/registry/gl.xml b/src/mapi/glapi/registry/gl.xml
index 0e12acc..dda9345 100755
--- a/src/mapi/glapi/registry/gl.xml
+++ b/src/mapi/glapi/registry/gl.xml
@@ -41757,6 +41757,7 @@ typedef unsigned int GLhandleARB;
 
 
 
+
 
 
 
diff --git a/src/mesa/main/extensions_table.h b/src/mesa/main/extensions_table.h
index ad3bffc..6c47b3b 100644
--- a/src/mesa/main/extensions_table.h
+++ b/src/mesa/main/extensions_table.h
@@ -285,6 +285,7 @@ EXT(KHR_robust_buffer_access_behavior   , 
ARB_robust_buffer_access_behavior
 EXT(KHR_robustness  , KHR_robustness   
  , GLL, GLC,  x , ES2, 2012)
 EXT(KHR_texture_compression_astc_hdr, KHR_texture_compression_astc_hdr 
  , GLL, GLC,  x , ES2, 2012)
 EXT(KHR_texture_compression_astc_ldr, KHR_texture_compression_astc_ldr 
  , GLL, GLC,  x , ES2, 2012)
+EXT(KHR_texture_compression_astc_sliced_3d  , 
KHR_texture_compression_astc_sliced_3d , GLL, GLC,  x , ES2, 2015)
 
 EXT(MESA_pack_invert, MESA_pack_invert 
  , GLL, GLC,  x ,  x , 2002)
 EXT(MESA_texture_signed_rgba, EXT_texture_snorm
  , GLL, GLC,  x ,  x , 2009)
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 29e47de..d490c25 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3946,6 +3946,7 @@ struct gl_extensions
GLboolean KHR_robustness;
GLboolean KHR_texture_compression_astc_hdr;
GLboolean KHR_texture_compression_astc_ldr;
+   GLboolean KHR_texture_compression_astc_sliced_3d;
GLboolean MESA_pack_invert;
GLboolean MESA_ycbcr_texture;
GLboolean NV_conditional_render;
diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index 26a6c21..b546888 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -1407,10 +1407,11 @@ _mesa_target_can_be_compressed(const struct gl_context 
*ctx, GLenum target,
  break;
   case MESA_FORMAT_LAYOUT_ASTC:
  target_can_be_compresed =
- ctx->Extensions.KHR_texture_compression_astc_hdr;
+ctx->Extensions.KHR_texture_compression_astc_hdr ||
+ctx->Extensions.KHR_texture_compression_astc_sliced_3d;
 
  /* Throw an INVALID_OPERATION error if the target is TEXTURE_3D and
-  * and the hdr extension is not supported.
+  * and either of above extensions are not supported.
   * See comment in switch case GL_TEXTURE_CUBE_MAP_ARRAY for more info.
   */
  if (!target_can_be_compresed)
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] i965/gen9: Enable KHR_texture_compression_astc_sliced_3d

2016-07-07 Thread Anuj Phogat
Signed-off-by: Anuj Phogat 
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 27dfb0c..c557137 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -391,6 +391,7 @@ intelInitExtensions(struct gl_context *ctx)
 
if (brw->gen >= 9) {
   ctx->Extensions.KHR_texture_compression_astc_ldr = true;
+  ctx->Extensions.KHR_texture_compression_astc_sliced_3d = true;
   ctx->Extensions.ARB_shader_stencil_export = true;
}
 
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7.5/11] glsl: Kill __intrinsic_atomic_sub

2016-07-07 Thread Ian Romanick
On 07/07/2016 06:34 PM, Ilia Mirkin wrote:
> On Thu, Jul 7, 2016 at 9:26 PM, Ian Romanick  wrote:
>> On 07/07/2016 04:58 PM, Ilia Mirkin wrote:
>>> On Thu, Jul 7, 2016 at 5:02 PM, Ian Romanick  wrote:
 From: Ian Romanick 

 Just generate an __intrinsic_atomic_add with a negated parameter.

 Signed-off-by: Ian Romanick 
 ---
  src/compiler/glsl/builtin_functions.cpp| 50 
 +++---
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  8 -
  2 files changed, 46 insertions(+), 12 deletions(-)

 diff --git a/src/compiler/glsl/builtin_functions.cpp 
 b/src/compiler/glsl/builtin_functions.cpp
 index 941ea12..ef3b2b0 100644
 --- a/src/compiler/glsl/builtin_functions.cpp
 +++ b/src/compiler/glsl/builtin_functions.cpp
 @@ -3310,13 +3310,29 @@ builtin_builder::asin_expr(ir_variable *x, float 
 p0, float p1)
mul(abs(x), imm(p1));
  }

 +/**
 + * Generate a ir_call to a function with a set of parameters
 + *
 + * The input \c params can either be a list of \c ir_variable or a list of
 + * \c ir_dereference_variable.  In the latter case, all nodes will be 
 removed
 + * from \c params and used directly as the parameters to the generated
 + * \c ir_call.
 + */
  ir_call *
  builtin_builder::call(ir_function *f, ir_variable *ret, exec_list params)
  {
 exec_list actual_params;

 -   foreach_in_list(ir_variable, var, ) {
 -  actual_params.push_tail(var_ref(var));
 +   foreach_in_list_safe(ir_instruction, ir, ) {
 +  ir_dereference_variable *d = ir->as_dereference_variable();
 +  if (d != NULL) {
 + d->remove();
 + actual_params.push_tail(d);
 +  } else {
 + ir_variable *var = ir->as_variable();
 + assert(var != NULL);
 + actual_params.push_tail(var_ref(var));
 +  }
 }

 ir_function_signature *sig =
 @@ -5292,8 +5308,34 @@ builtin_builder::_atomic_counter_op1(const char 
 *intrinsic,
 MAKE_SIG(glsl_type::uint_type, avail, 2, counter, data);

 ir_variable *retval = body.make_temp(glsl_type::uint_type, 
 "atomic_retval");
 -   body.emit(call(shader->symbols->get_function(intrinsic), retval,
 -  sig->parameters));
 +
 +   /* Instead of generating an __intrinsic_atomic_sub, generate an
 +* __intrinsic_atomic_add with the data parameter negated.
 +*/
 +   if (strcmp("__intrinsic_atomic_sub", intrinsic) == 0) {
 +  ir_variable *const neg_data =
 + body.make_temp(glsl_type::uint_type, "neg_data");
 +
 +  body.emit(assign(neg_data, neg(data)));
 +
 +  exec_list parameters;
 +
 +  parameters.push_tail(new(mem_ctx) ir_dereference_variable(counter));
 +  parameters.push_tail(new(mem_ctx) 
 ir_dereference_variable(neg_data));
>>>
>>> I don't get it ... why change call() to allow taking dereferences and
>>> create them here rather than just feeding in the ir_variables
>>> directly?
>>
>> Oh, I already went down that path.  :)  neg_data would have to be in two
>> lists at the same time:  the instruction stream and parameters.
>> Restructuring the code so that the ir_variables could be in parameters
>> then move them to the instruction stream was... enough to make a grown
>> Mick Jagger cry.
>>
>> I'm not terribly enamored with this solution either, but I didn't see a
>> better way.
> 
> How does it work in the "normal" case, i.e. if I just write GLSL that looks 
> like
> 
> int foo = 1;
> bar(foo)
> 
> Is there a separate ir_variable created to hold the foo inside the
> call? If so, that seems easy enough too ... perhaps there's a
> non-obvious reason why that turns into a pile of sadness?

ir_call in the instruction stream has an exec_list that contains
ir_dereference_variable nodes.

The builtin_builder::call method previously took an exec_list of
ir_variables and created a list of ir_dereference_variable.  All of the
original users of that method wanted to make a function call using
exactly the set of parameters passed to the built-in function (i.e.,
call __intrinsic_atomic_add using the parameters to atomicAdd).  For
these users, the list of ir_variables already existed:  the list of
parameters in the built-in function signature.

This new caller doesn't do that.  It wants to call a function with a
parameter from the function and a value calculated in the function.  So,
I changed builtin_builder::call to take a list that could either be a
list of ir_variable or a list of ir_dereference_variable.  In the former
case it behaves just as it previously did.  In the latter case, it uses
(and removes from the input list) the ir_dereference_variable nodes

Re: [Mesa-dev] [PATCH 05/11] glsl: Replace the linear search in get_intrinsic_opcode with a radix trie

2016-07-07 Thread Ian Romanick
On 07/07/2016 04:03 PM, Dylan Baker wrote:
> Quoting Ian Romanick (2016-07-05 17:46:13)
>> From: Ian Romanick 
>>
>> If there is a way to do this cleanly in mako, I'm very interested to
>> hear about it.
>>
>>textdata bss dec hex filename
>> 7529003  273096   28584 7830683  777c9b /tmp/i965_dri-64bit-before.so
>> 7528883  273096   28584 7830563  777c23 /tmp/i965_dri-64bit-after.so
>>
>> Signed-off-by: Ian Romanick 
>> ---
>>  src/compiler/glsl/nir_intrinsic_map.py | 131 
>> ++---
>>  1 file changed, 119 insertions(+), 12 deletions(-)
>>
>> diff --git a/src/compiler/glsl/nir_intrinsic_map.py 
>> b/src/compiler/glsl/nir_intrinsic_map.py
>> index 7f13c6c..5962d4b 100644
>> --- a/src/compiler/glsl/nir_intrinsic_map.py
>> +++ b/src/compiler/glsl/nir_intrinsic_map.py
>> @@ -66,6 +66,123 @@ intrinsics = [("__intrinsic_atomic_read", 
>> ("nir_intrinsic_atomic_counter_read_va
>>("__intrinsic_atomic_exchange_shared", 
>> ("nir_intrinsic_shared_atomic_exchange", None)),
>>("__intrinsic_atomic_comp_swap_shared", 
>> ("nir_intrinsic_shared_atomic_comp_swap", None))]
>>  
>> +def remove_prefix(table, prefix_length):
>> +"""Strip prefix_length characters off the name of each entry in 
>> table."""
>> +
>> +return [(s[prefix_length:], d) for (s, d) in table]
>> +
>> +
>> +def generate_trie(table):
>> +"""table is a list of (string, data) tuples.  It is assumed to be 
>> sorted by
>> +string.
>> +
>> +A radix trie (or compact prefix trie) is recursively generated from the
>> +list of names.  Names are paritioned into groups that have at least
>> +prefix_thresh (tunable parameter) common prefix characters.  Each of 
>> these
>> +groups becomes the branches at the current level of the tree.  The
>> +matching prefix characters from each group is removed, and the group is
>> +recursively operated on in the same fashion.
>> +
>> +The recursion terminates when no groups can be formed with at least
>> +prefix_thresh matching characters.
>> +
>> +Each node in the trie is a 3-element tuple:
>> +
>> +(prefix_string, [child_nodes], client_data)
>> +
>> +One of [child_nodes] or client_data will be None.
>> +
>> +See https://en.wikipedia.org/wiki/Radix_tree for more background details
>> +on the data structure.
>> +
>> +"""
>> +
>> +# Threshold for considering two strings to have the same prefix.
>> +prefix_thresh = 1
>> +
>> +if len(table) == 1 and table[0][0] == "":
>> +return [("", None, table[0][1])]
>> +
>> +trie_level = []
>> +
>> +(s, d) = table[0]
>> +candidates = [(s, d)]
>> +base = s
>> +prefix_length = len(s)
>> +
>> +for (s, d) in table[1:]:
>> +if s[:prefix_thresh] == base[:prefix_thresh]:
>> +candidates.append((s, d))
>> +
>> +l = len(s[:([x[0]==x[1] for x in zip(s, base)]+[0]).index(0)])
>> +if l < prefix_length:
>> +prefix_length = l
>> +else:
>> +trie_level.append((base[:prefix_length], 
>> generate_trie(remove_prefix(candidates, prefix_length)), None))
>> +
>> +candidates = [(s, d)]
>> +base = s
>> +prefix_length = len(s)
>> +
>> +trie_level.append((base[:prefix_length], 
>> generate_trie(remove_prefix(candidates, prefix_length)), None))
>> +
>> +return trie_level
>> +
>> +
>> +def emit_trie_leaf(indent, d):
>> +if d[1] is None:
>> +return "{}return {};\n".format(indent, d[0])
>> +else:
>> +c_code = "{}int_op = {};\n".format(indent, d[0])
>> +c_code += "{}uint_op = {};\n".format(indent, d[1])
>> +return c_code
>> +
>> +
>> +def trie_as_C_code(trie, indent="   ", prefix_string="__intrinsic_"):
>> +conditional = "if"
>> +
>> +c_code = ""
>> +for (s, t, d) in trie:
>> +if d is not None:
>> +c_code +=  "{}{} (name[0] == '\\0') {{\n".format(indent, 
>> conditional)
>> +c_code += "{}   /* {} */\n".format(indent, prefix_string)
>> +c_code += emit_trie_leaf(indent + "   ", d);
>> +
>> +else:
>> +# Before emitting the string comparison, check to see of the
>> +# subtree has a single element with an empty string.  In that
>> +# case, use strcmp() instead of strncmp() and don't advance the
>> +# name pointer.
>> +
>> +if len(t) == 1 and t[0][2] is not None:
>> +if s == "":
>> +c_code += "{}{} (name[0] == '\\0') {{\n".format(indent, 
>> conditional, s)
>> +else:
>> +c_code += "{}{} (strcmp(name, \"{}\") == 0) 
>> {{\n".format(indent, conditional, s)
>> +
>> +c_code += "{}   /* {} */\n".format(indent, prefix_string + 
>> s)
>> +c_code += emit_trie_leaf(indent + "   ", t[0][2]);
>> +

Re: [Mesa-dev] [PATCH 7.5/11] glsl: Kill __intrinsic_atomic_sub

2016-07-07 Thread Ilia Mirkin
On Thu, Jul 7, 2016 at 9:26 PM, Ian Romanick  wrote:
> On 07/07/2016 04:58 PM, Ilia Mirkin wrote:
>> On Thu, Jul 7, 2016 at 5:02 PM, Ian Romanick  wrote:
>>> From: Ian Romanick 
>>>
>>> Just generate an __intrinsic_atomic_add with a negated parameter.
>>>
>>> Signed-off-by: Ian Romanick 
>>> ---
>>>  src/compiler/glsl/builtin_functions.cpp| 50 
>>> +++---
>>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  8 -
>>>  2 files changed, 46 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/src/compiler/glsl/builtin_functions.cpp 
>>> b/src/compiler/glsl/builtin_functions.cpp
>>> index 941ea12..ef3b2b0 100644
>>> --- a/src/compiler/glsl/builtin_functions.cpp
>>> +++ b/src/compiler/glsl/builtin_functions.cpp
>>> @@ -3310,13 +3310,29 @@ builtin_builder::asin_expr(ir_variable *x, float 
>>> p0, float p1)
>>>mul(abs(x), imm(p1));
>>>  }
>>>
>>> +/**
>>> + * Generate a ir_call to a function with a set of parameters
>>> + *
>>> + * The input \c params can either be a list of \c ir_variable or a list of
>>> + * \c ir_dereference_variable.  In the latter case, all nodes will be 
>>> removed
>>> + * from \c params and used directly as the parameters to the generated
>>> + * \c ir_call.
>>> + */
>>>  ir_call *
>>>  builtin_builder::call(ir_function *f, ir_variable *ret, exec_list params)
>>>  {
>>> exec_list actual_params;
>>>
>>> -   foreach_in_list(ir_variable, var, ) {
>>> -  actual_params.push_tail(var_ref(var));
>>> +   foreach_in_list_safe(ir_instruction, ir, ) {
>>> +  ir_dereference_variable *d = ir->as_dereference_variable();
>>> +  if (d != NULL) {
>>> + d->remove();
>>> + actual_params.push_tail(d);
>>> +  } else {
>>> + ir_variable *var = ir->as_variable();
>>> + assert(var != NULL);
>>> + actual_params.push_tail(var_ref(var));
>>> +  }
>>> }
>>>
>>> ir_function_signature *sig =
>>> @@ -5292,8 +5308,34 @@ builtin_builder::_atomic_counter_op1(const char 
>>> *intrinsic,
>>> MAKE_SIG(glsl_type::uint_type, avail, 2, counter, data);
>>>
>>> ir_variable *retval = body.make_temp(glsl_type::uint_type, 
>>> "atomic_retval");
>>> -   body.emit(call(shader->symbols->get_function(intrinsic), retval,
>>> -  sig->parameters));
>>> +
>>> +   /* Instead of generating an __intrinsic_atomic_sub, generate an
>>> +* __intrinsic_atomic_add with the data parameter negated.
>>> +*/
>>> +   if (strcmp("__intrinsic_atomic_sub", intrinsic) == 0) {
>>> +  ir_variable *const neg_data =
>>> + body.make_temp(glsl_type::uint_type, "neg_data");
>>> +
>>> +  body.emit(assign(neg_data, neg(data)));
>>> +
>>> +  exec_list parameters;
>>> +
>>> +  parameters.push_tail(new(mem_ctx) ir_dereference_variable(counter));
>>> +  parameters.push_tail(new(mem_ctx) ir_dereference_variable(neg_data));
>>
>> I don't get it ... why change call() to allow taking dereferences and
>> create them here rather than just feeding in the ir_variables
>> directly?
>
> Oh, I already went down that path.  :)  neg_data would have to be in two
> lists at the same time:  the instruction stream and parameters.
> Restructuring the code so that the ir_variables could be in parameters
> then move them to the instruction stream was... enough to make a grown
> Mick Jagger cry.
>
> I'm not terribly enamored with this solution either, but I didn't see a
> better way.

How does it work in the "normal" case, i.e. if I just write GLSL that looks like

int foo = 1;
bar(foo)

Is there a separate ir_variable created to hold the foo inside the
call? If so, that seems easy enough too ... perhaps there's a
non-obvious reason why that turns into a pile of sadness?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7.5/11] glsl: Kill __intrinsic_atomic_sub

2016-07-07 Thread Ian Romanick
On 07/07/2016 04:58 PM, Ilia Mirkin wrote:
> On Thu, Jul 7, 2016 at 5:02 PM, Ian Romanick  wrote:
>> From: Ian Romanick 
>>
>> Just generate an __intrinsic_atomic_add with a negated parameter.
>>
>> Signed-off-by: Ian Romanick 
>> ---
>>  src/compiler/glsl/builtin_functions.cpp| 50 
>> +++---
>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  8 -
>>  2 files changed, 46 insertions(+), 12 deletions(-)
>>
>> diff --git a/src/compiler/glsl/builtin_functions.cpp 
>> b/src/compiler/glsl/builtin_functions.cpp
>> index 941ea12..ef3b2b0 100644
>> --- a/src/compiler/glsl/builtin_functions.cpp
>> +++ b/src/compiler/glsl/builtin_functions.cpp
>> @@ -3310,13 +3310,29 @@ builtin_builder::asin_expr(ir_variable *x, float p0, 
>> float p1)
>>mul(abs(x), imm(p1));
>>  }
>>
>> +/**
>> + * Generate a ir_call to a function with a set of parameters
>> + *
>> + * The input \c params can either be a list of \c ir_variable or a list of
>> + * \c ir_dereference_variable.  In the latter case, all nodes will be 
>> removed
>> + * from \c params and used directly as the parameters to the generated
>> + * \c ir_call.
>> + */
>>  ir_call *
>>  builtin_builder::call(ir_function *f, ir_variable *ret, exec_list params)
>>  {
>> exec_list actual_params;
>>
>> -   foreach_in_list(ir_variable, var, ) {
>> -  actual_params.push_tail(var_ref(var));
>> +   foreach_in_list_safe(ir_instruction, ir, ) {
>> +  ir_dereference_variable *d = ir->as_dereference_variable();
>> +  if (d != NULL) {
>> + d->remove();
>> + actual_params.push_tail(d);
>> +  } else {
>> + ir_variable *var = ir->as_variable();
>> + assert(var != NULL);
>> + actual_params.push_tail(var_ref(var));
>> +  }
>> }
>>
>> ir_function_signature *sig =
>> @@ -5292,8 +5308,34 @@ builtin_builder::_atomic_counter_op1(const char 
>> *intrinsic,
>> MAKE_SIG(glsl_type::uint_type, avail, 2, counter, data);
>>
>> ir_variable *retval = body.make_temp(glsl_type::uint_type, 
>> "atomic_retval");
>> -   body.emit(call(shader->symbols->get_function(intrinsic), retval,
>> -  sig->parameters));
>> +
>> +   /* Instead of generating an __intrinsic_atomic_sub, generate an
>> +* __intrinsic_atomic_add with the data parameter negated.
>> +*/
>> +   if (strcmp("__intrinsic_atomic_sub", intrinsic) == 0) {
>> +  ir_variable *const neg_data =
>> + body.make_temp(glsl_type::uint_type, "neg_data");
>> +
>> +  body.emit(assign(neg_data, neg(data)));
>> +
>> +  exec_list parameters;
>> +
>> +  parameters.push_tail(new(mem_ctx) ir_dereference_variable(counter));
>> +  parameters.push_tail(new(mem_ctx) ir_dereference_variable(neg_data));
> 
> I don't get it ... why change call() to allow taking dereferences and
> create them here rather than just feeding in the ir_variables
> directly?

Oh, I already went down that path.  :)  neg_data would have to be in two
lists at the same time:  the instruction stream and parameters.
Restructuring the code so that the ir_variables could be in parameters
then move them to the instruction stream was... enough to make a grown
Mick Jagger cry.

I'm not terribly enamored with this solution either, but I didn't see a
better way.

>> +
>> +  ir_function *const func =
>> + shader->symbols->get_function("__intrinsic_atomic_add");
>> +  ir_instruction *const c = call(func, retval, parameters);
>> +
>> +  assert(c != NULL);
>> +  assert(parameters.is_empty());
>> +
>> +  body.emit(c);
>> +   } else {
>> +  body.emit(call(shader->symbols->get_function(intrinsic), retval,
>> + sig->parameters));
>> +   }
>> +
>> body.emit(ret(retval));
>> return sig;
>>  }
>> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
>> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> index 197b3af..3320f2a 100644
>> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>> @@ -3208,13 +3208,6 @@ 
>> glsl_to_tgsi_visitor::visit_atomic_counter_intrinsic(ir_call *ir)
>>   val = ((ir_instruction *)param)->as_rvalue();
>>   val->accept(this);
>>   data2 = this->result;
>> -  } else if (!strcmp("__intrinsic_atomic_sub", callee)) {
>> - opcode = TGSI_OPCODE_ATOMUADD;
>> - st_src_reg res = get_temp(glsl_type::uvec4_type);
>> - st_dst_reg dstres = st_dst_reg(res);
>> - dstres.writemask = dst.writemask;
>> - emit_asm(ir, TGSI_OPCODE_INEG, dstres, data);
>> - data = res;
>>} else {
>>   assert(!"Unexpected intrinsic");
>>   return;
>> @@ -3625,7 +3618,6 @@ glsl_to_tgsi_visitor::visit(ir_call *ir)
>> !strcmp("__intrinsic_atomic_increment", callee) ||
>> !strcmp("__intrinsic_atomic_predecrement", callee) ||
>> 

Re: [Mesa-dev] [PATCH 1/4] glx: Call __glXInitVertexArrayState() with a usable gc.

2016-07-07 Thread Ian Romanick
On 07/07/2016 04:49 PM, Matt Turner wrote:
> On Thu, Jul 7, 2016 at 2:34 PM, Ian Romanick  wrote:
>> On 07/07/2016 09:44 AM, Matt Turner wrote:
>>> On Wed, Jun 29, 2016 at 2:16 PM, Ian Romanick  wrote:
 On 06/29/2016 02:04 AM, Colin McDonald wrote:
> I'm not familiar with the code, other than diving in to fix these
> indirect multi-texture problems, so you will know much more about it
> than me.
>
> But, my understanding is that __glXInitVertexArrayState needs info
> from the server, obtained by calls to _indirect_glGetString &
> __indirect_glGetIntegerv. Those routines need the current context
> from __glXGetCurrentContext, so __glXSetCurrentContext(gc) must have
> been called first.
>
> I see your point about a "layering violation".  I think that to avoid
> that would require a more substantial restructuring, so that the
> indirect layer can run some initialisation code (ie
> __glXInitVertexArrayState or similar) separate from the bind
> callback, once a usable context has been setup.

 Maybe...  *If* __glXGetCurrentContext is the only problem, then I think
 a small refactor of __indirect_glGetString could also solve the problem.
  Just make a new function

 const GLubyte *do_GetString(Display *dpy, struct glx_context *gc,
 GLenum name);

 that both __indirect_glGetString and indirect_bind_context call.  It
 might even be worth folding the contents of __glXGetString into the new
 function... though that's probably a follow-up patch.
>>>
>>> I tried that (see attached p.patch)... and I get another segfault.
>>
>> I think it should be possible to elide the __glXFlushRenderBuffer call.
>> Since the context is being made current, its buffer of rendering
>> commands must be empty.  Does the attached patch help?
> 
> Thanks. That gets us past that problem... and on to the next!

"Alex, I'll take yak shaving for 400."

"You get a daily double!"

"What is #fml?"

I think all of these problems are fixable.  At some point it's not worth
all the effort when a fix for the original problem exists.  I redact my
initial NAK.

> Fun, fun. __glXInitVertexArrayState(gc) contains
> 
>if (__glExtensionBitIsEnabled(gc, GL_ARB_vertex_program_bit)) {
>   __indirect_glGetProgramivARB(GL_VERTEX_PROGRAM_ARB,
>GL_MAX_PROGRAM_ATTRIBS_ARB,
>_program_attribs);
>}
> 
> Notice __glExtensionBitIsEnabled takes an explicit gc, passed into
> __glXInitVertexArrayState. __indirect_glGetProgramivARB on the other
> hand calls __glXGetCurrentContext and gets the dummyContext. I've
> plumbed gc into a do_GetProgramivARB, and I see a similar call to
> __indirect_glGetIntegerv that has the same problem. That looks like
> the only other one at least.
> 
> __indirect_glGetProgramivARB is code-gen'd, so I just want to add
> 'handcode="client"' to GetProgramivARB's XML in gl_API.xml. That in
> turn necessitates adding a dpy parameter to
> __glXInitVertexArrayState().
> 
> Ultimately, we segfault in __glXSetupVendorRequest() because it tries
> to get dpy from gc->currentDpy.
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x77667eb9 in __glXSetupVendorRequest (gc=0x662080, code=17,
> vop=1307, cmdlen=8) at indirect.c:191
> 191LockDisplay(dpy);
> (gdb) p dpy
> $40 = (Display * const) 0x0
> (gdb) bt
> #0  0x77667eb9 in __glXSetupVendorRequest (gc=0x662080,
> code=17, vop=1307, cmdlen=8) at indirect.c:191
> #1  0x7769aacb in do_GetProgramivARB (dpy=0x64fc60,
> gc=0x662080, target=34336, pname=34989, params=0x7fffdab4) at
> ../../../mesa/src/glx/single2.c:537
> #2  0x77691944 in __glXInitVertexArrayState (dpy=0x64fc60,
> gc=0x662080) at ../../../mesa/src/glx/indirect_vertex_array.c:198
> #3  0x77688c97 in indirect_bind_context (gc=0x662080,
> old=0x778cf6c0 , draw=27262978, read=27262978) at
> ../../../mesa/src/glx/indirect_glx.c:160
> #4  0x7766278a in MakeContextCurrent (dpy=0x64fc60,
> draw=27262978, read=27262978, gc_user=0x662080) at
> ../../../mesa/src/glx/glxcurrent.c:228
> #5  0x770c1af0 in fgPlatformOpenWindow () from /usr/lib64/libglut.so.3
> #6  0x770bbb06 in fgOpenWindow () from /usr/lib64/libglut.so.3
> #7  0x770ba42b in fgCreateWindow () from /usr/lib64/libglut.so.3
> #8  0x770bbc00 in glutCreateWindow () from /usr/lib64/libglut.so.3
> #9  0x004022c8 in main (argc=1, argv=0x7fffe0f8) at 
> arbvparray.c:294
> 
> I'm not sure this is tractable.
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] RadeonSI: Multithreaded shader compilation

2016-07-07 Thread Timothy Arceri
On Wed, 2016-06-29 at 18:32 +0200, Marek Olšák wrote:
> Hi,
> 
> This series implements basic multithreaded LLVM shader compilation
> in a minimally invasive way. (+51 lines of code in the main patch)
> 
> It doesn't help on-demand shader compilation, but it does improve
> loading and startup times by being able to saturate up to 4 CPU cores
> if given enough shaders to compile. A proper shader cache might make
> this redundant, but we don't have that now.

Have you had a chance to take a look at my recent shader cache work
[1]? The glsl work is mostly done I'm now just cleaning up and fixing
up the fallback paths for when we have a cache miss. I'm not sure if
you guys will need the fallback path or not since you have a way
dealing with varients, you might be ok which would make things much
simpler.

Anyway I don't think that getting something up and running would be a
large amount of work. Most of your time would likely be spent tweaking
the glsl to tgsi path to haveit to skip over the IR conversion and just
grab the required state.

The two files you would find most interesting would be:

src/compiler/glsl/shader_cache.cpp
src/mesa/drivers/dri/i965/brw_shader_cache.c

Everything else (besided the cache code itself) is pretty much just
wiring things up to be called at the right time, or skipped.


[1] https://github.com/tarceri/Mesa_arrays_of_arrays.git shader-cache20

> 
> Implementation:
> 
>    si_create_shader_selector doesn't compile main shader parts and
>    geometry shaders, but instead schedules an async job that does
> that.
> 
>    si_shader_select (in a draw call) waits for the job to complete
>    before assembling a shader variant from multiple binaries.
> 
> Results:
> 
> Loading times (using a stopwatch and multiple tries):
>    Elemental Demo (black screen time): 10.45 -> 8.16 seconds (-2.29
> s)
>    Unigine Heaven (loading time): 13.4 -> 12.2 seconds (-1.2 s)
> 
> Apitrace running times:
>    DiRT Showdown: 101.3 - > 94 seconds (-7.3 seconds)
>    Left 4 Dead 2: 37.5 -> 30 seconds (-7.5 seconds)
>    Borderlands 2: 36.9 -> 36.1 seconds (-0.8 seconds)
>  (the last one is an example of on-demand compilation)
> 
> Please review.
> 
> Marek
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/17] glsl/nir: add new num_packed_components field

2016-07-07 Thread Timothy Arceri
On Thu, 2016-07-07 at 17:50 +1000, Edward O'Callaghan wrote:
> Hi,
> 
> There is a typing issue in this patch in that, you are converting
> ‘gl_linked_shader*’ to ‘gl_shader*’ for the first argument to
> function
> ‘void set_num_packed_components(gl_shader*, ir_variable_mode,
> unsigned
> int)’ at the various call sites.

Whoops, not sure how I missed that when rebasing. Fix pushed to the
branch in my repo. Thanks.

> 
> Cheers,
> Edward.
> 
> 
> On 07/07/2016 11:58 AM, Timothy Arceri wrote:
> > This will be used to store the total number of components used at
> > this location
> > when packing via ARB_enhanced_layouts.
> > ---
> >  src/compiler/glsl/glsl_to_nir.cpp   |  1 +
> >  src/compiler/glsl/ir.h  |  5 +++
> >  src/compiler/glsl/link_varyings.cpp | 74
> > -
> >  src/compiler/glsl/linker.cpp|  2 +
> >  src/compiler/glsl/linker.h  |  4 ++
> >  src/compiler/nir/nir.h  |  5 +++
> >  6 files changed, 89 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/compiler/glsl/glsl_to_nir.cpp
> > b/src/compiler/glsl/glsl_to_nir.cpp
> > index 20302e3..baba624 100644
> > --- a/src/compiler/glsl/glsl_to_nir.cpp
> > +++ b/src/compiler/glsl/glsl_to_nir.cpp
> > @@ -375,6 +375,7 @@ nir_visitor::visit(ir_variable *ir)
> > var->data.explicit_binding = ir->data.explicit_binding;
> > var->data.has_initializer = ir->data.has_initializer;
> > var->data.location_frac = ir->data.location_frac;
> > +   var->data.num_packed_components = ir-
> > >data.num_packed_components;
> >  
> > switch (ir->data.depth_layout) {
> > case ir_depth_layout_none:
> > diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
> > index 1325e35..637b53c 100644
> > --- a/src/compiler/glsl/ir.h
> > +++ b/src/compiler/glsl/ir.h
> > @@ -770,6 +770,11 @@ public:
> >    unsigned location_frac:2;
> >  
> >    /**
> > +   * The total number of components packed into this location.
> > +   */
> > +  unsigned num_packed_components:4;
> > +
> > +  /**
> > * Layout of the matrix.  Uses glsl_matrix_layout values.
> > */
> >    unsigned matrix_layout:2;
> > diff --git a/src/compiler/glsl/link_varyings.cpp
> > b/src/compiler/glsl/link_varyings.cpp
> > index 76d0be1..35f97a9 100644
> > --- a/src/compiler/glsl/link_varyings.cpp
> > +++ b/src/compiler/glsl/link_varyings.cpp
> > @@ -1975,6 +1975,70 @@ reserved_varying_slot(struct
> > gl_linked_shader *stage,
> > return slots;
> >  }
> >  
> > +void
> > +set_num_packed_components(struct gl_shader *shader,
> > ir_variable_mode io_mode,
> > +  unsigned base_offset)
> > +{
> > +   /* Find the max number of components used at this location */
> > +   unsigned num_components[MAX_VARYINGS_INCL_PATCH] = { 0 };
> > +
> > +   foreach_in_list(ir_instruction, node, shader->ir) {
> > +  ir_variable *const var = node->as_variable();
> > +
> > +  if (var == NULL || var->data.mode != io_mode ||
> > +  !var->data.explicit_location)
> > + continue;
> > +
> > +  int idx = var->data.location - base_offset;
> > +  if (idx < 0 || idx >= MAX_VARYINGS_INCL_PATCH ||
> > +  var->type->without_array()->is_record() ||
> > +  var->type->without_array()->is_matrix())
> > + continue;
> > +
> > +  if (var->type->is_array()) {
> > + const glsl_type *type = get_varying_type(var, shader-
> > >Stage);
> > + unsigned array_components = type->without_array()-
> > >vector_elements +
> > +var->data.location_frac;
> > + assert(type->arrays_of_arrays_size() + idx <=
> > +ARRAY_SIZE(num_components));
> > + for (unsigned i = idx; i < type->arrays_of_arrays_size(); 
> > i++) {
> > +num_components[i] = MAX2(array_components,
> > num_components[i]);
> > + }
> > +  } else {
> > + unsigned comps = var->type->vector_elements +
> > +var->data.location_frac;
> > + num_components[idx] = MAX2(comps, num_components[idx]);
> > +  }
> > +   }
> > +
> > +   foreach_in_list(ir_instruction, node, shader->ir) {
> > +  ir_variable *const var = node->as_variable();
> > +
> > +  if (var == NULL || var->data.mode != io_mode ||
> > +  !var->data.explicit_location)
> > + continue;
> > +
> > +  int idx = var->data.location - base_offset;
> > +  if (idx < 0 || idx >= MAX_VARYINGS_INCL_PATCH ||
> > +  var->type->without_array()->is_record() ||
> > +  var->type->without_array()->is_matrix())
> > + continue;
> > +
> > +  /* For arrays we need to check all elements in order to find
> > the max
> > +   * number of components used.
> > +   */
> > +  unsigned c = 0;
> > +  if (var->type->is_array()) {
> > + const glsl_type *type = get_varying_type(var, shader-
> > >Stage);
> > + for (unsigned i = idx; i < type->arrays_of_arrays_size(); 
> > i++) {
> > +   

[Mesa-dev] [Bug 96853] gl_PrimitiveID is zero when rendering points of size > 1

2016-07-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=96853

--- Comment #1 from Roland Scheidegger  ---
I think this would be a bug in the emulation code used for large points.
Specifically, a gs is used to generate two tris out of a point, which means the
prim id cannot be first used in the fs, it has to be passed from gs.

Maybe something like this would fix it, but don't quote me on that...

diff --git a/src/gallium/auxiliary/util/u_simple_shaders.c
b/src/gallium/auxiliary/util/u_simple_shaders.c
index 1220e18..9527d96 100644
--- a/src/gallium/auxiliary/util/u_simple_shaders.c
+++ b/src/gallium/auxiliary/util/u_simple_shaders.c
@@ -874,11 +874,14 @@ util_make_geometry_passthrough_shader(struct pipe_context
*pipe,
   src[i] = ureg_src_dimension(src[i], 0);
   dst[i] = ureg_DECL_output(ureg, semantic_names[i], semantic_indexes[i]);
}
+   src[i] = ureg_DECL_input(ureg, TGSI_SEMANTIC_PRIM_ID, 0, 0, 1);
+   dst[i] = ureg_DECL_output(ureg, TGSI_SEMANTIC_PRIM_ID, 0);

/* MOV dst[i] src[i] */
for (i = 0; i < num_attribs; i++) {
   ureg_MOV(ureg, dst[i], src[i]);
}
+   ureg_MOV(ureg, dst[i], src[i]);

/* EMIT IMM[0] */
ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, , 1);

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/11] glsl: Replace the linear search in get_intrinsic_opcode with a radix trie

2016-07-07 Thread Jason Ekstrand
On Thu, Jul 7, 2016 at 10:36 AM, Ian Romanick  wrote:

> On 07/07/2016 04:15 AM, Jason Ekstrand wrote:
> > I can't help but think that we keep solving this problem... We have a
> > low-collision hash table for vkGetProcAddress, something for
> > glxGetProcAddress and eglGetProcAddress (hopefully the same?) and now
> > this.  Can we pick a method, make it a little Python helper, and use it
> > everywhere?
> >
> > Not being critical; this is probably a fine solution for compile-time
> > string -> int mappings and possibly the best to date.  It just seems
> > like kind of a big hammer especially if glsl_to_nir is its only user.
>
> Well... I went to this extreme partially for fun. :)  I think there are
> other places that could use this, and I mentioned one in the intro
> message.  I haven't gone looking through the code to find others, but I
> suspect there may be.  That might even be a cool newbie project.
>

I wondered about that... This kind of smelled like show-off code.  :-)  Not
that there's anything wrong with that...


> I don't think this particular solution is useful for GetProcAddress
> kinds of things because (by the end) it depends on callers only ever
> querying things that are in the set.  The other problem with
> glXGetProcAddress and eglGetProcAddress is the driver can add new things
> to the set.  I'm not sure if anything can add functions to the set used
> by vkGetProcAddress.
>

Right.  I hadn't considered implementations adding things to the list.  For
vkGetProcAddress, we don't do anything dynamic with it yet.  That may
change in the future. I don't know.


>
> > On Jul 5, 2016 5:46 PM, "Ian Romanick"  > > wrote:
> >
> > From: Ian Romanick  > >
> >
> > If there is a way to do this cleanly in mako, I'm very interested to
> > hear about it.
> >
> >textdata bss dec hex filename
> > 7529003  273096   28584 7830683  777c9b /tmp/i965_dri-64bit-before.so
> > 7528883  273096   28584 7830563  777c23 /tmp/i965_dri-64bit-after.so
> >
> > Signed-off-by: Ian Romanick  > >
> > ---
> >  src/compiler/glsl/nir_intrinsic_map.py | 131
> > ++---
> >  1 file changed, 119 insertions(+), 12 deletions(-)
> >
> > diff --git a/src/compiler/glsl/nir_intrinsic_map.py
> > b/src/compiler/glsl/nir_intrinsic_map.py
> > index 7f13c6c..5962d4b 100644
> > --- a/src/compiler/glsl/nir_intrinsic_map.py
> > +++ b/src/compiler/glsl/nir_intrinsic_map.py
> > @@ -66,6 +66,123 @@ intrinsics = [("__intrinsic_atomic_read",
> > ("nir_intrinsic_atomic_counter_read_va
> >("__intrinsic_atomic_exchange_shared",
> > ("nir_intrinsic_shared_atomic_exchange", None)),
> >("__intrinsic_atomic_comp_swap_shared",
> > ("nir_intrinsic_shared_atomic_comp_swap", None))]
> >
> > +def remove_prefix(table, prefix_length):
> > +"""Strip prefix_length characters off the name of each entry in
> > table."""
> > +
> > +return [(s[prefix_length:], d) for (s, d) in table]
> > +
> > +
> > +def generate_trie(table):
> > +"""table is a list of (string, data) tuples.  It is assumed to
> > be sorted by
> > +string.
> > +
> > +A radix trie (or compact prefix trie) is recursively generated
> > from the
> > +list of names.  Names are paritioned into groups that have at
> least
> > +prefix_thresh (tunable parameter) common prefix characters.
> > Each of these
> > +groups becomes the branches at the current level of the tree.
> The
> > +matching prefix characters from each group is removed, and the
> > group is
> > +recursively operated on in the same fashion.
> > +
> > +The recursion terminates when no groups can be formed with at
> least
> > +prefix_thresh matching characters.
> > +
> > +Each node in the trie is a 3-element tuple:
> > +
> > +(prefix_string, [child_nodes], client_data)
> > +
> > +One of [child_nodes] or client_data will be None.
> > +
> > +See https://en.wikipedia.org/wiki/Radix_tree for more
> > background details
> > +on the data structure.
> > +
> > +"""
> > +
> > +# Threshold for considering two strings to have the same prefix.
> > +prefix_thresh = 1
> > +
> > +if len(table) == 1 and table[0][0] == "":
> > +return [("", None, table[0][1])]
> > +
> > +trie_level = []
> > +
> > +(s, d) = table[0]
> > +candidates = [(s, d)]
> > +base = s
> > +prefix_length = len(s)
> > +
> > +for (s, d) in table[1:]:
> > +if s[:prefix_thresh] == base[:prefix_thresh]:
> >   

Re: [Mesa-dev] [PATCH 7.5/11] glsl: Kill __intrinsic_atomic_sub

2016-07-07 Thread Ilia Mirkin
On Thu, Jul 7, 2016 at 5:02 PM, Ian Romanick  wrote:
> From: Ian Romanick 
>
> Just generate an __intrinsic_atomic_add with a negated parameter.
>
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/glsl/builtin_functions.cpp| 50 
> +++---
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  8 -
>  2 files changed, 46 insertions(+), 12 deletions(-)
>
> diff --git a/src/compiler/glsl/builtin_functions.cpp 
> b/src/compiler/glsl/builtin_functions.cpp
> index 941ea12..ef3b2b0 100644
> --- a/src/compiler/glsl/builtin_functions.cpp
> +++ b/src/compiler/glsl/builtin_functions.cpp
> @@ -3310,13 +3310,29 @@ builtin_builder::asin_expr(ir_variable *x, float p0, 
> float p1)
>mul(abs(x), imm(p1));
>  }
>
> +/**
> + * Generate a ir_call to a function with a set of parameters
> + *
> + * The input \c params can either be a list of \c ir_variable or a list of
> + * \c ir_dereference_variable.  In the latter case, all nodes will be removed
> + * from \c params and used directly as the parameters to the generated
> + * \c ir_call.
> + */
>  ir_call *
>  builtin_builder::call(ir_function *f, ir_variable *ret, exec_list params)
>  {
> exec_list actual_params;
>
> -   foreach_in_list(ir_variable, var, ) {
> -  actual_params.push_tail(var_ref(var));
> +   foreach_in_list_safe(ir_instruction, ir, ) {
> +  ir_dereference_variable *d = ir->as_dereference_variable();
> +  if (d != NULL) {
> + d->remove();
> + actual_params.push_tail(d);
> +  } else {
> + ir_variable *var = ir->as_variable();
> + assert(var != NULL);
> + actual_params.push_tail(var_ref(var));
> +  }
> }
>
> ir_function_signature *sig =
> @@ -5292,8 +5308,34 @@ builtin_builder::_atomic_counter_op1(const char 
> *intrinsic,
> MAKE_SIG(glsl_type::uint_type, avail, 2, counter, data);
>
> ir_variable *retval = body.make_temp(glsl_type::uint_type, 
> "atomic_retval");
> -   body.emit(call(shader->symbols->get_function(intrinsic), retval,
> -  sig->parameters));
> +
> +   /* Instead of generating an __intrinsic_atomic_sub, generate an
> +* __intrinsic_atomic_add with the data parameter negated.
> +*/
> +   if (strcmp("__intrinsic_atomic_sub", intrinsic) == 0) {
> +  ir_variable *const neg_data =
> + body.make_temp(glsl_type::uint_type, "neg_data");
> +
> +  body.emit(assign(neg_data, neg(data)));
> +
> +  exec_list parameters;
> +
> +  parameters.push_tail(new(mem_ctx) ir_dereference_variable(counter));
> +  parameters.push_tail(new(mem_ctx) ir_dereference_variable(neg_data));

I don't get it ... why change call() to allow taking dereferences and
create them here rather than just feeding in the ir_variables
directly?

> +
> +  ir_function *const func =
> + shader->symbols->get_function("__intrinsic_atomic_add");
> +  ir_instruction *const c = call(func, retval, parameters);
> +
> +  assert(c != NULL);
> +  assert(parameters.is_empty());
> +
> +  body.emit(c);
> +   } else {
> +  body.emit(call(shader->symbols->get_function(intrinsic), retval,
> + sig->parameters));
> +   }
> +
> body.emit(ret(retval));
> return sig;
>  }
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 197b3af..3320f2a 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -3208,13 +3208,6 @@ 
> glsl_to_tgsi_visitor::visit_atomic_counter_intrinsic(ir_call *ir)
>   val = ((ir_instruction *)param)->as_rvalue();
>   val->accept(this);
>   data2 = this->result;
> -  } else if (!strcmp("__intrinsic_atomic_sub", callee)) {
> - opcode = TGSI_OPCODE_ATOMUADD;
> - st_src_reg res = get_temp(glsl_type::uvec4_type);
> - st_dst_reg dstres = st_dst_reg(res);
> - dstres.writemask = dst.writemask;
> - emit_asm(ir, TGSI_OPCODE_INEG, dstres, data);
> - data = res;
>} else {
>   assert(!"Unexpected intrinsic");
>   return;
> @@ -3625,7 +3618,6 @@ glsl_to_tgsi_visitor::visit(ir_call *ir)
> !strcmp("__intrinsic_atomic_increment", callee) ||
> !strcmp("__intrinsic_atomic_predecrement", callee) ||
> !strcmp("__intrinsic_atomic_add", callee) ||
> -   !strcmp("__intrinsic_atomic_sub", callee) ||
> !strcmp("__intrinsic_atomic_min", callee) ||
> !strcmp("__intrinsic_atomic_max", callee) ||
> !strcmp("__intrinsic_atomic_and", callee) ||
> --
> 2.5.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH 1/4] glx: Call __glXInitVertexArrayState() with a usable gc.

2016-07-07 Thread Matt Turner
On Thu, Jul 7, 2016 at 2:34 PM, Ian Romanick  wrote:
> On 07/07/2016 09:44 AM, Matt Turner wrote:
>> On Wed, Jun 29, 2016 at 2:16 PM, Ian Romanick  wrote:
>>> On 06/29/2016 02:04 AM, Colin McDonald wrote:
 I'm not familiar with the code, other than diving in to fix these
 indirect multi-texture problems, so you will know much more about it
 than me.

 But, my understanding is that __glXInitVertexArrayState needs info
 from the server, obtained by calls to _indirect_glGetString &
 __indirect_glGetIntegerv. Those routines need the current context
 from __glXGetCurrentContext, so __glXSetCurrentContext(gc) must have
 been called first.

 I see your point about a "layering violation".  I think that to avoid
 that would require a more substantial restructuring, so that the
 indirect layer can run some initialisation code (ie
 __glXInitVertexArrayState or similar) separate from the bind
 callback, once a usable context has been setup.
>>>
>>> Maybe...  *If* __glXGetCurrentContext is the only problem, then I think
>>> a small refactor of __indirect_glGetString could also solve the problem.
>>>  Just make a new function
>>>
>>> const GLubyte *do_GetString(Display *dpy, struct glx_context *gc,
>>> GLenum name);
>>>
>>> that both __indirect_glGetString and indirect_bind_context call.  It
>>> might even be worth folding the contents of __glXGetString into the new
>>> function... though that's probably a follow-up patch.
>>
>> I tried that (see attached p.patch)... and I get another segfault.
>
> I think it should be possible to elide the __glXFlushRenderBuffer call.
> Since the context is being made current, its buffer of rendering
> commands must be empty.  Does the attached patch help?

Thanks. That gets us past that problem... and on to the next!

Fun, fun. __glXInitVertexArrayState(gc) contains

   if (__glExtensionBitIsEnabled(gc, GL_ARB_vertex_program_bit)) {
  __indirect_glGetProgramivARB(GL_VERTEX_PROGRAM_ARB,
   GL_MAX_PROGRAM_ATTRIBS_ARB,
   _program_attribs);
   }

Notice __glExtensionBitIsEnabled takes an explicit gc, passed into
__glXInitVertexArrayState. __indirect_glGetProgramivARB on the other
hand calls __glXGetCurrentContext and gets the dummyContext. I've
plumbed gc into a do_GetProgramivARB, and I see a similar call to
__indirect_glGetIntegerv that has the same problem. That looks like
the only other one at least.

__indirect_glGetProgramivARB is code-gen'd, so I just want to add
'handcode="client"' to GetProgramivARB's XML in gl_API.xml. That in
turn necessitates adding a dpy parameter to
__glXInitVertexArrayState().

Ultimately, we segfault in __glXSetupVendorRequest() because it tries
to get dpy from gc->currentDpy.

Program received signal SIGSEGV, Segmentation fault.
0x77667eb9 in __glXSetupVendorRequest (gc=0x662080, code=17,
vop=1307, cmdlen=8) at indirect.c:191
191LockDisplay(dpy);
(gdb) p dpy
$40 = (Display * const) 0x0
(gdb) bt
#0  0x77667eb9 in __glXSetupVendorRequest (gc=0x662080,
code=17, vop=1307, cmdlen=8) at indirect.c:191
#1  0x7769aacb in do_GetProgramivARB (dpy=0x64fc60,
gc=0x662080, target=34336, pname=34989, params=0x7fffdab4) at
../../../mesa/src/glx/single2.c:537
#2  0x77691944 in __glXInitVertexArrayState (dpy=0x64fc60,
gc=0x662080) at ../../../mesa/src/glx/indirect_vertex_array.c:198
#3  0x77688c97 in indirect_bind_context (gc=0x662080,
old=0x778cf6c0 , draw=27262978, read=27262978) at
../../../mesa/src/glx/indirect_glx.c:160
#4  0x7766278a in MakeContextCurrent (dpy=0x64fc60,
draw=27262978, read=27262978, gc_user=0x662080) at
../../../mesa/src/glx/glxcurrent.c:228
#5  0x770c1af0 in fgPlatformOpenWindow () from /usr/lib64/libglut.so.3
#6  0x770bbb06 in fgOpenWindow () from /usr/lib64/libglut.so.3
#7  0x770ba42b in fgCreateWindow () from /usr/lib64/libglut.so.3
#8  0x770bbc00 in glutCreateWindow () from /usr/lib64/libglut.so.3
#9  0x004022c8 in main (argc=1, argv=0x7fffe0f8) at arbvparray.c:294

I'm not sure this is tractable.
diff --git a/src/glx/glxclient.h b/src/glx/glxclient.h
index ed57a29..224537a 100644
--- a/src/glx/glxclient.h
+++ b/src/glx/glxclient.h
@@ -733,7 +733,7 @@ extern void __glEmptyImage(struct glx_context *, GLint, GLint, GLint, GLint, GLe
 /*
 ** Allocate and Initialize Vertex Array client state, and free.
 */
-extern void __glXInitVertexArrayState(struct glx_context *);
+extern void __glXInitVertexArrayState(Display *dpy, struct glx_context *);
 extern void __glXFreeVertexArrayState(struct glx_context *);
 
 /*
diff --git a/src/glx/glxext.c b/src/glx/glxext.c
index dc87fb9..462b89a 100644
--- a/src/glx/glxext.c
+++ b/src/glx/glxext.c
@@ -984,10 +984,11 @@ _X_HIDDEN GLubyte *
 __glXFlushRenderBuffer(struct glx_context * ctx, GLubyte * pc)
 {
Display 

[Mesa-dev] [PATCH 1/2] anv/dump: Fix vkCmdPipelineBarrier flags

2016-07-07 Thread Chad Versace
'true' is not valid for VkDependencyFlags.
---
 src/intel/vulkan/anv_dump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_dump.c b/src/intel/vulkan/anv_dump.c
index 1dc5079..49a5ae2 100644
--- a/src/intel/vulkan/anv_dump.c
+++ b/src/intel/vulkan/anv_dump.c
@@ -155,7 +155,7 @@ dump_image_do_blit(struct anv_device *device, struct 
dump_image *image,
ANV_CALL(CmdPipelineBarrier)(anv_cmd_buffer_to_handle(cmd_buffer),
   VK_PIPELINE_STAGE_TRANSFER_BIT,
   VK_PIPELINE_STAGE_TRANSFER_BIT,
-  true, 0, NULL, 0, NULL, 1,
+  0, 0, NULL, 0, NULL, 1,
   &(VkImageMemoryBarrier) {
  .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
  .srcAccessMask = VK_ACCESS_HOST_READ_BIT,
-- 
2.9.0.rc2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] anv/dump: Fix post-blit memory barrier

2016-07-07 Thread Chad Versace
Swap srcAccessMask and dstAccessMask.
---
 src/intel/vulkan/anv_dump.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_dump.c b/src/intel/vulkan/anv_dump.c
index 49a5ae2..4a5a44f 100644
--- a/src/intel/vulkan/anv_dump.c
+++ b/src/intel/vulkan/anv_dump.c
@@ -158,8 +158,8 @@ dump_image_do_blit(struct anv_device *device, struct 
dump_image *image,
   0, 0, NULL, 0, NULL, 1,
   &(VkImageMemoryBarrier) {
  .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
- .srcAccessMask = VK_ACCESS_HOST_READ_BIT,
- .dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
+ .srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
+ .dstAccessMask = VK_ACCESS_HOST_READ_BIT,
  .oldLayout = VK_IMAGE_LAYOUT_GENERAL,
  .newLayout = VK_IMAGE_LAYOUT_GENERAL,
  .srcQueueFamilyIndex = 0,
-- 
2.9.0.rc2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] anv/dump: Add support for dumping framebuffers

2016-07-07 Thread Chad Versace
On Fri 17 Jun 2016, Jason Ekstrand wrote:
> ---
>  src/intel/vulkan/anv_dump.c| 127 
> +
>  src/intel/vulkan/anv_private.h |  10 +++
>  src/intel/vulkan/genX_cmd_buffer.c |   4 ++
>  3 files changed, 141 insertions(+)

This patch looks very helpful.

> diff --git a/src/intel/vulkan/anv_dump.c b/src/intel/vulkan/anv_dump.c
> index 59a6f2a..1dc5079 100644
> --- a/src/intel/vulkan/anv_dump.c
> +++ b/src/intel/vulkan/anv_dump.c
> @@ -23,11 +23,16 @@
>  
>  #include "anv_private.h"
>  
> +#include "util/list.h"
> +#include "util/ralloc.h"
> +
>  /* This file contains utility functions for help debugging.  They can be
>   * called from GDB or similar to help inspect images and buffers.
>   */

I'd like to see the instructions from your cover letter here at the top
of the file.

>  
>  struct dump_image {
> +   struct list_head link;
> +
> const char *filename;
>  
> VkExtent2D extent;



> +  uint32_t b;
> +  for_each_bit(b, iview->image->aspects) {
> + VkImageAspectFlagBits aspect = (1 << b);
> + char suffix;
> + switch (aspect) {
> + case VK_IMAGE_ASPECT_COLOR_BIT: suffix = 'c'; break;
> + case VK_IMAGE_ASPECT_DEPTH_BIT: suffix = 'd'; break;
> + case VK_IMAGE_ASPECT_STENCIL_BIT:   suffix = 's'; break;
> + default:
> +unreachable("Invalid aspect");
> + }
> +
> + char *filename = ralloc_asprintf(dump_ctx, 
> "framebuffer%04d-%d%c.ppm",
> +  dump_idx, i, suffix);
> +
> + dump_add_image(cmd_buffer, (struct anv_image *)iview->image, aspect,
> +iview->base_mip, iview->base_layer, filename);

The cast is unneeded. The type is already (struct anv_image *).

Drop the cast and this is
Reviewed-by: Chad Versace 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/11] glsl: Replace the linear search in get_intrinsic_opcode with a radix trie

2016-07-07 Thread Dylan Baker
Quoting Ian Romanick (2016-07-05 17:46:13)
> From: Ian Romanick 
> 
> If there is a way to do this cleanly in mako, I'm very interested to
> hear about it.
> 
>textdata bss dec hex filename
> 7529003  273096   28584 7830683  777c9b /tmp/i965_dri-64bit-before.so
> 7528883  273096   28584 7830563  777c23 /tmp/i965_dri-64bit-after.so
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/glsl/nir_intrinsic_map.py | 131 
> ++---
>  1 file changed, 119 insertions(+), 12 deletions(-)
> 
> diff --git a/src/compiler/glsl/nir_intrinsic_map.py 
> b/src/compiler/glsl/nir_intrinsic_map.py
> index 7f13c6c..5962d4b 100644
> --- a/src/compiler/glsl/nir_intrinsic_map.py
> +++ b/src/compiler/glsl/nir_intrinsic_map.py
> @@ -66,6 +66,123 @@ intrinsics = [("__intrinsic_atomic_read", 
> ("nir_intrinsic_atomic_counter_read_va
>("__intrinsic_atomic_exchange_shared", 
> ("nir_intrinsic_shared_atomic_exchange", None)),
>("__intrinsic_atomic_comp_swap_shared", 
> ("nir_intrinsic_shared_atomic_comp_swap", None))]
>  
> +def remove_prefix(table, prefix_length):
> +"""Strip prefix_length characters off the name of each entry in table."""
> +
> +return [(s[prefix_length:], d) for (s, d) in table]
> +
> +
> +def generate_trie(table):
> +"""table is a list of (string, data) tuples.  It is assumed to be sorted 
> by
> +string.
> +
> +A radix trie (or compact prefix trie) is recursively generated from the
> +list of names.  Names are paritioned into groups that have at least
> +prefix_thresh (tunable parameter) common prefix characters.  Each of 
> these
> +groups becomes the branches at the current level of the tree.  The
> +matching prefix characters from each group is removed, and the group is
> +recursively operated on in the same fashion.
> +
> +The recursion terminates when no groups can be formed with at least
> +prefix_thresh matching characters.
> +
> +Each node in the trie is a 3-element tuple:
> +
> +(prefix_string, [child_nodes], client_data)
> +
> +One of [child_nodes] or client_data will be None.
> +
> +See https://en.wikipedia.org/wiki/Radix_tree for more background details
> +on the data structure.
> +
> +"""
> +
> +# Threshold for considering two strings to have the same prefix.
> +prefix_thresh = 1
> +
> +if len(table) == 1 and table[0][0] == "":
> +return [("", None, table[0][1])]
> +
> +trie_level = []
> +
> +(s, d) = table[0]
> +candidates = [(s, d)]
> +base = s
> +prefix_length = len(s)
> +
> +for (s, d) in table[1:]:
> +if s[:prefix_thresh] == base[:prefix_thresh]:
> +candidates.append((s, d))
> +
> +l = len(s[:([x[0]==x[1] for x in zip(s, base)]+[0]).index(0)])
> +if l < prefix_length:
> +prefix_length = l
> +else:
> +trie_level.append((base[:prefix_length], 
> generate_trie(remove_prefix(candidates, prefix_length)), None))
> +
> +candidates = [(s, d)]
> +base = s
> +prefix_length = len(s)
> +
> +trie_level.append((base[:prefix_length], 
> generate_trie(remove_prefix(candidates, prefix_length)), None))
> +
> +return trie_level
> +
> +
> +def emit_trie_leaf(indent, d):
> +if d[1] is None:
> +return "{}return {};\n".format(indent, d[0])
> +else:
> +c_code = "{}int_op = {};\n".format(indent, d[0])
> +c_code += "{}uint_op = {};\n".format(indent, d[1])
> +return c_code
> +
> +
> +def trie_as_C_code(trie, indent="   ", prefix_string="__intrinsic_"):
> +conditional = "if"
> +
> +c_code = ""
> +for (s, t, d) in trie:
> +if d is not None:
> +c_code +=  "{}{} (name[0] == '\\0') {{\n".format(indent, 
> conditional)
> +c_code += "{}   /* {} */\n".format(indent, prefix_string)
> +c_code += emit_trie_leaf(indent + "   ", d);
> +
> +else:
> +# Before emitting the string comparison, check to see of the
> +# subtree has a single element with an empty string.  In that
> +# case, use strcmp() instead of strncmp() and don't advance the
> +# name pointer.
> +
> +if len(t) == 1 and t[0][2] is not None:
> +if s == "":
> +c_code += "{}{} (name[0] == '\\0') {{\n".format(indent, 
> conditional, s)
> +else:
> +c_code += "{}{} (strcmp(name, \"{}\") == 0) 
> {{\n".format(indent, conditional, s)
> +
> +c_code += "{}   /* {} */\n".format(indent, prefix_string + s)
> +c_code += emit_trie_leaf(indent + "   ", t[0][2]);
> +else:
> +c_code += "{}{} (strncmp(name, \"{}\", {}) == 0) 
> {{\n".format(indent, conditional, s, len(s))
> +c_code += "{}   name += 

[Mesa-dev] [PATCH v2 1/5] swr: [rasterizer] add support for llvm-3.9

2016-07-07 Thread Tim Rowley
v2: use signed compare, remove unneeded vmask
---
 .../drivers/swr/rasterizer/jitter/builder_misc.cpp | 38 --
 .../jitter/scripts/gen_llvm_ir_macros.py   |  5 ---
 2 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
index 671178f..da77f60 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
@@ -700,20 +700,22 @@ Value *Builder::PSHUFB(Value* a, Value* b)
 /// lower 8 values are used.
 Value *Builder::PMOVSXBD(Value* a)
 {
-Value* res;
+// llvm-3.9 removed the pmovsxbd intrinsic
+#if HAVE_LLVM < 0x309
 // use avx2 byte sign extend instruction if available
 if(JM()->mArch.AVX2())
 {
-res = VPMOVSXBD(a);
+Function *pmovsxbd = Intrinsic::getDeclaration(JM()->mpCurrentModule, 
Intrinsic::x86_avx2_pmovsxbd);
+return CALL(pmovsxbd, std::initializer_list{a});
 }
 else
+#endif
 {
 // VPMOVSXBD output type
 Type* v8x32Ty = VectorType::get(mInt32Ty, 8);
 // Extract 8 values from 128bit lane and sign extend
-res = S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), v8x32Ty);
+return S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), 
v8x32Ty);
 }
-return res;
 }
 
 //
@@ -722,20 +724,22 @@ Value *Builder::PMOVSXBD(Value* a)
 /// @param a - 128bit SIMD lane(8x16bit) of 16bit integer values.
 Value *Builder::PMOVSXWD(Value* a)
 {
-Value* res;
+// llvm-3.9 removed the pmovsxwd intrinsic
+#if HAVE_LLVM < 0x309
 // use avx2 word sign extend if available
 if(JM()->mArch.AVX2())
 {
-res = VPMOVSXWD(a);
+Function *pmovsxwd = Intrinsic::getDeclaration(JM()->mpCurrentModule, 
Intrinsic::x86_avx2_pmovsxwd);
+return CALL(pmovsxwd, std::initializer_list{a});
 }
 else
+#endif
 {
 // VPMOVSXWD output type
 Type* v8x32Ty = VectorType::get(mInt32Ty, 8);
 // Extract 8 values from 128bit lane and sign extend
-res = S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), v8x32Ty);
+return S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), 
v8x32Ty);
 }
-return res;
 }
 
 //
@@ -875,9 +879,15 @@ Value *Builder::CVTPS2PH(Value* a, Value* rounding)
 
 Value *Builder::PMAXSD(Value* a, Value* b)
 {
+// llvm-3.9 removed the pmax intrinsics
+#if HAVE_LLVM >= 0x309
+Value* cmp = ICMP_SGT(a, b);
+return SELECT(cmp, a, b);
+#else
 if (JM()->mArch.AVX2())
 {
-return VPMAXSD(a, b);
+Function* pmaxsd = Intrinsic::getDeclaration(JM()->mpCurrentModule, 
Intrinsic::x86_avx2_pmaxs_d);
+return CALL(pmaxsd, {a, b});
 }
 else
 {
@@ -900,13 +910,20 @@ Value *Builder::PMAXSD(Value* a, Value* b)
 
 return result;
 }
+#endif
 }
 
 Value *Builder::PMINSD(Value* a, Value* b)
 {
+// llvm-3.9 removed the pmin intrinsics
+#if HAVE_LLVM >= 0x309
+Value* cmp = ICMP_SLT(a, b);
+return SELECT(cmp, a, b);
+#else
 if (JM()->mArch.AVX2())
 {
-return VPMINSD(a, b);
+Function* pminsd = Intrinsic::getDeclaration(JM()->mpCurrentModule, 
Intrinsic::x86_avx2_pmins_d);
+return CALL(pminsd, {a, b});
 }
 else
 {
@@ -929,6 +946,7 @@ Value *Builder::PMINSD(Value* a, Value* b)
 
 return result;
 }
+#endif
 }
 
 void Builder::Gather4(const SWR_FORMAT format, Value* pSrcBase, Value* 
byteOffsets, 
diff --git 
a/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py 
b/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py
index 4963c5e..234889b 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py
+++ b/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py
@@ -91,8 +91,6 @@ intrinsics = [
 ["VRCPPS", "x86_avx_rcp_ps_256", ["a"]],
 ["VMINPS", "x86_avx_min_ps_256", ["a", "b"]],
 ["VMAXPS", "x86_avx_max_ps_256", ["a", "b"]],
-["VPMINSD", "x86_avx2_pmins_d", ["a", "b"]],
-["VPMAXSD", "x86_avx2_pmaxs_d", ["a", "b"]],
 ["VROUND", "x86_avx_round_ps_256", ["a", "rounding"]],
 ["VCMPPS", "x86_avx_cmp_ps_256", ["a", "b", "cmpop"]],
 ["VBLENDVPS", "x86_avx_blendv_ps_256", ["a", "b", "mask"]],
@@ -100,8 +98,6 @@ intrinsics = [
 ["VMASKLOADD", "x86_avx2_maskload_d_256", ["src", "mask"]],
 ["VMASKMOVPS", "x86_avx_maskload_ps_256", ["src", "mask"]],
 ["VPSHUFB", "x86_avx2_pshuf_b", ["a", "b"]],
-["VPMOVSXBD", "x86_avx2_pmovsxbd", ["a"]],  # sign extend packed 8bit 
components
-["VPMOVSXWD", "x86_avx2_pmovsxwd", ["a"]],  # sign extend packed 16bit 
components
 ["VPERMD", 

Re: [Mesa-dev] [PATCH 1/5] swr: [rasterizer] add support for llvm-3.9

2016-07-07 Thread Rowley, Timothy O

> On Jul 6, 2016, at 7:32 PM, Roland Scheidegger  wrote:
> 
> Am 06.07.2016 um 23:51 schrieb Tim Rowley:
>> ---
>> .../drivers/swr/rasterizer/jitter/builder_misc.cpp | 38 
>> --
>> .../jitter/scripts/gen_llvm_ir_macros.py   |  5 ---
>> 2 files changed, 28 insertions(+), 15 deletions(-)
>> 
>> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
>> b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
>> index 671178f..b23a10d 100644
>> --- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
>> +++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
>> @@ -700,20 +700,22 @@ Value *Builder::PSHUFB(Value* a, Value* b)
>> /// lower 8 values are used.
>> Value *Builder::PMOVSXBD(Value* a)
>> {
>> -Value* res;
>> +// llvm-3.9 removed the pmovsxbd intrinsic
>> +#if HAVE_LLVM < 0x309
>> // use avx2 byte sign extend instruction if available
>> if(JM()->mArch.AVX2())
>> {
>> -res = VPMOVSXBD(a);
>> +Function *pmovsxbd = 
>> Intrinsic::getDeclaration(JM()->mpCurrentModule, 
>> Intrinsic::x86_avx2_pmovsxbd);
>> +return CALL(pmovsxbd, std::initializer_list{a});
>> }
>> else
>> +#endif
>> {
>> // VPMOVSXBD output type
>> Type* v8x32Ty = VectorType::get(mInt32Ty, 8);
>> // Extract 8 values from 128bit lane and sign extend
>> -res = S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), 
>> v8x32Ty);
>> +return S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), 
>> v8x32Ty);
>> }
>> -return res;
>> }
>> 
>> //
>> @@ -722,20 +724,22 @@ Value *Builder::PMOVSXBD(Value* a)
>> /// @param a - 128bit SIMD lane(8x16bit) of 16bit integer values.
>> Value *Builder::PMOVSXWD(Value* a)
>> {
>> -Value* res;
>> +// llvm-3.9 removed the pmovsxwd intrinsic
>> +#if HAVE_LLVM < 0x309
>> // use avx2 word sign extend if available
>> if(JM()->mArch.AVX2())
>> {
>> -res = VPMOVSXWD(a);
>> +Function *pmovsxwd = 
>> Intrinsic::getDeclaration(JM()->mpCurrentModule, 
>> Intrinsic::x86_avx2_pmovsxwd);
>> +return CALL(pmovsxwd, std::initializer_list{a});
>> }
>> else
>> +#endif
>> {
>> // VPMOVSXWD output type
>> Type* v8x32Ty = VectorType::get(mInt32Ty, 8);
>> // Extract 8 values from 128bit lane and sign extend
>> -res = S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), 
>> v8x32Ty);
>> +return S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), 
>> v8x32Ty);
>> }
>> -return res;
>> }
>> 
>> //
>> @@ -875,9 +879,15 @@ Value *Builder::CVTPS2PH(Value* a, Value* rounding)
>> 
>> Value *Builder::PMAXSD(Value* a, Value* b)
>> {
>> +// llvm-3.9 removed the pmax intrinsics
>> +#if HAVE_LLVM >= 0x309
>> +Value* cmp = ICMP_UGT(a, b);
>> +return SELECT(VMASK(cmp), a, b);
>> +#else
>> if (JM()->mArch.AVX2())
>> {
>> -return VPMAXSD(a, b);
>> +Function* pmaxsd = Intrinsic::getDeclaration(JM()->mpCurrentModule, 
>> Intrinsic::x86_avx2_pmaxs_d);
>> +return CALL(pmaxsd, {a, b});
>> }
>> else
>> {
>> @@ -900,13 +910,20 @@ Value *Builder::PMAXSD(Value* a, Value* b)
>> 
>> return result;
>> }
>> +#endif
>> }
>> 
>> Value *Builder::PMINSD(Value* a, Value* b)
>> {
>> +// llvm-3.9 removed the pmin intrinsics
>> +#if HAVE_LLVM >= 0x309
>> +Value* cmp = ICMP_ULT(a, b);
>> +return SELECT(VMASK(cmp), a, b);
>> +#else
> Yep, had to deal with that in gallivm as well...
> That said, these were signed min/max here. I think you wanted to use
> ICMP_SLT/ICMP_SGT…

llvm developers do seem intent on pruning the list of x86 intrinsics.  Thanks 
for spotting the mistake - updated patch coming.

-Tim

> Roland
> 
> 
> 
> 
>> if (JM()->mArch.AVX2())
>> {
>> -return VPMINSD(a, b);
>> +Function* pminsd = Intrinsic::getDeclaration(JM()->mpCurrentModule, 
>> Intrinsic::x86_avx2_pmins_d);
>> +return CALL(pminsd, {a, b});
>> }
>> else
>> {
>> @@ -929,6 +946,7 @@ Value *Builder::PMINSD(Value* a, Value* b)
>> 
>> return result;
>> }
>> +#endif
>> }
>> 
>> void Builder::Gather4(const SWR_FORMAT format, Value* pSrcBase, Value* 
>> byteOffsets, 
>> diff --git 
>> a/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py 
>> b/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py
>> index 4963c5e..234889b 100644
>> --- a/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py
>> +++ b/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py
>> @@ -91,8 +91,6 @@ intrinsics = [
>> ["VRCPPS", "x86_avx_rcp_ps_256", ["a"]],
>> ["VMINPS", "x86_avx_min_ps_256", ["a", "b"]],
>> ["VMAXPS", "x86_avx_max_ps_256", ["a", "b"]],
>> -  

Re: [Mesa-dev] [PATCH 1/6] i965/fs: add a helper function to create double immediates

2016-07-07 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> Gen7 hardware does not support double immediates so these need
> to be moved in 32-bit chunks to a regular vgrf instead. Instead
> of doing this every time we need to create a DF immediate,
> create a helper function that does the right thing depending
> on the hardware generation.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.h   |  2 ++
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 43 
> 
>  2 files changed, 45 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index 4237197..dd7ce7d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -167,6 +167,8 @@ public:
> bool lower_simd_width();
> bool opt_combine_constants();
>  
> +   fs_reg setup_imm_df(double v);
> +
> void emit_dummy_fs();
> void emit_repclear_shader();
> fs_reg *emit_fragcoord_interpolation();
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index b3f5dfd..268c847 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -616,6 +616,49 @@ fs_visitor::optimize_frontfacing_ternary(nir_alu_instr 
> *instr,
> return true;
>  }
>  
> +fs_reg
> +fs_visitor::setup_imm_df(double v)

Mainly nitpicking here, but because this function only needs an i965 IR
builder and doesn't otherwise care about the fs_visitor class, it would
make more sense for it to be a stand-alone function independent from
fs_visitor taking an fs_builder as argument instead.

> +{
> +   assert(devinfo->gen >= 7);
> +
> +   if (devinfo->gen >= 8)
> +  return brw_imm_df(v);
> +
> +   /* gen7 does not support DF immediates, so we generate a 64-bit constant 
> by
> +* writing the low 32-bit of the constant to suboffset 0 of a VGRF and
> +* the high 32-bit to suboffset 4 and then applying a stride of 0.
> +*
> +* Alternatively, we could also produce a normal VGRF (without stride 0)
> +* by writing to all the channels in the VGRF, however, that would hit the
> +* gen7 bug where we have to split writes that span more than 1 register
> +* into instructions with a width of 4 (otherwise the write to the second
> +* register written runs into an execmask hardware bug) which isn't very
> +* nice.
> +*/
> +   union {
> +  double d;
> +  struct {
> + uint32_t i1;
> + uint32_t i2;
> +  };
> +   } di;
> +
> +   di.d = v;
> +

You can declare a scalar builder here for convenience like:

| const fs_builder ubld = bld.exec_all().group(1, 0);

then use it below instead of 'bld' so you can get rid of the six inst
field assignments.

> +   fs_reg tmp = vgrf(glsl_type::uint_type);

On e.g. SIMD32 mode this will allocate 32 components worth of registers
even though you only need two.  Once you have a scalar builder at hand
you can do it as follows instead:

| const fs_reg tmp = ubld.vgrf(BRW_REGISTER_TYPE_UD, 2);

Other than that looks okay to me.

> +   fs_inst *inst = bld.MOV(tmp, brw_imm_ud(di.i1));
> +   inst->force_writemask_all = true;
> +   inst->exec_size = 1;
> +   inst->regs_written = 1;
> +
> +   inst = bld.MOV(horiz_offset(tmp, 1), brw_imm_ud(di.i2));
> +   inst->force_writemask_all = true;
> +   inst->exec_size = 1;
> +   inst->regs_written = 1;
> +
> +   return component(retype(tmp, BRW_REGISTER_TYPE_DF), 0);
> +}
> +
>  void
>  fs_visitor::nir_emit_alu(const fs_builder , nir_alu_instr *instr)
>  {
> -- 
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] anv/dump: Add a barrier for the source image

2016-07-07 Thread Chad Versace
On Fri 17 Jun 2016, Jason Ekstrand wrote:
> ---
>  src/intel/vulkan/anv_dump.c | 22 ++
>  1 file changed, 22 insertions(+)

Patch 4 is
Reviewed-by: Chad Versace 



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] anv/dump: Refactor the guts into helpers

2016-07-07 Thread Chad Versace
On Fri 17 Jun 2016, Jason Ekstrand wrote:
> ---
>  src/intel/vulkan/anv_dump.c | 224 
> +++-
>  1 file changed, 139 insertions(+), 85 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_dump.c b/src/intel/vulkan/anv_dump.c
> index ffb892c..0fee93c 100644
> --- a/src/intel/vulkan/anv_dump.c
> +++ b/src/intel/vulkan/anv_dump.c

Patch 3 is
Reviewed-by: Chad Versace 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/6] i965/fs: do pack lowering before simd splitting

2016-07-07 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> So that we can have gen7 split large writes produced by the pack lowering.

Reviewed-by: Francisco Jerez 

> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index caf88d1..0d4eb51 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -5830,6 +5830,11 @@ fs_visitor::optimize()
> progress = false;
> pass_num = 0;
>  
> +   if (OPT(lower_pack)) {
> +  OPT(register_coalesce);
> +  OPT(dead_code_eliminate);
> +   }
> +
> OPT(lower_simd_width);
>  
> /* After SIMD lowering just in case we had to unroll the EOT send. */
> @@ -5866,11 +5871,6 @@ fs_visitor::optimize()
>OPT(dead_code_eliminate);
> }
>  
> -   if (OPT(lower_pack)) {
> -  OPT(register_coalesce);
> -  OPT(dead_code_eliminate);
> -   }
> -
> if (OPT(lower_d2x)) {
>OPT(opt_copy_propagate);
>OPT(dead_code_eliminate);
> -- 
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] radeonsi: disable multi-threading when shader dumps are enabled

2016-07-07 Thread Marek Olšák
With that typo in patch 1 fixed, the series is:

Reviewed-by: Marek Olšák 

Marek

On Thu, Jul 7, 2016 at 9:39 AM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> Otherwise, shader dumps can become interleaved and unusable.
> ---
>  src/gallium/drivers/radeonsi/si_state_shaders.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
> b/src/gallium/drivers/radeonsi/si_state_shaders.c
> index 94587b2..c24130d 100644
> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
> @@ -1325,6 +1325,7 @@ static void *si_create_shader_selector(struct 
> pipe_context *ctx,
> util_queue_fence_init(>ready);
>
> if ((sctx->b.debug.debug_message && !sctx->b.debug.async) ||
> +   r600_can_dump_shader(>b, sel->info.processor) ||
> !util_queue_is_initialized(>shader_compiler_queue))
> si_init_shader_selector_async(sel, -1);
> else
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] radeonsi: use multi-threaded compilation in debug contexts

2016-07-07 Thread Marek Olšák
On Thu, Jul 7, 2016 at 9:39 AM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> We only have to stay single-threaded when debug output must be synchronous.
> This yields better parallelism in shader-db runs for me.

shader-db should already get the CPU load to 100% for all cores. It
doesn't seem to possible to get better parallelism than that.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Mark R*32F formats as filterable when an extension is present.

2016-07-07 Thread Ilia Mirkin
Reviewed-by: Ilia Mirkin 

On Thu, Jul 7, 2016 at 5:36 PM, Kenneth Graunke  wrote:
> GL_OES_texture_float_linear marks R32F, RG32F, RGB32F, and RGBA32F
> as texture filterable.
>
> Fixes glGenerateMipmap GL errors when visiting a WebGL demo in Chromium:
> http://www.iamnop.com/particles
>
> Cc: Matt Atwood 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/main/genmipmap.c |  2 +-
>  src/mesa/main/glformats.c | 17 -
>  src/mesa/main/glformats.h |  3 ++-
>  3 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/main/genmipmap.c b/src/mesa/main/genmipmap.c
> index d917220..5e780c9 100644
> --- a/src/mesa/main/genmipmap.c
> +++ b/src/mesa/main/genmipmap.c
> @@ -90,7 +90,7 @@ 
> _mesa_is_valid_generate_texture_mipmap_internalformat(struct gl_context *ctx,
>   internalformat == GL_LUMINANCE_ALPHA ||
>   internalformat == GL_LUMINANCE || internalformat == GL_ALPHA ||
>   (_mesa_is_es3_color_renderable(internalformat) &&
> -  _mesa_is_es3_texture_filterable(internalformat));
> +  _mesa_is_es3_texture_filterable(ctx, internalformat));
> }
>
> return (!_mesa_is_enum_format_integer(internalformat) &&
> diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
> index 24ce7b0..448577e 100644
> --- a/src/mesa/main/glformats.c
> +++ b/src/mesa/main/glformats.c
> @@ -3656,7 +3656,8 @@ _mesa_is_es3_color_renderable(GLenum internal_format)
>   * is marked "Texture Filterable" in Table 8.10 of the ES 3.2 specification.
>   */
>  bool
> -_mesa_is_es3_texture_filterable(GLenum internal_format)
> +_mesa_is_es3_texture_filterable(const struct gl_context *ctx,
> +GLenum internal_format)
>  {
> switch (internal_format) {
> case GL_R8:
> @@ -3680,6 +3681,20 @@ _mesa_is_es3_texture_filterable(GLenum internal_format)
> case GL_R11F_G11F_B10F:
> case GL_RGB9_E5:
>return true;
> +   case GL_R32F:
> +   case GL_RG32F:
> +   case GL_RGB32F:
> +   case GL_RGBA32F:
> +  /* The OES_texture_float_linear spec says:
> +   *
> +   *"When implemented against OpenGL ES 3.0 or later versions, sized
> +   * 32-bit floating-point formats become texture-filterable. This
> +   * should be noted by, for example, checking the ``TF'' column of
> +   * table 8.13 in the ES 3.1 Specification (``Correspondence of 
> sized
> +   * internal formats to base internal formats ... and use cases 
> ...'')
> +   * for the R32F, RG32F, RGB32F, and RGBA32F formats."
> +   */
> +  return ctx->Extensions.OES_texture_float_linear;
> default:
>return false;
> }
> diff --git a/src/mesa/main/glformats.h b/src/mesa/main/glformats.h
> index c73f464..782e0f2 100644
> --- a/src/mesa/main/glformats.h
> +++ b/src/mesa/main/glformats.h
> @@ -149,7 +149,8 @@ extern bool
>  _mesa_is_es3_color_renderable(GLenum internal_format);
>
>  extern bool
> -_mesa_is_es3_texture_filterable(GLenum internal_format);
> +_mesa_is_es3_texture_filterable(const struct gl_context *ctx,
> +GLenum internal_format);
>
>  #ifdef __cplusplus
>  }
> --
> 2.9.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/6] i965/fs: do not require force_writemask_all with exec_size 4

2016-07-07 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> So far we only used instructions with this size in situations where we
> did not operate per-channel and we wanted to ignore the execution mask,
> but gen7 fp64 will need to emit code with a width of 4 that needs
> normal execution masking.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index d25d26a..07581d2 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -1649,7 +1649,6 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>brw_set_default_acc_write_control(p, inst->writes_accumulator);
>brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
>  
> -  assert(inst->force_writemask_all || inst->exec_size >= 8);

Another possibility would be to relax the assertion to check that
"inst->force_writemask_all || inst->exec_size >= 4" -- Because you can
only control the channel enable group with nibble granularity at best
it's unpractical to split instructions into chunks of execution size
less than four.  SIMD4 though definitely makes sense because of FP64.
Either way patch is:

Reviewed-by: Francisco Jerez 

>assert(inst->force_writemask_all || inst->group % inst->exec_size == 
> 0);
>assert(inst->base_mrf + inst->mlen <= BRW_MAX_MRF(devinfo->gen));
>assert(inst->mlen <= BRW_MAX_MSG_LENGTH);
> -- 
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Mark R*32F formats as filterable when an extension is present.

2016-07-07 Thread Kenneth Graunke
GL_OES_texture_float_linear marks R32F, RG32F, RGB32F, and RGBA32F
as texture filterable.

Fixes glGenerateMipmap GL errors when visiting a WebGL demo in Chromium:
http://www.iamnop.com/particles

Cc: Matt Atwood 
Signed-off-by: Kenneth Graunke 
---
 src/mesa/main/genmipmap.c |  2 +-
 src/mesa/main/glformats.c | 17 -
 src/mesa/main/glformats.h |  3 ++-
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/src/mesa/main/genmipmap.c b/src/mesa/main/genmipmap.c
index d917220..5e780c9 100644
--- a/src/mesa/main/genmipmap.c
+++ b/src/mesa/main/genmipmap.c
@@ -90,7 +90,7 @@ _mesa_is_valid_generate_texture_mipmap_internalformat(struct 
gl_context *ctx,
  internalformat == GL_LUMINANCE_ALPHA ||
  internalformat == GL_LUMINANCE || internalformat == GL_ALPHA ||
  (_mesa_is_es3_color_renderable(internalformat) &&
-  _mesa_is_es3_texture_filterable(internalformat));
+  _mesa_is_es3_texture_filterable(ctx, internalformat));
}
 
return (!_mesa_is_enum_format_integer(internalformat) &&
diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
index 24ce7b0..448577e 100644
--- a/src/mesa/main/glformats.c
+++ b/src/mesa/main/glformats.c
@@ -3656,7 +3656,8 @@ _mesa_is_es3_color_renderable(GLenum internal_format)
  * is marked "Texture Filterable" in Table 8.10 of the ES 3.2 specification.
  */
 bool
-_mesa_is_es3_texture_filterable(GLenum internal_format)
+_mesa_is_es3_texture_filterable(const struct gl_context *ctx,
+GLenum internal_format)
 {
switch (internal_format) {
case GL_R8:
@@ -3680,6 +3681,20 @@ _mesa_is_es3_texture_filterable(GLenum internal_format)
case GL_R11F_G11F_B10F:
case GL_RGB9_E5:
   return true;
+   case GL_R32F:
+   case GL_RG32F:
+   case GL_RGB32F:
+   case GL_RGBA32F:
+  /* The OES_texture_float_linear spec says:
+   *
+   *"When implemented against OpenGL ES 3.0 or later versions, sized
+   * 32-bit floating-point formats become texture-filterable. This
+   * should be noted by, for example, checking the ``TF'' column of
+   * table 8.13 in the ES 3.1 Specification (``Correspondence of sized
+   * internal formats to base internal formats ... and use cases ...'')
+   * for the R32F, RG32F, RGB32F, and RGBA32F formats."
+   */
+  return ctx->Extensions.OES_texture_float_linear;
default:
   return false;
}
diff --git a/src/mesa/main/glformats.h b/src/mesa/main/glformats.h
index c73f464..782e0f2 100644
--- a/src/mesa/main/glformats.h
+++ b/src/mesa/main/glformats.h
@@ -149,7 +149,8 @@ extern bool
 _mesa_is_es3_color_renderable(GLenum internal_format);
 
 extern bool
-_mesa_is_es3_texture_filterable(GLenum internal_format);
+_mesa_is_es3_texture_filterable(const struct gl_context *ctx,
+GLenum internal_format);
 
 #ifdef __cplusplus
 }
-- 
2.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] glx: Call __glXInitVertexArrayState() with a usable gc.

2016-07-07 Thread Ian Romanick
On 07/07/2016 09:44 AM, Matt Turner wrote:
> On Wed, Jun 29, 2016 at 2:16 PM, Ian Romanick  wrote:
>> On 06/29/2016 02:04 AM, Colin McDonald wrote:
>>> I'm not familiar with the code, other than diving in to fix these
>>> indirect multi-texture problems, so you will know much more about it
>>> than me.
>>>
>>> But, my understanding is that __glXInitVertexArrayState needs info
>>> from the server, obtained by calls to _indirect_glGetString &
>>> __indirect_glGetIntegerv. Those routines need the current context
>>> from __glXGetCurrentContext, so __glXSetCurrentContext(gc) must have
>>> been called first.
>>>
>>> I see your point about a "layering violation".  I think that to avoid
>>> that would require a more substantial restructuring, so that the
>>> indirect layer can run some initialisation code (ie
>>> __glXInitVertexArrayState or similar) separate from the bind
>>> callback, once a usable context has been setup.
>>
>> Maybe...  *If* __glXGetCurrentContext is the only problem, then I think
>> a small refactor of __indirect_glGetString could also solve the problem.
>>  Just make a new function
>>
>> const GLubyte *do_GetString(Display *dpy, struct glx_context *gc,
>> GLenum name);
>>
>> that both __indirect_glGetString and indirect_bind_context call.  It
>> might even be worth folding the contents of __glXGetString into the new
>> function... though that's probably a follow-up patch.
> 
> I tried that (see attached p.patch)... and I get another segfault.

I think it should be possible to elide the __glXFlushRenderBuffer call.
Since the context is being made current, its buffer of rendering
commands must be empty.  Does the attached patch help?

> (gdb) bt
> #0  0x74a97700 in XGetXCBConnection () from /usr/lib64/libX11-xcb.so.1
> #1  0x77664a3d in __glXFlushRenderBuffer (ctx=0x662080,
> pc=0x77f76010 "") at ../../../mesa/src/glx/glxext.c:987
> #2  0x7769b429 in do_GetString (dpy=0x64fc60, gc=0x662080,
> name=7939) at ../../../mesa/src/glx/single2.c:678
> #3  0x77688d55 in indirect_bind_context (gc=0x662080,
> old=0x778cf6c0 , draw=27262978, read=27262978) at
> ../../../mesa/src/glx/indirect_glx.c:158
> #4  0x7766278a in MakeContextCurrent (dpy=0x64fc60,
> draw=27262978, read=27262978, gc_user=0x662080) at
> ../../../mesa/src/glx/glxcurrent.c:228
> #5  0x770c1af0 in fgPlatformOpenWindow () from /usr/lib64/libglut.so.3
> #6  0x770bbb06 in fgOpenWindow () from /usr/lib64/libglut.so.3
> #7  0x770ba42b in fgCreateWindow () from /usr/lib64/libglut.so.3
> #8  0x770bbc00 in glutCreateWindow () from /usr/lib64/libglut.so.3
> #9  0x004022c8 in main (argc=1, argv=0x7fffe0f8) at 
> arbvparray.c:294
> 
> gc->currentDpy is NULL. Sigh. I don't know what any of this code is doing.



move-XGetXCBConnection-call.patch
Description: application/pgp-keys
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/11] glsl: Replace big pile of hand-written code with a generator

2016-07-07 Thread Dylan Baker
Quoting Ian Romanick (2016-07-05 17:46:10)
> From: Ian Romanick 
> 
> Right now the generator generates nearly identical code.  There is no
> change in the binary size.
> 
>textdata bss dec hex filename
> 7529283  273096   28584 7830963  777db3 /tmp/i965_dri-64bit-before.so
> 7529283  273096   28584 7830963  777db3 /tmp/i965_dri-64bit-after.so
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/Makefile.glsl.am  |  11 ++-
>  src/compiler/glsl/glsl_to_nir.cpp  | 123 
> +
>  src/compiler/glsl/nir_intrinsic_map.py |  95 +
>  3 files changed, 105 insertions(+), 124 deletions(-)
>  create mode 100644 src/compiler/glsl/nir_intrinsic_map.py
> 
> diff --git a/src/compiler/Makefile.glsl.am b/src/compiler/Makefile.glsl.am
> index 1132aae..af80e60 100644
> --- a/src/compiler/Makefile.glsl.am
> +++ b/src/compiler/Makefile.glsl.am
> @@ -27,6 +27,7 @@ EXTRA_DIST += glsl/tests glsl/glcpp/tests glsl/README \
> glsl/glsl_parser.yy \
> glsl/glcpp/glcpp-lex.l  \
> glsl/glcpp/glcpp-parse.y\
> +   glsl/nir_intrinsic_map.cpp  \
> SConscript.glsl
>  
>  TESTS += glsl/glcpp/tests/glcpp-test   \
> @@ -208,6 +209,10 @@ glsl/glcpp/glcpp-lex.c: glsl/glcpp/glcpp-lex.l
> $(MKDIR_GEN)
> $(LEX_GEN) -o $@ $(srcdir)/glsl/glcpp/glcpp-lex.l
>  
> +glsl/nir_intrinsic_map.cpp: glsl/nir_intrinsic_map.py
> +   $(MKDIR_GEN)
> +   $(PYTHON_GEN) $(srcdir)/glsl/nir_intrinsic_map.py > $@ || ($(RM) $@; 
> false)
> +
>  # Only the parsers (specifically the header files generated at the same time)
>  # need to be in BUILT_SOURCES. Though if we list the parser headers YACC is
>  # called for the .c/.cpp file and the .h files. By listing the .c/.cpp files
> @@ -218,14 +223,16 @@ BUILT_SOURCES +=  \
> glsl/glsl_parser.cpp\
> glsl/glsl_lexer.cpp \
> glsl/glcpp/glcpp-parse.c\
> -   glsl/glcpp/glcpp-lex.c
> +   glsl/glcpp/glcpp-lex.c  \
> +   glsl/nir_intrinsic_map.cpp
>  CLEANFILES +=  \
> glsl/glcpp/glcpp-parse.h\
> glsl/glsl_parser.h  \
> glsl/glsl_parser.cpp\
> glsl/glsl_lexer.cpp \
> glsl/glcpp/glcpp-parse.c\
> -   glsl/glcpp/glcpp-lex.c
> +   glsl/glcpp/glcpp-lex.c  \
> +   glsl/nir_intrinsic_map.cpp
>  
>  clean-local:
> $(RM) -r subtest-cr subtest-cr-lf subtest-lf subtest-lf-cr
> diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
> b/src/compiler/glsl/glsl_to_nir.cpp
> index 266f150..3b8424e 100644
> --- a/src/compiler/glsl/glsl_to_nir.cpp
> +++ b/src/compiler/glsl/glsl_to_nir.cpp
> @@ -602,128 +602,7 @@ nir_visitor::visit(ir_return *ir)
> nir_builder_instr_insert(, >instr);
>  }
>  
> -namespace _glsl_to_nir {
> -
> -nir_intrinsic_op
> -get_intrinsic_opcode(const char *name, const ir_dereference *return_deref)
> -{
> -   nir_intrinsic_op op;
> -
> -   if (strcmp(name, "__intrinsic_atomic_read") == 0) {
> -  op = nir_intrinsic_atomic_counter_read_var;
> -   } else if (strcmp(name, "__intrinsic_atomic_increment") == 0) {
> -  op = nir_intrinsic_atomic_counter_inc_var;
> -   } else if (strcmp(name, "__intrinsic_atomic_predecrement") == 0) {
> -  op = nir_intrinsic_atomic_counter_dec_var;
> -   } else if (strcmp(name, "__intrinsic_image_load") == 0) {
> -  op = nir_intrinsic_image_load;
> -   } else if (strcmp(name, "__intrinsic_image_store") == 0) {
> -  op = nir_intrinsic_image_store;
> -   } else if (strcmp(name, "__intrinsic_image_atomic_add") == 0) {
> -  op = nir_intrinsic_image_atomic_add;
> -   } else if (strcmp(name, "__intrinsic_image_atomic_min") == 0) {
> -  op = nir_intrinsic_image_atomic_min;
> -   } else if (strcmp(name, "__intrinsic_image_atomic_max") == 0) {
> -  op = nir_intrinsic_image_atomic_max;
> -   } else if (strcmp(name, "__intrinsic_image_atomic_and") == 0) {
> -  op = nir_intrinsic_image_atomic_and;
> -   } else if (strcmp(name, "__intrinsic_image_atomic_or") == 0) {
> -  op = nir_intrinsic_image_atomic_or;
> -   } else if (strcmp(name, "__intrinsic_image_atomic_xor") == 0) {
> -  op = nir_intrinsic_image_atomic_xor;
> -   } else if (strcmp(name, "__intrinsic_image_atomic_exchange") == 0) {
> -  op = nir_intrinsic_image_atomic_exchange;
> -   } else if (strcmp(name, "__intrinsic_image_atomic_comp_swap") == 0) {
> -  op = nir_intrinsic_image_atomic_comp_swap;
> -   } else if (strcmp(name, "__intrinsic_memory_barrier") == 0) {
> -  op = 

[Mesa-dev] [Bug 96853] gl_PrimitiveID is zero when rendering points of size > 1

2016-07-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=96853

Bug ID: 96853
   Summary: gl_PrimitiveID is zero when rendering points of size >
1
   Product: Mesa
   Version: git
  Hardware: Other
OS: Linux (All)
Status: NEW
  Severity: major
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: denis.fisse...@tu-dortmund.de
QA Contact: mesa-dev@lists.freedesktop.org

Using Mesa3d 12.1.0-devel with Ubuntu 16.04 LTS in VMware, I stumbled upon a
bug regarding the contents of gl_PrimitiveID in the fragment shader.
I am using gl_PrimitiveID for color generation. When rendering a VAO as
GL_POINTS and the point size set by glPointSize() is larger than 1,
gl_PrimitiveID constantly had a value of 0. As soon as the point size is
reduced to 1 or the type on primitive is changed to GL_TRIANGLES,
gl_PrimitiveID contains correct values. For rasterized points, the value of
gl_PrimitiveID in the fragment shader should equal the value of a
variable(declared using "flat") set to gl_VertexID in the vertex shader. Using
this workaround i was able to produce the desired values.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7.5/11] glsl: Kill __intrinsic_atomic_sub

2016-07-07 Thread Ian Romanick
From: Ian Romanick 

Just generate an __intrinsic_atomic_add with a negated parameter.

Signed-off-by: Ian Romanick 
---
 src/compiler/glsl/builtin_functions.cpp| 50 +++---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  8 -
 2 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/src/compiler/glsl/builtin_functions.cpp 
b/src/compiler/glsl/builtin_functions.cpp
index 941ea12..ef3b2b0 100644
--- a/src/compiler/glsl/builtin_functions.cpp
+++ b/src/compiler/glsl/builtin_functions.cpp
@@ -3310,13 +3310,29 @@ builtin_builder::asin_expr(ir_variable *x, float p0, 
float p1)
   mul(abs(x), imm(p1));
 }
 
+/**
+ * Generate a ir_call to a function with a set of parameters
+ *
+ * The input \c params can either be a list of \c ir_variable or a list of
+ * \c ir_dereference_variable.  In the latter case, all nodes will be removed
+ * from \c params and used directly as the parameters to the generated
+ * \c ir_call.
+ */
 ir_call *
 builtin_builder::call(ir_function *f, ir_variable *ret, exec_list params)
 {
exec_list actual_params;
 
-   foreach_in_list(ir_variable, var, ) {
-  actual_params.push_tail(var_ref(var));
+   foreach_in_list_safe(ir_instruction, ir, ) {
+  ir_dereference_variable *d = ir->as_dereference_variable();
+  if (d != NULL) {
+ d->remove();
+ actual_params.push_tail(d);
+  } else {
+ ir_variable *var = ir->as_variable();
+ assert(var != NULL);
+ actual_params.push_tail(var_ref(var));
+  }
}
 
ir_function_signature *sig =
@@ -5292,8 +5308,34 @@ builtin_builder::_atomic_counter_op1(const char 
*intrinsic,
MAKE_SIG(glsl_type::uint_type, avail, 2, counter, data);
 
ir_variable *retval = body.make_temp(glsl_type::uint_type, "atomic_retval");
-   body.emit(call(shader->symbols->get_function(intrinsic), retval,
-  sig->parameters));
+
+   /* Instead of generating an __intrinsic_atomic_sub, generate an
+* __intrinsic_atomic_add with the data parameter negated.
+*/
+   if (strcmp("__intrinsic_atomic_sub", intrinsic) == 0) {
+  ir_variable *const neg_data =
+ body.make_temp(glsl_type::uint_type, "neg_data");
+
+  body.emit(assign(neg_data, neg(data)));
+
+  exec_list parameters;
+
+  parameters.push_tail(new(mem_ctx) ir_dereference_variable(counter));
+  parameters.push_tail(new(mem_ctx) ir_dereference_variable(neg_data));
+
+  ir_function *const func =
+ shader->symbols->get_function("__intrinsic_atomic_add");
+  ir_instruction *const c = call(func, retval, parameters);
+
+  assert(c != NULL);
+  assert(parameters.is_empty());
+
+  body.emit(c);
+   } else {
+  body.emit(call(shader->symbols->get_function(intrinsic), retval,
+ sig->parameters));
+   }
+
body.emit(ret(retval));
return sig;
 }
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 197b3af..3320f2a 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -3208,13 +3208,6 @@ 
glsl_to_tgsi_visitor::visit_atomic_counter_intrinsic(ir_call *ir)
  val = ((ir_instruction *)param)->as_rvalue();
  val->accept(this);
  data2 = this->result;
-  } else if (!strcmp("__intrinsic_atomic_sub", callee)) {
- opcode = TGSI_OPCODE_ATOMUADD;
- st_src_reg res = get_temp(glsl_type::uvec4_type);
- st_dst_reg dstres = st_dst_reg(res);
- dstres.writemask = dst.writemask;
- emit_asm(ir, TGSI_OPCODE_INEG, dstres, data);
- data = res;
   } else {
  assert(!"Unexpected intrinsic");
  return;
@@ -3625,7 +3618,6 @@ glsl_to_tgsi_visitor::visit(ir_call *ir)
!strcmp("__intrinsic_atomic_increment", callee) ||
!strcmp("__intrinsic_atomic_predecrement", callee) ||
!strcmp("__intrinsic_atomic_add", callee) ||
-   !strcmp("__intrinsic_atomic_sub", callee) ||
!strcmp("__intrinsic_atomic_min", callee) ||
!strcmp("__intrinsic_atomic_max", callee) ||
!strcmp("__intrinsic_atomic_and", callee) ||
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/11] ARB_shader_atomic_counter_ops for NIR and i965

2016-07-07 Thread Ian Romanick
This series (with the updates) is on my fd.o
ARB_shader_atomic_counter_ops-i965 branch.

On 07/05/2016 05:46 PM, Ian Romanick wrote:
> The first 7 patches in this series put GLSL-to-NIR on a small diet.  I
> looked at the giant sequense of 'if (strcmp(...) == 0) { ... } else if
> (strcmp(...) == 0) { ...' and said, "Oh hell no."  I don't think we care
> much about the performance of this code, so I opted to tune for size.
> Using an in-code radix trie gets it about as small as I think it can
> get.  The result is -784 bytes in a single function.  All 41 strings
> just disappear.
> 
> It looks like src/mesa/state_tracker/st_glsl_to_tgsi.cpp could get
> similar treatment, and the savings there should be even larger.  My
> recommendation would be to copy src/compiler/glsl/nir_intrinsic_map.py
> into src/mesa/state_tracker and change it to suit the needs of that
> code.  The hard part is already done. :)
> 
> The rest of the series adds the new intrinsics to NIR and to the i965
> driver.
> 
> What we don't have is a good set of piglit tests for the new intrinsics.
> We also might not have tests for the existing flavors of the new
> intrinsics on, for example, SSBOs.  There is a test for
> atomicCounterAddARB.  I think it's going to be fairly difficult to come
> up with good tests for the other functions.  I'll have to think about it
> some more.
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] anv/dump: Use anv_minify instead of hand-rolling it

2016-07-07 Thread Chad Versace
Patches 1 and 2 are
Reviewed-by: Chad Versace 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl: restrict swap_available dri2_egl_display field to X11

2016-07-07 Thread Chad Versace
On Mon 20 Jun 2016, Frank Binns wrote:
> This field is only ever set and read by the X11 platform.
> 
> Signed-off-by: Frank Binns 
> ---
>  src/egl/drivers/dri2/egl_dri2.h | 2 +-
>  src/egl/drivers/dri2/platform_wayland.c | 2 --
>  2 files changed, 1 insertion(+), 3 deletions(-)

Reviewed-by: Chad Versace 

And pushed.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl: Fix the bad surface attributes combination checking for pbuffers. (v2)

2016-07-07 Thread Chad Versace
On Mon 20 Jun 2016, Guillaume Charifi wrote:
> Fixes a regression induced by commit a0674ce5c41903ccd161e89abb149621bfbc40d2:
> When EGL_TEXTURE_FORMAT and EGL_TEXTURE_TARGET were both specified (and
> both != EGL_NO_TEXTURE), an error was instantly triggered, before the
> other one had even a chance to be checked, which is obviously not the
> intended behaviour.
> 
> v2: Full commit hash, remove useless variables.
> 
> Signed-off-by: Guillaume Charifi 
> Reviewed-by: Frank Binns 

I added this snippet to the commit message:

Fixes: piglit "spec/egl 1.4/eglcreatepbuffersurface and then glclear"
Fixes: piglit "spec/egl 1.4/largest possible eglcreatepbuffersurface and 
then glclear"

I pushed and gave it my
Reviewed-by: Chad Versace 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radeonsi: catch a potential state tracker error with non-MSAA FBs

2016-07-07 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Wed, Jul 6, 2016 at 6:07 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> At least st/mesa ensures this, so I'd rather not handle deviations in 
> radeonsi.
> ---
>  src/gallium/drivers/radeonsi/si_state.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/src/gallium/drivers/radeonsi/si_state.c 
> b/src/gallium/drivers/radeonsi/si_state.c
> index ee92f15..df6b610 100644
> --- a/src/gallium/drivers/radeonsi/si_state.c
> +++ b/src/gallium/drivers/radeonsi/si_state.c
> @@ -3193,6 +3193,12 @@ static void si_emit_sample_mask(struct si_context 
> *sctx, struct r600_atom *atom)
> struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
> unsigned mask = sctx->sample_mask.sample_mask;
>
> +   /* Needed for line and polygon smoothing as well as for the Polaris
> +* small primitive filter. We expect the state tracker to take care of
> +* this for us.
> +*/
> +   assert(mask == 0x || sctx->framebuffer.nr_samples > 1);
> +
> radeon_set_context_reg_seq(cs, R_028C38_PA_SC_AA_MASK_X0Y0_X1Y0, 2);
> radeon_emit(cs, mask | (mask << 16));
> radeon_emit(cs, mask | (mask << 16));
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH mesa] egl/display: remove unnecessary code and make it easier to read

2016-07-07 Thread Chad Versace
On Wed 06 Jul 2016, Eric Engestrom wrote:
> Remove the two first level `if` as they will always be true, and
> flatten the two remaining `if`.
> No functional change.
> 
> Signed-off-by: Eric Engestrom 
> ---
>  src/egl/main/egldisplay.c | 29 ++---
>  1 file changed, 14 insertions(+), 15 deletions(-)

Reviewed-by: Chad Versace 

And pushed.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Make single-buffered GLES representation internally consistent

2016-07-07 Thread Chad Versace
On Fri 01 Jul 2016, Chad Versace wrote:
> On Thu 30 Jun 2016, Stéphane Marchesin wrote:
> > On Thu, Jun 30, 2016 at 3:20 PM, Gurchetan Singh
> >  wrote:
> > > There are a few places in the code where clearing and reading are done on
> > > incorrect buffers for GLES contexts.  See comments for details.  This
> > > fixes 75 GLES3 dEQP tests on the surfaceless platform with no regressions.
> > >
> > > v2: Corrected unclear comment
> > > v3: Make the change in context.c instead of get.c
> > > v4: Removed whitespace
> > 
> > I looked for a better way than initializing from makecurrent, but
> > there doesn't seem to be one, so...
> > 
> > Reviewed-by: 
> 
> This looks like a difficult problem to fix.  I also looked for a better
> way to fix the problem, but I gave up after concluding that any
> thoroughly correct fix seemed to require endless yak shaving.
> 
> If this fixes dEQP regressions, without regressing other things, then
> I approve the patch. But, I'm not convinced yet that it doesn't regress
> X11 pixmaps. I want to run some tests and step through the code with gdb
> before I give a reviewed-by.

I spent this morning trying to solve this the right way, and I gave up.
And I couldn't find any tests for X11 pixmaps that hit these codepaths.
So I pushed the patch to master.

Reviewed-by: Chad Versace 



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/11] nir/intrinsics: Add more atomic_counter ops

2016-07-07 Thread Ian Romanick
On 07/07/2016 02:40 AM, Iago Toral wrote:
> On Tue, 2016-07-05 at 17:46 -0700, Ian Romanick wrote:
>> From: Ian Romanick 
>>
>> Signed-off-by: Ian Romanick 
>> ---
>>  src/compiler/glsl/glsl_to_nir.cpp  | 43
>> +++---
>>  src/compiler/glsl/nir_intrinsic_map.py |  8 
>>  .../glsl/tests/get_intrinsic_opcode_test.cpp   |  8 
>>  src/compiler/nir/nir.c |  1 +
>>  src/compiler/nir/nir_intrinsics.h  | 14 +++
>>  src/compiler/nir/nir_lower_atomics.c   | 38
>> +++
>>  6 files changed, 107 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/compiler/glsl/glsl_to_nir.cpp
>> b/src/compiler/glsl/glsl_to_nir.cpp
>> index 3b8424e..ab7200b 100644
>> --- a/src/compiler/glsl/glsl_to_nir.cpp
>> +++ b/src/compiler/glsl/glsl_to_nir.cpp
>> @@ -616,11 +616,44 @@ nir_visitor::visit(ir_call *ir)
>>switch (op) {
>>case nir_intrinsic_atomic_counter_read_var:
>>case nir_intrinsic_atomic_counter_inc_var:
>> -  case nir_intrinsic_atomic_counter_dec_var: {
>> - ir_dereference *param =
>> -(ir_dereference *) ir->actual_parameters.get_head();
>> - instr->variables[0] = evaluate_deref(>instr, param);
>> - nir_ssa_dest_init(>instr, >dest, 1, 32,
>> NULL);
>> +  case nir_intrinsic_atomic_counter_dec_var:
>> +  case nir_intrinsic_atomic_counter_add_var:
>> +  case nir_intrinsic_atomic_counter_min_var:
>> +  case nir_intrinsic_atomic_counter_max_var:
>> +  case nir_intrinsic_atomic_counter_and_var:
>> +  case nir_intrinsic_atomic_counter_or_var:
>> +  case nir_intrinsic_atomic_counter_xor_var:
>> +  case nir_intrinsic_atomic_counter_exchange_var:
>> +  case nir_intrinsic_atomic_counter_comp_swap_var: {
>> + nir_ssa_undef_instr *instr_undef =
>> +nir_ssa_undef_instr_create(shader, 1, 32);
>> + nir_builder_instr_insert(, _undef->instr);
>>
> 
> I guess you did not mean to include the undef instruction hunk above?

Arg... this and the stray list_inithead are left over from trying to fix
a bug that turned out to be a typo.  I remove both.  Thanks.

>> + /* Set the counter variable dereference. */
>> + exec_node *param = ir->actual_parameters.get_head();
>> + ir_dereference *counter = (ir_dereference *)param;
>> +
>> + instr->variables[0] = evaluate_deref(>instr,
>> counter);
>> + param = param->get_next();
>> +
>> + /* Set the intrinsic destination. */
>> + if (ir->return_deref) {
>> +nir_ssa_dest_init(>instr, >dest, 1, 32,
>> NULL);
>> + }
>> +
>> + /* Set the intrinsic parameters. */
>> + if (!param->is_tail_sentinel()) {
>> +instr->src[0] =
>> +   nir_src_for_ssa(evaluate_rvalue((ir_dereference
>> *)param));
>> +param = param->get_next();
>> + }
>> +
>> + if (!param->is_tail_sentinel()) {
>> +instr->src[1] =
>> +   nir_src_for_ssa(evaluate_rvalue((ir_dereference
>> *)param));
>> +param = param->get_next();
>> + }
>> +
>>   nir_builder_instr_insert(, >instr);
>>   break;
>>}
>> diff --git a/src/compiler/glsl/nir_intrinsic_map.py
>> b/src/compiler/glsl/nir_intrinsic_map.py
>> index 07b2d0d..5abc3cb 100644
>> --- a/src/compiler/glsl/nir_intrinsic_map.py
>> +++ b/src/compiler/glsl/nir_intrinsic_map.py
>> @@ -26,6 +26,14 @@ from mako.template import Template
>>  intrinsics = [("__intrinsic_atomic_read",
>> ("nir_intrinsic_atomic_counter_read_var", None)),
>>("__intrinsic_atomic_increment",
>> ("nir_intrinsic_atomic_counter_inc_var", None)),
>>("__intrinsic_atomic_predecrement",
>> ("nir_intrinsic_atomic_counter_dec_var", None)),
>> +  ("__intrinsic_atomic_add",
>> ("nir_intrinsic_atomic_counter_add_var", None)),
>> +  ("__intrinsic_atomic_min",
>> ("nir_intrinsic_atomic_counter_min_var", None)),
>> +  ("__intrinsic_atomic_max",
>> ("nir_intrinsic_atomic_counter_max_var", None)),
>> +  ("__intrinsic_atomic_and",
>> ("nir_intrinsic_atomic_counter_and_var", None)),
>> +  ("__intrinsic_atomic_or",
>> ("nir_intrinsic_atomic_counter_or_var", None)),
>> +  ("__intrinsic_atomic_xor",
>> ("nir_intrinsic_atomic_counter_xor_var", None)),
>> +  ("__intrinsic_atomic_exchange",
>> ("nir_intrinsic_atomic_counter_exchange_var", None)),
>> +  ("__intrinsic_atomic_comp_swap",
>> ("nir_intrinsic_atomic_counter_comp_swap_var", None)),
>>("__intrinsic_image_load",
>> ("nir_intrinsic_image_load", None)),
>>("__intrinsic_image_store",
>> ("nir_intrinsic_image_store", None)),
>>("__intrinsic_image_atomic_add",
>> ("nir_intrinsic_image_atomic_add", None)),
>> diff --git 

Re: [Mesa-dev] [PATCH 05/11] glsl: Replace the linear search in get_intrinsic_opcode with a radix trie

2016-07-07 Thread Ian Romanick
On 07/07/2016 04:15 AM, Jason Ekstrand wrote:
> I can't help but think that we keep solving this problem... We have a
> low-collision hash table for vkGetProcAddress, something for
> glxGetProcAddress and eglGetProcAddress (hopefully the same?) and now
> this.  Can we pick a method, make it a little Python helper, and use it
> everywhere?
> 
> Not being critical; this is probably a fine solution for compile-time
> string -> int mappings and possibly the best to date.  It just seems
> like kind of a big hammer especially if glsl_to_nir is its only user.

Well... I went to this extreme partially for fun. :)  I think there are
other places that could use this, and I mentioned one in the intro
message.  I haven't gone looking through the code to find others, but I
suspect there may be.  That might even be a cool newbie project.

I don't think this particular solution is useful for GetProcAddress
kinds of things because (by the end) it depends on callers only ever
querying things that are in the set.  The other problem with
glXGetProcAddress and eglGetProcAddress is the driver can add new things
to the set.  I'm not sure if anything can add functions to the set used
by vkGetProcAddress.

> On Jul 5, 2016 5:46 PM, "Ian Romanick"  > wrote:
> 
> From: Ian Romanick  >
> 
> If there is a way to do this cleanly in mako, I'm very interested to
> hear about it.
> 
>textdata bss dec hex filename
> 7529003  273096   28584 7830683  777c9b /tmp/i965_dri-64bit-before.so
> 7528883  273096   28584 7830563  777c23 /tmp/i965_dri-64bit-after.so
> 
> Signed-off-by: Ian Romanick  >
> ---
>  src/compiler/glsl/nir_intrinsic_map.py | 131
> ++---
>  1 file changed, 119 insertions(+), 12 deletions(-)
> 
> diff --git a/src/compiler/glsl/nir_intrinsic_map.py
> b/src/compiler/glsl/nir_intrinsic_map.py
> index 7f13c6c..5962d4b 100644
> --- a/src/compiler/glsl/nir_intrinsic_map.py
> +++ b/src/compiler/glsl/nir_intrinsic_map.py
> @@ -66,6 +66,123 @@ intrinsics = [("__intrinsic_atomic_read",
> ("nir_intrinsic_atomic_counter_read_va
>("__intrinsic_atomic_exchange_shared",
> ("nir_intrinsic_shared_atomic_exchange", None)),
>("__intrinsic_atomic_comp_swap_shared",
> ("nir_intrinsic_shared_atomic_comp_swap", None))]
> 
> +def remove_prefix(table, prefix_length):
> +"""Strip prefix_length characters off the name of each entry in
> table."""
> +
> +return [(s[prefix_length:], d) for (s, d) in table]
> +
> +
> +def generate_trie(table):
> +"""table is a list of (string, data) tuples.  It is assumed to
> be sorted by
> +string.
> +
> +A radix trie (or compact prefix trie) is recursively generated
> from the
> +list of names.  Names are paritioned into groups that have at least
> +prefix_thresh (tunable parameter) common prefix characters. 
> Each of these
> +groups becomes the branches at the current level of the tree.  The
> +matching prefix characters from each group is removed, and the
> group is
> +recursively operated on in the same fashion.
> +
> +The recursion terminates when no groups can be formed with at least
> +prefix_thresh matching characters.
> +
> +Each node in the trie is a 3-element tuple:
> +
> +(prefix_string, [child_nodes], client_data)
> +
> +One of [child_nodes] or client_data will be None.
> +
> +See https://en.wikipedia.org/wiki/Radix_tree for more
> background details
> +on the data structure.
> +
> +"""
> +
> +# Threshold for considering two strings to have the same prefix.
> +prefix_thresh = 1
> +
> +if len(table) == 1 and table[0][0] == "":
> +return [("", None, table[0][1])]
> +
> +trie_level = []
> +
> +(s, d) = table[0]
> +candidates = [(s, d)]
> +base = s
> +prefix_length = len(s)
> +
> +for (s, d) in table[1:]:
> +if s[:prefix_thresh] == base[:prefix_thresh]:
> +candidates.append((s, d))
> +
> +l = len(s[:([x[0]==x[1] for x in zip(s,
> base)]+[0]).index(0)])
> +if l < prefix_length:
> +prefix_length = l
> +else:
> +trie_level.append((base[:prefix_length],
> generate_trie(remove_prefix(candidates, prefix_length)), None))
> +
> +candidates = [(s, d)]
> +base = s
> +prefix_length = len(s)
> +
> +trie_level.append((base[:prefix_length],
> 

[Mesa-dev] [PATCH v2 17/24] i965: Use LZD to implement nir_op_find_lsb on Gen < 7

2016-07-07 Thread Ian Romanick
From: Ian Romanick 

v2: Rebase on changes to previous two patches.

Signed-off-by: Ian Romanick 
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 22 +-
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 26 --
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 93d5e9d..fbbfd96 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -1389,7 +1389,27 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
 
case nir_op_find_lsb:
   assert(nir_dest_bit_size(instr->dest.dest) < 64);
-  bld.FBL(result, op[0]);
+
+  if (devinfo->gen < 7) {
+ fs_reg temp = vgrf(glsl_type::int_type);
+
+ /* (x & -x) generates a value that consists of only the LSB of x.
+  * For all powers of 2, findMSB(y) == findLSB(y).
+  */
+ fs_reg src = retype(op[0], BRW_REGISTER_TYPE_D);
+ fs_reg negated_src = src;
+
+ /* One must be negated, and the other must be non-negated.  It
+  * doesn't matter which is which.
+  */
+ negated_src.negate = true;
+ src.negate = false;
+
+ bld.AND(temp, src, negated_src);
+ emit_find_msb_using_lzd(bld, result, temp, false);
+  } else {
+ bld.FBL(result, op[0]);
+  }
   break;
 
case nir_op_ubitfield_extract:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 85fa775..3b20508 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1535,9 +1535,31 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   break;
}
 
-   case nir_op_find_lsb:
-  emit(FBL(dst, op[0]));
+   case nir_op_find_lsb: {
+  vec4_builder bld = vec4_builder(this).at_end();
+
+  if (devinfo->gen < 7) {
+ dst_reg temp = bld.vgrf(BRW_REGISTER_TYPE_D);
+
+ /* (x & -x) generates a value that consists of only the LSB of x.
+  * For all powers of 2, findMSB(y) == findLSB(y).
+  */
+ src_reg src = src_reg(retype(op[0], BRW_REGISTER_TYPE_D));
+ src_reg negated_src = src;
+
+ /* One must be negated, and the other must be non-negated.  It
+  * doesn't matter which is which.
+  */
+ negated_src.negate = true;
+ src.negate = false;
+
+ bld.AND(temp, src, negated_src);
+ emit_find_msb_using_lzd(bld, dst, src_reg(temp), false);
+  } else {
+ bld.FBL(dst, op[0]);
+  }
   break;
+   }
 
case nir_op_ubitfield_extract:
case nir_op_ibitfield_extract:
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 16/24] i965: Use LZD to implement nir_op_ifind_msb on Gen < 7

2016-07-07 Thread Ian Romanick
From: Ian Romanick 

v2: Retype LZD source as UD to avoid potential problems with 0x8000.
Suggested by Matt.  Also update comment about problem values with
LZD(abs(x)).  Suggested by Curro.

Signed-off-by: Ian Romanick 
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 54 ++--
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 57 --
 2 files changed, 90 insertions(+), 21 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 65f6406..93d5e9d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -623,8 +623,36 @@ emit_find_msb_using_lzd(const fs_builder ,
 bool is_signed)
 {
fs_inst *inst;
+   fs_reg temp = src;
 
-   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), src);
+   if (is_signed) {
+  /* LZD of an absolute value source almost always does the right
+   * thing.  There are two problem values:
+   *
+   * * 0x8000.  Since abs(0x8000) == 0x8000, LZD returns
+   *   0.  However, findMSB(int(0x8000)) == 30.
+   *
+   * * 0x.  Since abs(0x) == 1, LZD returns
+   *   31.  Section 8.8 (Integer Functions) of the GLSL 4.50 spec says:
+   *
+   *For a value of zero or negative one, -1 will be returned.
+   *
+   * * Negative powers of two.  LZD(abs(-(1<dest.dest) < 64);
-  bld.FBH(retype(result, BRW_REGISTER_TYPE_UD), op[0]);
 
-  /* FBH counts from the MSB side, while GLSL's findMSB() wants the count
-   * from the LSB side. If FBH didn't return an error (0x), then
-   * subtract the result from 31 to convert the MSB count into an LSB 
count.
-   */
-  bld.CMP(bld.null_reg_d(), result, brw_imm_d(-1), BRW_CONDITIONAL_NZ);
+  if (devinfo->gen < 7) {
+ emit_find_msb_using_lzd(bld, result, op[0], true);
+  } else {
+ bld.FBH(retype(result, BRW_REGISTER_TYPE_UD), op[0]);
 
-  inst = bld.ADD(result, result, brw_imm_d(31));
-  inst->predicate = BRW_PREDICATE_NORMAL;
-  inst->src[0].negate = true;
+ /* FBH counts from the MSB side, while GLSL's findMSB() wants the
+  * count from the LSB side. If FBH didn't return an error
+  * (0x), then subtract the result from 31 to convert the MSB
+  * count into an LSB count.
+  */
+ bld.CMP(bld.null_reg_d(), result, brw_imm_d(-1), BRW_CONDITIONAL_NZ);
+
+ inst = bld.ADD(result, result, brw_imm_d(31));
+ inst->predicate = BRW_PREDICATE_NORMAL;
+ inst->src[0].negate = true;
+  }
   break;
}
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 352d88a..85fa775 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1000,8 +1000,36 @@ emit_find_msb_using_lzd(const vec4_builder ,
 bool is_signed)
 {
vec4_instruction *inst;
+   src_reg temp = src;
 
-   bld.LZD(retype(dst, BRW_REGISTER_TYPE_UD), src);
+   if (is_signed) {
+  /* LZD of an absolute value source almost always does the right
+   * thing.  There are two problem values:
+   *
+   * * 0x8000.  Since abs(0x8000) == 0x8000, LZD returns
+   *   0.  However, findMSB(int(0x8000)) == 30.
+   *
+   * * 0x.  Since abs(0x) == 1, LZD returns
+   *   31.  Section 8.8 (Integer Functions) of the GLSL 4.50 spec says:
+   *
+   *For a value of zero or negative one, -1 will be returned.
+   *
+   * * Negative powers of two.  LZD(abs(-(1<

Re: [Mesa-dev] [PATCH 4/4] st/omx/dec: make decoder video buffer progressive

2016-07-07 Thread Leo Liu

Hi Emil,

Have a look again, I think the logic is nothing wrong.

reason In lines


On 07/07/2016 11:39 AM, Emil Velikov wrote:

On 6 July 2016 at 19:03, Leo Liu  wrote:

The idea of encode tunneling is to use video buffer directly for encoder,
but currently the encoder doesn’t support interlaced surface, the OMX
decoder set progressive surface before on that purpose.

Since now we are polling the driver for interlacing information for
decoder, we got the interlaced as preferred as other APIs(VDPAU, VA-API),
thus breaking the transcode with tunneling.

The solution is when with tunnel detected, re-allocate progressive target
buffers, and then converting the interlaced decoder results to there.

This has been tested with transcode results bit to bit matching as before
with surface from progressive to progressive.

Signed-off-by: Leo Liu 
---
  src/gallium/state_trackers/omx/vid_dec.c | 65 +++-
  src/gallium/state_trackers/omx/vid_dec.h |  6 ++-
  2 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/omx/vid_dec.c 
b/src/gallium/state_trackers/omx/vid_dec.c
index a989c10..7842966 100644
--- a/src/gallium/state_trackers/omx/vid_dec.c
+++ b/src/gallium/state_trackers/omx/vid_dec.c
@@ -167,6 +167,19 @@ static OMX_ERRORTYPE vid_dec_Constructor(OMX_COMPONENTTYPE 
*comp, OMX_STRING nam
 if (!priv->pipe)
return OMX_ErrorInsufficientResources;

+   if (!vl_compositor_init(>compositor, priv->pipe)) {
+  priv->pipe->destroy(priv->pipe);
+  priv->pipe = NULL;
+  return OMX_ErrorInsufficientResources;
+   }
+
+   if (!vl_compositor_init_state(>cstate, priv->pipe)) {
+  vl_compositor_cleanup(>compositor);
+  priv->pipe->destroy(priv->pipe);
+  priv->pipe = NULL;
+  return OMX_ErrorInsufficientResources;
+   }
+()

IIRC as vid_dec_Constructor() fails, the caller (bellagio?) explicitly
calls the destructor vid_dec_Destructor(). Thus the above teardown
should not be needed.



We take reference of the structure of priv->compositor, and priv->cstate 
for init.
the init return true or false, and there is no clear flag from 
compositor and cstate to reflect on the success of the init function.
so we use priv->pipe as a clear flag in order to clean them up at the 
destructor.




 priv->sPortTypesParam[OMX_PortDomainVideo].nStartPortNumber = 0;
 priv->sPortTypesParam[OMX_PortDomainVideo].nPorts = 2;
 priv->ports = CALLOC(2, sizeof(omx_base_PortType *));
@@ -218,8 +231,11 @@ static OMX_ERRORTYPE vid_dec_Destructor(OMX_COMPONENTTYPE 
*comp)
priv->ports=NULL;
 }

-   if (priv->pipe)
+   if (priv->pipe) {
+  vl_compositor_cleanup_state(>cstate);
+  vl_compositor_cleanup(>compositor);

Neither vl_compositor_cleanup_state() nor vl_compositor_cleanup() is
happy if upon deref. the value (pointer again) is NULL.

omx/vid_enc.c could use similar cleanups ?

so, if priv->pipe is true,
it won't be NULL, the cstate and compositor is for sure there.


Are I right?

Thanks,
Leo



Thanks,
Emil


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 15/24] i965: Use LZD to implement nir_op_ufind_msb

2016-07-07 Thread Ian Romanick
From: Ian Romanick 

This uses one less instruction.

v2: Move emit_find_msb_using_lzd out of the visitor classes.  Suggested
by Curro.

Signed-off-by: Ian Romanick 
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  3 +++
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 26 +++-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  3 +++
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp   | 23 +
 4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index d25d26a..bda4a26 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1761,6 +1761,9 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
  /* FBL only supports UD type for dst. */
  brw_FBL(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
  break;
+  case BRW_OPCODE_LZD:
+ brw_LZD(p, dst, src[0]);
+ break;
   case BRW_OPCODE_CBIT:
  assert(devinfo->gen >= 7);
  /* CBIT only supports UD type for dst. */
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 04ed42e..65f6406 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -616,6 +616,25 @@ fs_visitor::optimize_frontfacing_ternary(nir_alu_instr 
*instr,
return true;
 }
 
+static void
+emit_find_msb_using_lzd(const fs_builder ,
+const fs_reg ,
+const fs_reg ,
+bool is_signed)
+{
+   fs_inst *inst;
+
+   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), src);
+
+   /* LZD counts from the MSB side, while GLSL's findMSB() wants the count
+* from the LSB side. Subtract the result from 31 to convert the MSB
+* count into an LSB count.  If no bits are set, LZD will return 32.
+* 31-32 = -1, which is exactly what findMSB() is supposed to return.
+*/
+   inst = bld.ADD(result, retype(result, BRW_REGISTER_TYPE_D), brw_imm_d(31));
+   inst->src[0].negate = true;
+}
+
 void
 fs_visitor::nir_emit_alu(const fs_builder , nir_alu_instr *instr)
 {
@@ -1312,7 +1331,12 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   bld.CBIT(result, op[0]);
   break;
 
-   case nir_op_ufind_msb:
+   case nir_op_ufind_msb: {
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
+  emit_find_msb_using_lzd(bld, result, op[0], false);
+  break;
+   }
+
case nir_op_ifind_msb: {
   assert(nir_dest_bit_size(instr->dest.dest) < 64);
   bld.FBH(retype(result, BRW_REGISTER_TYPE_UD), op[0]);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index bb0254e..193e748 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1637,6 +1637,9 @@ generate_code(struct brw_codegen *p,
  /* FBL only supports UD type for dst. */
  brw_FBL(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
  break;
+  case BRW_OPCODE_LZD:
+ brw_LZD(p, dst, src[0]);
+ break;
   case BRW_OPCODE_CBIT:
  assert(devinfo->gen >= 7);
  /* CBIT only supports UD type for dst. */
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index f3b4528..352d88a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -993,6 +993,26 @@ vec4_visitor::optimize_predicate(nir_alu_instr *instr,
return true;
 }
 
+static void
+emit_find_msb_using_lzd(const vec4_builder ,
+const dst_reg ,
+const src_reg ,
+bool is_signed)
+{
+   vec4_instruction *inst;
+
+   bld.LZD(retype(dst, BRW_REGISTER_TYPE_UD), src);
+
+   /* LZD counts from the MSB side, while GLSL's findMSB() wants the count
+* from the LSB side. Subtract the result from 31 to convert the MSB count
+* into an LSB count.  If no bits are set, LZD will return 32.  31-32 = -1,
+* which is exactly what findMSB() is supposed to return.
+*/
+   inst = bld.ADD(dst, retype(src_reg(dst), BRW_REGISTER_TYPE_D),
+  brw_imm_d(31));
+   inst->src[0].negate = true;
+}
+
 void
 vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
 {
@@ -1461,6 +1481,9 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   break;
 
case nir_op_ufind_msb:
+  emit_find_msb_using_lzd(vec4_builder(this).at_end(), dst, op[0], false);
+  break;
+
case nir_op_ifind_msb: {
   emit(FBH(retype(dst, BRW_REGISTER_TYPE_UD), op[0]));
 
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swrast: fix active attribs with atifragshader

2016-07-07 Thread Miklós Máté

On 06/26/2016 09:48 PM, Miklós Máté wrote:

Only include the ones that can be used by the shader.

This fixes texture coordinates, which were completely wrong,
because WPOS was included in the list of attribs. It also
increases performance noticeably.

Signed-off-by: Miklós Máté 
---
  src/mesa/swrast/s_context.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/swrast/s_context.c b/src/mesa/swrast/s_context.c
index 0a5fc7e..a63179c 100644
--- a/src/mesa/swrast/s_context.c
+++ b/src/mesa/swrast/s_context.c
@@ -504,7 +504,8 @@ _swrast_update_active_attribs(struct gl_context *ctx)
attribsMask &= ~VARYING_BIT_POS; /* WPOS is always handled specially */
 }
 else if (ctx->ATIFragmentShader._Enabled) {
-  attribsMask = ~0;  /* XXX fix me */
+  attribsMask = VARYING_BIT_COL0 | VARYING_BIT_COL1 |
+VARYING_BIT_FOGC | VARYING_BITS_TEX_ANY;
 }
 else {
/* fixed function */


ping?

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: Add optimization for (a || True == True)

2016-07-07 Thread Erik Faye-Lund
On Thu, Jul 7, 2016 at 7:01 PM, Connor Abbott  wrote:
> On Thu, Jul 7, 2016 at 12:57 PM, Erik Faye-Lund  wrote:
>> On Thu, Jul 7, 2016 at 2:12 AM, Eric Anholt  wrote:
>>> This was appearing in vc4 VS/CS in mupen64, due to vertex attrib lowering
>>> producing some constants that were getting compared.
>>>
>>> total instructions in shared programs: 112276 -> 112198 (-0.07%)
>>> instructions in affected programs: 2239 -> 2161 (-3.48%)
>>> total estimated cycles in shared programs: 283102 -> 283038 (-0.02%)
>>> estimated cycles in affected programs: 2365 -> 2301 (-2.71%)
>>> ---
>>>  src/compiler/nir/nir_opt_algebraic.py | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
>>> b/src/compiler/nir/nir_opt_algebraic.py
>>> index fd228017c54e..7d04ef941b73 100644
>>> --- a/src/compiler/nir/nir_opt_algebraic.py
>>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>>> @@ -197,6 +197,7 @@ optimizations = [
>>> (('iand', a, 0), 0),
>>> (('ior', a, a), a),
>>> (('ior', a, 0), a),
>>> +   (('ior', a, True), True),
>>
>> Is it guaranteed that evaluating 'a' doesn't have side-effects at this point?
>
> Yes, since "a" is just an arbitrary SSA value being matched. The
> short-circuiting behavior of || and && is handled very early on,
> during the AST -> GLSL IR translation, by turning them into if
> statements.

Cool, thanks for clarifying. Short-circuiting is indeed what I was
worried about, and I guess as you say SSA makes this trivial to do.
Great stuff :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] nir: Optimize away IF statements with no body on either side.

2016-07-07 Thread Eric Anholt
Eric Anholt  writes:

> Due to the rampant dead code elimination in coordinate shaders for vc4, we
> often end up with IFs that do nothing on either side.  In the
> loops-enabled build, shader-db gives:
>
> total instructions in shared programs: 125192 -> 119693 (-4.39%)
> instructions in affected programs: 30649 -> 25150 (-17.94%)
> total uniforms in shared programs: 38436 -> 37632 (-2.09%)
> uniforms in affected programs: 6168 -> 5364 (-13.04%)

This is totally broken because it doesn't consider the phi nodes after
the if, and I think the thing I was thinking of would be cleaned up by
peephole_select, anyway.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: Add optimization for (a || True == True)

2016-07-07 Thread Connor Abbott
On Thu, Jul 7, 2016 at 12:57 PM, Erik Faye-Lund  wrote:
> On Thu, Jul 7, 2016 at 2:12 AM, Eric Anholt  wrote:
>> This was appearing in vc4 VS/CS in mupen64, due to vertex attrib lowering
>> producing some constants that were getting compared.
>>
>> total instructions in shared programs: 112276 -> 112198 (-0.07%)
>> instructions in affected programs: 2239 -> 2161 (-3.48%)
>> total estimated cycles in shared programs: 283102 -> 283038 (-0.02%)
>> estimated cycles in affected programs: 2365 -> 2301 (-2.71%)
>> ---
>>  src/compiler/nir/nir_opt_algebraic.py | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index fd228017c54e..7d04ef941b73 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -197,6 +197,7 @@ optimizations = [
>> (('iand', a, 0), 0),
>> (('ior', a, a), a),
>> (('ior', a, 0), a),
>> +   (('ior', a, True), True),
>
> Is it guaranteed that evaluating 'a' doesn't have side-effects at this point?

Yes, since "a" is just an arbitrary SSA value being matched. The
short-circuiting behavior of || and && is handled very early on,
during the AST -> GLSL IR translation, by turning them into if
statements.

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/17] glsl/nir: add new num_packed_components field

2016-07-07 Thread Kenneth Graunke
On Thursday, July 7, 2016 11:58:43 AM PDT Timothy Arceri wrote:
> This will be used to store the total number of components used at this 
> location
> when packing via ARB_enhanced_layouts.
> ---
>  src/compiler/glsl/glsl_to_nir.cpp   |  1 +
>  src/compiler/glsl/ir.h  |  5 +++
>  src/compiler/glsl/link_varyings.cpp | 74 
> -
>  src/compiler/glsl/linker.cpp|  2 +
>  src/compiler/glsl/linker.h  |  4 ++
>  src/compiler/nir/nir.h  |  5 +++
>  6 files changed, 89 insertions(+), 2 deletions(-)

I still hate this field.  I'm going to try and come up with an alternate
solution.  I'll keep you posted.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: Add optimization for (a || True == True)

2016-07-07 Thread Erik Faye-Lund
On Thu, Jul 7, 2016 at 2:12 AM, Eric Anholt  wrote:
> This was appearing in vc4 VS/CS in mupen64, due to vertex attrib lowering
> producing some constants that were getting compared.
>
> total instructions in shared programs: 112276 -> 112198 (-0.07%)
> instructions in affected programs: 2239 -> 2161 (-3.48%)
> total estimated cycles in shared programs: 283102 -> 283038 (-0.02%)
> estimated cycles in affected programs: 2365 -> 2301 (-2.71%)
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index fd228017c54e..7d04ef941b73 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -197,6 +197,7 @@ optimizations = [
> (('iand', a, 0), 0),
> (('ior', a, a), a),
> (('ior', a, 0), a),
> +   (('ior', a, True), True),

Is it guaranteed that evaluating 'a' doesn't have side-effects at this point?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] glx: Call __glXInitVertexArrayState() with a usable gc.

2016-07-07 Thread Matt Turner
On Wed, Jun 29, 2016 at 2:16 PM, Ian Romanick  wrote:
> On 06/29/2016 02:04 AM, Colin McDonald wrote:
>> I'm not familiar with the code, other than diving in to fix these
>> indirect multi-texture problems, so you will know much more about it
>> than me.
>>
>> But, my understanding is that __glXInitVertexArrayState needs info
>> from the server, obtained by calls to _indirect_glGetString &
>> __indirect_glGetIntegerv. Those routines need the current context
>> from __glXGetCurrentContext, so __glXSetCurrentContext(gc) must have
>> been called first.
>>
>> I see your point about a "layering violation".  I think that to avoid
>> that would require a more substantial restructuring, so that the
>> indirect layer can run some initialisation code (ie
>> __glXInitVertexArrayState or similar) separate from the bind
>> callback, once a usable context has been setup.
>
> Maybe...  *If* __glXGetCurrentContext is the only problem, then I think
> a small refactor of __indirect_glGetString could also solve the problem.
>  Just make a new function
>
> const GLubyte *do_GetString(Display *dpy, struct glx_context *gc,
> GLenum name);
>
> that both __indirect_glGetString and indirect_bind_context call.  It
> might even be worth folding the contents of __glXGetString into the new
> function... though that's probably a follow-up patch.

I tried that (see attached p.patch)... and I get another segfault.

(gdb) bt
#0  0x74a97700 in XGetXCBConnection () from /usr/lib64/libX11-xcb.so.1
#1  0x77664a3d in __glXFlushRenderBuffer (ctx=0x662080,
pc=0x77f76010 "") at ../../../mesa/src/glx/glxext.c:987
#2  0x7769b429 in do_GetString (dpy=0x64fc60, gc=0x662080,
name=7939) at ../../../mesa/src/glx/single2.c:678
#3  0x77688d55 in indirect_bind_context (gc=0x662080,
old=0x778cf6c0 , draw=27262978, read=27262978) at
../../../mesa/src/glx/indirect_glx.c:158
#4  0x7766278a in MakeContextCurrent (dpy=0x64fc60,
draw=27262978, read=27262978, gc_user=0x662080) at
../../../mesa/src/glx/glxcurrent.c:228
#5  0x770c1af0 in fgPlatformOpenWindow () from /usr/lib64/libglut.so.3
#6  0x770bbb06 in fgOpenWindow () from /usr/lib64/libglut.so.3
#7  0x770ba42b in fgCreateWindow () from /usr/lib64/libglut.so.3
#8  0x770bbc00 in glutCreateWindow () from /usr/lib64/libglut.so.3
#9  0x004022c8 in main (argc=1, argv=0x7fffe0f8) at arbvparray.c:294

gc->currentDpy is NULL. Sigh. I don't know what any of this code is doing.
diff --git a/src/glx/indirect_glx.c b/src/glx/indirect_glx.c
index bb121f8..7df8d8e 100644
--- a/src/glx/indirect_glx.c
+++ b/src/glx/indirect_glx.c
@@ -126,6 +126,9 @@ SendMakeCurrentRequest(Display * dpy, CARD8 opcode,
return ret;
 }
 
+const GLubyte *
+do_GetString(Display *dpy, struct glx_context *gc, GLenum name);
+
 static int
 indirect_bind_context(struct glx_context *gc, struct glx_context *old,
 		  GLXDrawable draw, GLXDrawable read)
@@ -152,8 +155,8 @@ indirect_bind_context(struct glx_context *gc, struct glx_context *old,
 
state = gc->client_state_private;
if (state->array_state == NULL) {
-  glGetString(GL_EXTENSIONS);
-  glGetString(GL_VERSION);
+  do_GetString(dpy, gc, GL_EXTENSIONS);
+  do_GetString(dpy, gc, GL_VERSION);
   __glXInitVertexArrayState(gc);
}
 
diff --git a/src/glx/single2.c b/src/glx/single2.c
index 2a1bf06..12141e8 100644
--- a/src/glx/single2.c
+++ b/src/glx/single2.c
@@ -638,17 +638,14 @@ version_from_string(const char *ver, int *major_version, int *minor_version)
*minor_version = minor;
 }
 
+const GLubyte *
+do_GetString(Display *dpy, struct glx_context *gc, GLenum name);
 
 const GLubyte *
-__indirect_glGetString(GLenum name)
+do_GetString(Display *dpy, struct glx_context *gc, GLenum name)
 {
-   struct glx_context *gc = __glXGetCurrentContext();
-   Display *dpy = gc->currentDpy;
GLubyte *s = NULL;
 
-   if (!dpy)
-  return 0;
-
/*
 ** Return the cached copy if the string has already been fetched
 */
@@ -789,6 +786,19 @@ __indirect_glGetString(GLenum name)
return s;
 }
 
+
+const GLubyte *
+__indirect_glGetString(GLenum name)
+{
+   struct glx_context *gc = __glXGetCurrentContext();
+   Display *dpy = gc->currentDpy;
+
+   if (!dpy)
+  return 0;
+
+   return do_GetString(dpy, gc, name);
+}
+
 GLboolean
 __indirect_glIsEnabled(GLenum cap)
 {
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] st/omx/dec: make decoder video buffer progressive

2016-07-07 Thread Leo Liu



On 07/07/2016 11:39 AM, Emil Velikov wrote:

On 6 July 2016 at 19:03, Leo Liu  wrote:

The idea of encode tunneling is to use video buffer directly for encoder,
but currently the encoder doesn’t support interlaced surface, the OMX
decoder set progressive surface before on that purpose.

Since now we are polling the driver for interlacing information for
decoder, we got the interlaced as preferred as other APIs(VDPAU, VA-API),
thus breaking the transcode with tunneling.

The solution is when with tunnel detected, re-allocate progressive target
buffers, and then converting the interlaced decoder results to there.

This has been tested with transcode results bit to bit matching as before
with surface from progressive to progressive.

Signed-off-by: Leo Liu 
---
  src/gallium/state_trackers/omx/vid_dec.c | 65 +++-
  src/gallium/state_trackers/omx/vid_dec.h |  6 ++-
  2 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/omx/vid_dec.c 
b/src/gallium/state_trackers/omx/vid_dec.c
index a989c10..7842966 100644
--- a/src/gallium/state_trackers/omx/vid_dec.c
+++ b/src/gallium/state_trackers/omx/vid_dec.c
@@ -167,6 +167,19 @@ static OMX_ERRORTYPE vid_dec_Constructor(OMX_COMPONENTTYPE 
*comp, OMX_STRING nam
 if (!priv->pipe)
return OMX_ErrorInsufficientResources;

+   if (!vl_compositor_init(>compositor, priv->pipe)) {
+  priv->pipe->destroy(priv->pipe);
+  priv->pipe = NULL;
+  return OMX_ErrorInsufficientResources;
+   }
+
+   if (!vl_compositor_init_state(>cstate, priv->pipe)) {
+  vl_compositor_cleanup(>compositor);
+  priv->pipe->destroy(priv->pipe);
+  priv->pipe = NULL;
+  return OMX_ErrorInsufficientResources;
+   }
+()

IIRC as vid_dec_Constructor() fails, the caller (bellagio?)

Right. currently it's bellagio.

explicitly
calls the destructor vid_dec_Destructor(). Thus the above teardown
should not be needed.


Right.

 priv->sPortTypesParam[OMX_PortDomainVideo].nStartPortNumber = 0;
 priv->sPortTypesParam[OMX_PortDomainVideo].nPorts = 2;
 priv->ports = CALLOC(2, sizeof(omx_base_PortType *));
@@ -218,8 +231,11 @@ static OMX_ERRORTYPE vid_dec_Destructor(OMX_COMPONENTTYPE 
*comp)
priv->ports=NULL;
 }

-   if (priv->pipe)
+   if (priv->pipe) {
+  vl_compositor_cleanup_state(>cstate);
+  vl_compositor_cleanup(>compositor);

Neither vl_compositor_cleanup_state() nor vl_compositor_cleanup() is
happy if upon deref. the value (pointer again) is NULL.

omx/vid_enc.c could use similar cleanups ?


I'll come up with a patch to fix that on vid_enc.c

Thanks,
Leo



Thanks,
Emil


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] st/omx/dec: make decoder video buffer progressive

2016-07-07 Thread Leo Liu



On 07/07/2016 11:28 AM, Julien Isorce wrote:

No encoder with nouveau driver so I cannot do a tunneling test
but at least this does not break the decoder part so the 4 patches are:
Tested-by: Julien Isorce >

Thx!


Thanks for that.

Leo



On 6 July 2016 at 19:03, Leo Liu > wrote:


The idea of encode tunneling is to use video buffer directly for
encoder,
but currently the encoder doesn’t support interlaced surface, the OMX
decoder set progressive surface before on that purpose.

Since now we are polling the driver for interlacing information for
decoder, we got the interlaced as preferred as other APIs(VDPAU,
VA-API),
thus breaking the transcode with tunneling.

The solution is when with tunnel detected, re-allocate progressive
target
buffers, and then converting the interlaced decoder results to there.

This has been tested with transcode results bit to bit matching as
before
with surface from progressive to progressive.

Signed-off-by: Leo Liu >
---
 src/gallium/state_trackers/omx/vid_dec.c | 65
+++-
 src/gallium/state_trackers/omx/vid_dec.h |  6 ++-
 2 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/omx/vid_dec.c
b/src/gallium/state_trackers/omx/vid_dec.c
index a989c10..7842966 100644
--- a/src/gallium/state_trackers/omx/vid_dec.c
+++ b/src/gallium/state_trackers/omx/vid_dec.c
@@ -167,6 +167,19 @@ static OMX_ERRORTYPE
vid_dec_Constructor(OMX_COMPONENTTYPE *comp, OMX_STRING nam
if (!priv->pipe)
   return OMX_ErrorInsufficientResources;

+   if (!vl_compositor_init(>compositor, priv->pipe)) {
+  priv->pipe->destroy(priv->pipe);
+  priv->pipe = NULL;
+  return OMX_ErrorInsufficientResources;
+   }
+
+   if (!vl_compositor_init_state(>cstate, priv->pipe)) {
+  vl_compositor_cleanup(>compositor);
+  priv->pipe->destroy(priv->pipe);
+  priv->pipe = NULL;
+  return OMX_ErrorInsufficientResources;
+   }
+
priv->sPortTypesParam[OMX_PortDomainVideo].nStartPortNumber = 0;
priv->sPortTypesParam[OMX_PortDomainVideo].nPorts = 2;
priv->ports = CALLOC(2, sizeof(omx_base_PortType *));
@@ -218,8 +231,11 @@ static OMX_ERRORTYPE
vid_dec_Destructor(OMX_COMPONENTTYPE *comp)
   priv->ports=NULL;
}

-   if (priv->pipe)
+   if (priv->pipe) {
+  vl_compositor_cleanup_state(>cstate);
+  vl_compositor_cleanup(>compositor);
   priv->pipe->destroy(priv->pipe);
+   }

if (priv->screen)
   omx_put_screen();
@@ -547,6 +563,25 @@ static void
vid_dec_FillOutput(vid_dec_PrivateType *priv, struct pipe_video_buff
}
 }

+static void vid_dec_deint(vid_dec_PrivateType *priv, struct
pipe_video_buffer *src_buf,
+  struct pipe_video_buffer *dst_buf)
+{
+   struct vl_compositor *compositor = >compositor;
+   struct vl_compositor_state *s = >cstate;
+   struct pipe_surface **dst_surface;
+
+   dst_surface = dst_buf->get_surfaces(dst_buf);
+   vl_compositor_clear_layers(s);
+
+   vl_compositor_set_yuv_layer(s, compositor, 0, src_buf, NULL,
NULL, true);
+   vl_compositor_set_layer_dst_area(s, 0, NULL);
+   vl_compositor_render(s, compositor, dst_surface[0], NULL, false);
+
+   vl_compositor_set_yuv_layer(s, compositor, 0, src_buf, NULL,
NULL, false);
+   vl_compositor_set_layer_dst_area(s, 0, NULL);
+   vl_compositor_render(s, compositor, dst_surface[1], NULL, false);
+}
+
 static void vid_dec_FrameDecoded(OMX_COMPONENTTYPE *comp,
OMX_BUFFERHEADERTYPE* input,
  OMX_BUFFERHEADERTYPE* output)
 {
@@ -562,7 +597,33 @@ static void
vid_dec_FrameDecoded(OMX_COMPONENTTYPE *comp, OMX_BUFFERHEADERTYPE*

if (input->pInputPortPrivate) {
   if (output->pInputPortPrivate) {
- struct pipe_video_buffer *tmp = output->pOutputPortPrivate;
+ struct pipe_video_buffer *tmp, *vbuf, *new_vbuf;
+
+ tmp = output->pOutputPortPrivate;
+ vbuf = input->pInputPortPrivate;
+ if (vbuf->interlaced) {
+/* re-allocate the progressive buffer */
+omx_base_video_PortType *port;
+struct pipe_video_buffer templat = {};
+
+port = (omx_base_video_PortType *)
+ priv->ports[OMX_BASE_FILTER_INPUTPORT_INDEX];
+memset(, 0, sizeof(templat));
+templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420;
+templat.width =
port->sPortParam.format.video.nFrameWidth;
+templat.height =

Re: [Mesa-dev] [PATCH] st/omx: fix crash when vid_enc_Constructor fails

2016-07-07 Thread Leo Liu



On 07/07/2016 11:17 AM, Julien Isorce wrote:

It happens when trying to use omxh264enc with nouveau driver
because it does not provide any encoder at the moment.

It crashes on enc_ReleaseTasks(>free_tasks) because
at this time the list is not initialized.
So this patch make sure the lists are initialized.

Another way to fix this would be to do an early return in
enc_ReleaseTasks if head->next is null.


This way could make more sense.
all the tasks list should be initialized after everything gets settled.

Regards,
Leo


---
  src/gallium/state_trackers/omx/vid_enc.c | 17 ++---
  1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/gallium/state_trackers/omx/vid_enc.c 
b/src/gallium/state_trackers/omx/vid_enc.c
index d70439a..7df5565 100644
--- a/src/gallium/state_trackers/omx/vid_enc.c
+++ b/src/gallium/state_trackers/omx/vid_enc.c
@@ -158,9 +158,14 @@ static OMX_ERRORTYPE vid_enc_Constructor(OMX_COMPONENTTYPE 
*comp, OMX_STRING nam
 if (!priv)
return OMX_ErrorInsufficientResources;
  
+   LIST_INITHEAD(>free_tasks);

+   LIST_INITHEAD(>used_tasks);
+   LIST_INITHEAD(>b_frames);
+   LIST_INITHEAD(>stacked_tasks);
+
 r = omx_base_filter_Constructor(comp, name);
 if (r)
-   return r;
+  return r;
  
 priv->BufferMgmtCallback = vid_enc_BufferEncoded;

 priv->messageHandler = vid_enc_MessageHandler;
@@ -256,11 +261,6 @@ static OMX_ERRORTYPE vid_enc_Constructor(OMX_COMPONENTTYPE 
*comp, OMX_STRING nam
 priv->scale.xWidth = OMX_VID_ENC_SCALING_WIDTH_DEFAULT;
 priv->scale.xHeight = OMX_VID_ENC_SCALING_WIDTH_DEFAULT;
  
-   LIST_INITHEAD(>free_tasks);

-   LIST_INITHEAD(>used_tasks);
-   LIST_INITHEAD(>b_frames);
-   LIST_INITHEAD(>stacked_tasks);
-
 return OMX_ErrorNone;
  }
  
@@ -269,6 +269,9 @@ static OMX_ERRORTYPE vid_enc_Destructor(OMX_COMPONENTTYPE *comp)

 vid_enc_PrivateType* priv = comp->pComponentPrivate;
 int i;
  
+   if (!priv)

+  return OMX_ErrorBadParameter;
+
 enc_ReleaseTasks(>free_tasks);
 enc_ReleaseTasks(>used_tasks);
 enc_ReleaseTasks(>b_frames);
@@ -875,7 +878,7 @@ static void enc_ReleaseTasks(struct list_head *head)
 struct encode_task *i, *next;
  
 if (!head)

-  return;
+  return;
  
 LIST_FOR_EACH_ENTRY_SAFE(i, next, head, list) {

pipe_resource_reference(>bitstream, NULL);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] st/omx/dec: make decoder video buffer progressive

2016-07-07 Thread Emil Velikov
On 6 July 2016 at 19:03, Leo Liu  wrote:
> The idea of encode tunneling is to use video buffer directly for encoder,
> but currently the encoder doesn’t support interlaced surface, the OMX
> decoder set progressive surface before on that purpose.
>
> Since now we are polling the driver for interlacing information for
> decoder, we got the interlaced as preferred as other APIs(VDPAU, VA-API),
> thus breaking the transcode with tunneling.
>
> The solution is when with tunnel detected, re-allocate progressive target
> buffers, and then converting the interlaced decoder results to there.
>
> This has been tested with transcode results bit to bit matching as before
> with surface from progressive to progressive.
>
> Signed-off-by: Leo Liu 
> ---
>  src/gallium/state_trackers/omx/vid_dec.c | 65 
> +++-
>  src/gallium/state_trackers/omx/vid_dec.h |  6 ++-
>  2 files changed, 68 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/state_trackers/omx/vid_dec.c 
> b/src/gallium/state_trackers/omx/vid_dec.c
> index a989c10..7842966 100644
> --- a/src/gallium/state_trackers/omx/vid_dec.c
> +++ b/src/gallium/state_trackers/omx/vid_dec.c
> @@ -167,6 +167,19 @@ static OMX_ERRORTYPE 
> vid_dec_Constructor(OMX_COMPONENTTYPE *comp, OMX_STRING nam
> if (!priv->pipe)
>return OMX_ErrorInsufficientResources;
>
> +   if (!vl_compositor_init(>compositor, priv->pipe)) {
> +  priv->pipe->destroy(priv->pipe);
> +  priv->pipe = NULL;
> +  return OMX_ErrorInsufficientResources;
> +   }
> +
> +   if (!vl_compositor_init_state(>cstate, priv->pipe)) {
> +  vl_compositor_cleanup(>compositor);
> +  priv->pipe->destroy(priv->pipe);
> +  priv->pipe = NULL;
> +  return OMX_ErrorInsufficientResources;
> +   }
> +()
IIRC as vid_dec_Constructor() fails, the caller (bellagio?) explicitly
calls the destructor vid_dec_Destructor(). Thus the above teardown
should not be needed.

> priv->sPortTypesParam[OMX_PortDomainVideo].nStartPortNumber = 0;
> priv->sPortTypesParam[OMX_PortDomainVideo].nPorts = 2;
> priv->ports = CALLOC(2, sizeof(omx_base_PortType *));
> @@ -218,8 +231,11 @@ static OMX_ERRORTYPE 
> vid_dec_Destructor(OMX_COMPONENTTYPE *comp)
>priv->ports=NULL;
> }
>
> -   if (priv->pipe)
> +   if (priv->pipe) {
> +  vl_compositor_cleanup_state(>cstate);
> +  vl_compositor_cleanup(>compositor);
Neither vl_compositor_cleanup_state() nor vl_compositor_cleanup() is
happy if upon deref. the value (pointer again) is NULL.

omx/vid_enc.c could use similar cleanups ?

Thanks,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/omx: fix crash when vid_enc_Constructor fails

2016-07-07 Thread Emil Velikov
Hi Julien,

On 7 July 2016 at 16:17, Julien Isorce  wrote:
> It happens when trying to use omxh264enc with nouveau driver
> because it does not provide any encoder at the moment.
>
> It crashes on enc_ReleaseTasks(>free_tasks) because
> at this time the list is not initialized.
> So this patch make sure the lists are initialized.
>
> Another way to fix this would be to do an early return in
> enc_ReleaseTasks if head->next is null.
IMHO your patch makes sense as-is.

Please add the stable tag before committing. and (nit) keep the
whitespace changes as separate patch.

With that, the (resulting) two patches are
Reviewed-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] st/omx/dec: make decoder video buffer progressive

2016-07-07 Thread Julien Isorce
No encoder with nouveau driver so I cannot do a tunneling test
but at least this does not break the decoder part so the 4 patches are:
Tested-by: Julien Isorce 
Thx!


On 6 July 2016 at 19:03, Leo Liu  wrote:

> The idea of encode tunneling is to use video buffer directly for encoder,
> but currently the encoder doesn’t support interlaced surface, the OMX
> decoder set progressive surface before on that purpose.
>
> Since now we are polling the driver for interlacing information for
> decoder, we got the interlaced as preferred as other APIs(VDPAU, VA-API),
> thus breaking the transcode with tunneling.
>
> The solution is when with tunnel detected, re-allocate progressive target
> buffers, and then converting the interlaced decoder results to there.
>
> This has been tested with transcode results bit to bit matching as before
> with surface from progressive to progressive.
>
> Signed-off-by: Leo Liu 
> ---
>  src/gallium/state_trackers/omx/vid_dec.c | 65
> +++-
>  src/gallium/state_trackers/omx/vid_dec.h |  6 ++-
>  2 files changed, 68 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/state_trackers/omx/vid_dec.c
> b/src/gallium/state_trackers/omx/vid_dec.c
> index a989c10..7842966 100644
> --- a/src/gallium/state_trackers/omx/vid_dec.c
> +++ b/src/gallium/state_trackers/omx/vid_dec.c
> @@ -167,6 +167,19 @@ static OMX_ERRORTYPE
> vid_dec_Constructor(OMX_COMPONENTTYPE *comp, OMX_STRING nam
> if (!priv->pipe)
>return OMX_ErrorInsufficientResources;
>
> +   if (!vl_compositor_init(>compositor, priv->pipe)) {
> +  priv->pipe->destroy(priv->pipe);
> +  priv->pipe = NULL;
> +  return OMX_ErrorInsufficientResources;
> +   }
> +
> +   if (!vl_compositor_init_state(>cstate, priv->pipe)) {
> +  vl_compositor_cleanup(>compositor);
> +  priv->pipe->destroy(priv->pipe);
> +  priv->pipe = NULL;
> +  return OMX_ErrorInsufficientResources;
> +   }
> +
> priv->sPortTypesParam[OMX_PortDomainVideo].nStartPortNumber = 0;
> priv->sPortTypesParam[OMX_PortDomainVideo].nPorts = 2;
> priv->ports = CALLOC(2, sizeof(omx_base_PortType *));
> @@ -218,8 +231,11 @@ static OMX_ERRORTYPE
> vid_dec_Destructor(OMX_COMPONENTTYPE *comp)
>priv->ports=NULL;
> }
>
> -   if (priv->pipe)
> +   if (priv->pipe) {
> +  vl_compositor_cleanup_state(>cstate);
> +  vl_compositor_cleanup(>compositor);
>priv->pipe->destroy(priv->pipe);
> +   }
>
> if (priv->screen)
>omx_put_screen();
> @@ -547,6 +563,25 @@ static void vid_dec_FillOutput(vid_dec_PrivateType
> *priv, struct pipe_video_buff
> }
>  }
>
> +static void vid_dec_deint(vid_dec_PrivateType *priv, struct
> pipe_video_buffer *src_buf,
> +  struct pipe_video_buffer *dst_buf)
> +{
> +   struct vl_compositor *compositor = >compositor;
> +   struct vl_compositor_state *s = >cstate;
> +   struct pipe_surface **dst_surface;
> +
> +   dst_surface = dst_buf->get_surfaces(dst_buf);
> +   vl_compositor_clear_layers(s);
> +
> +   vl_compositor_set_yuv_layer(s, compositor, 0, src_buf, NULL, NULL,
> true);
> +   vl_compositor_set_layer_dst_area(s, 0, NULL);
> +   vl_compositor_render(s, compositor, dst_surface[0], NULL, false);
> +
> +   vl_compositor_set_yuv_layer(s, compositor, 0, src_buf, NULL, NULL,
> false);
> +   vl_compositor_set_layer_dst_area(s, 0, NULL);
> +   vl_compositor_render(s, compositor, dst_surface[1], NULL, false);
> +}
> +
>  static void vid_dec_FrameDecoded(OMX_COMPONENTTYPE *comp,
> OMX_BUFFERHEADERTYPE* input,
>   OMX_BUFFERHEADERTYPE* output)
>  {
> @@ -562,7 +597,33 @@ static void vid_dec_FrameDecoded(OMX_COMPONENTTYPE
> *comp, OMX_BUFFERHEADERTYPE*
>
> if (input->pInputPortPrivate) {
>if (output->pInputPortPrivate) {
> - struct pipe_video_buffer *tmp = output->pOutputPortPrivate;
> + struct pipe_video_buffer *tmp, *vbuf, *new_vbuf;
> +
> + tmp = output->pOutputPortPrivate;
> + vbuf = input->pInputPortPrivate;
> + if (vbuf->interlaced) {
> +/* re-allocate the progressive buffer */
> +omx_base_video_PortType *port;
> +struct pipe_video_buffer templat = {};
> +
> +port = (omx_base_video_PortType *)
> +priv->ports[OMX_BASE_FILTER_INPUTPORT_INDEX];
> +memset(, 0, sizeof(templat));
> +templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420;
> +templat.width = port->sPortParam.format.video.nFrameWidth;
> +templat.height = port->sPortParam.format.video.nFrameHeight;
> +templat.buffer_format = PIPE_FORMAT_NV12;
> +templat.interlaced = false;
> +new_vbuf = priv->pipe->create_video_buffer(priv->pipe,
> );
> +
> +/* convert the interlaced to the progressive */
> +vid_dec_deint(priv, input->pInputPortPrivate, new_vbuf);
> +

[Mesa-dev] [PATCH] st/omx: fix crash when vid_enc_Constructor fails

2016-07-07 Thread Julien Isorce
It happens when trying to use omxh264enc with nouveau driver
because it does not provide any encoder at the moment.

It crashes on enc_ReleaseTasks(>free_tasks) because
at this time the list is not initialized.
So this patch make sure the lists are initialized.

Another way to fix this would be to do an early return in
enc_ReleaseTasks if head->next is null.
---
 src/gallium/state_trackers/omx/vid_enc.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/gallium/state_trackers/omx/vid_enc.c 
b/src/gallium/state_trackers/omx/vid_enc.c
index d70439a..7df5565 100644
--- a/src/gallium/state_trackers/omx/vid_enc.c
+++ b/src/gallium/state_trackers/omx/vid_enc.c
@@ -158,9 +158,14 @@ static OMX_ERRORTYPE vid_enc_Constructor(OMX_COMPONENTTYPE 
*comp, OMX_STRING nam
if (!priv)
   return OMX_ErrorInsufficientResources;
 
+   LIST_INITHEAD(>free_tasks);
+   LIST_INITHEAD(>used_tasks);
+   LIST_INITHEAD(>b_frames);
+   LIST_INITHEAD(>stacked_tasks);
+
r = omx_base_filter_Constructor(comp, name);
if (r)
-   return r;
+  return r;
 
priv->BufferMgmtCallback = vid_enc_BufferEncoded;
priv->messageHandler = vid_enc_MessageHandler;
@@ -256,11 +261,6 @@ static OMX_ERRORTYPE vid_enc_Constructor(OMX_COMPONENTTYPE 
*comp, OMX_STRING nam
priv->scale.xWidth = OMX_VID_ENC_SCALING_WIDTH_DEFAULT;
priv->scale.xHeight = OMX_VID_ENC_SCALING_WIDTH_DEFAULT;
 
-   LIST_INITHEAD(>free_tasks);
-   LIST_INITHEAD(>used_tasks);
-   LIST_INITHEAD(>b_frames);
-   LIST_INITHEAD(>stacked_tasks);
-
return OMX_ErrorNone;
 }
 
@@ -269,6 +269,9 @@ static OMX_ERRORTYPE vid_enc_Destructor(OMX_COMPONENTTYPE 
*comp)
vid_enc_PrivateType* priv = comp->pComponentPrivate;
int i;
 
+   if (!priv)
+  return OMX_ErrorBadParameter;
+
enc_ReleaseTasks(>free_tasks);
enc_ReleaseTasks(>used_tasks);
enc_ReleaseTasks(>b_frames);
@@ -875,7 +878,7 @@ static void enc_ReleaseTasks(struct list_head *head)
struct encode_task *i, *next;
 
if (!head)
-  return;
+  return;
 
LIST_FOR_EACH_ENTRY_SAFE(i, next, head, list) {
   pipe_resource_reference(>bitstream, NULL);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] mesa: don't install GLX files if GLX is not built

2016-07-07 Thread Emil Velikov
On 7 July 2016 at 15:14, Akihiko Odaki  wrote:
> Hi Emil,
>
> On 2016-07-07 19:11, Emil Velikov wrote:
>>
>> [Adding back mesa-dev]
>
>
> Sorry, I mistakenly clicked "Reply" instead of "Reply All".
>
>> Hi Akihiko Odaki
>>
>> Before anything, let me say a couple of things about DRI.
>> The DRI interface is an abstraction layer where you have winsys
>> (GLX/EGL/other) agnostic DRI module and different DRI loaders, each
>> implementing different winsys' API. Thus DRI modules do/should not
>> have any GLX/EGL/other dependencies.
>>
>> As the location suggests, the file is not meant for developers using
>> GL/GLX/..., but it's an internal interface. This way loaders/modules
>> living outside of the mesa tree can reuse it. One such example is the
>> loader is xserver.
>
>
> That's what I wanted to know. Thank you for answering.
>
>> On 7 July 2016 at 00:38, Akihiko Odaki 
>> wrote:
>>>
>>> I forgot to note that other GL header files are included by the internal
>>> DRI
>>> files.
>>
>> The dri_interface.h file includes only drm.h afaict, due to historical
>> reasons. For ease of use (development) the DRI modules can reuse
>> numerical values from GL/GLES.
>
>
> They seems. They include GL/gl.h, which provide those types.
>
>>> But they are not necessary when developing softwares and some of them
>>> don't work as expected if the installation lacks GLX.
>>>
>> Yes, as mentioned above - DRI is internal and not something meant for
>> GL/GLES/EGL/... developers.
>>

 I assumed the header files in GL/internal are not supposed to be
 included by users directly, but they can be included by other public
 header files. If so, GL/internal/dri_interface.h is not necessary to be
 installed since it is not included by such headers.

 But, after recieving your email I investigated again to find that
 GL/internal/dri_interface.h is not included even if GLX is enabled. Now
 I'm not sure what is the expected usage of GL/internal/dri_interface.h.
 Should it be installed anyway?

>> Yes it should. Unfortunately one cannot know whether the
>> header/interface will be used, during mesa build/install stage.
>> If you know it's not needed, feel free to nuke it.
>>
>> Emil
>>
>
> I see. Now I also think the hunk to patch src/mesa/drivers/dri/Makefile.am
> should be removed.
>
> Sorry for bothering and thank you for answering my questions.
>
There's nothing to apologise for, one cannot expect that you'll know this :-)
I'll drop the hunk and push the patch in a second.

Thanks again,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa 12.0.0 release candidate 4

2016-07-07 Thread Emil Velikov
On 6 July 2016 at 20:37, Rob Clark  wrote:
> On Thu, Jun 23, 2016 at 9:35 AM, Emil Velikov  
> wrote:
>> Hi all,
>>
>> On 21 June 2016 at 15:35, Emil Velikov  wrote:
>>> The fourth release candidate for Mesa 12.0.0 is now available.
>>>
>>> Note: this is the final release candidate, with Mesa 12.0.0 expected in a 
>>> couple of days.
>>>
>> Considering the requests, from different parties, the final release
>> will be out tomorrow Friday after 20:00 GMT.
>>
>> All your nominations (that have master landed in master, if
>> applicable) will be included, but do let me know if certain patch(es)
>> should be included/excluded from the release.
>
> btw, in case you missed my note on IRC, these would be good to have on
> the 12.0 branch:
>
> 7295428 freedreno: fix crash on smaller gpus and higher resolutions
> 01ccb0d i965: don't drop const initializers in vector splitting
> f78a6b1 glsl: add driconf to zero-init unintialized vars
>
Thanks. Will pick those up as well.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa 12.0.0 release candidate 4

2016-07-07 Thread Emil Velikov
On 6 July 2016 at 20:30, Jason Ekstrand  wrote:
>
>
> On Thu, Jun 23, 2016 at 6:35 AM, Emil Velikov 
> wrote:
>>
>> Hi all,
>>
>> On 21 June 2016 at 15:35, Emil Velikov  wrote:
>> > The fourth release candidate for Mesa 12.0.0 is now available.
>> >
>> > Note: this is the final release candidate, with Mesa 12.0.0 expected in
>> > a couple of days.
>> >
>> Considering the requests, from different parties, the final release
>> will be out tomorrow Friday after 20:00 GMT.
>
>
> What's going on here?  I don't think I missed the release but "tomorrow" was
> 12 days ago according to my e-mail client.
I'm afraid a bunch of ugly things have been 'distracting' me recently
:-\ I'll double-check things I ping you wrt Talos/Dota2 + Vulkan in
very shortly.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] gallium: add async flag to pipe_debug_callback

2016-07-07 Thread Jan Vesely
On Thu, 2016-07-07 at 09:39 +0200, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> ---
>  src/gallium/include/pipe/p_state.h   | 6 ++
>  src/gallium/state_trackers/clover/core/queue.cpp | 5 -
>  src/mesa/state_tracker/st_debug.c| 5 -
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/include/pipe/p_state.h
> b/src/gallium/include/pipe/p_state.h
> index f4bee38..f7bf402 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -809,6 +809,12 @@ struct pipe_compute_state
>  struct pipe_debug_callback
>  {
> /**
> +* When set to \c true, the callback may be called asynchronously
> from a
> +* driver-created thread.
> +*/
> +   bool async;
> +
> +   /**
>  * Callback for the driver to report debug/performance/etc
> information back
>  * to the state tracker.
>  *
> diff --git a/src/gallium/state_trackers/clover/core/queue.cpp
> b/src/gallium/state_trackers/clover/core/queue.cpp
> index 24d71f1..00afdb6 100644
> --- a/src/gallium/state_trackers/clover/core/queue.cpp
> +++ b/src/gallium/state_trackers/clover/core/queue.cpp
> @@ -50,7 +50,10 @@ command_queue::command_queue(clover::context ,
> clover::device ,
>    throw error(CL_INVALID_DEVICE);
>  
> if (ctx.notify) {
> -  struct pipe_debug_callback cb = { _notify_callback, this
> };
> +  struct pipe_debug_callback cb;
> +  memset(, 0, sizeof(db));
> +  cb.debug_message = _notify_callback;
> +  cb.data = this;

I don't think this is necessary, elements that are not mentioned are
initialized to zero, both C(below) and C++.

Jan

>    if (pipe->set_debug_callback)
>   pipe->set_debug_callback(pipe, );
> }
> diff --git a/src/mesa/state_tracker/st_debug.c
> b/src/mesa/state_tracker/st_debug.c
> index eaf2549..214e223 100644
> --- a/src/mesa/state_tracker/st_debug.c
> +++ b/src/mesa/state_tracker/st_debug.c
> @@ -172,7 +172,10 @@ st_enable_debug_output(struct st_context *st,
> boolean enable)
>    return;
>  
> if (enable) {
> -  struct pipe_debug_callback cb = { st_debug_message, st };
> +  struct pipe_debug_callback cb;
> +  memset(, 0, sizeof(cb));
> +  cb.debug_message = st_debug_message;
> +  cb.data = st;
>    pipe->set_debug_callback(pipe, );
> } else {
>    pipe->set_debug_callback(pipe, NULL);


signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] mesa: don't install GLX files if GLX is not built

2016-07-07 Thread Akihiko Odaki

Hi Emil,

On 2016-07-07 19:11, Emil Velikov wrote:

[Adding back mesa-dev]


Sorry, I mistakenly clicked "Reply" instead of "Reply All".


Hi Akihiko Odaki

Before anything, let me say a couple of things about DRI.
The DRI interface is an abstraction layer where you have winsys
(GLX/EGL/other) agnostic DRI module and different DRI loaders, each
implementing different winsys' API. Thus DRI modules do/should not
have any GLX/EGL/other dependencies.

As the location suggests, the file is not meant for developers using
GL/GLX/..., but it's an internal interface. This way loaders/modules
living outside of the mesa tree can reuse it. One such example is the
loader is xserver.


That's what I wanted to know. Thank you for answering.


On 7 July 2016 at 00:38, Akihiko Odaki  wrote:

I forgot to note that other GL header files are included by the internal DRI
files.

The dri_interface.h file includes only drm.h afaict, due to historical
reasons. For ease of use (development) the DRI modules can reuse
numerical values from GL/GLES.


They seems. They include GL/gl.h, which provide those types.


But they are not necessary when developing softwares and some of them
don't work as expected if the installation lacks GLX.


Yes, as mentioned above - DRI is internal and not something meant for
GL/GLES/EGL/... developers.



I assumed the header files in GL/internal are not supposed to be
included by users directly, but they can be included by other public
header files. If so, GL/internal/dri_interface.h is not necessary to be
installed since it is not included by such headers.

But, after recieving your email I investigated again to find that
GL/internal/dri_interface.h is not included even if GLX is enabled. Now
I'm not sure what is the expected usage of GL/internal/dri_interface.h.
Should it be installed anyway?


Yes it should. Unfortunately one cannot know whether the
header/interface will be used, during mesa build/install stage.
If you know it's not needed, feel free to nuke it.

Emil



I see. Now I also think the hunk to patch 
src/mesa/drivers/dri/Makefile.am should be removed.


Sorry for bothering and thank you for answering my questions.

Regards,
Akihiko Odaki
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] radeon/uvd: move polaris fw check into radeon_video.c

2016-07-07 Thread Christian König

Am 07.07.2016 um 15:33 schrieb Leo Liu:



On 07/07/2016 05:57 AM, Christian König wrote:

From: Christian König 

It's actually not very clever to claim to support H.264
and then fail to create a decoder.

Signed-off-by: Christian König 
---
  src/gallium/drivers/radeon/radeon_uvd.c   |  8 
  src/gallium/drivers/radeon/radeon_video.c | 16 +---
  2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c

index 7223417..52658fa 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -60,8 +60,6 @@
  #define FB_BUFFER_SIZE_TONGA (2048 * 64)
  #define IT_SCALING_TABLE_SIZE 992
  -#define FW_1_66_16 ((1 << 24) | (66 << 16) | (16 << 8))
-
  /* UVD decoder representation */
  struct ruvd_decoder {
  struct pipe_video_codecbase;
@@ -1185,12 +1183,6 @@ struct pipe_video_codec 
*ruvd_create_decoder(struct pipe_context *context,

  height = align(height, VL_MACROBLOCK_HEIGHT);
  break;
  case PIPE_VIDEO_FORMAT_MPEG4_AVC:
-if ((info.family == CHIP_POLARIS10 || info.family == 
CHIP_POLARIS11) &&

-info.uvd_fw_version < FW_1_66_16 ) {
-RVID_ERR("POLARIS10/11 firmware version need to be 
updated.\n");

-return NULL;
-}
-
  width = align(width, VL_MACROBLOCK_WIDTH);
  height = align(height, VL_MACROBLOCK_HEIGHT);
  break;
diff --git a/src/gallium/drivers/radeon/radeon_video.c 
b/src/gallium/drivers/radeon/radeon_video.c

index 69e4416..a26668b 100644
--- a/src/gallium/drivers/radeon/radeon_video.c
+++ b/src/gallium/drivers/radeon/radeon_video.c
@@ -43,6 +43,8 @@
  #include "radeon_video.h"
  #include "radeon_vce.h"
  +#define FW_1_66_16 ((1 << 24) | (66 << 16) | (16 << 8))
+


Please add "UVD_" in front of Macro, 'cause at radeon/video, we are 
dealing VCE as well.

With that fixed, patch series are:


Good point, thanks for the review. Going to fix that and push the patches.

Christian.


Reviewed-by: Leo Liu 

Regards,
Leo



  /* generate an stream handle */
  unsigned rvid_alloc_stream_handle()
  {
@@ -206,6 +208,9 @@ int rvid_get_video_param(struct pipe_screen *screen,
  {
  struct r600_common_screen *rscreen = (struct r600_common_screen 
*)screen;

  enum pipe_video_format codec = u_reduce_video_profile(profile);
+struct radeon_info info;
+
+rscreen->ws->query_info(rscreen->ws, );
if (entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) {
  switch (param) {
@@ -239,10 +244,15 @@ int rvid_get_video_param(struct pipe_screen 
*screen,

  case PIPE_VIDEO_FORMAT_MPEG12:
  return profile != PIPE_VIDEO_PROFILE_MPEG1;
  case PIPE_VIDEO_FORMAT_MPEG4:
+/* no support for MPEG4 on older hw */
+return rscreen->family >= CHIP_PALM;
  case PIPE_VIDEO_FORMAT_MPEG4_AVC:
-if (rscreen->family < CHIP_PALM)
-/* no support for MPEG4 */
-return codec != PIPE_VIDEO_FORMAT_MPEG4;
+if ((rscreen->family == CHIP_POLARIS10 ||
+ rscreen->family == CHIP_POLARIS11) &&
+info.uvd_fw_version < FW_1_66_16 ) {
+RVID_ERR("POLARIS10/11 firmware version need to be 
updated.\n");

+return false;
+}
  return true;
  case PIPE_VIDEO_FORMAT_VC1:
  return true;




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeon/video: fix coding style in radeon_video.c

2016-07-07 Thread Christian König

Am 07.07.2016 um 15:51 schrieb Aaron Watry:



On Thu, Jul 7, 2016 at 4:57 AM, Christian König 
> wrote:


From: Christian König >

Signed-off-by: Christian König >
---
 src/gallium/drivers/radeon/radeon_video.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_video.c
b/src/gallium/drivers/radeon/radeon_video.c
index 5b29c78..69e4416 100644
--- a/src/gallium/drivers/radeon/radeon_video.c
+++ b/src/gallium/drivers/radeon/radeon_video.c
@@ -213,23 +213,23 @@ int rvid_get_video_param(struct pipe_screen
*screen,
return codec == PIPE_VIDEO_FORMAT_MPEG4_AVC &&
rvce_is_fw_version_supported(rscreen);
case PIPE_VIDEO_CAP_NPOT_TEXTURES:
-   return 1;
+   return 1;


Did you really mean to only replace the first set of spaces with tabs 
on these lines, or did you also mean to replace the send set?


Right now, you've got 2 tabs, and then spaces for the last indentation 
level... which seems weird.


Indeed, looks like my regular expression didn't worked as expected.

Thanks for pointing this out.

Christian.



--Aaron

case PIPE_VIDEO_CAP_MAX_WIDTH:
return (rscreen->family < CHIP_TONGA) ?
2048 : 4096;
case PIPE_VIDEO_CAP_MAX_HEIGHT:
return (rscreen->family < CHIP_TONGA) ?
1152 : 2304;
case PIPE_VIDEO_CAP_PREFERED_FORMAT:
-   return PIPE_FORMAT_NV12;
+   return PIPE_FORMAT_NV12;
case PIPE_VIDEO_CAP_PREFERS_INTERLACED:
-   return false;
+   return false;
case PIPE_VIDEO_CAP_SUPPORTS_INTERLACED:
-   return false;
+   return false;
case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE:
-   return true;
+   return true;
case PIPE_VIDEO_CAP_STACKED_FRAMES:
return (rscreen->family < CHIP_TONGA) ? 1 : 2;
default:
-   return 0;
+   return 0;
}
}

--
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org 
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeon/video: fix coding style in radeon_video.c

2016-07-07 Thread Aaron Watry
On Thu, Jul 7, 2016 at 4:57 AM, Christian König 
wrote:

> From: Christian König 
>
> Signed-off-by: Christian König 
> ---
>  src/gallium/drivers/radeon/radeon_video.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/radeon_video.c
> b/src/gallium/drivers/radeon/radeon_video.c
> index 5b29c78..69e4416 100644
> --- a/src/gallium/drivers/radeon/radeon_video.c
> +++ b/src/gallium/drivers/radeon/radeon_video.c
> @@ -213,23 +213,23 @@ int rvid_get_video_param(struct pipe_screen *screen,
> return codec == PIPE_VIDEO_FORMAT_MPEG4_AVC &&
> rvce_is_fw_version_supported(rscreen);
> case PIPE_VIDEO_CAP_NPOT_TEXTURES:
> -   return 1;
> +   return 1;
>

Did you really mean to only replace the first set of spaces with tabs on
these lines, or did you also mean to replace the send set?

Right now, you've got 2 tabs, and then spaces for the last indentation
level... which seems weird.

--Aaron


> case PIPE_VIDEO_CAP_MAX_WIDTH:
> return (rscreen->family < CHIP_TONGA) ? 2048 :
> 4096;
> case PIPE_VIDEO_CAP_MAX_HEIGHT:
> return (rscreen->family < CHIP_TONGA) ? 1152 :
> 2304;
> case PIPE_VIDEO_CAP_PREFERED_FORMAT:
> -   return PIPE_FORMAT_NV12;
> +   return PIPE_FORMAT_NV12;
> case PIPE_VIDEO_CAP_PREFERS_INTERLACED:
> -   return false;
> +   return false;
> case PIPE_VIDEO_CAP_SUPPORTS_INTERLACED:
> -   return false;
> +   return false;
> case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE:
> -   return true;
> +   return true;
> case PIPE_VIDEO_CAP_STACKED_FRAMES:
> return (rscreen->family < CHIP_TONGA) ? 1 : 2;
> default:
> -   return 0;
> +   return 0;
> }
> }
>
> --
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] radeon/uvd: move polaris fw check into radeon_video.c

2016-07-07 Thread Leo Liu



On 07/07/2016 05:57 AM, Christian König wrote:

From: Christian König 

It's actually not very clever to claim to support H.264
and then fail to create a decoder.

Signed-off-by: Christian König 
---
  src/gallium/drivers/radeon/radeon_uvd.c   |  8 
  src/gallium/drivers/radeon/radeon_video.c | 16 +---
  2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c
index 7223417..52658fa 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -60,8 +60,6 @@
  #define FB_BUFFER_SIZE_TONGA (2048 * 64)
  #define IT_SCALING_TABLE_SIZE 992
  
-#define FW_1_66_16 ((1 << 24) | (66 << 16) | (16 << 8))

-
  /* UVD decoder representation */
  struct ruvd_decoder {
struct pipe_video_codec base;
@@ -1185,12 +1183,6 @@ struct pipe_video_codec *ruvd_create_decoder(struct 
pipe_context *context,
height = align(height, VL_MACROBLOCK_HEIGHT);
break;
case PIPE_VIDEO_FORMAT_MPEG4_AVC:
-   if ((info.family == CHIP_POLARIS10 || info.family == CHIP_POLARIS11) 
&&
-   info.uvd_fw_version < FW_1_66_16 ) {
-   RVID_ERR("POLARIS10/11 firmware version need to be 
updated.\n");
-   return NULL;
-   }
-
width = align(width, VL_MACROBLOCK_WIDTH);
height = align(height, VL_MACROBLOCK_HEIGHT);
break;
diff --git a/src/gallium/drivers/radeon/radeon_video.c 
b/src/gallium/drivers/radeon/radeon_video.c
index 69e4416..a26668b 100644
--- a/src/gallium/drivers/radeon/radeon_video.c
+++ b/src/gallium/drivers/radeon/radeon_video.c
@@ -43,6 +43,8 @@
  #include "radeon_video.h"
  #include "radeon_vce.h"
  
+#define FW_1_66_16 ((1 << 24) | (66 << 16) | (16 << 8))

+


Please add "UVD_" in front of Macro, 'cause at radeon/video, we are 
dealing VCE as well.

With that fixed, patch series are:
Reviewed-by: Leo Liu 

Regards,
Leo



  /* generate an stream handle */
  unsigned rvid_alloc_stream_handle()
  {
@@ -206,6 +208,9 @@ int rvid_get_video_param(struct pipe_screen *screen,
  {
struct r600_common_screen *rscreen = (struct r600_common_screen 
*)screen;
enum pipe_video_format codec = u_reduce_video_profile(profile);
+   struct radeon_info info;
+
+   rscreen->ws->query_info(rscreen->ws, );
  
  	if (entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) {

switch (param) {
@@ -239,10 +244,15 @@ int rvid_get_video_param(struct pipe_screen *screen,
case PIPE_VIDEO_FORMAT_MPEG12:
return profile != PIPE_VIDEO_PROFILE_MPEG1;
case PIPE_VIDEO_FORMAT_MPEG4:
+   /* no support for MPEG4 on older hw */
+   return rscreen->family >= CHIP_PALM;
case PIPE_VIDEO_FORMAT_MPEG4_AVC:
-   if (rscreen->family < CHIP_PALM)
-   /* no support for MPEG4 */
-   return codec != PIPE_VIDEO_FORMAT_MPEG4;
+   if ((rscreen->family == CHIP_POLARIS10 ||
+rscreen->family == CHIP_POLARIS11) &&
+   info.uvd_fw_version < FW_1_66_16 ) {
+   RVID_ERR("POLARIS10/11 firmware version need to be 
updated.\n");
+   return false;
+   }
return true;
case PIPE_VIDEO_FORMAT_VC1:
return true;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/11] glsl: Replace the linear search in get_intrinsic_opcode with a radix trie

2016-07-07 Thread Jason Ekstrand
I can't help but think that we keep solving this problem... We have a
low-collision hash table for vkGetProcAddress, something for
glxGetProcAddress and eglGetProcAddress (hopefully the same?) and now
this.  Can we pick a method, make it a little Python helper, and use it
everywhere?

Not being critical; this is probably a fine solution for compile-time
string -> int mappings and possibly the best to date.  It just seems like
kind of a big hammer especially if glsl_to_nir is its only user.
On Jul 5, 2016 5:46 PM, "Ian Romanick"  wrote:

> From: Ian Romanick 
>
> If there is a way to do this cleanly in mako, I'm very interested to
> hear about it.
>
>textdata bss dec hex filename
> 7529003  273096   28584 7830683  777c9b /tmp/i965_dri-64bit-before.so
> 7528883  273096   28584 7830563  777c23 /tmp/i965_dri-64bit-after.so
>
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/glsl/nir_intrinsic_map.py | 131
> ++---
>  1 file changed, 119 insertions(+), 12 deletions(-)
>
> diff --git a/src/compiler/glsl/nir_intrinsic_map.py
> b/src/compiler/glsl/nir_intrinsic_map.py
> index 7f13c6c..5962d4b 100644
> --- a/src/compiler/glsl/nir_intrinsic_map.py
> +++ b/src/compiler/glsl/nir_intrinsic_map.py
> @@ -66,6 +66,123 @@ intrinsics = [("__intrinsic_atomic_read",
> ("nir_intrinsic_atomic_counter_read_va
>("__intrinsic_atomic_exchange_shared",
> ("nir_intrinsic_shared_atomic_exchange", None)),
>("__intrinsic_atomic_comp_swap_shared",
> ("nir_intrinsic_shared_atomic_comp_swap", None))]
>
> +def remove_prefix(table, prefix_length):
> +"""Strip prefix_length characters off the name of each entry in
> table."""
> +
> +return [(s[prefix_length:], d) for (s, d) in table]
> +
> +
> +def generate_trie(table):
> +"""table is a list of (string, data) tuples.  It is assumed to be
> sorted by
> +string.
> +
> +A radix trie (or compact prefix trie) is recursively generated from
> the
> +list of names.  Names are paritioned into groups that have at least
> +prefix_thresh (tunable parameter) common prefix characters.  Each of
> these
> +groups becomes the branches at the current level of the tree.  The
> +matching prefix characters from each group is removed, and the group
> is
> +recursively operated on in the same fashion.
> +
> +The recursion terminates when no groups can be formed with at least
> +prefix_thresh matching characters.
> +
> +Each node in the trie is a 3-element tuple:
> +
> +(prefix_string, [child_nodes], client_data)
> +
> +One of [child_nodes] or client_data will be None.
> +
> +See https://en.wikipedia.org/wiki/Radix_tree for more background
> details
> +on the data structure.
> +
> +"""
> +
> +# Threshold for considering two strings to have the same prefix.
> +prefix_thresh = 1
> +
> +if len(table) == 1 and table[0][0] == "":
> +return [("", None, table[0][1])]
> +
> +trie_level = []
> +
> +(s, d) = table[0]
> +candidates = [(s, d)]
> +base = s
> +prefix_length = len(s)
> +
> +for (s, d) in table[1:]:
> +if s[:prefix_thresh] == base[:prefix_thresh]:
> +candidates.append((s, d))
> +
> +l = len(s[:([x[0]==x[1] for x in zip(s, base)]+[0]).index(0)])
> +if l < prefix_length:
> +prefix_length = l
> +else:
> +trie_level.append((base[:prefix_length],
> generate_trie(remove_prefix(candidates, prefix_length)), None))
> +
> +candidates = [(s, d)]
> +base = s
> +prefix_length = len(s)
> +
> +trie_level.append((base[:prefix_length],
> generate_trie(remove_prefix(candidates, prefix_length)), None))
> +
> +return trie_level
> +
> +
> +def emit_trie_leaf(indent, d):
> +if d[1] is None:
> +return "{}return {};\n".format(indent, d[0])
> +else:
> +c_code = "{}int_op = {};\n".format(indent, d[0])
> +c_code += "{}uint_op = {};\n".format(indent, d[1])
> +return c_code
> +
> +
> +def trie_as_C_code(trie, indent="   ", prefix_string="__intrinsic_"):
> +conditional = "if"
> +
> +c_code = ""
> +for (s, t, d) in trie:
> +if d is not None:
> +c_code +=  "{}{} (name[0] == '\\0') {{\n".format(indent,
> conditional)
> +c_code += "{}   /* {} */\n".format(indent, prefix_string)
> +c_code += emit_trie_leaf(indent + "   ", d);
> +
> +else:
> +# Before emitting the string comparison, check to see of the
> +# subtree has a single element with an empty string.  In that
> +# case, use strcmp() instead of strncmp() and don't advance
> the
> +# name pointer.
> +
> +if len(t) == 1 and t[0][2] is not None:
> +if s == "":
> +c_code += "{}{} (name[0] == '\\0')
> 

Re: [Mesa-dev] mesa: don't install GLX files if GLX is not built

2016-07-07 Thread Emil Velikov
[Adding back mesa-dev]

Hi Akihiko Odaki

Before anything, let me say a couple of things about DRI.
The DRI interface is an abstraction layer where you have winsys
(GLX/EGL/other) agnostic DRI module and different DRI loaders, each
implementing different winsys' API. Thus DRI modules do/should not
have any GLX/EGL/other dependencies.

As the location suggests, the file is not meant for developers using
GL/GLX/..., but it's an internal interface. This way loaders/modules
living outside of the mesa tree can reuse it. One such example is the
loader is xserver.

On 7 July 2016 at 00:38, Akihiko Odaki  wrote:
> I forgot to note that other GL header files are included by the internal DRI
> files.
The dri_interface.h file includes only drm.h afaict, due to historical
reasons. For ease of use (development) the DRI modules can reuse
numerical values from GL/GLES.

> But they are not necessary when developing softwares and some of them
> don't work as expected if the installation lacks GLX.
>
Yes, as mentioned above - DRI is internal and not something meant for
GL/GLES/EGL/... developers.

>>
>> I assumed the header files in GL/internal are not supposed to be
>> included by users directly, but they can be included by other public
>> header files. If so, GL/internal/dri_interface.h is not necessary to be
>> installed since it is not included by such headers.
>>
>> But, after recieving your email I investigated again to find that
>> GL/internal/dri_interface.h is not included even if GLX is enabled. Now
>> I'm not sure what is the expected usage of GL/internal/dri_interface.h.
>> Should it be installed anyway?
>>
Yes it should. Unfortunately one cannot know whether the
header/interface will be used, during mesa build/install stage.
If you know it's not needed, feel free to nuke it.

Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH mesa] i965/blorp: fix indentation level

2016-07-07 Thread Eric Engestrom
On Thu, Jul 07, 2016 at 08:36:35AM +0300, Pohjolainen, Topi wrote:
> On Wed, Jul 06, 2016 at 10:02:42PM +0100, Eric Engestrom wrote:
> > Signed-off-by: Eric Engestrom 
> > ---
> >  src/mesa/drivers/dri/i965/gen7_blorp.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Thanks for fixing this!
> 
> Reviewed-by: Topi Pohjolainen 
> 
> Do you need me to push this for you?

Yes please :)

> 
> > 
> > diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.c 
> > b/src/mesa/drivers/dri/i965/gen7_blorp.c
> > index 7201549..0afd76b 100644
> > --- a/src/mesa/drivers/dri/i965/gen7_blorp.c
> > +++ b/src/mesa/drivers/dri/i965/gen7_blorp.c
> > @@ -797,7 +797,7 @@ gen7_blorp_exec(struct brw_context *brw,
> > if (params->wm_prog_data)
> >gen7_blorp_emit_binding_table_pointers_ps(brw, wm_bind_bo_offset);
> >  
> > -  gen7_blorp_emit_constant_ps_disable(brw);
> > +   gen7_blorp_emit_constant_ps_disable(brw);
> >  
> > if (params->src.mt) {
> >const uint32_t sampler_offset =
> > -- 
> > 2.9.0
> > 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/11] ARB_shader_atomic_counter_ops for NIR and i965

2016-07-07 Thread Iago Toral
On Wed, 2016-07-06 at 09:36 +0200, Iago Toral wrote:
> On Tue, 2016-07-05 at 17:46 -0700, Ian Romanick wrote:
> > 
> > The first 7 patches in this series put GLSL-to-NIR on a small
> > diet.  I
> > looked at the giant sequense of 'if (strcmp(...) == 0) { ... } else
> > if
> > (strcmp(...) == 0) { ...' and said, "Oh hell no."  I don't think we
> > care
> > much about the performance of this code, so I opted to tune for
> > size.
> > Using an in-code radix trie gets it about as small as I think it
> > can
> > get.  The result is -784 bytes in a single function.  All 41
> > strings
> > just disappear.
> Yeah, this looks like a nice clean-up. Patches 1-4 are:
> Reviewed-by: Iago Toral Quiroga 

I dropped a few minor comments for patch 9, but otherwise patches 8-11
are:

Reviewed-by: Iago Toral Quiroga 

Patches 5-7 should probably be reviewed by someone more experienced
with Python/mako, although I am not sure if the extra complexity added
with those patches pays off :-/. In any case, I would be happy to
review those too if you can't find someone more appropriate, just let
me know if that is the case.

> > 
> > It looks like src/mesa/state_tracker/st_glsl_to_tgsi.cpp could get
> > similar treatment, and the savings there should be even larger.  My
> > recommendation would be to copy
> > src/compiler/glsl/nir_intrinsic_map.py
> > into src/mesa/state_tracker and change it to suit the needs of that
> > code.  The hard part is already done. :)
> > 
> > The rest of the series adds the new intrinsics to NIR and to the
> > i965
> > driver.
> > 
> > What we don't have is a good set of piglit tests for the new
> > intrinsics.
> > We also might not have tests for the existing flavors of the new
> > intrinsics on, for example, SSBOs.  There is a test for
> > atomicCounterAddARB.  I think it's going to be fairly difficult to
> > come
> > up with good tests for the other functions.  I'll have to think
> > about
> > it
> > some more.
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] radeon/video: fix coding style in radeon_video.c

2016-07-07 Thread Christian König
From: Christian König 

Signed-off-by: Christian König 
---
 src/gallium/drivers/radeon/radeon_video.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_video.c 
b/src/gallium/drivers/radeon/radeon_video.c
index 5b29c78..69e4416 100644
--- a/src/gallium/drivers/radeon/radeon_video.c
+++ b/src/gallium/drivers/radeon/radeon_video.c
@@ -213,23 +213,23 @@ int rvid_get_video_param(struct pipe_screen *screen,
return codec == PIPE_VIDEO_FORMAT_MPEG4_AVC &&
rvce_is_fw_version_supported(rscreen);
case PIPE_VIDEO_CAP_NPOT_TEXTURES:
-   return 1;
+   return 1;
case PIPE_VIDEO_CAP_MAX_WIDTH:
return (rscreen->family < CHIP_TONGA) ? 2048 : 4096;
case PIPE_VIDEO_CAP_MAX_HEIGHT:
return (rscreen->family < CHIP_TONGA) ? 1152 : 2304;
case PIPE_VIDEO_CAP_PREFERED_FORMAT:
-   return PIPE_FORMAT_NV12;
+   return PIPE_FORMAT_NV12;
case PIPE_VIDEO_CAP_PREFERS_INTERLACED:
-   return false;
+   return false;
case PIPE_VIDEO_CAP_SUPPORTS_INTERLACED:
-   return false;
+   return false;
case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE:
-   return true;
+   return true;
case PIPE_VIDEO_CAP_STACKED_FRAMES:
return (rscreen->family < CHIP_TONGA) ? 1 : 2;
default:
-   return 0;
+   return 0;
}
}
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] radeon/uvd: simplify sending context buffer message

2016-07-07 Thread Christian König
From: Christian König 

Just send it whenever it is allocated.

Signed-off-by: Christian König 
---
 src/gallium/drivers/radeon/radeon_uvd.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c
index 275d826..c693b79 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -1123,12 +1123,9 @@ static void ruvd_end_frame(struct pipe_video_codec 
*decoder,
 
send_cmd(dec, RUVD_CMD_DPB_BUFFER, dec->dpb.res->buf, 0,
 RADEON_USAGE_READWRITE, RADEON_DOMAIN_VRAM);
-   if ((u_reduce_video_profile(picture->profile) == 
PIPE_VIDEO_FORMAT_HEVC) ||
-   (dec->stream_type == RUVD_CODEC_H264_PERF &&
-   ((struct r600_common_screen*)dec->screen)->family >= 
CHIP_POLARIS10)) {
+   if (dec->ctx.res)
send_cmd(dec, RUVD_CMD_CONTEXT_BUFFER, dec->ctx.res->buf, 0,
RADEON_USAGE_READWRITE, RADEON_DOMAIN_VRAM);
-   }
send_cmd(dec, RUVD_CMD_BITSTREAM_BUFFER, bs_buf->res->buf,
 0, RADEON_USAGE_READ, RADEON_DOMAIN_GTT);
send_cmd(dec, RUVD_CMD_DECODING_TARGET_BUFFER, dt, 0,
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] radeon/uvd: move polaris fw check into radeon_video.c

2016-07-07 Thread Christian König
From: Christian König 

It's actually not very clever to claim to support H.264
and then fail to create a decoder.

Signed-off-by: Christian König 
---
 src/gallium/drivers/radeon/radeon_uvd.c   |  8 
 src/gallium/drivers/radeon/radeon_video.c | 16 +---
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c
index 7223417..52658fa 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -60,8 +60,6 @@
 #define FB_BUFFER_SIZE_TONGA (2048 * 64)
 #define IT_SCALING_TABLE_SIZE 992
 
-#define FW_1_66_16 ((1 << 24) | (66 << 16) | (16 << 8))
-
 /* UVD decoder representation */
 struct ruvd_decoder {
struct pipe_video_codec base;
@@ -1185,12 +1183,6 @@ struct pipe_video_codec *ruvd_create_decoder(struct 
pipe_context *context,
height = align(height, VL_MACROBLOCK_HEIGHT);
break;
case PIPE_VIDEO_FORMAT_MPEG4_AVC:
-   if ((info.family == CHIP_POLARIS10 || info.family == 
CHIP_POLARIS11) &&
-   info.uvd_fw_version < FW_1_66_16 ) {
-   RVID_ERR("POLARIS10/11 firmware version need to be 
updated.\n");
-   return NULL;
-   }
-
width = align(width, VL_MACROBLOCK_WIDTH);
height = align(height, VL_MACROBLOCK_HEIGHT);
break;
diff --git a/src/gallium/drivers/radeon/radeon_video.c 
b/src/gallium/drivers/radeon/radeon_video.c
index 69e4416..a26668b 100644
--- a/src/gallium/drivers/radeon/radeon_video.c
+++ b/src/gallium/drivers/radeon/radeon_video.c
@@ -43,6 +43,8 @@
 #include "radeon_video.h"
 #include "radeon_vce.h"
 
+#define FW_1_66_16 ((1 << 24) | (66 << 16) | (16 << 8))
+
 /* generate an stream handle */
 unsigned rvid_alloc_stream_handle()
 {
@@ -206,6 +208,9 @@ int rvid_get_video_param(struct pipe_screen *screen,
 {
struct r600_common_screen *rscreen = (struct r600_common_screen 
*)screen;
enum pipe_video_format codec = u_reduce_video_profile(profile);
+   struct radeon_info info;
+
+   rscreen->ws->query_info(rscreen->ws, );
 
if (entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) {
switch (param) {
@@ -239,10 +244,15 @@ int rvid_get_video_param(struct pipe_screen *screen,
case PIPE_VIDEO_FORMAT_MPEG12:
return profile != PIPE_VIDEO_PROFILE_MPEG1;
case PIPE_VIDEO_FORMAT_MPEG4:
+   /* no support for MPEG4 on older hw */
+   return rscreen->family >= CHIP_PALM;
case PIPE_VIDEO_FORMAT_MPEG4_AVC:
-   if (rscreen->family < CHIP_PALM)
-   /* no support for MPEG4 */
-   return codec != PIPE_VIDEO_FORMAT_MPEG4;
+   if ((rscreen->family == CHIP_POLARIS10 ||
+rscreen->family == CHIP_POLARIS11) &&
+   info.uvd_fw_version < FW_1_66_16 ) {
+   RVID_ERR("POLARIS10/11 firmware version need to 
be updated.\n");
+   return false;
+   }
return true;
case PIPE_VIDEO_FORMAT_VC1:
return true;
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] radeon/uvd: fix contex buffer destruction in the error path

2016-07-07 Thread Christian König
From: Christian König 

Destroying a not allocated buffer is harmless.

Signed-off-by: Christian König 
---
 src/gallium/drivers/radeon/radeon_uvd.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c
index 52658fa..275d826 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -937,10 +937,7 @@ static void ruvd_destroy(struct pipe_video_codec *decoder)
}
 
rvid_destroy_buffer(>dpb);
-   if ((u_reduce_video_profile(dec->base.profile) == 
PIPE_VIDEO_FORMAT_HEVC) ||
-   (dec->stream_type == RUVD_CODEC_H264_PERF &&
-   ((struct r600_common_screen*)dec->screen)->family >= 
CHIP_POLARIS10))
-   rvid_destroy_buffer(>ctx);
+   rvid_destroy_buffer(>ctx);
 
FREE(dec);
 }
@@ -1288,8 +1285,7 @@ error:
}
 
rvid_destroy_buffer(>dpb);
-   if (dec->stream_type == RUVD_CODEC_H264_PERF && info.family >= 
CHIP_POLARIS10)
-   rvid_destroy_buffer(>ctx);
+   rvid_destroy_buffer(>ctx);
 
FREE(dec);
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/11] nir/intrinsics: Add more atomic_counter ops

2016-07-07 Thread Iago Toral
On Tue, 2016-07-05 at 17:46 -0700, Ian Romanick wrote:
> From: Ian Romanick 
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/glsl/glsl_to_nir.cpp  | 43
> +++---
>  src/compiler/glsl/nir_intrinsic_map.py |  8 
>  .../glsl/tests/get_intrinsic_opcode_test.cpp   |  8 
>  src/compiler/nir/nir.c |  1 +
>  src/compiler/nir/nir_intrinsics.h  | 14 +++
>  src/compiler/nir/nir_lower_atomics.c   | 38
> +++
>  6 files changed, 107 insertions(+), 5 deletions(-)
> 
> diff --git a/src/compiler/glsl/glsl_to_nir.cpp
> b/src/compiler/glsl/glsl_to_nir.cpp
> index 3b8424e..ab7200b 100644
> --- a/src/compiler/glsl/glsl_to_nir.cpp
> +++ b/src/compiler/glsl/glsl_to_nir.cpp
> @@ -616,11 +616,44 @@ nir_visitor::visit(ir_call *ir)
>    switch (op) {
>    case nir_intrinsic_atomic_counter_read_var:
>    case nir_intrinsic_atomic_counter_inc_var:
> -  case nir_intrinsic_atomic_counter_dec_var: {
> - ir_dereference *param =
> -(ir_dereference *) ir->actual_parameters.get_head();
> - instr->variables[0] = evaluate_deref(>instr, param);
> - nir_ssa_dest_init(>instr, >dest, 1, 32,
> NULL);
> +  case nir_intrinsic_atomic_counter_dec_var:
> +  case nir_intrinsic_atomic_counter_add_var:
> +  case nir_intrinsic_atomic_counter_min_var:
> +  case nir_intrinsic_atomic_counter_max_var:
> +  case nir_intrinsic_atomic_counter_and_var:
> +  case nir_intrinsic_atomic_counter_or_var:
> +  case nir_intrinsic_atomic_counter_xor_var:
> +  case nir_intrinsic_atomic_counter_exchange_var:
> +  case nir_intrinsic_atomic_counter_comp_swap_var: {
> + nir_ssa_undef_instr *instr_undef =
> +nir_ssa_undef_instr_create(shader, 1, 32);
> + nir_builder_instr_insert(, _undef->instr);
> +
> + /* Set the counter variable dereference. */
> + exec_node *param = ir->actual_parameters.get_head();
> + ir_dereference *counter = (ir_dereference *)param;
> +
> + instr->variables[0] = evaluate_deref(>instr,
> counter);
> + param = param->get_next();
> +
> + /* Set the intrinsic destination. */
> + if (ir->return_deref) {
> +nir_ssa_dest_init(>instr, >dest, 1, 32,
> NULL);
> + }
> +
> + /* Set the intrinsic parameters. */
> + if (!param->is_tail_sentinel()) {
> +instr->src[0] =
> +   nir_src_for_ssa(evaluate_rvalue((ir_dereference
> *)param));
> +param = param->get_next();
> + }
> +
> + if (!param->is_tail_sentinel()) {
> +instr->src[1] =
> +   nir_src_for_ssa(evaluate_rvalue((ir_dereference
> *)param));
> +param = param->get_next();
> + }
> +
>   nir_builder_instr_insert(, >instr);
>   break;
>    }
> diff --git a/src/compiler/glsl/nir_intrinsic_map.py
> b/src/compiler/glsl/nir_intrinsic_map.py
> index 07b2d0d..5abc3cb 100644
> --- a/src/compiler/glsl/nir_intrinsic_map.py
> +++ b/src/compiler/glsl/nir_intrinsic_map.py
> @@ -26,6 +26,14 @@ from mako.template import Template
>  intrinsics = [("__intrinsic_atomic_read",
> ("nir_intrinsic_atomic_counter_read_var", None)),
>    ("__intrinsic_atomic_increment",
> ("nir_intrinsic_atomic_counter_inc_var", None)),
>    ("__intrinsic_atomic_predecrement",
> ("nir_intrinsic_atomic_counter_dec_var", None)),
> +  ("__intrinsic_atomic_add",
> ("nir_intrinsic_atomic_counter_add_var", None)),
> +  ("__intrinsic_atomic_min",
> ("nir_intrinsic_atomic_counter_min_var", None)),
> +  ("__intrinsic_atomic_max",
> ("nir_intrinsic_atomic_counter_max_var", None)),
> +  ("__intrinsic_atomic_and",
> ("nir_intrinsic_atomic_counter_and_var", None)),
> +  ("__intrinsic_atomic_or",
> ("nir_intrinsic_atomic_counter_or_var", None)),
> +  ("__intrinsic_atomic_xor",
> ("nir_intrinsic_atomic_counter_xor_var", None)),
> +  ("__intrinsic_atomic_exchange",
> ("nir_intrinsic_atomic_counter_exchange_var", None)),
> +  ("__intrinsic_atomic_comp_swap",
> ("nir_intrinsic_atomic_counter_comp_swap_var", None)),
>    ("__intrinsic_image_load",
> ("nir_intrinsic_image_load", None)),
>    ("__intrinsic_image_store",
> ("nir_intrinsic_image_store", None)),
>    ("__intrinsic_image_atomic_add",
> ("nir_intrinsic_image_atomic_add", None)),
> diff --git a/src/compiler/glsl/tests/get_intrinsic_opcode_test.cpp
> b/src/compiler/glsl/tests/get_intrinsic_opcode_test.cpp
> index aeecf32..d270a03 100644
> --- a/src/compiler/glsl/tests/get_intrinsic_opcode_test.cpp
> +++ b/src/compiler/glsl/tests/get_intrinsic_opcode_test.cpp
> @@ -45,6 +45,14 @@ static const struct test_vector {
> test_vector("__intrinsic_atomic_read",
> 

Re: [Mesa-dev] [PATCH 09/11] nir/intrinsics: Add more atomic_counter ops

2016-07-07 Thread Iago Toral
On Tue, 2016-07-05 at 17:46 -0700, Ian Romanick wrote:
> From: Ian Romanick 
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/glsl/glsl_to_nir.cpp  | 43
> +++---
>  src/compiler/glsl/nir_intrinsic_map.py |  8 
>  .../glsl/tests/get_intrinsic_opcode_test.cpp   |  8 
>  src/compiler/nir/nir.c |  1 +
>  src/compiler/nir/nir_intrinsics.h  | 14 +++
>  src/compiler/nir/nir_lower_atomics.c   | 38
> +++
>  6 files changed, 107 insertions(+), 5 deletions(-)
> 
> diff --git a/src/compiler/glsl/glsl_to_nir.cpp
> b/src/compiler/glsl/glsl_to_nir.cpp
> index 3b8424e..ab7200b 100644
> --- a/src/compiler/glsl/glsl_to_nir.cpp
> +++ b/src/compiler/glsl/glsl_to_nir.cpp
> @@ -616,11 +616,44 @@ nir_visitor::visit(ir_call *ir)
>    switch (op) {
>    case nir_intrinsic_atomic_counter_read_var:
>    case nir_intrinsic_atomic_counter_inc_var:
> -  case nir_intrinsic_atomic_counter_dec_var: {
> - ir_dereference *param =
> -(ir_dereference *) ir->actual_parameters.get_head();
> - instr->variables[0] = evaluate_deref(>instr, param);
> - nir_ssa_dest_init(>instr, >dest, 1, 32,
> NULL);
> +  case nir_intrinsic_atomic_counter_dec_var:
> +  case nir_intrinsic_atomic_counter_add_var:
> +  case nir_intrinsic_atomic_counter_min_var:
> +  case nir_intrinsic_atomic_counter_max_var:
> +  case nir_intrinsic_atomic_counter_and_var:
> +  case nir_intrinsic_atomic_counter_or_var:
> +  case nir_intrinsic_atomic_counter_xor_var:
> +  case nir_intrinsic_atomic_counter_exchange_var:
> +  case nir_intrinsic_atomic_counter_comp_swap_var: {
> + nir_ssa_undef_instr *instr_undef =
> +nir_ssa_undef_instr_create(shader, 1, 32);
> + nir_builder_instr_insert(, _undef->instr);
> 

I guess you did not mean to include the undef instruction hunk above?

> + /* Set the counter variable dereference. */
> + exec_node *param = ir->actual_parameters.get_head();
> + ir_dereference *counter = (ir_dereference *)param;
> +
> + instr->variables[0] = evaluate_deref(>instr,
> counter);
> + param = param->get_next();
> +
> + /* Set the intrinsic destination. */
> + if (ir->return_deref) {
> +nir_ssa_dest_init(>instr, >dest, 1, 32,
> NULL);
> + }
> +
> + /* Set the intrinsic parameters. */
> + if (!param->is_tail_sentinel()) {
> +instr->src[0] =
> +   nir_src_for_ssa(evaluate_rvalue((ir_dereference
> *)param));
> +param = param->get_next();
> + }
> +
> + if (!param->is_tail_sentinel()) {
> +instr->src[1] =
> +   nir_src_for_ssa(evaluate_rvalue((ir_dereference
> *)param));
> +param = param->get_next();
> + }
> +
>   nir_builder_instr_insert(, >instr);
>   break;
>    }
> diff --git a/src/compiler/glsl/nir_intrinsic_map.py
> b/src/compiler/glsl/nir_intrinsic_map.py
> index 07b2d0d..5abc3cb 100644
> --- a/src/compiler/glsl/nir_intrinsic_map.py
> +++ b/src/compiler/glsl/nir_intrinsic_map.py
> @@ -26,6 +26,14 @@ from mako.template import Template
>  intrinsics = [("__intrinsic_atomic_read",
> ("nir_intrinsic_atomic_counter_read_var", None)),
>    ("__intrinsic_atomic_increment",
> ("nir_intrinsic_atomic_counter_inc_var", None)),
>    ("__intrinsic_atomic_predecrement",
> ("nir_intrinsic_atomic_counter_dec_var", None)),
> +  ("__intrinsic_atomic_add",
> ("nir_intrinsic_atomic_counter_add_var", None)),
> +  ("__intrinsic_atomic_min",
> ("nir_intrinsic_atomic_counter_min_var", None)),
> +  ("__intrinsic_atomic_max",
> ("nir_intrinsic_atomic_counter_max_var", None)),
> +  ("__intrinsic_atomic_and",
> ("nir_intrinsic_atomic_counter_and_var", None)),
> +  ("__intrinsic_atomic_or",
> ("nir_intrinsic_atomic_counter_or_var", None)),
> +  ("__intrinsic_atomic_xor",
> ("nir_intrinsic_atomic_counter_xor_var", None)),
> +  ("__intrinsic_atomic_exchange",
> ("nir_intrinsic_atomic_counter_exchange_var", None)),
> +  ("__intrinsic_atomic_comp_swap",
> ("nir_intrinsic_atomic_counter_comp_swap_var", None)),
>    ("__intrinsic_image_load",
> ("nir_intrinsic_image_load", None)),
>    ("__intrinsic_image_store",
> ("nir_intrinsic_image_store", None)),
>    ("__intrinsic_image_atomic_add",
> ("nir_intrinsic_image_atomic_add", None)),
> diff --git a/src/compiler/glsl/tests/get_intrinsic_opcode_test.cpp
> b/src/compiler/glsl/tests/get_intrinsic_opcode_test.cpp
> index aeecf32..d270a03 100644
> --- a/src/compiler/glsl/tests/get_intrinsic_opcode_test.cpp
> +++ b/src/compiler/glsl/tests/get_intrinsic_opcode_test.cpp
> @@ -45,6 +45,14 @@ static const 

Re: [Mesa-dev] [PATCH 2/6] i965/fs: use the new helper function to create double immediates

2016-07-07 Thread Kenneth Graunke
On Thursday, July 7, 2016 9:33:01 AM PDT Samuel Iglesias Gonsálvez wrote:
> 
> On 06/07/16 22:32, Kenneth Graunke wrote:
> > On Wednesday, July 6, 2016 12:09:58 PM PDT Samuel Iglesias Gonsálvez wrote:
> >> From: Iago Toral Quiroga 
> >>
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> >> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> index 268c847..d805d95 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> @@ -832,7 +832,7 @@ fs_visitor::nir_emit_alu(const fs_builder , 
> >> nir_alu_instr *instr)
> >>* a register and compare with that.
> >>*/
> >>   fs_reg tmp = vgrf(glsl_type::double_type);
> >> - bld.MOV(tmp, brw_imm_df(0.0));
> >> + bld.MOV(tmp, setup_imm_df(0.0));
> > 
> > Does this need to be splatted out to a full SIMD-width?
> > Why not just do:
> > 
> >fs_reg tmp = setup_imm_df(0.0);
> > 
> > and let the CMP compare against the stride 0 register?
> > 
> 
> Right, I will do the change.
> 
> >>  
> >>   /* A direct DF CMP using the flag register (null dst) won't work 
> >> in
> >>* SIMD16 because the CMP will be split in two by 
> >> lower_simd_width,
> >> @@ -1171,7 +1171,7 @@ fs_visitor::nir_emit_alu(const fs_builder , 
> >> nir_alu_instr *instr)
> >> case nir_op_d2b: {
> >>/* two-argument instructions can't take 64-bit immediates */
> >>fs_reg zero = vgrf(glsl_type::double_type);
> >> -  bld.MOV(zero, brw_imm_df(0.0));
> >> +  bld.MOV(zero, setup_imm_df(0.0));
> >>/* A SIMD16 execution needs to be split in two instructions, so use
> >> * a vgrf instead of the flag register as dst so instruction 
> >> splitting
> >> * works
> > 
> > Likewise, I don't think you need to splat here.
> > 
> 
> Here too.
> 
> Does it have your R-b?
> 
> Sam

Yup, with those changes, this patch gets a:
Reviewed-by: Kenneth Graunke 

> 
> >> @@ -1483,7 +1483,7 @@ fs_visitor::nir_emit_load_const(const fs_builder 
> >> ,
> >>  
> >> case 64:
> >>for (unsigned i = 0; i < instr->def.num_components; i++)
> >> - bld.MOV(offset(reg, bld, i), brw_imm_df(instr->value.f64[i]));
> >> + bld.MOV(offset(reg, bld, i), setup_imm_df(instr->value.f64[i]));
> >>break;
> >>  
> >> default:
> >>
> > 
> > This hunk looks good.
> > 
> 
> 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] gallium: add async flag to pipe_debug_callback

2016-07-07 Thread Nicolai Hähnle

On 07.07.2016 09:53, Edward O'Callaghan wrote:



On 07/07/2016 05:39 PM, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

---
  src/gallium/include/pipe/p_state.h   | 6 ++
  src/gallium/state_trackers/clover/core/queue.cpp | 5 -
  src/mesa/state_tracker/st_debug.c| 5 -
  3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/gallium/include/pipe/p_state.h 
b/src/gallium/include/pipe/p_state.h
index f4bee38..f7bf402 100644
--- a/src/gallium/include/pipe/p_state.h
+++ b/src/gallium/include/pipe/p_state.h
@@ -809,6 +809,12 @@ struct pipe_compute_state
  struct pipe_debug_callback
  {
 /**
+* When set to \c true, the callback may be called asynchronously from a
+* driver-created thread.
+*/
+   bool async;
+
+   /**
  * Callback for the driver to report debug/performance/etc information back
  * to the state tracker.
  *
diff --git a/src/gallium/state_trackers/clover/core/queue.cpp 
b/src/gallium/state_trackers/clover/core/queue.cpp
index 24d71f1..00afdb6 100644
--- a/src/gallium/state_trackers/clover/core/queue.cpp
+++ b/src/gallium/state_trackers/clover/core/queue.cpp
@@ -50,7 +50,10 @@ command_queue::command_queue(clover::context , 
clover::device ,
throw error(CL_INVALID_DEVICE);

 if (ctx.notify) {
-  struct pipe_debug_callback cb = { _notify_callback, this };
+  struct pipe_debug_callback cb;
+  memset(, 0, sizeof(db));

sizeof(cb) ?


Clearly, may compile setup is missing clover. Thanks!

Nicolai




+  cb.debug_message = _notify_callback;
+  cb.data = this;
if (pipe->set_debug_callback)
   pipe->set_debug_callback(pipe, );
 }
diff --git a/src/mesa/state_tracker/st_debug.c 
b/src/mesa/state_tracker/st_debug.c
index eaf2549..214e223 100644
--- a/src/mesa/state_tracker/st_debug.c
+++ b/src/mesa/state_tracker/st_debug.c
@@ -172,7 +172,10 @@ st_enable_debug_output(struct st_context *st, boolean 
enable)
return;

 if (enable) {
-  struct pipe_debug_callback cb = { st_debug_message, st };
+  struct pipe_debug_callback cb;
+  memset(, 0, sizeof(cb));
+  cb.debug_message = st_debug_message;
+  cb.data = st;
pipe->set_debug_callback(pipe, );
 } else {
pipe->set_debug_callback(pipe, NULL);




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] gallium: add async flag to pipe_debug_callback

2016-07-07 Thread Edward O'Callaghan


On 07/07/2016 05:39 PM, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> ---
>  src/gallium/include/pipe/p_state.h   | 6 ++
>  src/gallium/state_trackers/clover/core/queue.cpp | 5 -
>  src/mesa/state_tracker/st_debug.c| 5 -
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/include/pipe/p_state.h 
> b/src/gallium/include/pipe/p_state.h
> index f4bee38..f7bf402 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -809,6 +809,12 @@ struct pipe_compute_state
>  struct pipe_debug_callback
>  {
> /**
> +* When set to \c true, the callback may be called asynchronously from a
> +* driver-created thread.
> +*/
> +   bool async;
> +
> +   /**
>  * Callback for the driver to report debug/performance/etc information 
> back
>  * to the state tracker.
>  *
> diff --git a/src/gallium/state_trackers/clover/core/queue.cpp 
> b/src/gallium/state_trackers/clover/core/queue.cpp
> index 24d71f1..00afdb6 100644
> --- a/src/gallium/state_trackers/clover/core/queue.cpp
> +++ b/src/gallium/state_trackers/clover/core/queue.cpp
> @@ -50,7 +50,10 @@ command_queue::command_queue(clover::context , 
> clover::device ,
>throw error(CL_INVALID_DEVICE);
>  
> if (ctx.notify) {
> -  struct pipe_debug_callback cb = { _notify_callback, this };
> +  struct pipe_debug_callback cb;
> +  memset(, 0, sizeof(db));
sizeof(cb) ?

> +  cb.debug_message = _notify_callback;
> +  cb.data = this;
>if (pipe->set_debug_callback)
>   pipe->set_debug_callback(pipe, );
> }
> diff --git a/src/mesa/state_tracker/st_debug.c 
> b/src/mesa/state_tracker/st_debug.c
> index eaf2549..214e223 100644
> --- a/src/mesa/state_tracker/st_debug.c
> +++ b/src/mesa/state_tracker/st_debug.c
> @@ -172,7 +172,10 @@ st_enable_debug_output(struct st_context *st, boolean 
> enable)
>return;
>  
> if (enable) {
> -  struct pipe_debug_callback cb = { st_debug_message, st };
> +  struct pipe_debug_callback cb;
> +  memset(, 0, sizeof(cb));
> +  cb.debug_message = st_debug_message;
> +  cb.data = st;
>pipe->set_debug_callback(pipe, );
> } else {
>pipe->set_debug_callback(pipe, NULL);
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/17] glsl/nir: add new num_packed_components field

2016-07-07 Thread Edward O'Callaghan
Hi,

There is a typing issue in this patch in that, you are converting
‘gl_linked_shader*’ to ‘gl_shader*’ for the first argument to function
‘void set_num_packed_components(gl_shader*, ir_variable_mode, unsigned
int)’ at the various call sites.

Cheers,
Edward.


On 07/07/2016 11:58 AM, Timothy Arceri wrote:
> This will be used to store the total number of components used at this 
> location
> when packing via ARB_enhanced_layouts.
> ---
>  src/compiler/glsl/glsl_to_nir.cpp   |  1 +
>  src/compiler/glsl/ir.h  |  5 +++
>  src/compiler/glsl/link_varyings.cpp | 74 
> -
>  src/compiler/glsl/linker.cpp|  2 +
>  src/compiler/glsl/linker.h  |  4 ++
>  src/compiler/nir/nir.h  |  5 +++
>  6 files changed, 89 insertions(+), 2 deletions(-)
> 
> diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
> b/src/compiler/glsl/glsl_to_nir.cpp
> index 20302e3..baba624 100644
> --- a/src/compiler/glsl/glsl_to_nir.cpp
> +++ b/src/compiler/glsl/glsl_to_nir.cpp
> @@ -375,6 +375,7 @@ nir_visitor::visit(ir_variable *ir)
> var->data.explicit_binding = ir->data.explicit_binding;
> var->data.has_initializer = ir->data.has_initializer;
> var->data.location_frac = ir->data.location_frac;
> +   var->data.num_packed_components = ir->data.num_packed_components;
>  
> switch (ir->data.depth_layout) {
> case ir_depth_layout_none:
> diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
> index 1325e35..637b53c 100644
> --- a/src/compiler/glsl/ir.h
> +++ b/src/compiler/glsl/ir.h
> @@ -770,6 +770,11 @@ public:
>unsigned location_frac:2;
>  
>/**
> +   * The total number of components packed into this location.
> +   */
> +  unsigned num_packed_components:4;
> +
> +  /**
> * Layout of the matrix.  Uses glsl_matrix_layout values.
> */
>unsigned matrix_layout:2;
> diff --git a/src/compiler/glsl/link_varyings.cpp 
> b/src/compiler/glsl/link_varyings.cpp
> index 76d0be1..35f97a9 100644
> --- a/src/compiler/glsl/link_varyings.cpp
> +++ b/src/compiler/glsl/link_varyings.cpp
> @@ -1975,6 +1975,70 @@ reserved_varying_slot(struct gl_linked_shader *stage,
> return slots;
>  }
>  
> +void
> +set_num_packed_components(struct gl_shader *shader, ir_variable_mode io_mode,
> +  unsigned base_offset)
> +{
> +   /* Find the max number of components used at this location */
> +   unsigned num_components[MAX_VARYINGS_INCL_PATCH] = { 0 };
> +
> +   foreach_in_list(ir_instruction, node, shader->ir) {
> +  ir_variable *const var = node->as_variable();
> +
> +  if (var == NULL || var->data.mode != io_mode ||
> +  !var->data.explicit_location)
> + continue;
> +
> +  int idx = var->data.location - base_offset;
> +  if (idx < 0 || idx >= MAX_VARYINGS_INCL_PATCH ||
> +  var->type->without_array()->is_record() ||
> +  var->type->without_array()->is_matrix())
> + continue;
> +
> +  if (var->type->is_array()) {
> + const glsl_type *type = get_varying_type(var, shader->Stage);
> + unsigned array_components = type->without_array()->vector_elements +
> +var->data.location_frac;
> + assert(type->arrays_of_arrays_size() + idx <=
> +ARRAY_SIZE(num_components));
> + for (unsigned i = idx; i < type->arrays_of_arrays_size(); i++) {
> +num_components[i] = MAX2(array_components, num_components[i]);
> + }
> +  } else {
> + unsigned comps = var->type->vector_elements +
> +var->data.location_frac;
> + num_components[idx] = MAX2(comps, num_components[idx]);
> +  }
> +   }
> +
> +   foreach_in_list(ir_instruction, node, shader->ir) {
> +  ir_variable *const var = node->as_variable();
> +
> +  if (var == NULL || var->data.mode != io_mode ||
> +  !var->data.explicit_location)
> + continue;
> +
> +  int idx = var->data.location - base_offset;
> +  if (idx < 0 || idx >= MAX_VARYINGS_INCL_PATCH ||
> +  var->type->without_array()->is_record() ||
> +  var->type->without_array()->is_matrix())
> + continue;
> +
> +  /* For arrays we need to check all elements in order to find the max
> +   * number of components used.
> +   */
> +  unsigned c = 0;
> +  if (var->type->is_array()) {
> + const glsl_type *type = get_varying_type(var, shader->Stage);
> + for (unsigned i = idx; i < type->arrays_of_arrays_size(); i++) {
> +c = MAX2(c, num_components[i]);
> + }
> +  } else {
> + c = num_components[idx];
> +  }
> +  var->data.num_packed_components = c;
> +   }
> +}
>  
>  /**
>   * Assign locations for all variables that are produced in one pipeline stage
> @@ -2091,11 +2155,17 @@ assign_varying_locations(struct gl_context *ctx,
>  * 4. Mark input variables in the consumer that do not have locations as
>  *not being 

[Mesa-dev] [PATCH 2/4] st/mesa: set debug callback async flag

2016-07-07 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/mesa/state_tracker/st_context.c | 3 ++-
 src/mesa/state_tracker/st_debug.c   | 6 --
 src/mesa/state_tracker/st_debug.h   | 2 +-
 src/mesa/state_tracker/st_manager.c | 2 +-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/mesa/state_tracker/st_context.c 
b/src/mesa/state_tracker/st_context.c
index 4721215..f1d2084 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -90,7 +90,8 @@ static void st_Enable(struct gl_context * ctx, GLenum cap, 
GLboolean state)
 
switch (cap) {
case GL_DEBUG_OUTPUT:
-  st_enable_debug_output(st, state);
+   case GL_DEBUG_OUTPUT_SYNCHRONOUS:
+  st_update_debug_callback(st);
   break;
default:
   break;
diff --git a/src/mesa/state_tracker/st_debug.c 
b/src/mesa/state_tracker/st_debug.c
index 214e223..b51f350 100644
--- a/src/mesa/state_tracker/st_debug.c
+++ b/src/mesa/state_tracker/st_debug.c
@@ -27,6 +27,7 @@
 
 
 #include "main/context.h"
+#include "main/debug_output.h"
 #include "program/prog_print.h"
 
 #include "pipe/p_state.h"
@@ -164,16 +165,17 @@ st_debug_message(void *data,
 }
 
 void
-st_enable_debug_output(struct st_context *st, boolean enable)
+st_update_debug_callback(struct st_context *st)
 {
struct pipe_context *pipe = st->pipe;
 
if (!pipe->set_debug_callback)
   return;
 
-   if (enable) {
+   if (_mesa_get_debug_state_int(st->ctx, GL_DEBUG_OUTPUT)) {
   struct pipe_debug_callback cb;
   memset(, 0, sizeof(cb));
+  cb.async = !_mesa_get_debug_state_int(st->ctx, 
GL_DEBUG_OUTPUT_SYNCHRONOUS);
   cb.debug_message = st_debug_message;
   cb.data = st;
   pipe->set_debug_callback(pipe, );
diff --git a/src/mesa/state_tracker/st_debug.h 
b/src/mesa/state_tracker/st_debug.h
index e143609..6c1e915 100644
--- a/src/mesa/state_tracker/st_debug.h
+++ b/src/mesa/state_tracker/st_debug.h
@@ -63,7 +63,7 @@ extern int ST_DEBUG;
 
 void st_debug_init( void );
 
-void st_enable_debug_output(struct st_context *st, boolean enable);
+void st_update_debug_callback(struct st_context *st);
 
 static inline void
 ST_DBG( unsigned flag, const char *fmt, ... )
diff --git a/src/mesa/state_tracker/st_manager.c 
b/src/mesa/state_tracker/st_manager.c
index 997d428..d323c87 100644
--- a/src/mesa/state_tracker/st_manager.c
+++ b/src/mesa/state_tracker/st_manager.c
@@ -681,7 +681,7 @@ st_api_create_context(struct st_api *stapi, struct 
st_manager *smapi,
 
   st->ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT;
 
-  st_enable_debug_output(st, TRUE);
+  st_update_debug_callback(st);
}
 
if (attribs->flags & ST_CONTEXT_FLAG_FORWARD_COMPATIBLE)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] radeonsi: disable multi-threading when shader dumps are enabled

2016-07-07 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Otherwise, shader dumps can become interleaved and unusable.
---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 94587b2..c24130d 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1325,6 +1325,7 @@ static void *si_create_shader_selector(struct 
pipe_context *ctx,
util_queue_fence_init(>ready);
 
if ((sctx->b.debug.debug_message && !sctx->b.debug.async) ||
+   r600_can_dump_shader(>b, sel->info.processor) ||
!util_queue_is_initialized(>shader_compiler_queue))
si_init_shader_selector_async(sel, -1);
else
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] radeonsi: use multi-threaded compilation in debug contexts

2016-07-07 Thread Nicolai Hähnle
From: Nicolai Hähnle 

We only have to stay single-threaded when debug output must be synchronous.
This yields better parallelism in shader-db runs for me.
---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index abbe451..94587b2 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1103,16 +1103,16 @@ void si_init_shader_selector_async(void *job, int 
thread_index)
struct si_shader_selector *sel = (struct si_shader_selector *)job;
struct si_screen *sscreen = sel->screen;
LLVMTargetMachineRef tm;
-   struct pipe_debug_callback *debug;
+   struct pipe_debug_callback *debug = >debug;
unsigned i;
 
if (thread_index >= 0) {
assert(thread_index < ARRAY_SIZE(sscreen->tm));
tm = sscreen->tm[thread_index];
-   debug = NULL;
+   if (!debug->async)
+   debug = NULL;
} else {
tm = sel->tm;
-   debug = >debug;
}
 
/* Compile the main shader part for use with a prolog and/or epilog.
@@ -1324,7 +1324,7 @@ static void *si_create_shader_selector(struct 
pipe_context *ctx,
pipe_mutex_init(sel->mutex);
util_queue_fence_init(>ready);
 
-   if (sctx->b.debug.debug_message ||
+   if ((sctx->b.debug.debug_message && !sctx->b.debug.async) ||
!util_queue_is_initialized(>shader_compiler_queue))
si_init_shader_selector_async(sel, -1);
else
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] gallium: add async flag to pipe_debug_callback

2016-07-07 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/include/pipe/p_state.h   | 6 ++
 src/gallium/state_trackers/clover/core/queue.cpp | 5 -
 src/mesa/state_tracker/st_debug.c| 5 -
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/gallium/include/pipe/p_state.h 
b/src/gallium/include/pipe/p_state.h
index f4bee38..f7bf402 100644
--- a/src/gallium/include/pipe/p_state.h
+++ b/src/gallium/include/pipe/p_state.h
@@ -809,6 +809,12 @@ struct pipe_compute_state
 struct pipe_debug_callback
 {
/**
+* When set to \c true, the callback may be called asynchronously from a
+* driver-created thread.
+*/
+   bool async;
+
+   /**
 * Callback for the driver to report debug/performance/etc information back
 * to the state tracker.
 *
diff --git a/src/gallium/state_trackers/clover/core/queue.cpp 
b/src/gallium/state_trackers/clover/core/queue.cpp
index 24d71f1..00afdb6 100644
--- a/src/gallium/state_trackers/clover/core/queue.cpp
+++ b/src/gallium/state_trackers/clover/core/queue.cpp
@@ -50,7 +50,10 @@ command_queue::command_queue(clover::context , 
clover::device ,
   throw error(CL_INVALID_DEVICE);
 
if (ctx.notify) {
-  struct pipe_debug_callback cb = { _notify_callback, this };
+  struct pipe_debug_callback cb;
+  memset(, 0, sizeof(db));
+  cb.debug_message = _notify_callback;
+  cb.data = this;
   if (pipe->set_debug_callback)
  pipe->set_debug_callback(pipe, );
}
diff --git a/src/mesa/state_tracker/st_debug.c 
b/src/mesa/state_tracker/st_debug.c
index eaf2549..214e223 100644
--- a/src/mesa/state_tracker/st_debug.c
+++ b/src/mesa/state_tracker/st_debug.c
@@ -172,7 +172,10 @@ st_enable_debug_output(struct st_context *st, boolean 
enable)
   return;
 
if (enable) {
-  struct pipe_debug_callback cb = { st_debug_message, st };
+  struct pipe_debug_callback cb;
+  memset(, 0, sizeof(cb));
+  cb.debug_message = st_debug_message;
+  cb.data = st;
   pipe->set_debug_callback(pipe, );
} else {
   pipe->set_debug_callback(pipe, NULL);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] i965/fs: use the new helper function to create double immediates

2016-07-07 Thread Samuel Iglesias Gonsálvez


On 06/07/16 22:32, Kenneth Graunke wrote:
> On Wednesday, July 6, 2016 12:09:58 PM PDT Samuel Iglesias Gonsálvez wrote:
>> From: Iago Toral Quiroga 
>>
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index 268c847..d805d95 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -832,7 +832,7 @@ fs_visitor::nir_emit_alu(const fs_builder , 
>> nir_alu_instr *instr)
>>* a register and compare with that.
>>*/
>>   fs_reg tmp = vgrf(glsl_type::double_type);
>> - bld.MOV(tmp, brw_imm_df(0.0));
>> + bld.MOV(tmp, setup_imm_df(0.0));
> 
> Does this need to be splatted out to a full SIMD-width?
> Why not just do:
> 
>fs_reg tmp = setup_imm_df(0.0);
> 
> and let the CMP compare against the stride 0 register?
> 

Right, I will do the change.

>>  
>>   /* A direct DF CMP using the flag register (null dst) won't work in
>>* SIMD16 because the CMP will be split in two by lower_simd_width,
>> @@ -1171,7 +1171,7 @@ fs_visitor::nir_emit_alu(const fs_builder , 
>> nir_alu_instr *instr)
>> case nir_op_d2b: {
>>/* two-argument instructions can't take 64-bit immediates */
>>fs_reg zero = vgrf(glsl_type::double_type);
>> -  bld.MOV(zero, brw_imm_df(0.0));
>> +  bld.MOV(zero, setup_imm_df(0.0));
>>/* A SIMD16 execution needs to be split in two instructions, so use
>> * a vgrf instead of the flag register as dst so instruction splitting
>> * works
> 
> Likewise, I don't think you need to splat here.
> 

Here too.

Does it have your R-b?

Sam

>> @@ -1483,7 +1483,7 @@ fs_visitor::nir_emit_load_const(const fs_builder ,
>>  
>> case 64:
>>for (unsigned i = 0; i < instr->def.num_components; i++)
>> - bld.MOV(offset(reg, bld, i), brw_imm_df(instr->value.f64[i]));
>> + bld.MOV(offset(reg, bld, i), setup_imm_df(instr->value.f64[i]));
>>break;
>>  
>> default:
>>
> 
> This hunk looks good.
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: don't interleave R600_DEBUG-enabled shader dumps

2016-07-07 Thread Nicolai Hähnle

On 07.07.2016 03:33, Tom Stellard wrote:

On Wed, Jul 06, 2016 at 11:55:03PM +0200, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

Only setting R600_DEBUG doesn't set any debug callback. Conversely, the debug
callback is only called when R600_DEBUG is set.


I don't get any output from shader-db with this patch.


Yeah, I forgot that the code is conservative wrt debug callbacks, which 
can't use multithreading when GL_DEBUG_OUTPUT_SYNCHRONOUS is set. The 
quick fix would be to keep the sctx->b.debug.debug_message setting.


Nicolai


-Tom


---
  src/gallium/drivers/radeonsi/si_state_shaders.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index abbe451..059ff70 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1324,7 +1324,7 @@ static void *si_create_shader_selector(struct 
pipe_context *ctx,
pipe_mutex_init(sel->mutex);
util_queue_fence_init(>ready);

-   if (sctx->b.debug.debug_message ||
+   if (r600_can_dump_shader(>b, sel->info.processor) ||
!util_queue_is_initialized(>shader_compiler_queue))
si_init_shader_selector_async(sel, -1);
else
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 96410] [Perf] Pre validate _mesa_sampler_uniforms_pipeline_are_valid like _mesa_sampler_uniforms_are_valid

2016-07-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=96410

gregory.hain...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from gregory.hain...@gmail.com ---
Fixed by commit 6a524c76f502fe15bb3612065a23ece693aed237

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 96358] SSO: wrong interface validation between GS and VS (regresion due to latest gles 3.1)

2016-07-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=96358

Timothy Arceri  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Timothy Arceri  ---
commit  73a6a4ce4975016d4f86d644b31d30bb6d3a38f8

mesa: Strip arrayness from interface block names in some IO validation
Outputs from the vertex shader need to be able to match
per-vertex-arrayed inputs of later stages.  Acomplish this by stripping
one level of arrayness from the names and types of outputs going to a
per-vertex-arrayed stage.

v2: Add missing checks for TESS_EVAL->GEOMETRY.  Noticed by Timothy
Arceri.

v3: Use a slightly simpler stage check suggested by Ilia.

Signed-off-by: Ian Romanick 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96358
Reviewed-by: Kenneth Graunke 
Cc: "12.0" 
Cc: Gregory Hainaut 
Cc: Ilia Mirkin 

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >