Re: [Mesa-dev] [PATCH v2] i965: fix release build unused variable warning

2016-12-11 Thread Eduardo Lima Mitev
On 12/11/2016 08:57 PM, Grazvydas Ignotas wrote:
> Signed-off-by: Grazvydas Ignotas 
> ---
> v2: take Eduardo Lima's suggestion further - remove the variable completely
> no commit access
> 

I just pushed the patch. Thanks!

Eduardo

>  src/mesa/drivers/dri/i965/brw_blorp.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> index 4c1d858..43ac3be 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> @@ -813,8 +813,6 @@ do_single_blorp_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
>can_fast_clear = false;
>  
> const unsigned logical_layer = irb_logical_mt_layer(irb);
> -   const bool is_lossless_compressed = intel_miptree_is_lossless_compressed(
> -  brw, irb->mt);
> const enum intel_fast_clear_state fast_clear_state =
>intel_miptree_get_fast_clear_state(irb->mt, irb->mt_level,
>   logical_layer);
> @@ -850,7 +848,7 @@ do_single_blorp_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
> * it now.
> */
>if (!irb->mt->mcs_buf) {
> - assert(!is_lossless_compressed);
> + assert(!intel_miptree_is_lossless_compressed(brw, irb->mt));
>   if (!intel_miptree_alloc_non_msrt_mcs(brw, irb->mt, false)) {
>  /* MCS allocation failed--probably this will only happen in
>   * out-of-memory conditions.  But in any case, try to recover
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/9] i965: Factor out oword block read and write message control calculation.

2016-12-11 Thread Kenneth Graunke
On Sunday, December 11, 2016 5:44:03 PM PST Francisco Jerez wrote:
> Kenneth Graunke  writes:
> 
> > On Friday, December 9, 2016 11:03:27 AM PST Francisco Jerez wrote:
> >> We'll need roughly the same logic in other places and it would be
> >> annoying to duplicate it.  Instead factor it out into a function-like
> >> macro that takes the number of dwords per block (which will prove more
> >> convenient than taking the same value in owords or some other unit).
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_defines.h |  6 ++
> >>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 ++
> >>  2 files changed, 8 insertions(+), 12 deletions(-)
> >> 
> >> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> >> b/src/mesa/drivers/dri/i965/brw_defines.h
> >> index cae8e9a..1c638a0 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> >> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> >> @@ -1669,6 +1669,12 @@ enum brw_message_target {
> >>  #define BRW_DATAPORT_OWORD_BLOCK_2_OWORDS 2
> >>  #define BRW_DATAPORT_OWORD_BLOCK_4_OWORDS 3
> >>  #define BRW_DATAPORT_OWORD_BLOCK_8_OWORDS 4
> >> +#define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n)  \
> >> +   ((n) == 4 ? BRW_DATAPORT_OWORD_BLOCK_1_OWORDLOW :\
> >> +(n) == 8 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS :  \
> >> +(n) == 16 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS : \
> >> +(n) == 32 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : \
> >> +(abort(), ~0))
> >
> > How about:
> >
> > #define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n) \
> >(assert(n == 4 || n == 8 || n == 16 || n == 32), ffs(n) - 3)
> >
> 
> I don't think that would work for n > 4 due to the rather unfortunate
> hardware encoding, e.g. BRW_DATAPORT_OWORD_BLOCK_2_OWORDS is supposed to
> be encoded as 2 but your macro would give 1 as result. :|

Whoops.  I suppose you could do:

#define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n) \
   ((n) == 4 ? 0 : (assert(n == 8 || n == 16 || n == 32), ffs(n) - 2))

but I'm not sure that's that much better...


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Make copy propagation not panic when it sees an intrinsic.

2016-12-11 Thread Timothy Arceri
On Mon, 2016-12-12 at 11:35 +1100, Timothy Arceri wrote:
> On Sun, 2016-12-11 at 00:00 -0800, Kenneth Graunke wrote:
> > 
> > On Saturday, December 10, 2016 12:37:16 PM PST Matt Turner wrote:
> > > 
> > > 
> > > On Fri, Dec 9, 2016 at 8:28 PM, Kenneth Graunke  > > e.
> > > org> wrote:
> > > > 
> > > > 
> > > > A number of games have large arrays of constants, which we
> > > > promote to
> > > > uniforms.  This introduces copies from the uniform array to the
> > > > original
> > > > temporary array.  Normally, copy propagation eliminates those
> > > > copies,
> > > > making everything refer to the uniform array directly.
> > > > 
> > > > A number of shaders in "Deus Ex: Mankind Divided" recently
> > > > exposed a
> > > > limitation of copy propagation - if we had any intrinsics (i.e.
> > > > image
> > > > access in a compute shader), we weren't able to get rid of
> > > > these
> > > > copies.
> > > > 
> > > > That meant that any variable indexing remained on the temporary
> > > > array
> > > > rather being moved to the uniform array.  i965's scalar backend
> > > > currently doesn't support indirect addressing of temporary
> > > > arrays,
> > > > which meant lowering it to if-ladders.  This was horrible.
> > > > 
> > > > On Skylake:
> > > > 
> > > > total instructions in shared programs: 13700090 -> 13654519 (-
> > > > 0.33%)
> > > > instructions in affected programs: 56438 -> 10867 (-80.75%)
> > > 
> > > Wow!
> > > 
> > > > 
> > > > 
> > > > helped: 14
> > > > HURT: 0
> > > > 
> > > > total cycles in shared programs: 288879704 -> 291270232 (0.83%)
> > > > cycles in affected programs: 12758080 -> 15148608 (18.74%)
> > > 
> > > ... that seems nuts?
> > > 
> > > Any idea what's going on with the cycle counts?
> > 
> > Good point...I glossed over the cycle counts when I saw the -80%
> > reduction in instructions with 0 shaders hurt.  But they do look
> > pretty bad, so let's take a closer look...
> > 
> > There are two nearly identical shaders that are the worst
> > offenders:
> > 
> > shaders/closed/steam/deus-ex-mankind-divided/256.shader_test CS
> > SIMD16:
> > 
> > instructions: 2770 -> 253 (-2,517 instructions or -90.87%)
> > spills: 25 -> 0
> > fills:  29 -> 0
> > cycles: 923266 -> 1420534 (+497,268 cycles or +53.86%)
> > compile time: 2.73 seconds -> 0.17 seconds
> > 
> > There are three loops in the program, each of which contains two
> > indirect reads of the uvec4[98] constant array.
> > 
> > Before this patch, there were:
> >  - 67 UNIFORM_PULL_CONSTANT_LOADs at the top of the program
> >  - 1 UNIFORM_PULL_CONSTANT_LOAD in the first (cheap) loop
> >  - 1 UNIFORM_PULL_CONSTANT_LOAD in the second (expensive) loop
> >  - 1 UNIFORM_PULL_CONSTANT_LOAD in the third (very expensive) loop
> > 
> > After this patch, there are:
> >  - 0 loads at the top of the program
> >  - 1 VARYING_PULL_CONSTANT_LOAD in the first (cheap) loop
> >  - 2 VARYING_PULL_CONSTANT_LOAD in the second (expensive) loop
> >  - 2 VARYING_PULL_CONSTANT_LOAD in the third (very expensive) loop
> > 
> > The array indexes in the expensive loop are a[foo] and a[foo + 1].
> > foo is modified in the loop, so they can't be hoisted out.  I don't
> > think we can determine the number of loop iterations.
> 
> The first loop has a trip count of 49. If I force it to unroll I get:
> 
> CS SIMD16 shader: 682 inst, 2 loops, 1009856 cycles.
> 
> The current unrolling rules are:
> 
> static bool
> is_loop_small_enough_to_unroll(nir_shader *shader, nir_loop_info *li)
> {
>    unsigned max_iter = shader->options->max_unroll_iterations;
> 
>    if (li->trip_count > max_iter)
>   return false;
> 
>    if (li->force_unroll) // this is for consecutive indirect array
> access
>   return true;
> 
>    bool loop_not_too_large =
>   li->num_instructions * li->trip_count <= max_iter * 25;
> 
>    return loop_not_too_large;
> }
> 
> 
> Maybe we should just drop the first trip count check? Checking the
> number of intructions * trip count makes a lot more sense. Dropping
> the
> check allows the first loop to unroll.

I tried to get this working for GLSL IR loop unrolling and fixed a
couple of the other problems, panicing at intrinsics (partialy fixed
-well I just ignore them for now), and nequal condition. 

But I also ran into futher issues:

1) Not being able to detect an induction variable because we end up
with: 

(assign  (x) (var_ref compiler_temp@19)  (expression uint bitcast_f2u
(swiz y (var_ref r1) )) ) 
(if (expression bool != (var_ref compiler_temp@19)
(constant uint (0)) ) (
  break
)

2) The GLSL IR analysis pass bails if the first if in the loop is not a
loop terminator.

It would have been nice to fix the GLSL IR pass so I could compare the
NIR pass to the improved version and try to avoid regressing shaders
like this but I'm stating to think it's not worth all the effort. What
do you think?

[1] 

Re: [Mesa-dev] [PATCH v3 2/3] svga: Fix a strict-aliasing violation in shader dumper

2016-12-11 Thread Edward O'Callaghan
Brian/Roland ping?

On 12/07/2016 10:30 AM, Edward O'Callaghan wrote:
> As per the C spec, it is illegal to alias pointers to different
> types. This results in undefined behaviour after optimization
> passes, resulting in very subtle bugs that happen only on a
> full moon..
> 
> Use a memcpy() as a well defined coercion between the isomorphic
> bit-field interpretations of memory.
> 
> V.2: Use C99 compat STATIC_ASSERT() over C11 static_assert().
> 
> Signed-off-by: Edward O'Callaghan 
> ---
>  src/gallium/drivers/svga/svgadump/svga_shader_dump.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/svga/svgadump/svga_shader_dump.c 
> b/src/gallium/drivers/svga/svgadump/svga_shader_dump.c
> index 4ee1bf2..46126a5 100644
> --- a/src/gallium/drivers/svga/svgadump/svga_shader_dump.c
> +++ b/src/gallium/drivers/svga/svgadump/svga_shader_dump.c
> @@ -30,6 +30,9 @@
>   * @author Michal Krol 
>   */
>  
> +#include 
> +#include 
> +
>  #include "svga_shader.h"
>  #include "svga_shader_dump.h"
>  #include "svga_shader_op.h"
> @@ -413,6 +416,11 @@ dump_dstreg(struct sh_dstreg dstreg,
>  
>  static void dump_srcreg( struct sh_srcreg srcreg, struct sh_srcreg *indreg, 
> const struct dump_info *di )
>  {
> +   struct sh_reg srcreg_sh = {0};
> +   /* bit-fields carefully aligned, ensure they stay that way. */
> +   STATIC_ASSERT(sizeof(struct sh_reg) == sizeof(struct sh_srcreg));
> +   memcpy(_sh, , sizeof(srcreg_sh));
> +
> switch (srcreg.modifier) {
> case SVGA3DSRCMOD_NEG:
> case SVGA3DSRCMOD_BIASNEG:
> @@ -427,7 +435,7 @@ static void dump_srcreg( struct sh_srcreg srcreg, struct 
> sh_srcreg *indreg, cons
> case SVGA3DSRCMOD_NOT:
>_debug_printf( "!" );
> }
> -   dump_reg( *(struct sh_reg *) , indreg, di );
> +   dump_reg(srcreg_sh, indreg, di );
> switch (srcreg.modifier) {
> case SVGA3DSRCMOD_NONE:
> case SVGA3DSRCMOD_NEG:
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] EGL/android: Enhance pbuffer implementation

2016-12-11 Thread Tomasz Figa
Hi,

On Fri, Dec 9, 2016 at 8:29 PM, Liu Zhiquan  wrote:
> Some dri drivers will pass multiple bits in buffer_mask parameter
> to droid_image_get_buffer(), more than the actual supported buffer
> type combination. For such case, will go through all the bits, and
> will not return error when unsupported buffer is requested, only
> return error when the allocation for supported buffer failed.
>
> v2: coding style and log changes
> v3: coding style changes and update patch format
>
> Signed-off-by: Liu Zhiquan 
> Signed-off-by: Long, Zhifang 
> Reviewed-by: Tomasz Figa 
> ---
>  src/egl/drivers/dri2/platform_android.c | 177 
> +---
>  1 file changed, 96 insertions(+), 81 deletions(-)
>

Looks good, thanks for patiently addressing all the nitpicks. Re-affirming:

Reviewed-by: Tomasz Figa 

Best regards,
Tomasz
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] vulkan: fix 'statement with no effect' warning

2016-12-11 Thread Jason Ekstrand
I have a patch in my tree that fixes that warning by making us actually
return the error code in that function rather than simply using it as a
fancy print function. I think that's the better fix.

On Dec 11, 2016 9:06 AM, "Grazvydas Ignotas"  wrote:

> Emitted on release build in case vk_error() return is not used.
>
> Signed-off-by: Grazvydas Ignotas 
> ---
> no commit access
>
>  src/vulkan/vk_util.c |  4 
>  src/vulkan/vk_util.h | 13 -
>  2 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/src/vulkan/vk_util.c b/src/vulkan/vk_util.c
> index 5d38117..716b8e5 100644
> --- a/src/vulkan/vk_util.c
> +++ b/src/vulkan/vk_util.c
> @@ -60,6 +60,8 @@ __vk_finishme(const char *file, int line, const char
> *format, ...)
> fprintf(stderr, "%s:%d: FINISHME: %s\n", file, line, buffer);
>  }
>
> +#ifdef DEBUG
> +
>  VkResult
>  __vk_errorf(VkResult error, const char *file, int line, const char
> *format, ...)
>  {
> @@ -103,3 +105,5 @@ __vk_errorf(VkResult error, const char *file, int
> line, const char *format, ...)
>
> return error;
>  }
> +
> +#endif /* DEBUG */
> diff --git a/src/vulkan/vk_util.h b/src/vulkan/vk_util.h
> index c384838..5ce206c 100644
> --- a/src/vulkan/vk_util.h
> +++ b/src/vulkan/vk_util.h
> @@ -72,15 +72,18 @@ align_i32(int32_t v, int32_t a)
>   * propagating errors. Might be useful to plug in a stack trace here.
>   */
>
> +#ifdef DEBUG
>  VkResult __vk_errorf(VkResult error, const char *file, int line, const
> char *format, ...);
> +#else
> +static inline VkResult
> +__vk_errorf(VkResult error, const char *file, int line, const char
> *format, ...)
> +{
> +   return error;
> +}
> +#endif
>
> -#ifdef DEBUG
>  #define vk_error(error) __vk_errorf(error, __FILE__, __LINE__, NULL);
>  #define vk_errorf(error, format, ...) __vk_errorf(error, __FILE__,
> __LINE__, format, ## __VA_ARGS__);
> -#else
> -#define vk_error(error) error
> -#define vk_errorf(error, format, ...) error
> -#endif
>
>  void __vk_finishme(const char *file, int line, const char *format, ...)
> vk_printflike(3, 4);
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] clover: Implement 'CL_MEM_OBJECT_IMAGE1D'

2016-12-11 Thread Francisco Jerez
Edward O'Callaghan  writes:

> Hi Francisco, thanks for the review..
>
> On 11/25/2016 03:39 PM, Francisco Jerez wrote:
>> Edward O'Callaghan  writes:
>> 
>>> Signed-off-by: Edward O'Callaghan 
>>> ---
>>>  src/gallium/state_trackers/clover/api/memory.cpp  |  9 -
>>>  src/gallium/state_trackers/clover/core/memory.cpp | 13 +
>>>  src/gallium/state_trackers/clover/core/memory.hpp | 10 ++
>>>  3 files changed, 31 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/gallium/state_trackers/clover/api/memory.cpp 
>>> b/src/gallium/state_trackers/clover/api/memory.cpp
>>> index 58f56d1..41e344e 100644
>>> --- a/src/gallium/state_trackers/clover/api/memory.cpp
>>> +++ b/src/gallium/state_trackers/clover/api/memory.cpp
>>> @@ -166,6 +166,14 @@ clCreateImage(cl_context d_ctx, cl_mem_flags d_flags,
>>> ret_error(r_errcode, CL_SUCCESS);
>>>  
>>> switch (desc->image_type) {
>>> +   case CL_MEM_OBJECT_IMAGE1D:
>>> +  if (!desc->image_width)
>>> + throw error(CL_INVALID_IMAGE_SIZE);
>>> +
>> 
>> We probably need to check that the specified image width is within the
>> device limits -- There's no device::max_image_levels_1d() query but
>> max_image_levels_2d() should work as well.
>> 
>>> +  return new image1d(ctx, flags, format,
>>> + desc->image_width,
>>> + desc->image_row_pitch, host_ptr);
>>> +
>>> case CL_MEM_OBJECT_IMAGE2D:
>>>if (!desc->image_width || !desc->image_height)
>>>   throw error(CL_INVALID_IMAGE_SIZE);
>>> @@ -214,7 +222,6 @@ clCreateImage(cl_context d_ctx, cl_mem_flags d_flags,
>>>   desc->image_depth, desc->image_row_pitch,
>>>   desc->image_slice_pitch, host_ptr);
>>>  
>>> -   case CL_MEM_OBJECT_IMAGE1D:
>>> case CL_MEM_OBJECT_IMAGE1D_ARRAY:
>>> case CL_MEM_OBJECT_IMAGE1D_BUFFER:
>>>// XXX - Not implemented.
>>> diff --git a/src/gallium/state_trackers/clover/core/memory.cpp 
>>> b/src/gallium/state_trackers/clover/core/memory.cpp
>>> index de1862b..246bd2d 100644
>>> --- a/src/gallium/state_trackers/clover/core/memory.cpp
>>> +++ b/src/gallium/state_trackers/clover/core/memory.cpp
>>> @@ -185,6 +185,19 @@ image::slice_pitch() const {
>>> return _slice_pitch;
>>>  }
>>>  
>>> +image1d::image1d(clover::context , cl_mem_flags flags,
>>> + const cl_image_format *format,
>>> + size_t width, size_t row_pitch,
>>> + void *host_ptr) :
>>> +   image(ctx, flags, format, width, 0, 1,
>> 
>> All surface dimension fields (width, height, depth) of a clover::image
>> object are considered valid regardless of the image type, so we should
>> set unused dimensions to 1 in order to avoid unexpected results while
>> doing arithmetic or various error checking with them.
>> 
>>> + row_pitch, 0, row_pitch, host_ptr) {
>> 
>> I don't think you can rely on the row pitch to be meaningful for 1D
>> images, it's probably necessary to pass it as argument to the
>> constructor, we're probably better off calculating the size by hand.
> Why not and what do you propose here exactly?
>

Because technically 1D images are a single row, so the row pitch is kind
of meaningless...  How about
'util_format_get_blocksize(translate_format(format)) * width'?

>> 
>>> +}
>>> +
>>> +cl_mem_object_type
>>> +image1d::type() const {
>>> +   return CL_MEM_OBJECT_IMAGE1D;
>>> +}
>>> +
>>>  image2d::image2d(clover::context , cl_mem_flags flags,
>>>   const cl_image_format *format, size_t width,
>>>   size_t height, size_t row_pitch,
>>> diff --git a/src/gallium/state_trackers/clover/core/memory.hpp 
>>> b/src/gallium/state_trackers/clover/core/memory.hpp
>>> index 1a3e8c9..ad9052b 100644
>>> --- a/src/gallium/state_trackers/clover/core/memory.hpp
>>> +++ b/src/gallium/state_trackers/clover/core/memory.hpp
>>> @@ -134,6 +134,16 @@ namespace clover {
>>> std::unique_ptr> resources;
>>> };
>>>  
>>> +   class image1d : public image {
>>> +   public:
>>> +  image1d(clover::context , cl_mem_flags flags,
>>> +  const cl_image_format *format,
>>> +  size_t width, size_t row_pitch,
>>> +  void *host_ptr);
>>> +
>>> +  virtual cl_mem_object_type type() const;
>>> +   };
>>> +
>>> class image2d : public image {
>>> public:
>>>image2d(clover::context , cl_mem_flags flags,
>>> -- 
>>> 2.7.4
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] clover: Implement 'CL_MEM_OBJECT_IMAGE2D_ARRAY'

2016-12-11 Thread Francisco Jerez
Edward O'Callaghan  writes:

> On 11/25/2016 03:38 PM, Francisco Jerez wrote:
>> Edward O'Callaghan  writes:
>> 
>>> Signed-off-by: Edward O'Callaghan 
>>> ---
>>>  src/gallium/state_trackers/clover/api/memory.cpp  | 17 -
>>>  src/gallium/state_trackers/clover/core/memory.cpp | 14 ++
>>>  src/gallium/state_trackers/clover/core/memory.hpp | 11 +++
>>>  3 files changed, 41 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/gallium/state_trackers/clover/api/memory.cpp 
>>> b/src/gallium/state_trackers/clover/api/memory.cpp
>>> index 9b3cd8b..58f56d1 100644
>>> --- a/src/gallium/state_trackers/clover/api/memory.cpp
>>> +++ b/src/gallium/state_trackers/clover/api/memory.cpp
>>> @@ -181,6 +181,22 @@ clCreateImage(cl_context d_ctx, cl_mem_flags d_flags,
>>>   desc->image_width, desc->image_height,
>>>   desc->image_row_pitch, host_ptr);
>>>  
>>> +   case CL_MEM_OBJECT_IMAGE2D_ARRAY:
>>> +  if (!desc->image_width || !desc->image_height)
>>> + throw error(CL_INVALID_IMAGE_SIZE);
>>> +
>>> +  if (all_of([=](const device ) {
>>> +   const size_t max = 1 << dev.max_image_levels_2d();
>>> +   return (desc->image_width > max ||
>>> +   desc->image_height > max);
>>> +}, ctx.devices()))
>>> + throw error(CL_INVALID_IMAGE_SIZE);
>>> +
>>> +  return new image2d_array(ctx, flags, format,
>>> +   desc->image_width, desc->image_height,
>>> +   desc->image_array_size, 
>>> desc->image_slice_pitch,
>>> +   host_ptr);
>>> +
>>> case CL_MEM_OBJECT_IMAGE3D:
>>>if (!desc->image_width || !desc->image_height || !desc->image_depth)
>>>   throw error(CL_INVALID_IMAGE_SIZE);
>>> @@ -201,7 +217,6 @@ clCreateImage(cl_context d_ctx, cl_mem_flags d_flags,
>>> case CL_MEM_OBJECT_IMAGE1D:
>>> case CL_MEM_OBJECT_IMAGE1D_ARRAY:
>>> case CL_MEM_OBJECT_IMAGE1D_BUFFER:
>>> -   case CL_MEM_OBJECT_IMAGE2D_ARRAY:
>>>// XXX - Not implemented.
>>>throw error(CL_IMAGE_FORMAT_NOT_SUPPORTED);
>>>  
>>> diff --git a/src/gallium/state_trackers/clover/core/memory.cpp 
>>> b/src/gallium/state_trackers/clover/core/memory.cpp
>>> index b852e68..de1862b 100644
>>> --- a/src/gallium/state_trackers/clover/core/memory.cpp
>>> +++ b/src/gallium/state_trackers/clover/core/memory.cpp
>>> @@ -198,6 +198,20 @@ image2d::type() const {
>>> return CL_MEM_OBJECT_IMAGE2D;
>>>  }
>>>  
>>> +image2d_array::image2d_array(clover::context , cl_mem_flags flags,
>>> + const cl_image_format *format,
>>> + size_t width, size_t height,
>>> + size_t array_size, size_t slice_pitch,
>>> + void *host_ptr) :
>>> +   image(ctx, flags, format, width, height, 1,
>>> + 0, slice_pitch, slice_pitch * array_size, host_ptr) {
>>> +}
>> 
>> Seems like you aren't passing the row pitch as argument, and the array
>> size is getting lost so you won't be able to recover it in order to
>> implement memory transfer functions and CL image queries.
> So I should extend the image base class with a private _array_size state
> and a const getter method also, would that be reasonable?
>

Yeah, that sounds reasonable to me.

>> 
>>> +
>>> +cl_mem_object_type
>>> +image2d_array::type() const {
>>> +   return CL_MEM_OBJECT_IMAGE2D_ARRAY;
>>> +}
>>> +
>>>  image3d::image3d(clover::context , cl_mem_flags flags,
>>>   const cl_image_format *format,
>>>   size_t width, size_t height, size_t depth,
>>> diff --git a/src/gallium/state_trackers/clover/core/memory.hpp 
>>> b/src/gallium/state_trackers/clover/core/memory.hpp
>>> index bd6da6b..1a3e8c9 100644
>>> --- a/src/gallium/state_trackers/clover/core/memory.hpp
>>> +++ b/src/gallium/state_trackers/clover/core/memory.hpp
>>> @@ -144,6 +144,17 @@ namespace clover {
>>>virtual cl_mem_object_type type() const;
>>> };
>>>  
>>> +   class image2d_array : public image {
>>> +   public:
>>> +  image2d_array(clover::context , cl_mem_flags flags,
>>> +const cl_image_format *format,
>>> +size_t width, size_t height,
>>> +size_t array_size, size_t slice_pitch,
>>> +void *host_ptr);
>>> +
>>> +  virtual cl_mem_object_type type() const;
>>> +   };
>>> +
>>> class image3d : public image {
>>> public:
>>>image3d(clover::context , cl_mem_flags flags,
>>> -- 
>>> 2.7.4
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___

Re: [Mesa-dev] [PATCH v3 2/2] clover: add GetKernelArgInfo (CL 1.2)

2016-12-11 Thread Francisco Jerez
Serge Martin  writes:

> ---
>  src/gallium/state_trackers/clover/api/kernel.cpp   | 92 
> +-
>  src/gallium/state_trackers/clover/core/module.cpp  | 14 
>  src/gallium/state_trackers/clover/core/module.hpp  | 11 +++
>  .../state_trackers/clover/llvm/codegen/common.cpp  | 11 +++
>  4 files changed, 125 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/state_trackers/clover/api/kernel.cpp 
> b/src/gallium/state_trackers/clover/api/kernel.cpp
> index 73ba34a..81a4a38 100644
> --- a/src/gallium/state_trackers/clover/api/kernel.cpp
> +++ b/src/gallium/state_trackers/clover/api/kernel.cpp
> @@ -192,9 +192,95 @@ clGetKernelWorkGroupInfo(cl_kernel d_kern, cl_device_id 
> d_dev,
>  CLOVER_API cl_int
>  clGetKernelArgInfo(cl_kernel d_kern,
> cl_uint idx, cl_kernel_arg_info param,
> -   size_t size, void *r_buf, size_t *r_size) {
> -   CLOVER_NOT_SUPPORTED_UNTIL("1.2");
> -   return CL_KERNEL_ARG_INFO_NOT_AVAILABLE;
> +   size_t size, void *r_buf, size_t *r_size) try {
> +   property_buffer buf { r_buf, size, r_size };
> +   const auto  = obj(d_kern);
> +
> +   const auto args =
> +  find(name_equals(kern.name()), kern.program().symbols()).args;
> +
> +   if (args.at(0).arg_info.arg_name.empty())
> +  throw error(CL_KERNEL_ARG_INFO_NOT_AVAILABLE);
> +
> +   const auto  = find([&](const clover::module::argument ) {
> +   return arg.arg_info.index == idx;
> +}, args);
> +
> +   switch (param) {
> +   case CL_KERNEL_ARG_ADDRESS_QUALIFIER: {
> +  cl_kernel_arg_address_qualifier v;
> +  switch (arg.type) {
> + case module::argument::local:
> +v = CL_KERNEL_ARG_ADDRESS_LOCAL;
> +break;
> + case module::argument::constant:
> +v = CL_KERNEL_ARG_ADDRESS_CONSTANT;
> +break;
> + case module::argument::global:
> + case module::argument::image2d_rd:
> + case module::argument::image2d_wr:
> + case module::argument::image3d_rd:
> + case module::argument::image3d_wr:
> +v = CL_KERNEL_ARG_ADDRESS_GLOBAL;
> +break;
> + default:
> +v = CL_KERNEL_ARG_ADDRESS_PRIVATE;
> + }
> +  buf.as_scalar() = v;
> +  break;
> +   }
> +
> +   case CL_KERNEL_ARG_ACCESS_QUALIFIER: {
> +  cl_kernel_arg_access_qualifier v;
> +  switch (arg.type) {
> + case module::argument::image2d_rd:
> + case module::argument::image3d_rd:
> +v = CL_KERNEL_ARG_ACCESS_READ_ONLY;
> +break;
> + case module::argument::image2d_wr:
> + case module::argument::image3d_wr:
> +v = CL_KERNEL_ARG_ACCESS_WRITE_ONLY;
> +break;
> + default:
> +v = CL_KERNEL_ARG_ACCESS_NONE;
> + }
> +  buf.as_scalar() = v;
> +  break;
> +   }
> +
> +   case CL_KERNEL_ARG_TYPE_NAME:
> +  buf.as_string() = arg.arg_info.type_name;
> +  break;
> +
> +   case CL_KERNEL_ARG_TYPE_QUALIFIER: {
> +  cl_kernel_arg_type_qualifier v = CL_KERNEL_ARG_TYPE_NONE;
> +
> +  if (arg.arg_info.type_qualifier.find("const") != std::string::npos)
> + v |= CL_KERNEL_ARG_TYPE_CONST;
> +  if (arg.arg_info.type_qualifier.find("restrict") != std::string::npos)
> + v |= CL_KERNEL_ARG_TYPE_RESTRICT;
> +  if (arg.arg_info.type_qualifier.find("volatile") != std::string::npos)
> + v |= CL_KERNEL_ARG_TYPE_VOLATILE;
> +
> +  buf.as_scalar() = v;
> +  break;
> +   }
> +
> +   case CL_KERNEL_ARG_NAME:
> +  buf.as_string() = arg.arg_info.arg_name;
> +  break;
> +
> +   default:
> +  throw error(CL_INVALID_VALUE);
> +   }
> +
> +   return CL_SUCCESS;
> +
> +} catch (std::out_of_range ) {
> +   return CL_INVALID_ARG_INDEX;
> +
> +} catch (error ) {
> +   return e.get();
>  }
>  
>  namespace {
> diff --git a/src/gallium/state_trackers/clover/core/module.cpp 
> b/src/gallium/state_trackers/clover/core/module.cpp
> index a6c5b98..1b6b642 100644
> --- a/src/gallium/state_trackers/clover/core/module.cpp
> +++ b/src/gallium/state_trackers/clover/core/module.cpp
> @@ -168,6 +168,19 @@ namespace {
>}
> };
>  
> +   /// (De)serialize a module::argument::info
> +   template<>
> +   struct _serializer {
> +  template
> +  static void
> +  proc(S , QT ) {
> + _proc(s, x.index);
> + _proc(s, x.type_name);
> + _proc(s, x.type_qualifier);
> + _proc(s, x.arg_name);
> +  }
> +   };
> +
> /// (De)serialize a module::argument.
> template<>
> struct _serializer {
> @@ -180,6 +193,7 @@ namespace {
>   _proc(s, x.target_align);
>   _proc(s, x.ext_type);
>   _proc(s, x.semantic);
> + _proc(s, x.arg_info);
>}
> };
>  
> diff --git a/src/gallium/state_trackers/clover/core/module.hpp 
> 

Re: [Mesa-dev] [PATCH 4/9] i965: Factor out oword block read and write message control calculation.

2016-12-11 Thread Francisco Jerez
Kenneth Graunke  writes:

> On Friday, December 9, 2016 11:03:27 AM PST Francisco Jerez wrote:
>> We'll need roughly the same logic in other places and it would be
>> annoying to duplicate it.  Instead factor it out into a function-like
>> macro that takes the number of dwords per block (which will prove more
>> convenient than taking the same value in owords or some other unit).
>> ---
>>  src/mesa/drivers/dri/i965/brw_defines.h |  6 ++
>>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 ++
>>  2 files changed, 8 insertions(+), 12 deletions(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
>> b/src/mesa/drivers/dri/i965/brw_defines.h
>> index cae8e9a..1c638a0 100644
>> --- a/src/mesa/drivers/dri/i965/brw_defines.h
>> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
>> @@ -1669,6 +1669,12 @@ enum brw_message_target {
>>  #define BRW_DATAPORT_OWORD_BLOCK_2_OWORDS 2
>>  #define BRW_DATAPORT_OWORD_BLOCK_4_OWORDS 3
>>  #define BRW_DATAPORT_OWORD_BLOCK_8_OWORDS 4
>> +#define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n)  \
>> +   ((n) == 4 ? BRW_DATAPORT_OWORD_BLOCK_1_OWORDLOW :\
>> +(n) == 8 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS :  \
>> +(n) == 16 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS : \
>> +(n) == 32 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : \
>> +(abort(), ~0))
>
> How about:
>
> #define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n) \
>(assert(n == 4 || n == 8 || n == 16 || n == 32), ffs(n) - 3)
>

I don't think that would work for n > 4 due to the rather unfortunate
hardware encoding, e.g. BRW_DATAPORT_OWORD_BLOCK_2_OWORDS is supposed to
be encoded as 2 but your macro would give 1 as result. :|

>>  
>>  #define BRW_DATAPORT_OWORD_DUAL_BLOCK_1OWORD 0
>>  #define BRW_DATAPORT_OWORD_DUAL_BLOCK_4OWORDS2
>> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
>> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
>> index 341f543..6141bfb 100644
>> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
>> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
>> @@ -2056,11 +2056,6 @@ void brw_oword_block_write_scratch(struct brw_codegen 
>> *p,
>> mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
>>  
>> const unsigned mlen = 1 + num_regs;
>> -   const unsigned msg_control =
>> -  (num_regs == 1 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS :
>> -   num_regs == 2 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS :
>> -   num_regs == 4 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : 0);
>> -   assert(msg_control);
>>  
>> /* Set up the message header.  This is g0, with g0.2 filled with
>>  * the offset.  We don't want to leave our offset around in g0 or
>> @@ -2134,7 +2129,7 @@ void brw_oword_block_write_scratch(struct brw_codegen 
>> *p,
>>brw_set_dp_write_message(p,
>> insn,
>> brw_scratch_surface_idx(p),
>> -   msg_control,
>> +   BRW_DATAPORT_OWORD_BLOCK_DWORDS(num_regs * 8),
>> msg_type,
>> target_cache,
>> mlen,
>> @@ -2181,11 +2176,6 @@ brw_oword_block_read_scratch(struct brw_codegen *p,
>> dest = retype(dest, BRW_REGISTER_TYPE_UW);
>>  
>> const unsigned rlen = num_regs;
>> -   const unsigned msg_control =
>> -  (num_regs == 1 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS :
>> -   num_regs == 2 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS :
>> -   num_regs == 4 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : 0);
>> -   assert(msg_control);
>> const unsigned target_cache =
>>(devinfo->gen >= 7 ? GEN7_SFID_DATAPORT_DATA_CACHE :
>> devinfo->gen >= 6 ? GEN6_SFID_DATAPORT_RENDER_CACHE :
>> @@ -,7 +2212,7 @@ brw_oword_block_read_scratch(struct brw_codegen *p,
>>brw_set_dp_read_message(p,
>>insn,
>>brw_scratch_surface_idx(p),
>> -  msg_control,
>> +  BRW_DATAPORT_OWORD_BLOCK_DWORDS(num_regs * 8),
>>BRW_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ, /* 
>> msg_type */
>>target_cache,
>>1, /* msg_length */
>> 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] fp64 support for Intel HSW

2016-12-11 Thread Darius Spitznagel

Hello devs,

could someone please ack the patches from the Igalia guys.
It is really frustrating that their patches are still not in master.

By the way. I use this branch now for months...
https://github.com/Igalia/mesa/tree/i965-va64-gen7-scalar-vec4
and it also enables va64 for HSW and it works! > OpenGL 4.2 (4.3 could 
also be enabled I think).


Thank you all for you great work and a happy Advent season from

Darius Spitznagel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: don't attempt to unlock an unlocked debug state mutex

2016-12-11 Thread Jonathan Gray
Previously there was no _mesa_unlock_debug_state() call at all,
one is still retained here for the case where _mesa_lock_debug_state()
is called and did not return NULL (which is documented as being
unlocked).

On Mon, Dec 12, 2016 at 11:40:32AM +1100, Edward O'Callaghan wrote:
> Hold up..
> 
> Does this reintroduce the hang in glsl-fs-loop piglit test with
> MESA_DEBUG=context though? Was that tested? I'm interested to know how
> this got so muddled up in the first place.
> 
> Kind Regards,
> Edward.
> 
> On 12/12/2016 05:21 AM, Eduardo Lima Mitev wrote:
> > Looks good.
> > 
> > Reviewed-by: Eduardo Lima Mitev 
> > 
> > On 12/11/2016 04:42 PM, Jonathan Gray wrote:
> >> Commit 929fcee47e46781c57f2a354ce0a013915c033d1 introduced code that
> >> attempts to unlock an unlocked mutex which is undefined behaviour.
> >>
> >> On OpenBSD this leads to an abort:
> >>
> >> 0  0x124dadfa96ba in thrkill () at :2
> >> 1  0x124dadf3da39 in *_libc_abort () at 
> >> /usr/src/lib/libc/stdlib/abort.c:52
> >> 2  0x124d2c1165b5 in *_libpthread_pthread_mutex_unlock 
> >> (mutexp=)
> >> at /usr/src/lib/librthread/rthread_sync.c:221
> >> 3  0x124d279c02e4 in init_attrib_groups (ctx=0x124df0fda000) at 
> >> main/context.c:825
> >> 4  _mesa_initialize_context (ctx=ctx@entry=0x124df0fda000, 
> >> api=api@entry=API_OPENGL_CORE,
> >> visual=visual@entry=0x7f7bdfd0, share_list=share_list@entry=0x0,
> >> driverFunctions=driverFunctions@entry=0x7f7bda60) at 
> >> main/context.c:1204
> >> 5  0x124d27b507ec in st_create_context (api=api@entry=API_OPENGL_CORE,
> >> pipe=pipe@entry=0x124dc491, visual=visual@entry=0x7f7bdfd0,
> >> share=share@entry=0x0, options=options@entry=0x7f7be128)
> >> at state_tracker/st_context.c:545
> >> 6  0x124d27b8639f in st_api_create_context (stapi=,
> >> smapi=0x124d1b608800, attribs=0x7f7be100, error=0x7f7be0fc, 
> >> shared_stctxi=0x0)
> >> at state_tracker/st_manager.c:669
> >> 7  0x124d27cc5b9c in dri_create_context (api=, 
> >> visual=0x124d8a0f8a00,
> >> cPriv=0x124de473f240, major_version=, 
> >> minor_version=,
> >> flags=, notify_reset=false, error=0x7f7be2b4,
> >> sharedContextPrivate=0x0) at dri_context.c:123
> >> 8  0x124d27cc5029 in driCreateContextAttribs (screen=0x124d8a0f8400,
> >> api=, config=0x124d8a0f8a00, shared=,
> >> num_attribs=, attribs=, 
> >> error=0x7f7be2b4,
> >> data=0x124d77814a00) at dri_util.c:448
> >> 9  0x124d8e109b00 in drisw_create_context_attribs (base=0x124df3e08700,
> >> config_base=0x124d7a0e7300, shareList=, 
> >> num_attribs=,
> >> attribs=, error=0x7f7be2b4) at drisw_glx.c:476
> >> 10 0x124d8e104b4a in glXCreateContextAttribsARB (dpy=0x124d533f,
> >> config=0x124d7a0e7300, share_context=0x0, direct=1, 
> >> attrib_list=0x7f7be300)
> >> at create_context.c:78
> >>
> >> Signed-off-by: Jonathan Gray 
> >> ---
> >>  src/mesa/main/debug_output.c | 5 ++---
> >>  1 file changed, 2 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c
> >> index 48dbbb3..bc933db 100644
> >> --- a/src/mesa/main/debug_output.c
> >> +++ b/src/mesa/main/debug_output.c
> >> @@ -1282,14 +1282,13 @@ _mesa_init_debug_output(struct gl_context *ctx)
> >> */
> >>struct gl_debug_state *debug = _mesa_lock_debug_state(ctx);
> >>if (!debug) {
> >> - goto done;
> >> + return;
> >>}
> >>debug->DebugOutput = GL_TRUE;
> >>debug->LogToStderr = GL_TRUE;
> >>ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT;
> >> +  _mesa_unlock_debug_state(ctx);
> >> }
> >> -done:
> >> -   _mesa_unlock_debug_state(ctx);
> >>  }
> >>  
> >>  
> >>
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> 



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/9] i965: Factor out oword block read and write message control calculation.

2016-12-11 Thread Kenneth Graunke
On Friday, December 9, 2016 11:03:27 AM PST Francisco Jerez wrote:
> We'll need roughly the same logic in other places and it would be
> annoying to duplicate it.  Instead factor it out into a function-like
> macro that takes the number of dwords per block (which will prove more
> convenient than taking the same value in owords or some other unit).
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h |  6 ++
>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 ++
>  2 files changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index cae8e9a..1c638a0 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -1669,6 +1669,12 @@ enum brw_message_target {
>  #define BRW_DATAPORT_OWORD_BLOCK_2_OWORDS 2
>  #define BRW_DATAPORT_OWORD_BLOCK_4_OWORDS 3
>  #define BRW_DATAPORT_OWORD_BLOCK_8_OWORDS 4
> +#define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n)  \
> +   ((n) == 4 ? BRW_DATAPORT_OWORD_BLOCK_1_OWORDLOW :\
> +(n) == 8 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS :  \
> +(n) == 16 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS : \
> +(n) == 32 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : \
> +(abort(), ~0))

How about:

#define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n) \
   (assert(n == 4 || n == 8 || n == 16 || n == 32), ffs(n) - 3)

>  
>  #define BRW_DATAPORT_OWORD_DUAL_BLOCK_1OWORD 0
>  #define BRW_DATAPORT_OWORD_DUAL_BLOCK_4OWORDS2
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index 341f543..6141bfb 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -2056,11 +2056,6 @@ void brw_oword_block_write_scratch(struct brw_codegen 
> *p,
> mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
>  
> const unsigned mlen = 1 + num_regs;
> -   const unsigned msg_control =
> -  (num_regs == 1 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS :
> -   num_regs == 2 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS :
> -   num_regs == 4 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : 0);
> -   assert(msg_control);
>  
> /* Set up the message header.  This is g0, with g0.2 filled with
>  * the offset.  We don't want to leave our offset around in g0 or
> @@ -2134,7 +2129,7 @@ void brw_oword_block_write_scratch(struct brw_codegen 
> *p,
>brw_set_dp_write_message(p,
>  insn,
> brw_scratch_surface_idx(p),
> -msg_control,
> +BRW_DATAPORT_OWORD_BLOCK_DWORDS(num_regs * 8),
>  msg_type,
> target_cache,
>  mlen,
> @@ -2181,11 +2176,6 @@ brw_oword_block_read_scratch(struct brw_codegen *p,
> dest = retype(dest, BRW_REGISTER_TYPE_UW);
>  
> const unsigned rlen = num_regs;
> -   const unsigned msg_control =
> -  (num_regs == 1 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS :
> -   num_regs == 2 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS :
> -   num_regs == 4 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : 0);
> -   assert(msg_control);
> const unsigned target_cache =
>(devinfo->gen >= 7 ? GEN7_SFID_DATAPORT_DATA_CACHE :
> devinfo->gen >= 6 ? GEN6_SFID_DATAPORT_RENDER_CACHE :
> @@ -,7 +2212,7 @@ brw_oword_block_read_scratch(struct brw_codegen *p,
>brw_set_dp_read_message(p,
> insn,
>brw_scratch_surface_idx(p),
> -   msg_control,
> +   BRW_DATAPORT_OWORD_BLOCK_DWORDS(num_regs * 8),
> BRW_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ, /* 
> msg_type */
> target_cache,
> 1, /* msg_length */
> 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 1/2] clover: fix sampler argument type detection

2016-12-11 Thread Francisco Jerez
Serge Martin  writes:

> ---
>  src/gallium/state_trackers/clover/llvm/codegen/common.cpp | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/src/gallium/state_trackers/clover/llvm/codegen/common.cpp 
> b/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> index 13ccd59..aa6ca50 100644
> --- a/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> @@ -118,6 +118,11 @@ namespace {
>module::argument::zero_ext,
>module::argument::image_format);
>  
> + } else if (type_name == "sampler_t") {
> +args.emplace_back(module::argument::sampler, arg_api_size,
> +  target_size, target_align,
> +  module::argument::zero_ext);
> +

Change looks reasonable, but, do you expect this to write anything into
the kernel's input buffer?  Currently it won't (see the implementation
of sampler_argument::bind in core/kernel.cpp).  If you didn't expect it
to, you should probably pass zero instead of 'target_size' and
'target_align' for consistency.  If you did, you'll need to fix
core/kernel.cpp in addition.

>   } else {
>  // Other types.
>  const auto actual_type =
> -- 
> 2.5.5


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: don't attempt to unlock an unlocked debug state mutex

2016-12-11 Thread Edward O'Callaghan
Hold up..

Does this reintroduce the hang in glsl-fs-loop piglit test with
MESA_DEBUG=context though? Was that tested? I'm interested to know how
this got so muddled up in the first place.

Kind Regards,
Edward.

On 12/12/2016 05:21 AM, Eduardo Lima Mitev wrote:
> Looks good.
> 
> Reviewed-by: Eduardo Lima Mitev 
> 
> On 12/11/2016 04:42 PM, Jonathan Gray wrote:
>> Commit 929fcee47e46781c57f2a354ce0a013915c033d1 introduced code that
>> attempts to unlock an unlocked mutex which is undefined behaviour.
>>
>> On OpenBSD this leads to an abort:
>>
>> 0  0x124dadfa96ba in thrkill () at :2
>> 1  0x124dadf3da39 in *_libc_abort () at 
>> /usr/src/lib/libc/stdlib/abort.c:52
>> 2  0x124d2c1165b5 in *_libpthread_pthread_mutex_unlock 
>> (mutexp=)
>> at /usr/src/lib/librthread/rthread_sync.c:221
>> 3  0x124d279c02e4 in init_attrib_groups (ctx=0x124df0fda000) at 
>> main/context.c:825
>> 4  _mesa_initialize_context (ctx=ctx@entry=0x124df0fda000, 
>> api=api@entry=API_OPENGL_CORE,
>> visual=visual@entry=0x7f7bdfd0, share_list=share_list@entry=0x0,
>> driverFunctions=driverFunctions@entry=0x7f7bda60) at 
>> main/context.c:1204
>> 5  0x124d27b507ec in st_create_context (api=api@entry=API_OPENGL_CORE,
>> pipe=pipe@entry=0x124dc491, visual=visual@entry=0x7f7bdfd0,
>> share=share@entry=0x0, options=options@entry=0x7f7be128)
>> at state_tracker/st_context.c:545
>> 6  0x124d27b8639f in st_api_create_context (stapi=,
>> smapi=0x124d1b608800, attribs=0x7f7be100, error=0x7f7be0fc, 
>> shared_stctxi=0x0)
>> at state_tracker/st_manager.c:669
>> 7  0x124d27cc5b9c in dri_create_context (api=, 
>> visual=0x124d8a0f8a00,
>> cPriv=0x124de473f240, major_version=, 
>> minor_version=,
>> flags=, notify_reset=false, error=0x7f7be2b4,
>> sharedContextPrivate=0x0) at dri_context.c:123
>> 8  0x124d27cc5029 in driCreateContextAttribs (screen=0x124d8a0f8400,
>> api=, config=0x124d8a0f8a00, shared=,
>> num_attribs=, attribs=, 
>> error=0x7f7be2b4,
>> data=0x124d77814a00) at dri_util.c:448
>> 9  0x124d8e109b00 in drisw_create_context_attribs (base=0x124df3e08700,
>> config_base=0x124d7a0e7300, shareList=, 
>> num_attribs=,
>> attribs=, error=0x7f7be2b4) at drisw_glx.c:476
>> 10 0x124d8e104b4a in glXCreateContextAttribsARB (dpy=0x124d533f,
>> config=0x124d7a0e7300, share_context=0x0, direct=1, 
>> attrib_list=0x7f7be300)
>> at create_context.c:78
>>
>> Signed-off-by: Jonathan Gray 
>> ---
>>  src/mesa/main/debug_output.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c
>> index 48dbbb3..bc933db 100644
>> --- a/src/mesa/main/debug_output.c
>> +++ b/src/mesa/main/debug_output.c
>> @@ -1282,14 +1282,13 @@ _mesa_init_debug_output(struct gl_context *ctx)
>> */
>>struct gl_debug_state *debug = _mesa_lock_debug_state(ctx);
>>if (!debug) {
>> - goto done;
>> + return;
>>}
>>debug->DebugOutput = GL_TRUE;
>>debug->LogToStderr = GL_TRUE;
>>ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT;
>> +  _mesa_unlock_debug_state(ctx);
>> }
>> -done:
>> -   _mesa_unlock_debug_state(ctx);
>>  }
>>  
>>  
>>
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98869] Electronic Super Joy graphic artefacts (regression)

2016-12-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98869

--- Comment #5 from Mark Janes  ---
If you can't bisect mesa, then providing an apitrace of the workload will allow
someone else to bisect it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Make copy propagation not panic when it sees an intrinsic.

2016-12-11 Thread Timothy Arceri
On Sun, 2016-12-11 at 00:00 -0800, Kenneth Graunke wrote:
> On Saturday, December 10, 2016 12:37:16 PM PST Matt Turner wrote:
> > 
> > On Fri, Dec 9, 2016 at 8:28 PM, Kenneth Graunke  > org> wrote:
> > > 
> > > A number of games have large arrays of constants, which we
> > > promote to
> > > uniforms.  This introduces copies from the uniform array to the
> > > original
> > > temporary array.  Normally, copy propagation eliminates those
> > > copies,
> > > making everything refer to the uniform array directly.
> > > 
> > > A number of shaders in "Deus Ex: Mankind Divided" recently
> > > exposed a
> > > limitation of copy propagation - if we had any intrinsics (i.e.
> > > image
> > > access in a compute shader), we weren't able to get rid of these
> > > copies.
> > > 
> > > That meant that any variable indexing remained on the temporary
> > > array
> > > rather being moved to the uniform array.  i965's scalar backend
> > > currently doesn't support indirect addressing of temporary
> > > arrays,
> > > which meant lowering it to if-ladders.  This was horrible.
> > > 
> > > On Skylake:
> > > 
> > > total instructions in shared programs: 13700090 -> 13654519 (-
> > > 0.33%)
> > > instructions in affected programs: 56438 -> 10867 (-80.75%)
> > 
> > Wow!
> > 
> > > 
> > > helped: 14
> > > HURT: 0
> > > 
> > > total cycles in shared programs: 288879704 -> 291270232 (0.83%)
> > > cycles in affected programs: 12758080 -> 15148608 (18.74%)
> > 
> > ... that seems nuts?
> > 
> > Any idea what's going on with the cycle counts?
> 
> Good point...I glossed over the cycle counts when I saw the -80%
> reduction in instructions with 0 shaders hurt.  But they do look
> pretty bad, so let's take a closer look...
> 
> There are two nearly identical shaders that are the worst offenders:
> 
> shaders/closed/steam/deus-ex-mankind-divided/256.shader_test CS
> SIMD16:
> 
> instructions: 2770 -> 253 (-2,517 instructions or -90.87%)
> spills: 25 -> 0
> fills:  29 -> 0
> cycles: 923266 -> 1420534 (+497,268 cycles or +53.86%)
> compile time: 2.73 seconds -> 0.17 seconds
> 
> There are three loops in the program, each of which contains two
> indirect reads of the uvec4[98] constant array.
> 
> Before this patch, there were:
>  - 67 UNIFORM_PULL_CONSTANT_LOADs at the top of the program
>  - 1 UNIFORM_PULL_CONSTANT_LOAD in the first (cheap) loop
>  - 1 UNIFORM_PULL_CONSTANT_LOAD in the second (expensive) loop
>  - 1 UNIFORM_PULL_CONSTANT_LOAD in the third (very expensive) loop
> 
> After this patch, there are:
>  - 0 loads at the top of the program
>  - 1 VARYING_PULL_CONSTANT_LOAD in the first (cheap) loop
>  - 2 VARYING_PULL_CONSTANT_LOAD in the second (expensive) loop
>  - 2 VARYING_PULL_CONSTANT_LOAD in the third (very expensive) loop
> 
> The array indexes in the expensive loop are a[foo] and a[foo + 1].
> foo is modified in the loop, so they can't be hoisted out.  I don't
> think we can determine the number of loop iterations.

The first loop has a trip count of 49. If I force it to unroll I get:

CS SIMD16 shader: 682 inst, 2 loops, 1009856 cycles.

The current unrolling rules are:

static bool
is_loop_small_enough_to_unroll(nir_shader *shader, nir_loop_info *li)
{
   unsigned max_iter = shader->options->max_unroll_iterations;

   if (li->trip_count > max_iter)
  return false;

   if (li->force_unroll) // this is for consecutive indirect array
access
  return true;

   bool loop_not_too_large =
  li->num_instructions * li->trip_count <= max_iter * 25;

   return loop_not_too_large;
}


Maybe we should just drop the first trip count check? Checking the
number of intructions * trip count makes a lot more sense. Dropping the
check allows the first loop to unroll.

> 
> The two expensive loops look to be twice as expensive after this
> patch.
> The numbers aren't quite adding up for me - it looks like we should
> spend 200 more cycles per loop iteration, but the loops are like
> 40,000
> -> 90,000 cycles.
> 
> I'm not sure what to do with this information.  Eliminating 90% of
> the
> instructions seems good.  Requiring no scratch access seems good.
> Eliminating the 67 memory loads outside of the loops seems good.
> Doing two memory loads per loop doesn't seem too crazy, given that
> it matches the GLSL source code.  Burning 49 registers to store the
> entire array for the lifetime of the program seems pretty crazy...
> 
> --Ken
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] gallivm: optimize gather a bit, by using supplied destination type

2016-12-11 Thread sroland
From: Roland Scheidegger 

By using a dst_type in the the gather interface, gather has some more
knowledge about how values should be fetched.
E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather
will no longer do a ZExt with a 96bit scalar value to 128bit, but
just fetch the 96bit as 3x32bit vector (this is still going to be
2 loads of course, but the loads can be done directly to simd vector
that way).
Also, we can now do some try to use the right int/float type. This should
make no difference really since there's typically no domain transition
penalties for such simd loads, however it actually makes a difference
since llvm will use different shuffle lowering afterwards so the caller
can use this to trick llvm into using sane shuffle afterwards (and yes
llvm is really stupid there - nothing against using the shuffle
instruction from the correct domain, but not at the cost of doing 3 times
more shuffles, the case which actually matters is refusal to use shufps
for integer values).
Also do some attempt to avoid things which look great on paper but llvm
doesn't really handle (e.g. fetching 3-element 8 bit and 16 bit vectors
which is simply disastrous - I suspect type legalizer is to blame trying
to extend these vectors to 128bit types somehow, so fetching these with
scalars like before which is suboptimal due to the ZExt).

Remove the ability for truncation (no point, this is gather, not conversion)
as it is complex enough already.

While here also implement not just the float, but also the 64bit avx2
gathers (disabled though since based on the theoretical numbers the benefit
just isn't there at all until Skylake at least).
---
 src/gallium/auxiliary/draw/draw_llvm.c |   2 +-
 src/gallium/auxiliary/gallivm/lp_bld_format_aos.c  |   5 +-
 .../auxiliary/gallivm/lp_bld_format_aos_array.c|   9 +-
 src/gallium/auxiliary/gallivm/lp_bld_format_soa.c  |  23 +-
 src/gallium/auxiliary/gallivm/lp_bld_format_yuv.c  |   4 +-
 src/gallium/auxiliary/gallivm/lp_bld_gather.c  | 359 +
 src/gallium/auxiliary/gallivm/lp_bld_gather.h  |   2 +-
 src/gallium/auxiliary/gallivm/lp_bld_sample_aos.c  |   8 +-
 8 files changed, 333 insertions(+), 79 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index c548572..19b75a5 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1864,7 +1864,7 @@ draw_llvm_generate(struct draw_llvm *llvm, struct 
draw_llvm_variant *variant)

LLVMPointerType(LLVMInt8TypeInContext(context),
0), "");
  tmp = lp_build_gather(gallivm, vs_type.length,
-   32, 32, TRUE,
+   32, bld.type, TRUE,
fetch_elts, tmp, FALSE);
  LLVMBuildStore(builder, tmp, index_store);
   }
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_aos.c 
b/src/gallium/auxiliary/gallivm/lp_bld_format_aos.c
index 9f6b9e9..322e7b8 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_format_aos.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_format_aos.c
@@ -224,6 +224,7 @@ lp_build_unpack_arith_rgba_aos(struct gallivm_state 
*gallivm,
/* Ex: convert packed = {XYZW, XYZW, XYZW, XYZW}
 * into masked = {X, Y, Z, W}
 */
+   /* Note: we cannot do this shift on x86 natively until AVX2. */
shifted = LLVMBuildLShr(builder, packed, LLVMConstVector(shifts, 4), "");
masked = LLVMBuildAnd(builder, shifted, LLVMConstVector(masks, 4), "");
 
@@ -394,6 +395,7 @@ lp_build_fetch_rgba_aos(struct gallivm_state *gallivm,
util_is_power_of_two(format_desc->block.bits)) {
   LLVMValueRef packed;
   LLVMTypeRef dst_vec_type = lp_build_vec_type(gallivm, type);
+  struct lp_type fetch_type;
   unsigned vec_len = type.width * type.length;
 
   /*
@@ -401,8 +403,9 @@ lp_build_fetch_rgba_aos(struct gallivm_state *gallivm,
* scaling or converting.
*/
 
+  fetch_type = lp_type_uint(type.width*4);
   packed = lp_build_gather(gallivm, type.length/4,
-   format_desc->block.bits, type.width*4,
+   format_desc->block.bits, fetch_type,
aligned, base_ptr, offset, TRUE);
 
   assert(format_desc->block.bits <= vec_len);
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_aos_array.c 
b/src/gallium/auxiliary/gallivm/lp_bld_format_aos_array.c
index 8cad3a6..636a4a6 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_format_aos_array.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_format_aos_array.c
@@ -70,7 +70,14 @@ lp_build_fetch_rgba_aos_array(struct gallivm_state *gallivm,
 
src_vec_type  = lp_build_vec_type(gallivm,  src_type);
 
-   /* Read whole vector from memory, unaligned */
+   /*
+* Read whole vector from memory, unaligned.

[Mesa-dev] [PATCH 2/6] gallivm: optimize SoA AoS fallback fetch path a little

2016-12-11 Thread sroland
From: Roland Scheidegger 

We should do transpose, not extract/insert, at least with "sufficient" amount
of channels (for 4 channels, extract/insert shuffles generated otherwise look
truly terrifying). Albeit we shouldn't fallback to that so often in any case.
---
 src/gallium/auxiliary/gallivm/lp_bld_format_soa.c | 83 +++
 1 file changed, 70 insertions(+), 13 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
index 389bfa0..902c763 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
@@ -40,6 +40,39 @@
 #include "lp_bld_debug.h"
 #include "lp_bld_format.h"
 #include "lp_bld_arit.h"
+#include "lp_bld_pack.h"
+
+
+static void
+convert_to_soa(struct gallivm_state *gallivm,
+   LLVMValueRef src_aos[LP_MAX_VECTOR_WIDTH / 32],
+   LLVMValueRef dst_soa[4],
+   const struct lp_type soa_type)
+{
+   unsigned j, k;
+   struct lp_type aos_channel_type = soa_type;
+
+   LLVMValueRef aos_channels[4];
+   unsigned pixels_per_channel = soa_type.length / 4;
+
+   debug_assert((soa_type.length % 4) == 0);
+
+   aos_channel_type.length >>= 1;
+
+   for (j = 0; j < 4; ++j) {
+  LLVMValueRef channel[LP_MAX_VECTOR_LENGTH] = { 0 };
+
+  assert(pixels_per_channel <= LP_MAX_VECTOR_LENGTH);
+
+  for (k = 0; k < pixels_per_channel; ++k) {
+ channel[k] = src_aos[j + 4 * k];
+  }
+
+  aos_channels[j] = lp_build_concat(gallivm, channel, aos_channel_type, 
pixels_per_channel);
+   }
+
+   lp_build_transpose_aos(gallivm, soa_type, aos_channels, dst_soa);
+}
 
 
 void
@@ -48,9 +81,6 @@ lp_build_format_swizzle_soa(const struct 
util_format_description *format_desc,
 const LLVMValueRef *unswizzled,
 LLVMValueRef swizzled_out[4])
 {
-   assert(PIPE_SWIZZLE_0 == (int)PIPE_SWIZZLE_0);
-   assert(PIPE_SWIZZLE_1 == (int)PIPE_SWIZZLE_1);
-
if (format_desc->colorspace == UTIL_FORMAT_COLORSPACE_ZS) {
   enum pipe_swizzle swizzle;
   LLVMValueRef depth_or_stencil;
@@ -547,9 +577,11 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
{
   unsigned k, chan;
   struct lp_type tmp_type;
+  LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32];
+  boolean vec_transpose = FALSE;
 
   if (gallivm_debug & GALLIVM_DEBUG_PERF) {
- debug_printf("%s: scalar unpacking of %s\n",
+ debug_printf("%s: AoS fetch fallback for %s\n",
   __FUNCTION__, format_desc->short_name);
   }
 
@@ -560,12 +592,31 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
  rgba_out[chan] = lp_build_undef(gallivm, type);
   }
 
+  if (format_desc->nr_channels > 2 ||
+  format_desc->layout != UTIL_FORMAT_LAYOUT_PLAIN) {
+ /*
+  * Note that vector transpose can be worse. This is because
+  * llvm will ensure the missing channels have the correct
+  * values, in particular typically 1.0 for the last channel
+  * (if they are used or not doesn't matter, usually llvm can't
+  * figure this out here probably due to the transpose).
+  * But with the extract/insert path, since those missing elements
+  * were just directly inserted/extracted llvm can optimize this
+  * somewhat (though it still doesn't look great - and not for
+  * the compressed formats due to their external fetch funcs).
+  * So restrict to cases where we are sure it helps (albeit
+  * with 2 channels it MIGHT be worth it at least with AVX).
+  * In any case, this is just a bandaid, it does NOT replace proper
+  * SoA format unpack.
+  */
+ vec_transpose = TRUE;
+  }
+
   /* loop over number of pixels */
   for(k = 0; k < type.length; ++k) {
  LLVMValueRef index = lp_build_const_int32(gallivm, k);
  LLVMValueRef offset_elem;
  LLVMValueRef i_elem, j_elem;
- LLVMValueRef tmp;
 
  offset_elem = LLVMBuildExtractElement(builder, offset,
index, "");
@@ -574,20 +625,26 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
  j_elem = LLVMBuildExtractElement(builder, j, index, "");
 
  /* Get a single float[4]={R,G,B,A} pixel */
- tmp = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type,
-   aligned, base_ptr, offset_elem,
-   i_elem, j_elem, cache);
+ aos_fetch[k] = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type,
+aligned, base_ptr, offset_elem,
+i_elem, j_elem, cache);
 
  /*
   * Insert the AoS tmp value channels into the SoA result vectors at
   * position = 

[Mesa-dev] [PATCH 5/6] gallivm: generalize the compressed format soa fetch a bit

2016-12-11 Thread sroland
From: Roland Scheidegger 

This can now handle rgtc (unorm) too - this path no longer handles plain
formats, but that's unnecessary they now all have their proper SoA unpack
(this will still be dog-slow though due to the actual fetch being per-pixel
util fallbacks).
---
 src/gallium/auxiliary/gallivm/lp_bld_format_soa.c | 86 +--
 1 file changed, 49 insertions(+), 37 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
index 9550f26..68cbb10 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
@@ -733,64 +733,69 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
 
/*
 * Try calling lp_build_fetch_rgba_aos for all pixels.
+* Should only really hit subsampled, compressed
+* (for s3tc srgb too, for rgtc the unorm ones only) by now.
+* (This is invalid for plain 8unorm formats because we're lazy with
+* the swizzle since some results would arrive swizzled, some not.)
 */
 
-   if (util_format_fits_8unorm(format_desc) &&
+   if ((format_desc->layout != UTIL_FORMAT_LAYOUT_PLAIN) &&
+   (util_format_fits_8unorm(format_desc) ||
+format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC) &&
type.floating && type.width == 32 &&
(type.length == 1 || (type.length % 4 == 0))) {
   struct lp_type tmp_type;
-  LLVMValueRef tmp;
+  struct lp_build_context bld;
+  LLVMValueRef packed, rgba[4];
+  const struct util_format_description *flinear_desc;
+  const struct util_format_description *frgba8_desc;
+  unsigned chan;
 
+  lp_build_context_init(, gallivm, type);
+
+  /*
+   * Make sure the conversion in aos really only does convert to rgba8
+   * and not anything more (so use linear format, adjust type).
+   */
+  flinear_desc = util_format_description(util_format_linear(format));
   memset(_type, 0, sizeof tmp_type);
   tmp_type.width = 8;
   tmp_type.length = type.length * 4;
   tmp_type.norm = TRUE;
 
-  tmp = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type,
-aligned, base_ptr, offset, i, j, cache);
+  packed = lp_build_fetch_rgba_aos(gallivm, flinear_desc, tmp_type,
+   aligned, base_ptr, offset, i, j, cache);
+  packed = LLVMBuildBitCast(builder, packed, bld.int_vec_type, "");
 
-  lp_build_rgba8_to_fi32_soa(gallivm,
-type,
-tmp,
-rgba_out);
-
-  return;
-   }
-
-   if (format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC &&
-   /* non-srgb case is already handled above */
-   format_desc->colorspace == UTIL_FORMAT_COLORSPACE_SRGB &&
-   type.floating && type.width == 32 &&
-   (type.length == 1 || (type.length % 4 == 0)) &&
-   cache) {
-  const struct util_format_description *format_decompressed;
-  const struct util_format_description *flinear_desc;
-  LLVMValueRef packed;
-  flinear_desc = 
util_format_description(util_format_linear(format_desc->format));
-  /* This probably only works with aligned data */
-  packed = lp_build_fetch_cached_texels(gallivm,
-flinear_desc,
-type.length,
-base_ptr,
-offset,
-i, j,
-cache);
-  packed = LLVMBuildBitCast(builder, packed,
-lp_build_int_vec_type(gallivm, type), "");
   /*
-   * The values are now packed so they match ordinary srgb RGBA8 format,
+   * The values are now packed so they match ordinary (srgb) RGBA8 format,
* hence need to use matching format for unpack.
*/
-  format_decompressed = util_format_description(PIPE_FORMAT_R8G8B8A8_SRGB);
-
+  frgba8_desc = util_format_description(PIPE_FORMAT_R8G8B8A8_UNORM);
+  if (format_desc->colorspace == UTIL_FORMAT_COLORSPACE_SRGB) {
+ assert(format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC);
+ frgba8_desc = util_format_description(PIPE_FORMAT_R8G8B8A8_SRGB);
+  }
   lp_build_unpack_rgba_soa(gallivm,
-   format_decompressed,
+   frgba8_desc,
type,
-   packed, rgba_out);
+   packed, rgba);
 
+  /*
+   * We converted 4 channels. Make sure llvm can drop unneeded ones
+   * (luckily the rgba order is fixed, only la needs special case).
+   */
+  for (chan = 0; chan < 4; chan++) {
+ enum pipe_swizzle swizzle = format_desc->swizzle[chan];
+ if (chan == 3 && 

[Mesa-dev] [PATCH 4/6] gallivm: provide soa fetch path handling formats with more than 32bit

2016-12-11 Thread sroland
From: Roland Scheidegger 

This previously always fell back to AoS conversion. Even for 4-float formats
(which is the optimal case by far for that fallback case) this was suboptimal,
since it meant the conversion couldn't be done with 256bit vectors. While this
may still only be partly possible for some formats, (unless there's AVX2
support) at least the transpose can be done with half the unpacks
(and before using the transpose for AoS fallbacks, it was worse still).
With less than 4 channels, things got way worse with the AoS fallback
quickly even with 128bit vectors.
The strategy is pretty much the same as the existing one for formats
which fit into 32 bits, except there's now multiple vectors to be
fetched (2 or 4 to be exact), which need to be shuffled first (if it's 4
vectors, this amounts to a transpose, for 2 it's a bit different),
then the unpack is done the same (with the exception that the shift
of the channels is now modulo 32, and we need to select the right
vector).
In fact the most complex part about it is to get the shuffles right
for separating into lo/hi parts for AVX/AVX2...
This also makes use of the new ability of gather to use provided type
information, which we abuse to outsmart llvm so we get decent shuffles,
and to fetch 3x32bit vectors without having to ZExt the scalar.
And just because we can, we handle double formats too, albeit they are
a bit different (draw sometimes needs to handle that).
---
 src/gallium/auxiliary/gallivm/lp_bld_format_soa.c | 529 +++---
 1 file changed, 375 insertions(+), 154 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
index b3ea709..9550f26 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
@@ -31,6 +31,7 @@
 #include "util/u_format.h"
 #include "util/u_memory.h"
 #include "util/u_string.h"
+#include "util/u_math.h"
 
 #include "lp_bld_type.h"
 #include "lp_bld_const.h"
@@ -113,6 +114,166 @@ lp_build_format_swizzle_soa(const struct 
util_format_description *format_desc,
 }
 
 
+
+static LLVMValueRef
+lp_build_extract_soa_chan(struct lp_build_context *bld,
+  unsigned blockbits,
+  boolean srgb_chan,
+  struct util_format_channel_description chan_desc,
+  LLVMValueRef packed)
+{
+   struct gallivm_state *gallivm = bld->gallivm;
+   LLVMBuilderRef builder = gallivm->builder;
+   struct lp_type type = bld->type;
+   LLVMValueRef input = packed;
+   const unsigned width = chan_desc.size;
+   const unsigned start = chan_desc.shift;
+   const unsigned stop = start + width;
+
+   /* Decode the input vector component */
+
+   switch(chan_desc.type) {
+   case UTIL_FORMAT_TYPE_VOID:
+  input = bld->undef;
+  break;
+
+   case UTIL_FORMAT_TYPE_UNSIGNED:
+  /*
+   * Align the LSB
+   */
+  if (start) {
+ input = LLVMBuildLShr(builder, input,
+   lp_build_const_int_vec(gallivm, type, start), 
"");
+  }
+
+  /*
+   * Zero the MSBs
+   */
+  if (stop < blockbits) {
+ unsigned mask = ((unsigned long long)1 << width) - 1;
+ input = LLVMBuildAnd(builder, input,
+  lp_build_const_int_vec(gallivm, type, mask), "");
+  }
+
+  /*
+   * Type conversion
+   */
+  if (type.floating) {
+ if (srgb_chan) {
+struct lp_type conv_type = lp_uint_type(type);
+input = lp_build_srgb_to_linear(gallivm, conv_type, width, input);
+ }
+ else {
+if(chan_desc.normalized)
+   input = lp_build_unsigned_norm_to_float(gallivm, width, type, 
input);
+else
+   input = LLVMBuildSIToFP(builder, input, bld->vec_type, "");
+ }
+  }
+  else if (chan_desc.pure_integer) {
+ /* Nothing to do */
+  } else {
+  /* FIXME */
+  assert(0);
+  }
+  break;
+
+   case UTIL_FORMAT_TYPE_SIGNED:
+  /*
+   * Align the sign bit first.
+   */
+  if (stop < type.width) {
+ unsigned bits = type.width - stop;
+ LLVMValueRef bits_val = lp_build_const_int_vec(gallivm, type, bits);
+ input = LLVMBuildShl(builder, input, bits_val, "");
+  }
+
+  /*
+   * Align the LSB (with an arithmetic shift to preserve the sign)
+   */
+  if (chan_desc.size < type.width) {
+ unsigned bits = type.width - chan_desc.size;
+ LLVMValueRef bits_val = lp_build_const_int_vec(gallivm, type, bits);
+ input = LLVMBuildAShr(builder, input, bits_val, "");
+  }
+
+  /*
+   * Type conversion
+   */
+  if (type.floating) {
+ input = LLVMBuildSIToFP(builder, input, bld->vec_type, "");
+ if (chan_desc.normalized) {
+double scale = 1.0 / ((1 << (chan_desc.size - 

[Mesa-dev] [PATCH 6/6] draw: use SoA fetch, not AoS one

2016-12-11 Thread sroland
From: Roland Scheidegger 

Now that there's some SoA fetch which never falls back, we should usually get
results which are better or at least not worse (something like rgba32f will
stay the same). I suppose though it might be worse in some cases where the
format doesn't require conversion (e.g. rg32f) and goes straight to output -
if llvm was able to see through all shuffles then it might have been able
to do away with the aos->soa->aos transpose entirely which can no longer work
possibly except for 4-channel formats (due to replacing the undef channels
with 0/1 before the second transpose and not the first - llvm will
definitely not be able to figure that out). That might actually be quite
common, but I'm not sure llvm really could optimize it in the first place,
and if it's a problem we should just special case such inputs (though note
that if conversion is needed, it isn't obvious if it's better to skip
the transpose or do the conversion AoS-style).

For cases which get way better, think something like R16_UNORM with 8-wide
vectors: this was 8 sign-extend fetches, 8 cvt, 8 muls, followed by
a couple of shuffles to stitch things together (if it is smart enough,
6 unpacks) and then a (8-wide) transpose (not sure if llvm could even
optimize the shuffles + transpose, since the 16bit values were actually
sign-extended to 128bit before being cast to a float vec, so that would be
another 8 unpacks). Now that is just 8 fetches (directly inserted into
vector, albeit there's one 128bit insert needed), 1 cvt, 1 mul.
---
 src/gallium/auxiliary/draw/draw_llvm.c | 54 +-
 1 file changed, 40 insertions(+), 14 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 19b75a5..f895b76 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -755,11 +755,9 @@ fetch_vector(struct gallivm_state *gallivm,
  LLVMValueRef *inputs,
  LLVMValueRef indices)
 {
-   LLVMValueRef zero = LLVMConstNull(LLVMInt32TypeInContext(gallivm->context));
LLVMBuilderRef builder = gallivm->builder;
struct lp_build_context blduivec;
LLVMValueRef offset, valid_mask;
-   LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32];
unsigned i;
 
lp_build_context_init(, gallivm, lp_uint_type(vs_type));
@@ -783,21 +781,49 @@ fetch_vector(struct gallivm_state *gallivm,
}
 
/*
-* Note: we probably really want to use SoA fetch, not AoS one (albeit
-* for most formats it will amount to the same as this isn't very
-* optimized). But looks dangerous since it assumes alignment.
+* Use SoA fetch. This should produce better code usually.
+* Albeit it's possible there's exceptions (in particular if the fetched
+* value is going directly to output if it's something like RG32F).
 */
-   for (i = 0; i < vs_type.length; i++) {
-  LLVMValueRef offset1, elem;
-  elem = lp_build_const_int32(gallivm, i);
-  offset1 = LLVMBuildExtractElement(builder, offset, elem, "");
+   if (1) {
+  struct lp_type res_type = vs_type;
+  /* The type handling is annoying here... */
+  if (format_desc->colorspace == UTIL_FORMAT_COLORSPACE_RGB &&
+  format_desc->channel[0].pure_integer) {
+ if (format_desc->channel[0].type == UTIL_FORMAT_TYPE_SIGNED) {
+res_type = lp_type_int_vec(vs_type.width, vs_type.width * 
vs_type.length);
+ }
+ else if (format_desc->channel[0].type == UTIL_FORMAT_TYPE_UNSIGNED) {
+res_type = lp_type_uint_vec(vs_type.width, vs_type.width * 
vs_type.length);
+ }
+  }
 
-  aos_fetch[i] = lp_build_fetch_rgba_aos(gallivm, format_desc,
- lp_float32_vec4_type(),
- FALSE, map_ptr, offset1,
- zero, zero, NULL);
+  lp_build_fetch_rgba_soa(gallivm, format_desc,
+  res_type, FALSE, map_ptr, offset,
+  blduivec.zero, blduivec.zero,
+  NULL, inputs);
+
+  for (i = 0; i < TGSI_NUM_CHANNELS; i++) {
+ inputs[i] = LLVMBuildBitCast(builder, inputs[i],
+  lp_build_vec_type(gallivm, vs_type), "");
+  }
+
+   }
+   else {
+  LLVMValueRef zero = 
LLVMConstNull(LLVMInt32TypeInContext(gallivm->context));
+  LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32];
+  for (i = 0; i < vs_type.length; i++) {
+ LLVMValueRef offset1, elem;
+ elem = lp_build_const_int32(gallivm, i);
+ offset1 = LLVMBuildExtractElement(builder, offset, elem, "");
+
+ aos_fetch[i] = lp_build_fetch_rgba_aos(gallivm, format_desc,
+lp_float32_vec4_type(),
+FALSE, map_ptr, offset1,
+   

[Mesa-dev] [PATCH 1/6] gallivm: (trivial) handle non-aligned fetch for lp_build_fetch_rgba_soa

2016-12-11 Thread sroland
From: Roland Scheidegger 

soa fetch so far always assumed that data was aligned. However, we want to
use this for vertex fetch, and data might not be aligned there, so handle
it in this path too (basically just pass through alignment through to other
functions). (It looks like it wouldn't work for for cached s3tc but this is
no different than with AoS fetch.)
---
 src/gallium/auxiliary/gallivm/lp_bld_format.h |  1 +
 src/gallium/auxiliary/gallivm/lp_bld_format_soa.c | 15 +--
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  4 ++--
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format.h 
b/src/gallium/auxiliary/gallivm/lp_bld_format.h
index 5c866f4..6540caa 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_format.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_format.h
@@ -143,6 +143,7 @@ void
 lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
 const struct util_format_description *format_desc,
 struct lp_type type,
+boolean aligned,
 LLVMValueRef base_ptr,
 LLVMValueRef offsets,
 LLVMValueRef i,
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
index 7444c51..389bfa0 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c
@@ -349,6 +349,7 @@ lp_build_rgba8_to_fi32_soa(struct gallivm_state *gallivm,
  *
  * \param type  the desired return type for 'rgba'.  The vector length
  *  is the number of texels to fetch
+ * \param aligned if the offset is guaranteed to be aligned to element width
  *
  * \param base_ptr  points to the base of the texture mip tree.
  * \param offsetoffset to start of the texture image block.  For non-
@@ -365,6 +366,7 @@ void
 lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
 const struct util_format_description *format_desc,
 struct lp_type type,
+boolean aligned,
 LLVMValueRef base_ptr,
 LLVMValueRef offset,
 LLVMValueRef i,
@@ -402,7 +404,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
type.length,
format_desc->block.bits,
type.width,
-   TRUE,
+   aligned,
base_ptr, offset, FALSE);
 
   /*
@@ -428,7 +430,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
 
   packed = lp_build_gather(gallivm, type.length,
format_desc->block.bits,
-   type.width, TRUE,
+   type.width, aligned,
base_ptr, offset, FALSE);
   if (format_desc->format == PIPE_FORMAT_R11G11B10_FLOAT) {
  lp_build_r11g11b10_to_float(gallivm, packed, rgba_out);
@@ -456,14 +458,14 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
  LLVMValueRef s_offset = lp_build_const_int_vec(gallivm, type, 4);
  offset = LLVMBuildAdd(builder, offset, s_offset, "");
  packed = lp_build_gather(gallivm, type.length, 32, type.width,
-  TRUE, base_ptr, offset, FALSE);
+  aligned, base_ptr, offset, FALSE);
  packed = LLVMBuildAnd(builder, packed,
lp_build_const_int_vec(gallivm, type, mask), 
"");
   }
   else {
  assert (format_desc->format == PIPE_FORMAT_Z32_FLOAT_S8X24_UINT);
  packed = lp_build_gather(gallivm, type.length, 32, type.width,
-  TRUE, base_ptr, offset, TRUE);
+  aligned, base_ptr, offset, TRUE);
  packed = LLVMBuildBitCast(builder, packed,
lp_build_vec_type(gallivm, type), "");
   }
@@ -489,7 +491,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
   tmp_type.norm = TRUE;
 
   tmp = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type,
-TRUE, base_ptr, offset, i, j, cache);
+aligned, base_ptr, offset, i, j, cache);
 
   lp_build_rgba8_to_fi32_soa(gallivm,
 type,
@@ -509,6 +511,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
   const struct util_format_description *flinear_desc;
   LLVMValueRef packed;
   flinear_desc = 
util_format_description(util_format_linear(format_desc->format));
+  /* This probably only works with aligned data */
   packed = lp_build_fetch_cached_texels(gallivm,
 

[Mesa-dev] [Bug 99055] Games hang / freeze completely

2016-12-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99055

--- Comment #2 from Timothy Arceri  ---
(In reply to Etienne Bruines from comment #0)
> 
> I'd be happy to provide any additional information.

The most helpful thing you could do is build mesa from source and bisect the
commit that introduces the hang.

I'd recommend building mesa from the master branch to start with to see if its
still in master of if this is something introduced in the stable branch only.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Clamp GetUniformuiv values to be >= 0.

2016-12-11 Thread Kenneth Graunke
Section 2.2.2 (Data Conversions For State Query Commands) of the
OpenGL 4.5 October 24th 2016 specification says:

"If a command returning unsigned integer data is called, such as
 GetSamplerParameterIuiv, negative values are clamped to zero."

Fixes GL44-CTS.gpu_shader_fp64.state_query.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/main/uniform_query.cpp | 48 +
 1 file changed, 39 insertions(+), 9 deletions(-)

Hey Nicolai,

I wrote a similar patch a while back, but never got around to sending it,
since I realized that the gl45release branch expects our current behavior,
and the change to make the CTS expect clamping is only on the master branch.

Apparently I made some additional changes, compared to yours.  I figured
I'd send this along and let you see if you think any of my extra changes
are still necessary.  If so, feel free to fold them into your patch.

I also think we need to fix several other glGet* commands...it's just that
this is the only one currently tested.  A bunch work because the values
returned can't be negative.

 --Ken

diff --git a/src/mesa/main/uniform_query.cpp b/src/mesa/main/uniform_query.cpp
index db700df..f05a29f 100644
--- a/src/mesa/main/uniform_query.cpp
+++ b/src/mesa/main/uniform_query.cpp
@@ -347,14 +347,10 @@ _mesa_get_uniform(struct gl_context *ctx, GLuint program, 
GLint location,
* just memcpy the data.  If the types are not compatible, perform a
* slower convert-and-copy process.
*/
-  if (returnType == uni->type->base_type
- || ((returnType == GLSL_TYPE_INT
-  || returnType == GLSL_TYPE_UINT)
- &&
- (uni->type->base_type == GLSL_TYPE_INT
-  || uni->type->base_type == GLSL_TYPE_UINT
-   || uni->type->base_type == GLSL_TYPE_SAMPLER
-   || uni->type->base_type == GLSL_TYPE_IMAGE))) {
+  if (returnType == uni->type->base_type ||
+  ((returnType == GLSL_TYPE_INT || returnType == GLSL_TYPE_UINT) &&
+   (uni->type->base_type == GLSL_TYPE_SAMPLER ||
+uni->type->base_type == GLSL_TYPE_IMAGE))) {
 memcpy(paramsOut, src, bytes);
   } else {
 union gl_constant_value *const dst =
@@ -422,7 +418,6 @@ _mesa_get_uniform(struct gl_context *ctx, GLuint program, 
GLint location,
   }
   break;
case GLSL_TYPE_INT:
-   case GLSL_TYPE_UINT:
   switch (uni->type->base_type) {
   case GLSL_TYPE_FLOAT:
  /* While the GL 3.2 core spec doesn't explicitly
@@ -447,6 +442,9 @@ _mesa_get_uniform(struct gl_context *ctx, GLuint program, 
GLint location,
   case GLSL_TYPE_BOOL:
  dst[didx].i = src[sidx].i ? 1 : 0;
  break;
+   case GLSL_TYPE_UINT:
+  dst[didx].i = src[sidx].i;
+  break;
case GLSL_TYPE_DOUBLE: {
   double tmp;
   memcpy(, [sidx].f, sizeof(tmp));
@@ -458,6 +456,38 @@ _mesa_get_uniform(struct gl_context *ctx, GLuint program, 
GLint location,
  break;
   }
   break;
+case GLSL_TYPE_UINT:
+   switch (uni->type->base_type) {
+   case GLSL_TYPE_FLOAT:
+  /* The spec isn't terribly clear how to handle negative
+   * values with an unsigned return type.
+   *
+   * GL 4.5 section 2.2.2 ("Data Conversions for State
+   * Query Commands") says:
+   *
+   * "If a value is so large in magnitude that it cannot be
+   *  represented by the returned data type, then the nearest
+   *  value representable using the requested type is
+   *  returned."
+   */
+  dst[didx].i = src[sidx].f < 0.0f ? 0 : IROUND(src[sidx].f);
+  break;
+   case GLSL_TYPE_BOOL:
+  dst[didx].i = src[sidx].i ? 1 : 0;
+  break;
+   case GLSL_TYPE_INT:
+  dst[didx].i = MAX2(src[sidx].i, 0);
+  break;
+   case GLSL_TYPE_DOUBLE: {
+  double tmp;
+  memcpy(, [sidx].f, sizeof(tmp));
+  dst[didx].i = tmp < 0.0 ? 0 : IROUNDD(tmp);
+  break;
+   }
+   default:
+  unreachable("invalid uniform type");
+   }
+   break;
 
default:
   assert(!"Should not get here.");
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Print out cycle estimates at the start of block annotations.

2016-12-11 Thread Matt Turner
Good plan.

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-12-11 Thread Matt Turner

i965/vec4: handle 32 and 64 bit channels in liveness analysis

Please indent the returned multiline expressions in
var_from_reg() like we do elsewhere, so that the second line
begins on the same column as the first line.

*/ goes on its own line.

I'm having a hard time reviewing this one. The logic is rather
complex. I'll ask someone to help me review it on Monday at the
office.

i965/vec4: add a horiz_offset() helper
i965: move the group field from fs_inst to backend_instruction.
i965/vec4: add a SIMD lowering pass

In the commit message, you say

For now the pass only handles the gen7 restriction where any
instruction that writes 2 registers also needs to read 2
registers.  This affects double-precision instructions
reading uniforms, for example. Later patches will extend the
lowering pass adding a few more cases.

But the rule about if-writing-two-regs, must-read-two-regs
says that scalar sources are an exception:

"When source is scalar, the source registers are not
 incremented."

I don't see any code that allows us to avoid splitting an
instruction if it's writing two registers but sourcing a scalar
uniform. Maybe this doesn't apply because we have to use a non
scalar swizzle (.xy) to access a single fp64 component?

i965/vec4: make the generator set correct NibCtrl for SIMD4 DF instructions
i965/vec4: dump NibCtrl for instructions with execsize != 8
i965/disasm: print NibCtrl for instructions with execsize < 8
i965/vec4: teach CSE about exec_size, group and doubles
i965/vec4: teach cmod propagation about different execution sizes
i965/vec4: split double-precision bcsel

bcsel is the NIR opcode. I'd change references to bcsel to SEL.

Very interesting find...

i965/vec4: add a scalarization pass for double-precision instructions

Don't indent case inside a switch.

i965/vec4: translate 64-bit swizzles to 32-bit
i965/vec4: implement access to DF source components Z/W

Wow, bien hecho!

i965/disasm: fix subreg for dst in Align16 mode
i965/vec4: teach register coalescing about 64-bit
i965/vec4: fix pack_uniform_registers for doubles
i965/vec4: fix indentation in pack_uniform_registers
i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands

s/need/needs/ in the comment.

i965/vec4/nir: do not emit 64-bit MAD
i965/vec4: do not emit 64-bit MAD

I might change the name of this commit to "i965/vec4: Lower
64-bit MAD" or "i965/vec4: Lower DF MAD"

I think I'd change the name of the function as well, maybe to
lower_64bit_mad[_to_mul_add] or something.

i965/vec4: support multiple dispatch widths and groups in the IR builder.
i965/vec4: Add a shuffle_64bit_data helper

I was initially confused by r0.0:DF/r0.1:DF, thinking that .1 in
r0.1:DF was a subreg offset. But I think it's actually the
register offset (i.e., .offset)?

If that's the case, I think it would be clearer just to
increment the register number in the example:

r0.0:DF  x0 y0 z0 w0
r1.0:DF  x1 y1 z1 w1

s/opperation/operation/ in the comment.

On the multiline bld.group(...), I think Curro's style is to
align with the '.'. For instance,

inst = bld.group(4, for_write ? 1 : 0)
  .MOV(writemask(dst, WRITEMASK_ZW),
   swizzle(byte_offset(src, REG_SIZE), BRW_SWIZZLE_XYXY));

so that group and MOV align, with the '.' on the same line as
the MOV.

i965/vec4: Fix UBO loads for 64-bit data
i965/vec4: Fix SSBO loads for 64-bit data
i965/vec4: Fix SSBO stores for 64-bit data
i965/vec4: don't constant propagate 64-bit immediates
i965/vec4: prevent copy-propagation from values with a different type size
i965/vec4: Prevent copy propagation from violating pre-gen8 restrictions

Similar comment as before about being allowed to write two
registers while sourcing a scalar. Maybe doesn't apply because
of the double swizzle.

i965/vec4: don't propagate single-precision uniforms into 4-wide instructions
i965/vec4: don't copy propagate misaligned registers
i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms

I don't see this in the BDW PRMs, and the internal documentation
says that it applies to "CHV, BXT"

I suggest dropping this patch (or replacing it with one that
adds || devinfo->is_broxton).

i965/vec4: Do not use DepCtrl with 64-bit instructions
i965/vec4: do not split scratch read/write opcodes
i965/vec4: fix scratch offset for 64bit data
i965/vec4: fix scratch reads for 64bit data
i965/vec4: fix scratch writes for 64bit data
i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-bit data
i965/vec4: fix indentation in 

[Mesa-dev] [PATCH] i965: Print out cycle estimates at the start of block annotations.

2016-12-11 Thread Kenneth Graunke
We now print

   START B15 <-B14 (42774 cycles)

indicating that we estimate B15 will take 42,774 cycles.  Printing
this should make it easier where time is spent in the program.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/intel_asm_annotation.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_asm_annotation.c 
b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
index b01490a..bb24e5e 100644
--- a/src/mesa/drivers/dri/i965/intel_asm_annotation.c
+++ b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
@@ -45,7 +45,7 @@ dump_assembly(void *assembly, int num_annotations, struct 
annotation *annotation
 struct bblock_t *predecessor_block = predecessor_link->block;
 fprintf(stderr, " <-B%d", predecessor_block->num);
  }
- fprintf(stderr, "\n");
+ fprintf(stderr, " (%u cycles)\n", 
annotation[i].block_start->cycle_count);
   }
 
   if (last_annotation_ir != annotation[i].ir) {
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 84/84 v3] st/nine: Implement new buffer upload path

2016-12-11 Thread Axel Davy
This new buffer upload path enables to lock
faster than the normal path when using
DISCARD/NOOVERWRITE.

v2: Diverse cleanups and fixes.
v3: Fix allocation size for 'lone' buffers and
add more debug info.

Signed-off-by: Axel Davy 
---
This fixes the reported 'leak' (which was not a leak
but the code allocating buffers too big).
I'm still interested in feedback though.
 src/gallium/state_trackers/nine/Makefile.sources   |   2 +
 src/gallium/state_trackers/nine/buffer9.c  |  75 +-
 src/gallium/state_trackers/nine/buffer9.h  |   9 +-
 src/gallium/state_trackers/nine/device9.c  |   5 +
 src/gallium/state_trackers/nine/device9.h  |   3 +
 src/gallium/state_trackers/nine/indexbuffer9.c |  10 +-
 src/gallium/state_trackers/nine/indexbuffer9.h |   2 -
 .../state_trackers/nine/nine_buffer_upload.c   | 293 +
 .../state_trackers/nine/nine_buffer_upload.h   |  59 +
 src/gallium/state_trackers/nine/vertexbuffer9.c|   3 +-
 10 files changed, 440 insertions(+), 21 deletions(-)
 create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.c
 create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.h

diff --git a/src/gallium/state_trackers/nine/Makefile.sources 
b/src/gallium/state_trackers/nine/Makefile.sources
index 2bb08a2..56698a1 100644
--- a/src/gallium/state_trackers/nine/Makefile.sources
+++ b/src/gallium/state_trackers/nine/Makefile.sources
@@ -23,6 +23,8 @@ C_SOURCES := \
indexbuffer9.h \
iunknown.c \
iunknown.h \
+   nine_buffer_upload.c \
+   nine_buffer_upload.h \
nine_csmt_helper.h \
nine_debug.c \
nine_debug.h \
diff --git a/src/gallium/state_trackers/nine/buffer9.c 
b/src/gallium/state_trackers/nine/buffer9.c
index 9a7c30c..203e1d9 100644
--- a/src/gallium/state_trackers/nine/buffer9.c
+++ b/src/gallium/state_trackers/nine/buffer9.c
@@ -23,6 +23,7 @@
 
 #include "buffer9.h"
 #include "device9.h"
+#include "nine_buffer_upload.h"
 #include "nine_helpers.h"
 #include "nine_pipe.h"
 
@@ -100,6 +101,9 @@ NineBuffer9_ctor( struct NineBuffer9 *This,
 else
 info->usage = PIPE_USAGE_DYNAMIC;
 
+/* When Writeonly is not set, we don't want to enable the
+ * optimizations */
+This->discard_nooverwrite_only = !!(Usage & D3DUSAGE_WRITEONLY);
 /* if (pDesc->Usage & D3DUSAGE_DONOTCLIP) { } */
 /* if (pDesc->Usage & D3DUSAGE_NONSECURE) { } */
 /* if (pDesc->Usage & D3DUSAGE_NPATCHES) { } */
@@ -161,12 +165,18 @@ NineBuffer9_dtor( struct NineBuffer9 *This )
 list_del(>managed.list2);
 }
 
+if (This->buf)
+nine_upload_release_buffer(This->base.base.device->buffer_upload, 
This->buf);
+
 NineResource9_dtor(>base);
 }
 
 struct pipe_resource *
-NineBuffer9_GetResource( struct NineBuffer9 *This )
+NineBuffer9_GetResource( struct NineBuffer9 *This, unsigned *offset )
 {
+if (This->buf)
+return nine_upload_buffer_resource_and_offset(This->buf, offset);
+*offset = 0;
 return NineResource9_GetResource(>base);
 }
 
@@ -260,6 +270,8 @@ NineBuffer9_Lock( struct NineBuffer9 *This,
 if (Flags & D3DLOCK_DONOTWAIT && !(This->base.usage & D3DUSAGE_DYNAMIC))
 usage |= PIPE_TRANSFER_DONTBLOCK;
 
+This->discard_nooverwrite_only &= !!(Flags & (D3DLOCK_DISCARD | 
D3DLOCK_NOOVERWRITE));
+
 if (This->nmaps == This->maxmaps) {
 struct NineTransfer *newmaps =
 REALLOC(This->maps, sizeof(struct NineTransfer)*This->maxmaps,
@@ -271,8 +283,52 @@ NineBuffer9_Lock( struct NineBuffer9 *This,
 This->maps = newmaps;
 }
 
-This->maps[This->nmaps].is_pipe_secondary = FALSE;
+if (This->buf &&
+(!This->discard_nooverwrite_only ||
+ (This->discard_nooverwrite_only && (Flags & D3DLOCK_DISCARD {
+/* Release previous buffer */
+if (This->nmaps >= 1)
+This->maps[This->nmaps-1].should_destroy_buf = true;
+else
+nine_upload_release_buffer(device->buffer_upload, This->buf);
+This->buf = NULL;
+/* Mapping a nine_subbuffer without discard/nooverwrite.
+ * Since that will hurt (we'll wait the entire buffer
+ * to be idle), we'll won't nine_subbuffer for this buffer
+ * in fear that happens again. */
+if (!This->discard_nooverwrite_only) {
+DBG("Disabling nine_subbuffer for a buffer having"
+"used a nine_subbuffer buffer\n");
+/* Rebind previous resource */
+NineBuffer9_RebindIfRequired(This, device);
+}
+}
+
+This->maps[This->nmaps].transfer = NULL;
+This->maps[This->nmaps].is_pipe_secondary = false;
+This->maps[This->nmaps].buf = NULL;
+This->maps[This->nmaps].should_destroy_buf = false;
+
+if (This->discard_nooverwrite_only) {
+if (!This->buf) {
+This->buf = nine_upload_create_buffer(device->buffer_upload, 
This->base.info.width0);
+ 

Re: [Mesa-dev] [PATCH] nv50/ir: use OPCLASS_SURFACE for SUSTB

2016-12-11 Thread Ilia Mirkin
I don't think we use SUSTB anyways... it was used in the old
handleSTORE logic in the tgsi -> nvir converter, but wtvr. I have a
feeling it's the sort of thing that might matter for nv50 (vs nvc0),
but for now this is fine. (Esp since the opclasses don't matter on
nv50.)

On Sun, Dec 11, 2016 at 5:44 PM, Samuel Pitoiset
 wrote:
> Found by inspection, probably a typo because a surface store is
> definitely not an atomic operation.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
> index 273ec34..789dba4 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
> @@ -109,7 +109,7 @@ const OpClass Target::operationClass[] =
> OPCLASS_TEXTURE, OPCLASS_TEXTURE, OPCLASS_TEXTURE, OPCLASS_TEXTURE,
> OPCLASS_TEXTURE, OPCLASS_TEXTURE,
> // SULDB, SULDP, SUSTB, SUSTP; SUREDB, SUREDP, SULEA
> -   OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_ATOMIC, OPCLASS_SURFACE,
> +   OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE,
> OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE,
> // SUBFM, SUCLAMP, SUEAU, SUQ, MADSP
> OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_ARITH,
> --
> 2.10.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nv50/ir: use OPCLASS_SURFACE for SUSTB

2016-12-11 Thread Samuel Pitoiset
Found by inspection, probably a typo because a surface store is
definitely not an atomic operation.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
index 273ec34..789dba4 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
@@ -109,7 +109,7 @@ const OpClass Target::operationClass[] =
OPCLASS_TEXTURE, OPCLASS_TEXTURE, OPCLASS_TEXTURE, OPCLASS_TEXTURE,
OPCLASS_TEXTURE, OPCLASS_TEXTURE,
// SULDB, SULDP, SUSTB, SUSTP; SUREDB, SUREDP, SULEA
-   OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_ATOMIC, OPCLASS_SURFACE,
+   OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE,
OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE,
// SUBFM, SUCLAMP, SUEAU, SUQ, MADSP
OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_ARITH,
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99055] Games hang / freeze completely

2016-12-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99055

--- Comment #1 from Etienne Bruines  ---
Update: apitrace did not magically fix it, but it did delay the symptom by a
few hours.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i965: fix release build unused variable warning

2016-12-11 Thread Grazvydas Ignotas
Signed-off-by: Grazvydas Ignotas 
---
v2: take Eduardo Lima's suggestion further - remove the variable completely
no commit access

 src/mesa/drivers/dri/i965/brw_blorp.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 4c1d858..43ac3be 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -813,8 +813,6 @@ do_single_blorp_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   can_fast_clear = false;
 
const unsigned logical_layer = irb_logical_mt_layer(irb);
-   const bool is_lossless_compressed = intel_miptree_is_lossless_compressed(
-  brw, irb->mt);
const enum intel_fast_clear_state fast_clear_state =
   intel_miptree_get_fast_clear_state(irb->mt, irb->mt_level,
  logical_layer);
@@ -850,7 +848,7 @@ do_single_blorp_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
* it now.
*/
   if (!irb->mt->mcs_buf) {
- assert(!is_lossless_compressed);
+ assert(!intel_miptree_is_lossless_compressed(brw, irb->mt));
  if (!intel_miptree_alloc_non_msrt_mcs(brw, irb->mt, false)) {
 /* MCS allocation failed--probably this will only happen in
  * out-of-memory conditions.  But in any case, try to recover
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/aubinator: fix 32bit shift overflow warning

2016-12-11 Thread Eduardo Lima Mitev
Looks good.

Reviewed-by: Eduardo Lima Mitev 

Will push this and the other ones I reviewed.

Thanks!

Eduardo

On 12/10/2016 08:10 PM, Grazvydas Ignotas wrote:
> Doesn't look like this can work on 32bit, just rids of annoying
> warning.
> 
> Signed-off-by: Grazvydas Ignotas 
> ---
> no commit access, somebody please push
> 
>  src/intel/tools/aubinator.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> index 5e3a684..e2bec8f 100644
> --- a/src/intel/tools/aubinator.c
> +++ b/src/intel/tools/aubinator.c
> @@ -1297,7 +1297,7 @@ int main(int argc, char *argv[])
>file = aub_file_open(input_file);
>  
> /* mmap a terabyte for our gtt space. */
> -   gtt_size = 1ul << 40;
> +   gtt_size = 1ull << 40;
> gtt = mmap(NULL, gtt_size, PROT_READ | PROT_WRITE,
>MAP_PRIVATE | MAP_ANONYMOUS |  MAP_NORESERVE, -1, 0);
> if (gtt == MAP_FAILED) {
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: fix release build unused variable warnings

2016-12-11 Thread Eduardo Lima Mitev
Looks good. I don't remember the MAYBE_UNUSED macro being used that much
in mesa opengl, but it seems it is trendy in vulkan code.

Reviewed-by: Eduardo Lima Mitev 

On 12/10/2016 08:10 PM, Grazvydas Ignotas wrote:
> Signed-off-by: Grazvydas Ignotas 
> ---
> no commit access, somebody please push
> 
>  src/intel/vulkan/anv_blorp.c   | 3 ++-
>  src/intel/vulkan/genX_cmd_buffer.c | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index 159e4a0..b431d6a 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -909,7 +909,8 @@ anv_cmd_buffer_alloc_blorp_binding_table(struct 
> anv_cmd_buffer *cmd_buffer,
>   state_offset);
> if (bt_state.map == NULL) {
>/* We ran out of space.  Grab a new binding table block. */
> -  VkResult result = anv_cmd_buffer_new_binding_table_block(cmd_buffer);
> +  MAYBE_UNUSED VkResult result =
> + anv_cmd_buffer_new_binding_table_block(cmd_buffer);
>assert(result == VK_SUCCESS);
>  
>/* Re-emit state base addresses so we get the new surface state base
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> b/src/intel/vulkan/genX_cmd_buffer.c
> index f761d9a..6131cfb 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -697,7 +697,7 @@ genX(cmd_buffer_config_l3)(struct anv_cmd_buffer 
> *cmd_buffer,
> assert(!urb_low_bw || cfg->n[GEN_L3P_URB] == cfg->n[GEN_L3P_SLM]);
>  
> /* Minimum number of ways that can be allocated to the URB. */
> -   const unsigned n0_urb = (devinfo->is_baytrail ? 32 : 0);
> +   MAYBE_UNUSED const unsigned n0_urb = devinfo->is_baytrail ? 32 : 0;
> assert(cfg->n[GEN_L3P_URB] >= n0_urb);
>  
> uint32_t l3sqcr1, l3cr2, l3cr3;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: fix release build unused variable warning

2016-12-11 Thread Eduardo Lima Mitev
The patch looks good but now that we are at it, I would move the
definition and assignment of 'is_lossless_compressed' to the block where
it is used, right before the assert. Right now we are always defining
and assigning the variable, but it is consumed conditionally.

With that changed, patch is:

Reviewed-by: Eduardo Lima Mitev 

Thanks!

Should I push this and the other patches you submitted and I reviewed?

Eduardo

On 12/10/2016 08:10 PM, Grazvydas Ignotas wrote:
> Signed-off-by: Grazvydas Ignotas 
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> index 4c1d858..8b5750c 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> @@ -813,8 +813,8 @@ do_single_blorp_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
>can_fast_clear = false;
>  
> const unsigned logical_layer = irb_logical_mt_layer(irb);
> -   const bool is_lossless_compressed = intel_miptree_is_lossless_compressed(
> -  brw, irb->mt);
> +   MAYBE_UNUSED const bool is_lossless_compressed =
> +  intel_miptree_is_lossless_compressed(brw, irb->mt);
> const enum intel_fast_clear_state fast_clear_state =
>intel_miptree_get_fast_clear_state(irb->mt, irb->mt_level,
>   logical_layer);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: don't attempt to unlock an unlocked debug state mutex

2016-12-11 Thread Eduardo Lima Mitev
Looks good.

Reviewed-by: Eduardo Lima Mitev 

On 12/11/2016 04:42 PM, Jonathan Gray wrote:
> Commit 929fcee47e46781c57f2a354ce0a013915c033d1 introduced code that
> attempts to unlock an unlocked mutex which is undefined behaviour.
> 
> On OpenBSD this leads to an abort:
> 
> 0  0x124dadfa96ba in thrkill () at :2
> 1  0x124dadf3da39 in *_libc_abort () at 
> /usr/src/lib/libc/stdlib/abort.c:52
> 2  0x124d2c1165b5 in *_libpthread_pthread_mutex_unlock (mutexp= out>)
> at /usr/src/lib/librthread/rthread_sync.c:221
> 3  0x124d279c02e4 in init_attrib_groups (ctx=0x124df0fda000) at 
> main/context.c:825
> 4  _mesa_initialize_context (ctx=ctx@entry=0x124df0fda000, 
> api=api@entry=API_OPENGL_CORE,
> visual=visual@entry=0x7f7bdfd0, share_list=share_list@entry=0x0,
> driverFunctions=driverFunctions@entry=0x7f7bda60) at 
> main/context.c:1204
> 5  0x124d27b507ec in st_create_context (api=api@entry=API_OPENGL_CORE,
> pipe=pipe@entry=0x124dc491, visual=visual@entry=0x7f7bdfd0,
> share=share@entry=0x0, options=options@entry=0x7f7be128)
> at state_tracker/st_context.c:545
> 6  0x124d27b8639f in st_api_create_context (stapi=,
> smapi=0x124d1b608800, attribs=0x7f7be100, error=0x7f7be0fc, 
> shared_stctxi=0x0)
> at state_tracker/st_manager.c:669
> 7  0x124d27cc5b9c in dri_create_context (api=, 
> visual=0x124d8a0f8a00,
> cPriv=0x124de473f240, major_version=, 
> minor_version=,
> flags=, notify_reset=false, error=0x7f7be2b4,
> sharedContextPrivate=0x0) at dri_context.c:123
> 8  0x124d27cc5029 in driCreateContextAttribs (screen=0x124d8a0f8400,
> api=, config=0x124d8a0f8a00, shared=,
> num_attribs=, attribs=, 
> error=0x7f7be2b4,
> data=0x124d77814a00) at dri_util.c:448
> 9  0x124d8e109b00 in drisw_create_context_attribs (base=0x124df3e08700,
> config_base=0x124d7a0e7300, shareList=, 
> num_attribs=,
> attribs=, error=0x7f7be2b4) at drisw_glx.c:476
> 10 0x124d8e104b4a in glXCreateContextAttribsARB (dpy=0x124d533f,
> config=0x124d7a0e7300, share_context=0x0, direct=1, 
> attrib_list=0x7f7be300)
> at create_context.c:78
> 
> Signed-off-by: Jonathan Gray 
> ---
>  src/mesa/main/debug_output.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c
> index 48dbbb3..bc933db 100644
> --- a/src/mesa/main/debug_output.c
> +++ b/src/mesa/main/debug_output.c
> @@ -1282,14 +1282,13 @@ _mesa_init_debug_output(struct gl_context *ctx)
> */
>struct gl_debug_state *debug = _mesa_lock_debug_state(ctx);
>if (!debug) {
> - goto done;
> + return;
>}
>debug->DebugOutput = GL_TRUE;
>debug->LogToStderr = GL_TRUE;
>ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT;
> +  _mesa_unlock_debug_state(ctx);
> }
> -done:
> -   _mesa_unlock_debug_state(ctx);
>  }
>  
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965/blit: Fix the src dimension sanity check in miptree_copy

2016-12-11 Thread Pohjolainen, Topi
On Sun, Dec 11, 2016 at 07:56:59AM -0800, Jason Ekstrand wrote:
>On Dec 11, 2016 12:28 AM, "Pohjolainen, Topi"
><[1]topi.pohjolai...@gmail.com> wrote:
> 
>On Tue, Dec 06, 2016 at 12:37:45PM -0800, Jason Ekstrand wrote:
>> Cc: "13.0" <[2]mesa-sta...@lists.freedesktop.org>
>> ---
>>  src/mesa/drivers/dri/i965/intel_blit.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/intel_blit.c
>b/src/mesa/drivers/dri/i965/intel_blit.c
>> index 03a35ee..5f0cf74 100644
>> --- a/src/mesa/drivers/dri/i965/intel_blit.c
>> +++ b/src/mesa/drivers/dri/i965/intel_blit.c
>> @@ -421,8 +421,10 @@ intel_miptree_copy(struct brw_context *brw,
>>
>>assert(src_x % bw == 0);
>>assert(src_y % bh == 0);
>> -  assert(src_width % bw == 0);
>> -  assert(src_height % bh == 0);
>> +  assert(src_width % bw == 0 ||
>> + src_x + src_width == minify(src_mt->logical_width0,
>src_level));
>> +  assert(src_height % bh == 0 ||
>> + src_y + src_height == minify(src_mt->logical_height0,
>src_level));
> 
>  Can you given example how we can blit sub-block worth of data? Above
>  there is
>  check for src_x/y being block aligned don't all mipmap levels need
>  to be
>  aligned as well?
> 
>For compressed textures, the width and height of the entire texture
>don't have to be block-aligned.  When computing the size of a miplevel
>in blocks, you just round up to the nearest block and part of the last
>block may be outside the texture.  This means that every time you have
>a subrectangle of a compressed textures, the upper and left edges have
>to be block-aligned and the lower and right edges have to be either
>block-aligned or be the left or right edge of the miplevel.  Make
>sense?  It's really annoying.

Ah, that makes sense. Would you mind leaving some of that as comment?

Anyway, both patches:

Reviewed-by: Topi Pohjolainen 

> 
>>
>>src_x /= (int)bw;
>>src_y /= (int)bh;
>> --
>> 2.5.0.400.gff86faf
>>
> 
>  > ___
>  > mesa-dev mailing list
>  > [3]mesa-dev@lists.freedesktop.org
>  > [4]https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> References
> 
>1. mailto:topi.pohjolai...@gmail.com
>2. mailto:mesa-sta...@lists.freedesktop.org
>3. mailto:mesa-dev@lists.freedesktop.org
>4. https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] vulkan: deduplicate the util code

2016-12-11 Thread Grazvydas Ignotas
anv_util.c and radv_util.c are essentially duplicates, so create a new
vk_util.c to be shared by both vulkan drivers. Some other content from
anv_private.h and radv_private.h was moved to vk_util.h too.

Signed-off-by: Grazvydas Ignotas 
---
no commit access

 configure.ac  |   1 +
 src/Makefile.am   |   1 +
 src/amd/vulkan/Makefile.am|   1 +
 src/amd/vulkan/Makefile.sources   |   1 -
 src/amd/vulkan/radv_device.c  |   2 +-
 src/amd/vulkan/radv_private.h |  60 +-
 src/amd/vulkan/radv_util.c| 130 --
 src/intel/vulkan/Makefile.am  |   1 +
 src/intel/vulkan/Makefile.sources |   1 -
 src/intel/vulkan/anv_private.h|  60 +-
 src/intel/vulkan/anv_util.c   | 127 -
 src/vulkan/Makefile.am|  17 +
 src/vulkan/Makefile.sources   |   4 ++
 src/vulkan/vk_util.c  | 105 ++
 src/vulkan/vk_util.h  |  94 +++
 15 files changed, 229 insertions(+), 376 deletions(-)
 delete mode 100644 src/amd/vulkan/radv_util.c
 delete mode 100644 src/intel/vulkan/anv_util.c
 create mode 100644 src/vulkan/Makefile.am
 create mode 100644 src/vulkan/Makefile.sources
 create mode 100644 src/vulkan/vk_util.c
 create mode 100644 src/vulkan/vk_util.h

diff --git a/configure.ac b/configure.ac
index 799f5eb..715a0ff 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2855,6 +2855,7 @@ AC_CONFIG_FILES([Makefile
src/mesa/main/tests/Makefile
src/util/Makefile
src/util/tests/hash_table/Makefile
+   src/vulkan/Makefile
src/vulkan/wsi/Makefile])
 
 AC_OUTPUT
diff --git a/src/Makefile.am b/src/Makefile.am
index ad54356..2c36dbd 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -116,6 +116,7 @@ SUBDIRS += intel/tools
 endif
 
 if HAVE_VULKAN_COMMON
+SUBDIRS += vulkan
 SUBDIRS += vulkan/wsi
 endif
 
diff --git a/src/amd/vulkan/Makefile.am b/src/amd/vulkan/Makefile.am
index 6e184c0..2f746dd 100644
--- a/src/amd/vulkan/Makefile.am
+++ b/src/amd/vulkan/Makefile.am
@@ -94,6 +94,7 @@ libvulkan_common_la_SOURCES = $(VULKAN_SOURCES)
 
 VULKAN_LIB_DEPS += \
libvulkan_common.la \
+   $(top_builddir)/src/vulkan/libvulkan_shared.la \
$(top_builddir)/src/vulkan/wsi/libvulkan_wsi.la \
$(top_builddir)/src/amd/common/libamd_common.la \
$(top_builddir)/src/amd/addrlib/libamdgpu_addrlib.la \
diff --git a/src/amd/vulkan/Makefile.sources b/src/amd/vulkan/Makefile.sources
index 425a00f..139209b 100644
--- a/src/amd/vulkan/Makefile.sources
+++ b/src/amd/vulkan/Makefile.sources
@@ -56,7 +56,6 @@ VULKAN_FILES := \
radv_private.h \
radv_radeon_winsys.h \
radv_query.c \
-   radv_util.c \
radv_util.h \
radv_wsi.c \
si_cmd_buffer.c \
diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 75b7af1..3ecc24f 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -829,7 +829,7 @@ VkResult radv_QueueSubmit(
   
pSubmits[i].commandBufferCount,
   can_patch, base_fence);
if (ret)
-   radv_loge("failed to submit CS %d\n", i);
+   vk_loge("failed to submit CS %d\n", i);
free(cs_array);
}
 
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index 3d4b111..0abbe0b 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -49,6 +49,7 @@
 #include "util/list.h"
 #include "util/vk_alloc.h"
 #include "main/macros.h"
+#include "vulkan/vk_util.h"
 
 #include "radv_radeon_winsys.h"
 #include "ac_binary.h"
@@ -100,36 +101,12 @@ enum radv_mem_type {
RADV_MEM_TYPE_COUNT
 };
 
-#define radv_noreturn __attribute__((__noreturn__))
-#define radv_printflike(a, b) __attribute__((__format__(__printf__, a, b)))
-
-static inline uint32_t
-align_u32(uint32_t v, uint32_t a)
-{
-   assert(a != 0 && a == (a & -a));
-   return (v + a - 1) & ~(a - 1);
-}
-
 static inline uint32_t
 align_u32_npot(uint32_t v, uint32_t a)
 {
return (v + a - 1) / a * a;
 }
 
-static inline uint64_t
-align_u64(uint64_t v, uint64_t a)
-{
-   assert(a != 0 && a == (a & -a));
-   return (v + a - 1) & ~(a - 1);
-}
-
-static inline int32_t
-align_i32(int32_t v, int32_t a)
-{
-   assert(a != 0 && a == (a & -a));
-   return (v + a - 1) & ~(a - 1);
-}
-
 /** Alignment must be a power of 2. */
 static inline bool
 radv_is_aligned(uintmax_t n, uintmax_t a)
@@ -182,38 +159,8 @@ radv_clear_mask(uint32_t *inout_mask, uint32_t clear_mask)
}
 }
 
-#define for_each_bit(b, dword)  \
-   for (uint32_t __dword = (dword);\
-(b) = 

[Mesa-dev] [PATCH 2/2] vulkan: fix 'statement with no effect' warning

2016-12-11 Thread Grazvydas Ignotas
Emitted on release build in case vk_error() return is not used.

Signed-off-by: Grazvydas Ignotas 
---
no commit access

 src/vulkan/vk_util.c |  4 
 src/vulkan/vk_util.h | 13 -
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/vulkan/vk_util.c b/src/vulkan/vk_util.c
index 5d38117..716b8e5 100644
--- a/src/vulkan/vk_util.c
+++ b/src/vulkan/vk_util.c
@@ -60,6 +60,8 @@ __vk_finishme(const char *file, int line, const char *format, 
...)
fprintf(stderr, "%s:%d: FINISHME: %s\n", file, line, buffer);
 }
 
+#ifdef DEBUG
+
 VkResult
 __vk_errorf(VkResult error, const char *file, int line, const char *format, 
...)
 {
@@ -103,3 +105,5 @@ __vk_errorf(VkResult error, const char *file, int line, 
const char *format, ...)
 
return error;
 }
+
+#endif /* DEBUG */
diff --git a/src/vulkan/vk_util.h b/src/vulkan/vk_util.h
index c384838..5ce206c 100644
--- a/src/vulkan/vk_util.h
+++ b/src/vulkan/vk_util.h
@@ -72,15 +72,18 @@ align_i32(int32_t v, int32_t a)
  * propagating errors. Might be useful to plug in a stack trace here.
  */
 
+#ifdef DEBUG
 VkResult __vk_errorf(VkResult error, const char *file, int line, const char 
*format, ...);
+#else
+static inline VkResult
+__vk_errorf(VkResult error, const char *file, int line, const char *format, 
...)
+{
+   return error;
+}
+#endif
 
-#ifdef DEBUG
 #define vk_error(error) __vk_errorf(error, __FILE__, __LINE__, NULL);
 #define vk_errorf(error, format, ...) __vk_errorf(error, __FILE__, __LINE__, 
format, ## __VA_ARGS__);
-#else
-#define vk_error(error) error
-#define vk_errorf(error, format, ...) error
-#endif
 
 void __vk_finishme(const char *file, int line, const char *format, ...)
vk_printflike(3, 4);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98869] Electronic Super Joy graphic artefacts (regression)

2016-12-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98869

cosiek...@o2.pl changed:

   What|Removed |Added

   Priority|medium  |high

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965/blit: Fix the src dimension sanity check in miptree_copy

2016-12-11 Thread Jason Ekstrand
On Dec 11, 2016 12:28 AM, "Pohjolainen, Topi" 
wrote:

On Tue, Dec 06, 2016 at 12:37:45PM -0800, Jason Ekstrand wrote:
> Cc: "13.0" 
> ---
>  src/mesa/drivers/dri/i965/intel_blit.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_blit.c
b/src/mesa/drivers/dri/i965/intel_blit.c
> index 03a35ee..5f0cf74 100644
> --- a/src/mesa/drivers/dri/i965/intel_blit.c
> +++ b/src/mesa/drivers/dri/i965/intel_blit.c
> @@ -421,8 +421,10 @@ intel_miptree_copy(struct brw_context *brw,
>
>assert(src_x % bw == 0);
>assert(src_y % bh == 0);
> -  assert(src_width % bw == 0);
> -  assert(src_height % bh == 0);
> +  assert(src_width % bw == 0 ||
> + src_x + src_width == minify(src_mt->logical_width0,
src_level));
> +  assert(src_height % bh == 0 ||
> + src_y + src_height == minify(src_mt->logical_height0,
src_level));

Can you given example how we can blit sub-block worth of data? Above there
is
check for src_x/y being block aligned don't all mipmap levels need to be
aligned as well?


For compressed textures, the width and height of the entire texture don't
have to be block-aligned.  When computing the size of a miplevel in blocks,
you just round up to the nearest block and part of the last block may be
outside the texture.  This means that every time you have a subrectangle of
a compressed textures, the upper and left edges have to be block-aligned
and the lower and right edges have to be either block-aligned or be the
left or right edge of the miplevel.  Make sense?  It's really annoying.

>
>src_x /= (int)bw;
>src_y /= (int)bh;
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: don't attempt to unlock an unlocked debug state mutex

2016-12-11 Thread Jonathan Gray
Commit 929fcee47e46781c57f2a354ce0a013915c033d1 introduced code that
attempts to unlock an unlocked mutex which is undefined behaviour.

On OpenBSD this leads to an abort:

0  0x124dadfa96ba in thrkill () at :2
1  0x124dadf3da39 in *_libc_abort () at /usr/src/lib/libc/stdlib/abort.c:52
2  0x124d2c1165b5 in *_libpthread_pthread_mutex_unlock (mutexp=)
at /usr/src/lib/librthread/rthread_sync.c:221
3  0x124d279c02e4 in init_attrib_groups (ctx=0x124df0fda000) at 
main/context.c:825
4  _mesa_initialize_context (ctx=ctx@entry=0x124df0fda000, 
api=api@entry=API_OPENGL_CORE,
visual=visual@entry=0x7f7bdfd0, share_list=share_list@entry=0x0,
driverFunctions=driverFunctions@entry=0x7f7bda60) at main/context.c:1204
5  0x124d27b507ec in st_create_context (api=api@entry=API_OPENGL_CORE,
pipe=pipe@entry=0x124dc491, visual=visual@entry=0x7f7bdfd0,
share=share@entry=0x0, options=options@entry=0x7f7be128)
at state_tracker/st_context.c:545
6  0x124d27b8639f in st_api_create_context (stapi=,
smapi=0x124d1b608800, attribs=0x7f7be100, error=0x7f7be0fc, 
shared_stctxi=0x0)
at state_tracker/st_manager.c:669
7  0x124d27cc5b9c in dri_create_context (api=, 
visual=0x124d8a0f8a00,
cPriv=0x124de473f240, major_version=, 
minor_version=,
flags=, notify_reset=false, error=0x7f7be2b4,
sharedContextPrivate=0x0) at dri_context.c:123
8  0x124d27cc5029 in driCreateContextAttribs (screen=0x124d8a0f8400,
api=, config=0x124d8a0f8a00, shared=,
num_attribs=, attribs=, error=0x7f7be2b4,
data=0x124d77814a00) at dri_util.c:448
9  0x124d8e109b00 in drisw_create_context_attribs (base=0x124df3e08700,
config_base=0x124d7a0e7300, shareList=, 
num_attribs=,
attribs=, error=0x7f7be2b4) at drisw_glx.c:476
10 0x124d8e104b4a in glXCreateContextAttribsARB (dpy=0x124d533f,
config=0x124d7a0e7300, share_context=0x0, direct=1, 
attrib_list=0x7f7be300)
at create_context.c:78

Signed-off-by: Jonathan Gray 
---
 src/mesa/main/debug_output.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c
index 48dbbb3..bc933db 100644
--- a/src/mesa/main/debug_output.c
+++ b/src/mesa/main/debug_output.c
@@ -1282,14 +1282,13 @@ _mesa_init_debug_output(struct gl_context *ctx)
*/
   struct gl_debug_state *debug = _mesa_lock_debug_state(ctx);
   if (!debug) {
- goto done;
+ return;
   }
   debug->DebugOutput = GL_TRUE;
   debug->LogToStderr = GL_TRUE;
   ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT;
+  _mesa_unlock_debug_state(ctx);
}
-done:
-   _mesa_unlock_debug_state(ctx);
 }
 
 
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99055] Games hang / freeze completely

2016-12-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99055

Bug ID: 99055
   Summary: Games hang / freeze completely
   Product: Mesa
   Version: 13.0
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: major
  Priority: medium
 Component: Other
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: etie...@bruines.com
QA Contact: mesa-dev@lists.freedesktop.org

I recently updated to 13.0.2-1, and since then, any game will hang after about
a few minutes of gameplay (sometimes even less than a minute). I am not
entirely sure this is related to Mesa, but I'm confident that you'll be able to
point me in the right direction. 

Games tested: Factorio, Democracy 3. 
Things that do work: Google Chrome (Netflix / YouTube). 

-
  GL_VERSION:  3.0 Mesa 13.0.2
  GL_VENDOR:   Intel Open Source Technology Center
  GL_RENDERER: Mesa DRI Intel(R) Sandybridge Mobile x86/MMX/SSE2

-
No logs available at (looking around the moment of the 'hanging')
/var/log/syslog
/var/log/messages
/var/log/Xorg.0.log
dmesg

The games do not output anything either. The game just freezes, as if the
drawing-thread is hanging. Meaning after switching workspaces (screens), it'll
refuse to load anything new and show the "old workspace" at that location. 

At this stage I have no idea how to debug this. SIGINT won't stop the program,
I have to use SIGKILL. 

I have tried using `apitrace trace`. This "fixed" the bug: not able to
reproduce when using apitrace. The current workaround I'm using right now is
`apitrace trace -o /dev/null factorio` to start the game; it's a bit slower
because of the tracing, but it hasn't crashed so far. (It often looks like it's
going to crash, but it seems to recover from it)

I'd be happy to provide any additional information.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 84/84 v2] st/nine: Implement new buffer upload path

2016-12-11 Thread Axel Davy
This new buffer upload path enables to lock
faster than the normal path when using
DISCARD/NOOVERWRITE.

v2: Diverse cleanups and fixes.

Signed-off-by: Axel Davy 
---
This patch needed a few cleanups, whih this new version adds.
With this version, I wasn't able to reproduce the memory leak.
I'm not sure though it is fixed, I will ask the ones who reported
the leak to check.
I'm still interested in comments. For example, is the pool big enough ?
 src/gallium/state_trackers/nine/Makefile.sources   |   2 +
 src/gallium/state_trackers/nine/buffer9.c  |  75 +-
 src/gallium/state_trackers/nine/buffer9.h  |   9 +-
 src/gallium/state_trackers/nine/device9.c  |   5 +
 src/gallium/state_trackers/nine/device9.h  |   3 +
 src/gallium/state_trackers/nine/indexbuffer9.c |  10 +-
 src/gallium/state_trackers/nine/indexbuffer9.h |   2 -
 .../state_trackers/nine/nine_buffer_upload.c   | 290 +
 .../state_trackers/nine/nine_buffer_upload.h   |  59 +
 src/gallium/state_trackers/nine/vertexbuffer9.c|   3 +-
 10 files changed, 437 insertions(+), 21 deletions(-)
 create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.c
 create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.h

diff --git a/src/gallium/state_trackers/nine/Makefile.sources 
b/src/gallium/state_trackers/nine/Makefile.sources
index 2bb08a2..56698a1 100644
--- a/src/gallium/state_trackers/nine/Makefile.sources
+++ b/src/gallium/state_trackers/nine/Makefile.sources
@@ -23,6 +23,8 @@ C_SOURCES := \
indexbuffer9.h \
iunknown.c \
iunknown.h \
+   nine_buffer_upload.c \
+   nine_buffer_upload.h \
nine_csmt_helper.h \
nine_debug.c \
nine_debug.h \
diff --git a/src/gallium/state_trackers/nine/buffer9.c 
b/src/gallium/state_trackers/nine/buffer9.c
index 9a7c30c..203e1d9 100644
--- a/src/gallium/state_trackers/nine/buffer9.c
+++ b/src/gallium/state_trackers/nine/buffer9.c
@@ -23,6 +23,7 @@
 
 #include "buffer9.h"
 #include "device9.h"
+#include "nine_buffer_upload.h"
 #include "nine_helpers.h"
 #include "nine_pipe.h"
 
@@ -100,6 +101,9 @@ NineBuffer9_ctor( struct NineBuffer9 *This,
 else
 info->usage = PIPE_USAGE_DYNAMIC;
 
+/* When Writeonly is not set, we don't want to enable the
+ * optimizations */
+This->discard_nooverwrite_only = !!(Usage & D3DUSAGE_WRITEONLY);
 /* if (pDesc->Usage & D3DUSAGE_DONOTCLIP) { } */
 /* if (pDesc->Usage & D3DUSAGE_NONSECURE) { } */
 /* if (pDesc->Usage & D3DUSAGE_NPATCHES) { } */
@@ -161,12 +165,18 @@ NineBuffer9_dtor( struct NineBuffer9 *This )
 list_del(>managed.list2);
 }
 
+if (This->buf)
+nine_upload_release_buffer(This->base.base.device->buffer_upload, 
This->buf);
+
 NineResource9_dtor(>base);
 }
 
 struct pipe_resource *
-NineBuffer9_GetResource( struct NineBuffer9 *This )
+NineBuffer9_GetResource( struct NineBuffer9 *This, unsigned *offset )
 {
+if (This->buf)
+return nine_upload_buffer_resource_and_offset(This->buf, offset);
+*offset = 0;
 return NineResource9_GetResource(>base);
 }
 
@@ -260,6 +270,8 @@ NineBuffer9_Lock( struct NineBuffer9 *This,
 if (Flags & D3DLOCK_DONOTWAIT && !(This->base.usage & D3DUSAGE_DYNAMIC))
 usage |= PIPE_TRANSFER_DONTBLOCK;
 
+This->discard_nooverwrite_only &= !!(Flags & (D3DLOCK_DISCARD | 
D3DLOCK_NOOVERWRITE));
+
 if (This->nmaps == This->maxmaps) {
 struct NineTransfer *newmaps =
 REALLOC(This->maps, sizeof(struct NineTransfer)*This->maxmaps,
@@ -271,8 +283,52 @@ NineBuffer9_Lock( struct NineBuffer9 *This,
 This->maps = newmaps;
 }
 
-This->maps[This->nmaps].is_pipe_secondary = FALSE;
+if (This->buf &&
+(!This->discard_nooverwrite_only ||
+ (This->discard_nooverwrite_only && (Flags & D3DLOCK_DISCARD {
+/* Release previous buffer */
+if (This->nmaps >= 1)
+This->maps[This->nmaps-1].should_destroy_buf = true;
+else
+nine_upload_release_buffer(device->buffer_upload, This->buf);
+This->buf = NULL;
+/* Mapping a nine_subbuffer without discard/nooverwrite.
+ * Since that will hurt (we'll wait the entire buffer
+ * to be idle), we'll won't nine_subbuffer for this buffer
+ * in fear that happens again. */
+if (!This->discard_nooverwrite_only) {
+DBG("Disabling nine_subbuffer for a buffer having"
+"used a nine_subbuffer buffer\n");
+/* Rebind previous resource */
+NineBuffer9_RebindIfRequired(This, device);
+}
+}
+
+This->maps[This->nmaps].transfer = NULL;
+This->maps[This->nmaps].is_pipe_secondary = false;
+This->maps[This->nmaps].buf = NULL;
+This->maps[This->nmaps].should_destroy_buf = false;
+
+if (This->discard_nooverwrite_only) {
+if (!This->buf) {
+This->buf 

Re: [Mesa-dev] [PATCH] glsl: Make copy propagation not panic when it sees an intrinsic.

2016-12-11 Thread Marek Olšák
I don't see any difference in Deus Ex, but F1 2015 uses 64% less
spilled-temp-array memory with this on GCN.

Marek

On Sat, Dec 10, 2016 at 5:28 AM, Kenneth Graunke  wrote:
> A number of games have large arrays of constants, which we promote to
> uniforms.  This introduces copies from the uniform array to the original
> temporary array.  Normally, copy propagation eliminates those copies,
> making everything refer to the uniform array directly.
>
> A number of shaders in "Deus Ex: Mankind Divided" recently exposed a
> limitation of copy propagation - if we had any intrinsics (i.e. image
> access in a compute shader), we weren't able to get rid of these copies.
>
> That meant that any variable indexing remained on the temporary array
> rather being moved to the uniform array.  i965's scalar backend
> currently doesn't support indirect addressing of temporary arrays,
> which meant lowering it to if-ladders.  This was horrible.
>
> On Skylake:
>
> total instructions in shared programs: 13700090 -> 13654519 (-0.33%)
> instructions in affected programs: 56438 -> 10867 (-80.75%)
> helped: 14
> HURT: 0
>
> total cycles in shared programs: 288879704 -> 291270232 (0.83%)
> cycles in affected programs: 12758080 -> 15148608 (18.74%)
> helped: 6
> HURT: 8
>
> All shaders helped are compute shaders in Tomb Raider or Deus Ex.
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/compiler/glsl/opt_copy_propagation.cpp | 31 
> ++
>  1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/src/compiler/glsl/opt_copy_propagation.cpp 
> b/src/compiler/glsl/opt_copy_propagation.cpp
> index 247c498..2240421 100644
> --- a/src/compiler/glsl/opt_copy_propagation.cpp
> +++ b/src/compiler/glsl/opt_copy_propagation.cpp
> @@ -186,11 +186,34 @@ ir_copy_propagation_visitor::visit_enter(ir_call *ir)
>}
> }
>
> -   /* Since we're unlinked, we don't (necessarily) know the side effects of
> -* this call.  So kill all copies.
> +   /* Since this pass can run when unlinked, we don't (necessarily) know
> +* the side effects of calls.  (When linked, most calls are inlined
> +* anyway, so it doesn't matter much.)
> +*
> +* One place where this does matter is IR intrinsics.  They're never
> +* inlined.  We also know what they do - while some have side effects
> +* (such as image writes), none edit random global variables.  So we
> +* can assume they're side-effect free (other than the return value
> +* and out parameters).
>  */
> -   _mesa_hash_table_clear(acp, NULL);
> -   this->killed_all = true;
> +   if (!ir->callee->is_intrinsic()) {
> +  _mesa_hash_table_clear(acp, NULL);
> +  this->killed_all = true;
> +   } else {
> +  if (ir->return_deref)
> + kill(ir->return_deref->var);
> +
> +  foreach_two_lists(formal_node, >callee->parameters,
> +actual_node, >actual_parameters) {
> + ir_variable *sig_param = (ir_variable *) formal_node;
> + if (sig_param->data.mode == ir_var_function_out ||
> + sig_param->data.mode == ir_var_function_inout) {
> +ir_rvalue *ir = (ir_rvalue *) actual_node;
> +ir_variable *var = ir->variable_referenced();
> +kill(var);
> + }
> +  }
> +   }
>
> return visit_continue_with_parent;
>  }
> --
> 2.10.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965/blit: Fix the src dimension sanity check in miptree_copy

2016-12-11 Thread Pohjolainen, Topi
On Tue, Dec 06, 2016 at 12:37:45PM -0800, Jason Ekstrand wrote:
> Cc: "13.0" 
> ---
>  src/mesa/drivers/dri/i965/intel_blit.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_blit.c 
> b/src/mesa/drivers/dri/i965/intel_blit.c
> index 03a35ee..5f0cf74 100644
> --- a/src/mesa/drivers/dri/i965/intel_blit.c
> +++ b/src/mesa/drivers/dri/i965/intel_blit.c
> @@ -421,8 +421,10 @@ intel_miptree_copy(struct brw_context *brw,
>  
>assert(src_x % bw == 0);
>assert(src_y % bh == 0);
> -  assert(src_width % bw == 0);
> -  assert(src_height % bh == 0);
> +  assert(src_width % bw == 0 ||
> + src_x + src_width == minify(src_mt->logical_width0, src_level));
> +  assert(src_height % bh == 0 ||
> + src_y + src_height == minify(src_mt->logical_height0, 
> src_level));

Can you given example how we can blit sub-block worth of data? Above there is
check for src_x/y being block aligned don't all mipmap levels need to be
aligned as well?

>  
>src_x /= (int)bw;
>src_y /= (int)bh;
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 3/7] i965/fs: Add a CHANNEL_IDS opcode

2016-12-11 Thread Pohjolainen, Topi
On Mon, Dec 05, 2016 at 11:59:54AM -0800, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h|  6 ++
>  src/mesa/drivers/dri/i965/brw_fs.h |  3 +++
>  src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  1 +
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 28 
> ++
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 14 +++--
>  src/mesa/drivers/dri/i965/brw_shader.cpp   |  2 ++
>  6 files changed, 43 insertions(+), 11 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index f22a52f..16a72c4 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -1095,6 +1095,12 @@ enum opcode {
>  */
> SHADER_OPCODE_BROADCAST,
>  
> +   /* Takes no sources and returns the index of each channel in its 
> respective
> +* SIMD channel in the destination register.  The instruction must have
> +* WE_all set and the destination must have type BRW_REGISTER_TYPE_UW.

In the implementation further down you actually pass BRW_REGISTER_TYPE_W and
assert for W or UW.

While I haven't actually worked much on the compiler and just been reading
patches in order to keep up, to me this patch looks to make things better
structured its own - generator emitting the actual trick of instructions.

> +*/
> +   SHADER_OPCODE_CHANNEL_IDS,
> +
> VEC4_OPCODE_MOV_BYTES,
> VEC4_OPCODE_PACK_BYTES,
> VEC4_OPCODE_UNPACK_UNIFORM,
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index 002cee8..4ce0f56 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -437,6 +437,9 @@ private:
>struct brw_reg msg_data,
>unsigned msg_type);
>  
> +   void generate_channel_ids(fs_inst *inst,
> + struct brw_reg dst);
> +
> void generate_set_sample_id(fs_inst *inst,
> struct brw_reg dst,
> struct brw_reg src0,
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
> index 48220ef..02ccda1 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
> @@ -79,6 +79,7 @@ is_expression(const fs_visitor *v, const fs_inst *const 
> inst)
> case FS_OPCODE_LINTERP:
> case SHADER_OPCODE_FIND_LIVE_CHANNEL:
> case SHADER_OPCODE_BROADCAST:
> +   case SHADER_OPCODE_CHANNEL_IDS:
> case SHADER_OPCODE_MOV_INDIRECT:
> case SHADER_OPCODE_TEX_LOGICAL:
> case SHADER_OPCODE_TXD_LOGICAL:
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index c5b50e1..3ef2d5b 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -1435,6 +1435,30 @@ fs_generator::generate_set_simd4x2_offset(fs_inst 
> *inst,
> brw_pop_insn_state(p);
>  }
>  
> +void
> +fs_generator::generate_channel_ids(fs_inst *inst, struct brw_reg dst)
> +{
> +   assert(dst.type == BRW_REGISTER_TYPE_UW ||
> +  dst.type == BRW_REGISTER_TYPE_W);
> +   assert(inst->exec_size >= 8);
> +
> +   brw_push_insn_state(p);
> +   brw_set_default_exec_size(p, BRW_EXECUTE_8);
> +   brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
> +   brw_set_default_mask_control(p, true);
> +   brw_MOV(p, dst, brw_imm_v(0x76543210));
> +
> +   if (inst->exec_size > 8) {
> +  assert(inst->exec_size == 16);
> +  brw_ADD(p, vec8(suboffset(dst, 8)), vec8(dst), brw_imm_w(8));
> +   }
> +
> +   brw_pop_insn_state(p);
> +
> +   if (inst->group > 0)
> +  brw_ADD(p, dst, dst, brw_imm_w(inst->group));
> +}
> +
>  /* Sets vstride=1, width=4, hstride=0 of register src1 during
>   * the ADD instruction.
>   */
> @@ -2056,6 +2080,10 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>   brw_broadcast(p, dst, src[0], src[1]);
>   break;
>  
> +  case SHADER_OPCODE_CHANNEL_IDS:
> + generate_channel_ids(inst, dst);
> + break;
> +
>case FS_OPCODE_SET_SAMPLE_ID:
>   generate_set_sample_id(inst, dst, src[0], src[1]);
>   break;
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 855266f..9478bb8 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -4310,17 +4310,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
> }
>  
> case nir_intrinsic_load_channel_num: {
> -  fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UW);
> -  dest = retype(dest, BRW_REGISTER_TYPE_UD);
> -  const fs_builder allbld8 = bld.group(8, 0).exec_all();
> -  allbld8.MOV(tmp, brw_imm_v(0x76543210));
> -  

Re: [Mesa-dev] [PATCH 26/27] i965: Remove scanout restriction from lossless compression

2016-12-11 Thread Pohjolainen, Topi
On Thu, Dec 01, 2016 at 02:10:07PM -0800, Ben Widawsky wrote:
> From: Ben Widawsky 
> 
> Cc: Topi Pohjolainen 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 12 +++-
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index b79de08..b297f79 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -157,7 +157,7 @@ intel_miptree_supports_non_msrt_fast_clear(struct 
> brw_context *brw,
> if (mt->disable_aux_buffers)
>return false;
>  
> -   if (mt->is_scanout)
> +   if (mt->is_scanout && mt->msaa_layout != INTEL_MSAA_LAYOUT_CMS)

I guess the same question I had in previous becomes more general. How do we
know that the external consumer is prepared for compression (mt->is_scanout
just tells that the buffer goes out from the driver, right)?

>return false;
>  
> /* This function applies only to non-multisampled render targets. */
> @@ -528,10 +528,6 @@ intel_miptree_create_layout(struct brw_context *brw,
>const UNUSED bool is_lossless_compressed_aux =
>   brw->gen >= 9 && num_samples == 1 &&
>   mt->format == MESA_FORMAT_R_UINT32;
> -
> -  /* For now, nothing else has this requirement */
> -  assert(is_lossless_compressed_aux ||
> - (layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0);

Why do we need to drop this? And if we do, then we can drop the entire
else-branch, "is_lossless_compressed_aux" is only used in the assertion.

> }
>  
> brw_miptree_layout(brw, mt, layout_flags);
> @@ -752,11 +748,9 @@ intel_miptree_create(struct brw_context *brw,
> * resolves.
> */
>const bool lossless_compression_disabled = INTEL_DEBUG & DEBUG_NO_RBC;
> -  assert(!mt->is_scanout);
>const bool is_lossless_compressed =
>   unlikely(!lossless_compression_disabled) &&
> - brw->gen >= 9 && !mt->is_scanout &&
> - intel_miptree_supports_lossless_compressed(brw, mt);
> + brw->gen >= 9 && intel_miptree_supports_lossless_compressed(brw, 
> mt);
>  
>if (is_lossless_compressed) {
>   intel_miptree_alloc_non_msrt_mcs(brw, mt, is_lossless_compressed);
> @@ -1043,7 +1037,7 @@ intel_miptree_release(struct intel_mipmap_tree **mt)
>  drm_intel_bo_unreference((*mt)->hiz_buf->aux_base.bo);
>   free((*mt)->hiz_buf);
>}
> -  if ((*mt)->mcs_buf && !(*mt)->is_scanout) {
> +  if ((*mt)->mcs_buf) {
>   drm_intel_bo_unreference((*mt)->mcs_buf->bo);
>   free((*mt)->mcs_buf);
>}
> -- 
> 2.10.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Make copy propagation not panic when it sees an intrinsic.

2016-12-11 Thread Kenneth Graunke
On Saturday, December 10, 2016 12:37:16 PM PST Matt Turner wrote:
> On Fri, Dec 9, 2016 at 8:28 PM, Kenneth Graunke  wrote:
> > A number of games have large arrays of constants, which we promote to
> > uniforms.  This introduces copies from the uniform array to the original
> > temporary array.  Normally, copy propagation eliminates those copies,
> > making everything refer to the uniform array directly.
> >
> > A number of shaders in "Deus Ex: Mankind Divided" recently exposed a
> > limitation of copy propagation - if we had any intrinsics (i.e. image
> > access in a compute shader), we weren't able to get rid of these copies.
> >
> > That meant that any variable indexing remained on the temporary array
> > rather being moved to the uniform array.  i965's scalar backend
> > currently doesn't support indirect addressing of temporary arrays,
> > which meant lowering it to if-ladders.  This was horrible.
> >
> > On Skylake:
> >
> > total instructions in shared programs: 13700090 -> 13654519 (-0.33%)
> > instructions in affected programs: 56438 -> 10867 (-80.75%)
> 
> Wow!
> 
> > helped: 14
> > HURT: 0
> >
> > total cycles in shared programs: 288879704 -> 291270232 (0.83%)
> > cycles in affected programs: 12758080 -> 15148608 (18.74%)
> 
> ... that seems nuts?
> 
> Any idea what's going on with the cycle counts?

Good point...I glossed over the cycle counts when I saw the -80%
reduction in instructions with 0 shaders hurt.  But they do look
pretty bad, so let's take a closer look...

There are two nearly identical shaders that are the worst offenders:

shaders/closed/steam/deus-ex-mankind-divided/256.shader_test CS SIMD16:

instructions: 2770 -> 253 (-2,517 instructions or -90.87%)
spills: 25 -> 0
fills:  29 -> 0
cycles: 923266 -> 1420534 (+497,268 cycles or +53.86%)
compile time: 2.73 seconds -> 0.17 seconds

There are three loops in the program, each of which contains two
indirect reads of the uvec4[98] constant array.

Before this patch, there were:
 - 67 UNIFORM_PULL_CONSTANT_LOADs at the top of the program
 - 1 UNIFORM_PULL_CONSTANT_LOAD in the first (cheap) loop
 - 1 UNIFORM_PULL_CONSTANT_LOAD in the second (expensive) loop
 - 1 UNIFORM_PULL_CONSTANT_LOAD in the third (very expensive) loop

After this patch, there are:
 - 0 loads at the top of the program
 - 1 VARYING_PULL_CONSTANT_LOAD in the first (cheap) loop
 - 2 VARYING_PULL_CONSTANT_LOAD in the second (expensive) loop
 - 2 VARYING_PULL_CONSTANT_LOAD in the third (very expensive) loop

The array indexes in the expensive loop are a[foo] and a[foo + 1].
foo is modified in the loop, so they can't be hoisted out.  I don't
think we can determine the number of loop iterations.

The two expensive loops look to be twice as expensive after this patch.
The numbers aren't quite adding up for me - it looks like we should
spend 200 more cycles per loop iteration, but the loops are like 40,000
-> 90,000 cycles.

I'm not sure what to do with this information.  Eliminating 90% of the
instructions seems good.  Requiring no scratch access seems good.
Eliminating the 67 memory loads outside of the loops seems good.
Doing two memory loads per loop doesn't seem too crazy, given that
it matches the GLSL source code.  Burning 49 registers to store the
entire array for the lifetime of the program seems pretty crazy...

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev