date:20180723

Re: [Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

2018-07-23 Thread Timothy Arceri


On 24/07/18 14:24, Marek Olšák wrote:

On Mon, Jul 23, 2018 at 11:33 PM, Timothy Arceri  wrote:

On 24/07/18 11:15, Marek Olšák wrote:


On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie  wrote:


On 20 July 2018 at 13:12, Marek Olšák  wrote:


From: Marek Olšák 

To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
finish sooner on the older CPUs. (otherwise it gets killed and we fail
the test)



I think this is possibly a bad idea, since it's clear LLVM has some
pathalogical
behaviour the AMDGPU backend for this shader and we are just papering
over it.

A quick dig into LLVM shows horrible misuse of a SmallVector data
structure
for what ends up having 2000 entries in it.

I'm not going to out right NAK this, but it would be nice to have it
accompanied
by a pointer to an llvm bug against the amdgpu backend for the
pathalogical case.



Even if I comment out the push_back call in LLVM, it's still too slow.
(the dEQP test times out and fails) LLVMCodeGenLevelLess is faster,
but I don't know yet if it's enough for the test.



I hard-coded the second buffer block to column_major rather than row_major
which reduced total run time from 15 -> 9 seconds on my machine. So it seems
temps would definitely help. Proper packing support would also likely help a
little more but not as much.


Can you please describe how temps would help?


I already have :)

https://lists.freedesktop.org/archives/mesa-dev/2018-July/200710.html



Thanks,
Marek


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

2018-07-23 Thread Marek Olšák

On Mon, Jul 23, 2018 at 11:33 PM, Timothy Arceri  wrote:
> On 24/07/18 11:15, Marek Olšák wrote:
>>
>> On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie  wrote:
>>>
>>> On 20 July 2018 at 13:12, Marek Olšák  wrote:

 From: Marek Olšák 

 To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
 finish sooner on the older CPUs. (otherwise it gets killed and we fail
 the test)
>>>
>>>
>>> I think this is possibly a bad idea, since it's clear LLVM has some
>>> pathalogical
>>> behaviour the AMDGPU backend for this shader and we are just papering
>>> over it.
>>>
>>> A quick dig into LLVM shows horrible misuse of a SmallVector data
>>> structure
>>> for what ends up having 2000 entries in it.
>>>
>>> I'm not going to out right NAK this, but it would be nice to have it
>>> accompanied
>>> by a pointer to an llvm bug against the amdgpu backend for the
>>> pathalogical case.
>>
>>
>> Even if I comment out the push_back call in LLVM, it's still too slow.
>> (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster,
>> but I don't know yet if it's enough for the test.
>
>
> I hard-coded the second buffer block to column_major rather than row_major
> which reduced total run time from 15 -> 9 seconds on my machine. So it seems
> temps would definitely help. Proper packing support would also likely help a
> little more but not as much.

Can you please describe how temps would help?

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] gallium/u_vbuf: handle indirect multidraws correctly and efficiently (v2)

2018-07-23 Thread Marek Olšák

From: Marek Olšák 

v2: need to do MAX{start+count} instead of MAX{count}
added piglit tests
---
 src/gallium/auxiliary/util/u_vbuf.c | 199 
 1 file changed, 175 insertions(+), 24 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_vbuf.c 
b/src/gallium/auxiliary/util/u_vbuf.c
index 746ff1085ce..ca53e6218fd 100644
--- a/src/gallium/auxiliary/util/u_vbuf.c
+++ b/src/gallium/auxiliary/util/u_vbuf.c
@@ -1124,20 +1124,45 @@ static void u_vbuf_set_driver_vertex_buffers(struct 
u_vbuf *mgr)
unsigned start_slot, count;
 
start_slot = ffs(mgr->dirty_real_vb_mask) - 1;
count = util_last_bit(mgr->dirty_real_vb_mask >> start_slot);
 
pipe->set_vertex_buffers(pipe, start_slot, count,
 mgr->real_vertex_buffer + start_slot);
mgr->dirty_real_vb_mask = 0;
 }
 
+static void
+u_vbuf_split_indexed_multidraw(struct u_vbuf *mgr, struct pipe_draw_info *info,
+   unsigned *indirect_data, unsigned stride,
+   unsigned draw_count)
+{
+   assert(info->index_size);
+   info->indirect = NULL;
+
+   for (unsigned i = 0; i < draw_count; i++) {
+  unsigned offset = i * stride / 4;
+
+  info->count = indirect_data[offset + 0];
+  info->instance_count = indirect_data[offset + 1];
+
+  if (!info->count || !info->instance_count)
+ continue;
+
+  info->start = indirect_data[offset + 2];
+  info->index_bias = indirect_data[offset + 3];
+  info->start_instance = indirect_data[offset + 4];
+
+  u_vbuf_draw_vbo(mgr, info);
+   }
+}
+
 void u_vbuf_draw_vbo(struct u_vbuf *mgr, const struct pipe_draw_info *info)
 {
struct pipe_context *pipe = mgr->pipe;
int start_vertex;
unsigned min_index;
unsigned num_vertices;
boolean unroll_indices = FALSE;
const uint32_t used_vb_mask = mgr->ve->used_vb_mask;
uint32_t user_vb_mask = mgr->user_vb_mask & used_vb_mask;
const uint32_t incompatible_vb_mask =
@@ -1153,47 +1178,172 @@ void u_vbuf_draw_vbo(struct u_vbuf *mgr, const struct 
pipe_draw_info *info)
   if (mgr->dirty_real_vb_mask & used_vb_mask) {
  u_vbuf_set_driver_vertex_buffers(mgr);
   }
 
   pipe->draw_vbo(pipe, info);
   return;
}
 
new_info = *info;
 
-   /* Fallback. We need to know all the parameters. */
+   /* Handle indirect (multi)draws. */
if (new_info.indirect) {
-  struct pipe_transfer *transfer = NULL;
-  int *data;
-
-  if (new_info.index_size) {
- data = pipe_buffer_map_range(pipe, new_info.indirect->buffer,
-  new_info.indirect->offset, 20,
-  PIPE_TRANSFER_READ, );
- new_info.index_bias = data[3];
- new_info.start_instance = data[4];
-  }
-  else {
- data = pipe_buffer_map_range(pipe, new_info.indirect->buffer,
-  new_info.indirect->offset, 16,
-  PIPE_TRANSFER_READ, );
- new_info.start_instance = data[3];
+  const struct pipe_draw_indirect_info *indirect = new_info.indirect;
+  unsigned draw_count = 0;
+
+  /* Get the number of draws. */
+  if (indirect->indirect_draw_count) {
+ pipe_buffer_read(pipe, indirect->indirect_draw_count,
+  indirect->indirect_draw_count_offset,
+  4, _count);
+  } else {
+ draw_count = indirect->draw_count;
   }
 
-  new_info.count = data[0];
-  new_info.instance_count = data[1];
-  new_info.start = data[2];
-  pipe_buffer_unmap(pipe, transfer);
-  new_info.indirect = NULL;
-
-  if (!new_info.count)
+  if (!draw_count)
  return;
+
+  unsigned data_size = (draw_count - 1) * indirect->stride +
+   (new_info.index_size ? 20 : 16);
+  unsigned *data = alloca(data_size);
+
+  /* Read the used buffer range only once, because the read can be
+   * uncached.
+   */
+  pipe_buffer_read(pipe, indirect->buffer, indirect->offset, data_size,
+   data);
+
+  if (info->index_size) {
+ /* Indexed multidraw. */
+ unsigned index_bias0 = data[3];
+ bool index_bias_same = true;
+
+ /* If we invoke the translate path, we have to split the multidraw. */
+ if (incompatible_vb_mask ||
+ mgr->ve->incompatible_elem_mask) {
+u_vbuf_split_indexed_multidraw(mgr, _info, data,
+   indirect->stride, draw_count);
+return;
+ }
+
+ /* See if index_bias is the same for all draws. */
+ for (unsigned i = 1; i < draw_count; i++) {
+if (data[i * indirect->stride / 4 + 3] != index_bias0) {
+   index_bias_same = false;
+   break;
+}
+ }
+
+ /* Split the multidraw if index_bias is different. */
+ if

[Mesa-dev] [PATCH] mesa: allow indirect draws with the default VAO and compatibility profile

2018-07-23 Thread Marek Olšák

From: Marek Olšák 

---
 src/mesa/main/draw_validate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/draw_validate.c b/src/mesa/main/draw_validate.c
index c0a234a2bc2..29304bd5144 100644
--- a/src/mesa/main/draw_validate.c
+++ b/src/mesa/main/draw_validate.c
@@ -1078,21 +1078,22 @@ valid_draw_indirect(struct gl_context *ctx,
 {
const uint64_t end = (uint64_t) (uintptr_t) indirect + size;
 
/* OpenGL ES 3.1 spec. section 10.5:
 *
 *  "DrawArraysIndirect requires that all data sourced for the
 *  command, including the DrawArraysIndirectCommand
 *  structure,  be in buffer objects,  and may not be called when
 *  the default vertex array object is bound."
 */
-   if (ctx->Array.VAO == ctx->Array.DefaultVAO) {
+   if (ctx->API != API_OPENGL_COMPAT &&
+   ctx->Array.VAO == ctx->Array.DefaultVAO) {
   _mesa_error(ctx, GL_INVALID_OPERATION, "(no VAO bound)");
   return GL_FALSE;
}
 
/* From OpenGL ES 3.1 spec. section 10.5:
 * "An INVALID_OPERATION error is generated if zero is bound to
 * VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to any enabled
 * vertex array."
 *
 * Here we check that for each enabled vertex array we have a vertex
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] gallium/u_vbuf: split u_vbuf_get_minmax_index function (v2)

2018-07-23 Thread Marek Olšák

From: Marek Olšák 

This will be used by indirect multidraws.

v2: clean up the function further, change return types to unsigned
---
 src/gallium/auxiliary/util/u_vbuf.c | 101 ++--
 1 file changed, 51 insertions(+), 50 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_vbuf.c 
b/src/gallium/auxiliary/util/u_vbuf.c
index c04a18a6764..746ff1085ce 100644
--- a/src/gallium/auxiliary/util/u_vbuf.c
+++ b/src/gallium/auxiliary/util/u_vbuf.c
@@ -1015,111 +1015,111 @@ static boolean 
u_vbuf_mapping_vertex_buffer_blocks(const struct u_vbuf *mgr)
 * We could query whether each buffer is busy, but that would
 * be way more costly than this. */
return (mgr->ve->used_vb_mask &
(~mgr->user_vb_mask &
 ~mgr->incompatible_vb_mask &
 mgr->ve->compatible_vb_mask_all &
 mgr->ve->noninstance_vb_mask_any &
 mgr->nonzero_stride_vb_mask)) != 0;
 }
 
-static void u_vbuf_get_minmax_index(struct pipe_context *pipe,
-const struct pipe_draw_info *info,
-int *out_min_index, int *out_max_index)
+static void
+u_vbuf_get_minmax_index_mapped(const struct pipe_draw_info *info,
+   const void *indices, unsigned *out_min_index,
+   unsigned *out_max_index)
 {
-   struct pipe_transfer *transfer = NULL;
-   const void *indices;
-   unsigned i;
-
-   if (info->has_user_indices) {
-  indices = (uint8_t*)info->index.user +
-info->start * info->index_size;
-   } else {
-  indices = pipe_buffer_map_range(pipe, info->index.resource,
-  info->start * info->index_size,
-  info->count * info->index_size,
-  PIPE_TRANSFER_READ, );
-   }
+   unsigned max = 0;
+   unsigned min = ~0u;
 
switch (info->index_size) {
case 4: {
   const unsigned *ui_indices = (const unsigned*)indices;
-  unsigned max_ui = 0;
-  unsigned min_ui = ~0U;
   if (info->primitive_restart) {
- for (i = 0; i < info->count; i++) {
+ for (unsigned i = 0; i < info->count; i++) {
 if (ui_indices[i] != info->restart_index) {
-   if (ui_indices[i] > max_ui) max_ui = ui_indices[i];
-   if (ui_indices[i] < min_ui) min_ui = ui_indices[i];
+   if (ui_indices[i] > max) max = ui_indices[i];
+   if (ui_indices[i] < min) min = ui_indices[i];
 }
  }
   }
   else {
- for (i = 0; i < info->count; i++) {
-if (ui_indices[i] > max_ui) max_ui = ui_indices[i];
-if (ui_indices[i] < min_ui) min_ui = ui_indices[i];
+ for (unsigned i = 0; i < info->count; i++) {
+if (ui_indices[i] > max) max = ui_indices[i];
+if (ui_indices[i] < min) min = ui_indices[i];
  }
   }
-  *out_min_index = min_ui;
-  *out_max_index = max_ui;
   break;
}
case 2: {
   const unsigned short *us_indices = (const unsigned short*)indices;
-  unsigned max_us = 0;
-  unsigned min_us = ~0U;
   if (info->primitive_restart) {
- for (i = 0; i < info->count; i++) {
+ for (unsigned i = 0; i < info->count; i++) {
 if (us_indices[i] != info->restart_index) {
-   if (us_indices[i] > max_us) max_us = us_indices[i];
-   if (us_indices[i] < min_us) min_us = us_indices[i];
+   if (us_indices[i] > max) max = us_indices[i];
+   if (us_indices[i] < min) min = us_indices[i];
 }
  }
   }
   else {
- for (i = 0; i < info->count; i++) {
-if (us_indices[i] > max_us) max_us = us_indices[i];
-if (us_indices[i] < min_us) min_us = us_indices[i];
+ for (unsigned i = 0; i < info->count; i++) {
+if (us_indices[i] > max) max = us_indices[i];
+if (us_indices[i] < min) min = us_indices[i];
  }
   }
-  *out_min_index = min_us;
-  *out_max_index = max_us;
   break;
}
case 1: {
   const unsigned char *ub_indices = (const unsigned char*)indices;
-  unsigned max_ub = 0;
-  unsigned min_ub = ~0U;
   if (info->primitive_restart) {
- for (i = 0; i < info->count; i++) {
+ for (unsigned i = 0; i < info->count; i++) {
 if (ub_indices[i] != info->restart_index) {
-   if (ub_indices[i] > max_ub) max_ub = ub_indices[i];
-   if (ub_indices[i] < min_ub) min_ub = ub_indices[i];
+   if (ub_indices[i] > max) max = ub_indices[i];
+   if (ub_indices[i] < min) min = ub_indices[i];
 }
  }
   }
   else {
- for (i = 0; i < info->count; i++) {
-if (ub_indices[i] > max_ub) max_ub = ub_indices[i];
-if (ub_indices[i] < min_ub) min_ub = ub_indices[i];
+ for

Re: [Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

2018-07-23 Thread Timothy Arceri


On 24/07/18 11:15, Marek Olšák wrote:

On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie  wrote:

On 20 July 2018 at 13:12, Marek Olšák  wrote:

From: Marek Olšák 

To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
finish sooner on the older CPUs. (otherwise it gets killed and we fail
the test)


I think this is possibly a bad idea, since it's clear LLVM has some pathalogical
behaviour the AMDGPU backend for this shader and we are just papering over it.

A quick dig into LLVM shows horrible misuse of a SmallVector data structure
for what ends up having 2000 entries in it.

I'm not going to out right NAK this, but it would be nice to have it accompanied
by a pointer to an llvm bug against the amdgpu backend for the
pathalogical case.


Even if I comment out the push_back call in LLVM, it's still too slow.
(the dEQP test times out and fails) LLVMCodeGenLevelLess is faster,
but I don't know yet if it's enough for the test.


I hard-coded the second buffer block to column_major rather than 
row_major which reduced total run time from 15 -> 9 seconds on my 
machine. So it seems temps would definitely help. Proper packing support 
would also likely help a little more but not as much.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

2018-07-23 Thread Marek Olšák

On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie  wrote:
> On 20 July 2018 at 13:12, Marek Olšák  wrote:
>> From: Marek Olšák 
>>
>> To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
>> finish sooner on the older CPUs. (otherwise it gets killed and we fail
>> the test)
>
> I think this is possibly a bad idea, since it's clear LLVM has some 
> pathalogical
> behaviour the AMDGPU backend for this shader and we are just papering over it.
>
> A quick dig into LLVM shows horrible misuse of a SmallVector data structure
> for what ends up having 2000 entries in it.
>
> I'm not going to out right NAK this, but it would be nice to have it 
> accompanied
> by a pointer to an llvm bug against the amdgpu backend for the
> pathalogical case.

Even if I comment out the push_back call in LLVM, it's still too slow.
(the dEQP test times out and fails) LLVMCodeGenLevelLess is faster,
but I don't know yet if it's enough for the test.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/8] ac: add support for 16bit UBO loads

2018-07-23 Thread Marek Olšák

On Thu, Jul 19, 2018 at 8:48 AM, Daniel Schürmann
 wrote:
> Signed-off-by: Daniel Schürmann 
> ---
>  src/amd/common/ac_llvm_build.c  | 25 +
>  src/amd/common/ac_llvm_build.h  |  8 
>  src/amd/common/ac_nir_to_llvm.c | 21 ++---
>  3 files changed, 51 insertions(+), 3 deletions(-)
>
> diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
> index 4078b005e5..54b7e98701 100644
> --- a/src/amd/common/ac_llvm_build.c
> +++ b/src/amd/common/ac_llvm_build.c
> @@ -1103,6 +1103,31 @@ LLVMValueRef 
> ac_build_buffer_load_format_gfx9_safe(struct ac_llvm_context *ctx,
>can_speculate, true);
>  }
>
> +LLVMValueRef
> +ac_build_tbuffer_load_short(struct ac_llvm_context *ctx,
> +   LLVMValueRef rsrc,
> +   LLVMValueRef vindex,
> +   LLVMValueRef voffset,
> +   LLVMValueRef soffset,
> +   LLVMValueRef immoffset)
> +{
> +   const char *name = "llvm.amdgcn.tbuffer.load.i32";
> +   LLVMTypeRef type = ctx->i32;
> +   LLVMValueRef params[] = {
> +   rsrc,
> +   vindex,
> +   voffset,
> +   soffset,
> +   immoffset,
> +   LLVMConstInt(ctx->i32, 
> V_008F0C_BUF_DATA_FORMAT_16, false),
> +   LLVMConstInt(ctx->i32, 
> V_008F0C_BUF_NUM_FORMAT_UINT, false),
> +   ctx->i1false,
> +   ctx->i1false,
> +   };
> +   LLVMValueRef res = ac_build_intrinsic(ctx, name, type, params, 9, 0);
> +   return LLVMBuildTrunc(ctx->builder, res, ctx->i16, "");
> +}
> +
>  /**
>   * Set range metadata on an instruction.  This can only be used on load and
>   * call instructions.  If you know an instruction can only produce the values
> diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
> index 4e7cbcd5fa..c5753037e7 100644
> --- a/src/amd/common/ac_llvm_build.h
> +++ b/src/amd/common/ac_llvm_build.h
> @@ -252,6 +252,14 @@ LLVMValueRef 
> ac_build_buffer_load_format_gfx9_safe(struct ac_llvm_context *ctx,
>bool glc,
>bool can_speculate);
>
> +LLVMValueRef
> +ac_build_tbuffer_load_short(struct ac_llvm_context *ctx,
> +   LLVMValueRef rsrc,
> +   LLVMValueRef vindex,
> +   LLVMValueRef voffset,
> +   LLVMValueRef soffset,
> +   LLVMValueRef immoffset);
> +
>  LLVMValueRef
>  ac_get_thread_id(struct ac_llvm_context *ctx);
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 43a0b86420..d7a52a536c 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -1663,9 +1663,24 @@ static LLVMValueRef visit_load_ubo_buffer(struct 
> ac_nir_context *ctx,
> if (instr->dest.ssa.bit_size == 64)
> num_components *= 2;
>
> -   ret = ac_build_buffer_load(>ac, rsrc, num_components, NULL, 
> offset,
> -  NULL, 0, false, false, true, true);
> -   ret = ac_trim_vector(>ac, ret, num_components);
> +   if (instr->dest.ssa.bit_size == 16) {
> +   LLVMValueRef results[num_components];
> +   for (unsigned i = 0; i < num_components; ++i) {
> +   results[i] = ac_build_tbuffer_load_short(>ac,
> +rsrc,
> +
> ctx->ac.i32_0,
> +offset,
> +
> ctx->ac.i32_0,
> +
> LLVMConstInt(ctx->ac.i32, 2 * i, 0));
> +   }

FYI, tbuffer.load is significantly slower than SI.load.const. If the
offset is aligned to 4, SI.load.const + FPExt or ZExt/SExt would be
faster.

Marek

> +   ret = ac_build_gather_values(>ac, results, 
> num_components);
> +   } else {
> +   ret = ac_build_buffer_load(>ac, rsrc, num_components, 
> NULL, offset,
> +  NULL, 0, false, false, true, true);
> +
> +   ret = ac_trim_vector(>ac, ret, num_components);
> +   }
> +
> return LLVMBuildBitCast(ctx->ac.builder, ret,
> get_def_type(ctx, >dest.ssa), "");
>  }
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/7] st/mesa: generalize code for the compressed texture map/unmap fallback

2018-07-23 Thread Marek Olšák

From: Marek Olšák 

in order to support ASTC
---
 src/mesa/state_tracker/st_cb_texture.c | 41 +++---
 src/mesa/state_tracker/st_texture.h| 13 
 2 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_texture.c 
b/src/mesa/state_tracker/st_cb_texture.c
index b345b2c6d8b..ecd1f4ef339 100644
--- a/src/mesa/state_tracker/st_cb_texture.c
+++ b/src/mesa/state_tracker/st_cb_texture.c
@@ -200,23 +200,23 @@ st_FreeTextureImageBuffer(struct gl_context *ctx,
DBG("%s\n", __func__);
 
if (stImage->pt) {
   pipe_resource_reference(>pt, NULL);
}
 
free(stImage->transfer);
stImage->transfer = NULL;
stImage->num_transfers = 0;
 
-   if (stImage->etc_data) {
-  free(stImage->etc_data);
-  stImage->etc_data = NULL;
+   if (stImage->compressed_data) {
+  free(stImage->compressed_data);
+  stImage->compressed_data = NULL;
}
 
/* if the texture image is being deallocated, the structure of the
 * texture is changing so we'll likely need a new sampler view.
 */
st_texture_release_all_sampler_views(st, stObj);
 }
 
 bool
 st_compressed_format_fallback(struct st_context *st, mesa_format format)
@@ -224,36 +224,37 @@ st_compressed_format_fallback(struct st_context *st, 
mesa_format format)
if (format == MESA_FORMAT_ETC1_RGB8)
   return !st->has_etc1;
 
if (_mesa_is_format_etc2(format))
   return !st->has_etc2;
 
return false;
 }
 
 static void
-etc_fallback_allocate(struct st_context *st, struct st_texture_image *stImage)
+compressed_tex_fallback_allocate(struct st_context *st,
+ struct st_texture_image *stImage)
 {
struct gl_texture_image *texImage = >base;
 
if (!st_compressed_format_fallback(st, texImage->TexFormat))
   return;
 
-   if (stImage->etc_data)
-  free(stImage->etc_data);
+   if (stImage->compressed_data)
+  free(stImage->compressed_data);
 
unsigned data_size = _mesa_format_image_size(texImage->TexFormat,
 texImage->Width2,
 texImage->Height2,
 texImage->Depth2);
 
-   stImage->etc_data =
+   stImage->compressed_data =
   malloc(data_size * _mesa_num_tex_faces(texImage->TexObject->Target));
 }
 
 /** called via ctx->Driver.MapTextureImage() */
 static void
 st_MapTextureImage(struct gl_context *ctx,
struct gl_texture_image *texImage,
GLuint slice, GLuint x, GLuint y, GLuint w, GLuint h,
GLbitfield mode,
GLubyte **mapOut, GLint *rowStrideOut)
@@ -268,36 +269,42 @@ st_MapTextureImage(struct gl_context *ctx,
 GL_MAP_WRITE_BIT |
 GL_MAP_INVALIDATE_RANGE_BIT)) == 0);
 
const enum pipe_transfer_usage transfer_flags =
   st_access_flags_to_transfer_flags(mode, false);
 
map = st_texture_image_map(st, stImage, transfer_flags, x, y, slice, w, h, 
1,
   );
if (map) {
   if (st_compressed_format_fallback(st, texImage->TexFormat)) {
- /* ETC isn't supported by all gallium drivers, where it's represented
-  * by uncompressed formats. We store the compressed data (as it's
-  * needed for image copies in OES_copy_image), and decompress as
-  * necessary in Unmap.
+ /* Some compressed formats don't have to be supported by drivers,
+  * and st/mesa transparently handles decompression on upload (Unmap),
+  * so that drivers don't see the compressed formats.
   *
-  * Note: all ETC1/ETC2 formats have 4x4 block sizes.
+  * We store the compressed data (it's needed for glGetCompressedTex-
+  * Image and image copies in OES_copy_image).
   */
  unsigned z = transfer->box.z;
  struct st_texture_image_transfer *itransfer = >transfer[z];
 
- unsigned bytes = _mesa_get_format_bytes(texImage->TexFormat);
+ unsigned blk_w, blk_h;
+ _mesa_get_format_block_size(texImage->TexFormat, _w, _h);
+
+ unsigned y_blocks = DIV_ROUND_UP(texImage->Height2, blk_h);
  unsigned stride = *rowStrideOut = itransfer->temp_stride =
 _mesa_format_row_stride(texImage->TexFormat, texImage->Width2);
+ unsigned block_size = _mesa_get_format_bytes(texImage->TexFormat);
+
  *mapOut = itransfer->temp_data =
-stImage->etc_data + ((x / 4) * bytes + (y / 4) * stride) +
-z * stride * texImage->Height2 / 4;
+stImage->compressed_data +
+(z * y_blocks + (y / blk_h)) * stride +
+(x / blk_w) * block_size;
  itransfer->map = map;
   }
   else {
  /* supported mapping */
  *mapOut = map;
  *rowStrideOut = transfer->stride;
   }
}
else {
   *mapOut = NULL;
@@ -624,21 +631,21 @@

[Mesa-dev] [PATCH 1/7] mesa: add ASTC 2D LDR decoder

2018-07-23 Thread Marek Olšák

From: Marek Olšák 

---
 src/mesa/Makefile.sources  |2 +
 src/mesa/main/formats.c|   42 +
 src/mesa/main/formats.h|3 +
 src/mesa/main/texcompress_astc.cpp | 1871 
 src/mesa/main/texcompress_astc.h   |   47 +
 src/util/half_float.c  |   59 +
 src/util/half_float.h  |5 +
 7 files changed, 2029 insertions(+)
 create mode 100644 src/mesa/main/texcompress_astc.cpp
 create mode 100644 src/mesa/main/texcompress_astc.h

diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
index 63f3734c322..ae8934e2830 100644
--- a/src/mesa/Makefile.sources
+++ b/src/mesa/Makefile.sources
@@ -207,20 +207,22 @@ MAIN_FILES = \
main/shader_query.cpp \
main/shared.c \
main/shared.h \
main/state.c \
main/state.h \
main/stencil.c \
main/stencil.h \
main/syncobj.c \
main/syncobj.h \
main/texcompress.c \
+   main/texcompress_astc.cpp \
+   main/texcompress_astc.h \
main/texcompress_bptc.c \
main/texcompress_bptc.h \
main/texcompress_bptc_tmp.h \
main/texcompress_cpal.c \
main/texcompress_cpal.h \
main/texcompress_etc.c \
main/texcompress_etc.h \
main/texcompress_etc_tmp.h \
main/texcompress_fxt1.c \
main/texcompress_fxt1.h \
diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c
index fdb53afd570..d4cd5d2182c 100644
--- a/src/mesa/main/formats.c
+++ b/src/mesa/main/formats.c
@@ -620,20 +620,62 @@ _mesa_is_format_etc2(mesa_format format)
case MESA_FORMAT_ETC2_SIGNED_RG11_EAC:
case MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1:
case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1:
   return GL_TRUE;
default:
   return GL_FALSE;
}
 }
 
 
+/**
+ * Return TRUE if format is an ASTC 2D compressed format.
+ */
+bool
+_mesa_is_format_astc_2d(mesa_format format)
+{
+   switch (format) {
+   case MESA_FORMAT_RGBA_ASTC_4x4:
+   case MESA_FORMAT_RGBA_ASTC_5x4:
+   case MESA_FORMAT_RGBA_ASTC_5x5:
+   case MESA_FORMAT_RGBA_ASTC_6x5:
+   case MESA_FORMAT_RGBA_ASTC_6x6:
+   case MESA_FORMAT_RGBA_ASTC_8x5:
+   case MESA_FORMAT_RGBA_ASTC_8x6:
+   case MESA_FORMAT_RGBA_ASTC_8x8:
+   case MESA_FORMAT_RGBA_ASTC_10x5:
+   case MESA_FORMAT_RGBA_ASTC_10x6:
+   case MESA_FORMAT_RGBA_ASTC_10x8:
+   case MESA_FORMAT_RGBA_ASTC_10x10:
+   case MESA_FORMAT_RGBA_ASTC_12x10:
+   case MESA_FORMAT_RGBA_ASTC_12x12:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_4x4:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x4:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x5:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x6:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_8x5:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_8x6:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_8x8:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_10x5:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_10x6:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_10x8:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_10x10:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_12x10:
+   case MESA_FORMAT_SRGB8_ALPHA8_ASTC_12x12:
+  return true;
+   default:
+  return false;
+   }
+}
+
+
 /**
  * If the given format is a compressed format, return a corresponding
  * uncompressed format.
  */
 mesa_format
 _mesa_get_uncompressed_format(mesa_format format)
 {
switch (format) {
case MESA_FORMAT_RGB_FXT1:
   return MESA_FORMAT_BGR_UNORM8;
diff --git a/src/mesa/main/formats.h b/src/mesa/main/formats.h
index 2afa886782b..335e4de9955 100644
--- a/src/mesa/main/formats.h
+++ b/src/mesa/main/formats.h
@@ -714,20 +714,23 @@ _mesa_is_format_unsigned(mesa_format format);
 
 extern GLboolean
 _mesa_is_format_signed(mesa_format format);
 
 extern GLboolean
 _mesa_is_format_integer(mesa_format format);
 
 extern bool
 _mesa_is_format_etc2(mesa_format format);
 
+bool
+_mesa_is_format_astc_2d(mesa_format format);
+
 GLenum
 _mesa_is_format_color_format(mesa_format format);
 
 extern GLenum
 _mesa_get_format_color_encoding(mesa_format format);
 
 extern GLuint
 _mesa_format_image_size(mesa_format format, GLsizei width,
 GLsizei height, GLsizei depth);
 
diff --git a/src/mesa/main/texcompress_astc.cpp 
b/src/mesa/main/texcompress_astc.cpp
new file mode 100644
index 000..996e8ea28d6
--- /dev/null
+++ b/src/mesa/main/texcompress_astc.cpp
@@ -0,0 +1,1871 @@
+/*
+ * Copyright 2015 Philip Taylor 
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this

[Mesa-dev] [PATCH 5/7] st/mesa: generalize fallback_copy_image for compressed textures

2018-07-23 Thread Marek Olšák

From: Marek Olšák 

in order to support ASTC
---
 src/mesa/state_tracker/st_cb_copyimage.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_copyimage.c 
b/src/mesa/state_tracker/st_cb_copyimage.c
index 5230b61877f..f5d8c047800 100644
--- a/src/mesa/state_tracker/st_cb_copyimage.c
+++ b/src/mesa/state_tracker/st_cb_copyimage.c
@@ -525,52 +525,57 @@ copy_image(struct pipe_context *pipe,
PIPE_FORMAT_R16G16_UNORM))
   return;
 
/* Only allow non-identity swizzling on RGBA8 formats. */
 
/* Simple copy, memcpy with swizzling, no format conversion. */
swizzled_copy(pipe, dst, dst_level, dstx, dsty, dstz, src, src_level,
  src_box);
 }
 
-/* Note, the only allowable compressed format for this function is ETC */
 static void
 fallback_copy_image(struct st_context *st,
 struct gl_texture_image *dst_image,
 struct pipe_resource *dst_res,
 int dst_x, int dst_y, int dst_z,
 struct gl_texture_image *src_image,
 struct pipe_resource *src_res,
 int src_x, int src_y, int src_z,
 int src_w, int src_h)
 {
uint8_t *dst, *src;
int dst_stride, src_stride;
struct pipe_transfer *dst_transfer, *src_transfer;
unsigned line_bytes;
 
bool dst_is_compressed = dst_image && 
_mesa_is_format_compressed(dst_image->TexFormat);
bool src_is_compressed = src_image && 
_mesa_is_format_compressed(src_image->TexFormat);
 
+   unsigned dst_blk_w = 1, dst_blk_h = 1, src_blk_w = 1, src_blk_h = 1;
+   if (dst_image)
+  _mesa_get_format_block_size(dst_image->TexFormat, _blk_w, 
_blk_h);
+   if (src_image)
+  _mesa_get_format_block_size(src_image->TexFormat, _blk_w, 
_blk_h);
+
unsigned dst_w = src_w;
unsigned dst_h = src_h;
unsigned lines = src_h;
 
if (src_is_compressed && !dst_is_compressed) {
-  dst_w = DIV_ROUND_UP(dst_w, 4);
-  dst_h = DIV_ROUND_UP(dst_h, 4);
+  dst_w = DIV_ROUND_UP(dst_w, src_blk_w);
+  dst_h = DIV_ROUND_UP(dst_h, src_blk_h);
} else if (!src_is_compressed && dst_is_compressed) {
-  dst_w *= 4;
-  dst_h *= 4;
+  dst_w *= dst_blk_w;
+  dst_h *= dst_blk_h;
}
if (src_is_compressed) {
-  lines = DIV_ROUND_UP(lines, 4);
+  lines = DIV_ROUND_UP(lines, src_blk_h);
}
 
if (src_image)
   line_bytes = _mesa_format_row_stride(src_image->TexFormat, src_w);
else
   line_bytes = _mesa_format_row_stride(dst_image->TexFormat, dst_w);
 
if (dst_image) {
   st->ctx->Driver.MapTextureImage(
 st->ctx, dst_image, dst_z,
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/7] st/mesa: use st_compressed_format_fallback more

2018-07-23 Thread Marek Olšák

From: Marek Olšák 

---
 src/mesa/state_tracker/st_format.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/src/mesa/state_tracker/st_format.c 
b/src/mesa/state_tracker/st_format.c
index 9ae796eca9e..45513e8683e 100644
--- a/src/mesa/state_tracker/st_format.c
+++ b/src/mesa/state_tracker/st_format.c
@@ -45,20 +45,21 @@
 #include "main/macros.h"
 #include "main/formatquery.h"
 
 #include "pipe/p_context.h"
 #include "pipe/p_defines.h"
 #include "pipe/p_screen.h"
 #include "util/u_format.h"
 #include "st_cb_texture.h"
 #include "st_context.h"
 #include "st_format.h"
+#include "st_texture.h"
 
 
 /**
  * Translate Mesa format to Gallium format.
  */
 enum pipe_format
 st_mesa_format_to_pipe_format(const struct st_context *st,
   mesa_format mesaFormat)
 {
struct pipe_screen *screen = st->pipe->screen;
@@ -1039,41 +1040,34 @@ st_pipe_format_to_mesa_format(enum pipe_format format)
  */
 static void
 test_format_conversion(struct st_context *st)
 {
GLuint i;
 
/* test all Mesa formats */
for (i = 1; i < MESA_FORMAT_COUNT; i++) {
   enum pipe_format pf;
 
-  /* ETC formats are translated differently, skip them. */
-  if (_mesa_is_format_etc2(i))
- continue;
-  if (i == MESA_FORMAT_ETC1_RGB8 && !st->has_etc1)
+  if (st_compressed_format_fallback(st, i))
  continue;
 
   pf = st_mesa_format_to_pipe_format(st, i);
   if (pf != PIPE_FORMAT_NONE) {
  mesa_format MAYBE_UNUSED mf = st_pipe_format_to_mesa_format(pf);
  assert(mf == i);
   }
}
 
/* Test all Gallium formats */
for (i = 1; i < PIPE_FORMAT_COUNT; i++) {
-  /* ETC formats are translated differently, skip them. */
-  if (i == PIPE_FORMAT_ETC1_RGB8 && !st->has_etc1)
- continue;
-
   mesa_format mf = st_pipe_format_to_mesa_format(i);
-  if (_mesa_is_format_etc2(mf) && !st->has_etc2)
+  if (st_compressed_format_fallback(st, mf))
  continue;
 
   if (mf != MESA_FORMAT_NONE) {
  enum pipe_format MAYBE_UNUSED pf =
 st_mesa_format_to_pipe_format(st, mf);
  assert(pf == i);
   }
}
 }
 
@@ -2340,24 +2334,22 @@ st_ChooseTextureFormat(struct gl_context *ctx, GLenum 
target,
   pTarget, 0, bindings, GL_TRUE);
 
if (pFormat == PIPE_FORMAT_NONE && !is_renderbuffer) {
   /* try choosing format again, this time without render target bindings */
   pFormat = st_choose_format(st, internalFormat, format, type,
  pTarget, 0, PIPE_BIND_SAMPLER_VIEW,
  GL_TRUE);
}
 
if (pFormat == PIPE_FORMAT_NONE) {
-  /* lie about using etc1/etc2 natively if we do decoding tricks */
   mFormat = _mesa_glenum_to_compressed_format(internalFormat);
-  if ((mFormat == MESA_FORMAT_ETC1_RGB8 && !st->has_etc1) ||
-  (_mesa_is_format_etc2(mFormat) && !st->has_etc2))
+  if (st_compressed_format_fallback(st, mFormat))
   return mFormat;
 
   /* no luck at all */
   return MESA_FORMAT_NONE;
}
 
mFormat = st_pipe_format_to_mesa_format(pFormat);
 
/* Debugging aid */
if (0) {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 7/7] st/mesa: implement ASTC 2D LDR fallback for all drivers

2018-07-23 Thread Marek Olšák

From: Marek Olšák 

---
 src/mesa/state_tracker/st_cb_texture.c | 17 +++-
 src/mesa/state_tracker/st_context.c|  3 ++
 src/mesa/state_tracker/st_context.h|  1 +
 src/mesa/state_tracker/st_extensions.c |  5 +++
 src/mesa/state_tracker/st_format.c | 56 ++
 5 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_texture.c 
b/src/mesa/state_tracker/st_cb_texture.c
index 80eb171dfd8..56d8c411472 100644
--- a/src/mesa/state_tracker/st_cb_texture.c
+++ b/src/mesa/state_tracker/st_cb_texture.c
@@ -34,20 +34,21 @@
 #include "main/format_utils.h"
 #include "main/glformats.h"
 #include "main/image.h"
 #include "main/imports.h"
 #include "main/macros.h"
 #include "main/mipmap.h"
 #include "main/pack.h"
 #include "main/pbo.h"
 #include "main/pixeltransfer.h"
 #include "main/texcompress.h"
+#include "main/texcompress_astc.h"
 #include "main/texcompress_etc.h"
 #include "main/texgetimage.h"
 #include "main/teximage.h"
 #include "main/texobj.h"
 #include "main/texstore.h"
 
 #include "state_tracker/st_debug.h"
 #include "state_tracker/st_context.h"
 #include "state_tracker/st_cb_bitmap.h"
 #include "state_tracker/st_cb_fbo.h"
@@ -220,20 +221,23 @@ st_FreeTextureImageBuffer(struct gl_context *ctx,
 
 bool
 st_compressed_format_fallback(struct st_context *st, mesa_format format)
 {
if (format == MESA_FORMAT_ETC1_RGB8)
   return !st->has_etc1;
 
if (_mesa_is_format_etc2(format))
   return !st->has_etc2;
 
+   if (_mesa_is_format_astc_2d(format))
+  return !st->has_astc_2d_ldr;
+
return false;
 }
 
 static void
 compressed_tex_fallback_allocate(struct st_context *st,
  struct st_texture_image *stImage)
 {
struct gl_texture_image *texImage = >base;
 
if (!st_compressed_format_fallback(st, texImage->TexFormat))
@@ -330,28 +334,34 @@ st_UnmapTextureImage(struct gl_context *ctx,
   struct pipe_transfer *transfer = itransfer->transfer;
 
   assert(z == transfer->box.z);
 
   if (transfer->usage & PIPE_TRANSFER_WRITE) {
  if (texImage->TexFormat == MESA_FORMAT_ETC1_RGB8) {
 _mesa_etc1_unpack_rgba(itransfer->map, transfer->stride,
itransfer->temp_data,
itransfer->temp_stride,
transfer->box.width, 
transfer->box.height);
- }
- else {
+ } else if (_mesa_is_format_etc2(texImage->TexFormat)) {
bool bgra = stImage->pt->format == PIPE_FORMAT_B8G8R8A8_SRGB;
 _mesa_unpack_etc2_format(itransfer->map, transfer->stride,
  itransfer->temp_data, 
itransfer->temp_stride,
  transfer->box.width, transfer->box.height,
 texImage->TexFormat,
 bgra);
+ } else if (_mesa_is_format_astc_2d(texImage->TexFormat)) {
+_mesa_unpack_astc_2d_ldr(itransfer->map, transfer->stride,
+ itransfer->temp_data, 
itransfer->temp_stride,
+ transfer->box.width, transfer->box.height,
+ texImage->TexFormat);
+ } else {
+unreachable("unexpected format for a compressed format fallback");
  }
   }
 
   itransfer->temp_data = NULL;
   itransfer->temp_stride = 0;
   itransfer->map = 0;
}
 
st_texture_image_unmap(st, stImage, slice);
 }
@@ -1408,20 +1418,21 @@ st_TexSubImage(struct gl_context *ctx, GLuint dims,
unsigned dstz = texImage->Face + texImage->TexObject->MinLayer;
unsigned dst_level = 0;
 
st_flush_bitmap_cache(st);
st_invalidate_readpix_cache(st);
 
if (stObj->pt == stImage->pt)
   dst_level = texImage->TexObject->MinLevel + texImage->Level;
 
assert(!_mesa_is_format_etc2(texImage->TexFormat) &&
+  !_mesa_is_format_astc_2d(texImage->TexFormat) &&
   texImage->TexFormat != MESA_FORMAT_ETC1_RGB8);
 
if (!dst)
   goto fallback;
 
/* Try texture_subdata, which should be the fastest memcpy path. */
if (pixels &&
!_mesa_is_bufferobj(unpack->BufferObj) &&
_mesa_texstore_can_use_memcpy(ctx, texImage->_BaseFormat,
  texImage->TexFormat, format, type,
@@ -1868,20 +1879,21 @@ st_GetTexSubImage(struct gl_context * ctx,
GLenum gl_target = texImage->TexObject->Target;
enum pipe_texture_target pipe_target;
unsigned dims;
struct pipe_blit_info blit;
unsigned bind;
struct pipe_transfer *tex_xfer;
ubyte *map = NULL;
boolean done = FALSE;
 
assert(!_mesa_is_format_etc2(texImage->TexFormat) &&
+  !_mesa_is_format_astc_2d(texImage->TexFormat) &&
   texImage->TexFormat != MESA_FORMAT_ETC1_RGB8);
 
st_flush_bitmap_cache(st);
 
if

[Mesa-dev] [PATCH 6/7] st/mesa: add ETC2 & ASTC fast path for GetTex(Sub)Image

2018-07-23 Thread Marek Olšák

From: Marek Olšák 

Not sure if GL/GLES can hit this path, but it's just decompression.
---
 src/mesa/state_tracker/st_cb_texture.c | 41 ++
 1 file changed, 41 insertions(+)

diff --git a/src/mesa/state_tracker/st_cb_texture.c 
b/src/mesa/state_tracker/st_cb_texture.c
index ecd1f4ef339..80eb171dfd8 100644
--- a/src/mesa/state_tracker/st_cb_texture.c
+++ b/src/mesa/state_tracker/st_cb_texture.c
@@ -1960,35 +1960,76 @@ st_GetTexSubImage(struct gl_context * ctx,
   /* Set the appropriate format for the decompressed texture.
* Luminance and sRGB formats shouldn't appear here.*/
   switch (src_format) {
   case PIPE_FORMAT_DXT1_RGB:
   case PIPE_FORMAT_DXT1_RGBA:
   case PIPE_FORMAT_DXT3_RGBA:
   case PIPE_FORMAT_DXT5_RGBA:
   case PIPE_FORMAT_RGTC1_UNORM:
   case PIPE_FORMAT_RGTC2_UNORM:
   case PIPE_FORMAT_ETC1_RGB8:
+  case PIPE_FORMAT_ETC2_RGB8:
+  case PIPE_FORMAT_ETC2_RGB8A1:
+  case PIPE_FORMAT_ETC2_RGBA8:
+  case PIPE_FORMAT_ASTC_4x4:
+  case PIPE_FORMAT_ASTC_5x4:
+  case PIPE_FORMAT_ASTC_5x5:
+  case PIPE_FORMAT_ASTC_6x5:
+  case PIPE_FORMAT_ASTC_6x6:
+  case PIPE_FORMAT_ASTC_8x5:
+  case PIPE_FORMAT_ASTC_8x6:
+  case PIPE_FORMAT_ASTC_8x8:
+  case PIPE_FORMAT_ASTC_10x5:
+  case PIPE_FORMAT_ASTC_10x6:
+  case PIPE_FORMAT_ASTC_10x8:
+  case PIPE_FORMAT_ASTC_10x10:
+  case PIPE_FORMAT_ASTC_12x10:
+  case PIPE_FORMAT_ASTC_12x12:
   case PIPE_FORMAT_BPTC_RGBA_UNORM:
  dst_glformat = GL_RGBA8;
  break;
   case PIPE_FORMAT_RGTC1_SNORM:
   case PIPE_FORMAT_RGTC2_SNORM:
  if (!ctx->Extensions.EXT_texture_snorm)
 goto fallback;
  dst_glformat = GL_RGBA8_SNORM;
  break;
   case PIPE_FORMAT_BPTC_RGB_FLOAT:
   case PIPE_FORMAT_BPTC_RGB_UFLOAT:
  if (!ctx->Extensions.ARB_texture_float)
 goto fallback;
  dst_glformat = GL_RGBA32F;
  break;
+  case PIPE_FORMAT_ETC2_R11_UNORM:
+ if (!screen->is_format_supported(screen, PIPE_FORMAT_R16_UNORM,
+  pipe_target, 0, bind))
+goto fallback;
+ dst_glformat = GL_R16;
+ break;
+  case PIPE_FORMAT_ETC2_R11_SNORM:
+ if (!screen->is_format_supported(screen, PIPE_FORMAT_R16_SNORM,
+  pipe_target, 0, bind))
+goto fallback;
+ dst_glformat = GL_R16_SNORM;
+ break;
+  case PIPE_FORMAT_ETC2_RG11_UNORM:
+ if (!screen->is_format_supported(screen, PIPE_FORMAT_R16G16_UNORM,
+  pipe_target, 0, bind))
+goto fallback;
+ dst_glformat = GL_RG16;
+ break;
+  case PIPE_FORMAT_ETC2_RG11_SNORM:
+ if (!screen->is_format_supported(screen, PIPE_FORMAT_R16G16_SNORM,
+  pipe_target, 0, bind))
+goto fallback;
+ dst_glformat = GL_RG16_SNORM;
+ break;
   default:
  assert(0);
  goto fallback;
   }
 
   dst_format = st_choose_format(st, dst_glformat, format, type,
 pipe_target, 0, bind, FALSE);
 
   if (dst_format == PIPE_FORMAT_NONE) {
  /* unable to get an rgba format!?! */
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/7] ASTC texture compression for all Gallium drivers

2018-07-23 Thread Marek Olšák

Hi,

This series enables ASTC texture compression for all Gallium drivers
that don't support it in hardware. The works the same as the ETC2
fallback, i.e. it decompresses ASTC inside glCompressedTexImage to
a supported uncompressed format.

RadeonSI now finally supports the following:
- GL_KHR_texture_compression_astc_ldr
- GL_ANDROID_extension_pack_es31a
- OpenGL ES 3.2 !!!

All ASTC dEQP tests pass.

Please review.

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/7] st/mesa: generalize st_etc_fallback -> st_compressed_format_fallback

2018-07-23 Thread Marek Olšák

From: Marek Olšák 

for ASTC support later
---
 src/mesa/state_tracker/st_cb_copyimage.c |  4 ++--
 src/mesa/state_tracker/st_cb_texture.c   | 24 ++--
 src/mesa/state_tracker/st_texture.h  |  2 +-
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_copyimage.c 
b/src/mesa/state_tracker/st_cb_copyimage.c
index d160c8c8d30..5230b61877f 100644
--- a/src/mesa/state_tracker/st_cb_copyimage.c
+++ b/src/mesa/state_tracker/st_cb_copyimage.c
@@ -661,22 +661,22 @@ st_CopyImageSubData(struct gl_context *ctx,
  dst_z += dst_image->TexObject->MinLayer;
   }
} else {
   struct st_renderbuffer *dst = st_renderbuffer(dst_renderbuffer);
   dst_res = dst->texture;
   dst_level = 0;
}
 
u_box_2d_zslice(src_x, src_y, src_z, src_width, src_height, );
 
-   if ((src_image && st_etc_fallback(st, src_image)) ||
-   (dst_image && st_etc_fallback(st, dst_image))) {
+   if ((src_image && st_compressed_format_fallback(st, src_image->TexFormat)) 
||
+   (dst_image && st_compressed_format_fallback(st, dst_image->TexFormat))) 
{
   fallback_copy_image(st, dst_image, dst_res, dst_x, dst_y, orig_dst_z,
   src_image, src_res, src_x, src_y, orig_src_z,
   src_width, src_height);
} else {
   copy_image(pipe, dst_res, dst_level, dst_x, dst_y, dst_z,
  src_res, src_level, );
}
 }
 
 void
diff --git a/src/mesa/state_tracker/st_cb_texture.c 
b/src/mesa/state_tracker/st_cb_texture.c
index 99209abcd62..b345b2c6d8b 100644
--- a/src/mesa/state_tracker/st_cb_texture.c
+++ b/src/mesa/state_tracker/st_cb_texture.c
@@ -212,32 +212,37 @@ st_FreeTextureImageBuffer(struct gl_context *ctx,
   stImage->etc_data = NULL;
}
 
/* if the texture image is being deallocated, the structure of the
 * texture is changing so we'll likely need a new sampler view.
 */
st_texture_release_all_sampler_views(st, stObj);
 }
 
 bool
-st_etc_fallback(struct st_context *st, struct gl_texture_image *texImage)
+st_compressed_format_fallback(struct st_context *st, mesa_format format)
 {
-   return (_mesa_is_format_etc2(texImage->TexFormat) && !st->has_etc2) ||
-  (texImage->TexFormat == MESA_FORMAT_ETC1_RGB8 && !st->has_etc1);
+   if (format == MESA_FORMAT_ETC1_RGB8)
+  return !st->has_etc1;
+
+   if (_mesa_is_format_etc2(format))
+  return !st->has_etc2;
+
+   return false;
 }
 
 static void
 etc_fallback_allocate(struct st_context *st, struct st_texture_image *stImage)
 {
struct gl_texture_image *texImage = >base;
 
-   if (!st_etc_fallback(st, texImage))
+   if (!st_compressed_format_fallback(st, texImage->TexFormat))
   return;
 
if (stImage->etc_data)
   free(stImage->etc_data);
 
unsigned data_size = _mesa_format_image_size(texImage->TexFormat,
 texImage->Width2,
 texImage->Height2,
 texImage->Depth2);
 
@@ -262,21 +267,21 @@ st_MapTextureImage(struct gl_context *ctx,
assert((mode & ~(GL_MAP_READ_BIT |
 GL_MAP_WRITE_BIT |
 GL_MAP_INVALIDATE_RANGE_BIT)) == 0);
 
const enum pipe_transfer_usage transfer_flags =
   st_access_flags_to_transfer_flags(mode, false);
 
map = st_texture_image_map(st, stImage, transfer_flags, x, y, slice, w, h, 
1,
   );
if (map) {
-  if (st_etc_fallback(st, texImage)) {
+  if (st_compressed_format_fallback(st, texImage->TexFormat)) {
  /* ETC isn't supported by all gallium drivers, where it's represented
   * by uncompressed formats. We store the compressed data (as it's
   * needed for image copies in OES_copy_image), and decompress as
   * necessary in Unmap.
   *
   * Note: all ETC1/ETC2 formats have 4x4 block sizes.
   */
  unsigned z = transfer->box.z;
  struct st_texture_image_transfer *itransfer = >transfer[z];
 
@@ -303,22 +308,23 @@ st_MapTextureImage(struct gl_context *ctx,
 
 /** called via ctx->Driver.UnmapTextureImage() */
 static void
 st_UnmapTextureImage(struct gl_context *ctx,
  struct gl_texture_image *texImage,
  GLuint slice)
 {
struct st_context *st = st_context(ctx);
struct st_texture_image *stImage  = st_texture_image(texImage);
 
-   if (st_etc_fallback(st, texImage)) {
-  /* Decompress the ETC texture to the mapped one. */
+   if (st_compressed_format_fallback(st, texImage->TexFormat)) {
+  /* Decompress the compressed image on upload if the driver doesn't
+   * support the compressed format. */
   unsigned z = slice + stImage->base.Face;
   struct st_texture_image_transfer *itransfer = >transfer[z];
   struct pipe_transfer *transfer = itransfer->transfer;
 
   assert(z == transfer->box.z);

[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107351

--- Comment #10 from Bas Nieuwenhuizen  ---
If you're in a position to get more useful info, it would be great if you could
dump the *pCreateInfo arg of the crashing radv_CreateImage.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107351

--- Comment #9 from Bas Nieuwenhuizen  ---
Yes, images not explicitly using the path for Android cross-api interop should
just follow the normal path.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107351

--- Comment #8 from Mauro Rossi  ---
In the code between ANDROID braces, is it correct that code falls through in
case of !gralloc_info or should it bail out?

Just a question as I may be very wrong and confused


#ifdef ANDROID
   const VkNativeBufferANDROID *gralloc_info =
  vk_find_struct_const(pCreateInfo->pNext, NATIVE_BUFFER_ANDROID);

if (gralloc_info)/*  this line 1252  */
   return radv_image_from_gralloc(device, pCreateInfo, gralloc_info,
  pAllocator, pImage);
#endif

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107351

--- Comment #7 from Bas Nieuwenhuizen  ---
Looking at the full backtrace it looks like plain texture creation, nothing
Android specific so I'll need to debug the specific app sometime.

07-22 21:31:16.071  5007  5007 F DEBUG   : backtrace:
07-22 21:31:16.071  5007  5007 F DEBUG   : #00 pc 000f1d47 
/system/vendor/lib/hw/vulkan.radv.so (radv_image_create+199)
07-22 21:31:16.071  5007  5007 F DEBUG   : #01 pc 000f2f26 
/system/vendor/lib/hw/vulkan.radv.so (radv_CreateImage+166)
07-22 21:31:16.071  5007  5007 F DEBUG   : #02 pc b59a 
/android/system/lib/libvulkan.so
07-22 21:31:16.071  5007  5007 F DEBUG   : #03 pc 001488fe 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(eva::graphics::vulkan::native_memory_utility::create_texture(VkImageCreateInfo
const&)+78)
07-22 21:31:16.071  5007  5007 F DEBUG   : #04 pc 0013ffe9 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(eva::graphics::vulkan::vulkan_device::create_texture(eva::graphics::texture_desc
const&, eva::graphics::subresource_data const*)+233)
07-22 21:31:16.071  5007  5007 F DEBUG   : #05 pc 001458cf 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
07-22 21:31:16.071  5007  5007 F DEBUG   : #06 pc 0013e474 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(eva::graphics::vulkan::create_dds_texture_from_memory(std::function,
eva::graphics::texture_desc*, unsigned char const*, DXGI_FORMAT, unsigned int,
unsigned int, unsigned int)+2484)
07-22 21:31:16.071  5007  5007 F DEBUG   : #07 pc 00145b8c 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(eva::graphics::vulkan::vulkan_resource_factory::load_texture(boost::filesystem::path
const&, DXGI_FORMAT, int)+316)
07-22 21:31:16.071  5007  5007 F DEBUG   : #08 pc 001ad794 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(eva::graphics::assets::texture_loader::load(std::string const&) const+1412)
07-22 21:31:16.071  5007  5007 F DEBUG   : #09 pc 00153e69 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(eva::ui::loading_screen_contents::loading_screen_contents(eva::ui::loading_screen_creation_parameters
const&, eva::globals&)+649)
07-22 21:31:16.071  5007  5007 F DEBUG   : #10 pc 0011f1a1 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(mobile_player::player::create_loading_screen()+161)
07-22 21:31:16.071  5007  5007 F DEBUG   : #11 pc 00125563 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(_ZN13mobile_player6playerC1ERN3eva7globalsESsNS1_4math6vectorIiLj2EEESsRNS1_2ui13invalidatableERNS7_12message_pumpEP13ANativeWindowiNS1_8graphics8platformE+6083)
07-22 21:31:16.071  5007  5007 F DEBUG   : #12 pc 00118e37 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(android_native::native_application::create_player(std::string, bool,
ANativeWindow*)+375)
07-22 21:31:16.071  5007  5007 F DEBUG   : #13 pc 0011cd0e 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(activity::handle_cmd(int)+206)
07-22 21:31:16.071  5007  5007 F DEBUG   : #14 pc 0011ce15 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(activity::handle_cmd(android_app*, int)+37)
07-22 21:31:16.071  5007  5007 F DEBUG   : #15 pc 0025e46c 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
07-22 21:31:16.071  5007  5007 F DEBUG   : #16 pc 0011cb9e 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(activity::run()+126)
07-22 21:31:16.071  5007  5007 F DEBUG   : #17 pc 0011cf02 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so
(android_main+98)
07-22 21:31:16.071  5007  5007 F DEBUG   : #18 pc 0025ddd4 
/data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 12/15] nir/types: Add a wrapper to access gl_type

2018-07-23 Thread Timothy Arceri


On 23/07/18 22:48, Alejandro Piñeiro wrote:

On 21/07/18 05:15, Timothy Arceri wrote:

On 21/07/18 13:09, Timothy Arceri wrote:

Reviewed-by: Timothy Arceri 


Actually I take that back. This introduces a dependency on GL in NIR,


Hmm, but that dependency is already there. nir.h includes GL/gl.h, and
in fact, there is a comment pointing that is there for GLenum.
nir_variable includes a "Glenum format" as part of
ARB_shader_image_load_store.


If it's already there then for now this patch is:

Reviewed-by: Timothy Arceri 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107351

--- Comment #6 from Mauro Rossi  ---
Created attachment 140802
  --> https://bugs.freedesktop.org/attachment.cgi?id=140802=edit
extended logcat to see what happens before and after

extended logcat to see what happens before and after

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107351

--- Comment #5 from Mauro Rossi  ---
>Is ANDROID properly defined?

Yes, because it is defined by the build system itself, 
without need of passing the LOCAL_CFLAGS := -DANDROID

Mauro

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107295] Access violation on glDrawArrays with count >= 2048

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107295

Roland Scheidegger  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Roland Scheidegger  ---
Fixed by 09828feab0bdcb1feb0865fc66e5304f6164c3b8.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107351

--- Comment #4 from Bas Nieuwenhuizen  ---
No, on ANDROID, the call to radv_image_from_gralloc should happen for wsi
images and wsi_info should be always NULL.

Is ANDROID properly defined?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] virgl: remove unused stride-arguments

2018-07-23 Thread Erik Faye-Lund

On Jul 23, 2018 22:05, Dave Airlie  wrote:On 20 July 2018 at 13:39, Gurchetan Singh  wrote:

> Reviewed-by: Gurchetan Singh 

> On Wed, Jul 18, 2018 at 4:06 AM Erik Faye-Lund

>  wrote:

>>

>> The IOCTLs doesn't pass this along, so computing them in the first

>> place is kinda pointless.

>>

>> Signed-off-by: Erik Faye-Lund 

>> ---

>>

>> This is just a cleanup I noticed based on some discussion with Gert.

>>

>> A question is, what code here expects this stride to be respected? The

>> call-sites in virgl_*_transfer_map and virgl_*_transfer_unmap kinda

>> looks like they do... They'll get a bit of a surprise here, no?



>>

>> Anyway, this is already broken, so I think this should be OK. But

>> perhaps this patch shows some code-paths that need some love?



I reverted this as it didn't fixup vtest, and it introduced build time warnings.



I agree this should get some more investigation but make sure vtest and

drm backends don't regress.



Dave.


Good call! ___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107351

--- Comment #3 from Mauro Rossi  ---
Hi Bas, 
the problem was obeserved in the last month, but the relevant parts
radv_image.c and radv_android.c did not change,
so anything between 3da693b and HEAD

the addr2line output is not always reliable in terms of line of code,
looking at the logcat the call causing the segfault is radv_image_create() at
line 1261, the symbols resolved by crash dump should be reliable,
based on previous experiences.

I had a look at differences between radv and anv and I saw two of them:

1. wsi extension implementation is different, so I tried to revert 
   a50f93e (radv/image: Implement the wsi "extension")
   but the problem was not solved

2. radv_image_create() passes .scanout parameter and anv_image_create()
   does not. I have not tried yet to remove .scanout in radv_image_create(), 
   but is it worth an attempt?

Mauro

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.

2018-07-23 Thread Kenneth Graunke

On Monday, July 23, 2018 1:24:08 PM PDT Rob Clark wrote:
> On Mon, Jul 23, 2018 at 1:25 PM, Eric Anholt  wrote:
> > Eric Anholt  writes:
> >
> >> This is controlled by a new nir_shader_compiler_options flag, and fixes
> >> dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.
> >> ---
> >
> > Ken, Rob, any chance you could take a quick look at this?  I think this
> > is the last change I need in Mesa for GLES2 conformance.
> >
> 
> I was kinda wondering if there was any driver where we *didn't* want
> this?  I had meant to try that deqp w/ freedreno but got distracted..
> I'll try to give this a look this evening..
> 
> BR,
> -R

i965 can configure the point coordinate origin to be either upper left
or lower left, so I don't think we need Y-flipping in the shader.  But
we do need this pass for gl_FragCoord.  So I think making it optional
is probably a good idea.

Looks good to me, Eric.  Congratulations on finishing the conformance
test bugfixing slog! :)

Reviewed-by: Kenneth Graunke 

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel/compiler: Account for built-in uniforms in analyze_ubo_ranges

2018-07-23 Thread Kenneth Graunke

On Monday, July 23, 2018 10:46:31 AM PDT Jason Ekstrand wrote:
> The original pass only looked for load_uniform intrinsics but there are
> a number of other places that could end up loading a push constant.  One
> obvious omission was images which always implicitly use a push constant.
> Legacy VS clip planes also get pushed into the shader.
> 
> Cc: mesa-sta...@lists.freedesktop.org
> Cc: Kenneth Graunke 

Ugh.  Thanks.

Reviewed-by: Kenneth Graunke 

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/miptree: Fix can_blit_slice()

2018-07-23 Thread Nanley Chery

On Mon, Jul 23, 2018 at 08:20:15PM +0100, Chris Wilson wrote:
> Quoting Nanley Chery (2018-07-23 18:17:15)
> > Satisfy the BLT engine's row pitch limitation on the destination
> > miptree. The destination miptree is untiled, so its row_pitch will be
> > slightly less than or equal to the source miptree's row_pitch. Use the
> > source miptree's row_pitch in can_blit_slice instead of its blt_pitch.
> > 
> > Fixes 0288fe8d0417730bdd5b3477130dd1dc32bdbcd3
> > ("i965/miptree: Use the correct BLT pitch")
> > 
> > Cc: 
> > Reported-by: Dylan Baker 
> > ---
> >  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > index a18d5ac3624..d8e823e4826 100644
> > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > @@ -3544,8 +3544,13 @@ static bool
> >  can_blit_slice(struct intel_mipmap_tree *mt,
> > unsigned int level, unsigned int slice)
> >  {
> > -   /* See intel_miptree_blit() for details on the 32k pitch limit. */
> > -   if (intel_miptree_blt_pitch(mt) >= 32768)
> > +   /* The blit destination is untiled, so its row_pitch will be slightly 
> > less
> > +* than or equal to the source's row_pitch. The BLT engine only supports
> > +* linear row pitches up to but not including 32k.
> > +*
> > +* See intel_miptree_blit() for details on the 32k pitch limit.
> > +*/
> > +   if (mt->surf.row_pitch >= 32768)
> >return false;
> 
> I see the difference, but do we copy the whole slice or a region of it?

I think it depends on the caller.

-Nanley

> -Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107351

--- Comment #2 from Bas Nieuwenhuizen  ---
Do you have a hash of the upstream git version you based this on? (just so I
can see how the lines correlate in radv_image.c)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.

2018-07-23 Thread Rob Clark

On Mon, Jul 23, 2018 at 1:25 PM, Eric Anholt  wrote:
> Eric Anholt  writes:
>
>> This is controlled by a new nir_shader_compiler_options flag, and fixes
>> dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.
>> ---
>
> Ken, Rob, any chance you could take a quick look at this?  I think this
> is the last change I need in Mesa for GLES2 conformance.
>

I was kinda wondering if there was any driver where we *didn't* want
this?  I had meant to try that deqp w/ freedreno but got distracted..
I'll try to give this a look this evening..

BR,
-R


> (https://gerrit.khronos.org/#/c/2745/ and
> https://gitlab.khronos.org/Tracker/vk-gl-cts/issues/1278 being my CTS
> issues)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-07-23 Thread Jose Maria Casanova Crespo

These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.

The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.

The motivation of this functions if to know the read/written bytes
of the instructions to improve the liveness analysis for partial read/writes.

We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
parameter they have a different read pattern.

v2: (Francisco Jerez)
- Split original register_byte_use_pattern into one read and other
  write.
- Check for send like instructions using this->mlen != 0
- Pass functions src number and offset.
- Use periodic_mask function with code written by Francisco Jerez
  to simplify pattern generation.
- Avoid breaking silently if source straddles multiple GRFs.

v3: (Francisco Jerez)
- A SEND could be this->mlen != 0 or this->is_send_from_grf
- We only assume that a periodic mask with offset could be applied
  to reg_offset == 0.
- We can assure that for MOVs operations for any offset (Chema)

Cc: Francisco Jerez 

---
 src/intel/compiler/brw_fs.cpp  | 119 +
 src/intel/compiler/brw_ir_fs.h |   2 +
 2 files changed, 121 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 7ddbd285fe2..4fa0f154c44 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -39,6 +39,7 @@
 #include "compiler/glsl_types.h"
 #include "compiler/nir/nir_builder.h"
 #include "program/prog_parameter.h"
+#include 
 
 using namespace brw;
 
@@ -687,6 +688,124 @@ fs_inst::is_partial_write() const
this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns a periodic mask that is repeated "count" times with a "step"
+ * size and consecutive "bits" finally shifted "offset" bits to the left.
+ *
+ * This helper is used to calculate the representations of byte read/write
+ * register patterns
+ *
+ * Example: periodic_mask(8, 4, 2, 0)  would return 0x
+ *  periodic_mask(8, 4, 2, 2)  would return 0x
+ *  periodic_masc(8, 2, 2, 16) would return 0x
+ */
+static inline uint32_t
+periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset)
+{
+   uint32_t m = (count ? (1 << bits) - 1 : 0);
+   const unsigned max = MIN2(count * step, sizeof(m) * CHAR_BIT);
+
+   for (unsigned shift = step; shift < max; shift *= 2)
+  m |= m << shift;
+
+   return m << offset;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been written by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and the
+ * stride of the destination register.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are defined for operations that don't write a full register. So we
+ * we can identify in live range variable analysis if a partial write has
+ * completelly defined the data used by a partial read.
+ */
+unsigned
+fs_inst::dst_write_pattern(unsigned reg_offset) const
+{
+   assert(this->dst.file == VGRF);
+   /* We don't know what is written so we return the worst case */
+   if (this->predicate && this->opcode != BRW_OPCODE_SEL)
+  return 0u;
+   /* We assume that send destinations are completelly defined */
+   if (this->is_send_from_grf() || this->mlen != 0) {
+  return ~0u;
+   }
+
+   /* The byte pattern is calculated using a periodic mask for reg_offset == 0
+* because the internal offset will match how the register is written.
+*
+* We can for any reg_offset on MOV operations. We could add in the future
+* other opcodes, but we didn't include them until we have evidences of
+* them being used in partial write situations that ensure that the pattern
+* is repeated of any reg_offset.
+*/
+   if (reg_offset == 0 || this->opcode == BRW_OPCODE_MOV) {
+  return periodic_mask(this->exec_size,
+   this->dst.stride * type_sz(this->dst.type),
+   type_sz(this->dst.type),
+   this->dst.offset % REG_SIZE);
+   }
+   /* This shouldn't be reached by in liveness range calcluation but if
+* function is other context we know that we write a complete register.
+*/
+   if (!this->is_partial_write())
+  return ~0u;
+
+   /* By default we don't know what is written */
+   return 0u;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been read by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and stride of
+ * a source register and a register offset.
+ *
+ * The objective of this function is to identify

Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-23 Thread Chema Casanova

El 20/07/18 a las 22:10, Francisco Jerez escribió:
> Chema Casanova  writes:
> 
>> El 20/07/18 a las 00:34, Francisco Jerez escribió:
>>> Chema Casanova  writes:
>>>
 El 14/07/18 a las 00:14, Francisco Jerez escribió:
> Jose Maria Casanova Crespo  writes:
>
>> For a register source/destination of an instruction the function returns
>> the read/write byte pattern of a 32-byte registers as a unsigned int.
>>
>> The returned pattern takes into account the exec_size of the instruction,
>> the type bitsize, the stride and if the register is source or 
>> destination.
>>
>> The objective of the functions if to help to know the read/written bytes
>> of the instructions to improve the liveness analysis for partial 
>> read/writes.
>>
>> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
>> parameter they have a different read pattern.
>> ---
>>  src/intel/compiler/brw_fs.cpp  | 183 +
>>  src/intel/compiler/brw_ir_fs.h |   1 +
>>  2 files changed, 184 insertions(+)
>>
>> diff --git a/src/intel/compiler/brw_fs.cpp 
>> b/src/intel/compiler/brw_fs.cpp
>> index 2b8363ca362..f3045c4ff6c 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
>> this->dst.offset % REG_SIZE != 0);
>>  }
>>  
>> +/**
>> + * Returns a 32-bit uint whose bits represent if the associated 
>> register byte
>> + * has been read/written by the instruction. The returned pattern takes 
>> into
>> + * account the exec_size of the instruction, the type bitsize and the 
>> register
>> + * stride and the register is source or destination for the instruction.
>> + *
>> + * The objective of this function is to identify which parts of the 
>> register
>> + * are read or written for operations that don't read/write a full 
>> register.
>> + * So we can identify in live range variable analysis if a partial 
>> write has
>> + * completelly defined the part of the register used by a partial read. 
>> So we
>> + * avoid extending the liveness range because all data read was already
>> + * defined although the wasn't completely written.
>> + */
>> +unsigned
>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) 
>> const
>> +{
>> +   if (is_dst) {

> Please split into two functions (like fs_inst::src_read and
> ::src_written) since that would make the call-sites of this method more
> self-documenting than a boolean parameter.  You should be able to share
> code by refactoring the common logic into a separate function (see below
> for some suggestions on how that could be achieved).

 Sure, it would improve readability and simplifies the logic, I've chosen
 dst_write_pattern and src_read_pattern.

>
>> +  /* We don't know what is written so we return the worts case */
>
> "worst"

 Fixed.

>> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
>> + return 0;
>> +  /* We assume that send destinations are completely written */
>> +  if (this->is_send_from_grf())
>> + return ~0u;
>
> Some send-like instructions won't be caught by this condition, you
> should check for this->mlen != 0 in addition.

 Would it be enough to check for (this->mlen > 0) and forget about
 is_send_from_grf? I am using this approach in v2 I am sending.

>>>
>>> I don't think the mlen > 0 condition would catch all cases either...
>>> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need both
>>> conditions.  Sucks...
>>
>> That is true, so now we have the:
>>  (this->is_send_from_grf() || this->mlen != 0)
>>
>> +   } else {
>> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit 
>> aligned
>> +   * so the read pattern depends on the bitsize stored at src[4]
>> +   */
>> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
>> +  this->src[1].nr == r.nr) {

> I feel uncomfortable about attempting to guess the source the caller is
> referring to by comparing the registers for equality.  E.g.  you could
> potentially end up with two sources that compare equal but have
> different semantics (e.g. as a result of CSE) which might cause it to
> get the wrong answer.  It would probably be better to pass a source
> index and a byte offset as argument instead of an fs_reg.

 I've didn't thought about CSE, I'm now receiving the number of source
 and the reg_offset. I'm using reg_offset instead of byte offsets as it
 simplifies the logic. Now we are using always

Re: [Mesa-dev] [PATCH 1/2] nir: add const_index parameters to system value builder function

2018-07-23 Thread Eric Anholt

Karol Herbst  writes:

> On Mon, Jul 23, 2018 at 7:02 PM, Eric Anholt  wrote:
>> Karol Herbst  writes:
>>
>>> this allows to replace some nir_load_system_value calls with the specific
>>> system value constructor
>>>
>>> Reviewed-by: Jason Ekstrand 
>>> Signed-off-by: Karol Herbst 
>>> ---
>>>  src/compiler/nir/nir_builder_opcodes_h.py | 21 +++--
>>>  1 file changed, 19 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/compiler/nir/nir_builder_opcodes_h.py 
>>> b/src/compiler/nir/nir_builder_opcodes_h.py
>>> index 72cf5b4549d..0440875dade 100644
>>> --- a/src/compiler/nir/nir_builder_opcodes_h.py
>>> +++ b/src/compiler/nir/nir_builder_opcodes_h.py
>>> @@ -55,11 +55,28 @@ nir_load_system_value(nir_builder *build, 
>>> nir_intrinsic_op op, int index)
>>> return >dest.ssa;
>>>  }
>>>
>>> +<%
>>> +def sysval_decl_list(opcode):
>>> +   res = ''
>>> +   if opcode.indices:
>>> +  res += ', unsigned ' + opcode.indices[0].lower()
>>> +   return res
>>> +
>>> +def sysval_arg_list(opcode):
>>> +   args = []
>>> +   if opcode.indices:
>>> +  args.append(opcode.indices[0].lower())
>>> +   else:
>>> +  args.append('0')
>>> +   return ', '.join(args)
>>> +%>
>>
>> I was confused why only indices[0] was used, but it looks like system
>> values can only have one index.  Maybe assert(len(opcode.indices) <= 1)?
>> Other than that, these are:
>>
>> Reviewed-by: Eric Anholt 
>
> maybe a static assert that verifies this instead? We could put it
> inside nir_intrinsics_c.py and add a note that nir_load_system_value
> needs to be adjusted in case we want more.

Either way is fine with me.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] virgl: remove unused stride-arguments

2018-07-23 Thread Dave Airlie

On 20 July 2018 at 13:39, Gurchetan Singh  wrote:
> Reviewed-by: Gurchetan Singh 
> On Wed, Jul 18, 2018 at 4:06 AM Erik Faye-Lund
>  wrote:
>>
>> The IOCTLs doesn't pass this along, so computing them in the first
>> place is kinda pointless.
>>
>> Signed-off-by: Erik Faye-Lund 
>> ---
>>
>> This is just a cleanup I noticed based on some discussion with Gert.
>>
>> A question is, what code here expects this stride to be respected? The
>> call-sites in virgl_*_transfer_map and virgl_*_transfer_unmap kinda
>> looks like they do... They'll get a bit of a surprise here, no?

>>
>> Anyway, this is already broken, so I think this should be OK. But
>> perhaps this patch shows some code-paths that need some love?

I reverted this as it didn't fixup vtest, and it introduced build time warnings.

I agree this should get some more investigation but make sure vtest and
drm backends don't regress.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] virgl: add initial shader_storage_buffer_object support. (v2)

2018-07-23 Thread Dave Airlie

On 24 July 2018 at 03:14, Gert Wollny  wrote:
> Am Montag, den 23.07.2018, 21:17 +1000 schrieb Dave Airlie:
>> On 23 July 2018 at 17:28, Gert Wollny 
>> wrote:
>> > Am Montag, den 23.07.2018, 17:13 +1000 schrieb Dave Airlie:
>> > > On 23 July 2018 at 16:46, Gert Wollny 
>> > > wrote:
>> > > > Am Montag, den 23.07.2018, 09:04 +1000 schrieb Dave Airlie:
>> > > > >
>> > > > > > > +   uint32_t max_shader_buffer = shader ==
>> > > > > > > PIPE_SHADER_FRAGMENT ?
>> > > > > > > +  rs->caps.caps.v2.max_shader_buffer_frag_compute :
>> > > > > >
>> > > > > > Shouldn't max_shader_buffer_frag_compute also be used for
>> > > > > > PIPE_SHADER_COMPUTE?
>> > > > >
>> > > > > virgl doesn't support compute shaders yet. Adding support for
>> > > > > them is
>> > > > > in the future patches.
>> > > >
>> > > > I somehow suspected this, but then one should probably
>> > > > explicitely
>> > > > return zero instead of returning
>> > > > max_shader_buffer_other_stages.
>> > >
>> > > You can never reach this point of code with shader set to
>> > > PIPE_SHADER_COMPUTE.
>> > >
>> > > The outer switch statement defaults to return 0 for compute
>> > > shaders.
>> >
>> > I see, that was not visible in the patch, sorry for the noise.
>>
>> Is that an r-b in disguise? :-)
>
> Well, I had to take a closer look for this, and there is one thing that
>  I'm wondering about: Are the last five macros in virgl_protocol.h for
> the shader buffers actually used? I don't see it, but if they supposed
> to be used somewhere then the offsets for the last three entries seem
> to be off by one, because currently
> VIRGL_SET_SHADER_BUFFER_OFFSET(0) == VIRGL_SET_SHADER_BUFFER_START_SLOT
> == 2

Those macros are meant to be used on the decode side in virglrenderer,
and I just realised they weren't and are wrong, just sent a patch to fix them
up there. I'll fixup these for consistency

Dave.

>
> When this is clarified the patch is
> Reviewed-By: Gert Wollny 
>
> Best,
> Gert
>>
>> Dave.
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107351

--- Comment #1 from Mauro Rossi  ---
Created attachment 140797
  --> https://bugs.freedesktop.org/attachment.cgi?id=140797=edit
logcat of slingshot extreeme vulkan test

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107351

Bug ID: 107351
   Summary: Android 8.1: radv segfault with 3Dmark vulkan
benchmarks
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: other
Status: NEW
  Severity: major
  Priority: medium
 Component: Drivers/Vulkan/radeon
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: issor.or...@gmail.com
QA Contact: mesa-dev@lists.freedesktop.org

Created attachment 140796
  --> https://bugs.freedesktop.org/attachment.cgi?id=140796=edit
logcat of api_overhead test

Hi,

while performing some test on vulkan radv HAL on Android 8.1 (oreo-x86)
build a segfault was noticed and traced with 3Dmark vulkan benchmark.

The build is experimental and based on hwcomposer.drm + gralloc.gbm
and tested on HD7790 gpu (Bonaire)

The same 3Dmark vulkan benchmarks work without segfault on Intel anv running on
Skylake GT2, so it is better to investigate why radv crashes.

Inspection of crash with addr2line -Cfe returns last line of code:
src/amd/vulkan/radv_image.c: 1289

Mauro Rossi
android-x86 team


- beginning of crash
07-22 21:31:15.938  4949  4995 F libc: Fatal signal 11 (SIGSEGV), code 1,
fault addr 0x1c in tid 4995 (Thread-9), pid 4949 (cation:workload)
...
07-22 21:31:15.956  5007  5007 F DEBUG   : *** *** *** *** *** *** *** *** ***
*** *** *** *** *** *** ***
07-22 21:31:15.956  5007  5007 F DEBUG   : Build fingerprint:
'Android-x86/android_x86_64/x86_64:8.1.0/OPM6.171019.030.E1/maur07141051:userdebug/test-keys'
07-22 21:31:15.956  5007  5007 F DEBUG   : Revision: '0'
07-22 21:31:15.956  5007  5007 F DEBUG   : ABI: 'x86'
07-22 21:31:15.956  5007  5007 F DEBUG   : pid: 4949, tid: 4995, name: Thread-9
 >>> com.futuremark.dmandroid.application:workload <<<
07-22 21:31:15.956  5007  5007 F DEBUG   : signal 11 (SIGSEGV), code 1
(SEGV_MAPERR), fault addr 0x1c
07-22 21:31:15.956  5007  5007 F DEBUG   : Cause: null pointer dereference
07-22 21:31:15.956  5007  5007 F DEBUG   : eax   ebx c65b690c  ecx
c2a6d0ac  edx 0097
07-22 21:31:15.956  5007  5007 F DEBUG   : esi c2a6d0ac  edi c1fbbb00
07-22 21:31:15.956  5007  5007 F DEBUG   : xcs 0023  xds 002b  xes
002b  xfs 006b  xss 002b
07-22 21:31:15.956  5007  5007 F DEBUG   : eip c64e0d47  ebp c2b7eb38  esp
c2b7eaf0  flags 00010286
07-22 21:31:16.071  5007  5007 F DEBUG   : 
07-22 21:31:16.071  5007  5007 F DEBUG   : backtrace:
07-22 21:31:16.071  5007  5007 F DEBUG   : #00 pc 000f1d47 
/system/vendor/lib/hw/vulkan.radv.so (radv_image_create+199)
07-22 21:31:16.071  5007  5007 F DEBUG   : #01 pc 000f2f26 
/system/vendor/lib/hw/vulkan.radv.so (radv_CreateImage+166)
07-22 21:31:16.071  5007  5007 F DEBUG   : #02 pc b59a 
/android/system/lib/libvulkan.so

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/miptree: Fix can_blit_slice()

2018-07-23 Thread Chris Wilson

Quoting Nanley Chery (2018-07-23 18:17:15)
> Satisfy the BLT engine's row pitch limitation on the destination
> miptree. The destination miptree is untiled, so its row_pitch will be
> slightly less than or equal to the source miptree's row_pitch. Use the
> source miptree's row_pitch in can_blit_slice instead of its blt_pitch.
> 
> Fixes 0288fe8d0417730bdd5b3477130dd1dc32bdbcd3
> ("i965/miptree: Use the correct BLT pitch")
> 
> Cc: 
> Reported-by: Dylan Baker 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index a18d5ac3624..d8e823e4826 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -3544,8 +3544,13 @@ static bool
>  can_blit_slice(struct intel_mipmap_tree *mt,
> unsigned int level, unsigned int slice)
>  {
> -   /* See intel_miptree_blit() for details on the 32k pitch limit. */
> -   if (intel_miptree_blt_pitch(mt) >= 32768)
> +   /* The blit destination is untiled, so its row_pitch will be slightly less
> +* than or equal to the source's row_pitch. The BLT engine only supports
> +* linear row pitches up to but not including 32k.
> +*
> +* See intel_miptree_blit() for details on the 32k pitch limit.
> +*/
> +   if (mt->surf.row_pitch >= 32768)
>return false;

I see the difference, but do we copy the whole slice or a region of it?
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107305] glsl/opt_copy_propagation_elements.cpp:72:9: error: delegating constructors are permitted only in C++11

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107305

Caio Marcelo de Oliveira Filho  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #3 from Caio Marcelo de Oliveira Filho  ---
Fixed by

commit 52d831ff83036773978aabf52dde3bb73bb211c7
Author: Caio Marcelo de Oliveira Filho 
Date:   Fri Jul 20 13:21:33 2018 -0700

glsl: remove delegating constructors to allow build with C++98

Delegating constructors is a C++11 feature, so this was breaking when
compiling with C++98. Change the copy_propagation_state() calls that
used the convenience constructor to use a static member function
instead.

Since copy_propagation_state is expected to be heap allocated, this
change is a good fit.

Tested-by: Vinson Lee 
Reviewed-by: Matt Turner 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107305

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel/compiler: Account for built-in uniforms in analyze_ubo_ranges

2018-07-23 Thread Lionel Landwerlin


On 23/07/18 18:46, Jason Ekstrand wrote:

The original pass only looked for load_uniform intrinsics but there are
a number of other places that could end up loading a push constant.  One
obvious omission was images which always implicitly use a push constant.
Legacy VS clip planes also get pushed into the shader.

Cc: mesa-sta...@lists.freedesktop.org
Cc: Kenneth Graunke 


Tested-by: Lionel Landwerlin 


---
  src/intel/compiler/brw_nir.h  |  1 +
  .../compiler/brw_nir_analyze_ubo_ranges.c | 41 +--
  src/intel/vulkan/anv_pipeline.c   |  2 +-
  src/mesa/drivers/dri/i965/brw_gs.c|  2 +-
  src/mesa/drivers/dri/i965/brw_tcs.c   |  2 +-
  src/mesa/drivers/dri/i965/brw_tes.c   |  2 +-
  src/mesa/drivers/dri/i965/brw_vs.c|  2 +-
  src/mesa/drivers/dri/i965/brw_wm.c|  2 +-
  8 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/src/intel/compiler/brw_nir.h b/src/intel/compiler/brw_nir.h
index 19442b47eae..7d82edafe46 100644
--- a/src/intel/compiler/brw_nir.h
+++ b/src/intel/compiler/brw_nir.h
@@ -148,6 +148,7 @@ void brw_nir_lower_patch_vertices_in_to_uniform(nir_shader 
*nir);
  
  void brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler,

  nir_shader *nir,
+const struct brw_vs_prog_key *vs_key,
  struct brw_ubo_range out_ranges[4]);
  
  bool brw_nir_opt_peephole_ffma(nir_shader *shader);

diff --git a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c 
b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
index cd5137da06e..cfa531675fc 100644
--- a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
+++ b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
@@ -124,12 +124,29 @@ analyze_ubos_block(struct ubo_analysis_state *state, 
nir_block *block)
   continue;
  
nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);

-  if (intrin->intrinsic == nir_intrinsic_load_uniform)
+  switch (intrin->intrinsic) {
+  case nir_intrinsic_load_uniform:
+  case nir_intrinsic_image_deref_load:
+  case nir_intrinsic_image_deref_store:
+  case nir_intrinsic_image_deref_atomic_add:
+  case nir_intrinsic_image_deref_atomic_min:
+  case nir_intrinsic_image_deref_atomic_max:
+  case nir_intrinsic_image_deref_atomic_and:
+  case nir_intrinsic_image_deref_atomic_or:
+  case nir_intrinsic_image_deref_atomic_xor:
+  case nir_intrinsic_image_deref_atomic_exchange:
+  case nir_intrinsic_image_deref_atomic_comp_swap:
+  case nir_intrinsic_image_deref_size:
   state->uses_regular_uniforms = true;
-
-  if (intrin->intrinsic != nir_intrinsic_load_ubo)
   continue;
  
+  case nir_intrinsic_load_ubo:

+ break; /* Fall through to the analysis below */
+
+  default:
+ continue; /* Not a uniform or UBO intrinsic */
+  }
+
nir_const_value *block_const = nir_src_as_const_value(intrin->src[0]);
nir_const_value *offset_const = nir_src_as_const_value(intrin->src[1]);
  
@@ -179,6 +196,7 @@ print_ubo_entry(FILE *file,

  void
  brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler,
 nir_shader *nir,
+   const struct brw_vs_prog_key *vs_key,
 struct brw_ubo_range out_ranges[4])
  {
 const struct gen_device_info *devinfo = compiler->devinfo;
@@ -197,6 +215,23 @@ brw_nir_analyze_ubo_ranges(const struct brw_compiler 
*compiler,
   _mesa_hash_table_create(mem_ctx, NULL, _mesa_key_pointer_equal),
 };
  
+   switch (nir->info.stage) {

+   case MESA_SHADER_VERTEX:
+  if (vs_key && vs_key->nr_userclip_plane_consts > 0)
+ state.uses_regular_uniforms = true;
+  break;
+
+   case MESA_SHADER_COMPUTE:
+  /* Compute shaders use push constants to get the subgroup ID so it's
+   * best to just assume some system values are pushed.
+   */
+  state.uses_regular_uniforms = true;
+  break;
+
+   default:
+  break;
+   }
+
 /* Walk the IR, recording how many times each UBO block/offset is used. */
 nir_foreach_function(function, nir) {
if (function->impl) {
diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 211cee788b8..1e6bd12b87d 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -472,7 +472,7 @@ anv_pipeline_compile(struct anv_pipeline *pipeline,
anv_nir_apply_pipeline_layout(pipeline, layout, nir, prog_data, map);
  
 if (stage != MESA_SHADER_COMPUTE)

-  brw_nir_analyze_ubo_ranges(compiler, nir, prog_data->ubo_ranges);
+  brw_nir_analyze_ubo_ranges(compiler, nir, NULL, prog_data->ubo_ranges);
  
 assert(nir->num_uniforms == prog_data->nr_params * 4);
  
diff --git a/src/mesa/drivers/dri/i965/brw_gs.c b/src/mesa/drivers/dri/i965/brw_gs.c

index 9acb0337e20..7263f6351e9 100644

Re: [Mesa-dev] [PATCH] r600: enable tess_input_info for TES

2018-07-23 Thread Gert Wollny

Looks reasonable, 
Reviewed-By: Gert Wollny  

Am Donnerstag, den 19.07.2018, 14:42 +1000 schrieb Dave Airlie:
> From: Dave Airlie 
> 
> There might be a nicer way to do this, but this is at least correct.
> 
> This fixes:
> KHR-GL44.tessellation_shader.single.max_patch_vertices
> KHR-
> GL44.tessellation_shader.tessellation_control_to_tessellation_evaluat
> ion.gl_PatchVerticesIn
> ---
>  src/gallium/drivers/r600/r600_shader.c | 20 ++--
>  1 file changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/src/gallium/drivers/r600/r600_shader.c
> b/src/gallium/drivers/r600/r600_shader.c
> index 4b91da3..6effa31be7b 100644
> --- a/src/gallium/drivers/r600/r600_shader.c
> +++ b/src/gallium/drivers/r600/r600_shader.c
> @@ -1673,19 +1673,11 @@ static void tgsi_src(struct r600_shader_ctx
> *ctx,
>   } else if (ctx-
> >info.system_value_semantic_name[tgsi_src->Register.Index] ==
> TGSI_SEMANTIC_TESSOUTER) {
>   r600_src->sel = 2;
>   } else if (ctx-
> >info.system_value_semantic_name[tgsi_src->Register.Index] ==
> TGSI_SEMANTIC_VERTICESIN) {
> - if (ctx->type == PIPE_SHADER_TESS_CTRL) {
> - r600_src->sel = ctx-
> >tess_input_info;
> - r600_src->swizzle[0] = 2;
> - r600_src->swizzle[1] = 2;
> - r600_src->swizzle[2] = 2;
> - r600_src->swizzle[3] = 2;
> - } else {
> - r600_src->sel = ctx-
> >tess_input_info;
> - r600_src->swizzle[0] = 3;
> - r600_src->swizzle[1] = 3;
> - r600_src->swizzle[2] = 3;
> - r600_src->swizzle[3] = 3;
> - }
> + r600_src->sel = ctx->tess_input_info;
> + r600_src->swizzle[0] = 2;
> + r600_src->swizzle[1] = 2;
> + r600_src->swizzle[2] = 2;
> + r600_src->swizzle[3] = 2;
>   } else if (ctx->type == PIPE_SHADER_TESS_CTRL &&
> ctx->info.system_value_semantic_name[tgsi_src->Register.Index] ==
> TGSI_SEMANTIC_PRIMID) {
>   r600_src->sel = 0;
>   r600_src->swizzle[0] = 0;
> @@ -3559,7 +3551,7 @@ static int r600_shader_from_tgsi(struct
> r600_context *rctx,
>   ctx.tess_input_info = ++regno;
>   ctx.tess_output_info = ++regno;
>   } else if (ctx.type == PIPE_SHADER_TESS_EVAL) {
> - ctx.tess_input_info = 0;
> + ctx.tess_input_info = ++regno;
>   ctx.tess_output_info = ++regno;
>   } else if (ctx.type == PIPE_SHADER_GEOMETRY) {
>   ctx.gs_export_gpr_tregs[0] = ++regno;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] intel/compiler: Account for built-in uniforms in analyze_ubo_ranges

2018-07-23 Thread Jason Ekstrand

The original pass only looked for load_uniform intrinsics but there are
a number of other places that could end up loading a push constant.  One
obvious omission was images which always implicitly use a push constant.
Legacy VS clip planes also get pushed into the shader.

Cc: mesa-sta...@lists.freedesktop.org
Cc: Kenneth Graunke 
---
 src/intel/compiler/brw_nir.h  |  1 +
 .../compiler/brw_nir_analyze_ubo_ranges.c | 41 +--
 src/intel/vulkan/anv_pipeline.c   |  2 +-
 src/mesa/drivers/dri/i965/brw_gs.c|  2 +-
 src/mesa/drivers/dri/i965/brw_tcs.c   |  2 +-
 src/mesa/drivers/dri/i965/brw_tes.c   |  2 +-
 src/mesa/drivers/dri/i965/brw_vs.c|  2 +-
 src/mesa/drivers/dri/i965/brw_wm.c|  2 +-
 8 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/src/intel/compiler/brw_nir.h b/src/intel/compiler/brw_nir.h
index 19442b47eae..7d82edafe46 100644
--- a/src/intel/compiler/brw_nir.h
+++ b/src/intel/compiler/brw_nir.h
@@ -148,6 +148,7 @@ void brw_nir_lower_patch_vertices_in_to_uniform(nir_shader 
*nir);
 
 void brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler,
 nir_shader *nir,
+const struct brw_vs_prog_key *vs_key,
 struct brw_ubo_range out_ranges[4]);
 
 bool brw_nir_opt_peephole_ffma(nir_shader *shader);
diff --git a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c 
b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
index cd5137da06e..cfa531675fc 100644
--- a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
+++ b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
@@ -124,12 +124,29 @@ analyze_ubos_block(struct ubo_analysis_state *state, 
nir_block *block)
  continue;
 
   nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
-  if (intrin->intrinsic == nir_intrinsic_load_uniform)
+  switch (intrin->intrinsic) {
+  case nir_intrinsic_load_uniform:
+  case nir_intrinsic_image_deref_load:
+  case nir_intrinsic_image_deref_store:
+  case nir_intrinsic_image_deref_atomic_add:
+  case nir_intrinsic_image_deref_atomic_min:
+  case nir_intrinsic_image_deref_atomic_max:
+  case nir_intrinsic_image_deref_atomic_and:
+  case nir_intrinsic_image_deref_atomic_or:
+  case nir_intrinsic_image_deref_atomic_xor:
+  case nir_intrinsic_image_deref_atomic_exchange:
+  case nir_intrinsic_image_deref_atomic_comp_swap:
+  case nir_intrinsic_image_deref_size:
  state->uses_regular_uniforms = true;
-
-  if (intrin->intrinsic != nir_intrinsic_load_ubo)
  continue;
 
+  case nir_intrinsic_load_ubo:
+ break; /* Fall through to the analysis below */
+
+  default:
+ continue; /* Not a uniform or UBO intrinsic */
+  }
+
   nir_const_value *block_const = nir_src_as_const_value(intrin->src[0]);
   nir_const_value *offset_const = nir_src_as_const_value(intrin->src[1]);
 
@@ -179,6 +196,7 @@ print_ubo_entry(FILE *file,
 void
 brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler,
nir_shader *nir,
+   const struct brw_vs_prog_key *vs_key,
struct brw_ubo_range out_ranges[4])
 {
const struct gen_device_info *devinfo = compiler->devinfo;
@@ -197,6 +215,23 @@ brw_nir_analyze_ubo_ranges(const struct brw_compiler 
*compiler,
  _mesa_hash_table_create(mem_ctx, NULL, _mesa_key_pointer_equal),
};
 
+   switch (nir->info.stage) {
+   case MESA_SHADER_VERTEX:
+  if (vs_key && vs_key->nr_userclip_plane_consts > 0)
+ state.uses_regular_uniforms = true;
+  break;
+
+   case MESA_SHADER_COMPUTE:
+  /* Compute shaders use push constants to get the subgroup ID so it's
+   * best to just assume some system values are pushed.
+   */
+  state.uses_regular_uniforms = true;
+  break;
+
+   default:
+  break;
+   }
+
/* Walk the IR, recording how many times each UBO block/offset is used. */
nir_foreach_function(function, nir) {
   if (function->impl) {
diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 211cee788b8..1e6bd12b87d 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -472,7 +472,7 @@ anv_pipeline_compile(struct anv_pipeline *pipeline,
   anv_nir_apply_pipeline_layout(pipeline, layout, nir, prog_data, map);
 
if (stage != MESA_SHADER_COMPUTE)
-  brw_nir_analyze_ubo_ranges(compiler, nir, prog_data->ubo_ranges);
+  brw_nir_analyze_ubo_ranges(compiler, nir, NULL, prog_data->ubo_ranges);
 
assert(nir->num_uniforms == prog_data->nr_params * 4);
 
diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
b/src/mesa/drivers/dri/i965/brw_gs.c
index 9acb0337e20..7263f6351e9 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -94,7 +94,7 @@ brw_codegen_gs_prog(struct

Re: [Mesa-dev] [PATCH] nir: Add a couple trivial abs optimizations

2018-07-23 Thread Ian Romanick

Funny thing about this... this is how i915 implements ABS. :)

Reviewed-by: Ian Romanick 

On 07/23/2018 12:02 AM, Jason Ekstrand wrote:
> Spotted in a shader in Batman: Arkham City.
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index f49b43f3213..ba277fdfd0e 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -229,6 +229,8 @@ optimizations = [
> (('imax', a, a), a),
> (('umin', a, a), a),
> (('umax', a, a), a),
> +   (('fmax', a, ('fneg', a)), ('fabs', a)),
> +   (('imax', a, ('ineg', a)), ('iabs', a)),
> (('fmin', a, ('fneg', a)), ('fneg', ('fabs', a))),
> (('imin', a, ('ineg', a)), ('ineg', ('iabs', a))),
> (('fmin', a, ('fneg', ('fabs', a))), ('fneg', ('fabs', a))),
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: remove delegating constructors to allow build with C++98

2018-07-23 Thread Matt Turner

Reviewed-by: Matt Turner 

and pushed.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.

2018-07-23 Thread Eric Anholt

Eric Anholt  writes:

> This is controlled by a new nir_shader_compiler_options flag, and fixes
> dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.
> ---

Ken, Rob, any chance you could take a quick look at this?  I think this
is the last change I need in Mesa for GLES2 conformance.

(https://gerrit.khronos.org/#/c/2745/ and
https://gitlab.khronos.org/Tracker/vk-gl-cts/issues/1278 being my CTS
issues)


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/miptree: Fix can_blit_slice()

2018-07-23 Thread Nanley Chery

Satisfy the BLT engine's row pitch limitation on the destination
miptree. The destination miptree is untiled, so its row_pitch will be
slightly less than or equal to the source miptree's row_pitch. Use the
source miptree's row_pitch in can_blit_slice instead of its blt_pitch.

Fixes 0288fe8d0417730bdd5b3477130dd1dc32bdbcd3
("i965/miptree: Use the correct BLT pitch")

Cc: 
Reported-by: Dylan Baker 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index a18d5ac3624..d8e823e4826 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -3544,8 +3544,13 @@ static bool
 can_blit_slice(struct intel_mipmap_tree *mt,
unsigned int level, unsigned int slice)
 {
-   /* See intel_miptree_blit() for details on the 32k pitch limit. */
-   if (intel_miptree_blt_pitch(mt) >= 32768)
+   /* The blit destination is untiled, so its row_pitch will be slightly less
+* than or equal to the source's row_pitch. The BLT engine only supports
+* linear row pitches up to but not including 32k.
+*
+* See intel_miptree_blit() for details on the 32k pitch limit.
+*/
+   if (mt->surf.row_pitch >= 32768)
   return false;
 
return true;
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] nir: add const_index parameters to system value builder function

2018-07-23 Thread Karol Herbst

On Mon, Jul 23, 2018 at 7:02 PM, Eric Anholt  wrote:
> Karol Herbst  writes:
>
>> this allows to replace some nir_load_system_value calls with the specific
>> system value constructor
>>
>> Reviewed-by: Jason Ekstrand 
>> Signed-off-by: Karol Herbst 
>> ---
>>  src/compiler/nir/nir_builder_opcodes_h.py | 21 +++--
>>  1 file changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/compiler/nir/nir_builder_opcodes_h.py 
>> b/src/compiler/nir/nir_builder_opcodes_h.py
>> index 72cf5b4549d..0440875dade 100644
>> --- a/src/compiler/nir/nir_builder_opcodes_h.py
>> +++ b/src/compiler/nir/nir_builder_opcodes_h.py
>> @@ -55,11 +55,28 @@ nir_load_system_value(nir_builder *build, 
>> nir_intrinsic_op op, int index)
>> return >dest.ssa;
>>  }
>>
>> +<%
>> +def sysval_decl_list(opcode):
>> +   res = ''
>> +   if opcode.indices:
>> +  res += ', unsigned ' + opcode.indices[0].lower()
>> +   return res
>> +
>> +def sysval_arg_list(opcode):
>> +   args = []
>> +   if opcode.indices:
>> +  args.append(opcode.indices[0].lower())
>> +   else:
>> +  args.append('0')
>> +   return ', '.join(args)
>> +%>
>
> I was confused why only indices[0] was used, but it looks like system
> values can only have one index.  Maybe assert(len(opcode.indices) <= 1)?
> Other than that, these are:
>
> Reviewed-by: Eric Anholt 

maybe a static assert that verifies this instead? We could put it
inside nir_intrinsics_c.py and add a note that nir_load_system_value
needs to be adjusted in case we want more.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] virgl: add initial shader_storage_buffer_object support. (v2)

2018-07-23 Thread Gert Wollny

Am Montag, den 23.07.2018, 21:17 +1000 schrieb Dave Airlie:
> On 23 July 2018 at 17:28, Gert Wollny 
> wrote:
> > Am Montag, den 23.07.2018, 17:13 +1000 schrieb Dave Airlie:
> > > On 23 July 2018 at 16:46, Gert Wollny 
> > > wrote:
> > > > Am Montag, den 23.07.2018, 09:04 +1000 schrieb Dave Airlie:
> > > > > 
> > > > > > > +   uint32_t max_shader_buffer = shader ==
> > > > > > > PIPE_SHADER_FRAGMENT ?
> > > > > > > +  rs->caps.caps.v2.max_shader_buffer_frag_compute :
> > > > > > 
> > > > > > Shouldn't max_shader_buffer_frag_compute also be used for
> > > > > > PIPE_SHADER_COMPUTE?
> > > > > 
> > > > > virgl doesn't support compute shaders yet. Adding support for
> > > > > them is
> > > > > in the future patches.
> > > > 
> > > > I somehow suspected this, but then one should probably
> > > > explicitely
> > > > return zero instead of returning
> > > > max_shader_buffer_other_stages.
> > > 
> > > You can never reach this point of code with shader set to
> > > PIPE_SHADER_COMPUTE.
> > > 
> > > The outer switch statement defaults to return 0 for compute
> > > shaders.
> > 
> > I see, that was not visible in the patch, sorry for the noise.
> 
> Is that an r-b in disguise? :-)

Well, I had to take a closer look for this, and there is one thing that
 I'm wondering about: Are the last five macros in virgl_protocol.h for
the shader buffers actually used? I don't see it, but if they supposed
to be used somewhere then the offsets for the last three entries seem
to be off by one, because currently 
VIRGL_SET_SHADER_BUFFER_OFFSET(0) == VIRGL_SET_SHADER_BUFFER_START_SLOT
== 2

When this is clarified the patch is 
Reviewed-By: Gert Wollny 

Best, 
Gert 
> 
> Dave.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965: implement MESA_framebuffer_flip_y [v3]

2018-07-23 Thread Fritz Koenig

Instead of using _mesa_is_winsys_fbo or
_mesa_is_user_fbo to infer if an fbo is
flipped use the InvertedY flag.

v2:
* additional window-system framebuffer checks [for jason]
v3:
* s/inverted_y/flip_y/g [for chadv]
* s/InvertedY/FlipY/g [for chadv]
---
 src/mesa/drivers/dri/i965/brw_blorp.c |  2 +-
 src/mesa/drivers/dri/i965/brw_meta_util.c |  4 +-
 src/mesa/drivers/dri/i965/brw_sf.c|  6 +--
 src/mesa/drivers/dri/i965/genX_state_upload.c | 50 +--
 src/mesa/drivers/dri/i965/intel_extensions.c  |  1 +
 src/mesa/drivers/dri/i965/intel_fbo.c | 11 ++--
 .../drivers/dri/i965/intel_pixel_bitmap.c |  8 +--
 src/mesa/drivers/dri/i965/intel_pixel_copy.c  |  4 +-
 src/mesa/drivers/dri/i965/intel_pixel_draw.c  |  2 +-
 9 files changed, 43 insertions(+), 45 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index b66ee18ba4..7476cee43a 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -740,7 +740,7 @@ brw_blorp_copytexsubimage(struct brw_context *brw,
/* Account for the fact that in the system framebuffer, the origin is at
 * the lower left.
 */
-   bool mirror_y = _mesa_is_winsys_fbo(ctx->ReadBuffer);
+   bool mirror_y = ctx->ReadBuffer->FlipY;
if (mirror_y)
   apply_y_flip(, , src_rb->Height);
 
diff --git a/src/mesa/drivers/dri/i965/brw_meta_util.c 
b/src/mesa/drivers/dri/i965/brw_meta_util.c
index d292f5a8e2..908b098976 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_util.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_util.c
@@ -250,13 +250,13 @@ brw_meta_mirror_clip_and_scissor(const struct gl_context 
*ctx,
/* Account for the fact that in the system framebuffer, the origin is at
 * the lower left.
 */
-   if (_mesa_is_winsys_fbo(read_fb)) {
+   if (read_fb->FlipY) {
   GLint tmp = read_fb->Height - *srcY0;
   *srcY0 = read_fb->Height - *srcY1;
   *srcY1 = tmp;
   *mirror_y = !*mirror_y;
}
-   if (_mesa_is_winsys_fbo(draw_fb)) {
+   if (draw_fb->FlipY) {
   GLint tmp = draw_fb->Height - *dstY0;
   *dstY0 = draw_fb->Height - *dstY1;
   *dstY1 = tmp;
diff --git a/src/mesa/drivers/dri/i965/brw_sf.c 
b/src/mesa/drivers/dri/i965/brw_sf.c
index 73bc663f29..f4073fa6cf 100644
--- a/src/mesa/drivers/dri/i965/brw_sf.c
+++ b/src/mesa/drivers/dri/i965/brw_sf.c
@@ -90,7 +90,7 @@ brw_upload_sf_prog(struct brw_context *brw)
   return;
 
/* _NEW_BUFFERS */
-   bool render_to_fbo = _mesa_is_user_fbo(ctx->DrawBuffer);
+   bool flip_y = ctx->DrawBuffer->FlipY;
 
memset(, 0, sizeof(key));
 
@@ -137,7 +137,7 @@ brw_upload_sf_prog(struct brw_context *brw)
 * Window coordinates in a FBO are inverted, which means point
 * sprite origin must be inverted, too.
 */
-   if ((ctx->Point.SpriteOrigin == GL_LOWER_LEFT) != render_to_fbo)
+   if ((ctx->Point.SpriteOrigin == GL_LOWER_LEFT) == flip_y)
   key.sprite_origin_lower_left = true;
 
/* BRW_NEW_FS_PROG_DATA */
@@ -161,7 +161,7 @@ brw_upload_sf_prog(struct brw_context *brw)
* face orientation, just as we invert the viewport in
* sf_unit_create_from_key().
*/
-  key.frontface_ccw = brw->polygon_front_bit == render_to_fbo;
+  key.frontface_ccw = brw->polygon_front_bit != flip_y;
}
 
if (!brw_search_cache(>cache, BRW_CACHE_SF_PROG, , sizeof(key),
diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
b/src/mesa/drivers/dri/i965/genX_state_upload.c
index 9e0a17b9d9..cfafbb0c37 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -217,7 +217,7 @@ genX(upload_polygon_stipple)(struct brw_context *brw)
* to a FBO (i.e. any named frame buffer object), we *don't*
* need to invert - we already match the layout.
*/
-  if (_mesa_is_winsys_fbo(ctx->DrawBuffer)) {
+  if (ctx->DrawBuffer->FlipY) {
  for (unsigned i = 0; i < 32; i++)
 poly.PatternRow[i] = ctx->PolygonStipple[31 - i]; /* invert */
   } else {
@@ -257,7 +257,7 @@ genX(upload_polygon_stipple_offset)(struct brw_context *brw)
* to a user-created FBO then our native pixel coordinate system
* works just fine, and there's no window system to worry about.
*/
-  if (_mesa_is_winsys_fbo(ctx->DrawBuffer)) {
+  if (ctx->DrawBuffer->FlipY) {
  poly.PolygonStippleYOffset =
 (32 - (_mesa_geometric_height(ctx->DrawBuffer) & 31)) & 31;
   }
@@ -1468,7 +1468,7 @@ genX(upload_clip_state)(struct brw_context *brw)
 #endif
 
 #if GEN_GEN == 7
-  clip.FrontWinding = brw->polygon_front_bit == _mesa_is_user_fbo(fb);
+  clip.FrontWinding = brw->polygon_front_bit != fb->FlipY;
 
   if (ctx->Polygon.CullFlag) {
  switch (ctx->Polygon.CullFaceMode) {
@@ -1583,7 +1583,7 @@ genX(upload_sf)(struct brw_context *brw)
 
 #if GEN_GEN <= 7
/* _NEW_BUFFERS */
-   bool render_to_fbo =

[Mesa-dev] [PATCH 1/2] mesa: MESA_framebuffer_flip_y extension [v4]

2018-07-23 Thread Fritz Koenig

Adds an extension to glFramebufferParameteri
that will specify if the framebuffer is vertically
flipped. Historically system framebuffers are
vertically flipped and user framebuffers are not.
Checking to see the state was done by looking at
the name field.  This adds an explicit field.

v2:
* updated spec language [for chadv]
* correctly specifying ES 3.1 [for chadv]
* refactor access to rb->Name [for jason]
* handle GetFramebufferParameteriv [for chadv]
v3:
* correct _mesa_GetMultisamplefv [for kusmabite]
v4:
* update spec language [for chadv]
* s/GLboolean/bool/g [for chadv]
* s/InvertedY/FlipY/g [for chadv]
* s/inverted_y/flip_y/g [for chadv]
* assert changes [for chadv]
---
 docs/specs/MESA_framebuffer_flip_y.txt | 81 ++
 docs/specs/enums.txt   |  3 +
 include/GLES2/gl2ext.h |  5 ++
 src/mapi/glapi/registry/gl.xml |  6 ++
 src/mesa/drivers/dri/i915/intel_fbo.c  |  6 +-
 src/mesa/drivers/dri/i965/intel_fbo.c  |  6 +-
 src/mesa/drivers/dri/nouveau/nouveau_fbo.c |  6 +-
 src/mesa/drivers/dri/radeon/radeon_fbo.c   |  6 +-
 src/mesa/drivers/dri/radeon/radeon_span.c  |  9 ++-
 src/mesa/drivers/dri/swrast/swrast.c   |  6 +-
 src/mesa/drivers/osmesa/osmesa.c   |  5 +-
 src/mesa/drivers/x11/xm_buffer.c   |  3 +-
 src/mesa/drivers/x11/xmesaP.h  |  3 +-
 src/mesa/main/accum.c  | 17 +++--
 src/mesa/main/dd.h |  3 +-
 src/mesa/main/extensions_table.h   |  1 +
 src/mesa/main/fbobject.c   | 16 +
 src/mesa/main/framebuffer.c|  1 +
 src/mesa/main/glheader.h   |  3 +
 src/mesa/main/mtypes.h |  4 ++
 src/mesa/main/readpix.c| 20 +++---
 src/mesa/state_tracker/st_cb_fbo.c |  6 +-
 src/mesa/swrast/s_blit.c   | 17 +++--
 src/mesa/swrast/s_clear.c  |  3 +-
 src/mesa/swrast/s_copypix.c| 11 +--
 src/mesa/swrast/s_depth.c  |  6 +-
 src/mesa/swrast/s_drawpix.c| 26 ---
 src/mesa/swrast/s_renderbuffer.c   |  6 +-
 src/mesa/swrast/s_renderbuffer.h   |  3 +-
 src/mesa/swrast/s_stencil.c|  3 +-
 30 files changed, 235 insertions(+), 56 deletions(-)
 create mode 100644 docs/specs/MESA_framebuffer_flip_y.txt

diff --git a/docs/specs/MESA_framebuffer_flip_y.txt 
b/docs/specs/MESA_framebuffer_flip_y.txt
new file mode 100644
index 00..697ab4e75d
--- /dev/null
+++ b/docs/specs/MESA_framebuffer_flip_y.txt
@@ -0,0 +1,81 @@
+Name
+
+MESA_framebuffer_flip_y
+
+Name Strings
+
+GL_MESA_framebuffer_flip_y
+
+Contact
+
+Fritz Koenig 
+
+Contributors
+
+Fritz Koenig, Google
+Kristian Høgsberg, Google
+Chad Versace, Google
+
+Status
+
+Proposal
+
+Version
+
+Version 1, June 7, 2018
+
+Number
+
+302
+
+Dependencies
+
+OpenGL ES 3.1 is required, for FramebufferParameteri.
+
+Overview
+
+This extension defines a new framebuffer parameter,
+GL_FRAMEBUFFER_FLIP_Y_MESA, that changes the behavior of the reads and
+writes to the framebuffer attachment points. When 
GL_FRAMEBUFFER_FLIP_Y_MESA
+is GL_TRUE, render commands and pixel transfer operations access the
+backing store of each attachment point with an y-inverted coordinate
+system. This y-inversion is relative to the coordinate system set when
+GL_FRAMEBUFFER_FLIP_Y_MESA is GL_FALSE.
+
+Access through TexSubImage2D and similar calls will notice the effect of
+the flip when they are not attached to framebuffer objects because
+GL_FRAMEBUFFER_FLIP_Y_MESA is associated with the framebuffer object and
+not the attachment points.
+
+IP Status
+
+None
+
+Issues
+
+None
+
+New Procedures and Functions
+
+None
+
+New Types
+
+None
+
+New Tokens
+
+Accepted by the  argument of FramebufferParameteri and
+GetFramebufferParameteriv:
+
+GL_FRAMEBUFFER_FLIP_Y_MESA  0x8BBB
+
+Errors
+
+An INVALID_OPERATION error is generated by GetFramebufferParameteriv if the
+default framebuffer is bound to  and  is 
FRAMEBUFFER_FLIP_Y_MESA.
+
+Revision History
+
+Version 1, June, 2018
+Initial draft (Fritz Koenig)
diff --git a/docs/specs/enums.txt b/docs/specs/enums.txt
index bf3ca9c176..e1b95ec874 100644
--- a/docs/specs/enums.txt
+++ b/docs/specs/enums.txt
@@ -71,6 +71,9 @@ GL_MESA_tile_raster_order
GL_TILE_RASTER_ORDER_INCREASING_X_MESA  0x8BB9
GL_TILE_RASTER_ORDER_INCREASING_Y_MESA  0x8BBA
 
+GL_MESA_framebuffer_flip_y
+   GL_FRAMEBUFFER_FLIP_Y_MESA   0x8BBB
+
 EGL_MESA_drm_image
 EGL_DRM_BUFFER_FORMAT_MESA 0x31D0
 EGL_DRM_BUFFER_USE_MESA0x31D1
diff --git a/include/GLES2/gl2ext.h b/include/GLES2/gl2ext.h
index a7d19a1fc8..0a93bfb865 100644
--- a/include/GLES2/gl2ext.h
+++ b/include/GLES2/gl2ext.h
@@ -2334,6

Re: [Mesa-dev] i915/swrast vertex array regression

2018-07-23 Thread Mathias Fröhlich

Hi Ville,

> I noticed a while back that xonotic had started to misrender the gun
> models on i915. Yesterday I bisected it down to commit 64d2a2048054
> ("mesa: Make gl_vertex_array contain pointers to first order VAO
> members."). Actually that commit broke things even worse (and the
> game would even crash after a while), but a later commit (presumably
> 98f35ad63c23 ("vbo: Correctly handle source arrays in vbo_split_copy.")
> fixed things a little bit. But even with current master the guns are
> still being misrendered. I also verified that the same breakage was
> visible with swrast, whereas llvmpipe and i965 seemed ok.
> 
> To reproduce you can run 'xonotic-glx -benchmark demos/the-big-keybench.dem' 
> and wait until the guy picks up the mortar (I think that's what its
> called) which is maybe the third gun he picks up in that demo. The gun 
> is supposed to have some red color on it but now it has green.
> 
> It looked like there's already a bunch of stuff piled on top of the
> regression so reverting didn't seem entirely trivial, and thus I didn't
> bother trying.

I'll take a look!
And no - reverting depends on too much.

Mathias


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] nir: add const_index parameters to system value builder function

2018-07-23 Thread Eric Anholt

Karol Herbst  writes:

> this allows to replace some nir_load_system_value calls with the specific
> system value constructor
>
> Reviewed-by: Jason Ekstrand 
> Signed-off-by: Karol Herbst 
> ---
>  src/compiler/nir/nir_builder_opcodes_h.py | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/nir/nir_builder_opcodes_h.py 
> b/src/compiler/nir/nir_builder_opcodes_h.py
> index 72cf5b4549d..0440875dade 100644
> --- a/src/compiler/nir/nir_builder_opcodes_h.py
> +++ b/src/compiler/nir/nir_builder_opcodes_h.py
> @@ -55,11 +55,28 @@ nir_load_system_value(nir_builder *build, 
> nir_intrinsic_op op, int index)
> return >dest.ssa;
>  }
>  
> +<%
> +def sysval_decl_list(opcode):
> +   res = ''
> +   if opcode.indices:
> +  res += ', unsigned ' + opcode.indices[0].lower()
> +   return res
> +
> +def sysval_arg_list(opcode):
> +   args = []
> +   if opcode.indices:
> +  args.append(opcode.indices[0].lower())
> +   else:
> +  args.append('0')
> +   return ', '.join(args)
> +%>

I was confused why only indices[0] was used, but it looks like system
values can only have one index.  Maybe assert(len(opcode.indices) <= 1)?
Other than that, these are:

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/2] i965, anv: Use INTEL_DEBUG for disk_cache driver flags

2018-07-23 Thread Jordan Justen

On 2018-07-23 03:29:21, Eero Tamminen wrote:
> Hi,
> 
> Should this fix also:
> https://bugs.freedesktop.org/show_bug.cgi?id=106382
> ?
> 

It helped a bit. For me, on a simple test program, it allowed shader
time to work once, even when the cache had a non-shader-time program.
But, running wit shader-time a second time, it printed 'No shader time
collected yet'.

Without this patch, the test program would crash if the cache had a
non-shader-time program, and this patch prevented that.

Regarding the bisect info in that bug, if you happen to run with
INTEL_DEBUG=shader_time first after a fresh build, then shader time
would probably work, since it would generate the shader time program
and put it in the cache. So, the bisect info could be unreliable.

I'll take at what is happening on the '2nd shader-time run' issue that
I described above.

-Jordan

> 
> On 23.07.2018 07:27, Jordan Justen wrote:
> > Since various options within INTEL_DEBUG could impact code generation,
> > we need to set the disk cache driver_flags parameter based on the
> > INTEL_DEBUG flags in use.
> > 
> > An example that will affect the program generated by i965 is the
> > INTEL_DEBUG=nocompact option.
> > 
> > The DEBUG_DISK_CACHE_MASK value is added to mask the settings of
> > INTEL_DEBUG that can affect program generation.
> > 
> > v2:
> >   * Use driver_flags (Tim)
> >   * Also update Anvil (Jason)
> > 
> > Signed-off-by: Jordan Justen 
> > Reviewed-by: Lionel Landwerlin  (v1)
> > ---
> >   src/intel/common/gen_debug.h   | 5 +
> >   src/intel/vulkan/anv_device.c  | 3 ++-
> >   src/mesa/drivers/dri/i965/brw_disk_cache.c | 3 ++-
> >   3 files changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/intel/common/gen_debug.h b/src/intel/common/gen_debug.h
> > index f6c44eeb912..aa9f3cf80d7 100644
> > --- a/src/intel/common/gen_debug.h
> > +++ b/src/intel/common/gen_debug.h
> > @@ -84,6 +84,11 @@ extern uint64_t INTEL_DEBUG;
> >   #define DEBUG_COLOR   (1ull << 40)
> >   #define DEBUG_REEMIT  (1ull << 41)
> >   
> > +/* These flags may affect program generation */
> > +#define DEBUG_DISK_CACHE_MASK \
> > +   (DEBUG_SHADER_TIME | DEBUG_NO16 | DEBUG_NO_DUAL_OBJECT_GS | DEBUG_NO8 | 
> > \
> > +   DEBUG_SPILL_FS | DEBUG_SPILL_VEC4 | DEBUG_NO_COMPACTION | DEBUG_DO32)
> > +
> >   #ifdef HAVE_ANDROID_PLATFORM
> >   #define LOG_TAG "INTEL-MESA"
> >   #if ANDROID_API_LEVEL >= 26
> > diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
> > index 247ba641336..97a71563b8a 100644
> > --- a/src/intel/vulkan/anv_device.c
> > +++ b/src/intel/vulkan/anv_device.c
> > @@ -286,7 +286,8 @@ anv_physical_device_init_disk_cache(struct 
> > anv_physical_device *device)
> >  char timestamp[41];
> >  _mesa_sha1_format(timestamp, device->driver_build_sha1);
> >   
> > -   device->disk_cache = disk_cache_create(renderer, timestamp, 0);
> > +   const uint64_t driver_flags = INTEL_DEBUG & DEBUG_DISK_CACHE_MASK;
> > +   device->disk_cache = disk_cache_create(renderer, timestamp, 
> > driver_flags);
> >   #else
> >  device->disk_cache = NULL;
> >   #endif
> > diff --git a/src/mesa/drivers/dri/i965/brw_disk_cache.c 
> > b/src/mesa/drivers/dri/i965/brw_disk_cache.c
> > index a678c355b9d..8f1b064fd61 100644
> > --- a/src/mesa/drivers/dri/i965/brw_disk_cache.c
> > +++ b/src/mesa/drivers/dri/i965/brw_disk_cache.c
> > @@ -393,6 +393,7 @@ brw_disk_cache_init(struct intel_screen *screen)
> >  char timestamp[41];
> >  _mesa_sha1_format(timestamp, id_sha1);
> >   
> > -   screen->disk_cache = disk_cache_create(renderer, timestamp, 0);
> > +   const uint64_t driver_flags = INTEL_DEBUG & DEBUG_DISK_CACHE_MASK;
> > +   screen->disk_cache = disk_cache_create(renderer, timestamp, 
> > driver_flags);
> >   #endif
> >   }
> > 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/8] nir: add 16bit type information to glsl types

2018-07-23 Thread Bas Nieuwenhuizen

On Thu, Jul 19, 2018 at 2:48 PM, Daniel Schürmann
 wrote:
> Signed-off-by: Daniel Schürmann 
> ---
>  src/compiler/glsl_types.h  | 15 +++
>  src/compiler/nir_types.cpp | 12 
>  src/compiler/nir_types.h   |  1 +
>  3 files changed, 28 insertions(+)
>
> diff --git a/src/compiler/glsl_types.h b/src/compiler/glsl_types.h
> index efc6324865..8cc8177f2d 100644
> --- a/src/compiler/glsl_types.h
> +++ b/src/compiler/glsl_types.h
> @@ -87,6 +87,13 @@ enum glsl_base_type {
> GLSL_TYPE_ERROR
>  };
>
> +static inline bool glsl_base_type_is_16bit(enum glsl_base_type type)
> +{
> +   return type == GLSL_TYPE_FLOAT16 ||
> +  type == GLSL_TYPE_UINT16 ||
> +  type == GLSL_TYPE_INT16;
> +}
> +
>  static inline bool glsl_base_type_is_64bit(enum glsl_base_type type)
>  {
> return type == GLSL_TYPE_DOUBLE ||
> @@ -551,6 +558,14 @@ public:
>return glsl_base_type_is_64bit(base_type);
> }
>
> +   /**
> +* Query whether or not a type is 64-bit

Update comment?

> +*/
> +   bool is_16bit() const
> +   {
> +  return glsl_base_type_is_16bit(base_type);
> +   }
> +
> /**
>  * Query whether or not a type is a non-array boolean type
>  */
> diff --git a/src/compiler/nir_types.cpp b/src/compiler/nir_types.cpp
> index 6f1182b742..3a3864414f 100644
> --- a/src/compiler/nir_types.cpp
> +++ b/src/compiler/nir_types.cpp
> @@ -164,6 +164,12 @@ glsl_get_record_location_offset(const struct glsl_type 
> *type,
> return type->record_location_offset(length);
>  }
>
> +bool
> +glsl_type_is_16bit(const glsl_type *type)
> +{
> +   return type->is_16bit();
> +}
> +
>  bool
>  glsl_type_is_64bit(const glsl_type *type)
>  {
> @@ -473,6 +479,12 @@ glsl_channel_type(const glsl_type *t)
>return glsl_uint64_t_type();
> case GLSL_TYPE_INT64:
>return glsl_int64_t_type();
> +   case GLSL_TYPE_FLOAT16:
> +  return glsl_float16_t_type();
> +   case GLSL_TYPE_UINT16:
> +  return glsl_uint16_t_type();
> +   case GLSL_TYPE_INT16:
> +  return glsl_int16_t_type();
> default:
>unreachable("Unhandled base type glsl_channel_type()");
> }
> diff --git a/src/compiler/nir_types.h b/src/compiler/nir_types.h
> index c128250c7d..817b7a9b34 100644
> --- a/src/compiler/nir_types.h
> +++ b/src/compiler/nir_types.h
> @@ -121,6 +121,7 @@ glsl_get_bit_size(const struct glsl_type *type)
> return 0;
>  }
>
> +bool glsl_type_is_16bit(const struct glsl_type *type);
>  bool glsl_type_is_64bit(const struct glsl_type *type);
>  bool glsl_type_is_void(const struct glsl_type *type);
>  bool glsl_type_is_error(const struct glsl_type *type);
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] docs: move releases from Fridays to Wednesdays

2018-07-23 Thread Dylan Baker

Acked-by: Dylan Baker 

Quoting Juan A. Suarez Romero (2018-07-23 02:22:06)
> Reviewed-by: Juan A. Suarez 
> 
> Reviewed/Acked/Agreed, whatever makes more sense :)
> 
> 
> 
> J.A.
> 
> On Thu, 2018-07-19 at 16:00 +0300, Andres Gomez wrote:
> > As discussed at:
> > https://lists.freedesktop.org/archives/mesa-dev/2018-March/188525.html
> > 
> > Cc: Emil Velikov 
> > Cc: Juan A. Suarez Romero 
> > Cc: Dylan Baker 
> > Cc: Ian Romanick 
> > Cc: Carl Worth 
> > Cc: Mark Janes 
> > Signed-off-by: Andres Gomez 
> > ---
> >  docs/releasing.html | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/docs/releasing.html b/docs/releasing.html
> > index a022d0c484b..14315e7a8e4 100644
> > --- a/docs/releasing.html
> > +++ b/docs/releasing.html
> > @@ -54,8 +54,8 @@ For example:
> >  Release schedule
> >  
> >  
> > -Releases should happen on Fridays. Delays can occur although those should 
> > be keep
> > -to a minimum.
> > +Releases should happen on Wednesdays. Delays can occur although those
> > +should be keep to a minimum.
> >  
> >  See our calendar for 
> > the
> >  date and other details for individual releases.
> 


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 54971] __glXInitialize can initialize same display multiple times

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=54971

Julien Isorce  changed:

   What|Removed |Added

 CC||julien.iso...@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] i915/swrast vertex array regression

2018-07-23 Thread Ville Syrjälä

Hi Mathias,

I noticed a while back that xonotic had started to misrender the gun
models on i915. Yesterday I bisected it down to commit 64d2a2048054
("mesa: Make gl_vertex_array contain pointers to first order VAO
members."). Actually that commit broke things even worse (and the
game would even crash after a while), but a later commit (presumably
98f35ad63c23 ("vbo: Correctly handle source arrays in vbo_split_copy.")
fixed things a little bit. But even with current master the guns are
still being misrendered. I also verified that the same breakage was
visible with swrast, whereas llvmpipe and i965 seemed ok.

To reproduce you can run 'xonotic-glx -benchmark demos/the-big-keybench.dem' 
and wait until the guy picks up the mortar (I think that's what its
called) which is maybe the third gun he picks up in that demo. The gun 
is supposed to have some red color on it but now it has green.

It looked like there's already a bunch of stuff piled on top of the
regression so reverting didn't seem entirely trivial, and thus I didn't
bother trying.

-- 
Ville Syrjälä
Intel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: fix a little memleak when we keep shader info

2018-07-23 Thread Samuel Pitoiset




On 07/23/2018 04:02 PM, Bas Nieuwenhuizen wrote:

On Mon, Jul 23, 2018 at 1:50 PM, Samuel Pitoiset
 wrote:

When we keep shader info for eg. RADV_TRACE_FILE, we don't
free the NIR when creating the pipelines. Though, we have to
free it when destroying the shader module.

Signed-off-by: Samuel Pitoiset 
---
  src/amd/vulkan/radv_shader.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index aac5b8a21a..ef7c41af9e 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -599,8 +599,8 @@ shader_variant_create(struct radv_device *device,
 if (device->keep_shader_info) {
 variant->disasm_string = binary.disasm_string;
 variant->llvm_ir_string = binary.llvm_ir_string;
+   variant->nir = *shaders;


How does this work with merged shaders?


You are right. I figured this on friday but I totally forgot during the 
weekend. :)


This still leaks memory for merged shaders.




 if (!gs_copy_shader && !module->nir) {
-   variant->nir = *shaders;
 variant->spirv = (uint32_t *)module->data;
 variant->spirv_size = module->size;
 }
--
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: Update to new VK_EXT_vertex_attribute_divisor to version 2.

2018-07-23 Thread Bas Nieuwenhuizen

Behavior wrt firstInstance got changed, and a divisor of 0 has been
disallowed.

The new version of the ext got published in specification 1.1.81.
---
 src/amd/vulkan/radv_extensions.py | 2 +-
 src/amd/vulkan/radv_nir_to_llvm.c | 7 ---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/amd/vulkan/radv_extensions.py 
b/src/amd/vulkan/radv_extensions.py
index a5fbffac33b..d02042e5647 100644
--- a/src/amd/vulkan/radv_extensions.py
+++ b/src/amd/vulkan/radv_extensions.py
@@ -104,7 +104,7 @@ EXTENSIONS = [
 Extension('VK_EXT_sampler_filter_minmax', 1, 
'device->rad_info.chip_class >= CIK'),
 Extension('VK_EXT_shader_viewport_index_layer',   1, True),
 Extension('VK_EXT_shader_stencil_export', 1, True),
-Extension('VK_EXT_vertex_attribute_divisor',  1, True),
+Extension('VK_EXT_vertex_attribute_divisor',  2, True),
 Extension('VK_AMD_draw_indirect_count',   1, True),
 Extension('VK_AMD_gcn_shader',1, True),
 Extension('VK_AMD_rasterization_order',   1, 
'device->has_out_of_order_rast'),
diff --git a/src/amd/vulkan/radv_nir_to_llvm.c 
b/src/amd/vulkan/radv_nir_to_llvm.c
index c7d772fa652..d12ef09c9f3 100644
--- a/src/amd/vulkan/radv_nir_to_llvm.c
+++ b/src/amd/vulkan/radv_nir_to_llvm.c
@@ -1984,8 +1984,7 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
uint32_t divisor = 
ctx->options->key.vs.instance_rate_divisors[attrib_index];
 
if (divisor) {
-   buffer_index = LLVMBuildAdd(ctx->ac.builder, 
ctx->abi.instance_id,
-   
ctx->abi.start_instance, "");
+   buffer_index = ctx->abi.instance_id;
 
if (divisor != 1) {
buffer_index = 
LLVMBuildUDiv(ctx->ac.builder, buffer_index,
@@ -2000,8 +1999,10 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
MAX2(1, 
ctx->shader_info->vs.vgpr_comp_cnt);
}
} else {
-   buffer_index = ctx->ac.i32_0;
+   unreachable("Invalid vertex attribute divisor 
of 0.");
}
+
+   buffer_index = LLVMBuildAdd(ctx->ac.builder, 
ctx->abi.start_instance, buffer_index, "");
} else
buffer_index = LLVMBuildAdd(ctx->ac.builder, 
ctx->abi.vertex_id,
ctx->abi.base_vertex, "");
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: Add a couple of iand/ior optimizations

2018-07-23 Thread Jason Ekstrand

On Mon, Jul 23, 2018 at 1:08 AM Timothy Arceri 
wrote:

> Ian and I have been looking at these type of things recently. Ian has
> started work on a pass to cover this stuff without having to add dozens
> of these types of opts.
>
> https://lists.freedesktop.org/archives/mesa-dev/2018-July/200583.html
>

Does said pass work only on logical operations with booleans or also
regular bitwise operations?  This patch is for bitwise operations though it
naturally works with booleans.

--Jason


> On 23/07/18 17:36, Jason Ekstrand wrote:
> > Spotted in a shader in Batman: Arkham City.
> > ---
> >   src/compiler/nir/nir_opt_algebraic.py | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/src/compiler/nir/nir_opt_algebraic.py
> b/src/compiler/nir/nir_opt_algebraic.py
> > index ba277fdfd0e..f2007852b21 100644
> > --- a/src/compiler/nir/nir_opt_algebraic.py
> > +++ b/src/compiler/nir/nir_opt_algebraic.py
> > @@ -377,6 +377,8 @@ optimizations = [
> >  (('ixor', a, a), 0),
> >  (('ixor', a, 0), a),
> >  (('inot', ('inot', a)), a),
> > +   (('ior', ('iand', a, b), b), b),
> > +   (('iand', ('ior', a, b), b), b),
> >  # DeMorgan's Laws
> >  (('iand', ('inot', a), ('inot', b)), ('inot', ('ior',  a, b))),
> >  (('ior',  ('inot', a), ('inot', b)), ('inot', ('iand', a, b))),
> >
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT

2018-07-23 Thread Jason Ekstrand

Reviewed-by: Jason Ekstrand 

On Mon, Jul 23, 2018 at 4:05 AM Lionel Landwerlin <
lionel.g.landwer...@intel.com> wrote:

> Thanks Alex!
>
> Reviewed-by: Lionel Landwerlin 
>
> On 23/07/18 09:51, Alex Smith wrote:
> > According to the spec, these should apply to all read/write access
> > types (so would be equivalent to specifying all other access types
> > individually). Currently, they were doing nothing.
> >
> > v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask.
> >
> > Signed-off-by: Alex Smith 
> > Cc: mesa-sta...@lists.freedesktop.org
> > ---
> >   src/intel/vulkan/anv_private.h | 9 +
> >   1 file changed, 9 insertions(+)
> >
> > diff --git a/src/intel/vulkan/anv_private.h
> b/src/intel/vulkan/anv_private.h
> > index cec2842792..1660fcbbc8 100644
> > --- a/src/intel/vulkan/anv_private.h
> > +++ b/src/intel/vulkan/anv_private.h
> > @@ -1731,6 +1731,9 @@ anv_pipe_flush_bits_for_access_flags(VkAccessFlags
> flags)
> >pipe_bits |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT;
> >pipe_bits |= ANV_PIPE_DEPTH_CACHE_FLUSH_BIT;
> >break;
> > +  case VK_ACCESS_MEMORY_WRITE_BIT:
> > + pipe_bits |= ANV_PIPE_FLUSH_BITS;
> > + break;
> > default:
> >break; /* Nothing to do */
> > }
> > @@ -1761,6 +1764,12 @@
> anv_pipe_invalidate_bits_for_access_flags(VkAccessFlags flags)
> > case VK_ACCESS_TRANSFER_READ_BIT:
> >pipe_bits |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT;
> >break;
> > +  case VK_ACCESS_MEMORY_READ_BIT:
> > + pipe_bits |= ANV_PIPE_INVALIDATE_BITS;
> > + break;
> > +  case VK_ACCESS_MEMORY_WRITE_BIT:
> > + pipe_bits |= ANV_PIPE_FLUSH_BITS;
> > + break;
> > default:
> >break; /* Nothing to do */
> > }
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: fix a little memleak when we keep shader info

2018-07-23 Thread Bas Nieuwenhuizen

On Mon, Jul 23, 2018 at 1:50 PM, Samuel Pitoiset
 wrote:
> When we keep shader info for eg. RADV_TRACE_FILE, we don't
> free the NIR when creating the pipelines. Though, we have to
> free it when destroying the shader module.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_shader.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
> index aac5b8a21a..ef7c41af9e 100644
> --- a/src/amd/vulkan/radv_shader.c
> +++ b/src/amd/vulkan/radv_shader.c
> @@ -599,8 +599,8 @@ shader_variant_create(struct radv_device *device,
> if (device->keep_shader_info) {
> variant->disasm_string = binary.disasm_string;
> variant->llvm_ir_string = binary.llvm_ir_string;
> +   variant->nir = *shaders;

How does this work with merged shaders?

> if (!gs_copy_shader && !module->nir) {
> -   variant->nir = *shaders;
> variant->spirv = (uint32_t *)module->data;
> variant->spirv_size = module->size;
> }
> --
> 2.18.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 12/15] nir/types: Add a wrapper to access gl_type

2018-07-23 Thread Alejandro Piñeiro

On 21/07/18 05:15, Timothy Arceri wrote:
> On 21/07/18 13:09, Timothy Arceri wrote:
>> Reviewed-by: Timothy Arceri 
>
> Actually I take that back. This introduces a dependency on GL in NIR, 

Hmm, but that dependency is already there. nir.h includes GL/gl.h, and
in fact, there is a comment pointing that is there for GLenum.
nir_variable includes a "Glenum format" as part of
ARB_shader_image_load_store.

Just in case you want to see it on code:
https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/nir/nir.h#n33
https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/nir/nir.h#n338

> maybe better to put this one in gl_nir_linker.h? 

I can't move it directly there, as gl_nir_linker is C-code. After all,
that was the reason we placed it at nir_types.[h/cpp], because we need a
C-wrapper for gl_nir_link* stuff. Perhaps I could put it at
link_util.cpp, but sounds weird. And as I mentioned before, NIR already
have a dependency on GL.

> If you agree that change would be:
>
> Reviewed-by: Timothy Arceri 
>
>>
>> On 21/07/18 01:08, Alejandro Piñeiro wrote:
>>> From: Neil Roberts 
>>>
>>> ---
>>>   src/compiler/nir_types.cpp | 6 ++
>>>   src/compiler/nir_types.h   | 2 ++
>>>   2 files changed, 8 insertions(+)
>>>
>>> diff --git a/src/compiler/nir_types.cpp b/src/compiler/nir_types.cpp
>>> index 6f1182b742c..2085138c407 100644
>>> --- a/src/compiler/nir_types.cpp
>>> +++ b/src/compiler/nir_types.cpp
>>> @@ -81,6 +81,12 @@ glsl_get_column_type(const struct glsl_type *type)
>>>  return type->column_type();
>>>   }
>>> +GLenum
>>> +glsl_get_gl_type(const struct glsl_type *type)
>>> +{
>>> +   return type->gl_type;
>>> +}
>>> +
>>>   enum glsl_base_type
>>>   glsl_get_base_type(const struct glsl_type *type)
>>>   {
>>> diff --git a/src/compiler/nir_types.h b/src/compiler/nir_types.h
>>> index c128250c7d3..ec5bf2cbc76 100644
>>> --- a/src/compiler/nir_types.h
>>> +++ b/src/compiler/nir_types.h
>>> @@ -59,6 +59,8 @@ glsl_get_function_return_type(const struct
>>> glsl_type *type);
>>>   const struct glsl_function_param *
>>>   glsl_get_function_param(const struct glsl_type *type, unsigned
>>> index);
>>> +GLenum glsl_get_gl_type(const struct glsl_type *type);
>>> +
>>>   enum glsl_base_type glsl_get_base_type(const struct glsl_type *type);
>>>   unsigned glsl_get_vector_elements(const struct glsl_type *type);
>>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 106893] Potential mem leak with radv, linked to RADV_TRACE_FILE

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=106893

--- Comment #14 from Alex Smith  ---
If ever there is a need to get a trace, I've given Samuel details of a game
option that can be set to disable the background pipeline preloading.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 106893] Potential mem leak with radv, linked to RADV_TRACE_FILE

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=106893

John  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #13 from John  ---
I'm surely wrong but shouldn't that be flushed to a file or something when
memory is becoming a problem? It seems like in such a case the trace cannot be
used, isn't that problematic?

Either way, thank you for looking into it!

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] intel/ppgtt: memory address alignment

2018-07-23 Thread Sergii Romantsov

Kernel (for ppgtt) requires memory address to be
aligned to page size (4096).
Added such alignment for buffers marked with EXEC_OBJECT_PINNED.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106997
Fixes: a363bb2cd0e2 (i965: Allocate VMA in userspace for full-PPGTT systems.)
Signed-off-by: Sergii Romantsov 
---
 src/mesa/drivers/dri/i965/brw_bufmgr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c 
b/src/mesa/drivers/dri/i965/brw_bufmgr.c
index 09d45e3..8383735 100644
--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
@@ -643,7 +643,7 @@ retry:
bo->kflags = bufmgr->initial_kflags;
 
if ((bo->kflags & EXEC_OBJECT_PINNED) && bo->gtt_offset == 0ull) {
-  bo->gtt_offset = vma_alloc(bufmgr, memzone, bo->size, 1);
+  bo->gtt_offset = vma_alloc(bufmgr, memzone, bo->size, 4096);
 
   if (bo->gtt_offset == 0ull)
  goto err_free;
@@ -784,7 +784,7 @@ brw_bo_gem_create_from_name(struct brw_bufmgr *bufmgr,
bo->kflags = bufmgr->initial_kflags;
 
if (bo->kflags & EXEC_OBJECT_PINNED)
-  bo->gtt_offset = vma_alloc(bufmgr, BRW_MEMZONE_OTHER, bo->size, 1);
+  bo->gtt_offset = vma_alloc(bufmgr, BRW_MEMZONE_OTHER, bo->size, 4096);
 
_mesa_hash_table_insert(bufmgr->handle_table, >gem_handle, bo);
_mesa_hash_table_insert(bufmgr->name_table, >global_name, bo);
@@ -1424,7 +1424,7 @@ brw_bo_gem_create_from_prime_internal(struct brw_bufmgr 
*bufmgr, int prime_fd,
 
if (bo->kflags & EXEC_OBJECT_PINNED) {
   assert(bo->size > 0);
-  bo->gtt_offset = vma_alloc(bufmgr, BRW_MEMZONE_OTHER, bo->size, 1);
+  bo->gtt_offset = vma_alloc(bufmgr, BRW_MEMZONE_OTHER, bo->size, 4096);
}
 
if (tiling_mode < 0) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: fix a little memleak when we keep shader info

2018-07-23 Thread Samuel Pitoiset

When we keep shader info for eg. RADV_TRACE_FILE, we don't
free the NIR when creating the pipelines. Though, we have to
free it when destroying the shader module.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_shader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index aac5b8a21a..ef7c41af9e 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -599,8 +599,8 @@ shader_variant_create(struct radv_device *device,
if (device->keep_shader_info) {
variant->disasm_string = binary.disasm_string;
variant->llvm_ir_string = binary.llvm_ir_string;
+   variant->nir = *shaders;
if (!gs_copy_shader && !module->nir) {
-   variant->nir = *shaders;
variant->spirv = (uint32_t *)module->data;
variant->spirv_size = module->size;
}
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw: force draw pipeline if there's more than 65535 vertices

2018-07-23 Thread Jose Fonseca


On 22/07/18 00:09, srol...@vmware.com wrote:

From: Roland Scheidegger 

The pt emit path can only handle 65535 - the number of vertices is
truncated to a ushort, resulting in a too small buffer allocation, which
will crash.

Forcing the pipeline path looks suboptimal, then again this bug is
probably there ever since GS is supported, so it seems it's not
happening often. (Note that the vertex_id in the vertex header is 16
bit too, however this is only used by the draw pipeline, and it denotes
the emit vertex nr, and that uses vbuf code, which will only emit smaller
chunks, so should be fine I think.)
Other solutions would be to simply allow 32bit counts for vertex
allocation, however 65535 is already larger than this was intended for
(the idea being it should be more cache friendly). Or could try to teach
the pt emit path to split the emit in smaller chunks (only the non-index
path can be affected, since gs output is always linear), but it's a bit
tricky (we don't know the primitive boundaries up-front).

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=107295

Cc: 
---
  src/gallium/auxiliary/draw/draw_pt_emit.c  |  2 ++
  src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c  | 10 ++
  src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c |  9 +
  3 files changed, 21 insertions(+)

diff --git a/src/gallium/auxiliary/draw/draw_pt_emit.c 
b/src/gallium/auxiliary/draw/draw_pt_emit.c
index 6fb630b549..984c76fdf9 100644
--- a/src/gallium/auxiliary/draw/draw_pt_emit.c
+++ b/src/gallium/auxiliary/draw/draw_pt_emit.c
@@ -158,6 +158,7 @@ draw_pt_emit(struct pt_emit *emit,
  */
 render->set_primitive(draw->render, prim_info->prim);
  
+   assert(vertex_count <= 65535);

 render->allocate_vertices(render,
   (ushort)translate->key.output_stride,
   (ushort)vertex_count);
@@ -229,6 +230,7 @@ draw_pt_emit_linear(struct pt_emit *emit,
  */
 render->set_primitive(draw->render, prim_info->prim);
  
+   assert(count <= 65535);

 if (!render->allocate_vertices(render,
(ushort)translate->key.output_stride,
(ushort)count))
diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c 
b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
index aa20b918f5..f76e022994 100644
--- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
+++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
@@ -299,6 +299,16 @@ fetch_pipeline_generic(struct draw_pt_middle_end *middle,
FREE(vert_info->verts);
vert_info = _vert_info;
prim_info = _prim_info;
+
+  /*
+   * pt emit can only handle ushort number of vertices (see
+   * render->allocate_vertices).
+   * vsplit guarantees there's never more than 4096, however GS can
+   * easily blow this up (by a factor of 256 (or even 1024) max).
+   */
+  if (vert_info->count > 65535) {
+ opt |= PT_PIPELINE;
+  }
 } else {
if (draw_prim_assembler_is_required(draw, prim_info, vert_info)) {
   draw_prim_assembler_run(draw, prim_info, vert_info,
diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c 
b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
index 5e0c562256..91c9360cce 100644
--- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
@@ -428,6 +428,15 @@ llvm_pipeline_generic(struct draw_pt_middle_end *middle,
FREE(vert_info->verts);
vert_info = _vert_info;
prim_info = _prim_info;
+  /*
+   * pt emit can only handle ushort number of vertices (see
+   * render->allocate_vertices).
+   * vsplit guarantees there's never more than 4096, however GS can
+   * easily blow this up (by a factor of 256 (or even 1024) max).
+   */
+  if (vert_info->count > 65535) {
+ opt |= PT_PIPELINE;
+  }
 } else {
if (draw_prim_assembler_is_required(draw, prim_info, vert_info)) {
   draw_prim_assembler_run(draw, prim_info, vert_info,



Sounds good to me.

Reviewed-by: Jose Fonseca 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 106893] Potential mem leak with radv, linked to RADV_TRACE_FILE

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=106893

Samuel Pitoiset  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |NOTABUG

--- Comment #12 from Samuel Pitoiset  ---
We don't have any memory leaks with RADV_TRACE_FILE actually. The thing is that
RoTR creates a TON of pipelines in the background menu for the whole game. When
RADV_TRACE_FILE is set we keep all shader info, including SPIRV, NIR and
assembly.

After few minutes the game will crash because we are out of memory. We can't do
anything useful for that.

Thanks for the report!

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] virgl: add initial shader_storage_buffer_object support. (v2)

2018-07-23 Thread Dave Airlie

On 23 July 2018 at 17:28, Gert Wollny  wrote:
> Am Montag, den 23.07.2018, 17:13 +1000 schrieb Dave Airlie:
>> On 23 July 2018 at 16:46, Gert Wollny 
>> wrote:
>> > Am Montag, den 23.07.2018, 09:04 +1000 schrieb Dave Airlie:
>> > >
>> > > > > +   uint32_t max_shader_buffer = shader ==
>> > > > > PIPE_SHADER_FRAGMENT ?
>> > > > > +  rs->caps.caps.v2.max_shader_buffer_frag_compute :
>> > > >
>> > > > Shouldn't max_shader_buffer_frag_compute also be used for
>> > > > PIPE_SHADER_COMPUTE?
>> > >
>> > > virgl doesn't support compute shaders yet. Adding support for
>> > > them is
>> > > in the future patches.
>> >
>> > I somehow suspected this, but then one should probably explicitely
>> > return zero instead of returning max_shader_buffer_other_stages.
>>
>> You can never reach this point of code with shader set to
>> PIPE_SHADER_COMPUTE.
>>
>> The outer switch statement defaults to return 0 for compute shaders.
>
> I see, that was not visible in the patch, sorry for the noise.

Is that an r-b in disguise? :-)

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] forward precise-flag if supported

2018-07-23 Thread Dave Airlie

On 23 July 2018 at 20:48, Erik Faye-Lund  wrote:
> New versions of virglrenderer supports the precise-flag, so let's
> forward it from TGSI if that's the case.

Reviewed-by: Dave Airlie 
>
> This fixes a few dEQP-GLES31 tests:
> - dEQP-GLES31.functional.tessellation.common_edge.quads_equal_spacing_precise
> - 
> dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_even_spacing_precise
> - 
> dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_odd_spacing_precise
> - 
> dEQP-GLES31.functional.tessellation.common_edge.triangles_equal_spacing_precise
> - 
> dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_even_spacing_precise
> - 
> dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_odd_spacing_precise
>
> Signed-off-by: Erik Faye-Lund 
> ---
>
> This matches this virglrenderer-series, which has been merged upstrea:
> https://patchwork.freedesktop.org/series/46361/
>
>  src/gallium/drivers/virgl/virgl_hw.h   | 1 +
>  src/gallium/drivers/virgl/virgl_tgsi.c | 5 -
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/virgl/virgl_hw.h 
> b/src/gallium/drivers/virgl/virgl_hw.h
> index ee6aa68c5a..8f4f0ab6d8 100644
> --- a/src/gallium/drivers/virgl/virgl_hw.h
> +++ b/src/gallium/drivers/virgl/virgl_hw.h
> @@ -203,6 +203,7 @@ enum virgl_formats {
>  #define VIRGL_CAP_TEXTURE_VIEW (1 << 1)
>  #define VIRGL_CAP_SET_MIN_SAMPLES  (1 << 2)
>  #define VIRGL_CAP_COPY_IMAGE   (1 << 3)
> +#define VIRGL_CAP_TGSI_PRECISE (1 << 4)
>
>  #define VIRGL_BIND_DEPTH_STENCIL (1 << 0)
>  #define VIRGL_BIND_RENDER_TARGET (1 << 1)
> diff --git a/src/gallium/drivers/virgl/virgl_tgsi.c 
> b/src/gallium/drivers/virgl/virgl_tgsi.c
> index ff5abf6ddb..d1f785d4d2 100644
> --- a/src/gallium/drivers/virgl/virgl_tgsi.c
> +++ b/src/gallium/drivers/virgl/virgl_tgsi.c
> @@ -31,6 +31,7 @@
>  struct virgl_transform_context {
> struct tgsi_transform_context base;
> bool cull_enabled;
> +   bool has_precise;
>  };
>
>  static void
> @@ -76,7 +77,8 @@ static void
>  virgl_tgsi_transform_instruction(struct tgsi_transform_context *ctx,
>  struct tgsi_full_instruction *inst)
>  {
> -   if (inst->Instruction.Precise)
> +   struct virgl_transform_context *vtctx = (struct virgl_transform_context 
> *)ctx;
> +   if (!vtctx->has_precise && inst->Instruction.Precise)
>inst->Instruction.Precise = 0;
>
> for (unsigned i = 0; i < inst->Instruction.NumSrcRegs; i++) {
> @@ -104,6 +106,7 @@ struct tgsi_token *virgl_tgsi_transform(struct 
> virgl_context *vctx, const struct
> transform.base.transform_property = virgl_tgsi_transform_property;
> transform.base.transform_instruction = virgl_tgsi_transform_instruction;
> transform.cull_enabled = vscreen->caps.caps.v1.bset.has_cull;
> +   transform.has_precise = vscreen->caps.caps.v2.capability_bits & 
> VIRGL_CAP_TGSI_PRECISE;
> tgsi_transform_shader(tokens_in, new_tokens, newLen, );
>
> return new_tokens;
> --
> 2.18.0
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT

2018-07-23 Thread Lionel Landwerlin


Thanks Alex!

Reviewed-by: Lionel Landwerlin 

On 23/07/18 09:51, Alex Smith wrote:

According to the spec, these should apply to all read/write access
types (so would be equivalent to specifying all other access types
individually). Currently, they were doing nothing.

v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask.

Signed-off-by: Alex Smith 
Cc: mesa-sta...@lists.freedesktop.org
---
  src/intel/vulkan/anv_private.h | 9 +
  1 file changed, 9 insertions(+)

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index cec2842792..1660fcbbc8 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1731,6 +1731,9 @@ anv_pipe_flush_bits_for_access_flags(VkAccessFlags flags)
   pipe_bits |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT;
   pipe_bits |= ANV_PIPE_DEPTH_CACHE_FLUSH_BIT;
   break;
+  case VK_ACCESS_MEMORY_WRITE_BIT:
+ pipe_bits |= ANV_PIPE_FLUSH_BITS;
+ break;
default:
   break; /* Nothing to do */
}
@@ -1761,6 +1764,12 @@ anv_pipe_invalidate_bits_for_access_flags(VkAccessFlags 
flags)
case VK_ACCESS_TRANSFER_READ_BIT:
   pipe_bits |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT;
   break;
+  case VK_ACCESS_MEMORY_READ_BIT:
+ pipe_bits |= ANV_PIPE_INVALIDATE_BITS;
+ break;
+  case VK_ACCESS_MEMORY_WRITE_BIT:
+ pipe_bits |= ANV_PIPE_FLUSH_BITS;
+ break;
default:
   break; /* Nothing to do */
}



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] forward precise-flag if supported

2018-07-23 Thread Erik Faye-Lund

New versions of virglrenderer supports the precise-flag, so let's
forward it from TGSI if that's the case.

This fixes a few dEQP-GLES31 tests:
- dEQP-GLES31.functional.tessellation.common_edge.quads_equal_spacing_precise
- 
dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_even_spacing_precise
- 
dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_odd_spacing_precise
- 
dEQP-GLES31.functional.tessellation.common_edge.triangles_equal_spacing_precise
- 
dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_even_spacing_precise
- 
dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_odd_spacing_precise

Signed-off-by: Erik Faye-Lund 
---

This matches this virglrenderer-series, which has been merged upstrea:
https://patchwork.freedesktop.org/series/46361/

 src/gallium/drivers/virgl/virgl_hw.h   | 1 +
 src/gallium/drivers/virgl/virgl_tgsi.c | 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/virgl/virgl_hw.h 
b/src/gallium/drivers/virgl/virgl_hw.h
index ee6aa68c5a..8f4f0ab6d8 100644
--- a/src/gallium/drivers/virgl/virgl_hw.h
+++ b/src/gallium/drivers/virgl/virgl_hw.h
@@ -203,6 +203,7 @@ enum virgl_formats {
 #define VIRGL_CAP_TEXTURE_VIEW (1 << 1)
 #define VIRGL_CAP_SET_MIN_SAMPLES  (1 << 2)
 #define VIRGL_CAP_COPY_IMAGE   (1 << 3)
+#define VIRGL_CAP_TGSI_PRECISE (1 << 4)
 
 #define VIRGL_BIND_DEPTH_STENCIL (1 << 0)
 #define VIRGL_BIND_RENDER_TARGET (1 << 1)
diff --git a/src/gallium/drivers/virgl/virgl_tgsi.c 
b/src/gallium/drivers/virgl/virgl_tgsi.c
index ff5abf6ddb..d1f785d4d2 100644
--- a/src/gallium/drivers/virgl/virgl_tgsi.c
+++ b/src/gallium/drivers/virgl/virgl_tgsi.c
@@ -31,6 +31,7 @@
 struct virgl_transform_context {
struct tgsi_transform_context base;
bool cull_enabled;
+   bool has_precise;
 };
 
 static void
@@ -76,7 +77,8 @@ static void
 virgl_tgsi_transform_instruction(struct tgsi_transform_context *ctx,
 struct tgsi_full_instruction *inst)
 {
-   if (inst->Instruction.Precise)
+   struct virgl_transform_context *vtctx = (struct virgl_transform_context 
*)ctx;
+   if (!vtctx->has_precise && inst->Instruction.Precise)
   inst->Instruction.Precise = 0;
 
for (unsigned i = 0; i < inst->Instruction.NumSrcRegs; i++) {
@@ -104,6 +106,7 @@ struct tgsi_token *virgl_tgsi_transform(struct 
virgl_context *vctx, const struct
transform.base.transform_property = virgl_tgsi_transform_property;
transform.base.transform_instruction = virgl_tgsi_transform_instruction;
transform.cull_enabled = vscreen->caps.caps.v1.bset.has_cull;
+   transform.has_precise = vscreen->caps.caps.v2.capability_bits & 
VIRGL_CAP_TGSI_PRECISE;
tgsi_transform_shader(tokens_in, new_tokens, newLen, );
 
return new_tokens;
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 3/4] nv50/ir: optimize imul/imad to xmads

2018-07-23 Thread Rhys Perry

This patch is:

Reviewied-By: Karol Herbst 

forgot to add that

On Mon, Jul 23, 2018 at 11:40 AM, Rhys Perry  wrote:
> This hits the shader-db numbers a good bit, though a few xmads is way
> faster than an imul or imad and the cost is mitigated by the next commit,
> which optimizes many multiplications by immediates into shorter and less
> register heavy instructions than the xmads.
>
> total instructions in shared programs : 5787704 -> 5839715 (0.90%)
> total gprs used in shared programs: 669878 -> 670553 (0.10%)
> total shared used in shared programs  : 548832 -> 548832 (0.00%)
> total local used in shared programs   : 21068 -> 21164 (0.46%)
>
> local sharedgpr   inst  bytes
> helped   0   0  39   0   0
>   hurt   1   0 36530763076
>
> Signed-off-by: Rhys Perry 
> ---
>  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 56 
> ++
>  .../nouveau/codegen/nv50_ir_target_gm107.cpp   |  1 -
>  2 files changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index 6deea7a360..a6ddb284b8 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -2292,13 +2292,18 @@ AlgebraicOpt::visit(BasicBlock *bb)
>  // 
> =
>
>  // ADD(SHL(a, b), c) -> SHLADD(a, b, c)
> +// MUL(a, b) -> a few XMADs
> +// MAD/FMA(a, b, c) -> a few XMADs
>  class LateAlgebraicOpt : public Pass
>  {
>  private:
> virtual bool visit(Instruction *);
>
> void handleADD(Instruction *);
> +   void handleMULMAD(Instruction *);
> bool tryADDToSHLADD(Instruction *);
> +
> +   BuildUtil bld;
>  };
>
>  void
> @@ -2359,6 +2364,52 @@ LateAlgebraicOpt::tryADDToSHLADD(Instruction *add)
> return true;
>  }
>
> +// MUL(a, b) -> a few XMADs
> +// MAD/FMA(a, b, c) -> a few XMADs
> +void
> +LateAlgebraicOpt::handleMULMAD(Instruction *i)
> +{
> +   // TODO: handle NV50_IR_SUBOP_MUL_HIGH
> +   if (!prog->getTarget()->isOpSupported(OP_XMAD, TYPE_U32))
> +  return;
> +   if (isFloatType(i->dType) || typeSizeof(i->dType) != 4)
> +  return;
> +   if (i->subOp || i->usesFlags() || i->flagsDef >= 0)
> +  return;
> +
> +   assert(!i->src(0).mod);
> +   assert(!i->src(1).mod);
> +   assert(i->op == OP_MUL ? 1 : !i->src(2).mod);
> +
> +   bld.setPosition(i, false);
> +
> +   Value *a = i->getSrc(0);
> +   Value *b = i->getSrc(1);
> +   Value *c = i->op == OP_MUL ? bld.mkImm(0) : i->getSrc(2);
> +
> +   Value *tmp0 = bld.getSSA();
> +   Value *tmp1 = bld.getSSA();
> +
> +   Instruction *insn = bld.mkOp3(OP_XMAD, TYPE_U32, tmp0, b, a, c);
> +   insn->setPredicate(i->cc, i->getPredicate());
> +
> +   insn = bld.mkOp3(OP_XMAD, TYPE_U32, tmp1, b, a, bld.mkImm(0));
> +   insn->setPredicate(i->cc, i->getPredicate());
> +   insn->subOp = NV50_IR_SUBOP_XMAD_MRG | NV50_IR_SUBOP_XMAD_H1(1);
> +
> +   Value *pred = i->getPredicate();
> +   i->setPredicate(i->cc, NULL);
> +
> +   i->op = OP_XMAD;
> +   i->setSrc(0, b);
> +   i->setSrc(1, tmp1);
> +   i->setSrc(2, tmp0);
> +   i->subOp = NV50_IR_SUBOP_XMAD_PSL | NV50_IR_SUBOP_XMAD_CBCC;
> +   i->subOp |= NV50_IR_SUBOP_XMAD_H1(0) | NV50_IR_SUBOP_XMAD_H1(1);
> +
> +   i->setPredicate(i->cc, pred);
> +}
> +
>  bool
>  LateAlgebraicOpt::visit(Instruction *i)
>  {
> @@ -2366,6 +2417,11 @@ LateAlgebraicOpt::visit(Instruction *i)
> case OP_ADD:
>handleADD(i);
>break;
> +   case OP_MUL:
> +   case OP_MAD:
> +   case OP_FMA:
> +  handleMULMAD(i);
> +  break;
> default:
>break;
> }
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> index bb1c234c43..edb823afb4 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> @@ -166,7 +166,6 @@ TargetGM107::isBarrierRequired(const Instruction *insn) 
> const
>}
>break;
> case OPCLASS_ARITH:
> -  // TODO: IMUL/IMAD require barriers too, use of XMAD instead!
>if ((insn->op == OP_MUL || insn->op == OP_MAD) &&
>!isFloatType(insn->dType))
>   return true;
> --
> 2.14.4
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 1/4] nv50/ir: add preliminary support for OP_XMAD

2018-07-23 Thread Rhys Perry

Signed-off-by: Rhys Perry 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  | 26 ++
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 18 +--
 .../drivers/nouveau/codegen/nv50_ir_print.cpp  | 19 
 .../drivers/nouveau/codegen/nv50_ir_target.cpp |  7 +++---
 .../nouveau/codegen/nv50_ir_target_gm107.cpp   |  1 +
 .../nouveau/codegen/nv50_ir_target_nv50.cpp|  1 +
 .../nouveau/codegen/nv50_ir_target_nvc0.cpp| 15 +
 7 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 0b220cc48d..13822a08c3 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -58,6 +58,9 @@ enum operation
OP_FMA,
OP_SAD, // abs(src0 - src1) + src2
OP_SHLADD,
+   // extended multiply-add (GM107+), does a lot of things.
+   // see envytools for detailed documentation
+   OP_XMAD,
OP_ABS,
OP_NEG,
OP_NOT,
@@ -256,6 +259,29 @@ enum operation
 #define NV50_IR_SUBOP_MINMAX_MED  2
 #define NV50_IR_SUBOP_MINMAX_HIGH 3
 
+// xmad(src0, src1, 0) << 16 + src2
+#define NV50_IR_SUBOP_XMAD_PSL (1 << 0)
+// (xmad(src0, src1, src2) & 0x) | (src1 << 16)
+#define NV50_IR_SUBOP_XMAD_MRG (1 << 1)
+// xmad(src0, src1, src2.lo)
+#define NV50_IR_SUBOP_XMAD_CLO (1 << 2)
+// xmad(src0, src1, src2.hi)
+#define NV50_IR_SUBOP_XMAD_CHI (2 << 2)
+// if both operands to the multiplication are non-zero, subtract 65536 for each
+// negative operand
+#define NV50_IR_SUBOP_XMAD_CSFU (3 << 2)
+// xmad(src0, src1, src2) + src1 << 16
+#define NV50_IR_SUBOP_XMAD_CBCC (4 << 2)
+#define NV50_IR_SUBOP_XMAD_CMODE_SHIFT 2
+#define NV50_IR_SUBOP_XMAD_CMODE_MASK uint16_t(0x7 << 
NV50_IR_SUBOP_XMAD_CMODE_SHIFT)
+
+// use the high 16 bits instead of the low 16 bits for the multiplication.
+// if the instruction's sType is signed, sign extend the operand from 16 bits
+// to 32 before multiplication.
+#define NV50_IR_SUBOP_XMAD_H1_SHIFT 5
+#define NV50_IR_SUBOP_XMAD_H1(i) (1 << (NV50_IR_SUBOP_XMAD_H1_SHIFT + (i)))
+#define NV50_IR_SUBOP_XMAD_H1_MASK uint16_t(0x3 << NV50_IR_SUBOP_XMAD_H1_SHIFT)
+
 enum DataType
 {
TYPE_NONE,
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 16022e6f23..6deea7a360 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -191,9 +191,17 @@ void
 LoadPropagation::checkSwapSrc01(Instruction *insn)
 {
const Target *targ = prog->getTarget();
-   if (!targ->getOpInfo(insn).commutative)
-  if (insn->op != OP_SET && insn->op != OP_SLCT && insn->op != OP_SUB)
+   if (!targ->getOpInfo(insn).commutative) {
+  if (insn->op != OP_SET && insn->op != OP_SLCT &&
+  insn->op != OP_SUB && insn->op != OP_XMAD)
  return;
+  // XMAD is only commutative if both the CBCC and MRG flags are not set.
+  if (insn->op == OP_XMAD &&
+  (insn->subOp & NV50_IR_SUBOP_XMAD_CMODE_MASK) == 
NV50_IR_SUBOP_XMAD_CBCC)
+ return;
+  if (insn->op == OP_XMAD && (insn->subOp & NV50_IR_SUBOP_XMAD_MRG))
+ return;
+   }
if (insn->src(1).getFile() != FILE_GPR)
   return;
// This is the special OP_SET used for alphatesting, we can't reverse its
@@ -236,6 +244,12 @@ LoadPropagation::checkSwapSrc01(Instruction *insn)
if (insn->op == OP_SUB) {
   insn->src(0).mod = insn->src(0).mod ^ Modifier(NV50_IR_MOD_NEG);
   insn->src(1).mod = insn->src(1).mod ^ Modifier(NV50_IR_MOD_NEG);
+   } else
+   if (insn->op == OP_XMAD) {
+  // swap h1 flags
+  uint16_t h1 = (insn->subOp >> 1 & NV50_IR_SUBOP_XMAD_H1(0)) |
+(insn->subOp << 1 & NV50_IR_SUBOP_XMAD_H1(1));
+  insn->subOp = (insn->subOp & ~NV50_IR_SUBOP_XMAD_H1_MASK) | h1;
}
 }
 
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index ee3506fbae..7eab8b8d70 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -86,6 +86,7 @@ const char *operationStr[OP_LAST + 1] =
"fma",
"sad",
"shladd",
+   "xmad",
"abs",
"neg",
"not",
@@ -240,6 +241,11 @@ static const char *barOpStr[] =
"sync", "arrive", "red and", "red or", "red popc"
 };
 
+static const char *xmadOpCModeStr[] =
+{
+   "clo", "chi", "csfu", "cbcc"
+};
+
 static const char *DataTypeStr[] =
 {
"-",
@@ -625,6 +631,19 @@ void Instruction::print() const
  if (subOp < ARRAY_SIZE(barOpStr))
 PRINT("%s ", barOpStr[subOp]);
  break;
+  case OP_XMAD: {
+ if (subOp & NV50_IR_SUBOP_XMAD_PSL)
+PRINT("psl ");
+ if (subOp & NV50_IR_SUBOP_XMAD_MRG)
+PRINT("mrg ");
+ unsigned cmode = (subOp &

[Mesa-dev] [PATCH v3 4/4] nv50/ir: further optimize multiplication by immediates

2018-07-23 Thread Rhys Perry

Strongly mitigates the harm from the previous commit, which made many
integer multiplications much more heavy on the register and instruction
count.

total instructions in shared programs : 5839715 -> 5801926 (-0.65%)
total gprs used in shared programs: 670553 -> 669853 (-0.10%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21068 (-0.45%)

local sharedgpr   inst  bytes
helped   1   0 40825222522
  hurt   0   0 232  23  23

Signed-off-by: Rhys Perry 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 126 ++---
 1 file changed, 112 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index a6ddb284b8..e5d033c9b0 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -379,6 +379,10 @@ private:
 
CmpInstruction *findOriginForTestWithZero(Value *);
 
+   Value *createMulMethod1(Value *a, unsigned b, Value *c);
+   Value *createMulMethod2(Value *a, unsigned b, Value *c);
+   Value *createMul(Value *a, unsigned b, Value *c);
+
unsigned int foldCount;
 
BuildUtil bld;
@@ -953,6 +957,88 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue 
)
}
 }
 
+Value *
+ConstantFolding::createMulMethod1(Value *a, unsigned b, Value *c)
+{
+   // basically constant folded shift and add multiplication.
+   Value *res = c ? c : bld.loadImm(NULL, 0u);
+   bool resZero = !c;
+   unsigned ashift = 0;
+   while (b) {
+  if ((b & 1) && ashift) {
+ Value *sh = bld.loadImm(NULL, ashift);
+ if (resZero)
+res = bld.mkOp2v(OP_SHL, TYPE_U32, bld.getSSA(), a, sh);
+ else
+res = bld.mkOp3v(OP_SHLADD, TYPE_U32, bld.getSSA(), a, sh, res);
+ resZero = false;
+  } else if (b & 1) {
+ if (resZero)
+res = a;
+ else
+res = bld.mkOp2v(OP_ADD, TYPE_U32, bld.getSSA(), res, a);
+ resZero = false;
+  }
+  b >>= 1;
+  ashift++;
+   }
+   return res;
+}
+
+Value *
+ConstantFolding::createMulMethod2(Value *a, unsigned b, Value *c)
+{
+   // basically does a * b2 - a * (b2 - b) + c
+   uint64_t b2 = util_next_power_of_two64(b);
+   unsigned b2shift = ffsll(b2) - 1;
+
+   Value *mul1 = createMulMethod1(a, b2 - b, NULL);
+
+   Value *res;
+   if (b2shift < 32) {
+  Instruction *i = bld.mkOp3(OP_SHLADD, TYPE_U32, bld.getSSA(),
+ a, bld.loadImm(NULL, b2shift), mul1);
+  res = i->getDef(0);
+
+  // all targets supporting OP_SHLADD should pass this
+  assert(bld.getProgram()->getTarget()->isModSupported(i, 2, 
NV50_IR_MOD_NEG));
+  i->src(2).mod *= Modifier(NV50_IR_MOD_NEG);
+   } else {
+  res = bld.mkOp1v(OP_NEG, TYPE_U32, bld.getSSA(), mul1);
+   }
+
+   if (c)
+  res = bld.mkOp2v(OP_ADD, TYPE_U32, bld.getSSA(), res, c);
+
+   return res;
+}
+
+Value *
+ConstantFolding::createMul(Value *a, unsigned b, Value *c)
+{
+   unsigned cost[2];
+
+   // estimate cost for first method (a << i) + (b << j) + ...
+   cost[0] = util_bitcount64(b >> 1);
+
+   // estimate cost for second method (a << i) - ((a << j) + (a << k) + ...)
+   uint64_t rounded_b = util_next_power_of_two64(b);
+   cost[1] = rounded_b == b ? 1 : (util_bitcount64((rounded_b - b) >> 1) + 1);
+   if (c) cost[1]++;
+
+   // The general method, multiplication by XMADs, costs three instructions.
+   // So nothing much larger than that or it could be making things worse.
+   if (cost[0] > 3 && cost[1] > 3)
+  return NULL;
+
+   // the cost is the same for both methods with powers of twos
+   // but method 1 creates more optimizable code
+   if (cost[0] <= cost[1])
+  return createMulMethod1(a, b, c);
+   else
+  return createMulMethod2(a, b, c);
+}
+
 void
 ConstantFolding::opnd(Instruction *i, ImmediateValue , int s)
 {
@@ -1040,13 +1126,25 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
, int s)
  i->setSrc(s, i->getSrc(t));
  i->src(s).mod = i->src(t).mod;
   } else
-  if (!isFloatType(i->sType) && !imm0.isNegative() && imm0.isPow2()) {
- i->op = OP_SHL;
- imm0.applyLog2();
- i->setSrc(0, i->getSrc(t));
- i->src(0).mod = i->src(t).mod;
- i->setSrc(1, new_ImmediateValue(prog, imm0.reg.data.u32));
- i->src(1).mod = 0;
+  if (!isFloatType(i->dType)) {
+ bool optimized = false;
+ if (target->isOpSupported(OP_SHLADD, TYPE_U32)) {
+bld.setPosition(i, false);
+Value *val = createMul(i->getSrc(t), imm0.reg.data.u32, NULL);
+if (val) {
+   i->def(0).replace(val, false);
+   delete_Instruction(prog, i);
+   optimized = true;
+

[Mesa-dev] [PATCH v3 0/4] nv50/ir: Improve Performance of Integer Multiplication

2018-07-23 Thread Rhys Perry

Changes in v3:
- stylistic changes
- simplify createMulMethod2()
- update shader-db statistics
- use util_bitcount64 and util_next_power_of_two64 instead of
  reimplementing them
Changes in v2:
- rebase
- bring back constant folding for multiplication by power-of-twos for nv50
- remove TODO in nv50_ir_target_gm107.cpp
- document XMAD's flags
- change how XMAD's per-operand flags are represented
- move util/bitscan.h stuff into a new patch
- stylistic changes

This series improve the performance of integer multiplication by removing
much usage of the very slow IMAD and IMUL on Maxwell+ and improving
multiplication by immediates on Fermi+.

The first and second patch add support for the XMAD instruction in codegen

The third patch replaces most IMADs and IMULs with a sequence of XMADs on
Maxwell+. This is far faster but increases the total instructions in the
shader-db by 0.90%, gpr count by 0.10% and local memory by 0.46%.

The next patch significantly lowers this number. It replaces many
multiplications by immediates with instructions that should be as fast or
faster than the XMAD approach. They are also typically smaller and less
register heavy, so they decrease the total instruction count by -0.65% and
bring the gpr count and local memory back to normal.

This series gives about a ~50% speedup in fragment-heavy scenaries with
Dolphin 5.0 on my GTX 1060. All timings were made with interesting looking
fifos from Dolphin's bugtracker:
 Wind Waker: 18 FPS -> 26 FPS at 3x internal resolution
 Wind Waker:  8 FPS -> 11 FPS at 5x internal resolution
   Paper Mario?: 26 FPS -> 42 FPS at 5x internal resolution
SpongeBob Movie: 19 FPS -> 30 FPS at 5x internal resolution

Unigine Heaven and Unigine Valley seems to run the same at low quality with
no anti-aliasing and no tessellation. SuperTuxKart and 0 A.D. also show no
change.

It's possible these patches may break something. Piglit shows no functionality
regressions though they should probably be tested for improvements or breakage
with actual applications.

These patches can also be found on my github:
https://github.com/pendingchaos/mesa/tree/nv-xmad-v3

The final changes in shader-db are as follows:

total instructions in shared programs : 5787704 -> 5801926 (0.25%)
total gprs used in shared programs: 669878 -> 669853 (-0.00%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21068 (0.00%)

local sharedgpr   inst  bytes
helped   0   0 280 717 717
  hurt   0   0 29821712171

Rhys Perry (4):
  nv50/ir: add preliminary support for OP_XMAD
  gm107/ir: add support for OP_XMAD on GM107+
  nv50/ir: optimize imul/imad to xmads
  nv50/ir: further optimize multiplication by immediates

 src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  26 +++
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp |  65 +++
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 200 +++--
 .../drivers/nouveau/codegen/nv50_ir_print.cpp  |  19 ++
 .../drivers/nouveau/codegen/nv50_ir_target.cpp |   7 +-
 .../nouveau/codegen/nv50_ir_target_gm107.cpp   |   6 +-
 .../nouveau/codegen/nv50_ir_target_nv50.cpp|   1 +
 .../nouveau/codegen/nv50_ir_target_nvc0.cpp|  16 ++
 8 files changed, 320 insertions(+), 20 deletions(-)

-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 3/4] nv50/ir: optimize imul/imad to xmads

2018-07-23 Thread Rhys Perry

This hits the shader-db numbers a good bit, though a few xmads is way
faster than an imul or imad and the cost is mitigated by the next commit,
which optimizes many multiplications by immediates into shorter and less
register heavy instructions than the xmads.

total instructions in shared programs : 5787704 -> 5839715 (0.90%)
total gprs used in shared programs: 669878 -> 670553 (0.10%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21164 (0.46%)

local sharedgpr   inst  bytes
helped   0   0  39   0   0
  hurt   1   0 36530763076

Signed-off-by: Rhys Perry 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 56 ++
 .../nouveau/codegen/nv50_ir_target_gm107.cpp   |  1 -
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 6deea7a360..a6ddb284b8 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -2292,13 +2292,18 @@ AlgebraicOpt::visit(BasicBlock *bb)
 // 
=
 
 // ADD(SHL(a, b), c) -> SHLADD(a, b, c)
+// MUL(a, b) -> a few XMADs
+// MAD/FMA(a, b, c) -> a few XMADs
 class LateAlgebraicOpt : public Pass
 {
 private:
virtual bool visit(Instruction *);
 
void handleADD(Instruction *);
+   void handleMULMAD(Instruction *);
bool tryADDToSHLADD(Instruction *);
+
+   BuildUtil bld;
 };
 
 void
@@ -2359,6 +2364,52 @@ LateAlgebraicOpt::tryADDToSHLADD(Instruction *add)
return true;
 }
 
+// MUL(a, b) -> a few XMADs
+// MAD/FMA(a, b, c) -> a few XMADs
+void
+LateAlgebraicOpt::handleMULMAD(Instruction *i)
+{
+   // TODO: handle NV50_IR_SUBOP_MUL_HIGH
+   if (!prog->getTarget()->isOpSupported(OP_XMAD, TYPE_U32))
+  return;
+   if (isFloatType(i->dType) || typeSizeof(i->dType) != 4)
+  return;
+   if (i->subOp || i->usesFlags() || i->flagsDef >= 0)
+  return;
+
+   assert(!i->src(0).mod);
+   assert(!i->src(1).mod);
+   assert(i->op == OP_MUL ? 1 : !i->src(2).mod);
+
+   bld.setPosition(i, false);
+
+   Value *a = i->getSrc(0);
+   Value *b = i->getSrc(1);
+   Value *c = i->op == OP_MUL ? bld.mkImm(0) : i->getSrc(2);
+
+   Value *tmp0 = bld.getSSA();
+   Value *tmp1 = bld.getSSA();
+
+   Instruction *insn = bld.mkOp3(OP_XMAD, TYPE_U32, tmp0, b, a, c);
+   insn->setPredicate(i->cc, i->getPredicate());
+
+   insn = bld.mkOp3(OP_XMAD, TYPE_U32, tmp1, b, a, bld.mkImm(0));
+   insn->setPredicate(i->cc, i->getPredicate());
+   insn->subOp = NV50_IR_SUBOP_XMAD_MRG | NV50_IR_SUBOP_XMAD_H1(1);
+
+   Value *pred = i->getPredicate();
+   i->setPredicate(i->cc, NULL);
+
+   i->op = OP_XMAD;
+   i->setSrc(0, b);
+   i->setSrc(1, tmp1);
+   i->setSrc(2, tmp0);
+   i->subOp = NV50_IR_SUBOP_XMAD_PSL | NV50_IR_SUBOP_XMAD_CBCC;
+   i->subOp |= NV50_IR_SUBOP_XMAD_H1(0) | NV50_IR_SUBOP_XMAD_H1(1);
+
+   i->setPredicate(i->cc, pred);
+}
+
 bool
 LateAlgebraicOpt::visit(Instruction *i)
 {
@@ -2366,6 +2417,11 @@ LateAlgebraicOpt::visit(Instruction *i)
case OP_ADD:
   handleADD(i);
   break;
+   case OP_MUL:
+   case OP_MAD:
+   case OP_FMA:
+  handleMULMAD(i);
+  break;
default:
   break;
}
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
index bb1c234c43..edb823afb4 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
@@ -166,7 +166,6 @@ TargetGM107::isBarrierRequired(const Instruction *insn) 
const
   }
   break;
case OPCLASS_ARITH:
-  // TODO: IMUL/IMAD require barriers too, use of XMAD instead!
   if ((insn->op == OP_MUL || insn->op == OP_MAD) &&
   !isFloatType(insn->dType))
  return true;
-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 2/4] gm107/ir: add support for OP_XMAD on GM107+

2018-07-23 Thread Rhys Perry

Signed-off-by: Rhys Perry 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 65 ++
 .../nouveau/codegen/nv50_ir_target_gm107.cpp   |  6 +-
 .../nouveau/codegen/nv50_ir_target_nvc0.cpp|  1 +
 3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index 1d31f181e4..c3d7be0f0e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -156,6 +156,7 @@ private:
void emitIMUL();
void emitIMAD();
void emitISCADD();
+   void emitXMAD();
void emitIMNMX();
void emitICMP();
void emitISET();
@@ -1892,6 +1893,67 @@ CodeEmitterGM107::emitISCADD()
emitGPR (0x00, insn->def(0));
 }
 
+void
+CodeEmitterGM107::emitXMAD()
+{
+   assert(insn->src(0).getFile() == FILE_GPR);
+
+   bool constbuf = false;
+   bool psl_mrg = true;
+   bool immediate = false;
+   if (insn->src(2).getFile() == FILE_MEMORY_CONST) {
+  assert(insn->src(1).getFile() == FILE_GPR);
+  constbuf = true;
+  psl_mrg = false;
+  emitInsn(0x5100);
+  emitGPR(0x27, insn->src(1));
+  emitCBUF(0x22, -1, 0x14, 16, 2, insn->src(2));
+   } else if (insn->src(1).getFile() == FILE_MEMORY_CONST) {
+  assert(insn->src(2).getFile() == FILE_GPR);
+  constbuf = true;
+  emitInsn(0x4e00);
+  emitCBUF(0x22, -1, 0x14, 16, 2, insn->src(1));
+  emitGPR(0x27, insn->src(2));
+   } else if (insn->src(1).getFile() == FILE_IMMEDIATE) {
+  assert(insn->src(2).getFile() == FILE_GPR);
+  assert(!(insn->subOp & NV50_IR_SUBOP_XMAD_H1(1)));
+  immediate = false;
+  emitInsn(0x3600);
+  emitIMMD(0x14, 19, insn->src(1));
+  emitGPR(0x27, insn->src(2));
+   } else {
+  assert(insn->src(1).getFile() == FILE_GPR);
+  assert(insn->src(2).getFile() == FILE_GPR);
+  emitInsn(0x5b00);
+  emitGPR(0x14, insn->src(1));
+  emitGPR(0x27, insn->src(2));
+   }
+
+   if (psl_mrg)
+  emitField(constbuf ? 0x37 : 0x24, 2, insn->subOp & 0x3);
+
+   unsigned cmode = (insn->subOp & NV50_IR_SUBOP_XMAD_CMODE_MASK);
+   cmode >>= NV50_IR_SUBOP_XMAD_CMODE_SHIFT;
+   emitField(0x32, constbuf ? 2 : 3, cmode);
+
+   emitX(constbuf ? 0x36 : 0x26);
+   emitCC(0x2f);
+
+   emitGPR(0x0, insn->def(0));
+   emitGPR(0x8, insn->src(0));
+
+   // source flags
+   if (isSignedType(insn->sType)) {
+  uint16_t h1s = insn->subOp & NV50_IR_SUBOP_XMAD_H1_MASK;
+  emitField(0x30, 2, h1s >> NV50_IR_SUBOP_XMAD_H1_SHIFT);
+   }
+   emitField(0x35, 1, insn->subOp & NV50_IR_SUBOP_XMAD_H1(0) ? 1 : 0);
+   if (!immediate) {
+  bool h1 = insn->subOp & NV50_IR_SUBOP_XMAD_H1(1);
+  emitField(constbuf ? 0x34 : 0x23, 1, h1);
+   }
+}
+
 void
 CodeEmitterGM107::emitIMNMX()
 {
@@ -3266,6 +3328,9 @@ CodeEmitterGM107::emitInstruction(Instruction *i)
case OP_SHLADD:
   emitISCADD();
   break;
+   case OP_XMAD:
+  emitXMAD();
+  break;
case OP_MIN:
case OP_MAX:
   if (isFloatType(insn->dType)) {
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
index 7293fb27dd..bb1c234c43 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
@@ -60,8 +60,11 @@ TargetGM107::isOpSupported(operation op, DataType ty) const
case OP_SQRT:
case OP_DIV:
case OP_MOD:
-   case OP_XMAD:
   return false;
+   case OP_XMAD:
+  if (isFloatType(ty))
+ return false;
+  break;
default:
   break;
}
@@ -231,6 +234,7 @@ TargetGM107::getLatency(const Instruction *insn) const
case OP_SUB:
case OP_VOTE:
case OP_XOR:
+   case OP_XMAD:
   if (insn->dType != TYPE_F64)
  return 6;
   break;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
index 7e66d2950b..5257f353e4 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
@@ -161,6 +161,7 @@ static const struct opProperties _initPropsGM107[] = {
{ OP_SUSTP,   0x0, 0x0, 0x0, 0x0, 0x0, 0x4 },
{ OP_SUREDB,  0x0, 0x0, 0x0, 0x0, 0x0, 0x4 },
{ OP_SUREDP,  0x0, 0x0, 0x0, 0x0, 0x0, 0x4 },
+   { OP_XMAD,0x0, 0x0, 0x0, 0x0, 0x6, 0x2 },
 };
 
 void TargetNVC0::initProps(const struct opProperties *props, int size)
-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 07/15] spirv/nir: Set info.gs.uses_end_primitive

2018-07-23 Thread Alejandro Piñeiro

On 20/07/18 18:32, Jason Ekstrand wrote:
> This should already be handled by nir_gather_info.  Is there some
> reason why we need to do it here as well?

Seems so. I have just tried to remove this patch, and I got no
regressions with run our specific tests, and a full run with the
borrowed tests. So lets drop it for now.
>
> On July 20, 2018 08:09:26 Alejandro Piñeiro  wrote:
>
>> From: Neil Roberts 
>>
>> Whenever SpvOpEndPrimitive or SpvOpEndStreamPrimitive is encountered
>> it now sets uses_end_primitive to true. This reflects the code in
>> validate_geometry_shader_emissions for GLSL.
>> ---
>> src/compiler/spirv/spirv_to_nir.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/src/compiler/spirv/spirv_to_nir.c
>> b/src/compiler/spirv/spirv_to_nir.c
>> index 238298a8340..4d297e60b60 100644
>> --- a/src/compiler/spirv/spirv_to_nir.c
>> +++ b/src/compiler/spirv/spirv_to_nir.c
>> @@ -3208,6 +3208,7 @@ vtn_handle_barrier(struct vtn_builder *b, SpvOp
>> opcode,
>>   case SpvOpEndPrimitive:
>>   case SpvOpEndStreamPrimitive:
>>  intrinsic_op = nir_intrinsic_end_primitive;
>> + b->shader->info.gs.uses_end_primitive = true;
>>  break;
>>   default:
>>  unreachable("Invalid opcode");
>> -- 
>> 2.14.1
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
>
>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/2] i965, anv: Use INTEL_DEBUG for disk_cache driver flags

2018-07-23 Thread Eero Tamminen


Hi,

Should this fix also:
https://bugs.freedesktop.org/show_bug.cgi?id=106382
?

- Eero

On 23.07.2018 07:27, Jordan Justen wrote:

Since various options within INTEL_DEBUG could impact code generation,
we need to set the disk cache driver_flags parameter based on the
INTEL_DEBUG flags in use.

An example that will affect the program generated by i965 is the
INTEL_DEBUG=nocompact option.

The DEBUG_DISK_CACHE_MASK value is added to mask the settings of
INTEL_DEBUG that can affect program generation.

v2:
  * Use driver_flags (Tim)
  * Also update Anvil (Jason)

Signed-off-by: Jordan Justen 
Reviewed-by: Lionel Landwerlin  (v1)
---
  src/intel/common/gen_debug.h   | 5 +
  src/intel/vulkan/anv_device.c  | 3 ++-
  src/mesa/drivers/dri/i965/brw_disk_cache.c | 3 ++-
  3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/intel/common/gen_debug.h b/src/intel/common/gen_debug.h
index f6c44eeb912..aa9f3cf80d7 100644
--- a/src/intel/common/gen_debug.h
+++ b/src/intel/common/gen_debug.h
@@ -84,6 +84,11 @@ extern uint64_t INTEL_DEBUG;
  #define DEBUG_COLOR   (1ull << 40)
  #define DEBUG_REEMIT  (1ull << 41)
  
+/* These flags may affect program generation */

+#define DEBUG_DISK_CACHE_MASK \
+   (DEBUG_SHADER_TIME | DEBUG_NO16 | DEBUG_NO_DUAL_OBJECT_GS | DEBUG_NO8 | \
+   DEBUG_SPILL_FS | DEBUG_SPILL_VEC4 | DEBUG_NO_COMPACTION | DEBUG_DO32)
+
  #ifdef HAVE_ANDROID_PLATFORM
  #define LOG_TAG "INTEL-MESA"
  #if ANDROID_API_LEVEL >= 26
diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 247ba641336..97a71563b8a 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -286,7 +286,8 @@ anv_physical_device_init_disk_cache(struct 
anv_physical_device *device)
 char timestamp[41];
 _mesa_sha1_format(timestamp, device->driver_build_sha1);
  
-   device->disk_cache = disk_cache_create(renderer, timestamp, 0);

+   const uint64_t driver_flags = INTEL_DEBUG & DEBUG_DISK_CACHE_MASK;
+   device->disk_cache = disk_cache_create(renderer, timestamp, driver_flags);
  #else
 device->disk_cache = NULL;
  #endif
diff --git a/src/mesa/drivers/dri/i965/brw_disk_cache.c 
b/src/mesa/drivers/dri/i965/brw_disk_cache.c
index a678c355b9d..8f1b064fd61 100644
--- a/src/mesa/drivers/dri/i965/brw_disk_cache.c
+++ b/src/mesa/drivers/dri/i965/brw_disk_cache.c
@@ -393,6 +393,7 @@ brw_disk_cache_init(struct intel_screen *screen)
 char timestamp[41];
 _mesa_sha1_format(timestamp, id_sha1);
  
-   screen->disk_cache = disk_cache_create(renderer, timestamp, 0);

+   const uint64_t driver_flags = INTEL_DEBUG & DEBUG_DISK_CACHE_MASK;
+   screen->disk_cache = disk_cache_create(renderer, timestamp, driver_flags);
  #endif
  }



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] RFC : Context aware user space Resource control

2018-07-23 Thread Kedar J Karanje

On Fri, Jul 20, 2018 at 10:59:03AM +0100, Lionel Landwerlin wrote:
> On 20/07/18 09:32, aravindan.muthuku...@intel.com wrote:
> >diff --git a/src/egl/generate/egl.xml b/src/egl/generate/egl.xml
> >index 9250f93..52b0c9f 100644
> >--- a/src/egl/generate/egl.xml
> >+++ b/src/egl/generate/egl.xml
> >@@ -460,6 +460,14 @@
> >  
> >  
> >+ >comment="Reserved for context aware load type">
> >+
> >+
> >+
> >+
> >+
> >+
> >+
> >   > comment="Reserved for Tim Renouf, Antix (Khronos bug 4949)">
> >  
> >  
> 
> Are you defining a new extension here?
> The chunk above seems to imply this is from IMG, but still commented as
> reserved.
>
> As far as I understand enums need to be allocated through Khronos group
> before you can make use of them.
> Otherwise this will be unusable by applications as soon as another extension
> reuses the same enums...

Hello Lionel, 
  Thank you for the comments, yes we used IMG extension as a template and were
  wondering how the xml is modified , when we found the procedure for proposing
  an extension : https://www.khronos.org/registry/OpenGL/docs/rules.html , 
  just wondering how does it work for RFC ,
  Do we need to get through the Khronos process first then send for RFC or 
  is the other way round ok ? I shall dig into the mail archives for any
  existing extension and how it came into being.

Thanks,
Kedar
> 
> Thanks,
> 
> -
> Lionel
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] docs: update calendar to match the 18.2 plan with the one announced

2018-07-23 Thread Juan A. Suarez Romero

Reviewed-by: Juan A. Suarez 


J.A.

On Thu, 2018-07-19 at 15:21 -0700, Dylan Baker wrote:
> Quoting Andres Gomez (2018-07-19 06:03:11)
> > Additionally, I've extended the 18.1 cycle by one more release,
> > tentatively assigned to Dylan, due to the ~2 weeks delay for 18.2.
> 
> This seems like a good plan to me, and I have no problem making an additional
> release.
> 
> Acked-by: Dylan Baker 
> 
> > 
> > Cc: Dylan Baker 
> > Cc: Juan A. Suarez 
> > Cc: Emil Velikov 
> > Signed-off-by: Andres Gomez 
> > ---
> >  docs/release-calendar.html | 16 +++-
> >  1 file changed, 11 insertions(+), 5 deletions(-)
> > 
> > diff --git a/docs/release-calendar.html b/docs/release-calendar.html
> > index 7f029b5b396..8e23b7f9e05 100644
> > --- a/docs/release-calendar.html
> > +++ b/docs/release-calendar.html
> > @@ -39,7 +39,7 @@ if you'd like to nominate a patch in the next stable 
> > release.
> >  Notes
> >  
> >  
> > -18.1
> > +18.1
> >  2018-07-27
> >  18.1.5
> >  Dylan Baker
> > @@ -55,29 +55,35 @@ if you'd like to nominate a patch in the next stable 
> > release.
> >  2018-08-24
> >  18.1.7
> >  Dylan Baker
> > +
> > +
> > +
> > +2018-09-07
> > +18.1.8
> > +Dylan Baker
> >  Last planned 18.1.x release
> >  
> >  
> >  18.2
> > -2018-07-20
> > +2018-08-01
> >  18.2.0rc1
> >  Andres Gomez
> >  
> >  
> >  
> > -2018-07-27
> > +2018-08-08
> >  18.2.0rc2
> >  Andres Gomez
> >  
> >  
> >  
> > -2018-08-03
> > +2018-08-15
> >  18.2.0rc3
> >  Andres Gomez
> >  
> >  
> >  
> > -2018-08-10
> > +2018-08-22
> >  18.2.0rc4
> >  Andres Gomez
> >  Last planned RC/Final release
> > -- 
> > 2.18.0
> > 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] docs: move releases from Fridays to Wednesdays

2018-07-23 Thread Juan A. Suarez Romero

Reviewed-by: Juan A. Suarez 

Reviewed/Acked/Agreed, whatever makes more sense :)



J.A.

On Thu, 2018-07-19 at 16:00 +0300, Andres Gomez wrote:
> As discussed at:
> https://lists.freedesktop.org/archives/mesa-dev/2018-March/188525.html
> 
> Cc: Emil Velikov 
> Cc: Juan A. Suarez Romero 
> Cc: Dylan Baker 
> Cc: Ian Romanick 
> Cc: Carl Worth 
> Cc: Mark Janes 
> Signed-off-by: Andres Gomez 
> ---
>  docs/releasing.html | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/releasing.html b/docs/releasing.html
> index a022d0c484b..14315e7a8e4 100644
> --- a/docs/releasing.html
> +++ b/docs/releasing.html
> @@ -54,8 +54,8 @@ For example:
>  Release schedule
>  
>  
> -Releases should happen on Fridays. Delays can occur although those should be 
> keep
> -to a minimum.
> +Releases should happen on Wednesdays. Delays can occur although those
> +should be keep to a minimum.
>  
>  See our calendar for the
>  date and other details for individual releases.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 106590] Wrong line numbers expanded while compiling shaders

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=106590

--- Comment #3 from Zhaowei Yuan  ---
I posted patch v2 to cc more maintainer:
https://patchwork.freedesktop.org/patch/240324/

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107156] earth tessellation bug

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107156

Gregor Münch  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|REOPENED|RESOLVED

--- Comment #10 from Gregor Münch  ---


*** This bug has been marked as a duplicate of bug 107276 ***

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107276] radv: OpBitfieldUExtract returns incorrect result when count is zero

2018-07-23 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107276

Gregor Münch  changed:

   What|Removed |Added

 CC||ximi...@mail.ru

--- Comment #6 from Gregor Münch  ---
*** Bug 107156 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT

2018-07-23 Thread Alex Smith

According to the spec, these should apply to all read/write access
types (so would be equivalent to specifying all other access types
individually). Currently, they were doing nothing.

v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask.

Signed-off-by: Alex Smith 
Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/vulkan/anv_private.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index cec2842792..1660fcbbc8 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1731,6 +1731,9 @@ anv_pipe_flush_bits_for_access_flags(VkAccessFlags flags)
  pipe_bits |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT;
  pipe_bits |= ANV_PIPE_DEPTH_CACHE_FLUSH_BIT;
  break;
+  case VK_ACCESS_MEMORY_WRITE_BIT:
+ pipe_bits |= ANV_PIPE_FLUSH_BITS;
+ break;
   default:
  break; /* Nothing to do */
   }
@@ -1761,6 +1764,12 @@ anv_pipe_invalidate_bits_for_access_flags(VkAccessFlags 
flags)
   case VK_ACCESS_TRANSFER_READ_BIT:
  pipe_bits |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT;
  break;
+  case VK_ACCESS_MEMORY_READ_BIT:
+ pipe_bits |= ANV_PIPE_INVALIDATE_BITS;
+ break;
+  case VK_ACCESS_MEMORY_WRITE_BIT:
+ pipe_bits |= ANV_PIPE_FLUSH_BITS;
+ break;
   default:
  break; /* Nothing to do */
   }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] glcpp: Sync line number for macro

2018-07-23 Thread zhaowei yuan

Line number of a predefined macro should be set
as where it is referenced rather than declared

Patch v2 does nothing more except ccing more maintainers

Signed-off-by: zhaowei yuan 
Bugzilla: https://patchwork.freedesktop.org/patch/225818/
---
 src/compiler/glsl/glcpp/glcpp-lex.l   |  1 +
 src/compiler/glsl/glcpp/glcpp-parse.y | 55 ++-
 src/compiler/glsl/glcpp/glcpp.h   |  4 ++-
 src/compiler/glsl/glcpp/pp.c  |  3 +-
 4 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/src/compiler/glsl/glcpp/glcpp-lex.l 
b/src/compiler/glsl/glcpp/glcpp-lex.l
index 9cfcc12..86b82c2 100644
--- a/src/compiler/glsl/glcpp/glcpp-lex.l
+++ b/src/compiler/glsl/glcpp/glcpp-lex.l
@@ -50,6 +50,7 @@ void glcpp_set_column (int  column_no , yyscan_t yyscanner);
yylloc->first_line = yylloc->last_line = yylineno;  \
yycolumn += yyleng; \
yylloc->last_column = yycolumn + 1; \
+   yylloc->position = (yytext - 
YY_CURRENT_BUFFER_LVALUE->yy_ch_buf); \
parser->has_new_line_number = 0;\
parser->has_new_source_number = 0;  \
} while(0);
diff --git a/src/compiler/glsl/glcpp/glcpp-parse.y 
b/src/compiler/glsl/glcpp/glcpp-parse.y
index ccb3aa1..68f8477 100644
--- a/src/compiler/glsl/glcpp/glcpp-parse.y
+++ b/src/compiler/glsl/glcpp/glcpp-parse.y
@@ -1021,7 +1021,7 @@ _token_list_append_list(token_list_t *list, token_list_t 
*tail)
 }
 
 static token_list_t *
-_token_list_copy(glcpp_parser_t *parser, token_list_t *other)
+_token_list_copy(glcpp_parser_t *parser, token_list_t *other, token_node_t 
*macro_node)
 {
token_list_t *copy;
token_node_t *node;
@@ -1033,6 +1033,12 @@ _token_list_copy(glcpp_parser_t *parser, token_list_t 
*other)
for (node = other->head; node; node = node->next) {
   token_t *new_token = linear_alloc_child(parser->linalloc, 
sizeof(token_t));
   *new_token = *node->token;
+
+  if(macro_node) {
+ new_token->location.first_line = 
macro_node->token->location.first_line;
+ new_token->location.last_line = macro_node->token->location.last_line;
+  }
+
   _token_list_append (parser, copy, new_token);
}
 
@@ -1349,7 +1355,7 @@ add_builtin_define(glcpp_parser_t *parser, const char 
*name, int value)
 
 glcpp_parser_t *
 glcpp_parser_create(const struct gl_extensions *extension_list,
-glcpp_extension_iterator extensions, void *state, gl_api 
api)
+glcpp_extension_iterator extensions, void *state, gl_api 
api, const char *input)
 {
glcpp_parser_t *parser;
 
@@ -1377,6 +1383,11 @@ glcpp_parser_create(const struct gl_extensions 
*extension_list,
parser->lex_from_list = NULL;
parser->lex_from_node = NULL;
 
+   parser->input = _mesa_string_buffer_create(parser, strlen(input) + 1);
+   strcpy(parser->input->buf, input);
+   parser->input->buf[strlen(input)] = '\0';
+   parser->input->length = strlen(input);
+
parser->output = _mesa_string_buffer_create(parser,
INITIAL_PP_OUTPUT_BUF_SIZE);
parser->info_log = _mesa_string_buffer_create(parser,
@@ -1441,7 +1452,7 @@ typedef enum function_status
 static function_status_t
 _arguments_parse(glcpp_parser_t *parser,
  argument_list_t *arguments, token_node_t *node,
- token_node_t **last)
+ token_node_t **last, int *end_position)
 {
token_list_t *argument;
int paren_count;
@@ -1465,8 +1476,10 @@ _arguments_parse(glcpp_parser_t *parser,
  paren_count++;
   } else if (node->token->type == ')') {
  paren_count--;
- if (paren_count == 0)
+ if (paren_count == 0) {
+*end_position = node->token->location.position;
 break;
+ }
   }
 
   if (node->token->type == ',' && paren_count == 1) {
@@ -1702,6 +1715,28 @@ _glcpp_parser_apply_pastes(glcpp_parser_t *parser, 
token_list_t *list)
list->non_space_tail = list->tail;
 }
 
+static int
+_glcpp_parser_get_line(glcpp_parser_t *parser, int offset)
+{
+   int line = 1;
+   int i;
+
+   for(i = 0; parser->input->buf[i] && i <= offset; i++) {
+  if(parser->input->buf[i] == '\n' || parser->input->buf[i] == '\r')
+ line++;
+   }
+
+   return line;
+}
+
+static void
+_glcpp_sync_location(glcpp_parser_t *parser, token_t *token, token_t 
*macro_token)
+{
+   token->location.source = macro_token->location.source;
+   token->location.first_line = _glcpp_parser_get_line(parser, 
token->location.position);
+   token->location.last_line = token->location.first_line;
+}
+
 /* This is a helper function that's essentially part of the
  * implementation of _glcpp_parser_expand_node. It shouldn't be called
  * except for by that function.
@@ -1732,7 +1767,9 @@ _glcpp_parser_expand_function(glcpp_parser_t *parser,

Re: [Mesa-dev] [PATCH] anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT

2018-07-23 Thread Alex Smith

On 20 July 2018 at 19:01, Jason Ekstrand  wrote:

> On Fri, Jul 20, 2018 at 8:37 AM Lionel Landwerlin <
> lionel.g.landwer...@intel.com> wrote:
>
>> On 20/07/18 11:44, Alex Smith wrote:
>>
>> According to the spec, these should apply to all read/write access
>> types (so would be equivalent to specifying all other access types
>> individually). Currently, they were doing nothing.
>>
>> Signed-off-by: Alex Smith  
>> 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>  src/intel/vulkan/anv_private.h | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
>> index cec2842792..775bacaff2 100644
>> --- a/src/intel/vulkan/anv_private.h
>> +++ b/src/intel/vulkan/anv_private.h
>> @@ -1731,6 +1731,9 @@ anv_pipe_flush_bits_for_access_flags(VkAccessFlags 
>> flags)
>>   pipe_bits |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT;
>>   pipe_bits |= ANV_PIPE_DEPTH_CACHE_FLUSH_BIT;
>>   break;
>> +  case VK_ACCESS_MEMORY_WRITE_BIT:
>> + pipe_bits |= ANV_PIPE_FLUSH_BITS;
>> + break;
>>default:
>>   break; /* Nothing to do */
>>}
>> @@ -1761,6 +1764,9 @@ 
>> anv_pipe_invalidate_bits_for_access_flags(VkAccessFlags flags)
>>case VK_ACCESS_TRANSFER_READ_BIT:
>>   pipe_bits |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT;
>>   break;
>> +  case VK_ACCESS_MEMORY_READ_BIT:
>> + pipe_bits |= ANV_PIPE_INVALIDATE_BITS;
>> + break;
>>
>>
>> I know this function is a bit oddly named for that, but with this part of
>> the spec regarding VK_ACCESS_MEMORY_WRITE_BIT :
>>
>> "
>>
>>-
>>
>>When included in a destination access mask, makes all available
>>writes visible to all future write accesses on entities known to the 
>> Vulkan
>>device.
>>
>> "
>>
>> I would also add :
>>
>> case VK_ACCESS_MEMORY_WRITE_BIT:
>> pipe_bits |= ANV_PIPE_FLUSH_BITS;
>> break;
>>
>> Does that sound fair?
>>
>
> That's quite the heavy hammer But I think it's the right thing to do.
>

Yes - these bits are supposed to be the heaviest hammer there is. I'll add
that.

Alex
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: Add a couple of iand/ior optimizations

2018-07-23 Thread Timothy Arceri

Ian and I have been looking at these type of things recently. Ian has 
started work on a pass to cover this stuff without having to add dozens 
of these types of opts.


https://lists.freedesktop.org/archives/mesa-dev/2018-July/200583.html

On 23/07/18 17:36, Jason Ekstrand wrote:

Spotted in a shader in Batman: Arkham City.
---
  src/compiler/nir/nir_opt_algebraic.py | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index ba277fdfd0e..f2007852b21 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -377,6 +377,8 @@ optimizations = [
 (('ixor', a, a), 0),
 (('ixor', a, 0), a),
 (('inot', ('inot', a)), a),
+   (('ior', ('iand', a, b), b), b),
+   (('iand', ('ior', a, b), b), b),
 # DeMorgan's Laws
 (('iand', ('inot', a), ('inot', b)), ('inot', ('ior',  a, b))),
 (('ior',  ('inot', a), ('inot', b)), ('inot', ('iand', a, b))),


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 5/8] nir: add complex_loop bool to loop info

2018-07-23 Thread Timothy Arceri

In order to be sure loop_terminator_list is an accurate
representation of all the jumps in the loop we need to be sure we
didn't encounter any other complex behaviour such as continues,
nested breaks, etc during analysis.

This will be used in the following patch.
---
 src/compiler/nir/nir.h  | 6 ++
 src/compiler/nir/nir_loop_analyze.c | 8 ++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 3bfe7d7f7bf..336a2ca5ac4 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1745,6 +1745,12 @@ typedef struct {
/* Unroll the loop regardless of its size */
bool force_unroll;
 
+   /* Does the loop contain complex loop terminators, continues or other
+* complex behaviours? If this is true we can't rely on
+* loop_terminator_list to be complete or accurate.
+*/
+   bool complex_loop;
+
nir_loop_terminator *limiting_terminator;
 
/* A list of loop_terminators terminating this loop. */
diff --git a/src/compiler/nir/nir_loop_analyze.c 
b/src/compiler/nir/nir_loop_analyze.c
index 5454b7691ba..9c3fd2f286f 100644
--- a/src/compiler/nir/nir_loop_analyze.c
+++ b/src/compiler/nir/nir_loop_analyze.c
@@ -317,15 +317,19 @@ find_loop_terminators(loop_info_state *state)
   * not find a loop terminator, but there is a break-statement then
   * we should return false so that we do not try to find trip-count
   */
- if (!nir_is_trivial_loop_if(nif, break_blk))
+ if (!nir_is_trivial_loop_if(nif, break_blk)) {
+state->loop->info->complex_loop = true;
 return false;
+ }
 
  /* Continue if the if contained no jumps at all */
  if (!break_blk)
 continue;
 
- if (nif->condition.ssa->parent_instr->type == nir_instr_type_phi)
+ if (nif->condition.ssa->parent_instr->type == nir_instr_type_phi) {
+state->loop->info->complex_loop = true;
 return false;
+ }
 
  nir_loop_terminator *terminator =
 rzalloc(state->loop->info, nir_loop_terminator);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 8/8] nir: add loop unroll support for complex wrapper loops

2018-07-23 Thread Timothy Arceri

In GLSL IR we cheat with switch statements and simply convert them
into loops with a single iteration. This allowed us to make use of
the existing jump instruction handling provided by the loop handing
code, it also allows dead code to be cleaned up once we have
wrapped the code in a loop.

However using loops in this way created previously unrollable loops
which limits further optimisations. Here we provide a way to unroll
loops that end in a break and have multiple other exits.

All shader-db changes are from the dolphin uber shaders. There is a
small amount of HURT shaders but in general the improvements far
exceed the HURT.

shader-db results IVB:

total instructions in shared programs: 10018187 -> 10016468 (-0.02%)
instructions in affected programs: 104080 -> 102361 (-1.65%)
helped: 36
HURT: 15

total cycles in shared programs: 220065064 -> 154529655 (-29.78%)
cycles in affected programs: 126063017 -> 60527608 (-51.99%)
helped: 51
HURT: 0

total loops in shared programs: 2515 -> 2308 (-8.23%)
loops in affected programs: 903 -> 696 (-22.92%)
helped: 51
HURT: 0

total spills in shared programs: 4370 -> 4124 (-5.63%)
spills in affected programs: 1397 -> 1151 (-17.61%)
helped: 9
HURT: 12

total fills in shared programs: 4581 -> 4419 (-3.54%)
fills in affected programs: 2201 -> 2039 (-7.36%)
helped: 9
HURT: 15
---
 src/compiler/nir/nir_opt_loop_unroll.c | 113 +
 1 file changed, 76 insertions(+), 37 deletions(-)

diff --git a/src/compiler/nir/nir_opt_loop_unroll.c 
b/src/compiler/nir/nir_opt_loop_unroll.c
index 9c33267cb72..fa60523aac7 100644
--- a/src/compiler/nir/nir_opt_loop_unroll.c
+++ b/src/compiler/nir/nir_opt_loop_unroll.c
@@ -67,7 +67,6 @@ loop_prepare_for_unroll(nir_loop *loop)
/* Remove continue if its the last instruction in the loop */
nir_instr *last_instr = nir_block_last_instr(nir_loop_last_block(loop));
if (last_instr && last_instr->type == nir_instr_type_jump) {
-  assert(nir_instr_as_jump(last_instr)->type == nir_jump_continue);
   nir_instr_remove(last_instr);
}
 }
@@ -474,54 +473,91 @@ complex_unroll(nir_loop *loop, nir_loop_terminator 
*unlimit_term,
 static bool
 wrapper_unroll(nir_loop *loop)
 {
-   bool progress = false;
-
-   nir_block *blk_after_loop =
-  nir_cursor_current_block(nir_after_cf_node(>cf_node));
-
-   /* There may still be some single src phis following the loop that
-* have not yet been cleaned up by another pass. Tidy those up before
-* unrolling the loop.
-*/
-   nir_foreach_instr_safe(instr, blk_after_loop) {
-  if (instr->type != nir_instr_type_phi)
- break;
+   if (!list_empty(>info->loop_terminator_list)) {
 
-  nir_phi_instr *phi = nir_instr_as_phi(instr);
-  assert(exec_list_length(>srcs) == 1);
+  /* Unrolling a loop with a large number of exits can result in a
+   * large inrease in register pressure. For now we just skip
+   * unrolling if we have more than 3 exits (not including the break
+   * at the end of the loop).
+   *
+   * TODO: Most loops that fit this pattern are simply switch
+   * statements that are converted to a loop to take advantage of
+   * exiting jump instruction handling. In this case we could make
+   * use of a binary seach pattern like we do in
+   * nir_lower_indirect_derefs(), this should allow us to unroll the
+   * loops in an optimal way and should also avoid some of the
+   * register pressure that comes from simply nesting the
+   * terminators one after the other.
+   */
+  if (list_length(>info->loop_terminator_list) > 3)
+ return false;
+
+  loop_prepare_for_unroll(loop);
+
+  nir_cursor cursor = nir_after_block(nir_loop_last_block(loop));
+  list_for_each_entry(nir_loop_terminator, terminator,
+  >info->loop_terminator_list,
+  loop_terminator_link) {
+
+ /* Remove break from the terminator */
+ nir_instr *break_instr =
+nir_block_last_instr(terminator->break_block);
+ nir_instr_remove(break_instr);
+
+ /* Pluck out the loop body. */
+ nir_cf_list loop_body;
+ nir_cf_extract(_body,
+nir_after_cf_node(>nif->cf_node),
+cursor);
+
+ /* Reinsert loop body into continue from block */
+ nir_cf_reinsert(_body,
+ nir_after_block(terminator->continue_from_block));
+
+ cursor = terminator->continue_from_then ?
+   nir_after_block(nir_if_last_then_block(terminator->nif)) :
+   nir_after_block(nir_if_last_else_block(terminator->nif));
+  }
+   } else {
+  nir_block *blk_after_loop =
+ nir_cursor_current_block(nir_after_cf_node(>cf_node));
 
-  nir_phi_src *phi_src = exec_node_data(nir_phi_src,
-exec_list_get_head(>srcs),
-node);
+  /* There may

[Mesa-dev] [PATCH v2 6/8] nir/opt_loop_unroll: Remove unneeded phis if we make progress

2018-07-23 Thread Timothy Arceri

Now that SSA values can be derefs and they have special rules, we have
to be a bit more careful about our LCSSA phis.  In particular, we need
to clean up in case LCSSA ended up creating a phi node for a deref.
This avoids validation issues with some CTS tests with the following
patch, but its possible this we could also see the same problem with
the existing unrolling passes.
---
 src/compiler/nir/nir_opt_loop_unroll.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir_opt_loop_unroll.c 
b/src/compiler/nir/nir_opt_loop_unroll.c
index a1ad0e59814..e0e0b754716 100644
--- a/src/compiler/nir/nir_opt_loop_unroll.c
+++ b/src/compiler/nir/nir_opt_loop_unroll.c
@@ -575,9 +575,17 @@ nir_opt_loop_unroll_impl(nir_function_impl *impl,
 _nested_loop);
}
 
-   if (progress)
+   if (progress) {
   nir_lower_regs_to_ssa_impl(impl);
 
+  /* Calling nir_convert_loop_to_lcssa() adds extra phi nodes which may
+   * not be valid if they're used for something such as a deref.
+   *  Remove any unneeded phis.
+   */
+  nir_copy_prop(impl->function->shader);
+  nir_opt_remove_phis_impl(impl);
+   }
+
return progress;
 }
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 3/8] nir: allow more nested loops to be unrolled

2018-07-23 Thread Timothy Arceri

The innermost check was added to stop us from unrolling multiple
loops in a single pass, and to stop outer loops from unrolling.

When we successfully unroll a loop we need to run the analysis
pass again before deciding if we want to go ahead an unroll a
second loop.

However the logic was flawed because it never tried to unroll any
nested loops other than the first innermost loop it found.
If this innermost loop is not unrolled we end up skipping all
other nested loops.

This unrolls a loop in a Deus Ex: MD shader on ultra settings and
also unrolls a loop in a shader from the game Prey when running
on DXVK.
---
 src/compiler/nir/nir_opt_loop_unroll.c | 31 ++
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/src/compiler/nir/nir_opt_loop_unroll.c 
b/src/compiler/nir/nir_opt_loop_unroll.c
index 955dfede694..a1ad0e59814 100644
--- a/src/compiler/nir/nir_opt_loop_unroll.c
+++ b/src/compiler/nir/nir_opt_loop_unroll.c
@@ -483,9 +483,10 @@ is_loop_small_enough_to_unroll(nir_shader *shader, 
nir_loop_info *li)
 }
 
 static bool
-process_loops(nir_shader *sh, nir_cf_node *cf_node, bool *innermost_loop)
+process_loops(nir_shader *sh, nir_cf_node *cf_node, bool *has_nested_loop_out)
 {
bool progress = false;
+   bool has_nested_loop = false;
nir_loop *loop;
 
switch (cf_node->type) {
@@ -494,32 +495,32 @@ process_loops(nir_shader *sh, nir_cf_node *cf_node, bool 
*innermost_loop)
case nir_cf_node_if: {
   nir_if *if_stmt = nir_cf_node_as_if(cf_node);
   foreach_list_typed_safe(nir_cf_node, nested_node, node, 
_stmt->then_list)
- progress |= process_loops(sh, nested_node, innermost_loop);
+ progress |= process_loops(sh, nested_node, has_nested_loop_out);
   foreach_list_typed_safe(nir_cf_node, nested_node, node, 
_stmt->else_list)
- progress |= process_loops(sh, nested_node, innermost_loop);
+ progress |= process_loops(sh, nested_node, has_nested_loop_out);
   return progress;
}
case nir_cf_node_loop: {
   loop = nir_cf_node_as_loop(cf_node);
   foreach_list_typed_safe(nir_cf_node, nested_node, node, >body)
- progress |= process_loops(sh, nested_node, innermost_loop);
+ progress |= process_loops(sh, nested_node, _nested_loop);
+
   break;
}
default:
   unreachable("unknown cf node type");
}
 
-   if (*innermost_loop) {
-  /* Don't attempt to unroll outer loops or a second inner loop in
-   * this pass wait until the next pass as we have altered the cf.
-   */
-  *innermost_loop = false;
+   /* Don't attempt to unroll a second inner loop in this pass, wait until the
+* next pass as we have altered the cf.
+*/
+   if (!progress) {
 
-  if (loop->info->limiting_terminator == NULL)
- return progress;
+  if (has_nested_loop || loop->info->limiting_terminator == NULL)
+ goto exit;
 
   if (!is_loop_small_enough_to_unroll(sh, loop->info))
- return progress;
+ goto exit;
 
   if (loop->info->is_trip_count_known) {
  simple_unroll(loop);
@@ -555,6 +556,8 @@ process_loops(nir_shader *sh, nir_cf_node *cf_node, bool 
*innermost_loop)
   }
}
 
+exit:
+   *has_nested_loop_out = true;
return progress;
 }
 
@@ -567,9 +570,9 @@ nir_opt_loop_unroll_impl(nir_function_impl *impl,
nir_metadata_require(impl, nir_metadata_block_index);
 
foreach_list_typed_safe(nir_cf_node, node, node, >body) {
-  bool innermost_loop = true;
+  bool has_nested_loop = false;
   progress |= process_loops(impl->function->shader, node,
-_loop);
+_nested_loop);
}
 
if (progress)
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 7/8] nir: add loop unroll support for wrapper loops

2018-07-23 Thread Timothy Arceri

This adds support for unrolling the classic

do {
// ...
} while (false)

that is used to wrap multi-line macros. GLSL IR also wraps switch
statements in a loop like this.

shader-db results IVB:

total loops in shared programs: 2515 -> 2512 (-0.12%)
loops in affected programs: 33 -> 30 (-9.09%)
helped: 3
HURT: 0
---
 src/compiler/nir/nir_opt_loop_unroll.c | 77 ++
 1 file changed, 77 insertions(+)

diff --git a/src/compiler/nir/nir_opt_loop_unroll.c 
b/src/compiler/nir/nir_opt_loop_unroll.c
index e0e0b754716..9c33267cb72 100644
--- a/src/compiler/nir/nir_opt_loop_unroll.c
+++ b/src/compiler/nir/nir_opt_loop_unroll.c
@@ -465,6 +465,65 @@ complex_unroll(nir_loop *loop, nir_loop_terminator 
*unlimit_term,
_mesa_hash_table_destroy(remap_table, NULL);
 }
 
+/* Unrolls the classic wrapper loops e.g
+ *
+ *do {
+ *// ...
+ *} while (false)
+ */
+static bool
+wrapper_unroll(nir_loop *loop)
+{
+   bool progress = false;
+
+   nir_block *blk_after_loop =
+  nir_cursor_current_block(nir_after_cf_node(>cf_node));
+
+   /* There may still be some single src phis following the loop that
+* have not yet been cleaned up by another pass. Tidy those up before
+* unrolling the loop.
+*/
+   nir_foreach_instr_safe(instr, blk_after_loop) {
+  if (instr->type != nir_instr_type_phi)
+ break;
+
+  nir_phi_instr *phi = nir_instr_as_phi(instr);
+  assert(exec_list_length(>srcs) == 1);
+
+  nir_phi_src *phi_src = exec_node_data(nir_phi_src,
+exec_list_get_head(>srcs),
+node);
+
+  nir_ssa_def_rewrite_uses(>dest.ssa, phi_src->src);
+  nir_instr_remove(instr);
+
+  progress = true;
+   }
+
+   nir_block *last_loop_blk = nir_loop_last_block(loop);
+   if (nir_block_ends_in_break(last_loop_blk)) {
+
+  /* Remove break at end of the loop */
+  nir_instr *break_instr = nir_block_last_instr(last_loop_blk);
+  nir_instr_remove(break_instr);
+
+  /* Pluck out the loop body. */
+  nir_cf_list loop_body;
+  nir_cf_extract(_body, nir_before_block(nir_loop_first_block(loop)),
+ nir_after_block(nir_loop_last_block(loop)));
+
+  /* Reinsert loop body after the loop */
+  nir_cf_reinsert(_body, nir_after_cf_node(>cf_node));
+
+  /* The loop has been unrolled so remove it. */
+  nir_cf_node_remove(>cf_node);
+
+  progress = true;
+   }
+
+   return progress;
+}
+
 static bool
 is_loop_small_enough_to_unroll(nir_shader *shader, nir_loop_info *li)
 {
@@ -516,6 +575,24 @@ process_loops(nir_shader *sh, nir_cf_node *cf_node, bool 
*has_nested_loop_out)
 */
if (!progress) {
 
+  /* Check for the classic
+   *
+   *do {
+   *// ...
+   *} while (false)
+   *
+   * that is used to wrap multi-line macros. GLSL IR also wraps switch
+   * statements in a loop like this.
+   */
+  if (loop->info->limiting_terminator == NULL &&
+  list_empty(>info->loop_terminator_list) &&
+  !loop->info->complex_loop) {
+
+ progress = wrapper_unroll(loop);
+
+ goto exit;
+  }
+
   if (has_nested_loop || loop->info->limiting_terminator == NULL)
  goto exit;
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 4/8] nir: always attempt to find loop terminators

2018-07-23 Thread Timothy Arceri

This will help later patches with unrolling loops that end with a
break i.e. loops the always exit on their first interation.
---
 src/compiler/nir/nir_loop_analyze.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/compiler/nir/nir_loop_analyze.c 
b/src/compiler/nir/nir_loop_analyze.c
index d564296aa67..5454b7691ba 100644
--- a/src/compiler/nir/nir_loop_analyze.c
+++ b/src/compiler/nir/nir_loop_analyze.c
@@ -717,13 +717,6 @@ get_loop_info(loop_info_state *state, nir_function_impl 
*impl)
   }
}
 
-   /* Induction analysis needs invariance information so get that first */
-   compute_invariance_information(state);
-
-   /* We have invariance information so try to find induction variables */
-   if (!compute_induction_information(state))
-  return;
-
/* Try to find all simple terminators of the loop. If we can't find any,
 * or we find possible terminators that have side effects then bail.
 */
@@ -737,6 +730,13 @@ get_loop_info(loop_info_state *state, nir_function_impl 
*impl)
   return;
}
 
+   /* Induction analysis needs invariance information so get that first */
+   compute_invariance_information(state);
+
+   /* We have invariance information so try to find induction variables */
+   if (!compute_induction_information(state))
+  return;
+
/* Run through each of the terminators and try to compute a trip-count */
find_trip_count(state);
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] v2 Loop unrolling and if statement opts

2018-07-23 Thread Timothy Arceri

v2:

- make use of nir_block_dominates() in the if-statement opts as suggested
by Connor. This allowed for a reduction in code required but also allowed
me to implemented some extra optimisations.

- implement feedback from Jason on loop unrolling opts.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 >

1 - 100 of 118 matches

Mail list logo