date:20180719

Re: [Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

2018-07-19 Thread Dave Airlie

On 20 July 2018 at 13:12, Marek Olšák  wrote:
> From: Marek Olšák 
>
> To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
> finish sooner on the older CPUs. (otherwise it gets killed and we fail
> the test)

I think this is possibly a bad idea, since it's clear LLVM has some pathalogical
behaviour the AMDGPU backend for this shader and we are just papering over it.

A quick dig into LLVM shows horrible misuse of a SmallVector data structure
for what ends up having 2000 entries in it.

I'm not going to out right NAK this, but it would be nice to have it accompanied
by a pointer to an llvm bug against the amdgpu backend for the
pathalogical case.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] virgl: remove unused stride-arguments

2018-07-19 Thread Gurchetan Singh

Reviewed-by: Gurchetan Singh 
On Wed, Jul 18, 2018 at 4:06 AM Erik Faye-Lund
 wrote:
>
> The IOCTLs doesn't pass this along, so computing them in the first
> place is kinda pointless.
>
> Signed-off-by: Erik Faye-Lund 
> ---
>
> This is just a cleanup I noticed based on some discussion with Gert.
>
> A question is, what code here expects this stride to be respected? The
> call-sites in virgl_*_transfer_map and virgl_*_transfer_unmap kinda
> looks like they do... They'll get a bit of a surprise here, no?
>
> Anyway, this is already broken, so I think this should be OK. But
> perhaps this patch shows some code-paths that need some love?
>
>  src/gallium/drivers/virgl/virgl_buffer.c  |  4 ++--
>  src/gallium/drivers/virgl/virgl_context.c |  2 +-
>  src/gallium/drivers/virgl/virgl_texture.c | 24 ++-
>  src/gallium/drivers/virgl/virgl_winsys.h  |  2 --
>  .../winsys/virgl/drm/virgl_drm_winsys.c   |  6 -
>  5 files changed, 5 insertions(+), 33 deletions(-)
>
> diff --git a/src/gallium/drivers/virgl/virgl_buffer.c 
> b/src/gallium/drivers/virgl/virgl_buffer.c
> index 2e63aebc72..97b2854b9c 100644
> --- a/src/gallium/drivers/virgl/virgl_buffer.c
> +++ b/src/gallium/drivers/virgl/virgl_buffer.c
> @@ -77,7 +77,7 @@ static void *virgl_buffer_transfer_map(struct pipe_context 
> *ctx,
>
> readback = virgl_res_needs_readback(vctx, >base, usage);
> if (readback)
> -  vs->vws->transfer_get(vs->vws, vbuf->base.hw_res, box, 
> trans->base.stride, trans->base.layer_stride, offset, level);
> +  vs->vws->transfer_get(vs->vws, vbuf->base.hw_res, box, offset, level);
>
> if (!(usage & PIPE_TRANSFER_UNSYNCHRONIZED))
>doflushwait = true;
> @@ -109,7 +109,7 @@ static void virgl_buffer_transfer_unmap(struct 
> pipe_context *ctx,
>   vbuf->base.clean = FALSE;
>   vctx->num_transfers++;
>   vs->vws->transfer_put(vs->vws, vbuf->base.hw_res,
> -   >box, trans->base.stride, 
> trans->base.layer_stride, trans->offset, transfer->level);
> +   >box, trans->offset, 
> transfer->level);
>
>}
> }
> diff --git a/src/gallium/drivers/virgl/virgl_context.c 
> b/src/gallium/drivers/virgl/virgl_context.c
> index ee28680b8f..19bc23dd1e 100644
> --- a/src/gallium/drivers/virgl/virgl_context.c
> +++ b/src/gallium/drivers/virgl/virgl_context.c
> @@ -71,7 +71,7 @@ static void virgl_buffer_flush(struct virgl_context *vctx,
>
> vctx->num_transfers++;
> rs->vws->transfer_put(rs->vws, vbuf->base.hw_res,
> - , 0, 0, box.x, 0);
> + , box.x, 0);
>
> util_range_set_empty(>valid_buffer_range);
>  }
> diff --git a/src/gallium/drivers/virgl/virgl_texture.c 
> b/src/gallium/drivers/virgl/virgl_texture.c
> index 150a5ebd8c..485b7cf1a7 100644
> --- a/src/gallium/drivers/virgl/virgl_texture.c
> +++ b/src/gallium/drivers/virgl/virgl_texture.c
> @@ -138,7 +138,6 @@ static void *virgl_texture_transfer_map(struct 
> pipe_context *ctx,
> const unsigned h = u_minify(vtex->base.u.b.height0, level);
> const unsigned nblocksy = util_format_get_nblocksy(format, h);
> bool is_depth = 
> util_format_has_depth(util_format_description(resource->format));
> -   uint32_t l_stride;
> bool doflushwait;
>
> doflushwait = virgl_res_needs_flush_wait(vctx, >base, usage);
> @@ -156,15 +155,6 @@ static void *virgl_texture_transfer_map(struct 
> pipe_context *ctx,
> trans->base.stride = vtex->stride[level];
> trans->base.layer_stride = trans->base.stride * nblocksy;
>
> -   if (resource->target != PIPE_TEXTURE_3D &&
> -   resource->target != PIPE_TEXTURE_CUBE &&
> -   resource->target != PIPE_TEXTURE_1D_ARRAY &&
> -   resource->target != PIPE_TEXTURE_2D_ARRAY &&
> -   resource->target != PIPE_TEXTURE_CUBE_ARRAY)
> -  l_stride = 0;
> -   else
> -  l_stride = trans->base.layer_stride;
> -
> if (is_depth && resource->nr_samples > 1) {
>struct pipe_resource tmp_resource;
>virgl_init_temp_resource_from_box(_resource, resource, box,
> @@ -188,7 +178,7 @@ static void *virgl_texture_transfer_map(struct 
> pipe_context *ctx,
>
> readback = virgl_res_needs_readback(vctx, >base, usage);
> if (readback)
> -  vs->vws->transfer_get(vs->vws, hw_res, box, trans->base.stride, 
> l_stride, offset, level);
> +  vs->vws->transfer_get(vs->vws, hw_res, box, offset, level);
>
> if (doflushwait || readback)
>vs->vws->resource_wait(vs->vws, vtex->base.hw_res);
> @@ -210,16 +200,6 @@ static void virgl_texture_transfer_unmap(struct 
> pipe_context *ctx,
> struct virgl_context *vctx = virgl_context(ctx);
> struct virgl_transfer *trans = virgl_transfer(transfer);
> struct virgl_texture *vtex = virgl_texture(transfer->resource);
> -   uint32_t l_stride;
> -
> -   if (transfer->resource->target != PIPE_TEXTURE_3D &&
> -   transfer->resource->target != PIPE_TEXTURE_CUBE &&
> -

Re: [Mesa-dev] [PATCH] radeonsi: Add debug option to enable LLVM GlobalISel (v2)

2018-07-19 Thread Tom Stellard

On 07/19/2018 08:18 PM, Marek Olšák wrote:
> From: Tom Stellard 
> 
> R600_DEBUG=gisel will tell LLVM to use GlobalISel rather than
> SelectionDAG for instruction selection.
> 
> v2: mareko: move the helper to src/amd/common
> 

Thanks for picking this up.
Reviewed-by: Tom Stellard 

> Signed-off-by: Marek Olšák 
> ---
>  src/amd/common/ac_llvm_helper.cpp  |  7 +++
>  src/amd/common/ac_llvm_util.c  | 11 +--
>  src/amd/common/ac_llvm_util.h  |  2 ++
>  src/gallium/drivers/radeonsi/si_pipe.c |  3 +++
>  src/gallium/drivers/radeonsi/si_pipe.h |  1 +
>  5 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/src/amd/common/ac_llvm_helper.cpp 
> b/src/amd/common/ac_llvm_helper.cpp
> index e0943135fad..a4b2fde786a 100644
> --- a/src/amd/common/ac_llvm_helper.cpp
> +++ b/src/amd/common/ac_llvm_helper.cpp
> @@ -164,10 +164,17 @@ bool ac_compile_module_to_binary(struct 
> ac_compiler_passes *p, LLVMModuleRef mod
>  
>   if (!success)
>   fprintf(stderr, "amd: cannot read an ELF shader binary\n");
>   return success;
>  }
>  
>  void ac_llvm_add_barrier_noop_pass(LLVMPassManagerRef passmgr)
>  {
>   llvm::unwrap(passmgr)->add(llvm::createBarrierNoopPass());
>  }
> +
> +void ac_enable_global_isel(LLVMTargetMachineRef tm)
> +{
> +#if HAVE_LLVM >= 0x0700
> +  reinterpret_cast(tm)->setGlobalISel(true);
> +#endif
> +}
> diff --git a/src/amd/common/ac_llvm_util.c b/src/amd/common/ac_llvm_util.c
> index b6960f7382d..10e1ca99d41 100644
> --- a/src/amd/common/ac_llvm_util.c
> +++ b/src/amd/common/ac_llvm_util.c
> @@ -27,20 +27,21 @@
>  #include "ac_llvm_build.h"
>  #include "util/bitscan.h"
>  #include 
>  #include 
>  #include 
>  #include 
>  #if HAVE_LLVM >= 0x0700
>  #include 
>  #endif
>  #include "c11/threads.h"
> +#include "gallivm/lp_bld_misc.h"
>  #include "util/u_math.h"
>  
>  #include 
>  #include 
>  #include 
>  
>  static void ac_init_llvm_target()
>  {
>   LLVMInitializeAMDGPUTargetInfo();
>   LLVMInitializeAMDGPUTarget();
> @@ -48,23 +49,27 @@ static void ac_init_llvm_target()
>   LLVMInitializeAMDGPUAsmPrinter();
>  
>   /* For inline assembly. */
>   LLVMInitializeAMDGPUAsmParser();
>  
>   /* Workaround for bug in llvm 4.0 that causes image intrinsics
>* to disappear.
>* https://reviews.llvm.org/D26348
>*
>* "mesa" is the prefix for error messages.
> +  *
> +  * -global-isel-abort=2 is a no-op unless global isel has been enabled.
> +  * This option tells the backend to fall-back to SelectionDAG and print
> +  * a diagnostic message if global isel fails.
>*/
> - const char *argv[2] = { "mesa", "-simplifycfg-sink-common=false" };
> - LLVMParseCommandLineOptions(2, argv, NULL);
> + const char *argv[3] = { "mesa", "-simplifycfg-sink-common=false", 
> "-global-isel-abort=2" };
> + LLVMParseCommandLineOptions(3, argv, NULL);
>  }
>  
>  static once_flag ac_init_llvm_target_once_flag = ONCE_FLAG_INIT;
>  
>  void ac_init_llvm_once(void)
>  {
>   call_once(_init_llvm_target_once_flag, ac_init_llvm_target);
>  }
>  
>  static LLVMTargetRef ac_get_llvm_target(const char *triple)
> @@ -158,20 +163,22 @@ static LLVMTargetMachineRef 
> ac_create_target_machine(enum radeon_family family,
>target,
>triple,
>ac_get_llvm_processor_name(family),
>features,
>level,
>LLVMRelocDefault,
>LLVMCodeModelDefault);
>  
>   if (out_triple)
>   *out_triple = triple;
> + if (tm_options & AC_TM_ENABLE_GLOBAL_ISEL)
> + ac_enable_global_isel(tm);
>   return tm;
>  }
>  
>  static LLVMPassManagerRef ac_create_passmgr(LLVMTargetLibraryInfoRef 
> target_library_info,
>   bool check_ir)
>  {
>   LLVMPassManagerRef passmgr = LLVMCreatePassManager();
>   if (!passmgr)
>   return NULL;
>  
> diff --git a/src/amd/common/ac_llvm_util.h b/src/amd/common/ac_llvm_util.h
> index c0e759b8836..a3adf4fe458 100644
> --- a/src/amd/common/ac_llvm_util.h
> +++ b/src/amd/common/ac_llvm_util.h
> @@ -57,20 +57,21 @@ enum ac_func_attr {
>  };
>  
>  enum ac_target_machine_options {
>   AC_TM_SUPPORTS_SPILL = (1 << 0),
>   AC_TM_SISCHED = (1 << 1),
>   AC_TM_FORCE_ENABLE_XNACK = (1 << 2),
>   AC_TM_FORCE_DISABLE_XNACK = (1 << 3),
>   AC_TM_PROMOTE_ALLOCA_TO_SCRATCH = (1 << 4),
>   AC_TM_CHECK_IR = (1 << 5),
>   AC_TM_CREATE_LOW_OPT = (1 << 6),
> + AC_TM_ENABLE_GLOBAL_ISEL = (1 << 7),
>  };
>  
>  enum ac_float_mode {
>   AC_FLOAT_MODE_DEFAULT,
>   AC_FLOAT_MODE_NO_SIGNED_ZEROS_FP_MATH,
>   AC_FLOAT_MODE_UNSAFE_FP_MATH,
>  };
>  
>  /* Per-thread persistent LLVM objects. */
>  struct ac_llvm_compiler {
> @@

[Mesa-dev] [PATCH] radeonsi: Add debug option to enable LLVM GlobalISel (v2)

2018-07-19 Thread Marek Olšák

From: Tom Stellard 

R600_DEBUG=gisel will tell LLVM to use GlobalISel rather than
SelectionDAG for instruction selection.

v2: mareko: move the helper to src/amd/common

Signed-off-by: Marek Olšák 
---
 src/amd/common/ac_llvm_helper.cpp  |  7 +++
 src/amd/common/ac_llvm_util.c  | 11 +--
 src/amd/common/ac_llvm_util.h  |  2 ++
 src/gallium/drivers/radeonsi/si_pipe.c |  3 +++
 src/gallium/drivers/radeonsi/si_pipe.h |  1 +
 5 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/src/amd/common/ac_llvm_helper.cpp 
b/src/amd/common/ac_llvm_helper.cpp
index e0943135fad..a4b2fde786a 100644
--- a/src/amd/common/ac_llvm_helper.cpp
+++ b/src/amd/common/ac_llvm_helper.cpp
@@ -164,10 +164,17 @@ bool ac_compile_module_to_binary(struct 
ac_compiler_passes *p, LLVMModuleRef mod
 
if (!success)
fprintf(stderr, "amd: cannot read an ELF shader binary\n");
return success;
 }
 
 void ac_llvm_add_barrier_noop_pass(LLVMPassManagerRef passmgr)
 {
llvm::unwrap(passmgr)->add(llvm::createBarrierNoopPass());
 }
+
+void ac_enable_global_isel(LLVMTargetMachineRef tm)
+{
+#if HAVE_LLVM >= 0x0700
+  reinterpret_cast(tm)->setGlobalISel(true);
+#endif
+}
diff --git a/src/amd/common/ac_llvm_util.c b/src/amd/common/ac_llvm_util.c
index b6960f7382d..10e1ca99d41 100644
--- a/src/amd/common/ac_llvm_util.c
+++ b/src/amd/common/ac_llvm_util.c
@@ -27,20 +27,21 @@
 #include "ac_llvm_build.h"
 #include "util/bitscan.h"
 #include 
 #include 
 #include 
 #include 
 #if HAVE_LLVM >= 0x0700
 #include 
 #endif
 #include "c11/threads.h"
+#include "gallivm/lp_bld_misc.h"
 #include "util/u_math.h"
 
 #include 
 #include 
 #include 
 
 static void ac_init_llvm_target()
 {
LLVMInitializeAMDGPUTargetInfo();
LLVMInitializeAMDGPUTarget();
@@ -48,23 +49,27 @@ static void ac_init_llvm_target()
LLVMInitializeAMDGPUAsmPrinter();
 
/* For inline assembly. */
LLVMInitializeAMDGPUAsmParser();
 
/* Workaround for bug in llvm 4.0 that causes image intrinsics
 * to disappear.
 * https://reviews.llvm.org/D26348
 *
 * "mesa" is the prefix for error messages.
+*
+* -global-isel-abort=2 is a no-op unless global isel has been enabled.
+* This option tells the backend to fall-back to SelectionDAG and print
+* a diagnostic message if global isel fails.
 */
-   const char *argv[2] = { "mesa", "-simplifycfg-sink-common=false" };
-   LLVMParseCommandLineOptions(2, argv, NULL);
+   const char *argv[3] = { "mesa", "-simplifycfg-sink-common=false", 
"-global-isel-abort=2" };
+   LLVMParseCommandLineOptions(3, argv, NULL);
 }
 
 static once_flag ac_init_llvm_target_once_flag = ONCE_FLAG_INIT;
 
 void ac_init_llvm_once(void)
 {
call_once(_init_llvm_target_once_flag, ac_init_llvm_target);
 }
 
 static LLVMTargetRef ac_get_llvm_target(const char *triple)
@@ -158,20 +163,22 @@ static LLVMTargetMachineRef ac_create_target_machine(enum 
radeon_family family,
 target,
 triple,
 ac_get_llvm_processor_name(family),
 features,
 level,
 LLVMRelocDefault,
 LLVMCodeModelDefault);
 
if (out_triple)
*out_triple = triple;
+   if (tm_options & AC_TM_ENABLE_GLOBAL_ISEL)
+   ac_enable_global_isel(tm);
return tm;
 }
 
 static LLVMPassManagerRef ac_create_passmgr(LLVMTargetLibraryInfoRef 
target_library_info,
bool check_ir)
 {
LLVMPassManagerRef passmgr = LLVMCreatePassManager();
if (!passmgr)
return NULL;
 
diff --git a/src/amd/common/ac_llvm_util.h b/src/amd/common/ac_llvm_util.h
index c0e759b8836..a3adf4fe458 100644
--- a/src/amd/common/ac_llvm_util.h
+++ b/src/amd/common/ac_llvm_util.h
@@ -57,20 +57,21 @@ enum ac_func_attr {
 };
 
 enum ac_target_machine_options {
AC_TM_SUPPORTS_SPILL = (1 << 0),
AC_TM_SISCHED = (1 << 1),
AC_TM_FORCE_ENABLE_XNACK = (1 << 2),
AC_TM_FORCE_DISABLE_XNACK = (1 << 3),
AC_TM_PROMOTE_ALLOCA_TO_SCRATCH = (1 << 4),
AC_TM_CHECK_IR = (1 << 5),
AC_TM_CREATE_LOW_OPT = (1 << 6),
+   AC_TM_ENABLE_GLOBAL_ISEL = (1 << 7),
 };
 
 enum ac_float_mode {
AC_FLOAT_MODE_DEFAULT,
AC_FLOAT_MODE_NO_SIGNED_ZEROS_FP_MATH,
AC_FLOAT_MODE_UNSAFE_FP_MATH,
 };
 
 /* Per-thread persistent LLVM objects. */
 struct ac_llvm_compiler {
@@ -136,16 +137,17 @@ bool ac_init_llvm_compiler(struct ac_llvm_compiler 
*compiler,
   bool okay_to_leak_target_library_info,
   enum radeon_family family,
   enum ac_target_machine_options tm_options);
 void

[Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs

2018-07-19 Thread Marek Olšák

From: Marek Olšák 

To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
finish sooner on the older CPUs. (otherwise it gets killed and we fail
the test)
---
 src/amd/common/ac_llvm_util.c | 18 ++--
 src/amd/common/ac_llvm_util.h | 11 ++-
 src/gallium/drivers/radeonsi/si_pipe.c| 12 +++-
 src/gallium/drivers/radeonsi/si_shader.c  | 29 +++
 .../drivers/radeonsi/si_shader_internal.h |  3 +-
 .../drivers/radeonsi/si_shader_tgsi_setup.c   |  8 +++--
 6 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/src/amd/common/ac_llvm_util.c b/src/amd/common/ac_llvm_util.c
index 0c8dbf1ec51..b6960f7382d 100644
--- a/src/amd/common/ac_llvm_util.c
+++ b/src/amd/common/ac_llvm_util.c
@@ -130,20 +130,21 @@ const char *ac_get_llvm_processor_name(enum radeon_family 
family)
return HAVE_LLVM >= 0x0700 ? "gfx904" : "gfx902";
case CHIP_VEGA20:
return HAVE_LLVM >= 0x0700 ? "gfx906" : "gfx902";
default:
return "";
}
 }
 
 static LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family,
 enum 
ac_target_machine_options tm_options,
+LLVMCodeGenOptLevel level,
 const char **out_triple)
 {
assert(family >= CHIP_TAHITI);
char features[256];
const char *triple = (tm_options & AC_TM_SUPPORTS_SPILL) ? 
"amdgcn-mesa-mesa3d" : "amdgcn--";
LLVMTargetRef target = ac_get_llvm_target(triple);
bool barrier_does_waitcnt = family != CHIP_VEGA20;
 
snprintf(features, sizeof(features),
 
"+DumpCode,+vgpr-spilling,-fp32-denormals,+fp64-denormals%s%s%s%s%s",
@@ -151,21 +152,21 @@ static LLVMTargetMachineRef ac_create_target_machine(enum 
radeon_family family,
 tm_options & AC_TM_FORCE_ENABLE_XNACK ? ",+xnack" : "",
 tm_options & AC_TM_FORCE_DISABLE_XNACK ? ",-xnack" : "",
 tm_options & AC_TM_PROMOTE_ALLOCA_TO_SCRATCH ? 
",-promote-alloca" : "",
 barrier_does_waitcnt ? ",+auto-waitcnt-before-barrier" : "");

LLVMTargetMachineRef tm = LLVMCreateTargetMachine(
 target,
 triple,
 ac_get_llvm_processor_name(family),
 features,
-LLVMCodeGenLevelDefault,
+level,
 LLVMRelocDefault,
 LLVMCodeModelDefault);
 
if (out_triple)
*out_triple = triple;
return tm;
 }
 
 static LLVMPassManagerRef ac_create_passmgr(LLVMTargetLibraryInfoRef 
target_library_info,
bool check_ir)
@@ -294,25 +295,34 @@ ac_count_scratch_private_memory(LLVMValueRef function)
 
 bool
 ac_init_llvm_compiler(struct ac_llvm_compiler *compiler,
  bool okay_to_leak_target_library_info,
  enum radeon_family family,
  enum ac_target_machine_options tm_options)
 {
const char *triple;
memset(compiler, 0, sizeof(*compiler));
 
-   compiler->tm = ac_create_target_machine(family,
-   tm_options, );
+   compiler->tm = ac_create_target_machine(family, tm_options,
+   LLVMCodeGenLevelDefault,
+   );
if (!compiler->tm)
return false;
 
+   if (tm_options & AC_TM_CREATE_LOW_OPT) {
+   compiler->low_opt_tm =
+   ac_create_target_machine(family, tm_options,
+LLVMCodeGenLevelLess, NULL);
+   if (!compiler->low_opt_tm)
+   goto fail;
+   }
+
if (okay_to_leak_target_library_info || (HAVE_LLVM >= 0x0700)) {
compiler->target_library_info =
ac_create_target_library_info(triple);
if (!compiler->target_library_info)
goto fail;
}
 
compiler->passmgr = ac_create_passmgr(compiler->target_library_info,
  tm_options & AC_TM_CHECK_IR);
if (!compiler->passmgr)
@@ -327,13 +337,15 @@ fail:
 void
 ac_destroy_llvm_compiler(struct ac_llvm_compiler *compiler)
 {
if (compiler->passmgr)
LLVMDisposePassManager(compiler->passmgr);
 #if HAVE_LLVM >= 0x0700
/* This crashes on LLVM 5.0 and 6.0 and Ubuntu 18.04, so leak it there. 
*/
if (compiler->target_library_info)
ac_dispose_target_library_info(compiler->target_library_info);
 #endif
+

[Mesa-dev] [PATCH 1/2] radeonsi: handle SI_FORCE_FAMILY early

2018-07-19 Thread Marek Olšák

From: Marek Olšák 

before LLVM target machines are created
---
 src/gallium/drivers/radeonsi/si_pipe.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index d3d0c0ef075..22e333aec77 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -781,20 +781,21 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws,
 {
struct si_screen *sscreen = CALLOC_STRUCT(si_screen);
unsigned hw_threads, num_comp_hi_threads, num_comp_lo_threads, i;
 
if (!sscreen) {
return NULL;
}
 
sscreen->ws = ws;
ws->query_info(ws, >info);
+   si_handle_env_var_force_family(sscreen);
 
sscreen->debug_flags = debug_get_flags_option("R600_DEBUG",
debug_options, 0);
 
/* Set functions first. */
sscreen->b.context_create = si_pipe_create_context;
sscreen->b.destroy = si_destroy_screen;
 
si_init_screen_get_functions(sscreen);
si_init_screen_buffer_functions(sscreen);
@@ -870,22 +871,20 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws,
if (!util_queue_init(>shader_compiler_queue_low_priority,
 "shlo",
 64, num_comp_lo_threads,
 UTIL_QUEUE_INIT_RESIZE_IF_FULL |
 UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY)) {
   si_destroy_shader_cache(sscreen);
   FREE(sscreen);
   return NULL;
}
 
-   si_handle_env_var_force_family(sscreen);
-
if (!debug_get_bool_option("RADEON_DISABLE_PERFCOUNTERS", false))
si_init_perfcounters(sscreen);
 
/* Determine tessellation ring info. */
bool double_offchip_buffers = sscreen->info.chip_class >= CIK &&
  sscreen->info.family != CHIP_CARRIZO &&
  sscreen->info.family != CHIP_STONEY;
/* This must be one less than the maximum number due to a hw limitation.
 * Various hardware bugs in SI, CIK, and GFX9 need this.
 */
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] nir: Teach src_is_type about bcsel

2018-07-19 Thread Ilia Mirkin

On Thu, Jul 19, 2018 at 8:39 PM, Ian Romanick  wrote:
> From: Ian Romanick 
>
> ---
>  src/compiler/nir/nir_search.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
> index c727e9c70b7..99b4ac6ddd2 100644
> --- a/src/compiler/nir/nir_search.c
> +++ b/src/compiler/nir/nir_search.c
> @@ -75,9 +75,13 @@ src_is_type(nir_src src, nir_alu_type type, unsigned 
> num_components,
> src_is_type(src_alu->src[1].src, nir_type_bool,
> num_components, swizzle);
>   case nir_op_inot:
> -return src_is_type(src_alu->src[0].src, nir_type_bool);

Oh, and there it is. Should probably go into 1/2.

>  return src_is_type(src_alu->src[0].src, nir_type_bool,
> num_components, swizzle);
> + case nir_op_bcsel:
> +return src_is_type(src_alu->src[1].src, nir_type_bool,
> +   num_components, swizzle) &&
> +   src_is_type(src_alu->src[2].src, nir_type_bool,
> +   num_components, swizzle);
>   default:
>  break;
>   }
> --
> 2.14.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] nir: Teach src_is_type about nir_instr_type_load_const

2018-07-19 Thread Ilia Mirkin

On Thu, Jul 19, 2018 at 8:39 PM, Ian Romanick  wrote:
> From: Ian Romanick 
>
> ---
> After looking back at Tim's patches, I realized that I had done my
> implementation slightly differntly.  I didn't include these in my branch
> because it caused a bunch of regressions on i965.  I'm just sending
> these out in case we end up going this route...
>
>
>  src/compiler/nir/nir_search.c | 28 
>  1 file changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
> index 28b36b2b863..c727e9c70b7 100644
> --- a/src/compiler/nir/nir_search.c
> +++ b/src/compiler/nir/nir_search.c
> @@ -49,7 +49,8 @@ static const uint8_t identity_swizzle[] = { 0, 1, 2, 3 };
>   * Used for satisfying 'a@type' constraints.
>   */
>  static bool
> -src_is_type(nir_src src, nir_alu_type type)
> +src_is_type(nir_src src, nir_alu_type type, unsigned num_components,
> +const uint8_t *swizzle)
>  {
> assert(type != nir_type_invalid);
>
> @@ -69,10 +70,14 @@ src_is_type(nir_src src, nir_alu_type type)
>   case nir_op_iand:
>   case nir_op_ior:
>   case nir_op_ixor:
> -return src_is_type(src_alu->src[0].src, nir_type_bool) &&
> -   src_is_type(src_alu->src[1].src, nir_type_bool);
> +return src_is_type(src_alu->src[0].src, nir_type_bool,
> +   num_components, swizzle) &&
> +   src_is_type(src_alu->src[1].src, nir_type_bool,
> +   num_components, swizzle);
>   case nir_op_inot:
>  return src_is_type(src_alu->src[0].src, nir_type_bool);

Was there supposed to be a - here? I don't think this will compile as-is.

> +return src_is_type(src_alu->src[0].src, nir_type_bool,
> +   num_components, swizzle);
>   default:
>  break;
>   }
> @@ -86,6 +91,20 @@ src_is_type(nir_src src, nir_alu_type type)
>   return intr->intrinsic == nir_intrinsic_load_front_face ||
>  intr->intrinsic == nir_intrinsic_load_helper_invocation;
>}
> +   } else if (src.ssa->parent_instr->type == nir_instr_type_load_const) {
> +  if (type == nir_type_bool) {
> + const nir_const_value *const val = nir_src_as_const_value(src);
> +
> + assert(val != NULL);
> +
> + for (unsigned i = 0; i < num_components; i++) {
> +if (val->u32[swizzle[i]] != NIR_FALSE &&
> +val->u32[swizzle[i]] != NIR_TRUE)
> +   return false;
> + }
> +
> + return true;
> +  }
> }
>
> /* don't know */
> @@ -159,7 +178,8 @@ match_value(const nir_search_value *value, nir_alu_instr 
> *instr, unsigned src,
>  return false;
>
>   if (var->type != nir_type_invalid &&
> - !src_is_type(instr->src[src].src, var->type))
> + !src_is_type(instr->src[src].src, var->type, num_components,
> +  new_swizzle))
>  return false;
>
>   state->variables_seen |= (1 << var->variable);
> --
> 2.14.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] nir: Teach src_is_type about nir_instr_type_load_const

2018-07-19 Thread Ian Romanick

From: Ian Romanick 

---
After looking back at Tim's patches, I realized that I had done my
implementation slightly differntly.  I didn't include these in my branch
because it caused a bunch of regressions on i965.  I'm just sending
these out in case we end up going this route...


 src/compiler/nir/nir_search.c | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
index 28b36b2b863..c727e9c70b7 100644
--- a/src/compiler/nir/nir_search.c
+++ b/src/compiler/nir/nir_search.c
@@ -49,7 +49,8 @@ static const uint8_t identity_swizzle[] = { 0, 1, 2, 3 };
  * Used for satisfying 'a@type' constraints.
  */
 static bool
-src_is_type(nir_src src, nir_alu_type type)
+src_is_type(nir_src src, nir_alu_type type, unsigned num_components,
+const uint8_t *swizzle)
 {
assert(type != nir_type_invalid);
 
@@ -69,10 +70,14 @@ src_is_type(nir_src src, nir_alu_type type)
  case nir_op_iand:
  case nir_op_ior:
  case nir_op_ixor:
-return src_is_type(src_alu->src[0].src, nir_type_bool) &&
-   src_is_type(src_alu->src[1].src, nir_type_bool);
+return src_is_type(src_alu->src[0].src, nir_type_bool,
+   num_components, swizzle) &&
+   src_is_type(src_alu->src[1].src, nir_type_bool,
+   num_components, swizzle);
  case nir_op_inot:
 return src_is_type(src_alu->src[0].src, nir_type_bool);
+return src_is_type(src_alu->src[0].src, nir_type_bool,
+   num_components, swizzle);
  default:
 break;
  }
@@ -86,6 +91,20 @@ src_is_type(nir_src src, nir_alu_type type)
  return intr->intrinsic == nir_intrinsic_load_front_face ||
 intr->intrinsic == nir_intrinsic_load_helper_invocation;
   }
+   } else if (src.ssa->parent_instr->type == nir_instr_type_load_const) {
+  if (type == nir_type_bool) {
+ const nir_const_value *const val = nir_src_as_const_value(src);
+
+ assert(val != NULL);
+
+ for (unsigned i = 0; i < num_components; i++) {
+if (val->u32[swizzle[i]] != NIR_FALSE &&
+val->u32[swizzle[i]] != NIR_TRUE)
+   return false;
+ }
+
+ return true;
+  }
}
 
/* don't know */
@@ -159,7 +178,8 @@ match_value(const nir_search_value *value, nir_alu_instr 
*instr, unsigned src,
 return false;
 
  if (var->type != nir_type_invalid &&
- !src_is_type(instr->src[src].src, var->type))
+ !src_is_type(instr->src[src].src, var->type, num_components,
+  new_swizzle))
 return false;
 
  state->variables_seen |= (1 << var->variable);
-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] nir: Teach src_is_type about bcsel

2018-07-19 Thread Ian Romanick

From: Ian Romanick 

---
 src/compiler/nir/nir_search.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
index c727e9c70b7..99b4ac6ddd2 100644
--- a/src/compiler/nir/nir_search.c
+++ b/src/compiler/nir/nir_search.c
@@ -75,9 +75,13 @@ src_is_type(nir_src src, nir_alu_type type, unsigned 
num_components,
src_is_type(src_alu->src[1].src, nir_type_bool,
num_components, swizzle);
  case nir_op_inot:
-return src_is_type(src_alu->src[0].src, nir_type_bool);
 return src_is_type(src_alu->src[0].src, nir_type_bool,
num_components, swizzle);
+ case nir_op_bcsel:
+return src_is_type(src_alu->src[1].src, nir_type_bool,
+   num_components, swizzle) &&
+   src_is_type(src_alu->src[2].src, nir_type_bool,
+   num_components, swizzle);
  default:
 break;
  }
-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] radv: make use of radv_subpass_barrier() when resolving subpasses

2018-07-19 Thread Bas Nieuwenhuizen

Reviewed-by: Bas Nieuwenhuizen 

for the series.

On Wed, Jul 18, 2018 at 4:19 PM, Samuel Pitoiset
 wrote:
> The goal is to use radv_barrier()/radv_subpass_barrier() as
> much as possible for further optimizations.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c  |  3 ++-
>  src/amd/vulkan/radv_meta_resolve_cs.c | 16 +---
>  src/amd/vulkan/radv_meta_resolve_fs.c | 13 ++---
>  src/amd/vulkan/radv_private.h |  3 +++
>  4 files changed, 20 insertions(+), 15 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index 041ebf0ca3..b67f0ffdbe 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -2048,7 +2048,8 @@ radv_dst_access_flush(struct radv_cmd_buffer 
> *cmd_buffer,
> return flush_bits;
>  }
>
> -static void radv_subpass_barrier(struct radv_cmd_buffer *cmd_buffer, const 
> struct radv_subpass_barrier *barrier)
> +void radv_subpass_barrier(struct radv_cmd_buffer *cmd_buffer,
> + const struct radv_subpass_barrier *barrier)
>  {
> cmd_buffer->state.flush_bits |= radv_src_access_flush(cmd_buffer, 
> barrier->src_access_mask,
>   NULL);
> diff --git a/src/amd/vulkan/radv_meta_resolve_cs.c 
> b/src/amd/vulkan/radv_meta_resolve_cs.c
> index daf11e0576..ad02594614 100644
> --- a/src/amd/vulkan/radv_meta_resolve_cs.c
> +++ b/src/amd/vulkan/radv_meta_resolve_cs.c
> @@ -473,6 +473,8 @@ radv_cmd_buffer_resolve_subpass_cs(struct radv_cmd_buffer 
> *cmd_buffer)
> struct radv_framebuffer *fb = cmd_buffer->state.framebuffer;
> const struct radv_subpass *subpass = cmd_buffer->state.subpass;
> struct radv_meta_saved_state saved_state;
> +   struct radv_subpass_barrier barrier;
> +
> /* FINISHME(perf): Skip clears for resolve attachments.
>  *
>  * From the Vulkan 1.0 spec:
> @@ -485,13 +487,13 @@ radv_cmd_buffer_resolve_subpass_cs(struct 
> radv_cmd_buffer *cmd_buffer)
> if (!subpass->has_resolve)
> return;
>
> -   /* Resolves happen before the end-of-subpass barriers get executed,
> -* so we have to make the attachment shader-readable */
> -   cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_PS_PARTIAL_FLUSH |
> -   RADV_CMD_FLAG_FLUSH_AND_INV_CB |
> -   RADV_CMD_FLAG_FLUSH_AND_INV_CB_META |
> -   RADV_CMD_FLAG_INV_GLOBAL_L2 |
> -   RADV_CMD_FLAG_INV_VMEM_L1;
> +   /* Resolves happen before the end-of-subpass barriers get executed, so
> +* we have to make the attachment shader-readable.
> +*/
> +   barrier.src_stage_mask = 
> VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
> +   barrier.src_access_mask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
> +   barrier.dst_access_mask = VK_ACCESS_INPUT_ATTACHMENT_READ_BIT;
> +   radv_subpass_barrier(cmd_buffer, );
>
> radv_decompress_resolve_subpass_src(cmd_buffer);
>
> diff --git a/src/amd/vulkan/radv_meta_resolve_fs.c 
> b/src/amd/vulkan/radv_meta_resolve_fs.c
> index 5f4f241893..0e4957b163 100644
> --- a/src/amd/vulkan/radv_meta_resolve_fs.c
> +++ b/src/amd/vulkan/radv_meta_resolve_fs.c
> @@ -580,6 +580,7 @@ radv_cmd_buffer_resolve_subpass_fs(struct radv_cmd_buffer 
> *cmd_buffer)
> struct radv_framebuffer *fb = cmd_buffer->state.framebuffer;
> const struct radv_subpass *subpass = cmd_buffer->state.subpass;
> struct radv_meta_saved_state saved_state;
> +   struct radv_subpass_barrier barrier;
>
> /* FINISHME(perf): Skip clears for resolve attachments.
>  *
> @@ -600,13 +601,11 @@ radv_cmd_buffer_resolve_subpass_fs(struct 
> radv_cmd_buffer *cmd_buffer)
>
> /* Resolves happen before the end-of-subpass barriers get executed,
>  * so we have to make the attachment shader-readable */
> -   cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_PS_PARTIAL_FLUSH |
> -   RADV_CMD_FLAG_FLUSH_AND_INV_CB |
> -   RADV_CMD_FLAG_FLUSH_AND_INV_CB_META |
> -   RADV_CMD_FLAG_FLUSH_AND_INV_DB |
> -   RADV_CMD_FLAG_FLUSH_AND_INV_DB_META |
> -   RADV_CMD_FLAG_INV_GLOBAL_L2 |
> -   RADV_CMD_FLAG_INV_VMEM_L1;
> +   barrier.src_stage_mask = 
> VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
> +   barrier.src_access_mask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT |
> + 
> VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
> +   barrier.dst_access_mask = VK_ACCESS_INPUT_ATTACHMENT_READ_BIT;
> +   radv_subpass_barrier(cmd_buffer, );
>
>

Re: [Mesa-dev] [PATCH 6/6] glsl: fix function inlining with opaque parameters

2018-07-19 Thread Marek Olšák

For patches 3-6:

Reviewed-by: Marek Olšák 

Marek

On Wed, Jun 6, 2018 at 3:55 PM, Rhys Perry  wrote:
> Signed-off-by: Rhys Perry 
> ---
>  src/compiler/glsl/opt_function_inlining.cpp | 52 
> -
>  1 file changed, 44 insertions(+), 8 deletions(-)
>
> diff --git a/src/compiler/glsl/opt_function_inlining.cpp 
> b/src/compiler/glsl/opt_function_inlining.cpp
> index 04690b6cf4..52f57da936 100644
> --- a/src/compiler/glsl/opt_function_inlining.cpp
> +++ b/src/compiler/glsl/opt_function_inlining.cpp
> @@ -131,6 +131,18 @@ ir_save_lvalue_visitor::visit_enter(ir_dereference_array 
> *deref)
> return visit_stop;
>  }
>
> +static bool
> +should_replace_variable(ir_variable *sig_param, ir_rvalue *param) {
> +   /* For opaque types, we want the inlined variable references
> +* referencing the passed in variable, since that will have
> +* the location information, which an assignment of an opaque
> +* variable wouldn't.
> +*/
> +   return sig_param->type->contains_opaque() &&
> +  param->is_dereference() &&
> +  sig_param->data.mode == ir_var_function_in;
> +}
> +
>  void
>  ir_call::generate_inline(ir_instruction *next_ir)
>  {
> @@ -155,12 +167,8 @@ ir_call::generate_inline(ir_instruction *next_ir)
>ir_rvalue *param = (ir_rvalue *) actual_node;
>
>/* Generate a new variable for the parameter. */
> -  if (sig_param->type->contains_opaque()) {
> -/* For opaque types, we want the inlined variable references
> - * referencing the passed in variable, since that will have
> - * the location information, which an assignment of an opaque
> - * variable wouldn't.  Fix it up below.
> - */
> +  if (should_replace_variable(sig_param, param)) {
> + /* Actual replacement happens below */
>  parameters[i] = NULL;
>} else {
>  parameters[i] = sig_param->clone(ctx, ht);
> @@ -242,10 +250,9 @@ ir_call::generate_inline(ir_instruction *next_ir)
>ir_rvalue *const param = (ir_rvalue *) actual_node;
>ir_variable *sig_param = (ir_variable *) formal_node;
>
> -  if (sig_param->type->contains_opaque()) {
> +  if (should_replace_variable(sig_param, param)) {
>  ir_dereference *deref = param->as_dereference();
>
> -assert(deref);
>  do_variable_replacement(_instructions, sig_param, deref);
>}
> }
> @@ -351,6 +358,9 @@ public:
> virtual ir_visitor_status visit_leave(ir_dereference_array *);
> virtual ir_visitor_status visit_leave(ir_dereference_record *);
> virtual ir_visitor_status visit_leave(ir_texture *);
> +   virtual ir_visitor_status visit_leave(ir_assignment *);
> +   virtual ir_visitor_status visit_leave(ir_expression *);
> +   virtual ir_visitor_status visit_leave(ir_return *);
>
> void replace_deref(ir_dereference **deref);
> void replace_rvalue(ir_rvalue **rvalue);
> @@ -391,6 +401,32 @@ ir_variable_replacement_visitor::visit_leave(ir_texture 
> *ir)
> return visit_continue;
>  }
>
> +ir_visitor_status
> +ir_variable_replacement_visitor::visit_leave(ir_assignment *ir)
> +{
> +   replace_deref(>lhs);
> +   replace_rvalue(>rhs);
> +
> +   return visit_continue;
> +}
> +
> +ir_visitor_status
> +ir_variable_replacement_visitor::visit_leave(ir_expression *ir)
> +{
> +   for (uint8_t i = 0; i < ir->num_operands; i++)
> +  replace_rvalue(>operands[i]);
> +
> +   return visit_continue;
> +}
> +
> +ir_visitor_status
> +ir_variable_replacement_visitor::visit_leave(ir_return *ir)
> +{
> +   replace_rvalue(>value);
> +
> +   return visit_continue;
> +}
> +
>  ir_visitor_status
>  ir_variable_replacement_visitor::visit_leave(ir_dereference_array *ir)
>  {
> --
> 2.14.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [AppVeyor] mesa staging/18.1 #8356 failed

2018-07-19 Thread AppVeyor




Build mesa 8356 failed


Commit 6561cc8f86 by Jason Ekstrand on 7/12/2018 9:05 PM:

intel/blorp: Handle 3-component formats in clears\n\nThis fixes a nasty hang in Batman: Arkham City which apparently calls\nvkCmdClearColorImage on a linear RGB image.\n\ncc: mesa-sta...@lists.freedesktop.org\nReviewed-by: Samuel Iglesias Gonsálvez \n(cherry picked from commit daa78f30b6dbd95ca5838aa666000f8fd628c92c)


Configure your notification preferences

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/6] gallium: add new SAMP2HND and IMG2HND opcodes

2018-07-19 Thread Marek Olšák

On Wed, Jun 6, 2018 at 6:42 PM, Ilia Mirkin  wrote:
> On Wed, Jun 6, 2018 at 3:55 PM, Rhys Perry  wrote:
>> This commit does not add support for the opcodes in gallivm or tgsi_to_nir.c
>>
>> Signed-off-by: Rhys Perry 
>> ---
>>  src/gallium/auxiliary/tgsi/tgsi_info.c |  2 ++
>>  src/gallium/auxiliary/tgsi/tgsi_info_opcodes.h |  4 ++--
>>  src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h   |  3 +++
>>  src/gallium/docs/source/tgsi.rst   | 25 
>> +
>>  src/gallium/include/pipe/p_shader_tokens.h |  2 ++
>>  5 files changed, 34 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
>> b/src/gallium/auxiliary/tgsi/tgsi_info.c
>> index 4aa658785c..bbe1a21e43 100644
>> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
>> @@ -153,6 +153,8 @@ tgsi_opcode_infer_type(enum tgsi_opcode opcode)
>> case TGSI_OPCODE_POPC:
>> case TGSI_OPCODE_LSB:
>> case TGSI_OPCODE_UMSB:
>> +   case TGSI_OPCODE_IMG2HND:
>> +   case TGSI_OPCODE_SAMP2HND:
>>return TGSI_TYPE_UNSIGNED;
>> case TGSI_OPCODE_ARL:
>> case TGSI_OPCODE_ARR:
>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info_opcodes.h 
>> b/src/gallium/auxiliary/tgsi/tgsi_info_opcodes.h
>> index 1b2803cf3f..c3787c2fbb 100644
>> --- a/src/gallium/auxiliary/tgsi/tgsi_info_opcodes.h
>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info_opcodes.h
>> @@ -162,8 +162,8 @@ OPCODE(1, 1, COMP, IABS)
>>  OPCODE(1, 1, COMP, ISSG)
>>  OPCODE(1, 2, OTHR, LOAD)
>>  OPCODE(1, 2, OTHR, STORE, .is_store = 1)
>> -OPCODE_GAP(163) /* removed */
>> -OPCODE_GAP(164) /* removed */
>> +OPCODE(1, 1, OTHR, IMG2HND)
>> +OPCODE(1, 1, OTHR, SAMP2HND, .is_tex = 1)
>>  OPCODE_GAP(165) /* removed */
>>  OPCODE(0, 0, OTHR, BARRIER)
>>
>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h 
>> b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
>> index 9a13fa6684..54a1ee15b6 100644
>> --- a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
>> +++ b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
>> @@ -160,6 +160,9 @@ OP13(UCMP)
>>  OP11(IABS)
>>  OP11(ISSG)
>>
>> +OP11(IMG2HND)
>> +OP11(SAMP2HND)
>> +
>>  OP12(IMUL_HI)
>>  OP12(UMUL_HI)
>>
>> diff --git a/src/gallium/docs/source/tgsi.rst 
>> b/src/gallium/docs/source/tgsi.rst
>> index 9e956586c4..a4a78e6267 100644
>> --- a/src/gallium/docs/source/tgsi.rst
>> +++ b/src/gallium/docs/source/tgsi.rst
>> @@ -2592,6 +2592,31 @@ For these opcodes, the resource can be a BUFFER, 
>> IMAGE, or MEMORY.
>>barrier in between.
>>
>>
>> +.. _bindlessopcodes:
>> +
>> +Bindless Opcodes
>> +
>> +
>> +These opcodes are for working with bindless sampler or image handles and
>> +require PIPE_CAP_BINDLESS_TEXTURE.
>> +
>> +.. opcode:: IMG2HND - Get a bindless handle for a image
>> +
>> +  Syntax: ``IMG2HND dst, image``
>> +
>> +  Example: ``IMG2HND TEMP[0], IMAGE[0]``
>> +
>> +  Sets 'dst' to a bindless handle for 'image'.
>> +
>> +.. opcode:: SAMP2HND - Get a bindless handle for a sampler view
>> +
>> +  Syntax: ``SAMP2HND dst, sampler``
>> +
>> +  Example: ``SAMP2HND TEMP[0], SVIEW[0]``
>> +
>> +  Sets 'dst' to a bindless handle for 'sampler'.
>
> You want SAMP[0] here, not SVIEW[0].
>
> Handles are defined to be 64-bit, so you should mention that only the
> first 2 channels are set, the yw channels are set to zero. (Or
> undefined ... although I prefer less undefined stuff.)

With those done:

Reviewed-by: Marek Olšák 

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/icl: Disable binding table prefetching

2018-07-19 Thread Anuj Phogat

From: Topi Pohjolainen 

Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to
disable prefetching of binding tables for ICLLP A0 and B0
steppings. It fixes multiple gpu hangs in
ext_framebuffer_multisample* tests on ICLLP B0 h/w.

Anuj: Add comments and commit message.
  Add gen 11 checks in the code.

Signed-off-by: Anuj Phogat 
---
 src/intel/blorp/blorp_genX_exec.h |  7 +++
 src/mesa/drivers/dri/i965/genX_state_upload.c | 14 +-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/src/intel/blorp/blorp_genX_exec.h 
b/src/intel/blorp/blorp_genX_exec.h
index 8bd9174b677..50341ab0ecf 100644
--- a/src/intel/blorp/blorp_genX_exec.h
+++ b/src/intel/blorp/blorp_genX_exec.h
@@ -762,6 +762,13 @@ blorp_emit_ps_config(struct blorp_batch *batch,
  ps.BindingTableEntryCount = 1;
   }
 
+ /* Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to
+  * disable prefetching of binding tables on A0 and B0 steppings.
+  * TODO: Revisit this WA on C0 stepping.
+  */
+  if (GEN_GEN == 11)
+ ps.BindingTableEntryCount = 0;
+
   if (prog_data) {
  ps._8PixelDispatchEnable = prog_data->dispatch_8;
  ps._16PixelDispatchEnable = prog_data->dispatch_16;
diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
b/src/mesa/drivers/dri/i965/genX_state_upload.c
index 9e0a17b9d93..b02acaf40e5 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -2165,7 +2165,13 @@ static const struct brw_tracked_state genX(wm_state) = {
pkt.KernelStartPointer = KSP(brw, stage_state->prog_offset);   \
pkt.SamplerCount   =   \
   DIV_ROUND_UP(CLAMP(stage_state->sampler_count, 0, 16), 4);  \
+   /* Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to \
+* disable prefetching of binding tables in A0 and B0 steppings.   \
+* TODO: Revisit this WA on C0 stepping.   \
+*/\
pkt.BindingTableEntryCount =   \
+  GEN_GEN == 11 ? \
+  0 : \
   stage_prog_data->binding_table.size_bytes / 4;  \
pkt.FloatingPointMode  = stage_prog_data->use_alt_mode;\
   \
@@ -3954,7 +3960,13 @@ genX(upload_ps)(struct brw_context *brw)
  DIV_ROUND_UP(CLAMP(stage_state->sampler_count, 0, 16), 4);
 
   /* BRW_NEW_FS_PROG_DATA */
-  ps.BindingTableEntryCount = prog_data->base.binding_table.size_bytes / 4;
+  /* Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to 
disable
+   * prefetching of binding tables in A0 and B0 steppings.
+   * TODO: Revisit this workaround on C0 stepping.
+   */
+  ps.BindingTableEntryCount = GEN_GEN == 11 ?
+  0 :
+  prog_data->base.binding_table.size_bytes / 4;
 
   if (prog_data->base.use_alt_mode)
  ps.FloatingPointMode = Alternate;
-- 
2.17.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-19 Thread Francisco Jerez

Chema Casanova  writes:

> El 14/07/18 a las 00:14, Francisco Jerez escribió:
>> Jose Maria Casanova Crespo  writes:
>> 
>>> For a register source/destination of an instruction the function returns
>>> the read/write byte pattern of a 32-byte registers as a unsigned int.
>>>
>>> The returned pattern takes into account the exec_size of the instruction,
>>> the type bitsize, the stride and if the register is source or destination.
>>>
>>> The objective of the functions if to help to know the read/written bytes
>>> of the instructions to improve the liveness analysis for partial 
>>> read/writes.
>>>
>>> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
>>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
>>> parameter they have a different read pattern.
>>> ---
>>>  src/intel/compiler/brw_fs.cpp  | 183 +
>>>  src/intel/compiler/brw_ir_fs.h |   1 +
>>>  2 files changed, 184 insertions(+)
>>>
>>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
>>> index 2b8363ca362..f3045c4ff6c 100644
>>> --- a/src/intel/compiler/brw_fs.cpp
>>> +++ b/src/intel/compiler/brw_fs.cpp
>>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
>>> this->dst.offset % REG_SIZE != 0);
>>>  }
>>>  
>>> +/**
>>> + * Returns a 32-bit uint whose bits represent if the associated register 
>>> byte
>>> + * has been read/written by the instruction. The returned pattern takes 
>>> into
>>> + * account the exec_size of the instruction, the type bitsize and the 
>>> register
>>> + * stride and the register is source or destination for the instruction.
>>> + *
>>> + * The objective of this function is to identify which parts of the 
>>> register
>>> + * are read or written for operations that don't read/write a full 
>>> register.
>>> + * So we can identify in live range variable analysis if a partial write 
>>> has
>>> + * completelly defined the part of the register used by a partial read. So 
>>> we
>>> + * avoid extending the liveness range because all data read was already
>>> + * defined although the wasn't completely written.
>>> + */
>>> +unsigned
>>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const
>>> +{
>>> +   if (is_dst) {
>
>> Please split into two functions (like fs_inst::src_read and
>> ::src_written) since that would make the call-sites of this method more
>> self-documenting than a boolean parameter.  You should be able to share
>> code by refactoring the common logic into a separate function (see below
>> for some suggestions on how that could be achieved).
>
> Sure, it would improve readability and simplifies the logic, I've chosen
> dst_write_pattern and src_read_pattern.
>
>> 
>>> +  /* We don't know what is written so we return the worts case */
>> 
>> "worst"
>
> Fixed.
>
>>> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
>>> + return 0;
>>> +  /* We assume that send destinations are completely written */
>>> +  if (this->is_send_from_grf())
>>> + return ~0u;
>> 
>> Some send-like instructions won't be caught by this condition, you
>> should check for this->mlen != 0 in addition.
>
> Would it be enough to check for (this->mlen > 0) and forget about
> is_send_from_grf? I am using this approach in v2 I am sending.
>

I don't think the mlen > 0 condition would catch all cases either...
E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC.  You probably need both
conditions.  Sucks...

>>> +   } else {
>>> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
>>> +   * so the read pattern depends on the bitsize stored at src[4]
>>> +   */
>>> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
>>> +  this->src[1].nr == r.nr) {
>
>> I feel uncomfortable about attempting to guess the source the caller is
>> referring to by comparing the registers for equality.  E.g.  you could
>> potentially end up with two sources that compare equal but have
>> different semantics (e.g. as a result of CSE) which might cause it to
>> get the wrong answer.  It would probably be better to pass a source
>> index and a byte offset as argument instead of an fs_reg.
>
> I've didn't thought about CSE, I'm now receiving the number of source
> and the reg_offset. I'm using reg_offset instead of byte offsets as it
> simplifies the logic. Now we are using always the base src register to
> do all the calculation
>>> + switch (this->src[4].ud) {
>>> + case 32:
>>> +return ~0u;
>>> + case 16:
>>> +return 0x;
>>> + case 8:
>>> +return 0x;
>>> + default:
>>> +unreachable("Unsupported bitsize at 
>>> byte_scattered_write_logical");
>>> + }
>> 
>> Replace the above switch statement with a call to "periodic_mask(8, 4,
>> this->src[4].ud / 8)" (see below for the definition).
>
> Ok.
>
>>> +  }
>>> +  /* As for

Re: [Mesa-dev] [PATCH 0/6] Fix Various Compilation Issues With Bindless

2018-07-19 Thread Rhys Perry

I think it needs more review and for the feedback on the first and
second patches to be addressed before being pushed?

I'm not a member of mesa group on the Gitlab, so I can't anyway.


On Thu, Jul 19, 2018 at 11:14 PM, Marek Olšák  wrote:
> Hi,
>
> Do you plan to push this?
>
> Marek
>
> On Wed, Jun 6, 2018 at 3:55 PM, Rhys Perry  wrote:
>> Previously, there were some errors in the compiler's implementation of
>> ARB_bindless_texture, mostly related to usage of bound image or sampler
>> handles allowed by ARB_bindless_texture, resulting in assertions or
>> compilation errors. This series fixes following issues found in mesa:
>> - Assertions when casting bound handles to uvec2
>> - Compilation errors when using the ?: operator with bound handles
>> - Assertions creating a constant image/sampler handle
>>- For example: image2D(uvec2(5, 6))
>> - Inlining of function calls with rvalues other than dereferences to
>>   handle uniforms passed into them creates assertion failures
>> - Usage of bound handles as l-values
>>
>> In order to create bindless handles from bound images or samplers, two new
>> TGSI opcodes needed to be added: SAMP2HND and IMG2HND. These are used when
>> casting bound handles or when using them as l-values (e.g. using them with
>> the ?: operator).
>>
>> This series has the following limitations because I don't have the
>> hardware needed to test the needed changes:
>> - radeonsi and gallivm do not handle SAMP2HND and IMG2HND
>> - similar instructions/intrinsics for nir have not been added
>> - the tgsi to nir conversion code does not handle SAMP2HND and IMG2HND
>> - IMG2HND with Kepler is not implemented
>> Usage of bound handles as l-values and casting them is handled better than
>> before though.
>>
>> Some tests for these changes have been posted on the piglit mailing list.
>>
>> Rhys Perry (6):
>>   gallium: add new SAMP2HND and IMG2HND opcodes
>>   nv50/ir: add support for SAMP2HND on gk104+ and IMG2HND on gm107+
>>   glsl_to_tgsi: allow bound samplers and images to be used as l-values
>>   glsl: allow ?: operator with images and samplers when bindless is enabled
>>   glsl,glsl_to_tgsi: fix sampler/image constants
>>   glsl: fix function inlining with opaque parameters
>>
>>  src/compiler/glsl/ast_to_hir.cpp   |  8 ++-
>>  src/compiler/glsl/ir.cpp   | 32 +-
>>  src/compiler/glsl/opt_function_inlining.cpp| 52 +---
>>  src/gallium/auxiliary/tgsi/tgsi_info.c |  2 +
>>  src/gallium/auxiliary/tgsi/tgsi_info_opcodes.h |  4 +-
>>  src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h   |  3 +
>>  src/gallium/docs/source/tgsi.rst   | 25 
>>  src/gallium/drivers/nouveau/codegen/nv50_ir.cpp|  2 +
>>  src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  2 +
>>  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 22 +++
>>  .../drivers/nouveau/codegen/nv50_ir_inlines.h  |  4 +-
>>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 25 
>>  .../nouveau/codegen/nv50_ir_lowering_nvc0.h|  1 +
>>  .../drivers/nouveau/codegen/nv50_ir_print.cpp  |  2 +
>>  .../drivers/nouveau/codegen/nv50_ir_target.cpp |  7 ++-
>>  src/gallium/include/pipe/p_shader_tokens.h |  2 +
>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 69 
>> --
>>  src/mesa/state_tracker/st_glsl_to_tgsi_private.h   |  1 +
>>  18 files changed, 239 insertions(+), 24 deletions(-)
>>
>> --
>> 2.14.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/6] Fix Various Compilation Issues With Bindless

2018-07-19 Thread Marek Olšák

Hi,

Do you plan to push this?

Marek

On Wed, Jun 6, 2018 at 3:55 PM, Rhys Perry  wrote:
> Previously, there were some errors in the compiler's implementation of
> ARB_bindless_texture, mostly related to usage of bound image or sampler
> handles allowed by ARB_bindless_texture, resulting in assertions or
> compilation errors. This series fixes following issues found in mesa:
> - Assertions when casting bound handles to uvec2
> - Compilation errors when using the ?: operator with bound handles
> - Assertions creating a constant image/sampler handle
>- For example: image2D(uvec2(5, 6))
> - Inlining of function calls with rvalues other than dereferences to
>   handle uniforms passed into them creates assertion failures
> - Usage of bound handles as l-values
>
> In order to create bindless handles from bound images or samplers, two new
> TGSI opcodes needed to be added: SAMP2HND and IMG2HND. These are used when
> casting bound handles or when using them as l-values (e.g. using them with
> the ?: operator).
>
> This series has the following limitations because I don't have the
> hardware needed to test the needed changes:
> - radeonsi and gallivm do not handle SAMP2HND and IMG2HND
> - similar instructions/intrinsics for nir have not been added
> - the tgsi to nir conversion code does not handle SAMP2HND and IMG2HND
> - IMG2HND with Kepler is not implemented
> Usage of bound handles as l-values and casting them is handled better than
> before though.
>
> Some tests for these changes have been posted on the piglit mailing list.
>
> Rhys Perry (6):
>   gallium: add new SAMP2HND and IMG2HND opcodes
>   nv50/ir: add support for SAMP2HND on gk104+ and IMG2HND on gm107+
>   glsl_to_tgsi: allow bound samplers and images to be used as l-values
>   glsl: allow ?: operator with images and samplers when bindless is enabled
>   glsl,glsl_to_tgsi: fix sampler/image constants
>   glsl: fix function inlining with opaque parameters
>
>  src/compiler/glsl/ast_to_hir.cpp   |  8 ++-
>  src/compiler/glsl/ir.cpp   | 32 +-
>  src/compiler/glsl/opt_function_inlining.cpp| 52 +---
>  src/gallium/auxiliary/tgsi/tgsi_info.c |  2 +
>  src/gallium/auxiliary/tgsi/tgsi_info_opcodes.h |  4 +-
>  src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h   |  3 +
>  src/gallium/docs/source/tgsi.rst   | 25 
>  src/gallium/drivers/nouveau/codegen/nv50_ir.cpp|  2 +
>  src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  2 +
>  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 22 +++
>  .../drivers/nouveau/codegen/nv50_ir_inlines.h  |  4 +-
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 25 
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.h|  1 +
>  .../drivers/nouveau/codegen/nv50_ir_print.cpp  |  2 +
>  .../drivers/nouveau/codegen/nv50_ir_target.cpp |  7 ++-
>  src/gallium/include/pipe/p_shader_tokens.h |  2 +
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 69 
> --
>  src/mesa/state_tracker/st_glsl_to_tgsi_private.h   |  1 +
>  18 files changed, 239 insertions(+), 24 deletions(-)
>
> --
> 2.14.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] virgl: add initial shader_storage_buffer_object support. (v2)

2018-07-19 Thread Dave Airlie

From: Dave Airlie 

This adds the guest side support for ARB_shader_storage_buffer_object.

Co-authors: Gurchetan Singh 

v2: move to using separate maximums
---
 src/gallium/drivers/virgl/virgl_buffer.c   |  2 ++
 src/gallium/drivers/virgl/virgl_context.c  | 44 ++
 src/gallium/drivers/virgl/virgl_context.h  |  2 ++
 src/gallium/drivers/virgl/virgl_encode.c   | 25 +
 src/gallium/drivers/virgl/virgl_encode.h   |  5 
 src/gallium/drivers/virgl/virgl_hw.h   |  3 ++
 src/gallium/drivers/virgl/virgl_protocol.h | 10 +++
 src/gallium/drivers/virgl/virgl_resource.h |  2 ++
 src/gallium/drivers/virgl/virgl_screen.c   |  5 
 9 files changed, 98 insertions(+)

diff --git a/src/gallium/drivers/virgl/virgl_buffer.c 
b/src/gallium/drivers/virgl/virgl_buffer.c
index 2e63aebc72c..3288bb20bd1 100644
--- a/src/gallium/drivers/virgl/virgl_buffer.c
+++ b/src/gallium/drivers/virgl/virgl_buffer.c
@@ -164,6 +164,8 @@ struct pipe_resource *virgl_buffer_create(struct 
virgl_screen *vs,
vbind = pipe_to_virgl_bind(template->bind);
size = template->width0;
 
+   if (vbind == VIRGL_BIND_SHADER_BUFFER)
+  buf->base.clean = FALSE;
buf->base.hw_res = vs->vws->resource_create(vs->vws, template->target, 
template->format, vbind, template->width0, 1, 1, 1, 0, 0, size);
 
util_range_set_empty(>valid_buffer_range);
diff --git a/src/gallium/drivers/virgl/virgl_context.c 
b/src/gallium/drivers/virgl/virgl_context.c
index ee28680b8fc..74b232fe6cf 100644
--- a/src/gallium/drivers/virgl/virgl_context.c
+++ b/src/gallium/drivers/virgl/virgl_context.c
@@ -168,6 +168,20 @@ static void virgl_attach_res_uniform_buffers(struct 
virgl_context *vctx,
}
 }
 
+static void virgl_attach_res_shader_buffers(struct virgl_context *vctx,
+enum pipe_shader_type shader_type)
+{
+   struct virgl_winsys *vws = virgl_screen(vctx->base.screen)->vws;
+   struct virgl_resource *res;
+   unsigned i;
+   for (i = 0; i < PIPE_MAX_SHADER_BUFFERS; i++) {
+  res = virgl_resource(vctx->ssbos[shader_type][i]);
+  if (res) {
+ vws->emit_res(vws, vctx->cbuf, res->hw_res, FALSE);
+  }
+   }
+}
+
 /*
  * after flushing, the hw context still has a bunch of
  * resources bound, so we need to rebind those here.
@@ -183,6 +197,7 @@ static void virgl_reemit_res(struct virgl_context *vctx)
for (shader_type = 0; shader_type < PIPE_SHADER_TYPES; shader_type++) {
   virgl_attach_res_sampler_views(vctx, shader_type);
   virgl_attach_res_uniform_buffers(vctx, shader_type);
+  virgl_attach_res_shader_buffers(vctx, shader_type);
}
virgl_attach_res_vertex_buffers(vctx);
virgl_attach_res_so_targets(vctx);
@@ -911,6 +926,34 @@ static void virgl_blit(struct pipe_context *ctx,
 blit);
 }
 
+static void virgl_set_shader_buffers(struct pipe_context *ctx,
+ enum pipe_shader_type shader,
+ unsigned start_slot, unsigned count,
+ const struct pipe_shader_buffer *buffers)
+{
+   struct virgl_context *vctx = virgl_context(ctx);
+   struct virgl_screen *rs = virgl_screen(ctx->screen);
+
+   for (unsigned i = 0; i < count; i++) {
+  unsigned idx = start_slot + i;
+
+  if (buffers) {
+ if (buffers[i].buffer) {
+pipe_resource_reference(>ssbos[shader][idx], 
buffers[i].buffer);
+continue;
+ }
+  }
+  pipe_resource_reference(>ssbos[shader][idx], NULL);
+   }
+
+   uint32_t max_shader_buffer = shader == PIPE_SHADER_FRAGMENT ?
+  rs->caps.caps.v2.max_shader_buffer_frag_compute :
+  rs->caps.caps.v2.max_shader_buffer_other_stages;
+   if (!max_shader_buffer)
+  return;
+   virgl_encode_set_shader_buffers(vctx, shader, start_slot, count, buffers);
+}
+
 static void
 virgl_context_destroy( struct pipe_context *ctx )
 {
@@ -1048,6 +1091,7 @@ struct pipe_context *virgl_context_create(struct 
pipe_screen *pscreen,
vctx->base.flush_resource = virgl_flush_resource;
vctx->base.blit =  virgl_blit;
 
+   vctx->base.set_shader_buffers = virgl_set_shader_buffers;
virgl_init_context_resource_functions(>base);
virgl_init_query_functions(vctx);
virgl_init_so_functions(vctx);
diff --git a/src/gallium/drivers/virgl/virgl_context.h 
b/src/gallium/drivers/virgl/virgl_context.h
index 3492dcfa494..5747654ea82 100644
--- a/src/gallium/drivers/virgl/virgl_context.h
+++ b/src/gallium/drivers/virgl/virgl_context.h
@@ -68,6 +68,8 @@ struct virgl_context {
unsigned num_so_targets;
 
struct pipe_resource *ubos[PIPE_SHADER_TYPES][PIPE_MAX_CONSTANT_BUFFERS];
+
+   struct pipe_resource *ssbos[PIPE_SHADER_TYPES][PIPE_MAX_SHADER_BUFFERS];
int num_transfers;
int num_draws;
struct list_head to_flush_bufs;
diff --git a/src/gallium/drivers/virgl/virgl_encode.c 
b/src/gallium/drivers/virgl/virgl_encode.c
index c1af01b6fdf..b09366dcee6 100644
---

Re: [Mesa-dev] [PATCH 1/2] mesa: MESA_framebuffer_flip_y extension [v3]

2018-07-19 Thread Fritz Koenig

On Wed, Jul 11, 2018 at 3:54 PM Chad Versace  wrote:
>
> +Ken, I had a question about GLboolean. I call you by name in the
> comments below.
>
> On Fri 29 Jun 2018, Fritz Koenig wrote:
> > Adds an extension to glFramebufferParameteri
> > that will specify if the framebuffer is vertically
> > flipped. Historically system framebuffers are
> > vertically flipped and user framebuffers are not.
> > Checking to see the state was done by looking at
> > the name field.  This adds an explicit field.
> >
> > v2:
> > * updated spec language [for chadv]
> > * correctly specifying ES 3.1 [for chadv]
> > * refactor access to rb->Name [for jason]
> > * handle GetFramebufferParameteriv [for chadv]
> > v3:
> > * correct _mesa_GetMultisamplefv [for kusmabite]
> > ---
>
> >  docs/specs/MESA_framebuffer_flip_y.spec| 84 ++
>
> Use file extension '.txt'. Khronos no longer uses the '.spec' extension.
>
> File docs/specs/enums.txt needs an update too.
>
> >  include/GLES2/gl2ext.h |  5 ++
> >  src/mapi/glapi/registry/gl.xml |  6 ++
> >  src/mesa/drivers/dri/i915/intel_fbo.c  |  7 +-
> >  src/mesa/drivers/dri/i965/intel_fbo.c  |  7 +-
> >  src/mesa/drivers/dri/nouveau/nouveau_fbo.c |  7 +-
> >  src/mesa/drivers/dri/radeon/radeon_fbo.c   |  7 +-
> >  src/mesa/drivers/dri/radeon/radeon_span.c  |  9 ++-
> >  src/mesa/drivers/dri/swrast/swrast.c   |  7 +-
> >  src/mesa/drivers/osmesa/osmesa.c   |  5 +-
> >  src/mesa/drivers/x11/xm_buffer.c   |  3 +-
> >  src/mesa/drivers/x11/xmesaP.h  |  3 +-
> >  src/mesa/main/accum.c  | 17 +++--
> >  src/mesa/main/dd.h |  3 +-
> >  src/mesa/main/extensions_table.h   |  1 +
> >  src/mesa/main/fbobject.c   | 18 -
> >  src/mesa/main/framebuffer.c|  1 +
> >  src/mesa/main/glheader.h   |  3 +
> >  src/mesa/main/mtypes.h |  3 +
> >  src/mesa/main/readpix.c| 20 +++---
> >  src/mesa/state_tracker/st_cb_fbo.c |  7 +-
> >  src/mesa/swrast/s_blit.c   | 17 +++--
> >  src/mesa/swrast/s_clear.c  |  3 +-
> >  src/mesa/swrast/s_copypix.c| 11 +--
> >  src/mesa/swrast/s_depth.c  |  6 +-
> >  src/mesa/swrast/s_drawpix.c| 26 ---
> >  src/mesa/swrast/s_renderbuffer.c   |  6 +-
> >  src/mesa/swrast/s_renderbuffer.h   |  3 +-
> >  src/mesa/swrast/s_stencil.c|  3 +-
> >  29 files changed, 241 insertions(+), 57 deletions(-)
> >  create mode 100644 docs/specs/MESA_framebuffer_flip_y.spec
> >
> > diff --git a/docs/specs/MESA_framebuffer_flip_y.spec 
> > b/docs/specs/MESA_framebuffer_flip_y.spec
> > new file mode 100644
> > index 00..dca77a9541
> > --- /dev/null
> > +++ b/docs/specs/MESA_framebuffer_flip_y.spec
> > @@ -0,0 +1,84 @@
> > +Name
> > +
> > +MESA_framebuffer_flip_y
> > +
> > +Name Strings
> > +
> > +GL_MESA_framebuffer_flip_y
> > +
> > +Contact
> > +
> > +Fritz Koenig 
> > +
> > +Contributors
> > +
> > +Fritz Koenig, Google
> > +Kristian Høgsberg, Google
> > +Chad Versace, Google
> > +
> > +Status
> > +
> > +Proposal
> > +
> > +Version
> > +
> > +Version 1, June 7, 2018
> > +
> > +Number
> > +
> > +TBD
> > +
> > +Dependencies
> > +
> > +OpenGL ES 3.1 is required, for FramebufferParameteri.
> > +
> > +Overview
> > +
> > +Rendered buffers are normally returned right side up, as accessed
> > +top to bottom.  This extension allows those buffers to be upside down
> > +when accessed top to bottom.
> > +
> > +This extension defines a new framebuffer parameter,
> > +GL_FRAMEBUFFER_FLIP_Y_MESA, that changes the behavior of the reads and
> > +writes to the framebuffer attachment points. When 
> > GL_FRAMEBUFFER_FLIP_Y_MESA
> > +is GL_TRUE, render commands and pixel transfer operations access the
> > +backing store of each attachment point with an y-inverted coordinate
> > +system. This y-inversion is relative to the coordinate system set when
> > +GL_FRAMEBUFFER_FLIP_Y_MESA is GL_FALSE.
> > +
> > +Access through TexSubImage2D and similar calls will notice the effect 
> > of
> > +the flip when they are not attached to framebuffer objects because
> > +GL_FRAMEBUFFER_FLIP_Y_MESA is associated with the framebuffer object 
> > and
> > +not the attachment points.
> > +
> > +IP Status
> > +
> > +None
> > +
> > +Issues
> > +
> > +None
> > +
> > +New Procedures and Functions
> > +
> > +None
> > +
> > +New Types
> > +
> > +None
> > +
> > +New Tokens
> > +
> > +Accepted by the  argument of FramebufferParameteri and
> > +GetFramebufferParameteriv:
> > +
> > +GL_FRAMEBUFFER_FLIP_Y_MESA  0x8BBB
> > +
> > +Errors
> > +GL_INVALID_OPERATION is returned from  GetFramebufferParameteriv if 
> > this
> > +is called on a winsys

Re: [Mesa-dev] [PATCH] util/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache.

2018-07-19 Thread Bas Nieuwenhuizen

On Thu, Jul 19, 2018 at 6:48 PM, Eric Engestrom
 wrote:
> On Wednesday, 2018-07-18 14:01:49 +0200, Bas Nieuwenhuizen wrote:
>> radv always needs it, so just check the header instead. Also
>> do not declare the function if the variable is not set, so we
>> get a nice compile error instead of failing to open a device
>> at runtime.
>
> If that's the goal, why have any guards? Just #include the header,
> that's your compilation error if it's missing :)

Well the goal is only to introduce a compile error if we do not have
it, but any of the users of the function gets build. Notably this
function only gets used in radv,radeonsi,r600g and nouveau, while the
disk_cache.h gets included also from e.g. the glsl compiler which is
an issue for e.g. windows.

radv works fine without the disk cache enabled, but still needs this
function or it fails device creation (a non-disk cache is part of the
vulkan API).

>
>>
>> Fixes: b87ef9e606a "util: fix MSVC build issue in disk_cache.h"
>> ---
>>  configure.ac  | 1 +
>>  meson.build   | 2 +-
>>  src/util/disk_cache.h | 8 +++-
>>  3 files changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/configure.ac b/configure.ac
>> index c946454cfae..ffb8424a07b 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -872,6 +872,7 @@ AC_HEADER_MAJOR
>>  AC_CHECK_HEADER([xlocale.h], [DEFINES="$DEFINES -DHAVE_XLOCALE_H"])
>>  AC_CHECK_HEADER([sys/sysctl.h], [DEFINES="$DEFINES -DHAVE_SYS_SYSCTL_H"])
>>  AC_CHECK_HEADERS([endian.h])
>> +AC_CHECK_HEADER([dlfcn.h], [DEFINES="$DEFINES -DHAVE_DLFCN_H"])
>>  AC_CHECK_FUNC([strtof], [DEFINES="$DEFINES -DHAVE_STRTOF"])
>>  AC_CHECK_FUNC([mkostemp], [DEFINES="$DEFINES -DHAVE_MKOSTEMP"])
>>  AC_CHECK_FUNC([timespec_get], [DEFINES="$DEFINES -DHAVE_TIMESPEC_GET"])
>> diff --git a/meson.build b/meson.build
>> index e05645cbf39..86a4a4ce6da 100644
>> --- a/meson.build
>> +++ b/meson.build
>> @@ -960,7 +960,7 @@ elif cc.has_header_symbol('sys/mkdev.h', 'major')
>>pre_args += '-DMAJOR_IN_MKDEV'
>>  endif
>>
>> -foreach h : ['xlocale.h', 'sys/sysctl.h', 'linux/futex.h', 'endian.h']
>> +foreach h : ['xlocale.h', 'sys/sysctl.h', 'linux/futex.h', 'endian.h', 
>> 'dlfcn.h']
>>if cc.compiles('#include <@0@>'.format(h), name : '@0@'.format(h))
>>  pre_args += '-DHAVE_@0@'.format(h.to_upper().underscorify())
>>endif
>> diff --git a/src/util/disk_cache.h b/src/util/disk_cache.h
>> index f84840fb5ca..50bd9f41ac4 100644
>> --- a/src/util/disk_cache.h
>> +++ b/src/util/disk_cache.h
>> @@ -24,7 +24,7 @@
>>  #ifndef DISK_CACHE_H
>>  #define DISK_CACHE_H
>>
>> -#ifdef ENABLE_SHADER_CACHE
>> +#ifdef HAVE_DLFCN_H
>>  #include 
>>  #endif
>>  #include 
>> @@ -88,10 +88,10 @@ disk_cache_format_hex_id(char *buf, const uint8_t 
>> *hex_id, unsigned size)
>> return buf;
>>  }
>>
>> +#ifdef HAVE_DLFCN_H
>>  static inline bool
>>  disk_cache_get_function_timestamp(void *ptr, uint32_t* timestamp)
>>  {
>> -#ifdef ENABLE_SHADER_CACHE
>> Dl_info info;
>> struct stat st;
>> if (!dladdr(ptr, ) || !info.dli_fname) {
>> @@ -102,10 +102,8 @@ disk_cache_get_function_timestamp(void *ptr, uint32_t* 
>> timestamp)
>> }
>> *timestamp = st.st_mtime;
>> return true;
>> -#else
>> -   return false;
>> -#endif
>>  }
>> +#endif
>>
>>  /* Provide inlined stub functions if the shader cache is disabled. */
>>
>> --
>> 2.18.0
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 0/4] Android kms_swrast support

2018-07-19 Thread Rob Herring

On Thu, Jul 19, 2018 at 9:52 AM Robert Foss  wrote:
>
> Hey Rob,
>
> On 2018-07-19 09:26, Tomasz Figa wrote:
> > On Thu, Jul 19, 2018 at 12:08 AM Robert Foss  
> > wrote:
> >>
> >> Hey Rob,
> >>
> >> On 2018-07-18 15:30, Rob Herring wrote:
> >>> On Tue, Jul 17, 2018 at 4:33 AM Robert Foss  
> >>> wrote:
> 
>  This series implements kms_swrast support for the Android
>  platform. And since having to debug a null pointer dereference,
>  simplify that process for the next guy.
> >>>
> >>> So is this working for you now?
> >>
> >> I'm seeing page-flips happen in the logs, but have no graphical output on 
> >> the
> >> Qemu-based setup I'm using now.
> >>
> >> When using virgl I'm seeing the same page-flipping in the logs, but no 
> >> graphical
> >> output.
> >>
> >>>
>  As it stands now, any kernel must have the following ioctls flagged with
>  DRM_RENDER_ALLOW[1], which isn't the case in the mainline kernel.
> 
>  DRM_IOCTL_MODE_CREATE_DUMB
>  DRM_IOCTL_MODE_MAP_DUMB
> >>>
> >>> Ah, sorry. I should have mentioned this. We have discussed this issue
> >>> in the past, but to no further conclusion.
> >>>
> >>> But as I recall, I thought the issue was also allowing import and
> >>> export of dumb buffers?
> >>
> >> Yeah, it's a two-parter for any AOSP Treble build.
> >> 1) Allow dumb buffer ioctls fom render nodes
> >> 2) Support moving buffers across processes.
> >
> > Wouldn't 2) be automatically solved by 1), since we should be able to
> > run drmPrimeHandleToFD for dumb buffers already?
> >
>
> I thought, perhaps wrongly that drmPrimeHandleToFD was only applicable to 
> dmabufs.
>
> I think I've misunderstood the restrictions of dumb buffers, if they're
> shareable across processes using drmPrimeHandleToFD then only 1) is in our 
> way.

I thought the issue was either:
1) a card node can't export dumb buffers
2) a render node can't allocate dumb buffers

My approach had been disabling the permission check for 1. Tomasz's
approach had been to allow VGEM to allocate dumb buffers.

Either way, we either need more permissions on dumb buffers or define
some new s/w rendering scanout buffers which are just dumb buffers,
but not called that because upstream doesn't want to extend dumb
buffers.

>  While it would be possible to open a non-render node to pass the
>  authentication check, this would still cause authentication issues
>  when the /dev/dri/cardX node needs to be opened as master by both mesa
>  and the compositor.
> >>>
> >>> Right. We've pretty much stripped the support that was there out. Plus
> >>> I don't think it will work with Treble.
> >>>
>  I don't know how acceptable this series is for upstreaming, while 
>  relying on
>  a non-mainline kernel. I think the policy is to not accept changes that
>  don't have both a user and kernel space solution in place.
> 
>  Like I noted yesterday[2] the alternative to using dumb buffers and 
>  having
>  authentication issues is using VGEM, which is new territory to me, and 
>  it would
>  take me a little bit of time to figure exactly how it fits into the 
>  current
>  kms_swrast approach.
>  Input, like noted before, is very much welcome.
> >>>
> >>> I'm very much in favor of the former approach. VGEM seems like an
> >>> overly complicated solution when there's a very simple solution.
> >>>
> >>
> >> The former solution being what we have now, dumb buffers?
> >> I don't think dumb buffers are a viable path due to 2) listed above.
> >
> > I don't understand what 2) is about. Could you elaborate on it?
>
> See above!
>
> >
> > I'd personally be for dropping those strange restrictions from render
> > nodes. I don't see why a render node couldn't allocate and map a dumb
> > buffer (for software rendering) and share it with another process that
> > opened a control node (to display it).
>
>  From my understanding the wider communitys idea is to minimize the use of 
> dumb
> buffers.
> A part of not allowing render nodes to use map dumb buffers is meant to
> incentivize proprietary drivers to not do the simplest thing that could 
> possibly
> work, as far as I understand.
>
> So while I'm happy to push that change upstream, if for no other reason than 
> to
> generate a dialogue, maybe it's not all that likely that it will be accepted.

I think we have a valid usecase and it's worth the discussion at least
to force some guidance on direction upstream would like.

> >> If there are any other options I'm not aware of, I'm very much listening.
> >
> > One could just call mmap() on DMA-buf FDs directly rather than
> > importing them, but that could open another can of worms, because FDs
> > don't give us any way to deduplicate buffers (you might be given
> > several FDs pointing to the same buffer, which in case of importing to
> > DRM would end up with the same GEM handle every time).
> >
>
> So mmap()ing dmabuf FDs dealing with that can of worms is preferable

Re: [Mesa-dev] [PATCH 1/3] nir: allow nir search type check to see through bcsel

2018-07-19 Thread Ian Romanick

On 07/18/2018 11:17 PM, Ian Romanick wrote:
> Oh man... I was also recently looking at that same compute shader, and I
> wrote nearly identical patches the early part of last week.  The bcsel
> patches caused a bit of pain for i965.  I came up with a different way
> to handle that particular problem... either way, I eventually abandoned
> the whole approach.  Adding a bunch of one-off cases for weird
> combinations of logic expressions (and that shader has some doozies!)
> just isn't scalable.
> 
> I've pushed a branch logic-expression-frobbing to my cgit with all that
> work.
> 
> In the mean time, I have been working code that generically optimizes
> logical expressions.  I'm hoping to get that sent out next week.  So
> far, it looks like it should be able to achieve the same affect on this
> particular shader.  This new pass should make most, if not all, of the
> logic expression algebraic optimizations in nir_opt_algebraic.
> 
> As soon as I can run shader-db, I'll post a branch.

It can't run shader-db, but opt-minimize-Boolean has what I've done so
far.  You may find the TEST_F(quine_mccluskey_test, real_world_shader)
test interesting. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 3/4] intel: tools: aubwrite: fix invalid frees on finish

2018-07-19 Thread Jordan Justen

Reviewed-by: Jordan Justen 

On 2018-07-18 10:21:31, Lionel Landwerlin wrote:
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aub_write.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/src/intel/tools/aub_write.c b/src/intel/tools/aub_write.c
> index 1224e8f6b7f..de4ce33 100644
> --- a/src/intel/tools/aub_write.c
> +++ b/src/intel/tools/aub_write.c
> @@ -255,11 +255,16 @@ align_u32(uint32_t v, uint32_t a)
>  }
>  
>  static void
> -aub_ppgtt_table_finish(struct aub_ppgtt_table *table)
> +aub_ppgtt_table_finish(struct aub_ppgtt_table *table, int level)
>  {
> +   if (level == 1)
> +  return;
> +
> for (unsigned i = 0; i < ARRAY_SIZE(table->subtables); i++) {
> -  aub_ppgtt_table_finish(table->subtables[i]);
> -  free(table->subtables[i]);
> +  if (table->subtables[i]) {
> + aub_ppgtt_table_finish(table->subtables[i], level - 1);
> + free(table->subtables[i]);
> +  }
> }
>  }
>  
> @@ -280,7 +285,7 @@ aub_file_init(struct aub_file *aub, FILE *file, uint16_t 
> pci_id)
>  void
>  aub_file_finish(struct aub_file *aub)
>  {
> -   aub_ppgtt_table_finish(>pml4);
> +   aub_ppgtt_table_finish(>pml4, 4);
> fclose(aub->file);
>  }
>  
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl/ast: Fix loss of error_emitted value due to reassignment

2018-07-19 Thread Ian Romanick

On 07/18/2018 01:53 AM, Danylo Piliaiev wrote:
> Signed-off-by: Danylo Piliaiev 
> ---
>  src/compiler/glsl/ast_to_hir.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/compiler/glsl/ast_to_hir.cpp 
> b/src/compiler/glsl/ast_to_hir.cpp
> index dd60a2a87f..8a4cc56511 100644
> --- a/src/compiler/glsl/ast_to_hir.cpp
> +++ b/src/compiler/glsl/ast_to_hir.cpp
> @@ -1938,7 +1938,7 @@ ast_expression::do_hir(exec_list *instructions,
>result = get_lvalue_copy(instructions, op[0]->clone(ctx, NULL));
>  
>ir_rvalue *junk_rvalue;
> -  error_emitted =
> +  error_emitted |=
>   do_assignment(instructions, state,
> this->subexpressions[0]->non_lvalue_description,
> op[0]->clone(ctx, NULL), temp_rhs,
> 

Are there any tests that encounter this?  It seems like do_assignment
should always generate an error if either of the operands have errors.

I notice that the ast_pre_inc / ast_pre_dec case just before this only
sets error_emitted once.  Whatever we decided, ast_pre_* and ast_post_*
should do it the same way.  Intuitively, I think the ast_pre_inc /
ast_pre_dec method is correct, but you'll have to confirm or refute that
with testing. :)

Since this is a Boolean value, I have a mild preference for

  error_emitted =
 do_assignment(instructions, state,
   this->subexpressions[0]->non_lvalue_description,
   op[0]->clone(ctx, NULL), temp_rhs,
   _rvalue, false, false,
   this->subexpressions[0]->get_location()) ||
 error_emitted;

or (as is done in the ast_array_index case just below this):

  if (do_assignment(instructions, state,
this->subexpressions[0]->non_lvalue_description,
op[0]->clone(ctx, NULL), temp_rhs,
_rvalue, false, false,
this->subexpressions[0]->get_location()))
 error_emitted = true;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107276] radv: OpBitfieldUExtract returns incorrect result when count is zero

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107276

Samuel Pitoiset  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Samuel Pitoiset  ---
Should be fixed with
https://cgit.freedesktop.org/mesa/mesa/commit/?id=3d41757788aca774e64297bed962696cc0c9b262

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: Allow ES2 function parameters to be hidden by variable declarations.

2018-07-19 Thread Ian Romanick

On 07/18/2018 03:03 PM, Eric Anholt wrote:
> Ian Romanick  writes:
> 
>> On 07/16/2018 02:46 PM, Eric Anholt wrote:
>>> This fixes dEQP case:
>>>
>>> dEQP-GLES2.functional.shaders.scoping.valid.local_variable_hides_function_parameter_fragment
>>
>> Are we sure that test is correct?  I'm sure I already know the answer,
>> but does the test contain any justification or spec references?  I just
>> re-read section 4.2 (Scoping) of the ESSL 1.00 spec, and I don't see
>> anything to support this.  Did I miss something?
>>
>> In fact, the grammar says:
>>
>> function_definition:
>> function_prototype compound_statement_no_new_scope
>>
>> So... I think this test is just wrong.
> 
> OK, so I'm confused why this test still exists, if people have managed
> to get conformance on Mesa.  I'm on master of VK-GL-CTS, and it's still
> in the mustpass file:
> 
> external/openglcts/data/mustpass/gles/aosp_mustpass/master/gles2-master.txt:dEQP-GLES2.functional.shaders.scoping.valid.local_variable_hides_function_parameter_fragment

There are a huge pile of test lists, and I have never really understood
the whole mess.  There are some that only matter for some kind of
Android conformance runs.  There are some that only matter for Khronos
conformance runs.  And there are some that don't seem to matter for
anything at all.

> I don't see anything that would exclude the test -- there's
> gles2-driver-issues.txt, but that appears to only be used to exclude
> tests from AOSP DEQP usage.
> 
> Could whoever on the Intel side submitted a conformance package for Mesa
> send me a copy?  I haven't been able to find it on the Khronos site, and
> I suspect it would help me understand how to achieve conformance with
> Mesa.
> dEQP-GLES3.functional.shaders.preprocessor.predefined_macros.line_2_vertex
> is another one that fails on Mesa with i965, and seems to have been in
> the testsuite forever.

When we do conformance submissions, we don't run off master.  We use
whatever is tip of the per-API release branch.  We then do the
"official" run using 'cd external/openglcts/modules; ./cts-runner
--type='.  I haven't pulled any of the repos since last
year, so this information may be out of date.



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] intel/isl/gen4: Make depth/stencil buffers Y-Tiled

2018-07-19 Thread Nanley Chery

On Thu, Jul 19, 2018 at 10:16:47AM -0700, Kenneth Graunke wrote:
> On Tuesday, July 17, 2018 10:45:28 AM PDT Nanley Chery wrote:
> > On Tue, Jul 17, 2018 at 08:19:30AM -0700, Kenneth Graunke wrote:
> > > Personally, I'd be inclined to simply make this
> > > 
> > >*flags &= ISL_TILING_Y0_BIT;
> 
> While I still think the above is simpler and perhaps safer, your
> patches seem correct to me, and are:
> 
> Reviewed-by: Kenneth Graunke 

Thank you for the review.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] intel/isl/gen4: Make depth/stencil buffers Y-Tiled

2018-07-19 Thread Kenneth Graunke

On Tuesday, July 17, 2018 10:45:28 AM PDT Nanley Chery wrote:
> On Tue, Jul 17, 2018 at 08:19:30AM -0700, Kenneth Graunke wrote:
> > Personally, I'd be inclined to simply make this
> > 
> >*flags &= ISL_TILING_Y0_BIT;

While I still think the above is simpler and perhaps safer, your
patches seem correct to me, and are:

Reviewed-by: Kenneth Graunke 

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] egl/android: Delete set_damage_region from egl dri vtbl

2018-07-19 Thread Harish Krupo


Eric Anholt  writes:

> Harish Krupo  writes:
>
>> Eric Anholt  writes:
>>
>>> Harish Krupo  writes:
>>>
 Hi Eric,

 Eric Anholt  writes:

> Harish Krupo  writes:
>
>> The intension of the KHR_partial_update was not to send the damage back
>> to the platform but to send the damage to the driver to ensure that the
>> following rendering could be restricted to those regions.
>> This patch removes the set_damage_region from the egl_dri vtbl and all
>> the platfrom_*.c files.
>> Then upcomming patches add a new dri2 interface for the drivers to
>> implement
>>
>> Signed-off-by: Harish Krupo 
>
> Why shouldn't the platform know about the damage region in a swap, if
> it's available?  It looks like it was successfully used for Android, and
> we should be using it for Present as well.

 From the spec [1], the damage region referred to by partial_update spec is
 the damaged part of the buffer when it is used again. The damage that the
 compositor/platform needs to know is the damage between the (n-1)th
 frame and the nth frame. Quoting from the spec:
 "   The surface damage for frame n is the difference between frame n and 
 frame
 (n-1), and represents the area that a compositor must recompose."
 This is the damage referred to by the swap_buffers_with_damage spec [2],
 whereas the partial_update damage region's objective is to restrict the 
 subsequent
 rendering operations on the back buffer, to only those regions which have 
 changed since
 that buffer was last used. This information is available as the buffer
 age. Some more information: [3].
>>>
>>> OK, let's document that in the new internal API you're adding then.
>>> Things I'd want to know as an implementer of the hook:
>>>
>>> 1) Am I guaranteed that it's called before the frame is started?
>>>
>>
>> No. When no damage region is set, the whole surface should be considered
>> damaged. As a matter of fact, the damage region is set to full surface
>> when the frame boundary is reached (i.e. swapbuffersXXX is called).
>
> The spec citation I was looking for was:
>
> If any client API commands resulting in rendering to  have been
> issued since eglSwapBuffers was last called with , or since the
> surface was created in case eglSwapBuffers has not yet been called on it,
> attempting to set the damage region will result in undefined framebuffer
> contents for the entire framebuffer.
>
> So, the driver should expect the partial damage region to be set before
> any rendering has happened, and doesn't need to worry about doing things
> right if it shows up later.
>
>>> 2) Is the behavior if the client draws outside of the partial update
>>> damage region defined?  (is it "the driver must not change pixels
>>> outside of the partial region" or "the driver might not change pixels
>>> outside of the partial region")
>>>
>>
>> If I have understood the spec correctly, then the damage regions set are
>> a hint to the driver so that it can optimize the rendering by
>> restricting the client's drawing commands to only the damaged region.
>> In the current implementation, although the damage regions are sent back
>> to the compositor instead of sending it to the driver, no issues are
>> observed with the rendered output and it passes deqp tests. This
>> supports the argument that the damages are only a hint.
>
> I was looking for documentation of this part of the spec in the method's
> comment:
>
> At all times, any client API rendering which falls outside of the damage
> region results in undefined framebuffer contents for the entire 
> framebuffer.
> It is the client's responsibility to ensure that rendering is confined to
> the current damage area.
>
>>> 3) Is the client guaranteed to fully initialize pixels in the partial
>>> update region, or might it depend on previous contents?
>>
>> If the above argument is right then it means that the client would
>> actually initialize the pixels of the full buffer but expect that the
>> driver renders only the damaged regions.
>
> The client can't initialize the full buffer, because of the spec quote
> above.
>
> The actual relevant spec quote for this is:
>
> If EGL_EXT_buffer_age is supported, the contents of the buffer inside the
> damage region may also be relied upon to contain the same content as the
> last time they were defined for the current back buffer.

Thank you, understood it. I should have read the spec better :(.
Also, generalizing Android/deqp's usage seems to be wrong. Android's
deqp passed previously even when the driver wasn't restricting the
rendering to only the damaged regions.
Should I update these in the comments section of the extension?

> It is the client's responsibility to ensure that rendering is confined to
> the current damage area.
quick question: How can the client ensure this?

Thank you
Regards

Re: [Mesa-dev] [PATCH] util/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache.

2018-07-19 Thread Eric Engestrom

On Wednesday, 2018-07-18 14:01:49 +0200, Bas Nieuwenhuizen wrote:
> radv always needs it, so just check the header instead. Also
> do not declare the function if the variable is not set, so we
> get a nice compile error instead of failing to open a device
> at runtime.

If that's the goal, why have any guards? Just #include the header,
that's your compilation error if it's missing :)

> 
> Fixes: b87ef9e606a "util: fix MSVC build issue in disk_cache.h"
> ---
>  configure.ac  | 1 +
>  meson.build   | 2 +-
>  src/util/disk_cache.h | 8 +++-
>  3 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index c946454cfae..ffb8424a07b 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -872,6 +872,7 @@ AC_HEADER_MAJOR
>  AC_CHECK_HEADER([xlocale.h], [DEFINES="$DEFINES -DHAVE_XLOCALE_H"])
>  AC_CHECK_HEADER([sys/sysctl.h], [DEFINES="$DEFINES -DHAVE_SYS_SYSCTL_H"])
>  AC_CHECK_HEADERS([endian.h])
> +AC_CHECK_HEADER([dlfcn.h], [DEFINES="$DEFINES -DHAVE_DLFCN_H"])
>  AC_CHECK_FUNC([strtof], [DEFINES="$DEFINES -DHAVE_STRTOF"])
>  AC_CHECK_FUNC([mkostemp], [DEFINES="$DEFINES -DHAVE_MKOSTEMP"])
>  AC_CHECK_FUNC([timespec_get], [DEFINES="$DEFINES -DHAVE_TIMESPEC_GET"])
> diff --git a/meson.build b/meson.build
> index e05645cbf39..86a4a4ce6da 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -960,7 +960,7 @@ elif cc.has_header_symbol('sys/mkdev.h', 'major')
>pre_args += '-DMAJOR_IN_MKDEV'
>  endif
>  
> -foreach h : ['xlocale.h', 'sys/sysctl.h', 'linux/futex.h', 'endian.h']
> +foreach h : ['xlocale.h', 'sys/sysctl.h', 'linux/futex.h', 'endian.h', 
> 'dlfcn.h']
>if cc.compiles('#include <@0@>'.format(h), name : '@0@'.format(h))
>  pre_args += '-DHAVE_@0@'.format(h.to_upper().underscorify())
>endif
> diff --git a/src/util/disk_cache.h b/src/util/disk_cache.h
> index f84840fb5ca..50bd9f41ac4 100644
> --- a/src/util/disk_cache.h
> +++ b/src/util/disk_cache.h
> @@ -24,7 +24,7 @@
>  #ifndef DISK_CACHE_H
>  #define DISK_CACHE_H
>  
> -#ifdef ENABLE_SHADER_CACHE
> +#ifdef HAVE_DLFCN_H
>  #include 
>  #endif
>  #include 
> @@ -88,10 +88,10 @@ disk_cache_format_hex_id(char *buf, const uint8_t 
> *hex_id, unsigned size)
> return buf;
>  }
>  
> +#ifdef HAVE_DLFCN_H
>  static inline bool
>  disk_cache_get_function_timestamp(void *ptr, uint32_t* timestamp)
>  {
> -#ifdef ENABLE_SHADER_CACHE
> Dl_info info;
> struct stat st;
> if (!dladdr(ptr, ) || !info.dli_fname) {
> @@ -102,10 +102,8 @@ disk_cache_get_function_timestamp(void *ptr, uint32_t* 
> timestamp)
> }
> *timestamp = st.st_mtime;
> return true;
> -#else
> -   return false;
> -#endif
>  }
> +#endif
>  
>  /* Provide inlined stub functions if the shader cache is disabled. */
>  
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107295] Access violation on glDrawArrays with count >= 2048

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107295

--- Comment #1 from Roland Scheidegger  ---
To debug this we'd need some more information.
Backtrace might be a good start but probably not sufficient.
Sample code would be great, as would be an apitrace.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gles: report maximum vertex-attrib stride to guest

2018-07-19 Thread Erik Faye-Lund


Sorry, send wrong. Please ignore.


On 19. juli 2018 17:58, Erik Faye-Lund wrote:

Similar to e387116, we also need to report this for GLES hosts.

Signed-off-by: Erik Faye-Lund 
---

Without this, there's no chance for GLES hosts to get GLES3.1 support.

  src/vrend_renderer.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/src/vrend_renderer.c b/src/vrend_renderer.c
index 9fb2f92..405993a 100644
--- a/src/vrend_renderer.c
+++ b/src/vrend_renderer.c
@@ -7396,6 +7396,9 @@ static void vrend_renderer_fill_caps_gles(uint32_t set, 
UNUSED uint32_t version,
 if (gles_ver >= 31)
glGetIntegerv(GL_SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT, 
(GLint*)>v2.shader_buffer_offset_alignment);
  
+   if (gles_ver >= 31)

+  glGetIntegerv(GL_MAX_VERTEX_ATTRIB_STRIDE, 
(GLint*)>v2.max_vertex_attrib_stride);
+
 /* Not available on GLES */
 caps->v2.texture_buffer_offset_alignment = 0;
  


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gles: report maximum vertex-attrib stride to guest

2018-07-19 Thread Erik Faye-Lund

Similar to e387116, we also need to report this for GLES hosts.

Signed-off-by: Erik Faye-Lund 
---

Without this, there's no chance for GLES hosts to get GLES3.1 support.

 src/vrend_renderer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/vrend_renderer.c b/src/vrend_renderer.c
index 9fb2f92..405993a 100644
--- a/src/vrend_renderer.c
+++ b/src/vrend_renderer.c
@@ -7396,6 +7396,9 @@ static void vrend_renderer_fill_caps_gles(uint32_t set, 
UNUSED uint32_t version,
if (gles_ver >= 31)
   glGetIntegerv(GL_SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT, 
(GLint*)>v2.shader_buffer_offset_alignment);
 
+   if (gles_ver >= 31)
+  glGetIntegerv(GL_MAX_VERTEX_ATTRIB_STRIDE, 
(GLint*)>v2.max_vertex_attrib_stride);
+
/* Not available on GLES */
caps->v2.texture_buffer_offset_alignment = 0;
 
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 1/4] intel: tools: dump: remove command execution feature

2018-07-19 Thread Rafael Antognolli

I was thinking about the patch. I didn't look deeply into the one that
removes the command execution stuff, but for the rest of the series,

Acked-by: Rafael Antognolli 

Sorry for being ambiguous :P

On Thu, Jul 19, 2018 at 10:12:57AM +0100, Lionel Landwerlin wrote:
> Was that for the whole series, or just this patch? :)
> 
> Thanks,
> 
> -
> Lionel
> 
> On 18/07/18 21:42, Jason Ekstrand wrote:
> 
> Very sketchily
> 
> Reviewed-by: Jason Ekstrand 
> 
> On Wed, Jul 18, 2018 at 10:21 AM Lionel Landwerlin <
> lionel.g.landwer...@intel.com> wrote:
> 
> In commit 86cb05a6d35a52 ("intel: aubinator: remove standard input
> processing option") we removed the ability to process aub as an input
> stream because we're now rely on mmapping the aub file to back the
> buffers aubinator is parsing.
> 
> intel_aubdump was the provider of the standard input data and since
> we've copied/reworked intel_aubdump into intel_dump_gpu within Mesa,
> we don't need that code anymore.
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/intel_dump_gpu.c  | 121 
> +++---
>  src/intel/tools/intel_dump_gpu.in |  27 +--
>  2 files changed, 29 insertions(+), 119 deletions(-)
> 
> diff --git a/src/intel/tools/intel_dump_gpu.c b/src/intel/tools/
> intel_dump_gpu.c
> index 6d2c4b7f983..5fd2c8ea723 100644
> --- a/src/intel/tools/intel_dump_gpu.c
> +++ b/src/intel/tools/intel_dump_gpu.c
> @@ -53,8 +53,8 @@ static int (*libc_close)(int fd) = 
> close_init_helper;
>  static int (*libc_ioctl)(int fd, unsigned long request, ...) =
> ioctl_init_helper;
> 
>  static int drm_fd = -1;
> -static char *filename = NULL;
> -static FILE *files[2] = { NULL, NULL };
> +static char *output_filename = NULL;
> +static FILE *output_file = NULL;
>  static int verbose = 0;
>  static bool device_override;
> 
> @@ -111,7 +111,7 @@ align_u32(uint32_t v, uint32_t a)
> 
>  static struct gen_device_info devinfo = {0};
>  static uint32_t device;
> -static struct aub_file aubs[2];
> +static struct aub_file aub_file;
> 
>  static void *
>  relocate_bo(struct bo *bo, const struct drm_i915_gem_execbuffer2
> *execbuffer2,
> @@ -205,28 +205,21 @@ dump_execbuffer2(int fd, struct
> drm_i915_gem_execbuffer2 *execbuffer2)
>fail_if(!gen_get_device_info(device, ),
>"failed to identify chipset=0x%x\n", device);
> 
> -  for (int i = 0; i < ARRAY_SIZE(files); i++) {
> - if (files[i] != NULL) {
> -aub_file_init([i], files[i], device);
> -if (verbose == 2)
> -   aubs[i].verbose_log_file = stdout;
> -aub_write_header([i], 
> program_invocation_short_name);
> - }
> -  }
> +  aub_file_init(_file, output_file, device);
> +  if (verbose == 2)
> + aub_file.verbose_log_file = stdout;
> +  aub_write_header(_file, program_invocation_short_name);
> 
>if (verbose)
>   printf("[intel_aubdump: running, "
>  "output file %s, chipset id 0x%04x, gen %d]\n",
> -filename, device, devinfo.gen);
> +output_filename, device, devinfo.gen);
> }
> 
> -   /* Any aub */
> -   struct aub_file *any_aub = files[0] ? [0] : [1];;
> -
> -   if (aub_use_execlists(any_aub))
> +   if (aub_use_execlists(_file))
>offset = 0x1000;
> else
> -  offset = aub_gtt_size(any_aub);
> +  offset = aub_gtt_size(_file);
> 
> if (verbose)
>printf("Dumping execbuffer2:\n");
> @@ -263,13 +256,8 @@ dump_execbuffer2(int fd, struct
> drm_i915_gem_execbuffer2 *execbuffer2)
>   bo->map = gem_mmap(fd, obj->handle, 0, bo->size);
>fail_if(bo->map == MAP_FAILED, "intel_aubdump: bo mmap failed\
> n");
> 
> -  for (int i = 0; i < ARRAY_SIZE(files); i++) {
> - if (files[i] == NULL)
> -continue;
> -
> - if (aub_use_execlists([i]))
> -aub_map_ppgtt([i], bo->offset, bo->size);
> -  }
> +  if (aub_use_execlists(_file))
> + aub_map_ppgtt(_file, bo->offset, bo->size);
> }
> 
> batch_index = (execbuffer2->flags & I915_EXEC_BATCH_FIRST) ? 0 :
> @@ -284,30 +272,21 @@ dump_execbuffer2(int fd, struct
> drm_i915_gem_execbuffer2 *execbuffer2)
>else
>   data = bo->map;
> 
> -  for (int i = 0; i <

Re: [Mesa-dev] [PATCH v2 0/4] Android kms_swrast support

2018-07-19 Thread Robert Foss


Hey Rob,

On 2018-07-19 09:26, Tomasz Figa wrote:

On Thu, Jul 19, 2018 at 12:08 AM Robert Foss  wrote:


Hey Rob,

On 2018-07-18 15:30, Rob Herring wrote:

On Tue, Jul 17, 2018 at 4:33 AM Robert Foss  wrote:


This series implements kms_swrast support for the Android
platform. And since having to debug a null pointer dereference,
simplify that process for the next guy.


So is this working for you now?


I'm seeing page-flips happen in the logs, but have no graphical output on the
Qemu-based setup I'm using now.

When using virgl I'm seeing the same page-flipping in the logs, but no graphical
output.




As it stands now, any kernel must have the following ioctls flagged with
DRM_RENDER_ALLOW[1], which isn't the case in the mainline kernel.

DRM_IOCTL_MODE_CREATE_DUMB
DRM_IOCTL_MODE_MAP_DUMB


Ah, sorry. I should have mentioned this. We have discussed this issue
in the past, but to no further conclusion.

But as I recall, I thought the issue was also allowing import and
export of dumb buffers?


Yeah, it's a two-parter for any AOSP Treble build.
1) Allow dumb buffer ioctls fom render nodes
2) Support moving buffers across processes.


Wouldn't 2) be automatically solved by 1), since we should be able to
run drmPrimeHandleToFD for dumb buffers already?



I thought, perhaps wrongly that drmPrimeHandleToFD was only applicable to 
dmabufs.

I think I've misunderstood the restrictions of dumb buffers, if they're 
shareable across processes using drmPrimeHandleToFD then only 1) is in our way.







While it would be possible to open a non-render node to pass the
authentication check, this would still cause authentication issues
when the /dev/dri/cardX node needs to be opened as master by both mesa
and the compositor.


Right. We've pretty much stripped the support that was there out. Plus
I don't think it will work with Treble.


I don't know how acceptable this series is for upstreaming, while relying on
a non-mainline kernel. I think the policy is to not accept changes that
don't have both a user and kernel space solution in place.

Like I noted yesterday[2] the alternative to using dumb buffers and having
authentication issues is using VGEM, which is new territory to me, and it would
take me a little bit of time to figure exactly how it fits into the current
kms_swrast approach.
Input, like noted before, is very much welcome.


I'm very much in favor of the former approach. VGEM seems like an
overly complicated solution when there's a very simple solution.



The former solution being what we have now, dumb buffers?
I don't think dumb buffers are a viable path due to 2) listed above.


I don't understand what 2) is about. Could you elaborate on it?


See above!



I'd personally be for dropping those strange restrictions from render
nodes. I don't see why a render node couldn't allocate and map a dumb
buffer (for software rendering) and share it with another process that
opened a control node (to display it).


From my understanding the wider communitys idea is to minimize the use of dumb 
buffers.
A part of not allowing render nodes to use map dumb buffers is meant to 
incentivize proprietary drivers to not do the simplest thing that could possibly 
work, as far as I understand.


So while I'm happy to push that change upstream, if for no other reason than to 
generate a dialogue, maybe it's not all that likely that it will be accepted.






If there are any other options I'm not aware of, I'm very much listening.


One could just call mmap() on DMA-buf FDs directly rather than
importing them, but that could open another can of worms, because FDs
don't give us any way to deduplicate buffers (you might be given
several FDs pointing to the same buffer, which in case of importing to
DRM would end up with the same GEM handle every time).



So mmap()ing dmabuf FDs dealing with that can of worms is preferable to looking 
into a VGEM based approach?


Rob Herring, presumably rightly, seems to think going down the VGEM route is 
opening a pretty bad can of worms too.



Rob.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107295] Access violation on glDrawArrays with count >= 2048

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107295

Bug ID: 107295
   Summary: Access violation on glDrawArrays with count >= 2048
   Product: Mesa
   Version: 18.0
  Hardware: x86 (IA32)
OS: Windows (All)
Status: NEW
  Severity: major
  Priority: medium
 Component: Drivers/Gallium/llvmpipe
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: p...@binksoftware.nl
QA Contact: mesa-dev@lists.freedesktop.org

I found this issue first on a 10.x version. The issue still remains in version
18.0.

I use an OpenGL 3.3 Core context and a set of shaders that converts points to
simple flat shaded cones using a geometry shader given the following:

* Position, normal and color as vertex attributes
* Base radius and length as uniforms

The cones are made out of 8 sides (16 triangles total including the base).

Calling glDrawArrays(GL_POINTS, 0, 3000) will cause an access violation in the
DLL (opengl32). All points are stored into a single buffer object described by
a VAO. Further testing shows that the maximum count that will not crash is
2047. Splitting the call works fine, i.e.:

glDrawArrays(GL_POINTS, 0, 1500)
glDrawArrays(GL_POINTS, 1500, 1500)

According to llvmpipe, the maximum recommended vertex count for draw range
elements is 3000 (there's no specific value for just glDrawArrays). This is
just a recommended value, where larger values may only decrease performance,
but it's already crashing way before 3000 is reached.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 2/2] intel/fs: Improve liveness range calculation for partial writes

2018-07-19 Thread Jose Maria Casanova Crespo

We use the information of the registers read/write patterns
to improve variable liveness analysis avoiding extending the
liveness range of a variable to the beginning of the block so
it always reaches the beginning of the shader.

This optimization analyses inside each block if a partial write
defines completely the bytes used by a following instruction
in the block. So we are not in the case of the use of an undefined
value in the block.

This avoids almost all the spilling that happens with 8bit/16bit
storage tests, without any compilation performance impact for shader-db
execution that is compensated by spilling reductions.

At this moment we don't extend the logic to intra-block calculations
of livein/liveout to not hurt performance on the general case because of
not taking advance of BITWORD operations.

The execution time for running dEQP-VK.*8bit_storage.* tests is reduced
from 7m27.966s to 13.015s.

shader-bd on SKL shows improvements reducing spilling on
deus-ex-mankind-divided and dophin without increasing execution time.

total instructions in shared programs: 14867218 -> 14863959 (-0.02%)
instructions in affected programs: 121570 -> 118311 (-2.68%)
helped: 38
HURT: 0

total cycles in shared programs: 537923248 -> 537720965 (-0.04%)
cycles in affected programs: 63154229 -> 62951946 (-0.32%)
helped: 61
HURT: 26

total loops in shared programs: 4828 -> 4828 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 7790 -> 7375 (-5.33%)
spills in affected programs: 2824 -> 2409 (-14.70%)
helped: 35
HURT: 0

total fills in shared programs: 10557 -> 10024 (-5.05%)
fills in affected programs: 3752 -> 3219 (-14.21%)
helped: 38
HURT: 0

v2: - Use functions dst_write_pattern and src_read_pattern
  introduced in previous patch at v2.
- Avoid calculating read_pattern if defpartial is 0

Cc: Francisco Jerez 
---
 src/intel/compiler/brw_fs_live_variables.cpp | 61 
 src/intel/compiler/brw_fs_live_variables.h   | 13 -
 2 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
b/src/intel/compiler/brw_fs_live_variables.cpp
index 059f076fa51..d3559e3114f 100644
--- a/src/intel/compiler/brw_fs_live_variables.cpp
+++ b/src/intel/compiler/brw_fs_live_variables.cpp
@@ -54,9 +54,9 @@ using namespace brw;
 
 void
 fs_live_variables::setup_one_read(struct block_data *bd, fs_inst *inst,
-  int ip, const fs_reg )
+  int ip, int src, int reg_offset)
 {
-   int var = var_from_reg(reg);
+   int var = var_from_reg(inst->src[src]) + reg_offset;
assert(var < num_vars);
 
start[var] = MIN2(start[var], ip);
@@ -64,31 +64,48 @@ fs_live_variables::setup_one_read(struct block_data *bd, 
fs_inst *inst,
 
/* The use[] bitset marks when the block makes use of a variable (VGRF
 * channel) without having completely defined that variable within the
-* block.
+* block. We take into account that a partial write could have defined
+* completely the read bytes in the block.
 */
-   if (!BITSET_TEST(bd->def, var))
-  BITSET_SET(bd->use, var);
+   if (!BITSET_TEST(bd->def, var)) {
+  if (!bd->defpartial[var]) {
+ BITSET_SET(bd->use, var);
+  } else {
+ unsigned read_pattern = inst->src_read_pattern(src, reg_offset);
+ if ((bd->defpartial[var] & read_pattern) != read_pattern)
+BITSET_SET(bd->use, var);
+  }
+   }
 }
 
 void
 fs_live_variables::setup_one_write(struct block_data *bd, fs_inst *inst,
-   int ip, const fs_reg )
+   int ip, int reg_offset)
 {
-   int var = var_from_reg(reg);
+   int var = var_from_reg(inst->dst) + reg_offset;
assert(var < num_vars);
 
start[var] = MIN2(start[var], ip);
end[var] = MAX2(end[var], ip);
 
/* The def[] bitset marks when an initialization in a block completely
-* screens off previous updates of that variable (VGRF channel).
+* screens off previous updates of that variable (VGRF channel). If
+* we have a partial write now we store the write pattern so next
+* reads in the block can check if what they read was completelly screened
+* of by this partial write.
 */
-   if (inst->dst.file == VGRF) {
-  if (!inst->is_partial_write() && !BITSET_TEST(bd->use, var))
+   assert(inst->dst.file == VGRF);
+   if(!BITSET_TEST(bd->use, var)) {
+  if (!inst->is_partial_write()) {
  BITSET_SET(bd->def, var);
-
-  BITSET_SET(bd->defout, var);
+ bd->defpartial[var] = ~0u;
+  } else {
+ bd->defpartial[var] |= inst->dst_write_pattern(reg_offset);
+ if (bd->defpartial[var] == ~0u)
+BITSET_SET(bd->def, var);
+  }
}
+   BITSET_SET(bd->defout, var);
 }
 
 /**
@@ -115,14 +132,9 @@ fs_live_variables::setup_def_use()
   foreach_inst_in_block(fs_inst, inst, block) {
 /* Set use[] for this

[Mesa-dev] [PATCH v2 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-07-19 Thread Jose Maria Casanova Crespo

These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.

The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.

The motivation of this functions if to know the read/written bytes
of the instructions to improve the liveness analysis for partial read/writes.

We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
parameter they have a different read pattern.

v2: (Francisco Jerez)
- Split original register_byte_use_pattern into one read and other
  write.
- Check for send like instructions using this->mlen != 0
- Pass functions src number and offset.
- Use periodic_mask function with code written by Francisco Jerez
  to simplify pattern generation.
- Avoid breaking silently if source straddles multiple GRFs.

Cc: Francisco Jerez 
---
 src/intel/compiler/brw_fs.cpp  | 87 ++
 src/intel/compiler/brw_ir_fs.h |  2 +
 2 files changed, 89 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 7ddbd285fe2..d06b057cdbf 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -687,6 +687,93 @@ fs_inst::is_partial_write() const
this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns a periodic mask that is repeated "count" times with a "step"
+ * size and consecutive "bits" finally shifted "offset" bits to the left.
+ *
+ * This helper is used to calculate the representations of byte read/write
+ * register patterns
+ *
+ * Example: periodic_mask(8, 4, 2, 0)  would return 0x
+ *  periodic_mask(8, 4, 2, 2)  would return 0x
+ *  periodic_masc(8, 2, 2, 16) would return 0x
+ */
+static inline uint32_t
+periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset)
+{
+   uint32_t m = (count ? (1 << bits) - 1 : 0);
+   const unsigned max = MIN2(count * step, REG_SIZE);
+
+   for (unsigned shift = step; shift < max; shift *= 2)
+  m |= m << shift;
+
+   assert(offset + max - (step - bits) <= REG_SIZE);
+
+   return m << offset;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been written by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and the
+ * stride of the destination register.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are defined for operations that don't write a full register. So we
+ * we can identify in live range variable analysis if a partial write has
+ * completelly defined the data used by a partial read.
+ */
+unsigned
+fs_inst::dst_write_pattern(unsigned reg_offset) const
+{
+   assert(this->dst.file == VGRF);
+   /* We don't know what is written so we return the worst case */
+   if (this->predicate && this->opcode != BRW_OPCODE_SEL)
+  return 0u;
+   /* We assume that send destinations are completelly defined */
+   if (this->mlen > 0)
+  return ~0u;
+
+   return periodic_mask(this->exec_size,
+this->dst.stride * type_sz(this->dst.type),
+type_sz(this->dst.type),
+this->dst.offset % REG_SIZE);
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been read by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and stride of
+ * a source register and a register offset.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are used for operations that don't read a full register.
+ */
+unsigned
+fs_inst::src_read_pattern(int i, unsigned reg_offset) const
+{
+   assert(src[i].file == VGRF);
+   /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
+* so the read pattern depends on the bitsize stored at src[4].
+*/
+   if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && i == 1)
+  return periodic_mask(8, 4, this->src[4].ud / 8, 0);
+
+   /* As for byte_scattered_write_logical but we need to take into account
+* that data written in the payload(src[0]) are now on reg_offset 1 on SIMD8
+* and reg_offset 2 and 3 on SIMD16.
+*/
+   if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE && i == 0) {
+  if (DIV_ROUND_UP(reg_offset, (this->exec_size / 8)) == 1)
+ return periodic_mask(8, 4, this->src[2].ud / 8, 0);
+   }
+
+   return periodic_mask(this->exec_size,
+this->src[i].stride * type_sz(this->src[i].type),
+type_sz(this->src[i].type),
+this->src[i].offset % REG_SIZE);
+}
+
 unsigned
 fs_inst::components_read(unsigned i) const
 {
diff --git

Re: [Mesa-dev] [PATCH v2 4/4] intel: tools: dump: trace memory writes

2018-07-19 Thread Rafael Antognolli

On Thu, Jul 19, 2018 at 10:14:32AM +0100, Lionel Landwerlin wrote:
> On 18/07/18 21:58, Rafael Antognolli wrote:
> > On Wed, Jul 18, 2018 at 06:21:32PM +0100, Lionel Landwerlin wrote:
> > > Signed-off-by: Lionel Landwerlin 
> > > ---
> > >   src/intel/tools/aub_write.c | 45 ++---
> > >   1 file changed, 32 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/src/intel/tools/aub_write.c b/src/intel/tools/aub_write.c
> > > index de4ce33..9c140553542 100644
> > > --- a/src/intel/tools/aub_write.c
> > > +++ b/src/intel/tools/aub_write.c
> > > @@ -313,10 +313,17 @@ dword_out(struct aub_file *aub, uint32_t data)
> > >   static void
> > >   mem_trace_memory_write_header_out(struct aub_file *aub, uint64_t addr,
> > > -  uint32_t len, uint32_t addr_space)
> > > +  uint32_t len, uint32_t addr_space,
> > > +  const char *desc)
> > Looks like you are not using desc anywhere...
> > 
> > Other than that, things look good.
> 
> Duh! Fixed locally.
> Counts as Rb?

Yeah, sure :)

> Thanks,
> 
> -
> Lionel
> 
> > 
> > >   {
> > >  uint32_t dwords = ALIGN(len, sizeof(uint32_t)) / sizeof(uint32_t);
> > > +   if (aub->verbose_log_file) {
> > > +  fprintf(aub->verbose_log_file,
> > > +  "  MEM WRITE (0x%016" PRIx64 "-0x%016" PRIx64 ")\n",
> > > +  addr, addr + len);
> > > +   }
> > > +
> > >  dword_out(aub, CMD_MEM_TRACE_MEMORY_WRITE | (5 + dwords - 1));
> > >  dword_out(aub, addr & 0x);   /* addr lo */
> > >  dword_out(aub, addr >> 32);   /* addr hi */
> > > @@ -387,7 +394,8 @@ populate_ppgtt_table(struct aub_file *aub, struct 
> > > aub_ppgtt_table *table,
> > > uint64_t write_size = (dirty_end - dirty_start + 1) *
> > >sizeof(uint64_t);
> > > mem_trace_memory_write_header_out(aub, write_addr, write_size,
> > > -
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL);
> > > +
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL,
> > > +"PPGTT update");
> > > data_out(aub, entries + dirty_start, write_size);
> > >  }
> > >   }
> > > @@ -476,7 +484,8 @@ write_execlists_header(struct aub_file *aub, const 
> > > char *name)
> > >  mem_trace_memory_write_header_out(aub, STATIC_GGTT_MAP_START >> 12,
> > >ggtt_ptes * GEN8_PTE_SIZE,
> > > - 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT_ENTRY);
> > > + 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT_ENTRY,
> > > + "GGTT PT");
> > >  for (uint32_t i = 0; i < ggtt_ptes; i++) {
> > > dword_out(aub, 1 + 0x1000 * i + STATIC_GGTT_MAP_START);
> > > dword_out(aub, 0);
> > > @@ -484,7 +493,8 @@ write_execlists_header(struct aub_file *aub, const 
> > > char *name)
> > >  /* RENDER_RING */
> > >  mem_trace_memory_write_header_out(aub, RENDER_RING_ADDR, RING_SIZE,
> > > - 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
> > > + 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
> > > + "RENDER RING");
> > >  for (uint32_t i = 0; i < RING_SIZE; i += sizeof(uint32_t))
> > > dword_out(aub, 0);
> > > @@ -492,7 +502,8 @@ write_execlists_header(struct aub_file *aub, const 
> > > char *name)
> > >  mem_trace_memory_write_header_out(aub, RENDER_CONTEXT_ADDR,
> > >PPHWSP_SIZE +
> > >sizeof(render_context_init),
> > > - 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
> > > + 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
> > > + "RENDER PPHWSP");
> > >  for (uint32_t i = 0; i < PPHWSP_SIZE; i += sizeof(uint32_t))
> > > dword_out(aub, 0);
> > > @@ -501,7 +512,8 @@ write_execlists_header(struct aub_file *aub, const 
> > > char *name)
> > >  /* BLITTER_RING */
> > >  mem_trace_memory_write_header_out(aub, BLITTER_RING_ADDR, RING_SIZE,
> > > - 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
> > > + 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
> > > + "BLITTER RING");
> > >  for (uint32_t i = 0; i < RING_SIZE; i += sizeof(uint32_t))
> > > dword_out(aub, 0);
> > > @@ -509,7 +521,8 @@ write_execlists_header(struct aub_file *aub, const 
> > > char *name)
> > >  mem_trace_memory_write_header_out(aub, BLITTER_CONTEXT_ADDR,
> > >PPHWSP_SIZE +
> > >

[Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-19 Thread Chema Casanova

El 14/07/18 a las 00:14, Francisco Jerez escribió:
> Jose Maria Casanova Crespo  writes:
> 
>> For a register source/destination of an instruction the function returns
>> the read/write byte pattern of a 32-byte registers as a unsigned int.
>>
>> The returned pattern takes into account the exec_size of the instruction,
>> the type bitsize, the stride and if the register is source or destination.
>>
>> The objective of the functions if to help to know the read/written bytes
>> of the instructions to improve the liveness analysis for partial read/writes.
>>
>> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
>> parameter they have a different read pattern.
>> ---
>>  src/intel/compiler/brw_fs.cpp  | 183 +
>>  src/intel/compiler/brw_ir_fs.h |   1 +
>>  2 files changed, 184 insertions(+)
>>
>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
>> index 2b8363ca362..f3045c4ff6c 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
>> this->dst.offset % REG_SIZE != 0);
>>  }
>>  
>> +/**
>> + * Returns a 32-bit uint whose bits represent if the associated register 
>> byte
>> + * has been read/written by the instruction. The returned pattern takes into
>> + * account the exec_size of the instruction, the type bitsize and the 
>> register
>> + * stride and the register is source or destination for the instruction.
>> + *
>> + * The objective of this function is to identify which parts of the register
>> + * are read or written for operations that don't read/write a full register.
>> + * So we can identify in live range variable analysis if a partial write has
>> + * completelly defined the part of the register used by a partial read. So 
>> we
>> + * avoid extending the liveness range because all data read was already
>> + * defined although the wasn't completely written.
>> + */
>> +unsigned
>> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const
>> +{
>> +   if (is_dst) {

> Please split into two functions (like fs_inst::src_read and
> ::src_written) since that would make the call-sites of this method more
> self-documenting than a boolean parameter.  You should be able to share
> code by refactoring the common logic into a separate function (see below
> for some suggestions on how that could be achieved).

Sure, it would improve readability and simplifies the logic, I've chosen
dst_write_pattern and src_read_pattern.

> 
>> +  /* We don't know what is written so we return the worts case */
> 
> "worst"

Fixed.

>> +  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
>> + return 0;
>> +  /* We assume that send destinations are completely written */
>> +  if (this->is_send_from_grf())
>> + return ~0u;
> 
> Some send-like instructions won't be caught by this condition, you
> should check for this->mlen != 0 in addition.

Would it be enough to check for (this->mlen > 0) and forget about
is_send_from_grf? I am using this approach in v2 I am sending.

>> +   } else {
>> +  /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
>> +   * so the read pattern depends on the bitsize stored at src[4]
>> +   */
>> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
>> +  this->src[1].nr == r.nr) {

> I feel uncomfortable about attempting to guess the source the caller is
> referring to by comparing the registers for equality.  E.g.  you could
> potentially end up with two sources that compare equal but have
> different semantics (e.g. as a result of CSE) which might cause it to
> get the wrong answer.  It would probably be better to pass a source
> index and a byte offset as argument instead of an fs_reg.

I've didn't thought about CSE, I'm now receiving the number of source
and the reg_offset. I'm using reg_offset instead of byte offsets as it
simplifies the logic. Now we are using always the base src register to
do all the calculation
>> + switch (this->src[4].ud) {
>> + case 32:
>> +return ~0u;
>> + case 16:
>> +return 0x;
>> + case 8:
>> +return 0x;
>> + default:
>> +unreachable("Unsupported bitsize at 
>> byte_scattered_write_logical");
>> + }
> 
> Replace the above switch statement with a call to "periodic_mask(8, 4,
> this->src[4].ud / 8)" (see below for the definition).

Ok.

>> +  }
>> +  /* As for byte_scattered_write_logical but we need to take into 
>> account
>> +   * that data written are in the payload offset 32 with SIMD8 and 
>> offset
>> +   * 64 with SIMD16.
>> +   */
>> +  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE &&
>> +  this->src[0].nr == r.nr) {
>> + fs_reg payload =

Re: [Mesa-dev] [PATCH] gm107/ir: use CS2R for SV_CLOCK

2018-07-19 Thread Karol Herbst

so yeah, I am a bit blind today as this was already handled in the
patch. Moving the SvSemantic check inside something like
isCS2RSV(SvSemantic) might make it simplier for future system values
to be used with cs2r, but not really required right now. In either
case: Reviewed-by: Karol Herbst 

On Thu, Jul 19, 2018 at 5:28 PM, Karol Herbst  wrote:
> playing a bout around with nvdisasm, there seems to be some
> complications with certain sched opcodes. I think we should figure out
> if this is simply nvdisasm crashing or if this is a real hardware
> thing.
>
> On Tue, Jul 17, 2018 at 9:26 PM, Rhys Perry  wrote:
>> After some testing and looking at traces of the blob or nvcc output,
>> it seems the only system value CS2R is useful for is SV_CLOCK.
>>
>> On Tue, Jul 17, 2018 at 1:09 PM, Karol Herbst  wrote:
>>> that seems like a good enough improvement. I think looking onto other
>>> sysvals would be worthwhile as SV_CLOCK isn't used that often. The
>>> invocation ID and related ones would be interesting to look into as
>>> they are much more common.
>>>
>>> On Tue, Jul 17, 2018 at 1:59 PM, Rhys Perry  
>>> wrote:
 I'm getting ~28 cycles for the S2R and ~6 cycles (unsurprisingly) for the 
 CS2R.

 nvcc with SM30 seems to use the same instruction as the nvc0 emission code.

 The SV_LANE* system values don't work with CS2R and I haven't looked
 too deeply into the others.

 On Tue, Jul 17, 2018 at 12:13 PM, Karol Herbst  wrote:
> interesting, do you have some numbers on that? Wondering if we could
> switch more sysvals over to it and what about older gens?
>
> On Tue, Jul 17, 2018 at 12:46 PM, Rhys Perry  
> wrote:
>> This instruction seems to be faster than S2R and requires no barrier,
>> though the range of special registers it can read from is limited.
>>
>> Signed-off-by: Rhys Perry 
>> ---
>>  src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 14 
>> +-
>>  .../drivers/nouveau/codegen/nv50_ir_target_gm107.cpp   |  4 +++-
>>  2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
>> index 694d1b10a3..c306a4680b 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
>> @@ -124,6 +124,7 @@ private:
>>
>> void emitMOV();
>> void emitS2R();
>> +   void emitCS2R();
>> void emitF2F();
>> void emitF2I();
>> void emitI2F();
>> @@ -749,6 +750,14 @@ CodeEmitterGM107::emitS2R()
>> emitGPR (0x00, insn->def(0));
>>  }
>>
>> +void
>> +CodeEmitterGM107::emitCS2R()
>> +{
>> +   emitInsn(0x50c8);
>> +   emitSYS (0x14, insn->src(0));
>> +   emitGPR (0x00, insn->def(0));
>> +}
>> +
>>  void
>>  CodeEmitterGM107::emitF2F()
>>  {
>> @@ -3192,7 +3201,10 @@ CodeEmitterGM107::emitInstruction(Instruction *i)
>>emitMOV();
>>break;
>> case OP_RDSV:
>> -  emitS2R();
>> +  if (insn->getSrc(0)->reg.data.id == SV_CLOCK)
>> + emitCS2R();
>> +  else
>> + emitS2R();
>>break;
>> case OP_ABS:
>> case OP_NEG:
>> diff --git 
>> a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
>> index 04cbd402a1..009470fb93 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
>> @@ -153,9 +153,10 @@ TargetGM107::isBarrierRequired(const Instruction 
>> *insn) const
>>case OP_AFETCH:
>>case OP_PFETCH:
>>case OP_PIXLD:
>> -  case OP_RDSV:
>>case OP_SHFL:
>>   return true;
>> +  case OP_RDSV:
>> + return insn->getSrc(0)->reg.data.id != SV_CLOCK;
>>default:
>>   break;
>>}
>> @@ -229,6 +230,7 @@ TargetGM107::getLatency(const Instruction *insn) 
>> const
>> case OP_SUB:
>> case OP_VOTE:
>> case OP_XOR:
>> +   case OP_RDSV:
>>if (insn->dType != TYPE_F64)
>>   return 6;
>>break;
>> --
>> 2.14.4
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gm107/ir: use CS2R for SV_CLOCK

2018-07-19 Thread Karol Herbst

playing a bout around with nvdisasm, there seems to be some
complications with certain sched opcodes. I think we should figure out
if this is simply nvdisasm crashing or if this is a real hardware
thing.

On Tue, Jul 17, 2018 at 9:26 PM, Rhys Perry  wrote:
> After some testing and looking at traces of the blob or nvcc output,
> it seems the only system value CS2R is useful for is SV_CLOCK.
>
> On Tue, Jul 17, 2018 at 1:09 PM, Karol Herbst  wrote:
>> that seems like a good enough improvement. I think looking onto other
>> sysvals would be worthwhile as SV_CLOCK isn't used that often. The
>> invocation ID and related ones would be interesting to look into as
>> they are much more common.
>>
>> On Tue, Jul 17, 2018 at 1:59 PM, Rhys Perry  wrote:
>>> I'm getting ~28 cycles for the S2R and ~6 cycles (unsurprisingly) for the 
>>> CS2R.
>>>
>>> nvcc with SM30 seems to use the same instruction as the nvc0 emission code.
>>>
>>> The SV_LANE* system values don't work with CS2R and I haven't looked
>>> too deeply into the others.
>>>
>>> On Tue, Jul 17, 2018 at 12:13 PM, Karol Herbst  wrote:
 interesting, do you have some numbers on that? Wondering if we could
 switch more sysvals over to it and what about older gens?

 On Tue, Jul 17, 2018 at 12:46 PM, Rhys Perry  
 wrote:
> This instruction seems to be faster than S2R and requires no barrier,
> though the range of special registers it can read from is limited.
>
> Signed-off-by: Rhys Perry 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 14 
> +-
>  .../drivers/nouveau/codegen/nv50_ir_target_gm107.cpp   |  4 +++-
>  2 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> index 694d1b10a3..c306a4680b 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
> @@ -124,6 +124,7 @@ private:
>
> void emitMOV();
> void emitS2R();
> +   void emitCS2R();
> void emitF2F();
> void emitF2I();
> void emitI2F();
> @@ -749,6 +750,14 @@ CodeEmitterGM107::emitS2R()
> emitGPR (0x00, insn->def(0));
>  }
>
> +void
> +CodeEmitterGM107::emitCS2R()
> +{
> +   emitInsn(0x50c8);
> +   emitSYS (0x14, insn->src(0));
> +   emitGPR (0x00, insn->def(0));
> +}
> +
>  void
>  CodeEmitterGM107::emitF2F()
>  {
> @@ -3192,7 +3201,10 @@ CodeEmitterGM107::emitInstruction(Instruction *i)
>emitMOV();
>break;
> case OP_RDSV:
> -  emitS2R();
> +  if (insn->getSrc(0)->reg.data.id == SV_CLOCK)
> + emitCS2R();
> +  else
> + emitS2R();
>break;
> case OP_ABS:
> case OP_NEG:
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> index 04cbd402a1..009470fb93 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> @@ -153,9 +153,10 @@ TargetGM107::isBarrierRequired(const Instruction 
> *insn) const
>case OP_AFETCH:
>case OP_PFETCH:
>case OP_PIXLD:
> -  case OP_RDSV:
>case OP_SHFL:
>   return true;
> +  case OP_RDSV:
> + return insn->getSrc(0)->reg.data.id != SV_CLOCK;
>default:
>   break;
>}
> @@ -229,6 +230,7 @@ TargetGM107::getLatency(const Instruction *insn) const
> case OP_SUB:
> case OP_VOTE:
> case OP_XOR:
> +   case OP_RDSV:
>if (insn->dType != TYPE_F64)
>   return 6;
>break;
> --
> 2.14.4
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107283] Dispatch sanity tests broken

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107283

Mark Janes  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Mark Janes  ---
fixed by dcbcc83003f397e207873bcc379e1f5bb24e3d11

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 77449] Tracker bug for all bugs related to Steam titles

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=77449
Bug 77449 depends on bug 98848, which changed state.

Bug 98848 Summary: Master of Orion flickering / window destroy
https://bugs.freedesktop.org/show_bug.cgi?id=98848

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 106404] Hang in draw function caused by interaction of glutInitWindowPosition, glutFullScreen , glDrawBuffer(GL_FRONT)

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=106404

Michel Dänzer  changed:

   What|Removed |Added

 CC||jwy...@feralinteractive.com

--- Comment #5 from Michel Dänzer  ---
*** Bug 107290 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107290] Hang reading from GL_FRONT in fullscreen

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107290

Michel Dänzer  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Michel Dänzer  ---


*** This bug has been marked as a duplicate of bug 106404 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 100916] Multiple definition of `glwMDrawingAreaWidgetClass'

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=100916

Stefan Dirsch  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #7 from Stefan Dirsch  ---
Brian meanwhile pushed this.

commit b060a0782f09ebe4f60c8fd4564c11ba043c331f (HEAD -> master, origin/master,
origin/HEAD)
Author: Stefan Dirsch 
Date:   Tue Jul 17 12:40:00 2018 -0600

libGLw: Use newly introduced GLAPIVAR for variables

GLAPI doesn't have an 'extern' in some circumstances. This way,
variable declarations become definitions (fdo #100916).

Signed-off-by: Stefan Dirsch 
Reviewed-by: Brian Paul 

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nvc0: add missing increments to gpu_serialize_count

2018-07-19 Thread Rhys Perry

Signed-off-by: Rhys Perry 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_context.c | 4 
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 2 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 2 ++
 src/gallium/drivers/nouveau/nvc0/nve4_compute.c | 2 ++
 4 files changed, 10 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
index 2e4490b8d9..43545d3dfd 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
@@ -51,6 +51,8 @@ nvc0_texture_barrier(struct pipe_context *pipe, unsigned 
flags)
 
IMMED_NVC0(push, NVC0_3D(SERIALIZE), 0);
IMMED_NVC0(push, NVC0_3D(TEX_CACHE_CTL), 0);
+
+   NOUVEAU_DRV_STAT(nouveau_context(pipe)->screen, gpu_serialize_count, 1);
 }
 
 static void
@@ -93,6 +95,8 @@ nvc0_memory_barrier(struct pipe_context *pipe, unsigned flags)
* without that.
*/
   IMMED_NVC0(push, NVC0_3D(SERIALIZE), 0);
+
+  NOUVEAU_DRV_STAT(>screen->base, gpu_serialize_count, 1);
}
 
/* If we're going to texture from a buffer/image written by a shader, we
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index 57d98753f4..36f2ba4ba8 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -825,6 +825,8 @@ nvc0_program_upload(struct nvc0_context *nvc0, struct 
nvc0_program *prog)
   /* Make sure to synchronize before deleting the code segment. */
   IMMED_NVC0(nvc0->base.pushbuf, NVC0_3D(SERIALIZE), 0);
 
+  NOUVEAU_DRV_STAT(>base, gpu_serialize_count, 1);
+
   if ((screen->text->size << 1) <= (1 << 23)) {
  ret = nvc0_screen_resize_text_area(screen, screen->text->size << 1);
  if (ret) {
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index df5723dc37..5963fc2577 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -2613,6 +2613,8 @@ nvc0_hw_sm_end_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
  PUSH_DATA (push, (cfg->ctr[i].func << 4) | cfg->ctr[i].mode);
   }
}
+
+   NOUVEAU_DRV_STAT(>base, gpu_serialize_count, 1);
 }
 
 static inline bool
diff --git a/src/gallium/drivers/nouveau/nvc0/nve4_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nve4_compute.c
index 28460f8cbe..4bedacd9ec 100644
--- a/src/gallium/drivers/nouveau/nvc0/nve4_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nve4_compute.c
@@ -129,6 +129,8 @@ nve4_screen_compute_setup(struct nvc0_screen *screen,
   for (i = 63; i >= 0; i--)
  PUSH_DATA(push, 0x38000 | i);
   IMMED_NVC0(push, SUBC_CP(NV50_GRAPH_SERIALIZE), 0);
+
+  NOUVEAU_DRV_STAT(>base, gpu_serialize_count, 1);
}
 
BEGIN_NVC0(push, NVE4_CP(TEX_CB_INDEX), 1);
-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 106893] Potential mem leak with radv

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=106893

John  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #6 from John  ---
Of course!

But now I am quite confused, I tried with a more standard system (standard
Linux kernel and standard mesa-git packages), no difference.

I then rolled back Linux to 4.17 as it's what I used when I created this, and
moved to stable llvm 6.0.1 /Mesa 18.1.4, but still no difference.

So either it's another package I'm not thinking of, since I'm using Arch
there's an update to something every day, something that happened between Mesa
18.1 and 18.1.4 or LLVM 6.0.1, or something obvious that I'm missing. But since
it's working, oh well!

Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] docs: update calendar to match the 18.2 plan with the one announced

2018-07-19 Thread Andres Gomez

Additionally, I've extended the 18.1 cycle by one more release,
tentatively assigned to Dylan, due to the ~2 weeks delay for 18.2.

Cc: Dylan Baker 
Cc: Juan A. Suarez 
Cc: Emil Velikov 
Signed-off-by: Andres Gomez 
---
 docs/release-calendar.html | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/docs/release-calendar.html b/docs/release-calendar.html
index 7f029b5b396..8e23b7f9e05 100644
--- a/docs/release-calendar.html
+++ b/docs/release-calendar.html
@@ -39,7 +39,7 @@ if you'd like to nominate a patch in the next stable release.
 Notes
 
 
-18.1
+18.1
 2018-07-27
 18.1.5
 Dylan Baker
@@ -55,29 +55,35 @@ if you'd like to nominate a patch in the next stable 
release.
 2018-08-24
 18.1.7
 Dylan Baker
+
+
+
+2018-09-07
+18.1.8
+Dylan Baker
 Last planned 18.1.x release
 
 
 18.2
-2018-07-20
+2018-08-01
 18.2.0rc1
 Andres Gomez
 
 
 
-2018-07-27
+2018-08-08
 18.2.0rc2
 Andres Gomez
 
 
 
-2018-08-03
+2018-08-15
 18.2.0rc3
 Andres Gomez
 
 
 
-2018-08-10
+2018-08-22
 18.2.0rc4
 Andres Gomez
 Last planned RC/Final release
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] docs: correct typo in the submitting patches instructions

2018-07-19 Thread Andres Gomez

Cc: Emil Velikov 
Cc: Eric Engestrom 
Signed-off-by: Andres Gomez 
---
 docs/submittingpatches.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/submittingpatches.html b/docs/submittingpatches.html
index 178ec5f89ec..e5350bdb2cf 100644
--- a/docs/submittingpatches.html
+++ b/docs/submittingpatches.html
@@ -36,7 +36,7 @@
 perhaps, in very trivial cases.)
 Code patches should follow Mesa
 coding conventions.
-Whenever possible, patches should only effect individual Mesa/Gallium
+Whenever possible, patches should only affect individual Mesa/Gallium
 components.
 Patches should never introduce build breaks and should be bisectable (see
 git bisect.)
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] docs: move releases from Fridays to Wednesdays

2018-07-19 Thread Andres Gomez

As discussed at:
https://lists.freedesktop.org/archives/mesa-dev/2018-March/188525.html

Cc: Emil Velikov 
Cc: Juan A. Suarez Romero 
Cc: Dylan Baker 
Cc: Ian Romanick 
Cc: Carl Worth 
Cc: Mark Janes 
Signed-off-by: Andres Gomez 
---
 docs/releasing.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/releasing.html b/docs/releasing.html
index a022d0c484b..14315e7a8e4 100644
--- a/docs/releasing.html
+++ b/docs/releasing.html
@@ -54,8 +54,8 @@ For example:
 Release schedule
 
 
-Releases should happen on Fridays. Delays can occur although those should be 
keep
-to a minimum.
+Releases should happen on Wednesdays. Delays can occur although those
+should be keep to a minimum.
 
 See our calendar for the
 date and other details for individual releases.
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/8] ac: add support for 16bit buffer loads

2018-07-19 Thread Daniel Schürmann

Signed-off-by: Daniel Schürmann 
---
 src/amd/common/ac_nir_to_llvm.c | 96 ++---
 1 file changed, 53 insertions(+), 43 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index d7a52a536c..770a025c82 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1586,63 +1586,73 @@ static LLVMValueRef visit_load_buffer(struct 
ac_nir_context *ctx,
   const nir_intrinsic_instr *instr)
 {
LLVMValueRef results[2];
-   int load_components;
+   int load_bytes;
+   int elem_size_bytes = instr->dest.ssa.bit_size / 8;
int num_components = instr->num_components;
-   if (instr->dest.ssa.bit_size == 64)
-   num_components *= 2;
+   int num_bytes = num_components * elem_size_bytes;
 
-   for (int i = 0; i < num_components; i += load_components) {
-   load_components = MIN2(num_components - i, 4);
+   for (int i = 0; i < num_bytes; i += load_bytes) {
+   load_bytes = MIN2(num_bytes - i, 16);
const char *load_name;
-   LLVMTypeRef data_type = ctx->ac.f32;
-   LLVMValueRef offset = LLVMConstInt(ctx->ac.i32, i * 4, false);
-   offset = LLVMBuildAdd(ctx->ac.builder, get_src(ctx, 
instr->src[1]), offset, "");
-
-   if (load_components == 3)
-   data_type = LLVMVectorType(ctx->ac.f32, 4);
-   else if (load_components > 1)
-   data_type = LLVMVectorType(ctx->ac.f32, 
load_components);
-
-   if (load_components >= 3)
-   load_name = "llvm.amdgcn.buffer.load.v4f32";
-   else if (load_components == 2)
-   load_name = "llvm.amdgcn.buffer.load.v2f32";
-   else if (load_components == 1)
-   load_name = "llvm.amdgcn.buffer.load.f32";
-   else
-   unreachable("unhandled number of components");
-
-   LLVMValueRef params[] = {
-   ctx->abi->load_ssbo(ctx->abi,
-   get_src(ctx, instr->src[0]),
-   false),
-   ctx->ac.i32_0,
-   offset,
-   ctx->ac.i1false,
-   ctx->ac.i1false,
-   };
-
-   results[i > 0 ? 1 : 0] = ac_build_intrinsic(>ac, 
load_name, data_type, params, 5, 0);
+   LLVMTypeRef data_type;
+   LLVMValueRef offset = get_src(ctx, instr->src[1]);
+   LLVMValueRef immoffset = LLVMConstInt(ctx->ac.i32, i, false);
+   LLVMValueRef rsrc = ctx->abi->load_ssbo(ctx->abi,
+   get_src(ctx, 
instr->src[0]), false);
+   LLVMValueRef vindex = ctx->ac.i32_0;
+
+   int idx = i ? 1 : 0;
+   if (load_bytes == 2) {
+   results[idx] = ac_build_tbuffer_load_short(>ac,
+  rsrc,
+  vindex,
+  offset,
+  
ctx->ac.i32_0,
+  immoffset);
+   } else {
+   switch (load_bytes) {
+   case 16:
+   case 12:
+   load_name = "llvm.amdgcn.buffer.load.v4f32";
+   data_type = ctx->ac.v4f32;
+   break;
+   case 8:
+   case 6:
+   load_name = "llvm.amdgcn.buffer.load.v2f32";
+   data_type = ctx->ac.v2f32;
+   break;
+   case 4:
+   load_name = "llvm.amdgcn.buffer.load.f32";
+   data_type = ctx->ac.f32;
+   break;
+   default:
+   unreachable("Malformed load buffer.");
+   }
+   LLVMValueRef params[] = {
+   rsrc,
+   vindex,
+   LLVMBuildAdd(ctx->ac.builder, offset, 
immoffset, ""),
+   ctx->ac.i1false,
+   ctx->ac.i1false,
+   };
+   results[idx] = ac_build_intrinsic(>ac, load_name, 
data_type, params, 5, 0);
+   unsigned num_elems = ac_get_type_size(data_type) / 
elem_size_bytes;
+   LLVMTypeRef resTy = 
LLVMVectorType(LLVMIntType(instr->dest.ssa.bit_size),

[Mesa-dev] [PATCH 8/8] radv: enable VK_KHR_16bit_storage extension / 16bit storage features

2018-07-19 Thread Daniel Schürmann

Signed-off-by: Daniel Schürmann 
---
 src/amd/vulkan/radv_device.c  | 10 ++
 src/amd/vulkan/radv_extensions.py |  1 +
 src/amd/vulkan/radv_shader.c  |  1 +
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 80ddb65480..34159bcc5b 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -740,6 +740,7 @@ void radv_GetPhysicalDeviceFeatures2(
VkPhysicalDevicephysicalDevice,
VkPhysicalDeviceFeatures2KHR   *pFeatures)
 {
+   RADV_FROM_HANDLE(radv_physical_device, pdevice, physicalDevice);
vk_foreach_struct(ext, pFeatures->pNext) {
switch (ext->sType) {
case 
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VARIABLE_POINTER_FEATURES_KHR: {
@@ -770,10 +771,11 @@ void radv_GetPhysicalDeviceFeatures2(
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_16BIT_STORAGE_FEATURES: {
VkPhysicalDevice16BitStorageFeatures *features =
(VkPhysicalDevice16BitStorageFeatures*)ext;
-   features->storageBuffer16BitAccess = false;
-   features->uniformAndStorageBuffer16BitAccess = false;
-   features->storagePushConstant16 = false;
-   features->storageInputOutput16 = false;
+   bool enabled = HAVE_LLVM >= 0x0700 && 
pdevice->rad_info.chip_class >= VI;
+   features->storageBuffer16BitAccess = enabled;
+   features->uniformAndStorageBuffer16BitAccess = enabled;
+   features->storagePushConstant16 = enabled;
+   features->storageInputOutput16 = enabled;
break;
}
case 
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_SAMPLER_YCBCR_CONVERSION_FEATURES: {
diff --git a/src/amd/vulkan/radv_extensions.py 
b/src/amd/vulkan/radv_extensions.py
index a5fbffac33..15d29becfd 100644
--- a/src/amd/vulkan/radv_extensions.py
+++ b/src/amd/vulkan/radv_extensions.py
@@ -51,6 +51,7 @@ class Extension:
 # and dEQP-VK.api.info.device fail due to the duplicated strings.
 EXTENSIONS = [
 Extension('VK_ANDROID_native_buffer', 5, 'ANDROID && 
device->rad_info.has_syncobj_wait_for_submit'),
+Extension('VK_KHR_16bit_storage', 1, 'HAVE_LLVM >= 
0x0700'),
 Extension('VK_KHR_bind_memory2',  1, True),
 Extension('VK_KHR_create_renderpass2',1, True),
 Extension('VK_KHR_dedicated_allocation',  1, True),
diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index aac5b8a21a..634e35e1d9 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -224,6 +224,7 @@ radv_shader_compile_to_nir(struct radv_device *device,
.descriptor_array_dynamic_indexing = true,
.runtime_descriptor_array = true,
.stencil_export = true,
+   .storage_16bit = true,
},
};
entry_point = spirv_to_nir(spirv, module->size / 4,
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/8] radv: add support for 16bit input/output

2018-07-19 Thread Daniel Schürmann

Signed-off-by: Daniel Schürmann 
---
 src/amd/common/ac_nir_to_llvm.c   |  8 ++-
 src/amd/vulkan/radv_nir_to_llvm.c | 90 +--
 2 files changed, 80 insertions(+), 18 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 770a025c82..babcb9de44 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1798,6 +1798,10 @@ static LLVMValueRef load_tess_varyings(struct 
ac_nir_context *ctx,
  var->data.location_frac,
  instr->num_components,
  is_patch, is_compact, 
load_inputs);
+   if (instr->dest.ssa.bit_size == 16) {
+   result = ac_to_integer(>ac, result);
+   result = LLVMBuildTrunc(ctx->ac.builder, result, dest_type, "");
+   }
return LLVMBuildBitCast(ctx->ac.builder, result, dest_type, "");
 }
 
@@ -3798,10 +3802,12 @@ ac_handle_shader_output_decl(struct ac_llvm_context 
*ctx,
}
}
 
+   bool is_16bit = glsl_type_is_16bit(variable->type);
+   LLVMTypeRef type = is_16bit ? ctx->f16 : ctx->f32;
for (unsigned i = 0; i < attrib_count; ++i) {
for (unsigned chan = 0; chan < 4; chan++) {
abi->outputs[ac_llvm_reg_index_soa(output_loc + i, 
chan)] =
-  ac_build_alloca_undef(ctx, ctx->f32, "");
+  ac_build_alloca_undef(ctx, type, "");
}
}
 }
diff --git a/src/amd/vulkan/radv_nir_to_llvm.c 
b/src/amd/vulkan/radv_nir_to_llvm.c
index c7d772fa65..9f9dc0d4fe 100644
--- a/src/amd/vulkan/radv_nir_to_llvm.c
+++ b/src/amd/vulkan/radv_nir_to_llvm.c
@@ -1479,6 +1479,8 @@ store_tcs_output(struct ac_shader_abi *abi,
if (!(writemask & (1 << chan)))
continue;
LLVMValueRef value = ac_llvm_extract_elem(>ac, src, chan - 
component);
+   value = ac_to_integer(>ac, value);
+   value = LLVMBuildZExtOrBitCast(ctx->ac.builder, value, 
ctx->ac.i32, "");
 
if (store_lds || is_tess_factor) {
LLVMValueRef dw_addr_chan =
@@ -1575,10 +1577,13 @@ load_gs_input(struct ac_shader_abi *abi,
ctx->ac.i32_0,
vtx_offset, soffset,
0, 1, 0, true, false);
+   }
 
-   value[i] = LLVMBuildBitCast(ctx->ac.builder, value[i],
-   type, "");
+   if (ac_get_type_size(type) == 2) {
+   value[i] = LLVMBuildBitCast(ctx->ac.builder, value[i], 
ctx->ac.i32, "");
+   value[i] = LLVMBuildTrunc(ctx->ac.builder, value[i], 
ctx->ac.i16, "");
}
+   value[i] = LLVMBuildBitCast(ctx->ac.builder, value[i], type, 
"");
}
result = ac_build_varying_gather_values(>ac, value, 
num_components, component);
result = ac_to_integer(>ac, result);
@@ -1757,7 +1762,8 @@ visit_emit_vertex(struct ac_shader_abi *abi, unsigned 
stream, LLVMValueRef *addr
voffset = LLVMBuildAdd(ctx->ac.builder, voffset, 
gs_next_vertex, "");
voffset = LLVMBuildMul(ctx->ac.builder, voffset, 
LLVMConstInt(ctx->ac.i32, 4, false), "");
 
-   out_val = LLVMBuildBitCast(ctx->ac.builder, out_val, 
ctx->ac.i32, "");
+   out_val = ac_to_integer(>ac, out_val);
+   out_val = LLVMBuildZExtOrBitCast(ctx->ac.builder, 
out_val, ctx->ac.i32, "");
 
ac_build_buffer_store_dword(>ac, ctx->gsvs_ring,
out_val, 1,
@@ -1976,6 +1982,7 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
 
variable->data.driver_location = variable->data.location * 4;
 
+   enum glsl_base_type type = glsl_get_base_type(variable->type);
for (unsigned i = 0; i < attrib_count; ++i) {
LLVMValueRef output[4];
unsigned attrib_index = variable->data.location + i - 
VERT_ATTRIB_GENERIC0;
@@ -2019,14 +2026,20 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
for (unsigned chan = 0; chan < 4; chan++) {
LLVMValueRef llvm_chan = LLVMConstInt(ctx->ac.i32, 
chan, false);
output[chan] = LLVMBuildExtractElement(ctx->ac.builder, 
input, llvm_chan, "");
+   if (type == GLSL_TYPE_FLOAT16) {
+   output[chan] = 
LLVMBuildBitCast(ctx->ac.builder, output[chan], ctx->ac.f32, "");
+   output[chan] = 
LLVMBuildFPTrunc(ctx->ac.builder, output[chan], ctx->ac.f16, "");
+   }
+

[Mesa-dev] [PATCH 2/8] ac: add support for 16bit ssbo stores

2018-07-19 Thread Daniel Schürmann

Signed-off-by: Daniel Schürmann 
---
 src/amd/common/ac_nir_to_llvm.c | 144 +++-
 1 file changed, 84 insertions(+), 60 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index f46744e8ca..43a0b86420 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1406,31 +1406,24 @@ static uint32_t widen_mask(uint32_t mask, unsigned 
multiplier)
 static LLVMValueRef extract_vector_range(struct ac_llvm_context *ctx, 
LLVMValueRef src,
  unsigned start, unsigned count)
 {
-   LLVMTypeRef type = LLVMTypeOf(src);
+   LLVMValueRef mask[] = {
+   LLVMConstInt(ctx->i32, 0, false), LLVMConstInt(ctx->i32, 1, false),
+   LLVMConstInt(ctx->i32, 2, false), LLVMConstInt(ctx->i32, 3, false) };
 
-   if (LLVMGetTypeKind(type) != LLVMVectorTypeKind) {
+   unsigned src_elements = ac_get_llvm_num_components(src);
+
+   if (count == src_elements) {
assert(start == 0);
-   assert(count == 1);
return src;
+   } else if (count == 1) {
+   assert(start < src_elements);
+   return LLVMBuildExtractElement(ctx->builder, src, mask[start],  
"");
+   } else {
+   assert(start + count <= src_elements);
+   assert(count <= 4);
+   LLVMValueRef swizzle = LLVMConstVector([start], count);
+   return LLVMBuildShuffleVector(ctx->builder, src, src, swizzle, 
"");
}
-
-   unsigned src_elements = LLVMGetVectorSize(type);
-   assert(start < src_elements);
-   assert(start + count <= src_elements);
-
-   if (start == 0 && count == src_elements)
-   return src;
-
-   if (count == 1)
-   return LLVMBuildExtractElement(ctx->builder, src, 
LLVMConstInt(ctx->i32, start, false), "");
-
-   assert(count <= 8);
-   LLVMValueRef indices[8];
-   for (unsigned i = 0; i < count; ++i)
-   indices[i] = LLVMConstInt(ctx->i32, start + i, false);
-
-   LLVMValueRef swizzle = LLVMConstVector(indices, count);
-   return LLVMBuildShuffleVector(ctx->builder, src, src, swizzle, "");
 }
 
 static void visit_store_ssbo(struct ac_nir_context *ctx,
@@ -1438,33 +1431,19 @@ static void visit_store_ssbo(struct ac_nir_context *ctx,
 {
const char *store_name;
LLVMValueRef src_data = get_src(ctx, instr->src[0]);
-   LLVMTypeRef data_type = ctx->ac.f32;
-   int elem_size_mult = ac_get_elem_bits(>ac, LLVMTypeOf(src_data)) / 
32;
-   int components_32bit = elem_size_mult * instr->num_components;
+   int elem_size_bytes = ac_get_elem_bits(>ac, LLVMTypeOf(src_data)) 
/ 8;
unsigned writemask = nir_intrinsic_write_mask(instr);
-   LLVMValueRef base_data, base_offset;
-   LLVMValueRef params[6];
 
-   params[1] = ctx->abi->load_ssbo(ctx->abi,
+   LLVMValueRef rsrc = ctx->abi->load_ssbo(ctx->abi,
get_src(ctx, instr->src[1]), true);
-   params[2] = ctx->ac.i32_0; /* vindex */
-   params[4] = ctx->ac.i1false;  /* glc */
-   params[5] = ctx->ac.i1false;  /* slc */
-
-   if (components_32bit > 1)
-   data_type = LLVMVectorType(ctx->ac.f32, components_32bit);
-
-   writemask = widen_mask(writemask, elem_size_mult);
-
-   base_data = ac_to_float(>ac, src_data);
+   LLVMValueRef base_data = ac_to_float(>ac, src_data);
base_data = ac_trim_vector(>ac, base_data, instr->num_components);
-   base_data = LLVMBuildBitCast(ctx->ac.builder, base_data,
-data_type, "");
-   base_offset = get_src(ctx, instr->src[2]);  /* voffset */
+   LLVMValueRef base_offset = get_src(ctx, instr->src[2]);
+
while (writemask) {
int start, count;
-   LLVMValueRef data;
-   LLVMValueRef offset;
+   LLVMValueRef data, offset;
+   LLVMTypeRef data_type;
 
u_bit_scan_consecutive_range(, , );
 
@@ -1474,31 +1453,76 @@ static void visit_store_ssbo(struct ac_nir_context *ctx,
writemask |= 1 << (start + 2);
count = 2;
}
+   int num_bytes = count * elem_size_bytes; /* count in bytes */
 
-   if (count > 4) {
-   writemask |= ((1u << (count - 4)) - 1u) << (start + 4);
-   count = 4;
+   /* we can only store 4 DWords at the same time.
+* can only happen for 64 Bit vectors. */
+   if (num_bytes > 16) {
+   writemask |= ((1u << (count - 2)) - 1u) << (start + 2);
+   count = 2;
+   num_bytes = 16;
}
 
-   if (count == 4) {
-   store_name = "llvm.amdgcn.buffer.store.v4f32";
-   } else if (count == 2) {
-

[Mesa-dev] [PATCH 3/8] ac: add support for 16bit UBO loads

2018-07-19 Thread Daniel Schürmann

Signed-off-by: Daniel Schürmann 
---
 src/amd/common/ac_llvm_build.c  | 25 +
 src/amd/common/ac_llvm_build.h  |  8 
 src/amd/common/ac_nir_to_llvm.c | 21 ++---
 3 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 4078b005e5..54b7e98701 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -1103,6 +1103,31 @@ LLVMValueRef 
ac_build_buffer_load_format_gfx9_safe(struct ac_llvm_context *ctx,
   can_speculate, true);
 }
 
+LLVMValueRef
+ac_build_tbuffer_load_short(struct ac_llvm_context *ctx,
+   LLVMValueRef rsrc,
+   LLVMValueRef vindex,
+   LLVMValueRef voffset,
+   LLVMValueRef soffset,
+   LLVMValueRef immoffset)
+{
+   const char *name = "llvm.amdgcn.tbuffer.load.i32";
+   LLVMTypeRef type = ctx->i32;
+   LLVMValueRef params[] = {
+   rsrc,
+   vindex,
+   voffset,
+   soffset,
+   immoffset,
+   LLVMConstInt(ctx->i32, 
V_008F0C_BUF_DATA_FORMAT_16, false),
+   LLVMConstInt(ctx->i32, 
V_008F0C_BUF_NUM_FORMAT_UINT, false),
+   ctx->i1false,
+   ctx->i1false,
+   };
+   LLVMValueRef res = ac_build_intrinsic(ctx, name, type, params, 9, 0);
+   return LLVMBuildTrunc(ctx->builder, res, ctx->i16, "");
+}
+
 /**
  * Set range metadata on an instruction.  This can only be used on load and
  * call instructions.  If you know an instruction can only produce the values
diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
index 4e7cbcd5fa..c5753037e7 100644
--- a/src/amd/common/ac_llvm_build.h
+++ b/src/amd/common/ac_llvm_build.h
@@ -252,6 +252,14 @@ LLVMValueRef ac_build_buffer_load_format_gfx9_safe(struct 
ac_llvm_context *ctx,
   bool glc,
   bool can_speculate);
 
+LLVMValueRef
+ac_build_tbuffer_load_short(struct ac_llvm_context *ctx,
+   LLVMValueRef rsrc,
+   LLVMValueRef vindex,
+   LLVMValueRef voffset,
+   LLVMValueRef soffset,
+   LLVMValueRef immoffset);
+
 LLVMValueRef
 ac_get_thread_id(struct ac_llvm_context *ctx);
 
diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 43a0b86420..d7a52a536c 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1663,9 +1663,24 @@ static LLVMValueRef visit_load_ubo_buffer(struct 
ac_nir_context *ctx,
if (instr->dest.ssa.bit_size == 64)
num_components *= 2;
 
-   ret = ac_build_buffer_load(>ac, rsrc, num_components, NULL, offset,
-  NULL, 0, false, false, true, true);
-   ret = ac_trim_vector(>ac, ret, num_components);
+   if (instr->dest.ssa.bit_size == 16) {
+   LLVMValueRef results[num_components];
+   for (unsigned i = 0; i < num_components; ++i) {
+   results[i] = ac_build_tbuffer_load_short(>ac,
+rsrc,
+ctx->ac.i32_0,
+offset,
+ctx->ac.i32_0,
+
LLVMConstInt(ctx->ac.i32, 2 * i, 0));
+   }
+   ret = ac_build_gather_values(>ac, results, num_components);
+   } else {
+   ret = ac_build_buffer_load(>ac, rsrc, num_components, 
NULL, offset,
+  NULL, 0, false, false, true, true);
+
+   ret = ac_trim_vector(>ac, ret, num_components);
+   }
+
return LLVMBuildBitCast(ctx->ac.builder, ret,
get_def_type(ctx, >dest.ssa), "");
 }
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 7/8] ac: add support for 16bit load_push_constant

2018-07-19 Thread Daniel Schürmann

Signed-off-by: Daniel Schürmann 
---
 src/amd/common/ac_nir_to_llvm.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index babcb9de44..598e129aad 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1381,6 +1381,26 @@ static LLVMValueRef visit_load_push_constant(struct 
ac_nir_context *ctx,
get_src(ctx, instr->src[0]), "");
 
ptr = ac_build_gep0(>ac, ctx->abi->push_constants, addr);
+
+   if (instr->dest.ssa.bit_size == 16) {
+   unsigned load_dwords = instr->dest.ssa.num_components / 2 + 1;
+   LLVMTypeRef vec_type = LLVMVectorType(LLVMInt16Type(), 2 * 
load_dwords);
+   ptr = ac_cast_ptr(>ac, ptr, vec_type);
+   LLVMValueRef res = LLVMBuildLoad(ctx->ac.builder, ptr, "");
+   res = LLVMBuildBitCast(ctx->ac.builder, res, vec_type, "");
+   LLVMValueRef cond = LLVMBuildLShr(ctx->ac.builder, addr, 
ctx->ac.i32_1, "");
+   cond = LLVMBuildTrunc(ctx->ac.builder, cond, LLVMInt1Type(), 
"");
+   LLVMValueRef mask[] = { LLVMConstInt(ctx->ac.i32, 0, false), 
LLVMConstInt(ctx->ac.i32, 1, false),
+   LLVMConstInt(ctx->ac.i32, 2, false), 
LLVMConstInt(ctx->ac.i32, 3, false),
+   LLVMConstInt(ctx->ac.i32, 4, false)};
+   LLVMValueRef swizzle_aligned = LLVMConstVector([0], 
instr->dest.ssa.num_components);
+   LLVMValueRef swizzle_unaligned = LLVMConstVector([1], 
instr->dest.ssa.num_components);
+   LLVMValueRef shuffle_aligned = 
LLVMBuildShuffleVector(ctx->ac.builder, res, res, swizzle_aligned, "");
+   LLVMValueRef shuffle_unaligned = 
LLVMBuildShuffleVector(ctx->ac.builder, res, res, swizzle_unaligned, "");
+   res = LLVMBuildSelect(ctx->ac.builder, cond, shuffle_unaligned, 
shuffle_aligned, "");
+   return LLVMBuildBitCast(ctx->ac.builder, res, get_def_type(ctx, 
>dest.ssa), "");
+   }
+
ptr = ac_cast_ptr(>ac, ptr, get_def_type(ctx, >dest.ssa));
 
return LLVMBuildLoad(ctx->ac.builder, ptr, "");
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/8] ac: add 16bit conversion operations

2018-07-19 Thread Daniel Schürmann

Signed-off-by: Daniel Schürmann 
---
 src/amd/common/ac_llvm_build.c  | 13 -
 src/amd/common/ac_nir_to_llvm.c | 27 +++
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index a77c29270d..4078b005e5 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -175,6 +175,8 @@ ac_get_type_size(LLVMTypeRef type)
switch (kind) {
case LLVMIntegerTypeKind:
return LLVMGetIntTypeWidth(type) / 8;
+   case LLVMHalfTypeKind:
+   return 2;
case LLVMFloatTypeKind:
return 4;
case LLVMDoubleTypeKind:
@@ -320,6 +322,9 @@ void ac_build_type_name_for_intr(LLVMTypeRef type, char 
*buf, unsigned bufsize)
case LLVMIntegerTypeKind:
snprintf(buf, bufsize, "i%d", LLVMGetIntTypeWidth(elem_type));
break;
+   case LLVMHalfTypeKind:
+   snprintf(buf, bufsize, "f16");
+   break;
case LLVMFloatTypeKind:
snprintf(buf, bufsize, "f32");
break;
@@ -1819,11 +1824,9 @@ LLVMValueRef ac_build_cvt_pkrtz_f16(struct 
ac_llvm_context *ctx,
 {
LLVMTypeRef v2f16 =
LLVMVectorType(LLVMHalfTypeInContext(ctx->context), 2);
-   LLVMValueRef res =
-   ac_build_intrinsic(ctx, "llvm.amdgcn.cvt.pkrtz",
-  v2f16, args, 2,
-  AC_FUNC_ATTR_READNONE);
-   return LLVMBuildBitCast(ctx->builder, res, ctx->i32, "");
+
+   return ac_build_intrinsic(ctx, "llvm.amdgcn.cvt.pkrtz", v2f16,
+ args, 2, AC_FUNC_ATTR_READNONE);
 }
 
 /* Upper 16 bits must be zero. */
diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 83d8b9a442..f46744e8ca 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -464,7 +464,8 @@ static LLVMValueRef emit_pack_half_2x16(struct 
ac_llvm_context *ctx,
comp[0] = LLVMBuildExtractElement(ctx->builder, src0, ctx->i32_0, "");
comp[1] = LLVMBuildExtractElement(ctx->builder, src0, ctx->i32_1, "");
 
-   return ac_build_cvt_pkrtz_f16(ctx, comp);
+   return LLVMBuildBitCast(ctx->builder, ac_build_cvt_pkrtz_f16(ctx, comp),
+   ctx->i32, "");
 }
 
 static LLVMValueRef emit_unpack_half_2x16(struct ac_llvm_context *ctx,
@@ -843,34 +844,47 @@ static void visit_alu(struct ac_nir_context *ctx, const 
nir_alu_instr *instr)
src[i] = ac_to_integer(>ac, src[i]);
result = ac_build_gather_values(>ac, src, num_components);
break;
+   case nir_op_f2i16:
case nir_op_f2i32:
case nir_op_f2i64:
src[0] = ac_to_float(>ac, src[0]);
result = LLVMBuildFPToSI(ctx->ac.builder, src[0], def_type, "");
break;
+   case nir_op_f2u16:
case nir_op_f2u32:
case nir_op_f2u64:
src[0] = ac_to_float(>ac, src[0]);
result = LLVMBuildFPToUI(ctx->ac.builder, src[0], def_type, "");
break;
+   case nir_op_i2f16:
case nir_op_i2f32:
case nir_op_i2f64:
src[0] = ac_to_integer(>ac, src[0]);
result = LLVMBuildSIToFP(ctx->ac.builder, src[0], 
ac_to_float_type(>ac, def_type), "");
break;
+   case nir_op_u2f16:
case nir_op_u2f32:
case nir_op_u2f64:
src[0] = ac_to_integer(>ac, src[0]);
result = LLVMBuildUIToFP(ctx->ac.builder, src[0], 
ac_to_float_type(>ac, def_type), "");
break;
-   case nir_op_f2f64:
+   case nir_op_f2f16_rtz:
src[0] = ac_to_float(>ac, src[0]);
-   result = LLVMBuildFPExt(ctx->ac.builder, src[0], 
ac_to_float_type(>ac, def_type), "");
+   LLVMValueRef param[2] = { src[0], ctx->ac.f32_0 };
+   result = ac_build_cvt_pkrtz_f16(>ac, param);
+   result = LLVMBuildExtractElement(ctx->ac.builder, result, 
ctx->ac.i32_0, "");
break;
+   case nir_op_f2f16_rtne:
+   case nir_op_f2f16_undef:
case nir_op_f2f32:
+   case nir_op_f2f64:
src[0] = ac_to_float(>ac, src[0]);
-   result = LLVMBuildFPTrunc(ctx->ac.builder, src[0], 
ac_to_float_type(>ac, def_type), "");
+   if (ac_get_elem_bits(>ac, LLVMTypeOf(src[0])) < 
ac_get_elem_bits(>ac, def_type))
+   result = LLVMBuildFPExt(ctx->ac.builder, src[0], 
ac_to_float_type(>ac, def_type), "");
+   else
+   result = LLVMBuildFPTrunc(ctx->ac.builder, src[0], 
ac_to_float_type(>ac, def_type), "");
break;
+   case nir_op_u2u16:
case nir_op_u2u32:
case nir_op_u2u64:
src[0] = ac_to_integer(>ac, src[0]);
@@ -879,6 +893,7 @@

[Mesa-dev] [PATCH 5/8] nir: add 16bit type information to glsl types

2018-07-19 Thread Daniel Schürmann

Signed-off-by: Daniel Schürmann 
---
 src/compiler/glsl_types.h  | 15 +++
 src/compiler/nir_types.cpp | 12 
 src/compiler/nir_types.h   |  1 +
 3 files changed, 28 insertions(+)

diff --git a/src/compiler/glsl_types.h b/src/compiler/glsl_types.h
index efc6324865..8cc8177f2d 100644
--- a/src/compiler/glsl_types.h
+++ b/src/compiler/glsl_types.h
@@ -87,6 +87,13 @@ enum glsl_base_type {
GLSL_TYPE_ERROR
 };
 
+static inline bool glsl_base_type_is_16bit(enum glsl_base_type type)
+{
+   return type == GLSL_TYPE_FLOAT16 ||
+  type == GLSL_TYPE_UINT16 ||
+  type == GLSL_TYPE_INT16;
+}
+
 static inline bool glsl_base_type_is_64bit(enum glsl_base_type type)
 {
return type == GLSL_TYPE_DOUBLE ||
@@ -551,6 +558,14 @@ public:
   return glsl_base_type_is_64bit(base_type);
}
 
+   /**
+* Query whether or not a type is 64-bit
+*/
+   bool is_16bit() const
+   {
+  return glsl_base_type_is_16bit(base_type);
+   }
+
/**
 * Query whether or not a type is a non-array boolean type
 */
diff --git a/src/compiler/nir_types.cpp b/src/compiler/nir_types.cpp
index 6f1182b742..3a3864414f 100644
--- a/src/compiler/nir_types.cpp
+++ b/src/compiler/nir_types.cpp
@@ -164,6 +164,12 @@ glsl_get_record_location_offset(const struct glsl_type 
*type,
return type->record_location_offset(length);
 }
 
+bool
+glsl_type_is_16bit(const glsl_type *type)
+{
+   return type->is_16bit();
+}
+
 bool
 glsl_type_is_64bit(const glsl_type *type)
 {
@@ -473,6 +479,12 @@ glsl_channel_type(const glsl_type *t)
   return glsl_uint64_t_type();
case GLSL_TYPE_INT64:
   return glsl_int64_t_type();
+   case GLSL_TYPE_FLOAT16:
+  return glsl_float16_t_type();
+   case GLSL_TYPE_UINT16:
+  return glsl_uint16_t_type();
+   case GLSL_TYPE_INT16:
+  return glsl_int16_t_type();
default:
   unreachable("Unhandled base type glsl_channel_type()");
}
diff --git a/src/compiler/nir_types.h b/src/compiler/nir_types.h
index c128250c7d..817b7a9b34 100644
--- a/src/compiler/nir_types.h
+++ b/src/compiler/nir_types.h
@@ -121,6 +121,7 @@ glsl_get_bit_size(const struct glsl_type *type)
return 0;
 }
 
+bool glsl_type_is_16bit(const struct glsl_type *type);
 bool glsl_type_is_64bit(const struct glsl_type *type);
 bool glsl_type_is_void(const struct glsl_type *type);
 bool glsl_type_is_error(const struct glsl_type *type);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] VK_KHR_16bit_storage extension / 16bit storage features

2018-07-19 Thread Daniel Schürmann

This series implements and enables the VK_KHR_16bit_storage extension 
(respectively the VK 1.1 features) for VI+.
LLVM >= 7 is required and it is unlikely that workarounds for LLVM-6 get 
implemented for this rarely used ext.
This first implementation does not yet make use of packing 16bit values between 
shader stages.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] travis: manually generate sys/syscall.h

2018-07-19 Thread Andres Gomez

Until now, the needed bits were wrongly included in linux/memfd.h

Since Travis' sys/syscall.h doesn't provide the SYS_memfd_create, we
generate that header manually, including the needed bits to avoid
compilation problems, as the ones observed after:
3228335b55c ("intel: aubinator: handle GGTT mappings")

Fixes: 7e2af374742 ("travis: split the make target to three separate ones")

Cc: Emil Velikov 
Cc: Juan A. Suarez Romero 
Cc: Dylan Baker 
Cc: Eric Engestrom 
Signed-off-by: Andres Gomez 
---
 .travis.yml | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index 012cc9139e0..8b1730bec69 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -584,13 +584,34 @@ install:
"#ifndef _LINUX_MEMFD_H" \
"#define _LINUX_MEMFD_H" \
"" \
-   "#define __NR_memfd_create 319" \
-   "#define SYS_memfd_create __NR_memfd_create" \
-   "" \
"#define MFD_CLOEXEC 0x0001U" \
"#define MFD_ALLOW_SEALING   0x0002U" \
"" \
"#endif /* _LINUX_MEMFD_H */" > linux/memfd.h
+
+  # Generate this header, including the missing SYS_memfd_create
+  # macro, which is not provided by the header in the Travis
+  # instance
+  mkdir -p sys
+  printf "%s\n" \
+   "#ifndef _SYSCALL_H" \
+   "#define _SYSCALL_H  1" \
+   "" \
+   "#include " \
+   "" \
+   "#ifndef _LIBC" \
+   "# include " \
+   "#endif" \
+   "" \
+   "#ifndef __NR_memfd_create" \
+   "# define __NR_memfd_create 319 /* Taken from  */" 
\
+   "#endif" \
+   "" \
+   "#ifndef SYS_memfd_create" \
+   "# define SYS_memfd_create __NR_memfd_create" \
+   "#endif" \
+   "" \
+   "#endif" > sys/syscall.h
 fi
 
 script:
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107290] Hang reading from GL_FRONT in fullscreen

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107290

Bug ID: 107290
   Summary: Hang reading from GL_FRONT in fullscreen
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: jwy...@feralinteractive.com
QA Contact: mesa-dev@lists.freedesktop.org

Created attachment 140709
  --> https://bugs.freedesktop.org/attachment.cgi?id=140709=edit
Demonstration for hang

If GL_FRONT is bound using glReadBuffer in a fullscreen application, the driver
can hang in xshmfence_await.

This only appears to happen if the application is the only thing being
displayed - even bringing up the alt-tab menu will cause rendering to resume as
normal. Once this bug has occurred though, returning to the app will cause it
to hang again.

Possibly this happens when the else if (buffer_type ==
loader_dri3_buffer_front) path of dri3_get_buffer is used.

The attached sample should demonstrate this issue - it reports a duration for
the swap/read back on the order of the amount of time that the application was
the sole thing rendering to the screen.

Tested radeonsi, git revision 5ba3e5c358, Fedora 27, gnome 3.26.2.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 106893] Potential mem leak with radv

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=106893

Bas Nieuwenhuizen  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Bas Nieuwenhuizen  ---
Thanks for the follow up!

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 106893] Potential mem leak with radv

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=106893

--- Comment #4 from John  ---
I have just tried again and since I was within Alex' numbers in the menu, I
tried the benchmark and was able to complete it with no issue for the 1st time!
I'd say the game used about 4Gb of RAM during that time, so all good!

A few things have changed on my system so I'll dig a little more to see what
helped.

- I'm building mesa myself instead of using mesa-git to use this patch:
https://bugs.freedesktop.org/attachment.cgi?id=139672=edit (I cannot
play Hitman in good conditions without it).

- I'm using the CK patchset for my kernel with MUQSS.

- Then the obvious new code in linux 4.17.6/llvm-svn/radv-git, but I'd rather
not bisect these to find what fixed my issue...

Thanks to whoever fixed this issue!

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107288] REGoth/bgfx: util_blitter_generate_mipmap: Assertion `!util_format_has_stencil(desc)' failed.

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107288

Bug ID: 107288
   Summary: REGoth/bgfx: util_blitter_generate_mipmap: Assertion
`!util_format_has_stencil(desc)' failed.
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: haa...@frickel.club
QA Contact: mesa-dev@lists.freedesktop.org

Created attachment 140708
  --> https://bugs.freedesktop.org/attachment.cgi?id=140708=edit
application output

REGoth: ../src/gallium/auxiliary/util/u_blitter.c:2048:
util_blitter_generate_mipmap: Assertion `!util_format_has_stencil(desc)'
failed.


REGoth is a project reimplementing the game engine used for Gothic 1 and 2.
They use bgfx for rendering and the assert is caused by whatever that library
is doing. Its startup output is attached.

This may be an application bug, but commenting out that assert makes ReGoth
work with seemingly no ill effect, so it would be nice if it could be verified
that it's a valid assert and report it downstream to bgfx if it is.

Only tried with radeonsi latest git on RX 480 but the location of the assert
probably applies to all drivers.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] libGLw: Use newly introduced GLAPIVAR for variables

2018-07-19 Thread Stefan Dirsch

On Wed, Jul 18, 2018 at 11:07:01AM -0600, Brian Paul wrote:
> On 07/17/2018 06:47 PM, Stefan Dirsch wrote:
> > On Tue, Jul 17, 2018 at 04:57:26PM -0600, Brian Paul wrote:
> > > Reviewed-by: Brian Paul 
> > > 
> > > Do you need me to push this for you?
> > 
> > I'm afraid the answer is yes. Tried it but push hangs forever after this
> > 
> > # git push --verbose
> > Pushing to ssh://git.freedesktop.org/git/mesa/glw.git
> > Counting objects: 4, done.
> > Delta compression using up to 8 threads.
> > Compressing objects: 100% (4/4), done.
> > Writing objects: 100% (4/4), 700 bytes | 350.00 KiB/s, done.
> > Total 4 (delta 3), reused 0 (delta 0)
> 
> Worked for me.

Wonderful! Thanks a bunch.

Stefan

Public Key available
--
Stefan Dirsch (Res. & Dev.)   SUSE LINUX GmbH
Tel: 0911-740 53 0Maxfeldstraße 5
FAX: 0911-740 53 479  D-90409 Nürnberg
http://www.suse.deGermany 
---
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham
Norton, HRB 21284 (AG Nürnberg)
---
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] st: Sweep NIR after linking phase to free held memory

2018-07-19 Thread Danylo Piliaiev



On 19.07.18 12:30, Timothy Arceri wrote:

On 19.07.18 11:47, Timothy Arceri wrote:

On 19/07/18 08:31, Eric Anholt wrote:

Danylo Piliaiev  writes:


After optimization passes and many trasfromations most of memory


"transformations"

NIR holds is a garbage which was being freed only after shader 
deletion.


"is garbage"

Freeing it at the end of linking will save memory which would be 
useful

in case there are a lot of complex shaders being compiled.
The common case for this issue is 32bit game running under Wine.

The cost of the optimization is around ~3-5% of compilation speed
with complex shaders.

Signed-off-by: Danylo Piliaiev 


This seems good, and I'm running it through the CTS now.



The problem is this does the sweep too early. We still do lots of 
work on NIR after this, I've thought about this a few times and it 
really seems we should be able to call a driver specific function 
from the st and pass it the IR so it can do what ever it wants 
(lowering opts etc) and spits it back out. Once this is done we 
should then call sweep and cache the IR.


At the very least we should call sweep before we cache NIR rather 
than where this patch places it.
I debugged mesa once more and cannot see where else memory is 
allocated for NIR after this sweep. Could you point it to me? After 
this sweep the memory NIR holds never grows. Later this NIR is cloned 
from and these cloned NIRs are being sweeped in other place.


Ah yes you are right. We do clone it when creating variants I'd 
forgotten about that. In that case please ignore my comment.


Good, I thought I really missed something...

Also during my investigation of Mesa memory usage I wrote gdb pretty 
printer which shows how much memory variable holds in its ralloc context 
(with all its children), it was crudely written and at this moment has 
sever limitation: x64 only, depends on internal malloc implementation 
and other hardcoded things, also I wasn't able to nicely display 
children of a variable. The reason that pretty printer was done this way 
is that calling c function (e.g. malloc_usable_size) corrupts backtrace 
somehow.


Example usage:

(gdb) source ralloc_info_pretty_printer.py
(gdb) backtrace
#0  brw_link_shader (ctx=0x558ca0a0 , 
shProg=0x55d850c0 ) at brw_link.cpp:320
#1  0x72732b6f in _mesa_glsl_link_shader (ctx=0x558ca0a0 
, prog=0x55d850c0 ) at 
program/ir_to_mesa.cpp:3174
#2  0x725a1862 in link_program (no_error=false, 
shProg=0x55d850c0 , ctx=0x558ca0a0 ) at 
main/shaderapi.c:1206
#3  link_program_error (ctx=0x558ca0a0 , 
shProg=0x55d850c0 ) at main/shaderapi.c:1286
#4  0x725a2f00 in _mesa_LinkProgram (programObj=3) at 
main/shaderapi.c:1778
#5  0x6de1 in main () at 
/home/danylo/Projects/shader_compilation_memory_test/test.cpp:421

(gdb) p prog->nir
$1 = 0x55dc2b20 

If there is any interest in having it in Mesa I can clean it up. You can 
find its code in attachment.


import gdb
import gdb.types
import gdb.printing

have_python_2 = (sys.version_info[0] == 2)
have_python_3 = (sys.version_info[0] == 3)

if have_python_3:
intptr = int
elif have_python_2:
intptr = long

def get_ralloc_header(val):
try:
ralloc_header_type = gdb.lookup_type("ralloc_header")
ralloc_header_type_pointer = ralloc_header_type.pointer()

val_as_header_ptr = val.cast(ralloc_header_type_pointer)

if not val_as_header_ptr:
return None

header_ptr = val_as_header_ptr - 1

if not header_ptr:
return None

canary = header_ptr["canary"]

if canary and int(canary) == 0x5A1106:
return header_ptr
else:
return None
except:
return None


class RAllocPrinter:
"Pretty Printer for ralloc"
printer_name = 'ralloc'

def __init__(self, val, header):
self.val = val
self.header = header;
self.has_child = True

def calc_alloc_size(self, header_ptr):
headers = [header_ptr]
size = 0

current_inferior = gdb.selected_inferior()

size_bits_mask = ~(0x1 | 0x2 | 0x4)

while headers:
current_header_ptr = headers.pop()

child_ptr = current_header_ptr["child"]
while intptr(child_ptr) != 0:
canary = child_ptr["canary"]
if not canary or int(canary) != 0x5A1106:
return -1

headers.append(child_ptr)
child_ptr = child_ptr["next"]

#struct malloc_chunk {
#INTERNAL_SIZE_T prev_size;  /* Size of previous chunk (if free).  */
#INTERNAL_SIZE_T size;   /* Size in bytes, including overhead. */   <--
# --USABLE MEMORY

mem = current_inferior.read_memory(int(current_header_ptr) - 8, 8)

size += mem.cast('l')[0] & size_bits_mask

return size

def to_string(self):

Re: [Mesa-dev] [PATCH] i965/tex: ignore the diff between GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE

2018-07-19 Thread andrey simiklit

Hi,

> Ugh... not so good.  According to Oliver on the bug, this just make the
assert go away and doesn't actually fix anything.  Likely this is needed
but not sufficient.

So as far as I understand Oliver found the bad commit in xorg glamor:
https://bugs.freedesktop.org/show_bug.cgi?id=107287

So at the moment we should fix just this "assertion" issue for Intel
because "rendering" issue came from xorg/glamor and there is no "rendering"
issue in Intel part.
Please correct me if I incorrect.

Regards,
Andrii.

On Tue, Jul 10, 2018 at 8:04 PM, Olivier Fourdan  wrote:

> Hi,
>
> On Tue, 10 Jul 2018 at 18:56, Jason Ekstrand  wrote:
> >
> > Ugh... not so good.  According to Oliver on the bug, this just make the
> assert go away and doesn't actually fix anything.  Likely this is needed
> but not sufficient.
>
> Well, maybe not even needed, at least in my case I don't hit that
> assert() with the current Mesa code (from either master and 18.1),
> I've just seen that happening with past commits while doing the
> bisection in git... So adding this now wouldn't help with bisection
> (again, in my case, not sure about others).
>
> All I meant pointing at this assert() failure is that it breaks the
> bisection in git as I am unable to tell if the rendering is correct or
> not as it makes the test program abort.
>
> Cheers,
> Olivier
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] st: Sweep NIR after linking phase to free held memory

2018-07-19 Thread Timothy Arceri


On 19.07.18 11:47, Timothy Arceri wrote:

On 19/07/18 08:31, Eric Anholt wrote:

Danylo Piliaiev  writes:


After optimization passes and many trasfromations most of memory


"transformations"

NIR holds is a garbage which was being freed only after shader 
deletion.


"is garbage"


Freeing it at the end of linking will save memory which would be useful
in case there are a lot of complex shaders being compiled.
The common case for this issue is 32bit game running under Wine.

The cost of the optimization is around ~3-5% of compilation speed
with complex shaders.

Signed-off-by: Danylo Piliaiev 


This seems good, and I'm running it through the CTS now.



The problem is this does the sweep too early. We still do lots of work 
on NIR after this, I've thought about this a few times and it really 
seems we should be able to call a driver specific function from the st 
and pass it the IR so it can do what ever it wants (lowering opts etc) 
and spits it back out. Once this is done we should then call sweep and 
cache the IR.


At the very least we should call sweep before we cache NIR rather than 
where this patch places it.
I debugged mesa once more and cannot see where else memory is allocated 
for NIR after this sweep. Could you point it to me? After this sweep the 
memory NIR holds never grows. Later this NIR is cloned from and these 
cloned NIRs are being sweeped in other place.


Ah yes you are right. We do clone it when creating variants I'd 
forgotten about that. In that case please ignore my comment.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] st: Sweep NIR after linking phase to free held memory

2018-07-19 Thread Danylo Piliaiev




On 19.07.18 11:47, Timothy Arceri wrote:

On 19/07/18 08:31, Eric Anholt wrote:

Danylo Piliaiev  writes:


After optimization passes and many trasfromations most of memory


"transformations"

NIR holds is a garbage which was being freed only after shader 
deletion.


"is garbage"


Freeing it at the end of linking will save memory which would be useful
in case there are a lot of complex shaders being compiled.
The common case for this issue is 32bit game running under Wine.

The cost of the optimization is around ~3-5% of compilation speed
with complex shaders.

Signed-off-by: Danylo Piliaiev 


This seems good, and I'm running it through the CTS now.



The problem is this does the sweep too early. We still do lots of work 
on NIR after this, I've thought about this a few times and it really 
seems we should be able to call a driver specific function from the st 
and pass it the IR so it can do what ever it wants (lowering opts etc) 
and spits it back out. Once this is done we should then call sweep and 
cache the IR.


At the very least we should call sweep before we cache NIR rather than 
where this patch places it.
I debugged mesa once more and cannot see where else memory is allocated 
for NIR after this sweep. Could you point it to me? After this sweep the 
memory NIR holds never grows. Later this NIR is cloned from and these 
cloned NIRs are being sweeped in other place.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 4/4] intel: tools: dump: trace memory writes

2018-07-19 Thread Lionel Landwerlin


On 18/07/18 21:58, Rafael Antognolli wrote:

On Wed, Jul 18, 2018 at 06:21:32PM +0100, Lionel Landwerlin wrote:

Signed-off-by: Lionel Landwerlin 
---
  src/intel/tools/aub_write.c | 45 ++---
  1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/src/intel/tools/aub_write.c b/src/intel/tools/aub_write.c
index de4ce33..9c140553542 100644
--- a/src/intel/tools/aub_write.c
+++ b/src/intel/tools/aub_write.c
@@ -313,10 +313,17 @@ dword_out(struct aub_file *aub, uint32_t data)
  
  static void

  mem_trace_memory_write_header_out(struct aub_file *aub, uint64_t addr,
-  uint32_t len, uint32_t addr_space)
+  uint32_t len, uint32_t addr_space,
+  const char *desc)

Looks like you are not using desc anywhere...

Other than that, things look good.


Duh! Fixed locally.
Counts as Rb?

Thanks,

-
Lionel




  {
 uint32_t dwords = ALIGN(len, sizeof(uint32_t)) / sizeof(uint32_t);
  
+   if (aub->verbose_log_file) {

+  fprintf(aub->verbose_log_file,
+  "  MEM WRITE (0x%016" PRIx64 "-0x%016" PRIx64 ")\n",
+  addr, addr + len);
+   }
+
 dword_out(aub, CMD_MEM_TRACE_MEMORY_WRITE | (5 + dwords - 1));
 dword_out(aub, addr & 0x);   /* addr lo */
 dword_out(aub, addr >> 32);   /* addr hi */
@@ -387,7 +394,8 @@ populate_ppgtt_table(struct aub_file *aub, struct 
aub_ppgtt_table *table,
uint64_t write_size = (dirty_end - dirty_start + 1) *
   sizeof(uint64_t);
mem_trace_memory_write_header_out(aub, write_addr, write_size,
-
AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL);
+
AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL,
+"PPGTT update");
data_out(aub, entries + dirty_start, write_size);
 }
  }
@@ -476,7 +484,8 @@ write_execlists_header(struct aub_file *aub, const char 
*name)
  
 mem_trace_memory_write_header_out(aub, STATIC_GGTT_MAP_START >> 12,

   ggtt_ptes * GEN8_PTE_SIZE,
- 
AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT_ENTRY);
+ 
AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT_ENTRY,
+ "GGTT PT");
 for (uint32_t i = 0; i < ggtt_ptes; i++) {
dword_out(aub, 1 + 0x1000 * i + STATIC_GGTT_MAP_START);
dword_out(aub, 0);
@@ -484,7 +493,8 @@ write_execlists_header(struct aub_file *aub, const char 
*name)
  
 /* RENDER_RING */

 mem_trace_memory_write_header_out(aub, RENDER_RING_ADDR, RING_SIZE,
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
+ AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
+ "RENDER RING");
 for (uint32_t i = 0; i < RING_SIZE; i += sizeof(uint32_t))
dword_out(aub, 0);
  
@@ -492,7 +502,8 @@ write_execlists_header(struct aub_file *aub, const char *name)

 mem_trace_memory_write_header_out(aub, RENDER_CONTEXT_ADDR,
   PPHWSP_SIZE +
   sizeof(render_context_init),
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
+ AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
+ "RENDER PPHWSP");
 for (uint32_t i = 0; i < PPHWSP_SIZE; i += sizeof(uint32_t))
dword_out(aub, 0);
  
@@ -501,7 +512,8 @@ write_execlists_header(struct aub_file *aub, const char *name)
  
 /* BLITTER_RING */

 mem_trace_memory_write_header_out(aub, BLITTER_RING_ADDR, RING_SIZE,
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
+ AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
+ "BLITTER RING");
 for (uint32_t i = 0; i < RING_SIZE; i += sizeof(uint32_t))
dword_out(aub, 0);
  
@@ -509,7 +521,8 @@ write_execlists_header(struct aub_file *aub, const char *name)

 mem_trace_memory_write_header_out(aub, BLITTER_CONTEXT_ADDR,
   PPHWSP_SIZE +
   sizeof(blitter_context_init),
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
+ AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
+ "BLITTER PPHWSP");
 for (uint32_t i = 0; i < PPHWSP_SIZE; i += sizeof(uint32_t))
dword_out(aub, 0);
  
@@ -518,7 +531,8 @@ write_execlists_header(struct aub_file *aub, const char *name)
  
 /* VIDEO_RING */

 mem_trace_memory_write_header_out(aub, VIDEO_RING_ADDR, RING_SIZE,
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
+

Re: [Mesa-dev] [PATCH v2 1/4] intel: tools: dump: remove command execution feature

2018-07-19 Thread Lionel Landwerlin


Was that for the whole series, or just this patch? :)

Thanks,

-
Lionel

On 18/07/18 21:42, Jason Ekstrand wrote:

Very sketchily

Reviewed-by: Jason Ekstrand >


On Wed, Jul 18, 2018 at 10:21 AM Lionel Landwerlin 
mailto:lionel.g.landwer...@intel.com>> 
wrote:


In commit 86cb05a6d35a52 ("intel: aubinator: remove standard input
processing option") we removed the ability to process aub as an input
stream because we're now rely on mmapping the aub file to back the
buffers aubinator is parsing.

intel_aubdump was the provider of the standard input data and since
we've copied/reworked intel_aubdump into intel_dump_gpu within Mesa,
we don't need that code anymore.

Signed-off-by: Lionel Landwerlin mailto:lionel.g.landwer...@intel.com>>
---
 src/intel/tools/intel_dump_gpu.c  | 121
+++---
 src/intel/tools/intel_dump_gpu.in  | 
27 +--
 2 files changed, 29 insertions(+), 119 deletions(-)

diff --git a/src/intel/tools/intel_dump_gpu.c
b/src/intel/tools/intel_dump_gpu.c
index 6d2c4b7f983..5fd2c8ea723 100644
--- a/src/intel/tools/intel_dump_gpu.c
+++ b/src/intel/tools/intel_dump_gpu.c
@@ -53,8 +53,8 @@ static int (*libc_close)(int fd) =
close_init_helper;
 static int (*libc_ioctl)(int fd, unsigned long request, ...) =
ioctl_init_helper;

 static int drm_fd = -1;
-static char *filename = NULL;
-static FILE *files[2] = { NULL, NULL };
+static char *output_filename = NULL;
+static FILE *output_file = NULL;
 static int verbose = 0;
 static bool device_override;

@@ -111,7 +111,7 @@ align_u32(uint32_t v, uint32_t a)

 static struct gen_device_info devinfo = {0};
 static uint32_t device;
-static struct aub_file aubs[2];
+static struct aub_file aub_file;

 static void *
 relocate_bo(struct bo *bo, const struct drm_i915_gem_execbuffer2
*execbuffer2,
@@ -205,28 +205,21 @@ dump_execbuffer2(int fd, struct
drm_i915_gem_execbuffer2 *execbuffer2)
       fail_if(!gen_get_device_info(device, ),
               "failed to identify chipset=0x%x\n", device);

-      for (int i = 0; i < ARRAY_SIZE(files); i++) {
-         if (files[i] != NULL) {
-            aub_file_init([i], files[i], device);
-            if (verbose == 2)
-               aubs[i].verbose_log_file = stdout;
-            aub_write_header([i],
program_invocation_short_name);
-         }
-      }
+      aub_file_init(_file, output_file, device);
+      if (verbose == 2)
+         aub_file.verbose_log_file = stdout;
+      aub_write_header(_file, program_invocation_short_name);

       if (verbose)
          printf("[intel_aubdump: running, "
                 "output file %s, chipset id 0x%04x, gen %d]\n",
-                filename, device, devinfo.gen);
+                output_filename, device, devinfo.gen);
    }

-   /* Any aub */
-   struct aub_file *any_aub = files[0] ? [0] : [1];;
-
-   if (aub_use_execlists(any_aub))
+   if (aub_use_execlists(_file))
       offset = 0x1000;
    else
-      offset = aub_gtt_size(any_aub);
+      offset = aub_gtt_size(_file);

    if (verbose)
       printf("Dumping execbuffer2:\n");
@@ -263,13 +256,8 @@ dump_execbuffer2(int fd, struct
drm_i915_gem_execbuffer2 *execbuffer2)
          bo->map = gem_mmap(fd, obj->handle, 0, bo->size);
       fail_if(bo->map == MAP_FAILED, "intel_aubdump: bo mmap
failed\n");

-      for (int i = 0; i < ARRAY_SIZE(files); i++) {
-         if (files[i] == NULL)
-            continue;
-
-         if (aub_use_execlists([i]))
-            aub_map_ppgtt([i], bo->offset, bo->size);
-      }
+      if (aub_use_execlists(_file))
+         aub_map_ppgtt(_file, bo->offset, bo->size);
    }

    batch_index = (execbuffer2->flags & I915_EXEC_BATCH_FIRST) ? 0 :
@@ -284,30 +272,21 @@ dump_execbuffer2(int fd, struct
drm_i915_gem_execbuffer2 *execbuffer2)
       else
          data = bo->map;

-      for (int i = 0; i < ARRAY_SIZE(files); i++) {
-         if (files[i] == NULL)
-            continue;
-
-         if (bo == batch_bo) {
-            aub_write_trace_block([i], AUB_TRACE_TYPE_BATCH,
-                                  GET_PTR(data), bo->size,
bo->offset);
-         } else {
-            aub_write_trace_block([i], AUB_TRACE_TYPE_NOTYPE,
-                                  GET_PTR(data), bo->size,
bo->offset);
-         }
+      if (bo == batch_bo) {
+         aub_write_trace_block(_file, AUB_TRACE_TYPE_BATCH,
+                               GET_PTR(data), bo->size, bo->offset);
+      } else {
+         aub_write_trace_block(_file, AUB_TRACE_TYPE_NOTYPE,
+

Re: [Mesa-dev] [PATCH] intel/tools: fix segfault with intel_dump_gpu

2018-07-19 Thread Lionel Landwerlin


Hey Jordan,

I have patch that remove this for loop, reviewed by Jason.
Landing it right now, that should fix this problem.

Thanks,

-
Lionel

On 19/07/18 09:52, Jordan Justen wrote:

Cc: Jason Ekstrand 
Cc: Lionel Landwerlin 
Fixes: 0a457d987ee "intel/tools: Refactor aub dumping to remove singletons"
Signed-off-by: Jordan Justen 
---
  src/intel/tools/intel_dump_gpu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/tools/intel_dump_gpu.c b/src/intel/tools/intel_dump_gpu.c
index 6d2c4b7f983..e0ff1245925 100644
--- a/src/intel/tools/intel_dump_gpu.c
+++ b/src/intel/tools/intel_dump_gpu.c
@@ -301,7 +301,7 @@ dump_execbuffer2(int fd, struct drm_i915_gem_execbuffer2 
*execbuffer2)
 }
  
 for (int i = 0; i < ARRAY_SIZE(files); i++) {

-  if (files[i] != NULL)
+  if (files[i] == NULL)
   continue;
  
aub_write_exec([i],



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] intel/tools: fix segfault with intel_dump_gpu

2018-07-19 Thread Jordan Justen

Cc: Jason Ekstrand 
Cc: Lionel Landwerlin 
Fixes: 0a457d987ee "intel/tools: Refactor aub dumping to remove singletons"
Signed-off-by: Jordan Justen 
---
 src/intel/tools/intel_dump_gpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/tools/intel_dump_gpu.c b/src/intel/tools/intel_dump_gpu.c
index 6d2c4b7f983..e0ff1245925 100644
--- a/src/intel/tools/intel_dump_gpu.c
+++ b/src/intel/tools/intel_dump_gpu.c
@@ -301,7 +301,7 @@ dump_execbuffer2(int fd, struct drm_i915_gem_execbuffer2 
*execbuffer2)
}
 
for (int i = 0; i < ARRAY_SIZE(files); i++) {
-  if (files[i] != NULL)
+  if (files[i] == NULL)
  continue;
 
   aub_write_exec([i],
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] st: Sweep NIR after linking phase to free held memory

2018-07-19 Thread Timothy Arceri


On 19/07/18 08:31, Eric Anholt wrote:

Danylo Piliaiev  writes:


After optimization passes and many trasfromations most of memory


"transformations"


NIR holds is a garbage which was being freed only after shader deletion.


"is garbage"


Freeing it at the end of linking will save memory which would be useful
in case there are a lot of complex shaders being compiled.
The common case for this issue is 32bit game running under Wine.

The cost of the optimization is around ~3-5% of compilation speed
with complex shaders.

Signed-off-by: Danylo Piliaiev 


This seems good, and I'm running it through the CTS now.



The problem is this does the sweep too early. We still do lots of work 
on NIR after this, I've thought about this a few times and it really 
seems we should be able to call a driver specific function from the st 
and pass it the IR so it can do what ever it wants (lowering opts etc) 
and spits it back out. Once this is done we should then call sweep and 
cache the IR.


At the very least we should call sweep before we cache NIR rather than 
where this patch places it.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 00/49] meson for windows

2018-07-19 Thread Liviu Prodea

Review for Meson windows support v2
1. osmesa Scons build doesn't use lib prefix on windows platform. I don't know 
much of cygwin though.Sent a merge request to ensure parity - 
https://gitlab.freedesktop.org/dbaker/mesa/merge_requests/2
2. Problem linking LLVM. 

a. The LLVM wrap example in the docs doesn't specify from what llvm-config 
command that libraries array comes from. With Scons this is documented in the 
code itself 
https://gitlab.freedesktop.org/mesa/mesa/blob/master/scons/llvm.py#L106
b. I wrote a llvm wrap generator based on that documentation making the 
assumption the needed libraries are the same as with Scons. Unfortunately I get 
static - dynamic link mismatches for all LLVM libraries. Maybe I am doing 
something wrong or I need to build LLVM differently (hopefully mot) or it's a 
bug.
llvm wrap meson.build
project('llvm', ['cpp']) 
 
cpp = meson.get_compiler('cpp') 
 
_deps = [] 
_search = join_paths(meson.current_source_dir(), '../../../llvm/x64/lib') 
foreach d : ['LLVMIRReader', 'LLVMAsmParser', 'LLVMX86Disassembler', 
'LLVMX86AsmParser', 'LLVMX86CodeGen', 'LLVMGlobalISel', 'LLVMSelectionDAG', 
'LLVMAsmPrinter', 'LLVMDebugInfoCodeView', 'LLVMDebugInfoMSF', 'LLVMCodeGen', 
'LLVMScalarOpts', 'LLVMInstCombine', 'LLVMTransformUtils', 'LLVMBitWriter', 
'LLVMX86Desc', 'LLVMMCDisassembler', 'LLVMX86Info', 'LLVMX86AsmPrinter', 
'LLVMX86Utils', 'LLVMMCJIT', 'LLVMExecutionEngine', 'LLVMTarget', 
'LLVMAnalysis', 'LLVMProfileData', 'LLVMRuntimeDyld', 'LLVMObject', 
'LLVMMCParser', 'LLVMBitReader', 'LLVMMC', 'LLVMCore', 'LLVMBinaryFormat', 
'LLVMSupport', 'LLVMDemangle'] 
  _deps += cpp.find_library(d, dirs : _search) 
endforeach 
 
ext_llvm = declare_dependency( 
  include_directories : 
include_directories(join_paths(meson.current_source_dir(), 
'../../../llvm/x64/include')), 
  dependencies : _deps, 
  version : '6.0.1', 
) 
 
irbuilder_h = files(join_paths(meson.current_source_dir(), 
'../../../llvm/x64/include/llvm/IR/IRBuilder.h')) 




LLVM build config: cmake -G "Ninja" -Thost=x64 -DLLVM_TARGETS_TO_BUILD=X86 
-DCMAKE_BUILD_TYPE=Release -DLLVM_USE_CRT_RELEASE=MT -DLLVM_ENABLE_RTTI=1 
-DLLVM_ENABLE_TERMINFO=OFF -DCMAKE_INSTALL_PREFIX=../x64

Mesa3D build config command: C:\Software\DEVELO~1\projects\mesa\Py3\python.exe 
C:\Software\DEVELO~1\projects\mesa\Py3\Scripts\meson.py . 
.\build\windows-x86_64 --backend=ninja --buildtype=release -Dllvm-wrap=llvm 
-Dosmesa=gallium

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 0/4] Android kms_swrast support

2018-07-19 Thread Tomasz Figa

On Thu, Jul 19, 2018 at 12:08 AM Robert Foss  wrote:
>
> Hey Rob,
>
> On 2018-07-18 15:30, Rob Herring wrote:
> > On Tue, Jul 17, 2018 at 4:33 AM Robert Foss  
> > wrote:
> >>
> >> This series implements kms_swrast support for the Android
> >> platform. And since having to debug a null pointer dereference,
> >> simplify that process for the next guy.
> >
> > So is this working for you now?
>
> I'm seeing page-flips happen in the logs, but have no graphical output on the
> Qemu-based setup I'm using now.
>
> When using virgl I'm seeing the same page-flipping in the logs, but no 
> graphical
> output.
>
> >
> >> As it stands now, any kernel must have the following ioctls flagged with
> >> DRM_RENDER_ALLOW[1], which isn't the case in the mainline kernel.
> >>
> >> DRM_IOCTL_MODE_CREATE_DUMB
> >> DRM_IOCTL_MODE_MAP_DUMB
> >
> > Ah, sorry. I should have mentioned this. We have discussed this issue
> > in the past, but to no further conclusion.
> >
> > But as I recall, I thought the issue was also allowing import and
> > export of dumb buffers?
>
> Yeah, it's a two-parter for any AOSP Treble build.
> 1) Allow dumb buffer ioctls fom render nodes
> 2) Support moving buffers across processes.

Wouldn't 2) be automatically solved by 1), since we should be able to
run drmPrimeHandleToFD for dumb buffers already?

>
> >
> >> While it would be possible to open a non-render node to pass the
> >> authentication check, this would still cause authentication issues
> >> when the /dev/dri/cardX node needs to be opened as master by both mesa
> >> and the compositor.
> >
> > Right. We've pretty much stripped the support that was there out. Plus
> > I don't think it will work with Treble.
> >
> >> I don't know how acceptable this series is for upstreaming, while relying 
> >> on
> >> a non-mainline kernel. I think the policy is to not accept changes that
> >> don't have both a user and kernel space solution in place.
> >>
> >> Like I noted yesterday[2] the alternative to using dumb buffers and having
> >> authentication issues is using VGEM, which is new territory to me, and it 
> >> would
> >> take me a little bit of time to figure exactly how it fits into the current
> >> kms_swrast approach.
> >> Input, like noted before, is very much welcome.
> >
> > I'm very much in favor of the former approach. VGEM seems like an
> > overly complicated solution when there's a very simple solution.
> >
>
> The former solution being what we have now, dumb buffers?
> I don't think dumb buffers are a viable path due to 2) listed above.

I don't understand what 2) is about. Could you elaborate on it?

I'd personally be for dropping those strange restrictions from render
nodes. I don't see why a render node couldn't allocate and map a dumb
buffer (for software rendering) and share it with another process that
opened a control node (to display it).

>
> If there are any other options I'm not aware of, I'm very much listening.

One could just call mmap() on DMA-buf FDs directly rather than
importing them, but that could open another can of worms, because FDs
don't give us any way to deduplicate buffers (you might be given
several FDs pointing to the same buffer, which in case of importing to
DRM would end up with the same GEM handle every time).

Best regards,
Tomasz
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 107224] Incorrect Rendering in Deus Ex: Mankind Divided in-game menu

2018-07-19 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=107224

--- Comment #2 from network...@rkmail.ru ---
(In reply to Timothy Arceri from comment #1)

> Are you sure it used to work. I've tested all the way back to Mesa
> 17.0-branchpoint (d1efa09d342bff) on Intels i965 driver and I still see the
> issue. 
> 
> For now moving this to Mesa core as it seems to be a general Mesa issue (or
> game bug ?) rather than a radeonsi problem.

I'm sure I played the game with no issues before. It was in March 2017, I
think. Maybe the game was updated since then. I have no capable hardware from
other vendors to test with.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] nir: allow nir search type check to see through bcsel

2018-07-19 Thread Ian Romanick

Oh man... I was also recently looking at that same compute shader, and I
wrote nearly identical patches the early part of last week.  The bcsel
patches caused a bit of pain for i965.  I came up with a different way
to handle that particular problem... either way, I eventually abandoned
the whole approach.  Adding a bunch of one-off cases for weird
combinations of logic expressions (and that shader has some doozies!)
just isn't scalable.

I've pushed a branch logic-expression-frobbing to my cgit with all that
work.

In the mean time, I have been working code that generically optimizes
logical expressions.  I'm hoping to get that sent out next week.  So
far, it looks like it should be able to achieve the same affect on this
particular shader.  This new pass should make most, if not all, of the
logic expression algebraic optimizations in nir_opt_algebraic.

As soon as I can run shader-db, I'll post a branch.

On 07/18/2018 08:29 PM, Timothy Arceri wrote:
> ---
>  src/compiler/nir/nir_search.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
> index 28b36b2b863..743ffdf232c 100644
> --- a/src/compiler/nir/nir_search.c
> +++ b/src/compiler/nir/nir_search.c
> @@ -73,6 +73,9 @@ src_is_type(nir_src src, nir_alu_type type)
> src_is_type(src_alu->src[1].src, nir_type_bool);
>   case nir_op_inot:
>  return src_is_type(src_alu->src[0].src, nir_type_bool);
> + case nir_op_bcsel:
> +return src_is_type(src_alu->src[1].src, nir_type_bool) &&
> +   src_is_type(src_alu->src[2].src, nir_type_bool);
>   default:
>  break;
>   }
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

81 matches

Mail list logo