[Mesa-dev] [PATCH] gallivm: Make sure module has the correct data layout when pass manager runs

2017-05-07 Thread Tom Stellard
The datalayout for modules was purposely not being set in order to work around
the fact that the ExecutionEngine requires that the module's datalayout
matches the datalayout of the TargetMachine that the ExecutionEngine is
using.

When the pass manager runs on a module with no datalayout, it uses
the default datalayout which is little-endian.  This causes problems
on big-endian targets, because some optimizations that are legal on
little-endian or illegal on big-endian.

To resolve this, we set the datalayout prior to running the pass
manager, and then clear it before creating the ExectionEngine.

This patch fixes a lot of piglit tests on big-endian ppc64.

Cc: mesa-sta...@lists.freedesktop.org
---
 src/gallium/auxiliary/gallivm/lp_bld_init.c | 34 +++--
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c 
b/src/gallium/auxiliary/gallivm/lp_bld_init.c
index ef2580e..9f1ade6 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c
@@ -125,19 +125,6 @@ create_pass_manager(struct gallivm_state *gallivm)
LLVMAddTargetData(gallivm->target, gallivm->passmgr);
 #endif
 
-   /* Setting the module's DataLayout to an empty string will cause the
-* ExecutionEngine to copy to the DataLayout string from its target
-* machine to the module.  As of LLVM 3.8 the module and the execution
-* engine are required to have the same DataLayout.
-*
-* TODO: This is just a temporary work-around.  The correct solution is
-* for gallivm_init_state() to create a TargetMachine and pull the
-* DataLayout from there.  Currently, the TargetMachine used by llvmpipe
-* is being implicitly created by the EngineBuilder in
-* lp_build_create_jit_compiler_for_module()
-*/
-
-#if HAVE_LLVM < 0x0308
{
   char *td_str;
   // New ones from the Module.
@@ -145,9 +132,6 @@ create_pass_manager(struct gallivm_state *gallivm)
   LLVMSetDataLayout(gallivm->module, td_str);
   free(td_str);
}
-#else
-   LLVMSetDataLayout(gallivm->module, "");
-#endif
 
if ((gallivm_debug & GALLIVM_DEBUG_NO_OPT) == 0) {
   /* These are the passes currently listed in llvm-c/Transforms/Scalar.h,
@@ -628,6 +612,24 @@ gallivm_compile_module(struct gallivm_state *gallivm)
}
 
if (use_mcjit) {
+  /* Setting the module's DataLayout to an empty string will cause the
+   * ExecutionEngine to copy to the DataLayout string from its target
+   * machine to the module.  As of LLVM 3.8 the module and the execution
+   * engine are required to have the same DataLayout.
+   *
+   * We must make sure we do this after running the optimization passes,
+   * because those passes need a correct datalayout string.  For example,
+   * if those optimization passes see an empty datalayout, they will assume
+   * this is a little endian target and will do optimizations that break 
big
+   * endian machines.
+   *
+   * TODO: This is just a temporary work-around.  The correct solution is
+   * for gallivm_init_state() to create a TargetMachine and pull the
+   * DataLayout from there.  Currently, the TargetMachine used by llvmpipe
+   * is being implicitly created by the EngineBuilder in
+   * lp_build_create_jit_compiler_for_module()
+   */
+  LLVMSetDataLayout(gallivm->module, "");
   assert(!gallivm->engine);
   if (!init_gallivm_engine(gallivm)) {
  assert(0);
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: setup llvm target data layout

2017-03-13 Thread Tom Stellard
On Tue, Mar 14, 2017 at 06:52:19AM +1000, Dave Airlie wrote:
> From: Dave Airlie <airl...@redhat.com>
> 
> Ported from radeonsi, pointed out by Tom.
> 
> "This prevents LLVM from using sext instructions for local memory
> offsets and allows the backend to fold immediate offsets into the
> instruction. This also prevents some incorrect code generation for
> ptrtoint and inttoptr instructions."
> 
> Signed-off-by: Dave Airlie <airl...@redhat.com>

Reviewed-by: Tom Stellard <tstel...@redhat.com>
> ---
>  src/amd/common/ac_nir_to_llvm.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 47091a2..417b34e 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -4911,6 +4911,13 @@ LLVMModuleRef 
> ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
>   memset(shader_info, 0, sizeof(*shader_info));
>  
>   LLVMSetTarget(ctx.module, options->supports_spill ? 
> "amdgcn-mesa-mesa3d" : "amdgcn--");
> +
> + LLVMTargetDataRef data_layout = LLVMCreateTargetDataLayout(tm);
> + char *data_layout_str = LLVMCopyStringRepOfTargetData(data_layout);
> + LLVMSetDataLayout(ctx.module, data_layout_str);
> + LLVMDisposeTargetData(data_layout);
> + LLVMDisposeMessage(data_layout_str);
> +
>   setup_types();
>  
>   ctx.builder = LLVMCreateBuilderInContext(ctx.context);
> -- 
> 2.9.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv/ac: enable loop unrolling.

2017-02-27 Thread Tom Stellard
On Fri, Feb 24, 2017 at 03:30:50PM -0800, Matt Arsenault wrote:
> 
> > On Feb 24, 2017, at 14:39, Marek Olšák  wrote:
> > 
> > On Fri, Feb 24, 2017 at 7:20 PM, Matt Arsenault  wrote:
> >> 
> >> On Feb 24, 2017, at 01:45, Marek Olšák  wrote:
> >> 
> >> The main requirement is that if there is indirect indexing inside a
> >> loop, we always want to unroll the whole loop to get rid of the
> >> indexing, which can decrease scratch usage.
> >> 
> >> Marek
> >> 
> >> We boost the unroll thresholds when there is private memory indexed by the
> >> induction variable. See AMDGPUTTIImpl::getUnrollingPreferences
> > 
> > When Samuel Pitoiset was experimenting with the same code as this
> > patch but for radeonsi, getUnrollingPreferences wasn't even getting
> > called when unrolling. I guess he eventually gave up or didn't see any
> > positive effect from it.
> > 
> > Marek
> 
> Then there’s a bug somewhere. It should be getting called

It's possible TargetTransformInfo isn't being setup correctly by the
mesa pass pipeline.

-Tom

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: Fix build on LLVM < 3.9 v2

2017-01-31 Thread Tom Stellard
This was broken by: e0cc0a614c96011958bc3a1b84da9168e0e1ccbb

v2:
  - Use preprocessor macro
---
 src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c 
b/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
index 205686a..c7445e0 100644
--- a/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
+++ b/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
@@ -1256,8 +1256,6 @@ void si_llvm_context_init(struct si_shader_context *ctx,
  const struct tgsi_token *tokens)
 {
struct lp_type type;
-   LLVMTargetDataRef data_layout = LLVMCreateTargetDataLayout(tm);
-   char *data_layout_str = LLVMCopyStringRepOfTargetData(data_layout);
 
/* Initialize the gallivm object:
 * We are only using the module, context, and builder fields of this 
struct.
@@ -1275,9 +1273,13 @@ void si_llvm_context_init(struct si_shader_context *ctx,
ctx->gallivm.context);
LLVMSetTarget(ctx->gallivm.module, "amdgcn--");
 
+#if HAVE_LLVM >= 0x0309
+   LLVMTargetDataRef data_layout = LLVMCreateTargetDataLayout(tm);
+   char *data_layout_str = LLVMCopyStringRepOfTargetData(data_layout);
LLVMSetDataLayout(ctx->gallivm.module, data_layout_str);
LLVMDisposeTargetData(data_layout);
LLVMDisposeMessage(data_layout_str);
+#endif
 
bool unsafe_fpmath = (sscreen->b.debug_flags & DBG_UNSAFE_MATH) != 0;
ctx->gallivm.builder = lp_create_builder(ctx->gallivm.context,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: Fix build on LLVM < 3.9

2017-01-31 Thread Tom Stellard
This was broken by: e0cc0a614c96011958bc3a1b84da9168e0e1ccbb
---
 src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c 
b/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
index 205686a..897faae 100644
--- a/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
+++ b/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
@@ -1257,7 +1257,11 @@ void si_llvm_context_init(struct si_shader_context *ctx,
 {
struct lp_type type;
LLVMTargetDataRef data_layout = LLVMCreateTargetDataLayout(tm);
-   char *data_layout_str = LLVMCopyStringRepOfTargetData(data_layout);
+   char *data_layout_str = NULL;
+
+   if (HAVE_LLVM >= 0x0309) {
+   data_layout_str = LLVMCopyStringRepOfTargetData(data_layout);
+   }
 
/* Initialize the gallivm object:
 * We are only using the module, context, and builder fields of this 
struct.
@@ -1275,9 +1279,11 @@ void si_llvm_context_init(struct si_shader_context *ctx,
ctx->gallivm.context);
LLVMSetTarget(ctx->gallivm.module, "amdgcn--");
 
-   LLVMSetDataLayout(ctx->gallivm.module, data_layout_str);
-   LLVMDisposeTargetData(data_layout);
-   LLVMDisposeMessage(data_layout_str);
+   if (data_layout_str) {
+   LLVMSetDataLayout(ctx->gallivm.module, data_layout_str);
+   LLVMDisposeTargetData(data_layout);
+   LLVMDisposeMessage(data_layout_str);
+   }
 
bool unsafe_fpmath = (sscreen->b.debug_flags & DBG_UNSAFE_MATH) != 0;
ctx->gallivm.builder = lp_create_builder(ctx->gallivm.context,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] radeonsi: Use build_buffer_load helper function for geometry shaders

2017-01-31 Thread Tom Stellard
Also modify build_buffer_load to always pass soffset to an intrinsic
if it is set.  This is required to avoid failing buffer range checks
in some cases.
---
 src/gallium/drivers/radeonsi/si_shader.c | 67 ++--
 1 file changed, 20 insertions(+), 47 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 5c5f2e6..a6de7c4 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -890,7 +890,7 @@ static LLVMValueRef build_buffer_load(struct 
si_shader_context *ctx,
struct gallivm_state *gallivm = >gallivm;
unsigned func = CLAMP(num_channels, 1, 3) - 1;
 
-   if (HAVE_LLVM >= 0x309) {
+   if (!soffset && HAVE_LLVM >= 0x309) {
LLVMValueRef args[] = {
LLVMBuildBitCast(gallivm->builder, rsrc, ctx->v4i32, 
""),
vindex ? vindex : LLVMConstInt(ctx->i32, 0, 0),
@@ -909,11 +909,6 @@ static LLVMValueRef build_buffer_load(struct 
si_shader_context *ctx,
   "");
}
 
-   if (soffset) {
-   args[2] = LLVMBuildAdd(gallivm->builder, args[2], 
soffset,
-  "");
-   }
-
snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s",
 type_names[func]);
 
@@ -1185,13 +1180,12 @@ static LLVMValueRef fetch_input_gs(
struct lp_build_context *uint = >bld_base.uint_bld;
struct gallivm_state *gallivm = base->gallivm;
LLVMValueRef vtx_offset;
-   LLVMValueRef args[9];
unsigned vtx_offset_param;
struct tgsi_shader_info *info = >selector->info;
unsigned semantic_name = info->input_semantic_name[reg->Register.Index];
unsigned semantic_index = 
info->input_semantic_index[reg->Register.Index];
unsigned param;
-   LLVMValueRef value;
+   LLVMValueRef soffset, value;
 
if (swizzle != ~0 && semantic_name == TGSI_SEMANTIC_PRIMID)
return get_primitive_id(bld_base, swizzle);
@@ -1223,27 +1217,15 @@ static LLVMValueRef fetch_input_gs(
  4);
 
param = si_shader_io_get_unique_index(semantic_name, semantic_index);
-   args[0] = ctx->esgs_ring;
-   args[1] = vtx_offset;
-   args[2] = lp_build_const_int32(gallivm, (param * 4 + swizzle) * 256);
-   args[3] = uint->zero;
-   args[4] = uint->one;  /* OFFEN */
-   args[5] = uint->zero; /* IDXEN */
-   args[6] = uint->one;  /* GLC */
-   args[7] = uint->zero; /* SLC */
-   args[8] = uint->zero; /* TFE */
-
-   value = lp_build_intrinsic(gallivm->builder,
-  "llvm.SI.buffer.load.dword.i32.i32",
-  ctx->i32, args, 9,
-  LP_FUNC_ATTR_READONLY);
+   soffset = lp_build_const_int32(gallivm, (param * 4 + swizzle) * 256);
+
+   value = build_buffer_load(ctx, ctx->esgs_ring, 1, NULL,
+ vtx_offset, soffset, 0, 1, 0);
if (tgsi_type_is_64bit(type)) {
LLVMValueRef value2;
-   args[2] = lp_build_const_int32(gallivm, (param * 4 + swizzle + 
1) * 256);
-   value2 = lp_build_intrinsic(gallivm->builder,
-   "llvm.SI.buffer.load.dword.i32.i32",
-   ctx->i32, args, 9,
-   LP_FUNC_ATTR_READONLY);
+   soffset = lp_build_const_int32(gallivm, (param * 4 + swizzle + 
1) * 256);
+   value2 = build_buffer_load(ctx, ctx->esgs_ring, 1, NULL,
+  vtx_offset, soffset, 0, 1, 0);
return si_llvm_emit_fetch_64bit(bld_base, type,
value, value2);
}
@@ -6476,7 +6458,7 @@ si_generate_gs_copy_shader(struct si_screen *sscreen,
struct lp_build_context *uint = _base->uint_bld;
struct si_shader_output_values *outputs;
struct tgsi_shader_info *gsinfo = _selector->info;
-   LLVMValueRef args[9];
+   LLVMValueRef voffset;
int i, r;
 
outputs = MALLOC(gsinfo->num_outputs * sizeof(outputs[0]));
@@ -6503,18 +6485,6 @@ si_generate_gs_copy_shader(struct si_screen *sscreen,
create_function();
preload_ring_buffers();
 
-   args[0] = ctx.gsvs_ring[0];
-   args[1] = lp_build_mul_imm(uint,
-  LLVMGetParam(ctx.main_fn,
-   ctx.param_vertex_id),
-  4);
-   args[3] = uint->zero;
-   args[4] = uint->one;  /* OFFEN */
-   args[5] = uint->zero; /* IDXEN */
-   args[6] = uint->one;  /* GLC */
-   args[7] = uint->one;  /* SLC */
-   args[8] = uint->zero; /* TFE 

[Mesa-dev] [PATCH 2/2] radeonsi: Use llvm.amdgcn.s.buffer.load instead of llvm.SI.load.const

2017-01-31 Thread Tom Stellard
Advantages of using llvm.amdgcn.s.buffer.load

- We can use a real pointer type, which LLVM can better reason about and do
  alias analysis on.  This will also ease the transition to using fat pointers
  and LLVM IR loads.

- llvm.amdgcn.s.buffer.load is defined in IntrinsicsAMDGPU.td so passes can
  query information about it other than just its attributes.
---
 src/gallium/auxiliary/gallivm/lp_bld_intr.c|  1 +
 src/gallium/auxiliary/gallivm/lp_bld_intr.h|  3 +-
 src/gallium/drivers/radeonsi/si_shader.c   | 48 +-
 src/gallium/drivers/radeonsi/si_shader_internal.h  |  8 
 .../drivers/radeonsi/si_shader_tgsi_setup.c|  6 +++
 5 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_intr.c 
b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
index 049671a..dc8de55 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_intr.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
@@ -144,6 +144,7 @@ static const char *attr_to_str(enum lp_func_attr attr)
 {
switch (attr) {
case LP_FUNC_ATTR_ALWAYSINLINE: return "alwaysinline";
+   case LP_FUNC_ATTR_ARGMEMONLY: return "argmemonly";
case LP_FUNC_ATTR_BYVAL: return "byval";
case LP_FUNC_ATTR_INREG: return "inreg";
case LP_FUNC_ATTR_NOALIAS: return "noalias";
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_intr.h 
b/src/gallium/auxiliary/gallivm/lp_bld_intr.h
index f1e075a..7c8f09b 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_intr.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_intr.h
@@ -54,7 +54,8 @@ enum lp_func_attr {
LP_FUNC_ATTR_NOUNWIND = (1 << 4),
LP_FUNC_ATTR_READNONE = (1 << 5),
LP_FUNC_ATTR_READONLY = (1 << 6),
-   LP_FUNC_ATTR_LAST = (1 << 7)
+   LP_FUNC_ATTR_ARGMEMONLY   = (1 << 7),
+   LP_FUNC_ATTR_LAST = (1 << 8)
 };
 
 void
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index a6de7c4..cf13cb5 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -93,11 +93,6 @@ static void si_build_ps_epilog_function(struct 
si_shader_context *ctx,
  */
 #define VS_EPILOG_PRIMID_LOC 2
 
-enum {
-   CONST_ADDR_SPACE = 2,
-   LOCAL_ADDR_SPACE = 3,
-};
-
 #define SENDMSG_GS 2
 #define SENDMSG_GS_DONE 3
 
@@ -360,8 +355,21 @@ static LLVMValueRef build_indexed_load_const(
struct si_shader_context *ctx,
LLVMValueRef base_ptr, LLVMValueRef index)
 {
+   LLVMTypeRef ptr_type = LLVMTypeOf(base_ptr);
+   LLVMTypeRef elem_type = LLVMGetElementType(ptr_type);
+   LLVMTypeKind elem_kind = LLVMGetTypeKind(elem_type);
LLVMValueRef result = build_indexed_load(ctx, base_ptr, index, true);
LLVMSetMetadata(result, ctx->invariant_load_md_kind, ctx->empty_md);
+
+   /* Set !dereferenceable metadata */
+   if (elem_kind == LLVMPointerTypeKind ||
+   (elem_kind == LLVMArrayTypeKind && 
LLVMGetTypeKind(LLVMGetElementType(elem_type)) == LLVMPointerTypeKind)) {
+   LLVMValueRef deref_bytes, deref_md;
+   deref_bytes = LLVMConstInt(ctx->i64, UINT64_MAX, 0);
+   deref_md = LLVMMDNodeInContext(LLVMGetTypeContext(ptr_type),
+   _bytes, 1);
+   LLVMSetMetadata(result, ctx->dereferenceable_md_kind, deref_md);
+   }
return result;
 }
 
@@ -1571,16 +1579,34 @@ static LLVMValueRef get_thread_id(struct 
si_shader_context *ctx)
 
 /**
  * Load a dword from a constant buffer.
+ * @param offset This is a byte offset.
+ * @returns An LLVMValueRef with f32 type.
  */
 static LLVMValueRef buffer_load_const(struct si_shader_context *ctx,
  LLVMValueRef resource,
  LLVMValueRef offset)
 {
LLVMBuilderRef builder = ctx->gallivm.builder;
-   LLVMValueRef args[2] = {resource, offset};
+   LLVMValueRef load;
+   LLVMValueRef args[3] = {resource, offset, LLVMConstInt(ctx->i1, 0, 0) };
+   LLVMTypeRef resource_type = LLVMTypeOf(resource);
+   LLVMTypeKind resource_kind = LLVMGetTypeKind(resource_type);
+
+   /* XXX: We can have a non-pointer resource if we do a constant load
+ * from the RW_BUFFERS whicha are still represented using the <16 x i8>
+ * type. We can eliminate this once we start using pointer types for
+* those buffers.
+*/
+   if (resource_kind != LLVMPointerTypeKind) {
+   return lp_build_intrinsic(builder, "llvm.SI.load.const",
+ ctx->f32, args, 2,
+ LP_FUNC_ATTR_READNONE);
+   }
 
-   return lp_build_intrinsic(builder, "llvm.SI.load.const", ctx->f32, 
args, 2,
-  LP_FUNC_ATTR_READNONE);
+   load = lp_build_intrinsic(builder, "llvm.amdgcn.s.buffer.load.i32",
+ ctx->i32, args, 3,
+

Re: [Mesa-dev] [PATCH 1/2] radeonsi: add Polaris12 support

2016-12-19 Thread Tom Stellard
On Mon, Dec 19, 2016 at 02:04:05PM -0500, Alex Deucher wrote:
> From: Junwei Zhang 
> 
> Signed-off-by: Junwei Zhang 
> Reviewed-by: Nicolai Hähnle 
> Acked-by: Christian König 
> ---
>  src/amd/addrlib/r800/ciaddrlib.cpp| 3 ++-
>  src/amd/addrlib/r800/ciaddrlib.h  | 1 +
>  src/amd/common/amd_family.h   | 1 +
>  src/amd/common/amdgpu_id.h| 4 
>  src/gallium/drivers/radeon/r600_pipe_common.c | 6 ++
>  src/gallium/drivers/radeon/radeon_vce.c   | 3 ++-
>  src/gallium/drivers/radeonsi/si_pipe.c| 1 +
>  src/gallium/drivers/radeonsi/si_state.c   | 1 +
>  src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 4 
>  9 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/src/amd/addrlib/r800/ciaddrlib.cpp 
> b/src/amd/addrlib/r800/ciaddrlib.cpp
> index 7c5d29a..c726c4d 100644
> --- a/src/amd/addrlib/r800/ciaddrlib.cpp
> +++ b/src/amd/addrlib/r800/ciaddrlib.cpp
> @@ -353,6 +353,7 @@ AddrChipFamily CIAddrLib::HwlConvertChipFamily(
>  m_settings.isFiji= ASICREV_IS_FIJI_P(uChipRevision);
>  m_settings.isPolaris10   = 
> ASICREV_IS_POLARIS10_P(uChipRevision);
>  m_settings.isPolaris11   = 
> ASICREV_IS_POLARIS11_M(uChipRevision);
> +m_settings.isPolaris12   = 
> ASICREV_IS_POLARIS12_V(uChipRevision);
>  break;
>  case FAMILY_CZ:
>  m_settings.isCarrizo = 1;
> @@ -417,7 +418,7 @@ BOOL_32 CIAddrLib::HwlInitGlobalParams(
>  {
>  m_pipes = 16;
>  }
> -else if (m_settings.isPolaris11)
> +else if (m_settings.isPolaris11 || m_settings.isPolaris12)
>  {
>  m_pipes = 4;
>  }
> diff --git a/src/amd/addrlib/r800/ciaddrlib.h 
> b/src/amd/addrlib/r800/ciaddrlib.h
> index de995fa..2c9a4cc 100644
> --- a/src/amd/addrlib/r800/ciaddrlib.h
> +++ b/src/amd/addrlib/r800/ciaddrlib.h
> @@ -62,6 +62,7 @@ struct CIChipSettings
>  UINT_32 isFiji: 1;
>  UINT_32 isPolaris10   : 1;
>  UINT_32 isPolaris11   : 1;
> +UINT_32 isPolaris12   : 1;
>  // VI fusion (Carrizo)
>  UINT_32 isCarrizo : 1;
>  };
> diff --git a/src/amd/common/amd_family.h b/src/amd/common/amd_family.h
> index 6a713ad..b09bbb8 100644
> --- a/src/amd/common/amd_family.h
> +++ b/src/amd/common/amd_family.h
> @@ -91,6 +91,7 @@ enum radeon_family {
>  CHIP_STONEY,
>  CHIP_POLARIS10,
>  CHIP_POLARIS11,
> +CHIP_POLARIS12,
>  CHIP_LAST,
>  };
>  
> diff --git a/src/amd/common/amdgpu_id.h b/src/amd/common/amdgpu_id.h
> index f91df55..1683a5a 100644
> --- a/src/amd/common/amdgpu_id.h
> +++ b/src/amd/common/amdgpu_id.h
> @@ -142,6 +142,8 @@ enum {
>  
>   VI_POLARIS11_M_A0 = 90,
>  
> + VI_POLARIS12_V_A0 = 100,
> +
>   VI_UNKNOWN= 0xFF
>  };
>  
> @@ -156,6 +158,8 @@ enum {
>   ((eChipRev >= VI_POLARIS10_P_A0) && (eChipRev < VI_POLARIS11_M_A0))
>  #define ASICREV_IS_POLARIS11_M(eChipRev)   \
>   (eChipRev >= VI_POLARIS11_M_A0)
> +#define ASICREV_IS_POLARIS12_V(eChipRev)\
> + (eChipRev >= VI_POLARIS12_V_A0)
>  
>  /* CZ specific rev IDs */
>  enum {
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
> b/src/gallium/drivers/radeon/r600_pipe_common.c
> index 0b5c6dc..033e59c 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
> @@ -755,6 +755,7 @@ static const char* r600_get_chip_name(struct 
> r600_common_screen *rscreen)
>   case CHIP_FIJI: return "AMD FIJI";
>   case CHIP_POLARIS10: return "AMD POLARIS10";
>   case CHIP_POLARIS11: return "AMD POLARIS11";
> + case CHIP_POLARIS12: return "AMD POLARIS12";
>   case CHIP_STONEY: return "AMD STONEY";
>   default: return "AMD unknown";
>   }
> @@ -893,6 +894,11 @@ const char *r600_get_llvm_processor_name(enum 
> radeon_family family)
>   case CHIP_POLARIS10: return "polaris10";
>   case CHIP_POLARIS11: return "polaris11";
>  #endif
> +#if HAVE_LLVM <= 0x0309
> + case CHIP_POLARIS12: return "polaris11";
> +#else
> + case CHIP_POLARIS12: return "polaris12";

There is a preference to move away from code names and use
gfxip names in the compiler. For LLVM >= 4.0 this should be "gfx803"

-Tom

> +#endif
>   default: return "";
>   }
>  }
> diff --git a/src/gallium/drivers/radeon/radeon_vce.c 
> b/src/gallium/drivers/radeon/radeon_vce.c
> index aad2ec1..dcd56ea 100644
> --- a/src/gallium/drivers/radeon/radeon_vce.c
> +++ b/src/gallium/drivers/radeon/radeon_vce.c
> @@ -413,7 +413,8 @@ struct pipe_video_codec *rvce_create_encoder(struct 
> pipe_context *context,
>   enc->use_vui = true;
>   if (rscreen->info.family >= CHIP_TONGA &&
>   rscreen->info.family != CHIP_STONEY &&
> - rscreen->info.family != CHIP_POLARIS11)
> + 

[Mesa-dev] [PATCH] radeonsi: Use build_buffer_load helper function for geometry shaders

2016-12-15 Thread Tom Stellard
Also add a need_range_checks parameter to this function, which can be
set to false to enable some additional optimizations.  Currently, this
will cause the compiler to emit the llvm.SI.buffer.load.dword intrinsic
instead of llvm.amdgcn.buffer.load.  Eventually, this information
will be passed to LLVM to enable more agressive addressing mode optimizations.
---
 src/gallium/drivers/radeonsi/si_shader.c | 79 
 1 file changed, 29 insertions(+), 50 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 72cf827..5b15ad4 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -825,12 +825,13 @@ static LLVMValueRef build_buffer_load(struct 
si_shader_context *ctx,
   LLVMValueRef soffset,
   unsigned inst_offset,
   unsigned glc,
-  unsigned slc)
+  unsigned slc,
+  bool need_range_checks)
 {
struct gallivm_state *gallivm = >gallivm;
unsigned func = CLAMP(num_channels, 1, 3) - 1;
 
-   if (HAVE_LLVM >= 0x309) {
+   if (need_range_checks && HAVE_LLVM >= 0x309) {
LLVMValueRef args[] = {
LLVMBuildBitCast(gallivm->builder, rsrc, ctx->v4i32, 
""),
vindex ? vindex : LLVMConstInt(ctx->i32, 0, 0),
@@ -896,7 +897,7 @@ static LLVMValueRef build_buffer_load(struct 
si_shader_context *ctx,
 static LLVMValueRef buffer_load(struct lp_build_tgsi_context *bld_base,
 enum tgsi_opcode_type type, unsigned swizzle,
 LLVMValueRef buffer, LLVMValueRef offset,
-LLVMValueRef base)
+LLVMValueRef base, bool need_range_checks)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
struct gallivm_state *gallivm = bld_base->base.gallivm;
@@ -906,14 +907,14 @@ static LLVMValueRef buffer_load(struct 
lp_build_tgsi_context *bld_base,
 
if (swizzle == ~0) {
value = build_buffer_load(ctx, buffer, 4, NULL, base, offset,
- 0, 1, 0);
+ 0, 1, 0, need_range_checks);
 
return LLVMBuildBitCast(gallivm->builder, value, vec_type, "");
}
 
if (!tgsi_type_is_64bit(type)) {
value = build_buffer_load(ctx, buffer, 4, NULL, base, offset,
- 0, 1, 0);
+ 0, 1, 0, need_range_checks);
 
value = LLVMBuildBitCast(gallivm->builder, value, vec_type, "");
return LLVMBuildExtractElement(gallivm->builder, value,
@@ -921,10 +922,10 @@ static LLVMValueRef buffer_load(struct 
lp_build_tgsi_context *bld_base,
}
 
value = build_buffer_load(ctx, buffer, 1, NULL, base, offset,
- swizzle * 4, 1, 0);
+ swizzle * 4, 1, 0, need_range_checks);
 
value2 = build_buffer_load(ctx, buffer, 1, NULL, base, offset,
-  swizzle * 4 + 4, 1, 0);
+  swizzle * 4 + 4, 1, 0, need_range_checks);
 
return si_llvm_emit_fetch_64bit(bld_base, type, value, value2);
 }
@@ -1044,7 +1045,7 @@ static LLVMValueRef fetch_input_tes(
base = LLVMGetParam(ctx->main_fn, ctx->param_oc_lds);
addr = get_tcs_tes_buffer_address_from_reg(ctx, NULL, reg);
 
-   return buffer_load(bld_base, type, swizzle, buffer, base, addr);
+   return buffer_load(bld_base, type, swizzle, buffer, base, addr, true);
 }
 
 static void store_output_tcs(struct lp_build_tgsi_context *bld_base,
@@ -1125,13 +1126,12 @@ static LLVMValueRef fetch_input_gs(
struct lp_build_context *uint = >soa.bld_base.uint_bld;
struct gallivm_state *gallivm = base->gallivm;
LLVMValueRef vtx_offset;
-   LLVMValueRef args[9];
unsigned vtx_offset_param;
struct tgsi_shader_info *info = >selector->info;
unsigned semantic_name = info->input_semantic_name[reg->Register.Index];
unsigned semantic_index = 
info->input_semantic_index[reg->Register.Index];
unsigned param;
-   LLVMValueRef value;
+   LLVMValueRef soffset, value;
 
if (swizzle != ~0 && semantic_name == TGSI_SEMANTIC_PRIMID)
return get_primitive_id(bld_base, swizzle);
@@ -1163,27 +1163,15 @@ static LLVMValueRef fetch_input_gs(
  4);
 
param = si_shader_io_get_unique_index(semantic_name, semantic_index);
-   args[0] = ctx->esgs_ring;
-   args[1] = vtx_offset;
-   args[2] = lp_build_const_int32(gallivm, (param * 4 + swizzle) * 256);
-  

[Mesa-dev] [PATCH] radeonsi: Set datalayout on the llvm module

2016-12-15 Thread Tom Stellard
This prevents LLVM from using sext instructions for local memory offsets
and allows the backend to fold immediate offsets into the instruction.

This also prevents some incorrect code generation for ptrtoint and
inttoptr instructions.
---
 src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c 
b/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
index 2f38949..b6cb00f 100644
--- a/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
+++ b/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
@@ -1231,6 +1231,8 @@ void si_llvm_context_init(struct si_shader_context *ctx,
  const struct tgsi_token *tokens)
 {
struct lp_type type;
+   LLVMTargetDataRef data_layout = LLVMCreateTargetDataLayout(tm);
+   char *data_layout_str = LLVMCopyStringRepOfTargetData(data_layout);
 
/* Initialize the gallivm object:
 * We are only using the module, context, and builder fields of this 
struct.
@@ -1248,6 +1250,10 @@ void si_llvm_context_init(struct si_shader_context *ctx,
ctx->gallivm.context);
LLVMSetTarget(ctx->gallivm.module, "amdgcn--");
 
+   LLVMSetDataLayout(ctx->gallivm.module, data_layout_str);
+   LLVMDisposeTargetData(data_layout);
+   LLVMDisposeMessage(data_layout_str);
+
bool unsafe_fpmath = (sscreen->b.debug_flags & DBG_UNSAFE_MATH) != 0;
ctx->gallivm.builder = lp_create_builder(ctx->gallivm.context,
 unsafe_fpmath);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Use buffer_load intrinsics instead of llvm.SI.vs.load.input

2016-11-18 Thread Tom Stellard
On Sat, Nov 19, 2016 at 01:09:00AM +0100, Marek Olšák wrote:
> On Wed, Nov 16, 2016 at 4:38 PM, Tom Stellard <t...@stellard.net> wrote:
> > On Wed, Nov 16, 2016 at 11:13:45AM +0100, Nicolai Hähnle wrote:
> >> Have you looked at the shader-db impact?
> >>
> >
> > shader-db is mostly unchanged.  There are a few decreases in SGPR usage and
> > code size, and a 4 byte increase in code size for one shader.
> >
> >> I do think we should eventually do this, but llvm.SI.vs.load.input is
> >> ReadNone while llvm.amdgcn.buffer.load.* is only ReadOnly, so as long as we
> >> can't teach LLVM properly about no-aliasing and speculability, there may be
> >> performance regressions.
> >>
> >
> > Ideally llvm.amdgcn.buffer.load.* would be ReadOnly and ArgMemOnly, but I 
> > think
> > as long as it has non-pointer arguments this combination behaves the same as
> > ReadNone, which would be incorrect.
> 
> Why would it be incorrect?
> 

Because llvm.amdgcn.buffer.load.* can be used in a lot of different
ways, so it is possible that the memory it is reading from has been
modified by the shader.

-Tom

> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Add missing call to _mesa_unlock_debug_state(ctx);

2016-11-16 Thread Tom Stellard
cd724208d3e1e3307f84a794f2c1fc83b69ccf8a added a call to
_mesa_lock_debug_state(ctx) but wasn't unlocking the debug state.

This fixes a hang in glsl-fs-loop piglit test with MESA_DEBUG=context.
---
 src/gallium/drivers/radeonsi/si_pipe.c | 8 +---
 src/mesa/main/debug_output.c   | 5 +++--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 1737e23..b086f0e 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -128,9 +128,11 @@ si_create_llvm_target_machine(struct si_screen *sscreen)
 {
const char *triple = "amdgcn--";
 
-   if (sscreen->b.debug_flags & DBG_GLOBAL_ISEL) {
-   const char *options[1] = {"-global-isel"};
-   LLVMParseCommandLineOptions(1, options, NULL);
+   static bool cl_set = false;
+   if (!cl_set && sscreen->b.debug_flags & DBG_GLOBAL_ISEL) {
+   const char *options[4] = {"radeonsi", 
"-global-isel","-global-isel-abort=2", "-debug-only=instruction-select"};
+   LLVMParseCommandLineOptions(3, options, NULL);
+   cl_set = true;
}
 
return LLVMCreateTargetMachine(si_llvm_get_amdgpu_target(triple), 
triple,
diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c
index 4e9209b..b3d9398 100644
--- a/src/mesa/main/debug_output.c
+++ b/src/mesa/main/debug_output.c
@@ -1282,15 +1282,16 @@ _mesa_init_debug_output(struct gl_context *ctx)
*/
   struct gl_debug_state *debug = _mesa_lock_debug_state(ctx);
   if (!debug) {
- return;
+ goto done;
   }
   debug->DebugOutput = GL_TRUE;
   debug->LogToStderr = GL_TRUE;
   ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT;
}
+done:
+   _mesa_unlock_debug_state(ctx);
 }
 
-
 void
 _mesa_free_errors_data(struct gl_context *ctx)
 {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Add missing call to _mesa_unlock_debug_state(ctx); v2

2016-11-16 Thread Tom Stellard
cd724208d3e1e3307f84a794f2c1fc83b69ccf8a added a call to
_mesa_lock_debug_state(ctx) but wasn't unlocking the debug state.

This fixes a hang in glsl-fs-loop piglit test with MESA_DEBUG=context.

v2:
  - Remove unrelated changes.
---
 src/mesa/main/debug_output.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c
index 4e9209b..48dbbb3 100644
--- a/src/mesa/main/debug_output.c
+++ b/src/mesa/main/debug_output.c
@@ -1282,12 +1282,14 @@ _mesa_init_debug_output(struct gl_context *ctx)
*/
   struct gl_debug_state *debug = _mesa_lock_debug_state(ctx);
   if (!debug) {
- return;
+ goto done;
   }
   debug->DebugOutput = GL_TRUE;
   debug->LogToStderr = GL_TRUE;
   ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT;
}
+done:
+   _mesa_unlock_debug_state(ctx);
 }
 
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Use buffer_load intrinsics instead of llvm.SI.vs.load.input

2016-11-16 Thread Tom Stellard
On Wed, Nov 16, 2016 at 11:13:45AM +0100, Nicolai Hähnle wrote:
> Have you looked at the shader-db impact?
> 

shader-db is mostly unchanged.  There are a few decreases in SGPR usage and
code size, and a 4 byte increase in code size for one shader.

> I do think we should eventually do this, but llvm.SI.vs.load.input is
> ReadNone while llvm.amdgcn.buffer.load.* is only ReadOnly, so as long as we
> can't teach LLVM properly about no-aliasing and speculability, there may be
> performance regressions.
> 

Ideally llvm.amdgcn.buffer.load.* would be ReadOnly and ArgMemOnly, but I think
as long as it has non-pointer arguments this combination behaves the same as
ReadNone, which would be incorrect.

-Tom

> Cheers,
> Nicolai
> 
> On 16.11.2016 03:14, Tom Stellard wrote:
> >---
> > src/gallium/drivers/radeonsi/si_shader.c | 69 
> > +++-
> > 1 file changed, 50 insertions(+), 19 deletions(-)
> >
> >diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> >b/src/gallium/drivers/radeonsi/si_shader.c
> >index 306e12f..ee4fe2f 100644
> >--- a/src/gallium/drivers/radeonsi/si_shader.c
> >+++ b/src/gallium/drivers/radeonsi/si_shader.c
> >@@ -82,6 +82,17 @@ static void si_build_ps_prolog_function(struct 
> >si_shader_context *ctx,
> > static void si_build_ps_epilog_function(struct si_shader_context *ctx,
> > union si_shader_part_key *key);
> >
> >+static LLVMValueRef build_buffer_load(struct si_shader_context *ctx,
> >+  LLVMValueRef rsrc,
> >+  int num_channels,
> >+  LLVMValueRef vindex,
> >+  LLVMValueRef voffset,
> >+  LLVMValueRef soffset,
> >+  unsigned inst_offset,
> >+  unsigned glc,
> >+  unsigned slc,
> >+  bool is_format);
> >+
> > /* Ideally pass the sample mask input to the PS epilog as v13, which
> >  * is its usual location, so that the shader doesn't have to add v_mov.
> >  */
> >@@ -368,6 +379,31 @@ static LLVMValueRef get_instance_index_for_fetch(
> > LLVMGetParam(radeon_bld->main_fn, 
> > param_start_instance), "");
> > }
> >
> >+static LLVMValueRef build_vs_load_input(struct si_shader_context *ctx,
> >+LLVMValueRef rsrc,
> >+LLVMValueRef index,
> >+LLVMValueRef offset) {
> >+
> >+struct lp_build_context *base = >soa.bld_base.base;
> >+struct lp_build_context *uint = >soa.bld_base.uint_bld;
> >+struct gallivm_state *gallivm = base->gallivm;
> >+
> >+LLVMValueRef args[8];
> >+
> >+if (HAVE_LLVM < 0x0400) {
> >+args[0] = rsrc;
> >+args[1] = offset;
> >+args[2] = index;
> >+
> >+return lp_build_intrinsic(gallivm->builder,
> >+"llvm.SI.vs.load.input", ctx->v4f32, args, 3,
> >+LP_FUNC_ATTR_READNONE);
> >+}
> >+
> >+return build_buffer_load(ctx, rsrc, 4, index, offset,
> >+ uint->zero, 0, 0, 0, true);
> >+}
> >+
> > static void declare_input_vs(
> > struct si_shader_context *ctx,
> > unsigned input_index,
> >@@ -385,7 +421,6 @@ static void declare_input_vs(
> > LLVMValueRef t_list;
> > LLVMValueRef attribute_offset;
> > LLVMValueRef buffer_index;
> >-LLVMValueRef args[3];
> > LLVMValueRef input;
> >
> > /* Load the T list */
> >@@ -402,12 +437,8 @@ static void declare_input_vs(
> > ctx->param_vertex_index0 +
> > input_index);
> >
> >-args[0] = t_list;
> >-args[1] = attribute_offset;
> >-args[2] = buffer_index;
> >-input = lp_build_intrinsic(gallivm->builder,
> >-"llvm.SI.vs.load.input", ctx->v4f32, args, 3,
> >-LP_FUNC_ATTR_READNONE);
> >+input = build_vs_load_input(ctx, t_list, buffer_index,
> >+attribute_offset);
> >
> > /* Break up the vec4 into individual components */
> > for (chan = 0; chan < 4; chan++) {
> >@@ -808,7 +839,8 @@ static LLVMVa

[Mesa-dev] [PATCH 2/2] radeonsi: Use buffer_load intrinsics instead of llvm.SI.vs.load.input

2016-11-15 Thread Tom Stellard
---
 src/gallium/drivers/radeonsi/si_shader.c | 69 +++-
 1 file changed, 50 insertions(+), 19 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 306e12f..ee4fe2f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -82,6 +82,17 @@ static void si_build_ps_prolog_function(struct 
si_shader_context *ctx,
 static void si_build_ps_epilog_function(struct si_shader_context *ctx,
union si_shader_part_key *key);
 
+static LLVMValueRef build_buffer_load(struct si_shader_context *ctx,
+  LLVMValueRef rsrc,
+  int num_channels,
+  LLVMValueRef vindex,
+  LLVMValueRef voffset,
+  LLVMValueRef soffset,
+  unsigned inst_offset,
+  unsigned glc,
+  unsigned slc,
+ bool is_format);
+
 /* Ideally pass the sample mask input to the PS epilog as v13, which
  * is its usual location, so that the shader doesn't have to add v_mov.
  */
@@ -368,6 +379,31 @@ static LLVMValueRef get_instance_index_for_fetch(
LLVMGetParam(radeon_bld->main_fn, 
param_start_instance), "");
 }
 
+static LLVMValueRef build_vs_load_input(struct si_shader_context *ctx,
+   LLVMValueRef rsrc,
+   LLVMValueRef index,
+   LLVMValueRef offset) {
+
+   struct lp_build_context *base = >soa.bld_base.base;
+   struct lp_build_context *uint = >soa.bld_base.uint_bld;
+   struct gallivm_state *gallivm = base->gallivm;
+
+   LLVMValueRef args[8];
+
+   if (HAVE_LLVM < 0x0400) {
+   args[0] = rsrc;
+   args[1] = offset;
+   args[2] = index;
+
+   return lp_build_intrinsic(gallivm->builder,
+   "llvm.SI.vs.load.input", ctx->v4f32, args, 3,
+   LP_FUNC_ATTR_READNONE);
+   }
+
+   return build_buffer_load(ctx, rsrc, 4, index, offset,
+uint->zero, 0, 0, 0, true);
+}
+
 static void declare_input_vs(
struct si_shader_context *ctx,
unsigned input_index,
@@ -385,7 +421,6 @@ static void declare_input_vs(
LLVMValueRef t_list;
LLVMValueRef attribute_offset;
LLVMValueRef buffer_index;
-   LLVMValueRef args[3];
LLVMValueRef input;
 
/* Load the T list */
@@ -402,12 +437,8 @@ static void declare_input_vs(
ctx->param_vertex_index0 +
input_index);
 
-   args[0] = t_list;
-   args[1] = attribute_offset;
-   args[2] = buffer_index;
-   input = lp_build_intrinsic(gallivm->builder,
-   "llvm.SI.vs.load.input", ctx->v4f32, args, 3,
-   LP_FUNC_ATTR_READNONE);
+   input = build_vs_load_input(ctx, t_list, buffer_index,
+   attribute_offset);
 
/* Break up the vec4 into individual components */
for (chan = 0; chan < 4; chan++) {
@@ -808,7 +839,8 @@ static LLVMValueRef build_buffer_load(struct 
si_shader_context *ctx,
   LLVMValueRef soffset,
   unsigned inst_offset,
   unsigned glc,
-  unsigned slc)
+  unsigned slc,
+ bool is_format)
 {
struct gallivm_state *gallivm = >gallivm;
unsigned func = CLAMP(num_channels, 1, 3) - 1;
@@ -837,8 +869,8 @@ static LLVMValueRef build_buffer_load(struct 
si_shader_context *ctx,
   "");
}
 
-   snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s",
-type_names[func]);
+   snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s%s",
+is_format ? "format." : "", type_names[func]);
 
return lp_build_intrinsic(gallivm->builder, name, types[func], 
args,
  ARRAY_SIZE(args), 
LP_FUNC_ATTR_READONLY);
@@ -889,14 +921,14 @@ static LLVMValueRef buffer_load(struct 
lp_build_tgsi_context *bld_base,
 
if (swizzle == ~0) {
value = build_buffer_load(ctx, buffer, 4, NULL, base, offset,
- 0, 1, 0);
+ 0, 1, 0, false);
 
return LLVMBuildBitCast(gallivm->builder, value, vec_type, "");
}
 
if (!tgsi_type_is_64bit(type)) {
  

[Mesa-dev] [PATCH 1/2] radeonsi: Use amdgcn intrinsics for fs interpolation

2016-11-15 Thread Tom Stellard
---
 src/gallium/drivers/radeonsi/si_shader.c | 197 ++-
 1 file changed, 143 insertions(+), 54 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 0410a32..306e12f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1204,6 +1204,81 @@ static int lookup_interp_param_index(unsigned 
interpolate, unsigned location)
}
 }
 
+static LLVMValueRef build_fs_interp(
+   struct lp_build_tgsi_context *bld_base,
+   LLVMValueRef llvm_chan,
+   LLVMValueRef attr_number,
+   LLVMValueRef params,
+   LLVMValueRef i,
+   LLVMValueRef j) {
+
+   struct si_shader_context *ctx = si_shader_context(bld_base);
+   struct gallivm_state *gallivm = bld_base->base.gallivm;
+   LLVMValueRef args[5];
+   LLVMValueRef p1;
+   if (HAVE_LLVM < 0x0400) {
+   LLVMValueRef ij[2];
+   ij[0] = LLVMBuildBitCast(gallivm->builder, i, ctx->i32, "");
+   ij[1] = LLVMBuildBitCast(gallivm->builder, j, ctx->i32, "");
+
+   args[0] = llvm_chan;
+   args[1] = attr_number;
+   args[2] = params;
+   args[3] = lp_build_gather_values(gallivm, ij, 2);
+   return lp_build_intrinsic(gallivm->builder, "llvm.fs.interp",
+ ctx->f32, args, 4,
+ LP_FUNC_ATTR_READNONE);
+
+   }
+
+   args[0] = i;
+   args[1] = llvm_chan;
+   args[2] = attr_number;
+   args[3] = params;
+
+   p1 = lp_build_intrinsic(gallivm->builder, "llvm.amdgcn.interp.p1",
+   ctx->f32, args, 4, LP_FUNC_ATTR_READNONE);
+
+   args[0] = p1;
+   args[1] = j;
+   args[2] = llvm_chan;
+   args[3] = attr_number;
+   args[4] = params;
+
+   return lp_build_intrinsic(gallivm->builder, "llvm.amdgcn.interp.p2",
+ ctx->f32, args, 5, LP_FUNC_ATTR_READNONE);
+}
+
+static LLVMValueRef build_fs_interp_mov(
+   struct lp_build_tgsi_context *bld_base,
+   LLVMValueRef parameter,
+   LLVMValueRef llvm_chan,
+   LLVMValueRef attr_number,
+   LLVMValueRef params) {
+
+   struct si_shader_context *ctx = si_shader_context(bld_base);
+   struct gallivm_state *gallivm = bld_base->base.gallivm;
+   LLVMValueRef args[4];
+   if (HAVE_LLVM < 0x0400) {
+   args[0] = llvm_chan;
+   args[1] = attr_number;
+   args[2] = params;
+
+   return lp_build_intrinsic(gallivm->builder,
+ "llvm.SI.fs.constant",
+ ctx->f32, args, 3,
+ LP_FUNC_ATTR_READNONE);
+   }
+
+   args[0] = parameter;
+   args[1] = llvm_chan;
+   args[2] = attr_number;
+   args[3] = params;
+
+   return lp_build_intrinsic(gallivm->builder, "llvm.amdgcn.interp.mov",
+ ctx->f32, args, 4, LP_FUNC_ATTR_READNONE);
+}
+
 /**
  * Interpolate a fragment shader input.
  *
@@ -1229,16 +1304,15 @@ static void interp_fs_input(struct si_shader_context 
*ctx,
LLVMValueRef face,
LLVMValueRef result[4])
 {
-   struct lp_build_context *base = >soa.bld_base.base;
-   struct lp_build_context *uint = >soa.bld_base.uint_bld;
+   struct lp_build_tgsi_context *bld_base = >soa.bld_base;
+   struct lp_build_context *base = _base->base;
+   struct lp_build_context *uint = _base->uint_bld;
struct gallivm_state *gallivm = base->gallivm;
-   const char *intr_name;
LLVMValueRef attr_number;
+   LLVMValueRef i, j;
 
unsigned chan;
 
-   attr_number = lp_build_const_int32(gallivm, input_index);
-
/* fs.constant returns the param from the middle vertex, so it's not
 * really useful for flat shading. It's meant to be used for custom
 * interpolation (but the intrinsic can't fetch from the other two
@@ -1248,12 +1322,26 @@ static void interp_fs_input(struct si_shader_context 
*ctx,
 * to do the right thing. The only reason we use fs.constant is that
 * fs.interp cannot be used on integers, because they can be equal
 * to NaN.
+*
+* When interp is true we will use fs.constant or for newer llvm,
+ * amdgcn.interp.mov.
 */
-   intr_name = interp_param ? "llvm.SI.fs.interp" : "llvm.SI.fs.constant";
+   bool interp = interp_param != NULL;
+
+   attr_number = lp_build_const_int32(gallivm, input_index);
+
+   if (interp) {
+   interp_param = LLVMBuildBitCast(gallivm->builder, interp_param,
+   LLVMVectorType(ctx->f32, 2), 
"");
+
+   i = LLVMBuildExtractElement(gallivm->builder, interp_param,
+  

Re: [Mesa-dev] [PATCH] clover: adapt to new error API since LLVM r286752

2016-11-14 Thread Tom Stellard
On Mon, Nov 14, 2016 at 01:44:18PM +0100, Dieter Nützel wrote:
> Tested-by: Dieter Nützel 
> 
> Thanks Vedran!
> 

Pushed, thanks!

-Tom

> Dieter
> 
> Am 14.11.2016 12:17, schrieb Vedran Miletić:
> > ---
> >  src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp | 10 
> > --
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git
> > a/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> > b/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> > index 8e89a49..5dcc4f8 100644
> > --- a/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> > +++ b/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> > @@ -98,8 +98,14 @@ clover::llvm::parse_module_library(const module ,
> > ::llvm::LLVMContext ,
> > std::string _log) {
> > auto mod = ::llvm::parseBitcodeFile(::llvm::MemoryBufferRef(
> >  as_string(m.secs[0].data), " 
> > "), ctx);
> > -   if (!mod)
> > -  fail(r_log, error(CL_INVALID_PROGRAM), 
> > mod.getError().message());
> > +
> > +   if (::llvm::Error err = mod.takeError()) {
> > +  std::string msg;
> > +  ::llvm::handleAllErrors(std::move(err), 
> > [&](::llvm::ErrorInfoBase ) {
> > + msg = EIB.message();
> > + fail(r_log, error(CL_INVALID_PROGRAM), msg.c_str());
> > +  });
> > +   }
> > 
> > return std::unique_ptr<::llvm::Module>(std::move(*mod));
> >  }
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: fix building since llvm r286566

2016-11-11 Thread Tom Stellard
On Fri, Nov 11, 2016 at 02:00:26PM +0100, Laurent Carlier wrote:

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>

> pretty trivial fix
> ---
>  src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp 
> b/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> index 108f8d5..8e89a49 100644
> --- a/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> @@ -37,7 +37,12 @@
>  #include "util/algorithm.hpp"
>  
>  #include 
> +#if HAVE_LLVM < 0x0400
>  #include 
> +#else
> +#include 
> +#include 
> +#endif
>  #include 
>  
>  using namespace clover;
> -- 
> 2.10.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallivm: Fix build after removal of deprecated attribute API v3

2016-11-10 Thread Tom Stellard
On Wed, Nov 09, 2016 at 11:45:38PM +0100, Roland Scheidegger wrote:
> Am 09.11.2016 um 16:22 schrieb Tom Stellard:
> > v2:
> >   Fix adding parameter attributes with LLVM < 4.0.
> > 
> > v3:
> >   Fix typo.
> >   Fix parameter index.
> >   Add a gallivm enum for function attributes.
> > ---
> >  src/gallium/auxiliary/draw/draw_llvm.c|  6 +-
> >  src/gallium/auxiliary/gallivm/lp_bld_intr.c   | 70 
> > ++-
> >  src/gallium/auxiliary/gallivm/lp_bld_intr.h   | 22 ++-
> >  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  4 +-
> >  src/gallium/drivers/radeonsi/si_shader.c  | 69 
> > +++---
> >  src/gallium/drivers/radeonsi/si_shader_tgsi_alu.c | 24 
> >  6 files changed, 143 insertions(+), 52 deletions(-)
> > 
> > diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
> > b/src/gallium/auxiliary/draw/draw_llvm.c
> > index 5b4e2a1..ba86b11 100644
> > --- a/src/gallium/auxiliary/draw/draw_llvm.c
> > +++ b/src/gallium/auxiliary/draw/draw_llvm.c
> > @@ -1568,8 +1568,7 @@ draw_llvm_generate(struct draw_llvm *llvm, struct 
> > draw_llvm_variant *variant,
> > LLVMSetFunctionCallConv(variant_func, LLVMCCallConv);
> > for (i = 0; i < num_arg_types; ++i)
> >if (LLVMGetTypeKind(arg_types[i]) == LLVMPointerTypeKind)
> > - LLVMAddAttribute(LLVMGetParam(variant_func, i),
> > -  LLVMNoAliasAttribute);
> > + lp_add_function_attr(variant_func, i + 1, LP_FUNC_ATTR_NOALIAS);
> >  
> > context_ptr   = LLVMGetParam(variant_func, 0);
> > io_ptr= LLVMGetParam(variant_func, 1);
> > @@ -2193,8 +2192,7 @@ draw_gs_llvm_generate(struct draw_llvm *llvm,
> >  
> > for (i = 0; i < ARRAY_SIZE(arg_types); ++i)
> >if (LLVMGetTypeKind(arg_types[i]) == LLVMPointerTypeKind)
> > - LLVMAddAttribute(LLVMGetParam(variant_func, i),
> > -  LLVMNoAliasAttribute);
> > + lp_add_function_attr(variant_func, i + 1, LP_FUNC_ATTR_NOALIAS);
> >  
> > context_ptr   = LLVMGetParam(variant_func, 0);
> > input_array   = LLVMGetParam(variant_func, 1);
> > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_intr.c 
> > b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
> > index f12e735..049671a 100644
> > --- a/src/gallium/auxiliary/gallivm/lp_bld_intr.c
> > +++ b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
> > @@ -46,6 +46,7 @@
> >  
> >  #include "util/u_debug.h"
> >  #include "util/u_string.h"
> > +#include "util/bitscan.h"
> >  
> >  #include "lp_bld_const.h"
> >  #include "lp_bld_intr.h"
> > @@ -120,13 +121,73 @@ lp_declare_intrinsic(LLVMModuleRef module,
> >  }
> >  
> >  
> > +#if HAVE_LLVM < 0x0400
> > +static LLVMAttribute lp_attr_to_llvm_attr(enum lp_func_attr attr)
> > +{
> > +   switch (attr) {
> > +   case LP_FUNC_ATTR_ALWAYSINLINE: return LLVMAlwaysInlineAttribute;
> > +   case LP_FUNC_ATTR_BYVAL: return LLVMByValAttribute;
> > +   case LP_FUNC_ATTR_INREG: return LLVMInRegAttribute;
> > +   case LP_FUNC_ATTR_NOALIAS: return LLVMNoAliasAttribute;
> > +   case LP_FUNC_ATTR_NOUNWIND: return LLVMNoUnwindAttribute;
> > +   case LP_FUNC_ATTR_READNONE: return LLVMReadNoneAttribute;
> > +   case LP_FUNC_ATTR_READONLY: return LLVMReadOnlyAttribute;
> > +   default:
> > +  _debug_printf("Unhandled function attribute: %x\n", attr);
> > +  return 0;
> > +   }
> > +}
> > +
> > +#else
> > +
> > +static const char *attr_to_str(enum lp_func_attr attr)
> > +{
> > +   switch (attr) {
> > +   case LP_FUNC_ATTR_ALWAYSINLINE: return "alwaysinline";
> > +   case LP_FUNC_ATTR_BYVAL: return "byval";
> > +   case LP_FUNC_ATTR_INREG: return "inreg";
> > +   case LP_FUNC_ATTR_NOALIAS: return "noalias";
> > +   case LP_FUNC_ATTR_NOUNWIND: return "nounwind";
> > +   case LP_FUNC_ATTR_READNONE: return "readnone";
> > +   case LP_FUNC_ATTR_READONLY: return "readonly";
> > +   default:
> > +  _debug_printf("Unhandled function attribute: %x\n", attr);
> > +  return 0;
> > +   }
> > +}
> > +
> > +#endif
> > +
> > +void
> > +lp_add_function_attr(LLVMValueRef function,
> > + int attr_idx,
> > + enum lp_func_

[Mesa-dev] [PATCH 1/2] gallivm: Fix build after removal of deprecated attribute API v3

2016-11-09 Thread Tom Stellard
v2:
  Fix adding parameter attributes with LLVM < 4.0.

v3:
  Fix typo.
  Fix parameter index.
  Add a gallivm enum for function attributes.
---
 src/gallium/auxiliary/draw/draw_llvm.c|  6 +-
 src/gallium/auxiliary/gallivm/lp_bld_intr.c   | 70 ++-
 src/gallium/auxiliary/gallivm/lp_bld_intr.h   | 22 ++-
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  4 +-
 src/gallium/drivers/radeonsi/si_shader.c  | 69 +++---
 src/gallium/drivers/radeonsi/si_shader_tgsi_alu.c | 24 
 6 files changed, 143 insertions(+), 52 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 5b4e2a1..ba86b11 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1568,8 +1568,7 @@ draw_llvm_generate(struct draw_llvm *llvm, struct 
draw_llvm_variant *variant,
LLVMSetFunctionCallConv(variant_func, LLVMCCallConv);
for (i = 0; i < num_arg_types; ++i)
   if (LLVMGetTypeKind(arg_types[i]) == LLVMPointerTypeKind)
- LLVMAddAttribute(LLVMGetParam(variant_func, i),
-  LLVMNoAliasAttribute);
+ lp_add_function_attr(variant_func, i + 1, LP_FUNC_ATTR_NOALIAS);
 
context_ptr   = LLVMGetParam(variant_func, 0);
io_ptr= LLVMGetParam(variant_func, 1);
@@ -2193,8 +2192,7 @@ draw_gs_llvm_generate(struct draw_llvm *llvm,
 
for (i = 0; i < ARRAY_SIZE(arg_types); ++i)
   if (LLVMGetTypeKind(arg_types[i]) == LLVMPointerTypeKind)
- LLVMAddAttribute(LLVMGetParam(variant_func, i),
-  LLVMNoAliasAttribute);
+ lp_add_function_attr(variant_func, i + 1, LP_FUNC_ATTR_NOALIAS);
 
context_ptr   = LLVMGetParam(variant_func, 0);
input_array   = LLVMGetParam(variant_func, 1);
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_intr.c 
b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
index f12e735..049671a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_intr.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
@@ -46,6 +46,7 @@
 
 #include "util/u_debug.h"
 #include "util/u_string.h"
+#include "util/bitscan.h"
 
 #include "lp_bld_const.h"
 #include "lp_bld_intr.h"
@@ -120,13 +121,73 @@ lp_declare_intrinsic(LLVMModuleRef module,
 }
 
 
+#if HAVE_LLVM < 0x0400
+static LLVMAttribute lp_attr_to_llvm_attr(enum lp_func_attr attr)
+{
+   switch (attr) {
+   case LP_FUNC_ATTR_ALWAYSINLINE: return LLVMAlwaysInlineAttribute;
+   case LP_FUNC_ATTR_BYVAL: return LLVMByValAttribute;
+   case LP_FUNC_ATTR_INREG: return LLVMInRegAttribute;
+   case LP_FUNC_ATTR_NOALIAS: return LLVMNoAliasAttribute;
+   case LP_FUNC_ATTR_NOUNWIND: return LLVMNoUnwindAttribute;
+   case LP_FUNC_ATTR_READNONE: return LLVMReadNoneAttribute;
+   case LP_FUNC_ATTR_READONLY: return LLVMReadOnlyAttribute;
+   default:
+  _debug_printf("Unhandled function attribute: %x\n", attr);
+  return 0;
+   }
+}
+
+#else
+
+static const char *attr_to_str(enum lp_func_attr attr)
+{
+   switch (attr) {
+   case LP_FUNC_ATTR_ALWAYSINLINE: return "alwaysinline";
+   case LP_FUNC_ATTR_BYVAL: return "byval";
+   case LP_FUNC_ATTR_INREG: return "inreg";
+   case LP_FUNC_ATTR_NOALIAS: return "noalias";
+   case LP_FUNC_ATTR_NOUNWIND: return "nounwind";
+   case LP_FUNC_ATTR_READNONE: return "readnone";
+   case LP_FUNC_ATTR_READONLY: return "readonly";
+   default:
+  _debug_printf("Unhandled function attribute: %x\n", attr);
+  return 0;
+   }
+}
+
+#endif
+
+void
+lp_add_function_attr(LLVMValueRef function,
+ int attr_idx,
+ enum lp_func_attr attr)
+{
+
+#if HAVE_LLVM < 0x0400
+   LLVMAttribute llvm_attr = lp_attr_to_llvm_attr(attr);
+   if (attr_idx == -1) {
+  LLVMAddFunctionAttr(function, llvm_attr);
+   } else {
+  LLVMAddAttribute(LLVMGetParam(function, attr_idx - 1), llvm_attr);
+   }
+#else
+   LLVMContextRef context = 
LLVMGetModuleContext(LLVMGetGlobalParent(function));
+   const char *attr_name = attr_to_str(attr);
+   unsigned kind_id = LLVMGetEnumAttributeKindForName(attr_name,
+  strlen(attr_name));
+   LLVMAttributeRef llvm_attr = LLVMCreateEnumAttribute(context, kind_id, 0);
+   LLVMAddAttributeAtIndex(function, attr_idx, llvm_attr);
+#endif
+}
+
 LLVMValueRef
 lp_build_intrinsic(LLVMBuilderRef builder,
const char *name,
LLVMTypeRef ret_type,
LLVMValueRef *args,
unsigned num_args,
-   LLVMAttribute attr)
+   unsigned attr_mask)
 {
LLVMModuleRef module = 
LLVMGetGlobalParent(LLVMGetBasicBlockParent(LLVMGetInsertBlock(builder)));
LLVMValueRef function;
@@ -148,7 +209,12 @@ lp_build_intrinsic(LLVMBuilderRef builder,
   /* NoUnwind indicates that the intrinsic never raises a C++ exception.
* Set it for all intrinsics.
*/
-

[Mesa-dev] [PATCH 2/2] llvmpipe: Fix build after removal of deprecated attribute API v2

2016-11-09 Thread Tom Stellard
From: Aaron Watry <awa...@gmail.com>

Applies on top of v3 of Tom's gallivm change.

v2:
  - Tom Stellard: Use enums instread of strings.

Signed-off-by: Aaron Watry <awa...@gmail.com>
CC: Tom Stellard <thomas.stell...@amd.com>
CC: Jan Vesely <jan.ves...@rutgers.edu>
---
 src/gallium/drivers/llvmpipe/lp_state_fs.c| 2 +-
 src/gallium/drivers/llvmpipe/lp_state_setup.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c 
b/src/gallium/drivers/llvmpipe/lp_state_fs.c
index 3428eed..0910815 100644
--- a/src/gallium/drivers/llvmpipe/lp_state_fs.c
+++ b/src/gallium/drivers/llvmpipe/lp_state_fs.c
@@ -2296,7 +2296,7 @@ generate_fragment(struct llvmpipe_context *lp,
 */
for(i = 0; i < ARRAY_SIZE(arg_types); ++i)
   if(LLVMGetTypeKind(arg_types[i]) == LLVMPointerTypeKind)
- LLVMAddAttribute(LLVMGetParam(function, i), LLVMNoAliasAttribute);
+ lp_add_function_attr(function, i + 1, LP_FUNC_ATTR_NOALIAS);
 
context_ptr  = LLVMGetParam(function, 0);
x= LLVMGetParam(function, 1);
diff --git a/src/gallium/drivers/llvmpipe/lp_state_setup.c 
b/src/gallium/drivers/llvmpipe/lp_state_setup.c
index a57e2f0..6b0df21 100644
--- a/src/gallium/drivers/llvmpipe/lp_state_setup.c
+++ b/src/gallium/drivers/llvmpipe/lp_state_setup.c
@@ -624,8 +624,7 @@ set_noalias(LLVMBuilderRef builder,
int i;
for(i = 0; i < nr_args; ++i)
   if(LLVMGetTypeKind(arg_types[i]) == LLVMPointerTypeKind)
- LLVMAddAttribute(LLVMGetParam(function, i),
-LLVMNoAliasAttribute);
+ lp_add_function_attr(function, i + 1, LP_FUNC_ATTR_NOALIAS);
 }
 
 static void
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallivm: Fix build after removal of deprecated attribute API v2

2016-11-07 Thread Tom Stellard
v2:
  Fix adding parameter attributes with LLVM < 4.0.
---
 src/gallium/auxiliary/draw/draw_llvm.c|  6 +-
 src/gallium/auxiliary/gallivm/lp_bld_intr.c   | 52 -
 src/gallium/auxiliary/gallivm/lp_bld_intr.h   | 13 -
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  4 +-
 src/gallium/drivers/radeonsi/si_shader.c  | 69 ---
 src/gallium/drivers/radeonsi/si_shader_tgsi_alu.c | 24 
 6 files changed, 116 insertions(+), 52 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 5b4e2a1..5d87318 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1568,8 +1568,7 @@ draw_llvm_generate(struct draw_llvm *llvm, struct 
draw_llvm_variant *variant,
LLVMSetFunctionCallConv(variant_func, LLVMCCallConv);
for (i = 0; i < num_arg_types; ++i)
   if (LLVMGetTypeKind(arg_types[i]) == LLVMPointerTypeKind)
- LLVMAddAttribute(LLVMGetParam(variant_func, i),
-  LLVMNoAliasAttribute);
+ lp_add_function_attr(variant_func, i + 1, "noalias", 7);
 
context_ptr   = LLVMGetParam(variant_func, 0);
io_ptr= LLVMGetParam(variant_func, 1);
@@ -2193,8 +2192,7 @@ draw_gs_llvm_generate(struct draw_llvm *llvm,
 
for (i = 0; i < ARRAY_SIZE(arg_types); ++i)
   if (LLVMGetTypeKind(arg_types[i]) == LLVMPointerTypeKind)
- LLVMAddAttribute(LLVMGetParam(variant_func, i),
-  LLVMNoAliasAttribute);
+ lp_add_function_attr(variant_func, i + 1, "noalias", 7);
 
context_ptr   = LLVMGetParam(variant_func, 0);
input_array   = LLVMGetParam(variant_func, 1);
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_intr.c 
b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
index f12e735..401e9a2 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_intr.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
@@ -120,13 +120,57 @@ lp_declare_intrinsic(LLVMModuleRef module,
 }
 
 
+#if HAVE_LLVM < 0x0400
+static LLVMAttribute str_to_attr(const char *attr_name, unsigned attr_len)
+{
+   if (!strncmp("alwaysinline", attr_name, attr_len)) {
+  return LLVMAlwaysInlineAttribute;
+   } else if (!strncmp("byval", attr_name, attr_len)) {
+  return LLVMByValAttribute;
+   } else if (!strncmp("inreg", attr_name, attr_len)) {
+  return LLVMInRegAttribute;
+   } else if (!strncmp("noalias", attr_name, attr_len)) {
+  return LLVMNoAlliasAttribute;
+   } else if (!strncmp("readnone", attr_name, attr_len)) {
+  return LLVMReadNoneAttribute;
+   } else if (!strncmp("readonly", attr_name, attr_len)) {
+  return LLVMReadOnlyAttribute;
+   } else {
+  _debug_printf("Unhandled function attribute: %s\n", attr_name);
+  return 0;
+   }
+}
+#endif
+
+void
+lp_add_function_attr(LLVMValueRef function,
+ int attr_idx,
+ const char *attr_name,
+ unsigned attr_len)
+{
+
+#if HAVE_LLVM < 0x0400
+   LLVMAttribute attr = str_to_attr(attr_name, attr_len);
+   if (attr_idx == -1) {
+  LLVMAddFunctionAttr(function, attr);
+   } else {
+  LLVMAddAttribute(LLVMGetParam(function, attr_idx), attr);
+   }
+#else
+   LLVMContextRef context = 
LLVMGetModuleContext(LLVMGetGlobalParent(function));
+   unsigned kind_id = LLVMGetEnumAttributeKindForName(attr_name, attr_len);
+   LLVMAttributeRef attr = LLVMCreateEnumAttribute(context, kind_id, 0);
+   LLVMAddAttributeAtIndex(function, attr_idx, attr);
+#endif
+}
+
 LLVMValueRef
 lp_build_intrinsic(LLVMBuilderRef builder,
const char *name,
LLVMTypeRef ret_type,
LLVMValueRef *args,
unsigned num_args,
-   LLVMAttribute attr)
+   const char *attr_str)
 {
LLVMModuleRef module = 
LLVMGetGlobalParent(LLVMGetBasicBlockParent(LLVMGetInsertBlock(builder)));
LLVMValueRef function;
@@ -145,10 +189,14 @@ lp_build_intrinsic(LLVMBuilderRef builder,
 
   function = lp_declare_intrinsic(module, name, ret_type, arg_types, 
num_args);
 
+  if (attr_str) {
+ lp_add_function_attr(function, -1, attr_str, sizeof(attr_str));
+  }
+
   /* NoUnwind indicates that the intrinsic never raises a C++ exception.
* Set it for all intrinsics.
*/
-  LLVMAddFunctionAttr(function, attr | LLVMNoUnwindAttribute);
+  lp_add_function_attr(function, -1, "nounwind", 8);
 
   if (gallivm_debug & GALLIVM_DEBUG_IR) {
  lp_debug_dump_value(function);
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_intr.h 
b/src/gallium/auxiliary/gallivm/lp_bld_intr.h
index 7d80ac2..a058de4 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_intr.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_intr.h
@@ -60,13 +60,24 @@ lp_declare_intrinsic(LLVMModuleRef module,
  LLVMTypeRef *arg_types,
 

[Mesa-dev] [PATCH] gallivm: Fix build after removal of deprecated attribute API

2016-11-07 Thread Tom Stellard
---

Build tested only so far.

 src/gallium/auxiliary/draw/draw_llvm.c|  6 +-
 src/gallium/auxiliary/gallivm/lp_bld_intr.c   | 48 +++-
 src/gallium/auxiliary/gallivm/lp_bld_intr.h   | 13 -
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  4 +-
 src/gallium/drivers/radeonsi/si_shader.c  | 69 ---
 src/gallium/drivers/radeonsi/si_shader_tgsi_alu.c | 24 
 6 files changed, 112 insertions(+), 52 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 5b4e2a1..5d87318 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1568,8 +1568,7 @@ draw_llvm_generate(struct draw_llvm *llvm, struct 
draw_llvm_variant *variant,
LLVMSetFunctionCallConv(variant_func, LLVMCCallConv);
for (i = 0; i < num_arg_types; ++i)
   if (LLVMGetTypeKind(arg_types[i]) == LLVMPointerTypeKind)
- LLVMAddAttribute(LLVMGetParam(variant_func, i),
-  LLVMNoAliasAttribute);
+ lp_add_function_attr(variant_func, i + 1, "noalias", 7);
 
context_ptr   = LLVMGetParam(variant_func, 0);
io_ptr= LLVMGetParam(variant_func, 1);
@@ -2193,8 +2192,7 @@ draw_gs_llvm_generate(struct draw_llvm *llvm,
 
for (i = 0; i < ARRAY_SIZE(arg_types); ++i)
   if (LLVMGetTypeKind(arg_types[i]) == LLVMPointerTypeKind)
- LLVMAddAttribute(LLVMGetParam(variant_func, i),
-  LLVMNoAliasAttribute);
+ lp_add_function_attr(variant_func, i + 1, "noalias", 7);
 
context_ptr   = LLVMGetParam(variant_func, 0);
input_array   = LLVMGetParam(variant_func, 1);
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_intr.c 
b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
index f12e735..55afe6d 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_intr.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
@@ -120,13 +120,53 @@ lp_declare_intrinsic(LLVMModuleRef module,
 }
 
 
+#if HAVE_LLVM < 0x0400
+static LLVMAttribute str_to_attr(const char *attr_name, unsigned attr_len)
+{
+   if (!strncmp("alwaysinline", attr_name, attr_len)) {
+  return LLVMAlwaysInlineAttribute;
+   } else if (!strncmp("byval", attr_name, attr_len)) {
+  return LLVMByValAttribute;
+   } else if (!strncmp("inreg", attr_name, attr_len)) {
+  return LLVMInRegAttribute;
+   } else if (!strncmp("noalias", attr_name, attr_len)) {
+  return LLVMNoAlliasAttribute;
+   } else if (!strncmp("readnone", attr_name, attr_len)) {
+  return LLVMReadNoneAttribute;
+   } else if (!strncmp("readonly", attr_name, attr_len)) {
+  return LLVMReadOnlyAttribute;
+   } else {
+  _debug_printf("Unhandled function attribute: %s\n", attr_name);
+  return 0;
+   }
+}
+#endif
+
+void
+lp_add_function_attr(LLVMValueRef function,
+ unsigned attr_idx,
+ const char *attr_name,
+ unsigned attr_len)
+{
+
+#if HAVE_LLVM < 0x0400
+   LLVMAttribute attr = str_to_attr(attr_name, attr_len);
+   LLVMAddFunctionAttr(function, attr);
+#else
+   LLVMContextRef context = 
LLVMGetModuleContext(LLVMGetGlobalParent(function));
+   unsigned kind_id = LLVMGetEnumAttributeKindForName(attr_name, attr_len);
+   LLVMAttributeRef attr = LLVMCreateEnumAttribute(context, kind_id, 0);
+   LLVMAddAttributeAtIndex(function, attr_idx, attr);
+#endif
+}
+
 LLVMValueRef
 lp_build_intrinsic(LLVMBuilderRef builder,
const char *name,
LLVMTypeRef ret_type,
LLVMValueRef *args,
unsigned num_args,
-   LLVMAttribute attr)
+   const char *attr_str)
 {
LLVMModuleRef module = 
LLVMGetGlobalParent(LLVMGetBasicBlockParent(LLVMGetInsertBlock(builder)));
LLVMValueRef function;
@@ -145,10 +185,14 @@ lp_build_intrinsic(LLVMBuilderRef builder,
 
   function = lp_declare_intrinsic(module, name, ret_type, arg_types, 
num_args);
 
+  if (attr_str) {
+ lp_add_function_attr(function, -1, attr_str, sizeof(attr_str));
+  }
+
   /* NoUnwind indicates that the intrinsic never raises a C++ exception.
* Set it for all intrinsics.
*/
-  LLVMAddFunctionAttr(function, attr | LLVMNoUnwindAttribute);
+  lp_add_function_attr(function, -1, "nounwind", 8);
 
   if (gallivm_debug & GALLIVM_DEBUG_IR) {
  lp_debug_dump_value(function);
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_intr.h 
b/src/gallium/auxiliary/gallivm/lp_bld_intr.h
index 7d80ac2..b4558dc 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_intr.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_intr.h
@@ -60,13 +60,24 @@ lp_declare_intrinsic(LLVMModuleRef module,
  LLVMTypeRef *arg_types,
  unsigned num_args);
 
+void
+lp_remove_attr(LLVMValueRef value,
+   const char *attr_name,
+   unsigned 

Re: [Mesa-dev] nir/radv: workaround broken kilp support

2016-11-02 Thread Tom Stellard
On Wed, Nov 02, 2016 at 11:26:08AM +1000, Dave Airlie wrote:
> So it appears at least the LLVM 3.9 backend can get confused
> when it gets hit with if (cond) discard type constructs, and we
> have a GLSL optimisation to convert this to discard_if, so I've
> ported that to NIR, and enabled it for radv. It fixes the hangs
> and the tests here.
> 

Does this work correctly with llvm master?

-Tom


> Dave.
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radv: Use new image load/store intrinsic signatures v2

2016-10-13 Thread Tom Stellard
These were changed in LLVM r284024.

v2:
  - Only use float types for vdata of llvm.amdgcn.image.store.  LLVM doesn't
support integer types for this intrinsic.
---
 src/amd/common/ac_nir_to_llvm.c | 133 
 1 file changed, 108 insertions(+), 25 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 9c764c7..56814ec 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2296,13 +2296,73 @@ static LLVMValueRef get_image_coords(struct 
nir_to_llvm_context *ctx,
return res;
 }
 
+static void build_type_name_for_intr(
+LLVMTypeRef type,
+char *buf, unsigned bufsize)
+{
+LLVMTypeRef elem_type = type;
+
+assert(bufsize >= 8);
+
+if (LLVMGetTypeKind(type) == LLVMVectorTypeKind) {
+int ret = snprintf(buf, bufsize, "v%u",
+LLVMGetVectorSize(type));
+if (ret < 0) {
+char *type_name = LLVMPrintTypeToString(type);
+fprintf(stderr, "Error building type name for: %s\n",
+type_name);
+return;
+}
+elem_type = LLVMGetElementType(type);
+buf += ret;
+bufsize -= ret;
+}
+switch (LLVMGetTypeKind(elem_type)) {
+default: break;
+case LLVMIntegerTypeKind:
+snprintf(buf, bufsize, "i%d", LLVMGetIntTypeWidth(elem_type));
+break;
+case LLVMFloatTypeKind:
+snprintf(buf, bufsize, "f32");
+break;
+case LLVMDoubleTypeKind:
+snprintf(buf, bufsize, "f64");
+break;
+}
+}
+
+static void get_image_intr_name(const char *base_name,
+LLVMTypeRef data_type,
+LLVMTypeRef coords_type,
+LLVMTypeRef rsrc_type,
+char *out_name, unsigned out_len)
+{
+char coords_type_name[8];
+
+build_type_name_for_intr(coords_type, coords_type_name,
+sizeof(coords_type_name));
+
+if (HAVE_LLVM <= 0x0309) {
+snprintf(out_name, out_len, "%s.%s", base_name, 
coords_type_name);
+} else {
+char data_type_name[8];
+char rsrc_type_name[8];
+
+build_type_name_for_intr(data_type, data_type_name,
+sizeof(data_type_name));
+build_type_name_for_intr(rsrc_type, rsrc_type_name,
+sizeof(rsrc_type_name));
+snprintf(out_name, out_len, "%s.%s.%s.%s", base_name,
+ data_type_name, coords_type_name, rsrc_type_name);
+}
+}
+
 static LLVMValueRef visit_image_load(struct nir_to_llvm_context *ctx,
 nir_intrinsic_instr *instr)
 {
LLVMValueRef params[7];
LLVMValueRef res;
-   char intrinsic_name[32];
-   char coords_type[8];
+   char intrinsic_name[64];
const nir_variable *var = instr->variables[0]->var;
const struct glsl_type *type = var->type;
if(instr->variables[0]->deref.child)
@@ -2322,23 +2382,35 @@ static LLVMValueRef visit_image_load(struct 
nir_to_llvm_context *ctx,
res = trim_vector(ctx, res, instr->dest.ssa.num_components);
res = to_integer(ctx, res);
} else {
-   bool da = glsl_sampler_type_is_array(type) ||
- glsl_get_sampler_dim(type) == GLSL_SAMPLER_DIM_CUBE;
+   bool is_da = glsl_sampler_type_is_array(type) ||
+glsl_get_sampler_dim(type) == 
GLSL_SAMPLER_DIM_CUBE;
bool add_frag_pos = glsl_get_sampler_dim(type) == 
GLSL_SAMPLER_DIM_SUBPASS;
+   LLVMValueRef da = is_da ? ctx->i32one : ctx->i32zero;
+   LLVMValueRef glc = LLVMConstInt(ctx->i1, 0, false);
+   LLVMValueRef slc = LLVMConstInt(ctx->i1, 0, false);
 
params[0] = get_image_coords(ctx, instr, add_frag_pos);
params[1] = get_sampler_desc(ctx, instr->variables[0], 
DESC_IMAGE);
params[2] = LLVMConstInt(ctx->i32, 15, false); /* dmask */
-   params[3] = LLVMConstInt(ctx->i1, 0, false);  /* r128 */
-   params[4] = da ? ctx->i32one : ctx->i32zero; /* da */
-   params[5] = LLVMConstInt(ctx->i1, 0, false);  /* glc */
-   params[6] = LLVMConstInt(ctx->i1, 0, false);  /* slc */
+   if (HAVE_LLVM <= 0x0309) {
+   params[3] = LLVMConstInt(ctx->i1, 0, false);  /* r128 */
+   params[4] = da;
+   params[5] = glc;
+   params[6] = slc;
+   } else {
+

Re: [Mesa-dev] [PATCH 2/2] radv: Use new image load/store intrinsic signatures

2016-10-13 Thread Tom Stellard
On Thu, Oct 13, 2016 at 07:20:30PM +0200, Kai Wasserbäch wrote:
> Dear Tom,
> just FYI: this fails to apply on top of master
> (761388a0eb586b1dcaec063ee561056ed132dc1a). git am chokes on the
> visit_image_store() hunk for me. Attached is a "refreshed" version, which
> applies for me. I hope I didn't butcher anything inadvertently.
> 

Hi,

I just sent rebased patches.  Can you try those.

-Tom

> Cheers,
> Kai
> 
> 
> Tom Stellard wrote on 13.10.2016 17:21:
> > These were changed in LLVM r284024.
> > ---
> >  src/amd/common/ac_nir_to_llvm.c | 131 
> > 
> >  1 file changed, 107 insertions(+), 24 deletions(-)
> > 
> > diff --git a/src/amd/common/ac_nir_to_llvm.c 
> > b/src/amd/common/ac_nir_to_llvm.c
> > index 9c764c7..4fba7d3 100644
> > --- a/src/amd/common/ac_nir_to_llvm.c
> > +++ b/src/amd/common/ac_nir_to_llvm.c
> > @@ -2296,13 +2296,73 @@ static LLVMValueRef get_image_coords(struct 
> > nir_to_llvm_context *ctx,
> > return res;
> >  }
> >  
> > +static void build_type_name_for_intr(
> > +LLVMTypeRef type,
> > +char *buf, unsigned bufsize)
> > +{
> > +LLVMTypeRef elem_type = type;
> > +
> > +assert(bufsize >= 8);
> > +
> > +if (LLVMGetTypeKind(type) == LLVMVectorTypeKind) {
> > +int ret = snprintf(buf, bufsize, "v%u",
> > +LLVMGetVectorSize(type));
> > +if (ret < 0) {
> > +char *type_name = LLVMPrintTypeToString(type);
> > +fprintf(stderr, "Error building type name for: 
> > %s\n",
> > +type_name);
> > +return;
> > +}
> > +elem_type = LLVMGetElementType(type);
> > +buf += ret;
> > +bufsize -= ret;
> > +}
> > +switch (LLVMGetTypeKind(elem_type)) {
> > +default: break;
> > +case LLVMIntegerTypeKind:
> > +snprintf(buf, bufsize, "i%d", 
> > LLVMGetIntTypeWidth(elem_type));
> > +break;
> > +case LLVMFloatTypeKind:
> > +snprintf(buf, bufsize, "f32");
> > +break;
> > +case LLVMDoubleTypeKind:
> > +snprintf(buf, bufsize, "f64");
> > +break;
> > +}
> > +}
> > +
> > +static void get_image_intr_name(const char *base_name,
> > +LLVMTypeRef data_type,
> > +LLVMTypeRef coords_type,
> > +LLVMTypeRef rsrc_type,
> > +char *out_name, unsigned out_len)
> > +{
> > +char coords_type_name[8];
> > +
> > +build_type_name_for_intr(coords_type, coords_type_name,
> > +sizeof(coords_type_name));
> > +
> > +if (HAVE_LLVM <= 0x0309) {
> > +snprintf(out_name, out_len, "%s.%s", base_name, 
> > coords_type_name);
> > +} else {
> > +char data_type_name[8];
> > +char rsrc_type_name[8];
> > +
> > +build_type_name_for_intr(data_type, data_type_name,
> > +sizeof(data_type_name));
> > +build_type_name_for_intr(rsrc_type, rsrc_type_name,
> > +sizeof(rsrc_type_name));
> > +snprintf(out_name, out_len, "%s.%s.%s.%s", base_name,
> > + data_type_name, coords_type_name, rsrc_type_name);
> > +}
> > +}
> > +
> >  static LLVMValueRef visit_image_load(struct nir_to_llvm_context *ctx,
> >  nir_intrinsic_instr *instr)
> >  {
> > LLVMValueRef params[7];
> > LLVMValueRef res;
> > -   char intrinsic_name[32];
> > -   char coords_type[8];
> > +   char intrinsic_name[64];
> > const nir_variable *var = instr->variables[0]->var;
> > const struct glsl_type *type = var->type;
> > if(instr->variables[0]->deref.child)
> > @@ -2322,23 +2382,35 @@ static LLVMValueRef visit_image_load(struct 
> > nir_to_llvm_context *ctx,
> > res = trim_vector(ctx, res, instr->dest.ssa.num_components);
> > res = to_integer(ctx, res);
> 

[Mesa-dev] [PATCH 1/2] radv: Fix incorrect comment

2016-10-13 Thread Tom Stellard
---
 src/amd/common/ac_nir_to_llvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index e6ff7c8..9c764c7 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2373,8 +2373,8 @@ static void visit_image_store(struct nir_to_llvm_context 
*ctx,
bool da = glsl_sampler_type_is_array(type) ||
  glsl_get_sampler_dim(type) == GLSL_SAMPLER_DIM_CUBE;
 
-   params[0] = get_src(ctx, instr->src[2]); /* coords */
-   params[1] = get_image_coords(ctx, instr, false);
+   params[0] = get_src(ctx, instr->src[2]);
+   params[1] = get_image_coords(ctx, instr, false); /* coords */
params[2] = get_sampler_desc(ctx, instr->variables[0], 
DESC_IMAGE);
params[3] = LLVMConstInt(ctx->i32, 15, false); /* dmask */
params[4] = i1false;  /* r128 */
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radv: Use new image load/store intrinsic signatures

2016-10-13 Thread Tom Stellard
These were changed in LLVM r284024.
---
 src/amd/common/ac_nir_to_llvm.c | 131 
 1 file changed, 107 insertions(+), 24 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 9c764c7..4fba7d3 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2296,13 +2296,73 @@ static LLVMValueRef get_image_coords(struct 
nir_to_llvm_context *ctx,
return res;
 }
 
+static void build_type_name_for_intr(
+LLVMTypeRef type,
+char *buf, unsigned bufsize)
+{
+LLVMTypeRef elem_type = type;
+
+assert(bufsize >= 8);
+
+if (LLVMGetTypeKind(type) == LLVMVectorTypeKind) {
+int ret = snprintf(buf, bufsize, "v%u",
+LLVMGetVectorSize(type));
+if (ret < 0) {
+char *type_name = LLVMPrintTypeToString(type);
+fprintf(stderr, "Error building type name for: %s\n",
+type_name);
+return;
+}
+elem_type = LLVMGetElementType(type);
+buf += ret;
+bufsize -= ret;
+}
+switch (LLVMGetTypeKind(elem_type)) {
+default: break;
+case LLVMIntegerTypeKind:
+snprintf(buf, bufsize, "i%d", LLVMGetIntTypeWidth(elem_type));
+break;
+case LLVMFloatTypeKind:
+snprintf(buf, bufsize, "f32");
+break;
+case LLVMDoubleTypeKind:
+snprintf(buf, bufsize, "f64");
+break;
+}
+}
+
+static void get_image_intr_name(const char *base_name,
+LLVMTypeRef data_type,
+LLVMTypeRef coords_type,
+LLVMTypeRef rsrc_type,
+char *out_name, unsigned out_len)
+{
+char coords_type_name[8];
+
+build_type_name_for_intr(coords_type, coords_type_name,
+sizeof(coords_type_name));
+
+if (HAVE_LLVM <= 0x0309) {
+snprintf(out_name, out_len, "%s.%s", base_name, 
coords_type_name);
+} else {
+char data_type_name[8];
+char rsrc_type_name[8];
+
+build_type_name_for_intr(data_type, data_type_name,
+sizeof(data_type_name));
+build_type_name_for_intr(rsrc_type, rsrc_type_name,
+sizeof(rsrc_type_name));
+snprintf(out_name, out_len, "%s.%s.%s.%s", base_name,
+ data_type_name, coords_type_name, rsrc_type_name);
+}
+}
+
 static LLVMValueRef visit_image_load(struct nir_to_llvm_context *ctx,
 nir_intrinsic_instr *instr)
 {
LLVMValueRef params[7];
LLVMValueRef res;
-   char intrinsic_name[32];
-   char coords_type[8];
+   char intrinsic_name[64];
const nir_variable *var = instr->variables[0]->var;
const struct glsl_type *type = var->type;
if(instr->variables[0]->deref.child)
@@ -2322,23 +2382,35 @@ static LLVMValueRef visit_image_load(struct 
nir_to_llvm_context *ctx,
res = trim_vector(ctx, res, instr->dest.ssa.num_components);
res = to_integer(ctx, res);
} else {
-   bool da = glsl_sampler_type_is_array(type) ||
- glsl_get_sampler_dim(type) == GLSL_SAMPLER_DIM_CUBE;
+   bool is_da = glsl_sampler_type_is_array(type) ||
+glsl_get_sampler_dim(type) == 
GLSL_SAMPLER_DIM_CUBE;
bool add_frag_pos = glsl_get_sampler_dim(type) == 
GLSL_SAMPLER_DIM_SUBPASS;
+   LLVMValueRef da = is_da ? ctx->i32one : ctx->i32zero;
+   LLVMValueRef glc = LLVMConstInt(ctx->i1, 0, false);
+   LLVMValueRef slc = LLVMConstInt(ctx->i1, 0, false);
 
params[0] = get_image_coords(ctx, instr, add_frag_pos);
params[1] = get_sampler_desc(ctx, instr->variables[0], 
DESC_IMAGE);
params[2] = LLVMConstInt(ctx->i32, 15, false); /* dmask */
-   params[3] = LLVMConstInt(ctx->i1, 0, false);  /* r128 */
-   params[4] = da ? ctx->i32one : ctx->i32zero; /* da */
-   params[5] = LLVMConstInt(ctx->i1, 0, false);  /* glc */
-   params[6] = LLVMConstInt(ctx->i1, 0, false);  /* slc */
+   if (HAVE_LLVM <= 0x0309) {
+   params[3] = LLVMConstInt(ctx->i1, 0, false);  /* r128 */
+   params[4] = da;
+   params[5] = glc;
+   params[6] = slc;
+   } else {
+   LLVMValueRef lwe = LLVMConstInt(ctx->i1, 0, false);
+   params[3] = glc;
+   

[Mesa-dev] [PATCH 1/2] radv: Fix incorrect comment

2016-10-13 Thread Tom Stellard
---
 src/amd/common/ac_nir_to_llvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index e6ff7c8..9c764c7 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2373,8 +2373,8 @@ static void visit_image_store(struct nir_to_llvm_context 
*ctx,
bool da = glsl_sampler_type_is_array(type) ||
  glsl_get_sampler_dim(type) == GLSL_SAMPLER_DIM_CUBE;
 
-   params[0] = get_src(ctx, instr->src[2]); /* coords */
-   params[1] = get_image_coords(ctx, instr, false);
+   params[0] = get_src(ctx, instr->src[2]);
+   params[1] = get_image_coords(ctx, instr, false); /* coords */
params[2] = get_sampler_desc(ctx, instr->variables[0], 
DESC_IMAGE);
params[3] = LLVMConstInt(ctx->i32, 15, false); /* dmask */
params[4] = i1false;  /* r128 */
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] [RFC] radv: add scratch support for spilling.

2016-10-11 Thread Tom Stellard
On Tue, Oct 11, 2016 at 03:21:24PM +0200, Nicolai Hähnle wrote:
> On 11.10.2016 07:36, Dave Airlie wrote:
> > On 11 October 2016 at 12:13, Dave Airlie  wrote:
> >> On 11 October 2016 at 11:42, Dave Airlie  wrote:
> >>> On 11 October 2016 at 05:50, Dave Airlie  wrote:
>  On 10 October 2016 at 21:45, Arsenault, Matthew
>   wrote:
> > I don't like adding explicit IR arguments for ABI arguments, especially 
> > this
> > one. Adding a special case for the first index feels dirty. The rest of 
> > llvm
> > also won't be aware of the specialness of the argument. It would be
> > problematic because bugpoint would eliminate the unused argument and 
> > then
> > codegen would have to fail in some way when the argument is missing
> 
> That's a good point, but is there an alternative without burning two 
> userdata SGPRs?
> 
> One possibility is to define an ABI that says:
> 
> 1. SGPR0/1 points to an extra data region; it is reserved independently 
> from the shader arguments.
> 2. The first 64 bits of that extra data region point to the scratch buffer.
> 3. The main shader code can retrieve SGPR0/1 using an intrinsic.
> 
> This can be made to look somewhat similar to what HSA does.
> 

What if we stored all shader inputs in the 'extra data region', with an
ABI that defined fixed offsets in the 'extra data region' for each
input.

Then as an optimization we could have the compiler map the values that
it needed from the 'extra data region' into user sgprs and communicate
this back to the driver.

This gets us something that works very quickly and still allows us to do
optimizations in the future.

-Tom

> 
>  We should just hardcode the behaviour and switch both radv/radeonsi
>  over in one go?
> 
>  I'll try and code up, using the first 64-bits of the first buffer
>  pointed to by userdata 0/1,
>  to store things.
> >>>
> >>> I've looked at doing a dword fetch from the first two words of the 0/1 
> >>> userdata,
> >>>
> >>> It's not optimal for vulkan unfortunately, since the idea I had was per 
> >>> command
> >>> buffer I just allocate one scratch buffer of the size required at the 
> >>> end, and
> >>> patch it in at the start of the command buffer. However in the first
> >>> slot I was going
> >>> to use the push constants/dynamic buffer to store the value, however it 
> >>> looks
> >>> like I need to keep a list of everyone of these buffers I emit, and
> >>> backpatch them
> >>> all. It might not be too insane, just a slight bump in the keeping it 
> >>> simple.
> >>
> >> I'm probably losing te plot here, but I'm considering a double indirection,
> >>
> >> we load the 64-bit address from the first two dwords, then load the
> >> 64-bits dword
> >> from that address to get the value.
> >>
> >> This saves me allocating scratch bo's for secondary command buffers,
> >> and also having to allocating ever increasing scratch bo's as shaders that
> >> need more scratch get bound to the pipeline.
> >> I'm not sure how much of an effect this should have for GL though.
> >
> > I've posted a patch to this affect to the llvm phabricator.
> >
> > It definitely is cleaner for the radv driver.
> 
> I still think it would be nice to have the level of indirection or 
> whatever one wants to call it as a function attribute. This would allow 
> you to change your mind about e.g. just sticking the scratch pointer 
> directly into SGPR0/1. radeonsi and radv don't have to be identical in 
> that regard.
> 
> Cheers
> Nicolai
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] radeonsi: Refactor image store/load intrinsic name creation

2016-10-11 Thread Tom Stellard
---
 src/gallium/drivers/radeonsi/si_shader.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 49d4121..8254cb2 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -3738,6 +3738,18 @@ static void load_emit_memory(
emit_data->output[emit_data->chan] = lp_build_gather_values(gallivm, 
channels, 4);
 }
 
+static void get_image_intr_name(const char *base_name,
+   LLVMTypeRef coords_type,
+   char *out_name, unsigned out_len)
+{
+   char coords_type_name[8];
+
+   build_int_type_name(coords_type, coords_type_name,
+   sizeof(coords_type_name));
+
+   snprintf(out_name, out_len, "%s.%s", base_name, coords_type_name);
+}
+
 static void load_emit(
const struct lp_build_tgsi_action *action,
struct lp_build_tgsi_context *bld_base,
@@ -3748,7 +3760,6 @@ static void load_emit(
LLVMBuilderRef builder = gallivm->builder;
const struct tgsi_full_instruction * inst = emit_data->inst;
char intrinsic_name[32];
-   char coords_type[8];
 
if (inst->Src[0].Register.File == TGSI_FILE_MEMORY) {
load_emit_memory(ctx, emit_data);
@@ -3770,11 +3781,9 @@ static void load_emit(
emit_data->args, emit_data->arg_count,
LLVMReadOnlyAttribute);
} else {
-   build_int_type_name(LLVMTypeOf(emit_data->args[0]),
-   coords_type, sizeof(coords_type));
-
-   snprintf(intrinsic_name, sizeof(intrinsic_name),
-"llvm.amdgcn.image.load.%s", coords_type);
+   get_image_intr_name("llvm.amdgcn.image.load",
+   LLVMTypeOf(emit_data->args[0]),
+   intrinsic_name, sizeof(intrinsic_name));
 
emit_data->output[emit_data->chan] =
lp_build_intrinsic(
@@ -3951,7 +3960,6 @@ static void store_emit(
const struct tgsi_full_instruction * inst = emit_data->inst;
unsigned target = inst->Memory.Texture;
char intrinsic_name[32];
-   char coords_type[8];
 
if (inst->Dst[0].Register.File == TGSI_FILE_MEMORY) {
store_emit_memory(ctx, emit_data);
@@ -3972,10 +3980,9 @@ static void store_emit(
emit_data->dst_type, emit_data->args,
emit_data->arg_count, 0);
} else {
-   build_int_type_name(LLVMTypeOf(emit_data->args[1]),
-   coords_type, sizeof(coords_type));
-   snprintf(intrinsic_name, sizeof(intrinsic_name),
-"llvm.amdgcn.image.store.%s", coords_type);
+   get_image_intr_name("llvm.amdgcn.image.store",
+   LLVMTypeOf(emit_data->args[1]),
+   intrinsic_name, sizeof(intrinsic_name));
 
emit_data->output[emit_data->chan] =
lp_build_intrinsic(
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] radeonsi: Use the new image load/store intrinsic signatures

2016-10-11 Thread Tom Stellard
---
 src/gallium/drivers/radeonsi/si_shader.c | 59 +---
 1 file changed, 46 insertions(+), 13 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 4e07317..1f1fdf2 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -3575,16 +3575,29 @@ static void image_append_args(
const struct tgsi_full_instruction *inst = emit_data->inst;
LLVMValueRef i1false = LLVMConstInt(ctx->i1, 0, 0);
LLVMValueRef i1true = LLVMConstInt(ctx->i1, 1, 0);
-
-   emit_data->args[emit_data->arg_count++] = i1false; /* r128 */
-   emit_data->args[emit_data->arg_count++] =
-   tgsi_is_array_image(target) ? i1true : i1false; /* da */
-   if (!atomic) {
-   emit_data->args[emit_data->arg_count++] =
-   inst->Memory.Qualifier & (TGSI_MEMORY_COHERENT | 
TGSI_MEMORY_VOLATILE) ?
-   i1true : i1false; /* glc */
+   LLVMValueRef r128 = i1false;
+   LLVMValueRef da = tgsi_is_array_image(target) ? i1true : i1false;
+   LLVMValueRef glc =
+   inst->Memory.Qualifier & (TGSI_MEMORY_COHERENT | 
TGSI_MEMORY_VOLATILE) ?
+   i1true : i1false;
+   LLVMValueRef slc = i1false;
+   LLVMValueRef lwe = i1false;
+
+   if (atomic || (HAVE_LLVM <= 0x0309)) {
+   emit_data->args[emit_data->arg_count++] = r128;
+   emit_data->args[emit_data->arg_count++] = da;
+   if (!atomic) {
+   emit_data->args[emit_data->arg_count++] = glc;
+   }
+   emit_data->args[emit_data->arg_count++] = slc;
+   return;
}
-   emit_data->args[emit_data->arg_count++] = i1false; /* slc */
+
+   /* HAVE_LLVM >= 0x0400 */
+   emit_data->args[emit_data->arg_count++] = glc;
+   emit_data->args[emit_data->arg_count++] = slc;
+   emit_data->args[emit_data->arg_count++] = lwe;
+   emit_data->args[emit_data->arg_count++] = da;
 }
 
 /**
@@ -3761,7 +3774,9 @@ static void load_emit_memory(
 }
 
 static void get_image_intr_name(const char *base_name,
+   LLVMTypeRef data_type,
LLVMTypeRef coords_type,
+   LLVMTypeRef rsrc_type,
char *out_name, unsigned out_len)
 {
char coords_type_name[8];
@@ -3769,7 +3784,21 @@ static void get_image_intr_name(const char *base_name,
build_type_name_for_intr(coords_type, coords_type_name,
sizeof(coords_type_name));
 
+#if HAVE_LLVM <= 0x0309
snprintf(out_name, out_len, "%s.%s", base_name, coords_type_name);
+#else
+   {
+   char data_type_name[8];
+   char rsrc_type_name[8];
+
+   build_type_name_for_intr(data_type, data_type_name,
+   sizeof(data_type_name));
+   build_type_name_for_intr(rsrc_type, rsrc_type_name,
+   sizeof(rsrc_type_name));
+   snprintf(out_name, out_len, "%s.%s.%s.%s", base_name,
+data_type_name, coords_type_name, rsrc_type_name);
+   }
+#endif
 }
 
 static void load_emit(
@@ -3781,7 +3810,7 @@ static void load_emit(
struct gallivm_state *gallivm = bld_base->base.gallivm;
LLVMBuilderRef builder = gallivm->builder;
const struct tgsi_full_instruction * inst = emit_data->inst;
-   char intrinsic_name[32];
+   char intrinsic_name[64];
 
if (inst->Src[0].Register.File == TGSI_FILE_MEMORY) {
load_emit_memory(ctx, emit_data);
@@ -3804,7 +3833,9 @@ static void load_emit(
LLVMReadOnlyAttribute);
} else {
get_image_intr_name("llvm.amdgcn.image.load",
-   LLVMTypeOf(emit_data->args[0]),
+   emit_data->dst_type,/* vdata */
+   LLVMTypeOf(emit_data->args[0]), /* coords */
+   LLVMTypeOf(emit_data->args[1]), /* rsrc */
intrinsic_name, sizeof(intrinsic_name));
 
emit_data->output[emit_data->chan] =
@@ -3981,7 +4012,7 @@ static void store_emit(
LLVMBuilderRef builder = gallivm->builder;
const struct tgsi_full_instruction * inst = emit_data->inst;
unsigned target = inst->Memory.Texture;
-   char intrinsic_name[32];
+   char intrinsic_name[64];
 
if (inst->Dst[0].Register.File == TGSI_FILE_MEMORY) {
store_emit_memory(ctx, emit_data);
@@ -4003,7 +4034,9 @@ static void store_emit(
emit_data->arg_count, 0);
} else {
get_image_intr_name("llvm.amdgcn.image.store",
-   LLVMTypeOf(emit_data->args[1]),
+

[Mesa-dev] [PATCH 2/3] radeonsi: Add function for converting LLVM type to intrinsic string

2016-10-11 Thread Tom Stellard
The existing function only worked for integer types.
---
 src/gallium/drivers/radeonsi/si_shader.c | 42 
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 8254cb2..4e07317 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -3347,17 +3347,39 @@ static LLVMValueRef get_buffer_size(
  * Given the i32 or vNi32 \p type, generate the textual name (e.g. for use with
  * intrinsic names).
  */
-static void build_int_type_name(
+static void build_type_name_for_intr(
LLVMTypeRef type,
char *buf, unsigned bufsize)
 {
-   assert(bufsize >= 6);
+   LLVMTypeRef elem_type = type;
 
-   if (LLVMGetTypeKind(type) == LLVMVectorTypeKind)
-   snprintf(buf, bufsize, "v%ui32",
-LLVMGetVectorSize(type));
-   else
-   strcpy(buf, "i32");
+   assert(bufsize >= 8);
+
+   if (LLVMGetTypeKind(type) == LLVMVectorTypeKind) {
+   int ret = snprintf(buf, bufsize, "v%u",
+   LLVMGetVectorSize(type));
+   if (ret < 0) {
+   char *type_name = LLVMPrintTypeToString(type);
+   fprintf(stderr, "Error building type name for: %s\n",
+   type_name);
+   return;
+   }
+   elem_type = LLVMGetElementType(type);
+   buf += ret;
+   bufsize -= ret;
+   }
+   switch (LLVMGetTypeKind(elem_type)) {
+   default: break;
+   case LLVMIntegerTypeKind:
+   snprintf(buf, bufsize, "i%d", LLVMGetIntTypeWidth(elem_type));
+   break;
+   case LLVMFloatTypeKind:
+   snprintf(buf, bufsize, "f32");
+   break;
+   case LLVMDoubleTypeKind:
+   snprintf(buf, bufsize, "f64");
+   break;
+   }
 }
 
 static void build_tex_intrinsic(const struct lp_build_tgsi_action *action,
@@ -3744,7 +3766,7 @@ static void get_image_intr_name(const char *base_name,
 {
char coords_type_name[8];
 
-   build_int_type_name(coords_type, coords_type_name,
+   build_type_name_for_intr(coords_type, coords_type_name,
sizeof(coords_type_name));
 
snprintf(out_name, out_len, "%s.%s", base_name, coords_type_name);
@@ -4144,7 +4166,7 @@ static void atomic_emit(
} else {
char coords_type[8];
 
-   build_int_type_name(LLVMTypeOf(emit_data->args[1]),
+   build_type_name_for_intr(LLVMTypeOf(emit_data->args[1]),
coords_type, sizeof(coords_type));
snprintf(intrinsic_name, sizeof(intrinsic_name),
 "llvm.amdgcn.image.atomic.%s.%s",
@@ -4918,7 +4940,7 @@ static void build_tex_intrinsic(const struct 
lp_build_tgsi_action *action,
}
 
/* Add the type and suffixes .c, .o if needed. */
-   build_int_type_name(LLVMTypeOf(emit_data->args[0]), type, sizeof(type));
+   build_type_name_for_intr(LLVMTypeOf(emit_data->args[0]), type, 
sizeof(type));
sprintf(intr_name, "%s%s%s%s.%s",
name, is_shadow ? ".c" : "", infix,
has_offset ? ".o" : "", type);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeonsi/compute: Use the HSA abi for non-TGSI compute shaders v2

2016-09-13 Thread Tom Stellard
This patch switches non-TGSI compute shaders over to using the HSA
ABI described here:

https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md

The HSA ABI provides a much cleaner interface for compute shaders and allows
us to share more code in the compiler with the HSA stack.

The main changes in this patch are:
  - We now pass the scratch buffer resource into the shader via user sgprs
rather than using relocations.
  - Grid/Block sizes are now passed to the shader via the dispatch packet
rather than at the beginning of the kernel arguments.

Typically for HSA, the CP firmware will create the dispatch packet and set
up the user sgprs automatically.  However, in Mesa we let the driver do
this work.  The main reason for this is that I haven't researched how to
get the CP to do all these things, and I'm not sure if it is supported
for all GPUs.

v2:
  - Add comments explaining why we are setting certian bits of the scratch
resource descriptor.
---
 src/gallium/drivers/radeon/r600_pipe_common.c|   6 +-
 src/gallium/drivers/radeonsi/amd_kernel_code_t.h | 534 +++
 src/gallium/drivers/radeonsi/si_compute.c| 236 +-
 3 files changed, 758 insertions(+), 18 deletions(-)
 create mode 100644 src/gallium/drivers/radeonsi/amd_kernel_code_t.h

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 6d7cc1b..8f17f36 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -822,7 +822,11 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
if (rscreen->family <= CHIP_ARUBA) {
triple = "r600--";
} else {
-   triple = "amdgcn--";
+   if (HAVE_LLVM < 0x0400) {
+   triple = "amdgcn--";
+   } else {
+   triple = "amdgcn--mesa3d";
+   }
}
switch(rscreen->family) {
/* Clang < 3.6 is missing Hainan in its list of
diff --git a/src/gallium/drivers/radeonsi/amd_kernel_code_t.h 
b/src/gallium/drivers/radeonsi/amd_kernel_code_t.h
new file mode 100644
index 000..d0d7809
--- /dev/null
+++ b/src/gallium/drivers/radeonsi/amd_kernel_code_t.h
@@ -0,0 +1,534 @@
+/*
+ * Copyright 2015,2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * on the rights to use, copy, modify, merge, publish, distribute, sub
+ * license, and/or sell copies of the Software, and to permit persons to whom
+ * the Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef AMDKERNELCODET_H
+#define AMDKERNELCODET_H
+
+//---//
+// AMD Kernel Code, and its dependencies //
+//---//
+
+// Sets val bits for specified mask in specified dst packed instance.
+#define AMD_HSA_BITS_SET(dst, mask, val)   
\
+  dst &= (~(1 << mask ## _SHIFT) & ~mask); 
\
+  dst |= (((val) << mask ## _SHIFT) & mask)
+
+// Gets bits for specified mask from specified src packed instance.
+#define AMD_HSA_BITS_GET(src, mask)
\
+  ((src & mask) >> mask ## _SHIFT) 
\
+
+/* Every amd_*_code_t has the following properties, which are composed of
+ * a number of bit fields. Every bit field has a mask (AMD_CODE_PROPERTY_*),
+ * bit width (AMD_CODE_PROPERTY_*_WIDTH, and bit shift amount
+ * (AMD_CODE_PROPERTY_*_SHIFT) for convenient access. Unused bits must be 0.
+ *
+ * (Note that bit fields cannot be used as their layout is
+ * implementation defined in the C standard and so cannot be used to
+ * specify an ABI)
+ */
+enum amd_code_property_mask_t {
+
+  /* Enable the setup of the SGPR user data registers
+   * 

[Mesa-dev] [PATCH 1/2] radeonsi/compute: Add some more debug printfs

2016-09-13 Thread Tom Stellard
---
 src/gallium/drivers/radeonsi/si_compute.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 5041761..a79c224 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -298,6 +298,9 @@ static bool si_switch_compute_shader(struct si_context 
*sctx,
radeon_emit(cs, config->rsrc1);
radeon_emit(cs, config->rsrc2);
 
+   COMPUTE_DBG(sctx->screen, "COMPUTE_PGM_RSRC1: 0x%08x "
+   "COMPUTE_PGM_RSRC2: 0x%08x\n", config->rsrc1, config->rsrc2);
+
radeon_set_sh_reg(cs, R_00B860_COMPUTE_TMPRING_SIZE,
  S_00B860_WAVES(sctx->scratch_waves)
 | S_00B860_WAVESIZE(config->scratch_bytes_per_wave >> 10));
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: Don't use global variables for tess lds

2016-08-26 Thread Tom Stellard
We were allocating global variables for the maximum LDS size
which made the compiler think we were using all of LDS, which
isn't the case.
---
 src/gallium/drivers/radeonsi/si_shader.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 64c367e..5d972cb 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -5420,16 +5420,13 @@ static unsigned llvm_get_type_size(LLVMTypeRef type)
 static void declare_tess_lds(struct si_shader_context *ctx)
 {
struct gallivm_state *gallivm = >radeon_bld.gallivm;
-   LLVMTypeRef i32 = ctx->radeon_bld.soa.bld_base.uint_bld.elem_type;
-   unsigned lds_size = ctx->screen->b.chip_class >= CIK ? 65536 : 32768;
+   struct lp_build_tgsi_context *bld_base = >radeon_bld.soa.bld_base;
+   struct lp_build_context *uint = _base->uint_bld;
 
-   /* The actual size is computed outside of the shader to reduce
-* the number of shader variants. */
-   ctx->lds =
-   LLVMAddGlobalInAddressSpace(gallivm->module,
-   LLVMArrayType(i32, lds_size / 4),
-   "tess_lds",
-   LOCAL_ADDR_SPACE);
+   unsigned lds_size = ctx->screen->b.chip_class >= CIK ? 65536 : 32768;
+   ctx->lds = LLVMBuildIntToPtr(gallivm->builder, uint->zero,
+   LLVMPointerType(LLVMArrayType(ctx->i32, lds_size / 4), 
LOCAL_ADDR_SPACE),
+   "tess_lds");
 }
 
 static void create_function(struct si_shader_context *ctx)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: initialize and finalize the LLVM function pass manager

2016-08-18 Thread Tom Stellard
On Fri, Aug 12, 2016 at 01:26:08AM +0200, Marek Olšák wrote:
> From: Marek Olšák <marek.ol...@amd.com>
> 
> we should do that allegedly

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>
> ---
>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index d75311e..e04e26a 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -1918,21 +1918,23 @@ void radeon_llvm_finalize_module(struct 
> radeon_llvm_context *ctx)
>   LLVMAddPromoteMemoryToRegisterPass(gallivm->passmgr);
>  
>   /* Add some optimization passes */
>   LLVMAddScalarReplAggregatesPass(gallivm->passmgr);
>   LLVMAddLICMPass(gallivm->passmgr);
>   LLVMAddAggressiveDCEPass(gallivm->passmgr);
>   LLVMAddCFGSimplificationPass(gallivm->passmgr);
>   LLVMAddInstructionCombiningPass(gallivm->passmgr);
>  
>   /* Run the pass */
> + LLVMInitializeFunctionPassManager(gallivm->passmgr);
>   LLVMRunFunctionPassManager(gallivm->passmgr, ctx->main_fn);
> + LLVMFinalizeFunctionPassManager(gallivm->passmgr);
>  
>   LLVMDisposeBuilder(gallivm->builder);
>   LLVMDisposePassManager(gallivm->passmgr);
>   gallivm_dispose_target_library_info(target_library_info);
>  }
>  
>  void radeon_llvm_dispose(struct radeon_llvm_context *ctx)
>  {
>   LLVMDisposeModule(ctx->soa.bld_base.base.gallivm->module);
>   LLVMContextDispose(ctx->soa.bld_base.base.gallivm->context);
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/19] gallium/radeon: more descriptive names for LLVM temporaries in debug builds

2016-08-10 Thread Tom Stellard
On Tue, Aug 09, 2016 at 12:36:40PM +0200, Nicolai Hähnle wrote:
> From: Nicolai Hähnle <nicolai.haeh...@amd.com>
> 
This is a great idea.

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>

> ---
>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index 7b96a58..22ff18e 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -31,20 +31,21 @@
>  #include "gallivm/lp_bld_init.h"
>  #include "gallivm/lp_bld_intr.h"
>  #include "gallivm/lp_bld_misc.h"
>  #include "gallivm/lp_bld_swizzle.h"
>  #include "tgsi/tgsi_info.h"
>  #include "tgsi/tgsi_parse.h"
>  #include "util/u_math.h"
>  #include "util/u_memory.h"
>  #include "util/u_debug.h"
>  
> +#include 
>  #include 
>  #include 
>  
>  LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base,
> enum tgsi_opcode_type type)
>  {
>   LLVMContextRef ctx = bld_base->base.gallivm->context;
>  
>   switch (type) {
>   case TGSI_TYPE_UNSIGNED:
> @@ -421,20 +422,21 @@ static void emit_declaration(struct 
> lp_build_tgsi_context *bld_base,
>ctx->soa.addr[idx][chan] = 
> si_build_alloca_undef(
>   >gallivm,
>   ctx->soa.bld_base.uint_bld.elem_type, 
> "");
>   }
>   }
>   break;
>   }
>  
>   case TGSI_FILE_TEMPORARY:
>   {
> + char name[16] = "";
>   LLVMValueRef array_alloca = NULL;
>   unsigned decl_size;
>   first = decl->Range.First;
>   last = decl->Range.Last;
>   decl_size = 4 * ((last - first) + 1);
>   if (decl->Declaration.Array) {
>   unsigned id = decl->Array.ArrayID - 1;
>   if (!ctx->arrays) {
>   int size = 
> bld_base->info->array_max[TGSI_FILE_TEMPORARY];
>   ctx->arrays = CALLOC(size, 
> sizeof(ctx->arrays[0]));
> @@ -458,34 +460,42 @@ static void emit_declaration(struct 
> lp_build_tgsi_context *bld_base,
>   ctx->arrays[id].alloca = array_alloca;
>   }
>   }
>  
>   if (!ctx->temps_count) {
>   ctx->temps_count = 
> bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1;
>   ctx->temps = MALLOC(TGSI_NUM_CHANNELS * 
> ctx->temps_count * sizeof(LLVMValueRef));
>   }
>   if (!array_alloca) {
>   for (i = 0; i < decl_size; ++i) {
> +#ifdef DEBUG
> + snprintf(name, sizeof(name), "TEMP%d.%c",
> +  first + i / 4, "xyzw"[i % 4]);
> +#endif
>   ctx->temps[first * TGSI_NUM_CHANNELS + i] =
>   
> si_build_alloca_undef(bld_base->base.gallivm,
> 
> bld_base->base.vec_type,
> -   "temp");
> +   name);
>   }
>   } else {
>   LLVMValueRef idxs[2] = {
>   bld_base->uint_bld.zero,
>   NULL
>   };
>   for (i = 0; i < decl_size; ++i) {
> +#ifdef DEBUG
> + snprintf(name, sizeof(name), "TEMP%d.%c",
> +  first + i / 4, "xyzw"[i % 4]);
> +#endif
>   idxs[1] = 
> lp_build_const_int32(bld_base->base.gallivm, i);
>   ctx->temps[first * TGSI_NUM_CHANNELS + i] =
> - LLVMBuildGEP(builder, array_alloca, 
> idxs, 2, "temp");
> + LLVMBuildGEP(builder, array_alloca, 
> idxs, 2, name);
>   }
>   }
>   break;
>   }
>   case TGSI_FILE_INPUT:
>   {
>   unsigned idx;
>   for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) {
>   if (ctx->load_input)
>   ctx->load_input(ctx, idx, decl);
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/19] gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressing

2016-08-10 Thread Tom Stellard
On Tue, Aug 09, 2016 at 12:36:38PM +0200, Nicolai Hähnle wrote:
> From: Nicolai Hähnle <nicolai.haeh...@amd.com>
> 
> We can use the pointer stored in the temps array directly.

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>
> ---
>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index 41f24d3..e084248 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -352,25 +352,20 @@ LLVMValueRef radeon_llvm_emit_fetch(struct 
> lp_build_tgsi_context *bld_base,
>   case TGSI_FILE_TEMPORARY:
>   if (reg->Register.Index >= ctx->temps_count)
>   return LLVMGetUndef(tgsi2llvmtype(bld_base, type));
>   ptr = ctx->temps[reg->Register.Index * TGSI_NUM_CHANNELS + 
> swizzle];
>   if (tgsi_type_is_64bit(type)) {
>   ptr2 = ctx->temps[reg->Register.Index * 
> TGSI_NUM_CHANNELS + swizzle + 1];
>   return radeon_llvm_emit_fetch_64bit(bld_base, type,
>LLVMBuildLoad(builder, ptr, 
> ""),
>LLVMBuildLoad(builder, ptr2, 
> ""));
>   }
> - LLVMValueRef array = get_alloca_for_array(bld_base, 
> reg->Register.File, reg->Register.Index);
> - if (array) {
> - return bitcast(bld_base, type, 
> load_value_from_array(bld_base, reg->Register.File, type,
> - swizzle, reg->Register.Index, NULL));
> - }
>   result = LLVMBuildLoad(builder, ptr, "");
>   break;
>  
>   case TGSI_FILE_OUTPUT:
>   ptr = lp_get_output_ptr(bld, reg->Register.Index, swizzle);
>   if (tgsi_type_is_64bit(type)) {
>   ptr2 = lp_get_output_ptr(bld, reg->Register.Index, 
> swizzle + 1);
>   return radeon_llvm_emit_fetch_64bit(bld_base, type,
>LLVMBuildLoad(builder, ptr, 
> ""),
>LLVMBuildLoad(builder, ptr2, 
> ""));
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/19] gallium/radeon: simplify radeon_llvm_emit_store for direct array addressing

2016-08-10 Thread Tom Stellard
On Tue, Aug 09, 2016 at 12:36:39PM +0200, Nicolai Hähnle wrote:
> From: Nicolai Hähnle <nicolai.haeh...@amd.com>
> 
> We can use the pointer stored in the temps array directly.

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>
> ---
>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 7 ---
>  1 file changed, 7 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index e084248..7b96a58 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -624,30 +624,23 @@ void radeon_llvm_emit_store(struct 
> lp_build_tgsi_context *bld_base,
>   } else {
>   switch(reg->Register.File) {
>   case TGSI_FILE_OUTPUT:
>   temp_ptr = 
> bld->outputs[reg->Register.Index][chan_index];
>   if (tgsi_type_is_64bit(dtype))
>   temp_ptr2 = 
> bld->outputs[reg->Register.Index][chan_index + 1];
>   break;
>  
>   case TGSI_FILE_TEMPORARY:
>   {
> - LLVMValueRef array;
>   if (reg->Register.Index >= ctx->temps_count)
>   continue;
> - array = get_alloca_for_array(bld_base, 
> reg->Register.File, reg->Register.Index);
>  
> - if (array) {
> - store_value_to_array(bld_base, value, 
> reg->Register.File, chan_index, reg->Register.Index,
> - NULL);
> - continue;
> - }
>   temp_ptr = ctx->temps[ TGSI_NUM_CHANNELS * 
> reg->Register.Index + chan_index];
>   if (tgsi_type_is_64bit(dtype))
>   temp_ptr2 = ctx->temps[ 
> TGSI_NUM_CHANNELS * reg->Register.Index + chan_index + 1];
>  
>   break;
>   }
>   default:
>   return;
>   }
>   if (!tgsi_type_is_64bit(dtype))
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/19] gallium/radeon: clean up emit_declaration for temporaries

2016-08-10 Thread Tom Stellard
On Tue, Aug 09, 2016 at 12:36:37PM +0200, Nicolai Hähnle wrote:
> From: Nicolai Hähnle <nicolai.haeh...@amd.com>
> 
> In the alloca'd array case, no longer create redundant and unused allocas
> for the individual elements; create getelementptrs instead.

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>
> ---
>  .../drivers/radeon/radeon_setup_tgsi_llvm.c| 27 
> ++
>  1 file changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index d75311e..41f24d3 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -408,81 +408,90 @@ static LLVMValueRef si_build_alloca_undef(struct 
> gallivm_state *gallivm,
>   LLVMValueRef ptr = lp_build_alloca(gallivm, type, name);
>   LLVMBuildStore(gallivm->builder, LLVMGetUndef(type), ptr);
>   return ptr;
>  }
>  
>  static void emit_declaration(struct lp_build_tgsi_context *bld_base,
>const struct tgsi_full_declaration *decl)
>  {
>   struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
>   LLVMBuilderRef builder = bld_base->base.gallivm->builder;
> - unsigned first, last, i, idx;
> + unsigned first, last, i;
>   switch(decl->Declaration.File) {
>   case TGSI_FILE_ADDRESS:
>   {
>unsigned idx;
>   for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) {
>   unsigned chan;
>   for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) {
>ctx->soa.addr[idx][chan] = 
> si_build_alloca_undef(
>   >gallivm,
>   ctx->soa.bld_base.uint_bld.elem_type, 
> "");
>   }
>   }
>   break;
>   }
>  
>   case TGSI_FILE_TEMPORARY:
>   {
> + LLVMValueRef array_alloca = NULL;
>   unsigned decl_size;
>   first = decl->Range.First;
>   last = decl->Range.Last;
>   decl_size = 4 * ((last - first) + 1);
>   if (decl->Declaration.Array) {
>   unsigned id = decl->Array.ArrayID - 1;
>   if (!ctx->arrays) {
>   int size = 
> bld_base->info->array_max[TGSI_FILE_TEMPORARY];
>   ctx->arrays = CALLOC(size, 
> sizeof(ctx->arrays[0]));
> - for (i = 0; i < size; ++i) {
> - assert(!ctx->arrays[i].alloca);}
>   }
>  
>   ctx->arrays[id].range = decl->Range;
>  
>   /* If the array is more than 16 elements (each element
>* is 32-bits), then store it in a vector.  Storing the
>* array in a vector will causes the compiler to store
>* the array in registers and access it using indirect
>* addressing.  16 is number of vector elements that
>* LLVM will store in a register.
>* FIXME: We shouldn't need to do this.  LLVM should be
>* smart enough to promote allocas int registers when
>* profitable.
>*/
>   if (decl_size > 16) {
> - ctx->arrays[id].alloca = 
> LLVMBuildAlloca(builder,
> + array_alloca = LLVMBuildAlloca(builder,
>   LLVMArrayType(bld_base->base.vec_type, 
> decl_size),"array");
> + ctx->arrays[id].alloca = array_alloca;
>   }
>   }
> - first = decl->Range.First;
> - last = decl->Range.Last;
> +
>   if (!ctx->temps_count) {
>   ctx->temps_count = 
> bld_base->info->file_max[TGSI_FILE_TEMPORARY] + 1;
>   ctx->temps = MALLOC(TGSI_NUM_CHANNELS * 
> ctx->temps_count * sizeof(LLVMValueRef));
>   }
> - for (idx = first; idx <= last; idx++) {
> - for (i = 0; i < TGSI_NUM_CHANNELS; i++) {
> - ctx->temps[idx * TGSI_NUM_CHANNELS + i] =
> + if (!array_alloca) {
> + for (i = 0; i < decl_size; ++i) {
> + 

[Mesa-dev] [PATCH 1/2] radeonsi/compute: Add some more debug printfs

2016-07-25 Thread Tom Stellard
---
 src/gallium/drivers/radeonsi/si_compute.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 5a40286..949ab1a 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -299,6 +299,9 @@ static bool si_switch_compute_shader(struct si_context 
*sctx,
radeon_emit(cs, config->rsrc1);
radeon_emit(cs, config->rsrc2);
 
+   COMPUTE_DBG(sctx->screen, "COMPUTE_PGM_RSRC1: 0x%08x "
+   "COMPUTE_PGM_RSRC2: 0x%08x\n", config->rsrc1, config->rsrc2);
+
radeon_set_sh_reg(cs, R_00B860_COMPUTE_TMPRING_SIZE,
  S_00B860_WAVES(sctx->scratch_waves)
 | S_00B860_WAVESIZE(config->scratch_bytes_per_wave >> 10));
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeonsi/compute: Use the HSA abi for non-TGSI compute shaders

2016-07-25 Thread Tom Stellard
This patche switches non-TGSI compute shaders over to using the HSA
ABI described here:

https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md

The HSA ABI provides a much cleaner interface for compute shaders and allows
us to share more code in the compiler with the HSA stack.

The main changes in this patch are:
  - We now pass the scratch buffer resource into the shader via user sgprs
rather than using relocations.
  - Grid/Block sizes are now passed to the shader via the dispatch packet
rather than at the beginning of the kernel arguments.

Typically for HSA, the CP firmware will create the dispatch packet and set
up the user sgprs automatically.  However, in Mesa we let the driver do
this work.  The main reason for this is that I haven't researched how to
get the CP to do all these things, and I'm not sure if it is supported
for all GPUs.
---
 src/gallium/drivers/radeon/r600_pipe_common.c|   6 +-
 src/gallium/drivers/radeonsi/amd_kernel_code_t.h | 534 +++
 src/gallium/drivers/radeonsi/si_compute.c| 234 +-
 3 files changed, 756 insertions(+), 18 deletions(-)
 create mode 100644 src/gallium/drivers/radeonsi/amd_kernel_code_t.h

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index cd4908f..9ecf666 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -784,7 +784,11 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
if (rscreen->family <= CHIP_ARUBA) {
triple = "r600--";
} else {
-   triple = "amdgcn--";
+   if (HAVE_LLVM < 0x0400) {
+   triple = "amdgcn--";
+   } else {
+   triple = "amdgcn--mesa3d";
+   }
}
switch(rscreen->family) {
/* Clang < 3.6 is missing Hainan in its list of
diff --git a/src/gallium/drivers/radeonsi/amd_kernel_code_t.h 
b/src/gallium/drivers/radeonsi/amd_kernel_code_t.h
new file mode 100644
index 000..d0d7809
--- /dev/null
+++ b/src/gallium/drivers/radeonsi/amd_kernel_code_t.h
@@ -0,0 +1,534 @@
+/*
+ * Copyright 2015,2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * on the rights to use, copy, modify, merge, publish, distribute, sub
+ * license, and/or sell copies of the Software, and to permit persons to whom
+ * the Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef AMDKERNELCODET_H
+#define AMDKERNELCODET_H
+
+//---//
+// AMD Kernel Code, and its dependencies //
+//---//
+
+// Sets val bits for specified mask in specified dst packed instance.
+#define AMD_HSA_BITS_SET(dst, mask, val)   
\
+  dst &= (~(1 << mask ## _SHIFT) & ~mask); 
\
+  dst |= (((val) << mask ## _SHIFT) & mask)
+
+// Gets bits for specified mask from specified src packed instance.
+#define AMD_HSA_BITS_GET(src, mask)
\
+  ((src & mask) >> mask ## _SHIFT) 
\
+
+/* Every amd_*_code_t has the following properties, which are composed of
+ * a number of bit fields. Every bit field has a mask (AMD_CODE_PROPERTY_*),
+ * bit width (AMD_CODE_PROPERTY_*_WIDTH, and bit shift amount
+ * (AMD_CODE_PROPERTY_*_SHIFT) for convenient access. Unused bits must be 0.
+ *
+ * (Note that bit fields cannot be used as their layout is
+ * implementation defined in the C standard and so cannot be used to
+ * specify an ABI)
+ */
+enum amd_code_property_mask_t {
+
+  /* Enable the setup of the SGPR user data registers
+   * (AMD_CODE_PROPERTY_ENABLE_SGPR_*), see documentation of amd_kernel_code_t
+   * for initial register state.
+   *
+   * The 

[Mesa-dev] [PATCH 2/2] clover: Re-order includes in invocation.cpp to fix build

2016-07-20 Thread Tom Stellard
The build was failing because the official CL headers have a few defines, like:

\# define cl_khr_gl_sharing 1

Which have the same name as some class members of clang's OpenCLOptions class.
If we include the cl headers first, this breaks the build because the member
names of this class are replaced by the literal 1.
---
 .../state_trackers/clover/llvm/invocation.cpp  | 24 +++---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index bbd66d4..7b50b02 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -24,13 +24,6 @@
 // OTHER DEALINGS IN THE SOFTWARE.
 //
 
-#include "llvm/codegen.hpp"
-#include "llvm/compat.hpp"
-#include "llvm/invocation.hpp"
-#include "llvm/metadata.hpp"
-#include "llvm/util.hpp"
-#include "util/algorithm.hpp"
-
 #include 
 #include 
 #include 
@@ -45,6 +38,23 @@
 #include 
 #include 
 
+// We need to include internal headers last, because the internal headers
+// include CL headers which have #define's like:
+//
+//#define cl_khr_gl_sharing 1
+//#define cl_khr_icd 1
+//
+// Which will break the compilation of clang/Basic/OpenCLOptions.h
+
+#include "core/error.hpp"
+#include "llvm/codegen.hpp"
+#include "llvm/compat.hpp"
+#include "llvm/invocation.hpp"
+#include "llvm/metadata.hpp"
+#include "llvm/util.hpp"
+#include "util/algorithm.hpp"
+
+
 using namespace clover;
 using namespace clover::llvm;
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] clover: Add missing include v2

2016-07-20 Thread Tom Stellard
There was a patch committed to clang to remove unnecessary includes from
header files, so we now need to explicitly include
clang/Lex/PreprocessorOptions.h

v2:
  - Use <> instead of "" for the include path.
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index 4b7de26..bbd66d4 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -39,6 +39,8 @@
 #include 
 
 #include 
+#include 
+
 #include 
 #include 
 #include 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] clover: Re-order includes in invocation.cpp to fix build

2016-07-19 Thread Tom Stellard
The build was failing because the official CL headers have a few defines, like:

Which have the same name as some class members of clang's OpenCLOptions class.
If we include the cl headers first, this breaks the build because the member
names of this class are replaced by the literal 1.
---
 .../state_trackers/clover/llvm/invocation.cpp  | 24 +++---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index 437d75e..81ace64 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -24,13 +24,6 @@
 // OTHER DEALINGS IN THE SOFTWARE.
 //
 
-#include "llvm/codegen.hpp"
-#include "llvm/compat.hpp"
-#include "llvm/invocation.hpp"
-#include "llvm/metadata.hpp"
-#include "llvm/util.hpp"
-#include "util/algorithm.hpp"
-
 #include 
 #include 
 #include 
@@ -45,6 +38,23 @@
 #include 
 #include 
 
+// We need to include internal headers last, because the internal headers
+// include CL headers which have #define's like:
+//
+//#define cl_khr_gl_sharing 1
+//#define cl_khr_icd 1
+//
+// Which will break the compilation of clang/Basic/OpenCLOptions.h
+
+#include "core/error.hpp"
+#include "llvm/codegen.hpp"
+#include "llvm/compat.hpp"
+#include "llvm/invocation.hpp"
+#include "llvm/metadata.hpp"
+#include "llvm/util.hpp"
+#include "util/algorithm.hpp"
+
+
 using namespace clover;
 using namespace clover::llvm;
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] clover: Add missing include

2016-07-19 Thread Tom Stellard
There was a patch committed to clang to remove unnecessary includes from
header files, so we now need to explicitly include
clang/Lex/PreprocessorOptions.h
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index 4b7de26..437d75e 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -39,6 +39,8 @@
 #include 
 
 #include 
+#include "clang/Lex/PreprocessorOptions.h"
+
 #include 
 #include 
 #include 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] radeonsi: set dereferenceable attribute on descriptor arrays

2016-07-13 Thread Tom Stellard
On Wed, Jul 13, 2016 at 03:20:55PM -0400, Tom Stellard wrote:
> On Tue, Jul 12, 2016 at 10:52:35PM +0200, Marek Olšák wrote:
> > From: Marek Olšák <marek.ol...@amd.com>
> > 
> > This allows moving the loads arbitrarily in the Sinking pass.
> > 
> > 26002 shaders in 14643 tests
> > Totals:
> > SGPRS: 2080160 -> 2080160 (0.00 %)
> > VGPRS: 798875 -> 797826 (-0.13 %)
> > Spilled SGPRs: 108485 -> 79165 (-27.03 %)
> > Spilled VGPRs: 327 -> 327 (0.00 %)
> > Scratch VGPRs: 1656 -> 1652 (-0.24 %) dwords per thread
> > Code Size: 36127192 -> 35559780 (-1.57 %) bytes
> > LDS: 767 -> 767 (0.00 %) blocks
> > Max Waves: 212464 -> 212672 (0.10 %)
> > Wait states: 0 -> 0 (0.00 %)
> > 
> >  PERCENTAGES / AppShadersSGPRs VGPRs  SpillSGPR SpillVGPR  
> > Scratch   CodeSize  MaxWavesWaits
> >  (unknown)  4 . . . . . 
> > . . .
> >  0ad6 . . . . . 
> > . . .
> >  alien_isolation 2938 .0.04 %   -8.53 % . . 
> >   -0.71 %   -0.06 % .
> >  anholt10 . . . . . 
> > . . .
> >  batman_arkham_origins589 .   -0.58 %  -79.54 % . . 
> >   -6.72 %0.57 % .
> >  bioshock-infinite   1769 .   -0.65 %  -89.32 % . . 
> >   -4.73 %0.48 % .
> >  borderlands23968 .   -0.31 %  -51.21 % . . 
> >   -4.09 %0.22 % .
> >  brutal-legend338 .   -0.03 %   -2.95 % . . 
> >   -0.06 % . .
> >  civilization_beyond..116 . .  -14.17 % . . 
> >   -0.88 % . .
> >  counter_strike_glob..   1142 . . . . . 
> > . . .
> >  dirt-showdown541 .   -0.56 %  -40.14 % .   
> > -3.45 %   -1.82 %0.35 % .
> >  dolphin   22 . . . . . 
> >0.16 % . .
> >  dota2   1747 . . . . . 
> >0.01 % . .
> >  europa_universalis_4  76 .   -0.23 %  -42.11 % . . 
> >   -0.96 % . .
> >  f1-2015  774 .   -0.09 %  -28.89 % . . 
> >   -2.60 %0.09 % .
> >  furmark-0.7.0  4 . . . . . 
> > . . .
> >  gimark-0.7.0  10 . . . . . 
> > . . .
> >  glamor16 . . . . . 
> > . . .
> >  humus-celshading   4 . . . . . 
> > . . .
> >  humus-domino   6 . . . . . 
> > . . .
> >  humus-dynamicbranching24 .0.71 % . . . 
> >0.29 %   -0.45 % .
> >  humus-hdr 10 . . . . . 
> > . . .
> >  humus-portals  2 . . . . . 
> > . . .
> >  humus-volumetricfog..  6 . . . . . 
> > . . .
> >  left_4_dead_2   1762 . . . . . 
> > . . .
> >  metro_2033_redux2670 .   -0.10 %   -7.15 % . . 
> >   -0.03 % . .
> >  nexuiz80 . . . . . 
> > . . .
> >  pixmark-julia-fp32 2 . . . . . 
> > . . .
> >  pixmark-julia-fp64 2 . . . . . 
> > . . .
> >  pixmark-piano-0.7.02 . . . . . 
> > . . .
> >  pixmark-volplosion-..  2 . . . . . 
> > . . .
> >  plot3d-0.7.0   8 . . . . . 
> > . .

Re: [Mesa-dev] [PATCH 3/5] radeonsi: replace !tbaa with !invariant.load

2016-07-13 Thread Tom Stellard
On Tue, Jul 12, 2016 at 10:52:36PM +0200, Marek Olšák wrote:
> From: Marek Olšák <marek.ol...@amd.com>
> 

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>
> no change in generated code thanks to dereferenceable(n)
> ---
>  src/gallium/drivers/radeonsi/si_shader.c | 17 +
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index b23c7c6..ee63b95 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -101,10 +101,9 @@ struct si_shader_context
>  
>   LLVMTargetMachineRef tm;
>  
> + unsigned invariant_load_md_kind;
>   unsigned range_md_kind;
> - unsigned tbaa_md_kind;
>   unsigned uniform_md_kind;
> - LLVMValueRef tbaa_const_md;
>   LLVMValueRef empty_md;
>  
>   LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS];
> @@ -418,7 +417,7 @@ static LLVMValueRef build_indexed_load_const(
>   LLVMValueRef base_ptr, LLVMValueRef index)
>  {
>   LLVMValueRef result = build_indexed_load(ctx, base_ptr, index, true);
> - LLVMSetMetadata(result, ctx->tbaa_md_kind, ctx->tbaa_const_md);
> + LLVMSetMetadata(result, ctx->invariant_load_md_kind, ctx->empty_md);
>   return result;
>  }
>  
> @@ -5315,7 +5314,7 @@ static void si_create_function(struct si_shader_context 
> *ctx,
>   /* The combination of:
>* - ByVal
>* - dereferenceable
> -  * - tbaa
> +  * - invariant.load
>* allows the optimization passes to move loads and reduces
>* SGPR spilling significantly.
>*/
> @@ -5346,21 +5345,15 @@ static void si_create_function(struct 
> si_shader_context *ctx,
>  static void create_meta_data(struct si_shader_context *ctx)
>  {
>   struct gallivm_state *gallivm = 
> ctx->radeon_bld.soa.bld_base.base.gallivm;
> - LLVMValueRef tbaa_const[3];
>  
> + ctx->invariant_load_md_kind = LLVMGetMDKindIDInContext(gallivm->context,
> +
> "invariant.load", 14);
>   ctx->range_md_kind = LLVMGetMDKindIDInContext(gallivm->context,
>"range", 5);
> - ctx->tbaa_md_kind = LLVMGetMDKindIDInContext(gallivm->context,
> -  "tbaa", 4);
>   ctx->uniform_md_kind = LLVMGetMDKindIDInContext(gallivm->context,
>   "amdgpu.uniform", 14);
>  
>   ctx->empty_md = LLVMMDNodeInContext(gallivm->context, NULL, 0);
> -
> - tbaa_const[0] = LLVMMDStringInContext(gallivm->context, "const", 5);
> - tbaa_const[1] = 0;
> - tbaa_const[2] = lp_build_const_int32(gallivm, 1);
> - ctx->tbaa_const_md = LLVMMDNodeInContext(gallivm->context, tbaa_const, 
> 3);
>  }
>  
>  static void declare_streamout_params(struct si_shader_context *ctx,
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] radeonsi: set dereferenceable attribute on descriptor arrays

2016-07-13 Thread Tom Stellard
On Tue, Jul 12, 2016 at 10:52:35PM +0200, Marek Olšák wrote:
> From: Marek Olšák 
> 
> This allows moving the loads arbitrarily in the Sinking pass.
> 
> 26002 shaders in 14643 tests
> Totals:
> SGPRS: 2080160 -> 2080160 (0.00 %)
> VGPRS: 798875 -> 797826 (-0.13 %)
> Spilled SGPRs: 108485 -> 79165 (-27.03 %)
> Spilled VGPRs: 327 -> 327 (0.00 %)
> Scratch VGPRs: 1656 -> 1652 (-0.24 %) dwords per thread
> Code Size: 36127192 -> 35559780 (-1.57 %) bytes
> LDS: 767 -> 767 (0.00 %) blocks
> Max Waves: 212464 -> 212672 (0.10 %)
> Wait states: 0 -> 0 (0.00 %)
> 
>  PERCENTAGES / AppShadersSGPRs VGPRs  SpillSGPR SpillVGPR  
> Scratch   CodeSize  MaxWavesWaits
>  (unknown)  4 . . . . .   
>   . . .
>  0ad6 . . . . .   
>   . . .
>  alien_isolation 2938 .0.04 %   -8.53 % . .   
> -0.71 %   -0.06 % .
>  anholt10 . . . . .   
>   . . .
>  batman_arkham_origins589 .   -0.58 %  -79.54 % . .   
> -6.72 %0.57 % .
>  bioshock-infinite   1769 .   -0.65 %  -89.32 % . .   
> -4.73 %0.48 % .
>  borderlands23968 .   -0.31 %  -51.21 % . .   
> -4.09 %0.22 % .
>  brutal-legend338 .   -0.03 %   -2.95 % . .   
> -0.06 % . .
>  civilization_beyond..116 . .  -14.17 % . .   
> -0.88 % . .
>  counter_strike_glob..   1142 . . . . .   
>   . . .
>  dirt-showdown541 .   -0.56 %  -40.14 % .   -3.45 
> %   -1.82 %0.35 % .
>  dolphin   22 . . . . .   
>  0.16 % . .
>  dota2   1747 . . . . .   
>  0.01 % . .
>  europa_universalis_4  76 .   -0.23 %  -42.11 % . .   
> -0.96 % . .
>  f1-2015  774 .   -0.09 %  -28.89 % . .   
> -2.60 %0.09 % .
>  furmark-0.7.0  4 . . . . .   
>   . . .
>  gimark-0.7.0  10 . . . . .   
>   . . .
>  glamor16 . . . . .   
>   . . .
>  humus-celshading   4 . . . . .   
>   . . .
>  humus-domino   6 . . . . .   
>   . . .
>  humus-dynamicbranching24 .0.71 % . . .   
>  0.29 %   -0.45 % .
>  humus-hdr 10 . . . . .   
>   . . .
>  humus-portals  2 . . . . .   
>   . . .
>  humus-volumetricfog..  6 . . . . .   
>   . . .
>  left_4_dead_2   1762 . . . . .   
>   . . .
>  metro_2033_redux2670 .   -0.10 %   -7.15 % . .   
> -0.03 % . .
>  nexuiz80 . . . . .   
>   . . .
>  pixmark-julia-fp32 2 . . . . .   
>   . . .
>  pixmark-julia-fp64 2 . . . . .   
>   . . .
>  pixmark-piano-0.7.02 . . . . .   
>   . . .
>  pixmark-volplosion-..  2 . . . . .   
>   . . .
>  plot3d-0.7.0   8 . . . . .   
>   . . .
>  portal   474 . . . . .   
>   . . .
>  sauerbraten7 . . . . .   
>   . . .
>  serious_sam_3_bfe392 . .  -13.20 % . .   
> -1.81 % . .
>  supertuxkart   4 . . . . .   
>   . . .
>  talos_principle  324 .   -0.21 %  -18.39 % . .   
> -2.73 %0.14 % .
>  team_fortress_2  808 . . . . .   
>   . . .
>  tesseract430 .0.08 %  -68.57 % . .   
>  

Re: [Mesa-dev] [PATCH RFC 1/1] r600, compute: Use vtx #3 for kernel arguments

2016-07-11 Thread Tom Stellard
On Sun, Jun 26, 2016 at 08:40:55PM -0400, Jan Vesely wrote:
> Both explicit and implicit.
> Using vtx 0 (as existing llvm code implies) does not work for dynamic offsets.
> 
> Signed-off-by: Jan Vesely <jan.ves...@rutgers.edu>

I have no idea why vtx#3 works when vtx#0, maybe add a comment
explaining why we are using vtx#3.

With that change:

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>

> ---
> Hi,
> 
> I ran into problem when using VTX_READ from constant buffer would work only 
> for 0 index. The LLVM code implied that it should work (or maybe they 
> considered constant offsets only), but I could not find one way or the other 
> in ISA docs.
> 
> Switching to vtx#3 fixed the problem, though I'm not sure if it's the right 
> solution.
> 
> thanks,
> Jan
> 
> 
>  src/gallium/drivers/r600/evergreen_compute.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
> b/src/gallium/drivers/r600/evergreen_compute.c
> index 7f9580c..b351cee 100644
> --- a/src/gallium/drivers/r600/evergreen_compute.c
> +++ b/src/gallium/drivers/r600/evergreen_compute.c
> @@ -369,6 +369,8 @@ static void evergreen_compute_upload_input(struct 
> pipe_context *ctx,
>   ctx->transfer_unmap(ctx, transfer);
>  
>   /* ID=0 is reserved for the parameters */
> + evergreen_cs_set_vertex_buffer(rctx, 3, 0,
> + (struct pipe_resource*)shader->kernel_param);
>   evergreen_cs_set_constant_buffer(rctx, 0, 0, input_size,
>   (struct pipe_resource*)shader->kernel_param);
>  }
> @@ -614,9 +616,9 @@ static void evergreen_set_compute_resources(struct 
> pipe_context *ctx,
>   start, count);
>  
>   for (unsigned i = 0; i < count; i++) {
> - /* The First three vertex buffers are reserved for parameters 
> and
> + /* The First four vertex buffers are reserved for parameters and
>* global buffers. */
> - unsigned vtx_id = 3 + i;
> + unsigned vtx_id = 4 + i;
>   if (resources[i]) {
>   struct r600_resource_global *buffer =
>   (struct r600_resource_global*)
> -- 
> 2.7.4
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: don't interleave R600_DEBUG-enabled shader dumps

2016-07-06 Thread Tom Stellard
On Wed, Jul 06, 2016 at 11:55:03PM +0200, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> Only setting R600_DEBUG doesn't set any debug callback. Conversely, the debug
> callback is only called when R600_DEBUG is set.

I don't get any output from shader-db with this patch.

-Tom

> ---
>  src/gallium/drivers/radeonsi/si_state_shaders.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
> b/src/gallium/drivers/radeonsi/si_state_shaders.c
> index abbe451..059ff70 100644
> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
> @@ -1324,7 +1324,7 @@ static void *si_create_shader_selector(struct 
> pipe_context *ctx,
>   pipe_mutex_init(sel->mutex);
>   util_queue_fence_init(>ready);
>  
> - if (sctx->b.debug.debug_message ||
> + if (r600_can_dump_shader(>b, sel->info.processor) ||
>   !util_queue_is_initialized(>shader_compiler_queue))
>   si_init_shader_selector_async(sel, -1);
>   else
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: set LLVMNoUnwindAttribute on all intrinsics

2016-07-05 Thread Tom Stellard
On Tue, Jul 05, 2016 at 11:36:03AM +0200, Marek Olšák wrote:
> From: Marek Olšák 
> 
> RadeonSI stats: Mostly 0% difference, but Valley shows a small improvement:
> 

Do you know which intrinsic this made a difference for?  I'm guessing
this is required for all the intrinsics defined in the backend (e.g. in
SIIntrinsiscs.td), and is a good reason why we should move those into
include/llvm/IR/IntrinsicsAMDGPUt.td.

-Tom

>  ApplicationFilesSGPRs VGPRs   SpillSGPR SpillVGPR Code 
> SizeLDSMax Waves   Waits
>  unigine_valley   2780.00 %   -0.29 %0.00 %0.00 %0.01 
> %0.00 %0.17 %0.00 %
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_intr.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_intr.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
> index 0a8f996..f12e735 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_intr.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
> @@ -145,8 +145,10 @@ lp_build_intrinsic(LLVMBuilderRef builder,
>  
>function = lp_declare_intrinsic(module, name, ret_type, arg_types, 
> num_args);
>  
> -  if (attr)
> -  LLVMAddFunctionAttr(function, attr);
> +  /* NoUnwind indicates that the intrinsic never raises a C++ exception.
> +   * Set it for all intrinsics.
> +   */
> +  LLVMAddFunctionAttr(function, attr | LLVMNoUnwindAttribute);
>  
>if (gallivm_debug & GALLIVM_DEBUG_IR) {
>   lp_debug_dump_value(function);
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] clover: Implement linking of multiple CL programs.

2016-07-04 Thread Tom Stellard
On Sun, Jul 03, 2016 at 05:51:09PM -0700, Francisco Jerez wrote:
> This series is the result of a long back and forth of patches between
> Serge and me.  Most of the changes are already reviewed by either of
> us, but I'm sending them again over the mailing list just in case
> somebody else wants to give additional feedback.  Patches 1-30 are
> mainly cleaning up and reworking the current LLVM interfacing
> infrastructure in preparation for the following changes.  Patches
> 31-47 is where program compilation is actually split in two stages and
> support for linking multiple CL programs into one executable binary is
> implemented.
> 

I have no objections to this series and will be happy to see it
committed.

-Tom

> You can find the same series in the clover-link-program branch of my
> git tree:
> 
> https://cgit.freedesktop.org/~currojerez/mesa/log/?h=clover-link-program
> 
> [PATCH 01/47] clover: Bump required LLVM version to 3.6.
> [PATCH 02/47] clover/llvm: Drop support for LLVM < 3.6.
> [PATCH 03/47] clover/llvm: Drop dead code.
> [PATCH 04/47] clover/llvm: Collect #ifdef mess into a separate file.
> [PATCH 05/47] clover/llvm: Factor out target string parsing.
> [PATCH 06/47] clover/llvm: Factor out compiler option tokenization.
> [PATCH 07/47] clover/llvm: Refactor compiler instance initialization.
> [PATCH 08/47] clover/llvm: Declare compiler instance at top level and pass 
> down as argument.
> [PATCH 09/47] clover/llvm: Factor out LLVM context init.
> [PATCH 10/47] clover/llvm: Clean up compilation into LLVM IR.
> [PATCH 11/47] clover/llvm: Trivial codestyle clean-up for optimize().
> [PATCH 12/47] clover/llvm: Simplify diagnostic_handler().
> [PATCH 13/47] clover/llvm: Use helper function to abort compilation with 
> error message.
> [PATCH 14/47] clover/llvm: Tidy debug handling.
> [PATCH 15/47] clover/llvm: Move a bunch of utility functions into separate 
> file.
> [PATCH 16/47] clover/llvm: Clean up ELF parsing.
> [PATCH 17/47] clover/llvm: Clean up compile_native().
> [PATCH 18/47] clover/llvm: Factor out duplicated construction of 
> clover::module.
> [PATCH 19/47] clover/llvm: Fold compile_native() call into 
> build_module_native().
> [PATCH 20/47] clover/llvm: Clean up codestyle of get_kernel_args().
> [PATCH 21/47] clover/llvm: Add simplified utility functions for metadata 
> introspection.
> [PATCH 22/47] clover/llvm: Use metadata introspection utils for kernel 
> argument set-up.
> [PATCH 23/47] clover/llvm: Use metadata introspection utils for kernel 
> enumeration.
> [PATCH 24/47] clover/llvm: Clean up bitcode codegen.
> [PATCH 25/47] clover/llvm: Split native codegen and assembly print-out into 
> separate functions.
> [PATCH 26/47] clover/llvm: Define function for bitcode print-out.
> [PATCH 27/47] clover/llvm: Split shared codegen support code into separate 
> file.
> [PATCH 28/47] clover/llvm: Split bitcode codegen into separate file.
> [PATCH 29/47] clover/llvm: Split native codegen into separate file.
> [PATCH 30/47] clover/llvm: Trivial assorted cleanups for invocation.cpp.
> [PATCH 31/47] clover/llvm: Implement library bitcode codegen.
> [PATCH 32/47] clover/llvm: Split compilation and linking.
> [PATCH 33/47] clover/llvm: Implement linkage of multiple clover modules.
> [PATCH 34/47] clover/llvm: Implement the -create-library linker option.
> [PATCH 35/47] clover/tgsi: Move compiler entry point declaration into tgsi 
> directory and namespace.
> [PATCH 36/47] clover/tgsi: Add stub link_program() function.
> [PATCH 37/47] clover: Override ret_object.
> [PATCH 38/47] clover: Move back to using build_error to signal compilation 
> failure.
> [PATCH 39/47] clover: Define error subclass to signal build option parse 
> failure.
> [PATCH 40/47] clover: Change program::build opts argument to std::string.
> [PATCH 41/47] clover: Unify program::build_* into a single method returning a 
> struct.
> [PATCH 42/47] clover: Provide separate program methods for compilation and 
> linking.
> [PATCH 43/47] clover/llvm: Get rid of compile_program_llvm().
> [PATCH 44/47] clover/core: Remove compiler.hpp.
> [PATCH 45/47] clover: Trivial cleanups for api/program.cpp.
> [PATCH 46/47] clover: Add clLinkProgram (CL 1.2).
> [PATCH 47/47] clover/api: Implement clLinkProgram per-device binary presence 
> validation rule.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add a debug flag for unsafe math LLVM optimizations

2016-06-21 Thread Tom Stellard
On Tue, Jun 21, 2016 at 10:42:45PM +0200, Marek Olšák wrote:
> On Tue, Jun 21, 2016 at 9:27 PM, Tom Stellard <t...@stellard.net> wrote:
> > On Tue, Jun 21, 2016 at 08:17:15PM +0200, Marek Olšák wrote:
> >> On Tue, Jun 21, 2016 at 7:21 PM, Tom Stellard <t...@stellard.net> wrote:
> >> > On Mon, Jun 13, 2016 at 06:27:02PM +0200, Marek Olšák wrote:
> >> >> From: Marek Olšák <marek.ol...@amd.com>
> >> >>
> >> >> ---
> >> >>  src/gallium/drivers/radeon/r600_pipe_common.c |  1 +
> >> >>  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
> >> >>  src/gallium/drivers/radeonsi/si_shader.c  | 16 
> >> >>  3 files changed, 18 insertions(+)
> >> >>
> >> >> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
> >> >> b/src/gallium/drivers/radeon/r600_pipe_common.c
> >> >> index fa9f70d..5d4a679 100644
> >> >> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
> >> >> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
> >> >> @@ -482,6 +482,7 @@ static const struct debug_named_value 
> >> >> common_debug_options[] = {
> >> >>   { "sisched", DBG_SI_SCHED, "Enable LLVM SI Machine Instruction 
> >> >> Scheduler." },
> >> >>   { "mono", DBG_MONOLITHIC_SHADERS, "Use old-style monolithic 
> >> >> shaders compiled on demand" },
> >> >>   { "noce", DBG_NO_CE, "Disable the constant engine"},
> >> >> + { "unsafemath", DBG_UNSAFE_MATH, "Enable unsafe math shader 
> >> >> optimizations" },
> >> >>
> >> >>   DEBUG_NAMED_VALUE_END /* must be last */
> >> >>  };
> >> >> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
> >> >> b/src/gallium/drivers/radeon/r600_pipe_common.h
> >> >> index 77dfc4f..263ef5e 100644
> >> >> --- a/src/gallium/drivers/radeon/r600_pipe_common.h
> >> >> +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
> >> >> @@ -99,6 +99,7 @@
> >> >>  #define DBG_SI_SCHED (1llu << 46)
> >> >>  #define DBG_MONOLITHIC_SHADERS   (1llu << 47)
> >> >>  #define DBG_NO_CE(1llu << 48)
> >> >> +#define DBG_UNSAFE_MATH  (1llu << 49)
> >> >>
> >> >>  #define R600_MAP_BUFFER_ALIGNMENT 64
> >> >>  #define R600_MAX_VIEWPORTS16
> >> >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> >> >> b/src/gallium/drivers/radeonsi/si_shader.c
> >> >> index 6dc4985..bba6a55 100644
> >> >> --- a/src/gallium/drivers/radeonsi/si_shader.c
> >> >> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> >> >> @@ -5255,6 +5255,22 @@ static void si_create_function(struct 
> >> >> si_shader_context *ctx,
> >> >>   else
> >> >>   LLVMAddAttribute(P, LLVMInRegAttribute);
> >> >>   }
> >> >> +
> >> >> + if (ctx->screen->b.debug_flags & DBG_UNSAFE_MATH) {
> >> >> + /* These were copied from some LLVM test. */
> >> >> + 
> >> >> LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> >> >> +"less-precise-fpmad",
> >> >> +"true");
> >> >> + 
> >> >> LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> >> >> +"no-infs-fp-math",
> >> >> +"true");
> >> >> + 
> >> >> LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> >> >> +"no-nans-fp-math",
> >> >> +"true");
> >> >> + 
> >> >> LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> >> >> +"unsafe-fp-math",
> >> >> +"true");
> >> >> + }
> >> >
> >> > You may get better results by also adding the fast-math flags to the
> >> > individual floating-point instructions, but this would be a more
> >> > invasive change.
> >>
> >> Is there sample code showing how to do that?
> >>
> >
> > Something like this
> > https://cgit.freedesktop.org/~tstellar/mesa/commit/?h=fast-math=4fa18fdde3a2b51b2371064bc27729fd1038c219
> >
> > There are more flags than just the UnsafeAlgebra that is used in the
> > patch.
> 
> setUnsafeAlgebra enables all flags, not just UnsafeAlgebra.
> 
> Do the flags apply to whole expression trees or just one instruction?
> 

Just one instruction, but they should apply to expression trees if all
instructions in the tree have the same flags.

-Tom
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add a debug flag for unsafe math LLVM optimizations

2016-06-21 Thread Tom Stellard
On Tue, Jun 21, 2016 at 08:17:15PM +0200, Marek Olšák wrote:
> On Tue, Jun 21, 2016 at 7:21 PM, Tom Stellard <t...@stellard.net> wrote:
> > On Mon, Jun 13, 2016 at 06:27:02PM +0200, Marek Olšák wrote:
> >> From: Marek Olšák <marek.ol...@amd.com>
> >>
> >> ---
> >>  src/gallium/drivers/radeon/r600_pipe_common.c |  1 +
> >>  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
> >>  src/gallium/drivers/radeonsi/si_shader.c  | 16 
> >>  3 files changed, 18 insertions(+)
> >>
> >> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
> >> b/src/gallium/drivers/radeon/r600_pipe_common.c
> >> index fa9f70d..5d4a679 100644
> >> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
> >> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
> >> @@ -482,6 +482,7 @@ static const struct debug_named_value 
> >> common_debug_options[] = {
> >>   { "sisched", DBG_SI_SCHED, "Enable LLVM SI Machine Instruction 
> >> Scheduler." },
> >>   { "mono", DBG_MONOLITHIC_SHADERS, "Use old-style monolithic shaders 
> >> compiled on demand" },
> >>   { "noce", DBG_NO_CE, "Disable the constant engine"},
> >> + { "unsafemath", DBG_UNSAFE_MATH, "Enable unsafe math shader 
> >> optimizations" },
> >>
> >>   DEBUG_NAMED_VALUE_END /* must be last */
> >>  };
> >> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
> >> b/src/gallium/drivers/radeon/r600_pipe_common.h
> >> index 77dfc4f..263ef5e 100644
> >> --- a/src/gallium/drivers/radeon/r600_pipe_common.h
> >> +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
> >> @@ -99,6 +99,7 @@
> >>  #define DBG_SI_SCHED (1llu << 46)
> >>  #define DBG_MONOLITHIC_SHADERS   (1llu << 47)
> >>  #define DBG_NO_CE(1llu << 48)
> >> +#define DBG_UNSAFE_MATH  (1llu << 49)
> >>
> >>  #define R600_MAP_BUFFER_ALIGNMENT 64
> >>  #define R600_MAX_VIEWPORTS16
> >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> >> b/src/gallium/drivers/radeonsi/si_shader.c
> >> index 6dc4985..bba6a55 100644
> >> --- a/src/gallium/drivers/radeonsi/si_shader.c
> >> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> >> @@ -5255,6 +5255,22 @@ static void si_create_function(struct 
> >> si_shader_context *ctx,
> >>   else
> >>   LLVMAddAttribute(P, LLVMInRegAttribute);
> >>   }
> >> +
> >> + if (ctx->screen->b.debug_flags & DBG_UNSAFE_MATH) {
> >> + /* These were copied from some LLVM test. */
> >> + LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> >> +"less-precise-fpmad",
> >> +"true");
> >> + LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> >> +"no-infs-fp-math",
> >> +"true");
> >> + LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> >> +"no-nans-fp-math",
> >> +"true");
> >> + LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> >> +"unsafe-fp-math",
> >> +"true");
> >> + }
> >
> > You may get better results by also adding the fast-math flags to the
> > individual floating-point instructions, but this would be a more
> > invasive change.
> 
> Is there sample code showing how to do that?
> 

Something like this
https://cgit.freedesktop.org/~tstellar/mesa/commit/?h=fast-math=4fa18fdde3a2b51b2371064bc27729fd1038c219

There are more flags than just the UnsafeAlgebra that is used in the
patch.

-Tom

> Thanks,
> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add a debug flag for unsafe math LLVM optimizations

2016-06-21 Thread Tom Stellard
On Mon, Jun 13, 2016 at 06:27:02PM +0200, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/gallium/drivers/radeon/r600_pipe_common.c |  1 +
>  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
>  src/gallium/drivers/radeonsi/si_shader.c  | 16 
>  3 files changed, 18 insertions(+)
> 
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
> b/src/gallium/drivers/radeon/r600_pipe_common.c
> index fa9f70d..5d4a679 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
> @@ -482,6 +482,7 @@ static const struct debug_named_value 
> common_debug_options[] = {
>   { "sisched", DBG_SI_SCHED, "Enable LLVM SI Machine Instruction 
> Scheduler." },
>   { "mono", DBG_MONOLITHIC_SHADERS, "Use old-style monolithic shaders 
> compiled on demand" },
>   { "noce", DBG_NO_CE, "Disable the constant engine"},
> + { "unsafemath", DBG_UNSAFE_MATH, "Enable unsafe math shader 
> optimizations" },
>  
>   DEBUG_NAMED_VALUE_END /* must be last */
>  };
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
> b/src/gallium/drivers/radeon/r600_pipe_common.h
> index 77dfc4f..263ef5e 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.h
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
> @@ -99,6 +99,7 @@
>  #define DBG_SI_SCHED (1llu << 46)
>  #define DBG_MONOLITHIC_SHADERS   (1llu << 47)
>  #define DBG_NO_CE(1llu << 48)
> +#define DBG_UNSAFE_MATH  (1llu << 49)
>  
>  #define R600_MAP_BUFFER_ALIGNMENT 64
>  #define R600_MAX_VIEWPORTS16
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index 6dc4985..bba6a55 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -5255,6 +5255,22 @@ static void si_create_function(struct 
> si_shader_context *ctx,
>   else
>   LLVMAddAttribute(P, LLVMInRegAttribute);
>   }
> +
> + if (ctx->screen->b.debug_flags & DBG_UNSAFE_MATH) {
> + /* These were copied from some LLVM test. */
> + LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> +"less-precise-fpmad",
> +"true");
> + LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> +"no-infs-fp-math",
> +"true");
> + LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> +"no-nans-fp-math",
> +"true");
> + LLVMAddTargetDependentFunctionAttr(ctx->radeon_bld.main_fn,
> +"unsafe-fp-math",
> +"true");
> + }

You may get better results by also adding the fast-math flags to the
individual floating-point instructions, but this would be a more
invasive change.

-Tom

>  }
>  
>  static void create_meta_data(struct si_shader_context *ctx)
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] radeon/llvm: Remove uses_temp_indirect_addressing() function

2016-05-25 Thread Tom Stellard
bld->indirect_files is never set, so this function always returns false.
---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 24 +-
 1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index dd86f66..759c674 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -159,13 +159,6 @@ emit_array_fetch(
return result;
 }
 
-static bool uses_temp_indirect_addressing(
-   struct lp_build_tgsi_context *bld_base)
-{
-   struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base);
-   return (bld->indirect_files & (1 << TGSI_FILE_TEMPORARY));
-}
-
 LLVMValueRef radeon_llvm_emit_fetch(struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *reg,
enum tgsi_opcode_type type,
@@ -224,10 +217,6 @@ LLVMValueRef radeon_llvm_emit_fetch(struct 
lp_build_tgsi_context *bld_base,
case TGSI_FILE_TEMPORARY:
if (reg->Register.Index >= ctx->temps_count)
return LLVMGetUndef(tgsi2llvmtype(bld_base, type));
-   if (uses_temp_indirect_addressing(bld_base)) {
-   ptr = lp_get_temp_ptr_soa(bld, reg->Register.Index, 
swizzle);
-   break;
-   }
ptr = ctx->temps[reg->Register.Index * TGSI_NUM_CHANNELS + 
swizzle];
if (type == TGSI_TYPE_DOUBLE) {
ptr2 = ctx->temps[reg->Register.Index * 
TGSI_NUM_CHANNELS + swizzle + 1];
@@ -312,10 +301,6 @@ static void emit_declaration(
 
ctx->arrays[decl->Array.ArrayID - 1] = decl->Range;
}
-   if (uses_temp_indirect_addressing(bld_base)) {
-   lp_emit_declaration_soa(bld_base, decl);
-   break;
-   }
first = decl->Range.First;
last = decl->Range.Last;
if (!ctx->temps_count) {
@@ -457,10 +442,7 @@ void radeon_llvm_emit_store(
case TGSI_FILE_TEMPORARY:
if (range.First + i >= ctx->temps_count)
continue;
-   if 
(uses_temp_indirect_addressing(bld_base))
-   temp_ptr = 
lp_get_temp_ptr_soa(bld, i + range.First, chan_index);
-   else
-   temp_ptr = ctx->temps[(i + 
range.First) * TGSI_NUM_CHANNELS + chan_index];
+   temp_ptr = ctx->temps[(i + range.First) 
* TGSI_NUM_CHANNELS + chan_index];
break;
 
default:
@@ -482,10 +464,6 @@ void radeon_llvm_emit_store(
case TGSI_FILE_TEMPORARY:
if (reg->Register.Index >= ctx->temps_count)
continue;
-   if (uses_temp_indirect_addressing(bld_base)) {
-   temp_ptr = NULL;
-   break;
-   }
temp_ptr = ctx->temps[ TGSI_NUM_CHANNELS * 
reg->Register.Index + chan_index];
if (dtype == TGSI_TYPE_DOUBLE)
temp_ptr2 = ctx->temps[ 
TGSI_NUM_CHANNELS * reg->Register.Index + chan_index + 1];
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] radeon/llvm: Use alloca instructions for larger arrays

2016-05-25 Thread Tom Stellard
We were storing arrays in vectors, which was leading to some really bad
spill code for large arrays.  allocas instructions are a better fit for
arrays and LLVM optimizations are more geared toward dealing with
allocas instead of vectors.

For arrays that have 16 or less 32-bit elements, we will continue to use
vectors, because this will force LLVM to store them in registers and
use indirect registers, which is usually faster for small arrays.

In the future we should use allocas for all arrays and teach LLVM
how to store allocas in registers.

This fixes the piglit test:

spec/glsl-1.50/execution/geometry/max-input-component
---
 src/gallium/drivers/radeon/radeon_llvm.h   |   7 +-
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 169 ++---
 2 files changed, 151 insertions(+), 25 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index 3e11b36..5b524b6 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -50,6 +50,11 @@ struct radeon_llvm_loop {
LLVMBasicBlockRef endloop_block;
 };
 
+struct radeon_llvm_array {
+   struct tgsi_declaration_range range;
+   LLVMValueRef alloca;
+};
+
 struct radeon_llvm_context {
struct lp_build_tgsi_soa_context soa;
 
@@ -96,7 +101,7 @@ struct radeon_llvm_context {
unsigned loop_depth;
unsigned loop_depth_max;
 
-   struct tgsi_declaration_range *arrays;
+   struct radeon_llvm_array *arrays;
 
LLVMValueRef main_fn;
LLVMTypeRef return_type;
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 93bc307..cb35390 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -83,11 +83,25 @@ static LLVMValueRef emit_swizzle(
 
 static struct tgsi_declaration_range
 get_array_range(struct lp_build_tgsi_context *bld_base,
-   unsigned File, const struct tgsi_ind_register *reg)
+   unsigned File, unsigned reg_index,
+   const struct tgsi_ind_register *reg)
 {
struct radeon_llvm_context * ctx = radeon_llvm_context(bld_base);
 
-   if (File != TGSI_FILE_TEMPORARY || reg->ArrayID == 0 ||
+   if (!reg) {
+   unsigned i;
+   unsigned num_arrays = 
bld_base->info->array_max[TGSI_FILE_TEMPORARY];
+   for (i = 0; i < num_arrays; i++) {
+   const struct tgsi_declaration_range *range =
+   >arrays[i].range;
+
+   if (reg_index >= range->First && reg_index <= 
range->Last) {
+   return ctx->arrays[i].range;
+   }
+   }
+   }
+
+   if (File != TGSI_FILE_TEMPORARY || !reg || reg->ArrayID == 0 ||
reg->ArrayID > bld_base->info->array_max[TGSI_FILE_TEMPORARY]) {
struct tgsi_declaration_range range;
range.First = 0;
@@ -95,7 +109,32 @@ get_array_range(struct lp_build_tgsi_context *bld_base,
return range;
}
 
-   return ctx->arrays[reg->ArrayID - 1];
+   return ctx->arrays[reg->ArrayID - 1].range;
+}
+
+static LLVMValueRef get_alloca_for_array(
+   struct lp_build_tgsi_context *bld_base,
+   unsigned file,
+   unsigned index) {
+
+   unsigned i;
+   unsigned num_arrays;
+   struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
+
+   if (file != TGSI_FILE_TEMPORARY) {
+   return NULL;
+   }
+
+   num_arrays = bld_base->info->array_max[TGSI_FILE_TEMPORARY];
+   for (i = 0; i < num_arrays; i++) {
+   const struct tgsi_declaration_range *range =
+   >arrays[i].range;
+
+   if (index >= range->First && index <= range->Last) {
+   return ctx->arrays[i].alloca;
+   }
+   }
+   return NULL;
 }
 
 static LLVMValueRef
@@ -106,6 +145,9 @@ emit_array_index(
 {
struct gallivm_state * gallivm = bld->bld_base.base.gallivm;
 
+   if (!reg) {
+   return lp_build_const_int32(gallivm, offset);
+   }
LLVMValueRef addr = LLVMBuildLoad(gallivm->builder, 
bld->addr[reg->Index][reg->Swizzle], "");
return LLVMBuildAdd(gallivm->builder, addr, 
lp_build_const_int32(gallivm, offset), "");
 }
@@ -154,7 +196,7 @@ emit_array_fetch(
tmp_reg.Register.Index = i + range.First;
LLVMValueRef temp = radeon_llvm_emit_fetch(bld_base, _reg, 
type, swizzle);
result = LLVMBuildInsertElement(builder, result, temp,
-   lp_build_const_int32(gallivm, i), "");
+   lp_build_const_int32(gallivm, i), "array_vector");
}
return result;
 }
@@ -169,13 +211,35 @@ load_value_from_array(

[Mesa-dev] [PATCH 2/3] radeon/llvm: Add helpers for loading and storing data from arrays.

2016-05-25 Thread Tom Stellard
---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 51 +-
 1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 759c674..93bc307 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -159,6 +159,43 @@ emit_array_fetch(
return result;
 }
 
+static LLVMValueRef
+load_value_from_array(
+   struct lp_build_tgsi_context *bld_base,
+   unsigned file,
+   enum tgsi_opcode_type type,
+   unsigned swizzle,
+   unsigned reg_index,
+   const struct tgsi_ind_register *reg_indirect)
+{
+   struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base);
+   LLVMBuilderRef builder = bld_base->base.gallivm->builder;
+   struct tgsi_declaration_range range = get_array_range(bld_base, file, 
reg_indirect);
+
+   return LLVMBuildExtractElement(builder,
+   emit_array_fetch(bld_base, file, type, range, swizzle),
+   emit_array_index(bld, reg_indirect, reg_index - 
range.First), "");
+
+}
+
+static LLVMValueRef store_value_to_array(
+   struct lp_build_tgsi_context *bld_base,
+   LLVMValueRef value,
+   unsigned file,
+   unsigned chan_index,
+   unsigned reg_index,
+   const struct tgsi_ind_register *reg_indirect)
+{
+   struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base);
+   LLVMBuilderRef builder = bld_base->base.gallivm->builder;
+   struct tgsi_declaration_range range = get_array_range(bld_base, file, 
reg_indirect);
+
+   return LLVMBuildInsertElement(builder,
+   emit_array_fetch(bld_base, file, 
TGSI_TYPE_FLOAT, range, chan_index),
+   value,  emit_array_index(bld, reg_indirect, 
reg_index - range.First), "");
+   return NULL;
+}
+
 LLVMValueRef radeon_llvm_emit_fetch(struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *reg,
enum tgsi_opcode_type type,
@@ -180,12 +217,8 @@ LLVMValueRef radeon_llvm_emit_fetch(struct 
lp_build_tgsi_context *bld_base,
}
 
if (reg->Register.Indirect) {
-   struct tgsi_declaration_range range = get_array_range(bld_base,
-   reg->Register.File, >Indirect);
-   return LLVMBuildExtractElement(builder,
-   emit_array_fetch(bld_base, reg->Register.File, type, 
range, swizzle),
-   emit_array_index(bld, >Indirect, 
reg->Register.Index - range.First),
-   "");
+   return load_value_from_array(bld_base, reg->Register.File, type,
+   swizzle, reg->Register.Index, >Indirect);
}
 
switch(reg->Register.File) {
@@ -429,10 +462,8 @@ void radeon_llvm_emit_store(
reg->Register.File, >Indirect);
 
unsigned i, size = range.Last - range.First + 1;
-   LLVMValueRef array = LLVMBuildInsertElement(builder,
-   emit_array_fetch(bld_base, reg->Register.File, 
TGSI_TYPE_FLOAT, range, chan_index),
-   value,  emit_array_index(bld, >Indirect, 
reg->Register.Index - range.First), "");
-
+   LLVMValueRef array = store_value_to_array(bld_base, 
value, reg->Register.File, chan_index,
+ 
reg->Register.Index, >Indirect);
for (i = 0; i < size; ++i) {
switch(reg->Register.File) {
case TGSI_FILE_OUTPUT:
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/14] Revert "mesa: Build EGL without X11 headers after interop patchset"

2016-05-24 Thread Tom Stellard
On Tue, May 24, 2016 at 03:32:44PM +0100, Emil Velikov wrote:
> From: Emil Velikov <emil.veli...@collabora.com>
> 
> This reverts commit 4e2c9a04354b6b133845b8b93c0c5d34261a91d0.
> 
> The solution was incomplete and fragile. An alternative one is coming
> shortly.

Tested-by: Tom Stellard <thomas.stell...@amd.com>

> ---
>  include/GL/mesa_glinterop.h | 15 +--
>  1 file changed, 1 insertion(+), 14 deletions(-)
> 
> diff --git a/include/GL/mesa_glinterop.h b/include/GL/mesa_glinterop.h
> index 39822f2..814064d 100644
> --- a/include/GL/mesa_glinterop.h
> +++ b/include/GL/mesa_glinterop.h
> @@ -50,11 +50,7 @@
>  #ifndef MESA_GLINTEROP_H
>  #define MESA_GLINTEROP_H
>  
> -#if defined(MESA_EGL_NO_X11_HEADERS)
> -#include 
> -#else
>  #include 
> -#endif
>  #include 
>  
>  #ifdef __cplusplus
> @@ -223,7 +219,6 @@ typedef struct _mesa_glinterop_export_out {
>  } mesa_glinterop_export_out;
>  
>  
> -#if !defined(MESA_EGL_NO_X11_HEADERS)
>  /**
>   * Query device information.
>   *
> @@ -233,11 +228,9 @@ typedef struct _mesa_glinterop_export_out {
>   *
>   * \return MESA_GLINTEROP_SUCCESS or MESA_GLINTEROP_* != 0 on error
>   */
> -
>  GLAPI int GLAPIENTRY
>  MesaGLInteropGLXQueryDeviceInfo(Display *dpy, GLXContext context,
>  mesa_glinterop_device_info *out);
> -#endif
>  
>  
>  /**
> @@ -249,7 +242,6 @@ MesaGLInteropEGLQueryDeviceInfo(EGLDisplay dpy, 
> EGLContext context,
>  mesa_glinterop_device_info *out);
>  
>  
> -#if !defined(MESA_EGL_NO_X11_HEADERS)
>  /**
>   * Create and return a DMABUF handle corresponding to the given OpenGL
>   * object, and return other parameters about the OpenGL object.
> @@ -261,12 +253,10 @@ MesaGLInteropEGLQueryDeviceInfo(EGLDisplay dpy, 
> EGLContext context,
>   *
>   * \return MESA_GLINTEROP_SUCCESS or MESA_GLINTEROP_* != 0 on error
>   */
> -
>  GLAPI int GLAPIENTRY
>  MesaGLInteropGLXExportObject(Display *dpy, GLXContext context,
>   const mesa_glinterop_export_in *in,
>   mesa_glinterop_export_out *out);
> -#endif
>  
>  
>  /**
> @@ -278,17 +268,14 @@ MesaGLInteropEGLExportObject(EGLDisplay dpy, EGLContext 
> context,
>   const mesa_glinterop_export_in *in,
>   mesa_glinterop_export_out *out);
>  
> -#if !defined(MESA_EGL_NO_X11_HEADERS)
> +
>  typedef int (APIENTRYP PFNMESAGLINTEROPGLXQUERYDEVICEINFOPROC)(Display *dpy, 
> GLXContext context,
> 
> mesa_glinterop_device_info *out);
> -#endif
>  typedef int (APIENTRYP PFNMESAGLINTEROPEGLQUERYDEVICEINFOPROC)(EGLDisplay 
> dpy, EGLContext context,
> 
> mesa_glinterop_device_info *out);
> -#if !defined(MESA_EGL_NO_X11_HEADERS)
>  typedef int (APIENTRYP PFNMESAGLINTEROPGLXEXPORTOBJECTPROC)(Display *dpy, 
> GLXContext context,
>  const 
> mesa_glinterop_export_in *in,
>  
> mesa_glinterop_export_out *out);
> -#endif
>  typedef int (APIENTRYP PFNMESAGLINTEROPEGLEXPORTOBJECTPROC)(EGLDisplay dpy, 
> EGLContext context,
>  const 
> mesa_glinterop_export_in *in,
>  
> mesa_glinterop_export_out *out);
> -- 
> 2.8.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Build EGL without X11 headers after interop patchset

2016-05-20 Thread Tom Stellard
On Fri, May 20, 2016 at 04:29:07PM -0700, Mark Janes wrote:
> Tom Stellard <t...@stellard.net> writes:
> 
> > On Wed, Apr 27, 2016 at 10:33:14PM +, Youry Metlitsky wrote:
> >> ---
> >>  include/GL/mesa_glinterop.h | 15 ++-
> >>  1 file changed, 14 insertions(+), 1 deletion(-)
> >> 
> >
> > Hi,
> >
> > This patch breaks the build for me:
> >
> > glxcmds.c:2699:1: error: no previous prototype for
> > 'MesaGLInteropGLXQueryDeviceInfo' [-Werror=missing-prototypes]
> >  MesaGLInteropGLXQueryDeviceInfo(Display *dpy, GLXContext context,
> >   ^
> > glxcmds.c:2723:1: error: no previous prototype for
> > 'MesaGLInteropGLXExportObject' [-Werror=missing-prototypes]
> >  MesaGLInteropGLXExportObject(Display *dpy, GLXContext context,
> >   ^
> >   cc1: some warnings being treated as errors
> >
> > These are my configure args:
> >
> >  ./autogen.sh --disable-dri3 --disable-xvmc
> >  --with-gallium-drivers=swrast --disable-gallium-llvm
> >  --with-egl-platforms=drm --with-dri-drivers=no --enable-texture-float
> 
> Sorry about that!  I pushed the patch at Marek's direction, but I did
> test it on a several build configurations.
> 
> Adding " && !defined(MESA_EGL_NO_X11_HEADERS)" to the ifdef above the
> failure seems to fix compilation for those options -- does it work for
> you?
> 

Yes, the attached patch fixes my issue.

-Tom

> >
> >> diff --git a/include/GL/mesa_glinterop.h b/include/GL/mesa_glinterop.h
> >> index 814064d..39822f2 100644
> >> --- a/include/GL/mesa_glinterop.h
> >> +++ b/include/GL/mesa_glinterop.h
> >> @@ -50,7 +50,11 @@
> >>  #ifndef MESA_GLINTEROP_H
> >>  #define MESA_GLINTEROP_H
> >>  
> >> +#if defined(MESA_EGL_NO_X11_HEADERS)
> >> +#include 
> >> +#else
> >>  #include 
> >> +#endif
> >>  #include 
> >>  
> >>  #ifdef __cplusplus
> >> @@ -219,6 +223,7 @@ typedef struct _mesa_glinterop_export_out {
> >>  } mesa_glinterop_export_out;
> >>  
> >>  
> >> +#if !defined(MESA_EGL_NO_X11_HEADERS)
> >>  /**
> >>   * Query device information.
> >>   *
> >> @@ -228,9 +233,11 @@ typedef struct _mesa_glinterop_export_out {
> >>   *
> >>   * \return MESA_GLINTEROP_SUCCESS or MESA_GLINTEROP_* != 0 on error
> >>   */
> >> +
> >>  GLAPI int GLAPIENTRY
> >>  MesaGLInteropGLXQueryDeviceInfo(Display *dpy, GLXContext context,
> >>  mesa_glinterop_device_info *out);
> >> +#endif
> >>  
> >>  
> >>  /**
> >> @@ -242,6 +249,7 @@ MesaGLInteropEGLQueryDeviceInfo(EGLDisplay dpy, 
> >> EGLContext context,
> >>  mesa_glinterop_device_info *out);
> >>  
> >>  
> >> +#if !defined(MESA_EGL_NO_X11_HEADERS)
> >>  /**
> >>   * Create and return a DMABUF handle corresponding to the given OpenGL
> >>   * object, and return other parameters about the OpenGL object.
> >> @@ -253,10 +261,12 @@ MesaGLInteropEGLQueryDeviceInfo(EGLDisplay dpy, 
> >> EGLContext context,
> >>   *
> >>   * \return MESA_GLINTEROP_SUCCESS or MESA_GLINTEROP_* != 0 on error
> >>   */
> >> +
> >>  GLAPI int GLAPIENTRY
> >>  MesaGLInteropGLXExportObject(Display *dpy, GLXContext context,
> >>   const mesa_glinterop_export_in *in,
> >>   mesa_glinterop_export_out *out);
> >> +#endif
> >>  
> >>  
> >>  /**
> >> @@ -268,14 +278,17 @@ MesaGLInteropEGLExportObject(EGLDisplay dpy, 
> >> EGLContext context,
> >>   const mesa_glinterop_export_in *in,
> >>   mesa_glinterop_export_out *out);
> >>  
> >> -
> >> +#if !defined(MESA_EGL_NO_X11_HEADERS)
> >>  typedef int (APIENTRYP PFNMESAGLINTEROPGLXQUERYDEVICEINFOPROC)(Display 
> >> *dpy, GLXContext context,
> >> 
> >> mesa_glinterop_device_info *out);
> >> +#endif
> >>  typedef int (APIENTRYP PFNMESAGLINTEROPEGLQUERYDEVICEINFOPROC)(EGLDisplay 
> >> dpy, EGLContext context,
> >> 
> >> mesa_glinterop_device_info *out);
> >> +#if !defined(MESA_EGL_NO_X11_HEADERS)
> >>  typedef int (APIENTRYP PFNMESAGLINTEROP

Re: [Mesa-dev] [PATCH] mesa: Build EGL without X11 headers after interop patchset

2016-05-20 Thread Tom Stellard
On Wed, Apr 27, 2016 at 10:33:14PM +, Youry Metlitsky wrote:
> ---
>  include/GL/mesa_glinterop.h | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 

Hi,

This patch breaks the build for me:

glxcmds.c:2699:1: error: no previous prototype for
'MesaGLInteropGLXQueryDeviceInfo' [-Werror=missing-prototypes]
 MesaGLInteropGLXQueryDeviceInfo(Display *dpy, GLXContext context,
  ^
glxcmds.c:2723:1: error: no previous prototype for
'MesaGLInteropGLXExportObject' [-Werror=missing-prototypes]
 MesaGLInteropGLXExportObject(Display *dpy, GLXContext context,
  ^
  cc1: some warnings being treated as errors

These are my configure args:

 ./autogen.sh --disable-dri3 --disable-xvmc
 --with-gallium-drivers=swrast --disable-gallium-llvm
 --with-egl-platforms=drm --with-dri-drivers=no --enable-texture-float

> diff --git a/include/GL/mesa_glinterop.h b/include/GL/mesa_glinterop.h
> index 814064d..39822f2 100644
> --- a/include/GL/mesa_glinterop.h
> +++ b/include/GL/mesa_glinterop.h
> @@ -50,7 +50,11 @@
>  #ifndef MESA_GLINTEROP_H
>  #define MESA_GLINTEROP_H
>  
> +#if defined(MESA_EGL_NO_X11_HEADERS)
> +#include 
> +#else
>  #include 
> +#endif
>  #include 
>  
>  #ifdef __cplusplus
> @@ -219,6 +223,7 @@ typedef struct _mesa_glinterop_export_out {
>  } mesa_glinterop_export_out;
>  
>  
> +#if !defined(MESA_EGL_NO_X11_HEADERS)
>  /**
>   * Query device information.
>   *
> @@ -228,9 +233,11 @@ typedef struct _mesa_glinterop_export_out {
>   *
>   * \return MESA_GLINTEROP_SUCCESS or MESA_GLINTEROP_* != 0 on error
>   */
> +
>  GLAPI int GLAPIENTRY
>  MesaGLInteropGLXQueryDeviceInfo(Display *dpy, GLXContext context,
>  mesa_glinterop_device_info *out);
> +#endif
>  
>  
>  /**
> @@ -242,6 +249,7 @@ MesaGLInteropEGLQueryDeviceInfo(EGLDisplay dpy, 
> EGLContext context,
>  mesa_glinterop_device_info *out);
>  
>  
> +#if !defined(MESA_EGL_NO_X11_HEADERS)
>  /**
>   * Create and return a DMABUF handle corresponding to the given OpenGL
>   * object, and return other parameters about the OpenGL object.
> @@ -253,10 +261,12 @@ MesaGLInteropEGLQueryDeviceInfo(EGLDisplay dpy, 
> EGLContext context,
>   *
>   * \return MESA_GLINTEROP_SUCCESS or MESA_GLINTEROP_* != 0 on error
>   */
> +
>  GLAPI int GLAPIENTRY
>  MesaGLInteropGLXExportObject(Display *dpy, GLXContext context,
>   const mesa_glinterop_export_in *in,
>   mesa_glinterop_export_out *out);
> +#endif
>  
>  
>  /**
> @@ -268,14 +278,17 @@ MesaGLInteropEGLExportObject(EGLDisplay dpy, EGLContext 
> context,
>   const mesa_glinterop_export_in *in,
>   mesa_glinterop_export_out *out);
>  
> -
> +#if !defined(MESA_EGL_NO_X11_HEADERS)
>  typedef int (APIENTRYP PFNMESAGLINTEROPGLXQUERYDEVICEINFOPROC)(Display *dpy, 
> GLXContext context,
> 
> mesa_glinterop_device_info *out);
> +#endif
>  typedef int (APIENTRYP PFNMESAGLINTEROPEGLQUERYDEVICEINFOPROC)(EGLDisplay 
> dpy, EGLContext context,
> 
> mesa_glinterop_device_info *out);
> +#if !defined(MESA_EGL_NO_X11_HEADERS)
>  typedef int (APIENTRYP PFNMESAGLINTEROPGLXEXPORTOBJECTPROC)(Display *dpy, 
> GLXContext context,
>  const 
> mesa_glinterop_export_in *in,
>  
> mesa_glinterop_export_out *out);
> +#endif
>  typedef int (APIENTRYP PFNMESAGLINTEROPEGLEXPORTOBJECTPROC)(EGLDisplay dpy, 
> EGLContext context,
>  const 
> mesa_glinterop_export_in *in,
>  
> mesa_glinterop_export_out *out);
> -- 
> 2.8.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600, compute: create vtx buffer for text + rodata

2016-05-03 Thread Tom Stellard
On Mon, May 02, 2016 at 01:11:18AM -0400, Jan Vesely wrote:
> From: Jan Vesely <jan.ves...@rutgers.edu>
> 
> reserve buffer id 2
> 
> 
> Signed-off-by: Jan Vesely <jan.ves...@rutgers.edu>
> ---
> needs llvm patches to be of use:
> https://github.com/jvesely/llvm/tree/eg-const
> 
> passes program-scope-arrays piglit and fixes all builtin functions that
> are implemented using large tables (AMD Turks)
> 
>  src/gallium/drivers/r600/evergreen_compute.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
> b/src/gallium/drivers/r600/evergreen_compute.c
> index 334897e..f498007 100644
> --- a/src/gallium/drivers/r600/evergreen_compute.c
> +++ b/src/gallium/drivers/r600/evergreen_compute.c
> @@ -259,9 +259,11 @@ static void *evergreen_create_compute_state(struct 
> pipe_context *ctx,
>   radeon_elf_read(code, header->num_bytes, >binary);
>   r600_create_shader(>bc, >binary, _kill);
>  
> + /* Upload code + ROdata */
>   shader->code_bo = r600_compute_buffer_alloc_vram(rctx->screen,
>   shader->bc.ndw * 4);
>   p = r600_buffer_map_sync_with_rings(>b, shader->code_bo, 
> PIPE_TRANSFER_WRITE);
> + //TODO: use util_memcpy_cpu_to_le32 ?
>   memcpy(p, shader->bc.bytecode, shader->bc.ndw * 4);
>   rctx->b.ws->buffer_unmap(shader->code_bo->buf);
>  #endif
> @@ -612,9 +614,9 @@ static void evergreen_set_compute_resources(struct 
> pipe_context *ctx,
>   start, count);
>  
>   for (unsigned i = 0; i < count; i++) {
> - /* The First two vertex buffers are reserved for parameters and
> + /* The First three vertex buffers are reserved for parameters 
> and
>* global buffers. */
> - unsigned vtx_id = 2 + i;
> + unsigned vtx_id = 3 + i;
>   if (resources[i]) {
>   struct r600_resource_global *buffer =
>   (struct r600_resource_global*)
> @@ -681,9 +683,15 @@ static void evergreen_set_global_binding(struct 
> pipe_context *ctx,
>   *(handles[i]) = util_cpu_to_le32(handle);
>   }
>  
> + /* globals for writing */
>   evergreen_set_rat(rctx->cs_shader_state.shader, 0, pool->bo, 0, 
> pool->size_in_dw * 4);
> + /* globals for reading */
>   evergreen_cs_set_vertex_buffer(rctx, 1, 0,
>   (struct pipe_resource*)pool->bo);
> +
> + /* constants for reading, LLVM puts them in text segment */
> + evergreen_cs_set_vertex_buffer(rctx, 2, 0,
> + (struct 
> pipe_resource*)rctx->cs_shader_state.shader->code_bo);

I see now you are binding the whole shader to the vertex buffer rather
than just the readonly data, which is why you need to emit the
GlobalAddress in LLVM.

If you were to just bind the read-only data you could generate better
code in LLVM, because all the global address would just be constant
offsets from the start of the buffer and they could be folded into
other instructions.

However, this would be a little more involved because you would have
to change llvm to emit the read-only data fro R600 into a separate
section.

I think your approach is fine, since it is what radeonsi is doing, and
that makes it easier to share code for this in LLVM.  If you want to
optimize this in the future you always do it in a follow up commit.

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>

>  }
>  
>  /**
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: Add config parameter to si_shader_apply_scratch_relocs.

2016-04-21 Thread Tom Stellard
On Thu, Apr 21, 2016 at 06:28:15PM +0200, Bas Nieuwenhuizen wrote:
> shader->config is not updated for compute kernels.
> 
> Signed-off-by: Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl>

This fixes compute shaders that use scratch.  Thanks.

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>

> ---
>  src/gallium/drivers/radeonsi/si_compute.c   | 2 +-
>  src/gallium/drivers/radeonsi/si_shader.c| 3 ++-
>  src/gallium/drivers/radeonsi/si_shader.h| 1 +
>  src/gallium/drivers/radeonsi/si_state_shaders.c | 2 +-
>  4 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
> b/src/gallium/drivers/radeonsi/si_compute.c
> index 905c169..7e05be5 100644
> --- a/src/gallium/drivers/radeonsi/si_compute.c
> +++ b/src/gallium/drivers/radeonsi/si_compute.c
> @@ -221,7 +221,7 @@ static bool si_setup_compute_scratch_buffer(struct 
> si_context *sctx,
>   if (sctx->compute_scratch_buffer != shader->scratch_bo && 
> scratch_needed) {
>   uint64_t scratch_va = sctx->compute_scratch_buffer->gpu_address;
>  
> - si_shader_apply_scratch_relocs(sctx, shader, scratch_va);
> + si_shader_apply_scratch_relocs(sctx, shader, config, 
> scratch_va);
>  
>   if (si_shader_binary_upload(sctx->screen, shader))
>   return false;
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index 3bf68eb..c48ae3b 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -5394,13 +5394,14 @@ void si_shader_binary_read_config(struct 
> radeon_shader_binary *binary,
>  
>  void si_shader_apply_scratch_relocs(struct si_context *sctx,
>   struct si_shader *shader,
> + struct si_shader_config *config,
>   uint64_t scratch_va)
>  {
>   unsigned i;
>   uint32_t scratch_rsrc_dword0 = scratch_va;
>   uint32_t scratch_rsrc_dword1 =
>   S_008F04_BASE_ADDRESS_HI(scratch_va >> 32)
> - |  S_008F04_STRIDE(shader->config.scratch_bytes_per_wave / 64);
> + |  S_008F04_STRIDE(config->scratch_bytes_per_wave / 64);
>  
>   for (i = 0 ; i < shader->binary.reloc_count; i++) {
>   const struct radeon_shader_reloc *reloc =
> diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
> b/src/gallium/drivers/radeonsi/si_shader.h
> index 6ea849d..857a682 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.h
> +++ b/src/gallium/drivers/radeonsi/si_shader.h
> @@ -478,6 +478,7 @@ void si_shader_dump(struct si_screen *sscreen, struct 
> si_shader *shader,
>   FILE *f);
>  void si_shader_apply_scratch_relocs(struct si_context *sctx,
>   struct si_shader *shader,
> + struct si_shader_config *config,
>   uint64_t scratch_va);
>  void si_shader_binary_read_config(struct radeon_shader_binary *binary,
> struct si_shader_config *conf,
> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
> b/src/gallium/drivers/radeonsi/si_state_shaders.c
> index d560aae..49e688a 100644
> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
> @@ -1634,7 +1634,7 @@ static int si_update_scratch_buffer(struct si_context 
> *sctx,
>  
>   assert(sctx->scratch_buffer);
>  
> - si_shader_apply_scratch_relocs(sctx, shader, scratch_va);
> + si_shader_apply_scratch_relocs(sctx, shader, >config, 
> scratch_va);
>  
>   /* Replace the shader bo with a new bo that has the relocs applied. */
>   r = si_shader_binary_upload(sctx->screen, shader);
> -- 
> 2.8.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] radeonsi: Set range metadata on calls to llvm.SI.tid

2016-04-19 Thread Tom Stellard
On Tue, Apr 19, 2016 at 08:12:08PM +0200, Michael Schellenberger Costa wrote:
> Hi Tom,
> 
> Am 19.04.2016 um 19:52 schrieb Tom Stellard:
> > The range metadata tells LLVM the range of expected values for this 
> > intrinsic,
> > so it can do some additional optimizations on the result.
> > ---
> >  src/gallium/drivers/radeonsi/si_shader.c | 29 ++---
> >  1 file changed, 26 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> > b/src/gallium/drivers/radeonsi/si_shader.c
> > index 3b6d6e9..b4f2a42 100644
> > --- a/src/gallium/drivers/radeonsi/si_shader.c
> > +++ b/src/gallium/drivers/radeonsi/si_shader.c
> > @@ -1114,12 +1114,35 @@ static LLVMValueRef get_sample_id(struct 
> > radeon_llvm_context *radeon_bld)
> > SI_PARAM_ANCILLARY, 8, 4);
> >  }
> >  
> > +/**
> > + * Set range metadata on an instruction.  This can only be used on load and
> > + * call instructions.  To specify an instruciton can only produce the 
> > values
> > + * 0, 1, 2, you would do set_range_metadata(value, 0, 3);
> > + * \p lo is the minimum value inclusive.
> > + * \p hi is the maximum value exclusive.
> > + */
> > +static void set_range_metadata(LLVMValueRef value, unsigned lo, unsigned 
> > hi)
> > +{
> > +   const char *range_md_string = "range";
> > +   LLVMValueRef range_md, md_args[2];
> > +   LLVMTypeRef type = LLVMTypeOf(value);
> > +   LLVMContextRef context = LLVMGetTypeContext(type);
> > +   unsigned md_range_id = LLVMGetMDKindIDInContext(context,
> > +   range_md_string, strlen(range_md_string));
> > +
> > +   md_args[0] = LLVMConstInt(type, lo, false);
> > +   md_args[1] = LLVMConstInt(type, hi, false);
> > +   range_md = LLVMMDNodeInContext(context, md_args, 2);
> > +   LLVMSetMetadata(value, md_range_id, range_md);
> > +}
> > +
> >  static LLVMValueRef get_thread_id(struct si_shader_context *ctx)
> >  {
> > struct gallivm_state *gallivm = >radeon_bld.gallivm;
> > -
> > -   return lp_build_intrinsic(gallivm->builder, "llvm.SI.tid", ctx->i32,
> > -  NULL, 0, LLVMReadNoneAttribute);
> > +   LLVMValueRef tid = lp_build_intrinsic(gallivm->builder, "llvm.SI.tid",
> > +   ctx->i32,   NULL, 0, LLVMReadNoneAttribute);
> 
> same here, why not use the helper from patch 1?

This is the helper from patch 1. ;)

-Tom

> Michael
> > +   set_range_metadata(tid, 0, 64);
> > +   return tid;
> >  }
> >  
> >  /**
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] configure.ac: fix the --disable-llvm-shared-libs build

2016-04-19 Thread Tom Stellard
On Tue, Apr 19, 2016 at 07:39:13PM +0100, Emil Velikov wrote:
> Hi Chuck,
> 
> Thanks for chipping in.
> 
> On 19 April 2016 at 15:47, Chuck Atkins  wrote:
> > This still doesn't quite give what you want.  One can also have an llvm with
> > component shared libs.  So there's three different options for llvm library
> > configurations: a single shared lib, component shared libs, or component
> > static libs.
> From the three - only single shared lib and component static libs are 
> supported.
> 
> Personally I'm leaning that we ought to go with the latter only... Esp
> considering the problems that people tend to have with mesa + steam,
> every so often.
> 
> IIRC all the issues that we had with static llvm have been resolved.
> Plus we have great people like Kai who promptly send patches when
> things break (which hasn't happen in a long time)
> 
> Tom, what is your view on the topic - are you ok with us switching
> back to static one and/or nuking the shared one ? Iirc Jose was clear
> that in his view one should just static link LLVM. I believe that's
> still the case, right Jose ?
> 

The shared option is useful for development, because then you don't
need to rebuild mesa every time you make an LLVM change, so I don't want
to remove it.

I don't really have a strong opinion of what should be default.  Static
libraries have the disadvantage of requiring you to explicitly list the
libraries you need, so if a library changes names or a new dependency is
added, the build will break.

Static libraries builds also take up a huge amount of disk/memory since
each pipe driver and state tracker has their own copy of LLVM linked in.

-Tom

> Dave, I believe you guys were building with static llvm and/or private
> version explicitly for mesa. Would shifting to static LLVM be an
> option for you guys ?
> 
> >  To ensure you're getting just the static libs or just the
> > shared libs, you'll need to wrap the link lines in -Bstatic or -Bdynamic.
> > Something like (in pseudo code, not actual make or autoconf syntax):
> >
> > if use_llvm
> > then
> >   if use_llvm_shared
> >   then
> > if single_library
> > then
> >   # (whatever logic you need to deal with the different library naming
> > issues)
> >   LLVM_LIBS="-lLLVM'" or LLVM_LIBS="-LLVM-3.7"
> > else # use llvm component libs
> >   LLVM_LIBS="-Wl,--push-state,-Bdynamic `$LLVM_CONFIG --libs
> > $LLVM_COMPONENTS` -Wl,--pop-state"
> > endif
> >   else # use_llvm_static
> > LLVM_LIBS="-Wl,--push-state,-Bstatic `$LLVM_CONFIG --libs
> > $LLVM_COMPONENTS` -Wl,--pop-state"
> >   endif
> >
> So far we did not need to have the -Bstatic/-Bdynamic as LLVM as of
> recently introduced the experimental option of multiple shared
> libraries. I guess that will lead to confusion as it gets more common
> (less experimental)
> 
> >   LLVM_LIBS="-L`$LLVM_CONFIG --libdir` $LLVM_LIBS"
> > endif
> >
> > - Chuck
> >
> > On Mon, Apr 18, 2016 at 11:01 PM, Jonathan Gray  wrote:
> >>
> >> This patch is still required for master.
> >>
> >> On Sun, Feb 28, 2016 at 02:47:03PM +1100, Jonathan Gray wrote:
> >> > When building with --disable-llvm-shared-libs use llvm-config --libfiles
> >> > instead of of --libs so the full path to the .a files is used instead of
> >> > -lname.
> >> >
> I would imagine that the issue here is cause because the llvm
> libraries are not in the standard location correct ? I.e. they are
> places somewhere like /usr/lib/llvm while the linker looks only in
> /usr/lib.
> 
> Personally I'm fine with the using --libnames when static linking, but
> I'd appreciate if we can check that different versions of llvm-config
> produce the same (correct) output throughout.
> Based on the llvm-config code --libnames exists since 3.3.0 (at least)
> so that's great. Running 3.7 and it's fine, although I cannot test
> other versions, can anyone else it a try ?
> 
> * OpenCL - 3.5.0
> * r300 - 'any' (3.3.0) LLVM
> This should be demoted to a warning and a big fat note should be added
> in the GL_VERSION string... But that's another topic
> * r600/radeonsi - 3.6.0
> * swr 3.6.0
> * llvmpipe - 3.3.0
> 
> Thanks
> Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] radeonsi: Set range metadata on calls to llvm.SI.tid

2016-04-19 Thread Tom Stellard
The range metadata tells LLVM the range of expected values for this intrinsic,
so it can do some additional optimizations on the result.
---
 src/gallium/drivers/radeonsi/si_shader.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 3b6d6e9..b4f2a42 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1114,12 +1114,35 @@ static LLVMValueRef get_sample_id(struct 
radeon_llvm_context *radeon_bld)
SI_PARAM_ANCILLARY, 8, 4);
 }
 
+/**
+ * Set range metadata on an instruction.  This can only be used on load and
+ * call instructions.  To specify an instruciton can only produce the values
+ * 0, 1, 2, you would do set_range_metadata(value, 0, 3);
+ * \p lo is the minimum value inclusive.
+ * \p hi is the maximum value exclusive.
+ */
+static void set_range_metadata(LLVMValueRef value, unsigned lo, unsigned hi)
+{
+   const char *range_md_string = "range";
+   LLVMValueRef range_md, md_args[2];
+   LLVMTypeRef type = LLVMTypeOf(value);
+   LLVMContextRef context = LLVMGetTypeContext(type);
+   unsigned md_range_id = LLVMGetMDKindIDInContext(context,
+   range_md_string, strlen(range_md_string));
+
+   md_args[0] = LLVMConstInt(type, lo, false);
+   md_args[1] = LLVMConstInt(type, hi, false);
+   range_md = LLVMMDNodeInContext(context, md_args, 2);
+   LLVMSetMetadata(value, md_range_id, range_md);
+}
+
 static LLVMValueRef get_thread_id(struct si_shader_context *ctx)
 {
struct gallivm_state *gallivm = >radeon_bld.gallivm;
-
-   return lp_build_intrinsic(gallivm->builder, "llvm.SI.tid", ctx->i32,
-  NULL, 0, LLVMReadNoneAttribute);
+   LLVMValueRef tid = lp_build_intrinsic(gallivm->builder, "llvm.SI.tid",
+   ctx->i32,   NULL, 0, LLVMReadNoneAttribute);
+   set_range_metadata(tid, 0, 64);
+   return tid;
 }
 
 /**
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-19 Thread Tom Stellard
The ds_bpermute instruction allows threads to transfer data directly
to or from the vgprs of other threads.  These instructions use the lds
hardware to transfer data, but do not read or write lds memory.

DDX BEFORE:|  DDX AFTER:
   |
v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2, -1, v2
v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 60, v2
v_and_b32_e32 v2, 60, v2   |  v_lshlrev_b32_e32 v2, 2, v2
v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0 offset:4
ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
s_waitcnt lgkmcnt(0)   |
v_or_b32_e32 v0, 1, v2 |
v_lshlrev_b32_e32 v0, 2, v0|
ds_read_b32 v1, v3 |
ds_read_b32 v0, v0 |
s_waitcnt lgkmcnt(0)   |
   |
LDS: 1 blocks  |  LDS: 0 blocks
---
 src/gallium/drivers/radeonsi/si_shader.c | 42 +++-
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 2a747f9..d3e445b 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4162,6 +4162,7 @@ static void si_llvm_emit_ddxy(
LLVMValueRef indices[2];
LLVMValueRef store_ptr, load_ptr0, load_ptr1;
LLVMValueRef tl, trbl, result[4];
+   LLVMValueRef tl_tid, trbl_tid;
unsigned swizzle[4];
unsigned c;
int idx;
@@ -4179,20 +4180,24 @@ static void si_llvm_emit_ddxy(
else
mask = TID_MASK_TOP_LEFT;
 
-   indices[1] = LLVMBuildAnd(gallivm->builder, indices[1],
- lp_build_const_int32(gallivm, mask), "");
+   tl_tid = LLVMBuildAnd(gallivm->builder, indices[1],
+   lp_build_const_int32(gallivm, mask), "");
+   indices[1] = tl_tid;
load_ptr0 = LLVMBuildGEP(gallivm->builder, ctx->lds,
 indices, 2, "");
 
/* for DDX we want to next X pixel, DDY next Y pixel. */
idx = (opcode == TGSI_OPCODE_DDX || opcode == TGSI_OPCODE_DDX_FINE) ? 1 
: 2;
-   indices[1] = LLVMBuildAdd(gallivm->builder, indices[1],
+   trbl_tid = LLVMBuildAdd(gallivm->builder, indices[1],
  lp_build_const_int32(gallivm, idx), "");
+   indices[1] = trbl_tid;
load_ptr1 = LLVMBuildGEP(gallivm->builder, ctx->lds,
 indices, 2, "");
 
for (c = 0; c < 4; ++c) {
unsigned i;
+   LLVMValueRef val;
+   LLVMValueRef args[2];
 
swizzle[c] = 
tgsi_util_get_full_src_register_swizzle(>Src[0], c);
for (i = 0; i < c; ++i) {
@@ -4204,18 +4209,31 @@ static void si_llvm_emit_ddxy(
if (i != c)
continue;
 
-   LLVMBuildStore(gallivm->builder,
-  LLVMBuildBitCast(gallivm->builder,
-   lp_build_emit_fetch(bld_base, 
inst, 0, c),
-   ctx->i32, ""),
-  store_ptr);
+   val = LLVMBuildBitCast(gallivm->builder,
+   lp_build_emit_fetch(bld_base, inst, 0, c),
+   ctx->i32, "");
 
-   tl = LLVMBuildLoad(gallivm->builder, load_ptr0, "");
-   tl = LLVMBuildBitCast(gallivm->builder, tl, ctx->f32, "");
+   if ((HAVE_LLVM >= 0x0309) && ctx->screen->b.family >= 
CHIP_TONGA) {
 
-   trbl = LLVMBuildLoad(gallivm->builder, load_ptr1, "");
-   trbl = LLVMBuildBitCast(gallivm->builder, trbl, ctx->f32, "");
+   args[0] = LLVMBuildMul(gallivm->builder, tl_tid,
+lp_build_const_int32(gallivm, 4), "");
+   args[1] = val;
+   tl = lp_build_intrinsic(gallivm->builder,
+   "llvm.amdgcn.ds.bpermute", ctx->i32,
+   args, 2, LLVMReadNoneAttribute);
 
+   args[0] = LLVMBuildMul(gallivm->builder, trbl_tid,
+lp_build_const_int32(gallivm, 4), "");
+   trbl = lp_build_intrinsic(gallivm->builder,
+   "llvm.amdgcn.ds.bpermute", ctx->i32,
+   args, 2, LLVMReadNoneAttribute);
+   } else {
+   LLVMBuildStore(gallivm->builder, val, store_ptr);
+   tl = LLVMBuildLoad(gallivm->builder, load_ptr0, "");
+   

[Mesa-dev] [PATCH 3/4] radeonsi: Use llvm.amdgcn.mbcnt.* intrinsics instead of llvm.SI.tid v2

2016-04-19 Thread Tom Stellard
We're trying to move to more of the new style intrinsics with include
the correct target name, and map directly to ISA instructions.

v2:
  - Only do this with LLVM 3.8 and newer.
---
 src/gallium/drivers/radeonsi/si_shader.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index b4f2a42..2a747f9 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1139,8 +1139,23 @@ static void set_range_metadata(LLVMValueRef value, 
unsigned lo, unsigned hi)
 static LLVMValueRef get_thread_id(struct si_shader_context *ctx)
 {
struct gallivm_state *gallivm = >radeon_bld.gallivm;
-   LLVMValueRef tid = lp_build_intrinsic(gallivm->builder, "llvm.SI.tid",
+   LLVMValueRef tid;
+
+   if (HAVE_LLVM < 0x0308) {
+   tid = lp_build_intrinsic(gallivm->builder, "llvm.SI.tid",
ctx->i32,   NULL, 0, LLVMReadNoneAttribute);
+   } else {
+   LLVMValueRef tid_args[2];
+   tid_args[0] = lp_build_const_int32(gallivm, 0x);
+   tid_args[1] = lp_build_const_int32(gallivm, 0);
+   tid_args[1] = lp_build_intrinsic(gallivm->builder,
+   "llvm.amdgcn.mbcnt.lo", ctx->i32,
+   tid_args, 2, LLVMReadNoneAttribute);
+
+   tid = lp_build_intrinsic(gallivm->builder,
+   "llvm.amdgcn.mbcnt.hi", ctx->i32,
+   tid_args, 2, LLVMReadNoneAttribute);
+   }
set_range_metadata(tid, 0, 64);
return tid;
 }
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] radeonsi: Create a helper function for computing the thread id

2016-04-19 Thread Tom Stellard
---
 src/gallium/drivers/radeonsi/si_shader.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index c26960b..3b6d6e9 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1114,6 +1114,14 @@ static LLVMValueRef get_sample_id(struct 
radeon_llvm_context *radeon_bld)
SI_PARAM_ANCILLARY, 8, 4);
 }
 
+static LLVMValueRef get_thread_id(struct si_shader_context *ctx)
+{
+   struct gallivm_state *gallivm = >radeon_bld.gallivm;
+
+   return lp_build_intrinsic(gallivm->builder, "llvm.SI.tid", ctx->i32,
+  NULL, 0, LLVMReadNoneAttribute);
+}
+
 /**
  * Load a dword from a constant buffer.
  */
@@ -1776,8 +1784,7 @@ static void si_llvm_emit_streamout(struct 
si_shader_context *ctx,
LLVMValueRef so_vtx_count =
unpack_param(ctx, ctx->param_streamout_config, 16, 7);
 
-   LLVMValueRef tid = lp_build_intrinsic(builder, "llvm.SI.tid", ctx->i32,
-  NULL, 0, LLVMReadNoneAttribute);
+   LLVMValueRef tid = get_thread_id(ctx);
 
/* can_emit = tid < so_vtx_count; */
LLVMValueRef can_emit =
@@ -4123,8 +4130,7 @@ static void si_llvm_emit_ddxy(
unsigned mask;
 
indices[0] = bld_base->uint_bld.zero;
-   indices[1] = lp_build_intrinsic(gallivm->builder, "llvm.SI.tid", 
ctx->i32,
-NULL, 0, LLVMReadNoneAttribute);
+   indices[1] = get_thread_id(ctx);
store_ptr = LLVMBuildGEP(gallivm->builder, ctx->lds,
 indices, 2, "");
 
@@ -4195,8 +4201,7 @@ static LLVMValueRef si_llvm_emit_ddxy_interp(
unsigned c;
 
indices[0] = bld_base->uint_bld.zero;
-   indices[1] = lp_build_intrinsic(gallivm->builder, "llvm.SI.tid", 
ctx->i32,
-   NULL, 0, LLVMReadNoneAttribute);
+   indices[1] = get_thread_id(ctx);
store_ptr = LLVMBuildGEP(gallivm->builder, ctx->lds,
 indices, 2, "");
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/9] llvmpipe: Test more vector lengths.

2016-04-18 Thread Tom Stellard
On Mon, Apr 18, 2016 at 10:14:35AM +0100, Jose Fonseca wrote:
> All power of two of up native vector length.
> 
> There is actually a bug in lp_build_round for v2, whereby it doesn't
> round to nearest.  Fixing is left to the future, but the test is now
> able to expect it to fail.
> ---
>  src/gallium/drivers/llvmpipe/lp_test_arit.c | 43 
> -
>  1 file changed, 30 insertions(+), 13 deletions(-)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_test_arit.c 
> b/src/gallium/drivers/llvmpipe/lp_test_arit.c
> index ba831f3..f3ba5a1 100644
> --- a/src/gallium/drivers/llvmpipe/lp_test_arit.c
> +++ b/src/gallium/drivers/llvmpipe/lp_test_arit.c
> @@ -297,14 +297,16 @@ unary_tests[] = {
>   */
>  static LLVMValueRef
>  build_unary_test_func(struct gallivm_state *gallivm,
> -  const struct unary_test_t *test)
> +  const struct unary_test_t *test,
> +  unsigned length,
> +  const char *test_name)
>  {
> -   struct lp_type type = lp_type_float_vec(32, lp_native_vector_width);
> +   struct lp_type type = lp_type_float_vec(32, length * 32);
> LLVMContextRef context = gallivm->context;
> LLVMModuleRef module = gallivm->module;
> LLVMTypeRef vf32t = lp_build_vec_type(gallivm, type);
> LLVMTypeRef args[2] = { LLVMPointerType(vf32t, 0), LLVMPointerType(vf32t, 
> 0) };
> -   LLVMValueRef func = LLVMAddFunction(module, test->name,
> +   LLVMValueRef func = LLVMAddFunction(module, test_name,
> 
> LLVMFunctionType(LLVMVoidTypeInContext(context),
>  args, 
> Elements(args), 0));
> LLVMValueRef arg0 = LLVMGetParam(func, 0);
> @@ -371,14 +373,15 @@ flush_denorm_to_zero(float val)
>   * Test one LLVM unary arithmetic builder function.
>   */
>  static boolean
> -test_unary(unsigned verbose, FILE *fp, const struct unary_test_t *test)
> +test_unary(unsigned verbose, FILE *fp, const struct unary_test_t *test, 
> unsigned length)
>  {
> +   char test_name[128];
> +   util_snprintf(test_name, sizeof test_name, "%s.v%u", test->name, length);
> struct gallivm_state *gallivm;
> LLVMValueRef test_func;
> unary_func_t test_func_jit;
> boolean success = TRUE;
> int i, j;
> -   int length = lp_native_vector_width / 32;
> float *in, *out;
>  
> in = align_malloc(length * 4, length * 4);
> @@ -391,7 +394,7 @@ test_unary(unsigned verbose, FILE *fp, const struct 
> unary_test_t *test)
>  
> gallivm = gallivm_create("test_module", LLVMGetGlobalContext());

This is not related to this patch, but the c++ equivalent of
LLVMGetGlobalContext() has been removed from LLVM, and I think the C
API may be removed at some point in the future, so these tests should
be migrated to use LLVMCreateContext().

-Tom

>  
> -   test_func = build_unary_test_func(gallivm, test);
> +   test_func = build_unary_test_func(gallivm, test, length, test_name);
>  
> gallivm_compile_module(gallivm);
>  
> @@ -411,6 +414,7 @@ test_unary(unsigned verbose, FILE *fp, const struct 
> unary_test_t *test)
>for (i = 0; i < num_vals; ++i) {
>   float testval, ref;
>   double error, precision;
> + boolean expected_pass = TRUE;
>   bool pass;
>  
>   testval = flush_denorm_to_zero(in[i]);
> @@ -429,14 +433,23 @@ test_unary(unsigned verbose, FILE *fp, const struct 
> unary_test_t *test)
>  continue;
>   }
>  
> - if (!pass || verbose) {
> -printf("%s(%.9g): ref = %.9g, out = %.9g, precision = %f bits, 
> %s\n",
> -  test->name, in[i], ref, out[i], precision,
> -  pass ? "PASS" : "FAIL");
> + if (test->ref ==  && length == 2 && 
> + ref != roundf(testval)) {
> +/* FIXME: The generic (non SSE) path in lp_build_iround, which is
> + * always taken for length==2 regardless of native round support,
> + * does not round to even. */
> +expected_pass = FALSE;
> + }
> +
> + if (pass != expected_pass || verbose) {
> +printf("%s(%.9g): ref = %.9g, out = %.9g, precision = %f bits, 
> %s%s\n",
> +  test_name, in[i], ref, out[i], precision,
> +  pass ? "PASS" : "FAIL",
> +  !expected_pass ? (pass ? " (unexpected)" : " (expected)" 
> ): "");
>  fflush(stdout);
>   }
>  
> - if (!pass) {
> + if (pass != expected_pass) {
>  success = FALSE;
>   }
>}
> @@ -458,8 +471,12 @@ test_all(unsigned verbose, FILE *fp)
> int i;
>  
> for (i = 0; i < Elements(unary_tests); ++i) {
> -  if (!test_unary(verbose, fp, _tests[i])) {
> - success = FALSE;
> +  unsigned max_length = lp_native_vector_width / 32;
> +  unsigned length;
> +  for (length = 1; length <= max_length; length *= 2) {
> + if 

[Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-15 Thread Tom Stellard
The ds_bpermute instruction allows threads to transfer data directly
to or from the vgprs of other threads.  These instructions use the lds
hardware to transfer data, but do not read or write lds memory.

DDX BEFORE:|  DDX AFTER:
   |
v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2, -1, v2
v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, v2
v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0 offset:4
ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
s_waitcnt lgkmcnt(0)   |
v_or_b32_e32 v0, 1, v2 |
v_lshlrev_b32_e32 v0, 2, v0|
ds_read_b32 v1, v3 |
ds_read_b32 v0, v0 |
s_waitcnt lgkmcnt(0)   |
   |
LDS: 1 blocks  |  LDS: 0 blocks
---
 src/gallium/drivers/radeonsi/si_shader.c | 51 +---
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 377ff26..c3d03eb 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4117,22 +4117,22 @@ static void si_llvm_emit_ddxy(
LLVMValueRef indices[2];
LLVMValueRef store_ptr, load_ptr0, load_ptr1;
LLVMValueRef tl, trbl, result[4];
-   LLVMValueRef tid_args[2];
+   LLVMValueRef tl_tid, trbl_tid, tid, tid_args[2];
unsigned swizzle[4];
unsigned c;
int idx;
unsigned mask;
-
tid_args[0] = lp_build_const_int32(gallivm, 0x);
tid_args[1] = bld_base->uint_bld.zero;
tid_args[1] = lp_build_intrinsic(gallivm->builder,
"llvm.amdgcn.mbcnt.lo", ctx->i32,
tid_args, 2, LLVMReadNoneAttribute);
-
-   indices[0] = bld_base->uint_bld.zero;
-   indices[1] = lp_build_intrinsic(gallivm->builder,
+   tid = lp_build_intrinsic(gallivm->builder,
"llvm.amdgcn.mbcnt.hi", ctx->i32,
tid_args, 2, LLVMReadNoneAttribute);
+
+   indices[0] = bld_base->uint_bld.zero;
+   indices[1] = tid;
store_ptr = LLVMBuildGEP(gallivm->builder, ctx->lds,
 indices, 2, "");
 
@@ -4143,20 +4143,24 @@ static void si_llvm_emit_ddxy(
else
mask = TID_MASK_TOP_LEFT;
 
-   indices[1] = LLVMBuildAnd(gallivm->builder, indices[1],
- lp_build_const_int32(gallivm, mask), "");
+   tl_tid = LLVMBuildAnd(gallivm->builder, indices[1],
+   lp_build_const_int32(gallivm, mask), "");
+   indices[1] = tl_tid;
load_ptr0 = LLVMBuildGEP(gallivm->builder, ctx->lds,
 indices, 2, "");
 
/* for DDX we want to next X pixel, DDY next Y pixel. */
idx = (opcode == TGSI_OPCODE_DDX || opcode == TGSI_OPCODE_DDX_FINE) ? 1 
: 2;
-   indices[1] = LLVMBuildAdd(gallivm->builder, indices[1],
+   trbl_tid = LLVMBuildAdd(gallivm->builder, indices[1],
  lp_build_const_int32(gallivm, idx), "");
+   indices[1] = trbl_tid;
load_ptr1 = LLVMBuildGEP(gallivm->builder, ctx->lds,
 indices, 2, "");
 
for (c = 0; c < 4; ++c) {
unsigned i;
+   LLVMValueRef val;
+   LLVMValueRef args[2];
 
swizzle[c] = 
tgsi_util_get_full_src_register_swizzle(>Src[0], c);
for (i = 0; i < c; ++i) {
@@ -4168,18 +4172,31 @@ static void si_llvm_emit_ddxy(
if (i != c)
continue;
 
-   LLVMBuildStore(gallivm->builder,
-  LLVMBuildBitCast(gallivm->builder,
-   lp_build_emit_fetch(bld_base, 
inst, 0, c),
-   ctx->i32, ""),
-  store_ptr);
+   val = LLVMBuildBitCast(gallivm->builder,
+   lp_build_emit_fetch(bld_base, inst, 0, c),
+   ctx->i32, "");
 
-   tl = LLVMBuildLoad(gallivm->builder, load_ptr0, "");
-   tl = LLVMBuildBitCast(gallivm->builder, tl, ctx->f32, "");
+   if ((HAVE_LLVM >= 0x0309) && ctx->screen->b.family >= 
CHIP_TONGA) {
 
-   trbl = LLVMBuildLoad(gallivm->builder, load_ptr1, "");
-   trbl = LLVMBuildBitCast(gallivm->builder, trbl, ctx->f32, "");
+   args[0] = LLVMBuildMul(gallivm->builder, tl_tid,
+ 

[Mesa-dev] [PATCH 1/2] radeonsi: Use llvm.amdgcn.mbcnt.* intrinsics instead of llvm.SI.tid

2016-04-15 Thread Tom Stellard
We're trying to move to more of the new style intrinsics with include
the correct target name, and map directly to ISA instructions.
---
 src/gallium/drivers/radeonsi/si_shader.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index c26960b..377ff26 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4117,14 +4117,22 @@ static void si_llvm_emit_ddxy(
LLVMValueRef indices[2];
LLVMValueRef store_ptr, load_ptr0, load_ptr1;
LLVMValueRef tl, trbl, result[4];
+   LLVMValueRef tid_args[2];
unsigned swizzle[4];
unsigned c;
int idx;
unsigned mask;
 
+   tid_args[0] = lp_build_const_int32(gallivm, 0x);
+   tid_args[1] = bld_base->uint_bld.zero;
+   tid_args[1] = lp_build_intrinsic(gallivm->builder,
+   "llvm.amdgcn.mbcnt.lo", ctx->i32,
+   tid_args, 2, LLVMReadNoneAttribute);
+
indices[0] = bld_base->uint_bld.zero;
-   indices[1] = lp_build_intrinsic(gallivm->builder, "llvm.SI.tid", 
ctx->i32,
-NULL, 0, LLVMReadNoneAttribute);
+   indices[1] = lp_build_intrinsic(gallivm->builder,
+   "llvm.amdgcn.mbcnt.hi", ctx->i32,
+   tid_args, 2, LLVMReadNoneAttribute);
store_ptr = LLVMBuildGEP(gallivm->builder, ctx->lds,
 indices, 2, "");
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: Fix build against LLVM SVN >= r266163

2016-04-13 Thread Tom Stellard
On Wed, Apr 13, 2016 at 03:59:36PM +0900, Michel Dänzer wrote:
> From: Michel Dänzer <michel.daen...@amd.com>
> 
> createInternalizePass now takes a callback instead of a StringSet.
> 
> Signed-off-by: Michel Dänzer <michel.daen...@amd.com>
Reviewed-by: Tom Stellard <thomas.stell...@amd.com>
> ---
>  src/gallium/state_trackers/clover/llvm/invocation.cpp | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
> b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> index 4d11c24..97acd03 100644
> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> @@ -322,6 +322,18 @@ namespace {
>// list of kernel functions to the internalizer.  The internalizer will
>// treat the functions in the list as "main" functions and internalize
>// all of the other functions.
> +#if HAVE_LLVM >= 0x0309
> +  auto preserve_kernels = [=](const llvm::GlobalValue ) {
> + for (std::vector::const_iterator I = 
> kernels.begin(),
> +E = 
> kernels.end();
> +I != E; ++I) {
> +llvm::Function *kernel = *I;
> +if (GV.getName() == kernel->getName())
> +   return true;
> + }
> + return false;
> +  };
> +#else
>std::vector export_list;
>for (std::vector::const_iterator I = kernels.begin(),
>   E = kernels.end();
> @@ -329,12 +341,17 @@ namespace {
>   llvm::Function *kernel = *I;
>   export_list.push_back(kernel->getName().data());
>}
> +#endif
>  #if HAVE_LLVM < 0x0306
>PM.add(new llvm::DataLayoutPass(mod));
>  #elif HAVE_LLVM < 0x0307
>PM.add(new llvm::DataLayoutPass());
>  #endif
> +#if HAVE_LLVM >= 0x0309
> +  PM.add(llvm::createInternalizePass(preserve_kernels));
> +#else
>PM.add(llvm::createInternalizePass(export_list));
> +#endif
>  
>llvm::PassManagerBuilder PMB;
>PMB.OptLevel = optimization_level;
> -- 
> 2.8.0.rc3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600/compute: cleanup evergreen_compute.c

2016-04-06 Thread Tom Stellard
On Wed, Apr 06, 2016 at 10:40:50PM +0100, Dave Airlie wrote:
> This probably should have been cleaned up before merging, but we
> were a bit lax with it. This is a bunch of cleanups and changes,
> that make adding ARB_compute_support less of a task.
> 

Acked-by: Tom Stellard <thomas.stell...@amd.com>

> Dave.
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: Fix build against clang SVN >= r265359

2016-04-05 Thread Tom Stellard
On Tue, Apr 05, 2016 at 03:43:35PM +0900, Michel Dänzer wrote:
> From: Michel Dänzer 
> 
> Signed-off-by: Michel Dänzer 

I pushed this, thanks.

-Tom

> ---
>  src/gallium/state_trackers/clover/llvm/invocation.cpp | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
> b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> index 4d11c24..3fb3596 100644
> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> @@ -206,6 +206,9 @@ namespace {
>// http://www.llvm.org/bugs/show_bug.cgi?id=19735
>c.getDiagnosticOpts().ShowCarets = false;
>c.getInvocation().setLangDefaults(c.getLangOpts(), clang::IK_OpenCL,
> +#if HAVE_LLVM >= 0x0309
> +llvm::Triple(triple),
> +#endif
>  clang::LangStandard::lang_opencl11);
>c.createDiagnostics(
>new clang::TextDiagnosticPrinter(
> -- 
> 2.8.0.rc3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/20] radeonsi: set shader calling conventions

2016-04-04 Thread Tom Stellard
On Sat, Apr 02, 2016 at 03:10:44PM +0200, Bas Nieuwenhuizen wrote:
> Note that old mesa + new LLVM or new mesa + old LLVM breaks
> with this change and the corresponding LLVM change (D18559).
> 
> For LLVM version <= 3.8 we use the old method, but we can't detect
> people using a post 3.8 svn version that is still too old.
> 
> Signed-off-by: Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl>
> ---
>  src/gallium/drivers/radeon/radeon_llvm_emit.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
> b/src/gallium/drivers/radeon/radeon_llvm_emit.c
> index 474154e..7174132 100644
> --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
> +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
> @@ -55,6 +55,13 @@ enum radeon_llvm_shader_type {
>   RADEON_LLVM_SHADER_CS = 3,
>  };
>  
> +enum radeon_llvm_calling_convention {
> + RADEON_LLVM_AMDGPU_VS = 87,
> + RADEON_LLVM_AMDGPU_GS = 88,
> + RADEON_LLVM_AMDGPU_PS = 89,
> + RADEON_LLVM_AMDGPU_CS = 90,
> +};
> +
>  void radeon_llvm_add_attribute(LLVMValueRef F, const char *name, int value)
>  {
>   char str[16];
> @@ -71,27 +78,35 @@ void radeon_llvm_add_attribute(LLVMValueRef F, const char 
> *name, int value)
>  void radeon_llvm_shader_type(LLVMValueRef F, unsigned type)
>  {
>   enum radeon_llvm_shader_type llvm_type;
> + enum radeon_llvm_calling_convention calling_conv;
>  

This looks like you will get 'unused variable warnings' with this change.
Probably this easiest thing to do is put (void)variable_name somewhere, but
I'm not sure this is really a big deal.
Either way:

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>


>   switch (type) {
>   case TGSI_PROCESSOR_VERTEX:
>   case TGSI_PROCESSOR_TESS_CTRL:
>   case TGSI_PROCESSOR_TESS_EVAL:
>   llvm_type = RADEON_LLVM_SHADER_VS;
> + calling_conv = RADEON_LLVM_AMDGPU_VS;
>   break;
>   case TGSI_PROCESSOR_GEOMETRY:
>   llvm_type = RADEON_LLVM_SHADER_GS;
> + calling_conv = RADEON_LLVM_AMDGPU_GS;
>   break;
>   case TGSI_PROCESSOR_FRAGMENT:
>   llvm_type = RADEON_LLVM_SHADER_PS;
> + calling_conv = RADEON_LLVM_AMDGPU_PS;
>   break;
>   case TGSI_PROCESSOR_COMPUTE:
>   llvm_type = RADEON_LLVM_SHADER_CS;
> + calling_conv = RADEON_LLVM_AMDGPU_CS;
>   break;
>   default:
>   assert(0);
>   }
>  
> - radeon_llvm_add_attribute(F, "ShaderType", llvm_type);
> + if (HAVE_LLVM >= 0x309)
> + LLVMSetFunctionCallConv(F, calling_conv);
> + else
> + radeon_llvm_add_attribute(F, "ShaderType", llvm_type);
>  }
>  
>  static void init_r600_target()
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/swr: update rasterizer (532172)

2016-03-23 Thread Tom Stellard
On Wed, Mar 23, 2016 at 04:53:29PM +, Rowley, Timothy O wrote:
> 
> > On Mar 23, 2016, at 12:52 AM, Kenneth Graunke  wrote:
> > 
> > That's an awkward situation we've not run into before.
> > 
> > If the code is going to live in the upstream Mesa git repository, then
> > it seems like the best long term plan is to reverse the workflow: make
> > upstream Mesa the canonical repository, do development upstream, and
> > pull changes from upstream into any internal repositories.
> > 
> > Obviously, that's a huge process change - presumably you have a bunch
> > of people working in some Intel perforce system - but working in the
> > public is very beneficial.  It's also the mark of a true open source
> > project, rather than simply "available source”.
> 
> While that situation would be nice, the swr rasterizer is a subset of an 
> internal project, and what is upstreamed publicly is not just a straight copy 
> of our repository.  Moving to having the rasterizer’s “home” to Mesa involves 
> some large technical and workflow challenges.
> 

How much testing do you do on the version of swr that's in Mesa?

-Tom
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] AMDGPU/SI: Add Polaris support

2016-03-23 Thread Tom Stellard
On Wed, Mar 23, 2016 at 02:41:24PM -0400, Alex Deucher via llvm-commits wrote:
> From: Sonny Jiang 
> 

These two patches need to be squashed, but I can do it before I commit.

LGTM.

-Tom

> Signed-off-by: Sonny Jiang 
> Reviewed-by: Alex Deucher 
> ---
>  lib/Target/AMDGPU/Processors.td | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/lib/Target/AMDGPU/Processors.td b/lib/Target/AMDGPU/Processors.td
> index 32c327d..7c93c6a 100644
> --- a/lib/Target/AMDGPU/Processors.td
> +++ b/lib/Target/AMDGPU/Processors.td
> @@ -148,3 +148,11 @@ def : ProcessorModel<"fiji", SIQuarterSpeedModel,
>  def : ProcessorModel<"stoney", SIQuarterSpeedModel,
>[FeatureVolcanicIslands, FeatureISAVersion8_0_1, FeatureLDSBankCount16]
>  >;
> +
> +def : ProcessorModel<"polaris10", SIQuarterSpeedModel,
> +  [FeatureVolcanicIslands, FeatureISAVersion8_0_1]
> +>;
> +
> +def : ProcessorModel<"polaris11", SIQuarterSpeedModel,
> +  [FeatureVolcanicIslands, FeatureISAVersion8_0_1]
> +>;
> -- 
> 2.5.0
> 
> ___
> llvm-commits mailing list
> llvm-comm...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2016-02-17 Thread Tom Stellard
On Thu, Feb 18, 2016 at 02:07:25AM +0100, Roland Scheidegger wrote:
> Am 17.02.2016 um 22:09 schrieb Rowley, Timothy O:
> >> On Nov 18, 2015, at 12:34 PM, Emil Velikov
> >>  wrote: I have no objections against
> >> getting this merged, although here are a couple of things that
> >> should be sorted. Some of these are just reiteration from others:
> > 
> > Sorry about the delay responding to this; we’ve been working on a
> > number of the issues you mentioned (plus the usual year-end holidays
> > and other work).
> > 
> >> - First and foremost - please base your work against master. Mesa, 
> >> alike most other open-source projects, tries to keep features out
> >> of bugfix releases. As such basing things against 11.0 is not
> >> suitable.
> > 
> > Basing our efforts on a particular Mesa branch was an initial
> > development decision to keep a stable base while we figured out how
> > to build a driver from scratch.  We have now rebased to the Mesa
> > master and periodically merge updates.
> > 
> >> - Further combinatorial explosion of build configurations - with 
> >> internal/external core, swr-arch, etc. Some of these can (should?)
> >> be nuked, although further comments will follow as patch(es) hit
> >> the mailing list.
> > 
> > All the additional swr build options have been removed, leaving swr
> > simply as an additional gallium driver that can be enabled.  The
> > build-time architecture dependence has been addressed by building the
> > swr driver twice (avx and avx2), and having swr_create_screen check
> > the architecture and load the appropriate library.  I’m not
> > completely satisfied with the current solution as since the driver is
> > part of the loaded library we need to link most of mesa into the
> > “driver”.  The fix for this seems to be to just build the core swr
> > rasterizer architecture specific and dlopen/dlsym the fifty or so API
> > entry points.  However this interim solution simplifies things for
> > our users and removes the swr specific options from the general Mesa
> > build system.
> You could use different functions for avx and avx2 code, and plug the
> right ones in at runtime, as you can link them both just fine. It just
> requires that your code containing avx2 code is in a different compile
> unit to the one containing avx-only code. This way you only really have
> separate compiled code for the functions where there's really a
> difference (obviously, this prevents the compiler from using avx2 on its
> own in the shared parts, but I doubt that's a problem). Albeit if you
> have lots of differences scattered around (the worst would probably be
> different structures based on such difference used everywhere...) this
> might not be very practical (at a first glance, didn't look like it at
> least for avx and avx2).

You can set feature flags on a per-function basis now, so it's possible
to have an avx and avx2 function in the same module.  I haven't actually
tried this, though, so I'm not sure now well it's working at the moment.

-Tom

> Though I'm not actually sure how you would do that for c++ template
> code, maybe it doesn't work as easily...
> In any case, so far for llvmpipe we didn't bother (except for the jitted
> code of course) to optimize for newer instruction sets precisely due to
> it being annoying (certainly prevents you from doing "let's just
> optimize this math here in this little inline function when avx is
> available" - so we still have rasterization functions which emulate
> sse41 _mm_mul_epi32 with _mm_mul_epu32 and so on).
> 
> Roland
> 
> 
> > 
> >> - Using llvm's C++ interface, building against multiple LLVM 
> >> versions. If openswr only supports only limited versions of llvm,
> >> then the build should bail out accordingly - more
> >> comments/suggestions as patch(es) hit the ML.
> > 
> > OpenSWR now supports llvm 3.6, 3.7, and 3.8.  We don’t explicitly
> > prevent people from trying to use llvm-svn, though as you say the C++
> > api is not stable so they might encounter problems.
> > 
> >> - Will patches porting core openswr functionality from the
> >> internal tree be part of the public discussions ? The VMWare people
> >> have done a great thing trying to keep things open, and people
> >> have, on the rare occasion, found nitpicks in their patches.
> > 
> > Moving patches from the internal rasterizer tree can be scripted at a
> > top level, but unfortunately that’s the easy bit of keeping the two
> > in sync when changes happen on both sides of the fence.  I can try
> > tracking individual patches up to my git knowledge.
> > 
> >> - And last but not least - please split patches sensibly, for your 
> >> submission and further work). The "Initial public Mesa+SWR"
> >> touches files in quite a few different places.
> > 
> > I’m about to send the patches to the list for review; splitting them
> > into the driver, rasterizer, mesa changes, and build system.
> > 
> >> Mildly related - I'll be resending/merging a 

Re: [Mesa-dev] issues with split llvm libraries and llvmpipe and failing to load library

2016-02-11 Thread Tom Stellard
On Thu, Feb 11, 2016 at 01:59:25PM +1000, Dave Airlie wrote:
> Hey,
> 
> So in Fedora rawhide we are now building llvm 3.7.1 into the lots of
> little shared libraries format.
> 

This configuration is only recommended for developers.

See the documentation for BUILD_SHARED_LIBS:BOOL here: 
http://llvm.org/docs/CMake.html

-Tom

> However I'm running into a major problem with the fact that sometimes
> dlclose isn't dropping all the LLVM libraries from the address space
> of the process.
> 
> We have a sequence like this:
> 
> a) X server asks mesa gbm library to init, it loads the
> kms_swrast_dri.so with dlopen(LAZY|GLOBAL). kms_swrast_dri.so is
> linked against a large bunch of LLVM libraries (see below).
> 
> b) gbm discovers it can't do what it wants and dlcloses the library.
> At this point a bunch of the LLVM libraries drop out of the map,
> pretty much everything down to LLVMTarget. Everything from
> LLVMTarget onwards remains loaded and I've no idea how to discover why.
> 
> c) later X tries to load kms_swrast_dri.so for GLX usage, and it
> brings back in all the LLVM libraries that got dropped out, however as
> LLVMObject has never been cleaned up, it has all the command line
> options in it, so we get
> : CommandLine Error: Option 'x86-asm-syntax' registered more than once!
> LLVM ERROR: inconsistency in registered CommandLine options
> and things crash.
> 
> So anyone any ideas why we the linker isn't dropping all the other
> LLVM libraries, or any workaround for this, apart from just going back
> to the single libLLVM?
> 
> Dave.
> 
> [airlied@f21vm ~]$ ldd /usr/lib64/dri/kms_swrast_dri.so
> linux-vdso.so.1 (0x7ffec17fe000)
> libcrypto.so.10 => /lib64/libcrypto.so.10 (0x7f8a68ac5000)
> libselinux.so.1 => /lib64/libselinux.so.1 (0x7f8a688a3000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f8a68685000)
> libdl.so.2 => /lib64/libdl.so.2 (0x7f8a68481000)
> libexpat.so.1 => /lib64/libexpat.so.1 (0x7f8a68255000)
> libdrm_intel.so.1 => /lib64/libdrm_intel.so.1 (0x7f8a68033000)
> libdrm_nouveau.so.2 => /lib64/libdrm_nouveau.so.2 (0x7f8a67e2b000)
> libdrm_radeon.so.1 => /lib64/libdrm_radeon.so.1 (0x7f8a67c1f000)
> libdrm_amdgpu.so.1 => /lib64/libdrm_amdgpu.so.1 (0x7f8a67a16000)
> libdrm.so.2 => /lib64/libdrm.so.2 (0x7f8a67807000)
> libelf.so.1 => /lib64/libelf.so.1 (0x7f8a675ef000)
> libLLVMAMDGPUCodeGen.so.3.7 => /lib64/libLLVMAMDGPUCodeGen.so.3.7
> (0x7f8a672e2000)
> libLLVMAMDGPUAsmParser.so.3.7 =>
> /lib64/libLLVMAMDGPUAsmParser.so.3.7 (0x7f8a670a1000)
> libLLVMAMDGPUUtils.so.3.7 => /lib64/libLLVMAMDGPUUtils.so.3.7
> (0x7f8a66e9f000)
> libLLVMAMDGPUDesc.so.3.7 => /lib64/libLLVMAMDGPUDesc.so.3.7
> (0x7f8a66b9b000)
> libLLVMAMDGPUInfo.so.3.7 => /lib64/libLLVMAMDGPUInfo.so.3.7
> (0x7f8a66999000)
> libLLVMAMDGPUAsmPrinter.so.3.7 =>
> /lib64/libLLVMAMDGPUAsmPrinter.so.3.7 (0x7f8a66774000)
> libLLVMObjCARCOpts.so.3.7 => /lib64/libLLVMObjCARCOpts.so.3.7
> (0x7f8a6654e000)
> libLLVMOption.so.3.7 => /lib64/libLLVMOption.so.3.7 (0x7f8a6634)
> libLLVMIRReader.so.3.7 => /lib64/libLLVMIRReader.so.3.7 
> (0x7f8a6613b000)
> libLLVMAsmParser.so.3.7 => /lib64/libLLVMAsmParser.so.3.7
> (0x7f8a65ef2000)
> libLLVMLinker.so.3.7 => /lib64/libLLVMLinker.so.3.7 (0x7f8a65cdc000)
> libLLVMipo.so.3.7 => /lib64/libLLVMipo.so.3.7 (0x7f8a65a4a000)
> libLLVMVectorize.so.3.7 => /lib64/libLLVMVectorize.so.3.7
> (0x7f8a657ef000)
> libLLVMAArch64Disassembler.so.3.7 =>
> /lib64/libLLVMAArch64Disassembler.so.3.7 (0x7f8a655d)
> libLLVMAArch64CodeGen.so.3.7 =>
> /lib64/libLLVMAArch64CodeGen.so.3.7 (0x7f8a652b)
> libLLVMAArch64AsmParser.so.3.7 =>
> /lib64/libLLVMAArch64AsmParser.so.3.7 (0x7f8a65058000)
> libLLVMAArch64Desc.so.3.7 => /lib64/libLLVMAArch64Desc.so.3.7
> (0x7f8a64dde000)
> libLLVMAArch64Info.so.3.7 => /lib64/libLLVMAArch64Info.so.3.7
> (0x7f8a64bdc000)
> libLLVMAArch64AsmPrinter.so.3.7 =>
> /lib64/libLLVMAArch64AsmPrinter.so.3.7 (0x7f8a64979000)
> libLLVMAArch64Utils.so.3.7 => /lib64/libLLVMAArch64Utils.so.3.7
> (0x7f8a64768000)
> libLLVMBitWriter.so.3.7 => /lib64/libLLVMBitWriter.so.3.7
> (0x7f8a64541000)
> libLLVMX86Disassembler.so.3.7 =>
> /lib64/libLLVMX86Disassembler.so.3.7 (0x7f8a641df000)
> libLLVMX86AsmParser.so.3.7 => /lib64/libLLVMX86AsmParser.so.3.7
> (0x7f8a63f4b000)
> libLLVMX86CodeGen.so.3.7 => /lib64/libLLVMX86CodeGen.so.3.7
> (0x7f8a63b61000)
> libLLVMSelectionDAG.so.3.7 => /lib64/libLLVMSelectionDAG.so.3.7
> (0x7f8a63761000)
> libLLVMAsmPrinter.so.3.7 => /lib64/libLLVMAsmPrinter.so.3.7
> (0x7f8a634e6000)
> libLLVMCodeGen.so.3.7 => /lib64/libLLVMCodeGen.so.3.7 (0x7f8a63014000)
> libLLVMScalarOpts.so.3.7 => /lib64/libLLVMScalarOpts.so.3.7
> (0x7f8a62c92000)
> 

Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-09 Thread Tom Stellard
On Mon, Feb 08, 2016 at 09:38:32PM +0100, Marek Olšák wrote:
> On Mon, Feb 8, 2016 at 5:08 PM, Tom Stellard <t...@stellard.net> wrote:
> > On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
> >> From: Marek Olšák <marek.ol...@amd.com>
> >>
> >> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
> >> but not SI & CI, which can't disable denorms for those instructions.
> >
> > Do you know why this fixes FP16 conversions?  What does the OpenGL
> > spec say about denormal handing?
> 
> Yes, I know why. The patch explain everything as far as I can see
> though. What isn't clear?
> 

I got it now.

> SI & CI: Don't support FP16. FP16 conversions are hardcoded to emit
> and accept FP16 denormals.
> VI: Supports FP16. FP16 denormal support is now configurable and
> affects FP16 conversions as well.(shared setting with FP64).
> 
> OpenGL doesn't require denormals. Piglit does. I think this is
> incorrect piglit behavior.
> 
> >
> >> ---
> >>  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
> >>  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
> >>  src/gallium/drivers/radeonsi/sid.h  |  3 +++
> >>  3 files changed, 29 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> >> b/src/gallium/drivers/radeonsi/si_shader.c
> >> index a4680ce..3f1db70 100644
> >> --- a/src/gallium/drivers/radeonsi/si_shader.c
> >> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> >> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
> >>
> >>   si_shader_binary_read_config(binary, conf, 0);
> >>
> >> + /* Enable 64-bit and 16-bit denormals, because there is no 
> >> performance
> >> +  * cost.
> >> +  *
> >> +  * If denormals are enabled, all floating-point output modifiers are
> >> +  * ignored.
> >> +  *
> >> +  * Don't enable denormals for 32-bit floats, because:
> >> +  * - Floating-point output modifiers would be ignored by the hw.
> >> +  * - Some opcodes don't support denormals, such as v_mad_f32. We 
> >> would
> >> +  *   have to stop using those.
> >> +  * - SI & CI would be very slow.
> >> +  */
> >> + conf->float_mode |= V_00B028_FP_64_DENORMS;
> >> +
> >
> > Do SI/CI support fp64 denorms?  If so, won't this hurt performance?
> 
> Yes, they do. Fp64 denorms don't hurt performance. Only fp32 denorms
> do on SI & CI.
> 
> >
> > We should tell the compiler we are enabling fp-64 denorms by adding
> > +fp64-denormals to the feature string.  It would also be better to
> > read the float_mode value from the config registers emitted by the
> > compiler.
> 
> Yes, I agree, but LLVM only sets these parameters for compute or even
> HSA-only kernels, not for graphics shaders. We need to set the mode
> for all users _now_, not in 6 months. Last time I looked,
> +fp64-denormals had no effect on graphics shaders.
> 

We should still add +fp64-denormals even if the backend doesn't do
anything with it now.  This will make it easier if we have to use this
feature string to enable fix a bug in the backend,
because we will just be able to update LLVM.

I don't have a problem hard-coding float_mode for now, but once LLVM is
emitting the correct thing, we should pull the value from LLVM.

-Tom

> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable denorms for 64-bit and 16-bit floats

2016-02-08 Thread Tom Stellard
On Sat, Feb 06, 2016 at 01:15:42PM +0100, Marek Olšák wrote:
> From: Marek Olšák 
> 
> This fixes FP16 conversion instructions for VI, which has 16-bit floats,
> but not SI & CI, which can't disable denorms for those instructions.

Do you know why this fixes FP16 conversions?  What does the OpenGL
spec say about denormal handing?

> ---
>  src/gallium/drivers/radeonsi/si_shader.c| 14 ++
>  src/gallium/drivers/radeonsi/si_state_shaders.c | 18 --
>  src/gallium/drivers/radeonsi/sid.h  |  3 +++
>  3 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index a4680ce..3f1db70 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -4155,6 +4155,20 @@ int si_compile_llvm(struct si_screen *sscreen,
>  
>   si_shader_binary_read_config(binary, conf, 0);
>  
> + /* Enable 64-bit and 16-bit denormals, because there is no performance
> +  * cost.
> +  *
> +  * If denormals are enabled, all floating-point output modifiers are
> +  * ignored.
> +  *
> +  * Don't enable denormals for 32-bit floats, because:
> +  * - Floating-point output modifiers would be ignored by the hw.
> +  * - Some opcodes don't support denormals, such as v_mad_f32. We would
> +  *   have to stop using those.
> +  * - SI & CI would be very slow.
> +  */
> + conf->float_mode |= V_00B028_FP_64_DENORMS;
> +

Do SI/CI support fp64 denorms?  If so, won't this hurt performance?

We should tell the compiler we are enabling fp-64 denorms by adding
+fp64-denormals to the feature string.  It would also be better to
read the float_mode value from the config registers emitted by the
compiler.


-Tom
>   FREE(binary->config);
>   FREE(binary->global_symbol_offsets);
>   binary->config = NULL;
> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
> b/src/gallium/drivers/radeonsi/si_state_shaders.c
> index ce795c0..77a4e47 100644
> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
> @@ -124,7 +124,8 @@ static void si_shader_ls(struct si_shader *shader)
>   shader->config.rsrc1 = S_00B528_VGPRS((shader->config.num_vgprs - 1) / 
> 4) |
>  S_00B528_SGPRS((num_sgprs - 1) / 8) |
>  S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) |
> -S_00B528_DX10_CLAMP(1);
> +S_00B528_DX10_CLAMP(1) |
> +S_00B528_FLOAT_MODE(shader->config.float_mode);
>   shader->config.rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) |
>  
> S_00B52C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0);
>  }
> @@ -157,7 +158,8 @@ static void si_shader_hs(struct si_shader *shader)
>   si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS,
>  S_00B428_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B428_SGPRS((num_sgprs - 1) / 8) |
> -S_00B428_DX10_CLAMP(1));
> +S_00B428_DX10_CLAMP(1) |
> +S_00B428_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS,
>  S_00B42C_USER_SGPR(num_user_sgprs) |
>  
> S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -203,7 +205,8 @@ static void si_shader_es(struct si_shader *shader)
>  S_00B328_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B328_SGPRS((num_sgprs - 1) / 8) |
>  S_00B328_VGPR_COMP_CNT(vgpr_comp_cnt) |
> -S_00B328_DX10_CLAMP(1));
> +S_00B328_DX10_CLAMP(1) |
> +S_00B328_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B32C_SPI_SHADER_PGM_RSRC2_ES,
>  S_00B32C_USER_SGPR(num_user_sgprs) |
>  
> S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -292,7 +295,8 @@ static void si_shader_gs(struct si_shader *shader)
>   si_pm4_set_reg(pm4, R_00B228_SPI_SHADER_PGM_RSRC1_GS,
>  S_00B228_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  S_00B228_SGPRS((num_sgprs - 1) / 8) |
> -S_00B228_DX10_CLAMP(1));
> +S_00B228_DX10_CLAMP(1) |
> +S_00B228_FLOAT_MODE(shader->config.float_mode));
>   si_pm4_set_reg(pm4, R_00B22C_SPI_SHADER_PGM_RSRC2_GS,
>  S_00B22C_USER_SGPR(num_user_sgprs) |
>  
> S_00B22C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
> @@ -381,7 +385,8 @@ static void si_shader_vs(struct si_shader *shader, struct 
> si_shader *gs)
>  S_00B128_VGPRS((shader->config.num_vgprs - 1) / 4) |
>  

Re: [Mesa-dev] [PATCH 2/3] radeon/llvm: Set the target triple on the module

2016-02-05 Thread Tom Stellard
On Thu, Feb 04, 2016 at 02:42:15PM -0800, Matt Arsenault wrote:
> 
> > On Feb 4, 2016, at 13:02, Tom Stellard <thomas.stell...@amd.com> wrote:
> > 
> > +   LLVMSetTarget(ctx->gallivm.module,
> > +
> > +#if HAVE_LLVM < 0x0306
> > +   "r600--");
> > +#else
> > +   triple);
> > +#endif
> 
> This alone does not set the datalayout, which should also be set here.
> 

Ok, I'll do this in a follow up patch.

-Tom

> -Matt

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Allow dumping LLVM IR before optimization passes

2016-02-05 Thread Tom Stellard
On Fri, Feb 05, 2016 at 08:55:17AM -0500, Nicolai Hähnle wrote:
> On 04.02.2016 13:52, Tom Stellard wrote:
> > On Thu, Feb 04, 2016 at 09:15:26AM +0100, Nicolai Hähnle wrote:
> >> From: Nicolai Hähnle <nicolai.haeh...@amd.com>
> >>
> >> Set R600_DEBUG=preoptir to dump the LLVM IR before optimization passes,
> >> to allow diagnosing problems caused by optimization passes.
> >>
> >> Note that in order to compile the resulting IR with llc, you will first
> >> have to run at least the mem2reg pass, e.g.
> >>
> >> opt -mem2reg -S < shader.ll | llc -march=amdgcn -mcpu=bonaire
> >>
> >> Signed-off-by: Michel Dänzer <michel.daen...@amd.com> (original patch)
> >> Signed-off-by: Nicolai Hähnle <nicolai.haeh...@amd.com> (w/ debug flag)
> >> ---
> >> Having the option is a good idea, but I prefer to have a separate debug
> >> flag for it so that when you try to analyze bugs in codegen (which in
> >> my experience happens more often) you don't have to worry about
> >> replicating the exact same sequence of optimizations manually via the
> >> command line to reproduce the problem there.
> >>
> >>   src/gallium/drivers/radeon/r600_pipe_common.c |  1 +
> >>   src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
> >>   src/gallium/drivers/radeonsi/si_shader.c  | 16 ++--
> >>   3 files changed, 16 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
> >> b/src/gallium/drivers/radeon/r600_pipe_common.c
> >> index c827dbd..a1432ed 100644
> >> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
> >> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
> >> @@ -393,6 +393,7 @@ static const struct debug_named_value 
> >> common_debug_options[] = {
> >>{ "noir", DBG_NO_IR, "Don't print the LLVM IR"},
> >>{ "notgsi", DBG_NO_TGSI, "Don't print the TGSI"},
> >>{ "noasm", DBG_NO_ASM, "Don't print disassembled shaders"},
> >> +  { "preoptir", DBG_PREOPT_IR, "Print the LLVM IR before initial 
> >> optimizations" },
> >>
> >>/* features */
> >>{ "nodma", DBG_NO_ASYNC_DMA, "Disable asynchronous DMA" },
> >> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
> >> b/src/gallium/drivers/radeon/r600_pipe_common.h
> >> index c7e4c44..4e36631 100644
> >> --- a/src/gallium/drivers/radeon/r600_pipe_common.h
> >> +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
> >> @@ -71,6 +71,7 @@
> >>   #define DBG_NO_IR(1 << 12)
> >>   #define DBG_NO_TGSI  (1 << 13)
> >>   #define DBG_NO_ASM   (1 << 14)
> >> +#define DBG_PREOPT_IR (1 << 15)
> >>   /* Bits 21-31 are reserved for the r600g driver. */
> >>   /* features */
> >>   #define DBG_NO_ASYNC_DMA (1llu << 32)
> >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> >> b/src/gallium/drivers/radeonsi/si_shader.c
> >> index 8b524cf..d9ed6b2 100644
> >> --- a/src/gallium/drivers/radeonsi/si_shader.c
> >> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> >> @@ -4092,7 +4092,7 @@ int si_compile_llvm(struct si_screen *sscreen,
> >>if (r600_can_dump_shader(>b, processor)) {
> >>fprintf(stderr, "radeonsi: Compiling shader %d\n", count);
> >>
> >> -  if (!(sscreen->b.debug_flags & DBG_NO_IR))
> >> +  if (!(sscreen->b.debug_flags & (DBG_NO_IR | DBG_PREOPT_IR)))
> >>LLVMDumpModule(mod);
> >>}
> >>
> >> @@ -4178,6 +4178,12 @@ static int si_generate_gs_copy_shader(struct 
> >> si_screen *sscreen,
> >>si_llvm_export_vs(bld_base, outputs, gsinfo->num_outputs);
> >>
> >>LLVMBuildRetVoid(bld_base->base.gallivm->builder);
> >> +
> >> +  /* Dump LLVM IR before any optimization passes */
> >> +  if (sscreen->b.debug_flags & DBG_PREOPT_IR &&
> >> +  r600_can_dump_shader(>b, TGSI_PROCESSOR_GEOMETRY))
> >> +  LLVMDumpModule(bld_base->base.gallivm->module);
> >> +
> >>radeon_llvm_finalize_module(_shader_ctx->radeon_bld);
> >>
> >>if (dump)
> >> @@ -4385,9 +4391,15 @@ int si_shader_create(struct si_screen *sscreen, 
> >> LLVMTargetMachineRef tm,
> &g

[Mesa-dev] [PATCH 2/3] radeon/llvm: Set the target triple on the module

2016-02-04 Thread Tom Stellard
---
 src/gallium/drivers/r600/r600_llvm.c| 2 +-
 src/gallium/drivers/radeon/radeon_llvm.h| 3 ++-
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 9 -
 src/gallium/drivers/radeonsi/si_shader.c| 4 ++--
 4 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_llvm.c 
b/src/gallium/drivers/r600/r600_llvm.c
index 232db13..8db0476 100644
--- a/src/gallium/drivers/r600/r600_llvm.c
+++ b/src/gallium/drivers/r600/r600_llvm.c
@@ -784,7 +784,7 @@ LLVMModuleRef r600_tgsi_llvm(
 {
struct tgsi_shader_info shader_info;
struct lp_build_tgsi_context * bld_base = >soa.bld_base;
-   radeon_llvm_context_init(ctx);
+   radeon_llvm_context_init(ctx, "r600--");
LLVMTypeRef Arguments[32];
unsigned ArgumentsCount = 0;
for (unsigned i = 0; i < ctx->inputs_count; i++)
diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index e967ad2..9f7d039 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -158,7 +158,8 @@ void radeon_llvm_emit_prepare_cube_coords(struct 
lp_build_tgsi_context * bld_bas
  LLVMValueRef *coords_arg,
  LLVMValueRef *derivs_arg);
 
-void radeon_llvm_context_init(struct radeon_llvm_context * ctx);
+void radeon_llvm_context_init(struct radeon_llvm_context * ctx,
+  const char *triple);
 
 void radeon_llvm_create_func(struct radeon_llvm_context * ctx,
  LLVMTypeRef *ParamTypes, unsigned ParamCount);
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 76be376..d03ba4b 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1452,7 +1452,7 @@ static void emit_minmax_int(const struct 
lp_build_tgsi_action *action,
emit_data->args[1], "");
 }
 
-void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
+void radeon_llvm_context_init(struct radeon_llvm_context * ctx, const char 
*triple)
 {
struct lp_type type;
 
@@ -1466,6 +1466,13 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
ctx->gallivm.context = LLVMContextCreate();
ctx->gallivm.module = LLVMModuleCreateWithNameInContext("tgsi",
ctx->gallivm.context);
+   LLVMSetTarget(ctx->gallivm.module,
+
+#if HAVE_LLVM < 0x0306
+   "r600--");
+#else
+   triple);
+#endif
ctx->gallivm.builder = LLVMCreateBuilderInContext(ctx->gallivm.context);
 
struct lp_build_tgsi_context * bld_base = >soa.bld_base;
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 2192b21..6830363 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4133,7 +4133,7 @@ static int si_generate_gs_copy_shader(struct si_screen 
*sscreen,
si_shader_ctx->type = TGSI_PROCESSOR_VERTEX;
si_shader_ctx->is_gs_copy_shader = true;
 
-   radeon_llvm_context_init(_shader_ctx->radeon_bld);
+   radeon_llvm_context_init(_shader_ctx->radeon_bld, "amdgcn--");
 
create_meta_data(si_shader_ctx);
create_function(si_shader_ctx);
@@ -4276,7 +4276,7 @@ int si_shader_create(struct si_screen *sscreen, 
LLVMTargetMachineRef tm,
assert(shader->nparam == 0);
 
memset(_shader_ctx, 0, sizeof(si_shader_ctx));
-   radeon_llvm_context_init(_shader_ctx.radeon_bld);
+   radeon_llvm_context_init(_shader_ctx.radeon_bld, "amdgcn--");
bld_base = _shader_ctx.radeon_bld.soa.bld_base;
 
if (sel->type != PIPE_SHADER_COMPUTE)
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] radeon/llvm: Add TargetLibraryInfo to the pass manager

2016-02-04 Thread Tom Stellard
This will prevent optimization passes from introducing unsupported
library calls.
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index d03ba4b..609da39 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -30,6 +30,7 @@
 #include "gallivm/lp_bld_flow.h"
 #include "gallivm/lp_bld_init.h"
 #include "gallivm/lp_bld_intr.h"
+#include "gallivm/lp_bld_misc.h"
 #include "gallivm/lp_bld_swizzle.h"
 #include "tgsi/tgsi_info.h"
 #include "tgsi/tgsi_parse.h"
@@ -1645,6 +1646,8 @@ void radeon_llvm_create_func(struct radeon_llvm_context * 
ctx,
 void radeon_llvm_finalize_module(struct radeon_llvm_context * ctx)
 {
struct gallivm_state * gallivm = ctx->soa.bld_base.base.gallivm;
+   const char *triple = LLVMGetTarget(gallivm->module);
+   LLVMTargetLibraryInfoRef target_library_info;
/* End the main function with Return*/
LLVMBuildRetVoid(gallivm->builder);
 
@@ -1652,6 +1655,9 @@ void radeon_llvm_finalize_module(struct 
radeon_llvm_context * ctx)
ctx->gallivm.passmgr = LLVMCreateFunctionPassManagerForModule(
gallivm->module);
 
+   target_library_info = gallivm_create_target_library_info(triple);
+   LLVMAddTargetLibraryInfo(target_library_info, gallivm->passmgr);
+
/* This pass should eliminate all the load and store instructions */
LLVMAddPromoteMemoryToRegisterPass(gallivm->passmgr);
 
@@ -1667,7 +1673,7 @@ void radeon_llvm_finalize_module(struct 
radeon_llvm_context * ctx)
 
LLVMDisposeBuilder(gallivm->builder);
LLVMDisposePassManager(gallivm->passmgr);
-
+   gallivm_dispose_target_library_info(target_library_info);
 }
 
 void radeon_llvm_dispose(struct radeon_llvm_context * ctx)
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] gallivm: Add helpers for creating and destroying TargetLibraryInfo

2016-02-04 Thread Tom Stellard
This functionality is not exposed via the LLVM C API.
---
 src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 30 +++
 src/gallium/auxiliary/gallivm/lp_bld_misc.h   |  7 +++
 2 files changed, 37 insertions(+)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
index 3ee708f..30ef37c 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
@@ -61,6 +61,11 @@
 #include 
 #include 
 #include 
+#if HAVE_LLVM >= 0x0307
+#include 
+#else
+#include 
+#endif
 #if HAVE_LLVM < 0x0306
 #include 
 #else
@@ -147,6 +152,31 @@ lp_set_target_options(void)
gallivm_init_llvm_targets();
 }
 
+extern "C"
+LLVMTargetLibraryInfoRef
+gallivm_create_target_library_info(const char *triple)
+{
+   return reinterpret_cast(
+#if HAVE_LLVM < 0x0307
+   new llvm::TargetLibraryInfo(
+#else
+   new llvm::TargetLibraryInfoImpl(
+#endif
+   llvm::Triple(triple)));
+}
+
+extern "C"
+void
+gallivm_dispose_target_library_info(LLVMTargetLibraryInfoRef library_info)
+{
+   delete reinterpret_cast<
+#if HAVE_LLVM < 0x0307
+   llvm::TargetLibraryInfo
+#else
+   llvm::TargetLibraryInfoImpl
+#endif
+   *>(library_info);
+}
 
 extern "C"
 LLVMValueRef
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.h 
b/src/gallium/auxiliary/gallivm/lp_bld_misc.h
index 86d2f86..30b7b16 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.h
@@ -32,6 +32,7 @@
 
 #include "lp_bld.h"
 #include 
+#include 
 
 
 #ifdef __cplusplus
@@ -44,6 +45,12 @@ struct lp_generated_code;
 extern void
 gallivm_init_llvm_targets(void);
 
+extern LLVMTargetLibraryInfoRef
+gallivm_create_target_library_info(const char *triple);
+
+extern void
+gallivm_dispose_target_library_info(LLVMTargetLibraryInfoRef library_info);
+
 extern void
 lp_set_target_options(void);
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Allow dumping LLVM IR before optimization passes

2016-02-04 Thread Tom Stellard
On Thu, Feb 04, 2016 at 09:15:26AM +0100, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> Set R600_DEBUG=preoptir to dump the LLVM IR before optimization passes,
> to allow diagnosing problems caused by optimization passes.
> 
> Note that in order to compile the resulting IR with llc, you will first
> have to run at least the mem2reg pass, e.g.
> 
> opt -mem2reg -S < shader.ll | llc -march=amdgcn -mcpu=bonaire
> 
> Signed-off-by: Michel Dänzer  (original patch)
> Signed-off-by: Nicolai Hähnle  (w/ debug flag)
> ---
> Having the option is a good idea, but I prefer to have a separate debug
> flag for it so that when you try to analyze bugs in codegen (which in
> my experience happens more often) you don't have to worry about
> replicating the exact same sequence of optimizations manually via the
> command line to reproduce the problem there.
> 
>  src/gallium/drivers/radeon/r600_pipe_common.c |  1 +
>  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
>  src/gallium/drivers/radeonsi/si_shader.c  | 16 ++--
>  3 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
> b/src/gallium/drivers/radeon/r600_pipe_common.c
> index c827dbd..a1432ed 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
> @@ -393,6 +393,7 @@ static const struct debug_named_value 
> common_debug_options[] = {
>   { "noir", DBG_NO_IR, "Don't print the LLVM IR"},
>   { "notgsi", DBG_NO_TGSI, "Don't print the TGSI"},
>   { "noasm", DBG_NO_ASM, "Don't print disassembled shaders"},
> + { "preoptir", DBG_PREOPT_IR, "Print the LLVM IR before initial 
> optimizations" },
>  
>   /* features */
>   { "nodma", DBG_NO_ASYNC_DMA, "Disable asynchronous DMA" },
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
> b/src/gallium/drivers/radeon/r600_pipe_common.h
> index c7e4c44..4e36631 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.h
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
> @@ -71,6 +71,7 @@
>  #define DBG_NO_IR(1 << 12)
>  #define DBG_NO_TGSI  (1 << 13)
>  #define DBG_NO_ASM   (1 << 14)
> +#define DBG_PREOPT_IR(1 << 15)
>  /* Bits 21-31 are reserved for the r600g driver. */
>  /* features */
>  #define DBG_NO_ASYNC_DMA (1llu << 32)
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index 8b524cf..d9ed6b2 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -4092,7 +4092,7 @@ int si_compile_llvm(struct si_screen *sscreen,
>   if (r600_can_dump_shader(>b, processor)) {
>   fprintf(stderr, "radeonsi: Compiling shader %d\n", count);
>  
> - if (!(sscreen->b.debug_flags & DBG_NO_IR))
> + if (!(sscreen->b.debug_flags & (DBG_NO_IR | DBG_PREOPT_IR)))
>   LLVMDumpModule(mod);
>   }
>  
> @@ -4178,6 +4178,12 @@ static int si_generate_gs_copy_shader(struct si_screen 
> *sscreen,
>   si_llvm_export_vs(bld_base, outputs, gsinfo->num_outputs);
>  
>   LLVMBuildRetVoid(bld_base->base.gallivm->builder);
> +
> + /* Dump LLVM IR before any optimization passes */
> + if (sscreen->b.debug_flags & DBG_PREOPT_IR &&
> + r600_can_dump_shader(>b, TGSI_PROCESSOR_GEOMETRY))
> + LLVMDumpModule(bld_base->base.gallivm->module);
> +
>   radeon_llvm_finalize_module(_shader_ctx->radeon_bld);
>  
>   if (dump)
> @@ -4385,9 +4391,15 @@ int si_shader_create(struct si_screen *sscreen, 
> LLVMTargetMachineRef tm,
>   }
>  
>   LLVMBuildRetVoid(bld_base->base.gallivm->builder);
> + mod = bld_base->base.gallivm->module;
> +
> + /* Dump LLVM IR before any optimization passes */
> + if (sscreen->b.debug_flags & DBG_PREOPT_IR &&
> + r600_can_dump_shader(>b, si_shader_ctx.type))
> + LLVMDumpModule(mod);
> +

Is there any reason not to add the dump in  radeon_llvm_finalize_module()
after PromoteMem2Reg has run?  This would make the output readable by llc
and then you would only need to add the dump call in one place.

-Tom

>   radeon_llvm_finalize_module(_shader_ctx.radeon_bld);
>  
> - mod = bld_base->base.gallivm->module;
>   r = si_compile_llvm(sscreen, >binary, >config, tm,
>   mod, debug, si_shader_ctx.type);
>   if (r) {
> -- 
> 2.5.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] radeonsi: add max waves / SIMD to shader stats (v2)

2016-01-25 Thread Tom Stellard
On Fri, Jan 22, 2016 at 03:18:12PM +0100, Marek Olšák wrote:
> From: Marek Olšák 
> 
> v2: account for LDS usage in PS
> the limit is per SIMD, not per CU
> ---
>  src/gallium/drivers/radeonsi/si_shader.c | 54 
> +---
>  1 file changed, 49 insertions(+), 5 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index 1bd617f..33c0db6 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -4001,22 +4001,65 @@ static void si_shader_dump_disassembly(const struct 
> radeon_shader_binary *binary
>  
>  static void si_shader_dump_stats(struct si_screen *sscreen,
>struct si_shader_config *conf,
> +  unsigned num_inputs,
>unsigned code_size,
>struct pipe_debug_callback *debug,
>unsigned processor)
>  {
> + unsigned lds_increment = sscreen->b.chip_class >= CIK ? 512 : 256;
> + unsigned lds_per_wave = 0;
> + unsigned max_simd_waves = 10;
> +
> + /* Compute LDS usage for PS. */
> + if (processor == TGSI_PROCESSOR_FRAGMENT) {
> + /* The minimum usage per wave is (num_inputs * 36). The maximum
> +  * usage is (num_inputs * 36 * 16).
> +  * We can get anything in between and it varies between waves.
> +  *
> +  * Other stages don't know the size at compile time or don't
> +  * allocate LDS per wave, but instead they do it per thread 
> group.
> +  */
> + lds_per_wave = conf->lds_size * lds_increment +
> +align(num_inputs * 36, lds_increment);
> + }
> +
> + /* Compute the per-SIMD wave counts. */
> + if (conf->num_sgprs) {
> + if (sscreen->b.chip_class >= VI)
> + max_simd_waves = MIN2(max_simd_waves, 800 / 
> conf->num_sgprs);
> + else
> + max_simd_waves = MIN2(max_simd_waves, 512 / 
> conf->num_sgprs);
> + }
> +
> + if (conf->num_vgprs)
> + max_simd_waves = MIN2(max_simd_waves, 256 / conf->num_vgprs);
> +
> + /* LDS is 64KB per CU (4 SIMDs), divided into 16KB blocks per SIMD
> +  * that PS can use.
> +  */
> + if (lds_per_wave)
> + max_simd_waves = MIN2(max_simd_waves, 16384 / lds_per_wave);
> +
>   if (r600_can_dump_shader(>b, processor)) {
>   fprintf(stderr, "*** SHADER STATS ***\n"
> - "SGPRS: %d\nVGPRS: %d\nCode Size: %d bytes\nLDS: %d 
> blocks\n"
> - "Scratch: %d bytes per wave\n\n",
> + "SGPRS: %d\n"
> + "VGPRS: %d\n"
> + "Code Size: %d bytes\n"
> + "LDS: %d blocks\n"
> + "Scratch: %d bytes per wave\n"
> + "Max Waves: %d\n"
> + "\n",
>   conf->num_sgprs, conf->num_vgprs, code_size,
> - conf->lds_size, conf->scratch_bytes_per_wave);
> + conf->lds_size, conf->scratch_bytes_per_wave,
> + max_simd_waves);
>   }
>  
>   pipe_debug_message(debug, SHADER_INFO,
> -"Shader Stats: SGPRS: %d VGPRS: %d Code Size: %d 
> LDS: %d Scratch: %d",
> +"Shader Stats: SGPRS: %d VGPRS: %d Code Size: %d "
> +"LDS: %d Scratch: %d Max Waves: %d",
>  conf->num_sgprs, conf->num_vgprs, code_size,
> -conf->lds_size, conf->scratch_bytes_per_wave);
> +conf->lds_size, conf->scratch_bytes_per_wave,
> +max_simd_waves);
>  }
>  
>  void si_shader_dump(struct si_screen *sscreen, struct si_shader *shader,
> @@ -4027,6 +4070,7 @@ void si_shader_dump(struct si_screen *sscreen, struct 
> si_shader *shader,
>   si_shader_dump_disassembly(>binary, debug);
>  
>   si_shader_dump_stats(sscreen, >config,
> +shader->selector->info.num_inputs,

clover is segfaulting here, because shader->selector is NULL for compute
shaders.

-Tom

>shader->binary.code_size, debug, processor);
>  }
>  
> -- 
> 2.1.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-dev, v6, 1/3] clover: separate compile and link stages

2016-01-25 Thread Tom Stellard
Hi,

>diff --git a/src/gallium/state_trackers/clover/api/program.cpp 
>b/src/gallium/state_trackers/clover/api/program.cpp
>index 27ca2ef..f8d946e 100644
>--- a/src/gallium/state_trackers/clover/api/program.cpp
>+++ b/src/gallium/state_trackers/clover/api/program.cpp
>@@ -181,13 +181,20 @@ clBuildProgram(cl_program d_prog, cl_uint num_devs,
> 
>validate_build_program_common(prog, num_devs, d_devs, pfn_notify, 
> user_data);
> 
>-   prog.build(devs, opts);
>+   if (prog.has_source) {
>+  prog.compile(devs, opts);
>+  prog.link(devs, opts, { prog });
>+   }
>return CL_SUCCESS;
> } catch (error ) {
>-   if (e.get() == CL_INVALID_COMPILER_OPTIONS)
>-  return CL_INVALID_BUILD_OPTIONS;
>-   if (e.get() == CL_COMPILE_PROGRAM_FAILURE)
>-  return CL_BUILD_PROGRAM_FAILURE;
>+   switch (e.get()) {
>+  case CL_INVALID_COMPILER_OPTIONS:
>+  case CL_INVALID_LINKER_OPTIONS:
>+ return CL_INVALID_BUILD_OPTIONS;
>+  case CL_COMPILE_PROGRAM_FAILURE:
>+  case CL_LINK_PROGRAM_FAILURE:
>+ return CL_BUILD_PROGRAM_FAILURE;
>+   }
>return e.get();
> }
> 
>@@ -224,7 +231,7 @@ clCompileProgram(cl_program d_prog, cl_uint num_devs,
>   range(header_names, num_headers),
>   objs(d_header_progs, num_headers));
> 
>-   prog.build(devs, opts, headers);
>+   prog.compile(devs, opts, headers);
>return CL_SUCCESS;
> 
> } catch (error ) {
>diff --git a/src/gallium/state_trackers/clover/core/compiler.hpp 
>b/src/gallium/state_trackers/clover/core/compiler.hpp
>index 2076417..0d6766a 100644
>--- a/src/gallium/state_trackers/clover/core/compiler.hpp
>+++ b/src/gallium/state_trackers/clover/core/compiler.hpp
>@@ -32,11 +32,16 @@ namespace clover {
> 
>module compile_program_llvm(const std::string ,
>const header_map ,
>-   pipe_shader_ir ir,
>const std::string ,
>const std::string ,
>std::string _log);
> 
>+   module link_program_llvm(const std::vector ,
>+enum pipe_shader_ir ir,
>+const std::string ,
>+const std::string ,
>+std::string _log);
>+
>module compile_program_tgsi(const std::string ,
>std::string _log);
> }
>diff --git a/src/gallium/state_trackers/clover/core/error.hpp 
>b/src/gallium/state_trackers/clover/core/error.hpp
>index 59a5af4..4ec619c 100644
>--- a/src/gallium/state_trackers/clover/core/error.hpp
>+++ b/src/gallium/state_trackers/clover/core/error.hpp
>@@ -72,6 +72,13 @@ namespace clover {
>   }
>};
> 
>+   class link_error : public error {
>+   public:
>+  link_error(const std::string  = "") :
>+ error(CL_LINK_PROGRAM_FAILURE , what) {
>+  }
>+   };
>+
>template
>class invalid_object_error;
> 
>diff --git a/src/gallium/state_trackers/clover/core/program.cpp 
>b/src/gallium/state_trackers/clover/core/program.cpp
>index 6eebd9c..4aa2622 100644
>--- a/src/gallium/state_trackers/clover/core/program.cpp
>+++ b/src/gallium/state_trackers/clover/core/program.cpp
>@@ -40,8 +40,8 @@ program::program(clover::context ,
> }
> 
> void
>-program::build(const ref_vector , const char *opts,
>-   const header_map ) {
>+program::compile(const ref_vector , const std::string ,
>+ const header_map ) {
>if (has_source) {
>   _devices = devs;
> 
>@@ -58,9 +58,7 @@ program::build(const ref_vector , const char 
>*opts,
> auto module = (dev.ir_format() == PIPE_SHADER_IR_TGSI ?
>compile_program_tgsi(_source, log) :
>compile_program_llvm(_source, headers,
>-dev.ir_format(),
>-dev.ir_target(), 
>build_opts(dev),
>-log));
>+dev.ir_target(), opts, log));
> _binaries.insert({ , module });
> _logs.insert({ , log });
>  } catch (const error &) {
>@@ -71,6 +69,39 @@ program::build(const ref_vector , const char 
>*opts,
>}
> }
> 
>+void
>+program::link(const ref_vector , const std::string ,
>+  const ref_vector ) {
>+   _devices = devs;
>+
>+   for (auto  : devs) {
>+  if (dev.ir_format() == PIPE_SHADER_IR_TGSI)
>+ continue;
>+
>+   const std::vector mods = map([&](const program ) {
>+  return prog.binary(dev);
>+   }, progs);
>+
>+  _binaries.erase();
>+  _opts.erase();
>+
>+  _opts.insert({ , opts });
>+
>+  std::string log;
>+
>+  try {
>+ auto module = link_program_llvm(mods,
>+ dev.ir_format(), dev.ir_target(),
>+ opts, log);
>+ _binaries.insert({ , module });
>+ 

Re: [Mesa-dev] [PATCH 4/4] radeonsi: change LLVM intrinsics for BREV, CLAMP, EX2

2016-01-22 Thread Tom Stellard
On Fri, Jan 22, 2016 at 03:18:13PM +0100, Marek Olšák wrote:
> From: Marek Olšák <marek.ol...@amd.com>
> 

Reviewed-by: Tom Stellard <thomas.stell...@amd.com>

> Requested by Matt Arsenault.
> ---
>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index c94f109..76be376 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -1511,12 +1511,14 @@ void radeon_llvm_context_init(struct 
> radeon_llvm_context * ctx)
>   bld_base->op_actions[TGSI_OPCODE_BFI].emit = emit_bfi;
>   bld_base->op_actions[TGSI_OPCODE_BGNLOOP].emit = bgnloop_emit;
>   bld_base->op_actions[TGSI_OPCODE_BREV].emit = 
> build_tgsi_intrinsic_nomem;
> - bld_base->op_actions[TGSI_OPCODE_BREV].intr_name = "llvm.AMDGPU.brev";
> + bld_base->op_actions[TGSI_OPCODE_BREV].intr_name =
> + HAVE_LLVM >= 0x0308 ? "llvm.bitreverse.i32" : 
> "llvm.AMDGPU.brev";
>   bld_base->op_actions[TGSI_OPCODE_BRK].emit = brk_emit;
>   bld_base->op_actions[TGSI_OPCODE_CEIL].emit = 
> build_tgsi_intrinsic_nomem;
>   bld_base->op_actions[TGSI_OPCODE_CEIL].intr_name = "llvm.ceil.f32";
>   bld_base->op_actions[TGSI_OPCODE_CLAMP].emit = 
> build_tgsi_intrinsic_nomem;
> - bld_base->op_actions[TGSI_OPCODE_CLAMP].intr_name = "llvm.AMDIL.clamp.";
> + bld_base->op_actions[TGSI_OPCODE_CLAMP].intr_name =
> + HAVE_LLVM >= 0x0308 ? "llvm.AMDGPU.clamp." : 
> "llvm.AMDIL.clamp.";
>   bld_base->op_actions[TGSI_OPCODE_CMP].emit = emit_cmp;
>   bld_base->op_actions[TGSI_OPCODE_CONT].emit = cont_emit;
>   bld_base->op_actions[TGSI_OPCODE_COS].emit = build_tgsi_intrinsic_nomem;
> @@ -1539,7 +1541,8 @@ void radeon_llvm_context_init(struct 
> radeon_llvm_context * ctx)
>   bld_base->op_actions[TGSI_OPCODE_ENDIF].emit = endif_emit;
>   bld_base->op_actions[TGSI_OPCODE_ENDLOOP].emit = endloop_emit;
>   bld_base->op_actions[TGSI_OPCODE_EX2].emit = build_tgsi_intrinsic_nomem;
> - bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.AMDIL.exp.";
> + bld_base->op_actions[TGSI_OPCODE_EX2].intr_name =
> + HAVE_LLVM >= 0x0308 ? "llvm.exp2.f32" : "llvm.AMDIL.exp.";
>   bld_base->op_actions[TGSI_OPCODE_FLR].emit = build_tgsi_intrinsic_nomem;
>   bld_base->op_actions[TGSI_OPCODE_FLR].intr_name = "llvm.floor.f32";
>   bld_base->op_actions[TGSI_OPCODE_FMA].emit = build_tgsi_intrinsic_nomem;
> -- 
> 2.1.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/23] RadeonSI: Restructuring shader code generation part 2

2016-01-06 Thread Tom Stellard
On Wed, Jan 06, 2016 at 01:41:22PM +0100, Marek Olšák wrote:
> Hi,
> 
> These boring patches focus on restructuring pixel shader output handling and 
> code around si_compile_llvm (config, dumping, etc.). They are mostly code 
> movements and dividing functions into smaller ones, so that they can be 
> re-used by pixel shader epilog compilation code.
> 
> Please review.

These all look OK to me.

-Tom
> 
> Marek
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH shader-db] si-report: Track max waves per CU

2016-01-04 Thread Tom Stellard
---
 si-report.py | 56 ++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/si-report.py b/si-report.py
index ec88112..e717af0 100755
--- a/si-report.py
+++ b/si-report.py
@@ -65,6 +65,12 @@ def get_scratch_str(value, suffixes = True):
 suffix = 'bytes per wave'
 return get_value_str(value, 'Scratch', suffix)
 
+def get_waves_per_cu_str(value, suffixes = True):
+suffix = ''
+if suffixes:
+suffix = 'waves'
+return get_value_str(value, 'Max Waves / CU', suffix)
+
 def calculate_percent_change(b, a):
 if b == 0:
 return 0
@@ -89,15 +95,17 @@ class si_stats:
 self.code_size = 0
 self.lds = 0
 self.scratch = 0
+self.max_waves_per_cu = 0
 
 
 def to_string(self, suffixes = True):
-return "{}{}{}{}{}".format(
+return "{}{}{}{}{}{}".format(
 get_sgpr_str(self.sgprs, suffixes),
 get_vgpr_str(self.vgprs, suffixes),
 get_code_size_str(self.code_size, suffixes),
 get_lds_str(self.lds, suffixes),
-get_scratch_str(self.scratch, suffixes))
+get_scratch_str(self.scratch, suffixes),
+get_waves_per_cu_str(self.max_waves_per_cu, suffixes))
 
 
 def __str__(self):
@@ -109,6 +117,7 @@ class si_stats:
 self.code_size += other.code_size
 self.lds += other.lds
 self.scratch += other.scratch
+self.max_waves_per_cu += other.max_waves_per_cu
 
 def update(self, comp, cmp_fn):
 for name in self.__dict__.keys():
@@ -153,6 +162,48 @@ class si_stats:
 return False
 return True
 
+#TODO: Handle VI+ and take LDS into account.
+def compute_max_waves_per_cu(sgprs, vgprs):
+sgpr_waves = 10
+if sgprs <= 48:
+sgpr_waves = 10
+elif sgprs <= 56:
+sgpr_waves = 9
+elif sgprs <= 64:
+sgpr_waves = 8
+elif sgprs <= 72:
+sgpr_waves = 7
+elif sgprs <= 80:
+sgpr_waves = 6
+elif sgprs <= 96:
+sgpr_waves = 5
+else:
+sgpr_waves = 4
+
+vgpr_waves = 10
+if vgprs <= 24:
+vgpr_waves = 10
+elif vgprs <= 28:
+vgpr_waves = 9
+elif vgprs <= 32:
+vgpr_waves = 8
+elif vgprs <= 36:
+vgpr_waves = 7
+elif vgprs <= 40:
+vgpr_waves = 6
+elif vgprs <= 48:
+vgpr_waves = 5
+elif vgprs <= 64:
+vgpr_waves = 4
+elif vgprs <= 84:
+vgpr_waves = 3
+elif vgprs <= 128:
+vgpr_waves = 2
+else:
+vgpr_waves = 1
+
+return min(sgpr_waves, vgpr_waves)
+
 def get_results(filename):
 file = open(filename, "r")
 lines = file.read().split('\n')
@@ -199,6 +250,7 @@ def get_results(filename):
 current_stats.scratch = int(match.groups()[0])
 continue
 
+current_stats.max_waves_per_cu = 
compute_max_waves_per_cu(current_stats.sgprs, current_stats.vgprs)
 match = re.search(re_end, line)
 if match:
 results.append(current_stats)
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [shader-db PATCH 1/5] run: create debug contexts

2016-01-04 Thread Tom Stellard
On Wed, Dec 30, 2015 at 09:32:38PM -0500, Nicolai Hähnle wrote:
> For Gallium-based drivers, this is required for receiving shader information
> via debug messages.

Patches 2-5 are

Acked-by: Tom Stellard <thomas.stell...@amd.com>

> ---
>  run.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/run.c b/run.c
> index 82d8c91..685f830 100644
> --- a/run.c
> +++ b/run.c
> @@ -435,6 +435,7 @@ main(int argc, char **argv)
>  EGL_CONTEXT_OPENGL_CORE_PROFILE_BIT_KHR,
>  EGL_CONTEXT_MAJOR_VERSION_KHR, 3,
>  EGL_CONTEXT_MINOR_VERSION_KHR, 2,
> +EGL_CONTEXT_FLAGS_KHR, EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR,
>  EGL_NONE
>  };
>  EGLContext core_ctx = eglCreateContext(egl_dpy, cfg, EGL_NO_CONTEXT,
> -- 
> 2.5.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] New stable-branch 11.0 candidate pushed

2015-12-07 Thread Tom Stellard
On Mon, Dec 07, 2015 at 02:58:54PM +, Emil Velikov wrote:
> Hello list,
> 
> The candidate for the Mesa 11.0.7 is now available. Currently we have:
>  - 43 queued
>  - 24 nominated (outstanding)
>  - and 1 rejected/obsolete patches
> 
> Quite a few meta fixes (affecting i965), some driver fixes for i965,
> nouveau, r600 and llvm. The video encoding for Stoney has been disabled,
> as it requires many invasive changes to get working properly.
> There are also build fixes for DragonFly and other *BSD platforms, 
> 
> 
> Take a look at section "Mesa stable queue" for more information.
> 
> 
> Testing
> ---
> The following results are against piglit 4b6848c131c.
> 
> 
> Changes - classic i965(snb)
> ---
> None.
> 
> 
> Changes - swrast classic
> 
> None.
> 
> 
> Changes - gallium softpipe
> --
> None.
> 
> 
> Changes - gallium llvmpipe (LLVM 3.7)
> -
> None.
> 
> 
> Testing reports/general approval
> 
> Any testing reports (or general approval of the state of the branch)
> will be greatly appreciated.
> 
> 
> Trivial merge conflicts
> ---
> commit 5474b52bba196af38f3eaacfce69285e7d768dad
> Author: Dave Airlie <airl...@redhat.com>
> 
> r600: workaround empty geom shader.
> 
> (cherry picked from commit 4f347225752b48f3dc5a59a6be71fe78616252a7)
> 
> 
> The plan is to have 11.0.7 this Wednesday (9th of December).
> 
> If you have any questions or comments that you would like to share
> before the release, please go ahead.
> 
> 
> Cheers,
> Emil
> 
> 
> Mesa stable queue
> -
> 
> Nominated (24)
> ==
> 
> Boyan Ding (1):
>   i915: Add XRGB format to intel_screen_make_configs
> 
> Boyuan Zhang (1):
>   radeon/uvd: uv pitch separation for stoney
> 
> Brian Paul (1):
>   configure: don't try to build gallium DRI drivers if --disable-dri is 
> set
> 
> Emil Velikov (2):
>   i965: store reference to the context within struct brw_fence
>   egl/dri2: expose srgb configs when KHR_gl_colorspace is available
> 
> Ian Romanick (1):
>   meta/generate_mipmap: Work-around GLES 1.x problem with 
> GL_DRAW_FRAMEBUFFER
> 
> Ilia Mirkin (6):
>   freedreno/a4xx: support lod_bias
>   freedreno/a4xx: fix 5_5_5_1 texture sampler format
>   freedreno/a4xx: point regid to "red" even for alpha-only rb formats
>   nvc0/ir: fold postfactor into immediate
>   nv50/ir: deal with loops with no breaks
>   nv50/ir: fix DCE to not generate 96-bit loads
> 
> Jean-Sébastien Pédron (1):
>   ralloc: Use __attribute__((destructor)) instead of atexit(3)
> 
> Jonathan Gray (1):
>   configure: check for python2.7 for PYTHON2
> 
> Kenneth Graunke (2):
>   i965: Fix fragment shader struct inputs.
>   i965: Fix scalar vertex shader struct outputs.
> 
> Marek Olšák (2):
>   radeonsi: fix occlusion queries on Fiji
>   radeonsi: fix a hang due to uninitialized border color registers
> 
> Tom Stellard (6):
>   clover: Call clBuildProgram() notification function when build 
> completes v2
>   gallium/drivers: Add threadsafe wrappers for pipe_context v2
>   clover: Use threadsafe wrappers for pipe_context v2
>   clover: Properly initialize LLVM targets when linking with component 
> libs
>   radeonsi: Rename si_shader::ls_rsrc{1, 2} to si_shader::rsrc{1, 2}
>   radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values
> 

What do I need to do to get this patch into the stable release?

-Tom

> 
> Queued (43)
> ===
> 
> Chris Wilson (1):
>   meta: Compute correct buffer size with SkipRows/SkipPixels
> 
> Daniel Stone (1):
>   egl/wayland: Ignore rects from SwapBuffersWithDamage
> 
> Dave Airlie (4):
>   texgetimage: consolidate 1D array handling code.
>   r600: geometry shader gsvs itemsize workaround
>   r600: rv670 use at least 16es/gs threads
>   r600: workaround empty geom shader.
> 
> Emil Velikov (3):
>   docs: add sha256 checksums for 11.0.6
>   get-pick-list.sh: Require explicit "11.0" for nominating stable patches
>   mesa; add get-extra-pick-list.sh script into bin/
> 
> François Tigeot (1):
>   xmlconfig: Add support for DragonFly
> 
> Ian Romanick (22):
>   mesa: Make bind_vertex_buffer avilable outside varray.c
>   mesa: Refactor update_array_format to make 
> _mesa_update_array_for

Re: [Mesa-dev] Mesa 11.1.0 release candidate 2

2015-11-30 Thread Tom Stellard
On Mon, Nov 30, 2015 at 09:58:28AM +, Emil Velikov wrote:
> The second release candidate for Mesa 11.1.0 is now available.
> 

Hi Emil,

Can we make sure not to do the 11.1.0 release until we have a
fix for the pipe loader bug in clover?  clover is completely
broken in -rc2

-Tom

> 
> Boyuan Zhang (1):
>   radeon/uvd: uv pitch separation for stoney
> 
> Dave Airlie (1):
>   texgetimage: consolidate 1D array handling code.
> 
> Emil Velikov (12):
>   pipe-loader: link against libloader regardless of libdrm presence
>   loader: unconditionally add AM_CPPFLAGS to libloader_la_CPPFLAGS
>   configure.ac: default to disabled dri3 when --disable-dri is set
>   pipe-loader: fix off-by one error
>   target-hepers: add non inline sw helpers
>   targets: use the non-inline sw helpers
>   pipe-loader: check if winsys.name is non-null prior to strcmp
>   st/dri: fd management cleanups
>   st/xa: fd management cleanups
>   auxiliary/vl/drm: fd management cleanups
>   auxiliary/vl/dri: fd management cleanups
>   Update version to 11.1.0-rc2
> 
> Eric Anholt (2):
>   vc4: Just put USE_VC4_SIMULATOR in DEFINES.
>   vc4: Take precedence over ilo when in simulator mode.
> 
> Ian Romanick (23):
>   mesa: Make bind_vertex_buffer avilable outside varray.c
>   mesa: Refactor update_array_format to make 
> _mesa_update_array_format_public
>   mesa: Refactor enable_vertex_array_attrib to make 
> _mesa_enable_vertex_array_attrib
>   i965: Pass brw_context instead of gl_context to brw_draw_rectlist
>   i965: Use DSA functions for VBOs in brw_meta_fast_clear
>   i965: Use internal functions for buffer object access
>   i965: Don't pollute the buffer object namespace in brw_meta_fast_clear
>   meta: Use DSA functions for PBO in create_texture_for_pbo
>   meta: Use _mesa_NamedBufferData and _mesa_NamedBufferSubData for users 
> of _mesa_meta_setup_vertex_objects
>   i965: Use _mesa_NamedBufferSubData for users of 
> _mesa_meta_setup_vertex_objects
>   meta: Don't leave the VBO bound after _mesa_meta_setup_vertex_objects
>   meta: Track VBO using gl_buffer_object instead of GL API object handle
>   meta: Use DSA functions for VBOs in _mesa_meta_setup_vertex_objects
>   meta: Use internal functions for buffer object and VAO access
>   meta: Don't pollute the buffer object namespace in 
> _mesa_meta_setup_vertex_objects
>   meta: Partially convert _mesa_meta_DrawTex to DSA
>   meta: Track VBO using gl_buffer_object instead of GL API object handle 
> in _mesa_meta_DrawTex
>   meta: Use internal functions for buffer object and VAO access in 
> _mesa_meta_DrawTex
>   meta: Don't pollute the buffer object namespace in _mesa_meta_DrawTex
>   meta/TexSubImage: Don't pollute the buffer object namespace
>   meta: Don't save or restore the VBO binding
>   meta: Don't save or restore the active client texture
>   docs: add missed i965 feature to relnotes
> 
> Igor Gnatenko (1):
>   virgl: pipe_virgl_create_screen is not static
> 
> Ilia Mirkin (10):
>   freedreno/a4xx: only align slices in non-layer_first textures
>   freedreno/a4xx: fix 3d texture setup
>   freedreno/a4xx: fix independent blend
>   freedreno/a4xx: disable blending and alphatest for integer rt0
>   nouveau: use the buffer usage to determine placement when no binding
>   nv50,nvc0: properly handle buffer storage invalidation on dsa buffer
>   nv50/ir: fix (un)spilling of 3-wide results
>   freedreno/a4xx: use a factor of 32767 for snorm8 blending
>   docs: add missed freedreno features to relnotes
>   mesa: support GL_RED/GL_RG in ES2 contexts when driver support exists
> 
> Kenneth Graunke (2):
>   i965: Fix fragment shader struct inputs.
>   i965: Fix scalar vertex shader struct outputs.
> 
> Leo Liu (1):
>   radeon/vce: disable Stoney VCE for 11.0
> 
> Nanley Chery (2):
>   mesa/extensions: Enable overriding permanently enabled extensions
>   mesa/teximage: Fix S3TC regression due to ASTC interaction
> 
> Neil Roberts (1):
>   i965: Handle lum, intensity and missing components in the fast clear
> 
> Nicolai Hähnle (1):
>   radeon: only suspend queries on flush if they haven't been suspended yet
> 
> Timothy Arceri (2):
>   Revert "mesa: return initial value for VALIDATE_STATUS if pipe not 
> bound"
>   glsl: implement recent spec update to SSO validation
> 
> Tom Stellard (2):
>   radeonsi: Rename si_shader::ls_rsrc{1,2} to si_shader::rsrc{1,2}
>   radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values
> 
&

[Mesa-dev] [PATCH] clover: Handle NULL pipe_loader_device returned by pipe_loader_probe()

2015-11-30 Thread Tom Stellard
pipe_loader_probe() may initalize an entry in the device list to NULL,
while still counting this device in the number of devices that it
returns, so we need to handle this situation.
---

This is the most simple fix possible to get clover working again.  We can
discuss fixing the other issues in clover in a follow on patch.

src/gallium/state_trackers/clover/core/platform.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/state_trackers/clover/core/platform.cpp 
b/src/gallium/state_trackers/clover/core/platform.cpp
index 328b71c..871b90e 100644
--- a/src/gallium/state_trackers/clover/core/platform.cpp
+++ b/src/gallium/state_trackers/clover/core/platform.cpp
@@ -31,6 +31,9 @@ platform::platform() : adaptor_range(evals(), devs) {
pipe_loader_probe((), n);
 
for (pipe_loader_device *ldev : ldevs) {
+  if (!ldev) {
+ continue;
+  }
   try {
  devs.push_back(create(*this, ldev));
   } catch (error &) {
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: Handle NULL pipe_loader_device returned by pipe_loader_probe()

2015-11-30 Thread Tom Stellard
On Mon, Nov 30, 2015 at 07:57:32PM +0200, Francisco Jerez wrote:
> Tom Stellard <thomas.stell...@amd.com> writes:
> 
> > pipe_loader_probe() may initalize an entry in the device list to NULL,
> > while still counting this device in the number of devices that it
> > returns, so we need to handle this situation.
> 
> If this is related to the patch you sent last Saturday
> (1448679128-20276-1-git-send-email-thomas.stell...@amd.com), I don't
> think that's what happens.  What happens is that pipe_loader_sw_probe()
> returns an incorrect device count the first time around (one regardless
> of whether the software null device is actually available), so Clover
> allocates and zero-initializes a pointer in the ldevs array for a device
> which is never returned by pipe-loader, and then crashes.
> 
> Please mention in the commit message that this is actually working
> around a pipe-loader bug, but it makes sense to do it anyway because it
> fixes the theoretical race condition you pointed out in your last patch.
> 

Sorry, please disregard this.  I got branches mixed up while working on
this and I thought that this was required in addition to 
1448679128-20276-1-git-send-email-thomas.stell...@amd.com.

-Tom

> > ---
> >
> > This is the most simple fix possible to get clover working again.  We can
> > discuss fixing the other issues in clover in a follow on patch.
> >
> > src/gallium/state_trackers/clover/core/platform.cpp | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/src/gallium/state_trackers/clover/core/platform.cpp 
> > b/src/gallium/state_trackers/clover/core/platform.cpp
> > index 328b71c..871b90e 100644
> > --- a/src/gallium/state_trackers/clover/core/platform.cpp
> > +++ b/src/gallium/state_trackers/clover/core/platform.cpp
> > @@ -31,6 +31,9 @@ platform::platform() : adaptor_range(evals(), devs) {
> > pipe_loader_probe((), n);
> >  
> > for (pipe_loader_device *ldev : ldevs) {
> > +  if (!ldev) {
> > + continue;
> > +  }
> >try {
> 
> Just nitpicking now, but I'd prefer to simplify this even more by doing
> the following here:
> 
> +if (ldev)
> 
> >   devs.push_back(create(*this, ldev));
> >} catch (error &) {
> > -- 
> > 2.0.4




> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   5   6   7   8   9   10   >