Re: [Mesa-dev] [PATCH 1/3] radeonsi: move building llvm.SI.load.const into ac_build_buffer_load

2017-05-26 Thread Nicolai Hähnle

On 22.05.2017 12:18, Marek Olšák wrote:

On Mon, May 22, 2017 at 8:44 AM, Nicolai Hähnle  wrote:

On 19.05.2017 17:54, Marek Olšák wrote:


From: Marek Olšák 

---
  src/amd/common/ac_llvm_build.c   | 50

  src/amd/common/ac_llvm_build.h   |  3 +-
  src/amd/common/ac_nir_to_llvm.c  |  2 +-
  src/gallium/drivers/radeonsi/si_shader.c | 23 +++
  4 files changed, 51 insertions(+), 27 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c
b/src/amd/common/ac_llvm_build.c
index 87a1fb7..4b048ba 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -626,47 +626,73 @@ ac_build_buffer_store_dword(struct ac_llvm_context
*ctx,
  LLVMValueRef
  ac_build_buffer_load(struct ac_llvm_context *ctx,
  LLVMValueRef rsrc,
  int num_channels,
  LLVMValueRef vindex,
  LLVMValueRef voffset,
  LLVMValueRef soffset,
  unsigned inst_offset,
  unsigned glc,
  unsigned slc,
-bool readonly_memory)
+bool readonly_memory,
+bool coherent)



I think the series is basically fine, but the use of "coherent" here is
potentially confusing because "coherent" is already used as the memory
qualifier in shaders, and it sets glc.

Are there any buffer loads with readonly_memory && !glc && !vindex &&
!voffset where we aren't allowed to use SMEM? If possible, we could just use
that to control whether llvm.SI.load.const is used.

I seem to recall that there were some subtle differences in bounds checking,
and perhaps those prevent the use of SMEM sometimes. But then I'd rather use
a more explicit parameter name like smem_forbidden or smem_allowed.


readonly_memory is now slightly misleading. It means that there are no
stores/atomics to the same memory in the current shader, and it
enforces READNONE. You need both non-coherent (stores in other shader
invocations, if any, are not coherent with the current shader) and
readonly_memory (no stores in the current shader touch the memory).
Note that stores in the current shader are always coherent, that's why
there must be none.

Also, relying on GLC would be wrong, because GLC is also used to work
around hw bugs, so GLC=1 can be set even when it's OK to use SMEM.


Okay, that answers my question. Please add this explanation about glc=1, 
coherent=false to the comment in the function. For the series:


Reviewed-by: Nicolai Hähnle 




Marek




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] radeonsi: move building llvm.SI.load.const into ac_build_buffer_load

2017-05-22 Thread Marek Olšák
On Mon, May 22, 2017 at 8:44 AM, Nicolai Hähnle  wrote:
> On 19.05.2017 17:54, Marek Olšák wrote:
>>
>> From: Marek Olšák 
>>
>> ---
>>  src/amd/common/ac_llvm_build.c   | 50
>> 
>>  src/amd/common/ac_llvm_build.h   |  3 +-
>>  src/amd/common/ac_nir_to_llvm.c  |  2 +-
>>  src/gallium/drivers/radeonsi/si_shader.c | 23 +++
>>  4 files changed, 51 insertions(+), 27 deletions(-)
>>
>> diff --git a/src/amd/common/ac_llvm_build.c
>> b/src/amd/common/ac_llvm_build.c
>> index 87a1fb7..4b048ba 100644
>> --- a/src/amd/common/ac_llvm_build.c
>> +++ b/src/amd/common/ac_llvm_build.c
>> @@ -626,47 +626,73 @@ ac_build_buffer_store_dword(struct ac_llvm_context
>> *ctx,
>>  LLVMValueRef
>>  ac_build_buffer_load(struct ac_llvm_context *ctx,
>>  LLVMValueRef rsrc,
>>  int num_channels,
>>  LLVMValueRef vindex,
>>  LLVMValueRef voffset,
>>  LLVMValueRef soffset,
>>  unsigned inst_offset,
>>  unsigned glc,
>>  unsigned slc,
>> -bool readonly_memory)
>> +bool readonly_memory,
>> +bool coherent)
>
>
> I think the series is basically fine, but the use of "coherent" here is
> potentially confusing because "coherent" is already used as the memory
> qualifier in shaders, and it sets glc.
>
> Are there any buffer loads with readonly_memory && !glc && !vindex &&
> !voffset where we aren't allowed to use SMEM? If possible, we could just use
> that to control whether llvm.SI.load.const is used.
>
> I seem to recall that there were some subtle differences in bounds checking,
> and perhaps those prevent the use of SMEM sometimes. But then I'd rather use
> a more explicit parameter name like smem_forbidden or smem_allowed.

readonly_memory is now slightly misleading. It means that there are no
stores/atomics to the same memory in the current shader, and it
enforces READNONE. You need both non-coherent (stores in other shader
invocations, if any, are not coherent with the current shader) and
readonly_memory (no stores in the current shader touch the memory).
Note that stores in the current shader are always coherent, that's why
there must be none.

Also, relying on GLC would be wrong, because GLC is also used to work
around hw bugs, so GLC=1 can be set even when it's OK to use SMEM.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] radeonsi: move building llvm.SI.load.const into ac_build_buffer_load

2017-05-22 Thread Nicolai Hähnle

On 19.05.2017 17:54, Marek Olšák wrote:

From: Marek Olšák 

---
 src/amd/common/ac_llvm_build.c   | 50 
 src/amd/common/ac_llvm_build.h   |  3 +-
 src/amd/common/ac_nir_to_llvm.c  |  2 +-
 src/gallium/drivers/radeonsi/si_shader.c | 23 +++
 4 files changed, 51 insertions(+), 27 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 87a1fb7..4b048ba 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -626,47 +626,73 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx,
 LLVMValueRef
 ac_build_buffer_load(struct ac_llvm_context *ctx,
 LLVMValueRef rsrc,
 int num_channels,
 LLVMValueRef vindex,
 LLVMValueRef voffset,
 LLVMValueRef soffset,
 unsigned inst_offset,
 unsigned glc,
 unsigned slc,
-bool readonly_memory)
+bool readonly_memory,
+bool coherent)


I think the series is basically fine, but the use of "coherent" here is 
potentially confusing because "coherent" is already used as the memory 
qualifier in shaders, and it sets glc.


Are there any buffer loads with readonly_memory && !glc && !vindex && 
!voffset where we aren't allowed to use SMEM? If possible, we could just 
use that to control whether llvm.SI.load.const is used.


I seem to recall that there were some subtle differences in bounds 
checking, and perhaps those prevent the use of SMEM sometimes. But then 
I'd rather use a more explicit parameter name like smem_forbidden or 
smem_allowed.


Cheers,
Nicolai



 {
+   LLVMValueRef offset = LLVMConstInt(ctx->i32, inst_offset, 0);
+   if (voffset)
+   offset = LLVMBuildAdd(ctx->builder, offset, voffset, "");
+   if (soffset)
+   offset = LLVMBuildAdd(ctx->builder, offset, soffset, "");
+
+   /* Loads from a shader buffer that has no stores in the same shader
+* and is non-coherent with other shader invocations can use SMEM.
+*/
+   if (readonly_memory && !coherent) {
+   assert(vindex == NULL);
+   assert(glc == 0);
+   assert(slc == 0);
+
+   LLVMValueRef result[4];
+
+   for (int i = 0; i < num_channels; i++) {
+   if (i) {
+   offset = LLVMBuildAdd(ctx->builder, offset,
+ LLVMConstInt(ctx->i32, 4, 0), 
"");
+   }
+   LLVMValueRef args[2] = {rsrc, offset};
+   result[i] = ac_build_intrinsic(ctx, 
"llvm.SI.load.const.v4i32",
+  ctx->f32, args, 2,
+  AC_FUNC_ATTR_READNONE |
+  AC_FUNC_ATTR_LEGACY);
+   }
+   if (num_channels == 1)
+   return result[0];
+
+   if (num_channels == 3)
+   result[num_channels++] = LLVMGetUndef(ctx->f32);
+   return ac_build_gather_values(ctx, result, num_channels);
+   }
+
unsigned func = CLAMP(num_channels, 1, 3) - 1;

LLVMValueRef args[] = {
LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""),
vindex ? vindex : LLVMConstInt(ctx->i32, 0, 0),
-   LLVMConstInt(ctx->i32, inst_offset, 0),
+   offset,
LLVMConstInt(ctx->i1, glc, 0),
LLVMConstInt(ctx->i1, slc, 0)
};

LLVMTypeRef types[] = {ctx->f32, LLVMVectorType(ctx->f32, 2),
   ctx->v4f32};
const char *type_names[] = {"f32", "v2f32", "v4f32"};
char name[256];

-   if (voffset) {
-   args[2] = LLVMBuildAdd(ctx->builder, args[2], voffset,
-   "");
-   }
-
-   if (soffset) {
-   args[2] = LLVMBuildAdd(ctx->builder, args[2], soffset,
-   "");
-   }
-
snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s",
 type_names[func]);

return ac_build_intrinsic(ctx, name, types[func], args,
  ARRAY_SIZE(args),
  /* READNONE means writes can't affect it, 
while
   * READONLY means that writes can affect it. 
*/
  readonly_memory && HAVE_LLVM >= 0x0400 ?
  AC_FUNC_ATTR_READNONE :
  AC_FUNC_ATTR_READONLY);
diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
index 0ecbc4a..754461b 100644
--- 

[Mesa-dev] [PATCH 1/3] radeonsi: move building llvm.SI.load.const into ac_build_buffer_load

2017-05-19 Thread Marek Olšák
From: Marek Olšák 

---
 src/amd/common/ac_llvm_build.c   | 50 
 src/amd/common/ac_llvm_build.h   |  3 +-
 src/amd/common/ac_nir_to_llvm.c  |  2 +-
 src/gallium/drivers/radeonsi/si_shader.c | 23 +++
 4 files changed, 51 insertions(+), 27 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 87a1fb7..4b048ba 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -626,47 +626,73 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx,
 LLVMValueRef
 ac_build_buffer_load(struct ac_llvm_context *ctx,
 LLVMValueRef rsrc,
 int num_channels,
 LLVMValueRef vindex,
 LLVMValueRef voffset,
 LLVMValueRef soffset,
 unsigned inst_offset,
 unsigned glc,
 unsigned slc,
-bool readonly_memory)
+bool readonly_memory,
+bool coherent)
 {
+   LLVMValueRef offset = LLVMConstInt(ctx->i32, inst_offset, 0);
+   if (voffset)
+   offset = LLVMBuildAdd(ctx->builder, offset, voffset, "");
+   if (soffset)
+   offset = LLVMBuildAdd(ctx->builder, offset, soffset, "");
+
+   /* Loads from a shader buffer that has no stores in the same shader
+* and is non-coherent with other shader invocations can use SMEM.
+*/
+   if (readonly_memory && !coherent) {
+   assert(vindex == NULL);
+   assert(glc == 0);
+   assert(slc == 0);
+
+   LLVMValueRef result[4];
+
+   for (int i = 0; i < num_channels; i++) {
+   if (i) {
+   offset = LLVMBuildAdd(ctx->builder, offset,
+ LLVMConstInt(ctx->i32, 4, 
0), "");
+   }
+   LLVMValueRef args[2] = {rsrc, offset};
+   result[i] = ac_build_intrinsic(ctx, 
"llvm.SI.load.const.v4i32",
+  ctx->f32, args, 2,
+  AC_FUNC_ATTR_READNONE |
+  AC_FUNC_ATTR_LEGACY);
+   }
+   if (num_channels == 1)
+   return result[0];
+
+   if (num_channels == 3)
+   result[num_channels++] = LLVMGetUndef(ctx->f32);
+   return ac_build_gather_values(ctx, result, num_channels);
+   }
+
unsigned func = CLAMP(num_channels, 1, 3) - 1;
 
LLVMValueRef args[] = {
LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""),
vindex ? vindex : LLVMConstInt(ctx->i32, 0, 0),
-   LLVMConstInt(ctx->i32, inst_offset, 0),
+   offset,
LLVMConstInt(ctx->i1, glc, 0),
LLVMConstInt(ctx->i1, slc, 0)
};
 
LLVMTypeRef types[] = {ctx->f32, LLVMVectorType(ctx->f32, 2),
   ctx->v4f32};
const char *type_names[] = {"f32", "v2f32", "v4f32"};
char name[256];
 
-   if (voffset) {
-   args[2] = LLVMBuildAdd(ctx->builder, args[2], voffset,
-   "");
-   }
-
-   if (soffset) {
-   args[2] = LLVMBuildAdd(ctx->builder, args[2], soffset,
-   "");
-   }
-
snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s",
 type_names[func]);
 
return ac_build_intrinsic(ctx, name, types[func], args,
  ARRAY_SIZE(args),
  /* READNONE means writes can't affect it, 
while
   * READONLY means that writes can affect it. 
*/
  readonly_memory && HAVE_LLVM >= 0x0400 ?
  AC_FUNC_ATTR_READNONE :
  AC_FUNC_ATTR_READONLY);
diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
index 0ecbc4a..754461b 100644
--- a/src/amd/common/ac_llvm_build.h
+++ b/src/amd/common/ac_llvm_build.h
@@ -136,21 +136,22 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx,
 LLVMValueRef
 ac_build_buffer_load(struct ac_llvm_context *ctx,
 LLVMValueRef rsrc,
 int num_channels,
 LLVMValueRef vindex,
 LLVMValueRef voffset,
 LLVMValueRef soffset,
 unsigned inst_offset,
 unsigned glc,
 unsigned slc,
-bool readonly_memory);
+bool readonly_memory,
+bool coherent);
 
 LLVMValueRef