Re: [Mesa-dev] [PATCH 1/3] radeonsi: move building llvm.SI.load.const into ac_build_buffer_load
On 22.05.2017 12:18, Marek Olšák wrote: On Mon, May 22, 2017 at 8:44 AM, Nicolai Hähnlewrote: On 19.05.2017 17:54, Marek Olšák wrote: From: Marek Olšák --- src/amd/common/ac_llvm_build.c | 50 src/amd/common/ac_llvm_build.h | 3 +- src/amd/common/ac_nir_to_llvm.c | 2 +- src/gallium/drivers/radeonsi/si_shader.c | 23 +++ 4 files changed, 51 insertions(+), 27 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 87a1fb7..4b048ba 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -626,47 +626,73 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx, LLVMValueRef ac_build_buffer_load(struct ac_llvm_context *ctx, LLVMValueRef rsrc, int num_channels, LLVMValueRef vindex, LLVMValueRef voffset, LLVMValueRef soffset, unsigned inst_offset, unsigned glc, unsigned slc, -bool readonly_memory) +bool readonly_memory, +bool coherent) I think the series is basically fine, but the use of "coherent" here is potentially confusing because "coherent" is already used as the memory qualifier in shaders, and it sets glc. Are there any buffer loads with readonly_memory && !glc && !vindex && !voffset where we aren't allowed to use SMEM? If possible, we could just use that to control whether llvm.SI.load.const is used. I seem to recall that there were some subtle differences in bounds checking, and perhaps those prevent the use of SMEM sometimes. But then I'd rather use a more explicit parameter name like smem_forbidden or smem_allowed. readonly_memory is now slightly misleading. It means that there are no stores/atomics to the same memory in the current shader, and it enforces READNONE. You need both non-coherent (stores in other shader invocations, if any, are not coherent with the current shader) and readonly_memory (no stores in the current shader touch the memory). Note that stores in the current shader are always coherent, that's why there must be none. Also, relying on GLC would be wrong, because GLC is also used to work around hw bugs, so GLC=1 can be set even when it's OK to use SMEM. Okay, that answers my question. Please add this explanation about glc=1, coherent=false to the comment in the function. For the series: Reviewed-by: Nicolai Hähnle Marek -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] radeonsi: move building llvm.SI.load.const into ac_build_buffer_load
On Mon, May 22, 2017 at 8:44 AM, Nicolai Hähnlewrote: > On 19.05.2017 17:54, Marek Olšák wrote: >> >> From: Marek Olšák >> >> --- >> src/amd/common/ac_llvm_build.c | 50 >> >> src/amd/common/ac_llvm_build.h | 3 +- >> src/amd/common/ac_nir_to_llvm.c | 2 +- >> src/gallium/drivers/radeonsi/si_shader.c | 23 +++ >> 4 files changed, 51 insertions(+), 27 deletions(-) >> >> diff --git a/src/amd/common/ac_llvm_build.c >> b/src/amd/common/ac_llvm_build.c >> index 87a1fb7..4b048ba 100644 >> --- a/src/amd/common/ac_llvm_build.c >> +++ b/src/amd/common/ac_llvm_build.c >> @@ -626,47 +626,73 @@ ac_build_buffer_store_dword(struct ac_llvm_context >> *ctx, >> LLVMValueRef >> ac_build_buffer_load(struct ac_llvm_context *ctx, >> LLVMValueRef rsrc, >> int num_channels, >> LLVMValueRef vindex, >> LLVMValueRef voffset, >> LLVMValueRef soffset, >> unsigned inst_offset, >> unsigned glc, >> unsigned slc, >> -bool readonly_memory) >> +bool readonly_memory, >> +bool coherent) > > > I think the series is basically fine, but the use of "coherent" here is > potentially confusing because "coherent" is already used as the memory > qualifier in shaders, and it sets glc. > > Are there any buffer loads with readonly_memory && !glc && !vindex && > !voffset where we aren't allowed to use SMEM? If possible, we could just use > that to control whether llvm.SI.load.const is used. > > I seem to recall that there were some subtle differences in bounds checking, > and perhaps those prevent the use of SMEM sometimes. But then I'd rather use > a more explicit parameter name like smem_forbidden or smem_allowed. readonly_memory is now slightly misleading. It means that there are no stores/atomics to the same memory in the current shader, and it enforces READNONE. You need both non-coherent (stores in other shader invocations, if any, are not coherent with the current shader) and readonly_memory (no stores in the current shader touch the memory). Note that stores in the current shader are always coherent, that's why there must be none. Also, relying on GLC would be wrong, because GLC is also used to work around hw bugs, so GLC=1 can be set even when it's OK to use SMEM. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] radeonsi: move building llvm.SI.load.const into ac_build_buffer_load
On 19.05.2017 17:54, Marek Olšák wrote: From: Marek Olšák--- src/amd/common/ac_llvm_build.c | 50 src/amd/common/ac_llvm_build.h | 3 +- src/amd/common/ac_nir_to_llvm.c | 2 +- src/gallium/drivers/radeonsi/si_shader.c | 23 +++ 4 files changed, 51 insertions(+), 27 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 87a1fb7..4b048ba 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -626,47 +626,73 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx, LLVMValueRef ac_build_buffer_load(struct ac_llvm_context *ctx, LLVMValueRef rsrc, int num_channels, LLVMValueRef vindex, LLVMValueRef voffset, LLVMValueRef soffset, unsigned inst_offset, unsigned glc, unsigned slc, -bool readonly_memory) +bool readonly_memory, +bool coherent) I think the series is basically fine, but the use of "coherent" here is potentially confusing because "coherent" is already used as the memory qualifier in shaders, and it sets glc. Are there any buffer loads with readonly_memory && !glc && !vindex && !voffset where we aren't allowed to use SMEM? If possible, we could just use that to control whether llvm.SI.load.const is used. I seem to recall that there were some subtle differences in bounds checking, and perhaps those prevent the use of SMEM sometimes. But then I'd rather use a more explicit parameter name like smem_forbidden or smem_allowed. Cheers, Nicolai { + LLVMValueRef offset = LLVMConstInt(ctx->i32, inst_offset, 0); + if (voffset) + offset = LLVMBuildAdd(ctx->builder, offset, voffset, ""); + if (soffset) + offset = LLVMBuildAdd(ctx->builder, offset, soffset, ""); + + /* Loads from a shader buffer that has no stores in the same shader +* and is non-coherent with other shader invocations can use SMEM. +*/ + if (readonly_memory && !coherent) { + assert(vindex == NULL); + assert(glc == 0); + assert(slc == 0); + + LLVMValueRef result[4]; + + for (int i = 0; i < num_channels; i++) { + if (i) { + offset = LLVMBuildAdd(ctx->builder, offset, + LLVMConstInt(ctx->i32, 4, 0), ""); + } + LLVMValueRef args[2] = {rsrc, offset}; + result[i] = ac_build_intrinsic(ctx, "llvm.SI.load.const.v4i32", + ctx->f32, args, 2, + AC_FUNC_ATTR_READNONE | + AC_FUNC_ATTR_LEGACY); + } + if (num_channels == 1) + return result[0]; + + if (num_channels == 3) + result[num_channels++] = LLVMGetUndef(ctx->f32); + return ac_build_gather_values(ctx, result, num_channels); + } + unsigned func = CLAMP(num_channels, 1, 3) - 1; LLVMValueRef args[] = { LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""), vindex ? vindex : LLVMConstInt(ctx->i32, 0, 0), - LLVMConstInt(ctx->i32, inst_offset, 0), + offset, LLVMConstInt(ctx->i1, glc, 0), LLVMConstInt(ctx->i1, slc, 0) }; LLVMTypeRef types[] = {ctx->f32, LLVMVectorType(ctx->f32, 2), ctx->v4f32}; const char *type_names[] = {"f32", "v2f32", "v4f32"}; char name[256]; - if (voffset) { - args[2] = LLVMBuildAdd(ctx->builder, args[2], voffset, - ""); - } - - if (soffset) { - args[2] = LLVMBuildAdd(ctx->builder, args[2], soffset, - ""); - } - snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s", type_names[func]); return ac_build_intrinsic(ctx, name, types[func], args, ARRAY_SIZE(args), /* READNONE means writes can't affect it, while * READONLY means that writes can affect it. */ readonly_memory && HAVE_LLVM >= 0x0400 ? AC_FUNC_ATTR_READNONE : AC_FUNC_ATTR_READONLY); diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h index 0ecbc4a..754461b 100644 ---
[Mesa-dev] [PATCH 1/3] radeonsi: move building llvm.SI.load.const into ac_build_buffer_load
From: Marek Olšák--- src/amd/common/ac_llvm_build.c | 50 src/amd/common/ac_llvm_build.h | 3 +- src/amd/common/ac_nir_to_llvm.c | 2 +- src/gallium/drivers/radeonsi/si_shader.c | 23 +++ 4 files changed, 51 insertions(+), 27 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 87a1fb7..4b048ba 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -626,47 +626,73 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx, LLVMValueRef ac_build_buffer_load(struct ac_llvm_context *ctx, LLVMValueRef rsrc, int num_channels, LLVMValueRef vindex, LLVMValueRef voffset, LLVMValueRef soffset, unsigned inst_offset, unsigned glc, unsigned slc, -bool readonly_memory) +bool readonly_memory, +bool coherent) { + LLVMValueRef offset = LLVMConstInt(ctx->i32, inst_offset, 0); + if (voffset) + offset = LLVMBuildAdd(ctx->builder, offset, voffset, ""); + if (soffset) + offset = LLVMBuildAdd(ctx->builder, offset, soffset, ""); + + /* Loads from a shader buffer that has no stores in the same shader +* and is non-coherent with other shader invocations can use SMEM. +*/ + if (readonly_memory && !coherent) { + assert(vindex == NULL); + assert(glc == 0); + assert(slc == 0); + + LLVMValueRef result[4]; + + for (int i = 0; i < num_channels; i++) { + if (i) { + offset = LLVMBuildAdd(ctx->builder, offset, + LLVMConstInt(ctx->i32, 4, 0), ""); + } + LLVMValueRef args[2] = {rsrc, offset}; + result[i] = ac_build_intrinsic(ctx, "llvm.SI.load.const.v4i32", + ctx->f32, args, 2, + AC_FUNC_ATTR_READNONE | + AC_FUNC_ATTR_LEGACY); + } + if (num_channels == 1) + return result[0]; + + if (num_channels == 3) + result[num_channels++] = LLVMGetUndef(ctx->f32); + return ac_build_gather_values(ctx, result, num_channels); + } + unsigned func = CLAMP(num_channels, 1, 3) - 1; LLVMValueRef args[] = { LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""), vindex ? vindex : LLVMConstInt(ctx->i32, 0, 0), - LLVMConstInt(ctx->i32, inst_offset, 0), + offset, LLVMConstInt(ctx->i1, glc, 0), LLVMConstInt(ctx->i1, slc, 0) }; LLVMTypeRef types[] = {ctx->f32, LLVMVectorType(ctx->f32, 2), ctx->v4f32}; const char *type_names[] = {"f32", "v2f32", "v4f32"}; char name[256]; - if (voffset) { - args[2] = LLVMBuildAdd(ctx->builder, args[2], voffset, - ""); - } - - if (soffset) { - args[2] = LLVMBuildAdd(ctx->builder, args[2], soffset, - ""); - } - snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s", type_names[func]); return ac_build_intrinsic(ctx, name, types[func], args, ARRAY_SIZE(args), /* READNONE means writes can't affect it, while * READONLY means that writes can affect it. */ readonly_memory && HAVE_LLVM >= 0x0400 ? AC_FUNC_ATTR_READNONE : AC_FUNC_ATTR_READONLY); diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h index 0ecbc4a..754461b 100644 --- a/src/amd/common/ac_llvm_build.h +++ b/src/amd/common/ac_llvm_build.h @@ -136,21 +136,22 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx, LLVMValueRef ac_build_buffer_load(struct ac_llvm_context *ctx, LLVMValueRef rsrc, int num_channels, LLVMValueRef vindex, LLVMValueRef voffset, LLVMValueRef soffset, unsigned inst_offset, unsigned glc, unsigned slc, -bool readonly_memory); +bool readonly_memory, +bool coherent); LLVMValueRef