[PATCH] D36171: AMDGPU: Use direct struct returns
arsenm closed this revision. arsenm added a comment. r310527 https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36171: AMDGPU: Use direct struct returns
yaxunl accepted this revision. yaxunl added a comment. This revision is now accepted and ready to land. LGTM. Thanks! https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36171: AMDGPU: Use direct struct returns
arsenm added inline comments. Comment at: lib/CodeGen/TargetInfo.cpp:7571 + + // XXX: Should this be i64 instead, and should the limit increase? + llvm::Type *I32Ty = llvm::Type::getInt32Ty(getVMContext()); b-sumner wrote: > arsenm wrote: > > b-sumner wrote: > > > What we do here depends on NumRegsLeft when the block is entered and > > > NumRegs. If NumRegsLeft >= NumRegs then we just need 2 adjacent > > > registers. If NumRegsLeft == 1 and NumRegs == 2, then do we pass the low > > > half in a register and the upper half in memory, or all of it in memory? > > > Anyway, I think NumRegsLeft shouldn't be updated until we know it's OK, > > > and then we don't need the min(). > > It's all one or the other. Whether it's passed in memory or not is really > > determined in codegen based on the actual register limit (which is also > > higher than the 16 used here, at least for now). Here selects whether to > > use byval or not. The ABI is slightly different whether it's passed as > > byval or as too many registers. I'm not sure it ever really makes sense to > > use byval yet, so I wasn't trying to be very precise here. > Thanks. Just one more question. If we use memory for an argument, are all > following arguments required to use memory? In that case, the min() is > correct. But if a following argument could use a register, then the amount > to subtract is NumRegs <= NumRegsLeft ? NumRegs : 0. For what this does now, any large aggregates after NumRegsLeft == 0 will use byval. Simple types like int or small structs will still be directly passed arguments. https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36171: AMDGPU: Use direct struct returns
b-sumner added inline comments. Comment at: lib/CodeGen/TargetInfo.cpp:7571 + + // XXX: Should this be i64 instead, and should the limit increase? + llvm::Type *I32Ty = llvm::Type::getInt32Ty(getVMContext()); arsenm wrote: > b-sumner wrote: > > What we do here depends on NumRegsLeft when the block is entered and > > NumRegs. If NumRegsLeft >= NumRegs then we just need 2 adjacent registers. > > If NumRegsLeft == 1 and NumRegs == 2, then do we pass the low half in a > > register and the upper half in memory, or all of it in memory? Anyway, I > > think NumRegsLeft shouldn't be updated until we know it's OK, and then we > > don't need the min(). > It's all one or the other. Whether it's passed in memory or not is really > determined in codegen based on the actual register limit (which is also > higher than the 16 used here, at least for now). Here selects whether to use > byval or not. The ABI is slightly different whether it's passed as byval or > as too many registers. I'm not sure it ever really makes sense to use byval > yet, so I wasn't trying to be very precise here. Thanks. Just one more question. If we use memory for an argument, are all following arguments required to use memory? In that case, the min() is correct. But if a following argument could use a register, then the amount to subtract is NumRegs <= NumRegsLeft ? NumRegs : 0. https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36171: AMDGPU: Use direct struct returns
arsenm added inline comments. Comment at: lib/CodeGen/TargetInfo.cpp:7386 + bool isHomogeneousAggregateBaseType(QualType Ty) const override; + bool isHomogeneousAggregateSmallEnough(const Type *Base, + uint64_t Members) const override; yaxunl wrote: > arsenm wrote: > > yaxunl wrote: > > > Please add descriptions for the above newly added functions. > > I prefer not to put descriptions on overrides since they will just be out > > of date with the declaration > Please add descriptions for the non-override functions and data members above. I've added them to the body Comment at: lib/CodeGen/TargetInfo.cpp:7571 + + // XXX: Should this be i64 instead, and should the limit increase? + llvm::Type *I32Ty = llvm::Type::getInt32Ty(getVMContext()); b-sumner wrote: > What we do here depends on NumRegsLeft when the block is entered and NumRegs. > If NumRegsLeft >= NumRegs then we just need 2 adjacent registers. If > NumRegsLeft == 1 and NumRegs == 2, then do we pass the low half in a register > and the upper half in memory, or all of it in memory? Anyway, I think > NumRegsLeft shouldn't be updated until we know it's OK, and then we don't > need the min(). It's all one or the other. Whether it's passed in memory or not is really determined in codegen based on the actual register limit (which is also higher than the 16 used here, at least for now). Here selects whether to use byval or not. The ABI is slightly different whether it's passed as byval or as too many registers. I'm not sure it ever really makes sense to use byval yet, so I wasn't trying to be very precise here. https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36171: AMDGPU: Use direct struct returns
b-sumner added inline comments. Comment at: lib/CodeGen/TargetInfo.cpp:7571 + + // XXX: Should this be i64 instead, and should the limit increase? + llvm::Type *I32Ty = llvm::Type::getInt32Ty(getVMContext()); What we do here depends on NumRegsLeft when the block is entered and NumRegs. If NumRegsLeft >= NumRegs then we just need 2 adjacent registers. If NumRegsLeft == 1 and NumRegs == 2, then do we pass the low half in a register and the upper half in memory, or all of it in memory? Anyway, I think NumRegsLeft shouldn't be updated until we know it's OK, and then we don't need the min(). https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36171: AMDGPU: Use direct struct returns
arsenm updated this revision to Diff 110272. arsenm added a comment. Fix assert when estimating array registers https://reviews.llvm.org/D36171 Files: lib/CodeGen/TargetInfo.cpp test/CodeGenOpenCL/addr-space-struct-arg.cl test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl test/CodeGenOpenCL/amdgpu-nullptr.cl Index: test/CodeGenOpenCL/amdgpu-nullptr.cl === --- test/CodeGenOpenCL/amdgpu-nullptr.cl +++ test/CodeGenOpenCL/amdgpu-nullptr.cl @@ -511,9 +511,9 @@ // CHECK-LABEL: test_memset_private // CHECK: call void @llvm.memset.p0i8.i64(i8* nonnull {{.*}}, i8 0, i64 40, i32 8, i1 false) -StructTy3 test_memset_private(void) { +void test_memset_private(private StructTy3 *ptr) { StructTy3 S3 = {0, 0, 0, 0, 0}; - return S3; + *ptr = S3; } // Test casting literal 0 to pointer. Index: test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl === --- test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl +++ test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl @@ -2,20 +2,52 @@ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -S -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -triple r600-unknown-unknown -S -emit-llvm -o - %s | FileCheck %s -// CHECK-NOT: %struct.single_element_struct_arg = type { i32 } +typedef __attribute__(( ext_vector_type(2) )) char char2; +typedef __attribute__(( ext_vector_type(3) )) char char3; +typedef __attribute__(( ext_vector_type(4) )) char char4; + +typedef __attribute__(( ext_vector_type(2) )) short short2; +typedef __attribute__(( ext_vector_type(3) )) short short3; +typedef __attribute__(( ext_vector_type(4) )) short short4; + +typedef __attribute__(( ext_vector_type(2) )) int int2; +typedef __attribute__(( ext_vector_type(3) )) int int3; +typedef __attribute__(( ext_vector_type(4) )) int int4; +typedef __attribute__(( ext_vector_type(16) )) int int16; +typedef __attribute__(( ext_vector_type(32) )) int int32; + +// CHECK: %struct.empty_struct = type {} +typedef struct empty_struct +{ +} empty_struct; + +// CHECK-NOT: %struct.single_element_struct_arg typedef struct single_element_struct_arg { int i; } single_element_struct_arg_t; +// CHECK-NOT: %struct.nested_single_element_struct_arg +typedef struct nested_single_element_struct_arg +{ + single_element_struct_arg_t i; +} nested_single_element_struct_arg_t; + // CHECK: %struct.struct_arg = type { i32, float, i32 } typedef struct struct_arg { int i1; float f; int i2; } struct_arg_t; +// CHECK: %struct.struct_padding_arg = type { i8, i64 } +typedef struct struct_padding_arg +{ + char i1; + long f; +} struct_padding_arg; + // CHECK: %struct.struct_of_arrays_arg = type { [2 x i32], float, [4 x i32], [3 x float], i32 } typedef struct struct_of_arrays_arg { @@ -35,33 +67,457 @@ int i2; } struct_of_structs_arg_t; -// CHECK-LABEL: @test_single_element_struct_arg -// CHECK: i32 %arg1.coerce -__kernel void test_single_element_struct_arg(single_element_struct_arg_t arg1) +// CHECK: %union.transparent_u = type { i32 } +typedef union { + int b1; + float b2; +} transparent_u __attribute__((__transparent_union__)); + +// CHECK: %struct.single_array_element_struct_arg = type { [4 x i32] } +typedef struct single_array_element_struct_arg +{ +int i[4]; +} single_array_element_struct_arg_t; + +// CHECK: %struct.single_struct_element_struct_arg = type { %struct.inner } +// CHECK: %struct.inner = type { i32, i64 } +typedef struct single_struct_element_struct_arg +{ + struct inner { +int a; +long b; + } s; +} single_struct_element_struct_arg_t; + +// CHECK: %struct.different_size_type_pair +typedef struct different_size_type_pair { + long l; + int i; +} different_size_type_pair; + +// CHECK: %struct.flexible_array = type { i32, [0 x i32] } +typedef struct flexible_array +{ + int i; + int flexible[]; +} flexible_array; + +// CHECK: %struct.struct_arr16 = type { [16 x i32] } +typedef struct struct_arr16 +{ +int arr[16]; +} struct_arr16; + +// CHECK: %struct.struct_arr32 = type { [32 x i32] } +typedef struct struct_arr32 +{ +int arr[32]; +} struct_arr32; + +// CHECK: %struct.struct_arr33 = type { [33 x i32] } +typedef struct struct_arr33 +{ +int arr[33]; +} struct_arr33; + +// CHECK: %struct.struct_char_arr32 = type { [32 x i8] } +typedef struct struct_char_arr32 +{ + char arr[32]; +} struct_char_arr32; + +// CHECK-NOT: %struct.struct_char_x8 +typedef struct struct_char_x8 { + char x, y, z, w; + char a, b, c, d; +} struct_char_x8; + +// CHECK-NOT: %struct.struct_char_x4 +typedef struct struct_char_x4 { + char x, y, z, w; +} struct_char_x4; + +// CHECK-NOT: %struct.struct_char_x3 +typedef struct struct_char_x3 { + char x, y, z; +} struct_char_x3; + +// CHECK-NOT: %struct.struct_char_x2 +typedef struct struct_char_x2 { + char x, y; +} struct_char_x2; + +// CHECK-NOT: %struct.struct_char_x1 +typedef struct struct_char_x1 { + char x; +}
[PATCH] D36171: AMDGPU: Use direct struct returns
yaxunl added inline comments. Comment at: lib/CodeGen/TargetInfo.cpp:7386 + bool isHomogeneousAggregateBaseType(QualType Ty) const override; + bool isHomogeneousAggregateSmallEnough(const Type *Base, + uint64_t Members) const override; arsenm wrote: > yaxunl wrote: > > Please add descriptions for the above newly added functions. > I prefer not to put descriptions on overrides since they will just be out of > date with the declaration Please add descriptions for the non-override functions and data members above. https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36171: AMDGPU: Use direct struct returns
arsenm added inline comments. Comment at: lib/CodeGen/TargetInfo.cpp:7386 + bool isHomogeneousAggregateBaseType(QualType Ty) const override; + bool isHomogeneousAggregateSmallEnough(const Type *Base, + uint64_t Members) const override; yaxunl wrote: > Please add descriptions for the above newly added functions. I prefer not to put descriptions on overrides since they will just be out of date with the declaration Comment at: lib/CodeGen/TargetInfo.cpp:7401 +bool AMDGPUABIInfo::isHomogeneousAggregateBaseType(QualType Ty) const { + return true; +} yaxunl wrote: > why do we need this function if it always return true The default is return false https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36171: AMDGPU: Use direct struct returns
yaxunl added inline comments. Comment at: lib/CodeGen/TargetInfo.cpp:7386 + bool isHomogeneousAggregateBaseType(QualType Ty) const override; + bool isHomogeneousAggregateSmallEnough(const Type *Base, + uint64_t Members) const override; Please add descriptions for the above newly added functions. Comment at: lib/CodeGen/TargetInfo.cpp:7401 +bool AMDGPUABIInfo::isHomogeneousAggregateBaseType(QualType Ty) const { + return true; +} why do we need this function if it always return true https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36171: AMDGPU: Use direct struct returns
b-sumner added inline comments. Comment at: lib/CodeGen/TargetInfo.cpp:7555 + if (NumRegsLeft > 0) +NumRegsLeft -= (Size + 31) / 32; + Won't NumRegsLeft wrap if size==64 and NumRegsLeft == 1 potentially causing an assert later? https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36171: AMDGPU: Use direct struct returns
arsenm added a comment. ping https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36171: AMDGPU: Use direct struct returns
arsenm created this revision. Herald added subscribers: t-tye, tpr, dstuttard, nhaehnle, wdng, kzhuravl. This is an improvement over always using byval for structs. This will use registers until ~16 are used, and then switch back to byval. This needs more work, since I'm not sure it ever really makes sense to use byval. If the register limit is exceeded, the arguments still end up passed on the stack, but with a different ABI. It also may make sense to base this on number of registers used for non-struct arguments, rather than just arguments that appear first in the argument list. https://reviews.llvm.org/D36171 Files: lib/CodeGen/TargetInfo.cpp test/CodeGenOpenCL/addr-space-struct-arg.cl test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl test/CodeGenOpenCL/amdgpu-nullptr.cl Index: test/CodeGenOpenCL/amdgpu-nullptr.cl === --- test/CodeGenOpenCL/amdgpu-nullptr.cl +++ test/CodeGenOpenCL/amdgpu-nullptr.cl @@ -511,9 +511,9 @@ // CHECK-LABEL: test_memset_private // CHECK: call void @llvm.memset.p0i8.i64(i8* nonnull {{.*}}, i8 0, i64 40, i32 8, i1 false) -StructTy3 test_memset_private(void) { +void test_memset_private(private StructTy3 *ptr) { StructTy3 S3 = {0, 0, 0, 0, 0}; - return S3; + *ptr = S3; } // Test casting literal 0 to pointer. Index: test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl === --- test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl +++ test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl @@ -2,20 +2,52 @@ // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -S -emit-llvm -o - %s | FileCheck %s // RUN: %clang_cc1 -triple r600-unknown-unknown -S -emit-llvm -o - %s | FileCheck %s -// CHECK-NOT: %struct.single_element_struct_arg = type { i32 } +typedef __attribute__(( ext_vector_type(2) )) char char2; +typedef __attribute__(( ext_vector_type(3) )) char char3; +typedef __attribute__(( ext_vector_type(4) )) char char4; + +typedef __attribute__(( ext_vector_type(2) )) short short2; +typedef __attribute__(( ext_vector_type(3) )) short short3; +typedef __attribute__(( ext_vector_type(4) )) short short4; + +typedef __attribute__(( ext_vector_type(2) )) int int2; +typedef __attribute__(( ext_vector_type(3) )) int int3; +typedef __attribute__(( ext_vector_type(4) )) int int4; +typedef __attribute__(( ext_vector_type(16) )) int int16; +typedef __attribute__(( ext_vector_type(32) )) int int32; + +// CHECK: %struct.empty_struct = type {} +typedef struct empty_struct +{ +} empty_struct; + +// CHECK-NOT: %struct.single_element_struct_arg typedef struct single_element_struct_arg { int i; } single_element_struct_arg_t; +// CHECK-NOT: %struct.nested_single_element_struct_arg +typedef struct nested_single_element_struct_arg +{ + single_element_struct_arg_t i; +} nested_single_element_struct_arg_t; + // CHECK: %struct.struct_arg = type { i32, float, i32 } typedef struct struct_arg { int i1; float f; int i2; } struct_arg_t; +// CHECK: %struct.struct_padding_arg = type { i8, i64 } +typedef struct struct_padding_arg +{ + char i1; + long f; +} struct_padding_arg; + // CHECK: %struct.struct_of_arrays_arg = type { [2 x i32], float, [4 x i32], [3 x float], i32 } typedef struct struct_of_arrays_arg { @@ -35,33 +67,454 @@ int i2; } struct_of_structs_arg_t; -// CHECK-LABEL: @test_single_element_struct_arg -// CHECK: i32 %arg1.coerce -__kernel void test_single_element_struct_arg(single_element_struct_arg_t arg1) +// CHECK: %union.transparent_u = type { i32 } +typedef union +{ + int b1; + float b2; +} transparent_u __attribute__((__transparent_union__)); + +// CHECK: %struct.single_array_element_struct_arg = type { [4 x i32] } +typedef struct single_array_element_struct_arg +{ +int i[4]; +} single_array_element_struct_arg_t; + +// CHECK: %struct.single_struct_element_struct_arg = type { %struct.inner } +// CHECK: %struct.inner = type { i32, i64 } +typedef struct single_struct_element_struct_arg +{ + struct inner { +int a; +long b; + } s; +} single_struct_element_struct_arg_t; + +// CHECK: %struct.different_size_type_pair +typedef struct different_size_type_pair { + long l; + int i; +} different_size_type_pair; + +// CHECK: %struct.flexible_array = type { i32, [0 x i32] } +typedef struct flexible_array +{ + int i; + int flexible[]; +} flexible_array; + +// CHECK: %struct.struct_arr16 = type { [16 x i32] } +typedef struct struct_arr16 +{ +int arr[16]; +} struct_arr16; + +// CHECK: %struct.struct_arr32 = type { [32 x i32] } +typedef struct struct_arr32 +{ +int arr[32]; +} struct_arr32; + +// CHECK: %struct.struct_arr33 = type { [33 x i32] } +typedef struct struct_arr33 +{ +int arr[33]; +} struct_arr33; + +// CHECK: %struct.struct_char_arr32 = type { [32 x i8] } +typedef struct struct_char_arr32 +{ + char arr[32]; +} struct_char_arr32; + +// CHECK-NOT: %struct.struct_char_x8 +typedef struct struct_char_x8