[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-09 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision.
arsenm added a comment.

r310527


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-09 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl accepted this revision.
yaxunl added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks!


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7571
+
+  // XXX: Should this be i64 instead, and should the limit increase?
+  llvm::Type *I32Ty = llvm::Type::getInt32Ty(getVMContext());

b-sumner wrote:
> arsenm wrote:
> > b-sumner wrote:
> > > What we do here depends on NumRegsLeft when the block is entered and 
> > > NumRegs.  If NumRegsLeft >= NumRegs then we just need 2 adjacent 
> > > registers.  If NumRegsLeft == 1 and NumRegs == 2, then do we pass the low 
> > > half in a register and the upper half in memory, or all of it in memory?  
> > > Anyway, I think NumRegsLeft shouldn't be updated until we know it's OK, 
> > > and then we don't need the min().
> > It's all one or the other. Whether it's passed in memory or not is really 
> > determined in codegen based on the actual register limit (which is also 
> > higher than the 16 used here, at least for now). Here selects whether to 
> > use byval or not. The ABI is slightly different whether it's passed as 
> > byval or as too many registers. I'm not sure it ever really makes sense to 
> > use byval yet, so I wasn't trying to be very precise here.
> Thanks.  Just one more question.  If we use memory for an argument, are all 
> following arguments required to use memory?  In that case, the min() is 
> correct.  But if a following argument could use a register, then the amount 
> to subtract is NumRegs <= NumRegsLeft ? NumRegs : 0.
For what this does now, any large aggregates after NumRegsLeft == 0 will use 
byval. Simple types like int or small structs will still be directly passed 
arguments.


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Brian Sumner via Phabricator via cfe-commits
b-sumner added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7571
+
+  // XXX: Should this be i64 instead, and should the limit increase?
+  llvm::Type *I32Ty = llvm::Type::getInt32Ty(getVMContext());

arsenm wrote:
> b-sumner wrote:
> > What we do here depends on NumRegsLeft when the block is entered and 
> > NumRegs.  If NumRegsLeft >= NumRegs then we just need 2 adjacent registers. 
> >  If NumRegsLeft == 1 and NumRegs == 2, then do we pass the low half in a 
> > register and the upper half in memory, or all of it in memory?  Anyway, I 
> > think NumRegsLeft shouldn't be updated until we know it's OK, and then we 
> > don't need the min().
> It's all one or the other. Whether it's passed in memory or not is really 
> determined in codegen based on the actual register limit (which is also 
> higher than the 16 used here, at least for now). Here selects whether to use 
> byval or not. The ABI is slightly different whether it's passed as byval or 
> as too many registers. I'm not sure it ever really makes sense to use byval 
> yet, so I wasn't trying to be very precise here.
Thanks.  Just one more question.  If we use memory for an argument, are all 
following arguments required to use memory?  In that case, the min() is 
correct.  But if a following argument could use a register, then the amount to 
subtract is NumRegs <= NumRegsLeft ? NumRegs : 0.


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7386
+  bool isHomogeneousAggregateBaseType(QualType Ty) const override;
+  bool isHomogeneousAggregateSmallEnough(const Type *Base,
+ uint64_t Members) const override;

yaxunl wrote:
> arsenm wrote:
> > yaxunl wrote:
> > > Please add descriptions for the above newly added functions.
> > I prefer not to put descriptions on overrides since they will just be out 
> > of date with the declaration
> Please add descriptions for the non-override functions and data members above.
I've added them to the body



Comment at: lib/CodeGen/TargetInfo.cpp:7571
+
+  // XXX: Should this be i64 instead, and should the limit increase?
+  llvm::Type *I32Ty = llvm::Type::getInt32Ty(getVMContext());

b-sumner wrote:
> What we do here depends on NumRegsLeft when the block is entered and NumRegs. 
>  If NumRegsLeft >= NumRegs then we just need 2 adjacent registers.  If 
> NumRegsLeft == 1 and NumRegs == 2, then do we pass the low half in a register 
> and the upper half in memory, or all of it in memory?  Anyway, I think 
> NumRegsLeft shouldn't be updated until we know it's OK, and then we don't 
> need the min().
It's all one or the other. Whether it's passed in memory or not is really 
determined in codegen based on the actual register limit (which is also higher 
than the 16 used here, at least for now). Here selects whether to use byval or 
not. The ABI is slightly different whether it's passed as byval or as too many 
registers. I'm not sure it ever really makes sense to use byval yet, so I 
wasn't trying to be very precise here.


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Brian Sumner via Phabricator via cfe-commits
b-sumner added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7571
+
+  // XXX: Should this be i64 instead, and should the limit increase?
+  llvm::Type *I32Ty = llvm::Type::getInt32Ty(getVMContext());

What we do here depends on NumRegsLeft when the block is entered and NumRegs.  
If NumRegsLeft >= NumRegs then we just need 2 adjacent registers.  If 
NumRegsLeft == 1 and NumRegs == 2, then do we pass the low half in a register 
and the upper half in memory, or all of it in memory?  Anyway, I think 
NumRegsLeft shouldn't be updated until we know it's OK, and then we don't need 
the min().


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 110272.
arsenm added a comment.

Fix assert when estimating array registers


https://reviews.llvm.org/D36171

Files:
  lib/CodeGen/TargetInfo.cpp
  test/CodeGenOpenCL/addr-space-struct-arg.cl
  test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl
  test/CodeGenOpenCL/amdgpu-nullptr.cl

Index: test/CodeGenOpenCL/amdgpu-nullptr.cl
===
--- test/CodeGenOpenCL/amdgpu-nullptr.cl
+++ test/CodeGenOpenCL/amdgpu-nullptr.cl
@@ -511,9 +511,9 @@
 
 // CHECK-LABEL: test_memset_private
 // CHECK: call void @llvm.memset.p0i8.i64(i8* nonnull {{.*}}, i8 0, i64 40, i32 8, i1 false)
-StructTy3 test_memset_private(void) {
+void test_memset_private(private StructTy3 *ptr) {
   StructTy3 S3 = {0, 0, 0, 0, 0};
-  return S3;
+  *ptr = S3;
 }
 
 // Test casting literal 0 to pointer.
Index: test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl
===
--- test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl
+++ test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl
@@ -2,20 +2,52 @@
 // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -S -emit-llvm -o - %s | FileCheck %s
 // RUN: %clang_cc1 -triple r600-unknown-unknown -S -emit-llvm -o - %s | FileCheck %s
 
-// CHECK-NOT: %struct.single_element_struct_arg = type { i32 }
+typedef __attribute__(( ext_vector_type(2) )) char char2;
+typedef __attribute__(( ext_vector_type(3) )) char char3;
+typedef __attribute__(( ext_vector_type(4) )) char char4;
+
+typedef __attribute__(( ext_vector_type(2) )) short short2;
+typedef __attribute__(( ext_vector_type(3) )) short short3;
+typedef __attribute__(( ext_vector_type(4) )) short short4;
+
+typedef __attribute__(( ext_vector_type(2) )) int int2;
+typedef __attribute__(( ext_vector_type(3) )) int int3;
+typedef __attribute__(( ext_vector_type(4) )) int int4;
+typedef __attribute__(( ext_vector_type(16) )) int int16;
+typedef __attribute__(( ext_vector_type(32) )) int int32;
+
+// CHECK: %struct.empty_struct = type {}
+typedef struct empty_struct
+{
+} empty_struct;
+
+// CHECK-NOT: %struct.single_element_struct_arg
 typedef struct single_element_struct_arg
 {
 int i;
 } single_element_struct_arg_t;
 
+// CHECK-NOT: %struct.nested_single_element_struct_arg
+typedef struct nested_single_element_struct_arg
+{
+  single_element_struct_arg_t i;
+} nested_single_element_struct_arg_t;
+
 // CHECK: %struct.struct_arg = type { i32, float, i32 }
 typedef struct struct_arg
 {
 int i1;
 float f;
 int i2;
 } struct_arg_t;
 
+// CHECK: %struct.struct_padding_arg = type { i8, i64 }
+typedef struct struct_padding_arg
+{
+  char i1;
+  long f;
+} struct_padding_arg;
+
 // CHECK: %struct.struct_of_arrays_arg = type { [2 x i32], float, [4 x i32], [3 x float], i32 }
 typedef struct struct_of_arrays_arg
 {
@@ -35,33 +67,457 @@
 int i2;
 } struct_of_structs_arg_t;
 
-// CHECK-LABEL: @test_single_element_struct_arg
-// CHECK: i32 %arg1.coerce
-__kernel void test_single_element_struct_arg(single_element_struct_arg_t arg1)
+// CHECK: %union.transparent_u = type { i32 }
+typedef union
 {
+  int b1;
+  float b2;
+} transparent_u __attribute__((__transparent_union__));
+
+// CHECK: %struct.single_array_element_struct_arg = type { [4 x i32] }
+typedef struct single_array_element_struct_arg
+{
+int i[4];
+} single_array_element_struct_arg_t;
+
+// CHECK: %struct.single_struct_element_struct_arg = type { %struct.inner }
+// CHECK: %struct.inner = type { i32, i64 }
+typedef struct single_struct_element_struct_arg
+{
+  struct inner {
+int a;
+long b;
+  } s;
+} single_struct_element_struct_arg_t;
+
+// CHECK: %struct.different_size_type_pair
+typedef struct different_size_type_pair {
+  long l;
+  int i;
+} different_size_type_pair;
+
+// CHECK: %struct.flexible_array = type { i32, [0 x i32] }
+typedef struct flexible_array
+{
+  int i;
+  int flexible[];
+} flexible_array;
+
+// CHECK: %struct.struct_arr16 = type { [16 x i32] }
+typedef struct struct_arr16
+{
+int arr[16];
+} struct_arr16;
+
+// CHECK: %struct.struct_arr32 = type { [32 x i32] }
+typedef struct struct_arr32
+{
+int arr[32];
+} struct_arr32;
+
+// CHECK: %struct.struct_arr33 = type { [33 x i32] }
+typedef struct struct_arr33
+{
+int arr[33];
+} struct_arr33;
+
+// CHECK: %struct.struct_char_arr32 = type { [32 x i8] }
+typedef struct struct_char_arr32
+{
+  char arr[32];
+} struct_char_arr32;
+
+// CHECK-NOT: %struct.struct_char_x8
+typedef struct struct_char_x8 {
+  char x, y, z, w;
+  char a, b, c, d;
+} struct_char_x8;
+
+// CHECK-NOT: %struct.struct_char_x4
+typedef struct struct_char_x4 {
+  char x, y, z, w;
+} struct_char_x4;
+
+// CHECK-NOT: %struct.struct_char_x3
+typedef struct struct_char_x3 {
+  char x, y, z;
+} struct_char_x3;
+
+// CHECK-NOT: %struct.struct_char_x2
+typedef struct struct_char_x2 {
+  char x, y;
+} struct_char_x2;
+
+// CHECK-NOT: %struct.struct_char_x1
+typedef struct struct_char_x1 {
+  char x;
+} 

[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7386
+  bool isHomogeneousAggregateBaseType(QualType Ty) const override;
+  bool isHomogeneousAggregateSmallEnough(const Type *Base,
+ uint64_t Members) const override;

arsenm wrote:
> yaxunl wrote:
> > Please add descriptions for the above newly added functions.
> I prefer not to put descriptions on overrides since they will just be out of 
> date with the declaration
Please add descriptions for the non-override functions and data members above.


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7386
+  bool isHomogeneousAggregateBaseType(QualType Ty) const override;
+  bool isHomogeneousAggregateSmallEnough(const Type *Base,
+ uint64_t Members) const override;

yaxunl wrote:
> Please add descriptions for the above newly added functions.
I prefer not to put descriptions on overrides since they will just be out of 
date with the declaration



Comment at: lib/CodeGen/TargetInfo.cpp:7401
+bool AMDGPUABIInfo::isHomogeneousAggregateBaseType(QualType Ty) const {
+  return true;
+}

yaxunl wrote:
> why do we need this function if it always return true
The default is return false


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7386
+  bool isHomogeneousAggregateBaseType(QualType Ty) const override;
+  bool isHomogeneousAggregateSmallEnough(const Type *Base,
+ uint64_t Members) const override;

Please add descriptions for the above newly added functions.



Comment at: lib/CodeGen/TargetInfo.cpp:7401
+bool AMDGPUABIInfo::isHomogeneousAggregateBaseType(QualType Ty) const {
+  return true;
+}

why do we need this function if it always return true


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-07 Thread Brian Sumner via Phabricator via cfe-commits
b-sumner added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7555
+  if (NumRegsLeft > 0)
+NumRegsLeft -= (Size + 31) / 32;
+

Won't NumRegsLeft wrap if size==64 and NumRegsLeft == 1 potentially causing an 
assert later?


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-07 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

ping


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-01 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision.
Herald added subscribers: t-tye, tpr, dstuttard, nhaehnle, wdng, kzhuravl.

This is an improvement over always using byval for
structs.

  

This will use registers until ~16 are used, and then
switch back to byval. This needs more work, since I'm
not sure it ever really makes sense to use byval. If
the register limit is exceeded, the arguments still
end up passed on the stack, but with a different ABI.
It also may make sense to base this on number of
registers used for non-struct arguments, rather than
just arguments that appear first in the argument list.


https://reviews.llvm.org/D36171

Files:
  lib/CodeGen/TargetInfo.cpp
  test/CodeGenOpenCL/addr-space-struct-arg.cl
  test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl
  test/CodeGenOpenCL/amdgpu-nullptr.cl

Index: test/CodeGenOpenCL/amdgpu-nullptr.cl
===
--- test/CodeGenOpenCL/amdgpu-nullptr.cl
+++ test/CodeGenOpenCL/amdgpu-nullptr.cl
@@ -511,9 +511,9 @@
 
 // CHECK-LABEL: test_memset_private
 // CHECK: call void @llvm.memset.p0i8.i64(i8* nonnull {{.*}}, i8 0, i64 40, i32 8, i1 false)
-StructTy3 test_memset_private(void) {
+void test_memset_private(private StructTy3 *ptr) {
   StructTy3 S3 = {0, 0, 0, 0, 0};
-  return S3;
+  *ptr = S3;
 }
 
 // Test casting literal 0 to pointer.
Index: test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl
===
--- test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl
+++ test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl
@@ -2,20 +2,52 @@
 // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -S -emit-llvm -o - %s | FileCheck %s
 // RUN: %clang_cc1 -triple r600-unknown-unknown -S -emit-llvm -o - %s | FileCheck %s
 
-// CHECK-NOT: %struct.single_element_struct_arg = type { i32 }
+typedef __attribute__(( ext_vector_type(2) )) char char2;
+typedef __attribute__(( ext_vector_type(3) )) char char3;
+typedef __attribute__(( ext_vector_type(4) )) char char4;
+
+typedef __attribute__(( ext_vector_type(2) )) short short2;
+typedef __attribute__(( ext_vector_type(3) )) short short3;
+typedef __attribute__(( ext_vector_type(4) )) short short4;
+
+typedef __attribute__(( ext_vector_type(2) )) int int2;
+typedef __attribute__(( ext_vector_type(3) )) int int3;
+typedef __attribute__(( ext_vector_type(4) )) int int4;
+typedef __attribute__(( ext_vector_type(16) )) int int16;
+typedef __attribute__(( ext_vector_type(32) )) int int32;
+
+// CHECK: %struct.empty_struct = type {}
+typedef struct empty_struct
+{
+} empty_struct;
+
+// CHECK-NOT: %struct.single_element_struct_arg
 typedef struct single_element_struct_arg
 {
 int i;
 } single_element_struct_arg_t;
 
+// CHECK-NOT: %struct.nested_single_element_struct_arg
+typedef struct nested_single_element_struct_arg
+{
+  single_element_struct_arg_t i;
+} nested_single_element_struct_arg_t;
+
 // CHECK: %struct.struct_arg = type { i32, float, i32 }
 typedef struct struct_arg
 {
 int i1;
 float f;
 int i2;
 } struct_arg_t;
 
+// CHECK: %struct.struct_padding_arg = type { i8, i64 }
+typedef struct struct_padding_arg
+{
+  char i1;
+  long f;
+} struct_padding_arg;
+
 // CHECK: %struct.struct_of_arrays_arg = type { [2 x i32], float, [4 x i32], [3 x float], i32 }
 typedef struct struct_of_arrays_arg
 {
@@ -35,33 +67,454 @@
 int i2;
 } struct_of_structs_arg_t;
 
-// CHECK-LABEL: @test_single_element_struct_arg
-// CHECK: i32 %arg1.coerce
-__kernel void test_single_element_struct_arg(single_element_struct_arg_t arg1)
+// CHECK: %union.transparent_u = type { i32 }
+typedef union
+{
+  int b1;
+  float b2;
+} transparent_u __attribute__((__transparent_union__));
+
+// CHECK: %struct.single_array_element_struct_arg = type { [4 x i32] }
+typedef struct single_array_element_struct_arg
+{
+int i[4];
+} single_array_element_struct_arg_t;
+
+// CHECK: %struct.single_struct_element_struct_arg = type { %struct.inner }
+// CHECK: %struct.inner = type { i32, i64 }
+typedef struct single_struct_element_struct_arg
+{
+  struct inner {
+int a;
+long b;
+  } s;
+} single_struct_element_struct_arg_t;
+
+// CHECK: %struct.different_size_type_pair
+typedef struct different_size_type_pair {
+  long l;
+  int i;
+} different_size_type_pair;
+
+// CHECK: %struct.flexible_array = type { i32, [0 x i32] }
+typedef struct flexible_array
+{
+  int i;
+  int flexible[];
+} flexible_array;
+
+// CHECK: %struct.struct_arr16 = type { [16 x i32] }
+typedef struct struct_arr16
+{
+int arr[16];
+} struct_arr16;
+
+// CHECK: %struct.struct_arr32 = type { [32 x i32] }
+typedef struct struct_arr32
+{
+int arr[32];
+} struct_arr32;
+
+// CHECK: %struct.struct_arr33 = type { [33 x i32] }
+typedef struct struct_arr33
+{
+int arr[33];
+} struct_arr33;
+
+// CHECK: %struct.struct_char_arr32 = type { [32 x i8] }
+typedef struct struct_char_arr32
+{
+  char arr[32];
+} struct_char_arr32;
+
+// CHECK-NOT: %struct.struct_char_x8
+typedef struct struct_char_x8