[PATCH] D97224: Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.

2021-06-23 Thread John McCall via Phabricator via cfe-commits
rjmccall accepted this revision.
rjmccall added a comment.
This revision is now accepted and ready to land.

Alright, well, this does look cleaner.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97224/new/

https://reviews.llvm.org/D97224

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97224: Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.

2021-06-23 Thread James Y Knight via Phabricator via cfe-commits
jyknight added a comment.

Ping. I think this is correct, and would like to commit.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97224/new/

https://reviews.llvm.org/D97224

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97224: Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.

2021-03-12 Thread James Y Knight via Phabricator via cfe-commits
jyknight added a comment.

In D97224#2621328 , @rjmccall wrote:

> In D97224#2604537 , @jyknight wrote:
>
>>> I'm less certain about what to do with the `__atomic_*` builtins
>>
>> The `__atomic` builtins have already been supporting under-aligned pointers 
>> all along -- and that behavior is unchanged by this patch.
>
> How so?  Clang hasn't been passing down an alignment, which means that it's 
> been building `atomicrmw` instructions with the natural alignment for the IR 
> type, which means we've been assuming that all pointers have a least that 
> alignment.  The LLVM documentation even says that `atomicrmw` doesn't allow 
> under-alignment.

We construct a libcall to `__atomic_*` routines in the frontend upon seeing an 
underaligned argument, instead of letting the backend handle it -- there's a 
bunch of code at 
https://github.com/llvm/llvm-project/blob/bc4a5bdce4af82a5522869d8a154e9e15cf809df/clang/lib/CodeGen/CGAtomic.cpp#L790
 to handle that. I'd like to rip most of that out in the future, and just let 
the backend handle it in more cases.

E.g.

  typedef int __attribute__((aligned(1))) unaligned_int;
  bool cmpxchg_u(unaligned_int *x) {
  int expected = 2;
  return __atomic_compare_exchange_n(x, , 2, false, 
__ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
  }

generates a libcall to `__atomic_compare_exchange_4` (in IR, generated in the 
Clang code), instead of creating a cmpxchg IR instruction, due to the 
under-alignment. (Although, sigh, I've only just noticed we actually have a 
problem here -- the `__atomic_*_SIZE` libcalls are supposed to require an 
aligned argument -- so we should be using `__atomic_compare_exchange` (without 
size suffix) instead. Gah.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97224/new/

https://reviews.llvm.org/D97224

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97224: Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.

2021-03-11 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

In D97224#2604537 , @jyknight wrote:

>> I'm less certain about what to do with the `__atomic_*` builtins
>
> The `__atomic` builtins have already been supporting under-aligned pointers 
> all along -- and that behavior is unchanged by this patch.

How so?  Clang hasn't been passing down an alignment, which means that it's 
been building `atomicrmw` instructions with the natural alignment for the IR 
type, which means we've been assuming that all pointers have a least that 
alignment.  The LLVM documentation even says that `atomicrmw` doesn't allow 
under-alignment.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97224/new/

https://reviews.llvm.org/D97224

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97224: Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.

2021-03-11 Thread James Y Knight via Phabricator via cfe-commits
jyknight updated this revision to Diff 330088.
jyknight added a comment.

Use natural alignment for `_Interlocked*` intrinsics.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97224/new/

https://reviews.llvm.org/D97224

Files:
  clang/lib/CodeGen/CGAtomic.cpp
  clang/lib/CodeGen/CGBuilder.h
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGExprScalar.cpp
  clang/lib/CodeGen/CGStmtOpenMP.cpp
  clang/test/CodeGen/atomic-ops.c
  clang/test/CodeGen/ms-intrinsics-underaligned.c
  clang/test/CodeGen/ms-intrinsics.c
  clang/test/OpenMP/parallel_reduction_codegen.cpp

Index: clang/test/OpenMP/parallel_reduction_codegen.cpp
===
--- clang/test/OpenMP/parallel_reduction_codegen.cpp
+++ clang/test/OpenMP/parallel_reduction_codegen.cpp
@@ -198,7 +198,7 @@
 // LAMBDA: br label %[[REDUCTION_DONE]]
 // LAMBDA: [[CASE2]]
 // LAMBDA: [[G_PRIV_VAL:%.+]] = load i32, i32* [[G_PRIVATE_ADDR]]
-// LAMBDA: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 4
+// LAMBDA: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 128
 // LAMBDA: br label %[[REDUCTION_DONE]]
 // LAMBDA: [[REDUCTION_DONE]]
 // LAMBDA: ret void
@@ -255,7 +255,7 @@
 // BLOCKS: br label %[[REDUCTION_DONE]]
 // BLOCKS: [[CASE2]]
 // BLOCKS: [[G_PRIV_VAL:%.+]] = load i32, i32* [[G_PRIVATE_ADDR]]
-// BLOCKS: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 4
+// BLOCKS: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 128
 // BLOCKS: br label %[[REDUCTION_DONE]]
 // BLOCKS: [[REDUCTION_DONE]]
 // BLOCKS: ret void
@@ -771,7 +771,7 @@
 // case 2:
 // t_var += t_var_reduction;
 // CHECK: [[T_VAR_PRIV_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[T_VAR_PRIV]]
-// CHECK: atomicrmw add i32* [[T_VAR_REF]], i32 [[T_VAR_PRIV_VAL]] monotonic, align 4
+// CHECK: atomicrmw add i32* [[T_VAR_REF]], i32 [[T_VAR_PRIV_VAL]] monotonic, align 128
 
 // var = var.operator &(var_reduction);
 // CHECK: call void @__kmpc_critical(
@@ -801,7 +801,7 @@
 
 // t_var1 = min(t_var1, t_var1_reduction);
 // CHECK: [[T_VAR1_PRIV_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[T_VAR1_PRIV]]
-// CHECK: atomicrmw min i32* [[T_VAR1_REF]], i32 [[T_VAR1_PRIV_VAL]] monotonic, align 4
+// CHECK: atomicrmw min i32* [[T_VAR1_REF]], i32 [[T_VAR1_PRIV_VAL]] monotonic, align 128
 
 // break;
 // CHECK: br label %[[RED_DONE]]
Index: clang/test/CodeGen/ms-intrinsics.c
===
--- clang/test/CodeGen/ms-intrinsics.c
+++ clang/test/CodeGen/ms-intrinsics.c
@@ -450,10 +450,10 @@
 // CHECK-64: [[EL:%[0-9]+]] = zext i64 %inc1 to i128
 // CHECK-64: [[EHS:%[0-9]+]] = shl nuw i128 [[EH]], 64
 // CHECK-64: [[EXP:%[0-9]+]] = or i128 [[EHS]], [[EL]]
-// CHECK-64: [[ORG:%[0-9]+]] = load i128, i128* [[CNR]], align 16
+// CHECK-64: [[ORG:%[0-9]+]] = load i128, i128* [[CNR]], align 8
 // CHECK-64: [[RES:%[0-9]+]] = cmpxchg volatile i128* [[DST]], i128 [[ORG]], i128 [[EXP]] seq_cst seq_cst, align 16
 // CHECK-64: [[OLD:%[0-9]+]] = extractvalue { i128, i1 } [[RES]], 0
-// CHECK-64: store i128 [[OLD]], i128* [[CNR]], align 16
+// CHECK-64: store i128 [[OLD]], i128* [[CNR]], align 8
 // CHECK-64: [[SUC1:%[0-9]+]] = extractvalue { i128, i1 } [[RES]], 1
 // CHECK-64: [[SUC8:%[0-9]+]] = zext i1 [[SUC1]] to i8
 // CHECK-64: ret i8 [[SUC8]]
Index: clang/test/CodeGen/ms-intrinsics-underaligned.c
===
--- /dev/null
+++ clang/test/CodeGen/ms-intrinsics-underaligned.c
@@ -0,0 +1,99 @@
+// RUN: %clang_cc1 -ffreestanding -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 \
+// RUN: -triple x86_64--windows -Oz -emit-llvm -target-feature +cx16 %s -o - \
+// RUN: | FileCheck %s
+
+// Ensure that we emit _Interlocked atomic operations specifying natural
+// alignment, even when clang's usual alignment derivation would result in a
+// lower alignment value.
+
+// intrin.h needs size_t, but -ffreestanding prevents us from getting it from
+// stddef.h.  Work around it with this typedef.
+typedef __SIZE_TYPE__ size_t;
+
+#include 
+
+#pragma pack(1)
+typedef struct {
+  char a;
+  short b;
+  long c;
+  long long d;
+  void *p;
+} X;
+
+_Static_assert(sizeof(X) == 23, "");
+_Static_assert(__alignof__(X) == 1, "");
+
+// CHECK-LABEL: @test_InterlockedExchangePointer(
+// CHECK:   atomicrmw {{.*}} align 8
+void *test_InterlockedExchangePointer(X *x) {
+  return _InterlockedExchangePointer(>p, 0);
+}
+
+// CHECK-LABEL: @test_InterlockedExchange8(
+// CHECK:   atomicrmw {{.*}} align 1
+char test_InterlockedExchange8(X *x) {
+  return _InterlockedExchange8(>a, 0);
+}
+
+// CHECK-LABEL: @test_InterlockedExchange16(
+// CHECK:   atomicrmw {{.*}} align 2
+short test_InterlockedExchange16(X *x) {
+  return _InterlockedExchange16(>b, 0);
+}
+
+// CHECK-LABEL: 

[PATCH] D97224: Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.

2021-03-04 Thread James Y Knight via Phabricator via cfe-commits
jyknight added a comment.

> The idea here is not to "ignore type alignment".  `EmitPointerWithAlignment` 
> will sometimes return an alignment for a pointer that's less than the 
> alignment of the pointee type, e.g. because you're taking the address of a 
> packed struct field.  The critical question is whether the atomic builtins 
> ought to make an effort to honor that reduced alignment, even if it leads to 
> terrible code, or if we should treat the use of the builtin as a user promise 
> that the pointer is actually more aligned than you might think from the 
> information statically available.

Agreed -- that is the question. In general, I'd like to default to basing 
decisions only on the statically-known alignments, because I think that'll 
typically be the best choice for users. Where there's something like a packed 
struct, it's likely that the values will end up under-aligned in fact, not just 
in the compiler's understanding.

> For example, the MSDN documentation for `InterlockedIncrement` says that it 
> requires 32-bit alignment [...].  I would say that all of the Interlocked 
> APIs ought to be read as guaranteeing the natural, full-width alignment for 
> their operation.

I had missed that it was documented in some of the other functions (beyond 
InterlockedCompareExchange128). I'll change the remainder of the _Interlocked 
APIs to assume at least natural alignment.

> I'm less certain about what to do with the `__atomic_*` builtins

The `__atomic` builtins have already been supporting under-aligned pointers all 
along -- and that behavior is unchanged by this patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97224/new/

https://reviews.llvm.org/D97224

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97224: Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.

2021-03-04 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

In D97224#2604069 , @jyknight wrote:

> In D97224#2596410 , @rjmccall wrote:
>
>> Do we really consider the existing atomic intrinsics to not impose added 
>> alignment restrictions?  I'm somewhat concerned that we're going to produce 
>> IR here that's either really suboptimal or, worse, unimplementable, just 
>> because we over-interpreted some cue about alignment.  I guess it would only 
>> be a significant problem on a target where types are frequently 
>> under-aligned for what atomics need, which is not typical, or when the user 
>> is doing atomics on a field of something like a packed struct.
>
> For the `__atomic_*` intrinsics, we don't consider those as imposing 
> additional alignment restrictions -- currently, we avoid generating the LLVM 
> IR instruction if it's below natural alignment, since we couldn't specify 
> alignment on the IR instruction. (Now that we have alignment on the LLVM IR 
> operations, I'd like to eventually get rid of that logic from Clang, since 
> it's also handled by LLVM.)

Frontends ultimately have the responsibility of making sure they ultimately 
emit code that follows the platform ABI for atomics.  In most other parts of 
the ABI, we usually find that it is possible (even necessary) to delegate 
*part* of that ABI responsibility down to LLVM — e.g. to emit inline atomic 
sequences, which I suppose frontends could do with inline assembly, but there 
are obvious reasons to prefer a more semantic IR — but that at least in some 
cases, there is information that we cannot pass down and so must handle in the 
frontend.  I am somewhat skeptical that atomics are going to prove an exception 
here.  At the very least, there will always be *some* operations that we have 
to expand into compare-exchange loops in the frontend, for want of a 
sufficiently powerful instruction/intrinsic.  That said, if you find that you 
can actually free Clang from having to make certain decisions here, that's 
great; I just want you to understand that usually we find that there are limits 
to what we can usefully delegate to LLVM, and ultimately the responsibility is 
ours.

> For other intrinsics (e.g. MSVCIntrin::_InterlockedAnd, 
> Builtin::BI__sync_fetch_and_add_4, NVPTX::BI__nvvm_atom_add_gen_i, and the 
> others in those 3 function families), we currently entirely ignore the 
> alignment, and simply assume the argument is naturally-aligned. So, yes, this 
> change could potentially affect the behavior for underaligned types.
>
> So, I could change these to explicitly increase the assumed alignment of 
> their arguments, like I did for InterlockedCompareExchange128. My inclination 
> is not to do so, however...it doesn't seem like a good idea in general to 
> ignore type alignment. But, I'd not be opposed to doing that, if there's a 
> good reason.

The idea here is not to "ignore type alignment".  `EmitPointerWithAlignment` 
will sometimes return an alignment for a pointer that's less than the alignment 
of the pointee type, e.g. because you're taking the address of a packed struct 
field.  The critical question is whether the atomic builtins ought to make an 
effort to honor that reduced alignment, even if it leads to terrible code, or 
if we should treat the use of the builtin as a user promise that the pointer is 
actually more aligned than you might think from the information statically 
available.  That has to be resolved by the semantics of the builtin, and 
unfortunately intrinsic documentation is often spotty on this point.  For 
example, the MSDN documentation for `InterlockedIncrement` says that it 
requires 32-bit alignment, but the documentation for `InterlockedAdd` doesn't.  
It seems extremely unlikely that that is meant to be read as a statement that 
`InterlockedAdd` is actually more permissive.  I would say that all of the 
Interlocked APIs ought to be read as guaranteeing the natural, full-width 
alignment for their operation.  I'm less certain about what to do with the 
`__atomic_*` builtins, because maybe we should take this as an opportunity to 
try to "do the right thing" with under-aligned atomics; on the other hand, that 
assumes that we always *can* "do the right thing", and I don't want Clang to 
start miscompiling or refusing to compile code because we're trying to do 
something above and beyond the normal language semantics.  It might be more 
reasonable to always use the type alignment as a minimum.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97224/new/

https://reviews.llvm.org/D97224

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97224: Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.

2021-03-04 Thread James Y Knight via Phabricator via cfe-commits
jyknight added a comment.

In D97224#2596410 , @rjmccall wrote:

> Do we really consider the existing atomic intrinsics to not impose added 
> alignment restrictions?  I'm somewhat concerned that we're going to produce 
> IR here that's either really suboptimal or, worse, unimplementable, just 
> because we over-interpreted some cue about alignment.  I guess it would only 
> be a significant problem on a target where types are frequently under-aligned 
> for what atomics need, which is not typical, or when the user is doing 
> atomics on a field of something like a packed struct.

For the `__atomic_*` intrinsics, we don't consider those as imposing additional 
alignment restrictions -- currently, we avoid generating the LLVM IR 
instruction if it's below natural alignment, since we couldn't specify 
alignment on the IR instruction. (Now that we have alignment on the LLVM IR 
operations, I'd like to eventually get rid of that logic from Clang, since it's 
also handled by LLVM.)

For other intrinsics (e.g. MSVCIntrin::_InterlockedAnd, 
Builtin::BI__sync_fetch_and_add_4, NVPTX::BI__nvvm_atom_add_gen_i, and the 
others in those 3 function families), we currently entirely ignore the 
alignment, and simply assume the argument is naturally-aligned. So, yes, this 
change could potentially affect the behavior for underaligned types.

So, I could change these to explicitly increase the assumed alignment of their 
arguments, like I did for InterlockedCompareExchange128. My inclination is not 
to do so, however...it doesn't seem like a good idea in general to ignore type 
alignment. But, I'd not be opposed to doing that, if there's a good reason.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97224/new/

https://reviews.llvm.org/D97224

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97224: Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.

2021-03-04 Thread James Y Knight via Phabricator via cfe-commits
jyknight updated this revision to Diff 328219.
jyknight marked 2 inline comments as done.
jyknight added a comment.

Address review comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97224/new/

https://reviews.llvm.org/D97224

Files:
  clang/lib/CodeGen/CGAtomic.cpp
  clang/lib/CodeGen/CGBuilder.h
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGExprScalar.cpp
  clang/lib/CodeGen/CGStmtOpenMP.cpp
  clang/test/CodeGen/atomic-ops.c
  clang/test/CodeGen/ms-intrinsics.c
  clang/test/OpenMP/parallel_reduction_codegen.cpp

Index: clang/test/OpenMP/parallel_reduction_codegen.cpp
===
--- clang/test/OpenMP/parallel_reduction_codegen.cpp
+++ clang/test/OpenMP/parallel_reduction_codegen.cpp
@@ -198,7 +198,7 @@
 // LAMBDA: br label %[[REDUCTION_DONE]]
 // LAMBDA: [[CASE2]]
 // LAMBDA: [[G_PRIV_VAL:%.+]] = load i32, i32* [[G_PRIVATE_ADDR]]
-// LAMBDA: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 4
+// LAMBDA: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 128
 // LAMBDA: br label %[[REDUCTION_DONE]]
 // LAMBDA: [[REDUCTION_DONE]]
 // LAMBDA: ret void
@@ -255,7 +255,7 @@
 // BLOCKS: br label %[[REDUCTION_DONE]]
 // BLOCKS: [[CASE2]]
 // BLOCKS: [[G_PRIV_VAL:%.+]] = load i32, i32* [[G_PRIVATE_ADDR]]
-// BLOCKS: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 4
+// BLOCKS: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 128
 // BLOCKS: br label %[[REDUCTION_DONE]]
 // BLOCKS: [[REDUCTION_DONE]]
 // BLOCKS: ret void
@@ -771,7 +771,7 @@
 // case 2:
 // t_var += t_var_reduction;
 // CHECK: [[T_VAR_PRIV_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[T_VAR_PRIV]]
-// CHECK: atomicrmw add i32* [[T_VAR_REF]], i32 [[T_VAR_PRIV_VAL]] monotonic, align 4
+// CHECK: atomicrmw add i32* [[T_VAR_REF]], i32 [[T_VAR_PRIV_VAL]] monotonic, align 128
 
 // var = var.operator &(var_reduction);
 // CHECK: call void @__kmpc_critical(
@@ -801,7 +801,7 @@
 
 // t_var1 = min(t_var1, t_var1_reduction);
 // CHECK: [[T_VAR1_PRIV_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[T_VAR1_PRIV]]
-// CHECK: atomicrmw min i32* [[T_VAR1_REF]], i32 [[T_VAR1_PRIV_VAL]] monotonic, align 4
+// CHECK: atomicrmw min i32* [[T_VAR1_REF]], i32 [[T_VAR1_PRIV_VAL]] monotonic, align 128
 
 // break;
 // CHECK: br label %[[RED_DONE]]
Index: clang/test/CodeGen/ms-intrinsics.c
===
--- clang/test/CodeGen/ms-intrinsics.c
+++ clang/test/CodeGen/ms-intrinsics.c
@@ -450,10 +450,10 @@
 // CHECK-64: [[EL:%[0-9]+]] = zext i64 %inc1 to i128
 // CHECK-64: [[EHS:%[0-9]+]] = shl nuw i128 [[EH]], 64
 // CHECK-64: [[EXP:%[0-9]+]] = or i128 [[EHS]], [[EL]]
-// CHECK-64: [[ORG:%[0-9]+]] = load i128, i128* [[CNR]], align 16
+// CHECK-64: [[ORG:%[0-9]+]] = load i128, i128* [[CNR]], align 8
 // CHECK-64: [[RES:%[0-9]+]] = cmpxchg volatile i128* [[DST]], i128 [[ORG]], i128 [[EXP]] seq_cst seq_cst, align 16
 // CHECK-64: [[OLD:%[0-9]+]] = extractvalue { i128, i1 } [[RES]], 0
-// CHECK-64: store i128 [[OLD]], i128* [[CNR]], align 16
+// CHECK-64: store i128 [[OLD]], i128* [[CNR]], align 8
 // CHECK-64: [[SUC1:%[0-9]+]] = extractvalue { i128, i1 } [[RES]], 1
 // CHECK-64: [[SUC8:%[0-9]+]] = zext i1 [[SUC1]] to i8
 // CHECK-64: ret i8 [[SUC8]]
Index: clang/test/CodeGen/atomic-ops.c
===
--- clang/test/CodeGen/atomic-ops.c
+++ clang/test/CodeGen/atomic-ops.c
@@ -655,9 +655,9 @@
   __atomic_load(_a, _b, memory_order_seq_cst);
   // CHECK: store atomic i64 {{.*}}, align 16
   __atomic_store(_a, _b, memory_order_seq_cst);
-  // CHECK: atomicrmw xchg i64* {{.*}}, align 8
+  // CHECK: atomicrmw xchg i64* {{.*}}, align 16
   __atomic_exchange(_a, _b, _c, memory_order_seq_cst);
-  // CHECK: cmpxchg weak i64* {{.*}}, align 8
+  // CHECK: cmpxchg weak i64* {{.*}}, align 16
   __atomic_compare_exchange(_a, _b, _c, 1, memory_order_seq_cst, memory_order_seq_cst);
 }
 
Index: clang/lib/CodeGen/CGStmtOpenMP.cpp
===
--- clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -5150,7 +5150,7 @@
 X.getType()->hasSignedIntegerRepresentation());
   }
   llvm::Value *Res =
-  CGF.Builder.CreateAtomicRMW(RMWOp, X.getPointer(CGF), UpdateVal, AO);
+  CGF.Builder.CreateAtomicRMW(RMWOp, X.getAddress(CGF), UpdateVal, AO);
   return std::make_pair(true, RValue::get(Res));
 }
 
Index: clang/lib/CodeGen/CGExprScalar.cpp
===
--- clang/lib/CodeGen/CGExprScalar.cpp
+++ clang/lib/CodeGen/CGExprScalar.cpp
@@ -2430,7 +2430,7 @@
   // For atomic bool increment, we just store true and return it for
   // preincrement, do an atomic swap with true for postincrement
   return 

[PATCH] D97224: Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.

2021-03-01 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Do we really consider the existing atomic intrinsics to not impose added 
alignment restrictions?  I'm somewhat concerned that we're going to produce IR 
here that's either really suboptimal or, worse, unimplementable, just because 
we over-interpreted some cue about alignment.  I guess it would only be a 
significant problem on a target where types are frequently under-aligned for 
what atomics need, which is not typical, or when the user is doing atomics on a 
field of something like a packed struct.




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:354
   llvm::Type *Int128PtrTy = Int128Ty->getPointerTo();
-  Destination = CGF.Builder.CreateBitCast(Destination, Int128PtrTy);
-  Address ComparandResult(CGF.Builder.CreateBitCast(ComparandPtr, Int128PtrTy),
-  CGF.getContext().toCharUnitsFromBits(128));
+  DestAddr = CGF.Builder.CreateBitCast(DestAddr, Int128PtrTy);
+  ComparandAddr = CGF.Builder.CreateBitCast(ComparandAddr, Int128PtrTy);

Since you're changing this code anyway, please make this do 
`CreateElementBitCast(DestAddr, Int128Ty)` so that it's address-space-correct.

There are a lot of other lines in the patch that would benefit from the same 
thing.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:3765
 Ptr = Builder.CreateBitCast(Ptr, Int8Ty->getPointerTo(AddrSpace));
+Address PtrAddr(Ptr, CharUnits::One());
 Value *NewVal = Builder.getInt8(1);

This should be using `EmitPointerWithAlignment` instead of assuming an 
alignment of 1.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97224/new/

https://reviews.llvm.org/D97224

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97224: Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg.

2021-02-22 Thread James Y Knight via Phabricator via cfe-commits
jyknight created this revision.
jyknight added reviewers: gchatelet, jfb, rjmccall.
jyknight requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Following the LLVM change to add an alignment argument to the
IRBuilder calls, switch Clang's CGBuilder variants to take an Address
type. Then, update all callers to pass through the Address.

There is one interesting exception: Microsoft's
InterlockedCompareExchange128 family of functions are documented to
require (and assume) 16-byte alignment, despite the argument type
being only `long long*`.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D97224

Files:
  clang/lib/CodeGen/CGAtomic.cpp
  clang/lib/CodeGen/CGBuilder.h
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGExprScalar.cpp
  clang/lib/CodeGen/CGStmtOpenMP.cpp
  clang/test/CodeGen/atomic-ops.c
  clang/test/CodeGen/ms-intrinsics.c
  clang/test/OpenMP/parallel_reduction_codegen.cpp

Index: clang/test/OpenMP/parallel_reduction_codegen.cpp
===
--- clang/test/OpenMP/parallel_reduction_codegen.cpp
+++ clang/test/OpenMP/parallel_reduction_codegen.cpp
@@ -198,7 +198,7 @@
 // LAMBDA: br label %[[REDUCTION_DONE]]
 // LAMBDA: [[CASE2]]
 // LAMBDA: [[G_PRIV_VAL:%.+]] = load i32, i32* [[G_PRIVATE_ADDR]]
-// LAMBDA: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 4
+// LAMBDA: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 128
 // LAMBDA: br label %[[REDUCTION_DONE]]
 // LAMBDA: [[REDUCTION_DONE]]
 // LAMBDA: ret void
@@ -255,7 +255,7 @@
 // BLOCKS: br label %[[REDUCTION_DONE]]
 // BLOCKS: [[CASE2]]
 // BLOCKS: [[G_PRIV_VAL:%.+]] = load i32, i32* [[G_PRIVATE_ADDR]]
-// BLOCKS: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 4
+// BLOCKS: atomicrmw add i32* [[G_REF]], i32 [[G_PRIV_VAL]] monotonic, align 128
 // BLOCKS: br label %[[REDUCTION_DONE]]
 // BLOCKS: [[REDUCTION_DONE]]
 // BLOCKS: ret void
@@ -771,7 +771,7 @@
 // case 2:
 // t_var += t_var_reduction;
 // CHECK: [[T_VAR_PRIV_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[T_VAR_PRIV]]
-// CHECK: atomicrmw add i32* [[T_VAR_REF]], i32 [[T_VAR_PRIV_VAL]] monotonic, align 4
+// CHECK: atomicrmw add i32* [[T_VAR_REF]], i32 [[T_VAR_PRIV_VAL]] monotonic, align 128
 
 // var = var.operator &(var_reduction);
 // CHECK: call void @__kmpc_critical(
@@ -801,7 +801,7 @@
 
 // t_var1 = min(t_var1, t_var1_reduction);
 // CHECK: [[T_VAR1_PRIV_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[T_VAR1_PRIV]]
-// CHECK: atomicrmw min i32* [[T_VAR1_REF]], i32 [[T_VAR1_PRIV_VAL]] monotonic, align 4
+// CHECK: atomicrmw min i32* [[T_VAR1_REF]], i32 [[T_VAR1_PRIV_VAL]] monotonic, align 128
 
 // break;
 // CHECK: br label %[[RED_DONE]]
Index: clang/test/CodeGen/ms-intrinsics.c
===
--- clang/test/CodeGen/ms-intrinsics.c
+++ clang/test/CodeGen/ms-intrinsics.c
@@ -450,10 +450,10 @@
 // CHECK-64: [[EL:%[0-9]+]] = zext i64 %inc1 to i128
 // CHECK-64: [[EHS:%[0-9]+]] = shl nuw i128 [[EH]], 64
 // CHECK-64: [[EXP:%[0-9]+]] = or i128 [[EHS]], [[EL]]
-// CHECK-64: [[ORG:%[0-9]+]] = load i128, i128* [[CNR]], align 16
+// CHECK-64: [[ORG:%[0-9]+]] = load i128, i128* [[CNR]], align 8
 // CHECK-64: [[RES:%[0-9]+]] = cmpxchg volatile i128* [[DST]], i128 [[ORG]], i128 [[EXP]] seq_cst seq_cst, align 16
 // CHECK-64: [[OLD:%[0-9]+]] = extractvalue { i128, i1 } [[RES]], 0
-// CHECK-64: store i128 [[OLD]], i128* [[CNR]], align 16
+// CHECK-64: store i128 [[OLD]], i128* [[CNR]], align 8
 // CHECK-64: [[SUC1:%[0-9]+]] = extractvalue { i128, i1 } [[RES]], 1
 // CHECK-64: [[SUC8:%[0-9]+]] = zext i1 [[SUC1]] to i8
 // CHECK-64: ret i8 [[SUC8]]
Index: clang/test/CodeGen/atomic-ops.c
===
--- clang/test/CodeGen/atomic-ops.c
+++ clang/test/CodeGen/atomic-ops.c
@@ -655,9 +655,9 @@
   __atomic_load(_a, _b, memory_order_seq_cst);
   // CHECK: store atomic i64 {{.*}}, align 16
   __atomic_store(_a, _b, memory_order_seq_cst);
-  // CHECK: atomicrmw xchg i64* {{.*}}, align 8
+  // CHECK: atomicrmw xchg i64* {{.*}}, align 16
   __atomic_exchange(_a, _b, _c, memory_order_seq_cst);
-  // CHECK: cmpxchg weak i64* {{.*}}, align 8
+  // CHECK: cmpxchg weak i64* {{.*}}, align 16
   __atomic_compare_exchange(_a, _b, _c, 1, memory_order_seq_cst, memory_order_seq_cst);
 }
 
Index: clang/lib/CodeGen/CGStmtOpenMP.cpp
===
--- clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -5150,7 +5150,7 @@
 X.getType()->hasSignedIntegerRepresentation());
   }
   llvm::Value *Res =
-  CGF.Builder.CreateAtomicRMW(RMWOp, X.getPointer(CGF), UpdateVal, AO);
+  CGF.Builder.CreateAtomicRMW(RMWOp, X.getAddress(CGF), UpdateVal, AO);
   return std::make_pair(true,