[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-05-14 Thread Diogo N. Sampaio via Phabricator via cfe-commits
dnsampaio added a comment.

In D75169#1952159 , @pratlucas wrote:

> > Why not just make half as an argument do the right thing for that case?
>
> That would be the ideal approach, but currently there's a limitation on the 
> backend's calling convention lowering that gets in the way.
>  The lowering of calls in `SelectionDAGBuilder` includes a target-independent 
> step that is responsible for spliting or promoting each argument into "legal 
> registers" and takes place before the targets' calling convention lowering.
>  As `f16` is not a legal type on many of the `AAPCS_VFP` targets, it gets 
> promoted to `f32` before the target's lowering code has a chance to define 
> how to handle it.
>  Ideally, this stpe should only take place if lowering calling conventions 
> after type legalization - there's a FIXME there already capturing that -, but 
> that would involve a major rewriting that would impact multiple targets.
>  Inserting a hacky target-dependent fix in this step also didn't look very 
> good.
>  Do you see other alternatives for handling it? If not, which approach would 
> you suggest?


Would it be possible to pass a `half` argument and fix-it-up at 
`CodeGenPrepare`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-05-05 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas added a comment.

Ping.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-04-20 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas added a comment.
Herald added a subscriber: danielkiss.

Ping.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-04-08 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas added a comment.

Ping.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-03-31 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas added a comment.

> Why not just make half as an argument do the right thing for that case?

That would be the ideal approach, but currently there's a limitation on the 
backend's calling convention lowering that gets in the way.
The lowering of calls in `SelectionDAGBuilder` includes a target-independent 
step that is responsible for spliting or promoting each argument into "legal 
registers" and takes place before the targets' calling convention lowering.
As `f16` is not a legal type on many of the `AAPCS_VFP` targets, it gets 
promoted to `f32` before the target's lowering code has a chance to define how 
to handle it.
Ideally, this stpe should only take place if lowering calling conventions after 
type legalization - there's a FIXME there already capturing that -, but that 
would involve a major rewriting that would impact multiple targets.
Inserting a hacky target-dependent fix in this step also didn't look very good.
Do you see other alternatives for handling it? If not, which approach would you 
suggest?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-03-18 Thread Mark Lacey via Phabricator via cfe-commits
rudkx resigned from this revision.
rudkx added a comment.

I did some refactoring here years ago but I'm not that familiar with the ABIs 
or the handling in clang.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-03-18 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

In D75169#1928696 , @pratlucas wrote:

> > Oh, wait, AAPCS wants half values to be passed in the *least* significant 
> > bits of a GPR, even on big-endian machines? That's certainly more 
> > convenient, but it's a weird inconsistency with the otherwise iron rule of 
> > the calling convention, which that it's exactly as if you laid all of the 
> > arguments out in memory and then popped the first four 32-bit values off. 
> > We're talking about a calling convention here that literally skips 
> > registers in order to "align" arguments.
> > 
> > Can we not just coerce to i16? Will LLVM not pass an i16 in the 
> > least-significant bits of a register?
>
> Yes, AAPCS specifies that they should go into the LSBs:
>
> > B.2 [...]  If the argument is a Half-precision Floating Point Type its size 
> > is set to 4 bytes as if it had been copied to the least significant bits of 
> > a 32-bit register and the remaining bits filled with unspecified values.
>
> Coercing to i16 solves it for the general case, when the argumetns are going 
> into GPRs, but is not suficient when those arguments are required to go into 
> FP registers - e.g. `-mfloat-abi=hard`.


Why not just make `half` as an argument do the right thing for that case?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-03-18 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas added a comment.

> Oh, wait, AAPCS wants half values to be passed in the *least* significant 
> bits of a GPR, even on big-endian machines? That's certainly more convenient, 
> but it's a weird inconsistency with the otherwise iron rule of the calling 
> convention, which that it's exactly as if you laid all of the arguments out 
> in memory and then popped the first four 32-bit values off. We're talking 
> about a calling convention here that literally skips registers in order to 
> "align" arguments.
> 
> Can we not just coerce to i16? Will LLVM not pass an i16 in the 
> least-significant bits of a register?

Yes, AAPCS specifies that they should go into the LSBs:

> B.2 [...]  If the argument is a Half-precision Floating Point Type its size 
> is set to 4 bytes as if it had been copied to the least significant bits of a 
> 32-bit register and the remaining bits filled with unspecified values.

Coercing to i16 solves it for the general case, when the argumetns are going 
into GPRs, but is not suficient when those arguments are required to go into FP 
registers - e.g. `-mfloat-abi=hard`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-03-17 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

In D75169#1926545 , @pratlucas wrote:

> Hi @rjmccall,
>  I agree those kind of tweaks do not look good. The issue here, though, is 
> that argument coercion currently ignores the target's endian information when 
> performing coercion through memory.
>  This happens for any type that requires memory coercion, so unfortunately 
> using `[1 x i32]` does not do the trick.


Oh, wait, AAPCS wants half values to be passed in the *least* significant bits 
of a GPR, even on big-endian machines?  That's certainly more convenient, but 
it's a weird inconsistency with the otherwise iron rule of the calling 
convention, which that it's exactly as if you laid all of the arguments out in 
memory and then popped the first four 32-bit values off.  We're talking about a 
calling convention here that literally skips registers in order to "align" 
arguments.

Can we not just coerce to i16?  Will LLVM not pass an i16 in the 
least-significant bits of a register?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-03-17 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas added a comment.

Hi @rjmccall,
I agree those kind of tweaks do not look good. The issue here, though, is that 
argument coercion currently ignores the target's endian information when 
performing coercion through memory.
This happens for any type that requires memory coercion, so unfortunately using 
`[1 x i32]` does not do the trick.
Let me know if you have any other sugestions for handling this, I'd be glad to 
avoid the `ABIArgInfo` approach.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-03-16 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Ugh.  I'd hate to introduce yet another weird little tweak to `ABIArgInfo` 
that's used on exactly one target.  For 16-bit composite types, we seem to 
coerce to a type like `[1 x i32]`; would that be okay here?

You don't have a test that checks that you get the IR you want.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-03-12 Thread Chris Lattner via Phabricator via cfe-commits
lattner resigned from this revision.
lattner added a comment.

I'm not a qualified reviewer for this at this point.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-03-12 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas added a comment.

Ping.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32

2020-02-26 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas created this revision.
Herald added subscribers: cfe-commits, kristof.beyls.
Herald added a project: clang.

Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen. As such promotions
are implemented as coercion through memory in clang, aruments and
returns of those types end up stored on the wrong bits on big-endian
AArch32 - MSBs instead of LSBs.

This patch enforces AAPCS' directions, making sure clang stores those
types on the proper bits during coercion through memory for bit-endian
targets.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D75169

Files:
  clang/include/clang/CodeGen/CGFunctionInfo.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/arm-fp16-arguments.c

Index: clang/test/CodeGen/arm-fp16-arguments.c
===
--- clang/test/CodeGen/arm-fp16-arguments.c
+++ clang/test/CodeGen/arm-fp16-arguments.c
@@ -1,6 +1,9 @@
 // RUN: %clang_cc1 -triple armv7a--none-eabi -target-abi aapcs -mfloat-abi soft -fallow-half-arguments-and-returns -emit-llvm -o - -O2 %s | FileCheck %s --check-prefix=CHECK --check-prefix=SOFT
 // RUN: %clang_cc1 -triple armv7a--none-eabi -target-abi aapcs -mfloat-abi hard -fallow-half-arguments-and-returns -emit-llvm -o - -O2 %s | FileCheck %s --check-prefix=CHECK --check-prefix=HARD
 // RUN: %clang_cc1 -triple armv7a--none-eabi -target-abi aapcs -mfloat-abi soft -fnative-half-arguments-and-returns -emit-llvm -o - -O2 %s | FileCheck %s --check-prefix=NATIVE
+// RUN: %clang_cc1 -triple armebv7a--none-eabi -target-abi aapcs -mfloat-abi soft -fallow-half-arguments-and-returns -emit-llvm -o - -O2 %s | FileCheck %s --check-prefix=CHECK --check-prefix=SOFT
+// RUN: %clang_cc1 -triple armebv7a--none-eabi -target-abi aapcs -mfloat-abi hard -fallow-half-arguments-and-returns -emit-llvm -o - -O2 %s | FileCheck %s --check-prefix=CHECK --check-prefix=HARD
+// RUN: %clang_cc1 -triple armebv7a--none-eabi -target-abi aapcs -mfloat-abi soft -fnative-half-arguments-and-returns -emit-llvm -o - -O2 %s | FileCheck %s --check-prefix=NATIVE
 
 __fp16 g;
 
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -5934,7 +5934,8 @@
 llvm::Type *ResType = IsAAPCS_VFP ?
   llvm::Type::getFloatTy(getVMContext()) :
   llvm::Type::getInt32Ty(getVMContext());
-return ABIArgInfo::getDirect(ResType);
+return ABIArgInfo::getDirect(ResType, /*Offset=*/0, /*Padding=*/nullptr,
+ /*CanBeFlattened=*/true, /*EndianAlign=*/true);
   }
 
   if (!isAggregateTypeForABI(Ty)) {
@@ -6141,7 +6142,8 @@
 llvm::Type *ResType = IsAAPCS_VFP ?
   llvm::Type::getFloatTy(getVMContext()) :
   llvm::Type::getInt32Ty(getVMContext());
-return ABIArgInfo::getDirect(ResType);
+return ABIArgInfo::getDirect(ResType, /*Offset=*/0, /*Padding=*/nullptr,
+/*CanBeFlattened=*/true, /*EndianAlign=*/true);
   }
 
   if (!isAggregateTypeForABI(RetTy)) {
Index: clang/lib/CodeGen/CGCall.cpp
===
--- clang/lib/CodeGen/CGCall.cpp
+++ clang/lib/CodeGen/CGCall.cpp
@@ -1221,6 +1221,7 @@
 /// destination type; in this situation the values of bits which not
 /// present in the src are undefined.
 static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
+  bool EndianAlign,
   CodeGenFunction ) {
   llvm::Type *SrcTy = Src.getElementType();
 
@@ -1261,6 +1262,11 @@
   // Otherwise do coercion through memory. This is stupid, but simple.
   Address Tmp = CreateTempAllocaForCoercion(CGF, Ty, Src.getAlignment());
   Address Casted = CGF.Builder.CreateElementBitCast(Tmp,CGF.Int8Ty);
+  if (EndianAlign && CGF.CGM.getDataLayout().isBigEndian()) {
+// Offset address to match LSBs if endian alignment is required
+auto Offset = CharUnits::fromQuantity(DstSize - SrcSize);
+Casted = CGF.Builder.CreateConstInBoundsByteGEP(Casted, Offset);
+  }
   Address SrcCasted = CGF.Builder.CreateElementBitCast(Src,CGF.Int8Ty);
   CGF.Builder.CreateMemCpy(Casted, SrcCasted,
   llvm::ConstantInt::get(CGF.IntPtrTy, SrcSize),
@@ -1296,6 +1302,7 @@
 static void CreateCoercedStore(llvm::Value *Src,
Address Dst,
bool DstIsVolatile,
+   bool EndianAlign,
CodeGenFunction ) {
   llvm::Type *SrcTy = Src->getType();
   llvm::Type *DstTy = Dst.getType()->getElementType();
@@ -1348,6 +1355,11 @@
 Address Tmp = CreateTempAllocaForCoercion(CGF, SrcTy, Dst.getAlignment());
 CGF.Builder.CreateStore(Src, Tmp);
 Address Casted = CGF.Builder.CreateElementBitCast(Tmp,CGF.Int8Ty);
+if (EndianAlign &&