[PATCH] D75903: [AArch64][CodeGen] Fixing stack alignment of HFA arguments on AArch64 PCS

2021-02-08 Thread Reid Kleckner via Phabricator via cfe-commits
rnk added inline comments.



Comment at: llvm/docs/LangRef.rst:1220
+``alignstack()``
+This indicates the alignment that should be considered by the backend when
+assigning this parameter to a stack slot during calling convention

chill wrote:
> rnk wrote:
> > This seems like you are introducing a new meaning to `alignstack`, which 
> > according to the comments, only affects function SP alignment, not 
> > parameter alignment.
> > 
> > I'm assuming the reason you can't use the regular `align` attribute is that 
> > it is overloaded to mean two things: the alignment of the pointer when 
> > applied to a pointer, and the alignment of the argument memory when that 
> > pointer argument is marked `byval`. If you want to resolve this ambiguity, 
> > it seems like something that should be discussed on llvm-dev with a wider 
> > audience.
> Sorry, I couldn't quite get it, do you suggest we should be using the `align` 
> attribute instead of `alignstack`, if there  are no
> (major) objections on the llvm-dev list?
> 
> It certainly makes sense to me to use `align` as it already pertains to 
> individual argument alignment (even though it's for pointers only now).
> 
Mostly I think I meant that this will be a big change in the meaning of either 
the `align` or the `alignstack` attributes, and that should be hashed out on 
llvm-dev.

Right now `align` is kind of a hybrid between an optimization annotation 
attribute, like `dereferenceable` or `nonnull`, and an ABI attribute, like 
`byval` or `inreg`. When `align` is used with `byval`, it affects argument 
memory layout. When `byval` is not present, it is just an optimizer hint. IMO, 
ideally, we should separate those two roles.

I should be able to control the alignment of the memory used to pass a pointer 
argument, at the same time that I annotate which low bits of the pointer are 
known to be zero.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75903/new/

https://reviews.llvm.org/D75903

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75903: [AArch64][CodeGen] Fixing stack alignment of HFA arguments on AArch64 PCS

2021-02-05 Thread Momchil Velikov via Phabricator via cfe-commits
chill added inline comments.



Comment at: llvm/docs/LangRef.rst:1220
+``alignstack()``
+This indicates the alignment that should be considered by the backend when
+assigning this parameter to a stack slot during calling convention

rnk wrote:
> This seems like you are introducing a new meaning to `alignstack`, which 
> according to the comments, only affects function SP alignment, not parameter 
> alignment.
> 
> I'm assuming the reason you can't use the regular `align` attribute is that 
> it is overloaded to mean two things: the alignment of the pointer when 
> applied to a pointer, and the alignment of the argument memory when that 
> pointer argument is marked `byval`. If you want to resolve this ambiguity, it 
> seems like something that should be discussed on llvm-dev with a wider 
> audience.
Sorry, I couldn't quite get it, do you suggest we should be using the `align` 
attribute instead of `alignstack`, if there  are no
(major) objections on the llvm-dev list?

It certainly makes sense to me to use `align` as it already pertains to 
individual argument alignment (even though it's for pointers only now).



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75903/new/

https://reviews.llvm.org/D75903

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75903: [AArch64][CodeGen] Fixing stack alignment of HFA arguments on AArch64 PCS

2020-04-10 Thread Reid Kleckner via Phabricator via cfe-commits
rnk added a comment.

In D75903#1963382 , @pratlucas wrote:

> In regards to the compatibility with other compilers, I'm not sure that 
> following what seems to be an uncompliant behavior would be the best way to 
> proceed. @rnk and @ostannard, what would be your take on this?


I don't have any familiarity with the prevailing practices for ARM ABI 
compatibility, so I couldn't say. It might be worth checking in with 
stakeholders from the other compilers, i.e. file bugs against both compilers 
and ask for an opinion.




Comment at: llvm/docs/LangRef.rst:1220
+``alignstack()``
+This indicates the alignment that should be considered by the backend when
+assigning this parameter to a stack slot during calling convention

This seems like you are introducing a new meaning to `alignstack`, which 
according to the comments, only affects function SP alignment, not parameter 
alignment.

I'm assuming the reason you can't use the regular `align` attribute is that it 
is overloaded to mean two things: the alignment of the pointer when applied to 
a pointer, and the alignment of the argument memory when that pointer argument 
is marked `byval`. If you want to resolve this ambiguity, it seems like 
something that should be discussed on llvm-dev with a wider audience.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75903/new/

https://reviews.llvm.org/D75903



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75903: [AArch64][CodeGen] Fixing stack alignment of HFA arguments on AArch64 PCS

2020-04-06 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas added a comment.

From the AAPCS64's Parameter Passing Rules section 
(https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules),
 I believe the proposed handling is correct. The HFA related rules described in 
this section are:

  Stage B – Pre-padding and extension of arguments
  [...]
  B.2   If the argument type is an HFA or an HVA, then the argument is used 
unmodified.
  [...]

  Stage C – Assignment of arguments to registers and stack
  [...]
  C.2   If the argument is an HFA or an HVA and there are sufficient 
unallocated SIMD and Floating-point registers (NSRN + number of members <= 8), 
then the argument is allocated to SIMD and Floating-point Registers (with one 
register per member of the HFA or HVA). The NSRN is incremented by the number 
of registers used. The argument has now been allocated.
  C.3   If the argument is an HFA or an HVA then the NSRN is set to 8 and the 
size of the argument is rounded up to the nearest multiple of 8 bytes.
  C.4   If the argument is an HFA, an HVA, a Quad-precision Floating-point or 
Short Vector Type then the NSAA is rounded up to the larger of 8 or the Natural 
Alignment of the argument’s type.
  [...]

As per rule `C.4`, the argument should be allocated on the stack address 
rounded to the larger of 8 and its Natural Alignment, which is 32 according to 
what is specified by the Composite Types rules in sectoin 5.6 of that same 
document 
(https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#composite-types):

  5.6   Composite Types
  [...]
  - The natural alignment of a composite type is the maximum of each of the 
member alignments of the 'top-level' members of the composite type i.e. before 
any alignment adjustment of the entire composite is applied

In regards to the compatibility with other compilers, I'm not sure that 
following what seems to be an uncompliant behavior would be the best way to 
proceed. @rnk and @ostannard, what would be your take on this?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75903/new/

https://reviews.llvm.org/D75903



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75903: [AArch64][CodeGen] Fixing stack alignment of HFA arguments on AArch64 PCS

2020-03-30 Thread Reid Kleckner via Phabricator via cfe-commits
rnk added a comment.

In D75903#1914715 , @ostannard wrote:

> This means that `stk1` should be passed as a copy with alignment 16 bytes, 
> putting it at `sp+16`. GCC does this, clang without this patch passes it at 
> `sp+8`, and clang with this patch passes it at `sp+32`.


MSVC puts it at `sp+8`: https://gcc.godbolt.org/z/aAku9j

They allegedly follow this same document. Either way, we need to remain 
compatible.

Given that neither GCC nor MSVC do things they way you are proposing, are you 
sure this is correct? What compiler does this change make us more compatible 
with?

I don't believe there is an issue from the user's standpoint, since Clang 
copies the entire argument into an appropriately aligned alloca. If it is 
address-taken and stored to, there will not be any faults.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75903/new/

https://reviews.llvm.org/D75903



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75903: [AArch64][CodeGen] Fixing stack alignment of HFA arguments on AArch64 PCS

2020-03-20 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas added a comment.

Ping.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75903/new/

https://reviews.llvm.org/D75903



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75903: [AArch64][CodeGen] Fixing stack alignment of HFA arguments on AArch64 PCS

2020-03-11 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas added a comment.

I believe rule B.2 from the AAPCS64 should take precedence over rule B.5 for 
HFA arguments:

  B.2   If the argument type is an HFA or an HVA, then the argument is used 
unmodified.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75903/new/

https://reviews.llvm.org/D75903



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75903: [AArch64][CodeGen] Fixing stack alignment of HFA arguments on AArch64 PCS

2020-03-10 Thread Oliver Stannard (Linaro) via Phabricator via cfe-commits
ostannard added a comment.

I've not looked at the code in detail yet, but I think this still gets the ABI 
wrong for this example:

  typedef struct {
__attribute__ ((__aligned__(32))) double v[4];
  } TYPE1;
  
  double func(double d0, double d1, double d2, double d3,
  double d4, double d5, double d6, double d7,
  float stk0, TYPE1 stk1) {
return stk1.v[0];
  }

The ABI says 
(https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst, rule 
B.5):

  If the argument is an alignment adjusted type its value is passed as a copy 
of the actual value. The copy will have an alignment defined as follows.
  * For a Fundamental Data Type, the alignment is the natural alignment of that 
type, after any promotions.
  * For a Composite Type, the alignment of the copy will have 8-byte alignment 
if its natural alignment is <= 8 and 16-byte alignment if its natural alignment 
is >= 16.
  The alignment of the copy is used for applying marshaling rules.

This means that `stk1` should be passed as a copy with alignment 16 bytes, 
putting it at `sp+16`. GCC does this, clang without this patch passes it at 
`sp+8`, and clang with this patch passes it at `sp+32`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75903/new/

https://reviews.llvm.org/D75903



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75903: [AArch64][CodeGen] Fixing stack alignment of HFA arguments on AArch64 PCS

2020-03-10 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas updated this revision to Diff 249360.
pratlucas added a comment.

Clang-format.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75903/new/

https://reviews.llvm.org/D75903

Files:
  clang/include/clang/CodeGen/CGFunctionInfo.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/aarch64-args-hfa.c
  clang/test/CodeGen/arm64-arguments.c
  llvm/docs/LangRef.rst
  llvm/include/llvm/CodeGen/TargetCallingConv.h
  llvm/include/llvm/CodeGen/TargetLowering.h
  llvm/include/llvm/IR/Argument.h
  llvm/include/llvm/IR/Attributes.h
  llvm/include/llvm/IR/Attributes.td
  llvm/include/llvm/IR/Function.h
  llvm/include/llvm/IR/InstrTypes.h
  llvm/lib/AsmParser/LLParser.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
  llvm/lib/IR/Attributes.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/Verifier.cpp
  llvm/lib/Target/AArch64/AArch64CallingConvention.cpp
  llvm/test/Bitcode/compatibility.ll
  llvm/test/CodeGen/AArch64/arm64-abi-hfa-args.ll

Index: llvm/test/CodeGen/AArch64/arm64-abi-hfa-args.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/arm64-abi-hfa-args.ll
@@ -0,0 +1,54 @@
+; RUN: llc < %s -mtriple=arm64-none-eabi | FileCheck %s
+
+; Over-aligned HFA argument placed on register - one element per register
+define double @test_hfa_align_arg_reg([2 x double] alignstack(16) %h.coerce) local_unnamed_addr #0 {
+entry:
+; CHECK-LABEL: test_hfa_align_arg_reg:
+; CHECK-DAG: ret
+
+  %h.coerce.fca.0.extract = extractvalue [2 x double] %h.coerce, 0
+  ret double %h.coerce.fca.0.extract
+}
+
+; Call with over-aligned HFA argument placed on register - one element per register
+define double @test_hfa_align_call_reg() local_unnamed_addr #0 {
+entry:
+; CHECK-LABEL: test_hfa_align_call_reg:
+; CHECK-DAG: fmov	d0, #1.
+; CHECK-DAG: fmov	d1, #2.
+; CHECK-DAG: bl	test_hfa_align_arg_reg
+; CHECK-DAG: ldr	x30, [sp], #16  // 8-byte Folded Reload
+; CHECK-DAG: ret
+
+  %call = call double @test_hfa_align_arg_reg([2 x double] alignstack(16) [double 1.00e+00, double 2.00e+00])
+  ret double %call
+}
+
+; Over-aligned HFA argument placed on stack - stack round up to alignment
+define double @test_hfa_align_arg_stack(double %d0, double %d1, double %d2, double %d3, double %d4, double %d5, double %d6, double %d7, float %f, [2 x double] alignstack(16) %h.coerce) local_unnamed_addr #0 {
+entry:
+; CHECK-LABEL: test_hfa_align_arg_stack:
+; CHECK-DAG: ldr	d0, [sp, #16]
+; CHECK-DAG: ret
+
+  %h.coerce.fca.0.extract = extractvalue [2 x double] %h.coerce, 0
+  ret double %h.coerce.fca.0.extract
+}
+
+; Call with over-aligned HFA argument placed on stack - stack round up to alignment
+define double @test_hfa_align_call_stack() local_unnamed_addr #0 {
+entry:
+; CHECK-LABEL: test_hfa_align_call_stack:
+; CHECK-DAG: mov	x8, #4611686018427387904
+; CHECK-DAG: mov	x9, #4607182418800017408
+; CHECK-DAG: stp	x8, x30, [sp, #24]  // 8-byte Folded Spill
+; CHECK-DAG: str	x9, [sp, #16]
+; CHECK-DAG: bl	test_hfa_align_arg
+; CHECK-DAG: ldr	x30, [sp, #32]  // 8-byte Folded Reload
+; CHECK-DAG: add	sp, sp, #48 // =48
+; CHECK-DAG: ret
+
+  %call = call double @test_hfa_align_arg_stack(double undef, double undef, double undef, double undef, double undef, double undef, double undef, double undef, float undef, [2 x double] alignstack(16) [double 1.00e+00, double 2.00e+00])
+  ret double %call
+}
+
Index: llvm/test/Bitcode/compatibility.ll
===
--- llvm/test/Bitcode/compatibility.ll
+++ llvm/test/Bitcode/compatibility.ll
@@ -548,6 +548,8 @@
 ; CHECK: declare void @f.param.dereferenceable(i8* dereferenceable(4))
 declare void @f.param.dereferenceable_or_null(i8* dereferenceable_or_null(4))
 ; CHECK: declare void @f.param.dereferenceable_or_null(i8* dereferenceable_or_null(4))
+declare void @f.param.stack_align([2 x double] alignstack(16))
+; CHECK: declare void @f.param.stack_align([2 x double] alignstack(16))
 
 ; Functions -- unnamed_addr and local_unnamed_addr
 declare void @f.unnamed_addr() unnamed_addr
Index: llvm/lib/Target/AArch64/AArch64CallingConvention.cpp
===
--- llvm/lib/Target/AArch64/AArch64CallingConvention.cpp
+++ llvm/lib/Target/AArch64/AArch64CallingConvention.cpp
@@ -43,11 +43,16 @@
   const Align StackAlign =
   State.getMachineFunction().getDataLayout().getStackAlignment();
   const Align OrigAlign(ArgFlags.getOrigAlign());
-  const Align Align = std::min(OrigAlign, StackAlign);
+  const Align Alignment = std::min(OrigAlign, StackAlign);
+
+  if (ArgFlags.getStackAlign()) {
+const Align ArgStackAlign(ArgFlags.getStackAlign());
+State.AllocateStack(0, ArgStackAlign.value());
+  }
 
   for (auto  : PendingMembers) {
 

[PATCH] D75903: [AArch64][CodeGen] Fixing stack alignment of HFA arguments on AArch64 PCS

2020-03-10 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas created this revision.
Herald added subscribers: llvm-commits, cfe-commits, danielkiss, hiraditya, 
kristof.beyls.
Herald added projects: clang, LLVM.
pratlucas added a child revision: D75904: [ARM][CodeGen] Fixing stack alignment 
of HFA arguments on AArch32 PCS.
pratlucas added reviewers: t.p.northover, rnk, olista01, bogner.

Properly complying with AArch64 PCS on the handling of over-aligned HFA
arguments when those are placed on the stack. AAPCS64 specifies that the
stacked argument address should be rounded up to the Natural Alignment
of the argument before the argument is copied to memory.

Over alignment information extracted from language attributes on clang
was not properly propagated to the backend for those arguments, as it
does not map to the backend's base type alignments. As the standard also
specifies that, when placed in registers, a single FP register should be
allocated for each individual HFA element, type coercion is no suitable
for capturing the alignment constraints.

This patch fixes the alignment of these arguments by capturing their
stack alignment requirements in an IR argument attribute, making sure
this information is available for the calling convention lowering stage.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D75903

Files:
  clang/include/clang/CodeGen/CGFunctionInfo.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/aarch64-args-hfa.c
  clang/test/CodeGen/arm64-arguments.c
  llvm/docs/LangRef.rst
  llvm/include/llvm/CodeGen/TargetCallingConv.h
  llvm/include/llvm/CodeGen/TargetLowering.h
  llvm/include/llvm/IR/Argument.h
  llvm/include/llvm/IR/Attributes.h
  llvm/include/llvm/IR/Attributes.td
  llvm/include/llvm/IR/Function.h
  llvm/include/llvm/IR/InstrTypes.h
  llvm/lib/AsmParser/LLParser.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
  llvm/lib/IR/Attributes.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/Verifier.cpp
  llvm/lib/Target/AArch64/AArch64CallingConvention.cpp
  llvm/test/Bitcode/compatibility.ll
  llvm/test/CodeGen/AArch64/arm64-abi-hfa-args.ll

Index: llvm/test/CodeGen/AArch64/arm64-abi-hfa-args.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/arm64-abi-hfa-args.ll
@@ -0,0 +1,54 @@
+; RUN: llc < %s -mtriple=arm64-none-eabi | FileCheck %s
+
+; Over-aligned HFA argument placed on register - one element per register
+define double @test_hfa_align_arg_reg([2 x double] alignstack(16) %h.coerce) local_unnamed_addr #0 {
+entry:
+; CHECK-LABEL: test_hfa_align_arg_reg:
+; CHECK-DAG: ret
+
+  %h.coerce.fca.0.extract = extractvalue [2 x double] %h.coerce, 0
+  ret double %h.coerce.fca.0.extract
+}
+
+; Call with over-aligned HFA argument placed on register - one element per register
+define double @test_hfa_align_call_reg() local_unnamed_addr #0 {
+entry:
+; CHECK-LABEL: test_hfa_align_call_reg:
+; CHECK-DAG: fmov	d0, #1.
+; CHECK-DAG: fmov	d1, #2.
+; CHECK-DAG: bl	test_hfa_align_arg_reg
+; CHECK-DAG: ldr	x30, [sp], #16  // 8-byte Folded Reload
+; CHECK-DAG: ret
+
+  %call = call double @test_hfa_align_arg_reg([2 x double] alignstack(16) [double 1.00e+00, double 2.00e+00])
+  ret double %call
+}
+
+; Over-aligned HFA argument placed on stack - stack round up to alignment
+define double @test_hfa_align_arg_stack(double %d0, double %d1, double %d2, double %d3, double %d4, double %d5, double %d6, double %d7, float %f, [2 x double] alignstack(16) %h.coerce) local_unnamed_addr #0 {
+entry:
+; CHECK-LABEL: test_hfa_align_arg_stack:
+; CHECK-DAG: ldr	d0, [sp, #16]
+; CHECK-DAG: ret
+
+  %h.coerce.fca.0.extract = extractvalue [2 x double] %h.coerce, 0
+  ret double %h.coerce.fca.0.extract
+}
+
+; Call with over-aligned HFA argument placed on stack - stack round up to alignment
+define double @test_hfa_align_call_stack() local_unnamed_addr #0 {
+entry:
+; CHECK-LABEL: test_hfa_align_call_stack:
+; CHECK-DAG: mov	x8, #4611686018427387904
+; CHECK-DAG: mov	x9, #4607182418800017408
+; CHECK-DAG: stp	x8, x30, [sp, #24]  // 8-byte Folded Spill
+; CHECK-DAG: str	x9, [sp, #16]
+; CHECK-DAG: bl	test_hfa_align_arg
+; CHECK-DAG: ldr	x30, [sp, #32]  // 8-byte Folded Reload
+; CHECK-DAG: add	sp, sp, #48 // =48
+; CHECK-DAG: ret
+
+  %call = call double @test_hfa_align_arg_stack(double undef, double undef, double undef, double undef, double undef, double undef, double undef, double undef, float undef, [2 x double] alignstack(16) [double 1.00e+00, double 2.00e+00])
+  ret double %call
+}
+
Index: llvm/test/Bitcode/compatibility.ll
===
--- llvm/test/Bitcode/compatibility.ll
+++ llvm/test/Bitcode/compatibility.ll
@@ -548,6 +548,8 @@
 ; CHECK: declare void @f.param.dereferenceable(i8* dereferenceable(4))
 declare void @f.param.dereferenceable_or_null(i8*