I have found another ABI compliance bug in the AArch64 backend (arm64-gen.c).
According to AAPCS64, when an argument requires 16-byte alignment and
is passed on the stack, the stack address must be rounded up to the
next 16-byte boundary.
Currently, TCC fails to perform this alignment check for non-HFA types
in gen_va_arg. It reads data directly from the current stack pointer
position, ignoring the necessary padding. This results in data
corruption when a 16-byte aligned argument follows an 8-byte argument
on the stack.
I have checked x86_64-gen.c and riscv64-gen.c, and I did not observe
similar issues in those backends.
----------------------------------
Reproduction Code:
#include <stdarg.h>
#include <stdio.h>
#include <stdint.h>
typedef struct __attribute__((aligned(16))) A16 {
uint64_t lo;
uint64_t hi;
} A16;
static int check(int dummy, ...)
{
va_list ap;
uint64_t first;
A16 second;
va_start(ap, dummy);
// The first argument takes 8 bytes on the stack (if registers are
exhausted)
first = va_arg(ap, uint64_t);
// The second argument requires 16-byte alignment.
// TCC currently reads from offset 8 instead of offset 16 (padding ignored).
second = va_arg(ap, A16);
va_end(ap);
if (first != 0x1122334455667788ULL)
return 1;
if (second.lo != 0xaaaaaaaaaaaaaaaaULL || second.hi !=
0xbbbbbbbbbbbbbbbbULL)
return 2;
return 0;
}
int main(void)
{
// Force stack usage by exhausting registers or relying on va_list behavior
// (Note: Mach-O/Apple Silicon passes variadic args entirely on the stack)
A16 v = { 0xaaaaaaaaaaaaaaaaULL, 0xbbbbbbbbbbbbbbbbULL };
if (check(0, 0x1122334455667788ULL, v) != 0) {
puts("FAIL");
return 1;
}
puts("OK");
return 0;
}
------------------------
The patch:
diff --git a/arm64-gen.c b/arm64-gen.c
index 2038aeba..bbe63fa6 100644
--- a/arm64-gen.c
+++ b/arm64-gen.c
@@ -1355,6 +1355,10 @@ ST_FUNC void gen_va_arg(CType *t)
o(0x540000ad); // b.le .+20
#endif
o(0xf9400000 | r1 | r0 << 5); // ldr x(r1),[x(r0)] // __stack
+ if (align == 16) {
+ o(0x91003c00 | r1 | r1 << 5); // add x(r1),x(r1),#15
+ o(0x927cec00 | r1 | r1 << 5); // and x(r1),x(r1),#-16
+ }
o(0x9100001e | r1 << 5 | n << 10); // add x30,x(r1),#(n)
o(0xf900001e | r0 << 5); // str x30,[x(r0)] // __stack
#if !defined(TCC_TARGET_MACHO)
This change fixes the issue and then renders the correct result.
_______________________________________________
Tinycc-devel mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/tinycc-devel