https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95816

            Bug ID: 95816
           Summary: Aarch64 jumps between Hot/Cold sections use possibly
                    clobbered registers x16/x17
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---
            Target: AArch64

When splitting a function into two different sections (hot/cold).

The assembler produces a relocation on jumps between the two sections.

The linker is permitted to use a veneer to implement such a relocated jump.

The registers x16 and x17 are reserved for use in those veneers.

Hence the registers x16 and x17 should be treated as clobbered when jumping
between the hot/cold sections in a function.

This is not done.

We can use the testcase below to demonstrate this (modified from predict-22.c
in the testsuite).

---------
$ aarch64-none-linux-gnu-gcc \                                                  
>    predict-22.c \
>    -O2 -w -fPIC -freorder-blocks-and-partition \
>    -c -o predict-22.o

---------
volatile int v;
void bar (void) __attribute__((leaf, cold));
void baz (int *);
void alt (long long);

void
foo (int x, int y, int z)
{
  static int f __attribute__((section ("mysection")));
  register long long k asm ("x16");
  __asm__ ("mov\t%0, 10" : "=r" (k) : "0" (k));
  f = 1;
  if (__builtin_expect (x, 0))
  if (__builtin_expect (y, 0))
  if (__builtin_expect (z, 0))
    {
      f = 2;
      k *= 13;
      bar ();
      v += 1;
      v *= 2;
      v /= 2;
      v -= 1;
      v += 1;
      v *= 2;
      v /= 2;
      v -= 1;
      v += 1;
      v *= 2;
      v /= 2;
      v -= 1;
      v += k;
      v *= 2;
      v /= 2;
      v -= 1;
      v += 1;
      v *= 2;
      v /= 2;
      v -= 1;
      v += 1;
      v *= 2;
      v /= 2;
      v -= 1;
      v += 1;
      v *= 2;
      v /= 2;
      v -= 1;
      v += 1;
      v *= 2;
      v /= 2;
      v -= 1;
      f = 3;
      __builtin_abort ();
    }
  f = k;
  baz (&f);
}

--------

This produces an object file which is dumped below.
The dump below demonstrates that there is a R_AARCH64_JUMP26 relocation on the
jump between the hot/cold sections, and that the value stored in x16 is used
after that jump.



$  aarch64-none-linux-gnu-objdump -dr predict-22.o                              
predict-22.o:     file format elf64-littleaarch64


Disassembly of section .text:

0000000000000000 <foo>:
   0:   7100003f        cmp     w1, #0x0
   4:   7a401844        ccmp    w2, #0x0, #0x4, ne  // ne = any
   8:   7a401804        ccmp    w0, #0x0, #0x4, ne  // ne = any
   c:   d2800150        mov     x16, #0xa                       // #10
  10:   540000a1        b.ne    24 <foo+0x24>  // b.any
  14:   90000001        adrp    x1, 0 <foo>
                        14: R_AARCH64_ADR_PREL_PG_HI21  .bss
  18:   91000020        add     x0, x1, #0x0
                        18: R_AARCH64_ADD_ABS_LO12_NC   .bss
  1c:   b9000030        str     w16, [x1]
                        1c: R_AARCH64_LDST32_ABS_LO12_NC        .bss
  20:   14000000        b       0 <baz>
                        20: R_AARCH64_JUMP26    baz
  24:   a9bd7bfd        stp     x29, x30, [sp, #-48]!
  28:   910003fd        mov     x29, sp
  2c:   a90153f3        stp     x19, x20, [sp, #16]
  30:   f90013f5        str     x21, [sp, #32]
  34:   14000000        b       0 <foo>                           # Here is the
relocation.
                        34: R_AARCH64_JUMP26    .text.unlikely

Disassembly of section .text.unlikely:

0000000000000000 <foo.cold>:
   0:   90000015        adrp    x21, 0 <foo.cold>
                        0: R_AARCH64_ADR_PREL_PG_HI21   .bss
   4:   52800053        mov     w19, #0x2                       // #2
   8:   aa1003f4        mov     x20, x16                        # Here we try
and use the clobbered x16 register.
   c:   b90002b3        str     w19, [x21]
                        c: R_AARCH64_LDST32_ABS_LO12_NC .bss
  10:   94000000        bl      0 <bar>
                        10: R_AARCH64_CALL26    bar
  14:   90000000        adrp    x0, 4 <foo.cold+0x4>
                        14: R_AARCH64_ADR_GOT_PAGE      v
  18:   d28001a3        mov     x3, #0xd                        // #13
  1c:   52800062        mov     w2, #0x3                        // #3
  20:   f9400000        ldr     x0, [x0]
                        20: R_AARCH64_LD64_GOT_LO12_NC  v

Reply via email to