[Bug 2101084] Re: GCC produces wrong code for arm64+sve in some cases

Vladimir Petko Wed, 17 Dec 2025 00:15:49 -0800

** Description changed:

- 
  [Impact]
  
- This bug causes data corruption in the ARM64 code compiled with Scalable 
Vector Extensions (SVE) enabled for the 256-bit SVE processor but executed on 
128-bit SVE processors. 
- Example is AWS workload built for Graviton3, but executed on Graviton4. 
+ This bug causes data corruption in the ARM64 code compiled with Scalable 
Vector Extensions (SVE) enabled for the 256-bit SVE processor but executed on 
128-bit SVE processors.
+ Example is AWS workload built for Graviton3, but executed on Graviton4.
  
  When the compiler was compiling the ~ConstA (Not ConstA) expression to
  compute the index into the vector it was actually computing -ConstA
  (minus ConstA), e.g. ~4 instead of -5 produced -4.
  
  Graviton 4  processes a 256-bit vector in two passes. For the second
  pass it runs into this bug when computing indices into the second half
  of the vector and ends up with {-4, -5, -6, -7}, processing the last
  element of the first half twice and never touching the last element of
  the vector.
  
  This data corruption may cause data loss, failing checksums, and
  potentially security issues.
  
  [Test Plan]
  
  I was using Raspberry PI 5 for testing, but any other ARM64 platform or
  virtual machine will be sufficient.
  
  Install QEMU in noble:
  
  apt install qemu-user-static
  
  Launch lxd vm for the affected release, e.g.
  
  lxc launch ubuntu-daily:jammy tester
  lxc file push test.c tester/home/ubuntu/
  
  Install affected gcc:
  lxc exec tester -- /bin/sh -c "apt-get update && apt-get install -y gcc-9"
  
  Compile the reproducer[1]:
  lxc exec tester -- /bin/sh -c "gcc-9 -fno-inline -O3 -Wall 
-fno-strict-aliasing  -march=armv8.4-a+sve  -o /home/ubuntu/final 
/home/ubuntu/test.c”
  
  Fetch the reproducer:
  lxc file pull tester/home/ubuntu/final final
  
  Execute the testcase:
  qemu-aarch64-static -cpu neoverse-n2 ./final
  
  The testcase will output:
  PASS: got 0x00bbbbbb 0x00aaaaaa as expected
  If the bug is fixed and
- ERROR: expected 0x00bbbbbb 0x00aaaaaa but got 0x00bbbbbb 0xaaaaaa00 
+ ERROR: expected 0x00bbbbbb 0x00aaaaaa but got 0x00bbbbbb 0xaaaaaa00
  otherwise.
  
  [Where the problems can occur]
  
  The issue is a typo in the code that is used to calculate offset into
  the vector.
  
  The already corrupted data (e.g. checksums) calculated by the affected
  code will not match with the values produced after the fix. This may
  cause the end user to rebuild the indices relying on the calculated hash
  values after their workloads are recompiled by the fixed gcc.
  
  [Other info]
  
  Focal fixes will be done through the -pro updates.
  
  I have ran the test case set Invalid for the versions that are not
  affected by this issue.
  
  Affected:
  All gcc-8[2]
  All gcc-9[2]
  All gcc-11[2]
- Noble and down Gcc-12 
+ Noble and down Gcc-12
  Noble and down Gcc-13
  Noble and down Gcc-14
  Gcc-15 is not affected
  
+ The fixed packages will be uploaded to the stable PPA[3] created for this 
SRU. 
+ The PPA depends on -security only. The packages will need to be binary-copied 
to -updates and -security. 
+ 
  [1] 
https://bugs.launchpad.net/ubuntu/plucky/+source/gcc-14/+bug/2101084/comments/39
  [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976#c21
- 
+ [3] https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/lp-2101084
  
  Original Description:
  
  [Impact]
  This issue affects SVE vectorization on arm64 platforms, specifically in 
cases where bitwise-not operations are applied during optimization.
  
  [Fix]
  This issue has been resolved by an upstream patch.
  
  commit 78380fd7f743e23dfdf013d68a2f0347e1511550
  Author: Richard Sandiford <[email protected]>
  Date: Tue Mar 4 10:44:35 2025 +0000
  
      Fix folding of BIT_NOT_EXPR for POLY_INT_CST [PR118976]
  
      There was an embarrassing typo in the folding of BIT_NOT_EXPR for
      POLY_INT_CSTs: it used - rather than ~ on the poly_int.  Not sure
      how that happened, but it might have been due to the way that
      ~x is implemented as -1 - x internally.
  
      gcc/
              PR tree-optimization/118976
              * fold-const.cc (const_unop): Use ~ rather than - for 
BIT_NOT_EXPR.
              * config/aarch64/aarch64.cc (aarch64_test_sve_folding): New 
function.
              (aarch64_run_selftests): Run it.
  
  [Test Plan]
  1. Launch an instance using the latest generation of Graviton processors 
(Graviton4).
  2. Compile the following code using the command `gcc -O3 
-march=armv8.1-a+sve`:
  
  #include <stdint.h>
  #include <stdio.h>
  
  #ifndef NCOUNTS
  #define NCOUNTS 2
  #endif
  typedef struct {
     uint32_t state[5];
     uint32_t count[NCOUNTS];
     unsigned char buffer[64];
  } SHA1_CTX;
  
  void finalcount_av(SHA1_CTX *restrict ctx, unsigned char *restrict 
finalcount) {
     // ctx->count is:  uint32_t count[2];
     int count_idx;
     for (int i = 0; i < 4*NCOUNTS; i++) {
         count_idx = (4*NCOUNTS - i - 1)/4; // generic but equivalent for 
NCOUNTS==2.
         finalcount[i] = (unsigned char)((ctx->count[count_idx] >> ((3-(i & 3)) 
* 8) ) & 255);
     }
  }
  
  void finalcount_bv(SHA1_CTX *restrict ctx, unsigned char *restrict 
finalcount) {
     for (int i=0; i < 4*NCOUNTS; i += 4) {
         int ci = (4*NCOUNTS - i - 1)/4;
         finalcount[i+0] = (unsigned char)((ctx->count[ci] >> (3 * 8) ) & 255);
         finalcount[i+1] = (unsigned char)((ctx->count[ci] >> (2 * 8) ) & 255);
         finalcount[i+2] = (unsigned char)((ctx->count[ci] >> (1 * 8) ) & 255);
         finalcount[i+3] = (unsigned char)((ctx->count[ci] >> (0 * 8) ) & 255);
     }
  }
  
  int main() {
     unsigned char fa[NCOUNTS*4];
     unsigned char fb[NCOUNTS*4];
     uint32_t *for_print;
     int i;
  
     SHA1_CTX ctx;
     ctx.count[0] = 0xaaaaaa00;
     ctx.count[1] = 0xbbbbbb00;
     if (NCOUNTS >2 ) ctx.count[2] = 0xcccccc00;
     if (NCOUNTS >3 ) ctx.count[3] = 0xdddddd00;
     finalcount_av(&ctx, fa);
     finalcount_bv(&ctx, fb);
  
     int ok = 1;
     for (i=0; i<NCOUNTS*4; i++) {
         ok &= fa[i] == fb[i];
     }
     if (!ok) {
         for_print = (uint32_t*)fb;
         printf("ERROR: expected ");
         for (i=0; i<NCOUNTS; i++) {
             printf("0x%08x ",for_print[i]);
         }
         for_print = (uint32_t*)fa;
         printf("but got ");
         for (i=0; i<NCOUNTS; i++) {
             printf("0x%08x ",for_print[i]);
         }
         printf("\n");
         return 1;
     } else {
         for_print = (uint32_t*)fa;
         printf("PASS: got ");
         for (i=0; i<NCOUNTS; i++) {
             printf("0x%08x ",for_print[i]);
         }
         printf("as expected\n");
         return 0;
     }
  }
  
  3. Verify that the execution output does not contain the string "ERROR".
  
  [Where problems could occur]
  The issue is caused by a typo. If any regressions occur, they are expected to 
impact only specific partial instructions under certain scenarios, rather than 
disrupting the overall functionality.


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2101084

Title:
  GCC produces wrong code for arm64+sve in some cases

To manage notifications about this bug go to:
https://bugs.launchpad.net/gcc/+bug/2101084/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2101084] Re: GCC produces wrong code for arm64+sve in some cases

Reply via email to