[Bug 2101084] Re: GCC produces wrong code for arm64+sve in some cases

Vladimir Petko Wed, 17 Dec 2025 00:10:44 -0800

** Changed in: gcc-13 (Ubuntu Plucky)
       Status: Won't Fix => Invalid


** Changed in: gcc-14 (Ubuntu Questing)
       Status: Won't Fix => Invalid

** Changed in: gcc-14 (Ubuntu Resolute)
       Status: New => Invalid

** Changed in: gcc-14 (Ubuntu Plucky)
       Status: Won't Fix => Invalid

** No longer affects: gcc-8 (Ubuntu Resolute)

** No longer affects: gcc-9 (Ubuntu Resolute)

** Description changed:

+ 
+ [Impact]
+ 
+ This bug causes data corruption in the ARM64 code compiled with Scalable 
Vector Extensions (SVE) enabled for the 256-bit SVE processor but executed on 
128-bit SVE processors. 
+ Example is AWS workload built for Graviton3, but executed on Graviton4. 
+ 
+ When the compiler was compiling the ~ConstA (Not ConstA) expression to
+ compute the index into the vector it was actually computing -ConstA
+ (minus ConstA), e.g. ~4 instead of -5 produced -4.
+ 
+ Graviton 4  processes a 256-bit vector in two passes. For the second
+ pass it runs into this bug when computing indices into the second half
+ of the vector and ends up with {-4, -5, -6, -7}, processing the last
+ element of the first half twice and never touching the last element of
+ the vector.
+ 
+ This data corruption may cause data loss, failing checksums, and
+ potentially security issues.
+ 
+ [Test Plan]
+ 
+ I was using Raspberry PI 5 for testing, but any other ARM64 platform or
+ virtual machine will be sufficient.
+ 
+ Install QEMU in noble:
+ 
+ apt install qemu-user-static
+ 
+ Launch lxd vm for the affected release, e.g.
+ 
+ lxc launch ubuntu-daily:jammy tester
+ lxc file push test.c tester/home/ubuntu/
+ 
+ Install affected gcc:
+ lxc exec tester -- /bin/sh -c "apt-get update && apt-get install -y gcc-9"
+ 
+ Compile the reproducer[1]:
+ lxc exec tester -- /bin/sh -c "gcc-9 -fno-inline -O3 -Wall 
-fno-strict-aliasing  -march=armv8.4-a+sve  -o /home/ubuntu/final 
/home/ubuntu/test.c”
+ 
+ Fetch the reproducer:
+ lxc file pull tester/home/ubuntu/final final
+ 
+ Execute the testcase:
+ qemu-aarch64-static -cpu neoverse-n2 ./final
+ 
+ The testcase will output:
+ PASS: got 0x00bbbbbb 0x00aaaaaa as expected
+ If the bug is fixed and
+ ERROR: expected 0x00bbbbbb 0x00aaaaaa but got 0x00bbbbbb 0xaaaaaa00 
+ otherwise.
+ 
+ [Where the problems can occur]
+ 
+ The issue is a typo in the code that is used to calculate offset into
+ the vector.
+ 
+ The already corrupted data (e.g. checksums) calculated by the affected
+ code will not match with the values produced after the fix. This may
+ cause the end user to rebuild the indices relying on the calculated hash
+ values after their workloads are recompiled by the fixed gcc.
+ 
+ [Other info]
+ 
+ Focal fixes will be done through the -pro updates.
+ 
+ I have ran the test case set Invalid for the versions that are not
+ affected by this issue.
+ 
+ Affected:
+ All gcc-8[2]
+ All gcc-9[2]
+ All gcc-11[2]
+ Noble and down Gcc-12 
+ Noble and down Gcc-13
+ Noble and down Gcc-14
+ Gcc-15 is not affected
+ 
+ [1] 
https://bugs.launchpad.net/ubuntu/plucky/+source/gcc-14/+bug/2101084/comments/39
+ [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976#c21
+ 
+ 
+ Original Description:
+ 
  [Impact]
  This issue affects SVE vectorization on arm64 platforms, specifically in 
cases where bitwise-not operations are applied during optimization.
  
  [Fix]
  This issue has been resolved by an upstream patch.
  
  commit 78380fd7f743e23dfdf013d68a2f0347e1511550
  Author: Richard Sandiford <[email protected]>
  Date: Tue Mar 4 10:44:35 2025 +0000
  
      Fix folding of BIT_NOT_EXPR for POLY_INT_CST [PR118976]
  
      There was an embarrassing typo in the folding of BIT_NOT_EXPR for
      POLY_INT_CSTs: it used - rather than ~ on the poly_int.  Not sure
      how that happened, but it might have been due to the way that
      ~x is implemented as -1 - x internally.
  
      gcc/
              PR tree-optimization/118976
              * fold-const.cc (const_unop): Use ~ rather than - for 
BIT_NOT_EXPR.
              * config/aarch64/aarch64.cc (aarch64_test_sve_folding): New 
function.
              (aarch64_run_selftests): Run it.
  
  [Test Plan]
  1. Launch an instance using the latest generation of Graviton processors 
(Graviton4).
  2. Compile the following code using the command `gcc -O3 
-march=armv8.1-a+sve`:
  
  #include <stdint.h>
  #include <stdio.h>
  
- 
  #ifndef NCOUNTS
  #define NCOUNTS 2
  #endif
  typedef struct {
-    uint32_t state[5];
-    uint32_t count[NCOUNTS];
-    unsigned char buffer[64];
+    uint32_t state[5];
+    uint32_t count[NCOUNTS];
+    unsigned char buffer[64];
  } SHA1_CTX;
  
- 
  void finalcount_av(SHA1_CTX *restrict ctx, unsigned char *restrict 
finalcount) {
-    // ctx->count is:  uint32_t count[2];
-    int count_idx;
-    for (int i = 0; i < 4*NCOUNTS; i++) {
-        count_idx = (4*NCOUNTS - i - 1)/4; // generic but equivalent for 
NCOUNTS==2.
-        finalcount[i] = (unsigned char)((ctx->count[count_idx] >> ((3-(i & 3)) 
* 8) ) & 255);
-    }
+    // ctx->count is:  uint32_t count[2];
+    int count_idx;
+    for (int i = 0; i < 4*NCOUNTS; i++) {
+        count_idx = (4*NCOUNTS - i - 1)/4; // generic but equivalent for 
NCOUNTS==2.
+        finalcount[i] = (unsigned char)((ctx->count[count_idx] >> ((3-(i & 3)) 
* 8) ) & 255);
+    }
  }
  
- 
  void finalcount_bv(SHA1_CTX *restrict ctx, unsigned char *restrict 
finalcount) {
-    for (int i=0; i < 4*NCOUNTS; i += 4) {
-        int ci = (4*NCOUNTS - i - 1)/4;
-        finalcount[i+0] = (unsigned char)((ctx->count[ci] >> (3 * 8) ) & 255);
-        finalcount[i+1] = (unsigned char)((ctx->count[ci] >> (2 * 8) ) & 255);
-        finalcount[i+2] = (unsigned char)((ctx->count[ci] >> (1 * 8) ) & 255);
-        finalcount[i+3] = (unsigned char)((ctx->count[ci] >> (0 * 8) ) & 255);
-    }
+    for (int i=0; i < 4*NCOUNTS; i += 4) {
+        int ci = (4*NCOUNTS - i - 1)/4;
+        finalcount[i+0] = (unsigned char)((ctx->count[ci] >> (3 * 8) ) & 255);
+        finalcount[i+1] = (unsigned char)((ctx->count[ci] >> (2 * 8) ) & 255);
+        finalcount[i+2] = (unsigned char)((ctx->count[ci] >> (1 * 8) ) & 255);
+        finalcount[i+3] = (unsigned char)((ctx->count[ci] >> (0 * 8) ) & 255);
+    }
  }
  
+ int main() {
+    unsigned char fa[NCOUNTS*4];
+    unsigned char fb[NCOUNTS*4];
+    uint32_t *for_print;
+    int i;
  
- int main() {
-    unsigned char fa[NCOUNTS*4];
-    unsigned char fb[NCOUNTS*4];
-    uint32_t *for_print;
-    int i;
-   
-    SHA1_CTX ctx;
-    ctx.count[0] = 0xaaaaaa00;
-    ctx.count[1] = 0xbbbbbb00;
-    if (NCOUNTS >2 ) ctx.count[2] = 0xcccccc00;
-    if (NCOUNTS >3 ) ctx.count[3] = 0xdddddd00;
-    finalcount_av(&ctx, fa);
-    finalcount_bv(&ctx, fb);
+    SHA1_CTX ctx;
+    ctx.count[0] = 0xaaaaaa00;
+    ctx.count[1] = 0xbbbbbb00;
+    if (NCOUNTS >2 ) ctx.count[2] = 0xcccccc00;
+    if (NCOUNTS >3 ) ctx.count[3] = 0xdddddd00;
+    finalcount_av(&ctx, fa);
+    finalcount_bv(&ctx, fb);
  
- 
-    int ok = 1;
-    for (i=0; i<NCOUNTS*4; i++) {
-        ok &= fa[i] == fb[i];
-    }
-    if (!ok) {
-        for_print = (uint32_t*)fb;
-        printf("ERROR: expected ");
-        for (i=0; i<NCOUNTS; i++) {
-            printf("0x%08x ",for_print[i]);
-        }
-        for_print = (uint32_t*)fa;
-        printf("but got ");
-        for (i=0; i<NCOUNTS; i++) {
-            printf("0x%08x ",for_print[i]);
-        }
-        printf("\n");
-        return 1;
-    } else {
-        for_print = (uint32_t*)fa;
-        printf("PASS: got ");
-        for (i=0; i<NCOUNTS; i++) {
-            printf("0x%08x ",for_print[i]);
-        }
-        printf("as expected\n");
-        return 0;
-    }
+    int ok = 1;
+    for (i=0; i<NCOUNTS*4; i++) {
+        ok &= fa[i] == fb[i];
+    }
+    if (!ok) {
+        for_print = (uint32_t*)fb;
+        printf("ERROR: expected ");
+        for (i=0; i<NCOUNTS; i++) {
+            printf("0x%08x ",for_print[i]);
+        }
+        for_print = (uint32_t*)fa;
+        printf("but got ");
+        for (i=0; i<NCOUNTS; i++) {
+            printf("0x%08x ",for_print[i]);
+        }
+        printf("\n");
+        return 1;
+    } else {
+        for_print = (uint32_t*)fa;
+        printf("PASS: got ");
+        for (i=0; i<NCOUNTS; i++) {
+            printf("0x%08x ",for_print[i]);
+        }
+        printf("as expected\n");
+        return 0;
+    }
  }
  
  3. Verify that the execution output does not contain the string "ERROR".
  
  [Where problems could occur]
  The issue is caused by a typo. If any regressions occur, they are expected to 
impact only specific partial instructions under certain scenarios, rather than 
disrupting the overall functionality.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2101084

Title:
  GCC produces wrong code for arm64+sve in some cases

To manage notifications about this bug go to:
https://bugs.launchpad.net/gcc/+bug/2101084/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2101084] Re: GCC produces wrong code for arm64+sve in some cases

Reply via email to