[Bug tree-optimization/114864] [12/13/14/15 regression] wrong code at -O1 with "-fno-tree-dce -fno-tree-fre" on x86_64-linux-gnu

2024-04-26 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114864

H.J. Lu  changed:

   What|Removed |Added

 CC||ebotcazou at gcc dot gnu.org

--- Comment #2 from H.J. Lu  ---
It is caused by r12-434.

[Bug rtl-optimization/114828] [14 Regression] ICE on valid code at -O1 with "-ftree-pre -fselective-scheduling -fsel-sched-pipelining -fschedule-insns" on x86_64-linux-gnu: Segmentation fault

2024-04-23 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114828

H.J. Lu  changed:

   What|Removed |Added

 Ever confirmed|0   |1
Version|unknown |14.0
 Status|UNCONFIRMED |NEW
Summary|ICE on valid code at -O1|[14 Regression] ICE on
   |with "-ftree-pre|valid code at -O1 with
   |-fselective-scheduling  |"-ftree-pre
   |-fsel-sched-pipelining  |-fselective-scheduling
   |-fschedule-insns" on|-fsel-sched-pipelining
   |x86_64-linux-gnu:   |-fschedule-insns" on
   |Segmentation fault  |x86_64-linux-gnu:
   ||Segmentation fault
 CC||rguenther at suse dot de
   Last reconfirmed||2024-04-23

--- Comment #1 from H.J. Lu  ---
This is caused by r14-4089.

[Bug tree-optimization/114796] [11/12/13/14 Regression] wrong code at -O2 with "-fno-tree-fre -fno-inline -fselective-scheduling2" on x86_64-linux-gnu

2024-04-21 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114796

H.J. Lu  changed:

   What|Removed |Added

Summary|wrong code at -O2 with  |[11/12/13/14 Regression]
   |"-fno-tree-fre -fno-inline  |wrong code at -O2 with
   |-fselective-scheduling2" on |"-fno-tree-fre -fno-inline
   |x86_64-linux-gnu|-fselective-scheduling2" on
   ||x86_64-linux-gnu
   Last reconfirmed||2024-04-21
 CC||abel at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
Version|unknown |14.0
 Ever confirmed|0   |1

--- Comment #1 from H.J. Lu  ---
This is caused by r9-6789.

[Bug tree-optimization/114793] [14 Regression] wrong code at -O1 with "-fschedule-insns2 -fselective-scheduling2" on x86_64-linux-gnu (the generated code hangs)

2024-04-21 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114793

--- Comment #3 from H.J. Lu  ---
(In reply to Zhendong Su from comment #1)
> The following reproducer is different, but perhaps is the same or related.
> 
> Compiler Explorer: https://godbolt.org/z/411rzMP1n
> 
> [588] % gcctk -v
> Using built-in specs.
> COLLECT_GCC=gcctk
> COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/
> x86_64-pc-linux-gnu/14.0.1/lto-wrapper
> Target: x86_64-pc-linux-gnu
> Configured with: ../gcc-trunk/configure --disable-bootstrap
> --enable-checking=yes --prefix=/local/suz-local/software/local/gcc-trunk
> --enable-sanitizers --enable-languages=c,c++ --disable-werror
> --enable-multilib
> Thread model: posix
> Supported LTO compression algorithms: zlib
> gcc version 14.0.1 20240421 (experimental) (GCC) 
> [589] % 
> [589] % gcctk -O1 -fno-tree-forwprop -fselective-scheduling2
> -fschedule-insns2 -fsel-sched-pipelining small.c
> [590] % ./a.out
> Aborted
> [591] % 
> [591] % cat small.c
> int printf(const char *, ...);
> int a, d, g, h;
> volatile int b = 1;
> static unsigned c = 1;
> char e, f = 1, i;
> static int j() {
>   int k, l = g, m = 1 << l, n = -e, o = -1 % ((f && 1) ^ i), p = ~n - o;
>   if (m) {
> int q, s, t, r = 1 % (((1 % f) & (~e | c)) ^ b);
> q = f;
> s = i;
> t = e;
> f = -b;
> k = f;
> d = -1;
>   u:
> e = 0 & b;
> if (i > f)
>   if (!b)
> goto v;
> if (d > t)
>   __builtin_abort();
> if (b < 1 || !d || !c) {
>   printf("%d\n", i);
>   f = ((i | b) & (k - r)) << (e << ~t ^ q) << s;
>   goto u;
> }
> if (i)
>   f = q;
>   v:
> i = n & o & l;
> printf("%ld\n", (long)t);
>   }
>   i = p;
>   return h;
> }
> int main() {
>   for (; a < 3; a++)
> j();
>   return 0;
> }

This is caused by r14-2524.

[Bug tree-optimization/114793] wrong code at -O1 with "-fschedule-insns2 -fselective-scheduling2" on x86_64-linux-gnu (the generated code hangs)

2024-04-21 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114793

H.J. Lu  changed:

   What|Removed |Added

 CC||jh at suse dot cz
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-04-21
Version|unknown |14.0

--- Comment #2 from H.J. Lu  ---
(In reply to Zhendong Su from comment #0)
> It seems to be a recent regression as it does not reproduce with 13.2 and
> earlier.
> 
> Compiler Explorer: https://godbolt.org/z/b3cc1MqP9
> 
> [538] % gcctk -v
> Using built-in specs.
> COLLECT_GCC=gcctk
> COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/
> x86_64-pc-linux-gnu/14.0.1/lto-wrapper
> Target: x86_64-pc-linux-gnu
> Configured with: ../gcc-trunk/configure --disable-bootstrap
> --enable-checking=yes --prefix=/local/suz-local/software/local/gcc-trunk
> --enable-sanitizers --enable-languages=c,c++ --disable-werror
> --enable-multilib
> Thread model: posix
> Supported LTO compression algorithms: zlib
> gcc version 14.0.1 20240421 (experimental) (GCC) 
> [539] % 
> [539] % gcctk -O0 small.c
> [540] % ./a.out
> [541] % 
> [541] % gcctk -O1 -fschedule-insns2 -fselective-scheduling2 small.c
> [542] % timeout -s 9 10 ./a.out
> Killed
> [543] % 
> [543] % cat small.c
> int printf(const char *, ...);
> volatile int a;
> int b, c, d = 1, e, f;
> int main() {
>   int g = 1;
>   for (; b; b -= d)
> g = e;
>   for (; c < 2; c++) {
> if (g) {
>   if (!d)
> printf("%d", f);
>   continue;
> }
> a;
>   }
>   return 0;
> }

This is caused by r14-2712.

[Bug tree-optimization/114792] ICE on valid code at -O1 with "-fno-tree-ccp -fno-tree-copy-prop" on x86_64-linux-gnu: in get_loop_body, at cfgloop.cc:903

2024-04-21 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114792

H.J. Lu  changed:

   What|Removed |Added

   Last reconfirmed||2024-04-21
 CC||jh at suse dot cz
 Ever confirmed|0   |1
Version|unknown |14.0
 Status|UNCONFIRMED |NEW

--- Comment #1 from H.J. Lu  ---
It is caused by r14-301.

[Bug gcov-profile/114115] xz-utils segfaults when built with -fprofile-generate (bad interaction between IFUNC and binding?)

2024-04-15 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114115

H.J. Lu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #21 from H.J. Lu  ---
Fixed for GCC 14 and GCC 11/12/13 release branches.

[Bug target/114696] ICE: in extract_constrain_insn_cached, at recog.cc:2725 insn does not satisfy its constraints: {*anddi_1} with -mapxf -mx32

2024-04-15 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114696

H.J. Lu  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from H.J. Lu  ---
Fixed.

[Bug gcov-profile/114115] xz-utils segfaults when built with -fprofile-generate (bad interaction between IFUNC and binding?)

2024-04-14 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114115

--- Comment #17 from H.J. Lu  ---
(In reply to Jan Hubicka from comment #15)
> > Fixed for GCC 14 so far
> It is simple patch, so backporting is OK after a week in mainline.

These are patches which I am backporting:

https://patchwork.sourceware.org/project/gcc/list/?series=32823

[Bug target/114696] ICE: in extract_constrain_insn_cached, at recog.cc:2725 insn does not satisfy its constraints: {*anddi_1} with -mapxf -mx32

2024-04-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114696

H.J. Lu  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

--- Comment #3 from H.J. Lu  ---
A patch is posted at

https://patchwork.sourceware.org/project/gcc/list/?series=32811

[Bug target/114696] ICE: in extract_constrain_insn_cached, at recog.cc:2725 insn does not satisfy its constraints: {*anddi_1} with -mapxf -mx32

2024-04-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114696

H.J. Lu  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |hjl.tools at gmail dot 
com

--- Comment #2 from H.J. Lu  ---
Created attachment 57934
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57934=edit
A patch

I am testing this.

[Bug target/114696] ICE: in extract_constrain_insn_cached, at recog.cc:2725 insn does not satisfy its constraints: {*anddi_1} with -mapxf -mx32

2024-04-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114696

H.J. Lu  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-04-12
 Status|UNCONFIRMED |NEW

--- Comment #1 from H.J. Lu  ---
The problem is that the APX encoding length for AND exceeds 15 bytes with
-mx32.

[Bug libfortran/114646] libgfortran still doesn't define GTHREAD_USE_WEAK to 0 for newer glibc

2024-04-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114646

H.J. Lu  changed:

   What|Removed |Added

 Status|RESOLVED|NEW
 Resolution|DUPLICATE   |---

--- Comment #14 from H.J. Lu  ---
This issue is about how libgcc is used by libgfortran, not libgcc itself.

[Bug libfortran/114646] libgfortran still doesn't define GTHREAD_USE_WEAK to 0 for newer glibc

2024-04-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114646

H.J. Lu  changed:

   What|Removed |Added

 Resolution|DUPLICATE   |---
 Status|RESOLVED|NEW
 Ever confirmed|0   |1
  Component|libgcc  |libfortran
Summary|libgcc's gthr.h still   |libgfortran still doesn't
   |defines GTHREAD_USE_WEAK to |define GTHREAD_USE_WEAK to
   |1 for newer glibc   |0 for newer glibc

[Bug libgcc/114646] libgcc's gthr.h still defines GTHREAD_USE_WEAK to 1 for newer glibc

2024-04-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114646

--- Comment #10 from H.J. Lu  ---
Created attachment 57906
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57906=edit
A patch

I am testing this.

[Bug libgcc/114646] libgcc's gthr.h still defines GTHREAD_USE_WEAK to 1 for newer glibc

2024-04-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114646

--- Comment #7 from H.J. Lu  ---
r12-5108

commit 80fe172ba9820199c2bbce5d0611ffca27823049
Author: Jonathan Wakely 
Date:   Tue Nov 9 23:45:36 2021 +

libstdc++: Disable gthreads weak symbols for glibc 2.34 [PR103133]

Since Glibc 2.34 all pthreads symbols are defined directly in libc not
libpthread, and since Glibc 2.32 we have used __libc_single_threaded to
avoid unnecessary locking in single-threaded programs. This means there
is no reason to avoid linking to libpthread now, and so no reason to use
weak symbols defined in gthr-posix.h for all the pthread_xxx functions.

libstdc++-v3/ChangeLog:

PR libstdc++/100748
PR libstdc++/103133
* config/os/gnu-linux/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK):
Define for glibc 2.34 and later.

fixed static C++ pthread programs.  libgfortran neeeds a similar fix.

[Bug libgomp/39176] -static and -fopenmp and io causes segfault

2024-04-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39176

H.J. Lu  changed:

   What|Removed |Added

 CC||skpgkp2 at gmail dot com
 Status|REOPENED|NEW

--- Comment #11 from H.J. Lu  ---
r12-5108

commit 80fe172ba9820199c2bbce5d0611ffca27823049
Author: Jonathan Wakely 
Date:   Tue Nov 9 23:45:36 2021 +

libstdc++: Disable gthreads weak symbols for glibc 2.34 [PR103133]

Since Glibc 2.34 all pthreads symbols are defined directly in libc not
libpthread, and since Glibc 2.32 we have used __libc_single_threaded to
avoid unnecessary locking in single-threaded programs. This means there
is no reason to avoid linking to libpthread now, and so no reason to use
weak symbols defined in gthr-posix.h for all the pthread_xxx functions.

libstdc++-v3/ChangeLog:

PR libstdc++/100748
PR libstdc++/103133
* config/os/gnu-linux/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK):
Define for glibc 2.34 and later.

fixed static C++ pthread programs.  libgfortran neeeds a similar fix.

[Bug libgomp/39176] -static and -fopenmp and io causes segfault

2024-04-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39176

H.J. Lu  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   See Also||https://sourceware.org/bugz
   ||illa/show_bug.cgi?id=5784
   Last reconfirmed||2024-04-08
 Resolution|INVALID |---
 Status|RESOLVED|REOPENED

--- Comment #10 from H.J. Lu  ---
Reopened.

[Bug libfortran/114646] libgfortran doesn't work with static libpthread

2024-04-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114646

H.J. Lu  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-04-08

--- Comment #1 from H.J. Lu  ---
See https://sourceware.org/bugzilla/show_bug.cgi?id=5784#c10 for more info.

[Bug libfortran/114646] New: libgfortran doesn't work with static libpthread

2024-04-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114646

Bug ID: 114646
   Summary: libgfortran doesn't work with static libpthread
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libfortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
  Target Milestone: ---

[hjl@gnu-cfl-3 tmp]$ cat x.f90 
  use omp_lib
  implicit none
  integer, parameter :: NT = 4
  integer :: nThreads(NT)

  print *, 'Call omp_set_dynamic'
!$call omp_set_dynamic(.false.)
  print *, 'Call omp_set_num_threads'
!$call omp_set_num_threads(NT)
  print *, 'Now enter the parallel region'

!$omp parallel default(none) shared(nThreads)
  nThreads(omp_get_thread_num()+1) = omp_get_num_threads()
!$omp end parallel

  print*, nThreads

  END
[hjl@gnu-cfl-3 tmp]$ gfortran -static -fopenmp x.f90 
/usr/local/bin/ld: /usr/lib/gcc/x86_64-redhat-linux/13/libgomp.a(target.o): in
function `gomp_target_init.part.0':
(.text+0x4d6): warning: Using 'dlopen' in statically linked applications
requires at runtime the shared libraries from the glibc version used for
linking
[hjl@gnu-cfl-3 tmp]$ ./a.out 
 Call omp_set_dynamic
 Call omp_set_num_threads
 Now enter the parallel region
   4   4   4   4

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
Segmentation fault (core dumped)
[hjl@gnu-cfl-3 tmp]$

[Bug target/114590] [14 Regression] FAIL: gcc.target/i386/apx-ndd-ti-shift.c (test for excess errors)

2024-04-06 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114590

H.J. Lu  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from H.J. Lu  ---
Fixed.

[Bug gcov-profile/114599] [14 Regression] ICE: SIGSEGV in bitmap_set_bit(bitmap_head*, int) (bitmap.cc:975) with -O2 -fcondition-coverage

2024-04-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114599

H.J. Lu  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #6 from H.J. Lu  ---
Not fixed.

[Bug gcov-profile/114599] [14 Regression] ICE: SIGSEGV in bitmap_set_bit(bitmap_head*, int) (bitmap.cc:975) with -O2 -fcondition-coverage

2024-04-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114599

H.J. Lu  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com

--- Comment #5 from H.J. Lu  ---
Created attachment 57888
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57888=edit
A testcase

The bug isn't fixed:

[hjl@gnu-tgl-3 gcc]$
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.misc-tests/gcov-24.c
-fdiagnostics-plain-output -O2 -fcondition-coverage -S -o gcov-24.s
during IPA pass: profile
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.misc-tests/gcov-24.c: In
function ‘do_all_fn_LHASH_DOALL_ARG_arg2’:
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.misc-tests/gcov-24.c:20:1:
internal compiler error: Segmentation fault
0x16ccfa6 crash_signal
/export/gnu/import/git/sources/gcc/gcc/toplev.cc:319
0x180579d hash_table, unsigned int> >::hash_entry,
false, xcallocator>::find_with_hash(gcond* const&, unsigned int)
/export/gnu/import/git/sources/gcc/gcc/hash-table.h:983
0x1804c87 hash_map, unsigned int> >::get(gcond*
const&)
/export/gnu/import/git/sources/gcc/gcc/hash-map.h:191
0x17fdbf8 condition_uid
/export/gnu/import/git/sources/gcc/gcc/tree-profile.cc:370
0x17ff420 find_conditions(function*)
/export/gnu/import/git/sources/gcc/gcc/tree-profile.cc:877
0x158b963 branch_prob(bool)
/export/gnu/import/git/sources/gcc/gcc/profile.cc:1549
0x1802b86 tree_profiling
/export/gnu/import/git/sources/gcc/gcc/tree-profile.cc:1917
0x1803210 execute
/export/gnu/import/git/sources/gcc/gcc/tree-profile.cc:2046
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
[hjl@gnu-tgl-3 gcc]$

[Bug target/114590] [14 Regression] FAIL: gcc.target/i386/apx-ndd-ti-shift.c (test for excess errors)

2024-04-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114590

H.J. Lu  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |MOVED
   See Also||https://sourceware.org/bugz
   ||illa/show_bug.cgi?id=31606

--- Comment #1 from H.J. Lu  ---
An assembler bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=31606

[Bug target/114590] [14 Regression] FAIL: gcc.target/i386/apx-ndd-ti-shift.c (test for excess errors)

2024-04-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114590

H.J. Lu  changed:

   What|Removed |Added

   Priority|P3  |P2
   Target Milestone|--- |14.0
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-04-04

[Bug target/114590] New: [14 Regression] FAIL: gcc.target/i386/apx-ndd-ti-shift.c (test for excess errors)

2024-04-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114590

Bug ID: 114590
   Summary: [14 Regression] FAIL:
gcc.target/i386/apx-ndd-ti-shift.c (test for excess
errors)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com
  Target Milestone: ---
Target: x86-64

On x86-64, r14-9788-gb7bd2ec73d66f7 gave

Executing on host:
/export/build/gnu/tools-build/gcc-x32-gitlab/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-x32-gitlab/build-x86_64-linux/gcc/ 
/export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c
   -fdiagnostics-plain-output   -O2  -lm  -o ./apx-ndd-ti-shift.exe(timeout
= 300)
spawn -ignore SIGHUP
/export/build/gnu/tools-build/gcc-x32-gitlab/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-x32-gitlab/build-x86_64-linux/gcc/
/export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c
-fdiagnostics-plain-output -O2 -lm -o ./apx-ndd-ti-shift.exe
/tmp/ccVIKjlx.s: Assembler messages:
/tmp/ccVIKjlx.s:13: Error: operand type mismatch for `shld'
/tmp/ccVIKjlx.s:50: Error: operand type mismatch for `shrd'
/tmp/ccVIKjlx.s:91: Error: operand type mismatch for `shrd'
compiler exited with status 1
FAIL: gcc.target/i386/apx-ndd-ti-shift.c (test for excess errors)

[Bug target/114587] -mapxf should define a macro

2024-04-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114587

--- Comment #1 from H.J. Lu  ---
We should define a macro for each APX command-line option.

[Bug target/114587] -mapxf should define a macro

2024-04-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114587

H.J. Lu  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 Ever confirmed|0   |1
   Priority|P3  |P2
   Last reconfirmed||2024-04-04
 Status|UNCONFIRMED |NEW

[Bug target/114587] New: -mapxf should define a macro

2024-04-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114587

Bug ID: 114587
   Summary: -mapxf should define a macro
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com
  Target Milestone: ---
Target: x86-64

-mapxf should define a macro to indicate APX is enabled.

[Bug gcov-profile/114115] xz-utils segfaults when built with -fprofile-generate (bad interaction between IFUNC and binding?)

2024-04-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114115

H.J. Lu  changed:

   What|Removed |Added

  Known to work||14.0

--- Comment #14 from H.J. Lu  ---
Fixed for GCC 14 so far

[Bug lto/114337] LTO symbol table doesn't include builtin functions

2024-03-14 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114337

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |MOVED

--- Comment #4 from H.J. Lu  ---
Will fix it in linker.

[Bug lto/114337] LTO symbol table doesn't include builtin functions

2024-03-14 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114337

--- Comment #1 from H.J. Lu  ---
Maybe linker can deal with it.

[Bug lto/114337] New: LTO symbol table doesn't include builtin functions

2024-03-14 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114337

Bug ID: 114337
   Summary: LTO symbol table doesn't include builtin functions
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
  Target Milestone: ---

[hjl@gnu-cfl-3 pr31482-a]$ cat y.c
#include 
#include 

void *
foo (size_t n)
{
  printf ("hello\n");
  return malloc (n);
}
[hjl@gnu-cfl-3 pr31482-a]$ gcc -flto -c y.c
[hjl@gnu-cfl-3 pr31482-a]$ nm y.o
 T foo
[hjl@gnu-cfl-3 pr31482-a]$ lto-dump -list y.o
Type   Visibility  Size  Name
function  default 0  puts  
function  default 0  malloc  
function  default 4  foo  

[hjl@gnu-cfl-3 pr31482-a]$ 

This doesn't work with libraries which provide alternative implementations
for standard functions, like jemalloc, since linker doesn't know the builtin
functions are referenced.  Unless GCC can inline these builtin functions,
these symbols should be in LTO symbol table.

[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc

2024-02-29 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116

--- Comment #12 from H.J. Lu  ---
(In reply to Lukas Grätz from comment #11)
> 
> I applied it, double checked, make distclean, configure, make again.
> 
> But your result seems different. Have you applied Jakub Jelinek's patch to

No.

> save %rbp? I applied both patches. Perhaps there was some subtle
> merge-conflict with the two patches.

Please try just my patch.

[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc

2024-02-29 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116

--- Comment #10 from H.J. Lu  ---
(In reply to Lukas Grätz from comment #9)

> 
> Not on my computer. When I used -g I got:
> 
> 
> no_return_to_caller:
> .LFB0:
>   .loc 1 16 1 view -0
>   .cfi_startproc
>   .loc 1 17 3 view .LVU1
>   .loc 1 18 3 view .LVU2
> .LVL0:
>   .loc 1 18 26 discriminator 1 view .LVU3
>   .loc 1 16 1 is_stmt 0 view .LVU4
>   pushq   %rbp
>   .cfi_def_cfa_offset 16
>   .cfi_offset 6, -16
>   movl$array+67108860, %eax
>   .loc 1 21 31 view .LVU5
>   xorl%r13d, %r13d
>   .loc 1 16 1 view .LVU6
> 
> 
> Still no .cfi_undefined 13. In principle, it should also be generated
> without -g, as the rest of .cfi_offset and friends.

Did you apply my patch?  I got

.globl  no_return_to_caller
.type   no_return_to_caller, @function
no_return_to_caller:
.LFB0:
.file 1 "pr38534-1.c"
.loc 1 16 1 view -0
.cfi_startproc
.loc 1 17 3 view .LVU1
.loc 1 18 3 view .LVU2
.LVL0:
.loc 1 18 26 discriminator 1 view .LVU3
.loc 1 16 1 is_stmt 0 view .LVU4
subq$24, %rsp
.cfi_undefined 15
.cfi_undefined 14
.cfi_undefined 13
.cfi_undefined 12
.cfi_undefined 6
...

[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc

2024-02-29 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116

--- Comment #8 from H.J. Lu  ---
(In reply to Lukas Grätz from comment #7)
> (In reply to H.J. Lu from comment #6)
> > (In reply to Jakub Jelinek from comment #5)
> > > Yeah.  Not to mention, one can call backtrace even if -g0; you just don't
> > > get nice names for the addresses.  Without the patch you get crashes in 
> > > the
> > > unwinder when doing backtrace.
> > 
> > Should we generate REG_CFA_UNDEFINED for unsaved callee-saved registers to
> > help unwinder:
> > 
> > https://patchwork.sourceware.org/project/gcc/list/?series=30327
> 
> Yes. Also for gdb this is needed.
> 
> Perhaps I did something wrong. On my computer, I could get the first patch
> working to save rbp, I also applied the patch which should omit the
> .cfi_undefined. But somehow, I still not get .cfi_undefined for any of the
> examples.
> 
> 
> $ ./gcc/host-x86_64-pc-linux-gnu/gcc/cc1 -O3
> gcc/gcc/testsuite/gcc.target/i386/pr38534-7.c -o pr38534-7.S
> 
> $ cat pr38534-7.S
> [...]
> no_return_to_caller:
> .LFB0:
>   .cfi_startproc
>   pushq   %rbp
>   .cfi_def_cfa_offset 16
>   .cfi_offset 6, -16
>   movl$array+67108860, %eax
>   xorl%r13d, %r13d
> [...]
> 
> 
> The ".cfi_undefined 13" is still missing...

It is generated only when -g is used.

[Bug target/114098] _tile_loadconfig doesn't work

2024-02-27 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098

H.J. Lu  changed:

   What|Removed |Added

   Target Milestone|--- |11.5
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #7 from H.J. Lu  ---
Fixed for 11.5, 12.4, 13.3 and 14.

[Bug gcov-profile/114115] xz-utils segfaults when built with -fprofile-generate (bad interaction between IFUNC and binding?)

2024-02-26 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114115

--- Comment #8 from H.J. Lu  ---
A patch is posted at

https://patchwork.sourceware.org/project/gcc/list/?series=31343

[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc

2024-02-26 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116

H.J. Lu  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|hjl.tools at gmail dot com |unassigned at gcc dot 
gnu.org

--- Comment #6 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #5)
> Yeah.  Not to mention, one can call backtrace even if -g0; you just don't
> get nice names for the addresses.  Without the patch you get crashes in the
> unwinder when doing backtrace.

Should we generate REG_CFA_UNDEFINED for unsaved callee-saved registers to
help unwinder:

https://patchwork.sourceware.org/project/gcc/list/?series=30327

[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc

2024-02-26 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116

--- Comment #3 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #2)
> Created attachment 57545 [details]
> gcc14-pr114116.patch
> 
> This seems to fix it, so far tested just on the small testcase, back to the
> expected backtrace there.

Should we check -g? Without -g, I don't think we need to save FP.

[Bug gcov-profile/114115] xz-utils segfaults when built with -fprofile-generate (bad interaction between IFUNC and binding?)

2024-02-26 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114115

--- Comment #7 from H.J. Lu  ---
Created attachment 57544
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57544=edit
A patch

[Bug gcov-profile/114115] xz-utils segfaults when built with -fprofile-generate (bad interaction between IFUNC and binding?)

2024-02-26 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114115

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2024-02-26
   Target Milestone|--- |14.0
   Assignee|unassigned at gcc dot gnu.org  |hjl.tools at gmail dot 
com

[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc

2024-02-26 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116

H.J. Lu  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-26
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |hjl.tools at gmail dot 
com

[Bug target/114097] Missed register optimization in _Noreturn functions

2024-02-26 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097

H.J. Lu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #7 from H.J. Lu  ---
Fixed.

[Bug target/114097] Missed register optimization in _Noreturn functions

2024-02-25 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097

--- Comment #5 from H.J. Lu  ---
A patch is submitted:

https://patchwork.sourceware.org/project/gcc/list/?series=31294

[Bug target/114098] _tile_loadconfig doesn't work

2024-02-25 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098

H.J. Lu  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-02-25

--- Comment #2 from H.J. Lu  ---
We should tell GCC that 64 bytes will be accessed by ldtilecfg and sttilecfg.

[Bug target/114098] _tile_loadconfig doesn't work

2024-02-25 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098

--- Comment #1 from H.J. Lu  ---
The problem is that in

extern __inline void
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_tile_loadconfig (const void *__config)
{
  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
}

only 8 bytes are used.

[Bug target/114098] New: _tile_loadconfig doesn't work

2024-02-25 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098

Bug ID: 114098
   Summary: _tile_loadconfig doesn't work
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com
  Target Milestone: ---
Target: x86-64

[hjl@gnu-cfl-3 amx-1]$ cat foo.c
#include 
#include 

#define MAX_ROWS 16
#define MAX_COLS 64
#define MAX 1024
#define STRIDE 64

typedef struct __tile_config
{
  uint8_t palette_id;
  uint8_t start_row;
  uint8_t reserved_0[14];
  uint16_t colsb[16];
  uint8_t rows[16];
} __tilecfg;


extern void bar (__tilecfg *tileinfo);

/* Initialize tile config */
static void
init_tile_config (__tilecfg *tileinfo)
{
  int i;
  tileinfo->palette_id = 1;
  tileinfo->start_row = 0;

  for (i = 0; i < 1; ++i)
  {
tileinfo->colsb[i] = MAX_ROWS;
tileinfo->rows[i] =  MAX_ROWS;
  }

  for (i = 1; i < 4; ++i)
  {
tileinfo->colsb[i] = MAX_COLS;
tileinfo->rows[i] =  MAX_ROWS;
  }

  _tile_loadconfig (tileinfo);
}

void
enable_amx (void)
{
  __tilecfg tile_data = {0};
  init_tile_config (_data);
}
[hjl@gnu-cfl-3 amx-1]$ gcc -S -O2 -mamx-tile foo.c
[hjl@gnu-cfl-3 amx-1]$ cat foo.s
.file   "foo.c"
.text
.p2align 4
.globl  enable_amx
.type   enable_amx, @function
enable_amx:
.LFB6615:
.cfi_startproc
movl$1, %eax < tile_data isn't properly initialized.
movw%ax, -72(%rsp)
#APP
# 42 "/usr/lib/gcc/x86_64-redhat-linux/13/include/amxtileintrin.h" 1
ldtilecfg   -72(%rsp)
# 0 "" 2
#NO_APP
ret
.cfi_endproc
.LFE6615:
.size   enable_amx, .-enable_amx
.ident  "GCC: (GNU) 13.2.1 20231205 (Red Hat 13.2.1-6)"
.section.note.GNU-stack,"",@progbits
[hjl@gnu-cfl-3 amx-1]$

[Bug target/114097] Missed register optimization in _Noreturn functions

2024-02-25 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097

H.J. Lu  changed:

   What|Removed |Added

  Component|c   |target
   Target Milestone|--- |14.0
   Assignee|unassigned at gcc dot gnu.org  |hjl.tools at gmail dot 
com

[Bug c/114097] Missed register optimization in _Noreturn functions

2024-02-25 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097

--- Comment #3 from H.J. Lu  ---
Created attachment 57524
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57524=edit
A patch

[Bug c/114097] Missed register optimization in _Noreturn functions

2024-02-25 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097

--- Comment #2 from H.J. Lu  ---
I couldn't find a way to access the _Noreturn info in backend.

[Bug c/114097] Missed register optimization in _Noreturn functions

2024-02-25 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097

H.J. Lu  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-25
Version|unknown |14.0
 CC||hjl.tools at gmail dot com
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from H.J. Lu  ---
__attribute__((noreturn)) works in GCC 14:

[hjl@gnu-cfl-3 tmp]$ cat y.c
#include
#include
//_Noreturn
__attribute__((noreturn))
void noret(unsigned A, unsigned B, unsigned C, unsigned D, unsigned E, jmp_buf
Jb){

for(;A--;) puts("A");
for(;B--;) puts("B");
for(;C--;) puts("C");
for(;D--;) puts("D");
for(;E--;) puts("E");

longjmp(Jb,1);
}
[hjl@gnu-cfl-3 tmp]$ /usr/gcc-14.0.1-x32/bin/gcc -S -O2 y.c
[hjl@gnu-cfl-3 tmp]$ cat y.s
.file   "y.c"
.text
.section.rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "A"
.LC1:
.string "B"
.LC2:
.string "C"
.LC3:
.string "D"
.LC4:
.string "E"
.text
.p2align 4
.globl  noret
.type   noret, @function
noret:
.LFB11:
.cfi_startproc
subq$8, %rsp
.cfi_def_cfa_offset 16
movl%esi, %r15d
movl%edx, %r14d
movl%ecx, %r13d
movl%r8d, %ebp
movq%r9, %r12
testl   %edi, %edi
je  .L2
leal-1(%rdi), %ebx
.p2align 4,,10
.p2align 3
.L3:
movl$.LC0, %edi
callputs
subl$1, %ebx
jnb .L3
.L2:
leal-1(%r15), %ebx
testl   %r15d, %r15d
je  .L4
.p2align 4,,10
.p2align 3
.L5:
movl$.LC1, %edi
callputs
subl$1, %ebx
jnb .L5
.L4:
leal-1(%r14), %ebx
testl   %r14d, %r14d
je  .L6
.p2align 4,,10
.p2align 3
.L7:
movl$.LC2, %edi
callputs
subl$1, %ebx
jnb .L7
.L6:
leal-1(%r13), %ebx
testl   %r13d, %r13d
je  .L8
.p2align 4,,10
.p2align 3
.L9:
movl$.LC3, %edi
callputs
subl$1, %ebx
jnb .L9
.L8:
leal-1(%rbp), %ebx
testl   %ebp, %ebp
je  .L10
.p2align 4,,10
.p2align 3
.L11:
movl$.LC4, %edi
callputs
subl$1, %ebx
jnb .L11
.L10:
movl$1, %esi
movq%r12, %rdi
calllongjmp
.cfi_endproc
.LFE11:
.size   noret, .-noret
.ident  "GCC: (GNU) 14.0.1 20240223 (experimental)"
.section.note.GNU-stack,"",@progbits
[hjl@gnu-cfl-3 tmp]$

[Bug rtl-optimization/91161] [11/12/13/14 Regression] ICE in begin_move_insn, at sched-ebb.c:175

2024-02-22 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91161

--- Comment #14 from H.J. Lu  ---
(In reply to Andrew Pinski from comment #13)
> I looked into the IR between GCC 12 and GCC 13 (with the added attributes),
> before sched2 there is no difference. So it would good to see what change
> "fixes" this again.

The bug went latent by r13-2726.

[Bug middle-end/113988] during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5470

2024-02-19 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113988

--- Comment #13 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #11)
> Though, bet that would mean we punt with -mavx -mno-avx2 on 32-byte copies,
> because there we support just V8SFmode and not V32QImode.

Punt AVX without AVX2 shouldn't have any meaningful impacts on codegen
for real applications.

[Bug target/113912] push2/pop2 generated when stack isn't aligned to 16 bytes

2024-02-18 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113912

H.J. Lu  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from H.J. Lu  ---
Fixed.

[Bug target/113855] [14 Regression] __gcc_nested_func_ptr_{created,deleted} exports from 32-bit libgcc_s.so.1

2024-02-14 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113855

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from H.J. Lu  ---
Fixed.

[Bug target/113912] push2/pop2 generated when stack isn't aligned to 16 bytes

2024-02-13 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113912

H.J. Lu  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Last reconfirmed||2024-02-13
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from H.J. Lu  ---
A patch is at

https://patchwork.sourceware.org/project/gcc/list/?series=30889

[Bug target/113912] New: push2/pop2 generated when stack isn't aligned to 16 bytes

2024-02-13 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113912

Bug ID: 113912
   Summary: push2/pop2 generated when stack isn't aligned to 16
bytes
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com
  Target Milestone: ---
Target: x86-64

[hjl@gnu-cfl-3 apx-1]$ cat x.c
extern int bar (int);

void foo ()
{
  int a,b,c,d,e,f,i;
  a = bar (5);
  b = bar (a);
  c = bar (b);
  d = bar (c);
  e = bar (d);
  f = bar (e);
  for (i = 1; i < 10; i++)
  {
a += bar (a + i) + bar (b + i) +
 bar (c + i) + bar (d + i) +
 bar (e + i) + bar (f + i);
  }
}
[hjl@gnu-cfl-3 apx-1]$ make
/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -mapxf
-O2 -mpreferred-stack-boundary=3 -fomit-frame-pointer -S x.c
[hjl@gnu-cfl-3 apx-1]$ cat x.s
.file   "x.c"
.text
.p2align 4
.globl  foo
.type   foo, @function
foo:
.LFB0:
.cfi_startproc
pushp   %r15
.cfi_def_cfa_offset 16
.cfi_offset 15, -16
movl$5, %edi
push2p  %r13, %r14
.cfi_def_cfa_offset 32
.cfi_offset 14, -24
.cfi_offset 13, -32
push2p  %rbp, %r12
.cfi_def_cfa_offset 48
.cfi_offset 12, -40
.cfi_offset 6, -48
pushp   %rbx
.cfi_def_cfa_offset 56
.cfi_offset 3, -56
movl$1, %ebx
subq$8, %rsp
.cfi_def_cfa_offset 64
callbar
movl%eax, %edi
movl%eax, %r12d
callbar
movl%eax, %edi
movl%eax, %r15d
callbar
movl%eax, %edi
movl%eax, %r14d
callbar
movl%eax, %edi
movl%eax, %r13d
callbar
movl%eax, %edi
movl%eax, (%rsp)
callbar
movl%eax, 4(%rsp)
.p2align 4,,10
.p2align 3
.L2:
leal(%r12,%rbx), %edi
callbar
leal(%r15,%rbx), %edi
movl%eax, %ebp
callbar
leal(%r14,%rbx), %edi
addl%eax, %ebp
callbar
leal0(%r13,%rbx), %edi
addl%eax, %ebp
callbar
addl%ebx, (%rsp), %edi
addl%eax, %ebp
callbar
addl%ebx, 4(%rsp), %edi
addl$1, %ebx
addl%eax, %ebp
callbar
addl%eax, %ebp
addl%ebp, %r12d
cmpl$10, %ebx
jne .L2
addq$8, %rsp
.cfi_def_cfa_offset 56
popp%rbx
.cfi_def_cfa_offset 48
pop2p   %r12, %rbp
.cfi_restore 12
.cfi_restore 6
.cfi_def_cfa_offset 32
pop2p   %r14, %r13
.cfi_restore 14
.cfi_restore 13
.cfi_def_cfa_offset 16
popp%r15
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE0:
.size   foo, .-foo
.ident  "GCC: (GNU) 14.0.1 20240213 (experimental)"
.section.note.GNU-stack,"",@progbits
[hjl@gnu-cfl-3 apx-1]$ 

With -mpreferred-stack-boundary=3, the coming stack is 8-byte aligned. 
push2/pop2
shouldn't be generated in this case.

[Bug target/113876] ICE: in ix86_expand_epilogue, at config/i386/i386.cc:10101 with -O -mpreferred-stack-boundary=3 -finstrument-functions -mapxf -mcmodel=large

2024-02-13 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113876

H.J. Lu  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from H.J. Lu  ---
Fixed.

[Bug target/113876] ICE: in ix86_expand_epilogue, at config/i386/i386.cc:10101 with -O -mpreferred-stack-boundary=3 -finstrument-functions -mapxf -mcmodel=large

2024-02-13 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113876

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-02-13
   Target Milestone|--- |14.0
 CC||crazylht at gmail dot com
 Ever confirmed|0   |1

--- Comment #2 from H.J. Lu  ---
A patch is at

https://patchwork.sourceware.org/project/gcc/list/?series=30888

[Bug target/113909] gcc.target/i386/pr113689-1.c etc. FAIL

2024-02-13 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113909

--- Comment #1 from H.J. Lu  ---
It fails on Solaris because of:

sol2.h:#undef NO_PROFILE_COUNTERS

Just skip these tests for Solaris.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #38 from H.J. Lu  ---
The new glibc patch set covers both i386 and x86-64:

https://patchwork.sourceware.org/project/glibc/list/?series=30854

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #36 from H.J. Lu  ---
(In reply to Andreas Schwab from comment #35)
> ld.so use its internal malloc only during bootstrapping.

___tls_get_addr always uses the internal malloc.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #34 from H.J. Lu  ---
(In reply to H.J. Lu from comment #33)
> (In reply to H.J. Lu from comment #32)
> > (In reply to Michael Matz from comment #31)
> > > (In reply to H.J. Lu from comment #30)
> > > > (In reply to Michael Matz from comment #29)
> > > > > It not only can call malloc.  As the backtrace of H.J. shows, it quite
> > > > > clearly _does_ so :-)
> > > > 
> > > > ld.so can only call the malloc implementation internal to ld.so.
> > > 
> > > (And string functions for initializing that memory)  If that's ensured
> > > already
> > > everywhere: super.  Because I agree, that this is the best thing to do 
> > > here.
> > > From my perspective this is pure internal implementation details and hence
> > > setting up thread-local areas should not be expected to be interposable by
> > > users.
> > > (a custom allocator that isn't malloc or doesn't interact with it also 
> > > would
> > > work)
> > 
> > Since ia32 ld.so in glibc is compiled with:
> > 
> > Makefile:rtld-CFLAGS += -mno-sse -mno-mmx -mfpmath=387
> > 
> > ia32 _dl_tlsdesc_dynamic is OK.
> 
> 387 registers may be an issue.

I checked ld.so.  It doesn't use 387 registers.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #33 from H.J. Lu  ---
(In reply to H.J. Lu from comment #32)
> (In reply to Michael Matz from comment #31)
> > (In reply to H.J. Lu from comment #30)
> > > (In reply to Michael Matz from comment #29)
> > > > It not only can call malloc.  As the backtrace of H.J. shows, it quite
> > > > clearly _does_ so :-)
> > > 
> > > ld.so can only call the malloc implementation internal to ld.so.
> > 
> > (And string functions for initializing that memory)  If that's ensured
> > already
> > everywhere: super.  Because I agree, that this is the best thing to do here.
> > From my perspective this is pure internal implementation details and hence
> > setting up thread-local areas should not be expected to be interposable by
> > users.
> > (a custom allocator that isn't malloc or doesn't interact with it also would
> > work)
> 
> Since ia32 ld.so in glibc is compiled with:
> 
> Makefile:rtld-CFLAGS += -mno-sse -mno-mmx -mfpmath=387
> 
> ia32 _dl_tlsdesc_dynamic is OK.

387 registers may be an issue.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #32 from H.J. Lu  ---
(In reply to Michael Matz from comment #31)
> (In reply to H.J. Lu from comment #30)
> > (In reply to Michael Matz from comment #29)
> > > It not only can call malloc.  As the backtrace of H.J. shows, it quite
> > > clearly _does_ so :-)
> > 
> > ld.so can only call the malloc implementation internal to ld.so.
> 
> (And string functions for initializing that memory)  If that's ensured
> already
> everywhere: super.  Because I agree, that this is the best thing to do here.
> From my perspective this is pure internal implementation details and hence
> setting up thread-local areas should not be expected to be interposable by
> users.
> (a custom allocator that isn't malloc or doesn't interact with it also would
> work)

Since ia32 ld.so in glibc is compiled with:

Makefile:rtld-CFLAGS += -mno-sse -mno-mmx -mfpmath=387

ia32 _dl_tlsdesc_dynamic is OK.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #30 from H.J. Lu  ---
(In reply to Michael Matz from comment #29)
> It not only can call malloc.  As the backtrace of H.J. shows, it quite
> clearly _does_ so :-)
> 

ld.so can only call the malloc implementation internal to ld.so.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #28 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #27)
> (In reply to H.J. Lu from comment #26)
> > Even if I compile ia32 glibc with -march=skylake, the _dl_tlsdesc_dynamic
> > slow
> > path doesn't touch XMM registers at all.
> 
> I thought Florian said it can call malloc and malloc can be user provided
> and can use SSE2, 387/MMX or whatever other call clobbered registers ia32
> has.

[hjl@gnu-cfl-3 elf]$ readelf -rW ld.so

Relocation section '.rel.dyn' at offset 0x9f8 contains 3 entries:
 Offset InfoTypeSym. Value  Symbol's Name
00032fe0  1a06 R_386_GLOB_DAT 00031ac0   __rseq_offset@@GLIBC_2.35
00032fe4  1f06 R_386_GLOB_DAT 00031ac4   __rseq_size@@GLIBC_2.35
00032b20  002a R_386_IRELATIVE   

Relocation section '.relr.dyn' at offset 0xa10 contains 3 entries:
  12 offsets
00031a60
00032ed0
00032ed8
00032f04
00032f08
00032f0c
00032f10
00032f14
00032f18
00032f1c
00032f20
00032f24
[hjl@gnu-cfl-3 elf]$ 

You can't use another malloc for the ld.so internal usage of malloc/calloc.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #26 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #25)
> (In reply to H.J. Lu from comment #23)
> > > And i386/dl-tlsdesc.S needs to save/restore 387 and SSE regs?
> > 
> > i386 doesn't preserve them in _dl_runtime_resolve nor _dl_tlsdesc_dynamic.
> 
> That is different.  _dl_runtime_resolve happens only at the start of calls
> to functions, if in all supported ia32 ABIs all of i387 state is unsupported
> upon entering functions, then there is no need to save anything.
> While _dl_tlsdesc_dynamic can happen anywhere from within functions and
> doesn't clobber any registers except ax which gets the value, so I think it
> needs to be saved for that case.

I couldn't find a test to show it is needed on i386:

#0  __GI___libc_malloc (bytes=3200) at malloc.c:3294
#1  0xf7fdb771 in malloc (size=) at ../include/rtld-malloc.h:56
#2  allocate_dtv_entry (size=, alignment=4) at dl-tls.c:679
#3  allocate_and_init (map=0xf6e00670) at dl-tls.c:704
#4  tls_get_addr_tail (ti=0xf6e00a30, dtv=0x5655fcd8, the_map=0xf6e00670)
at dl-tls.c:904
#5  0xf7fdf5d5 in _dl_tlsdesc_dynamic () at ../sysdeps/i386/dl-tlsdesc.S:129
#6  0xf7fb017b in apply_tls (p=0xf7a0037c) at tst-gnu2-tls2mod1.c:26
#7  0x5655769b in access_mod (i=1, sym=0x5655a026 "apply_tls")
at ../sysdeps/i386/i686/tst-gnu2-tls2-i686.c:55
#8  start (arg=0x0) at ../sysdeps/i386/i686/tst-gnu2-tls2-i686.c:70
#9  0xf7c96207 in start_thread (arg=) at pthread_create.c:447
#10 0xf7d3dc08 in clone3 () at ../sysdeps/unix/sysv/linux/i386/clone3.S:111

Even if I compile ia32 glibc with -march=skylake, the _dl_tlsdesc_dynamic slow
path doesn't touch XMM registers at all.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

H.J. Lu  changed:

   What|Removed |Added

 Resolution|--- |MOVED
 Status|NEW |RESOLVED

--- Comment #24 from H.J. Lu  ---
Moved to glibc.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #23 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #22)
> BTW, does aarch64 dl-tlsdesc.S save SVE/SME register state (I only see fixed
> offsets in there), or are those call-saved?
> What about floating point registers in x86_64/dl-tlsdesc.S?

Floating point registers are preserved with my glibc patch.

> And i386/dl-tlsdesc.S needs to save/restore 387 and SSE regs?

i386 doesn't preserve them in _dl_runtime_resolve nor _dl_tlsdesc_dynamic.

[Bug tree-optimization/113752] [14 Regression] warning: ‘%s’ directive writing up to 10218 bytes into a region of size between 0 and 10240 [-Wformat-overflow=] since r14-261-g0ef3756adf078c

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113752

--- Comment #6 from H.J. Lu  ---
I can reproduce it with r14-8930-g1e94648ab7b370

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #21 from H.J. Lu  ---
(In reply to Florian Weimer from comment #20)
> (In reply to H.J. Lu from comment #19)
> > (In reply to Florian Weimer from comment #9)
> > > (In reply to H.J. Lu from comment #7)
> > > > > The __tls_get_addr call with the default approach potentially needs 
> > > > > to solve
> > > > > the same problem, doesn't it?
> > > > 
> > > > Isn't __tls_get_addr called via the PLT entry?
> > > 
> > > I'm not sure if that matters? Even if the lazy binding trampoline is 
> > > active,
> > > it won't protect the actual call.
> > 
> > Non-GNU2 TLS has
> > 
> > 4000  00010007 R_X86_64_JUMP_SLOT 
> > __tls_get_addr + 1010
> > 
> > which calls _dl_runtime_resolve with lazy binding. _dl_runtime_resolve
> > preserves all caller-saved registers.
> 
> The dynamic linker preserves register contents during lazy binding and
> restores them before calling __tls_get_addr, so it doesn't help with
> __tls_get_addr register usage itself. And lazy binding happens only once per
> process and object, while we need to protect the first call on every thread.

Only called from _dl_tlsdesc_dynamic isn't protected.  My glibc patch:

https://patchwork.sourceware.org/project/glibc/list/?series=30800

fixes it.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #19 from H.J. Lu  ---
(In reply to Florian Weimer from comment #9)
> (In reply to H.J. Lu from comment #7)
> > > The __tls_get_addr call with the default approach potentially needs to 
> > > solve
> > > the same problem, doesn't it?
> > 
> > Isn't __tls_get_addr called via the PLT entry?
> 
> I'm not sure if that matters? Even if the lazy binding trampoline is active,
> it won't protect the actual call.

Non-GNU2 TLS has

4000  00010007 R_X86_64_JUMP_SLOT 
__tls_get_addr + 1010

which calls _dl_runtime_resolve with lazy binding. _dl_runtime_resolve
preserves
all caller-saved registers.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #7 from H.J. Lu  ---
(In reply to Florian Weimer from comment #6)
> > (In reply to H.J. Lu from comment #4)
> > > (In reply to H.J. Lu from comment #3)
> > > > Created attachment 57385 [details]
> > > > A patch
> > > > 
> > > > Try this.
> > > 
> > > This doesn't work properly.  To work around in ld.so, _dl_tlsdesc_dynamic
> > > needs to save and restore ALL registers, which can be expensive.
> 
> Why doesn't this work properly? Is it possible to make it work with a
> different approach?

Clobber must be attached to TLS descriptor call insn.

> The __tls_get_addr call with the default approach potentially needs to solve
> the same problem, doesn't it?

Isn't __tls_get_addr called via the PLT entry?

> (In reply to Jakub Jelinek from comment #5)
> > Or it could be compiled with options to make sure it doesn't use vector
> > registers etc., and only save/restore if it needs to call into some code
> > where libc can't afford that (say allocate memory).
> 
> We currently call into malloc, which could be a replacement malloc. If GCC
> cannot be fixed, full context switch or elimination of the slow path are our
> best options for a glibc-side fix.

We should open a glibc bug.  I am working on the glibc fix.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-02-11

--- Comment #4 from H.J. Lu  ---
(In reply to H.J. Lu from comment #3)
> Created attachment 57385 [details]
> A patch
> 
> Try this.

This doesn't work properly.  To work around in ld.so, _dl_tlsdesc_dynamic needs
to save and restore ALL registers, which can be expensive.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #3 from H.J. Lu  ---
Created attachment 57385
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57385=edit
A patch

Try this.

[Bug target/113837] Zeroing unused bits in _BitInt can improve codegen

2024-02-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113837

--- Comment #9 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #7)
> (In reply to H.J. Lu from comment #5)
> > (In reply to Jakub Jelinek from comment #1)
> > > Ugh no, please don't.
> > > This is significant ABI change.
> > > First of all, zeroing even for signed _BitInt is very weird, sign 
> > > extension
> > > for that case is more natural, but when _BitInt doesn't have any 
> > > unspecified
> > > bits, everything that computes them will need to compute even the extra
> > > bits.  That is not the case in the current code.
> > 
> > Can we compare zeroing and undefined codegen of unused bits for storing
> > signed _BitInt?
> 
> Not easily, the bitint_info::extended support isn't there yet (as no target
> needed it so far).  See also the discussions about it on IRC and aarch64
> _BitInt support thread (aarch64 wants to have the extra bits unspecified,
> but arm 32 extended).
> 
> > Then implement whatever appropriate in GCC and make it the de facto ABI.
> 
> So what's wrong with
> https://gitlab.com/x86-psABIs/i386-ABI/-/issues/5
> ?  Has it been discussed, or is i386-ABI dead?

i386 psABI is not actively maintained.

> I'd probably go with 32-bit limbs for _BitInt(65) and higher instead of
> 64-bit,
> but under the hood that is how it will be implemented no matter what the ABI
> says,
> whether it is 32-bit limbs or 64-bit limbs only affects a) the alignment b)
> how much is wasted in case of say _BitInt(65) or _BitInt(129) etc. and what
> the sizeof is.
> Even if limbs are 64-bit, the question is about alignment, ia32 has 32-bit
> alignment for long long and double at least when used inside of structs, so
> it would be weird to have different alignment from struct { limb l1, l2; }
> and similar.

Just implement what is the appropriate in GCC.  We will document it.

[Bug target/113837] Zeroing unused bits in _BitInt can improve codegen

2024-02-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113837

--- Comment #6 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #4)
> (In reply to H.J. Lu from comment #3)
> > (In reply to Jakub Jelinek from comment #2)
> > > OT, what is the state of the ia32 _BitInt ABI?  I'd really like to enable 
> > > it
> > > in GCC 14 even for ia32 (and perhaps -mx32 if you care about that case).
> > 
> > I think we should leave ia32 alone.
> 
> You mean never support C23 on it?

Then implement whatever appropriate in GCC and make it the de facto ABI.

[Bug target/113837] Zeroing unused bits in _BitInt can improve codegen

2024-02-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113837

--- Comment #5 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #1)
> Ugh no, please don't.
> This is significant ABI change.
> First of all, zeroing even for signed _BitInt is very weird, sign extension
> for that case is more natural, but when _BitInt doesn't have any unspecified
> bits, everything that computes them will need to compute even the extra
> bits.  That is not the case in the current code.

Can we compare zeroing and undefined codegen of unused bits for storing
signed _BitInt?

[Bug target/113837] Zeroing unused bits in _BitInt can improve codegen

2024-02-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113837

--- Comment #3 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #2)
> OT, what is the state of the ia32 _BitInt ABI?  I'd really like to enable it
> in GCC 14 even for ia32 (and perhaps -mx32 if you care about that case).

I think we should leave ia32 alone.  x32 uses the same psABI as x86-64.

[Bug target/113837] New: Zeroing unused bits in _BitInt can improve codegen

2024-02-08 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113837

Bug ID: 113837
   Summary: Zeroing unused bits in _BitInt can improve codegen
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com
  Target Milestone: ---
Target: x86-64

I opened this x86-64 psABI issue:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/16

[Bug target/113689] [11/12/13/14 Regression] wrong code with -fprofile -mcmodel=large when needing drap register since r11-6548

2024-02-06 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113689

--- Comment #11 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #10)
> 
> Just the second hunk.  I think with sorry call the compilation fails, so what
> you actually emit doesn't matter (one can see it with -pipe, sure).

Done.

[Bug target/113689] [11/12/13/14 Regression] wrong code with -fprofile -mcmodel=large when needing drap register since r11-6548

2024-02-06 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113689

--- Comment #9 from H.J. Lu  ---
Like this?

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f02c6c02ac6..ed0b0e19985 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22785,10 +22785,10 @@ x86_64_select_profile_regnum (bool r11_ok
ATTRIBUTE_UNUSED)
  && !REGNO_REG_SET_P (reg_live, i
   return i;

-  sorry ("no register available for profiling %<-mcmodel=large%s%>",
+  sorry ("no register available for profiling %<-mcmodel=large%s%>, use r10",
ix86_cmodel == CM_LARGE_PIC ? " -fPIC" : "");

-  return INVALID_REGNUM;
+  return R10_REG;
 }

 /* Output assembler code to FILE to increment profiler label # LABELNO

[Bug target/113689] [11/12/13/14 Regression] wrong code with -fprofile -mcmodel=large when needing drap register since r11-6548

2024-02-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113689

--- Comment #6 from H.J. Lu  ---
Fixed for GCC 14 so far.

[Bug tree-optimization/113752] [14 Regression] warning: ‘%s’ directive writing up to 10218 bytes into a region of size between 0 and 10240 [-Wformat-overflow=]

2024-02-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113752

--- Comment #2 from H.J. Lu  ---
[hjl@gnu-skx-1 gcc]$ cat /tmp/foo.i
char a[10256];
char b;
char *c, *g;
int d, e, f;
int sprintf(char *, char *, ...);
unsigned long strlen(char *);
int h(char *j) {
  if (strlen(j) + strlen(c) + strlen(g) + 32 > 10256)
return 0;
  sprintf(a, "%s:%s:%d:%d:%d:%c:%s\n", j, c, d, e, f, b, g);
  return 1;
}
void i() { h("wctype"); }
[hjl@gnu-skx-1 gcc]$ ./xgcc -B./ -O3 -Wall -S /tmp/foo.i
/tmp/foo.i: In function ?i?:
/tmp/foo.i:10:33: warning: ?%s? directive writing up to 10218 bytes into a
region of size between 0 and 10240 [-Wformat-overflow=]
   10 |   sprintf(a, "%s:%s:%d:%d:%d:%c:%s\n", j, c, d, e, f, b, g);
  | ^~
In function ?h?,
inlined from ?i? at /tmp/foo.i:13:12:
/tmp/foo.i:10:3: note: ?sprintf? output between 18 and 20484 bytes into a
destination of size 10256
   10 |   sprintf(a, "%s:%s:%d:%d:%d:%c:%s\n", j, c, d, e, f, b, g);
  |   ^
[hjl@gnu-skx-1 gcc]$

[Bug tree-optimization/113752] [14 Regression] warning: ‘%s’ directive writing up to 10218 bytes into a region of size between 0 and 10240 [-Wformat-overflow=]

2024-02-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113752

H.J. Lu  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-02-04
 CC||aldyh at redhat dot com
 Status|UNCONFIRMED |NEW

--- Comment #1 from H.J. Lu  ---
It is caused by r14-261.

[Bug c/113752] New: [14 Regression] warning: ‘%s’ directive writing up to 10218 bytes into a region of size between 0 and 10240 [-Wformat-overflow=]

2024-02-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113752

Bug ID: 113752
   Summary: [14 Regression] warning: ‘%s’ directive writing up to
10218 bytes into a region of size between 0 and 10240
[-Wformat-overflow=]
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
  Target Milestone: ---

Created attachment 57315
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57315=edit
A testcase

[hjl@gnu-tgl-2 tmp]$  /usr/gcc-14.0.1-x32-apx/bin/gcc -O3 -S x.i -Wall 
In file included from tests-mbwc/tst_wctype.c:8:
tests-mbwc/tsp_common.c: In function ‘result.constprop.isra’:
tests-mbwc/tsp_common.c:55:24: warning: ‘%s’ directive writing up to 10218
bytes into a region of size between 0 and 10240 [-Wformat-overflow=]
tests-mbwc/tsp_common.c:55:3: note: ‘sprintf’ output between 18 and 20484 bytes
into a destination of size 10256
[hjl@gnu-tgl-2 tmp]$ 

GCC 13 is OK.

[Bug target/113751] New: -mapxf -mfma4 generates wrong assembly code

2024-02-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113751

Bug ID: 113751
   Summary: -mapxf -mfma4 generates wrong assembly code
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com
  Target Milestone: ---
Target: x86-64

[hjl@gnu-icx-1 creduce-1]$ cat x.i
struct {
  double a[8]
} a;
double b, c, d;
int e, f, g;
void h() {
  f = e;
  d = a.a[g + 1];
  c = a.a[g] + a.a[g + 3] * (a.a[g + 4] *
 (a.a[g + 5] *
  (a.a[g + 6] * (a.a[g + 7] * a.a[g + 8] + b;
  d += e > a.a[g + 11];
}
[hjl@gnu-icx-1 creduce-1]$
/export/build/gnu/tools-build/gcc-x32-gitlab/release/usr/gcc-14.0.1-x32/bin/gcc
-O3 -mfma4 -mapxf x.i -w -c
/tmp/cchsm1V9.s: Assembler messages:
/tmp/cchsm1V9.s:38: Error: extended GPR cannot be used as base/index for
`vfmaddsd'
[hjl@gnu-icx-1 creduce-1]$

[Bug target/113711] APX instruction set and instructions longer than 15 bytes (assembly warning)

2024-02-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113711

--- Comment #9 from H.J. Lu  ---
Many NDD patterns have the same issue.  Here is another testcase:

[hjl@gnu-cfl-3 pr113711]$ cat apx-ndd-length-X.c
/* { dg-do assemble { target { apxf && { ! ia32 } } } } */
/* { dg-options "-mapxf -O2" } */

typedef signed __int128 S;
int o;

S
qux (void)
{
  S z;
  o = __builtin_add_overflow (*(S __seg_fs *) 0x1000, 0x200, );
  return z;
}
[hjl@gnu-cfl-3 pr113711]$ make apx-ndd-length-X.o
/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -mapxf
-O3 -dp   -c -o apx-ndd-length-X.o apx-ndd-length-X.c
/tmp/cc1eMHh5.s: Assembler messages:
/tmp/cc1eMHh5.s:9: Warning: instruction length of 16 bytes exceeds the limit of
15
[hjl@gnu-cfl-3 pr113711]$ cat apx-ndd-length-Y.c 
/* { dg-do assemble { target { apxf && { ! ia32 } } } } */
/* { dg-options "-mapxf -O2" } */

__thread signed __int128 var;
int o;

signed __int128
qux (void)
{
  signed __int128 z;
  o = __builtin_add_overflow (var, 0x200, );
  return z;
}
[hjl@gnu-cfl-3 pr113711]$ make apx-ndd-length-Y.o
/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -mapxf
-O3 -dp   -c -o apx-ndd-length-Y.o apx-ndd-length-Y.c
/tmp/ccwvDbZA.s: Assembler messages:
/tmp/ccwvDbZA.s:9: Warning: instruction length of 16 bytes exceeds the limit of
15
[hjl@gnu-cfl-3 pr113711]$ 

We need to exam all NDD patterns to check invalid memory constraint.
We should find a testcase for each issue we find.

[Bug target/113744] Unnecessary "m" constraint in *adddi_4

2024-02-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113744

--- Comment #1 from H.J. Lu  ---
Other *add patterns may have the same issue.

[Bug target/113744] New: Unnecessary "m" constraint in *adddi_4

2024-02-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113744

Bug ID: 113744
   Summary: Unnecessary "m" constraint in *adddi_4
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com, hongyuw at gcc dot gnu.org,
lingling.kong7 at gmail dot com
  Target Milestone: ---

commit 7abcef725e40589553a079df9258ae094b811751
Author: Kong Lingling 
Date:   Wed Jan 18 17:23:29 2023 +0800

[APX NDD] Support APX NDD for optimization patterns of add

has

@@ -6994,31 +7021,35 @@ (define_insn "*addsi_3_zext"
 (define_insn "*adddi_4"
   [(set (reg FLAGS_REG)
   (compare
-(match_operand:DI 1 "nonimmediate_operand" "0")
-(match_operand:DI 2 "x86_64_immediate_operand" "e")))
-   (clobber (match_scratch:DI 0 "=r"))]
+(match_operand:DI 1 "nonimmediate_operand" "0,rm")
+(match_operand:DI 2 "x86_64_immediate_operand" "e,e")))
+   (clobber (match_scratch:DI 0 "=r,r"))]
   "TARGET_64BIT
&& ix86_match_ccmode (insn, CCGCmode)"

But peephole which generates *adddi_4 only supports register as operand 2.

[Bug target/113733] New: Invalid APX TLS code squence

2024-02-02 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113733

Bug ID: 113733
   Summary: Invalid APX TLS code squence
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com, hongyuw at gcc dot gnu.org
  Target Milestone: ---
Target: x86-64

Created attachment 57301
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57301=edit
A testcase

[hjl@gnu-cfl-3 apx-1]$ make
/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -mapxf
-O3 -dp -S x.c
/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -mapxf
-O3 -dp   -c -o x.o x.c
/tmp/ccbItraT.s: Assembler messages:
/tmp/ccbItraT.s:29: Error: TLS relocation cannot be used with `add'
make: *** [: x.o] Error 1
[hjl@gnu-cfl-3 apx-1]$ 

This NDD
addq%rax, a@gottpoff(%rip), %r15

can't be used in TLS code sequence.

[Bug libstdc++/113732] New: [14 Regression] FAIL: g++.dg/modules/hello-1_b.C caused by r14-8710

2024-02-02 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113732

Bug ID: 113732
   Summary: [14 Regression] FAIL: g++.dg/modules/hello-1_b.C
caused by r14-8710
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: ppalka at gcc dot gnu.org
  Target Milestone: ---

On x86-64, r14-8710 caused:

FAIL: g++.dg/modules/hello-1_b.C -std=c++17 (internal compiler error: canonical 
types differ for identical types 'std::tuple_element<__i, std::tuple<_Elements
.
..> >' and 'std::tuple_element<__i, std::tuple<_Elements ...> >')
FAIL: g++.dg/modules/hello-1_b.C -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/hello-1_b.C -std=c++2a (internal compiler error: canonical 
types differ for identical types 'std::tuple_element<__i, std::tuple<_Elements
.
..> >' and 'std::tuple_element<__i, std::tuple<_Elements ...> >')
FAIL: g++.dg/modules/hello-1_b.C -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/hello-1_b.C -std=c++2b (internal compiler error: canonical 
types differ for identical types 'std::tuple_element<__i, std::tuple<_Elements
.
..> >' and 'std::tuple_element<__i, std::tuple<_Elements ...> >')
FAIL: g++.dg/modules/hello-1_b.C -std=c++2b (test for excess errors)

[Bug target/113729] New: Missing APX NDD optimization

2024-02-02 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113729

Bug ID: 113729
   Summary: Missing APX NDD optimization
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com, hongyuw at gcc dot gnu.org
  Target Milestone: ---
Target: x86-64

APX spec has

---
Unlike the merge-upper behavior at a destination register of a typical x86
integer instruction when OSIZE
is 8b or 16b, the NDD register is always zero-uppered
---

But GCC 14 generates:

[hjl@gnu-tgl-3 pr113711]$ cat b.c
extern unsigned char b;

unsigned int
foo (void)
{
  return 200 + b;
}
[hjl@gnu-tgl-3 pr113711]$ make b.s
/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -mapxf
-O3 -S b.c
[hjl@gnu-tgl-3 pr113711]$ cat b.s
.file   "b.c"
.text
.p2align 4
.globl  foo
.type   foo, @function
foo:
.LFB0:
.cfi_startproc
movzbl  b(%rip), %eax
addl$200, %eax
ret
.cfi_endproc
.LFE0:
.size   foo, .-foo
.ident  "GCC: (GNU) 14.0.1 20240202 (experimental)"
.section.note.G
[hjl@gnu-tgl-3 pr113711]$

addb$200, b(%rip), %al

should be generated.

[Bug target/113711] APX instruction set and instructions longer than 15 bytes (assembly warning)

2024-02-02 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113711

--- Comment #8 from H.J. Lu  ---
My branch is at

https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pr113711/master

We need to add more tests:

1. All NDD instructions are 15 bytes or less.
2. Use "op imm, mem, reg" whenever possible.

[Bug target/113711] APX instruction set and instructions longer than 15 bytes (assembly warning)

2024-02-02 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113711

--- Comment #7 from H.J. Lu  ---
(In reply to Hongyu Wang from comment #6)
> (In reply to H.J. Lu from comment #5)
> > (In reply to Hongyu Wang from comment #4)
> > > Previously I added 
> > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;
> > > h=d564198f960a2f5994dde3f6b83d7a62021e49c3
> > > 
> > > to prohibit several *POFF constant usage in NDD add alternative. If 
> > > checking
> > > ADDR_SPACE_GENERIC can avoid the seg prefix usage, we can drop that 
> > > change?
> > 
> > Are there are any testcases for this change?
> > 
> 
> Cut and edit from gcc.dg\torture\tls\tls-test.c
> 
> #include 
> __thread int a = 255; 
> __thread int *b;
> int *volatile a_in_other_thread = (int *) 12345;
> 
> void *
> thread_func (void *arg)
> {
>   a_in_other_thread =  //Previously it will try to generate addq
> $a@tpoff, %fs:0, %rax 
>   a+=11144; //this was not fixed on trunk as UNSPEC_TPOFF is in mem operand
>   *((int *) arg) = a;
> 
>   return (void *)0;
> }

My patch seems to work.  But we need to add such tests to gcc.target/i386.

[Bug target/113711] APX instruction set and instructions longer than 15 bytes (assembly warning)

2024-02-02 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113711

--- Comment #5 from H.J. Lu  ---
(In reply to Hongyu Wang from comment #4)
> Previously I added 
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;
> h=d564198f960a2f5994dde3f6b83d7a62021e49c3
> 
> to prohibit several *POFF constant usage in NDD add alternative. If checking
> ADDR_SPACE_GENERIC can avoid the seg prefix usage, we can drop that change?

Are there are any testcases for this change?

> And I'd suggest to use j prefix for all APX related constraints like jf.

Will do.

[Bug target/113711] APX instruction set and instructions longer than 15 bytes (assembly warning)

2024-02-02 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113711

H.J. Lu  changed:

   What|Removed |Added

  Attachment #57288|0   |1
is obsolete||

--- Comment #3 from H.J. Lu  ---
Created attachment 57293
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57293=edit
An updated patch

[Bug target/113711] APX instruction set and instructions longer than 15 bytes (assembly warning)

2024-02-02 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113711

--- Comment #2 from H.J. Lu  ---
Created attachment 57288
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57288=edit
Add BN constraint for APX NDD instructions

Since the instruction length of APX NDD instructions:

op imm, mem, reg

may exceed the size limit of 15 byes, add BN constraint which is a
memory operand when TARGET_APX_NDD is disabled. For all TARGET_APX_NDD
patterns with

op imm, mem, reg

replace m with BN in operand 1 constraint for alternative with immediate
operand 2.

This patch isn't complete. We need to update all relevant TARGET_APX_NDD
patterns.

  1   2   3   4   5   6   7   8   9   10   >