[COMIITTED] Testsuite: Make dependence on -fdelete-null-pointer-checks explicit
I've checked in these tweaks for various testcases that fail on nios2-elf without an explicit -fdelete-null-pointer-checks option. This target is configured to build with that optimization off by default. -Sandra commit 04c69d0e61c0f98a010d77a79ab749d5f0aa6b67 Author: Sandra Loosemore Date: Sat Jan 8 22:02:13 2022 -0800 Testsuite: Make dependence on -fdelete-null-pointer-checks explicit nios2-elf target defaults to -fno-delete-null-pointer-checks, breaking tests that implicitly depend on that optimization. Add the option explicitly on these tests. 2022-01-08 Sandra Loosemore gcc/testsuite/ * g++.dg/cpp0x/constexpr-compare1.C: Add explicit -fdelete-null-pointer-checks option. * g++.dg/cpp0x/constexpr-compare2.C: Likewise. * g++.dg/cpp0x/constexpr-typeid2.C: Likewise. * g++.dg/cpp1y/constexpr-94716.C: Likewise. * g++.dg/cpp1z/constexpr-compare1.C: Likewise. * g++.dg/cpp1z/constexpr-if36.C: Likewise. * gcc.dg/init-compare-1.c: Likewise. libstdc++-v3/ * testsuite/18_support/type_info/constexpr.cc: Add explicit -fdelete-null-pointer-checks option. diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-compare1.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-compare1.C index ad65019..603c6d5 100644 --- a/gcc/testsuite/g++.dg/cpp0x/constexpr-compare1.C +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-compare1.C @@ -1,4 +1,5 @@ // { dg-do compile { target c++11 } } +// { dg-additional-options "-fdelete-null-pointer-checks" } extern int a, b; static_assert (&a == &a, ""); diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C index b1bc472..5c08dbb 100644 --- a/gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C @@ -1,5 +1,6 @@ // PR c++/69681 // { dg-do compile { target c++11 } } +// { dg-additional-options "-fdelete-null-pointer-checks" } void f(); void g(); diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-typeid2.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-typeid2.C index 78c6b8e..8ab76f9 100644 --- a/gcc/testsuite/g++.dg/cpp0x/constexpr-typeid2.C +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-typeid2.C @@ -1,5 +1,6 @@ // PR c++/103600 // { dg-do compile { target c++11 } } +// { dg-additional-options "-fdelete-null-pointer-checks" } #include diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-94716.C b/gcc/testsuite/g++.dg/cpp1y/constexpr-94716.C index 90173f3..5ac8720 100644 --- a/gcc/testsuite/g++.dg/cpp1y/constexpr-94716.C +++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-94716.C @@ -1,5 +1,6 @@ // PR c++/94716 // { dg-do compile { target c++14 } } +// { dg-additional-options "-fdelete-null-pointer-checks" } template char v = 0; static_assert (&v<2> == &v<2>, ""); diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-compare1.C b/gcc/testsuite/g++.dg/cpp1z/constexpr-compare1.C index a53c03c..d40d536 100644 --- a/gcc/testsuite/g++.dg/cpp1z/constexpr-compare1.C +++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-compare1.C @@ -1,4 +1,5 @@ // { dg-do compile { target c++17 } } +// { dg-additional-options "-fdelete-null-pointer-checks" } inline int a = 0; inline int b = 0; diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-if36.C b/gcc/testsuite/g++.dg/cpp1z/constexpr-if36.C index 4a1b134..e425af2 100644 --- a/gcc/testsuite/g++.dg/cpp1z/constexpr-if36.C +++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-if36.C @@ -3,6 +3,7 @@ // weakness. // { dg-do compile { target c++17 } } +// { dg-additional-options "-fdelete-null-pointer-checks" } extern void weakfn1 (void); extern void weakfn2 (void); diff --git a/gcc/testsuite/gcc.dg/init-compare-1.c b/gcc/testsuite/gcc.dg/init-compare-1.c index 9208b66..6737c85 100644 --- a/gcc/testsuite/gcc.dg/init-compare-1.c +++ b/gcc/testsuite/gcc.dg/init-compare-1.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-additional-options "-fdelete-null-pointer-checks" } */ extern int a, b; int c = &a == &a; diff --git a/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc b/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc index 07f4fb6..6fb67b4 100644 --- a/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc +++ b/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc @@ -1,5 +1,6 @@ // { dg-options "-std=gnu++23 -frtti" } // { dg-do compile { target c++23 } } +// { dg-additional-options "-fdelete-null-pointer-checks" } #include
[PATCH] middle-end: move initialization of stack_limit_rtx [PR103163]
This patch fixes the ICE I reported in PR103163. We were initializing stack_limit_rtx before the register properties it depends on were getting set. I moved it to the same function where stack_pointer_rtx, frame_pointer_rtx, etc are being initialized. Besides nios2 where I observed it, this bug was also reported to affect powerpc. Anybody want to check it there? Otherwise, OK to check in? -Sandra commit bd91ec874339f9fd256b2d83de7159f6c11f Author: Sandra Loosemore Date: Sat Jan 8 19:59:26 2022 -0800 middle-end: move initialization of stack_limit_rtx [PR103163] stack_limit_rtx was being initialized before init_reg_modes_target (), resulting in the REG expression being created incorrectly and an ICE later in compilation. 2022-01-08 Sandra Loosemore PR middle-end/103163 gcc/ * emit-rtl.c (init_emit_regs): Initialize stack_limit_rtx here... (init_emit_once): ...not here. diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index f16..76dbe42 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -6097,6 +6097,13 @@ init_emit_regs (void) if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM) pic_offset_table_rtx = gen_raw_REG (Pmode, PIC_OFFSET_TABLE_REGNUM); + /* Process stack-limiting command-line options. */ + if (opt_fstack_limit_symbol_arg != NULL) +stack_limit_rtx + = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (opt_fstack_limit_symbol_arg)); + if (opt_fstack_limit_register_no >= 0) +stack_limit_rtx = gen_rtx_REG (Pmode, opt_fstack_limit_register_no); + for (i = 0; i < (int) MAX_MACHINE_MODE; i++) { mode = (machine_mode) i; @@ -6177,13 +6184,6 @@ init_emit_once (void) /* Create the unique rtx's for certain rtx codes and operand values. */ - /* Process stack-limiting command-line options. */ - if (opt_fstack_limit_symbol_arg != NULL) -stack_limit_rtx - = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (opt_fstack_limit_symbol_arg)); - if (opt_fstack_limit_register_no >= 0) -stack_limit_rtx = gen_rtx_REG (Pmode, opt_fstack_limit_register_no); - /* Don't use gen_rtx_CONST_INT here since gen_rtx_CONST_INT in this case tries to use these variables. */ for (i = - MAX_SAVED_CONST_INT; i <= MAX_SAVED_CONST_INT; i++)
Re: [power-ieee128] OPEN CONV
On Sat, Jan 08, 2022 at 02:15:14PM -0500, David Edelsohn wrote: > On Sat, Jan 8, 2022 at 1:59 PM Michael Meissner > wrote: > > > > On Sat, Jan 08, 2022 at 03:18:07PM +0100, Jakub Jelinek wrote: > > > On Sat, Jan 08, 2022 at 03:13:10PM +0100, Thomas Koenig wrote: > > > > > > > > On 08.01.22 15:02, Jakub Jelinek via Fortran wrote: > > > > > Note, as for byteswapping, apparently it wasn't ever working right fox > > > > > the IBM extended real(kind=16) and complex(kind=16). > > > > > > > > The lack of bug reports since the conversion feature was introduced in > > > > 2006, more than 15 years ago, tells us something, I guess... > > > > > > powerpc64le was only introduced in GCC 4.8 in 2013, so slightly less > > > than that, but still. > > > Either nobody interchanges/shares fortran unformatted data between > > > powerpc big and little endian, or if they do, they don't use real(kind=16) > > > or complex(kind=16) in there... > > > > I still wish I had had the forethought when we were setting up the LE ABI to > > change the default 128-bit format to IEEE instead of IBM. But alas, I > > didn't. > > You would still need converters between the big endian IBM format and little > > endian IEEE format, but it would have avoided a lot of the problems where > > GCC > > assumes there is only one floating point format for each size. > > Mike, > > The LE ABI initial target was Power8 and IEEE128 hardware support was > added to Power9. The ABI was a conscious decision. IEEE 128 was not a > viable requirement for the LE ABI at the time of the transition. Yes I know, but my memory is we (the GCC group within IBM) at least knew that IEEE 128-bit was coming towards the end of the ABI definition period. But perhaps not. In any case, it doesn't much matter now, as it is all ancient history. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Re: [power-ieee128] OPEN CONV
On Sat, Jan 8, 2022 at 1:59 PM Michael Meissner wrote: > > On Sat, Jan 08, 2022 at 03:18:07PM +0100, Jakub Jelinek wrote: > > On Sat, Jan 08, 2022 at 03:13:10PM +0100, Thomas Koenig wrote: > > > > > > On 08.01.22 15:02, Jakub Jelinek via Fortran wrote: > > > > Note, as for byteswapping, apparently it wasn't ever working right fox > > > > the IBM extended real(kind=16) and complex(kind=16). > > > > > > The lack of bug reports since the conversion feature was introduced in > > > 2006, more than 15 years ago, tells us something, I guess... > > > > powerpc64le was only introduced in GCC 4.8 in 2013, so slightly less > > than that, but still. > > Either nobody interchanges/shares fortran unformatted data between > > powerpc big and little endian, or if they do, they don't use real(kind=16) > > or complex(kind=16) in there... > > I still wish I had had the forethought when we were setting up the LE ABI to > change the default 128-bit format to IEEE instead of IBM. But alas, I didn't. > You would still need converters between the big endian IBM format and little > endian IEEE format, but it would have avoided a lot of the problems where GCC > assumes there is only one floating point format for each size. Mike, The LE ABI initial target was Power8 and IEEE128 hardware support was added to Power9. The ABI was a conscious decision. IEEE 128 was not a viable requirement for the LE ABI at the time of the transition. Thanks, David
Re: [power-ieee128] OPEN CONV
On Sat, Jan 08, 2022 at 03:18:07PM +0100, Jakub Jelinek wrote: > On Sat, Jan 08, 2022 at 03:13:10PM +0100, Thomas Koenig wrote: > > > > On 08.01.22 15:02, Jakub Jelinek via Fortran wrote: > > > Note, as for byteswapping, apparently it wasn't ever working right fox > > > the IBM extended real(kind=16) and complex(kind=16). > > > > The lack of bug reports since the conversion feature was introduced in > > 2006, more than 15 years ago, tells us something, I guess... > > powerpc64le was only introduced in GCC 4.8 in 2013, so slightly less > than that, but still. > Either nobody interchanges/shares fortran unformatted data between > powerpc big and little endian, or if they do, they don't use real(kind=16) > or complex(kind=16) in there... I still wish I had had the forethought when we were setting up the LE ABI to change the default 128-bit format to IEEE instead of IBM. But alas, I didn't. You would still need converters between the big endian IBM format and little endian IEEE format, but it would have avoided a lot of the problems where GCC assumes there is only one floating point format for each size. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Re: [PATCH 1/1] [PATCH] Fix canadian compile for mingw-w64 copies the wrong dlls for mingw-w64 multilibs [PR100427]
On 1/8/2022 2:04 AM, NightStrike via Gcc-patches wrote: On Thu, Jan 6, 2022, 18:31 cqwrteur via Gcc-patches wrote: When building GCC hosted on windows with Canadian/native compilation (host==target), the build scripts in GCC would override DLLs with each other. For example, for MinGW-w64, 32-bit DLLs would override 64 bits because build scripts copy them both to /bin. This patch fixes the issue by avoiding copying DLLs with multilibs. However, it would still copy when we do not build multilibs, usually the native build for GCC on windows. --- gcc/configure | 26 ++ You should probably not be modifying configure directly. Umm, the patch modifies libtool.m4 (two instances) and presumably the configure changes are just rebuilds with the autotools. jeff
Re: [PATCH] gcc: pass-manager: Fix memory leak. [PR jit/63854]
On 1/6/2022 6:53 AM, David Malcolm via Gcc-patches wrote: On Sun, 2021-12-19 at 22:30 +0100, Marc Nieper-Wißkirchen wrote: This patch fixes a memory leak in the pass manager. In the existing code, the m_name_to_pass_map is allocated in pass_manager::register_pass_name, but never deallocated. This is fixed by adding a deletion in pass_manager::~pass_manager. Moreover the string keys in m_name_to_pass_map are all dynamically allocated. To free them, this patch adds a new hash trait for string hashes that are to be freed when the corresponding hash entry is removed. This fix is particularly relevant for using GCC as a library through libgccjit. The memory leak also occurs when libgccjit is instructed to use an external driver. Before the patch, compiling the hello world example of libgccjit with the external driver under Valgrind shows a loss of 12,611 (48 direct) bytes. After the patch, no memory leaks are reported anymore. (Memory leaks occurring when using the internal driver are mostly in the driver code in gcc/gcc.c and have to be fixed separately.) The patch has been tested by fully bootstrapping the compiler with the frontends C, C++, Fortran, LTO, ObjC, JIT and running the test suite under a x86_64-pc-linux-gnu host. Thanks for the patch. It looks correct to me, given that pass_manager::register_pass_name does an xstrdup and puts the result in the map. That said: - I'm not officially a reviewer for this part of gcc (though I probably touched this code last) - is it cleaner to instead change m_name_to_pass_map's key type from const char * to char *, to convey that the map "owns" the name? That way we probably wouldn't need struct typed_const_free_remove, and (I hope) works better with the type system. Dave gcc/ChangeLog: PR jit/63854 * hash-traits.h (struct typed_const_free_remove): New. (struct free_string_hash): New. * pass_manager.h: Use free_string_hash. * passes.c (pass_manager::register_pass_name): Use free_string_hash. (pass_manager::~pass_manager): Delete allocated m_name_to_pass_map. My concern (and what I hadn't had time to dig into) was we initially used nofree_string_hash -- I wanted to make sure there wasn't any path where the name came from the stack (can't be free'd), was saved elsewhere (danging pointer) and the like. ie, why were we using nofree_string_hash to begin with? I've never really mucked around with these bits, so the analysis side kept falling off the daily todo list. If/once you're comfortable with the patch David, then go ahead and apply it on Marc's behalf. jeff
Re: [power-ieee128] OPEN CONV
On Sat, Jan 08, 2022 at 03:13:10PM +0100, Thomas Koenig wrote: > > On 08.01.22 15:02, Jakub Jelinek via Fortran wrote: > > Note, as for byteswapping, apparently it wasn't ever working right fox > > the IBM extended real(kind=16) and complex(kind=16). > > The lack of bug reports since the conversion feature was introduced in > 2006, more than 15 years ago, tells us something, I guess... powerpc64le was only introduced in GCC 4.8 in 2013, so slightly less than that, but still. Either nobody interchanges/shares fortran unformatted data between powerpc big and little endian, or if they do, they don't use real(kind=16) or complex(kind=16) in there... Jakub
Re: [power-ieee128] OPEN CONV
On 08.01.22 15:02, Jakub Jelinek via Fortran wrote: Note, as for byteswapping, apparently it wasn't ever working right fox the IBM extended real(kind=16) and complex(kind=16). The lack of bug reports since the conversion feature was introduced in 2006, more than 15 years ago, tells us something, I guess...
Re: [power-ieee128] OPEN CONV
On Sat, Jan 08, 2022 at 12:10:56PM +0100, Jakub Jelinek via Gcc-patches wrote: > One reason for that is that neither conversion is lossless, neither format > is a subset or superset of the other. Yes, IEEE quad has both much bigger > exponent range (-16382..16383 vs. -1022..1023) and slightly bigger fixed > precision (113 vs. 106 bits). > But IBM extended has that weirdo numerically awful flexible precision where > certain numbers can have much bigger precision than those 106 bits, up to > 2048+52 or so. So there is rounding in both directions. > So, after distros switch to -mabi=ieeelongdouble by default or when people > use -mabi=ieeelongdouble on their programs, they'd better store that format > into data files by default, without the need of some magic CONVERT= options, > env vars or command line options. Only in the case where they need to > interact with -mabi=ibmlongdouble environments, they need to take some > action. Note, as for byteswapping, apparently it wasn't ever working right fox the IBM extended real(kind=16) and complex(kind=16). Because unlike IEEE extended or integral types, it seems powerpc*-*-* doesn't actually fully byteswap those between little and big endian. Proof: long double a = 0.L; compiled little endian IBM long double: .size a, 16 a: .long 1431655765 .long 1070945621 .long 1431655766 .long 1014322517 compiled big endian IBM long double: .size a, 16 a: .long 1070945621 .long 1431655765 .long 1014322517 .long 1431655766 compiled little endian IEEE long double: .size a, 16 a: .long 1431655765 .long 1431655765 .long 1431655765 .long 1073567061 compiled big endian IEEE long double: .size a, 16 a: .long 1073567061 .long 1431655765 .long 1431655765 .long 1431655765 where the numbers in .long arguments are 32-bit numbers stored in the selected endianity. Compiled with -mlong-double-64 little endian: .size a, 8 a: .long 1431655765 .long 1070945621 and big endian: .size a, 8 a: .long 1070945621 .long 1431655765 Unless I'm misreading this, for IEEE long double, or double (and I bet float too) byteswapping the whole numbers is what is needed for interoperability between powerpc64{,le}-linux, for IBM long double we'd actually want to byteswap it as 2 real(kind=8) numbers and not one real(kind=16) one, i.e. the numbers are always stored as the more significant double followed by less significant double in memory. Jakub
[PATCH] nvptx: Improved support for HFMode including neghf2 and abshf2.
This patch adds more support for _Float16 (HFmode) to the nvptx backend. Currently negation, absolute value and floating point comparisons are implemented by promoting to float (SFmode). This patch adds suitable define_insns to nvptx.md, most conditional on TARGET_SM53 (-misa=sm_53). This patch also adds support for HFmode fused multiply-add. One subtlety is that neghf2 and abshf2 are implemented by (HImode) bit manipulation operations to update the sign bit. The NVidia PTX ISA documentation for neg.f16 and abs.f16 contains the caution "Future implementations may comply with the IEEE 754 standard by preserving the (NaN) payload and modifying only the sign bit". Given the availability of suitable replacements, I thought it best to provide IEEE 754 compliant implementations. If anyone observes a performance penalty from this choice I'm happy to provide a -ffast-math variant (or revisit this decision). This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu (including newlib) with a make and make -k check with no new failures. Ok for mainline? 2022-01-08 Roger Sayle gcc/ChangeLog * config/nvptx/nvptx.md (*cmpf): New define_insn. (cstorehf4): New define_expand. (fmahf4): New define_insn. (neghf2): New define_insn. (abshf2): New define_insn. gcc/testsuite/ChangeLog * gcc.target/nvptx/float16-3.c: New test case for neghf2. * gcc.target/nvptx/float16-4.c: New test case for abshf2. * gcc.target/nvptx/float16-5.c: New test case for fmahf4. * gcc.target/nvptx/float16-6.c: New test case. Thanks in advance, Roger -- diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index ce74672..a6046d7 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -779,6 +779,14 @@ "" "%.\\tsetp%c1\\t%0, %2, %3;") +(define_insn "*cmphf" + [(set (match_operand:BI 0 "nvptx_register_operand" "=R") + (match_operator:BI 1 "nvptx_float_comparison_operator" + [(match_operand:HF 2 "nvptx_register_operand" "R") + (match_operand:HF 3 "nvptx_nonmemory_operand" "RF")]))] + "TARGET_SM53" + "%.\\tsetp%c1\\t%0, %2, %3;") + (define_insn "jump" [(set (pc) (label_ref (match_operand 0 "" "")))] @@ -969,6 +977,21 @@ DONE; }) +(define_expand "cstorehf4" + [(set (match_operand:SI 0 "nvptx_register_operand") + (match_operator:SI 1 "nvptx_float_comparison_operator" + [(match_operand:HF 2 "nvptx_register_operand") + (match_operand:HF 3 "nvptx_nonmemory_operand")]))] + "TARGET_SM53" +{ + rtx reg = gen_reg_rtx (BImode); + rtx cmp = gen_rtx_fmt_ee (GET_CODE (operands[1]), BImode, + operands[2], operands[3]); + emit_move_insn (reg, cmp); + emit_insn (gen_setccsi_from_bi (operands[0], reg)); + DONE; +}) + ;; Calls (define_insn "call_insn_" @@ -1156,6 +1179,26 @@ "TARGET_SM53" "%.\\tmul.f16\\t%0, %1, %2;") +(define_insn "fmahf4" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (fma:HF (match_operand:HF 1 "nvptx_register_operand" "R") + (match_operand:HF 2 "nvptx_nonmemory_operand" "RF") + (match_operand:HF 3 "nvptx_nonmemory_operand" "RF")))] + "TARGET_SM53" + "%.\\tfma%#.f16\\t%0, %1, %2, %3;") + +(define_insn "neghf2" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (neg:HF (match_operand:HF 1 "nvptx_register_operand" "R")))] + "" + "%.\\txor.b16\\t%0, %1, -32768;") + +(define_insn "abshf2" + [(set (match_operand:HF 0 "nvptx_register_operand" "=R") + (abs:HF (match_operand:HF 1 "nvptx_register_operand" "R")))] + "" + "%.\\tand.b16\\t%0, %1, 32767;") + (define_insn "exp2hf2" [(set (match_operand:HF 0 "nvptx_register_operand" "=R") (unspec:HF [(match_operand:HF 1 "nvptx_register_operand" "R")] diff --git a/gcc/testsuite/gcc.target/nvptx/float16-3.c b/gcc/testsuite/gcc.target/nvptx/float16-3.c new file mode 100644 index 000..914282a --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/float16-3.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -misa=sm_53 -mptx=6.3" } */ + +_Float16 var; + +void neg() +{ + var = -var; +} + +/* { dg-final { scan-assembler "xor.b16" } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/float16-4.c b/gcc/testsuite/gcc.target/nvptx/float16-4.c new file mode 100644 index 000..b11f17a --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/float16-4.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */ + +_Float16 var; + +void foo() +{ + var = (var < (_Float16)0.0) ? -var : var; +} + +/* { dg-final { scan-assembler "and.b16" } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/float16-5.c b/gcc/testsuite/gcc.target/nvptx/float16-5.c new file mode 100644 index 000..5fe15ec --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/float16-5.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-m
Re: [power-ieee128] OPEN CONV
On Sat, Jan 08, 2022 at 12:00:38PM +0100, Jakub Jelinek via Gcc-patches wrote: > And IMHO the default like for byte-swapping should be the native > format, i.e. the one the program actually used. One reason for that is that neither conversion is lossless, neither format is a subset or superset of the other. Yes, IEEE quad has both much bigger exponent range (-16382..16383 vs. -1022..1023) and slightly bigger fixed precision (113 vs. 106 bits). But IBM extended has that weirdo numerically awful flexible precision where certain numbers can have much bigger precision than those 106 bits, up to 2048+52 or so. So there is rounding in both directions. So, after distros switch to -mabi=ieeelongdouble by default or when people use -mabi=ieeelongdouble on their programs, they'd better store that format into data files by default, without the need of some magic CONVERT= options, env vars or command line options. Only in the case where they need to interact with -mabi=ibmlongdouble environments, they need to take some action. Jakub
Re: [Ada] Read directory in Ada.Directories.Start_Search rather than Get_Next_Entry
Hi Pierre-Marie, is this really a good idea? If a directory has millions of files in it (rare, but I've seen it) this may consume a lot of memory. Also, if using a slow medium like a network file system, reading the entire directory contents may take a long time. Finally, you aren't really solving the race condition, you're just making the window smaller, right? After all, if I understand right you are still using readdir, you just use it during a shorter time period. Best wishes, Duncan. On 07/01/2022 17:27, Pierre-Marie de Rodat via Gcc-patches wrote: The Ada.Directories directory search function is changed so the contents of the directory is now read in Start_Search instead of in Get_Next_Entry. Start_Search now stores the result of the directory search in the search object, with Get_Next_Entry returning results from the search object. This differs from the prior implementation where Get_Next_Entry would query the directory directly for the next item using the POSIX readdir function. The problem with building Get_Next_Entry around the readdir function is POSIX does not specify the behavior of readdir when files are added or removed from the directory being read. For example: on most systems, deleting files from the folder being read does not impact readdir. However, some systems, like RTEMS and HFS+ volumes on macOS, will return NULL instead of the next item in the directory if the current item returned by readdir is deleted. To avoid this issue, the contents of the directory is read in Start_Search and the user is given a copy of these results. Consequently, any subsequent modification to the directory does not affect the ability to iterate through the results. This approach is the same taken by the popular fts C functions. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * libgnat/a-direct.adb (Search_Data): Remove type. (Directory_Vectors): New package instantiation. (Search_State): New type. (Fetch_Next_Entry): Remove. (Close): Remove. (Finalize): Rewritten. (Full_Name): Ditto. (Get_Next_Entry): Return next entry from Search results vector rather than querying the directory directly using readdir. (Kind): Rewritten. (Modification_Time): Rewritten. (More_Entries): Use Search state cursor to determine if more entries are available for users to read. (Simple_Name): Rewritten. (Size): Rewritten. (Start_Search_Internal): Rewritten to load the contents of the directory that matches the pattern and filter into the search object. * libgnat/a-direct.ads (Search_Type): New type. (Search_Ptr): Ditto. (Directory_Entry_Type): Rewritten to support new Start_Search procedure. * libgnat/s-filatt.ads (File_Length_Attr): New function.
Re: [power-ieee128] OPEN CONV
On Sat, Jan 08, 2022 at 11:07:24AM +0100, Thomas Koenig wrote: > I have tried to unravel the different cases here, I count six > (lumping together the environment variables, the CONVERT specifier > and -fconvert, and leaving out the byte swapping) > > CompilerConvert Read action Write action > > IEEENone NoneNone > IEEEIEEE NoneNone > IEEEIBM IBM->IEEE IEEE->IBM > > IBM None NoneNone > IBM IEEE IEEE->IBM IBM->IEEE > IBM IBM NoneNone > > From this table, it is clear that the compiler has to inform > the library about the option it is using, I think it is best > encoded in the number passed to _gfortran_set_convert. Whether the compiler is using IEEE or IBM real(kind=16) or complex(kind=16) for a particular spot (which doesn't have to be the same in the whole program) is known to the library by the kind argument it provides to the I/O routines, if it is kind=16, it is IBM, if it is kind=17, it is IEEE. See the patch I've posted, which does one thing when the runtime kind (i.e. abi_kind on the compiler side) is 17 and convert says r16_ibm, and another thing when runtime kind is 16 and convert says r16_ieee. Other cases shouldn't need conversion. And IMHO the default like for byte-swapping should be the native format, i.e. the one the program actually used. The only thing that should be encoded in _gfortran_set_convert is -fconvertWHATEVER command line option IMO. Jakub
Re: [PATCH] x86_64: Improve (interunit) moves from TImode to V1TImode.
On Thu, Jan 6, 2022 at 7:00 PM Roger Sayle wrote: > > > > This patch improves the code generated when moving a 128-bit value > > in TImode, represented by two 64-bit registers, to V1TImode, which > > is a single SSE register. > > > > Currently, the simple move: > > typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); > > uv1ti foo(__int128 x) { return (uv1ti)x; } > > > > is always transferred via memory, as: > > foo:movq%rdi, -24(%rsp) > > movq%rsi, -16(%rsp) > > movdqa -24(%rsp), %xmm0 > > ret > > > > with this patch, we now generate (with -msse2): > > foo:movq%rdi, %xmm1 > > movq%rsi, %xmm2 > > punpcklqdq %xmm2, %xmm1 > > movdqa %xmm1, %xmm0 > > ret > > > > and with -mavx2: > > foo:vmovq %rdi, %xmm1 > > vpinsrq $1, %rsi, %xmm1, %xmm0 > > ret > > > > Even more dramatic is the improvement of zero extended transfers. > > > > uv1ti bar(unsigned char c) { return (uv1ti)(__int128)c; } > > > > Previously generated: > > bar:movq$0, -16(%rsp) > > movzbl %dil, %eax > > movq%rax, -24(%rsp) > > vmovdqa -24(%rsp), %xmm0 > > ret > > > > Now generates: > > bar:movzbl %dil, %edi > > movq%rdi, %xmm0 > > ret > > > > > > My first attempt at this functionality attempted to use a > > simple define_split: > > > > +;; Move TImode to V1TImode via V2DImode instead of memory. > > +(define_split > > + [(set (match_operand:V1TI 0 "register_operand") > > +(subreg:V1TI (match_operand:TI 1 "register_operand") 0))] > > + "TARGET_64BIT && TARGET_SSE2 && can_create_pseudo_p ()" > > + [(set (match_dup 2) (vec_concat:V2DI (match_dup 3) (match_dup 4))) > > + (set (match_dup 0) (subreg:V1TI (match_dup 2) 0))] > > +{ > > + operands[2] = gen_reg_rtx (V2DImode); > > + operands[3] = gen_lowpart (DImode, operands[1]); > > + operands[4] = gen_highpart (DImode, operands[1]); > > +}) > > + > > > > Unfortunately, this triggers very late during the compilation > > preventing some of the simplification's we'd like (in combine). > > For example the foo case above becomes: > > > > foo:movq%rsi, -16(%rsp) > > movq%rdi, %xmm0 > > movhps -16(%rsp), %xmm0 > > > > transferring half directly, and the other half via memory. > > And for the bar case above, GCC fails to appreciate that > > movq/vmovq clears the high bits, resulting in: > > > > bar:movzbl %dil, %eax > > xorl%edx, %edx > > vmovq %rax, %xmm1 > > vpinsrq $1, %rdx, %xmm1, %xmm0 > > ret > > > > > > Hence the solution (i.e. this patch) is to add a special case > > to ix86_expand_vector_move for TImode to V1TImode transfers. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check with no new failures. Ok for mainline? > > > > > > 2022-01-06 Roger Sayle > > > > gcc/ChangeLog > > * config/i386/i386-expand.c (ix86_expand_vector_move): Add > > special case for TImode to V1TImode moves, going via V2DImode. > > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/sse2-v1ti-mov-1.c: New test case. > > * gcc.target/i386/sse2-v1ti-zext.c: New test case. OK. Thanks, Uros.
Re: [PATCH] gcc: pass-manager: Fix memory leak. [PR jit/63854]
Thanks for replying so quickly! Am Do., 6. Jan. 2022 um 14:53 Uhr schrieb David Malcolm : [...] > Thanks for the patch. > > It looks correct to me, given that pass_manager::register_pass_name > does an xstrdup and puts the result in the map. > > That said: > - I'm not officially a reviewer for this part of gcc (though I probably > touched this code last) I am a newcomer to the codebase of GCC and haven't yet been able to figure out whom to contact. I bothered you because the patch is mostly relevant for the libgccjit frontend. > - is it cleaner to instead change m_name_to_pass_map's key type from > const char * to char *, to convey that the map "owns" the name? That > way we probably wouldn't need struct typed_const_free_remove, and (I > hope) works better with the type system. The problem with that approach is that we would then need a new version of string_hash in hash-traits.h, say owned_string_hash, which derives from pointer_hash and not pointer_hash . This would add roughly as much code as struct typed_const_free_remove. Using the hypothetical owned_string_hash in the definition of m_name_to_pass_map in passes.c would then produce a map taking "char *" strings instead of "const char *" strings. This, however, would then lead to problems in pass_manager::register_pass_name where name is a "const char *" string (coming from outside) but m_name_to_pass_map->get would take a "char *" string. I don't see how to resolve this without bigger refactoring, so I think my struct typed_const_free_remove approach is less intrusive. This conveys at least that the key isn't changed by the hashmap operations and that it is yet owned (because this is something that typed_const_free_remove presupposes. Thanks, Marc [...]
Re: [power-ieee128] OPEN CONV
On 07.01.22 22:48, Jakub Jelinek wrote: On Fri, Jan 07, 2022 at 10:40:50PM +0100, Thomas Koenig wrote: One thing that one has to watch out for is a big-endian IBM long double file, so the byte swapping will have to be done before assigning the value. I've tried to handle that right, i.e. on unformatted read with byte-swapping and r16 <-> r17 conversions first do byte-swapping and then r16 <-> r17 conversions, while for unformatted writes first r16 <-> r17 conversions and then byte-swapping. I have tried to unravel the different cases here, I count six (lumping together the environment variables, the CONVERT specifier and -fconvert, and leaving out the byte swapping) CompilerConvert Read action Write action IEEENone NoneNone IEEEIEEE NoneNone IEEEIBM IBM->IEEE IEEE->IBM IBM None NoneNone IBM IEEE IEEE->IBM IBM->IEEE IBM IBM NoneNone From this table, it is clear that the compiler has to inform the library about the option it is using, I think it is best encoded in the number passed to _gfortran_set_convert. Old programs should continue to run with the new library, so the absence of a call to _gfortran_set_convert, or a call which sets byte swapping, should have the old meaning, i.e IBM long double. A program which uses IEEE long double should then call _gfortran_set_convert with a suitable argument to let the library know what to do, just in case. I think this is what I will start working on. Best regards Thomas
Re: [PATCH] gcc: pass-manager: Fix memory leak. [PR jit/63854]
Am Do., 6. Jan. 2022 um 14:57 Uhr schrieb David Malcolm via Jit : > [...snip...] > > > > > > diff --git a/gcc/passes.c b/gcc/passes.c > > > index 4bea6ae5b6a..0c70ece5321 100644 > > > --- a/gcc/passes.c > > > +++ b/gcc/passes.c > > [...snip...] > > > > @@ -1943,7 +1944,7 @@ pass_manager::dump_profile_report () const > > >" |in count |out > > > prob " > > >"|in count |out prob " > > >"|size |time |\n"); > > > - > > > + > > >for (int i = 1; i < passes_by_id_size; i++) > > > if (profile_record[i].run) > > >{ > > > > ...and there's a stray whitespace change here (in > pass_manager::dump_profile_report), which probably shouldn't be in the > patch. There was stray whitespace in that line the unpatched version of `passes.c`, which my Emacs silently cleaned up. Shall I retain this whitespace although it should probably haven't been there in the first place? Or should I just add a remark in the patch notes about that? Thanks, Marc
Re: [PATCH 1/1] [PATCH] Fix canadian compile for mingw-w64 copies the wrong dlls for mingw-w64 multilibs [PR100427]
On Thu, Jan 6, 2022, 18:31 cqwrteur via Gcc-patches wrote: > When building GCC hosted on windows with Canadian/native compilation > (host==target), the build scripts in GCC would override DLLs with each > other. For example, for MinGW-w64, 32-bit DLLs would override 64 bits > because build scripts copy them both to /bin. > > This patch fixes the issue by avoiding copying DLLs with multilibs. > However, it would still copy when we do not build multilibs, usually the > native build for GCC on windows. > --- > gcc/configure | 26 ++ > You should probably not be modifying configure directly. >